| Internet-Draft | aipref-autoctl | September 2025 | 
| Silver | Expires 12 March 2026 | [Page] | 
This Internet Draft proposes a category entitled "AI Substitutive Use" which would enable parties to express a preference regarding how digital assets are used by automated processing systems, with a focus on post-training (inference-time) uses that are likely to result in the creation of AI-generated outputs that substitute for the original asset. The proposal is for this category to nest within the larger category of Automated Processing, currently envisaged in the working group draft [AIPREF-VOCAB] (21 July 2025).¶
This note is to be removed before publishing as an RFC.¶
The latest revision of this draft can be found at https://datatracker.ietf.org/doc/draft-silver-aipref-vocab-substitutive/. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-silver-aipref-vocab-substitutive/.¶
Discussion of this document takes place on the AI Preferences Working Group mailing list (mailto:ai-control@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/ai-control/. Subscribe at https://www.ietf.org/mailman/listinfo/ai-control/.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 12 March 2026.¶
Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Existing mechanisms for expressing preferences, including those under consideration by the AI Preferences WG do not address concerns which have been strongly articulated about the practice of using digital assets as input to AI models to generate outputs which substitute for or undermine the value of the original assets. This gap leaves a broad group of stakeholders (including creators, journalists and publishers) without a means to express a preference regarding a type of use which is already having a material adverse impact on their rights. Developers and deployers of AI systems are also left without a clear, standardized preference signal regarding such uses, which results in "blunt" approaches to gathering such content - exposing them to legal risk. This proposal intends to define a tailored preference category to address the specific need, improve visibility across the board and support continued broad access to information and content.¶
The use of digital assets for inferencing "in real time" is widespread as a means of improving the accuracy and contextual relevance of outputs, such as through the use of techniques such as Retrieval-Augmented Generation (RAG) [RAG2020]. The flipside of that value is that such outputs are inclined to substitute for or dilute the value of the original asset, which decreases user engagement with the original asset. This harms revenue opportunities and undermines the ability of the owner or distributor of the original asset to connect directly with their intended audience. For example, the use of journalistic material to create AI-generated summaries which have resulted in the substantial reduction of internet traffic to online publications. In the longer term, this jeopardises the sustainability of those enterprises and the underlying incentives to create and publish such material. To mitigate this, some are moving content behind paywalls and deploying other means of limiting open access - diminishing access to information and content.¶
Should incentives to create diminish, AI innovation will also suffer as a result of less quality content on which to build a distribution funnel. This would also undermine the sustainability and verifiability of news and information services relied upon by the public and government institutions. Where the AI model or platform takes on the role of information gatekeeper and shaper, connections between the public and original sources can be severed (or warped), which undermines the ability and willingness of internet users to ensure what they are reading, hearing or watching matches the original source(s), allowing factual misrepresentations to propagate and go unchecked.¶
Creators have also justifiably expressed the need for a preference that addresses the use of their assets to create derivative works "in the style of" such original assets. Creators are harmed by the unfettered use of their works as inputs to AI Models to create outputs which dilute the market for their works, adopting distinctive elements and styles established by the creators themselves - which also harms their moral rights and interests to protect the integrity of their works and ensure attribution.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
For the purposes of this document, the following terms are used:¶
Post-training (inference-time): Uses of an AI/ML model that occur after the model has been trained and frozen, typically when generating outputs in response to inputs at runtime.¶
Retrieval-Augmented Generation (RAG): A technique where external content is retrieved at query time and supplied to a model to condition the generated output. This document references RAG as a common mechanism by which substitutive outputs may be produced [RAG2020].¶
The Act of using one or more assets as input to a trained AI/ML model (as opposed to the training of the model) which results in an output which incorporates, summarizes, aggregates or reproduces the assets, including stylistic elements thereof; provided, however that this category does not cover the use of a lawfully acquired digital asset where carried out directly by an end user (as opposed to a search application or bot) as input to a trained model to create a summary of such digital asset.¶
The use of assets for AI Substitutive Use is a proper subset of Automated Processing usage [AIPREF-VOCAB].¶
This category is distinct from AI Training or Generative AI Training, as it addresses uses that occur after a model has been trained, during inference. It is also distinct from Search, which covers uses that direct users back to the original asset. Substitutive Use, by contrast, describes outputs that replace, reduce the utility of, or make the source asset redundant to users by summarizing, reproducing, or restyling its contents.¶
Consistent with that objective, this category would not apply where end users are summarising digital assets which they have already acquired, outside of the context of search or retrieving such assets from online locations in summarized form.¶
This document has no IANA actions.¶