<?xml version='1.0' encoding='utf-8'?>
<rfc ipr="trust200902" docName="draft-jurkovikj-collab-tunnel-02" category="exp" consensus="false" version="3">
  
  <front>
    <title abbrev="TCT">The Collaboration Content Transfer (TCT) Protocol</title>
    <seriesInfo name="Internet-Draft" value="draft-jurkovikj-collab-tunnel-02" />
    <author initials="A." surname="Jurkovikj" fullname="Antun Jurkovikj">
      <address>
        <postal>
          <country>North Macedonia</country>
        </postal>
        <email>antunjurkovic@gmail.com</email>
      </address>
    </author>
    <date year="2026" month="May" day="12" />
    <abstract>
      

<t>This document specifies the Collaboration Content Transfer (TCT) Protocol, an HTTP-based method for efficient, verifiable delivery of web content to automated agents. TCT defines a JSON envelope that encapsulates resource content in negotiated text-based formats (such as Markdown or plain text) together with canonical metadata. It uses bidirectional URL discovery between human-facing and machine-facing URLs, JSON sitemaps, strong ETag validators, and conditional requests to reduce bandwidth while preserving semantic structure and canonical identity.</t>
    </abstract>
  </front>
  <middle>
    

<section anchor="introduction">
      <name>Introduction</name>
      <t>Automated agents (search engines, AI crawlers, archives, monitoring tools, aggregators) increasingly consume web content at scale. Fetching and parsing full HTML pages for machine consumption is often inefficient:</t>
      <ul spacing="normal">
        <li>
          <t>Page weight is dominated by templates, navigation, ads, and scripts.</t>
        </li>
        <li>
          <t>Machine consumers typically need a stable textual representation of the core content.</t>
        </li>
        <li>
          <t>Many pages do not change frequently, but are re-fetched in full.</t>
        </li>
      </ul>
      <t>Various sites already expose JSON APIs or feeds, but:</t>
      <ul spacing="normal">
        <li>
          <t>formats differ across sites, and</t>
        </li>
        <li>
          <t>use of HTTP validators is inconsistent.</t>
        </li>
      </ul>
      <t>The Collaboration Content Transfer (TCT) Protocol (commonly abbreviated as "TCT" or referred to as "Collaboration Tunnel" in earlier experimental deployments) defines a simple, interoperable profile on top of HTTP that:</t>
      <ul spacing="normal">
        <li>
          <t>exposes a canonical JSON representation (M-URL) for selected resources;</t>
        </li>
        <li>
          <t>advertises C-URL/M-URL mappings and validators in a JSON sitemap (M-Sitemap);</t>
        </li>
        <li>
          <t>uses a single, well-defined strong ETag method for M-URLs; and</t>
        </li>
        <li>
          <t>enables "zero-fetch" behavior when content is unchanged.</t>
        </li>
      </ul>
      <t>Note: The term "tunnel" in earlier references does not imply network-layer tunneling; rather, it refers to providing a direct content channel optimized for automated consumption alongside the traditional human-facing web interface.</t>
      <t>TCT is intentionally conservative: it uses only existing HTTP mechanisms, is backward compatible with the Web, and does not define policy or licensing semantics.</t>
      </section>
  <section anchor="terminology">
        <name>Terminology</name>
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 <xref target="RFC2119" /> <xref target="RFC8174" /> when, and only when, they appear in all capitals.</t>
        <ul spacing="normal">
          <li>
            <t>C-URL:
            </t>
            <ul spacing="normal">
              <li>
                <t>The canonical, human-facing URL of a resource, typically an HTML document.</t>
              </li>
            </ul>
          </li>
          <li>
            <t>M-URL:
            </t>
            <ul spacing="normal">
              <li>
                <t>The machine-facing URL providing the TCT JSON representation of that resource.</t>
              </li>
            </ul>
          </li>
          <li>
            <t>M-Sitemap:
            </t>
            <ul spacing="normal">
              <li>
                <t>The JSON sitemap enumerating C-URL/M-URL pairs and associated validators.</t>
              </li>
            </ul>
          </li>
          <li>
            <t>Representation:
            </t>
            <ul spacing="normal">
              <li>
                <t>As defined in <xref target="RFC9110" />: the information in a payload, including representation metadata, that is subject to content negotiation.</t>
              </li>
            </ul>
          </li>
        </ul>
        <t>Unless stated otherwise, "client" refers to an automated agent that is aware of TCT.</t>
        </section>
      <section anchor="problem-statement">
          <name>Problem Statement</name>
          <t>Key inefficiencies in current automated consumption of web content include:</t>
          <ul spacing="normal">
            <li>
              <t>repeated transfer of large HTML documents whose core content has not changed;</t>
            </li>
            <li>
              <t>lack of a standard, compact, semantics-focused representation for page-like resources;</t>
            </li>
            <li>
              <t>ad hoc usage or absence of validators (ETag, Last-Modified), hindering efficient revalidation;</t>
            </li>
            <li>
              <t>difficulty for agents to reason about change detection at scale using only HTML and XML sitemaps.</t>
            </li>
          </ul>
        </section>
        <section anchor="goals-and-non-goals">
          <name>Goals and Non-Goals</name>
          <t>TCT is designed to:</t>
          <ul spacing="normal">
            <li>
              <t>reuse HTTP semantics (<xref target="RFC9110" />, <xref target="RFC9111" />) rather than introduce new ones;</t>
            </li>
            <li>
              <t>provide a simple, deterministic JSON representation appropriate for machine learning systems, automated agents, and programmatic content consumption, including:
              </t>
              <ul spacing="normal">
                <li>
                  <t>search indexing,</t>
                </li>
                <li>
                  <t>content analysis and classification,</t>
                </li>
                <li>
                  <t>retrieval-augmented generation (RAG) and similar ML workflows,</t>
                </li>
                <li>
                  <t>archival and monitoring;</t>
                </li>
              </ul>
            </li>
            <li>
              <t>define discovery and validation clearly enough for interoperable clients and servers.</t>
            </li>
          </ul>
          <t>TCT explicitly does NOT:</t>
          <ul spacing="normal">
            <li>
              <t>define how content may or may not be used (policy, licensing, AI usage);</t>
            </li>
            <li>
              <t>change the semantics of resources at C-URLs;</t>
            </li>
            <li>
              <t>require new HTTP methods or status codes.</t>
            </li>
          </ul>
        </section>
      <section anchor="architecture">
        <name>Architecture</name>
        <t>TCT introduces three main elements per participating origin:</t>
        <ul spacing="normal">
          <li>
            <t>C-URL:
            </t>
            <ul spacing="normal">
              <li>
                <t>The canonical resource URL for humans, often serving HTML.</t>
              </li>
            </ul>
          </li>
          <li>
            <t>M-URL:
            </t>
            <ul spacing="normal">
              <li>
                <t>A URL providing a canonical JSON representation of the same logical resource (TCT JSON).</t>
              </li>
            </ul>
          </li>
          <li>
            <t>M-Sitemap:
            </t>
            <ul spacing="normal">
              <li>
                <t>A JSON document listing C-URLs, M-URLs, and strong validators (<tt>etag</tt> values).</t>
              </li>
            </ul>
          </li>
        </ul>
        <t>High-level flow (informative):</t>
        <ol spacing="normal" type="1"><li>
            <t>Client performs <tt>GET /</tt> at <tt>https://example.com/</tt>.</t>
          </li>
          <li>
            <t>The origin root response includes:
            </t>
            <ul spacing="normal">
              <li>
                <t><tt>Link: &lt;/llm-pages.json&gt;; rel="index"; type="application/json"</tt>.</t>
              </li>
            </ul>
          </li>
          <li>
            <t>Client fetches <tt>/llm-pages.json</tt> (M-Sitemap).</t>
          </li>
          <li>
            <t>For each item:
            </t>
            <ul spacing="normal">
              <li>
                <t>learns <tt>(cUrl, mUrl, etag)</tt>.</t>
              </li>
            </ul>
          </li>
          <li>
            <t>Client fetches <tt>mUrl</tt> as needed:
            </t>
            <ul spacing="normal">
              <li>
                <t><tt>GET mUrl</tt>, with <tt>If-None-Match</tt> on subsequent checks.</t>
              </li>
            </ul>
          </li>
          <li>
            <t>Server responds with:
            </t>
            <ul spacing="normal">
              <li>
                <t><tt>200 OK</tt> + JSON when changed;</t>
              </li>
              <li>
                <t><tt>304 Not Modified</tt> when unchanged.</t>
              </li>
            </ul>
          </li>
        </ol>
        <t>TCT is additive and optional:</t>
        <ul spacing="normal">
          <li>
            <t>Non-TCT clients ignore TCT artifacts.</t>
          </li>
          <li>
            <t>Servers can deploy TCT gradually alongside existing content and sitemaps.</t>
          </li>
        </ul>
      </section>
      <section anchor="design-rationale-and-relation-to-existing-mechanisms">
        <name>Design Rationale and Relation to Existing Mechanisms</name>
        <t>TCT is grounded in existing mechanisms and complements several related efforts:</t>
        <section anchor="related-work">
          <name>Related Work</name>
          <t><strong>XML Sitemaps:</strong>
- Widely deployed for URL discovery (search engines, crawlers).
- Provide <tt>&lt;lastmod&gt;</tt> timestamps but do not define normative bindings between sitemap entries and HTTP validators (ETags) for endpoint representations.
- TCT builds on this model by adding structured JSON representations and strong validator integration.</t>
          <t><strong>ResourceSync:</strong>
- Developed by the Open Archives Initiative and collaborators for resource synchronization in digital libraries.
- Provides resource lists, change lists, and capability documents.
- Focus is synchronization and preservation; does not define a single, tightly integrated pattern combining:
  - per-resource JSON representation,
  - sitemap listing with validators, and
  - ETag-based zero-fetch semantics.
- TCT addresses the specific use case of efficient web content delivery to automated agents.</t>
          <t><strong>AMP (Accelerated Mobile Pages):</strong>
- Defines an HTML subset optimized for fast rendering on mobile devices.
- Provides alternate representations via <tt>&lt;link rel="amphtml"&gt;</tt>.
- TCT provides JSON (not HTML) for machine consumption, targeting crawlers and content analysis rather than human browsing.</t>
          <t><strong>Custom JSON APIs:</strong>
- Many sites expose custom JSON endpoints for content access.
- Structures, field names, and validator usage vary widely across implementations.
- TCT aims to standardize a minimal, interoperable profile suitable for broad adoption.</t>
          <t><strong>robots.txt:</strong>
- Defines crawl directives and access policies.
- TCT does not replace robots.txt; publishers MAY use both mechanisms:
  - robots.txt for crawl permissions and rate guidance, and
  - TCT for efficient content delivery.</t>
          <t><strong>RSS/Atom Feeds:</strong>
- Provide syndication of updates and content excerpts.
- Typically lack HTTP caching integration (per-item ETags, conditional requests).
- TCT can be viewed as "RSS with disciplined HTTP caching semantics."</t>
        </section>
        <section anchor="tct-design-choices">
          <name>TCT Design Choices</name>
          <t>Key design choices in TCT:</t>
          <ul spacing="normal">
            <li>
              <t>Use only existing HTTP semantics:
              </t>
              <ul spacing="normal">
                <li>
                  <t>GET, HEAD, 200, 304, ETag, Cache-Control, Link.</t>
                </li>
              </ul>
            </li>
            <li>
              <t>Use JSON as the machine representation:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Simple, widely supported, easy to parse.</t>
                </li>
                <li>
                  <t>TCT is complementary to existing HTTP compression mechanisms when
normal HTTP validator rules for content-coded representations are
respected.</t>
                </li>
              </ul>
            </li>
            <li>
              <t>Define one strong ETag method:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Based on canonical JSON bytes of the M-URL payload.</t>
                </li>
                <li>
                  <t>Avoid ambiguous or dual hash methods.</t>
                </li>
              </ul>
            </li>
            <li>
              <t>Keep policy and energy considerations out of the core:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Those can be specified separately as informational work.</t>
                </li>
              </ul>
            </li>
          </ul>
          <t>The intent is to offer an interoperable profile that is:</t>
          <ul spacing="normal">
            <li>
              <t>easy to implement,</t>
            </li>
            <li>
              <t>friendly to existing caches/CDNs, and</t>
            </li>
            <li>
              <t>precise enough for standardization.</t>
            </li>
          </ul>
        </section>
      </section>
      <section anchor="discovery">
        <name>Discovery</name>
        <section anchor="m-sitemap-discovery">
          <name>M-Sitemap Discovery</name>
          <t>A publisher implementing TCT MUST expose an M-Sitemap and advertise it from the origin root resource.</t>
          <t>When a client performs:</t>
          <ul spacing="normal">
            <li>
              <t><tt>GET /</tt> with <tt>Host: example.com</tt></t>
            </li>
          </ul>
          <t>and receives a successful (2xx) response (either directly or after following redirects per <xref target="RFC9110" />), that response:</t>
          <ul spacing="normal">
            <li>
              <t>MUST include a <tt>Link</tt> header with:
              </t>
              <ul spacing="normal">
                <li>
                  <t><tt>rel="index"</tt></t>
                </li>
                <li>
                  <t><tt>type="application/json"</tt></t>
                </li>
                <li>
                  <t>a target that is the M-Sitemap URL for this origin.</t>
                </li>
              </ul>
            </li>
          </ul>
          <t>Example:</t>
          <ul spacing="normal">
            <li>
              <t><tt>Link: &lt;/llm-sitemap.json&gt;; rel="index"; type="application/json"</tt></t>
            </li>
          </ul>
          <t>In addition to HTTP <tt>Link</tt> headers, publishers MAY advertise an M-Sitemap from HTML C-URL responses.</t>
          <t>If the C-URL response body is HTML, and the <tt>&lt;head&gt;</tt> element contains a <tt>&lt;link&gt;</tt> element with:</t>
          <ul spacing="normal">
            <li>
              <t><tt>rel="index"</tt>, and</t>
            </li>
            <li>
              <t><tt>type="application/json"</tt>,</t>
            </li>
          </ul>
          <t>then clients MAY treat the referenced URL as an M-Sitemap URL, subject to content-type and profile checks.</t>
          <t>Example:</t>
          <sourcecode type="html">&lt;link rel="index" type="application/json" href="/llm-sitemap.json"&gt;</sourcecode>
          <t>This mechanism is particularly useful for deployments where adding HTTP response headers is difficult but HTML templates are easily editable.</t>
          <t>In this specification, a Link header field with <tt>rel="index"</tt>, <tt>type="application/json"</tt>, and a target whose content matches Section 7 identifies the TCT M-Sitemap for that origin. Other uses of <tt>rel="index"</tt> remain valid and are outside the scope of this document.</t>
          <t>Notes:</t>
          <ul spacing="normal">
            <li>
              <t><tt>/llm-sitemap.json</tt> is an example; any stable path MAY be used.</t>
            </li>
            <li>
              <t>If <tt>/</tt> redirects (e.g., <tt>301</tt> or <tt>302</tt> to <tt>/en/</tt> or <tt>/index.html</tt>), the <tt>Link</tt> header MUST appear on the final redirect target.</t>
            </li>
            <li>
              <t>Clients:
              </t>
              <ul spacing="normal">
                <li>
                  <t>SHOULD follow redirects per <xref target="RFC9110" /> before checking for TCT support.</t>
                </li>
                <li>
                  <t>MUST discover the M-Sitemap via the <tt>Link</tt> header on the final response.</t>
                </li>
                <li>
                  <t>If no such <tt>Link</tt> is present, SHOULD assume TCT is not deployed and MUST NOT guess paths.</t>
                </li>
              </ul>
            </li>
            <li>
              <t>If multiple <tt>Link</tt> headers with <tt>rel="index"</tt> and <tt>type="application/json"</tt> are present, clients MAY load all of them. Publishers MAY also provide an M-Sitemap Index as defined in Section 7.2 to list multiple M-Sitemaps for an origin.</t>
            </li>
          </ul>
          <t>This specification does not define or require any <tt>/.well-known/</tt> URI.</t>
        </section>
        <section anchor="m-url-discovery-and-canonical-links">
          <name>M-URL Discovery and Canonical Links</name>
          <t>For each resource where an M-URL is provided:</t>
          <ul spacing="normal">
            <li>
              <t>The C-URL response (typically HTML) SHOULD advertise the M-URL as an alternate JSON representation:  </t>
              <t>
Either via HTML:  </t>
              <ul spacing="normal">
                <li>
                  <t><tt>&lt;link rel="alternate" type="application/json" href="https://example.com/post/llm.json"&gt;</tt></t>
                </li>
              </ul>
              <t>
Or HTTP:  </t>
              <ul spacing="normal">
                <li>
                  <t><tt>Link: &lt;https://example.com/post/llm.json&gt;; rel="alternate"; type="application/json"</tt></t>
                </li>
              </ul>
            </li>
            <li>
              <t>The M-URL response for that resource MUST include a corresponding canonical link (<xref target="RFC6596" />):  </t>
              <ul spacing="normal">
                <li>
                  <t><tt>Link: &lt;https://example.com/post/&gt;; rel="canonical"</tt></t>
                </li>
              </ul>
            </li>
          </ul>
          <t><strong>Note on M-URL paths:</strong>
The choice of M-URL path (e.g., <tt>/post/llm.json</tt>, <tt>/post/llm/</tt>, <tt>/post.json</tt>, etc.) is not specified by this document. Publishers MAY choose any stable URL scheme that suits their architecture. Clients MUST discover M-URLs via advertised links (as shown above) and MUST NOT assume a fixed path pattern.</t>
          <t>Examples of valid M-URL patterns:
- <tt>https://example.com/post/llm.json</tt> (used in this document)
- <tt>https://example.com/post/llm/</tt> (directory-style)
- <tt>https://example.com/post.json</tt> (extension-based)</t>
          <t>This bidirectional linkage allows clients to:</t>
          <ul spacing="normal">
            <li>
              <t>verify that an M-URL is an alternate for the expected C-URL;</t>
            </li>
            <li>
              <t>detect misconfigurations when links are inconsistent; and</t>
            </li>
            <li>
              <t>ensure the <tt>canonical_url</tt> field in the M-URL JSON matches (or is consistent with) the URL advertised via <tt>rel="canonical"</tt> for the corresponding C-URL.</t>
            </li>
          </ul>
        </section>
        <section anchor="template-invariance">
          <name>Template-Invariance</name>
          <t>TCT's template-invariance property:</t>
          <ul spacing="normal">
            <li>
              <t>Changes to HTML templates, CSS, or JavaScript at the C-URL SHOULD NOT require changes to the M-URL JSON, so long as the underlying resource content has not changed.</t>
            </li>
          </ul>
          <t>This is achieved by:</t>
          <ul spacing="normal">
            <li>
              <t>treating M-URLs as distinct, canonical JSON representations of content; and</t>
            </li>
            <li>
              <t>computing strong ETags over the M-URL JSON only.</t>
            </li>
          </ul>
          <t>Template-invariance is NOT achieved by relaxing strong ETag semantics; for any given M-URL, identical strong ETags MUST imply byte-identical JSON bodies.</t>
        </section>
      </section>
      <section anchor="m-url-representation">
        <name>M-URL Representation</name>
        <t>An M-URL is an HTTP resource that serves a JSON representation of a content resource suitable for machine consumption.</t>
        <t>This specification defines the observable JSON representation at M-URLs; it does not constrain how servers derive these representations from their internal data models, templates, or storage.</t>
        <t>TCT does not define cross-representation concurrency control. Origins
that also support Semantic Validators for HTTP can include
<tt>Semantic-ETag</tt> on C-URL responses to help capable clients correlate
human-facing and machine-facing representations without first fetching
the M-URL.</t>
        <section anchor="content-type-and-encoding">
          <name>Content-Type and Encoding</name>
          <t>M-URL responses:</t>
          <ul spacing="normal">
            <li>
              <t>MUST use <tt>Content-Type: application/json</tt>.</t>
            </li>
            <li>
              <t>MUST be valid JSON as per <xref target="RFC8259" />.</t>
            </li>
            <li>
              <t>MUST be encoded in UTF-8 without BOM.</t>
            </li>
            <li>
              <t>Producers SHOULD include <tt>charset=utf-8</tt> in the Content-Type header, but clients MUST NOT rely on the charset parameter being present.</t>
            </li>
          </ul>
        </section>
        <section anchor="required-json-fields">
          <name>Required JSON Fields</name>
          <t>The M-URL JSON object acts as an envelope containing the resource's data and metadata. It MUST contain:</t>
          <ul spacing="normal">
            <li>
              <t><tt>profile</tt> (string, REQUIRED)
              </t>
              <ul spacing="normal">
                <li>
                  <t>A TCT profile identifier, for example <tt>tct-1</tt>. This field signals that the representation follows this specification.</t>
                </li>
              </ul>
            </li>
            <li>
              <t><tt>canonical_url</tt> (string, REQUIRED)
              </t>
              <ul spacing="normal">
                <li>
                  <t>The canonical human-facing URL (C-URL) for the resource.</t>
                </li>
              </ul>
            </li>
            <li>
              <t><tt>title</tt> (string, REQUIRED)
              </t>
              <ul spacing="normal">
                <li>
                  <t>A human-readable title for the resource.</t>
                </li>
              </ul>
            </li>
            <li>
              <t><tt>content_media_type</tt> (string, OPTIONAL)
              </t>
              <ul spacing="normal">
                <li>
                  <t>The IANA media type of the data contained in the <tt>content</tt> field.</t>
                </li>
                <li>
                  <t>If omitted, the default value is <tt>text/plain; charset=utf-8</tt>.</t>
                </li>
                <li>
                  <t>Typical values include <tt>text/plain; charset=utf-8</tt>, <tt>text/markdown; charset=utf-8</tt>, and <tt>text/html; charset=utf-8</tt>.</t>
                </li>
                <li>
                  <t>This specification is primarily intended for text-based media types (text/*). Producers SHOULD NOT use binary media types in <tt>content_media_type</tt> unless the payload is safely representable as a UTF-8 string.</t>
                </li>
              </ul>
            </li>
            <li>
              <t><tt>content</tt> (string, REQUIRED)
              </t>
              <ul spacing="normal">
                <li>
                  <t>The main payload of the resource, encoded as a JSON string.</t>
                </li>
                <li>
                  <t>The structure of this string MUST conform to the format specified in <tt>content_media_type</tt>. For example:
                  </t>
                  <ul spacing="normal">
                    <li>
                      <t>If <tt>content_media_type</tt> is <tt>text/plain; charset=utf-8</tt>, <tt>content</tt> contains plain text.</t>
                    </li>
                    <li>
                      <t>If <tt>content_media_type</tt> is <tt>text/markdown; charset=utf-8</tt>, <tt>content</tt> contains markdown text (including headings, emphasis, lists, etc.).</t>
                    </li>
                  </ul>
                </li>
                <li>
                  <t>Content SHOULD exclude purely template/boilerplate text (navigation menus, footers, etc.).</t>
                </li>
              </ul>
            </li>
          </ul>
          <t>Note on earlier drafts:
Previous experimental versions of this protocol included a <tt>hash</tt> field inside the JSON body. That field has been removed. In this specification, the HTTP <tt>ETag</tt> header is computed strictly over the final canonical JSON representation (Section 6.2); there is no hash field inside the JSON body itself.</t>
          <t>Example (non-normative):</t>
          <t><tt>json
{
  "profile": "tct-1",
  "canonical_url": "https://example.com/post/",
  "title": "Article Title",
  "content_media_type": "text/plain; charset=utf-8",
  "content": "Core article content..."
}
</tt></t>
        </section>
        <section anchor="payload-flexibility-and-structure">
          <name>Payload Flexibility and Structure</name>
          <t>TCT treats the JSON representation as a strict envelope, while the <tt>content</tt> field is a flexible payload. This allows different deployments to choose a representation that balances token efficiency with semantic fidelity for their agents.</t>
          <t>Some common patterns:</t>
          <ul spacing="normal">
            <li>
              <t><strong>Plain text (<tt>text/plain; charset=utf-8</tt>)</strong>
              </t>
              <ul spacing="normal">
                <li>
                  <t>Suitable for simple data mining, basic full-text search, or scenarios where minimizing tokens or representation size is more important than preserving layout or hierarchy.</t>
                </li>
              </ul>
            </li>
            <li>
              <t><strong>Markdown (<tt>text/markdown; charset=utf-8</tt>)</strong>
              </t>
              <ul spacing="normal">
                <li>
                  <t>Recommended for many machine-learning and LLM scenarios. Markdown preserves headings, lists, emphasis, and other structural cues that can significantly improve machine comprehension and answer quality compared to unstructured text.</t>
                </li>
              </ul>
            </li>
            <li>
              <t><strong>Other text-based formats (for example, <tt>text/html; charset=utf-8</tt>)</strong>
              </t>
              <ul spacing="normal">
                <li>
                  <t>In some cases, producers MAY choose other text-based media types when both producers and consumers agree on how to interpret them. TCT does not define the semantics of these media types; it only transports them.</t>
                </li>
              </ul>
            </li>
          </ul>
          <t>Producers SHOULD choose a <tt>content_media_type</tt> that meets their agents' needs while avoiding unnecessary overhead. Semantic structure that is critical for downstream use SHOULD be either:
- encoded in the chosen text format (for example, Markdown headings), and/or
- expressed in additional structured JSON fields, rather than relying solely on implicit formatting.</t>
        </section>
        <section anchor="representation-stability">
          <name>Representation Stability</name>
          <t>For an M-URL implementing this specification:</t>
          <ul spacing="normal">
            <li>
              <t>For a given resource state, the JSON body MUST be deterministic.</t>
            </li>
            <li>
              <t>Any change to the JSON body bytes (including required or optional fields) MUST result in a different strong ETag (Section 6.2).</t>
            </li>
            <li>
              <t>Servers MUST NOT include per-request randomness (e.g., varying timestamps) in the TCT JSON representation.</t>
            </li>
          </ul>
        </section>
        <section anchor="operational-considerations-for-large-content">
          <name>Operational Considerations for Large Content</name>
          <t>Many TCT consumers are large language models or similar systems with bounded context windows. Publishers SHOULD avoid placing arbitrarily large documents into a single <tt>content</tt> field.</t>
          <t>Non-normative guidance:</t>
          <ul spacing="normal">
            <li>
              <t>Prefer segmenting very large resources into multiple logical items when feasible (for example, per section, chapter, or article).</t>
            </li>
            <li>
              <t>Keeping individual <tt>content</tt> payloads below a few hundred kilobytes is often sufficient for typical LLM context limits, but deployments may use stricter limits.</t>
            </li>
            <li>
              <t>Clients MAY enforce their own maximum payload size and skip, truncate, or defer items that exceed their limits.</t>
            </li>
          </ul>
        </section>
      </section>
      <section anchor="deterministic-json-and-strong-etags">
        <name>Deterministic JSON and Strong ETags</name>
        <section anchor="deterministic-json-serialization">
          <name>Deterministic JSON Serialization</name>
          <t>M-URL responses MUST use deterministic JSON serialization sufficient to support strong ETag semantics.</t>
          <t>Producers of M-URLs MUST use the JSON Canonicalization Scheme (JCS) as specified in <xref target="RFC8785" /> to canonicalize the JSON representation into a UTF-8 octet sequence. The same canonicalization algorithm MUST be used both:</t>
          <ul spacing="normal">
            <li>
              <t>when computing the strong ETag value, and</t>
            </li>
            <li>
              <t>when generating the response body.</t>
            </li>
          </ul>
          <t>In particular, canonicalization:</t>
          <ul spacing="normal">
            <li>
              <t>produces a single, unique octet sequence for each abstract JSON value;</t>
            </li>
            <li>
              <t>applies stable ordering of object members at all levels;</t>
            </li>
            <li>
              <t>uses deterministic formatting of numbers; and</t>
            </li>
            <li>
              <t>does not introduce insignificant whitespace beyond what is necessary to delimit JSON tokens.</t>
            </li>
          </ul>
        </section>
        <section anchor="strong-etag-generation-single-method">
          <name>Strong ETag Generation (Single Method)</name>
          <t>TCT defines one mandatory method for computing strong ETags for M-URLs.</t>
          <t>For an M-URL representation:</t>
          <ol spacing="normal" type="1"><li>
              <t>Construct the full JSON object representing the resource, including all required and optional fields.</t>
            </li>
            <li>
              <t>Canonicalize this JSON object to a UTF-8 octet sequence using JSON Canonicalization Scheme (JCS) as defined in Section 6.1.</t>
            </li>
            <li>
              <t>Compute the SHA-256 digest of these canonical bytes.</t>
            </li>
            <li>
              <t>Let <tt>F</tt> be the 64-character lowercase hexadecimal encoding of this digest.</t>
            </li>
            <li>
              <t>Set the HTTP <tt>ETag</tt> header field to:  </t>
              <ul spacing="normal">
                <li>
                  <t><tt>"sha256-</tt>F<tt>"</tt></t>
                </li>
              </ul>
            </li>
            <li>
              <t>Send the canonical JCS octet sequence from step 2 as the response body,
with no <tt>Content-Encoding</tt>, when using this ETag value as the strong
validator for the selected representation.</t>
            </li>
          </ol>
          <t>Requirements:</t>
          <ul spacing="normal">
            <li>
              <t>ETags for M-URLs:
              </t>
              <ul spacing="normal">
                <li>
                  <t>MUST be strong (no <tt>W/</tt> prefix).</t>
                </li>
                <li>
                  <t>MUST be quoted-strings as in <xref target="RFC9110" />.</t>
                </li>
              </ul>
            </li>
            <li>
              <t>Two successful M-URL responses with the same strong ETag MUST have identical response body bytes.</t>
            </li>
            <li>
              <t>Any change in the response body bytes MUST cause the strong ETag to change.</t>
            </li>
          </ul>
          <t>From the perspective of HTTP semantics, clients MUST treat ETag values as opaque validators. TCT specifies a reproducible form (SHA-256 over canonical JSON) to support efficient comparison, but HTTP intermediaries are not required to understand this structure.</t>
          <t>Important: The strong ETag is computed over the entire canonical JSON representation described above. There is no <tt>hash</tt> field inside the JSON body itself. This ensures that the ETag is a true validator for the representation bytes, consistent with <xref target="RFC9110" /> and common HTTP caching practice.</t>
          <t>If a deployment serves content-coded variants of an M-URL response, it
MUST follow HTTP validator rules for those selected representations. In
particular, a strong ETag for a content-coded representation MUST NOT be
reused as the strong ETag for the identity-coded canonical JSON
representation unless the selected representation bytes are identical.
The simplest conformant deployment is to serve M-URLs without
<tt>Content-Encoding</tt> and use transport-level security and ordinary cache
revalidation to reduce transfer cost.</t>
        </section>
        <section anchor="content-digest">
          <name>Content-Digest</name>
          <t>For M-URL responses served without <tt>Content-Encoding</tt>, servers SHOULD
include a <tt>Content-Digest</tt> header field computed over the same canonical
octet sequence used for ETag generation, using the <tt>sha-256</tt> algorithm
as defined in <xref target="RFC9530" />.</t>
          <t>For example:</t>
          <sourcecode type="http">
Content-Digest: sha-256=:&lt;base64-value&gt;:
</sourcecode>
          <t>This provides end-to-end integrity protection in addition to the
change-detection semantics of strong ETags. When present on an
identity-coded M-URL response, the <tt>Content-Digest</tt> value SHOULD be
consistent with the ETag and the canonical JSON representation described
in this document.</t>
          <t>If a deployment serves content-coded M-URL responses and also needs to
expose a digest of the unencoded canonical JSON representation, it MAY
use <tt>Unencoded-Digest</tt> <xref target="I-D.ietf-httpbis-unencoded-digest" /> when
supported by the sender and recipient. Otherwise, deployments using
<tt>Content-Digest</tt> on content-coded responses need to follow the
content-coding rules defined by <xref target="RFC9530" />.</t>
        </section>
        <section anchor="relationship-to-template-invariance">
          <name>Relationship to Template-Invariance</name>
          <t>Strong ETags in TCT are validators for the M-URL JSON representation only.</t>
          <t>Template-invariance is achieved structurally:</t>
          <ul spacing="normal">
            <li>
              <t>M-URLs do not include HTML templates or layout.</t>
            </li>
            <li>
              <t>Changes to C-URL HTML that do not affect the M-URL JSON do not affect the strong ETag.</t>
            </li>
            <li>
              <t>For TCT-conformant M-URLs, strong ETags MUST be representation-based and MUST NOT vary per request in the absence of a change to the underlying JSON representation.</t>
            </li>
          </ul>
        </section>
        <section anchor="canonical-text-normalization-optional-for-producers">
          <name>Canonical Text Normalization (Optional for Producers)</name>
          <t>TCT does not require clients to normalize text or recompute hashes.</t>
          <t>Use of this normalization algorithm is OPTIONAL and is not required for TCT conformance by either servers or clients. It defines an additional 'TCT text normalization' profile for implementations that choose to adopt it.</t>
          <t>However, producers or validators that derive text fields (such as <tt>content</tt>) from upstream formats MAY implement the normalization algorithm defined here to ensure stable input before JSON canonicalization.</t>
          <t>Normalization can change the surface form of text; publishers SHOULD only enable the TCT normalization profile when these transformations are acceptable for their application and do not undermine the semantics of the content.</t>
          <t>If an implementation claims conformance to this normalization, it:</t>
          <ul spacing="normal">
            <li>
              <t>MUST follow the algorithm in this section; and</t>
            </li>
            <li>
              <t>MUST pass the test vectors in Appendix A.</t>
            </li>
          </ul>
          <t>Algorithm (summary):</t>
          <t>Given input string S:</t>
          <ol spacing="normal" type="1"><li>
              <t>(Optional) HTML entity decoding:
              </t>
              <ul spacing="normal">
                <li>
                  <t>If starting from HTML source, deterministically decode character references.</t>
                </li>
              </ul>
            </li>
            <li>
              <t>Unicode normalization:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Apply NFKC as defined in Unicode Standard Annex #15 (UAX15).</t>
                </li>
              </ul>
            </li>
            <li>
              <t>Case folding:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Apply Unicode case folding as defined in Unicode-CaseFolding.</t>
                </li>
                <li>
                  <t>Locale-dependent mappings MUST NOT be used.</t>
                </li>
              </ul>
            </li>
            <li>
              <t>Control characters:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Remove all Cc characters except:
                  </t>
                  <ul spacing="normal">
                    <li>
                      <t>U+0009 (TAB), U+000A (LF), U+000D (CR).</t>
                    </li>
                  </ul>
                </li>
              </ul>
            </li>
            <li>
              <t>Whitespace collapsing:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Define set W:
                  </t>
                  <ul spacing="normal">
                    <li>
                      <t>U+0020, U+0009, U+000A, U+000D.</t>
                    </li>
                  </ul>
                </li>
                <li>
                  <t>Optionally treat U+00A0 as whitespace, if done consistently.</t>
                </li>
                <li>
                  <t>Replace each maximal run of W (and optional NBSP) with a single U+0020 SPACE.</t>
                </li>
              </ul>
            </li>
            <li>
              <t>Trim:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Remove leading and trailing SPACE (U+0020).</t>
                </li>
              </ul>
            </li>
          </ol>
          <t>The result is the normalized string N(S).</t>
          <t>Note on structured text formats:
When <tt>content_media_type</tt> is <tt>text/markdown</tt> or another structured text-based format, normalization applies to the string value of the <tt>content</tt> field. Normalization MUST NOT strip or rewrite syntax characters that are semantically meaningful for that media type (for example, <tt>#</tt>, <tt>*</tt>, or <tt>|</tt> in Markdown). Normalization is limited to Unicode composition, case-folding (where configured), and whitespace handling as defined in this specification.</t>
          <t>Details, examples, and conformance language are in Appendix A.</t>
        </section>
      </section>
      <section anchor="m-sitemap-format-and-semantics">
        <name>M-Sitemap Format and Semantics</name>
        <section anchor="m-sitemap-json-structure">
          <name>M-Sitemap JSON Structure</name>
          <t>The M-Sitemap is a JSON object that lists TCT-enabled resources.</t>
          <t>Fields:</t>
          <ul spacing="normal">
            <li>
              <t><tt>version</tt> (integer, REQUIRED)
              </t>
              <ul spacing="normal">
                <li>
                  <t>M-Sitemap format version. This specification defines version <tt>2</tt>.</t>
                </li>
                <li>
                  <t>Earlier experimental deployments used version <tt>1</tt> with different representation details (including a <tt>hash</tt> field in M-URL JSON). Those deployments are not specified here.</t>
                </li>
              </ul>
            </li>
            <li>
              <t><tt>profile</tt> (string, RECOMMENDED)
              </t>
              <ul spacing="normal">
                <li>
                  <t>Profile identifier for the sitemap, e.g., <tt>tct-1</tt>.</t>
                </li>
                <li>
                  <t>Note: The <tt>profile</tt> string identifies the TCT profile used by representations (both M-URLs and M-Sitemaps), independently of the numeric <tt>version</tt> field used in the M-Sitemap format. M-URLs also include a <tt>profile</tt> field (Section 5.2) which SHOULD match the M-Sitemap's <tt>profile</tt> value. The <tt>version</tt> field on M-Sitemaps and M-Sitemap Indexes is intended to track the JSON format used for those catalog documents. The <tt>profile</tt> string identifies the higher-level TCT profile. Future revisions may introduce new sitemap <tt>version</tt> values while keeping the same <tt>profile</tt>, or vice versa, so implementations MUST treat these two fields as independent dimensions.</t>
                </li>
              </ul>
            </li>
          </ul>
          <t>The <tt>version</tt> and <tt>profile</tt> fields serve different purposes:</t>
          <table>
            <thead>
              <tr>
                <th align="left">Field</th>
                <th align="left">Scope</th>
                <th align="left">Changes when...</th>
              </tr>
            </thead>
            <tbody>
              <tr>
                <td align="left">version</td>
                <td align="left">JSON format</td>
                <td align="left">Field names, structure, or wire format change.</td>
              </tr>
              <tr>
                <td align="left">profile</td>
                <td align="left">TCT semantics</td>
                <td align="left">Processing semantics or algorithms change.</td>
              </tr>
            </tbody>
          </table>
          <t>Implementations MUST treat these two fields as independent dimensions: a future document could define a new M-Sitemap format version while keeping the same <tt>profile</tt>, or vice versa.</t>
          <ul spacing="normal">
            <li>
              <t><tt>items</tt> (array of objects, REQUIRED)
              </t>
              <ul spacing="normal">
                <li>
                  <t>Each item:      </t>
                  <ul spacing="normal">
                    <li>
                      <t><tt>cUrl</tt> (string, REQUIRED)
                      </t>
                      <ul spacing="normal">
                        <li>
                          <t>The canonical URL (C-URL) of the resource.</t>
                        </li>
                      </ul>
                    </li>
                    <li>
                      <t><tt>mUrl</tt> (string, REQUIRED)
                      </t>
                      <ul spacing="normal">
                        <li>
                          <t>The corresponding M-URL for the resource.</t>
                        </li>
                      </ul>
                    </li>
                    <li>
                      <t><tt>etag</tt> (string, RECOMMENDED)
                      </t>
                      <ul spacing="normal">
                        <li>
                          <t>A strong ETag hint for the current representation at <tt>mUrl</tt>.</t>
                        </li>
                        <li>
                          <t>The value SHOULD be equal to the current HTTP <tt>ETag</tt> value used in M-URL responses (excluding HTTP quoting), but sitemap <tt>etag</tt> values are advisory. Clients MUST treat ETags used in M-URL responses as the authoritative HTTP validators.</t>
                        </li>
                      </ul>
                    </li>
                    <li>
                      <t><tt>lastModified</tt> (string, OPTIONAL)
                      </t>
                      <ul spacing="normal">
                        <li>
                          <t>An RFC 3339 timestamp indicating when the underlying resource content at the C-URL was last modified, as known to the publisher.</t>
                        </li>
                        <li>
                          <t>This value is advisory only and MUST NOT be treated as an HTTP validator. ETag remains authoritative for change detection and revalidation.</t>
                        </li>
                      </ul>
                    </li>
                  </ul>
                </li>
              </ul>
            </li>
          </ul>
          <t>Clients MAY use <tt>lastModified</tt> as a hint for scheduling or prioritizing fetches, but they MUST still treat ETag values as the authoritative indicator of representation changes.</t>
          <t>Example (non-normative):</t>
          <t><tt>json
{
  "version": 2,
  "profile": "tct-1",
  "items": [
    {
      "cUrl": "https://example.com/post/",
      "mUrl": "https://example.com/post/llm.json",
      "etag": "sha256-2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7",
      "lastModified": "2025-11-27T14:00:00Z"
    }
  ]
}
</tt></t>
          <t>Clients can distinguish an M-Sitemap from an M-Sitemap Index by inspecting the top-level object. An M-Sitemap:</t>
          <ul spacing="normal">
            <li>
              <t>has <tt>version</tt> equal to 2,</t>
            </li>
            <li>
              <t>MUST contain an <tt>items</tt> array, and</t>
            </li>
            <li>
              <t>MUST NOT contain a top-level <tt>sitemaps</tt> array.</t>
            </li>
          </ul>
        </section>
        <section anchor="m-sitemap-index-optional">
          <name>M-Sitemap Index (Optional)</name>
          <t>For large sites, a single M-Sitemap JSON document may be impractical. Publishers MAY provide an M-Sitemap Index that lists multiple M-Sitemaps.</t>
          <t>An M-Sitemap Index is a JSON object with the following fields:</t>
          <ul spacing="normal">
            <li>
              <t><tt>version</tt> (integer, REQUIRED)
              </t>
              <ul spacing="normal">
                <li>
                  <t>Index format version. This specification defines version <tt>1</tt>.</t>
                </li>
              </ul>
            </li>
          </ul>
          <t>Note: M-Sitemaps and M-Sitemap Indexes use independent <tt>version</tt> numbers. This specification defines M-Sitemap format version <tt>2</tt> and M-Sitemap Index format version <tt>1</tt>. A change to one format does not imply a change to the other.</t>
          <ul spacing="normal">
            <li>
              <t><tt>profile</tt> (string, RECOMMENDED)
              </t>
              <ul spacing="normal">
                <li>
                  <t>Profile identifier for the sitemap index, e.g., <tt>tct-sitemap-index-1</tt>.</t>
                </li>
              </ul>
            </li>
            <li>
              <t><tt>sitemaps</tt> (array of objects, REQUIRED)
              </t>
              <ul spacing="normal">
                <li>
                  <t>Each entry:      </t>
                  <ul spacing="normal">
                    <li>
                      <t><tt>url</tt> (string, REQUIRED)
                      </t>
                      <ul spacing="normal">
                        <li>
                          <t>The URL of an M-Sitemap JSON document as defined above.</t>
                        </li>
                      </ul>
                    </li>
                  </ul>
                </li>
              </ul>
            </li>
          </ul>
          <t>Example (non-normative):</t>
          <t><tt>json
{
  "version": 1,
  "profile": "tct-sitemap-index-1",
  "sitemaps": [
    { "url": "https://example.com/sitemaps/part-1.json" },
    { "url": "https://example.com/sitemaps/part-2.json" }
  ]
}
</tt></t>
          <t>Similarly, an M-Sitemap Index:</t>
          <ul spacing="normal">
            <li>
              <t>has <tt>version</tt> equal to 1,</t>
            </li>
            <li>
              <t>MUST contain a <tt>sitemaps</tt> array, and</t>
            </li>
            <li>
              <t>MUST NOT contain a top-level <tt>items</tt> array.</t>
            </li>
          </ul>
          <t>Clients can use these structural differences to reliably distinguish between M-Sitemaps and M-Sitemap Indexes even when both are served with <tt>Content-Type: application/json</tt>.</t>
          <t>Clients that support M-Sitemap Indexes:</t>
          <ul spacing="normal">
            <li>
              <t>MUST treat the <tt>sitemaps</tt> list as the complete set of M-Sitemaps for that origin.</t>
            </li>
            <li>
              <t>MUST NOT assume that an index will itself reference other indexes. Publishers SHOULD avoid chaining indexes (for example, index -&gt; index -&gt; sitemap) to keep discovery logic simple. Implementations MAY detect and ignore recursive or cyclic references.</t>
            </li>
          </ul>
        </section>
        <section anchor="http-response-for-m-sitemap">
          <name>HTTP Response for M-Sitemap</name>
          <t>The M-Sitemap:</t>
          <ul spacing="normal">
            <li>
              <t>MUST use <tt>Content-Type: application/json</tt>. The M-Sitemap MUST be encoded in UTF-8. Producers SHOULD include <tt>charset=utf-8</tt> in the Content-Type header, but clients MUST NOT rely on the charset parameter being present.</t>
            </li>
            <li>
              <t>SHOULD use cache directives that encourage timely revalidation, for example:
              </t>
              <ul spacing="normal">
                <li>
                  <t><tt>Cache-Control: max-age=0, must-revalidate</tt>, or</t>
                </li>
                <li>
                  <t>a short max-age value appropriate to the site's update frequency.</t>
                </li>
              </ul>
            </li>
            <li>
              <t>SHOULD include:
              </t>
              <ul spacing="normal">
                <li>
                  <t><tt>Vary: Accept-Encoding</tt></t>
                </li>
              </ul>
            </li>
          </ul>
        </section>
        <section anchor="parity-semantics">
          <name>Parity Semantics</name>
          <t>Design intent:</t>
          <ul spacing="normal">
            <li>
              <t>M-Sitemap <tt>etag</tt> values SHOULD match the current strong ETag values of their corresponding M-URLs.</t>
            </li>
          </ul>
          <t>Requirements:</t>
          <ul spacing="normal">
            <li>
              <t>Publishers MUST compute <tt>etag</tt> values using the same algorithm as Section 6.2.</t>
            </li>
            <li>
              <t>Publishers SHOULD keep M-Sitemap <tt>etag</tt> values in sync with M-URL ETags.</t>
            </li>
            <li>
              <t>Transient mismatches (due to non-atomic updates, caches, or propagation delays) MAY occur.</t>
            </li>
          </ul>
          <t>Client behavior:</t>
          <ul spacing="normal">
            <li>
              <t>Clients SHOULD treat M-Sitemap <tt>etag</tt> as a hint for change detection.</t>
            </li>
            <li>
              <t>Clients MAY compare <tt>etag</tt> with the M-URL ETag.</t>
            </li>
            <li>
              <t>Clients MUST NOT treat mismatches alone as protocol errors.</t>
            </li>
            <li>
              <t>In case of mismatch, clients SHOULD fall back to standard conditional requests on the M-URL and treat the M-Sitemap etag as an advisory signal only.</t>
            </li>
          </ul>
          <t>Strong ETag values for M-URLs and etag values in the M-Sitemap are scoped to the origin that serves them. Clients MUST NOT assume that identical validator values observed on different origins imply identical content.</t>
        </section>
      </section>
      <section anchor="client-behavior">
        <name>Client Behavior</name>
        <section anchor="zero-fetch-optimization">
          <name>Zero-Fetch Optimization</name>
          <t>A typical TCT-aware client:</t>
          <ol spacing="normal" type="1"><li>
              <t>Fetches <tt>/</tt> and discovers the M-Sitemap via <tt>Link</tt>.</t>
            </li>
            <li>
              <t>Fetches the M-Sitemap.</t>
            </li>
            <li>
              <t>For each item <tt>(cUrl, mUrl, etag)</tt>:  </t>
              <ul spacing="normal">
                <li>
                  <t>If it has a cached ETag for <tt>mUrl</tt>:
                  </t>
                  <ul spacing="normal">
                    <li>
                      <t>If cached ETag equals sitemap <tt>etag</tt>:
                      </t>
                      <ul spacing="normal">
                        <li>
                          <t>MAY skip fetching <tt>mUrl</tt> (zero-fetch).</t>
                        </li>
                      </ul>
                    </li>
                    <li>
                      <t>Otherwise:
                      </t>
                      <ul spacing="normal">
                        <li>
                          <t>SHOULD issue conditional GET:
                          </t>
                          <ul spacing="normal">
                            <li>
                              <t><tt>GET mUrl</tt></t>
                            </li>
                            <li>
                              <t><tt>If-None-Match: "sha256-..."</tt>.</t>
                            </li>
                          </ul>
                        </li>
                      </ul>
                    </li>
                  </ul>
                </li>
                <li>
                  <t>If no cached ETag:
                  </t>
                  <ul spacing="normal">
                    <li>
                      <t>SHOULD fetch <tt>mUrl</tt> with GET.</t>
                    </li>
                    <li>
                      <t>MAY include <tt>If-None-Match</tt> using sitemap <tt>etag</tt> as hint.</t>
                    </li>
                  </ul>
                </li>
              </ul>
            </li>
          </ol>
          <t>This enables large reductions in redundant fetches.</t>
          <t>This optimization is OPTIONAL. Clients that prefer strict HTTP cache validation MAY always perform a conditional GET with If-None-Match on M-URLs instead of relying solely on M-Sitemap etag hints.</t>
          <t>When using <tt>etag</tt> values from an M-Sitemap as hints in <tt>If-None-Match</tt> conditional requests:</t>
          <ul spacing="normal">
            <li>
              <t>If the client sends <tt>If-None-Match</tt> based on a sitemap <tt>etag</tt> and receives a <tt>200 OK</tt> response with a different strong ETag, the client <strong>MUST</strong> update its local cache with the new ETag and representation.</t>
            </li>
            <li>
              <t>Clients <strong>MUST NOT</strong> treat a mismatch between sitemap <tt>etag</tt> and the authoritative ETag in an M-URL response as a protocol error. Sitemap <tt>etag</tt> values are advisory and may be temporarily stale.</t>
            </li>
          </ul>
        </section>
        <section anchor="conditional-requests-to-m-urls">
          <name>Conditional Requests to M-URLs</name>
          <t>M-URLs implementing TCT:</t>
          <ul spacing="normal">
            <li>
              <t>MUST support <tt>If-None-Match</tt> as defined in <xref target="RFC9110" />.</t>
            </li>
          </ul>
          <t>Specifically:</t>
          <ul spacing="normal">
            <li>
              <t>If <tt>If-None-Match</tt> matches the current strong ETag:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Respond with <tt>304 Not Modified</tt>, no body.</t>
                </li>
              </ul>
            </li>
            <li>
              <t>Otherwise:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Respond with <tt>200 OK</tt> and the JSON representation.</t>
                </li>
              </ul>
            </li>
          </ul>
          <t>If both <tt>If-None-Match</tt> and <tt>If-Modified-Since</tt> are present:</t>
          <ul spacing="normal">
            <li>
              <t><tt>If-None-Match</tt> MUST take precedence (per <xref target="RFC9110" />).</t>
            </li>
          </ul>
          <t>Servers SHOULD send appropriate <tt>Cache-Control</tt> directives to encourage revalidation and safe caching.</t>
        </section>
        <section anchor="use-of-head-optional">
          <name>Use of HEAD (Optional)</name>
          <t>Clients MAY use <tt>HEAD</tt> on M-URLs as an optimization:</t>
          <ul spacing="normal">
            <li>
              <t>If a HEAD response includes an <tt>ETag</tt> equal to the cached one, a GET may be skipped.</t>
            </li>
            <li>
              <t>If HEAD is unreliable, clients SHOULD fall back to conditional GET.</t>
            </li>
          </ul>
          <t>Conditional GET with <tt>If-None-Match</tt> SHOULD be considered the primary mechanism.</t>
        </section>
        <section anchor="client-validation-optional">
          <name>Client Validation (Optional)</name>
          <t>For protocol correctness:</t>
          <ul spacing="normal">
            <li>
              <t>Clients are not required to implement the text normalization algorithm or to recompute validators.</t>
            </li>
            <li>
              <t>Clients MAY treat ETags and sitemap <tt>etag</tt> values as opaque.</t>
            </li>
          </ul>
          <t>Clients MAY perform additional checks as desired, such as:</t>
          <ul spacing="normal">
            <li>
              <t>verifying that an M-URL includes <tt>rel="canonical"</tt> pointing at the expected C-URL;</t>
            </li>
            <li>
              <t>checking M-Sitemap <tt>etag</tt> vs M-URL ETag parity;</t>
            </li>
            <li>
              <t>recomputing hashes using the published algorithms.</t>
            </li>
          </ul>
          <t>Such additional checks are implementation choices and are out of scope for TCT compliance.</t>
        </section>
      </section>
      <section anchor="operational-considerations-informative">
        <name>Operational Considerations (Informative)</name>
        <section anchor="deployment-with-cdns-and-proxies">
          <name>Deployment with CDNs and Proxies</name>
          <ul spacing="normal">
            <li>
              <t>M-URLs and M-Sitemaps are ordinary cacheable JSON resources.</t>
            </li>
            <li>
              <t>CDNs and reverse proxies:
              </t>
              <ul spacing="normal">
                <li>
                  <t>MAY cache them;</t>
                </li>
                <li>
                  <t>SHOULD honor strong ETags and 304 responses.</t>
                </li>
              </ul>
            </li>
            <li>
              <t>If CDNs or reverse proxies apply content-coding:
              </t>
              <ul spacing="normal">
                <li>
                  <t>they MUST preserve correct HTTP validator semantics for each selected
representation; and</t>
                </li>
                <li>
                  <t>they MUST NOT reuse one strong ETag across non-identical coded and
identity representations.</t>
                </li>
              </ul>
            </li>
          </ul>
        </section>
        <section anchor="large-sites-and-sharding">
          <name>Large Sites and Sharding</name>
          <t>For very large sites, operators MAY create multiple M-Sitemaps (for example, per section or per date) and list them from an M-Sitemap Index (Section 7.2). More advanced sharding conventions (for example, per-language or per-tenant indexes) are out of scope for this document.</t>
        </section>
        <section anchor="error-handling">
          <name>Error Handling</name>
          <t>Recommended behavior:</t>
          <ul spacing="normal">
            <li>
              <t>If the M-Sitemap is unavailable or invalid:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Treat TCT as temporarily unavailable.</t>
                </li>
              </ul>
            </li>
            <li>
              <t>If an M-URL returns 4xx/5xx:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Follow normal HTTP semantics for retries/backoff.</t>
                </li>
              </ul>
            </li>
            <li>
              <t>If the M-URL is persistently broken:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Clients MAY ignore it and rely on the C-URL HTML, if appropriate.</t>
                </li>
              </ul>
            </li>
          </ul>
        </section>
        <section anchor="rate-limiting-and-backoff">
          <name>Rate Limiting and Backoff</name>
          <t>TCT clients SHOULD respect HTTP rate limiting and overload signals. In particular:</t>
          <ul spacing="normal">
            <li>
              <t>When a client receives a <tt>429 Too Many Requests</tt> or <tt>503 Service Unavailable</tt> response, it SHOULD implement exponential backoff or a similar retry-suppression strategy.</t>
            </li>
            <li>
              <t>Clients SHOULD NOT re-fetch the same M-Sitemap or M-Sitemap Index more frequently than once per minute, unless explicitly directed otherwise by HTTP caching headers (for example, a <tt>Cache-Control: max-age=...</tt> directive).</t>
            </li>
          </ul>
          <t>These requirements are intended to complement, not replace, origin-specific guidance such as <tt>robots.txt</tt> rules, authentication policies, or out-of-band API documentation.</t>
        </section>
        <section anchor="backwards-compatibility">
          <name>Backwards Compatibility</name>
          <ul spacing="normal">
            <li>
              <t>TCT is purely additive:
              </t>
              <ul spacing="normal">
                <li>
                  <t>It does not alter C-URL semantics.</t>
                </li>
                <li>
                  <t>Non-participating clients see no change.</t>
                </li>
              </ul>
            </li>
            <li>
              <t>Operators can deploy TCT incrementally on selected resources.</t>
            </li>
          </ul>
        </section>
      </section>
      <section anchor="changes-since-01">
        <name>Changes Since -01</name>
        <t>This section summarizes the main changes between draft-jurkovikj-collab-tunnel-01 and this version (-02):</t>
        <ul spacing="normal">
          <li>
            <t><strong>Envelope and validators</strong>
            </t>
            <ul spacing="normal">
              <li>
                <t>Removed the <tt>hash</tt> field from the M-URL JSON envelope.</t>
              </li>
              <li>
                <t>The HTTP <tt>ETag</tt> header is now the sole authoritative validator for M-URL representations.</t>
              </li>
              <li>
                <t>The JSON body and the ETag are both derived from the same canonical octet sequence (Section 6.2).</t>
              </li>
            </ul>
          </li>
          <li>
            <t><strong>M-URL JSON structure</strong>
            </t>
            <ul spacing="normal">
              <li>
                <t>Added a REQUIRED <tt>profile</tt> field to the M-URL envelope so that clients can identify the TCT profile in use.</t>
              </li>
              <li>
                <t>Added an OPTIONAL <tt>content_media_type</tt> field to describe the media type of <tt>content</tt>. The default is <tt>text/plain; charset=utf-8</tt>.</t>
              </li>
              <li>
                <t>Clarified that <tt>content</tt> is a payload field and may carry Markdown or other text-based formats.</t>
              </li>
            </ul>
          </li>
          <li>
            <t><strong>Canonicalization</strong>
            </t>
            <ul spacing="normal">
              <li>
                <t>Strengthened the canonicalization requirement: implementations MUST use RFC 8785 JSON Canonicalization Scheme (JCS) for both the HTTP response body and ETag computation.</t>
              </li>
            </ul>
          </li>
          <li>
            <t><strong>M-Sitemap and M-Sitemap Index</strong>
            </t>
            <ul spacing="normal">
              <li>
                <t>Defined M-Sitemap format version <tt>2</tt> and clarified the relationship between <tt>version</tt> and <tt>profile</tt>.</t>
              </li>
              <li>
                <t>Introduced an M-Sitemap Index format with its own <tt>version</tt> field, intended for large deployments that shard their M-Sitemaps.</t>
              </li>
              <li>
                <t>Added an OPTIONAL <tt>lastModified</tt> field to M-Sitemap items for advisory timestamp hints.</t>
              </li>
            </ul>
          </li>
          <li>
            <t><strong>Discovery</strong>
            </t>
            <ul spacing="normal">
              <li>
                <t>Added HTML-based discovery for the M-Sitemap using <tt>&lt;link rel="index" type="application/json"&gt;</tt> in HTML <tt>&lt;head&gt;</tt> sections, in addition to HTTP <tt>Link</tt> headers.</t>
              </li>
            </ul>
          </li>
          <li>
            <t><strong>Normalization profile</strong>
            </t>
            <ul spacing="normal">
              <li>
                <t>Defined an optional TCT text normalization profile and provided test vectors for normalization behavior.</t>
              </li>
            </ul>
          </li>
          <li>
            <t><strong>Operational and security guidance</strong>
            </t>
            <ul spacing="normal">
              <li>
                <t>Added guidance on rate limiting and exponential backoff for clients.</t>
              </li>
              <li>
                <t>Added text on scraping, permission boundaries, and content divergence (cloaking) between C-URLs and M-URLs.</t>
              </li>
              <li>
                <t>Recommended the use of <tt>Content-Digest</tt> for stronger integrity protection.</t>
              </li>
              <li>
                <t>Added operational considerations for large content and LLM context window limits.</t>
              </li>
            </ul>
          </li>
        </ul>
      </section>
      <section anchor="security-considerations">
        <name>Security Considerations</name>
        <t>TCT builds directly on HTTP; most security considerations are inherited from <xref target="RFC9110" /> and <xref target="RFC9111" />.</t>
        <t>Key points:</t>
        <ul spacing="normal">
          <li>
            <t>HTTPS:
            </t>
            <ul spacing="normal">
              <li>
                <t>Publishers SHOULD serve M-URLs and M-Sitemaps over HTTPS.</t>
              </li>
              <li>
                <t>Clients SHOULD validate TLS as usual.</t>
              </li>
            </ul>
          </li>
          <li>
            <t>Integrity:
            </t>
            <ul spacing="normal">
              <li>
                <t>Strong ETags identify specific representations but do not authenticate servers.</t>
              </li>
              <li>
                <t>For stronger guarantees, publishers MAY use:
                </t>
                <ul spacing="normal">
                  <li>
                    <t><tt>Content-Digest</tt> headers (<xref target="RFC9530" />); and/or</t>
                  </li>
                  <li>
                    <t>HTTP Message Signatures (<xref target="RFC9421" />).</t>
                  </li>
                </ul>
              </li>
            </ul>
          </li>
          <li>
            <t>Access control:
            </t>
            <ul spacing="normal">
              <li>
                <t>If a C-URL requires authentication, its corresponding M-URL SHOULD be similarly protected.</t>
              </li>
              <li>
                <t>M-Sitemaps MUST NOT leak sensitive M-URLs or validators.</t>
              </li>
            </ul>
          </li>
          <li>
            <t>Cache poisoning:
            </t>
            <ul spacing="normal">
              <li>
                <t>Correct use of ETag and Cache-Control mitigates stale or mixed content.</t>
              </li>
              <li>
                <t>Clients MUST treat JSON as untrusted data and validate/sanitize as appropriate.</t>
              </li>
            </ul>
          </li>
          <li>
            <t>Privacy:
            </t>
            <ul spacing="normal">
              <li>
                <t>As with any sitemap, listing URLs in an M-Sitemap can reveal site structure.</t>
              </li>
              <li>
                <t>Publishers concerned about this SHOULD limit entries or restrict access.</t>
              </li>
            </ul>
          </li>
          <li>
            <t>Scraping and automated access:
            </t>
            <ul spacing="normal">
              <li>
                <t>TCT lowers technical friction for automated content retrieval by providing a machine-optimized representation and discovery mechanism.</t>
              </li>
              <li>
                <t>TCT does not change existing permission boundaries: publishers retain the same control mechanisms available for traditional web content (robots.txt, authentication, rate limiting, etc.).</t>
              </li>
              <li>
                <t>Clients SHOULD respect robots.txt directives and other access control policies as they would for any HTTP resource.</t>
              </li>
            </ul>
          </li>
          <li>
            <t>Content divergence (cloaking):
            </t>
            <ul spacing="normal">
              <li>
                <t>Publishers SHOULD ensure that M-URL content accurately reflects the substantive content of the corresponding C-URL.</t>
              </li>
              <li>
                <t>Clients MAY occasionally fetch and compare both C-URL and M-URL representations to detect extreme divergence or cloaking behavior.</t>
              </li>
              <li>
                <t>Persistent or deceptive divergence between C-URL and M-URL content may be treated as a trust signal by clients.</t>
              </li>
            </ul>
          </li>
          <li>
            <t>Origin trust:
            </t>
            <ul spacing="normal">
              <li>
                <t>Clients that consume M-Sitemaps and M-URLs inherently trust the origin in the same way they trust HTML pages or XML Sitemaps from that origin. A compromised or misconfigured origin can advertise incorrect mappings; this is not a new class of attack introduced by TCT.</t>
              </li>
            </ul>
          </li>
          <li>
            <t>Intermediaries:
            </t>
            <ul spacing="normal">
              <li>
                <t>Deployments SHOULD ensure that CDNs and other intermediaries do not strip or rewrite strong ETags on M-URLs or M-Sitemaps, as doing so can interfere with correct validation and zero-fetch behavior.</t>
              </li>
            </ul>
          </li>
        </ul>
      </section>
      <section anchor="iana-considerations">
        <name>IANA Considerations</name>
        <t>This document has no IANA actions.</t>
        <t>Future documents may define:</t>
        <ul spacing="normal">
          <li>
            <t>a well-known URI for discovering M-Sitemaps, and</t>
          </li>
          <li>
            <t>a media type parameter or profile URI for identifying TCT JSON representations.</t>
          </li>
        </ul>
        <t>Those registrations are intentionally out of scope for this Experimental specification.</t>
      </section>
    </middle>
  <back>
    <references anchor="sec-normative-references">
        <name>Normative References</name>
        <reference anchor="RFC2119" target="https://www.rfc-editor.org/info/rfc2119" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <author fullname="S. Bradner" initials="S." surname="Bradner" />
            <date month="March" year="1997" />
            <abstract>
              <t>In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14" />
          <seriesInfo name="RFC" value="2119" />
          <seriesInfo name="DOI" value="10.17487/RFC2119" />
        </reference>
        <reference anchor="RFC8174" target="https://www.rfc-editor.org/info/rfc8174" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml">
          <front>
            <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
            <author fullname="B. Leiba" initials="B." surname="Leiba" />
            <date month="May" year="2017" />
            <abstract>
              <t>RFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14" />
          <seriesInfo name="RFC" value="8174" />
          <seriesInfo name="DOI" value="10.17487/RFC8174" />
        </reference>
        <reference anchor="RFC8259" target="https://www.rfc-editor.org/info/rfc8259" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8259.xml">
          <front>
            <title>The JavaScript Object Notation (JSON) Data Interchange Format</title>
            <author fullname="T. Bray" initials="T." role="editor" surname="Bray" />
            <date month="December" year="2017" />
            <abstract>
              <t>JavaScript Object Notation (JSON) is a lightweight, text-based, language-independent data interchange format. It was derived from the ECMAScript Programming Language Standard. JSON defines a small set of formatting rules for the portable representation of structured data.</t>
              <t>This document removes inconsistencies with other specifications of JSON, repairs specification errors, and offers experience-based interoperability guidance.</t>
            </abstract>
          </front>
          <seriesInfo name="STD" value="90" />
          <seriesInfo name="RFC" value="8259" />
          <seriesInfo name="DOI" value="10.17487/RFC8259" />
        </reference>
        <reference anchor="RFC8785" target="https://www.rfc-editor.org/info/rfc8785" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8785.xml">
          <front>
            <title>JSON Canonicalization Scheme (JCS)</title>
            <author fullname="A. Rundgren" initials="A." surname="Rundgren" />
            <author fullname="B. Jordan" initials="B." surname="Jordan" />
            <author fullname="S. Erdtman" initials="S." surname="Erdtman" />
            <date month="June" year="2020" />
            <abstract>
              <t>Cryptographic operations like hashing and signing need the data to be expressed in an invariant format so that the operations are reliably repeatable. One way to address this is to create a canonical representation of the data. Canonicalization also permits data to be exchanged in its original form on the "wire" while cryptographic operations performed on the canonicalized counterpart of the data in the producer and consumer endpoints generate consistent results.</t>
              <t>This document describes the JSON Canonicalization Scheme (JCS). This specification defines how to create a canonical representation of JSON data by building on the strict serialization methods for JSON primitives defined by ECMAScript, constraining JSON data to the Internet JSON (I-JSON) subset, and by using deterministic property sorting.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="8785" />
          <seriesInfo name="DOI" value="10.17487/RFC8785" />
        </reference>
        <reference anchor="RFC9110" target="https://www.rfc-editor.org/info/rfc9110" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9110.xml">
          <front>
            <title>HTTP Semantics</title>
            <author fullname="R. Fielding" initials="R." role="editor" surname="Fielding" />
            <author fullname="M. Nottingham" initials="M." role="editor" surname="Nottingham" />
            <author fullname="J. Reschke" initials="J." role="editor" surname="Reschke" />
            <date month="June" year="2022" />
            <abstract>
              <t>The Hypertext Transfer Protocol (HTTP) is a stateless application-level protocol for distributed, collaborative, hypertext information systems. This document describes the overall architecture of HTTP, establishes common terminology, and defines aspects of the protocol that are shared by all versions. In this definition are core protocol elements, extensibility mechanisms, and the "http" and "https" Uniform Resource Identifier (URI) schemes.</t>
              <t>This document updates RFC 3864 and obsoletes RFCs 2818, 7231, 7232, 7233, 7235, 7538, 7615, 7694, and portions of 7230.</t>
            </abstract>
          </front>
          <seriesInfo name="STD" value="97" />
          <seriesInfo name="RFC" value="9110" />
          <seriesInfo name="DOI" value="10.17487/RFC9110" />
        </reference>
        <reference anchor="RFC9111" target="https://www.rfc-editor.org/info/rfc9111" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9111.xml">
          <front>
            <title>HTTP Caching</title>
            <author fullname="R. Fielding" initials="R." role="editor" surname="Fielding" />
            <author fullname="M. Nottingham" initials="M." role="editor" surname="Nottingham" />
            <author fullname="J. Reschke" initials="J." role="editor" surname="Reschke" />
            <date month="June" year="2022" />
            <abstract>
              <t>The Hypertext Transfer Protocol (HTTP) is a stateless application-level protocol for distributed, collaborative, hypertext information systems. This document defines HTTP caches and the associated header fields that control cache behavior or indicate cacheable response messages.</t>
              <t>This document obsoletes RFC 7234.</t>
            </abstract>
          </front>
          <seriesInfo name="STD" value="98" />
          <seriesInfo name="RFC" value="9111" />
          <seriesInfo name="DOI" value="10.17487/RFC9111" />
        </reference>
      </references>
      <references anchor="sec-informative-references">
        <name>Informative References</name>
        <reference anchor="RFC6596" target="https://www.rfc-editor.org/info/rfc6596" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6596.xml">
          <front>
            <title>The Canonical Link Relation</title>
            <author fullname="M. Ohye" initials="M." surname="Ohye" />
            <author fullname="J. Kupke" initials="J." surname="Kupke" />
            <date month="April" year="2012" />
            <abstract>
              <t>RFC 5988 specifies a way to define relationships between links on the web. This document describes a new type of such a relationship, "canonical", to designate an Internationalized Resource Identifier (IRI) as preferred over resources with duplicative content. This document is not an Internet Standards Track specification; it is published for informational purposes.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="6596" />
          <seriesInfo name="DOI" value="10.17487/RFC6596" />
        </reference>
        <reference anchor="RFC9421" target="https://www.rfc-editor.org/info/rfc9421" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9421.xml">
          <front>
            <title>HTTP Message Signatures</title>
            <author fullname="A. Backman" initials="A." role="editor" surname="Backman" />
            <author fullname="J. Richer" initials="J." role="editor" surname="Richer" />
            <author fullname="M. Sporny" initials="M." surname="Sporny" />
            <date month="February" year="2024" />
            <abstract>
              <t>This document describes a mechanism for creating, encoding, and verifying digital signatures or message authentication codes over components of an HTTP message. This mechanism supports use cases where the full HTTP message may not be known to the signer and where the message may be transformed (e.g., by intermediaries) before reaching the verifier. This document also describes a means for requesting that a signature be applied to a subsequent HTTP message in an ongoing HTTP exchange.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="9421" />
          <seriesInfo name="DOI" value="10.17487/RFC9421" />
        </reference>
        <reference anchor="RFC9530" target="https://www.rfc-editor.org/info/rfc9530" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9530.xml">
          <front>
            <title>Digest Fields</title>
            <author fullname="R. Polli" initials="R." surname="Polli" />
            <author fullname="L. Pardue" initials="L." surname="Pardue" />
            <date month="February" year="2024" />
            <abstract>
              <t>This document defines HTTP fields that support integrity digests. The Content-Digest field can be used for the integrity of HTTP message content. The Repr-Digest field can be used for the integrity of HTTP representations. Want-Content-Digest and Want-Repr-Digest can be used to indicate a sender's interest and preferences for receiving the respective Integrity fields.</t>
              <t>This document obsoletes RFC 3230 and the Digest and Want-Digest HTTP fields.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="9530" />
          <seriesInfo name="DOI" value="10.17487/RFC9530" />
        </reference>
        <reference anchor="I-D.ietf-httpbis-unencoded-digest" target="https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-unencoded-digest-04" xml:base="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-httpbis-unencoded-digest.xml">
          <front>
            <title>HTTP Unencoded Digest</title>
            <author fullname="Lucas Pardue" initials="L." surname="Pardue">
              <organization>Cloudflare</organization>
            </author>
            <author fullname="Mike West" initials="M." surname="West">
              <organization>Google</organization>
            </author>
            <date day="2" month="March" year="2026" />
            <abstract>
              <t>The Repr-Digest and Content-Digest integrity fields are subject to HTTP content coding considerations. There are some use cases that benefit from the unambiguous exchange of integrity digests of unencoded representation. The Unencoded-Digest and Want-Unencoded- Digest fields complement existing integrity fields for this purpose. This document updates the terms "Integrity fields" and "Integrity preference fields" defined in RFC 9530.</t>
            </abstract>
          </front>
          <seriesInfo name="Internet-Draft" value="draft-ietf-httpbis-unencoded-digest-04" />
        </reference>
      </references>
    <section anchor="appendix-a-normalization-test-vectors">
      <name>Appendix A. Normalization Test Vectors</name>
      <t>This appendix provides test vectors for implementations that claim conformance to the normalization algorithm (Section 6.4).</t>
      <t><strong>Conformance Requirement:</strong> Implementations claiming normalization support MUST produce the outputs specified below for all test inputs.</t>
      <t><strong>Test Format:</strong> Each test shows:
- Input string
- Output after each normalization step
- Final SHA-256 hash (computed over UTF-8 bytes of final output)</t>
      <section anchor="a1-basic-ascii">
        <name>A.1. Basic ASCII</name>
        <t><strong>Test 1: Simple ASCII text</strong>
- Input: <tt>"Hello World"</tt>
- After step 1 (HTML decode): <tt>"Hello World"</tt>
- After step 2 (NFKC): <tt>"Hello World"</tt>
- After step 3 (casefold): <tt>"hello world"</tt>
- After step 4 (control chars): <tt>"hello world"</tt>
- After step 5 (whitespace): <tt>"hello world"</tt>
- After step 6 (trim): <tt>"hello world"</tt>
- SHA-256: <tt>b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9</tt></t>
        <t><strong>Test 2: Leading/trailing whitespace</strong>
- Input: <tt>"  Hello World  "</tt>
- After step 1: <tt>"  Hello World  "</tt>
- After step 2: <tt>"  Hello World  "</tt>
- After step 3: <tt>"  hello world  "</tt>
- After step 4: <tt>"  hello world  "</tt>
- After step 5: <tt>" hello world "</tt>
- After step 6: <tt>"hello world"</tt>
- SHA-256: <tt>b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9</tt></t>
        <t><strong>Test 3: Multiple spaces</strong>
- Input: <tt>"Hello    World"</tt>
- After step 1: <tt>"Hello    World"</tt>
- After step 2: <tt>"Hello    World"</tt>
- After step 3: <tt>"hello    world"</tt>
- After step 4: <tt>"hello    world"</tt>
- After step 5: <tt>"hello world"</tt>
- After step 6: <tt>"hello world"</tt>
- SHA-256: <tt>b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9</tt></t>
        <section anchor="a2-html-entities">
          <name>A.2. HTML Entities</name>
          <t><strong>Test 4: Common HTML entities</strong>
- Input: <tt>"Hello &amp;amp; goodbye"</tt>
- After step 1: <tt>"Hello &amp; goodbye"</tt>
- After step 2: <tt>"Hello &amp; goodbye"</tt>
- After step 3: <tt>"hello &amp; goodbye"</tt>
- After step 4: <tt>"hello &amp; goodbye"</tt>
- After step 5: <tt>"hello &amp; goodbye"</tt>
- After step 6: <tt>"hello &amp; goodbye"</tt>
- SHA-256: <tt>da73536eaa9c427f3189de5b6371d798193e98f3c31df8bef710bba835e8c621</tt></t>
          <t><strong>Test 5: Angle brackets</strong>
- Input: <tt>"&amp;lt;tag&amp;gt;"</tt>
- After step 1: <tt>"&lt;tag&gt;"</tt>
- After step 2: <tt>"&lt;tag&gt;"</tt>
- After step 3: <tt>"&lt;tag&gt;"</tt>
- After step 4: <tt>"&lt;tag&gt;"</tt>
- After step 5: <tt>"&lt;tag&gt;"</tt>
- After step 6: <tt>"&lt;tag&gt;"</tt>
- SHA-256: <tt>c81ef880af0fcfef49e1b45c3690a1666c47d9e064b7eaead2af09bb78884dcd</tt></t>
          <t><strong>Test 6: Quotes</strong>
- Input: <tt>"&amp;quot;quoted&amp;quot;"</tt>
- After step 1: <tt>"\"quoted\""</tt>
- After step 2: <tt>"\"quoted\""</tt>
- After step 3: <tt>"\"quoted\""</tt>
- After step 4: <tt>"\"quoted\""</tt>
- After step 5: <tt>"\"quoted\""</tt>
- After step 6: <tt>"\"quoted\""</tt>
- SHA-256: <tt>272fca25899893eeb27b89583d5c81b8a4ac5af4d1e37e3909d879947303c1c5</tt></t>
        </section>
        <section anchor="a3-unicode-normalization-nfkc">
          <name>A.3. Unicode Normalization (NFKC)</name>
          <t><strong>Test 7: Composed vs decomposed U+00E9</strong>
- Input (composed): <tt>"Caf\u00E9"</tt> (U+00E9)
- After step 1: <tt>"Caf\u00E9"</tt>
- After step 2 (NFKC): <tt>"Caf\u00E9"</tt> (normalized to composed form)
- After step 3: <tt>"caf\u00E9"</tt>
- After step 4: <tt>"caf\u00E9"</tt>
- After step 5: <tt>"caf\u00E9"</tt>
- After step 6: <tt>"caf\u00E9"</tt>
- SHA-256: <tt>850f7dc43910ff890f8879c0ed26fe697c93a067ad93a7d50f466a7028a9bf4e</tt></t>
          <t><strong>Test 7b: Decomposed form (should produce same result)</strong>
- Input (decomposed): <tt>"Cafe\u0301"</tt> (e + combining acute)
- After step 2 (NFKC): <tt>"Caf\u00E9"</tt> (normalized to composed)
- Final result: Same as Test 7
- SHA-256: <tt>850f7dc43910ff890f8879c0ed26fe697c93a067ad93a7d50f466a7028a9bf4e</tt> (same as Test 7)</t>
          <t><strong>Test 8: Full-width characters</strong>
- Input: <tt>"\uFF28\uFF25\uFF2C\uFF2C\uFF2F"</tt> (full-width Latin)
- After step 1: <tt>"\uFF28\uFF25\uFF2C\uFF2C\uFF2F"</tt>
- After step 2 (NFKC): <tt>"HELLO"</tt> (converted to half-width)
- After step 3: <tt>"hello"</tt>
- After step 4: <tt>"hello"</tt>
- After step 5: <tt>"hello"</tt>
- After step 6: <tt>"hello"</tt>
- SHA-256: <tt>2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824</tt></t>
        </section>
        <section anchor="a4-case-folding-edge-cases">
          <name>A.4. Case Folding Edge Cases</name>
          <t><strong>Test 9: German sharp S</strong>
- Input: <tt>"Stra\u00DFe"</tt>
- After step 1: <tt>"Stra\u00DFe"</tt>
- After step 2: <tt>"Stra\u00DFe"</tt>
- After step 3 (casefold): <tt>"strasse"</tt> (U+00DF -&gt; ss)
- After step 4: <tt>"strasse"</tt>
- After step 5: <tt>"strasse"</tt>
- After step 6: <tt>"strasse"</tt>
- SHA-256: <tt>16d96952087774fee069b7585d3991b24d90c181c09b2129b4908c35baa7f0c0</tt></t>
          <t><strong>Test 10: Turkish U+0130 (dotted capital I)</strong>
- Input: <tt>"\u0130stanbul"</tt>
- After step 1: <tt>"\u0130stanbul"</tt>
- After step 2: <tt>"\u0130stanbul"</tt>
- After step 3 (casefold): <tt>"i\u0307stanbul"</tt> (locale-independent)
- After step 4: <tt>"i\u0307stanbul"</tt>
- After step 5: <tt>"i\u0307stanbul"</tt>
- After step 6: <tt>"i\u0307stanbul"</tt>
- SHA-256: <tt>4a4df120f7d1f3c286f58651abfcec2aade892ace635f96f02b946c96e6e1f86</tt></t>
        </section>
        <section anchor="a5-control-characters">
          <name>A.5. Control Characters</name>
          <t><strong>Test 11: Embedded tab</strong>
- Input: <tt>"Hello\tWorld"</tt>
- After step 1: <tt>"Hello\tWorld"</tt>
- After step 2: <tt>"Hello\tWorld"</tt>
- After step 3: <tt>"hello\tworld"</tt>
- After step 4: <tt>"hello\tworld"</tt> (tab preserved)
- After step 5: <tt>"hello world"</tt> (tab -&gt; space)
- After step 6: <tt>"hello world"</tt>
- SHA-256: <tt>b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9</tt></t>
          <t><strong>Test 12: Embedded newline</strong>
- Input: <tt>"Hello\nWorld"</tt>
- After step 1: <tt>"Hello\nWorld"</tt>
- After step 2: <tt>"Hello\nWorld"</tt>
- After step 3: <tt>"hello\nworld"</tt>
- After step 4: <tt>"hello\nworld"</tt> (newline preserved)
- After step 5: <tt>"hello world"</tt> (newline -&gt; space)
- After step 6: <tt>"hello world"</tt>
- SHA-256: <tt>b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9</tt></t>
          <t><strong>Test 13: Control character (BEL)</strong>
- Input: <tt>"Hello\u0007World"</tt> (U+0007 = BEL)
- After step 1: <tt>"Hello\u0007World"</tt>
- After step 2: <tt>"Hello\u0007World"</tt>
- After step 3: <tt>"hello\u0007world"</tt>
- After step 4: <tt>"helloworld"</tt> (control char removed)
- After step 5: <tt>"helloworld"</tt>
- After step 6: <tt>"helloworld"</tt>
- SHA-256: <tt>936a185caaa266bb9cbe981e9e05cb78cd732b0b3280eb944412bb6f8f8f07af</tt></t>
        </section>
        <section anchor="a6-whitespace-edge-cases">
          <name>A.6. Whitespace Edge Cases</name>
          <t><strong>Test 14: Non-breaking space (NBSP)</strong>
- Input: <tt>"Hello\u00A0World"</tt> (U+00A0 = NBSP)
- After step 1: <tt>"Hello\u00A0World"</tt>
- After step 2: <tt>"Hello\u00A0World"</tt>
- After step 3: <tt>"hello\u00A0world"</tt>
- After step 4: <tt>"hello\u00A0world"</tt>
- After step 5: <tt>"hello world"</tt> (NBSP -&gt; space, if treating NBSP as whitespace)
- After step 6: <tt>"hello world"</tt>
- SHA-256: <tt>b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9</tt></t>
          <t><strong>Note:</strong> Step 5 behavior for NBSP is implementation-defined per Section 6.4 ("Optionally treat U+00A0 as whitespace, if done consistently"). This test assumes NBSP is treated as whitespace.</t>
          <t><strong>Test 15: Mixed whitespace</strong>
- Input: <tt>"Hello \t\n World"</tt>
- After step 1: <tt>"Hello \t\n World"</tt>
- After step 2: <tt>"Hello \t\n World"</tt>
- After step 3: <tt>"hello \t\n world"</tt>
- After step 4: <tt>"hello \t\n world"</tt>
- After step 5: <tt>"hello world"</tt>
- After step 6: <tt>"hello world"</tt>
- SHA-256: <tt>b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9</tt></t>
        </section>
        <section anchor="a7-empty-and-edge-cases">
          <name>A.7. Empty and Edge Cases</name>
          <t><strong>Test 16: Empty string</strong>
- Input: <tt>""</tt>
- After all steps: <tt>""</tt>
- SHA-256: <tt>e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855</tt></t>
          <t><strong>Test 17: Whitespace only</strong>
- Input: <tt>"   "</tt>
- After step 1: <tt>"   "</tt>
- After step 2: <tt>"   "</tt>
- After step 3: <tt>"   "</tt>
- After step 4: <tt>"   "</tt>
- After step 5: <tt>" "</tt>
- After step 6: <tt>""</tt> (trimmed)
- SHA-256: <tt>e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855</tt></t>
          <t><strong>Test 18: Single character</strong>
- Input: <tt>"A"</tt>
- After step 1: <tt>"A"</tt>
- After step 2: <tt>"A"</tt>
- After step 3: <tt>"a"</tt>
- After step 4: <tt>"a"</tt>
- After step 5: <tt>"a"</tt>
- After step 6: <tt>"a"</tt>
- SHA-256: <tt>ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb</tt></t>
        </section>
        <section anchor="a8-complex-real-world-examples">
          <name>A.8. Complex Real-World Examples</name>
          <t><strong>Test 19: Article excerpt</strong>
- Input: <tt>"The caf\u00E9's "special" offer: 50% off!"</tt> (Note: uses straight ASCII quotes ' and ", not curly quotes)
- Expected output after normalization: <tt>the caf\u00E9's "special" offer: 50% off!</tt>
- SHA-256: <tt>25cdbe2315674d38ddaf1df6fe7ccd494ce89efebe8a3b5285742e57e7367545</tt></t>
          <t><strong>Test 20: Unicode mixed with entities</strong>
- Input: <tt>"Clich\u00E9 &amp;amp; r\u00E9sum\u00E9"</tt>
- Expected output after normalization: <tt>clich\u00E9 &amp; r\u00E9sum\u00E9</tt>
- SHA-256: <tt>7d56f360edd22f7be0bc0f126d45481df83e8afc68b83788cf37544c4ee6ce21</tt></t>
        </section>
        <section anchor="a9-implementation-notes">
          <name>A.9. Implementation Notes</name>
          <t><strong>Computing SHA-256:</strong>
- Encode the final normalized string as UTF-8 bytes
- Compute SHA-256 over those bytes
- Express result as 64 lowercase hex characters</t>
          <t><strong>Test Vector Validation:</strong>
- Implementations claiming normalization support MUST produce the SHA-256 hashes specified above
- The seven "hello world" variants (Tests 1, 2, 3, 11, 12, 14, 15) all normalize to identical output (<tt>"hello world"</tt>), demonstrating whitespace normalization equivalence
- Tests 7 and 7b demonstrate NFKC combining character handling (both produce identical hashes)</t>
        </section>
      </section>
      <section anchor="appendix-b-example-flows-informative">
        <name>Appendix B. Example Flows (Informative)</name>
        <t>This appendix illustrates TCT discovery and fetch patterns.</t>
        <section anchor="b1-initial-discovery-and-first-fetch">
          <name>B.1. Initial Discovery and First Fetch</name>
          <t><strong>Scenario:</strong> Client visits origin for the first time.</t>
          <t>```
1. Client -&gt; Server: GET / HTTP/1.1
                     Host: example.com</t>
          <ol spacing="normal" type="1"><li>
              <t>Server -&gt; Client: HTTP/1.1 200 OK
                  Link: &lt;/llm-pages.json&gt;; rel="index"; type="application/json"
                  Content-Type: text/html  </t>
              <artwork>
              [HTML body...]
</artwork>
            </li>
            <li>
              <t>Client -&gt; Server: GET /llm-pages.json HTTP/1.1
                  Host: example.com</t>
            </li>
            <li>
              <t>Server -&gt; Client: HTTP/1.1 200 OK
                  Content-Type: application/json; charset=utf-8
                  Cache-Control: max-age=0, must-revalidate  </t>
              <artwork>
              {
                "version": 2,
                "profile": "tct-1",
                "items": [
                  {
                    "cUrl": "https://example.com/article/",
                    "mUrl": "https://example.com/article/llm.json",
                    "etag": "sha256-abc123..."
                  }
                ]
              }

              (M-Sitemap version 2 as defined in Section 7.1)
</artwork>
            </li>
            <li>
              <t>Client -&gt; Server: GET /article/llm.json HTTP/1.1
                  Host: example.com</t>
            </li>
            <li>
              <t>Server -&gt; Client: HTTP/1.1 200 OK
                  Content-Type: application/json; charset=utf-8
                  ETag: "sha256-abc123..."
                  Link: <eref target="https://example.com/article/">https://example.com/article/</eref>; rel="canonical"  </t>
              <artwork>
              {
                "profile": "tct-1",
                "canonical_url": "https://example.com/article/",
                "title": "The Golden Ball Rule",
                "content_media_type": "text/markdown; charset=utf-8",
                "content": "## Overview\n\nThe Golden Ball rule allows for a **147** maximum break..."
              } ```
</artwork>
            </li>
          </ol>
          <t><strong>Client actions after step 6:</strong>
- Stores M-URL content locally
- Caches ETag <tt>"sha256-abc123..."</tt> for future revalidation</t>
        </section>
        <section anchor="b2-zero-fetch-optimization-content-unchanged">
          <name>B.2. Zero-Fetch Optimization (Content Unchanged)</name>
          <t><strong>Scenario:</strong> Client returns after some time; content hasn't changed.</t>
          <t>```
1. Client -&gt; Server: GET /llm-pages.json HTTP/1.1
                     Host: example.com</t>
          <ol spacing="normal" type="1"><li>
              <t>Server -&gt; Client: HTTP/1.1 200 OK
                  Content-Type: application/json; charset=utf-8  </t>
              <artwork>
              {
                "version": 2,
                "profile": "tct-1",
                "items": [
                  {
                    "cUrl": "https://example.com/article/",
                    "mUrl": "https://example.com/article/llm.json",
                    "etag": "sha256-abc123..."
                  }
                ]
              }
</artwork>
            </li>
            <li>
              <t>Client local comparison:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Sitemap etag: "sha256-abc123..."</t>
                </li>
                <li>
                  <t>Cached ETag:  "sha256-abc123..."</t>
                </li>
                <li>
                  <t>Match! -&gt; Skip fetch entirely (zero-fetch)</t>
                </li>
              </ul>
            </li>
            <li>
              <t>Client uses locally cached content for /article/
```</t>
            </li>
          </ol>
          <t><strong>Result:</strong> Zero bytes transferred for M-URL; content known to be current.</t>
        </section>
        <section anchor="b3-conditional-request-content-unchanged">
          <name>B.3. Conditional Request (Content Unchanged)</name>
          <t><strong>Scenario:</strong> Sitemap etag differs from cache, but actual content hasn't changed.</t>
          <t>```
1. Client -&gt; Server: GET /llm-pages.json HTTP/1.1</t>
          <ol spacing="normal" type="1"><li>
              <t>Server -&gt; Client: [Sitemap shows etag: "sha256-def456..."]</t>
            </li>
            <li>
              <t>Client comparison:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Sitemap etag: "sha256-def456..." (different!)</t>
                </li>
                <li>
                  <t>Cached ETag:  "sha256-abc123..."</t>
                </li>
                <li>
                  <t>Mismatch -&gt; Must fetch, but use conditional request</t>
                </li>
              </ul>
            </li>
            <li>
              <t>Client -&gt; Server: GET /article/llm.json HTTP/1.1
                  If-None-Match: "sha256-abc123..."</t>
            </li>
            <li>
              <t>Server -&gt; Client: HTTP/1.1 304 Not Modified
                  ETag: "sha256-abc123..."</t>
            </li>
            <li>
              <t>Client actions:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Content unchanged; uses cached copy</t>
                </li>
                <li>
                  <t>Notes: sitemap was stale/inconsistent; no protocol violation
```</t>
                </li>
              </ul>
            </li>
          </ol>
          <t><strong>Result:</strong> Small 304 response instead of full payload.</t>
        </section>
        <section anchor="b4-content-changed">
          <name>B.4. Content Changed</name>
          <t><strong>Scenario:</strong> Content has been updated.</t>
          <t>```
1. Client -&gt; Server: GET /llm-pages.json HTTP/1.1</t>
          <ol spacing="normal" type="1"><li>
              <t>Server -&gt; Client: [Sitemap shows etag: "sha256-xyz789..."]</t>
            </li>
            <li>
              <t>Client comparison:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Sitemap etag: "sha256-xyz789..." (different)</t>
                </li>
                <li>
                  <t>Cached ETag:  "sha256-abc123..."</t>
                </li>
                <li>
                  <t>Mismatch -&gt; Fetch with If-None-Match</t>
                </li>
              </ul>
            </li>
            <li>
              <t>Client -&gt; Server: GET /article/llm.json HTTP/1.1
                  If-None-Match: "sha256-abc123..."</t>
            </li>
            <li>
              <t>Server -&gt; Client: HTTP/1.1 200 OK
                  Content-Type: application/json; charset=utf-8
                  ETag: "sha256-xyz789..."  </t>
              <artwork>
              {
                "profile": "tct-1",
                "canonical_url": "https://example.com/article/",
                "title": "Updated Title",
                "content_media_type": "text/markdown; charset=utf-8",
                "content": "## Overview\n\nUpdated content with new information..."
              }
</artwork>
            </li>
            <li>
              <t>Client actions:
              </t>
              <ul spacing="normal">
                <li>
                  <t>Replaces cached content</t>
                </li>
                <li>
                  <t>Updates cached ETag to "sha256-xyz789..."
```</t>
                </li>
              </ul>
            </li>
          </ol>
          <t><strong>Result:</strong> Full new representation received.</t>
        </section>
        <section anchor="b5-parity-mismatch-handling">
          <name>B.5. Parity Mismatch Handling</name>
          <t><strong>Scenario:</strong> Sitemap etag doesn't match actual M-URL ETag (transient inconsistency).</t>
          <t>```
1. Client -&gt; Server: GET /llm-pages.json HTTP/1.1</t>
          <ol spacing="normal" type="1"><li>
              <t>Server -&gt; Client: [Sitemap shows etag: "sha256-old999..."]</t>
            </li>
            <li>
              <t>Client -&gt; Server: GET /article/llm.json HTTP/1.1
                  If-None-Match: "sha256-old999..."</t>
            </li>
            <li>
              <t>Server -&gt; Client: HTTP/1.1 200 OK
                  ETag: "sha256-new000..."  &lt;- Different from sitemap!  </t>
              <artwork>
              [Full M-URL JSON...]
</artwork>
            </li>
            <li>
              <t>Client actions:  </t>
              <ul spacing="normal">
                <li>
                  <t>Accepts M-URL response (valid per HTTP)</t>
                </li>
                <li>
                  <t>Uses ETag from M-URL response ("sha256-new000...") for future revalidation</t>
                </li>
                <li>
                  <t>Notes: sitemap inconsistency tolerated; no error
```</t>
                </li>
              </ul>
            </li>
          </ol>
          <t><strong>Result:</strong> Client falls back to standard HTTP caching; no protocol failure.</t>
        </section>
        <section anchor="b6-using-head-for-efficient-freshness-check">
          <name>B.6. Using HEAD for Efficient Freshness Check</name>
          <t><strong>Scenario:</strong> Client wants to check freshness before fetching.</t>
          <t>```
1. Client -&gt; Server: HEAD /article/llm.json HTTP/1.1
                     If-None-Match: "sha256-abc123..."</t>
          <ol spacing="normal" type="1"><li>
              <t>Server -&gt; Client: HTTP/1.1 304 Not Modified
                  ETag: "sha256-abc123..."</t>
            </li>
            <li>
              <t>Client actions:  </t>
              <ul spacing="normal">
                <li>
                  <t>Content unchanged; uses cached copy</t>
                </li>
                <li>
                  <t>Avoided transferring full body</t>
                </li>
              </ul>
            </li>
          </ol>
          <t>OR (if content changed):</t>
          <ol spacing="normal" type="1"><li>
              <t>Server -&gt; Client: HTTP/1.1 200 OK
                  ETag: "sha256-xyz789..."
                  Content-Length: 4567</t>
            </li>
            <li>
              <t>Client -&gt; Server: GET /article/llm.json HTTP/1.1
                  If-None-Match: "sha256-abc123..."</t>
            </li>
            <li>
              <t>Server -&gt; Client: HTTP/1.1 200 OK
                  ETag: "sha256-xyz789..."  </t>
              <artwork>
              [Full M-URL JSON...] ```
</artwork>
            </li>
          </ol>
          <t><strong>Note:</strong> HEAD support is optional; conditional GET is the primary mechanism.</t>
        </section>
        <section anchor="b7-discovery-via-c-url">
          <name>B.7. Discovery via C-URL</name>
          <t><strong>Scenario:</strong> Client discovers M-URL directly from HTML page.</t>
          <t>```
1. Client -&gt; Server: GET /article/ HTTP/1.1</t>
          <ol spacing="normal" type="1"><li>
              <t>Server -&gt; Client: HTTP/1.1 200 OK
                  Link: &lt;/article/llm.json&gt;; rel="alternate"; type="application/json"
                  Content-Type: text/html  </t>
              <artwork>
              &lt;!DOCTYPE html&gt;
              &lt;html&gt;
              &lt;head&gt;
                &lt;link rel="alternate" type="application/json"
                      href="https://example.com/article/llm.json"&gt;
              &lt;/head&gt;
              ...
</artwork>
            </li>
            <li>
              <t>Client -&gt; Server: GET /article/llm.json HTTP/1.1</t>
            </li>
            <li>
              <t>Server -&gt; Client: HTTP/1.1 200 OK
                  ETag: "sha256-abc123..."
                  Link: <eref target="https://example.com/article/">https://example.com/article/</eref>; rel="canonical"  </t>
              <artwork>
              [M-URL JSON...]
</artwork>
            </li>
            <li>
              <t>Client verifies:  </t>
              <ul spacing="normal">
                <li>
                  <t>Canonical link points back to /article/ -&gt; Consistent OK
```</t>
                </li>
              </ul>
            </li>
          </ol>
          <t><strong>Result:</strong> M-URL discovered and validated without sitemap.</t>
        </section>
      </section>
      <section anchor="appendix-c-implementation-notes-informative">
        <name>Appendix C. Implementation Notes (Informative)</name>
        <t>This appendix provides guidance for implementers.</t>
        <section anchor="c1-reference-implementations">
          <name>C.1. Reference Implementations</name>
          <t><strong>Note:</strong> The following implementations are provided as informative examples. Repository URLs, package names, and deployment details may change over time and are not normative.</t>
          <t>The following implementations demonstrate TCT in production environments:</t>
          <t><strong>WordPress Plugin (PHP):</strong>
- Repository: https://github.com/antunjurkovic-collab/tct-wp-plugin
- Version: 1.0.0
- Deployment: 3 production sites (970 URLs total)
- Features:
  - Automatic M-URL generation for posts/pages
  - M-Sitemap generation and caching
  - Strong ETag computation using canonical JSON
  - Normalization algorithm implementation
- Dependencies: WordPress 6.0+, PHP 7.4+
- JSON serialization: <tt>json_encode()</tt> with <tt>JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE</tt></t>
          <t><strong>Python Validator:</strong>
- PyPI package: <tt>collab-tunnel</tt> (https://pypi.org/project/collab-tunnel/1.0.2/)
- Version: 1.0.2
- Purpose: Protocol compliance testing
- Features:
  - Validates M-URL and M-Sitemap structure
  - Tests ETag parity
  - Verifies canonical link bidirectionality
  - Runs normalization test vectors
- Usage:
  <tt>python
  from collab_tunnel import validate_origin
  results = validate_origin("https://example.com")
            </tt></t>
          <t><strong>Cloudflare Worker (Edge Proxy):</strong>
- Repository: https://github.com/antunjurkovic-collab/tct-worker
- Purpose: Demonstrates CDN integration
- Features:
  - Proxies M-URLs with proper ETag handling
  - Implements 304 Not Modified caching
  - Handles conditional requests correctly
- Deployment: Cloudflare Workers platform</t>
        </section>
        <section anchor="c2-deterministic-json-libraries">
          <name>C.2. Deterministic JSON Libraries</name>
          <t>For RFC 8785 (JSON Canonicalization Scheme) compliance:</t>
          <t><strong>Python:</strong>
            <tt>python
import canonicaljson
canonical_bytes = canonicaljson.encode_canonical_json(obj)
</tt>
- Library: <tt>pip install canonicaljson</tt>
- Docs: https://github.com/matrix-org/python-canonicaljson</t>
          <t><strong>JavaScript:</strong>
            <tt>javascript
const canonicalize = require('canonicalize');
const canonical_string = canonicalize(obj);
</tt>
- Library: <tt>npm install canonicalize</tt>
- Docs: https://github.com/cyberphone/json-canonicalization</t>
          <t><strong>Go:</strong>
            <tt>go
import "github.com/cyberphone/json-canonicalization/go/json"
canonical, _ := json.CanonicalizeJSON(input)
</tt></t>
          <t><strong>Alternative (Stable Ordering) for Non-Conformant Experiments:</strong>
If you are experimenting without claiming full TCT conformance, you can approximate canonicalization by ensuring:
- Object keys sorted lexicographically (at ALL nesting levels)
- No insignificant whitespace
- Consistent number formatting
- UTF-8 encoding without BOM</t>
          <t><strong>Important:</strong> Simple key-sorting helpers (e.g., <tt>Object.keys(obj).sort()</tt> in JavaScript) are <strong>insufficient</strong> for nested objects and do not guarantee conformance. For production implementations claiming TCT conformance, you MUST use RFC 8785 libraries or implement the full RFC 8785 specification. The code examples in C.3 are <strong>illustrative only</strong> and may not handle all edge cases correctly.</t>
        </section>
        <section anchor="c3-sha-256-computation">
          <name>C.3. SHA-256 Computation</name>
          <t>The examples in this section are illustrative only. By themselves they do not guarantee the deterministic JSON requirements of Section 6.1 unless combined with a complete canonicalization algorithm such as RFC8785.</t>
          <t><strong>Note:</strong> The examples below are simplified for illustration. For production use, ensure proper RFC 8785 canonicalization (see C.2) before hashing.</t>
          <t><strong>Python:</strong>
```python
import hashlib
import json</t>
        </section>
      </section>
    </section>
    <section anchor="canonical-json">
      <name>Canonical JSON</name>
      <t>canonical_json = json.dumps(obj, ensure_ascii=False, sort_keys=True, separators=(',', ':'))
canonical_bytes = canonical_json.encode('utf-8')</t>
    </section>
    <section anchor="hash">
      <name>Hash</name>
      <t>sha256_hash = hashlib.sha256(canonical_bytes).hexdigest()
etag_value = f'"sha256-{sha256_hash}"'
```</t>
      <t><strong>JavaScript:</strong>
```javascript
const crypto = require('crypto');</t>
      <t>// Canonical JSON
// NOTE: This is illustrative only; for full conformance, use RFC 8785 or a complete canonicalization implementation.
const canonical_json = JSON.stringify(obj, Object.keys(obj).sort());
const canonical_bytes = Buffer.from(canonical_json, 'utf-8');</t>
      <t>// Hash
const sha256_hash = crypto.createHash('sha256').update(canonical_bytes).digest('hex');
const etag_value = <tt>"sha256-${sha256_hash}"</tt>;
```</t>
      <t><strong>PHP:</strong>
```php
// Canonical JSON
$canonical_json = json_encode($obj, JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE);</t>
      <t>// Hash
$sha256_hash = hash('sha256', $canonical_json);
$etag_value = '"sha256-' . $sha256_hash . '"';
```</t>
      <section anchor="c4-unicode-normalization">
        <name>C.4. Unicode Normalization</name>
        <t><strong>Python:</strong>
```python
import unicodedata</t>
      </section>
    </section>
    <section anchor="nfkc-normalization">
      <name>NFKC normalization</name>
      <t>normalized = unicodedata.normalize('NFKC', text)</t>
    </section>
    <section anchor="case-folding">
      <name>Case folding</name>
      <t>casefolded = normalized.casefold()
```</t>
      <t><strong>JavaScript:</strong>
```javascript
// NFKC normalization
const normalized = text.normalize('NFKC');</t>
      <t>// Case folding (approximation: toLowerCase with locale-independent behavior)
const casefolded = normalized.toLocaleLowerCase('en-US');
```</t>
      <t><strong>PHP:</strong>
```php
// NFKC normalization (requires intl extension)
$normalized = Normalizer::normalize($text, Normalizer::NFKC);</t>
      <t>// Case folding (mb_strtolower with UTF-8)
$casefolded = mb_strtolower($normalized, 'UTF-8');
```</t>
      <section anchor="c5-http-response-headers">
        <name>C.5. HTTP Response Headers</name>
        <t><strong>Typical headers for M-URLs:</strong></t>
        <t>A conformant M-URL response will include at least: <tt>Content-Type: application/json</tt>, <tt>ETag</tt> (strong, quoted), and <tt>Link: rel="canonical"</tt> (see Sections 5-6). Example:</t>
        <t><tt>
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
ETag: "sha256-abc123..."
Link: &lt;https://example.com/article/&gt;; rel="canonical"
Cache-Control: public, max-age=3600, must-revalidate, no-transform
</tt></t>
        <t><strong>Typical headers for M-Sitemap:</strong>
          <tt>
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Cache-Control: max-age=0, must-revalidate
# Example; a short max-age consistent with Section 7.2 MAY also be used
</tt></t>
        <t><strong>Conditional request handling:</strong>
```
# Request with If-None-Match
GET /article/llm.json HTTP/1.1
If-None-Match: "sha256-abc123..."</t>
      </section>
    </section>
    <section anchor="response-if-unchanged">
      <name>Response if unchanged</name>
      <t>HTTP/1.1 304 Not Modified
ETag: "sha256-abc123..."
Cache-Control: public, max-age=3600, must-revalidate</t>
    </section>
    <section anchor="response-if-changed">
      <name>Response if changed</name>
      <t>HTTP/1.1 200 OK
ETag: "sha256-xyz789..."
Content-Type: application/json; charset=utf-8
[full body...]
```</t>
      <section anchor="c6-production-deployment-data">
        <name>C.6. Production Deployment Data</name>
        <t><strong>Note:</strong> The following deployment data represents a snapshot as of November 2025. Specific sites, URLs, and metrics are provided as informative examples and may change.</t>
        <t>As of November 2025, TCT is deployed on:</t>
        <t><strong>bestdemotivationalposters.com:</strong>
- 500 URLs
- 100K+ pageviews/month
- WordPress 6.4 + TCT plugin v1.0.0
- Average HTML size: 103 KB (gzipped)
- Average M-URL size: 17.7 KB (gzipped)
- Bandwidth reduction: 83%</t>
        <t><strong>wellbeing-support.com:</strong>
- 400 URLs
- Health/wellness content
- Average zero-fetch rate: 85%</t>
        <t><strong>omacedonii.com:</strong>
- 70 URLs
- Multilingual (Polish)
- Demonstrates Unicode normalization in production</t>
        <t><strong>Aggregate Results:</strong>
- Total URLs: 970
- Bandwidth reduction: 83% median
- Zero-fetch rate: 70-90% (depends on update frequency)
- Combined bandwidth elimination: ~98% in steady-state</t>
        <section anchor="c7-common-implementation-pitfalls">
          <name>C.7. Common Implementation Pitfalls</name>
          <t><strong>Pitfall 1: Non-deterministic JSON</strong>
- Problem: Random key ordering, floating-point precision issues, timestamps
- Solution: Use RFC 8785 or strict ordering; exclude per-request randomness</t>
          <t><strong>Pitfall 2: Weak ETags</strong>
- Problem: Using <tt>W/"sha256-..."</tt> instead of <tt>"sha256-..."</tt>
- Solution: Always use strong ETags for M-URLs (no W/ prefix)</t>
          <t><strong>Pitfall 3: Sitemap Staleness</strong>
- Problem: Sitemap regenerated asynchronously; race conditions cause mismatches
- Solution: Accept transient inconsistency; clients fall back to conditional GET</t>
          <t><strong>Pitfall 4: Incorrect Canonical Links</strong>
- Problem: rel="canonical" points to wrong URL or is missing
- Solution: Validate bidirectional linkage (C-URL &lt;-&gt; M-URL)</t>
          <t><strong>Pitfall 5: HTML in Content Field</strong>
- Problem: Including raw HTML tags in <tt>content</tt> field
- Solution: Extract plain text or use deterministic markup (e.g., Markdown)</t>
          <t><strong>Pitfall 6: Reusing Strong ETags Across Content-Coded Variants</strong>
- Problem: CDN serves gzipped and identity-coded variants with the same
  strong ETag even though their selected representation bytes differ.
- Solution: Either serve M-URLs without <tt>Content-Encoding</tt>, or use
  distinct strong ETags and appropriate <tt>Vary: Accept-Encoding</tt>
  handling for each coded variant.</t>
          <t><strong>Pitfall 7: Locale-Dependent Case Folding</strong>
- Problem: Turkish U+0130 -&gt; i vs. I -&gt; U+0131 (locale-specific)
- Solution: Use Unicode default case folding (locale-independent)</t>
        </section>
        <section anchor="c8-testing-and-validation">
          <name>C.8. Testing and Validation</name>
          <t><strong>Automated Testing:</strong>
1. Protocol compliance: Use <tt>collab-tunnel-validator</tt> PyPI package
2. Normalization: Run Appendix A test vectors
3. ETag parity: Compare sitemap etag vs. actual M-URL ETag
4. Conditional requests: Test If-None-Match with matching/non-matching ETags</t>
          <t><strong>Manual Testing:</strong>
1. Check discovery: <tt>curl -I https://example.com/</tt> (look for Link header)
2. Fetch sitemap: <tt>curl https://example.com/llm-pages.json</tt>
3. Validate M-URL: <tt>curl -H "Accept: application/json" https://example.com/article/llm.json</tt>
4. Test 304: <tt>curl -H 'If-None-Match: "sha256-..."' https://example.com/article/llm.json</tt></t>
          <t><strong>Integration Testing:</strong>
- Deploy to staging environment
- Monitor 304 response rates (should be &gt;70% after initial crawl)
- Check CDN cache hit rates
- Verify canonical link bidirectionality</t>
        </section>
        <section anchor="c9-performance-considerations">
          <name>C.9. Performance Considerations</name>
          <t><strong>Server-Side:</strong>
- Cache normalized content and ETags (don't recompute on every request)
- Generate M-Sitemap asynchronously during content updates
- Use HTTP/2 for parallel M-URL fetches
- Implement early ETag validation (before full response generation)</t>
          <t><strong>Client-Side:</strong>
- Use zero-fetch optimization when possible (sitemap comparison)
- Batch M-Sitemap fetches (don't fetch per-URL)
- Implement exponential backoff for 429/503 responses
- Cache M-URLs and ETags persistently</t>
          <t><strong>CDN/Proxy:</strong>
- Configure strong ETag preservation
- If compression is enabled, use distinct strong ETags and
  <tt>Vary: Accept-Encoding</tt> for coded variants
- Cache 304 responses appropriately
- Don't strip ETag headers</t>
        </section>
        <section anchor="c10-security-best-practices">
          <name>C.10. Security Best Practices</name>
          <t><strong>HTTPS:</strong>
- It is RECOMMENDED to serve M-URLs and M-Sitemap over HTTPS (see Section 10)
- Validate TLS certificates properly
- Use HSTS where appropriate</t>
          <t><strong>Access Control:</strong>
- If C-URL requires authentication, protect M-URL similarly
- Don't leak sensitive URLs in public sitemaps
- Implement rate limiting to prevent abuse</t>
          <t><strong>Content Validation:</strong>
- Treat M-URL JSON as untrusted input
- Validate/sanitize before use
- Don't execute code from content field</t>
          <t><strong>ETag Integrity:</strong>
- Use Content-Digest (RFC 9530) for additional integrity
- Consider HTTP Message Signatures (RFC 9421) for authentication
- Monitor for ETag collision attempts (though SHA-256 makes this infeasible)</t>
        </section>
      </section>
    </section>
    <section anchor="acknowledgments">
      <name>Acknowledgments</name>
      <t>Thanks to reviewers and implementers who provided feedback on earlier revisions, including those who highlighted:</t>
      <ul spacing="normal">
        <li>
          <t>the need for a single strong ETag method;</t>
        </li>
        <li>
          <t>the importance of deterministic JSON;</t>
        </li>
        <li>
          <t>the separation of protocol from policy; and</t>
        </li>
        <li>
          <t>simplification of discovery and parity semantics.</t>
        </li>
      </ul>
    </section>
  </back>
  

</rfc>