| Internet-Draft | TCP Provenance Identifier Option | February 2026 |
| Liang, et al. | Expires 8 August 2026 | [Page] |
This document describes a TCP option that carries a Provenance Identifier (ProvID) to enable correlation of TCP connections when transport-layer identifiers change along the path.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 8 August 2026.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
In administrative domains (e.g., cloud platforms, enterprise networks, and data centers), TCP traffic often traverses devices such as NATs, load balancers, and service proxies that rewrite transport-layer identifiers or terminate and re-originate TCP connections. As a result, a single end-to-end exchange between two workloads may correspond to a sequence of distinct TCP connections within the domain. This document refers to that end-to-end exchange as a "logical communication".¶
These transformations break provenance continuity. Observations of TCP traffic at different points in the domain cannot reliably be associated with the same logical communication, and operators cannot determine which workload instance originally initiated a connection once rewriting has occurred.¶
This document defines an experimental TCP option that carries a small Provenance Identifier (ProvID). A ProvID is a compact value generated for the duration of a logical communication using workload-scoped attributes, such as host IP with process identifier. As illustrated in Figure 1, the ProvID enables provenance correlation across rewriting boundaries within the domain.¶
The ProvID option is intended for use within administrative domains and is not designed for use on the open Internet.¶
+------------------+ +--------------+ +-------------+
| Originating Host | | Middlebox | | Destination |
| | | (terminates) | | |
| [Workload A] | TCP #1 | | TCP #2 | |
| ProvID = P1 | (P1) | | (P1) | |
| |--------->| Relay P1 |--------->| |
| [Workload B] | | | | |
| ProvID = P2 | | | | |
| | | | | |
+------------------+ +--------------+ +-------------+
| | |
Observation Point A Observation Point B Observation Point C
logs: (TCP #1, P1) logs: (TCP #1, P1) logs: (TCP #2, P1)
(TCP #2, P1)
Logical Communication (ProvID = P1) for Workload A:
{ TCP #1, TCP #2 }
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
In many administrative domains, operators need to reconstruct the path of a logical communication for troubleshooting, incident investigation, or auditing. Flow logs and measurements collected at different observation points commonly reference different TCP connections that correspond to the same logical communication, and the originating endpoint cannot be reliably identified from transport identifiers alone.¶
In this use case, the originating endpoint includes the ProvID in the TCP header options. Observation points record the ProvID alongside locally observed connection identifiers. When a logical communication is realized as a series of distinct TCP connections within the domain, the ProvID provides a stable correlation handle for aggregating these disparate records. This enables end-to-end reconstruction of communication paths across middleboxes, supports cross-layer observability (for example, linking network telemetry with application or process context), and maintains provenance continuity.¶
An operator may detect anomalous TCP behavior at an observation point and need to remediate the issue by acting on the specific originator responsible. At that point, the original source may be obscured by address translation or connection rewriting, and even host-level attribution may be insufficient because multiple independent processes can share a host.¶
In this use case, the originating endpoint generates a ProvID using the pair (host IP address, process identifier) of the process that created the socket. An operator that observes abnormal traffic associated with a ProvID can map the ProvID to the initiating (host IP, process identifier) and take immediate action on that process (for example, suspend or restart the process, isolate the workload instance, or apply a narrowly scoped network policy).¶
The Provenance Identifier (ProvID) option uses a fixed-length experimental TCP option format. The option is identified by the experimental option kind and is distinguished by a fixed option length.¶
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Kind | Length | ExID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ProvID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
To maintain provenance continuity within an administrative domain, middleboxes (as defined in [RFC3234]) MUST handle the ProvID option according to their function.¶
A non-terminating middlebox is a device that resides on the communication path but does not terminate the end-to-end TCP connection. These middleboxes MUST forward the ProvID option unmodified in any segment where it appears. If the middlebox modifies transport-layer identifiers (e.g., performing Network Address Translation), it MUST NOT strip or alter the ProvID option.¶
A terminating middlebox is a device that acts as the endpoint for a TCP connection (e.g., a service proxy). To maintain provenance continuity, these middleboxes MUST relay the ProvID from the incoming connection to the outgoing connection. Specifically, whenever a ProvID is observed in an incoming TCP segment, the middlebox MUST include the identical ProvID value in the corresponding segment of the outgoing connection.¶
To prevent the leakage of internal network metadata, middleboxes at the boundary of the administrative domain MUST strip the ProvID option from any TCP segments exiting the domain. Similarly, any ProvID options present in traffic entering the domain from the open Internet MUST be stripped.¶
This document's IANA considerations are to be determined and will be provided in a subsequent revision of this draft. [TODO]¶
This document's security considerations are to be determined and will be provided in a subsequent revision of this draft. [TODO]¶
TODO¶