| Internet-Draft | SRv6 Path CC | February 2026 |
| Liu, et al. | Expires 1 September 2026 | [Page] |
This document describes a congestion control solution based on SRv6. It defines mechanisms for congestion notification and flow control within an SRv6-based network, optimizing congestion handling through hierarchical congestion control messages along SRv6 paths.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 1 September 2026.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
The SRv6 network needs a reliable and efficient mechanism for handling congestion across different segments. Current congestion control techniques lack the ability to handle congestion in a fine-grained, per-path manner. This draft proposes a solution that uses SRv6 path segments and slicing to notify upstream nodes and take actions to reduce congestion. The key idea is to notify upstream nodes about congestion and enable flow control based on SRv6 segments (SID lists). This process is integrated with the SRv6 network's slicing capabilities to provide fine-grained control over network traffic, ensuring lossless transmission of data across SRv6 network.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
Priority Flow Control (PFC) provides hop-by-hop, priority-based traffic control. Compared to the traditional Ethernet Pause mechanism, PFC enables more precise flow management by creating multiple virtual channels on a link, each of which can be paused or resumed independently, ensuring that traffic of different priorities does not interfere with one another.¶
With the growth of intelligent computing services, scenarios such as disaggregated computing and real-time inference require the lossless transmission of large volumes of bursty traffic. In interconnected wide-area networks (WANs), when network congestion occurs, the congestion status must be quickly propagated upstream to both head-end devices and edge devices, enabling hop-by-hop reduction of sending rates. These intelligent computing WANs typically use SRv6 Policies for transport. However, once traffic enters a policy, traditional PFC mechanisms face the following three major challenges:¶
Imprecise Congestion Notification: PFC propagates congestion information via Ethernet multicast frames. In WANs with complex topologies, multicast-based congestion signaling cannot accurately reach upstream SRv6 nodes, potentially leading to incorrect flow suppression and impacting unrelated services.¶
Long Path Latency: WAN paths are long and have significant latency. If congestion signals must be sent all the way back to the data center or edge devices, it results in prolonged traffic degradation. Therefore, upstream control on the SRv6 path is needed to respond promptly.¶
Control Overhead at the Head Node: A single head node in the WAN may manage numerous SRv6 paths. If all congestion messages are sent back to the head node, it could become a processing bottleneck. Performing distributed traffic control at intermediate nodes along the SRv6 path can alleviate the burden on the head node.¶
+----------+ +----------+
| Data | | Data |
| center A | | center B |
+----------+ +----------+
| Congestion Occurs ^
| | |
v v |
+----+ --> +----+ --> +----+ --> +----+ --> +----+
| R1 | | R2 | | R3 | | R4 | | R5 |
+----+ +----+ +----+ +----+ +----+
|
<-------------------------------------|
Congestion Notification
Consider two data centers, A and B, connected via an SRv6 path defined as R1 -> R2 -> R3 -> R4 -> R5, as shown in Figure 1. The process follows these steps:¶
The head node R1 encapsulates the SID list (SRv6 path) containing R2 -> R3 -> R4 -> R5. It may optionally carry an SRv6 path segment (PSID) and starts forwarding the data. The source address is R1, and the destination address is the SID of R2.¶
Transit nodes (R2, R3, R4) forward data according to the SID list, with each node checking its local SID table for forwarding and slice-related information.¶
When a node, such as R4, faces congestion(such as queue overload), it sends a congestion notification message to the previous node in the SID list (R3), including congestion-related information. For example, the priority queue where congestion occurs, congestion control parameter information (such as pause-time and/or target bandwidth), and slice ID of the suppressed tenant.¶
R3 receives the notification and adjusts the forwarding rate based on local capacity. If R3 cannot handle the congestion, the notification is forwarded further upstream to R2 and so on.¶
If no node can manage the congestion, the head node R1 adjusts the path load balancing or selects an alternate path to mitigate the congestion.¶
The congestion notification message can be encapsulated in either ICMPv6 [RFC4443] or UDP [RFC768] messages. Regardless of the encapsulation format, they contains following fields:¶
Checksum: Used for error-checking the packet.¶
Flags: Contains special flags. not defined.¶
Priority: Queue priority identifier, each priority queue occupies 1 bit (from high-order to low-order bits representing high priority to low priority respectively). If each bit is set to 1, it indicates that the priority queue is suppressed due to congestion control. If each bit is set to 0, it indicates that suppression is released from the priority queue.¶
Argument[]: Congestion control parameter information, each priority occupies 2 bytes, totaling 16 bytes. The use of arguments can be combined with flags, supporting flexible definition of congestion control parameter fields. By default(when all flag bits are 0), the meaning of argument is pause-time, measured in microseconds. When the upstream node's action execution exceeds the value of pausetime, traffic must be restored immediately.¶
Target Bandwidth: Indicates the target bandwidth information for expectation suppression. The default value is 0.¶
Slice ID: The identifier for the slice experiencing congestion.¶
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Checksum | +---------------+---------------+-------------------------------+ | Flags | Priority | Reserved | +---------------+---------------+-------------------------------+ | Argument[0] | Argument[1] | +-------------------------------+-------------------------------+ | Argument[2] | Argument[3] | +-------------------------------+-------------------------------+ | Argument[4] | Argument[5] | +-------------------------------+-------------------------------+ | Argument[6] | Argument[7] | +-------------------------------+-------------------------------+ | Target Bandwidth | ----------------------------------------------------------------+ | Slice ID | +---------------------------------------------------------------+
Where:¶
Type and Code: These fields indicate the specific congestion notification type and its sub-type, providing details about the kind of congestion event being reported.¶
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | UDP source port | UDP destination port | +-------------------------------+-------------------------------+ | UDP length | UDP checksum | +---------------+---------------+-------------------------------+ | Flags | Priority | Reserved | +---------------+---------------+-------------------------------+ | Argument[0] | Argument[1] | +-------------------------------+-------------------------------+ | Argument[2] | Argument[3] | +-------------------------------+-------------------------------+ | Argument[4] | Argument[5] | +-------------------------------+-------------------------------+ | Argument[6] | Argument[7] | +-------------------------------+-------------------------------+ | Target Bandwidth | ----------------------------------------------------------------+ | Slice ID | +---------------------------------------------------------------+
Where:¶
UDP Destination port: A new port indicates the congestion notification packet.¶
The SID configuration of each node in the figure is as follows: End.X SIDs of nodes R1 to R5 are A::1:1,A::2:1,A::3:1,A::4:1,A::5:1, and the slice ID corresponding to each SID is 1. The VPN SID of the R5 node is A::5:F.¶
The running process of each node is as follows:¶
The data packet sent by R1 is encapsulated with the SRv6 Policy. The SID list is {A::2:1, A::3:1, A::4:1, A::5:F}. The source address is A::1::, and the destination address is A::2:1.¶
Packets are forwarded based on the SID list of the SRv6 path. The destination address of the packet forwarded by R4 is replaced with A::5:F. The forwarding plane selects the corresponding slice based on the slice ID carried in the packet and selects the priority queue to be used based on the service class of the packet. When the 6th priority queue corresponding to slice 1 on the outbound interface of R4 is congested. current buffer usage exceeds the preset threshold (50%), a congestion notification message needs to be sent to the previous hop of the SRv6 path.¶
R4 constructs a congestion control packet in ICMPv6/UDP format and sends it to the previous-hop node. The packet carries the slice ID that identifies the tenant, the priority queue where congestion occurs, and the parameters that the tenant is expected to use for traffic control, such as pause-time. The source address of the IP packet can be the local address of R4, and the destination address can be the address of the previous-hop node R3, so that the data packet can reach R3 through routing. In this example, the slice ID is 1, the priority queue is 6, and the arguments parameter is used to identify the pause-time, which is 5 ms. This indicates that the upstream node R3 is expected to stop sending packets for 5 ms.¶
After receiving the congestion notification packet, R3 checks whether the destination address is local. R3 parses the ICMPv6/UDP congestion control packet to obtain the tenant slice ID and the priority queue that identifies the congestion, and then performs traffic control on the priority queue of the tenant based on the traffic control parameter information carried in the packet. In this example, R3 performs traffic control on priority queue 6 of slice 1 for a duration of 5 ms.¶
If the local buffer of R3 is sufficient, the congestion on R4 is relieved at the previous hop R3. This method effectively alleviates small network bursts through congestion control packets, thereby avoiding packet loss due to congestion. When the local buffer of the priority queue of the tenant on R3 is insufficient (the usage exceeds the threshold), R3 constructs an ICMPv6/UDP congestion control packet (for details about how to construct congestion control parameters, see R4) to notify the previous-hop node R2. This process is repeated until the congestion status is transmitted to the ingress node of the tunnel through each hop. The ingress node then resolves the congestion status through multipath load balancing or selects an alternate path.¶
This document does not introduce any new security considerations.¶
This document requests IANA to allocate a new ICMP message type and UDP port.¶