<?xml version="1.0" encoding="utf-8"?>
<?xml-model href="rfc7991bis.rnc"?>  <!-- Required for schema validation and schema-aware editing -->
<!-- <?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?> -->
<!-- This third-party XSLT can be enabled for direct transformations in XML processors, including most browsers -->


<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<!-- If further character entities are required then they should be added to the DOCTYPE above.
     Use of an external entity file is not recommended. -->

<rfc
  xmlns:xi="http://www.w3.org/2001/XInclude"
  category="info"
  docName="draft-hu-6man-ipv6-flowlabel-load-balancing-rdma-00"
  ipr="trust200902"
  obsoletes=""
  updates=""
  submissionType="IETF"
  xml:lang="en"
  version="3">

  <front>
    <title abbrev="draft-hu-6man-ipv6-flowlabel-load-balancing-rdma-00"> A RoCEv2 Flow-Level Load Balancing Method Based on the IPv6 Flow Label
    </title>
    <!--  [REPLACE/DELETE] abbrev. The abbreviated title is required if the full title is longer than 39 characters -->

    <seriesInfo name="Internet-Draft" value="draft-hu-6man-ipv6-flowlabel-load-balancing-rdma-00"/>
   
    <author fullname="Jiayuan Hu" initials="Jiayuan" role="editor" surname="Hu">
      <organization>China Telecom</organization>
      <address>
        <postal>
          <street>109, West Zhongshan Road, Tianhe District</street>
          <city>Guangzhou</city>
          <region>Guangzhou</region>
          <code>510000</code>
          <country>CN</country>
        </postal>
        <email>hujy5@chinatelecom.cn</email>
      </address>
    </author>

    <author fullname="Xia Gong" initials="Xia" role="editor" surname="Gong">
      <organization>China Telecom</organization>
      <address>
        <postal>
          <street>109, West Zhongshan Road, Tianhe District</street>
          <city>Guangzhou</city>
          <region>Guangzhou</region>
          <code>510000</code>
          <country>CN</country>
        </postal>
        <email>gongxia@chinatelecom.cn</email>
      </address>
    </author>

    <date year="2026"/>

    <area>Internet</area>
    <workgroup>IPv6 Maintenance</workgroup>
    <!-- "Internet Engineering Task Force" is fine for individual submissions.  If this element is 
          not present, the default is "Network Working Group", which is used by the RFC Editor as 
          a nod to the history of the RFC Series. -->

    <keyword>RFC</keyword>
    <!-- [REPLACE/DELETE]. Multiple allowed.  Keywords are incorporated into HTML output files for 
         use by search engines. -->

    <abstract>
      <t>
        This document proposes a method for achieving flow-level load balancing in RoCEv2 (RDMA over Converged Ethernet
        version 2) networks. Traditional per-flow load balancing based on the 5-tuple cannot distinguish between
        different RDMA sessions that share the same 5-tuple. This causes "elephant flows" to be hashed to the same
        path, leading to network congestion. This method resolves this issue by parsing the QP (Queue Pair) information
        from the IB BTH (Base Transport Header) and IB DETH (Datagram Extended Transport Header) headers of the RoCEv2
        packet. By combining this with portions of the IPv6 source and destination addresses as an entropy source, a
        CRC32 hash algorithm generates a 20-bit value, which is then written into the Flow Label field of the IPv6
        header. Network devices can subsequently use the updated "5-tuple + Flow Label" for more granular flow-level
        load balancing, thereby effectively improving transmission efficiency in high-performance networks such as AI
        computing.
      </t>
    </abstract>
 
  </front>


  <middle>
    
    <section>
      <name>Introduction</name>
      <t>
        The rapid advancement of Artificial Intelligence (AI) and High-Performance Computing (HPC) has driven the
        widespread adoption of Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCEv2) in data center and
        intelligent computing networks. RoCEv2 enables high-throughput, low-latency data transfers that are critical
        for distributed training and storage workloads. However, the effective operation of these networks is challenged
        by the inherent characteristics of RDMA traffic, particularly the "elephant flow" problem.
      </t>
      <t>
        Traditional load balancing mechanisms in IP networks typically rely on a 5-tuple (source/destination IP address,
        source/destination port, and protocol number) to identify and distribute traffic flows. In RoCEv2 networks, a
        significant limitation arises: multiple distinct RDMA sessions or flows generated by the same upper-layer
        application may share an identical 5-tuple. This is because the RDMA Queue Pair (QP) information, which uniquely
        identifies a session, is encapsulated within the InfiniBand Base Transport Header (IB BTH) and Datagram Extended
        Transport Header (IB DETH) of the RoCEv2 packet. Consequently, conventional 5-tuple-based hashing treats these
        distinct RDMA flows as a single entity and forwards them to the same network path, leading to severe congestion,
        packet loss, and a significant degradation in overall network throughput and performance.
      </t>
      <t>
        To address this problem, this document introduces a novel method for flow-level load balancing that leverages a
        standard IPv6 extension mechanism. The core idea is to enable network devices, such as routers and switches,
        to extract the QP pair information (source QP and destination QP) from
        the RoCEv2 packets. This extracted QP pair information is then used as input to a CRC32-based hash function to
        generate a unique per-flow identifier. This identifier is subsequently mapped into the Flow Label field of the
        IPv6 header.
      </t>
      <t>
        By combining the traditional 5-tuple with this dynamically generated Flow Label, the proposed method creates a
        fine-grained "5-tuple + Flow Label" flow identification key. This allows network devices to effectively
        distinguish between different RDMA sessions that were previously indistinguishable, thereby achieving true
        flow-level load balancing. This approach minimizes path collisions, reduces congestion, and enhances the
        utilization of multi-path network topologies within RoCEv2 environments.
      </t>
      <t>
        This document outlines the concept, details the packet processing method, and describes the mapping of the QP
        pair to the IPv6 Flow Label field. The subsequent sections will cover the mechanism in detail, discuss its
        advantages over existing solutions, and present use cases for its implementation in intelligent computing and
        data center networks.
      </t>
    </section>
      
    <section title="Conventions Used in This Document">
      <section>
        <name>Requirements Language</name>
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
          "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT
          RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
          interpreted as described in BCP 14 <xref target="RFC2119"/>
          <xref target="RFC8174"/> when, and only when, they appear in
          all capitals, as shown here.</t>
      </section>

      <section title="Abbreviations">
        <t> AIDC: Artificial Intelligence Data Center</t>
        <t> RoCEv2: RDMA over Converged Ethernet version 2</t>
        <t> RDMA: Remote Direct Memory Access</t>
        <t> QP: Queue Pair</t>
        <t> IB BTH: InfiniBand Base Transport Header</t>
        <t> IB DETH: InfiniBand Datagram Extended Transport Header</t>
        <t> CRC32: Cyclic Redundancy Check 32-bit algorithm.</t>
      </section>
    </section>
      <!-- [CHECK] The 'Requirements Language' section is optional -->

    <section title="Flow-Level Load Balancing Based on the IPv6 Flow Label">
      <section>
        <name>Construction of the Hash Input</name>
        <t>
          Ensuring the generated Flow Label can uniquely identify an RDMA flow while possessing sufficient randomness
          to minimize collision probability is critical. The procedure for constructing the hash input is as follows:
        </t>
        <t>
          1. Extract the QP Pair:
        </t>
        <t>
          * Src_QP: Extracted from the IB DETH header, 24 bits long (e.g., 0x123456).
        </t>
        <t>
          * Dst_QP: Extracted from the IB BTH header, 24 bits long (e.g., 0xABCDEF).
        </t>
        <t>
          2. Generate the Entropy Source:
        </t>
        <t>
          To increase hash randomness, an entropy source is introduced. This scheme recommends using portions of the
          IPv6 addresses.
        </t>
        <t>
            * Take the lower 16 bits of the IPv6 source address as the first entropy source, Entropy_Src.
        </t>
        <t>
            * Take the lower 16 bits of the IPv6 destination address as the second entropy source, Entropy_Dst.
        </t>
      </section>
        <section>
          <name>Hash by CRC32 Algorithm</name>
          <t>
            This draft uses CRC32 as the core hash algorithm and Initialize the CRC register to 0xFFFFFFFF. CRC32 offers
            advantages such as fast computation, hardware-friendly implementation, and a low collision rate, making it
            highly suitable for line-rate forwarding in network devices.
          </t>
          <t>
            First step is Byte-wise Split (using Hash_Input = 0x123456ABCDEF00010002): 0x12, 0x34, 0x56, 0xAB, 0xCD, 0xEF, 0x00, 0x01, 0x00, 0x02
          </t>
          <t>
            Second step is iterative Processing per Byte (using the first byte 0x12 as an example):
          </t>
          <t>
            Step 1 (XOR): XOR the lower 8 bits of the CRC register with the byte 0x12.
          </t>
          <t>
            Step 2 (8-bit Shift-XOR Loop): Process the result from Step 1 bit-by-bit for 8 iterations. In each iteration:
          </t>
          <t>
            a. Check the least significant bit (LSB) of the CRC register.
          </t>
          <t>
            b. Shift the CRC register right by one bit (pad the high bit with 0).
          </t>
          <t>
            c. If the LSB was 1, XOR the result with the generator polynomial 0x04C11DB7.
          </t>
          <t>
            Repeat Steps 1 and 2 for all subsequent bytes.
          </t>
          <t>
            After processing all bytes, the value in the CRC register is the final 32-bit hash result (e.g., 0x8E4D7A2F).
          </t>
        </section>
        <section>
          <name>Flow Label Field Population</name>
          <t>
            From the 32-bit CRC32 hash result, take the lower 20 bits as the Flow Label value and write this 20-bit
            value into the Flow Label field of the IPv6 header.
          </t>
          <figure>
          <name>Updated IPv6 Header Structure Showing the Newly Populated Flow Label Field</name>
          <artwork align="center"><![CDATA[
+---------+---------+---------+---------+---------+---------+---------+
| Version | Traffic Class     |          Flow Label (20 bits)         |
+---------+---------+---------+---------+---------+---------+---------+
|         Payload Length         |  Next Header  |        Hop Limit   |
+---------+---------+---------+---------+---------+---------+---------+
|                                                                     |
+                       IPv6 Source Address                           +
|                                                                     |
+---------+---------+---------+---------+---------+---------+---------+
|                                                                     |
+                       IPv6 Destination Address                      +
|                                                                     |
+---------+---------+---------+---------+---------+---------+---------+
|  ...    | UDP Header        |  ...    | IB BTH  |  ...    | IB DETH |
+---------+---------+---------+---------+---------+---------+---------+
            ]]>
          </artwork>
        </figure>
        </section>
      </section>
    
    <section anchor="IANA">
    <!-- All drafts are required to have an IANA considerations section. See RFC 8126 for a guide.-->
      <name>IANA Considerations</name>
      <t>This document makes no request to IANA.</t>
    </section>
    
    <section anchor="Security">
      <!-- All drafts are required to have a security considerations section. See RFC 3552 for a guide. -->
      <name>Security Considerations</name>
      <section>
        <name>Security issue</name>
        <t>
          This scheme only modifies the Flow Label field of the IPv6 header, which is performed by the ingress network
          device. It does not involve altering the packet payload and does not affect end-to-end application-layer
          security (e.g., IPsec). The modification does not change IP addresses or port numbers, thus imposing no
          additional processing burden on existing stateful firewalls or NAT devices.
        </t>
      </section>
      <section>
        <name>Compatibility issue</name>
        <t>
          End-to-End Protocol: The receiving device typically ignores the Flow Label field, making the scheme completely
          transparent to terminals that support standard IPv6.
        </t>
        <t>
          Intermediate Devices: All network devices supporting the IPv6 Flow Label field can benefit from this scheme.
          For legacy devices that do not support the Flow Label, they can still forward packets based on the traditional
          5-tuple. The scheme will not cause connectivity issues, but the full performance benefits will not be realized.
        </t>
        <t>
          Hardware-Friendly Implementation: The CRC32 algorithm is widely supported in existing network ASICs.
          Implementing the required logic (parsing BTH/DETH headers, performing the hash, and modifying the Flow Label)
          is relatively straightforward and requires minimal changes to existing hardware.
        </t>
      </section>
    </section>
    
    <!-- NOTE: The Acknowledgements and Contributors sections are at the end of this template -->
  </middle>

  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
        <!-- The recommended and simplest way to include a well known reference -->
        
      </references>
    </references>
    
    <section anchor="Contributors" numbered="false">
      <!-- [REPLACE/DELETE] a Contributors section is optional -->
      <name>Contributors</name>
      <t>Thanks to all the contributors.</t>
      <!-- [CHECK] it is optional to add a <contact> record for some or all contributors -->
    </section>
    
 </back>
</rfc>
