Network Working Group                                        M. Mathis
Request for Comments: 2018                                  J. Mahdavi
Category: Standards Track                                          PSC
                                                              S. Floyd
                                                                  LBNL
                                                            A. Romanow
                                                      Sun Microsystems
                                                          October 1996
                  TCP Selective Acknowledgment Options
Status of this Memo
   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.
Abstract
   TCP may experience poor performance when multiple packets are lost
   from one window of data.   With the limited information available
   from cumulative acknowledgments, a TCP sender can only learn about a
   single lost packet per round trip time.  An aggressive sender could
   choose to retransmit packets early, but such retransmitted segments
   may have already been successfully received.
   A Selective Acknowledgment (SACK) mechanism, combined with a
   selective repeat retransmission policy, can help to overcome these
   limitations.  The receiving TCP sends back SACK packets to the sender
   informing the sender of data that has been received. The sender can
   then retransmit only the missing data segments.
   This memo proposes an implementation of SACK and discusses its
   performance and related issues.
Acknowledgements
   Much of the text in this document is taken directly from RFC1072 "TCP
   Extensions for Long-Delay Paths" by Bob Braden and Van Jacobson.  The
   authors would like to thank Kevin Fall (LBNL), Christian Huitema
   (INRIA), Van Jacobson (LBNL), Greg Miller (MITRE), Greg Minshall
   (Ipsilon), Lixia Zhang (XEROX PARC and UCLA), Dave Borman (BSDI),
   Allison Mankin (ISI) and others for their review and constructive
   comments.
Mathis, et. al.             Standards Track                     [Page 1]
RFC 2018         TCP Selective Acknowledgement Options      October 1996
1.  Introduction
   Multiple packet losses from a window of data can have a catastrophic
   effect on TCP throughput. TCP [Postel81] uses a cumulative
   acknowledgment scheme in which received segments that are not at the
   left edge of the receive window are not acknowledged.  This forces
   the sender to either wait a roundtrip time to find out about each
   lost packet, or to unnecessarily retransmit segments which have been
   correctly received [Fall95].  With the cumulative acknowledgment
   scheme, multiple dropped segments generally cause TCP to lose its
   ACK-based clock, reducing overall throughput.
   Selective Acknowledgment (SACK) is a strategy which corrects this
   behavior in the face of multiple dropped segments.  With selective
   acknowledgments, the data receiver can inform the sender about all
   segments that have arrived successfully, so the sender need
   retransmit only the segments that have actually been lost.
   Several transport protocols, including NETBLT [Clark87], XTP
   [Strayer92], RDP [Velten84], NADIR [Huitema81], and VMTP [Cheriton88]
   have used selective acknowledgment.  There is some empirical evidence
   in favor of selective acknowledgments -- simple experiments with RDP
   have shown that disabling the selective acknowledgment facility
   greatly increases the number of retransmitted segments over a lossy,
   high-delay Internet path [Partridge87]. A recent simulation study by
   Kevin Fall and Sally Floyd [Fall95], demonstrates the strength of TCP
   with SACK over the non-SACK Tahoe and Reno TCP implementations.
   RFC1072 [VJ88] describes one possible implementation of SACK options
   for TCP.  Unfortunately, it has never been deployed in the Internet,
   as there was disagreement about how SACK options should be used in
   conjunction with the TCP window shift option (initially described
   RFC1072 and revised in [Jacobson92]).
   We propose slight modifications to the SACK options as proposed in
   RFC1072.  Specifically, sending a selective acknowledgment for the
   most recently received data reduces the need for long SACK options
   [Keshav94, Mathis95].  In addition, the SACK option now carries full
   32 bit sequence numbers.  These two modifications represent the only
   changes to the proposal in RFC1072.  They make SACK easier to
   implement and address concerns about robustness.
   The selective acknowledgment extension uses two TCP options. The
   first is an enabling option, "SACK-permitted", which may be sent in a
   SYN segment to indicate that the SACK option can be used once the
   connection is established.  The other is the SACK option itself,
   which may be sent over an established connection once permission has
   been given by SACK-permitted.
Mathis, et. al.             Standards Track                     [Page 2]
RFC 2018         TCP Selective Acknowledgement Options      October 1996
   The SACK option is to be included in a segment sent from a TCP that
   is receiving data to the TCP that is sending that data; we will refer
   to these TCP's as the data receiver and the data sender,
   respectively.  We will consider a particular simplex data flow; any
   data flowing in the reverse direction over the same connection can be
   treated independently.
2.  Sack-Permitted Option
   This two-byte option may be sent in a SYN by a TCP that has been
   extended to receive (and presumably process) the SACK option once the
   connection has opened.  It MUST NOT be sent on non-SYN segments.
       TCP Sack-Permitted Option:
       Kind: 4
       +---------+---------+
       | Kind=4  | Length=2|
       +---------+---------+
3.  Sack Option Format
   The SACK option is to be used to convey extended acknowledgment
   information from the receiver to the sender over an established TCP
   connection.
       TCP SACK Option:
       Kind: 5
       Length: Variable
                         +--------+--------+
                         | Kind=5 | Length |
       +--------+--------+--------+--------+
       |      Left Edge of 1st Block       |
       +--------+--------+--------+--------+
       |      Right Edge of 1st Block      |
       +--------+--------+--------+--------+
       |                                   |
       /            . . .                  /
       |                                   |
       +--------+--------+--------+--------+
       |      Left Edge of nth Block       |
       +--------+--------+--------+--------+
       |      Right Edge of nth Block      |
       +--------+--------+--------+--------+
Mathis, et. al.             Standards Track                     [Page 3]
RFC 2018         TCP Selective Acknowledgement Options      October 1996
   The SACK option is to be sent by a data receiver to inform the data
   sender of non-contiguous blocks of data that have been received and
   queued.  The data receiver awaits the receipt of data (perhaps by
   means of retransmissions) to fill the gaps in sequence space between
   received blocks.  When missing segments are received, the data
   receiver acknowledges the data normally by advancing the left window
   edge in the Acknowledgement Number Field of the TCP header.  The SACK
   option does not change the meaning of the Acknowledgement Number
   field.
   This option contains a list of some of the blocks of contiguous
   sequence space occupied by data that has been received and queued
   within the window.
   Each contiguous block of data queued at the data receiver is defined
   in the SACK option by two 32-bit unsigned integers in network byte
   order:
   *    Left Edge of Block
        This is the first sequence number of this block.
   *    Right Edge of Block
        This is the sequence number immediately following the last
        sequence number of this block.
   Each block represents received bytes of data that are contiguous and
   isolated; that is, the bytes just below the block, (Left Edge of
   Block - 1), and just above the block, (Right Edge of Block), have not
   been received.
   A SACK option that specifies n blocks will have a length of 8*n+2
   bytes, so the 40 bytes available for TCP options can specify a
   maximum of 4 blocks.  It is expected that SACK will often be used in
   conjunction with the Timestamp option used for RTTM [Jacobson92],
   which takes an additional 10 bytes (plus two bytes of padding); thus
   a maximum of 3 SACK blocks will be allowed in this case.
   The SACK option is advisory, in that, while it notifies the data
   sender that the data receiver has received the indicated segments,
   the data receiver is permitted to later discard data which have been
   reported in a SACK option.  A discussion appears below in Section 8
   of the consequences of advisory SACK, in particular that the data
   receiver may renege, or drop already SACKed data.
Mathis, et. al.             Standards Track                     [Page 4]
RFC 2018         TCP Selective Acknowledgement Options      October 1996
4.  Generating Sack Options: Data Receiver Behavior
   If the data receiver has received a SACK-Permitted option on the SYN
   for this connection, the data receiver MAY elect to generate SACK
   options as described below.  If the data receiver generates SACK
   options under any circumstance, it SHOULD generate them under all
   permitted circumstances.  If the data receiver has not received a
   SACK-Permitted option for a given connection, it MUST NOT send SACK
   options on that connection.
   If sent at all, SACK options SHOULD be included in all ACKs which do
   not ACK the highest sequence number in the data receiver's queue.  In
   this situation the network has lost or mis-ordered data, such that
   the receiver holds non-contiguous data in its queue.  RFC 1122,
   Section 4.2.2.21, discusses the reasons for the receiver to send ACKs
   in response to additional segments received in this state.  The
   receiver SHOULD send an ACK for every valid segment that arrives
   containing new data, and each of these "duplicate" ACKs SHOULD bear a
   SACK option.
   If the data receiver chooses to send a SACK option, the following
   rules apply:
      * The first SACK block (i.e., the one immediately following the
      kind and length fields in the option) MUST specify the contiguous
      block of data containing the segment which triggered this ACK,
      unless that segment advanced the Acknowledgment Number field in
      the header.  This assures that the ACK with the SACK option
      reflects the most recent change in the data receiver's buffer
      queue.
      * The data receiver SHOULD include as many distinct SACK blocks as
      possible in the SACK option.  Note that the maximum available
      option space may not be sufficient to report all blocks present in
      the receiver's queue.
      * The SACK option SHOULD be filled out by repeating the most
      recently reported SACK blocks (based on first SACK blocks in
      previous SACK options) that are not subsets of a SACK block
      already included in the SACK option being constructed.  This
      assures that in normal operation, any segment remaining part of a
      non-contiguous block of data held by the data receiver is reported
      in at least three successive SACK options, even for large-window
      TCP implementations [RFC1323]).  After the first SACK block, the
      following SACK blocks in the SACK option may be listed in
      arbitrary order.
Mathis, et. al.             Standards Track                     [Page 5]
RFC 2018         TCP Selective Acknowledgement Options      October 1996
   It is very important that the SACK option always reports the block
   containing the most recently received segment, because this provides
   the sender with the most up-to-date information about the state of
   the network and the data receiver's queue.
5.  Interpreting the Sack Option and Retransmission Strategy: Data
   Sender Behavior
   When receiving an ACK containing a SACK option, the data sender
   SHOULD record the selective acknowledgment for future reference.  The
   data sender is assumed to have a retransmission queue that contains
   the segments that have been transmitted but not yet acknowledged, in
   sequence-number order.  If the data sender performs re-packetization
   before retransmission, the block boundaries in a SACK option that it
   receives may not fall on boundaries of segments in the retransmission
   queue; however, this does not pose a serious difficulty for the
   sender.
   One possible implementation of the sender's behavior is as follows.
   Let us suppose that for each segment in the retransmission queue
   there is a (new) flag bit "SACKed", to be used to indicate that this
   particular segment has been reported in a SACK option.
   When an acknowledgment segment arrives containing a SACK option, the
   data sender will turn on the SACKed bits for segments that have been
   selectively acknowledged.  More specifically, for each block in the
   SACK option, the data sender will turn on the SACKed flags for all
   segments in the retransmission queue that are wholly contained within
   that block.  This requires straightforward sequence number
   comparisons.
   After the SACKed bit is turned on (as the result of processing a
   received SACK option), the data sender will skip that segment during
   any later retransmission.  Any segment that has the SACKed bit turned
   off and is less than the highest SACKed segment is available for
   retransmission.
   After a retransmit timeout the data sender SHOULD turn off all of the
   SACKed bits, since the timeout might indicate that the data receiver
   has reneged.  The data sender MUST retransmit the segment at the left
   edge of the window after a retransmit timeout, whether or not the
   SACKed bit is on for that segment.  A segment will not be dequeued
   and its buffer freed until the left window edge is advanced over it.
Mathis, et. al.             Standards Track                     [Page 6]
RFC 2018         TCP Selective Acknowledgement Options      October 1996
5.1  Congestion Control Issues
   This document does not attempt to specify in detail the congestion
   control algorithms for implementations of TCP with SACK.  However,
   the congestion control algorithms present in the de facto standard
   TCP implementations MUST be preserved [Stevens94].  In particular, to
   preserve robustness in the presence of packets reordered by the
   network, recovery is not triggered by a single ACK reporting out-of-
   order packets at the receiver.  Further, during recovery the data
   sender limits the number of segments sent in response to each ACK.
   Existing implementations limit the data sender to sending one segment
   during Reno-style fast recovery, or to two segments during slow-start
   [Jacobson88].  Other aspects of congestion control, such as reducing
   the congestion window in response to congestion, must similarly be
   preserved.
   The use of time-outs as a fall-back mechanism for detecting dropped
   packets is unchanged by the SACK option.  Because the data receiver
   is allowed to discard SACKed data, when a retransmit timeout occurs
   the data sender MUST ignore prior SACK information in determining
   which data to retransmit.
   Future research into congestion control algorithms may take advantage
   of the additional information provided by SACK.  One such area for
   future research concerns modifications to TCP for a wireless or
   satellite environment where packet loss is not necessarily an
   indication of congestion.
6.  Efficiency and Worst Case Behavior
   If the return path carrying ACKs and SACK options were lossless, one
   block per SACK option packet would always be sufficient.  Every
   segment arriving while the data receiver holds discontinuous data
   would cause the data receiver to send an ACK with a SACK option
   containing the one altered block in the receiver's queue.  The data
   sender is thus able to construct a precise replica of the receiver's
   queue by taking the union of all the first SACK blocks.
Mathis, et. al.             Standards Track                     [Page 7]
RFC 2018         TCP Selective Acknowledgement Options      October 1996
   Since the return path is not lossless, the SACK option is defined to
   include more than one SACK block in a single packet.  The redundant
   blocks in the SACK option packet increase the robustness of SACK
   delivery in the presence of lost ACKs.  For a receiver that is also
   using the time stamp option [Jacobson92], the SACK option has room to
   include three SACK blocks.  Thus each SACK block will generally be
   repeated at least three times, if necessary, once in each of three
   successive ACK packets.  However, if all of the ACK packets reporting
   a particular SACK block are dropped, then the sender might assume
   that the data in that SACK block has not been received, and
   unnecessarily retransmit those segments.
   The deployment of other TCP options may reduce the number of
   available SACK blocks to 2 or even to 1.  This will reduce the
   redundancy of SACK delivery in the presence of lost ACKs.  Even so,
   the exposure of TCP SACK in regard to the unnecessary retransmission
   of packets is strictly less than the exposure of current
   implementations of TCP.  The worst-case conditions necessary for the
   sender to needlessly retransmit data is discussed in more detail in a
   separate document [Floyd96].
   Older TCP implementations which do not have the SACK option will not
   be unfairly disadvantaged when competing against SACK-capable TCPs.
   This issue is discussed in more detail in [Floyd96].
7.  Sack Option Examples
   The following examples attempt to demonstrate the proper behavior of
   SACK generation by the data receiver.
   Assume the left window edge is 5000 and that the data transmitter
   sends a burst of 8 segments, each containing 500 data bytes.
      Case 1: The first 4 segments are received but the last 4 are
      dropped.
      The data receiver will return a normal TCP ACK segment
      acknowledging sequence number 7000, with no SACK option.
Mathis, et. al.             Standards Track                     [Page 8]
RFC 2018         TCP Selective Acknowledgement Options      October 1996
      Case 2:  The first segment is dropped but the remaining 7 are
      received.
         Upon receiving each of the last seven packets, the data
         receiver will return a TCP ACK segment that acknowledges
         sequence number 5000 and contains a SACK option specifying
         one block of queued data:
             Triggering    ACK      Left Edge   Right Edge
             Segment
             5000         (lost)
             5500         5000     5500       6000
             6000         5000     5500       6500
             6500         5000     5500       7000
             7000         5000     5500       7500
             7500         5000     5500       8000
             8000         5000     5500       8500
             8500         5000     5500       9000
      Case 3:  The 2nd, 4th, 6th, and 8th (last) segments are
      dropped.
      The data receiver ACKs the first packet normally.  The
      third, fifth, and seventh packets trigger SACK options as
      follows:
          Triggering  ACK    First Block   2nd Block     3rd Block
          Segment            Left   Right  Left   Right  Left   Right
                             Edge   Edge   Edge   Edge   Edge   Edge
          5000       5500
          5500       (lost)
          6000       5500    6000   6500
          6500       (lost)
          7000       5500    7000   7500   6000   6500
          7500       (lost)
          8000       5500    8000   8500   7000   7500   6000   6500
          8500       (lost)
Mathis, et. al.             Standards Track                     [Page 9]
RFC 2018         TCP Selective Acknowledgement Options      October 1996
      Suppose at this point, the 4th packet is received out of order.
      (This could either be because the data was badly misordered in the
      network, or because the 2nd packet was retransmitted and lost, and
      then the 4th packet was retransmitted). At this point the data
      receiver has only two SACK blocks to report.  The data receiver
      replies with the following Selective Acknowledgment:
          Triggering  ACK    First Block   2nd Block     3rd Block
          Segment            Left   Right  Left   Right  Left   Right
                             Edge   Edge   Edge   Edge   Edge   Edge
          6500       5500    6000   7500   8000   8500
      Suppose at this point, the 2nd segment is received.  The data
      receiver then replies with the following Selective Acknowledgment:
          Triggering  ACK    First Block   2nd Block     3rd Block
          Segment            Left   Right  Left   Right  Left   Right
                             Edge   Edge   Edge   Edge   Edge   Edge
          5500       7500    8000   8500
8.  Data Receiver Reneging
   Note that the data receiver is permitted to discard data in its queue
   that has not been acknowledged to the data sender, even if the data
   has already been reported in a SACK option.  Such discarding of
   SACKed packets is discouraged, but may be used if the receiver runs
   out of buffer space.
   The data receiver MAY elect not to keep data which it has reported in
   a SACK option.  In this case, the receiver SACK generation is
   additionally qualified:
      * The first SACK block MUST reflect the newest segment.  Even if
      the newest segment is going to be discarded and the receiver has
      already discarded adjacent segments, the first SACK block MUST
      report, at a minimum, the left and right edges of the newest
      segment.
      * Except for the newest segment, all SACK blocks MUST NOT report
      any old data which is no longer actually held by the receiver.
   Since the data receiver may later discard data reported in a SACK
   option, the sender MUST NOT discard data before it is acknowledged by
   the Acknowledgment Number field in the TCP header.
Mathis, et. al.             Standards Track                    [Page 10]
RFC 2018         TCP Selective Acknowledgement Options      October 1996
9.  Security Considerations
   This document neither strengthens nor weakens TCP's current security
   properties.
10. References
   [Cheriton88]  Cheriton, D., "VMTP: Versatile Message Transaction
   Protocol", RFC 1045, Stanford University, February 1988.
   [Clark87] Clark, D., Lambert, M., and L. Zhang, "NETBLT: A Bulk Data
   Transfer Protocol", RFC 998, MIT, March 1987.
   [Fall95]  Fall, K. and Floyd, S., "Comparisons of Tahoe, Reno, and
   Sack TCP", ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z, December 1995.
   [Floyd96]  Floyd, S.,  "Issues of TCP with SACK",
   ftp://ftp.ee.lbl.gov/papers/issues_sa.ps.Z, January 1996.
   [Huitema81] Huitema, C., and Valet, I., An Experiment on High Speed
   File Transfer using Satellite Links, 7th Data Communication
   Symposium, Mexico, October 1981.
   [Jacobson88] Jacobson, V., "Congestion Avoidance and Control",
   Proceedings of SIGCOMM '88, Stanford, CA., August 1988.
   [Jacobson88}, Jacobson, V. and R. Braden, "TCP Extensions for Long-
   Delay Paths", RFC 1072, October 1988.
   [Jacobson92] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions
   for High Performance", RFC 1323, May 1992.
   [Keshav94]  Keshav, presentation to the Internet End-to-End Research
   Group, November 1994.
   [Mathis95]  Mathis, M., and Mahdavi, J., TCP Forward Acknowledgment
   Option, presentation to the Internet End-to-End Research Group, June
   1995.
   [Partridge87]  Partridge, C., "Private Communication", February 1987.
   [Postel81]  Postel, J., "Transmission Control Protocol - DARPA
   Internet Program Protocol Specification", RFC 793, DARPA, September
   1981.
   [Stevens94] Stevens, W., TCP/IP Illustrated, Volume 1: The Protocols,
   Addison-Wesley, 1994.
Mathis, et. al.             Standards Track                    [Page 11]
RFC 2018         TCP Selective Acknowledgement Options      October 1996
   [Strayer92] Strayer, T., Dempsey, B., and Weaver, A., XTP -- the
   xpress transfer protocol. Addison-Wesley Publishing Company, 1992.
   [Velten84] Velten, D., Hinden, R., and J. Sax, "Reliable Data
   Protocol", RFC 908, BBN, July 1984.
11. Authors' Addresses
    Matt Mathis and Jamshid Mahdavi
    Pittsburgh Supercomputing Center
    4400 Fifth Ave
    Pittsburgh, PA 15213
    mathis@psc.edu
    mahdavi@psc.edu
    Sally Floyd
    Lawrence Berkeley National Laboratory
    One Cyclotron Road
    Berkeley, CA 94720
    floyd@ee.lbl.gov
    Allyn Romanow
    Sun Microsystems, Inc.
    2550 Garcia Ave., MPK17-202
    Mountain View, CA 94043
    allyn@eng.sun.com
Mathis, et. al.             Standards Track                    [Page 12]