| Internet-Draft | Enhanced BGP Resilience | November 2025 |
| Zhuang & Wang | Expires 28 May 2026 | [Page] |
According to the base BGP specification, a BGP speaker that receives an UPDATE message containing a malformed attribute is required to reset the session over which the offending attribute was received. RFC7606 revises the error handling procedures for a number of existing attributes. The use of the "treat-as-withdraw" and "attribute discard" approaches significantly reduces the likelihood of BGP sessions being reset when receiving malformed BGP update messages, thereby greatly enhancing network stability. However, in practical applications, there are still numerous instances where BGP session oscillations occur due to the receipt of malformed BGP update messages, unrecognized attribute fields, or routing rules generated by a certain BGP AFI/SAFI that affect the forwarding of BGP messages.¶
This document introduces some approaches to enhance the stability of BGP sessions.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 RFC 2119 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 28 May 2026.¶
Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
As the internet carries an increasing number of services, its stability has become more and more important. The BGP protocol plays a crucial role in the internet, enabling different ISPs and enterprises to provide symmetrical and reliable connection services, ensuring that information and data on the internet can be transmitted and exchanged quickly and securely. The oscillation of BGP protocol sessions can have a significant negative impact on the internet. The method introduced in RFC7606 promotes the stability of BGP protocol sessions. However, in practical applications, there are still numerous instances where BGP session oscillations occur due to the receipt of malformed BGP update messages, unrecognized attribute fields, or routing rules generated by a certain BGP AFI/SAFI that affect the forwarding of BGP messages.¶
This document introduces some approaches to enhance the stability of BGP sessions.¶
As shown in the figure below, the process of BGP protection mode operation is described.¶
Router2 Router1
~ ~
|--------BGP Flapping---------|
| |
| * Router1 detectes multiple
| | oscillations in the BGP session
| | and initiates BGP protection
| | mode.
| |
|<-------BGP Open Msg.--------* The BGP Open message sends by
| | Router1 contains only the
| | capability parameter sets defined
| | by the protection mode.
| |
*--------BGP Open Msg.------->|
| * When receiving a BGP Open message
| | from Router2, only the predefined
| | capability sets are accepted, while
| | the rest are ignored.
|-------Session Established---|
| |
|<-------BGP Update Msg.------* Router1 sends Update messages in
| | protected mode.
| |
*--------BGP Update Msg.----->|
| * Router1 processes the received Update
| | message in protection mode.
~ ~
Figure 1: Schematic Diagram of BGP Protection Mode Operation
¶
The processing procedure is as follows:¶
1) Router1 detectes multiple oscillations in the BGP session and initiates BGP protection mode.¶
2) The BGP Open message sent by Router1 contains only the capability parameter sets defined by the protection mode.¶
For example:¶
Before implementing this solution: Send IPv4 unicast address family, IPv4 Flowspec address family, Route Refresh capability, etc. In some erroneous operations, the rules published by the Flowspec address family can filter out protocol messages in the BGP session, leading to a prolonged absence of protocol message exchanges and ultimately causing the BGP session to be interrupted.¶
After implementing this solution: Only send IPv4 unicast address family and Route Refresh capability, and no longer send IPv4 Flowspec address family and other capability parameters that may cause oscillation.¶
3) When receiving a BGP Open message from Router2, only the predefined capability sets are accepted, while the rest are ignored. After this operational step, Router1 does not have the capability to filter protocol packets for the new session. This eliminates the issue of repeated BGP session flaps caused by problematic Flowspec routes.¶
4) After the BGP session is established, when Router1 sends a BGP Update message to Router2, the BGP Update message contains only the set of attributes customized for protection mode.¶
5) When Router1 receives a BGP Update message from Router2, it only accepts the set of attributes in the BGP update that are configured for protection mode, while ignoring other BGP path attributes.¶
As shown in the figure below, the process of BGP Diagnostic mode operation is described.¶
Router2 Router1
~ ~
|--------BGP Flapping---------|
| |
| * Router1 detectes multiple
| | oscillations in the BGP session
| | and initiates BGP Diagnostic
| | mode.
| |
~ ~
Figure 2: Schematic Diagram of BGP Diagnostic Mode Operation
¶
After the BGP session has flapped multiple times (determine a threshold, for example, 5 times), the router implementing this solution enters diagnostic mode the next time the BGP session starts:¶
1) Upon entering diagnostic mode, the router allocates additional storage resources to the BGP module and optionally activates some diagnostic modules that are typically disabled in normal mode. These diagnostic modules may usually impact the operational performance of BGP.¶
2) The router records the BGP messages received and sent to the peer and stores them.¶
3) The diagnostic module on the router performs a diagnostic analysis of the legality of the BGP messages.¶
4) If the diagnostic module identifies some issues, when the session restarts or it actively restarts the session: at this point, it excludes the information causing the fault from the messages sent and ignores the information causing the fault when processing the information received from the peer.¶
5) If the session continues to flap after starting diagnostic mode (either because the diagnostic module did not promptly identify the issue or because there is no diagnostic module), the router initiates the protection mode; the subsequent process is the same as in section 3.¶
TBD¶
No IANA actions are required for this document.¶
This document does not change the security properties of BGP.¶
TBD¶
The authors would like to acknowledge the review and inputs from ... (TBD)¶