Section 8.4. BGP-4 Support for IPv6

8.4. BGP-4 Support for IPv6

There is no actual BGP for IPv6. The IPv6 support derives from the capability of BGP-4 to exchange information about network layer protocols other than IPv4. These multiprotocol extensions of BGP-4 are defined in RFC 2858, which obsoletes RFC 2283. RFC 2283 is mentioned here because it is the base document for RFC 2545, which defines the IPv6 extensions for BGP-4. It is important to understand BGP-4 fully before looking at its multiprotocol extensions. The following sections start with a short overview of BGP-4 and its operations as defined in RFC 4271 (formerly RFC 1771). BGP message types are then discussed. The last part covers the implementation of IPv6 information carried within BGP-4.

8.4.1. BGP-4 Overview

Each AS runs its interior routing protocol (RIP, OSPF, etc.) to distribute all routing information within the AS. The BGP is an exterior routing protocol whose primary function is to exchange information about the reachability of networks between ASes. Each AS receives a unique AS number assigned by the numbering authority. Figure 8-34 shows the different types of ASes that can be interconnected using BGP-4.

Figure 8-34. BGP traffic and AS types

The AS types are further explained in the following list:

Transit AS: A transit AS has multiple connections to other ASes. Routing updates from any AS arriving at the transit AS may be passed through the AS and distributed out to other neighboring ASes. A transit AS can forward traffic to any other AS based on the routing information received. The AS of larger ISPs are usually of this type.
Stub AS: A stub AS has a single connection to another AS. All traffic to or from the stub AS passes through this link. Smaller ISPs and campus or corporate networks use this kind of AS. Most stub ASes don't have a unique AS number assigned, as they don't really have any use for BGP. Their network addresses are treated as part of the parent AS.
Multihomed nontransit AS: A multihomed nontransit AS has multiple connections to one or more other ASes. It does not pass routing updates through. Traffic not belonging to this AS is therefore never forwarded. A multihomed nontransit AS allows multiple entry/exit points to be used for load sharing of inbound and outbound traffic.

Two routers exchanging routing information with BGP are called BGP peers, BGP speakers, or BGP neighbors. They first establish a TCP connection to ensure reliable transport. The peers then open the actual BGP connection to exchange BGP messages. The most important BGP message is the UPDATE message, which contains the routes to be exchanged. A BGP route is defined as a unit of information consisting of the Network Layer Reachability Information (NLRI) and a set of path attributes. The NLRI is basically an IPv4 prefix and its prefix length. Any concept of IPv4 class information has been eliminated. The NLRI may represent a single network or, more commonly, an aggregate (summary) of a range of addresses. Each NLRI is accompanied by a set of path attributes that add additional information to the BGP route, i.e., the next hop address, a sequence of ASes through which the route has passed during its update, or its origin. Routing decisions and traffic management are often based on these path attributes. One attribute must be emphasized here, as it plays a very important role in loop detection: it is called AS_PATH, and it carries a sequence of AS numbers through which the route has passed. If the receiving peer recognizes its own AS number within the AS_PATH, it rejects the corresponding route.

BGP routing updates are exchanged between two peers. They are governed by policies. Outbound policies specify which NLRIs are advertised to a particular peer. A router can advertise only the BGP routes it uses itself. Inbound policies specify which BGP routes are accepted from a particular peer. Policies may also be used to modify a BGP route (including its attributes), to change its characteristics.

8.4.1.1. Establishing a BGP connection

In order to exchange routing updates, two peers first have to establish a BGP connection. Figure 8-35 illustrates the steps needed to establish a BGP connection, including the different BGP messages exchanged and the peer state. The entire state machine is explained in detail in RFC 4271. Each message and its fields are explained in the upcoming section "BGP Message Header."

Figure 8-35. Establishing a BGP connection

To initiate and establish a BGP connection, the peers use the BGP OPEN message. If both routers simultaneously try to establish a BGP connection to each other, two parallel connections might well be formed. To avoid this connection collision, one router has to back down. The connection initiated by the router with the higher BGP Identifier prevails. The BGP Identifier is uniquely assigned to each BGP router and is exchanged during the OPEN message. Once the open is confirmed, the routers exchange the entire routing table based on their policies. Only changes in the routing table are exchanged from now on. Routing exchanges are done using BGP UPDATE messages. BGP KEEPALIVE messages prevent the connection from timing out. The TCP session guarantees reliable delivery of each BGP message.

BGP distinguishes between the following peer connections:

IBGP connection: The peers are in the same AS and are called internal peers. BGP routes learned from internal peers must not be sent back to other internal peers; they can be sent only to external peers. Each internal peer must have a connection to all other internal peers so internal peers are fully meshed. The introduction of AS confederation for BGP (RFC 3065) or BGP route reflection (RFC 2796) relaxes this rule. The AS_PATH and NEXT_HOP attributes must not be modified when passing updates to internal peers.
EBGP connection: The peers are in different ASes and are called external peers. BGP routes learned from external peers can be updated to all other peers. When sending an update to an external peer, the AS_PATH and NEXT_HOP attributes are modified. The sending router adds the local AS number to the AS_PATH and sets the NEXT_HOP field to its local IPv4 address.

BGP NOTIFICATION messages inform the peers of any errors during the open or update process. The connection can be shut down in a controlled fashion using a cease NOTIFICATION message.

8.4.1.2. Route storage and policies

BGP routes are stored in a Routing Information Base (RIB). Figure 8-36 shows the three different RIBs and their interactions.

Figure 8-36. BGP RIBs and their interactions

Incoming messages could contain new feasible routes, replacement routes of earlier updates, or routes that have been withdrawn by the advertising peer. All these routes are placed into the Adj-RIB-In. For each new or changed route, a degree of preference is calculated based on the inbound policy. This preference is placed into the attribute LOCAL_PREF. If the route arrives from an internal peer, the LOCAL_PREF is already carried in the update and should not be recalculated. Each route in the Adj-RIB-In is now processed by the route selection process and entered into the Loc-RIB. The selection process first looks at the NEXT_HOP and AS_PATH attributes of the route. The IP address specified by the NEXT_HOP must be reachable through an entry in the local routing table. The AS_PATH must not contain the local AS number. If the two attributes comply, the route is accepted or ignored based on the inbound policy; otherwise, the route is ignored. In case of multiple routes to the same destination, the route with the highest preference is accepted. In case of the same preference, a complex tie-breaking rule ensures that only one of the routes to the same destination is accepted. See RFC 4271 for more details on this tie-breaking rule.

Routes in the Loc-RIB are now placed into the local routing table. The true next hop address is taken from the local route entry to the IPv4 address specified in the NEXT_HOP attribute.

All routes in the Loc-RIB and all routes in the local routing table are eligible to be advertised to external peers of this router. Only routes in the Loc-RIB learned from external peers are eligible to be advertised to all internal peers of this router unless route reflection is enabled (see RFC 2796). The outbound policy disseminates the routes to a peer-specific Adj-RIB-Out. The outbound policy may perform route aggregation or path attribute modification. Changes in the Adj-RIB-Out cause the update process to send an update to the peer.

8.4.2. BGP Message Header

BGP messages are carried on top of TCP connections, which can be established over either IPv4 or IPv6. The source and destination IP addresses of the datagram depend on the peer configuration. They are always unicast. BGP connections use the well-known TCP port 179. Remember that only one TCP connection is established between two peering routes. Figure 8-37 shows the BGP message header format. The header has a fixed size of 19 bytes.

Figure 8-37. BGP message header format

The fields of the BGP header are explained in detail in the following list:

Marker (16 bytes): Contains authentication data if authentication was negotiated between the peers. All bits are set to one if no authentication is used or in the OPEN message.
Length (2 bytes): The total length of the BGP message, including headers. The value must be between 19 and 4096. The maximum message size of any BGP message is 4096 bytes.
Type (1 byte): Indicates the BGP message types as listed in Table 8-9.

Table 8-9. BGP message types
Type	Name	Description
1	OPEN	Initializes BGP connection and negotiates session parameters
2	UPDATE	Exchanges feasible and withdrawn BGP routes
3	NOTIFICATION	Reports errors or terminates BGP connections
4	KEEPALIVE	Keeps the BGP connection from expiring

8.4.3. OPEN Message

As soon as the TCP connection between two BGP peers has been established, the routers send OPEN messages to initialize the BGP connection. This message verifies the validity of the peer and negotiates parameters used for the session using the fields illustrated in Figure 8-38. To verify the validity of a peer, each side of the connection must configure the IP address and the AS number of the peer.

Figure 8-38. The BGP OPEN message

The following list details of the fields of the OPEN message:

Version (1 byte): Indicates the BGP version used by the sending peer. The current version is 4. Both peers have to agree on the same version. The version can be negotiated. Each peer usually indicates the highest version it supports. If the receiving peer does not support this version, it notifies the peer and terminates the session.
My Autonomous System (2 bytes): Indicates the AS number of the sending router. The receiving router must verify this number to be the peer's AS number. If it is incorrect, the peer is notified and the session is terminated. If the AS number is the same as the receiving router's AS number, the peer is internal (IBGP); otherwise, the peer is external (EBGP).
Hold time (2 bytes): Proposes a maximum time in seconds that may elapse before any BGP message must arrive on this interface. The hold timer is negotiated to the smaller value advertised by either peer. To keep a BGP connection from expiring, the peers send KEEPALIVE messages once every HoldTime/3 seconds. A hold time of 0 indicates that no KEEPALIVE messages need to be sent. The value of the hold time is 0 or greater than 2.
BGP Identifier (4 bytes): Each router must be identified by a unique, globally assigned BGP identifier. At startup, the BGP Identifier is set to an IPv4 address of a local interface. The message is rejected if the BGP Identifier equals the BGP Identifier of the receiver or if the BGP Identifier is illegal. During route selection, the BGP Identifier may be used to break a tie.
Optional Parameter Length (1 byte): Indicates the length of optional parameters to be negotiated. A length of 0 indicates that there are no optional parameters.
Optional Parameters: Each Optional Parameter consists of a <Type, Length, Value> (TLV) triplet. Both routers must know and agree on the optional parameter; otherwise, the peer is notified of the rejection of the parameter. This could lead to the termination of the session. At the moment, two parameters are specified, as explained in Table 8-10. The optional parameter BGP Capability is very important for IPv6 support.

Table 8-10. Optional parameters
Type	Name	Description
1	Authentication	The parameter consists of two fields: Authentication Code and Authentication Data. The Authentication Code defines the authentication mechanism used and how the marker and authentication data fields are to be computed.
2	BGP Capability	The parameter consists of one or more <Code, Length, Value> triplets identifying different BGP Capabilities. It is defined in RFC 3392. The capability parameter may appear more than once in the `OPEN` message. The Capability Code set to 1 indicates the Multiprotocol Extension Capability as defined in RFC 2858.

The authentication for BGP connection is currently mostly based on the MD5-signature-option and implemented directly in TCP. This authentication does not use the authentication data subfield.

The multiprotocol capability is used in conjuntion with IPv6. It has a 4-byte value field. The first 2 bytes identify the Address Family Identifier (AFI), byte 3 is reserved, and byte 4 defines the Subsequent Address Family Identifier (SAFI). AFI defines the network layer protocol used in the multiprotocol extension. SAFI defines additional information about the protocol, such as whether the protocol uses unicast forwarding (SAFI=1), multicast forwarding (SAFI=2), or both (SAFI=3). To support IPv6, the Multiprotocol Extension Capability is set to <Code=1, Length=4, Value=hexadecimal 0x0002 0001>.

8.4.4. UPDATE Message

An UPDATE message carries BGP route(s) advertised by the originating peer. It is divided into three sections, as outlined in Figure 8-39. The first section specifies the IPv4 NLRI that the sending peer is withdrawing. The second section defines all path attributes associated with the feasible IPv4 NLRI followed in section three. Multiple NLRI with the exact same set of path attributes can be placed in a single UPDATE message.

Figure 8-39. The BGP UPDATE message

The fields of the UPDATE message are detailed in the following list:

Unfeasible Routes length (2 bytes)

Defines the length of the Withdrawn Routes field. When set to 0, it indicates that the originating peer has no route to withdraw with this message.

Withdrawn Routes

A list of IPv4 NLRIs that are no longer valid. Each NLRI is encoded as <length, prefix> and represents an IPv4 prefix. The 1-byte Length field defines the length of the corresponding Prefix field. The Prefix field is padded to the full octet. Because the NLRIs are IPv4 prefixes, this field can never be used to withdraw IPv6 routes. See "BGP Multiprotocol Extension for IPv6" for further details.

Total Path Attribute length (2 bytes)

Defines the length of the Path Attributes field.

Path attributes

Contains a list of path attributes that belong to the feasible NLRI advertised. Attributes are further explained in the next section.

Network Layer Reachability Information

A list of IPv4 NLRI that are advertised with this update. Each NLRI is encoded as <length, prefix> and represents an IPv4 prefix. The 1-byte Length field defines the length of the corresponding Prefix field. The Prefix field is padded to the full octet. The total length of this field is calculated as follows:

UPDATE message length - 23 - Unfeasible Routes Length - Total Path Attribute Length

Because the NLRIs are IPv4 prefixes, this field can never be used to advertise IPv6 routes. See "BGP Multiprotocol Extension for IPv6" for further details.

8.4.5. BGP Attributes

Path attributes provide additional information about the advertised NLRI. Each path attribute has a 2-byte attribute header, as depicted in Figure 8-40.

Figure 8-40. The BGP path attributes

The following list explains the Path Attribute in detail:

O bit (Optional bit): Defines whether the attribute is optional (set to 1) or well-known (set to 0). A well-known attribute must be recognized and supported by each BGP router. Optional attributes may not be recognized by some routers.
T bit (Transitive bit): Defines whether the attribute is transitive (set to 1) or nontransitive (set to 0). Transitive attributes must always be passed on when the NLRI is advertised to another peer. Well-known attributes must always be transitive.
P bit (Partial bit): Applies only to optional transitive attributes. If any router along the update path does not recognize the optional transitive attribute, it must set the P bit to 1. This setting indicates that at least one router in the path to the route does not recognize this attribute. This bit must always be set to 0 for optional nontransitive or well-known attributes.
E bit (Extended length bit): Defines whether the attribute length field is 1 byte (set to 0) or 2 bytes (set to 1). Extended length may be used if the attribute's data is longer than 255 bytes.
Attribute code (1 byte): Defines the type of attribute. Table 8-11 lists and explains some of the most common attributes. Detailed explanations should be taken directly from RFC 4271 or any RFC extending BGP (e.g., BGP Route Reflection defines attribute types 9 and 10).

Table 8-11. BGP attributes
Type	Name/flags	Description
1	`ORIGIN` (well-known)	Defines the original source of this route. 0=IGP, 1=EGP, 2=Incomplete
2	`AS_PATH` (well-known)	A sequence of AS numbers that this route has crossed during its update. The rightmost AS number defines the originating AS. Each AS crossed is prepended. Prevents loops and can be used for policies.
3	`NEXT_HOP` (well-known)	Specifies the next hop's IPv4 address. Cannot be used for IPv6.
4	`MED` (optional nontransitive)	The `MULTI_EXIT_DISC` (`MED`) indicates a desired preference (4-byte) of the route to the peerthe lower the better. Designed for multiple EBGP connections between two ASes to load-share inbound traffic.
5	`LOCAL_PREF` (well-known)	Defines a local preference (4 byte) of the route. The higher the better. It is usually calculated on routes arriving from external peers and preserved to internal peers. Designed for multiple EBGP connections to any AS to manage outbound traffic.
6	`ATOMIC_AGGREGATE` (well-known)	Specifies that one of the routers has selected the less-specific route over a more-specific route.
7	`AGGREGATOR` (optional transitive)	The BGP Identifier of the router that aggregated routes into this route.
8	`COMMUNITY` (optional transitive)	Carries a 4-byte informational tag. Can be used by the route selection process. Defined in RFC 1997.
14	`MP_REACH_NLRI` (optional nontransitive)	Advertises multiprotocol NLRI. Used for IPv6 prefixes. See "BGP Multiprotocol Extension for IPv6."
15	`MP_UNREACH_NLRI` (optional nontransitive)	Withdraws multiprotocol NLRI. Used for IPv6 prefixes. See "BGP Multiprotocol Extension for IPv6."

8.4.6. NOTIFICATION and KEEPALIVE Messages

NOTIFICATION messages are used to report errors. A 1-byte Error Code field specifies the main category of the error. A Subcode field providing the actual error follows the Error Code field. For troubleshooting reasons, additional data about the error is placed in the Data field. See RFC 4271 for all error codes. Additional documents extending BGP add error subcodes. Error messages for the BGP Extension for IPv6 are specified in RFC 2858.

KEEPALIVE messages contain no data whatsoever, just the BGP message header with the message type 4. They are used to prevent a BGP connection from timing out.

8.4.7. BGP Multiprotocol Extension for IPv6

BGP-4 carries only three pieces of information that are truly IPv4-specific:

NLRI (feasible and withdrawn) in the UPDATE message contains an IPv4 prefix.
NEXT_HOP path attribute in the UPDATE message contains an IPv4 address.
BGP Identifier is in the OPEN message and in the AGGREGATOR attribute.

To make BGP-4 available for other network layer protocols, the multiprotocol NLRI and its next hop information must be added. RFC 2858 extends BGP to support multiple network layer protocols. IPv6 is one of the protocols supported, as emphasized in a separate document (RFC 2545). To accommodate the new requirement for multiprotocol support, BGP-4 adds two new attributes to advertise and withdraw multiprotocol NLRI. The BGP Identifier stays unchanged. BGP-4 routers with IPv6 extensions therefore still need a local IPv4 address. To establish a BGP connection exchanging IPv6 prefixes, the peering routers need to advertise the optional parameter BGP capability to indicate IPv6 support. BGP connections and route selection remain unchanged. Each implementer needs to extend the RIB to accommodate IPv6 routes. Policies need to take IPv6 NLRI and next hop information into consideration for route selection.

An UPDATE message advertising only IPv6 NLRI sets the unfeasible route length field to 0 and carries no IPv4 NLRI. All advertised or withdrawn IPv6 routes are carried within the MP_REACH_NLRI and MP_UNREACH_NLRI. The UPDATE must carry the path attributes ORIGIN and AS_PATH; in IBGP connections it must also carry LOCAL_PREF. The NEXT_HOP attribute should not be carried. If the UPDATE message contains the NEXT_HOP attribute, the receiving peer must ignore it. All other attributes can be carried and are recognized.

An UPDATE message can advertise both IPv6 NLRI and IPv4 NLRI having the same path attributes. In this case, all fields can be used. For IPv6 NLRI, however, the NEXT_HOP attribute should be ignored. IPv4 and IPv6 NLRI are separated in the corresponding RIB.

8.4.7.1. MP_REACH_NLRI path attribute

This optional nontransitive attribute allows the exchange of feasible IPv6 NLRI to a peer, along with its next hop IPv6 address. The NLRI and the next hop are delivered in one attribute, as depicted in Figure 8-41.

Figure 8-41. The MP_REACH_NLRI path attribute for IPv6

The fields comprising the MP_REACH_NLRI path attribute are detailed in the following list:

Address Family Identifier (AFI) (2 bytes)

Defines the network layer protocol. IPv6 uses the value 0x0002 (hexadecimal) as specified at http://www.iana.org/numbers.html (see RFC 3232).

Subsequent Address Family Identifier (SAFI) (1 byte)

Defines whether the protocol uses unicast forwarding (SAFI=1), multicast forwarding (SAFI=2), or both (SAFI=3).

Length of the next hop network address (1 byte)

Defines the number of bytes used for the Next Hop Address field. IPv6 sets this field to either 16 or 32, depending on the number of the next hop address provided.

Network address of next hop

Contains the next hop IPv6 address of this IPv6 route. This field is updated when advertising this route to an external peer. The router chooses its own IPv6 global/local address of the link to the external peer. This field is generally not updated when advertising this route to an internal peer. If the next hop IPv6 address and the peer IPv6 address share a common linke.g., a link between two external peersthe link-local address of the common link should be added as a second next hop address. In return, when advertising this route to an internal peer, the link-local address received from an external peer needs to be removed.

RFC 2545 still uses the term site-local address instead of local address. Site-local addresses have been deprecated in the meantime. For more information, refer to the sections in Chapter 3 about IPv6 addressing.

Number of SNPA (1 byte)

Defines the number of Subnetwork Points of Attachment (SNPA) to follow right after this field. SNPA carry additional information associated with the router associated with the next hop address. IPv6 does not use this field and sets it to 0. Therefore, no SNPA data field will follow.

Network Layer Reachability Information (NLRI)

A list of IPv6 NLRI that are advertised with this attribute. Each NLRI is encoded as <length, prefix>. The 1-byte Length field defines the length of the corresponding Prefix field. The Prefix field is padded to the full octet. The length of this field is the remaining length after deducting the length of all previous fields from the attribute length.

8.4.7.2. MP_UNREACH_NLRI path attribute

This optional nontransitive attribute allows the sending peer to withdraw multiple IPv6 routes that are no longer valid. As illustrated in Figure 8-42, it basically contains a list of IPv6 prefixes that the peer should remove from its RIB.

Figure 8-42. The MP_UNREACH_NLRI path attribute for IPv6

The fields comprising the MP_UNREACH_NLRI path attribute are detailed in the following list:

Address Family Identifier (AFI) (2 bytes): Defines the network layer protocol. IPv6 uses the value 0x0002 (hexadecimal).
Subsequent Address Family Identifier (SAFI) (1 byte): Defines whether the protocol uses unicast forwarding (SAFI=1), multicast forwarding (SAFI=2), or both (SAFI=3).
Withdrawn routes: A list of IPv6 NLRI that are withdrawn from service. Each NLRI is encoded as <length, prefix>. The 1-byte Length field defines the length of the corresponding Prefix field. The Prefix field is padded to the full octet. The length of this field is the remaining length after deducting the length of all previous fields from the attribute length.