9.2 The Finite State Machine


The operational functions of BGP are defined in RFC 1771. These definitions should be complied with for any implementation to be successful. Failure to do so could give rise to serious concerns about the integrity of the information provided by the protocol and potentially create a chaotic state of routing.

It is necessary to understand that BGP operates as a finite state machine (FSM). This means, that at all times of operation, there is a defined state of the process. This process cannot move on to the next state or perform other functions without first meeting a predetermined set of criteria. Once these criteria are met, based on the conclusion, the process will proceed to another predefined state, and go through the process of meeting certain criteria, before proceeding. And on and on it goes. This also implies the ability to handle error conditions.

9.2.1 Transport

BGP uses TCP (port 179) as the underlying transport protocol. This provides the reliable transport function, error correction, and retransmission of the higher-level data, if necessary. TCP relies on IP as the network layer protocol. So, if there is IP connectivity to a destination, the TCP session should remain active. This will become relevant during our discussion on peering and how logical peering works.

9.2.2 Events

In BGP there are 13 different events that cause the FSM to change states. Since an FSM is based on a calculated response to a predetermined number of possible events, the state change is predictable. However, the state-event correlation relies not only on the event, but on the current state of the FSM. The following events allow the FSM state to change:

  • BGP start

  • BGP stop

  • BGP transport connection Open

  • BGP transport connection Closed

  • BGP transport connection open Failed

  • BGP transport fatal error

  • Connect- retry timer expired

  • Hold timer expired

  • Keepalive timer expired

  • Receive OPEN message

  • Receive KEEPALIVE message

  • Receive UPDATE message

  • Receive NOTIFICATION message

The next section will discuss the various connection states and how the above listed events determine state change.

9.2.3 Connection States

There are six different states to the FSM as documented in RFC 1771. Each state represents where the BGP process is in terms of operation, but also predicates what type of events can occur to cause the state to change.

9.2.3.1 Idle

The initial state of the BGP process is Idle . In this state, the local system is essentially waiting for some start event to get the process moving. The most common start event is nothing more than the network administrator configuring the local system to peer with a remote system.

Depending on which configuration parameters are used, the local system will either wait for incoming TCP connection on port 179 and subsequent OPEN messages, or it will attempt to establish a TCP connection and begin sending OPEN messages. In JUNOS, if the passive parameter is used, the local system will listen for an incoming TCP connection on TCP port 179, then listen for incoming OPEN messages. If the passive parameter is not used, the local system will start a connect-retry timer and attempt to establish a TCP connection on port 179 to the remote system. BGP is typically configured on one router and then on the other, so the first one is usually trying to establish a connection prior to the remote system being configured. Regardless, if both systems attempt to create a peering session at the same time, the potential for two peering sessions between the same neighbors exists. RFC 1771 outlines a mechanism called connection-collision detection, which will prevent multiple sessions between the same neighbors from being established. Once the TCP connection attempt is started, the FSM will transition to the Connect state.

9.2.3.2 Connect

This state occurs when the local system sees a TCP connection initiated on port 179. This state is key to the overall operation of BGP. If the local system cannot transition out of the Connect state, there is a potential problem with establishing the TCP connection. This can be a useful tip in troubleshooting BGP peering problems.

When the TCP connection is established, the local system will reset the connect-retry timer it started earlier and will send an OPEN message to the remote system. When the local system sends the first OPEN message, the FSM transitions to the OpenSent state.

9.2.3.3 Active

If the router is unable to create the TCP connection, the FSM will transition to the Active state. In this state, the local system will still try to create the TCP connection. If it is able to complete the connection, the local system will send out the OPEN message to the potential peer, and the FSM will transition to the OpenSent state. In the Active state, the local system is capable of receiving a connection attempt from a potential neighbor.

9.2.3.4 OpenSent

When the local system sends out its OPEN message, the FSM will transition to the OpenSent state. When this occurs, the local system will listen for an OPEN message from the remote system. When the local system receives an OPEN message from the remote system, it sends out a KEEPALIVE message to the remote, and the FSM transitions to the OpenConfirm state.

When the OPEN message is received from the remote system, the local system is able to determine certain characteristics about the session. If the ASN is different than the local system, the session will be external; if it is the same, then it will be internal.

9.2.3.5 OpenConfirm

When the FSM in the local system reaches this state, it waits for the KEEPALIVE message to be sent from the remote system. When the local system receives the KEEPALIVE message from the remote system, the FSM will then change to the Established state. If the hold timers expire here, the local system will send a NOTIFICATION message to the remote system, and the FSM will change to the Idle state.

9.2.3.6 Established

When the FSM for the peering session transitions to the Established state, the peers are now ready to begin sending UPDATE messages to exchange reachability information.

When a BGP session is truly UP and the FSM is in the Established state, the following can be assumed:

  • The BGP process has been started.

  • A reliable transport session has been created.

  • OPEN messages have been exchanged successfully, and session parameters have been negotiated.

  • KEEPALIVE messages have been received successfully, and no error condition exists.

9.2.4 Message Types and Formats

The importance of messages was shown in the previous section on FSM events and states. If you refer to the list of events, you will see the last four events involve the OPEN , UPDATE , NOTIFICATION , and KEEPALIVE messages. Each message has a distinct purpose and provides details necessary for establishing, maintaining, and discontinuing BGP peer sessions between local and remote systems.

The maximum message size supported in BGP is 4,096 bytes, and the minimum message size is 19 bytes. The minimum size message contains only the BGP header without any trailing data and is used as the KEEPALIVE message. Each message header has a fixed length and does not have to contain data in each bit. There are three parts to a message header (see Figure 9-14):

  1. Marker (16 bytes) ” contains all 1s for the OPEN message (also used when authentication is configured for BGP peering session)

  2. Length (2 bytes) ” indicates the total length of the BGP message to a receiving system

  3. Type (1 byte) ” indicates the message type

    OPEN ” 0001 (1)

    UPDATE ” 0010 (2)

    NOTIFICATION ” 0011 (3)

    KEEPALIVE ” 0100 (4)

Figure 9-14. BGP Message Header

graphics/09fig14.gif

9.2.4.1 OPEN Message

The OPEN message is sent by the local system once the TCP connection between the two potential peers has been created. Figure 9-15 illustrates the following OPEN message fields:

  • Version (1 byte) ” This indicates the version of BGP used by the local system (in the case of Juniper Networks, it is BGP4).

  • My AS (2 bytes) ” This indicates the local system's ASN.

  • Hold time (2 bytes) ” This indicates the configured hold time of the local system. The hold time will be negotiated between local and remote systems if they are not equal. A value of 0 indicates no use of the hold timer or keepalive timer.

  • BGP identifier (4 bytes) ” This is the RID field. In JUNOS, the use of the router-id statement will set this value. Otherwise, the value will be taken from the lowest configured IP address on the router.

  • Optional parameters length (1 byte) ” This indicates the length of the optional parameters field. When this field is set to 0, there will not be any optional parameters in the packet.

  • Optional parameters (variable) ” This is a variable-length field comprising the following parameters (see Figure 9-16):

    Parameter type (1 byte) ” This indicates the type of parameter that will be used. RFC 1771 only defines one optional parameter, authentication information (parameter type 1). RFC 2842 makes a provision to negotiate a list of capabilities. Without this, when a BGP implementation would see a parameter type that it did not recognize, it would cause an error and close the peering session.

    Parameter length (1 byte) ” This indicates the length of the variable-length parameter value field, listed next.

    Parameter value (variable) ” This is assigned based upon the parameter type field.

    Figure 9-16. OPEN Message Parameters

    graphics/09fig16.gif

Figure 9-15. OPEN Message Format

graphics/09fig15.gif

9.2.4.2 UPDATE Message

BGP uses UPDATE messages to exchange all routing information related to BGP. A single message can contain both NLRI with attributes and a list of withdrawn routes. If the attributes are equal, multiple prefixes can be sent in a single message. This functionality provides an efficient method of information exchange by combining multiple functions into a single message.

The UPDATE message can be thought of as having three distinct categories:

  1. Unfeasible routes (withdrawn routes that no longer exist)

  2. NLRI (prefix and mask, such as 192.168.0.0/16 )

  3. Path attributes (such as MED or AS_PATH )

Unlike some IGPs, where route information is used to construct a topology (usually with the local system as root of that tree), BGP uses a list of ASs that the NLRI has passed through. This list is built on the concept of creating a loop-free path to the destination prefix.

Figure 9-17 illustrates the format of the UPDATE message. The BGP UPDATE message fields are as follows :

  • Unfeasible routes length (2 bytes) ” This indicates the length of the withdrawn routes field. A value of 0 tells the receiving system that no withdrawn routes are listed in this UPDATE message.

  • Withdrawn routes (variable) ” This consists of IP prefixes to be removed or withdrawn from the routing table. Each prefix is actually broken down to two parts, prefix length and prefix (see Figure 9-18). The length in this case is not related to the actual physical space of the prefix field, but rather to the subnet mask in terms of bits:

    Prefix length (16) ” refers to a mask of 255.255.0.0

    Prefix (10.5.0.0)

    Figure 9-18. UPDATE ”IP Prefix and Prefix Length Fields

    graphics/09fig18.gif

Figure 9-17. UPDATE Message

graphics/09fig17.gif

The above example refers to 10.5.0.0/16 .

There are two important bits of information regarding the withdrawn routes field. First, the prefix length and value can be set to 0, which would withdraw all routes learned from the advertising neighbor. Second, regardless of the length of the actual prefix field, RFC 1771 calls for it to add trailing bits to keep the length of the field equal to the next highest byte count. These trailing bits are insignificant as they are only used for padding.

Figure 9-19 illustrates the attribute encoding of the UPDATE message.

  • Total path attribute length (2 bytes) ” This gives the length of the variable path attribute field. It is necessary, and given a value of 0 would indicate that NLRI would not be present in the UPDATE message.

  • Path attribute ” This comprises two parts: attribute type (2 bytes) and attribute flags (1 byte).

  • The first four high-order bits determine the category of the attribute.

Figure 9-19. UPDATE ”Attribute Encoding

graphics/09fig19.gif

Attributes are used to provide specific information regarding the characteristic of a particular prefix being advertised. Each of these attributes and their meanings are described in Section 9.2.5. However, for now it is important to understand that BGP interprets and advertises attributes based upon four distinct categories as defined in RFC 1771 and shown in Figure 9-20. The first high-order bit is representative of either a well-known or optional attribute (0 = well known, 1 = optional). The second high-order bit is representative of either transitive or nontransitive (0 = nontransitive, 1 = transitive). For well-known attributes, the bit must be set to 1 for transitive. The third high-order bit is representative of the partial bit. This bit must be set to 0 for well-known and optional nontransitive attributes. The fourth high-order bit is representative of the extended length bit. If this bit is set to 1, then the extended length may be used, but only if the length of the attribute is greater then 255 bytes. The four low-order bits are not currently used in BGP:

  • Attribute type code (1 byte)

  • Attribute length ” length of the attribute field

  • Attribute value ” actual value of the attribute

Figure 9-20. UPDATE ”Attribute Type Field

graphics/09fig20.gif

9.2.4.3 NOTIFICATION Message

Any BGP-speaking router will send a NOTIFICATION message whenever an error condition exists. When this message is sent, the sending router closes the BGP session and transitions to the Idle state. The notification message consist of three fields (see Figure 9-21):

  1. Data (variable) ” provides additional information on the error (e.g., incorrect RID information received for a session would cause an error condition. This piece of data could be placed here).

  2. Error code (1 byte) ” see Table 9-3 for values.

  3. Error subcode (1 byte) ” see Table 9-3 for values.

Figure 9-21. NOTIFICATION Message Format

graphics/09fig21.gif

The error codes and subcodes indicate the major and minor reasons for which the error was detected . The various error codes and subcodes are essential to the control of the BGP process over the session. If this particular information was not evaluated and the error codes were not available, there would be devastating effects on the integrity of the route information passed through ASs and the Internet. Table 9-3 lists these error codes. The numbers following each code indicate the decimal equivalent.

Table 9-3. NOTIFICATION Message Error Codes
Message Header Error (1) (Error Code)
Connection Not Synchronized (1) (Error Subcode)
Bad Message Length (2) (Error Subcode)
Bad Message Type (3) (Error Subcode)
OPEN Message Error (2) (Error Code)
Unsupported Version Number (1) (Error Subcode)
Bad Peer AS (2)(Error Subcode)
Bad BGP Identifier (3) (Error Subcode)
Unsupported Optional Parameter (4) (Error Subcode)
Authentication Failure (5) (Error Subcode)
Unacceptable Hold Time (6) (Error Subcode)
UPDATE Message Error (3) (Error Code)
Malformed Attribute List (1) (Error Subcode)
Unrecognized Well-known Attribute (2) (Error Subcode)
Missing Well-know Attribute (3) (Error Subcode)
Attribute Flags Error (4) (Error Subcode)
Attribute Length Error (5) (Error Subcode)
Invalid Origin Attribute (6) (Error Subcode)
AS Routing Loop (7) (Error Subcode)
Invalid NEXT_HOP Attribute (8) (Error Subcode)
Optional Attribute Error (9) (Error Subcode)
Invalid Network Field (10) (Error Subcode)
Malformed AS_PATH (11) (Error Subcode)
Hold Timer Expired (4) (Error Code)
Finite State Machine Error (5) (Error Code)
Cease (6) (Error Code)
9.2.4.4 KEEPALIVE Message

KEEPALIVE messages are exchanged to let each peering neighbor know that the other is there (see Figure 9-22). Two other elements are used: the hold timer and the KEEPALIVE messages. These messages cannot be any more frequent than one per second. When the BGP session is first negotiated, the HoldTime is agreed upon. If the agreed upon HoldTime is set to 0, then no KEEPALIVE messages will be sent. As noted previously, the KEEPALIVE messages consist only of the BGP header.

Figure 9-22. KEEPALIVE Message Format

graphics/09fig22.gif

9.2.5 Attributes

This section will discuss the attributes associated with BGP and what each means. Attributes are passed in UPDATE messages. There are four categories of BGP path attributes:

  1. Well-known mandatory ” must accompany each advertisement of the prefix

  2. Well-known discretionary ” must be recognized by the BGP implementation, but it is at the discretion of the local system as to whether or not the attribute will be sent in any UPDATE messages

  3. Optional transitive ” does not need to be known by the BGP implementation, but it is recommended, due to the attribute's transitive nature, for the local system to pass the attribute when announcing the prefix to other systems

  4. Optional nontransitive ” does not need to be known by the BGP implementation, but, due to the nature of the attribute, it is recommended that it not be passed to other peering neighbors via the UPDATE message

Table 9-4 lists the ten most common attributes used in BGP by category name and type code. The type code is the decimal value that is used in the UPDATE message to specify the type of attribute being passed.

Table 9-4. BGP Path Attributes
Attribute Category Type Code
ORIGIN Well-known mandatory 1
AS_PATH Well-known mandatory 2
NEXT_HOP Well-known mandatory 3
MULTI_EXIT_DISC Optional transitive 4
LOCAL_PREF Well-known discretionary 5
ATOMIC_AGGREGATE Well-known discretionary 6
AGGREGATOR Optional transitive 7
COMMUNITY Optional transitive 8
ORIGINATOR ID Optional nontransitive 9
CLUSTER_LIST Optional nontransitive 10
9.2.5.1 ORIGIN

This attribute gives an indication of how this particular prefix was learned. The following possible values can be used:

  • 0 (IGP) ”learned from the IGP in the originating AS

  • 1 (EGP) ”learned from the EGP in the originating AS

  • 2 (Incomplete) ”origin of this prefix cannot be determined

By default, JUNOS uses the ORIGIN code value 0, whether or not the route was learned from the IGP, statically defined, or part of an aggregate route.

9.2.5.2 AS_PATH

The AS_PATH attribute lists ASs for which a prefix has been announced. This attribute serves two functions: routing loop avoidance and path selection. If the receiving AS sees its own ASN in the AS_PATH list, it will ignore that announcement.

When a prefix is announced, the AS_PATH only lists the AS that announced the prefix. A single AS may have 10 routers or 100. So, the AS_PATH provides no additional granularity into how the packet would travel to a destination within a given AS.

The AS_PATH is set using the type field. If the type field has a value of 1 ( AS_SET ), then the resulting list is an unordered set of ASs. When an AS_SET is included, if the type value is 2 ( AS_SEQUENCE ), then the list that results is an ordered set of ASs. This means that when the local system readvertises the prefix, it will prepend the AS_SEQUENCE with the local system's ASN. Prepending always occurs in the left-most bits of the field.

If the prefix originates in the local AS, then the border router will add the local AS to the prefix and send it to the external neighbor. If the prefix is advertised internally, then no prepending is necessary. Remember, the AS_PATH is the listing of ASs that have ANNOUNCED the prefix, ANNOUNCED meaning advertising to other external neighboring ASs.

9.2.5.3 NEXT_HOP

This attribute is vital in the route-selection process. In short, the NEXT_HOP attribute indicates the IP address of the border router that can be used to reach a given destination, not the next-hop as in interface or gateway to the next Layer 3 device. The show route <prefix> detail or show route <prefix> extensive commands can be used to see both physical next-hops and protocol or border router next-hops. Understanding NEXT_HOP and how it is used is essential to understanding BGP route selection. A case study in Sections 10.2.2 and 10.2.3 covers NEXT_HOP .

9.2.5.4 MED

MED is a metric specified by an announcing external neighbor to identify the ingress point to use in the announcing AS for a given prefix. This attribute is used by the announcing system to influence the local system's decision process. When it is received, it can be propagated via IBGP, but does not get propagated when the local AS advertises the route to another external AS. A case study in Section 10.2.4 can be referenced for more information.

9.2.5.5 LOCAL_PREF

LOCAL_PREF is used by IBGP to influence internal routers to use a particular border router to reach a given prefix. The higher the value, the better the degree of preference. This means that if border router A advertises prefix 10.10.0.0/16 with a LOCAL_PREF of 100 , and border router B advertises 10.10.0.0/16 with a LOCAL_PREF of 150 , the internal routers will choose to send packets via border router B.

9.2.5.6 ATOMIC_AGGREGATE

Aggregation is a method by which the local system advertises a route representative of several more specific routes that it knows about. When this occurs, there is a potential loss of information relating to the more specific prefixes, such as AS_PATH . When this occurs, the local system will attach the ATOMIC_AGGREGATE attribute to the prefix when advertising it. This is important for the receiving system of the route. If a local system receives a prefix with the ATOMIC_AGGREGATE attribute set and does have a more specific route, it will not advertise the more specific route. With this being said, it can be assumed in some cases that routes with the ATOMIC_AGGREGATE attribute included will traverse ASs that may not be in the AS_PATH list. Aggregation can naturally cause loss of path information, hence the need to signal other systems that this has occurred.

9.2.5.7 AGGREGATOR

If a local system performs aggregation on a series of routes, it will include in the aggregated prefix advertisement the local system ASN and local system IP address that performed the aggregation.

9.2.5.8 COMMUNITY

Communities and policy go hand in hand with BGP. You can assign several prefixes the same values by including them in a particular community. Associated with this attribute are three well-known communities, as defined in RFC 1997:

  1. NO_EXPORT ( 0xFFFFFF01 ) ” Routes with this attribute value must not be advertised outside the AS boundary.

  2. NO_ADVERTISE ( 0xFFFFFF02 ) ” When this attribute is set, the local system does not advertise this route to other BGP neighbors.

  3. NO_EXPORT_SUBCONFED ( 0xFFFFFF03 ) ” Routes with this attribute set must not be advertised outside the local AS to external peers. In confederation scenarios, a confederation boundary router will not advertise the route to other external routers, even in the same parent AS.

Communities are essential in service provider networks. They can play a vital role in route coloring and further enhancing the routing domain's ability to maintain routing-policy control. Section 11.7 provides coverage on the use of communities.

9.2.5.9 ORIGINATOR_ID

This attribute is defined in RFC 1966. Simply put, the ORIGINATOR_ID is the RID of the router that originated the route into the AS. Route reflectors will not send a route learned from an originator back to that originator. The route-reflector server will set this attribute when advertising the prefix to internal neighbors. This attribute will not be sent to external neighbors.

9.2.5.10 CLUSTER_LIST

This is an optional nontransitive attribute and is used by BGP in route-reflector scenarios as well. The route-reflector server sets the CLUSTER_LIST value. Any routes received with this attribute set to the local CLUSTER_ID will be ignored. This, too, is part of the loop avoidance scheme in BGP route reflection and is especially useful when implementing multiple route-reflector clusters within a single AS.



Juniper Networks Reference Guide. JUNOS Routing, Configuration, and Architecture
Juniper Networks Reference Guide: JUNOS Routing, Configuration, and Architecture: JUNOS Routing, Configuration, and Architecture
ISBN: 0201775921
EAN: 2147483647
Year: 2002
Pages: 176

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net