Inter-AS Multicasting

 

A challenge facing any multicast routing protocol (or any unicast routing protocol, for that matter) is scaling efficiently to the set of hosts requiring delivery of packets. You have seen how dense mode protocols such as PIM-DM and DVMRP do not scale well; by definition, the protocols assume that most hosts in the multicast domain are group members. PIM-SM, being a sparse mode protocol, scales better because it assumes most hosts in the multicast domain are not group members . Yet the assumption of both dense mode and sparse mode protocols is that they span a single domain. In other words, all the IP multicast routing protocols you have examined so far can be considered multicast IGPs.

How, then, can multicast packets be delivered across AS boundaries while maintaining the autonomy of each AS?

The PIM-SM Internet Draft begins to address the issue by defining a PIM Multicast Border Router (PMBR). The PMBR resides at the edge of a PIM domain and builds special branches to all RPs in the domain, as illustrated in Figure 7-4. Each branch is represented by a (*,*,RP) entry, where the two wildcard components represent all source and group addresses that map to that RP. When an RP receives traffic from a source, it forwards the traffic to the PMBR, which then forwards the traffic into the neighboring domain. The PMBR depends on the neighboring domain to send it prunes for any unwanted traffic, and the PMBR then sends prunes to the RP.

Figure 7-4. A PIM Multicast Border Router Forms Multicast Branches to Each RP in Its Domain Called (*, *, RP) Branches. RPs Forward All Source Traffic to the PMBR Along These Branches

graphics/07fig04.gif

The shortcoming of the PMBR concept is this flood-and-prune behavior. In fact, PMBRs were proposed primarily to connect PIM-SM domains to DVMRP domains. Because of the poor scalability inherent in the approach, Cisco IOS Software does not support PMBRs.

Accepting that PIM-SM is the de facto standard IP multicast routing protocol, the question of how to route multicast traffic between autonomous systems can be reduced to a question of how to route between PIM-SM domains. Two issues must be addressed:

  • When a source is in one domain and group members are in other domains, RPF procedures must remain valid.

  • To preserve autonomy, a domain cannot rely on an RP in another domain.

PIM-SM is protocol-independent , so the first issue seems easy enough to resolve. Just as PIM uses the unicast IGP routes to determine RPF interfaces within a domain, it can use BGP routes to determine RPF interfaces to sources in other autonomous systems. When moving traffic between domains, however, you may want your multicast traffic to use different links from your unicast traffic, as shown in Figure 7-5. If a multicast packet arrives on link A, and BGP indicates that the unicast route to the packet's source is via link B, the RPF check fails. Static mroutes could be used to prevent RPF problems, but they are obviously not practical on a large scale. Instead, BGP must be extended so that it can indicate whether an advertised prefix is to be used for unicast routing, multicast RPF checks, or both.

Figure 7-5. Inter-AS Traffic Engineering Requirements May Dictate That Multicast Traffic Pass Over a Link Separate from Unicast Traffic

graphics/07fig05.gif

As it happens, PIM can take advantage of existing extensions to BGP. The extended version of BGP is called Multiprotocol BGP (MBGP) and is described in RFC 2283.[2] Although the extensions were created to allow BGP to carry reachability information for protocols such as IPv6 and IPX, the widespread application of MBGP is to advertise multicast sources. As a result, the "M" in MBGP is frequently and inaccurately thought to represent "multicast" rather than "multiprotocol."

The most common application of MBGP is for peer connections at NAPs among service providers that have agreed to exchange multicast traffic. As Figure 7-6 shows, the autonomous systems may be peered for unicast traffic but must share a separate peering point for multicast traffic. Some prefixes will be advertised over both the unicast and multicast NAPs, so MBGP is used to differentiate multicast RPF paths from unicast paths.

Figure 7-6. MBGP Is Used When Separate Peering Points Are Required for Multicast and Unicast

graphics/07fig06.gif

NOTE

Multicast NAPs are usually some nonswitched medium such as FDDI, as depicted in Figure 7-6.


The second inter-AS PIM issue (to preserve autonomy, a domain cannot rely on an RP in another domain) stems from the fact that an AS does not want to depend on an RP that it does not control. If each AS places its own RPs, however, there must be a protocol that each RP can use to share its source information with other RPs across AS boundaries and in turn discover sources known by other RPs, as illustrated in Figure 7-7. That protocol is the Multicast Source Discovery Protocol (MSDP).[3]

Figure 7-7. Multicast Source Discovery Protocol Is Spoken Between RPs and Allows Each RP to Discover Sources Known by Other RPs

graphics/07fig07.gif

The following two sections describe the MBGP extensions and the operation of MSDP.

Multiprotocol Extensions for BGP (MBGP)

RFC 2283 extends BGP for multiprotocol support by defining two new attributes:

  • Multiprotocol Reachable NLRI, or MP_REACH_NLRI (type 14)

  • Multiprotocol Unreachable NLRI, or MP_UNREACH_NLRI (type 15)

NOTE

See Chapter 2, Table 2-7, for a more complete list of BGP attribute type codes.


Both attributes are optional, nontransitive. Recall from Chapter 2, "Introduction to Border Gateway Protocol 4," that this means BGP speakers are not required to support the attributes, and BGP speakers that do not support the attributes do not pass them to their peers.

The MP_REACH_NLRI attribute advertises feasible routes, and MP_UNREACH_NLRI withdraws feasible routes. The Network Layer Reachability Information (NLRI) contained in the attributes is the protocol-specific destination information. When MBGP is used for IP multicast, the NLRI is always an IPv4 prefix describing one or more multicast sources. Remember that PIM routers do not use this information for packet forwarding but only for determining the RPF interface toward a particular source. These two new attributes provide the capability of signaling to a BGP peer whether a particular prefix is to be used for unicast routing, multicast RPF, or both.

The MP_REACH_NLRI consists of one or more [Address Family Information, Next Hop Information, NLRI] triples. The MP_UNREACH_NLRI consists of one or more [Address Family Information, Unfeasible Routes Length, Withdrawn Routes] triples.

NOTE

The complete format of the MP_REACH_NLRI is more complicated than is indicated here ”some fields are irrelevant to IP multicast. For a complete description, see RFC 2283.


The Address Family Information consists of an Address Family Identifier (AFI) and a Subsequent AFI (Sub-AFI). The AFI for IPv4 is 1, so it is always set to 1 for IP multicast.

The sub-AFI describes whether the NLRI is to be used for unicast routing only, multicast RPF information only, or both, as documented in Table 7-3.

Table 7-3. Subsequent Address Family Identifiers
Sub-AFI Description
1 Unicast route information only
2 Multicast RPF information only
3 Prefix can be used for both unicast routing information and multicast RPF information

Operation of Multicast Source Discovery Protocol (MSDP)

The purpose of MSDP is, as the name states, to discover multicast sources in other PIM domains. The advantage of running MSDP is that your own RPs exchange source information with RPs in other domains; your group members do not have to be directly dependent on another domain's RP.

NOTE

You will see in some subsequent case studies how MSDP can prove useful for sharing source information within a single domain, too.


MSDP uses TCP (port 639) for its peering connections. As with BGP, using point-to-point TCP peering means that each peer must be explicitly configured. When a PIM DR registers a source with its RP as illustrated in Figure 7-8, the RP sends a Source Active (SA) message to all of its MSDP peers.

Figure 7-8. RPs Advertise Sources to Their MSDP Neighbors with Source Active Messages

graphics/07fig08.gif

The SA contains the following:

  • The address of the multicast source

  • The group address to which the source is sending

  • The IP address of the originating RP

Each MSDP peer that receives the SA floods the SA to all of its own peers downstream from the originator. In some cases, such as the RPs in AS 6 and AS 7 of Figure 7-8, an RP may receive a copy of an SA from more than one MSDP peer. To prevent looping, the RP consults the BGP next-hop database to determine the next hop toward the SA's originator. If both MBGP and unicast BGP are configured, MBGP is checked first, and then unicast BGP. That next-hop neighbor is the RPF peer for the originator, and SAs received from the originator on any interface other than the interface to the RPF peer are dropped. The SA flooding process is, therefore, called peer RPF flooding. Because of the peer RPF flooding mechanism, BGP or MBGP must be running in conjunction with MSDP.

When an RP receives an SA, it checks to see whether there are any members of the SA's group in its domain by checking to see whether there are interfaces on the group's (*, G) outgoing interface list. If there are no group members, the RP does nothing. If there are group members, the RP sends an (S, G) join toward the source. As a result, a branch of the source tree is constructed across AS boundaries to the RP. As multicast packets arrive at the RP, they are forwarded down its own shared tree to the group members in the RP's domain. The members' DRs then have the option of joining the RPT tree to the source using standard PIM-SM procedures.

The originating RP continues to send periodic SAs for the (S, G) every 60 seconds for as long as the source is sending packets to the group. When an RP receives an SA, it has the option to cache the message. Suppose, for example, that an RP receives an SA for (172.16.5.4, 228.1.2.3) from originating RP 10.5.4.3. The RP consults its mroute table and finds that there are no active members for group 228.1.2.3, so it passes the SA message to its peers downstream of 10.5.4.3 without caching the message. If a host in the domain then sends a join to the RP for group 228.1.2.3, the RP adds the interface toward the host to the outgoing interface list of its (*, 224.1.2.3) entry. Because the previous SA was not cached, however, the RP has no knowledge of the source. Therefore, the RP must wait until the next SA message is received before it can initiate a join to the source.

If, on the other hand, the RP is caching SAs, the router will have an entry for (172.16.5.4, 228.1.2.3) and can join the source tree as soon as a host requests a join. The trade-off here is that in exchange for reducing the join latency, memory is consumed caching SA messages that may or may not be needed. If the RP belongs to a very large MSDP mesh, and there are large numbers of SAs, the memory consumption can be significant.

By default, Cisco IOS Software does not cache SAs. You can enable caching with the command ip msdp cache-sa-state. To help alleviate possible memory stress, you can link the command to an extended access list that specifies what (S, G) pairs to cache.

If an RP has an MSDP peer that is caching SAs, you can reduce the join latency at the RP without turning on caching by using SA Request and SA Response messages. When a host requests a join to a particular group, the RP sends an SA Request message to its caching peer(s). If a peer has cached source information for the group in question, it sends the information to the requesting RP with an SA Response message. The requesting RP uses the information in the SA Response but does not forward the message to any other peers. If a noncaching RP receives an SA Request, it sends an error message back to the requestor .

To enable a Cisco router to send SA Request messages, use the ip msdp sa-request command to specify the IP address or name of a caching peer. You can use the command multiple times to specify multiple caching peers.

MSDP Message Formats

MSDP messages are carried in TCP segments. When two routers are configured as MSDP peers, the router with the higher IP address listens on TCP port 639, and the router with the lower IP address attempts an active connect to port 639.

The MSDP messages use a TLV (Type/Length/Value) format and may be one of five types, shown in Table 7-4. The following sections detail the format of each message type.

Table 7-4. MSDP Message Types
Type Message
1 Source Active
2 Source Active Request
3 Source Active Response
4 Keepalive
5 Notification
Source Active TLV

When an MSDP RP receives a PIM Register message from an IP multicast source, it sends a Source Active message to its peers. Figure 7-9 shows the MSDP Source Active TLV format. SA messages are subsequently sent every 60 seconds until the source is no longer active. Multiple (S, G) entries can be advertised by a single SA.

Figure 7-9. The MSDP Source Active TLV Format

graphics/07fig09.gif

The fields for the MSDP Source Active TLV format are defined as follows :

  • Entry Count specifies the number of (S, G) entries being advertised by the specified RP address.

  • RP Address is the IP address of the originating RP.

  • Reserved is set to all zeroes.

  • Sprefix Length specifies the prefix length of the associated source address. This length is always 32.

  • Group Address is the multicast IP address to which the associated source is sending multicast packets.

  • Source Address is the IP address of the active source.

Source Active Request TLV

SA Request Messages, the format of which is shown in Figure 7-10, are used to request (S, G) information from MSPD peers that are caching SA state. SA Request messages should be sent only to caching peers (noncaching peers will return an error notification) and are sent only by RPs that are explicitly configured to do so.

Figure 7-10. The MSDP Source Active Request TLV Format

graphics/07fig10.gif

The fields for the MSDP Source Active Request TLV format are defined as follows:

  • Gprefix Length specifies the length of the group address prefix.

  • Group Address Prefix specifies the group address for which source information is requested .

Source Active Response TLV

SA Response messages, the format of which is shown in Figure 7-11, are sent by a caching peer in response to an SA Request message. They provide the requesting peer the source address and RP address associated with the specified group address. The format is the same as the SA message.

Figure 7-11. The MSPD Source Active Response TLV Format

graphics/07fig11.gif

Keepalive TLV

The active side (the peer with the lower IP address) of an MSDP connection tracks the passive side of the connection with a 75-second Keepalive timer. If no MSDP message is received from the passive side before the Keepalive timer expires , the active peer resets the TCP connection. If an MSDP message is received, the timer is reset. If the passive peer has no other MSDP messages to send, it sends a Keepalive message to prevent the active peer from resetting the connection. As Figure 7-12 shows, the Keepalive message is a simple 24\_bit TLV consisting of a type and length field.

Figure 7-12. The MSDP Keepalive TLV Format

graphics/07fig12.gif

Notification TLV

A Notification message is sent when an error is detected . Figure 7-13 shows the Notification message format.

Figure 7-13. The MSDP Notification TLV Format

graphics/07fig13.gif

The fields for the MSDP Notification TLV format are defined as follows:

  • Length = x + 5 is the length of the TLV, where x is the length of the data field and 5 is the first 5 octets.

  • O is the open bit. If this bit is cleared, the connection must be closed upon receipt of the Notification. Table 7-5 shows the states of the O bit for different error subcodes. MC indicates must close; the O bit is always cleared. CC indicates can close; the O bit might be cleared.

  • Error code is a 7-bit unsigned integer indicating the Notification type. Table 7-5 lists the error codes.

  • Error Subcode is an 8-bit unsigned integer that may offer more details about the error code. If the error code has no subcode, this field is zero. Table 7-5 shows the possible error subcodes associated with the error codes.

  • Data is a variable-length field containing information specific to the error code and error subcode. The various data fields are not covered in this chapter; see the MSDP Internet Draft for more information on the possible contents of this field.

Table 7-5. MSDP Error Codes and Subcodes
Error Code Error Code Description Error Subcode Error Subcode Description O-Bit State
1 Message header error Unspecific MC
2 Bad message length MC
3 Bad message type CC
2 SA Request error Unspecific MC
1 Does not cache SA MC
2 Invalid group MC
3 SA message/SA response error Unspecific MC
1 Invalid entry count CC
2 Invalid RP address MC
3 Invalid group address MC
4 Invalid source address MC
5 Invalid sprefix length MC
6 Looping SA (self is RP) MC
7 Unknown encapsulation MC
8 Administrative scope boundary violated MC
4 Hold timer expired Unspecific MC
5 Finite state machine error Unspecific MC
1 Unexpected message type FSM error MC
6 Notification Unspecific MC
7 Cease Unspecific MC


Routing TCP[s]IP (Vol. 22001)
Routing TCP[s]IP (Vol. 22001)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 182

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net