Up to this point, we have been discussing the host side of multicast. You have learned how hosts interact with switches and routers to join multicast groups and receive the traffic. It is now time to move on to discuss how multicast traffic travels across the Internet (or intranet) from a source on a remote network to a local router and host.
Unicast data uses routing protocols to accomplish the task of getting data to and from remote destinations. Multicast does the same, but it goes about it in a somewhat different manner. Unicast relies on routing tables. Multicast uses a sort of spanning tree system to distribute its data. This section describes the tree structures that can be implemented to enable multicast routing. In addition to trees, several different protocol methods can be used to achieve the desired implementation of multicast.
Two types of trees exist in multicast:
Source trees Source trees use the architecture of the source of the multicast traffic as the root of the tree.
Shared trees Shared trees use an architecture in which multiple sources share a common rendezvous point.
Each of these methods is effective and enables sourced multicast data to reach an arbitrary number of recipients of the multicast group. Let's discuss each of them in detail.
Source trees use special notation. This notation is used in what becomes a multicast route table. Unicast route tables use the destination address and next-hop information to establish a topology for forwarding information. Here is a sample from a unicast routing table:
B 126.96.36.199/24 [20/0] via 188.8.131.52, 3d08h B 184.108.40.206/24 [20/0] via 220.127.116.11, 2w1d B 18.104.22.168/24 [20/0] via 22.214.171.124, 1d03h B 126.96.36.199/16 [20/0] via 188.8.131.52, 3d07h B 184.108.40.206/24 [20/0] via 220.127.116.11, 1w2d 18.104.22.168/24 is variably subnetted, 2 subnets, 2 masks B 22.214.171.124/24 [20/0] via 126.96.36.199, 1w2d B 188.8.131.52/32 [20/0] via 184.108.40.206, 1w2d
Multicast route tables are somewhat different. A sample of a multicast table follows. Notice that the notation is different. Instead of having the destination address listed and then the next hop to get to the destination, source tree uses the notation (S, G). This notation specifies the source host's IP address and the multicast group address for which it is sourcing information. Let's take the first one, for example. This is seen as (220.127.116.11, 18.104.22.168), which means that the source host is 22.214.171.124 and it is sourcing traffic for the multicast group 126.96.36.199:
(188.8.131.52, 184.108.40.206), 00:01:04/00:01:55, flags: PT Incoming interface: POS1/0/0, RPF nbr 220.127.116.11, Mbgp Outgoing interface list: Null (18.104.22.168, 22.214.171.124), 00:02:06/00:00:53, flags: PT Incoming interface: POS1/0/0, RPF nbr 126.96.36.199, Mbgp Outgoing interface list: Null (188.8.131.52, 184.108.40.206), 00:00:28/00:02:31, flags: CLM Incoming interface: POS1/0/0, RPF nbr 220.127.116.11, Mbgp Outgoing interface list: FastEthernet4/0/0, Forward/Sparse, 00:00:28/00:02:54 FastEthernet4/1/0, Forward/Sparse, 00:00:28/00:02:31 (18.104.22.168, 22.214.171.124), 00:00:40/00:02:19, flags: CLM Incoming interface: POS1/0/0, RPF nbr 126.96.36.199, Mbgp Outgoing interface list: FastEthernet4/0/0, Forward/Sparse, 00:00:41/00:02:53 FastEthernet4/1/0, Forward/Sparse, 00:00:41/00:02:19 (188.8.131.52, 184.108.40.206), 00:04:43/00:02:06, flags: CLMT Incoming interface: POS1/0/0, RPF nbr 220.127.116.11, Mbgp Outgoing interface list: FastEthernet4/0/0, Forward/Sparse, 00:04:43/00:02:43 FastEthernet4/1/0, Forward/Sparse, 00:04:43/00:03:07 (18.104.22.168, 22.214.171.124), 00:17:58/00:03:29, flags: MT Incoming interface: POS1/0/0, RPF nbr 126.96.36.199, Mbgp Outgoing interface list: FastEthernet4/0/0, Forward/Sparse, 00:17:58/00:02:44
Figure 8.12 gives you a good picture of how source trees work.
Figure 8.12: Source tree forwarding
Also notice in the drawing that the shortest path to the receivers was chosen. This is known as choosing the shortest path tree (SPT). You can see from the preceding output that there are three sources for the same group of 188.8.131.52. This indicates that there are three SPT groups shown here: (184.108.40.206, 220.127.116.11), (18.104.22.168, 22.214.171.124), and (126.96.36.199, 188.8.131.52). Each of these sources has its own shortest path tree to the receivers.
There are two types of shared tree distribution:
Both of them work a little differently from source tree distribution. Shared tree architecture lies in the possibility that there might be multiple sources for one multicast group. Instead of each individual source creating its own SPT and distributing the data apart from the other sources, a shared root is designated. Multiple sources for a multicast group forward their data to a shared root or rendezvous point (RP). The rendezvous point then follows SPT to forward the data to the members of the group. Figure 8.13 depicts how the shared tree distribution works.
Unidirectional shared tree distribution operates as shown in Figure 8.13. All recipients of a multicast group receive the data from a RP no matter where they are located in the network. This is very inefficient if subscribers are close to the source because they need to get the multicast stream from the RP.
Figure 8.13: Shared tree forwarding
Bidirectional shared tree distribution operates somewhat differently. If a receiver lives upstream from the RP, it can receive data directly from the upstream source. Figure 8.14 depicts how this works. As you can see, Host A is a source for group 184.108.40.206, and Host B is a receiver of that same group. In a bidirectional shared tree, data goes directly from Host A to Host B without having to come from the RP.
Figure 8.14: Bidirectional shared tree
The tree distributions explain how source information is managed; now we must discuss how the actual data delivery is managed. There are several methods of making sure that delivery is as efficient as possible. The following is discussed here:
Reverse Path Forwarding (RPF)
Time to Live (TTL) attributes
RPF works in tandem with the routing protocols, but it is described briefly here. As you have seen in Figures 8.13 and 8.14, the traffic goes only to the multicast group receivers. We also indicated that bidirectional distribution eliminates the need to forward data upstream. You might ask, 'How do you define upstream?' It is easy to clarify. By means of the routing protocols, routers are aware of which interface leads to the source(s) of the multicast group. That interface is considered upstream.
The Reverse Path Forwarding process is based on the upstream information. After receiving an incoming multicast packet, the router verifies that the packet came in on an interface that leads back to the source. The router forwards the packet if the verification is positive; otherwise, the packet is discarded. This check stops potential loops. To avoid increased overhead on the router's processor, a multicast forwarding cache is implemented for the RPF lookups.
You can also control the delivery of IP multicast packets through the TTL counter and TTL thresholds. The Time to Live counter is decremented by one every time the packet hops a router. After the TTL counter is set to zero, the packet is discarded.
Thresholds are used to achieve higher granularity and greater control within one's own network. Thresholds are applied to specified interfaces of multicast-enabled routers. The router compares the threshold value of the multicast packet to the value specified in the interface configuration. If the TTL value of the packet is greater than or equal to the TTL threshold configured for the interface, the packet is forwarded through that interface.
TTL thresholds enable network administrators to bound their network and limit the distribution of multicast packets beyond the boundaries. This is accomplished by setting high values for outbound external interfaces. The maximum value for the TTL threshold is 255. Refer to Figure 8.15 to see how network boundaries can be set to limit distribution of multicast traffic.
Figure 8.15: TTL threshold utilization
The multicast source initially sets the TTL value for the multicast packet and then forwards it on throughout the network. In this scenario, the TTL threshold values have been set to 200 on both of the exiting Packet over Sonet (POS) interfaces. The initial TTL value has been set to 30 by the application. There are three to four router hops to get out of the campus network. Router 3 will decrement by one, leaving a TTL value of 29; the Catalyst 6509's MSFC will decrement by one as well, leaving the value set to 28. After the packet reaches Router 2 or Router 1, the value will be 27 or 26, respectively. Both of these values are less than the TTL threshold of 200, which means that Router 1 and Router 2 will drop any outbound multicast packets.
Unicast has several routing protocols that build route tables enabling layer 3 devices such as routers and some switches to forward unicast data to the next hop toward its final destination. We have also discussed some of the methods that multicast, in general, uses to distribute multicast data. Similar to unicast, multicast has a variety of routing protocols, including distance vector and link state protocols.
Protocols are used to enhance the efficiency by which multicast application data is distributed and to optimize the use of existing network resources. This section covers Distance Vector Multicast Routing Protocol (DVMRP), Multicast Open Shortest Path First (MOSPF), and Protocol Independent Multicast dense mode (PIM DM).
Distance Vector Multicast Routing Protocol (DVMRP) has achieved widespread use in the multicast world. A few years ago, you might have often heard the term 'DVMRP tunnel' used when discussing the implementation of multicast feeds from an ISP or a feed from the Multicast Backbone (MBONE). As the name indicates, this protocol uses a distance-vector algorithm. It uses several of the features that other distance-vector protocols (such as RIP) implement. Some of these features are a 32 max hop-count, poison reverse, and 60-second route updates. It also allows for IP classless masking of addresses.
Just as with other routing protocols, DVMRP-enabled routers must establish adjacencies in order to share route information. After the adjacency is established, the DVMRP route table is created. Route information is exchanged via route reports. It is important to remember that the DVMRP route table is stored separately from the unicast routing table. The DVMRP route table is more like a unicast route table than the multicast route table that was shown earlier in this chapter. A DVMRP table contains the layer 3 IP network of the multicast source and the next hop toward the source.
Because the DVMRP table has this form, it works perfectly with source tree distribution, as discussed earlier. Using the information in the DVMRP table, the tree for the source can be established. In addition, the router uses this information to perform the Reverse Path Forwarding check to verify that the multicast data coming into the interface is coming in an interface that leads back to the source of the data. DVMRP uses SPT for its multicast forwarding.
Figure 8.16 shows how DVMRP works. You can see that not every router in the network is a DVMRP router. You should also notice that the adjacencies are established over tunnel interfaces. DVMRP information is tunneled through an IP network. On either end of the tunnel, information is learned and exchanged to build a multicast forwarding database or route table.
Figure 8.16: DVMRP tunnels
Multicast Open Shortest Path First (MOSPF) is a link state protocol. OSPFv2 includes some changes that allow multicast to be enabled on OSPF-enabled routers. This eliminates the need for tunnels such as those used for DVMRP.
To completely understand the full functionality of MOSPF, you must have a thorough understanding of OSPF itself. Here we cover only the basic functionality of MOSPF, so you should be fine with just a basic understanding of OSPF.
For more on OSPF, see CCNP: Building Scalable Cisco Internetworks Study Guide, by Carl Timm and Wade Edwards (Sybex, 2003).
MOSPF's basic functionality lies within a single OSPF area. Design gets more complicated as you route multicast traffic to other areas (inter-area routing) or to other autonomous systems (inter-AS routing). This additional complication requires more knowledge of OSPF routing. We briefly discuss how this is accomplished in MOSPF, but most of the details will be regarding MOSPF intra-area routing.
OSPF route information is shared via different Link State Advertisement (LSA) types. LSAs are flooded throughout an area to give all OSPF-enabled routers a logical image of the network topology. When changes are made to the topology, new LSAs are flooded to propagate the change.
In addition to the unicast-routing LSA types, in OSPFv2 there is a special multicast LSA for flooding multicast group information throughout the area. This additional LSA type required some modification to the OSPF frame format.
Here is where you need to understand a little about OSPF. Multicast LSA flooding is done by the Designated Router (DR) when multiple routers are connected to a multi-access media, such as Ethernet. On point-to-point connections, there are no DR and Backup Designated Router (BDR). Look at the following code from a Cisco router running OSPF over point-to- point circuits:
Neighbor ID Pri State Dead Time Address Interface 172.16.1.2 1 FULL/ - 00:00:31 172.16.1.2 Serial3/0 192.168.1.2 1 FULL/ - 00:00:39 192.168.1.2 Serial3/1
On a multi-access network, the DR must be multicast enabled-that is, running MOSPF. If any non-MOSPF routers are on the same network, their OSPF priority must be lowered so that none of them becomes the DR. If a non-MOSPF router were to become the DR, it would not be able to forward the multicast LSA to the other routers on the segment.
Inside the OSPF area, updates are sent describing which links have active multicast members on them so that the multicast data can be forwarded to those interfaces. MOSPF also uses (S, G) notation and calculates the SPT by using the Dijkstra algorithm. You must also understand that an SPT is created for each source in the network.
When discussing the difference between intra-area and inter-area MOSPF, you must remember that all areas connect through Area 0, the backbone. In large networks, having full multicast tables in addition to all the unicast tables flow across Area 0 would cause a great deal of overhead and possibly latency.
Unicast OSPF uses a Summary LSA to inform the routers in Area 0 about the networks and topology in an adjacent area. This task is performed by the area's Area Border Router (ABR). The ABR summarizes all the information about the area and then passes it on to the backbone (Area 0) routers in a summary LSA. The same is done for the multicast topology. The ABR summarizes which multicast groups are active and which groups have sources within the area. This information is then sent to the backbone routers.
In addition to summarizing multicast group information, the ABR is responsible for the actual forwarding of multicast group traffic into and out of the area. Each area has an ABR that performs these two functions within an OSPF network.
OSPF implements Autonomous System Border Routers to be the bridges between different autonomous systems. These routers perform much the same as an ABR but must be able to communicate with non-OSPF-speaking devices. Multicast group information and data is forwarded and received by the Multicast Autonomous System Border Router (MASBR). Because MOSPF runs natively within OSPF, there must be a method or protocol by which the multicast information can be taken from MOSPF and communicated to the external AS. Historically, DVRMP has provided this bridge.
There are three types of Protocol Independent Multicast (PIM): sparse mode, dense mode, and a combination of the two. Although PIM dense mode (PIM DM) maintains several functions, the ones that are discussed here are flooding, pruning, and grafting. We'll talk about sparse mode later in this chapter.
PIM is considered 'protocol independent' because it actually uses the unicast route table for RPF and multicast forwarding. PIM DM understands classless subnet masking and uses it when the router is running an IP classless unicast protocol.
PIM DM routers establish neighbor relationships with other routers running PIM DM. It uses these neighbors to establish an SPT and forward multicast data throughout the network. The SPT created by PIM DM is based on source tree distribution.
PIM, either sparse mode or dense mode, is the method that Cisco recommends for multicast routing on their routers.
Flooding When a multicast source begins to transmit data, PIM runs the RPF, using the unicast route table to verify that the interface leads toward the source. It then forwards the data to all PIM neighbors. Those PIM neighbors then forward the data to their PIM neighbors. This happens throughout the network, whether there are group members on the router or not. Every multicast-enabled router participates; that is why it is considered flooding and is where the term 'dense mode' comes from.
When multiple, equal-cost links exist, the router with the highest IP address is elected to be the incoming interface (used for RPF). Every router runs the RPF when it receives the multicast data.
Figure 8.17 depicts the initial multicast flooding in a PIM DM network. You can see that the data is forwarded to every PIM neighbor throughout the network. After a PIM neighbor does the RPF calculation, the router then forwards the data to interfaces that have active members of the group.
Figure 8.17: PIM DM flooding
Pruning After the initial flooding through the PIM neighbors, pruning starts. Pruning is the act of trimming down the SPT. Because the data has been forwarded to every router, regardless of group membership, the routers must now prune back the distribution of the multicast data to routers that actually have active group members connected.
Figure 8.18 shows the pruning action that occurs for the PIM DM routers that don't have active group members. Router 5 does not have any active group members, so it sends a prune message to Router 3. Even though Router 4 has a network that does not have members, it does have an interface that does, so it will not send a prune message.
Figure 8.18: PIM DM pruning
Four criteria merit a prune message being sent by a router:
The incoming interface fails the RPF check.
There are no directly connected active group members and no PIM neighbors. (This is considered a leaf router because it has no downstream PIM neighbors.)
A point-to-point non-leaf router receives a prune request from a neighbor.
A LAN non-leaf router receives a prune request from another router, and no other router on the segment overrides the prune request.
If any of these criteria are met, a prune request is sent to the PIM neighbor and the SPT is pruned back.
Grafting PIM DM is also ready to forward multicast data after a previously inactive interface becomes active. This is done through the process of grafting. When a host sends an IGMP group membership report to the router, the router then sends a Graft message to the nearest upstream PIM neighbor. After this message is acknowledged, multicast data begins to be forwarded to the router and on to the host. Figure 8.19 depicts the grafting process.
Figure 8.19: PIM DM grafting
Sparse mode protocols use shared tree distribution as their forwarding methods. This is done to create a more efficient method of multicast distribution. Two sparse mode protocols are discussed in this section:
Core-based trees (CBT)
Protocol Independent Multicast sparse mode (PIM SM)
When we discussed shared trees, you learned that there were two types, unidirectional and bidirectional. CBT utilizes the bidirectional method for its multicast data distribution. Because CBT uses a shared tree system, it designates a core router that is used as the root of the tree, enabling data to flow up or down the tree.
Data forwarding in a CBT multicast system is similar to the shared tree distribution we covered earlier. If a source to a multicast group sends multicast data to the CBT-enabled router, the router then forwards the data out all interfaces that are included in the tree, not just the interface that leads to the core router. In this manner, data flows up and down the tree. After the data gets to the core router, the core router then forwards the information to the other routers that are in the tree. Figure 8.20 depicts this process.
Figure 8.20: CBT data distribution
It is important to see the difference between this sparse mode method and the dense mode method. In sparse mode operation, routers are members of the tree only if they have active members directly connected. Notice in Figure 8.20 that Router 5 is not participating. Dense mode operates on the initial premise that all PIM neighbors have active members directly connected. The tree changes when the directly connected routers request to be pruned from the tree.
A CBT router might become part of the tree after a host sends an IGMP Membership Record to the directly connected router. The router then sends a join tree request to the core router. If the request reaches a CBT tree member first, that router will add the leaf router to the tree and begin forwarding multicast data.
Pruning the tree is done much the same way. When there are no more active members on a router's interfaces, the router sends a prune request to the upstream router. The answering router removes the interface from the forwarding cache if it is on a point-to-point circuit, or it waits for a timer to expire it if is on a shared access network. The timer gives enough time for other CBT routers on the segment to override the prune request.
PIM sparse mode (PIM SM) also uses the architecture of shared tree distribution. There is an RP router that acts as the root of the shared tree. Unlike CBT, however, PIM SM uses the unidirectional shared tree distribution mechanism. Because PIM SM uses the unidirectional method, all multicast sources for any group must register with the RP of the shared tree. This enables the RP and other routers to establish the RPT, or RP tree (synonymous with SPT in source tree distribution).
Just as with CBT, PIM SM routers join the shared tree when they are notified via IGMP that a host requests membership of a multicast group. If the existing group entry (*, G) does not already exist in the router's table, it is created and the join tree request is sent to the next hop toward the RP. The next router receives the request. Depending on whether it has an existing entry for (*, G), two things can happen:
If an entry for (*, G) exists, the router simply adds the interface to the shared tree and no further join requests are sent toward the RP.
If an entry for (*, G) does not exist, the router creates an entry for the (*, G) group and adds the link to the forwarding cache. In addition to doing this, the router sends its own join request toward the RP.
This happens until the join request reaches a router that already has the (*, G) entry or a join request reaches the RP.
The next facet of PIM SM is the shared tree pruning. With PIM SM, pruning turns out to be just the opposite of the explicit Join mechanism used to construct the shared tree.
When a member leaves a group, it does so via IGMP. When it happens to be the last member on a segment, the router removes the interface from the forwarding cache entry and then sends a prune request toward the RP of the shared tree. If there is another router with active members connected to the router requesting the prune, it is removed from the outgoing interface list and no additional Prune messages are sent to the RP. See Figure 8.21 for a visual description.
Figure 8.21: PIM SM pruning
Router 5 receives an IGMP message requesting the removal of Host G from the group. Because Host G was the last active member of the group, the (*, G) entry is set to null 0 and a prune request is sent by Router 5 to Router 3. When Router 3 receives the request, it removes the link for interface S0 from the forwarding table. Because Host F is a directly connected active member of the group, the entry for (*, G) is not null 0, so no prune request is sent to Router 2 (the RP for this example).
If Host F were not active, the entry for (*, G) would have been set to null 0 also and a prune request would have been sent to the RP.
In PIM sparse mode, the routers closest to the sources and receivers register with the RP, and so the RP knows about all the sources and receivers for any group. But it is possible that several RPs may need to be created, resulting in several PIM SM domains. Naturally, RPs don't know about multicast sources in other domains. Multicast Source Discovery Protocol (MSDP) was developed to address this issue.
ISPs offering multicast routes to their customers faced a dilemma. Naturally, they didn't want to have to rely on an RP maintained by another ISP, but they needed to access multicast traffic coming from the Internet. MSDP allows them to each run their own RP. RPs peer together using a TCP-based connection that allows them to share information about active sources inside their own domains.
ISPs have the option of which sources they will forward to other MSDP peers, or which sources they will accept, using filtering configurations. PIM SM is used to forward traffic between the RP domains.
ISPs have no problem with this peering relationship. ISP border routers already establish peering relationships with neighboring ISPs, running Border Gateway Protocol (BGP) version 4 to exchange routing information as part of the Internet architecture. ISPs with such peering relationships have regular meetings, and their inter-ISP links are part of their commercial raison d'etre. MSDP peering is simply an addition to the agenda.
Within any multicast group, it is possible for two sources to exist. Therefore, as multiple listeners join the group, they all receive multicast streams from both sources. This can be filtered out, but possibly not until the last router is reached, in which case considerable unnecessary traffic will have been transmitted. Source-Specific Multicasting (SSM) is an extension to the PIM protocol that removes that problem without having to resort to MSDP source discovery. SSM requires the network be running IGMPv3.
In SSM multicast networks, the router closest to the receiver receives a request from that receiver to join to a multicast source. The receiver application uses the Include option to specify the required source. Once the multicast router knows the specific source of the multicast stream, it no longer needs to communicate via the RP, but can instead forward data to the receiver directly, using a source-based share tree distribution system.