Two of the newer additions to the Cisco range (the 3550 and 4000 series) are sometimes described as Multilayer Switches, with the obvious inference that something beyond legacy routing is going on. With respect to the 3550 series and both the 4000 and 6500 series using the Supervisor IV engine, that something is Cisco Express Forwarding (CEF). In fact, the 3550 is advertised as supporting CEF-based multilayer switching.
CEF differs from other MLS implementations, in that there is no caching in the traditional sense. Caching introduces a number of issues that need to be addressed. For example, how long should a cache stay valid? How big should a cache be permitted to grow? And how do we deal with routing topology changes that invalidate cache entries?
Well, Cisco has constantly worked to try to optimize cache behavior, but the problems remain. It seems that the only good way to do layer 3 data forwarding is to use a routing table. But that slows everything down again, right? Actually, no, not necessarily. You see, if you create a stripped-down version of the routing table, and a separate adjacency table (which is similar to a separate ARP cache), then you can get the best of both worlds. The table resides close to the interfaces (figuratively speaking), keeping data away from the busy route processor and its buses. And because the table is in communication with the main routing table, it is always as up- to-date as the main table.
It may seem wrong to refer to CEF as layer 3 switching, but layer 3 switching is so poorly defined that it is easy to see how someone could become confused. Remember, though, that the CEF process has much in common with the way we have previously described layer 3 switching.
The routing decision is taken for the first packet at the route processor, and the frame address is rewritten to allow the packet to be properly forwarded. This is true in both CEF and layer 3 switching. It is also true to say that subsequent frames are forwarded (and the MAC address rewritten) according to cached information, and that they never get to the route processor.
I guess that, arguably, the story of layer 3 switching began a long time ago with the introduction of fast switching, and it has just progressed to caches further away from the route processor. Sometimes the cache is moved all the way to a separate box, namely the switch. But once modern switches incorporate IOS and the associated routing capability, the cache would naturally move back into the same housing.
The point here is that layer 3 switching really is vendor-speak, and in an ideal world we would not even have a chapter with this title—we would be calling it something like “How to Speed Up the Routing Process.” The problem is that Cisco is under pressure from other vendors who call their offerings layer 3 switching, and so the myth continues to be propagated. And as long as Cisco exams are going to have questions on this topic, we have to use the same language. The term means little enough, but once you start building boxes with IOS that have the capability to perform both switching and routing, all this business of switch-router intercommunications disappears inside the proprietary architecture, and you can’t see it anymore. So the early switches didn’t do layer 3 switching at all, the (dying) range of CatOS-based switches do it in a complex fashion (as in the first part of this chapter), and the new IOS-based switches do it wonderfully, but it’s a secret!
CEF, then, is not a first-generation attempt to speed up the forwarding process, but is the most recent mechanism to be tested. I think that it would assist us in placing CEF in the proper context if we looked at how we got here, so I propose to first consider the actual forwarding mechanisms that have traditionally been used by Cisco routers.
Over the years, as Cisco routers have matured from the early days of the IGS and AGS platforms, faster processors have been employed to make the forwarding decisions more quickly. Nonetheless, it is not only the processor power that determines the latency of a switch. Right up there with processing delay is the time taken to forward packets around inside the router, hence the move toward ever faster router architectures.
Designers soon realized that even with faster buses, there were still some delays associated with internal packet forwarding that might benefit from other techniques, and this gave rise to the different switching modes employable in modern routers. Because the 3550 and Supervisor IV–equipped 4000 are really routers as well as switches, these processes suddenly became relevant to those engineers studying switches.
In order to really see the progression here from legacy routing to layer 3 switching, let’s look at some of the history, specifically that of process switching (which you could easily call legacy routing), fast switching, and optimum switching (both cache-based methods for speeding up the forwarding process). Finally, we’ll look properly at CEF.
When packets are process switched, the complete packet is forwarded across the internal architecture to the route processor. This is the “heart” of the router, and is a busy place to be! Often accessed via two buses—the Cbus and the systems bus—it involves a long trip through the router and out to the forwarding interface for the whole packet. At the route processor, the forwarding interface and the MAC header rewrite information is applied. Delay is considerable, but there are some advantages: if the routing table holds multiple paths of equal cost to the destination, then load balancing can be carried out on a per-packet basis.
The routing process is shown in Figure 7.9. This diagram illustrates the linear nature of process switching, where a packet travels right through the “heart” of the router, resulting in slow forwarding.
Figure 7.9: Process switching flow
Like process switching, fast switching has been available on all Cisco platforms for many years, including the ubiquitous 2500 series. Fast switching involves the use of a cache on the route processor where forwarding information is maintained. The first packet in a conversation is passed to the route processor, matched against routes, and process switched. The fast switching cache is updated, and subsequent packets have only the header matched in the cache. The result is that the rest of the conversation is forwarded without being passed to the route processor.
Forwarding information is stored in the form of a binary tree, which allows bit-by-bit decision making to be carried out regarding the next hop. This binary tree may require up to 32 levels of comparison to fully match a route, but the decision is often reached much more quickly, and is considered to be a very efficient lookup mechanism.
Entries in the fast cache are created at the beginning of a conversation, and therefore suffer the perennial problems of caches—how do updates to other information, such as the ARP cache, affect the cached information? And the answer is that they don’t, leaving the possibility that changes in the ARP cache may leave the fast cache with out-of-date and incorrect information. In that case, the cache must be recreated. The second problem with fast switching is that the cache can construct only a single route to a destination, so any load sharing must be on a conversation-by-conversation basis (sometimes caller per-destination load sharing) with a cache entry for each conversation.
Nonetheless, fast switching is perhaps ten times faster than process switching and is widely used.
The fast switching tree is shown in Figure 7.10. Each bit in the destination address is compared with the table, and because each possibility is either a one or a zero, a single match is gained with every pass.
Figure 7.10: Fast switching tree
Optimum switching also relies on a caching mechanism, but there are important differences from fast switching. The first difference is in the operation of the tree. Instead of a binary tree, with each level being a single comparison (one or zero in the binary string), optimum switching employs a 256-way multi-way tree (mtree). Each level allows selection of a single octet in the destination address, resulting in a maximum of four lookup probes to find any target.
Optimum switching is very fast, but still suffers from the same problems of cache invalidation and therefore needs to be aged out regularly, interrupting the optimum flow while caches are rebuilt from requests to the route processor again.
The optimum switching tree is shown in Figure 7.11. Each octet in the 32-bit dotted-decimal address is matched individually, resulting in a far faster lookup process.
Figure 7.11: Optimum switching tree
At last we come to CEF. CEF maintains two separate but related tables, the forwarding table and the adjacency table. The forwarding table contains routing information and the adjacency table contains layer 2 next-hop addressing. CEF uses a trie instead of a tree. No, that’s not a misprint. A trie is a pointer used with a data structure, where the data structure does not actually contain the data.
The separation in the data structure means that the lookup process can be recursive, allowing different routes to be selected for successive packets, thus enabling per-packet load sharing. Also, if information in a cache changes, because the lookup is performed individually each time, the most up-to-date information is always used.
The CEF forwarding process is shown in Figure 7.12. This simple diagram illustrates that the lookup is much swifter because the 256-way data structure is the most efficient of all lookup methods, and is directly associated with the adjacency table.
Figure 7.12: CEF forwarding process
The result of CEF forwarding is a much higher throughput. True, a lot of this increased speed is due to proprietary architecture inside the switch or router, including the increased use of ASICs and specialized buses and memory arrangements. But it is also true that packets no longer need to be forwarded across internal buses to the busy route processor, which is where most of the router latency is introduced. And there are other benefits to CEF, such as the ability to support packet-by-packet load sharing, which cannot be achieved using cached entries as in fast or optimum switching.
To configure CEF on a 3550 switch, you first have to enable IP routing. Remember that because this is a multilayer switch, only the layer 2 switching processes are enabled by default, to maintain the plug-and-play nature of all switches. Use the global command ip routing to enable ip routing, and use the global command ip cef to enable cef.
Terry_3550# Terry_3550#conf t Enter configuration commands, one per line. End with CNTL/Z. Terry_3550(config)#ip routing Terry_3550(config)#ip cef ? accounting Enable CEF accounting load-sharing Load sharing table Set CEF forwarding table characteristics traffic-statistics Enable collection of traffic statistics <cr>
Next, you have to convert the layer 2 interface to layer 3. To do this, use the interface command no switchport. Enable IP on the interface using the standard command, and then enable CEF on the interface using the ip route-cache cef interface command.
Terry_3550# Terry_3550# Terry_3550#conf t Enter configuration commands, one per line. End with CNTL/Z. Terry_3550(config)#int fa 0/1 Terry_3550(config-if)#no switchport Terry_3550(config-if)#ip address 192.168.1.1 255.255.255.0 Terry_3550(config-if)#no shut Terry_3550(config-if)#ip route-cache cef Terry_3550(config-if)#^Z
Finally, you can confirm that CEF is running on the interface by using the show ip interface command:
Terry_3550#show ip int fa 0/1 FastEthernet0/1 is down, line protocol is down Internet address is 192.168.1.1/24 Broadcast address is 255.255.255.255 [output cut] IP fast switching is enabled IP fast switching on the same interface is disabled IP Flow switching is disabled IP CEF switching is enabled IP CEF Fast switching turbo vector IP multicast fast switching is enabled [output cut] Terry_3550#
Any entries in the CEF table will be displayed in the following format, using the show ip cef command:
Terry_3550#sho ip cef fastEthernet 0/1 Prefix Next Hop Interface Terry_3550#