5.2 The IP Layer: Routing

only for RuBoard - do not distribute or recompile

5.2 The IP Layer: Routing

Interception caching begins at the IP (network) layer, where all sorts of IP packets are routed between nodes. Here, a router or switch recognizes HTTP packets and diverts them to a cache instead of forwarding them to their original destination. There are a number of ways to accomplish the interception:

Inline: An inline cache is a device that combines both web caching and routing (or bridging) into a single piece of equipment. Inline caches usually have two or more network interfaces. Products from Cacheflow and Network Appliance can operate in this fashion, as can Unix boxes running Squid.
Layer four switch: Switching is normally a layer two (datalink layer) activity. A layer four switch, however, can make forwarding decisions based on upper layer characteristics, such as IP addresses and TCP port numbers . In addition to HTTP redirection, layer four switches are also often used for server load balancing.
Web Cache Coordination Protocol: WCCP is an encapsulation protocol developed by Cisco Systems that requires implementation in both a router (or maybe even a switch) and the web cache. Cisco has implemented two versions of WCCP in their router products; both are openly documented as Internet Drafts. Even so, use of the protocol in a product may require licensing from Cisco.
Cisco policy routing: Policy routing refers to a router's ability to make forwarding decisions based on more than the destination address. We can use this to divert packets based on destination port numbers.

5.2.1 Inline Caches

An inline cache is a single device that performs both routing (or bridging) and web caching. Such a device is placed directly in the network path so it captures HTTP traffic passing through it. HTTP packets are processed by the caching application, while other packets are simply routed between interfaces.

Not many caching products are designed to operate in this manner. An inline cache is a rather obvious single point of failure. Let's face it, web caches are relatively complicated systems and therefore more likely to fail than a simpler device, such as an Ethernet switch. Most caching vendors recommend using a third-party product, such as a layer four switch, when customers need high reliability.

You can build an inexpensive inline cache with a PC, FreeBSD or Linux, and Squid. Any Unix system can route IP packets between two or more network interfaces. Add to that a web cache plus a little packet redirection (as described in Section 5.3), and you've got an inline interception cache. Note that such a system does not have very good failure-mode characteristics. If the system goes down, it affects all network traffic, not just caching. If Squid goes down, all web traffic is affected. It should be possible, however, to develop some clever scripts that monitor Squid's status and alter the packet redirection rules if necessary.

InfoLibria has a product that fits best in the inline category. They actually use two tightly coupled devices to accomplish inline interception caching. The DynaLink is a relatively simple device that you insert into a 100BaseT Ethernet segment. Two of the DynaLink's ports are for the segment you are splitting. The other two deliver packets to and from an attached DynaCache. The DynaLink is a layer one (physical) device. It does not care about Ethernet (layer two) packets or addresses. When the DynaLink is on, electromechanical switches connect the first pair of ports to the second. The DynaCache then skims the HTTP packets off to the cache while bridging all other traffic through to the other side. If the DynaLinks loses power or detects a failure of the cache, the electromechanical switches revert to the passthrough position.

If you want to use an inline caching configuration, carefully consider your reliability requirements and the failure characteristics of individual products. If you choose an inexpensive computer system and route all your traffic through it, be prepared for the fact that a failed disk drive, network card, or power supply can totally cut off your Internet traffic.

5.2.2 Layer Four Switches

Recently, a new class of products known as layer four switches ^[2] have become widely available. The phrase "layer four" refers to the transport layer of the OSI reference model; it indicates the switch's ability to forward packets based on more than just IP addresses. Switches generally operate at layer two (datalink) and don't know or care about IP addresses, let alone transport layer port numbers. Layer four switches, on the other hand, peek into TCP/IP headers and make forwarding decisions based on TCP port numbers, IP addresses, etc. These switches also have other intelligent features, such as load balancing and failure detection. When used for interception caching, a layer four switch diverts HTTP packets to a connected web cache. All of the other (non-HTTP) traffic is passed through.

^[2] You might also hear about "layer seven" or "content routing" switches. These products have additional features for looking even deeper into network traffic. Unfortunately, there is no widely accepted term to describe all the smart switching products.

Layer four switches also have very nice failure detection features (a.k.a. health checks). If the web cache fails, the switch simply disables redirection and passes the traffic through normally. Similarly, when the cache comes back to life, the switch once again diverts HTTP packets to it. Layer four switches can monitor a device's health in a number of ways, including the following:

ARP

The switch makes Address Resolution Protocol (ARP) requests for the cache's IP address. If the cache doesn't respond, then it is probably powered off, disconnected, or experiencing some other kind of serious failure.

Of course, this only works for devices that are "layer two attached" to the switch. The cache might be on a different subnet, in which case ARP is not used.

ICMP echo

ICMP echo requests, a.k.a. " pings ," also test the cache's low-level network configuration. As with ARP, ICMP tells the switch if the cache is on the network. However, ICMP can be used when the switch is on a different subnet.

ICMP round-trip time measurements can sometimes provide additional health information. If the cache is overloaded, or the network is congested , the time between ICMP echo and reply may increase. When the switch notices such an increase, it may send less traffic to the cache.

TCP

ARP and ICMP simply tell the switch that the cache is on the network. They don't, for example, indicate that the application is actually running and servicing requests. To check this, the switch sends connection probes to the cache. If the cache accepts the connection, that's a good indication that the application is running. If, however, the cache's TCP stack generates a reset message, the application cannot handle real traffic.

HTTP

In some cases, even establishing a TCP connection is not sufficient evidence that the cache is healthy . A number of layer four/seven products can send the cache a real HTTP request and analyze the response. For example, unless the HTTP status code is 200 (OK), the switch marks the cache as "down."

SNMP

Layer four switches can query the cache with SNMP. This can provide a variety of information, such as recent load, number of current sessions, service times, and error counts. Furthermore, the switch may be able to receive SNMP traps from the cache when certain events occur.

Load balancing is another useful feature of layer four switches. When the load placed on your web caches becomes too large, you can add additional caches and the switch will distribute the load between them. Switches often support numerous load balancing techniques, not all of which are necessarily good for web caches:

Round- robin

A counter is kept for each cache and incremented for every connection sent to it. The next request is sent to the cache with the lowest counter.

Least connections

The switch monitors the number of active connections per cache. The next request is sent to the cache with the fewest active connections.

Response time

The switch measures the response time of each cache, perhaps based on the time it takes to respond to a connection request. The next request is sent to the cache with the smallest response time.

Packet load

The switch monitors the number of packets traversing each cache's network port. The next request is sent to the cache with the lowest packet load.

Address hashing

The switch computes a hash function over the client and/or server IP addresses. The hash function returns an integer value, which is then divided by the number of caches. Thus, if I have three caches, each cache receives requests for one-third of the IP address space.

URL hashing

Address hashing doesn't always result in a well-balanced distribution of load. Some addresses may be significantly more popular than others, causing one cache to receive more traffic than the others. URL hashing is more likely to spread the load evenly. However, it also requires more memory and CPU capacity.

With address hashing, the forwarding decision can be made upon receipt of the first TCP packet. With URL hashing, the decision cannot be made until the entire URL has been received. Usually, URLs are quite small, less than 100 bytes, but they can be much larger. If the switch doesn't receive the full URL in the first data packet, it must store the incomplete URL and wait for the remaining piece.

If you have a cluster of caches, URL hashing or destination address hashing are the best choices. Both ensure that the same request always goes to the same cache. This partitioning maximizes your hit ratio and your disk utilization because a given object is stored in only one cache. The other techniques are likely to spread requests around randomly so that, over time, all of the caches come to hold the same objects. We'll talk more about cache clusters in Chapter 9.

Table 5-1 lists switch products and vendors that support layer four redirection. The Linux Virtual Server is an open source solution for turning a Linux box into a redirector and/or load balancer.

Table 5-1. Switches and Products That Support Web Redirection

Vendor	Product Line	Home Page
Alteon, bought by Nortel	AceSwitch	http://www.alteonwebsystems.com
Arrowpoint, bought by Cisco	Content Smart Switch	http://www.arrowpoint.com
Cisco	Local Director	http://www.cisco.com
F5 Labs	Big/IP	http://www.f5.com
Foundry	ServerIron	http://www.foundrynet.com
Linux Virtual Server	LVS	http://www.linuxvirtualserver.org
Radware	Cache Server Director	http://www.radware.com
Riverstone Networks	Web Switch	http://www.riverstonenet.com

These smart switching products have many more features than I've mentioned here. For additional information, please visit the products' home pages or http://www.lbdigest.com.

5.2.3 WCCP

Cisco invented WCCP to support interception caching with their router products. At the time of this writing, Cisco has developed two versions of WCCP. Version 1 has been documented within the IETF as an Internet Draft. The most recent version is dated July 2000. It's difficult to predict whether the IETF will grant any kind of RFC status to Cisco's previously proprietary protocols. Regardless, most of the caching vendors have already licensed and implemented WCCPv1. Some vendors are licensing Version 2, which was also recently documented as an Internet Draft. The remainder of this section refers only to WCCPv1, unless stated otherwise .

WCCP consists of two independent components : the control protocol and traffic redirection. The control protocol is relatively simple, with just three message types: HERE_I_AM, I_SEE_YOU, and ASSIGN_BUCKETS. A proxy cache advertises itself to its home router with the HERE_I_AM message. The router responds with an I_SEE_YOU message. The two devices continue exchanging these messages periodically to monitor the health of the connection between them. Once the router knows the cache is running, it can begin diverting traffic.

As with layer four switches, WCCP does not require that the proxy cache be connected directly to the home router. Since there may be additional routers between the proxy and the home router, diverted packets are encapsulated with GRE (Generic Routing Encapsulation, RFC 2784). WCCP is hardcoded to divert only TCP packets with destination port 80. The encapsulated packet is sent to the proxy cache. Upon receipt of the GRE packet, the cache strips off the encapsulation headers and pretends the TCP packet arrived there normally. Packets flowing in the reverse direction, from the cache to the client, are not GRE-encapsulated and don't necessarily flow through the home router.

WCCP supports cache clusters and load balancing. A WCCP-enabled router can divert traffic to many different caches. ^[3] In its I_SEE_YOU messages, the router tells each cache about all the other caches. The one with the lowest numbered IP address nominates itself as the designated cache . The designated cache is responsible for coming up with a partitioning scheme and sending it to the router with an ASSIGN_BUCKETS message. The buckets ”really a lookup table with 256 entries ”map hash values to particular caches. In other words, the value for each bucket specifies the cache that receives requests for the corresponding hash value. The router calculates a hash function over the destination IP address, looks up the cache index in the bucket table, and sends an encapsulated packet to that cache. The WCCP documentation is vague on a number of points. It does not specify the hash function, nor how the designated cache should divide up the load. WCCPv1 can support up to 32 caches associated with one router.

^[3] Each cache can have only one home router, however.

WCCP also supports failure detection. The cache sends HERE_I_AM messages every 10 seconds. If the router does not receive at least one HERE_I_AM message in a 30-second period, the cache is marked as unusable. Requests are not diverted to unusable caches. Instead, they are sent along the normal routing path towards the origin server. The designated cache can choose to reassign the unusable cache's buckets in a future ASSIGN_BUCKETS message.

WCCPv1 is supported in Cisco's IOS versions 11.1(19)CA, 11.1(19)CC, 11.2(14)P, and later. WCCPv2 is supported in all 12.0 and later versions. Most IOS 12.x versions also support WCCPv1, but 12.0(4)T and earlier do not. Be sure to check whether your Cisco hardware supports any of these IOS versions.

When configuring WCCP in your router, you should refer to your Cisco documentation. Here are the basic commands for IOS 11.x:

 ip wccp enable ! interface fastethernet0/0 ip wccp web-cache redirect

Use the following commands for IOS 12.x:

 ip wccp version 1 ip wccp web-cache ! interface fastethernet0/0 ip wccp web-cache redirect out

Notice that with IOS 12.x you need to specify which WCCP version to use. This command is only available in IOS releases that support both WCCP versions, however. The fastethernet0/0 interface may not be correct for your installation; use the name of the router interface that connects to the outside Internet. Note that packets are redirected on their way out of an interface. IOS does not yet support redirecting packets on their way in to the router. If needed, you can use access lists to prevent redirecting requests for some origin server or client addresses. Consult the WCCP documentation for full details.

5.2.4 Cisco Policy Routing

Interception caching can also be accomplished with Cisco policy routing. The next-hop for an IP packet is normally determined by looking up the destination address in the IP routing table. Policy routing allows you to set a different next-hop for packets that match a certain pattern, specified as an IP access list. For interception caching, we want to match packets destined for port 80, but we do not want to change the next-hop for packets originating from the cache. Thus, we have to be a little bit careful when writing the access list. The following example does what we want:

 access-list 110 deny   tcp host 10.1.2.3 any eq www access-list 110 permit tcp any any eq www

10.1.2.3 is the address of the cache. The first line excludes packets with a source address 10.1.2.3 and a destination port of 80 ( www ). The second line matches all other packets destined for port 80. Once the access list has been defined, you can use it in a route-map statement as follows :

 route-map proxy-redirect permit 10 match ip address 110 set ip next-hop 10.1.2.3

Again, 10.1.2.3 is the cache's address. This is where we want the packets to be diverted. The final step is to apply the policy route to specific interfaces:

 interface Ethernet0 ip policy route-map proxy-redirect

This instructs the router to check the policy route we specified for packets received on interface Ethernet0 .

On some Cisco routers, policy routing may degrade overall performance of the router. In some versions of the Cisco IOS, policy routing requires main CPU processing and does not take advantage of the "fast path" architecture. If your router is moderately busy in its normal mode, policy routing may impact the router so much that it becomes a bottleneck in your network.

Some amount of load balancing can be achieved with policy routing. For example, you can apply a different next-hop policy to each of your interfaces. It might even be possible to write a set of complicated access lists that make creative use of IP address netmasks .

Note that policy routing does not support failure detection. If the cache goes down or stops accepting connections for some reason, the router blindly continues to divert packets to it. Policy routing is only a mediocre replacement for sophisticated layer four switching products. If your production environment requires high availability, policy routing is probably not for you.

only for RuBoard - do not distribute or recompile