For years, Internet traffic has been growing at a breakneck pace. Because of the sheer amount of traffic coursing across the Internet and intranets, congestion has gotten worse. ISPs and organizations are being challenged to deal with this problem, because it is difficult to ensure QoS and deliver content to clients efficiently and affordably. By localizing traffic patterns on your existing network, you get a double bonus-not only is content delivered more quickly, but also the freed resources are available for additional traffic.
Content delivery is accelerated by locally filling content requests, instead of having to go across the Internet to fetch the information. This ensures the content is delivered quickly, and you don't have to worry about bottlenecks beyond your control. Traffic localization reduces the amount of redundant traffic on your WAN connections. This allows additional network resources for more users and for new services (like VoIP, for example).
In order to achieve this solution, it is necessary to have a network that enables transparent redirection technologies, like Web Cache Communication Protocol (WCCP). With this technology in place, network caches are added to key locations in the network to realize the traffic localization solution. Network caches store frequently accessed content and then serve it locally to deliver requests for the same content, but without having to go back across the Internet or WAN to get them. Obviously, this relieves congestion because repeated transmissions no longer need to be sent out. Figure 11-6 illustrates this process:
Using a Web browser, a user requests a Web page.
The network examines this request and then redirects it to a local network cache. This is done transparently so the user is unaware of the redirection.
The cache may not have the Web page stored. In that event, the cache makes its own request of the original Web server.
The original Web server delivers the requested Web page to the cache, which then resends it to the user. The cache stores a copy of the page in case it is needed later.
When another user requests the same Web page, this time, the cache has the page on hand and the request is fulfilled locally.
The cache delivers the Web page to the user locally. This eliminates the need to use WAN bandwidth and delivers the content much more quickly.
Figure 11-6: Web caching stores frequently accessed Web pages locally
Though we mentioned several CDN protocols earlier in this chapter, Cisco's proprietary protocol for enabling transparent caching throughout a network-Web Cache Communication Protocol (WCCP)-has become the protocol on which Cisco's solution is built. This protocol uses HTTP redirects to provide functionality. The first version of WCCP allowed communicating with just one router, did not support multicasting, and was limited to HTTP traffic. The current version is WCCP v2 and resolves the shortcomings of version 1.
WCCP uses UDP port 2048, operating through a generic routing encapsulation (GRE) tunnel between the router and the content engine (or content engines). Once the content has been delivered, either from the content engine or the source Web server, the HTTP packets are delivered and are not altered.
The content engines maintain a list of routers with which they have WCCP communications. When the content engine identifies itself to the routers, it shares its list of routers. In turn, the routers reply with a list of content engines that they see in the service group. As soon as all the devices know about one another, one content engine becomes the lead engine and determines in what way packets will be redirected.
The content engines send heartbeats to the routers every ten seconds through a GRE tunnel. If there is a cluster of content engines and one of the engines fails to send a heartbeat within 30 seconds, the router informs the lead content engine that the engine is missing and its resources must be reallocated to the remaining engines.
Note | We'll talk more about the specific resources in a cluster and what clustering is later in this section. |
An obvious concern when using caching is the issue of freshness. How can you be sure that the page you're looking at contains the most current information? That is, what prevents the content engine from storing last Friday's visit to a newspaper Web site for perpetuity?
Each Web page is made up of a number of Web objects, and each object has its own caching parameters that are established and managed by the Web page authors and HTTP standards. So, for example, our newspaper Web site will have new content, but things like the toolbars, navigation buttons, and the masthead are likely to be cacheable. As such, when the content engine stores the newspaper's Web site, it stores the elements that are not likely to change, then goes out to cull the new content. Content engines deliver fresh content by obeying HTTP caching standards (which we'll talk about in a moment) and allowing the administrators to decide when content should be refreshed from the source Web servers.
Web authors can establish to what degree to allow caching. In HTTP, caching parameters for each object on a Web site can be managed. Content can be set up for caching based on three settings:
The content is non-cacheable.
The content is cacheable (the default setting).
The content is cacheable, but it will expire on a given date.
HTTP 1.1 introduced a freshness mechanism called If-Modified-Since (IMS), which ensures cached data is up to date. Content engines send an IMS request to the destination Web server when the engine receives a request for cached content that has expired or when it receives IMS requests from clients where the cached content is older than a percentage of its maximum age. If the content on the destination Web server determines that the content in the engine has not been updated, it sends a message to the content engine to go ahead and serve its stored data to the client. If the content has been updated and is no longer fresh, the content engine will retrieve the new content.
But freshness is not just in the hands of the Web page creators. Network administrators can control the freshness of Web objects in their content engines. Content engines have a parameter called the freshness factor that can be configured by the network administrator.
This determines just how quickly content expires. When an object is stored in the cache, a TTL value is computed. That value is:
TTL = (current date – last modified date) * freshness factor
If the content has expired, based on the aforementioned formula, the Web data is refreshed in the cache the next time an IMS request is issued.
To establish a modest freshness policy, the freshness factor can be set to a small value (like .05) so that objects will expire more quickly. This will, however, cause more bandwidth to be consumed as pages are refreshed. Setting the freshness factor higher will cause less bandwidth to be consumed.
Note | Freshness can also be managed by the client. The client can click the browser's Reload or Refresh button. This will cause a series of IMS requests asking for Web objects that have been refreshed. Alternatively, SHIFT -REFRESH or SHIFT -RELOAD causes content engines to be bypassed and have the content sent directly to the client from the Web server. |
There are three primary ways in which content can be cached in Cisco's solution: transparent, proxy-style, and reverse proxy-style. The most common means of caching utilizes the transparent style of caching. However, the other methods are also useful to understand, as they may be more relevant and useful for your organization's needs.
The first method of caching is known as transparent caching. We outlined the steps involved in this type of caching already-in essence, a Web browser requests a Web page. That request first runs through the WCCP-enabled router, where it is analyzed. If the router determines that a local content engine has the desired content cached, it sends the request to the content engine, which delivers the content back to the browser. If it isn't cached, the content engine goes to the Internet to fetch and store the page.
Because this method utilizes a WCCP-enabled router, the content engine functions transparently to the browser. Clients need not be configured to be pointed to a specific proxy cache. As the content engine is transparent to the network, the router acts in a "normal" role for traffic that does not have to be redirected.
Using a CSS switch, however, the client's request need never reach the router. In larger deployments, it makes better sense to have a CSS switch to make decisions as to whether particular content has already been cached locally. Furthermore, large deployments might employ several content engines and data would be stored in each device, based on a uniform resource locator (URL).
HTTPS The whole process of caching seems straightforward enough, especially if someone is requesting static content. However, there is a slew of content on the Internet, on your intranet, or possibly traversing a dynamic WAN link. For instance, there are times when a user's Web page request will have to go to the intended Web server. The concept of caching is not thrown out the window in these cases. Let's consider what happens when a Secure HTTP (HTTPS) session is initiated:
The user initiates an HTTPS session. It is taken by the WCCP-enabled router and sent on to the content engine.
The content engine, configured as an HTTPS server, receives the request from the router.
A Secure Sockets Layer (SSL) certificate is obtained from the destination Web server by the content engine and then sent back (through the content engine) to the client to negotiate an SSL connection.
The client sends HTTPS requests within the SSL connection.
The content engine analyzes the request. If the information is in its cache, HTTP request processing occurs. If the content is in the content engine's cache (also known as a cache hit ), it sends the desired content back using the SSL connection.
If the content is not stored within the content engine (also known as a cache miss ), it establishes a connection to the destination Web server and requests the content through the SSL connection.
If possible, the content engine will cache the information and then send a copy back to the client through the SSL connection.
Content Bypassing There are some times when the content engine simply has to be avoided in order to get the session that the client needs. Though there are mechanisms in place for establishing HTTPS sessions, not all secure conversations can be accepted through the content engine.
Some Web sites rely on IP authentication and, as such, won't allow the content engine to connect on the client's behalf. Content engines can use authentication traffic bypass to avoid service disruption. Authentication traffic bypass is used automatically to create a dynamic access list for client/server pairs.
When a client/server pair goes into authentication bypass, it is bypassed for a set amount of time. The default setting is 20 minutes, but that value can be changed, depending on the organization's need.
The nontransparent or proxy-style of caching is known to the client, whereas transparent caching occurred without the client's knowledge that it was occurring. With proxy caching, the proxy cache performs the DNS lookup on the client's behalf. Proxies are used for different protocols, like HTTP, HTTPS, FTP, and so forth. Consider the network in Figure 11-7.
Figure 11-7: Proxy-style caching works on the client's behalf
The client has been configured to use a proxy server for HTTP requests. Normally, port 8080 is used, but different ports can be configured for the protocol you wish to manage.
The IP address of the proxy is also configured on the client. In the example, we're using address 10.1.100.100. Let's follow this method of caching step by step:
HTTP requests for content are directed to the proxy.
If the proxy cache does not have the content, the proxy performs the DNS lookup for the destination Web site.
When the DNS has been resolved, the proxy requests the content from the destination Web server and then retrieves it.
The content is stored in the cache before being forwarded to the client. This ensures that the next time the content is requested, the cache will have it.
Proxy caching is useful because the cache can be anywhere in the network. Furthermore, a measure of network security is provided in that only the client contacts the proxy, so the firewall rules can be stricter, allowing only the proxy to work through the firewall.
In the aforementioned proxy cache method, the proxy server is a proxy for the client. In the reverse proxy method, the proxy server acts as a proxy for the server. Reverse proxy caches also store selected content, whereas transparent and proxy methods store frequently requested content.
There are two cases in which reverse proxy caching is desirable:
Replicating content to geographically disparate locations
Replicating content for the sake of load balancing
In this scenario, the proxy server is set up with an Internet-routable IP address. That is, clients go to the proxy server based on DNS resolution of a domain name.
WCCP Servers Consider the cache deployment in Figure 11-8. The content engine works with a WCCP-enabled router and is configured for reverse proxy service for a Web server. In this scenario, the router interface linked to the Internet has an IP address of 192.168.1.100. HTTP requests sent to this server are first sent to the router interface at 172.12.12.1. Once the HTTP request has been received at this interface, the router redirects the request to the content engine (with an IP address of 172.12.12.20). In this case, the content engine is in front of the Web server, helping reduce the amount of traffic on it. If the information requested is not in the content engine, it sends a request to the Web server to locate the content.
Figure 11-8: Reverse proxy caching with WCCP-enabled routers places the content engine in front of the Web server
CSS Switches Consider the cache deployment in Figure 11-9. Here, the content engines are deployed with a WCCP-enabled router and a CSS switch. A user sends a request for Web page content. This is accepted at the CSS switch's virtual IP address. When the CSS switch takes the request, it forwards the request to the content engine. If the content engine does not have the requested data, the content engine will forward a request to the Web server.
Figure 11-9: In a reverse proxy scenario with CSS switches, the content engines are checked first before forwarding the request to the Web server
Content engines can be located at multiple points throughout an organization for optimal caching performance. For instance, consider the organization in Figure 11-10. In this case, the organization is served by three content engines. The first handles caching for the Customer Service department at the organization's headquarters; the second handles caching for the Production department in a branch office.
Figure 11-10: Content engines at different levels in an organization provide greater content availability
If a client in the Customer Service department sends a Web page request that can be accommodated by the first content engine, it will be served by that device. If that content engine cannot fulfill the request, it passes the request to the end Web server. Before it gets to the destination Web server, the request is considered by the content engine at the main Internet point of access. This provides another chance for the content to be served before having to go out onto the Internet or across a WAN. If this content engine is able to fulfill the request, then it is unnecessary to go onto the Internet. In the event someone at the branch office had requested the page, it might still be located on content engine number three, at corporate headquarters.
This scenario is especially useful for Internet service providers. With a content engine serving a number of clients, if common Web sites are requested, it is unnecessary to keep getting the page from the Internet. Instead, it can be served up locally by the content engine.
Another way to manage high traffic levels is through clustering. This simply means that multiple content engines are set up together. For instance, one Cisco content engine 7325 can support in excess of 155 Mbps of traffic and up to 936 GB of data. However, if a second 7325 is added, then the cluster can handle more than 310 Mbps throughput and 1.87 TB of data. Up to 32 content engines can be clustered together.
When a new content engine is added to the organization's cluster, the WCCP-enabled router detects the new device and reallocates resources for the new content engine.
Content engines use so-called buckets. WCCP-enabled routers redirect traffic to content engines using a hashing procedure based on the incoming request's destination IP address, and the request is sent to one of 256 buckets. Using a hashing technique, requests are spread evenly across all 256 buckets and, therefore, to all content engines in the cluster.
When a new content engine is added to the cluster, the WCCP-enabled router detects the new engine, and then the number of buckets is reconfigured, based on the total number of content engines. For example, let's say your organization has two content engines. Each engine would contain 128 buckets. If a third is added, then each engine is reconfigured to contain 85 or 86 buckets.
However, since a brand-new content engine won't have any content when it is added, it will suffer frequent cache misses until it has built up its storage. This problem is initially ameliorated, because the new content engine sends a message to others in the cluster, seeking the requested content. If another engine in the cluster has the content, it will be sent to the new engine. Once the engine decides that it has gotten enough content from its cohorts (based on parameters established by the network administrator), it will stop bothering its peers for content requests and instead query the end server.
Clustering is not only a good way to balance the load of caching requests, it also is a good way to ensure reliability. In the event one of the content engines in a cluster goes down, the WCCP-enabled router steps in and redistributes that engine's load across the remaining engines. The system continues operating, but with one less content engine. Certainly, this is not ideal from an availability standpoint, but at least the system remains accessible until the failed content engine can be restored.
If the entire cluster fails, then the WCCP-enabled router will stop bothering with caching, sending Web requests to their destination Web servers. To end users, it will appear as though it is simply taking longer for Web content to arrive.
As you have probably noticed by now, there seems to be a lot of responsibility placed on the shoulders of the WCCP-enabled router. If an engine in a cluster goes down, it is easy enough to redistribute its load to the other engines. But what happens if the WCCP-enabled router fails? In such an event, and assuming the pieces are in place before a failure, a WCCP-enabled, Multigroup Hot-Standby Router Protocol (MHSRP) router pair provides routing protection. This is known as WCCP multihoming.
Consider the network in Figure 11-11. There are two WCCP-enabled routers depicted. In the event one of these routers fails, the other would step in to take over for its failed brother, redirecting Web requests to the content engine cluster. The network in Figure 11-11 is fully redundant, because it employs both a content engine cluster and WCCP multihoming.
Figure 11-11: Multihoming provides reliability in addition to load balancing
Multihoming and clustering are good ways to plan for problems. However, as effective as they are, they aren't perfect. There might be a time when the entire cache system must be avoided. There are two scenarios in which the cache system is bypassed.
Overload Overload bypassing is used when there is a sudden surge of Web traffic and the content engine or cluster is simply overwhelmed. When this happens, the content engine is able to sense when it is overloaded and refuses additional requests until it can handle those already backlogged. Thus, incoming Web requests are simply forwarded to their destination Web servers, whether or not the content is stored in the content engine. The content engine continues to refuse requests until it determines that not only has the overload situation been averted, but also it does not expect to become overloaded again if it takes in new requests.
If the content engine is so besieged with requests that it cannot communicate with the WCCP-enabled router and share status messages, the router will logically remove that engine from the cluster, reallocating its buckets to other engines in the cluster.
Client If a client needs to be authenticated to the Web site using the client's IP address, authentication will fail if the content engine's IP address is seen and not the client's IP address. In such cases, the content engine will allow clients to bypass the engine and connect directly to the destination Web server.
Consider the exchange in Figure 11-12. In this figure, the client is attempting to access a Web server that insists on authentication. First, the request is funneled through the content engine. Seeing that the information is not stored locally, it is forwarded on to the destination Web server. If error codes are returned to the content engine (for instance, a 401-unauthorized request or 403-forbidden), the engine will automatically enter client bypass mode and allow the client to interact directly with the destination Web server. In addition, the content engine will store the destination IP address along with the client IP address. The next time the client attempts to access that Web server, the content engine will automatically enter client bypass mode.
Figure 11-12: If need be, the client can bypass the caching infrastructure and go directly to the source
If you think the golden days of internetworking are behind us, you haven't seen anything yet. The future is bright, as technologies and protocols are able to handle more and more. As networks become more intelligent, we can expect to see more services delivered, with special attention paid to network conditions and the specific needs of the content.