3.7 Servers on the WAN | LANs to WANs: The Complete Management Guide

< Day Day Up >

The continued growth of business-critical Internet applications, corporate intranets, storage services, conferencing, and e-commerce has prompted the need for innovations to improve traffic flow between servers on WAN. Increasingly, organizations of all types and sizes are turning to management products that allocate bandwidth according to various application performance criteria in an effort to reduce congestion, ensure delivery of priority messages, and support real-time multimedia traffic. The use of bandwidth management products can especially improve the performance of IP networks through such means as load balancing, caching, and queuing.

3.7.1 Load Balancing

By applying load balancing to the servers on the WAN, the performance of various services can be greatly improved. Through the use of bandwidth management tools, traffic can be directed to the best server available to handle a job. In a load-balanced network, incoming traffic is distributed among replicated servers, thus permitting server clusters to share the processing load, provide fail-back capability, and speed response time for users.

With advanced load-balancing tools, policies can be defined that reflect the capabilities of individual servers on the network. For example, all requests for video clips can be redirected to a video server that is specifically equipped to handle multiple streams, all routine information requests can be redirected to a Web server, and all employees in marketing can be redirected to the data warehouse server. The load-balancing tool continuously adjusts both the flow and prioritization of applications through the network and the distribution of those applications to servers.

Traffic can be balanced between available servers using algorithms such as:

Round robin: Each server is treated with equal priority.
Weighted round: Each server is given an individual weight or priority based on its ability to deliver specific applications.
Maintenance rerouting: Traffic is rerouted to another server when an originally targeted server becomes unavailable.

Selecting the right load-balancing approach results in efficient utilization of bandwidth and other resources and improves traffic flow throughout the organization.

3.7.2 Caching

A cache is temporary storage of frequently accessed information. Caching has long been used in computer systems to increase performance. A cache can be found in nearly every computer today, from mainframes to PCs. More recently, caching is used to improve the performance of corporate intranets. Many vendors of bandwidth management products offer network caching as well. Instead of users accessing the same information over the WAN, it is stored locally on a server. This arrangement gives users the information they need quickly, while freeing the WAN of unnecessary traffic, which improves its performance for all users.

Caching is frequently applied to the Web, especially for e-commerce applications. When users visit the same Web site, the browser first looks to see if a copy of the requested page is already in the computer’s hard disk cache. If it is, the load time is virtually instantaneous; if not, the request goes out over the Internet.

At the backbone level, lack of sufficient bandwidth is a global problem. Internet-telephony, videoconferencing, and multimedia applications are consuming even greater amounts of bandwidth. Network caching offers an effective and economical way to offload some of the massive bandwidth demand. This allows Internet service providers (ISPs) and corporations with their own intranets to maintain an active cache of the most-often visited Web sites so that when these pages are requested again, the download occurs from the locally maintained cache server instead of the request being routed to the actual server. The result is a faster download speed.

Caches can reside at various points in the network. For enterprises, caches can be deployed on servers throughout campus networks and in remote and branch offices. Within enterprise networks, caches are on the way to becoming as ubiquitous as IP routers. Just about every large company now depends on Web caches to keep their intranets running smoothly. There are two types of caching:

Passive caching: The cache waits until a user requests the object again, then sends a refresh request to the server. If the object has not changed, the cached object is served to the requesting user. If the object has changed, the cache retrieves the new object and serves it to the requesting user. However, this approach forces the end user to wait for the refresh request, which can take as long as the object retrieval itself. It also consumes bandwidth for unnecessary refresh requests.
Active caching: The cache performs the refresh request before the next user request—if the object is likely to be requested again and the object is likely to have changed on the server. This automatic and selective approach keeps the cache up to date so the next request can be served immediately. Network traffic does not increase because an object in cache is refreshed only if it has a high probability of being requested again, and only if there is a statistically high probability that it has changed on the source server.

Active caches can achieve hit ratios of up to 75%, meaning a greater percentage of user requests can be served by the cache. If the requested data is in the cache and is up to date, the cache can serve it to the user immediately upon request. If not, the user must wait while the cache retrieves the requested data from the network.

Passive caches, on the other hand, typically achieve hit rates of only 30%. This means users are forced to go to the network 2.5 times more often to get the information they need.

3.7.3 Queuing

Queuing techniques may be used separately or with rate control on IP networks. Queuing types include priority, weighted, and class-based. Priority queuing sets queues for high and low priority and empties high-priority queues first. Cisco’s weighted fair queuing (WFQ), for example, assigns traffic to priority queues and also apportions a bandwidth share. Its class-based queuing (CBQ) guarantees a transmission rate to a queue, and other queues can borrow from unused bandwidth.

WFQ classifies traffic into “conversations” and applies priority (or weights) to identified traffic to determine how much bandwidth each conversation is allowed relative to other conversations. Conversations are broken into two categories: those requiring large amounts of bandwidth and those requiring a smaller amount of bandwidth. The goal is to always have bandwidth available for the small bandwidth conversations and allow the large bandwidth conversations to split the rest proportionally to their weights.

WFQ provides consistent response time to heavy and light network users alike without adding excessive bandwidth. It is a flow-based queuing algorithm that schedules interactive traffic to the front of the queue to reduce response time and fairly shares the remaining bandwidth between other high-bandwidth flows. Low-volume traffic streams—which comprise the majority of traffic—receive preferential service, transmitting their entire offered loads in a timely fashion.

High-volume traffic streams share the remaining capacity proportionally between them.

< Day Day Up >