10.6 High Availability

only for RuBoard - do not distribute or recompile

10.6 High Availability

Unfortunately, computer systems do occasionally fail. If 100% availability is important, you should build some redundancy into your caching service. Redundancy can be a complicated business, and many organizations take it to extremes. Because computer systems have so many different ways to fail, you'll be faced with numerous options.

Power is one of the most obvious and common causes of failure. An uninterruptible power supply (UPS) is a good idea and relatively inexpensive. Even a small UPS that provides enough power for a few minutes will get you through most power outages. Many caching products and general-purpose servers have the option of redundant power supplies . These help guard against failure of the power supply itself and against power outages if you have separate power feeds.

Disk drives are also prone to failures. This is especially true for caching proxies that are busy 24 hours a day, 7 days a week. Some people like to use RAID systems to provide high-availability storage. In my experience, caches perform noticeably worse with RAID than without. If you really want to use RAID, you should probably use mirroring (RAID level 1) only. The importance of reliable storage depends on your location and the quality of your Internet connection. If your connection is good, the loss of a disk is not a big deal. However, if your connection is poor, the data on disk is very valuable , and RAID makes more sense.

In Chapter 9, I mentioned a number of ways that clusters can improve reliability. These range from simple DNS tricks to complicated load-balancing configurations. In general, cost is an important factor here. You can implement DNS-based failover and balancing for free, but you'll probably spend a lot of time working on it. The layer four switches and related products are expensive, but they provide better failure detection.

Once you insert a device like a layer four switch into your architecture, you also need to consider the reliability of that device. What happens if the layer four switch stops working? Some organizations use multiple switches in complex, cross-connected configurations, such as the one shown in Figure 10-1. In addition to providing fault tolerance, they also allow you to upgrade individual components without affecting the overall service. However, the complexities of these topologies makes them difficult to get "just right." You may find yourself spending a lot of time working on the configuration, especially when adding or removing servers.

Figure 10-1. Redundant layer four switches
figs/webc_1001.gif

If you're using interception caching, you can still provide robustness without redundancy in everything. Recall that with interception, a network device diverts packets to the cache instead of sending them to the origin server. If the cache fails, the device can simply send them to the origin server and pretend the cache doesn't exist. Most of the layer four switching products (see Table 5-1) have this bypass feature, as does Cisco's WCCP. InfoLibria's products use a device called the DynaLink that bypasses the cache in the event of a failure, including power loss. It has physical relays that essentially couple two cables together in the absence of electrical power.

If you choose one of these techniques to provide high availability, remember to think through the consequences of a failure. For example, if you have a cluster of caches and one goes down, can the remaining systems handle the full load? If not, you're likely to see cascading failures. The failure of one machine may cause another to fail, and so on. If you plan to use the bypass feature, think about the capacity of your Internet connection. Is there enough bandwidth to handle all the requests that would normally be served from the cache? Even if the network becomes severely congested , it's probably better than a complete denial of service for your users.

only for RuBoard - do not distribute or recompile


Web Caching
Web Caching
ISBN: 156592536X
EAN: N/A
Year: 2001
Pages: 160

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net