8.2 CARP

only for RuBoard - do not distribute or recompile

8.2 CARP

CARP is an algorithm and not a protocol per se. It was designed to address a particular problem: how to achieve efficient and scalable load balancing while maximizing hit ratios and minimizing latency. It is useful in situations where a proxy caching service consists of a number of machines tightly clustered together. For example, a large ISP may need five web caches to handle all the traffic from their customers. Given a cache cluster or array, how can we distribute the load among the members ?

One technique is known as DNS round- robin . Each cache has its own IP address, but all are listed under a single hostname. For example:

 $ORIGIN us.ircache.net. bo                      IN      A       192.52.106.30                         IN      A       192.52.106.31 

A DNS server, such as BIND, cycles the order of the addresses for every lookup. When one client asks for bo.us.ircache.net , it receives the answer (192.52.106.30, 192.52.106.31). The next client that asks gets the same addresses in a different order: (192.52.106.31, 192.52.106.30). [2] Clients normally try the first IP address in the list. They'll try the next address only if there is a failure with the first. With this technique, each cache should receive about half of the the requests .

[2] You can see this for yourself by running host www.cnn.com on your Unix system.

The drawback to this approach is that client requests are randomly assigned among the member caches. One client may request http://www.web-cache.com from cache A, while another requests the same URL from cache B, thereby missing the opportunity for a cache hit. Of course, the caches can use ICP among themselves to get cache hits from each other, but this adds some delay. Furthermore, the member caches become nearly identical over time.

URL hashing, as described in Section 4.3.2, is better than DNS round-robin because it deterministically partitions all URLs among the set of caches. However, it also has a couple of annoying problems. It is easy to split the load equally among all caches, but anything more sophisticated (such as 10%/30%/60%) is awkward to implement. Second, adding or removing a group member causes a significant readjustment of the URL-to-cache mapping. Because of the modulo function, adding one cache to a set of N means that the mapping changes for N out of each N+1 URLs. Increasing the number of caches from four to five members displaces 80% of all cached objects.

CARP addresses this last problem with a creative, if not complicated, algorithm. For a given request, CARP calculates a score for every proxy cache. The request is forwarded to the proxy with the highest score. If this fails, then the second-highest scoring cache is tried. The score is a calculation based on a hash of the URL, a hash of the cache's name , and weights assigned to each cache. The important characteristic is that adding a new cache to the array does not change the relative ranking of the scores for the other caches; instead, the new cache creates new scores. Statistically, the new scores will be higher than the existing caches' scores for a fraction of the URLs that is proportional to the cache's weight within the array.

CARP also specifies a file format for a Proxy Array Membership Table . This table allows clients to figure out which caches belong to a group. The table may be accessed via web protocols (HTTP) so many clients can easily retrieve the information.

The CARP algorithm may be used by any web client, such as a browser or proxy cache, that needs to choose among a set of caches. Note, however, that it only works for parent relationships because CARP does not predict cache hits. Another minor problem with CARP is related to persistent HTTP connections. A client that makes four requests may use more TCP connections with CARP than it would without. Finally, also note that CARP has linear scaling properties (similar to ICP) because a score must be calculated for every group member.

At the time of this writing, CARP is documented only as an Internet draft, which is now expired . The details of this document are included in Appendix C. Whether CARP advances to become a more formal standard remains to be seen. The document authors, Vinod Valloppillil of Microsoft and Keith W. Ross of the University of Pennsylvania, have not been actively pushing for its adoption recently. Both Squid and the Microsoft Proxy products implement CARP. The draft document mentions that CARP can be implemented as a proxy autoconfiguration script (see Section 4.3), but none is provided.

only for RuBoard - do not distribute or recompile


Web Caching
Web Caching
ISBN: 156592536X
EAN: N/A
Year: 2001
Pages: 160

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net