Other Caches | The Book of Webmin: Or How I Learned to Stop Worrying and Love UNIX

The Other Caches page provides an interface to one of Squid’s most interesting, but also widely misunderstood, features. Squid is the reference implementation of ICP, a simple but effective means for multiple caches to communicate with each other regarding the content that is available on each. This opens the door for many interesting possibilities when one is designing a caching infrastructure.

Internet Cache Protocol

It is probably useful to discuss how ICP works and some common usages for ICP within Squid, in order to quickly make it clear what it is good for, and perhaps even more importantly, what it is not good for. The most popular uses for ICP are discussed, and more good ideas will probably arise in the future as the Internet becomes even more global in scope and the web-caching infrastructure must grow with it.

Parent and Sibling Relationships

The ICP protocol specifies that a web cache can act as either a parent or a sibling. A parent cache is simply an ICP capable cache that will answer both hits and misses for child caches, while a sibling will only answer hits for other siblings. This subtle distinction means simply that a parent cache can proxy for caches that have no direct route to the Internet. A sibling cache, on the other hand, cannot be relied upon to answer all requests, and your cache must have another method to retrieve requests that cannot come from the sibling. This usually means that in sibling relationships, your cache will also have a direct connection to the Internet or a parent proxy that can retrieve misses from the origin servers. ICP is a somewhat chatty protocol, in that an ICP request will be sent to every neighbor cache each time a cache miss occurs. By default, whichever cache replies with an ICP hit first will be the cache used to request the object.

When to Use ICP

ICP is often used in situations wherein one has multiple Internet connections, or several types of paths to Internet content. Other possibilities include having a cache mesh such as the IRCache Hierarchy [http://www.ircache.net/Cache/] in the U.S. or The National Janet Web Caching Service [http://wwwcache.ja.net/] in the UK, which can utilize lower-cost non-backbone links to connect several remote caches in order to lower costs and raise performance. Finally, it is possible, though usually not recommended, to implement a rudimentary form of load balancing through the use of multiple parents and multiple child web caches. All of these options are discussed in some detail, but this document should not be considered the complete reference to ICP. Other good sources of information include the two RFCs on the subject, RFC 2186 [http://www.ircache.net/Cache/ICP/rfc2186.txt], which discusses the protocol itself, and RFC 2187 [http://www.ircache.net/Cache/ICP/rfc2187.txt], which describes the application of ICP.

One common ICP-based solution in use today is satellite cache prepopulation services. In this case, there are at least two caches at a site, one of which is connected to a satellite Internet uplink. The satellite-connected cache is provided by the service provider, and it is automatically filled with popular content via the satellite link. The other cache uses the satellite-connected cache as a sibling, which it queries for every cache miss that it has. If the satellite connected sibling has the content it will be served from the sibling cache; if not the primary cache will fetch the content from the origin server or a parent cache. ICP is a pretty effective, if somewhat bandwidth and processor intensive, means of accomplishing this task. A refinement of this process would be to use Cache-Digests for the satellite connected sibling in order to reduce traffic between the sibling caches. Nonetheless, ICP is a quite good method of implementing this idea.

Another common use is cache meshes. A cache mesh is, in short, a number of web caches at remote sites interconnected using ICP. The web caches could be in different cities, or they could be in different buildings of the same university or different floors in the same office building. This type of hierarchy allows a large number of caches to benefit from a larger client population than is directly available to it. All other things being equal, a cache that is not overloaded will perform better (with regard to hit ratio) with a larger number of clients. Simply put, a larger client population leads to a higher quality of cache content, which in turn leads to higher hit ratios and improved bandwidth savings. So, whenever it is possible to increase the client population without overloading the cache, such as in the case of a cache mesh, it may be worth considering. Again, this type of hierarchy can be improved upon by the use of Cache Digests, but ICP is usually simpler to implement and is a widely supported standard, even on non-Squid caches.

Finally, ICP is also sometimes used for load balancing multiple caches at the same site. ICP, or even Cache Digests for that matter, are almost never the best way to implement load balancing. However, for completeness, I’ll discuss it briefly. Using ICP for load balancing can be achieved in a few ways. One common method is to have several local siblings, which can each provide hits to the others’ clients, while the client load is evenly divided across the number of caches. Another option is to have a very fast but low-capacity web cache in front of two or more lower-cost, but higher-capacity, parent web caches. The parents will then provide the requests in a roughly equal amount. As mentioned, there are much better options for balancing web caches, the most popular being WCCP (version 1 is fully supported by Squid), and L4 or L7 switches.

Other Proxy Cache Servers

This section of the Other Caches page provides a list of currently configured sibling and parent caches, and also allows one to add more neighbor caches. Clicking the name of a neighbor cache will allow you to edit it. This section also provides the vital information about the neighbor caches, such as the type (parent, sibling, multicast), the proxy or HTTP port, and the ICP or UDP port of the caches. Note that Proxy port is the port where the neighbor cache normally listens for client traffic, which defaults to 3128.

Edit Cache Host

Clicking a cache peer name or clicking Add another cache on the primary Other Caches page brings you to this page, which allows you to edit most of the relevant details about neighbor caches (Figure 12-2).

click to expand
Figure 12-2: Edit Cache Host page

Hostname

The name or IP address of the neighbor cache you want your cache to communicate with. Note that this will be one-way traffic. Access Control Lists, or ACLs, are used to allow ICP requests from other caches. ACLs are covered later. This option plus most of the rest of the options on this page correspond to cache_peer lines in squid.conf.

Type

The type of relationship you want your cache to have with the neighbor cache. If the cache is upstream, and you have no control over it, you will need to consult with the administrator to find out what kind of relationship you should set up. If it is configured wrong, cache misses will likely result in errors for your users. The options here are sibling, parent, and multicast.

Proxy port

The port on which the neighbor cache is listening for standard HTTP requests. Even though the caches transmit availability data via ICP, actual web objects are still transmitted via HTTP on the port usually used for standard client traffic. If your neighbor cache is a Squid-based cache, then it is likely to be listening on the default port of 3128. Other common ports used by cache servers include 8000, 8888, 8080, and even 80 in some circumstances.

ICP port

The port on which the neighbor cache is configured to listen for ICP traffic. If your neighbor cache is a Squid-based proxy, this value can be found by checking the icp_port directive in the squid.conf file on the neighbor cache. Generally, however, the neighbor cache will listen on the default port 3130.

Proxy only?

A simple yes or no question to tell whether objects fetched from the neighbor cache should be cached locally. This can be used when all caches are operating well below their client capacity, but disk space is at a premium or hit ratio is of prime importance.

Send ICP queries?

Tells your cache whether or not to send ICP queries to a neighbor. The default is Yes, and it should probably stay that way. ICP queries is the method by which Squid knows which caches are responding and which caches are closest or best able to quickly answer a request.

Default cache

This be switched to Yes if this neighbor cache is to be the last-resort parent cache to be used in the event that no other neighbor cache is present as determined by ICP queries. Note that this does not prevent it from being used normally while other caches are responding as expected. Also, if this neighbor is the sole parent proxy, and no other route to the Internet exists, this should be enabled.

Round-robin cache?

Chooses whether to use round-robin scheduling between multiple parent caches in the absence of ICP queries. This should be set on all parents that you would like to schedule in this way.

ICP time-to-live

Defines the multicast TTL for ICP packets. When using multicast ICP, it is usually wise for security and bandwidth reasons to use the minimum tty suitable for your network.

Cache weighting

Sets the weight for a parent cache. When using this option it is possible to set higher numbers for preferred caches. The default value is 1, and if left unset for all parent caches, whichever cache responds positively first to an ICP query will be sent a request to fetch that object.

Closest only

Allows you to specify that your cache wants only CLOSEST_PARENT_MISS replies from parent caches. This allows your cache to then request the object from the parent cache closest to the origin server.

No digest?

Chooses whether this neighbor cache should send cache digests.

No NetDB exchange

When using ICP, it is possible for Squid to keep a database of network information about the neighbor caches, including availability and RTT, or Round Trip Time, information. This usually allows Squid to choose more wisely which caches to make requests to when multiple caches have the requested object.

No delay?

Prevents accesses to this neighbor cache from affecting delay pools. Delay pools, discussed in more detail later, are a means by which Squid can regulate bandwidth usage. If a neighbor cache is on the local network, and bandwidth usage between the caches does not need to be restricted, then this option can be used.

Login to proxy

Select this if you need to send authentication information when challenged by the neighbor cache. On local networks, this type of security is unlikely to be necessary.

Multicast responder

Allows Squid to know where to accept multicast ICP replies. Because multicast is fed on a single IP to many caches, Squid must have some way of determining which caches to listen to and what options apply to that particular cache. Selecting Yes here configures Squid to listen for multicast replies from the IP of this neighbor cache.

Query host for domains, Don’t query for domains

These two options are the only options on this page to configure a directive other than cache_peer in Squid. In this case it sets the cache_peer_domain option. This allows you to configure whether requests for certain domains can be queried via ICP and which should not. It is often used to configure caches not to query other caches for content within the local domain. Another common usage, such as in the national web hierarchies discussed above, is to define which web cache is used for requests destined for different TLDs. So, for example, if one has a low cost satellite link to the U.S. backbone from another country that is preferred for web traffic over the much more expensive land line, one can configure the satellite-connected cache as the cache to query for all .com, .edu, .org, net, .us, and .gov addresses.

Cache Selection Options

This section provides configuration options for general ICP configuration (Figure 12-3). These options affect all of the other neighbor caches that you define.

click to expand
Figure 12-3: Some global ICP options

Directly fetch URLs containing

Allows you to configure a match list of items to always fetch directly rather than query a neighbor cache. The default here is cgi-bin ? and should continue to be included unless you know what you’re doing. This helps prevent wasting intercache bandwidth on lots of requests that are usually never considered cacheable, and so will never return hits from your neighbor caches. This option sets the hierarchy_stoplist directive.

ICP query timeout

The time in milliseconds that Squid will wait before timing out ICP requests. The default allows Squid to calculate an optimum value based on average RTT of the neighbor caches. Usually, it is wise to leave this unchanged. However, for reference, the default value in the distant past was 2000, or 2 seconds. This option edits the icp_query_timeout directive.

Multicast ICP timeout

Timeout in milliseconds for multicast probes, which are sent out to discover the number of active multicast peers listening on a given multicast address. This configures the mcast_icp_query_timeout directive and defaults to 2000 ms, or 2 seconds.

Dead peer timeout

Controls how long Squid waits to declare a peer cache dead. If there are no ICP replies received in this amount of time, Squid will declare the peer dead and will not expect to receive any further ICP replies. However, it continues to send ICP queries for the peer and will mark it active again on receipt of a reply. This timeout also affects when Squid expects to receive ICP replies from peers. If more than this number of seconds have passed since the last ICP reply was received, Squid will not expect to receive an ICP reply on the next query. Thus, if your time between requests is greater than this timeout, your cache will send more requests DIRECT rather than through the neighbor caches.