Controlling Cachability

7.9 Controlling Cachability

HTTP defines several ways for a server to specify how long a document can be cached before it expires . In decreasing order of priority, the server can:

                Attach a Cache-Control: no-store header to the response.

                Attach a Cache-Control: must-revalidate header to the response.

                Attach a Cache-Control: no-cache header to the response.

                Attach a Cache-Control: max-age header to the response.

                Attach an Expires date header to the response.

                Attach no expiration information, letting the cache determine its own heuristic expiration date.

This section describes the cache controlling headers. The next section, Section 7.10 , describes how to assign different cache information to different content.

7.9.1 No-Cache and No-Store Headers

HTTP/1.1 offers several ways to mark an object uncachable. Technically, these uncachable pages should never be stored in a cache and, hence, never will get to the freshness calculation stage.

Here are a few HTTP headers that mark a document uncachable:

 Pragma: no-cache 
 Cache-Control: no-cache 
 Cache-Control: no-store 

RFC 2616 allows a cache to store a response that is marked "no-cache"; however, the cache needs to revalidate the response with the origin server before serving it. A response that is marked "no-store" forbids a cache from making a copy of the response. A cache should not store this response.

The Pragma: no-cache header is included in HTTP 1.1 for backward compatibility with HTTP 1.0+. It is technically valid and defined only for HTTP requests; however, it is widely used as an extension header for both HTTP 1.0 and 1.1 requests and responses. HTTP 1.1 applications should use Cache-Control: no-cache, except when dealing with HTTP 1.0 applications, which understand only Pragma: no-cache.

7.9.2 Max-Age Response Headers

The Cache-Control: max-age header indicates the number of seconds since it came from the server for which a document can be considered fresh. There is also an s-maxage header (note the absence of a hyphen in "maxage") that acts like max-age but applies only to shared (public) caches:

 Cache-Control: max-age=3600 
 Cache-Control: s-maxage=3600 

Servers can request that caches either not cache a document or refresh on every access by setting the maximum aging to zero:

 Cache-Control: max-age=0 
 Cache-Control: s-maxage=0 

7.9.3 Expires Response Headers

The deprecated Expires header specifies an actual expiration date instead of a time in seconds. The HTTP designers later decided that, because many servers have unsynchronized or incorrect clocks, it would be better to represent expiration in elapsed seconds, rather than absolute time. An analogous freshness lifetime can be calculated by computing the number of seconds difference between the expires value and the date value:

 Expires: Fri, 05 Jul 2002, 05:00:00 GMT 

Some servers also send back an Expires: 0 response header to try to make documents always expire, but this syntax is illegal and can cause problems with some software. You should try to support this construct as input, but shouldn't generate it.

7.9.4 Must-Revalidate Response Headers

The Cache-Control: must-revalidate response header tells the cache to bypass the freshness calculation mechanisms and revalidate on every access:

 Cache-Control: must-revalidate 

Attaching this header to a response is actually a stronger caching limitation than using Cache-Control: no-cache, because this header instructs a cache to always revalidate the response before serving the cached copy. This is true even if the server is unavailable, in which case the cache should not serve the cached copy, as it can't revalidate the response. Only the "no-store" directive is more limiting on a cache's behavior, because the no-store directive instructs the cache to not even make a copy of the resource (thereby always forcing the cache to retrieve the resource).

7.9.5 Heuristic Expiration

If the response doesn't contain either a Cache-Control: max-age header or an Expires header, the cache may compute a heuristic maximum age. Any algorithm may be used, but if the resulting maximum age is greater than 24 hours, a Heuristic Expiration Warning (Warning 13) header should be added to the response headers. As far as we know, few browsers make this warning information available to users.

One popular heuristic expiration algorithm, the LM-Factor algorithm, can be used if the document contains a last-modified date. The LM-Factor algorithm uses the last-modified date as an estimate of how volatile a document is. Here's the logic:

                If a cached document was last changed in the distant past, it may be a stable document and less likely to change suddenly, so it is safer to keep it in the cache longer.

                If the cached document was modified just recently, it probably changes frequently, so we should cache it only a short while before revalidating with the server.

The actual LM-Factor algorithm computes the time between when the cache talked to the server and when the server said the document was last modified, takes some fraction of this intervening time, and uses this fraction as the freshness duration in the cache. Here is some Perl pseudocode for the LM-factor algorithm:

 $time_since_modify = max(0, $server_Date - $server_Last_Modified); 
 $server_freshness_limit = int($time_since_modify * $lm_factor); 

Figure 7-16 depicts the LM-factor freshness period graphically. The cross-hatched line indicates the freshness period, using an LM-factor of 0.2.

Figure 7-16. Computing a freshness period using the LM-Factor algorithm

figs/http_0716.gif

Typically, people place upper bounds on heuristic freshness periods so they can't grow excessively large. A week is typical, though more conservative sites use a day.

Finally, if you don't have a last-modified date either, the cache doesn't have much information to go on. Caches typically assign a default freshness period (an hour or a day is typical) for documents without any freshness clues. More conservative caches sometimes choose freshness lifetimes of 0 for these heuristic documents, forcing the cache to validate that the data is still fresh before each time it is served to a client.

One last note about heuristic freshness calculationsthey are more common than you might think. Many origin servers still don't generate Expires and max-age headers. Pick your cache's expiration defaults carefully !

7.9.6 Client Freshness Constraints

Web browsers have a Refresh or Reload button to forcibly refresh content, which might be stale in the browser or proxy caches. The Refresh button issues a GET request with additional Cache-control request headers that force a revalidation or unconditional fetch from the server. The precise Refresh behavior depends on the particular browser, document, and intervening cache configurations.

Clients use Cache-Control request headers to tighten or loosen expiration constraints. Clients can use Cache-control headers to make the expiration more strict, for applications that need the very freshest documents (such as the manual Refresh button). On the other hand, clients might also want to relax the freshness requirements as a compromise to improve performance, reliability, or expenses. Table 7-4 summarizes the Cache-Control request directives.

Table 7-4. Cache-Control request directives

Directive

Purpose

Cache-Control: max-stale

Cache-Control: max-stale = <s>

The cache is free to serve a stale document. If the <s> parameter is specified, the document must not be stale by more than this amount of time. This relaxes the caching rules.

Cache-Control: min-fresh = <s>

The document must still be fresh for at least <s> seconds in the future. This makes the caching rules more strict.

Cache-Control: max-age = <s>

The cache cannot return a document that has been cached for longer than <s> seconds. This directive makes the caching rules more strict, unless the max-stale directive also is set, in which case the age can exceed its expiration time.

Cache-Control: no-cache Pragma: no-cache

This client won't accept a cached resource unless it has been revalidated.

Cache-Control: no-store

The cache should delete every trace of the document from storage as soon as possible, because it might contain sensitive information.

Cache-Control: only-if-cached

The client wants a copy only if it is in the cache..

7.9.7 Cautions

Document expiration isn't a perfect system. If a publisher accidentally assigns an expiration date too far in the future, any document changes she needs to make won't necessarily show up in all caches until the document has expired . [17] For this reason, many publishers don't use distant expiration dates. Also, many publishers don't even use expiration dates, making it tough for caches to know how long the document will be fresh.

[17] Document expiration is a form of "time to live" technique used in many Internet protocols, such as DNS. DNS, like HTTP, has trouble if you publish an expiration date far in the future and then find that you need to make a change. However, HTTP provides mechanisms for a client to override and force a reloading, unlike DNS.

 



HTTP. The Definitive Guide
HTTP: The Definitive Guide
ISBN: 1565925092
EAN: 2147483647
Year: 2001
Pages: 294

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net