2.6 Forcing a Cache to Refresh

only for RuBoard - do not distribute or recompile

2.6 Forcing a Cache to Refresh

One of the tradeoffs of caching is that you may occasionally receive stale data. What can you do if you believe (or know) that a cache has given you stale data? You need some way to refresh or validate the data received from the cache. HTTP provides a couple of mechanisms for doing just that. Clients can generate requests with Cache-control directives, the two most common of which are no-cache and max-age . We'll discuss no-cache first because it has been around the longest.

2.6.1 The no-cache Directive

The no-cache directive notifies a cache that it cannot return a cached copy. Even if a fresh copy of the response ”with a specific expiration time ”is in the cache, the client's request must be forwarded to the origin server. RFC 2616 calls such a request an "end-to-end validation" (Section 14.9.4). The no-cache directive is sent when you click on the Reload button on your browser. In an HTTP request, it looks like this:

 GET /index.html HTTP/1.1 Cache-control: no-cache

Recall that the Cache-control header does not exist in the HTTP/1.0 standard. Instead, HTTP/1.0 clients use a Pragma header for the no-cache directive:

 Pragma: no-cache

no-cache is the only directive defined for the Pragma header in RFC 1945. For backwards compatibility, RFC 2616 also defines the Pragma header. In fact, many of the recent HTTP/1.1 browsers still use Pragma for the no-cache directive instead of the newer Cache-control .

Note that the no-cache directive does not necessarily require the cache to purge its copy of the object. The client may generate a conditional request (with If-modified-since or another validator), in which case the origin server's response may be 304 (Not Modified). If, however, the server responds with 200 (OK), then the cache replaces the old object with the new one.

The interaction between no-cache and If-modified-since is tricky and often the source of some confusion. Consider, for example, the following sequence of events:

You are viewing an HTML page in your browser. This page is cached in your browser and was last modified on Friday, February 16, 2001, at 12:00:00.
The page author replaces the current HTML page with an older, backup copy of the page, perhaps with this Unix command:
```
 mv index.html.old index.html 
```
Now there is a "new" version of the HTML page on the server, but it has an older modification timestamp.

You try to reload the HTML page by using the Reload button. Your browser sends this request:

 GET http://www.foo.com/index.html Pragma: no-cache If-Modified-Since: Fri, 16 Feb 2001 09:46:18 GMT

The origin server sends a 304 (Not Modified) response and your browser displays the same page as before.

You could click on Reload until your mouse wears out and you would never get the "new" HTML page. What can you do to see the correct page?

If you are using Netscape Navigator, you can hold the Shift key down while clicking on Reload . This instructs Netscape to leave out the If-modified-since header. If you use Internet Explorer, hold down the Ctrl key while clicking on Reload . Alternatively, you can flush your browser's cache and then press Reload , which prevents the browser from sending an If-modified-since header in its request. Note that this is a user -agent problem, not a caching proxy problem.

In addition to the above problem, the Reload button, as implemented in most web browsers, leaves much to be desired. For example, it is not possible to reload a single inline image object. Similarly, it is not possible to reload web objects that are displayed externally from the browser, such as sound files and other "application" content types. If you need to refresh an image, PostScript document, or other externally displayed object, you may need to ask the cache administrator to do it for you. Some caches may have a web form that allows you to refresh cache objects. For this you need to know (and type in) the object's full URL.

Another problem with Reload is that it is often misused simply to rerequest a page. When the Web seems slow, we often interrupt a request as the page is being retrieved. To request the page again, you might use the Reload button. This, of course, sends the no-cache directive. Browsers do not have a button which requests a page again without sending no-cache . You can accomplish this by simply moving the cursor to the URL location box and pressing the Enter key.

As a cache administrator, you might wonder if caches ever can, or should, ignore the no-cache directive. A person who keeps a close watch on bandwidth usage might have the impression that the Reload button gets used much more often than necessary. Some products, such as Squid, have features that provide special treatment for no-cache requests. However, I personally do not recommend enabling these features because they violate the HTTP/1.1 protocol and leave users unable to get up-to-date information. One Squid option turns a no-cache request into an If-modified-since request. Another ignores the no-cache directive entirely.

2.6.2 The max-age Directive

The max-age directive specifies in seconds the maximum age of a cached response that the client is willing to accept. Whereas no-cache means "I won't accept any cached response," max-age says, "I will accept fresh responses that were last validated within this time period." To specify a limit of one hour , a client sends:

 Cache-control: max-age=3600

Note that max-age has lower precedence than an origin server expiration time. For example, if a cached object became stale five seconds ago, and it is requested with max-age=100000000 , the cache must still validate the object with the origin server. In other words, max-age can cause validation to occur but can never prevent it from occurring.

RFC 2616 calls a max-age=0 request an end-to-end revalidation , because it forces the cache to revalidate its stored response. The client might still get that stored response, unlike with an end-to-end reload ( no-cache ), where a cache is not allowed to return a cached response. The use of max-age=0 , however, is a request to validate a cached response. If the client includes its own validator, then it's a specific end-to-end validation . Otherwise, a caching proxy adds its own validator, if any, and the request is an unspecified end-to-end validation .

So far as I know, max-age is not widely supported in web browsers at this time. This is unfortunate, because a max-age=0 request is more likely to generate a short 304 (Not Modified) response, whereas a no-cache request is more likely to generate a full 200 (OK) response. It would be helpful if user agents had an option to set a max-age value in their requests. Users who prefer faster access over freshness would use a higher max-age setting. Similarly, those with lots of bandwidth and good response times could use a lower setting to ensure up-to-date responses.

2.6.3 The min-fresh Directive

In the previous section I talked about the max-stale directive, whereby clients can relax freshness requirements. Conversely, the min-fresh directive allows clients to specify more stringent freshness conditions. The value of the min-fresh directive is added to the object's age when performing staleness calculations. For example, if the client gives a min-fresh value of 3,600 seconds, the cache cannot return a response that would become stale within the next hour.

only for RuBoard - do not distribute or recompile