2.5 Validation

only for RuBoard - do not distribute or recompile

2.5 Validation

I've already discussed cache validation in the context of cache hits versus misses. Upon receiving a request for a cached object, the cache may want to validate the object with the origin server. If the cached object is still valid, the server replies with a short HTTP 304 (Not Modified) message. Otherwise, the entire new object is sent. HTTP/1.1 provides two validation mechanisms: last-modified timestamps and entity tags.

2.5.1 Last-modified Timestamps

Under HTTP/1.0, timestamps are the only type of validator. Even though HTTP/1.1 provides a new technique, last-modified timestamps remain in widespread use. Most HTTP responses include a Last-modified header that specifies the time when the resource was last changed on the origin server. The Last-modified timestamp is given in Greenwich Mean Time (GMT) with one-second resolution, for example:

 HTTP/1.1 200 OK Date: Sun, 04 Mar 2001 03:57:45 GMT Last-Modified: Fri, 02 Mar 2001 04:09:20 GMT 

For objects that correspond to regular files on the origin server, this timestamp is the filesystem modification time.

When a cache validates an object, this same timestamp is sent in the If-modified-since header of a conditional GET request.

 GET http://www.ircache.net/ HTTP/1.1 If-Modified-Since: Wed, 14 Feb 2001 15:35:26 GMT 

If the server's response is 304 (Not Modified), the cache's object is still valid. In this case, the cache must update the object to reflect any new HTTP response header values, such as Date and Expires . If the server's response is not 304, the cache treats the server's response as new content, replaces the cached object, and delivers it to the client.

The use of timestamps as validators has a number of undesirable consequences:

  • A file's timestamp might get updated without any change in the actual content of the file. Consider, for example, moving your entire origin server document tree from one disk partition to another. Depending on the method you use to copy the files, the modification times may not be preserved. Then, any If-modified-since requests to your server result in 200 (OK) replies, instead of 304 (Not Modified), even though the content of the files has not changed. [6]

    [6] On Unix, cp -p and tar -p preserve modification times and other file attributes.

  • Dealing with timestamps becomes complicated when different systems have different notions of the current time. Unfortunately, it is not safe to assume that all Internet hosts are synchronized to the same time. It's easy to find clocks that are off by days, months, and even years . Thus, we can't really compare an origin server timestamp to the local time. It should be compared only to other origin server timestamps, such as the Date and Expires values. All hosts involved in serving web objects, and especially caches, should synchronize their clocks to known, reliable sources using the Network Time Protocol (NTP). Unix hosts can use the ntpd or xntpd programs.

  • If-modified-since values cannot be used for objects that may be updated more frequently than once per second.

On the other hand, timestamps have some nice characteristics as well:

  • Timestamps can be stored internally with a relatively small amount of memory (typically four bytes). The fixed data size simplifies data structures and coding.

  • We can derive some meaning from the last-modified timestamp. That is, we can use it for more than just validation. As mentioned in the previous section, last-modified times are often used in heuristics to estimate expiration times.

2.5.2 Entity Tags

HTTP/1.1 provides another kind of validator known as an entity tag . An entity tag is an opaque string used to identify a specific instance of an object, for example:

 HTTP/1.1 200 OK ETag: "8cac4-276e-35b36b6a" 

A cache uses an entity tag to validate its object with the If-none-match request header:

 GET /index.html HTTP/1.1 If-None-Match: "8cac4-276e-35b36b6a" 

Upon receiving this request, the origin server examines its metadata for the object. If the entity tag ( 8cac4-276e-35b36b6a ) is still valid, the server returns a 304 (Not Modified) reply. Otherwise, the server ignores the If-none-match header and processes the request normally. Most likely, it returns a 200 (OK) response with the new, updated content.

The phrase "If-none-match" is a bit difficult to grasp at first. To understand it, you need to realize that HTTP/1.1 allows an origin server to associate multiple entity tags with a single resource. A cache can store different versions of the same object (same URI), each with a unique entity tag. If the cached responses are stale, the cache learns which are still valid by using the If-none-match header, listing all of its entity tags for that resource:

 GET /index.html HTTP/1.1 If-None-Match: "foo","bar","xyzzy" 

In essence, a cache says to an origin server, "If none of these entity tags are valid, send me the new content."

If the origin server's reply is a 304 (Not Modified) message, it must tell the cache which tag is valid. The valid entity tag is given in the Etag: header of a 304 response:

 HTTP/1.1 304 Not Modified ETag: "xyzzy" 

Unlike the Last-modified validator, entity tag validators are opaque to the cache. In other words, the cache cannot derive any meaning from an entity tag. The only operation a cache can perform on an entity tag is a test for equality (string comparison). Timestamps, however, are more useful to caches because they do convey some meaning. A cache can use the Last-modified timestamp to calculate the LM-factor and apply the heuristics described previously to identify objects likely to still be fresh. If servers all over the world suddenly stopped sending last-modified timestamps and sent only entity tags, caches would find it much more difficult to predict freshness. Fortunately, RFC 2616 (Section 13.3.4) advocates sending both last-modified timestamps and entity tags whenever possible.

2.5.3 Weak and Strong Validators

RFC 2616 defines two types of validators: weak and strong. Strong validators are used for exact, byte-for-byte equivalence of the content. Weak validators, on the other hand, are used for semantic equivalence. Consider, for example, a very simple change to a text document, such as correcting a spelling error. The correction does not alter the meaning of the document, but it does change the bits of the file. In this situation, a conditional request with a strong validator would return the new content, while the same conditional request with a weak validator would return "not modified."

HTTP/1.1 requires strong comparison in some circumstances and allows weak comparison in others. When strong comparison is required, only strong validators may be used. When weak comparison is allowed, either type of validator may be used.

An entity tag is, by default, a strong validator. Thus, even the slightest change to an origin server resource requires the entity tag to change as well. The Apache server uses a checksum over the message body as one component of its entity tags, so this happens automatically. If an origin server wants to create a weak entity tag, it prefixes the validator with "w/":

 Etag: "w/foo-bar-123" 

Last-modified timestamps, by comparison, are implicitly weak, unless a cache can determine that it is strong according to a complex set of rules. The reason is that the timestamp provides only single-second resolution. It's possible that an object gets modified twice (or more) in a one-second interval. If this actually happens, and the protocol requires strong validation, users could receive the wrong response. A caching proxy that does not support entity tags is better off forwarding conditional requests to origin servers when strong comparison is required.

HTTP/1.1 has many rules regarding weak and strong validators. These details are a bit too overwhelming to cover here, but you can find more information in Section 13.3.3 of RFC 2616.

only for RuBoard - do not distribute or recompile


Web Caching
Web Caching
ISBN: 156592536X
EAN: N/A
Year: 2001
Pages: 160

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net