3.6 Content Integrity

only for RuBoard - do not distribute or recompile

3.6 Content Integrity

Can you trust the information you receive from a cache? How do you know it has not been modified? How do you know it is what the origin server intends for you to see?

This is an extremely difficult problem, with no known solutions at this time. TCP does not currently provide any form of end-to-end security, which means this problem is not specific to HTTP or the Web. The Transport Layer Security protocol (TLS, formerly Secure Sockets Layer) does provide end-to-end security on top of the network transport protocols. TLS protocols [Dierks And Allen, 1999] are designed to prevent eavesdropping, tampering, and message forgery. However, the security provided by TLS is in effect only for the duration of the data transfer. It does not guarantee ” especially for cache hits ”that the object you receive has not been modified since the origin server generated it. Unfortunately, we do not have a general purpose digital signature scheme for web objects. Even if such a thing did exist, to be of any real value it would require out-of- band communication for the key exchange. In other words, it would be pointless to retrieve signing keys from the cache.

Recent security features being added to DNS [Eastlake, 1999] might be able to support a scheme for authenticating web objects. For example, lets say you request the URL http://www.monkeybrains.net/index.html . The response is an HTML page that includes, in comments, a digital signature. To validate the signature, you need the public key of the author or owner. Such keys can be entered into a DNS zone. Continuing with our example, we query the DNS for a http.www.monkeybrains.net KEY record. The returned key (if any) and the signature are enough to prove that the HTML page is authentic .

To date, I am not aware of any caches that have been broken into and had cache content modified. However, on numerous occasions, origin server security has been compromised, and the perpetrators have replaced the normal home page content with something else. Usually these pranks are short-lived and not a real problem. If the bogus pages make it into web caches, though, some users could receive the wrong content even after the origin server has been restored.

Another way to get bogus content to web users is to attack the DNS. Most networked applications in use today, including proxy caches, inherently trust the answers to their DNS queries. If my cache asks for the IP address of www.microsoft.com and gets a wrong answer, it happily connects to that address and retrieves the wrong content. The best way to prevent this from happening is to remain up-to-date with new releases (and patches) of name resolver software (e.g., BIND).

Assuming incorrect objects have been loaded into a cache, what can be done to get rid of them? In most cases, it is sufficient to issue a request with a Pragma:no-cache header. This operation, most easily accomplished by clicking the Reload button in a browser, replaces the old object with a new one. Occasionally, this may not work, or you may want to remove an object entirely instead of replacing it. Caching products should provide this functionality via their management interface. With Squid, you can use the client program to issue a purge request:

 client -m PURGE http://ircache.nlanr.net/badobject

only for RuBoard - do not distribute or recompile