Resin and Caching | Mastering Resin

When the usefulness of the Web became apparent after the flurry of activity in the late 1990s, site developers suddenly had to worry about staleness of a site, response times, bandwidth consumption, server usage, and other such topics. The fundamental theme today is to make the Web faster (with better performing servers) and less expensive to implement and maintain. All of these topics revolve around page caching.

Overview of Page Caching

Although it might seem like a simple topic, page caching is important because it brings advantages to the Web without much effort on the developer's part. Some of these advantages are as follows:

Reduced page latency— Users quickly gets the information they are seeking.
Less backend hardware— Cached pages don't need as many databases and Web servers.
Reduced bandwidth— Pages cached on the browser don't need server access.

Caching can occur on both the client and server as well as in a semi-middle tier. On the client, the cache is found in the Web browser; it handles storing pages returned from a server. The cache can typically be sized according to the needs of the user; the size can range from a single megabyte to many hundreds. The performance of built-in Web caches isn't always the best, so third-party extensions are available to increase size and performance.

When you browse to a large and potentially busy Web site, you may see a URL redirection away from the primary site to a proxy location where a proxy cache is used to store pages. The proxy cache is situated between the client and the server, typically near a network gateway. If the proxy cache is located near a gateway, users on the local network don't need to traverse the entire Internet to retrieve pages.

The last cache is found in the Web server. These caches are usually called proxy caches; you can find them built into the Web server or as a separate product in front of the Web server. Clients hit the cache before hitting the server.

How Web Caches Work

The functionality of most caches is governed by rules established in the HTTP protocol. These rules work with the page passed from the Web server to the client browser to determine whether the page should be cached. Some of the rules are as follows:

Secure pages are not cached.
Authenticated pages are not cached.
If a page says to not cache it, the page isn't cached.

A page is cached if:

The expiration date on the page is valid.
The page has been in the cache before, and the cache only checks once a session.
The modification date is well in the past.

Let's examine how a page tells the Web browser all these things.

HTML Metatags and HTTP Headers

One of the least common tags used by page developers is the metatag. A metatag provides attributes about the page that are read by the Web browser. Unfortunately, metatags are generally read by browser caches and aren't used by most proxy caches. The proxy cache doesn't read the Web page the way a Web browser does, because the Web browser needs to display the information contained in the page.

One of the most likely candidates for inclusion in a Web page by Web designers is the pragma: no-cache metatag. Its format is as follows:

 <META HTTP-EQUIV="Pragma" CONTENT="no-cache">

This metatag is designed to be read and followed in a manner such that the page isn't stored in the cache. Unfortunately, because many caches won't see the tag and some won't honor it, it is a waste of page space.

The alternative to metatags is to use HTTP headers that pass caching information. You don't place an HTTP header directly in Web page; instead, you add it to page content dynamically at the server. All Web pages begin their transfer from the server to the Web browser with an HTTP header, which is seen by any proxy caches and the Web browser.

Here's an example:

 HTTP/1.1 200 OK Date: Fri, 13 Sep 2002 13:50:24 GMT Server: Apache/1.3.3 (Unix) Cache-Control: max-age=3600, must-revalidate Expires: Fri, 13 Sep 2002 14:50:04 GMT Last-Modified: Mon, 29 May 2002 03:23:54 GMT ETag: "5c23-763-2333accd" Content-Length: 1045 Content-Type: text/html

After this information comes the HTML for the served Web page. As you can see from the header, there are two powerful cache headers: Cache-Control and Expires. The Expires header is recognized by most caches and Web browsers; it lets you specify for the cache a time period when the page is still fresh and can be served to clients without asking the server for a new copy. Setting a long expires value lets clients and proxy caches store your pages and allows for quick responses. The value provided in the Expires header is always a date/time stamp supplied in GMT time. A value in the past (by even a second) will cause the page to be marked uncacheable.

The Cache-Control header was introduced in the HTTP 1.1 protocol to help caches do their job of caching pages. The Cache-Control header can contain a number of attributes:

max-age=[seconds— The maximum time a page is fresh and can be served from the cache. The seconds start from the time the page was requested.
s-maxag[seconds]— Same as max-age, but used for proxy caches.
public— Allows a normally uncached page to be cached.
no-cache— Indicates that the page should not be cached by either the Web browser or the proxy cache.
must-revalidate— Forces the cache to be strict with the Cache-Control attributes. (The HTTP protocol allows caches to fudge some on the serving of cached pages.)
proxy-revalidate— Same as must-revalidate, but for proxy caches only.

You can find complete information in the HTTP 1.1 specification at www.ietf.org/rfc/rfc2616.txt. Now, let's see how to use this information with XTP, JSP, and servlets in the Resin server.