Section 13.1. Browser-Side Cache | Ajax Design Patterns

13.1. Browser-Side Cache

Auto-Update, Memoise, Memoize, Sync, Synchronise, Sychronize, Real-Time

Figure 13-1. Browser-Side Cache

13.1.1. Developer Story

Devi's produced a physics simulation and, though it's functional, she's disappointed that it keeps pausing to request calculations from the server. To speed it up, she sets up a cache that will retain the calculations locally.

13.1.2. Problem

How can you make the system respond quickly to user activity?

13.1.3. Forces

The application should respond to user actions quicklyideally, instantaneously.
Many user actions require a response from the server.
Responses from the server can be noticeably latent due to data transfer and server processing overheads.

13.1.4. Solution

Retain server results in a Browser-Side Cache. The cache is a JavaScript map-style object that holds query-result pairs; the queries are cache keys and the server results are server results. So whenever the browser needs to query the server, first check the cache. If the query is held as a key in the cache, the corresponding value is used as the result, and there's no need to access the server. Caching like this is performed in many systems, but there are some interesting Ajax-specific issues, such as the relationship with the browser's built-in cache and the asynchronous nature of calls.

XMLHttpRequest Call (Chapter 6) explains that call results can already be cached by the browser, so you might wonder why this pattern exists. After all, the browser not only handles caching for you, but can cache huge quantities of data using the filesystem, and, unlike a Browser-Side Cache, this data will usually live beyond the current session. The reason you'd use a Browser-Side Cache is to exert more control over caching. You get to choose how much data is cached and how long it lasts, whereas those things are set by the user in a web browser cache. Moreover, you bypass the notorious portability issues associated with cache control, choosing exactly what will be and what will not be cached. Another advantage of a Browser-Side Cache over the web browser's cache is that you can save JavaScript objects you've built up, thus saving the effort of reconstructing them later on (a process known as "memoisation").

What exactly is the format of the key and value? In the simplest case, the query would just be the URL, and the result would be the response body. However, that's not so useful because if you make calls correctly, as discussed in XMLHttpRequest Call (Chapter 6) and RESTful Service (Chapter 9), the browser will handle caching on your behalf, and as it's backed by the filesystem, it can cache much greater quantities of data. However, more useful are caches that hold high-level semantic contenttypically the results of processing server responses.

A cache data structure requires some way to access the key-value pairs "randomly"that is, directly, without having to traverse the entire data structure until the desired key-value pair is found. In JavaScript, a cache can be created using an associative array (in JavaScript, this is the same as an object). So an empty array is created at startup and gradually accumulates server results. Of course, an Ajax App lives indefinitely, so the array could continue growing if not controlled. There are a few ways to handle this, as discussed later in "Decisions."

It's common in many systems to treat the cache as a Proxy, as shown in Figure 13-2 (see Gamma et al.'s [1995] Proxy pattern). That is, clients retrieve results from an object called something like "ResultFetcher," which encapsulates anything to do with fetching of results. From their perspective, the ResultFetcher just provides results. It doesn't matter to the client whether all results are directly pulled from the server or whether some of them are cached inside ResultFetcher.

Figure 13-2. Standard synchronous caching setup

Here's where the asynchronous nature of XMLHttpRequest adds a slight complication. Under normal circumstances, a caching ResultFetcher always returns the desired value synchronously, whether the result was in the cache or not. But with Ajax, it can only do this if the result was indeed found in the cache. Otherwise, the server response will come back some time later, after the fetching function has already returned. To deal with this, you might consider having the requester pass a callback function to the fetching procedure. So the fetching procedure never returns anything directly, but instead ensures the callback function is eventually called with the desired value. If it knows the value immediately, the callback function will be called straightaway, and if not, it will be called later on. In other words, the ResultFetcher has roughly the same interface and external behavior as XMLHttpRequest (or a suitable wrapper).

13.1.5. Decisions

13.1.5.1. What will be stored as keys? For values?

The solution mentioned a couple of possibilities for keys and values. In the first instance, they might simply be URLs and response bodies. Using the entire URL and response body does come at a cost, however. Both contain a lot of useless information, which limits how many items can be stored in the cache. An alternative is to use semantically related values. For example, the key might be customer names, and the values might be customer objects. Semantic values also lets you incorporate other data, such as results from calculations.

13.1.5.2. How will cache size be kept under control?

You typically keep the cache size under control by deciding on its capacity and on some way to remove elements when that size has been reached. Each new element usually results in an older element being discarded. For efficiency, you might discard a large batch at once, and then let the cache gradually build up again.

Two common algorithms are (http://en.wikipedia.org/wiki/Cache_algorithms):

Least Recently Used (LRU): The discarded item is the one with the longest time since it was last retrieved.
Least Frequently Used (LFU): The discarded item is the one that has been retrieved the least.

Both algorithms are feasible in JavaScript, provided you use the right data structure. "Code Example: Cached Sum Demo," later, illustrates LRU.

13.1.5.3. How will you protect against stale data?

Regarding stale data, the first question to ask is, "How recent does the data have to be?" If it's real-time stats being used by a nurse to monitor a patient's health, it probably needs to be pretty recent. So much so, that a cache might even be out of the question. If it's a student perusing a 50-year old article on ancient Athenian literature, a 12-hour cache will do fine.

There are several ways to enforce this decision:

Attach a timestamp to each item that goes into the cache. Whenever you retrieve an item, inspect the timestamp to determine if the result is recent enough.
As a variant on the above, schedule a periodic loop to actively delete stale items.
Implement a browser-server protocol that allows the browser to determine if items are stale. So the browser might keep a timestamp for each item, and the server exposes a service to accept a timestamp. The server only needs to send the whole response if the value has changed since that time. An alternative to timestamps would be a hash functiona function that the browser runs against the cached value that can then be compared by the server against the item's most recent hash value, to see if a change has occurred.
Implement a service on the server that announces changes that have occurred. Use Periodic Refresh (Chapter 10) to actively listen for such changes and to delete from the cache any items that have changed.

13.1.6. Real-World Examples

This section contains a XMLHttpRequest Call library with caching support. "Real-World Examples" in Predictive Fetch includes several other systems with Browser-Side Caches.

13.1.6.1. libXmlRequest library

Stephen W. Cote's libXmlRequest (http://www.whitefrost.com/servlet/connector?file=reference/2003/06/17/libXmlRequest.html) was one of the earlier wrappers of the XMLHttpRequest object. A typical asynchronous request looks like this:

   getXml(path, callbackFunction, 1, requestId);

To make the request cached, simply add an extra argument at the end:

   getXml(path, callbackFunction, 1, requestId, 1);

The approach shows how cache functionality can be made orthogonal to core application functionality.

13.1.7. Code Example: Cached Sum Demo

The Basic Sum Demo (http://ajaxify.com/run/sum/) has to resort to the server each time the cache is reached. This refactoring adds caching functionality to reduce server calls and is implemented in three stages.

13.1.7.1. Including the query in the response

The asynchronous nature of XMLHttpRequest Call separates the initial query from the response, so it's not always clear when a request comes in what the corresponding request was. One way to achieve this is to include the original query as part of the response. That's the first refactoring here. The refactoring is detailed in XML Message (Chapter 9), and the net effect is a response like the one that is shown next (http://ajaxify.com/run/sum/xml/sumXML.php?figure1=4&figure2=8&figure3=).

   <sum>     <inputs>       <figure >4</figure>       <figure >8</figure>       <figure ></figure>     </inputs>     <outputs>12</outputs>   </sum>

Previously, the response was just 12. The resulting value is extracted from the XML in a slightly different way, but the inputs are not used until the next iteration.

13.1.7.2. An infinite cache

The next refactoring creates a very basic cacheone that has no regard for the laws of physicsit just keeps growing indefinitely. That's a bad thing, but a useful stepping stone to the final iteration. The cache holds the sum against a figuresString, simply a comma-separated list of the three figures.

First, the cache is created by a global variable and initialized from the onload method:

   var sumByFigures;   ...   function restartCache(html) {     sumByFigures = new Array( );     ...   }

Each time a sum is submitted, figuresString is calculated to form a key for the cache. The cache is then interrogated to see if it already contains that sum. If not, an asynchronous call is set up. Either way, the ultimate consequence is that repaintSum( ) will eventually be called with the new sum. If the result is already cached, it will be called straightaway. If not, it will be called after the server has returned.

   function submitSum( ) {     ...     var figuresString =       figures.figure1 + "," + figures.figure2 + "," + figures.figure3;     var cachedSum = sumByFigures[figuresString];     if (cachedSum) {       repaintSum(cachedSum);     } else {       repaintSum("---");       ajaxCaller.get("sumXML.php", figures, onSumResponse, true, null);     }   }

onSumResponse not only calls repaintSum( ) with the value in the response, but also pushes the result onto the cache:

   function onSumResponse(xml, headers, callingContext) {     var sum = xml.getElementsByTagName("output")[0].firstChild.nodeValue;     repaintSum(sum);     var figures = xml.getElementsByTagName("figure");     var figuresString =   figures[0].firstChild.nodeValue + ","                         + figures[1].firstChild.nodeValue + ","                         + figures[2].firstChild.nodeValue;     sumByFigures[figuresString] = sum;   }

Finally, repaintSum is the function that detects a changeeither wayand simply morphs the display:

   function repaintSum(html) {     self.$("sum").innerHTML = html;   }

13.1.7.3. A finite cache

The final cache (http://www.ajaxify.com/run/sum/xml/cached/expiry/) enhances the previous version by introducing a least-recently used disposal algorithm. Each time a new item is added to the cache, the least recently used item is discarded from the cache. It would be inefficient to trawl through the entire cache each time that happens, comparing usage times. So, in addition to the associative array, a parallel data structure is composed. It's a queue, where each new item is pushed to the tail of the queue and gradually approaches the head as further items are pushed on. When the queue is full and an item is pushed on to the tail, each item moves down one, and the head item "falls off" the queue, so it is deleted from both the queue and the associative array. Whenever an item is retrieved, it's sent back to the tail of the queue. That's what ensures the least recently used item is always at the head.

The queue itself is a class with the following functions: enqueue( ), dequeue( ), and sendToTail( ). It works by tracking the head, the tail, and the size, and by keeping the items in a doubly linked list. For example, enqueue( ) is defined like this:

   Queue.prototype.enqueue = function(obj) {     newEntry = {       value: obj,       next: this.tail     }     if (this.tail) {       this.tail.prev = newEntry;     } else { // Empty queue       this.head = newEntry;     }     this.tail = newEntry;     this.size++;   }

Back to the sum application, which now declares a queue as well as the associative array:

   var figuresQueue, sumByFigures;

Each time a new element arrives from the server, it's sent to encache( ). encache will lop the least recently used item off both data structures if the queue is full. Then it will add the new item to both.

   function encache(figuresString, sum) {     // Add to both cache and queue.     // Before adding to queue, take out queue head and     // also remove it from the cache.     if (figuresQueue.size == cacheSize) {       removedFigures = figuresQueue.dequeue(figuresString);       delete figuresString[removedFigures];     }     figuresQueue.enqueue(figuresString);     sumByFigures[figuresString] = sum;     $("queueSummary").innerHTML = figuresQueue.describe( );   }

Whenever the cache is queried, the queried value is not only returned, but is also sent back to the tail of the queue to mark it as having been recently used:

   function queryCache(figuresString) {     // Recently used, so move corresponding entry back to tail of queue     // if it exists.     figuresQueue.sendToTail(figuresString);     $("queueSummary").innerHTML = figuresQueue.describe( );     return sumByFigures[figuresString];   }

With these abstractions in place, there is not much change to the core part of the sum script. submitSum( ) queries the cache and calls the server if the result is not found. And the server response handler ensures new results are added to the cache:

   function submitSum( ) {     ...     var cachedSum = queryCache(figuresString);     if (cachedSum) {       repaintSum(cachedSum);     } else {       repaintSum("---");       ajaxCaller.get("sumXML.php", figures, onSumResponse, true, null);     }   }   ...   function onSumResponse(xml, headers, callingContext) {     ...     encache(figuresString, sum);   }

13.1.8. Alternatives

13.1.8.1. Built-in browser cache

As explained in the solution, the built-in browser cache is an alternative form of cache. It's better suited for larger data HTML, but is more difficult to control than a Browser-Side Cache.

13.1.8.2. Server-side cache

Caching data on the server cuts down on processing, especially when the data is shared by multiple users. However, that's mainly a benefit if the server processing is the bottleneck. Bandwidth is more often the chief constraint, and the server-side cache won't reduce browser-server traffic. The best option is often a combination of browser-side and server-side caching.

13.1.9. Related Patterns

13.1.9.1. Submission Throttling

Browser-Side Cache focuses on "read caching," in which responses from the server are held in the cache. Another flavor of caching is "write caching," where output is held in a buffer to defer outputting to its true destination. Submission Throttling (Chapter 10) is a form of write-caching.