Types of Caches | Scalable Internet Architectures

There are five basic types of caching approaches when dealing with content distribution. Each has pros and cons, and more importantly each has its place. Often it is awkward, or simply impossible, to implement a specific caching approach due to the nature of the underlying data.

Layered/Transparent Cache

The transparent (or layered) cache should be well understood by now. This is the technology deployed to solve our image-serving problem in Chapter 6. The caching proxy is placed in front of the architecture and all requests arrive at the proxy. The proxy attempts to service the request by using its private cache. If the response to that request has not yet been cached (or the cache has expired or otherwise been destroyed), the cache is (re)populated by issuing a request for the content to the authoritative source and storing the response.

This technique is ideal for static content because it is often easy to build commodity proxy caches capable of holding the entire static content store (capacity-wise). This means that as time goes on, the cache will closely resemble the authoritative source, and nearly 100% of all requests issued to the cache can be satisfied on the spot from its local copy.

This technology is called transparent because many implementations allow you to place a cache in front of web servers without any configuration changes. In Chapter 6, we chose not to use the technology in this fashion because we wanted to play tricks with our domain name (images.example.com). In a local web environment, the cache can intercept data and proxy it back to web servers by acting as a bridge or as a configured part of the architecture in the form of an IP gateway. Either way, web clients have no knowledge of its placement in the architecture, and it has no noticeable side effects from their perspective.

These proxy caches have an additional added value discussed briefly in Chapter 6. This value is acceleration and it is appropriate to delve into it more deeply now. Web clients (users with browsers) will most certainly be connecting to the website over a long-haul link that has a distinctly poorer performance than the main egress point of the website to the Internet. This means that the website could push the request content back to the client very rapidly if it wasn't for the slow connection over which it must feed the data. This is often referred to as spoon-feeding clients.

Application servers differ from static content servers in that they have more work to do. A static server interprets a request and hands back the appropriate content to the client. A dynamic server interprets a request and is responsible for constructing the content and then handing it back to the client. The act of constructing the content often requires more expensive resources than simply serving it (more RAM and faster processors). This means that application servers sport a higher price tag than their static counterparts.

At the end of the day, you do not want to look back and find that you spent a considerable amount of resources on your application servers spoon-feeding clients. An application server should be generating content and delivering it as fast as possible so that it can proceed to the next dynamic content generation job in its queue. This is where proxy caches shine.

A proxy cache is an "accelerator" if it sits between the client and the application server. In this configuration, the connection between the proxy cache and application server is a local area one that boasts high throughput and low latency (shown in Figure 7.2). This means that the response from the application server can be completed quickly and stored on the proxy cache allowing the application server to proceed to the next request. The proxy cache then spoon-feeds the content to the slow client and acts as a cost-effective buffer preventing slow clients from affecting the application servers.

Figure 7.2. An accelerating web proxy cache

If most pages served from an application server are unique, it is clear that caching the data on a transparent cache will be of little value because the cache hit-rate will be nearly 0%. However, the accelerating aspect of the technology alone is often worth the investment.

Integrated (Look-Aside) Cache

An integrated cache steps far away from the transparent caching in both implementation and mentality. Transparent caching is simply a black-box architectural component that can be placed in front of an architecture for added benefit. Integrated caching is, as you might assume, integrated into the architecturemore specifically into the application. Integrated caching is most similar to the programming technique of computational reuse where we store the results of expensive computations on the side so that they don't need to be repeated.

Computational Reuse

Computational reuse is a programming technique that attempts to capitalize on situations where the cost of storing the results of a computation and later finding them again is less expensive than performing the computation again. The classic computer science example of computational reuse is the computation of Fibonacci numbers.

The nth Fibonacci number is defined as F(n) = F(n-1) + F(n-2) where F(0) = 0 and F(1) = 1. A small perl function to calculate the nth Fibonacci number would be as follows:

sub fib {     my $o = shift;     die "Only natural number indexes." if($o < 0 || $o != int($o));     return $o if ($o < 2);     return fib($o-2) + fib($o-1); }

Running fib(30) on my laptop takes approximately 6.1 seconds. Why is this simple calculation so slow? When I calculate fib(28) and then proceed to calculate fib(29), I end up calculating fib(28) again. I calculate fib(28) twice, fib(27) tHRee times, fib(26) five times, fib(25) eight times...and fib(2) 514229 times! (Do you see the sequence there? Math is fascinating.)

Let's make a simple attempt at storing the result of each calculation so that every subsequent time we are asked for fib(n) we look up the answer instead of calculating it:

my %fib_cache; sub fib_cr {     my $o = shift;     die "Only natural number indexes." if($o < 0 || $o != int($o));     return $o if($o < 2);     # Here we return a cached result if one exists.     return $fib_cache{$o} if(exists($fib_cache{$o}));     # Here we cache the result and return it.     return $fib_cache{$o} = fib_cr($o-2) + $fib_cr($o-1); }

Running fib_cr(30) on my laptop takes less than 0.01 secondsroughly a 61000% speed up. This painfully obvious example demonstrates the value of computational reuse. This particular example's simplicity and obvious benefit are the reasons for its place as the "poster child" of computational reuse.

Real Application

Although the Fibonacci example may seem contrived, it illustrates a simple concept: It is often cheaper to store a previous calculation and retrieve it later than it is to perform the calculation again from scratch. This is true in all arenas of computer science, engineering, and life.

Integrated caches are most similar to computational reuse because the application itself attempts to remember (cache) any acquisition of data (or transformation of data) considered by the developer to be frequently used and sufficiently expensive.

Computational reuse itself can be applied directly to core application code, but when the problem is extrapolated to an architecture with multiple interdependent components, the technique evolves from computational reuse to integrated caching; and with that evolution comes complexity.

It is important to digress for a moment and understand that the gain from computation reuse is a performance gain and does not imply scalability. Computation reuse is an edict of high-performance computer programming, and although high-performance does not ensure scalability, low performance is a death warrant.

The gain from integrated caching is a step toward horizontal scalability (although not always a good step to take, as you will see). Integrated caching allows one component in an architecture to use another component less. A good example is the use of a database as the backing store for content on a website. You could query the database every time a specific page was requested, or you could cache the results and future requests would not require a database query.

The concept of an application code cache is to allow the application to store the result of an expensive operation and determine an answer to that query more quickly by referencing local data than by requerying the authoritative source (see Figure 7.3).

Figure 7.3. An integrated look-aside cache

Although computational reuse is a good approach in computer programming, it makes the assumption that the result of a particular question will not change. The 19th Fibonacci number isn't going to change when we aren't looking. However, if we were to cache a user's preferences or another result of an expensive database query, the answer is likely to change when we aren't looking.

This leaves us with the challenge of architecting a cache invalidation infrastructure or "fudging it" by setting cache timeoutsbasically accepting that an answer is good for some magical amount of time, and if it changes in that time, we will be stuck with the old incorrect answer.

A beautiful example of integrated caching is that which is inside recursive domain name system (DNS) servers. When I perform a lookup on www.example.com, I issue a simple question to my local recursive DNS server and expect a simple response. My DNS server must query root servers to determine who is responsible for the example.com domain and then query the responsible partyand this is the simplest case. If my DNS server did this every time I needed www.example.com, the Internet would be a wholly unsatisfying (slow) place. Instead, my DNS server caches the answer for some amount of time so that subsequent identical requests can be answered on the spot.

Why can't we use the same technique in our web applications? The technique clearly applies, but DNS has the advantage (called good engineering) that every piece of data is distributed with a time-to-live (TTL). Unlike web applications, every DNS system has insight into how long it is allowed to cache the result before it must throw away that answer as stale and then requery.

In web applications, the queries performed typically do not have an explicit TTL and the only implicit TTL for data is the life of the request itself. This means that the data changes, and we don't know how fast it will change. These circumstances make caching beyond the life of a single web request difficult to maintain and debug, and often lead to compromises that can be avoided by using data caches or write-thru caches.

Although integrated caches in their most simple form are an awkward fit for large-scale web architectures, we can use aspects of this technology combined with write-thru caches and distributed caches to realize true horizontal scalability.

Data Cache

The data cache is a simple concept. It is effectively computational reuse on a lower level than the integrated cache. With a data cache, we expect our core services (those on which the application relies) to employ computational reuse and extensive caching.

As we defined web caching, this doesn't exactly fit. The idea here is to have a database or other core service that efficiently uses a cache so that the web application can transparently benefit. The only reason that data caches are relevant is that it is possible to place a database replica on each web server and then we can more legitimately call data caching a form of web caching.

The intrinsic problem with integrated caching is that the cache is distributed and uncoordinated. If one node running the application performs an SQL query against a database and stores the answer, it has no sound way of determining whether the underlying data that formed the response has changed. As such, it has no good way of knowing whether it will receive the same answer if it issues the same question. However, one system does know whether the data has changedthe database.

We will just accept the fact that the vast majority of Internet architectures use a relation database management system (RDBMS) to store their data. All queries for that data are posed to the RDBMS, and it is responsible for answering them.

MySQL, for example, employs a data cache calling it a query cache. Essentially, it caches the complete answers to questions that are asked of it. Along with those answers it stores a reference to the data on which their accuracy depends. If that data changes, the answer is efficiently invalidated.

This is a simpler solution than an integrated cache on the web application level because a single product is in complete control of both the caching infrastructure and the authoritative data.

What does this buy you? The vast majority of data that powers Internet sites has a high read-to-write ratio, meaning that questions are asked of the database far more often than data is placed in the database. Although databases are fast and robust, they still must spend time answering queries, and that time is precious.

Write-Thru and Write-Back Caches

Write-back caches are classic computer engineering. You will find this technology in every computer processor, code optimizer, and disk drive. Any time that it is expensive to save a piece of data that is also commonly accessed, you will find a write-back cache. A write-back cache means that all modifications of data are performed on the cache. When the cache is full and entries are removed, they are then written to the backing store if needed. This simple design is depicted in Figure 7.4.

Figure 7.4. Write-Thru and Write-Back caches

If you stop reading here and get a cup of coffee, you should be able to come up with a few catastrophic problems with the write-back architecture.

A cache on most systems is a faster storage medium than the authoritative source for which it caches. In a web proxy cache, it is the cheap disk drive on the cheap box serving static contentno data is modified on those drives, so a failure cannot incur data loss. In a disk drive, volatile memory (DRAM) is faster than the physical disk drives. If data is written to the volatile cache on a drive and that cache operates in write-back mode, what happens when the power fails? Data does not make it back to disk, and you have inconsistency, corruption, and lost data.

Some IDE drives enable write-back cache that cannot be turned off! External RAID arrays often use write-back caches and have expensive battery units capable of keeping all the drives up and running long enough to completely journal its cache back to permanent storage.

You may be wondering how in the world you would incorporate a write-back cache paradigm in a distributed/cluster web application architecture? Well, there is certainly no good concrete example I can find. The concept does not adapt well to web systems; but, let's jump to write-back's brother named write-thru.

Write-thru caches operate in an interesting fashion. They exploit the fact that reads and writes in many systems are temporally relative. What does this mean? Reads often occur on data most recently written. A concrete example is the addition of a news article to a database. New news is read more often than old news. The news most recently added to the system will be the news most frequently requested from the system.

A write-thru cache writes the data to the cache and to the backing store knowing that a read request (or many read requests) are likely to follow for that data.

Distributed Cache

Distributed caching is a loosely defined term. Any time more than a single machine is being used to cache information you can argue that the solution is distributed. Herein, we refine our view of distributed caches to those that are collaborative. What does this really mean?

In a distributed cache, when a particular node computes and stores the results in its cache, other nodes will benefit. In effect, it is no longer "its" cache, but rather "the" cache. All the nodes participating in the cluster share a common cache where information is shared.

This sounds a lot like putting the cache in a centralized database. However, there is one distinct difference. Distributed caches are distributed, which means that the data is not stored in any single place. Typically, all data is stored at every node. Chapter 8, "Distributed Databases Are Easy, Just Read the Fine Print," touches on more complicated data replication techniques that will have information placed on some plural subset of machines within a cluster. In both situations, the data is available on the nodes that use the cache, and any single system failure will not pose a threat to availability.

Distributed systems are pretty tricky in general. What seem like simple problems at first, turn out to be complicated to solve in a general case. Let's look at a simple distributed cache that operates like a dictionary of keys and values both of which are simple strings. Caching techniques apply directly applies to countless technologies, but we will look at stock quotes and SSL session caching as two acutely different examples.

Simple Distributed Information Caches

For simplicity's sake, we will assume that we have real-time stock quotes feeding in from ACME Tickers. One of the services on our news site is real-time stock quotes on every page if the user opts for that in her personal preferences. We want all the information on these stock quotes on every web server so as not to stress or rely on any single point for the information. If we were to use a centralized database and it were to go down, all stock quotes would be unavailable. However, if we were to keep a cached copy on every box, and the main distribution point were to become unavailable, information would be stale but available until the distribution point was once again operational. This is a spectacular example of why people think distributed caches are easy.

In this example, changes are effected at a single point in the architecture and propagated from there. It is not the generalized case of distributed caching. It is a specific, distributed cache that operates on a one-to-many replication paradigm shown in Figure 7.5.

Figure 7.5. One-to-many replication for stock quotes

This sort of information dissemination can be accomplished via asynchronous database replication (discussed in Chapter 8), as well as a simple, reliable multicast approach using a technology such as Spread.

Not-So-Simple Distributed Information Caches

Now look at an SSL session cache. When an SSL connection is established, a session key and a corresponding session context are established. When a browser has a session ID and initiates another request, if it is found in the server's session cache, an expensive asymmetric key negotiation can be avoided.

In a clustered environment where we want to capitalize on all of our resources as efficiently as possible, we want to be able to allocate a request to the machine that has the most sufficient resources to satisfy it. In a typical SSL environment, subsequent requests are often sent back to the same server to avoid expensive cryptographic computationseven if that server is overloaded.

Alternatively, the SSL session information can be placed in a distributed cache (shown in Figure 7.6) so that subsequent requests can be serviced by any machine in the cluster without unnecessary overhead induced by an "absent" SSL session.

Figure 7.6. Commutative replication of SSL sessions

This problem is distinctly different from the stock quote problem. In this system, we must publish from all participating nodes and as such, the issues of partitions and merges of the network arise.

Ben Laurie's Splash! SSL session caching system based on Spread tackles some of these issues, and Spread Concept's Replicated Hash Table (RHT) product, although it doesn't integrate directly with SSL session caching systems, tackles all the commutative replication requirements for such a distributed caching architecture.