Caching Expensive Results with MemCached

Table of contents:

Credit: Michael Granger with Ben Bleything

Problem

You want to transparently cache the results of expensive operations, so that code that triggers the operations doesn need to know how to use the cache. The memcached program, described in Recipe 16.16, lets you use other machines RAM to store key-value pairs. The question is how to hide the use of this cache from the rest of your code.

Solution

If you have the luxury of designing your own implementation of the expensive operation, you can design in transparent caching from the beginning. The following code defines a get method that delegates to expensive_get if it can find an appropriate value in the cache. In this case, the expensive operation that gets cached is the (relatively inexpensive, actually) string reversal operation:

	require 
ubygems
	require memcache

	class DataLayer
	
	 def initialize(*cache_servers)
	 @cache = MemCache.new(*cache_servers)
	 end

	 def get(key)
	 @cache[key] ||= expensive_get(key)
	 end
	 alias_method :[], :get

	 protected
	 def expensive_get(key)
	 # …do expensive fetch of data for key
	 puts "Fetching expensive value for #{key}"
	 key.to_s.reverse
	 end
	end

Assuming youve got a memcached server running on your local machine, you can use this DataLayer as a way to cache the reversed versions of strings:

	layer = DataLayer.new( localhost:11211 )

	3.times do
	 puts "Data for foo: #{layer[foo]}"
	end
	# Fetching expensive value for foo
	# Data for foo: oof
	# Data for foo: oof

Discussion

Thats the easy case. But you don always get the opportunity to define a data layer from scratch. If you want to add memcaching to an existing data layer, you can create a caching strategy and add it to your existing classes as a mixin.

Heres a data layer, already written, that has no caching:

	class MyDataLayer
	 def get(key)
	 puts "Getting value for #{key} from data layer"
	 return key.to_s.reverse
	 end
	end

The data layer doesn know about the cache, so all of its operations are expensive. In this instance, its reversing a string every time you ask for it:

	layer = MyDataLayer.new

	"Value for foo: #{layer.get(foo)}"
	# Getting value for foo from data layer
	# => "Value for foo: oof"

	"Value for foo: #{layer.get(foo)}"
	# Getting value for foo from data layer
	# => "Value for foo: oof"

	"Value for foo: #{layer.get(foo)}"
	# Getting value for foo from data layer
	# => "Value for foo: oof"

Lets improve performance a little by defining a caching mixin. Itll wrap the get method so that it only runs the expensive code (the string reversal) if the answer isn already in the cache:

	require memcache

	module GetSetMemcaching
	 SERVER = localhost:11211

	 def self::extended(mod)
	 mod.module_eval do
	 alias_method :__uncached_get, :get
	 remove_method :get

	 def get(key)
	 puts "Cached get of #{key.inspect}"
	 get_cache()[key] ||= __uncached_get(key)
	 end
	 def get_cache
	 puts "Fetching cache object for #{SERVER}"
	 @cache ||= MemCache.new(SERVER)
	 end
	 end
	 super
	 end

	 def self::included(mod)
	 mod.extend(self)
	 super
	 end
	end

Once we mix GetSetMemcaching into our data layer, the same code we ran before will magically start to use use the cache:

	# Mix in caching to the pre-existing class
	MyDataLayer.extend(GetSetMemcaching)

	"Value for foo: #{layer.get(foo)}"
	# Cached get of "foo"
	# Fetching cache object for localhost:11211
	# Getting value for foo from data layer
	# => "Value for foo: oof"

	"Value for foo: #{layer.get(foo)}"
	# Cached get of "foo"
	# Fetching cache object for localhost:11211
	# => "Value for foo: oof"

	"Value for foo: #{layer.get(foo)}"
	# Cached get of "foo"
	# Fetching cache object for localhost:11211
	# => "Value for foo: oof"

The examples above are missing a couple features youd see in real life. Their API is very simple (just get methods), and they have no cache invalidationitems will stay in the cache forever, even if the underlying data changes.

The same basic principles apply to more complex caches, though. When you need a value thats expensive to find or calculate, you first ask the cache for the value, keyed by its identifying feature. The cache might map a SQL query to its result set, a primary key to the corresponding database object, an array of compound keys to the corresponding database object, and so on. If the object is missing from the cache, you fetch it the expensive way, and put it in the cache.

Problem

Solution

Discussion

See Also