Recipe18.5.Memoizing (Caching) the Return Values of Functions


Recipe 18.5. Memoizing (Caching) the Return Values of Functions

Credit: Paul Moore, Mitch Chapman, Hannu Kankaanp

Problem

You have a pure function that is often called with the same arguments (particularly a recursive function) and is slow to compute its results. You want to find a simple way to gain substantial improvement in performance.

Solution

The key idea behind memoizing is to store a function's results in a dictionary, keyed by the arguments that produce each result. This approach makes sense only for a pure function (i.e., one that yields the same result when called more than once with the same arguments). It's easy to memoize a function by hand. For example, using the recursive Fibonacci function, here is a manually memoized function:

fib_memo = {  } def fib(n):     if n < 2: return 1     if n not in fib_memo:         fib_memo[n] = fib(n-1) + fib(n-2)     return fib_memo[n]

Having to code the memoization inside each function to be memoized is repetitive and degrades the function's readability. An alternative is to encapsulate the memoization mechanics into a closure:

def memoize(fn):     memo = {  }     def memoizer(*param_tuple, **kwds_dict):         # can't memoize if there are any named arguments         if kwds_dict:             return fn(*param_tuple, **kwds_dict)         try:             # try using the memo dict, or else update it             try:                 return memo[param_tuple]             except KeyError:                 memo[param_tuple] = result = fn(*param_tuple)                 return result         except TypeError:             # some mutable arguments, bypass memoizing             return fn(*param_tuple)     # 2.4 only: memoizer._ _name_ _ = fn._ _name_ _     return memoizer

Using this memoize closure to memoize fib, the function definition becomes obvious, without caching boilerplate to obscure the algorithm. You must assign the memoize result to the same name, fib, as the recursive function; otherwise, the recursive calls bypass the memoizing:

def fib(n):     if n < 2: return 1     return fib(n-1) + fib(n-2) fib = memoize(fib)

This latest snippet shows that memoize is meant to be used exactly as a Python 2.4 decorator, so in Python 2.4, you could use decorator syntax (instead of the explicit call to memoize):

@memoize def fib(n):     if n < 2: return 1     return fib(n-1) + fib(n-2)

giving exactly the same semantics as the previous snippet.

Discussion

The memoize function is called with just one argument, a function f. It returns a closure memoizer that acts just like f but memoizes its arguments and result if the actual arguments to a call are hashable and positional. Calls with mutable or keyword arguments bypass the memoizing. If you're worried that such bypassing happens too often, making memoizing counterproductive, you should do a few dry runs that are representative of your intended production usage, with a closure that's modified to collect statistics. Then you can decide whether memoization is worth using for your specific application. Here's the modified memoize for this purpose:

def memoize(fn):     memo = {  }     def memoizer(*param_tuple, **kwds_dict):         if kwds_dict:             memoizer.namedargs += 1             return fn(*param_tuple, **kwds_dict)         try:             memoizer.cacheable += 1             try:                 return memo[param_tuple]             except KeyError:                 memoizer.misses += 1                 memo[param_tuple] = result = fn(*param_tuple)                 return result         except TypeError:             memoizer.cacheable -= 1             memoizer.noncacheable += 1             return fn(*param_tuple)     memoizer.namedargs = memoizer.cacheable = memoizer.noncacheable = 0     memoizer.misses = 0     return memoizer

Functions to be memoized must be pure (i.e., they must have no side effects and must always return the same value whenever they are called with the same arguments). More significantly, memoize returns a closure that does not memoize calls that receive mutable arguments, such as len on a list, nor functions that receive named parameters.

memoize cannot really check the semantics of the functions you wrap. The notions of same value and same arguments are vaguely defined in many cases, so take care. memoize does try to field occasional calls with keyword and mutable arguments (with an interesting mix of checking and try/except), but performance will suffer unless such cases are rare. This is why it's worth having around a version of memoize that keeps counts of the various possibilities, so that you can check their rarity.

See Also

Recipe 20.4 applies caching to class instances' attributes.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net