Recipe5.14.Enriching the Dictionary Type with Ratings Functionality

Recipe 5.14. Enriching the Dictionary Type with Ratings Functionality

Credit: Dmitry Vasiliev, Alex Martelli

Problem

You want to use a dictionary to store the mapping between some keys and a current score value for each key. You frequently need to access the keys and scores in natural order (meaning, in order of ascending scores) and to check on a "key"'s current ranking in that order, so that using just a dict isn't quite enough.

Solution

We can subclass dict and add or override methods as needed. By using multiple inheritance, placing base UserDict.DictMixin before base dict and carefully arranging our various delegations and "over"rides, we can achieve a good balance between getting good performance and avoiding the need to write "boilerplate" code.

By enriching our class with many examples in its docstring, we can use the standard library's module doctest to give us unit-testing functionality, as well as ensuring the accuracy of all the examples we write in the docstring:

#!/usr/bin/env python ''' An enriched dictionary that holds a mapping from keys to scores ''' from bisect import bisect_left, insort_left import UserDict class Ratings(UserDict.DictMixin, dict):     """ class Ratings is mostly like a dictionary, with extra features: the         value corresponding to each key is the 'score' for that key, and all         keys are ranked in terms their scores.  Values must be comparable; keys,         as well as being hashable, must be comparable if any two keys may ever         have the same corresponding value (i.e., may be "tied" on score).         All mapping-like behavior is just as you would expect, such as:         >>> r = Ratings({"bob": 30, "john": 30})         >>> len(r)         2         >>> r.has_key("paul"), "paul" in r         (False, False)         >>> r["john"] = 20         >>> r.update({"paul": 20, "tom": 10})         >>> len(r)         4         >>> r.has_key("paul"), "paul" in r         (True, True)         >>> [r[key] for key in ["bob", "paul", "john", "tom"]]         [30, 20, 20, 10]         >>> r.get("nobody"), r.get("nobody", 0)         (None, 0)         In addition to the mapping interface, we offer rating-specific         methods.  r.rating(key) returns the ranking of a "key" in the         ratings, with a ranking of 0 meaning the lowest score (when two         keys have equal scores, the keys themselves are compared, to         "break the tie", and the lesser key gets a lower ranking):         >>> [r.rating(key) for key in ["bob", "paul", "john", "tom"]]         [3, 2, 1, 0]         getValueByRating(ranking) and getKeyByRating(ranking) return the         score and key, respectively, for a given ranking index:         >>> [r.getValueByRating(rating) for rating in range(4)]         [10, 20, 20, 30]         >>> [r.getKeyByRating(rating) for rating in range(4)]         ['tom', 'john', 'paul', 'bob']         An important feature is that the keys( ) method returns keys in         ascending order of ranking, and all other related methods return         lists or iterators fully consistent with this ordering:         >>> r.keys( )         ['tom', 'john', 'paul', 'bob']         >>> [key for key in r]         ['tom', 'john', 'paul', 'bob']         >>> [key for key in r.iterkeys( )]         ['tom', 'john', 'paul', 'bob']         >>> r.values( )         [10, 20, 20, 30]         >>> [value for value in r.itervalues( )]         [10, 20, 20, 30]         >>> r.items( )         [('tom', 10), ('john', 20), ('paul', 20), ('bob', 30)]         >>> [item for item in r.iteritems( )]         [('tom', 10), ('john', 20), ('paul', 20), ('bob', 30)]         An instance can be modified (adding, changing and deleting         key-score correspondences), and every method of that instance         reflects the instance's current state at all times:         >>> r["tom"] = 100         >>> r.items( )         [('john', 20), ('paul', 20), ('bob', 30), ('tom', 100)]         >>> del r["paul"]         >>> r.items( )         [('john', 20), ('bob', 30), ('tom', 100)]         >>> r["paul"] = 25         >>> r.items( )         [('john', 20), ('paul', 25), ('bob', 30), ('tom', 100)]         >>> r.clear( )         >>> r.items( )         [  ]         """     ''' the implementation carefully mixes inheritance and delegation         to achieve reasonable performance while minimizing boilerplate,         and, of course, to ensure semantic correctness as above.  All         mappings' methods not implemented below get inherited, mostly         from DictMixin, but, crucially!, _ _getitem_ _ from dict. '''     def _ _init_ _(self, *args, **kwds):         ''' This class gets instantiated just like 'dict' '''         dict._ _init_ _(self, *args, **kwds)         # self._rating is the crucial auxiliary data structure: a list         # of all (value, key) pairs, kept in "natural"ly-sorted order         self._rating = [ (v, k) for k, v in dict.iteritems(self) ]         self._rating.sort( )     def copy(self):         ''' Provide an identical but independent copy '''         return Ratings(self)     def _ _setitem_ _(self, k, v):         ''' besides delegating to dict, we maintain self._rating '''         if k in self:             del self._rating[self.rating(k)]         dict._ _setitem_ _(self, k, v)         insort_left(self._rating, (v, k))     def _ _delitem_ _(self, k):         ''' besides delegating to dict, we maintain self._rating '''         del self._rating[self.rating(k)]         dict._ _delitem_ _(self, k)     ''' delegate some methods to dict explicitly to avoid getting         DictMixin's slower (though correct) implementations instead '''     _ _len_ _ = dict._ _len_ _     _ _contains_ _ = dict._ _contains_ _     has_key = _ _contains_ _     ''' the key semantic connection between self._rating and the order         of self.keys( ) -- DictMixin gives us all other methods 'for         free', although we could implement them directly for slightly         better performance. '''     def _ _iter_ _(self):         for v, k in self._rating:             yield k     iterkeys = _ _iter_ _     def keys(self):         return list(self)     ''' the three ratings-related methods '''     def rating(self, key):         item = self[key], key         i = bisect_left(self._rating, item)         if item == self._rating[i]:             return i         raise LookupError, "item not found in rating"     def getValueByRating(self, rating):         return self._rating[rating][0]     def getKeyByRating(self, rating):         return self._rating[rating][1] def _test( ):     ''' we use doctest to test this module, which must be named         rating.py, by validating all the examples in docstrings. '''     import doctest, rating     doctest.testmod(rating) if _ _name_ _ == "_ _main_ _":     _test( )

Discussion

In many ways, a dictionary is the natural data structure for storing a correspondence between keys (e.g., names of contestants in a competition) and the current "score" of each key (e.g., the number of points a contestant has scored so far, or the highest bid made by each contestant at an auction, etc.). If we use a dictionary for such purposes, we will probably want to access it often in natural orderthe order in which the keys' scores are ascendingand we'll also want fast access to the rankings (ratings) implied by the current "score"s (e.g., the contestant currently in third place, the score of the contestant who is in second place, etc.).

To achieve these purposes, this recipe subclasses dict to add the needed functionality that is completely missing from dict (methods rating, getValueByRating, getKeyByRating), and, more subtly and crucially, to modify method keys and all other related methods so that they return lists or iterators with the required order (i.e., the order in which scores are ascending; if we have to break ties when two keys have the same score, we implicitly compare the keys themselves). Most of the detailed documentation is in the docstring of the class itselfa crucial issue because by keeping the documentation and examples there, we can use module doctest from the Python Standard Library to provide unit-testing functionality, as well as ensuring that our examples are correct.

The most interesting aspect of the implementation is that it takes good care to minimize boilerplate (meaning repetitious and boring code, and therefore code where bugs are most likely to hide) without seriously impairing performance. class Ratings multiply inherits from dict and DictMixin, with the latter placed first in the list of bases, so that all methods come from the mixin, if it provides them, unless explicitly overridden in the class.

Raymond Hettinger's DictMixin class was originally posted as a recipe to the online version of the Python Cookbook and later became part of Python 2.3's standard library. DictMixin provides all the methods of a mapping except _ _init_ _, copy, and the four fundamental methods: _ _getitem_ _, _ _setitem_ _, _ _delitem_ _, and, last but not least, keys. If you are coding a mapping class and want to ensure that your class supports all of the many methods that a full mapping provides to application code, you should subclass DictMixin and supply at least the fundamental methods (depending on your class' semanticse.g., if your class has immutable instances, you need not supply the mutator methods _ _setitem_ _ and _ _delitem_ _). You may optionally implement other methods for performance purposes, overriding the implementation that DictMixin provides. The whole DictMixin architecture can be seen as an excellent example of the classic Template Method Design Pattern, applied pervasively in a useful mix-in variant.

In this recipe's class, we inherit _ _getitem_ _ from our other base (namely, the built-in type dict), and we also delegate explicitly to dict everything we can for performance reasons. We have to code the elementary mutator methods (_ _setitem_ _ and _ _delitem_ _) because, in addition to delegating to our base class dict, we need to maintain our auxiliary data structure self._ratinga list of (score, key) pairs that we keep in sorted order with the help of standard library module bisect. We implement keys ourselves (and while we're at it, we implement _ _iter_ _ i.e., iterkeys as well, since clearly keys is easiest to implement by using _ _iter_ _) to exploit self._rating and return the keys in the order we need. Finally, we add the obvious implementations for _ _init_ _ and copy, in addition to the three, ratings-specific methods that we supply.

The result is quite an interesting example of balancing concision, clarity, and well-advised reuse of the enormous amount of functionality that the standard Python library places at our disposal. If you use this module in your applications, profiling may reveal that a method that this recipe's class inherits from DictMixin has somewhat unsatisfactory performanceafter all, the implementations in DictMixin are, of necessity, somewhat generic. If this is the case, by all means add a direct implementation of whatever further methods you need to achieve maximum performance! For example, if your application performs a lot of looping on the result of calling r.iteritems( ) for some instance r of class Ratings, you may get slightly better performance by adding to the body of the class the direct implementation of the method:

    def iteritems(self):         for v, k in self._rating:             yield k, v

Recipe5.14.Enriching the Dictionary Type with Ratings Functionality