Recipe4.14.Inverting a Dictionary


Recipe 4.14. Inverting a Dictionary

Credit: Joel Lawhead, Ian Bollinger, Raymond Hettinger

Problem

An existing dict maps keys to unique values, and you want to build the inverse dict, mapping each value to its key.

Solution

You can write a function that passes a list comprehension as dict's argument to build the new requested dictionary:

def invert_dict(d):     return dict([ (v, k) for k, v in d.iteritems( ) ])

For large dictionaries, though, it's faster to use the generator izip from the itertools module in the Python Standard Library:

from itertools import izip def invert_dict_fast(d):     return dict(izip(d.itervalues( ), d.iterkeys( )))

Discussion

If the values in dict d are not unique, then d cannot truly be inverted, meaning that there exists no dict id such that for any valid key k, id[d[k]]==k. However, the functions shown in this recipe still construct, even in such cases, a "pseudo-inverse" dict pd such that, for any v that is a value in d, d[pd[v]]==v. Given the original dict d and the dict x returned by either of the functions shown in this recipe, you can easily check whether x is the true inverse of d or just d's pseudo-inverse: x is the true inverse of d if and only if len(x)==len(d). That's because, if two different keys have the same value, then, in the result of either of the functions in this recipe, one of the two keys will simply go "poof" into the ether, thus leaving the resulting pseudo-inverse dict shorter than the dict you started with. In any case, quite obviously, the functions shown in this recipe can work only if all values in d are hashable (meaning that they are all usable as keys into a dict): otherwise, the functions raise a TypeError exception.

When we program in Python, we normally "disregard minor optimizations," as Donald Knuth suggested over thirty years ago: we place a premium on clarity and correctness and care relatively little about speed. However, it can't hurt to know about faster possibilities: when we decide to code in a certain way because it's simpler or clearer than another, it's best if we are taking the decision deliberately, not out of ignorance.

Here, function invert_dict in this recipe's Solution might perhaps be considered clearer because it shows exactly what it's doing. Take the pairs k, v of key and value that method iteritems yields, swap them into (value, key) order, and feed the resulting list as the argument of dict, so that dict builds a dictionary where each value v is a key and the corresponding key k becomes that key's valuejust the inverse dict that our problem requires.

However, function invert_dict_fast, also in this recipe's Solution, isn't really any more complicated: it just operates more abstractly, by getting all keys and all values as two separate iterators and zipping them up (into an iterator whose items are the needed, swapped (value, key) pairs) via a call to generator izip, supplied by the itertools module of the Python Standard Library. If you get used to such higher abstraction levels, they will soon come to feel simpler than lower-level code!

Thanks to the higher level of abstraction, and to never materializing the whole list of pairs (but rather operating via generators and iterators that yield only one item at a time), function invert_dict_fast can be substantially faster than function invert_dict. For example, on my machine, to invert a 10,000-item dictionary, invert_dict takes about 63 milliseconds, but invert_dict_fast manages the same task in just 20 milliseconds. A speed increase by a factor of three, in general, is not to be sneered at. Such performance gains, when you work on large amounts of data, are the norm, rather than the exception, for coding at higher abstraction levels. This is particularly true when you can use itertools rather than loops or list comprehensions, because you don't need to materialize some large list in memory at one time. Performance gain is an extra incentive for getting familiar with working at higher abstraction levels, a familiarity that has conceptual and productivity pluses, too.

See Also

Documentation on mapping types and itertools in the Library Reference and Python in a Nutshell; Chapter 19.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net