Recipe4.17.Finding Unions and Intersections of Dictionaries


Recipe 4.17. Finding Unions and Intersections of Dictionaries

Credit: Tom Good, Andy McKay, Sami Hangaslammi, Robin Siebler

Problem

Given two dictionaries, you need to find the set of keys that are in both dictionaries (the intersection) or the set of keys that are in either dictionary (the union).

Solution

Sometimes, particularly in Python 2.3, you find yourself using dictionaries as concrete representations of sets. In such cases, you only care about the keys, not the corresponding values, and often you build the dictionaries by calls to dict.fromkeys, such as

a = dict.fromkeys(xrange(1000)) b = dict.fromkeys(xrange(500, 1500))

The fastest way to compute the dict that is the set-union is:

union = dict(a, **b)

The fastest concise way to compute the dict that is the set-intersection is:

inter = dict.fromkeys([x for x in a if x in b])

If the number of items in dictionaries a and b can be very different, then it can be important for speed considerations to have the shorter one in the for clause, and the longer one in the if clause, of this list comprehension. In such cases, it may be worth sacrificing some conciseness in favor of speed, by coding the intersection computation as follows:

if len(a) < len(b):     inter = dict.fromkeys([x for x in a if x not in b]) else:     inter = dict.fromkeys([x for x in b if x not in a])

Python also gives you types to represent sets directly (in standard library module sets, and, in Python 2.4, also as built-ins). Here is a snippet that you can use at the start of a module: the snippet ensures that name set is bound to the best available set type, so that throughout the module, you can then use the same code whether you're using Python 2.3 or 2.4:

try:     set except NameError:     from sets import Set as set

Having done this, you can now use type set to best effect, gaining clarity and conciseness, and (in Python 2.4) gaining a little speed, too:

a = set(xrange(1000)) b = set(xrange(500, 1500)) union = a | b inter = a & b

Discussion

In Python 2.3, even though the Python Standard Library module sets offers an elegant data type Set that directly represents a set (with hashable elements), it is still common to use a dict to represent a set, partly for historical reasons. Just in case you want to keep doing it, this recipe shows you how to compute unions and intersections of such sets in the fastest ways, which are not obvious. The code in this recipe, on my machine, takes about 260 microseconds for the union, about 690 for the intersection (with Python 2.3; with Python 2.4, 260 and 600,respectively), while alternatives based on loops or generator expressions are substantially slower.

However, it's best to use type set instead of representing sets by dictionaries. As the recipe shows, using set makes your code more direct and readable. If you dislike the or-operator (|) and the "and-operator" (&), you can equivalently use a.union(b) and a.intersection(b), respectively. Besides clarity, you also gain speed, particularly in Python 2.4: computing the union still takes about 260 microseconds, but computing the intersection takes only about 210. Even in Python 2.3, this approach is acceptably fast: computing the union takes about 270 microseconds, computing the intersection takes about 650not quite as fast as Python 2.4 but still quite comparable to what you can get if you represent sets by dictionaries. Last but not least, once you use type set (whether it is the Python 2.4 built-in, or class Set from the Python Standard Library module sets, the interface is the same), you gain a wealth of useful set operations. For example, the set of elements that are in either a or b but not both is a^b or, equivalently, a.symmetric_difference(b).

Even if you start with dicts for other reasons, consider using sets anyway if you need to perform set operations. Say, for example, that you have in phones a dictionary that maps names to phone numbers and in addresses one that maps names to addresses. The clearest and simplest way to print all names for which you know both address and phone number, and their associated data, is:

for name in set(phones) & set(addresses):     print name, phones[name], addresses[name]

This is much terser, and arguably clearer, than something like:

for name in phones:     if name in addresses:         print name, phones[name], addresses[name]

Another excellent alternative is:

for name in set(phones).intersection(addresses):     print name, phones[name], addresses[name]

If you use the named intersection method, rather than the & intersection operator, you don't need to turn both dicts into sets: just one of them. Then call intersection on the resulting set, and pass the other dict as the argument to the intersection method.

See Also

The Library Reference and Python in a Nutshell sections on mapping types, module sets, and Python 2.4's built-in set type.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net