Recipe19.9.Looping Through the Cross-Product of Multiple Iterables


Recipe 19.9. Looping Through the Cross-Product of Multiple Iterables

Credit: Attila Vàsàrhelyi, Raymond Hettinger, Steven Taschuk

Problem

You need to loop through every item of multiple iterables cross-productwise, meaning that you first want to get the first item of the first iterable paired with all the others, next, the second item of the first iterable paired with all the others, and so forth.

Solution

Say you have two iterables (lists, in this case) such as:

a = ['a1', 'a2', 'a3'] b = ['b1', 'b2']

If you want to loop over their cross-product, the simplest approach is often just a couple of nested for loops:

for x in a:     for y in b:         print x, y

This snippet's output is six lines:

a1 b1 a1 b2 a2 b1 a2 b2 a3 b1 a3 b2

However, in many cases, you'd rather get all items in the "cross-product" of multiple iterables as a single, linear sequence, suitable for using in a single for or for passing onwards to other sequence manipulation functions, such as those supplied by itertools. For such needs, you may put the nested fors in a list comprehension:

for x, y in [(x,y) for x in a for y in b]:     print x, y

Discussion

A list comprehension lets you easily generate (as a single, linear sequence) all the pairings of several iterables (also known as the cross-product, product set, or Cartesian product of these iterables). However, the number of items in such a cross-product is the arithmetic product (multiplication) of the lengths of all the iterables involved, a number that may easily get quite large. A list comprehension, by definition, builds the entire list at once, which means that it may consume substantial amounts of memory. Also, you get to start iterating only when the whole cross-product list is entirely built.

Python 2.4 offers one obvious way to solve this problem: the newly introduced construct of generator expressions:

for x, y in ((x,y) for x in a for y in b): print x, y

A generator expression looks just like a list comprehension, except that it uses parentheses rather than brackets: it returns an iterator, suitable for looping on, rather than building and returning a list. Thus, a generator expression can save substantial amounts of memory, if you are iterating over a very long sequence. Also, you start executing the loop's body very soon, since each successive element gets generated iteratively, before each iteration of the loop's body. If your loop's body contains conditional breaks, so that execution terminates as soon as some conditions are met, using a generator expression rather than a list comprehension can mean a potentially substantial improvement in performance.

If you need to support Python 2.3, and yet you want to achieve the kind of advantages that generator expressions can afford over list comprehensions, the best approach may be to code your own generator. This is quite simple if you only need to deal with a known number of sequences, such as two:

def cross_two(a, b):     for x in a:         for y in b:             yield a, b

Dealing with an arbitrary number of sequences is a bit more complicated, but not terribly so, particularly if we use recursion to help:

def cross_loop(*sequences):     if sequences:         for x in sequences[0]:             for y in cross_loop(sequences[1:]):                 yield (x,) + y     else:         yield ( )

We can also do it without recursion. It's not hard if we're willing to build the entire result list in memory at once before returning it, just as a list comprehension would:

def cross_list(*sequences):     result = [[  ]]     for seq in sequences:         result = [sublist+[item] for sublist in result for item in seq]     return result

Alternatively, you can return map(tuple, result) if you need to ensure that each item of the sequence you return is a tuple, not a list.

Recursion-free iterative (incremental) generation of the "cross-product" sequence is also feasible, even though it's nowhere as simple as either the recursive or the nonincremental versions:

def cross(*sequences):     # visualize an odometer, with "wheels" displaying "digits"...:     wheels = map(iter, sequences)     digits = [it.next( ) for it in wheels]     while True:         yield tuple(digits)         for i in range(len(digits)-1, -1, -1):             try:                 digits[i] = wheels[i].next( )                 break             except StopIteration:                 wheels[i] = iter(sequences[i])                 digits[i] = wheels[i].next( )         else:             break

In Python 2.4, you might express the for statement more clearly as for i in reversed(range(len(digits))).

To repeat, it is important to remember that all of these solutions should be considered only if you do have the problemthat is, if and only if you do need to view all items in the "cross-product" of multiple iterables as a single, linear sequence. Many cases have no such requirement, and simply coding multiple nested for loops inline is quite acceptable, simpler, and more readable. In many cases, getting all items in the "cross-product" as a single sequence is preferable, so it's worth knowing how to do that. However, do keep in mind that simplicity is an important virtue, and do not lose sight of it in pursuit of a cool (but complicated) solution. All the cool tools, constructs, and library modules that Python offers exist strictly to serve you, to let you build and maintain your applications with minimal effort. Don't go out of your way to use the new shiny tools if you can solve your application's problems with less effort in simpler ways!

See Also

The Library Reference and Python in a Nutshell docs about built-ins iter, enumerate, map, and (Python 2.4 only) reversed; the Language Reference and Python in a Nutshell docs about list comprehensions and (Python 2.4 only) generator expressions.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net