Recipe4.5.Creating Lists of Lists Without Sharing References


Recipe 4.5. Creating Lists of Lists Without Sharing References

Credit: David Ascher

Problem

You want to create a multidimensional list but want to avoid implicit reference sharing.

Solution

To build a list and avoid implicit reference sharing, use a list comprehension. For example, to build a 5 x 10 array of zeros:

multilist = [[0 for col in range(5)] for row in range(10)]

Discussion

When a newcomer to Python is first shown that multiplying a list by an integer repeats that list that many times, the newcomer often gets quite excited about it, since it is such an elegant notation. For example:

>>> alist = [0] * 5

is clearly an excellent way to get an array of 5 zeros.

The problem is that one-dimensional tasks often grow a second dimension, so there is a natural progression to:

>>> multi = [[0] * 5] * 3 >>> print multi [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]

This appears to work, but the same newcomer is then often puzzled by bugs, which typically can be boiled down to a snippet such as:

>>> multi[0][0] = 'oops!' >>> print multi [['oops!', 0, 0, 0, 0], ['oops!', 0, 0, 0, 0], ['oops!', 0, 0, 0, 0]]

This issue confuses most programmers at least once, if not a few times (see the FAQ entry at http://www.python.org/doc/FAQ.html#4.50). To understand the issue, it helps to decompose the creation of the multidimensional list into two steps:

>>> row = [0] * 5          # a list with five references to 0 >>> multi = [row] * 3      # a list with three references to the row object

This decomposed snippet produces a multi that's identical to that given by the more concise snippet [[0]*5]*3 shown earlier, and it has exactly the same problem: if you now assign a value to multi[0][0], you have also changed the value of multi[1][0] and that of multi[2][0] . . . , and, indeed, you have changed the value of row[0], too!

The comments are key to understanding the source of the confusion. Multiplying a sequence by a number creates a new sequence with the specified number of new references to the original contents. In the case of the creation of row, it doesn't matter whether or not references are being duplicated, since the referent (the object being referred to) is a number, and therefore immutable. In other words, there is no practical difference between an object and a reference to an object if that object is immutable. In the second line, however, we create a new list containing three references to the contents of the [row] list, which holds a single reference to a list. Thus, multi contains three references to a single list object. So, when the first element of the first element of multi is changed, you are actually modifying the first element of the shared list. Hence the surprise.

List comprehensions, as shown in the "Solution", avoid the problem. With list comprehensions, no sharing of references occursyou have a truly nested computation. If you have followed the discussion thoroughly, it may have occurred to you that we don't really need the inner list comprehension, only the outer one. In other words, couldn't we get just the same effect with:

multilist = [[0]*5 for row in range(10)]

The answer is that, yes, we could, and in fact using list multiplication for the innermost axis and list comprehension for all outer ones is fasterover twice as fast in this example. So why don't I recommend this latest solution? Answer: the speed improvement for this example is from 57 down to 24 microseconds in Python 2.3, from 49 to 21 in Python 2.4, on a typical PC of several years ago (AMD Athlon 1.2 GHz CPU, running Linux). Shaving a few tens of microseconds from a list-creation operation makes no real difference to your application's performance: and you should optimize your code, if at all, only where it matters, where it makes a substantial and important difference to the performance of your application as a whole. Therefore, I prefer the code shown in the recipe's Solution, simply because using the same construct for both the inner and the outer list creations makes it more conceptually symmetrical and easier to read!

See Also

Documentation for the range built-in function in the Library Reference and Python in a Nutshell.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net