Recipe 4.5. Creating Lists of Lists Without Sharing ReferencesCredit: David Ascher ProblemYou want to create a multidimensional list but want to avoid implicit reference sharing. SolutionTo build a list and avoid implicit reference sharing, use a list comprehension. For example, to build a 5 x 10 array of zeros: multilist = [[0 for col in range(5)] for row in range(10)] DiscussionWhen a newcomer to Python is first shown that multiplying a list by an integer repeats that list that many times, the newcomer often gets quite excited about it, since it is such an elegant notation. For example: >>> alist = [0] * 5 is clearly an excellent way to get an array of 5 zeros. The problem is that one-dimensional tasks often grow a second dimension, so there is a natural progression to: >>> multi = [[0] * 5] * 3 >>> print multi [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]] This appears to work, but the same newcomer is then often puzzled by bugs, which typically can be boiled down to a snippet such as: >>> multi[0][0] = 'oops!' >>> print multi [['oops!', 0, 0, 0, 0], ['oops!', 0, 0, 0, 0], ['oops!', 0, 0, 0, 0]] This issue confuses most programmers at least once, if not a few times (see the FAQ entry at http://www.python.org/doc/FAQ.html#4.50). To understand the issue, it helps to decompose the creation of the multidimensional list into two steps: >>> row = [0] * 5 # a list with five references to 0 >>> multi = [row] * 3 # a list with three references to the row object This decomposed snippet produces a multi that's identical to that given by the more concise snippet [[0]*5]*3 shown earlier, and it has exactly the same problem: if you now assign a value to multi[0][0], you have also changed the value of multi[1][0] and that of multi[2][0] . . . , and, indeed, you have changed the value of row[0], too! The comments are key to understanding the source of the confusion. Multiplying a sequence by a number creates a new sequence with the specified number of new references to the original contents. In the case of the creation of row, it doesn't matter whether or not references are being duplicated, since the referent (the object being referred to) is a number, and therefore immutable. In other words, there is no practical difference between an object and a reference to an object if that object is immutable. In the second line, however, we create a new list containing three references to the contents of the [row] list, which holds a single reference to a list. Thus, multi contains three references to a single list object. So, when the first element of the first element of multi is changed, you are actually modifying the first element of the shared list. Hence the surprise. List comprehensions, as shown in the "Solution", avoid the problem. With list comprehensions, no sharing of references occursyou have a truly nested computation. If you have followed the discussion thoroughly, it may have occurred to you that we don't really need the inner list comprehension, only the outer one. In other words, couldn't we get just the same effect with: multilist = [[0]*5 for row in range(10)] The answer is that, yes, we could, and in fact using list multiplication for the innermost axis and list comprehension for all outer ones is fasterover twice as fast in this example. So why don't I recommend this latest solution? Answer: the speed improvement for this example is from 57 down to 24 microseconds in Python 2.3, from 49 to 21 in Python 2.4, on a typical PC of several years ago (AMD Athlon 1.2 GHz CPU, running Linux). Shaving a few tens of microseconds from a list-creation operation makes no real difference to your application's performance: and you should optimize your code, if at all, only where it matters, where it makes a substantial and important difference to the performance of your application as a whole. Therefore, I prefer the code shown in the recipe's Solution, simply because using the same construct for both the inner and the outer list creations makes it more conceptually symmetrical and easier to read! See AlsoDocumentation for the range built-in function in the Library Reference and Python in a Nutshell. |