Recipe19.2.Building a List from Any Iterable


Recipe 19.2. Building a List from Any Iterable

Credit: Tom Good, Steve Alexander

Problem

You have an iterable object x (it might be a sequence or any other kind of object on which you can iterate, such as an iterator, a file, a dict) and need a list object y, with the same items as x and in the same order.

Solution

When you know that iterable object x is bounded (so that, e.g., a loop for item in x would surely terminate), building the list object you require is trivial:

y = list(x)

However, when you know that x is unbounded, or when you are not sure, then you must ensure termination before you call list. In particular, if you want to make a list with no more than n items from x, then standard library module itertools' function islice does exactly what you need:

import itertools y = list(itertools.islice(x, N))

Discussion

Python's generators, iterators, and sundry other iterables, are a wondrous thing, as this entire chapter strives to point out. The powerful and generic concept of iterable is a great way to represent all sort of sequences, including unbounded ones, in ways that can potentially save you huge (and even infinite!) amounts of memory. With the standard library module itertools, generators you can code yourself, and, in Python 2.4, generator expressions, you can perform many manipulations on completely general iterables.

However, once in a while, you need to build a good old-fashioned full-fledged list object from such a generic iterable. For example, building a list is the simplest way to sort or reverse the items in the iterable, and lists have many other useful methods you may want to apply. As long as you know for sure that the iterable is bounded (i.e., has a finite number of items), just call list with the iterable as the argument, as the "Solution" points out. In particular, avoid the goofiness of misusing a list comprehension such as [i for i in x], when list(x) is faster, cleaner, and more readable!

Calling list won't help if you're dealing with an unbounded iterable. The need to ensure that some iterable x is bounded also arises in many other contexts, besides that of calling list(x): all "accumulator" functions (sum(x), max(x), etc.) intrinsically need a bounded-iterable argument, and so does a statement such as for i in x (unless you have appropriate conditional breaks within the loop's body), a test such as if i in x, and so on.

If, as is frequently the case, all you want is to ensure that no more than n items of iterable x are taken, then itertools.islice, as shown in the "Solution", does just what you need. The islice function of the standard library itertools module offers many other possibilities (essentially equivalent to the various possibilities that slicing offers on sequences), but out of all of them, the simple "truncation" functionality (i.e., take no more than n items) is by far the most frequently used. The programming language Haskell, from which Python took many of the ideas underlying its list comprehensions and generator expression functionalities, has a built-in take function to cater to this rather frequent need, and itertools.islice is most often used as an equivalent to Haskell's built-in take.

In some cases, you cannot specify a maximum number of items, but you are able to specify a generic condition that you know will eventually be satisfied by the items of iterable x and can terminate the proceedings. itertools.takewhile lets you deal with such cases in a very general way, since it accepts the controlling predicate as a callable argument. For example:

y = list(itertools.takewhile((11)._ _cmp_ _, x))

binds name y to a new list made up of the sequence of items in iterable x up to, but not including, the first one that equals 11. (The reason we need to code (11)._ _cmp_ _ with parentheses is a somewhat subtle one: if we wrote 11._ _cmp_ _ without parentheses, Python would parse 11. as a floating-point literal, and the entire construct would be syntactically invalid. The parentheses are included to force the tokenization we mean, with 11 as an integer literal and the period indicating an access to its attribute, in this case, bound method _ _cmp_ _.)

For the special and frequent case in which the terminating condition is the equality of an item to some given value, a useful alternative is to use the two-arguments variant of the built-in function iter:

y = list(iter(iter(x).next, 11))

Here, the iter(x) call (which is innocuous if x is already an iterator) gives us an object on which we can surely access callable (bound method) nextwhich is necessary, because iter in its two-arguments form requires a callable as its first argument. The second argument is the sentinel value, meaning the value that terminates the iteration as soon as an item equal to it appears. For example, if x were a sequence with items 1, 6, 3, 5, 7, 11, 2, 9, . . , y would now be the list [1, 6, 3, 5, 7]. (The sentinel value itself is excluded: from the beginning, included, to the end, excluded, is the normal Python convention for just about all loops, implicit or explicit.)

See Also

Library Reference documentation on built-ins list and iter, and module itertools.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net