Recipe4.2.Constructing Lists with List Comprehensions

Recipe 4.2. Constructing Lists with List Comprehensions

Credit: Luther Blissett

Problem

You want to construct a new list by operating on elements of an existing sequence (or other kind of iterable).

Solution

Say you want to create a new list by adding 23 to each item of some other list. A list comprehension expresses this idea directly:

thenewlist = [x + 23 for x in theoldlist]

Similarly, say you want the new list to comprise all items in the other list that are larger than 5. A list comprehension says exactly that:

thenewlist = [x for x in theoldlist if x > 5]

When you want to combine both ideas, you can perform selection with an if clause, and also use some expression, such as adding 23, on the selected items, in a single pass:

thenewlist = [x + 23 for x in theoldlist if x > 5]

Discussion

Elegance, clarity, and pragmatism, are Python's core values. List comprehensions show how pragmatism can enhance both clarity and elegance. Indeed, list comprehensions are often the best approach even when, instinctively, you're thinking not of constructing a new list but rather of "altering an existing list". For example, if your task is to set all items greater than 100 to 100, in an existing list object L, the best solution is:

L[:] = [min(x,100) for x in L]

Assigning to the "whole-list slice" L[:] alters the existing list object in place, rather than just rebinding the name L, as would be the case if you coded L = . . . instead.

You should not use a list comprehension when you simply want to perform a loop. When you want a loop, code a loop. For an example of looping over a list, see Recipe 4.4. See Chapter 19 for more information about iteration in Python.

It's also best not to use a list comprehension when another built-in does what you want even more directly and immediately. For example, to copy a list, use L1 = list(L), not:

L1 = [x for x in L]

Similarly, when the operation you want to perform on each item is to call a function on the item and use the function's result, use L1 = map(f, L) rather than L1 = [f(x) for x in L]. But in most cases, a list comprehension is just right.

In Python 2.4, you should consider using a generator expression, rather than a list comprehension, when the sequence may be long and you only need one item at a time. The syntax of generator expressions is just the same as for list comprehensions, except that generator expressions are surrounded by parentheses, ( and ), not brackets, [ and ]. For example, say that we only need the summation of the list computed in this recipe's Solution, not each item of the list. In Python 2.3, we would code:

total = sum([x + 23 for x in theoldlist if x > 5])

In Python 2.4, we can code more naturally, omitting the brackets (no need to add additional parenthesesthe parentheses already needed to call the built-in sum suffice):

total = sum(x + 23 for x in theoldlist if x > 5)

Besides being a little bit cleaner, this method avoids materializing the list as a whole in memory and thus may be slightly faster when the list is extremely long.

The Reference Manual section on list displays (another name for list comprehensions) and Python 2.4 generator expressions; Chapter 19; the Library Reference and Python in a Nutshell docs on the itertools module and on the built-in functions map, filter, and sum; Haskell is at http://www.haskell.org.

Python borrowed list comprehensions from the functional language Haskell (http://www.haskell.org), changing the syntax to use keywords rather than punctuation. If you do know Haskell, though, take care! Haskell's list comprehensions, like the rest of Haskell, use lazy evaluation (also known as normal order or call by need). Each item is computed only when it's needed. Python, like most other languages, uses (for list comprehensions as well as elsewhere) eager evaluation (also known as applicative order, call by value, or strict evaluation). That is, the entire list is computed when the list comprehension executes, and kept in memory afterwards as long as necessary. If you are translating into Python a Haskell program that uses list comprehensions to represent infinite sequences, or even just long sequences of which only one item at a time must be kept around, Python list comprehensions may not be suitable. Rather, look into Python 2.4's new generator expressions, whose semantics are closer to the spirit of Haskell's lazy evaluationeach item gets computed only when needed.

Recipe4.2.Constructing Lists with List Comprehensions