Section 8.13. Generator Expressions


8.13. Generator Expressions

Generator expressions extend naturally from list comprehensions ("list comps"). When list comps came into being in Python 2.0, they revolutionized the language by giving users an extremely flexible and expressive way to designate the contents of a list on a single line. Ask any long-time Python user what new features have changed the way they program Python, and list comps should be near the top of the list.

Another significant feature that was added to Python in version 2.2 was the generator. A generator is a specialized function that allows you to return a value and "pause" the execution of that code and resume it at a later time. We will discuss generators in Chapter 11.

The one weakness of list comps is that all of the data have to be made available in order to create the entire list. This can have negative consequences if an iterator with a large dataset is involved. Generator expressions resolve this issue by combining the syntax and flexibility of list comps with the power of generators.

Introduced in Python 2.4, generator expressions are similar to list comprehensions in that the basic syntax is nearly identical; however, instead of building a list with values, they return a generator that "yields" after processing each item. Because of this, generator expressions are much more memory efficient by performing "lazy evaluation." Take a look at how similar they appear to list comps:

LIST COMPREHENSION:

[expr for iter_var in iterable if cond_expr]


GENERATOR EXPRESSION:

(expr for iter_var in iterable if cond_expr)


Generator expressions do not make list comps obsolete. They are just a more memory-friendly construct, and on top of that, are a great use case of generators. We now present a set of generator expression examples, including a long-winded one at the end showing you how Python code has changed over the years.

Disk File Example

In the previous section on list comprehensions, we took a look at finding the total number of non-whitespace characters in a text file. In the final snippet of code, we showed you how to perform that in one line of code using a list comprehension. If that file became unwieldy due to size, it would become fairly unfriendly memory-wise because we would have to put together a very long list of word lengths.

Instead of creating that large list, we can use a generator expression to perform the summing. Instead of building up this long list, it will calculate individual lengths and feed it to the sum() function, which takes not just lists but also iterables like generator expressions. We can then shorten our example above to be even more optimal (code- and execution-wise):

>>> sum(len(word) for line in data for word in line.split()) 408


All we did was remove the enclosing list comprehension square brackets: Two bytes shorter and it saves memory ... very environmentally friendly!

Cross-Product Pairs Example

Generator expressions are like list comprehensions in that they are lazy, which is their main benefit. They are also great ways of dealing with other lists and generators, like rows and cols here:

rows = [1, 2, 3, 17] def cols():        # example of simple generator     yield 56     yield 2     yield 1


We do not need to create a new list. We can piece together things on the fly. Let us create a generator expression for rows and cols:

x_product_pairs = ((i, j) for i in rows for j in cols())


Now we can loop through x_product_pairs, and it will loop through rows and cols lazily:

>>> for pair in x_product_pairs: ...     print pair ... (1, 56) (1, 2) (1, 1) (2, 56) (2, 2) (2, 1) (3, 56) (3, 2) (3, 1) (17, 56) (17, 2) (17, 1)


Refactoring Example

Let us look at some evolutionary code via an example that finds the longest line in a file. In the old days, the following was acceptable for reading a file:

f = open('/etc/motd', 'r') longest = 0 while True:     linelen = len(f.readline().strip())     if not linelen: break     if linelen > longest:         longest = linelen f.close() return longest


Actually, this is not that old. If it were really old Python code, the Boolean constant TRue would be the integer one, and instead of using the string strip() method, you would be using the string module:

import string           :     len(string.strip(f.readline()))


Since that time, we realized that we could release the (file) resource sooner if we read all the lines at once. If this was a log file used by many processes, then it behooves us not to hold onto a (write) file handle for an extended period of time. Yes, our example is for read, but you get the idea. So the preferred way of reading in lines from a file changed slightly to reflect this preference:

f = open('/etc/motd', 'r') longest = 0 allLines = f.readlines() f.close() for line in allLines:     linelen = len(line.strip())     if linelen > longest:         longest = linelen return longest


List comps allow us to simplify our code a little bit more and give us the ability to do more processing before we get our set of lines. In the next snippet, in addition to reading in the lines from the file, we call the string strip() method immediately instead of waiting until later.

f = open('/etc/motd', 'r') longest = 0 allLines = [x.strip() for x in f.readlines()] f.close() for line in allLines:     linelen = len(line)     if linelen > longest:         longest = linelen return longest


Still, both examples above have a problem when dealing with a large file as readlines() reads in all its lines. When iterators came around, and files became their own iterators, readlines() no longer needed to be called. While we are at it, why can't we just make our data set the set of line lengths (instead of lines)? That way, we can use the max() built-in function to get the longest string length:

f = open('/etc/motd', 'r') allLineLens = [len(x.strip()) for x in f] f.close() return max(allLineLens)


The only problem here is that even though you are iterating over f line by line, the list comprehension itself needs all lines of the file read into memory in order to generate the list. Let us simplify our code even more: we will replace the list comp with a generator expression and move it inside the call to max() so that all of the complexity is on a single line:

f = open('/etc/motd', 'r') longest = max(len(x.strip()) for x in f) f.close() return longest


One more refactoring, which we are not as much fans of, is dropping the file mode (defaulting to read) and letting Python clean up the open file. It is not as bad as if it were a file open for write, however, but it does work:

return max(len(x.strip()) for x in open('/etc/motd'))


We have come a long way, baby. Note that even a one-liner is not obfuscated enough in Python to make it difficult to read. Generator expressions were added in Python 2.4, and you can read more about them in PEP 289.



Core Python Programming
Core Python Programming (2nd Edition)
ISBN: 0132269937
EAN: 2147483647
Year: 2004
Pages: 334
Authors: Wesley J Chun

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net