Recipe19.13.Fetching Large Record Sets from a Database with a Generator


Recipe 19.13. Fetching Large Record Sets from a Database with a Generator

Credit: Christopher Prinos

Problem

You want to fetch a result set from a database (using the Python DB API) and easily iterate over each record in the result set. However, you don't want to use the DB cursor's method fetchall: it could consume a lot of memory and would force you to wait until the whole result set comes back before you can start iterating.

Solution

A generator is the ideal solution to this problem:

def fetchsome(cursor, arraysize=1000):     ''' A generator that simplifies the use of fetchmany '''     while True:         results = cursor.fetchmany(arraysize)         if not results: break         for result in results:             yield result

Discussion

In applications that use the Python DB API, you often see code that goes somewhat like (where cursor is a DB API cursor object):

cursor.execute('select * from HUGE_TABLE') for result in cursor.fetchall( ):     doSomethingWith(result)

This simple approach is "just" fine, as long as fetchall returns a small result set, but it does not work very well if the query result is very large. A large result set can take a long time to return. Also, cursor.fetchall( ) needs to allocate enough memory to store the entire result set in memory at once. Further, with this simple approach, the doSomethingWith function isn't going to get called until the entire query's result finishes coming over from the database to our program.

An alternative approach is to rely on the cursor.fetchone method:

for result in iter(cursor.fetchone, None):     doSomethingWith(result)

However, this alternative approach does not allow the database to optimize the fetching process: most databases can exhibit better efficiency when returning multiple records for a single query, rather than returning records one at a time as fetchone requires.

To let your applications obtain greater efficiency than fetchone allows, without the risks of unbounded memory consumption and delay connected to the use of fetchall, Python's DB API's cursors also have a fetchmany method. However, the direct use of fetchmany makes your iterations somewhat more complicated than the simple for statements such as those just shown. For example:

while True:     results = cursor.fetchmany(1000)     if not results: break     for result in results:         doSomethingWith(result)

Python's generators are a great way to encapsulate complicated iteration logic so that application code can just about always loop with simple for statements. With this recipe's fetchsome generator, you get the same efficiencies and safeguards as with the native use of the fetchmany method in the preceding snippet but with the same crystal-clear simplicity as in the snippets that used either fetchall or fetchone, namely:

for result in fetchsome(cursor):     doSomethingWith(result)

By default, fetchsome fetches up to 1,000 records at a time, but you can change that number, depending on your requirements. Optimal values can depend on schema, database type, choice of Python DB API module. In general, you're best advised to experiment with a few different values in your specific settings if you need to optimize this specific aspect. (Such experimentation is often a good idea for any optimization task.)

This recipe is clearly an example of a more general case: a subsequence unbuncher generator that you can use when you have a sequence of subsequences (each subsequence being obtained through some call, and the end of the whole sequence being indicated by an empty subsequence) and want to flatten it into a simple, linear sequence of items. You can think of this unbunching task as the reverse of the sequence-bunching tasks covered earlier in Recipe 19.10 and Recipe 19.11, or as a simpler variant of the sequence-flattening task covered in Recipe 4.6. A generator for unbunching might be:

def unbunch(next_subseq, *args):     ''' un-bunch a sequence of subsequences into a linear sequence '''     while True:         subseq = next_subseq(*args)         if not subseq: break         for item in subseq:             yield item

As you can see, the structure of unbunch is basically identical to that of the recipe's fetchsome. Usage would also be just about the same:

for result in unbunch(cursor.fetchmany, 1000):     doSomethingWith(result)

However, while it is important and instructive to consider this kind of generalization, when you're writing applications you're often better off using specific generators that directly deal with your application's specific needs. In this case, for example, calling fetchsome(cursor) is more obvious and direct than calling unbunch(cursor.fetchmany, 1000), and fetchsome usefully hides the usage of fetchmany as well as the specific choice of 1,000 as the subsequence size to fetch at each step.

See Also

Recipe 19.10; Recipe 19.11; Recipe 4.6; Python's DB API is covered in Chapter 7 and in Python in a Nutshell.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net