SQL Database Interfaces | Databases and Persistence

Shelves are a powerful tool; they allow scripts to throw Python objects on a keyed-access file and load them back later in a single step. They aren't quite a full-blown database system, though; objects (records) are accessed with a single key, and there is no notion of SQL queries. It's as if shelves were a database with a single index and no other query-processing support.

Although it's possible to build a multiple-index interface to store data with multiple shelves, it's not a trivial task and requires manually coded extensions (see the dbaseindexed module in the PyErrata system near the end of Chapter 14 for a prototype of this concept).

For industrial-strength persistence needs, Python also supports relational database systems. Today, there are freely available interfaces that let Python scripts utilize all common database systems, both free and commercial: Oracle, Sybase, Informix, mSql, MySql, Interbase, Postgres, ODBC, and more. In addition, the Python community has defined a database API specification that works portably with a variety of underlying database packages. Scripts written for this API can be migrated to different database vendor packages with minimal or no source code changes.

16.6.1 Interface Overview

Unlike all the persistence topics presented in this chapter and book so far, though, SQL databases are optional extensions that are not part of Python itself, and you need to know SQL to make the most sense of their interfaces. Because I don't have space to teach SQL in this text, this section instead gives a brief overview of the API; please consult other SQL references and the database API resources mentioned in the next section for more details.

The good news is that you can access SQL databases from Python, through a straightforward and portable model. The Python database API specification defines an interface for communicating with underlying database systems from Python scripts. Vendor-specific database interfaces for Python may or may not conform to this API completely, but all database extensions for Python seem minor variations on a theme. SQL databases in Python are grounded on a few concepts:

Connection objects represent a connection to a database, are the interface to rollback and commit operations, and generate cursor objects.
Cursor objects represent a single SQL statement submitted as a string, and can be used to step through SQL statement results.
Query results of SQL select statements are returned to scripts as Python lists of Python tuples, representing database tables of rows. Within these row tuples, field values are normal Python objects such as strings, integers, and floats, or special types (e.g., [('bob',38), ('emily',37)]).

Beyond this, the API defines a standard set of database exception types, special database type object constructors (e.g., nulls and dates), and informational calls.

For instance, to establish a database connection under the Python API-compliant Oracle interface available from Digital Creations, install the extension and then run a line of this form:

connobj = Connect("user/password@system")

The string argument's contents may vary per database and vendor, but they generally contain what you provide to log in to your database system. Once you have a connection object, there a variety of things you can do with it, including:

connobj.close( ) close connection now (not at object __del__ time)
connobj.commit( ) commit any pending transactions to the database
connobj.rollback( ) roll database back to start of pending transactions
connobj.getSource(proc) fetch stored procedure's code

But one of the most useful things to do with a connection object is to generate a cursor object:

cursobj = connobj.cursor( ) return a new cursor object for running SQL

Cursor objects have a set of methods, too (e.g., close to close the cursor before its destructor runs), but the most important may be this one:

cursobj.execute(sqlstring [, parm, parm,...]) run SQL query or command string

The execute method can be used to run a variety of SQL statement strings:

DDL definition statements (e.g., CREATE TABLE)
DML modification statements (e.g., UPDATE or INSERT)
DQL query statements (e.g., SELECT)

For DML statements, execute returns the number of rows effected. For DQL query statements, a None is returned and you must call one of the fetch methods to complete the operation:

tuple = cursobj.fetchone( ) fetch next row of a query result

listoftuple = cursobj.fetchmany([size]) fetch next set of rows of query result

listoftuple = cursobj.fetchall( ) fetch all remaining rows of the result

And once you've received fetch method results, table information is processed using normal Python list and tuple object operations (e.g., you can step through the tuples in a fetchall result list with a simple for loop). Most Python database interfaces also allow you to provide values to be passed to SQL statement strings, by providing targets and a tuple of parameters. For instance:

query = 'SELECT name, shoesize FROM spam WHERE job = ? AND age = ?' 
cursobj.execute(query, (value1, value2))
results = cursobj.fetchall( )
for row in results: ...

In this event, the database interface utilizes prepared statements (an optimization and convenience) and correctly passes the parameters to the database regardless of their Python types. The notation used to code targets in the query string may vary in some database interfaces (e.g., ":p1" and ":p2", rather than "?" and "?"); in any event, this is not the same as Python's "%" string formatting operator.

Finally, if your database supports stored procedures, you can generally call them with the callproc method, or by passing an SQL CALL or EXEC statement string to the execute method; use a fetch variant to retrieve its results.

16.6.2 Resources

There is more to database interfaces than the basics just mentioned, but additional API documentation is readily available on the Web. Perhaps the best resource for information about database extensions today is the home page of the Python database special interest group (SIG). Go to http://www.python.org, click on the SIGs link near the top, and navigate to the database group's page (or go straight to http://www.python.org/sigs/db-sig, the page's current address at the time of writing). There, you'll find API documentation, links to database vendor-specific extension modules, and more.

While you're at python.org, be sure to also explore the Gadfly database package -- a Python-specific SQL-based database extension, which sports wide portability, socket connections for client/server modes, and more. Gadfly loads data into memory, so it is currently somewhat limited in scope. On the other hand, it is ideal for prototyping database applications -- you can postpone cutting a check to a vendor until it's time to scale up for deployment. Moreover, Gadfly is suitable by itself for a variety of applications -- not every system needs large data stores, but many can benefit from the power of SQL.

Introducing Python

Part I: System Interfaces