Using the Python Interface


Now that you have the PySQLite interface installed, let's look at how the interface is used from a Python script. In this section we'll see how to open and close databases and issue commands to SQLite.

Opening and Closing a Database

In order to use the PySQLite functionality, Python must first import the sqlite module. Then you can call the sqlite.connect() constructor method in order to open a database. The following is an example with Python running in interactive mode:

 $ python Python 2.3.3 (#1, May  7 2004, 10:31:40) [GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sqlite >>> cx = sqlite.connect("pydb") >>> 

In its simplest form, the constructor takes a single database name parameter. The filename given can include a relative or absolute path and will be read from the current working directory if no path is specified. If that file exists, that database is opened; otherwise, a new empty database is created with the given filename. A connection object is returned, assigned to cx in this example.

If you followed the preceding example and no error occurred, exit Python and you'll see that the file pydb has been created in the current directory.

To close the connection to a database, simply call the .close() method on the connection object.

 >>> cx.close() 

After the connection has been closed, cx becomes unusable and an Error exception will be raised if you attempt to perform any operation on it.

Executing SQL Commands

To perform any database operation you must first create a cursor object using a valid database connection object. SQLite does not use the concept of cursors internally; however, PySQLite emulates this behavior in its interface in order to provide compliance with the Python Database API Specification. This is done using the .cursor() method, as follows:

 >>> cx = sqlite.connect("pydb") >>> cu = cx.cursor() 

Note

You can actually create as many cursor objects as you like on the same database connection, although there is usually little point in doing so. Separate cursors do not isolate datachanges made in one cursor are immediately reflected in other cursors created from the same connection.


With a cursor object, you can use the .execute() method to execute an SQL command. Listing 9.1 shows a simple Python script that opens the pydb database and issues a CREATE TABLE statement to create the contacts table.

Listing 9.1. Executing a CREATE TABLE Statement Using the Python Interface
 import sqlite sql = """ CREATE TABLE contacts (   id INTEGER PRIMARY KEY,   first_name CHAR,   last_name CHAR,   email CHAR UNIQUE ) """ cx = sqlite.connect("pydb") cu = cx.cursor() cu.execute(sql) cx.commit() cx.close() 

Run this script to create the table, and verify that it has been created using the sqlite tool.

 $ python listing9.1.py $ sqlite pydb SQLite version 2.8.15 Enter ".help" for instructions sqlite> .tables contacts sqlite> .schema CREATE TABLE contacts (   id INTEGER PRIMARY KEY,   first_name CHAR,   last_name CHAR,   email CHAR UNIQUE ); 

Error Handling

If for any reason the database file cannot be opened or the query cannot be executed, Python will raise an error. The following output was generated when an attempt was made to create a new SQLite database in a write-protected directory:

 >>> cx = sqlite.connect("pydb") Traceback (most recent call last):   File "<stdin>", line 1, in ?   File "/usr/lib/python2.3/site-packages/sqlite/__init__.py", line 61,   in connect     return Connection(*args, **kwargs)   File "/usr/lib/python2.3/site-packages/sqlite/main.py", line 445,    in __init__     self.db = _sqlite.connect(database, mode) _sqlite.DatabaseError: unable to open database: pydb 

This is pretty ugly, so you'll probably want to trap any such errors and handle them in a nice way. The StandardError error class can be used to trap an error from PySQLite, but the sqlite package also includes its own Error class, with subclasses for more specific error detection.

Listing 9.2 performs the same CREATE TABLE statement as before, but uses the sqlite.Error exception so that the error is handled within the script.

Listing 9.2. Using sqlite.Error to Trap Errors
 import sys; import sqlite; sql = """ CREATE TABLE contacts (   id INTEGER PRIMARY KEY,   first_name CHAR,   last_name CHAR,   email CHAR UNIQUE ) """ try:         cx = sqlite.connect("pydb") except sqlite.Error, errmsg:         print "Could not open the database file: " + str(errmsg)         sys.exit() try:         cu = cx.cursor()         cu.execute(sql)         cx.commit() except sqlite.Error, errmsg:         print "Could not execute the query: " + str(errmsg)         sys.exit() cx.close() 

When this script is run from a location in which it does not have permission to open the database file, the output will look like the following:

 $ ./listing9.2.py Could not open the database file: unable to open database: pydb 

If file permissions are okay but you run the script after the table has already been created, this will be the error produced:

 $ ./listing9.2.py Could not execute the query: table contacts already exists 

PySQLite provides the subclasses of Error specified by the Database API specification in order to handle specific types of error message. The different error classes are shown in Table 9.1.

Table 9.1. Error Subclasses Implemented by PySQLite

InterfaceError

Error returned by the database interface.

DatabaseError

Error returned by the database engine when a database cannot be opened, or when SQLite returns SQLITE_ERROR, SQLITE_READONLY, SQLITE_CORRUPT, SQLITE_FULL, SQLITE_CANTOPEN, SQLITE_SCHEMA, or an unknown error code.

DataError

Indicates a problem with data values when SQLite returns SQLITE_TOOBIG, for example division by zero.

OperationalError

Error related to the operation of the database that is out of the developer's control, when SQLite returns SQLITE_PERM, SQLITE_ABORT, SQLITE_BUSY, SQLITE_LOCKED, SQLITE_INTERRUPT, SQLITE_IOERR, or SQLITE_PROTOCOL.

IntegrityError

Error due to a problem with the referential integrity of the database, when SQLite has returned SQLITE_CONSTRAINT or SQLITE_MISMATCH.

InternalError

An internal error in the database engine, raised when SQLite returns SQLITE_INTERNAL, SQLITE_NOTFOUND, or SQLITE_EMPTY.

ProgrammingError

Indicates a programming error, caused by an error in a user-defined function or an operation being attempted on a closed database.

NotSupportedError

Raised if a non-supported operation is attempted.


Connection Parameters

Several optional arguments can be supplied to sqlite.connect() after the name of the database file. The full prototype is

 def connect (database, mode=0755, converters={}, autocommit=0,              encoding=None, timeout=None, command_logfile=None) 

The mode argument is currently not used but was intended to allow you to supply a file mode in which the database will be opened. Mode 0644 could be used, for instance, to open a database read-only.

The converters argument can be used to supply a mapping from SQL to Python data types. We will discuss the issues surrounding Python and SQLite data types in the section "Working with Data Types."

The autocommit feature is off by default but can be turned on for a connection object by setting this argument to 1. We will discuss this feature later in this chapter in the "Transactions" section.

Use the encoding argument to tell PySQLite what encoding method you want to use to save Unicode strings to the database. For example, pass the value utf-8 as the encoding argument to use UTF-8 encoding.

The timeout argument takes a value in seconds, which is the amount of time you want SQLite to continue trying to obtain a lock on the database file before giving up with a DatabaseError exception. The default value is None, meaning that no special action will be taken, and only one attempt will be made to open the database.

You can use the command_logfile argument to specify a file object that all the raw SQL statements passed to the SQLite engine through PySQLite will be logged to.

Using Commands That Change the Database

The script in Listing 9.3 prompts for three pieces of user input before generating an SQL statement to insert a new record into the contacts table.

Listing 9.3. Inserting a Database Record Using Data from User Input
 import sys import sqlite try:         cx = sqlite.connect("pydb") except sqlite.Error, errmsg:         print "Could not open the database file: " + str(errmsg)         sys.exit() fn = raw_input("Enter first name : ") ln = raw_input("Enter last name : ") em = raw_input("Enter email address: ") sql = """ INSERT INTO contacts (first_name, last_name, email) VALUES ('%s', '%s', '%s') """ % (fn, ln, em) try:         cu = cx.cursor()         cu.execute(sql)         cx.commit() except sqlite.Error, errmsg:         print "Could not execute the query: " + str(errmsg)         sys.exit() cx.close() 

The three calls to raw_input() gather information from standard input that we'll use to build the SQL INSERT statement. The first name, last name, and email are stored to fn, ln, and em respectively.

 fn = raw_input("Enter first name : ") ln = raw_input("Enter last name : ") em = raw_input("Enter email address: ") 

We then use these values to build up a string, sql, which contains the command to be passed to SQLite.

 sql = """ INSERT INTO contacts (first_name, last_name, email) VALUES ('%s', '%s', '%s') """ % (fn, ln, em) 

The record is then inserted when we open a cursor on the SQLite connection and execute the SQL statement. The commit() method is called to ensure that the record is saved before closing the database connection.

Executing the script to add a record to the contacts table looks like this:

 $ python listing9.3.py Enter first name : Chris Enter last name : Newman Enter email address: chris@lightwood.net 

We can then verify that the record has been inserted using the sqlite tool.

 $ sqlite pydb sqlite> SELECT * FROM contacts; id    first_name  last_name   email ----  ----------  ----------  -------------------- 1     Chris       Newman      chris@lightwood.net 

Care must be taken to ensure that records read from user input are safe for insertion into SQLite. The following example shows how a name containing an apostrophe character can break the script in Listing 9.3:

 Enter first name : Paddy Enter last name : O'Brien Enter email address: paddy@irish.com Could not execute the query: unrecognized token: "@" 

If you were to output the value of sql before it was executed, it would look like this:

 INSERT INTO contacts (first_name, last_name, email) VALUES ('Paddy', 'O'Brien', 'paddy@irish.com') 

SQLite is unable to determine that the apostrophe in O'Brien is not the closing quote around the second data value. It considers only values within pairs of single quotes to be the data items.

In this case, the values to be inserted will be seen as 'Paddy', 'O' and ', ', and so the SQL statement is nonsense. It is the @ symbol that first causes a tokenizer error and the script to raise a DatabaseError exception.

Quotes are delimited in SQLite by another quote characteryou have to use two single quotes wherever you want one to appear. The preceding SQL statement would work if it were written as follows:

 INSERT INTO contacts (first_name, last_name, email) VALUES ('Paddy', 'O''Brien', 'paddy@irish.com') 

However, to save having to validate user input and delimit apostrophes yourself, the execute() method can be used to substitute values into the query in a safe way. Rather than substituting values into sql, consider the following changes.

If we alter the SQL statement so that values are not enclosed in quotes at all, we can make PySQLite do the hard work of determining whether values require quoting and, if necessary, how special characters need delimiting.

 sql = """ INSERT INTO contacts (first_name, last_name, email) VALUES (%s, %s, %s) """ 

Then we execute the statement by passing a list of substitution values as the second argument to .execute().

 cu.execute(sql, (fn, ln, em) ) 

The modified version can be seen in full in Listing 9.4.

Listing 9.4. Using .execute() to Substitute Values into a Query
 import sys import sqlite try:         cx = sqlite.connect("pydb") except sqlite.Error, errmsg:         print "Could not open the database file: " + str(errmsg)         sys.exit() fn = raw_input("Enter first name : ") ln = raw_input("Enter last name : ") em = raw_input("Enter email address: ") sql = """ INSERT INTO contacts (first_name, last_name, email) VALUES (%s, %s, %s) """ try:         cu = cx.cursor()         cu.execute(sql, (fn, ln, em) )         cx.commit() except sqlite.Error, errmsg:         print "Could not execute the query: " + str(errmsg)         sys.exit() cx.close() 

We can now insert records using strings that contain apostrophes into SQLite without having to worry about delimiting quotes first.

 $ python listing9.4.py Enter first name : Paddy Enter last name : O'Brien Enter email address: paddy@irish.com $ sqlite pydb sqlite> SELECT * FROM contacts; id    first_name  last_name   email ----  ----------  ----------  -------------------- 1     Chris       Newman      chris@lightwood.net 2     Paddy       O'Brien     paddy@irish.com 

When a database operation is performed that affects rows in the databasenamely UPDATE and DELETE operationsyou can find the number of rows affected by the command by using the .rowcount attribute of the cursor object.

In Listing 9.5 we perform an UPDATE command that changes the case of the first_name field in contacts to uppercase across all database rows. To show how many rows were actually affected, the .rowcount attribute is displayed.

Listing 9.5. Finding the Number of Rows Affected by an SQL Statement
 import sys import sqlite try:         cx = sqlite.connect("pydb") except sqlite.Error, errmsg:         print "Could not open the database file: " + str(errmsg)         sys.exit() sql = "UPDATE contacts SET first_name = upper(first_name)" try:         cu = cx.cursor()         cu.execute(sql)         cx.commit() except sqlite.Error, errmsg:         print "Could not execute the query: " + str(errmsg)         sys.exit() print str(cu.rowcount) + " row(s) were affected" cx.close() 

When you execute this script, every row in the database is updated and the number of rows is printed to screen.

 $ python listing9.5.py 2 row(s) were affected 

Note

The number of affected rows returned by .rowcount is the total number of rows that the UPDATE or DELETE statement considersin other words, the number of rows matching the WHERE clause, if one is given. If a row is updated to the same value it held before the statement was executed, it is still counted by .rowcount even though no change to the data has been made.


After a SELECT statement, .rowcount also contains the number of rows returned by the query. In fact it returns the number of rows returned or affected by the WHERE clause in any SQL statement.

Transactions

PySQLite contains some built-in optimization to make the best use possible of SQLite's transactions. When a number of INSERT, UPDATE, or DELETE statements are issued in succession in a script, they are all executed within a single transaction.

The default behavior, if the SQL commands were entered through sqlite or via the underlying C/C++ interface, would be to open an implicit transaction for each statement and commit each transaction before moving on to the next command. There is significant performance benefit to be gained from grouping multiple statements into a single transaction, and so PySQLite attempts to handle the grouping automatically.

Whenever the .execute() method is passed an SQL statement that changes the database, PySQLite automatically sends a BEGIN TRANSACTION before that command is executed if a transaction has not already been started. A .commit() therefore needs to be issued before the changes will be saved to the database.

Note

If you look back to Listing 9.1, you'll see that we included a .commit() instruction before closing the database. Note that calling .close() while a transaction is open will roll back any unsaved changes.


PySQLite opens a transaction whenever necessary but is optimized to leave it as late as possible before the BEGIN TRANSACTION command is sent so that the database file is locked for as little time as possible. It is therefore the type of statement processed by .execute() that determines whether a transaction is started.

SELECT statements do not need to be run inside a transaction but if the result of a query is subsequently used in another statement that does change the database, you should make sure that both SQL commands are executed within the same transaction.

This is the default behavior of PySQLite, although it can be changed if desired by turning on autocommit mode by using a value of 1 for the autocommit argument in sqlite.connect().

 cx = sqlite.connect("pydb", 755, {}, 1) 

Because autocommit is the fourth argument, the mode and converters arguments must be supplied. In this example we have simply given them their default values.

An alternative syntax that is more compact is given in the following example. PySQLite recognizes that the second parameter is not a file mode and applies the autocommit setting:

 cx = sqlite.connect("pydb", autocommit=1); 

When autocommit is turned on, PySQLite will not attempt to do anything clever with transactions. An INSERT, UPDATE, or DELETE command issued outside of a transaction will be executed within its own implicit transaction. If you want to start a transaction manually, you must pass BEGIN TRANSACTION to the .execute() method.

Fetching Records from the Database

Let's take a look at how the result of a SELECT query is processed using PySQLite. The query is submitted through .execute() just as other types of SQL statement are executed.

To fetch the data retrieved by the SELECT, call .fetchone() on the cursor object. Data is returned one row at a time each time .fetchone() is called, and returns None if there are no more rows to be fetched.

Listing 9.6 shows a simple example that selects first_name and last_name from the contacts table. Each data record is fetched into row, which is then used to output the two column values to screen.

Listing 9.6. Fetching Selected Data Using .fetchone()
 import sys import sqlite try:         cx = sqlite.connect("pydb") except sqlite.Error, errmsg:         print "Could not open the database file: " + str(errmsg)         sys.exit() sql = "SELECT first_name, last_name FROM contacts" try:         cu = cx.cursor()         cu.execute(sql) except sqlite.Error, errmsg:         print "Could not execute the query: " + str(errmsg)         sys.exit() row = cu.fetchone() while row:         print row.first_name + " " + row.last_name         row = cu.fetchone() cx.close() 

The while loop simply prints the contents of row and, assuming there are more records to come, fetches the next row and repeats. The names of the selected columns are also the names of the elements of rowwe reference row.first_name and row.last_name respectively.

To comply with the Python Database API Specification, fetched columns can also be referenced using the index number of their position in the list, starting at zero. Take this example:

 print row.first_name + " " + row.last_name 

This could also be written as

 print row[0] + " " + row[1] 

The following is an example of output from running Listing 9.6:

 $ python listing9.6.py CHRIS Newman PADDY O'Brien Bill Williams 

Complimentary to .fetchone(), PySQLite also implements the .fetchmany() and .fetchall() methods to retrieve the multiple rows or even the entire data set returned by the SELECT all at once.

The .fetchmany() method takes a single argument to specify the number of rows to fetch. If the value is smaller than the number of rows remaining, the cursor position is advanced so that the next call to .fetchone() or .fetchmany() will begin at the next row in sequence.

The .fetchall() method, as you have probably guessed, is equivalent to calling .fetchmany() with the argument equal to the total number of rows returned by the query. If .fetchone() or .fetchmany() has been called on that cursor object previously, only the remaining rows will be fetched.

Listing 9.7 performs the same query on the contacts table using .fetchmany() to grab the result two records at a time.

Listing 9.7. Fetching Several Rows at a Time Using .fetchmany()
 import sys import sqlite try:         cx = sqlite.connect("pydb", 755) except sqlite.Error, errmsg:         print "Could not open the database file: " + str(errmsg)         sys.exit() sql = "SELECT first_name, last_name FROM contacts" try:         cu = cx.cursor()         cu.execute(sql) except sqlite.Error, errmsg:         print "Could not execute the query: " + str(errmsg)         sys.exit() print "Query returns " + str(cu.rowcount) + " row(s)" rows = cu.fetchmany(2) while rows:         for row in rows:                 print row.first_name + " " + row.last_name         rows = cu.fetchmany(2) cx.close() 

This time, we use two nested loops in order to display all the records. The outer while loop fetches rows two at a time, and the inner for loop iterates each of the fetched rows into a single row.

The .rowcount attribute for SELECT statements contains the total number of rows returned by the query. We output this value in Listing 9.7 to show the number of records that will be fetched.

 $ python listing9.7.py Query returns 3 row(s) CHRIS Newman PADDY O'Brien Bill Williams 

If no value is given, the number of rows is taken from the .arraysize attribute, which will contain the optimum number of rows to fetch at a time.

For example, the following alternative could have been used in Listing 9.7:

 cu.arraysize = 2 rows = cu.fetchmany() 

Working with Data Types

As we saw in Chapter 2, "Working with Data," SQLite is typeless. However, Python uses strong data typing and although most of the time it is able to infer what type it needs to convert a string from SQLite to, sometimes you need to give a mapping to ensure that a data value is converted to the correct type in Python.

Although the column types given to SQLite in the CREATE TABLE statement are mostly ignoredother than to determine whether numeric or text-based sorting is performed on that columnPySQLite will look at the column type to try to determine how a column should be converted to a Python data type.

Table 9.2 shows the default mappings, which should be kept in mind when creating the schema for a database that is to be interfaced from PySQLite.

Table 9.2. Default Mappings of SQLite Columns to Python Data Types

Column Type

Converter Function

CHAR, TEXT

str()

INT

int()

FLOAT, NUMERIC, NUMBER, DECIMAL, REAL, and DOUBLE

float()

UNICODE

UnicodeConvertor(self.encoding)

BINARY, BLOB

sqlite.decode()

DATE

DateTime.DateFrom()

TIME

DateTime.TimeFrom()

TIMESTAMP

DateTime.DateTimeFrom()

INTERVAL

DateTime.DateTimeDeltaFrom()


Note

The column types shown in Table 9.2 can be substrings of the actual type specified in the table schema. For example, an INTEGER column will also be converted using int(). Column types are not case-sensitive. The closest match found is used and sometimes you must be careful; for example, INT is closer to INTEGER than INTERVAL. To be sure of using the correct column type, always give the name in full.


To override the default mapping converter functions, the converter name can be passed to .execute() before a query is executed, in the following format:

 cu.execute("-- types type1, type2, ...") 

In this example, col1, col2, and so on are the converter functions to use for each column in turn. For example, to tell SQLite to convert the first two columns returned from the next query executed as string and integer respectively, you would use the following command:

 cu.execute("-- types str, int") 

The following built-in types can be passed in a -- types command:

  • str

  • int

  • long

  • float

  • unicode

  • binary

  • date

  • time

  • timestamp

  • interval

Adding New SQL Functions

Though not part of the Python Database API Specification, PySQLite provides an interface to SQLite's powerful user-defined function capabilities for both simple and aggregating functions.

This feature allows you to create your own functions in Python that can be registered as SQLite functions and therefore called within an SQL statement. There is therefore no need to know C in order to extend the SQL language when using PySQLite.

User-defined functions are registered through the .create_function() method on a database connection object. Functions are available in SQL only for the duration of that connectionthey are not saved to the database itself or made available to other connection objects within the same script.

In Listing 9.8 we create a function, altcaps(), in Python and then register it as an SQL function. The function takes a single string argument and returns it with alternating capitalization. This function is fairly useless but should serve as an example of how user-defined functions are implemented via PySQLite.

Listing 9.8. Creating a User-Defined SQL Function
 import sys import string import sqlite def altcaps (word):         altstr = ""         for i in range(len(word)):                 if i%2:                         altstr += string.lower(word[i])                 else:                         altstr += string.upper(word[i])         return altstr try:         cx = sqlite.connect("pydb") except sqlite.Error, errmsg:         print "Could not open the database file: " + str(errmsg)         sys.exit() cx.create_function("altcaps", 1, altcaps) sql = "SELECT altcaps(first_name) FROM contacts" try:         cu = cx.cursor()         cu.execute('-- types str')         cu.execute(sql) except sqlite.Error, errmsg:         print "Could not execute the query: " + str(errmsg)         sys.exit() row = cu.fetchone() while row:         print row[0]         row = cu.fetchone() cx.close() 

The altcaps() function uses range() to create a loop that will run once for each character in word.

 for i in range(len(word)): 

We then use the mod operator (%) to test whether the loop index is odd or even. A new string altstr is built up a character at a time, with every odd-numbered character added in uppercase and every even position in lowercase.

 if i%2:        altstr += string.lower(word[i]) else:        altstr += string.upper(word[i]) 

This function is then registered in SQLite:

 cx.create_function("altcaps", 1, altcaps) 

There are three arguments to .create_function()the name of the SQL function, the number of arguments, and a reference to the Python function that is to be executed when the SQL function is called.

It is important to tell PySQLite what data type to expect to be returned from a user-defined function; otherwise, float will be assumed. We set a string type for this function with the following line :

 cu.execute('-- types str') 

Note

PySQLite assumes that user-defined functions return a NUMERIC unless told otherwise. Therefore the default converter function is float().


Executing Listing 9.8 produces the following output, consisting of the first_name values from contacts with alternating upper- and lowercase letters:

 $ python listing9.8.py ChRiS PaDdY BiLl 

In this example we gave the SQL function the same name as the Python function, although this is not a requirement. In Listing 9.9 we reference Python's random() function from the standard library and create an SQL function called pyrandom() so as not to conflict with SQLite's built-in random() function.

Note

If you use .create_function() to create an SQL function with the same name as an existing function, the user-defined function will simply overload the existing function. No error or warning is raised.


Listing 9.9. Registering a Python Library Function in SQL
 import sys import random import sqlite try:         cx = sqlite.connect("pydb") except sqlite.Error, errmsg:         print "Could not open the database file: " + str(errmsg)         sys.exit() cx.create_function("pyrandom", 0, random.random) sql = "SELECT pyrandom()"; try:         cu = cx.cursor()         cu.execute(sql) except sqlite.Error, errmsg:         print "Could not execute the query: " + str(errmsg)         sys.exit() row = cu.fetchone() print row[0] cx.close()  

Adding Aggregating Functions

PySQLite also allows you to add custom aggregating functions to SQL. An aggregating function is one that applies a calculation across many rows at a timeeither the entire dataset returned by a query or rows grouped together by a GROUP BY clause.

Aggregating functions require a class to be defined containing four functions:

  • A step function to be executed for each row, named step()

  • A function to be executed after each row has been processed, named finalize()

  • A function to reset data stored in the class, named reset() when a new aggregate calculation begins

We will create a class definition that can be used to register an aggregating function to compute the median of a set of numbers. First, we need some data to try the function on.

Listing 9.10 creates a simple table, numbers, and loops from 1 to 9 inserting the square of each value. Because the sequence of square numbers does not increase proportionally, the mean and median averages will be different.

Listing 9.10. Creating a Table Containing a Sequence of Square Numbers
 import sys import sqlite try:         cx = sqlite.connect("pydb") except sqlite.Error, errmsg:         print "Could not open the database file: " + str(errmsg)         sys.exit() cu = cx.cursor() sql = "CREATE TABLE numbers (num INTEGER)" cu.execute(sql) for i in range(1,10):         sql = "INSERT INTO numbers VALUES (%d)" % (i*i)         cu.execute(sql) cx.commit() cx.close() 

The sequence inserted looks like this:

 1 4 9 16 25 36 49 64 81 

The median is a value such that half the values in the set are above the median and half are below it. From looking at the sequence we can see that the median will be 25.

Listing 9.11 contains the class definition for our median() aggregating function and executes an SQL statement to find both the median and the meanusing the SQLite built-in AVG() functionon the sequence of square numbers.

Listing 9.11. Creating a Table Containing the Mean and Median Averages
 import sys import sqlite class median:         def __init__(self):                 self.reset()         def reset(self):                 self.values = []         def step(self, val):                 self.values.append(float(val))         def finalize(self):                 self.values.sort()                 n = len(self.values)                 if n%2 == 1:                         return self.values[n//2]                 else:                         return (self.values[n//2-1] + self.values[n//2])/2 try:         cx = sqlite.connect("pydb") except sqlite.Error, errmsg:         print "Could not open the database file: " + str(errmsg)         sys.exit() cx.create_aggregate("median", 1, median) sql = "SELECT median(num), avg(num) FROM numbers"; try:         cu = cx.cursor()         cu.execute(sql) except sqlite.Error, errmsg:         print "Could not execute the query: " + str(errmsg)         sys.exit() row = cu.fetchone() print "Median is " + str(row[0]) print "Mean is " + str(row[1]) cx.close() 

The class contains the following member functions:

 def __init__(self):         self.reset() 

The constructor calls reset() to initialize the local array in which the values from which the median is calculated will be stored.

 def reset(self):         self.values = [] 

The step() function simply appends each value to the local array. Nothing more is required here; the hard work is done in finalize().

 def step(self, val):         self.values.append(float(val)) 

First of all we sort the array into numerical order.

 def finalize(self):         self.values.sort() 

The rule to find the median is slightly different depending on whether there is an odd or even number of elements.

If there are an odd number of elements, the median is simply the middle value from the list. The index of the midpoint of the array is half the length of the array, rounded down to the nearest whole number.

 n = len(self.values) if n%2 == 1:         return self.values[n//2] 

If there is an even number of elements, the median is the midpoint between the two middle values.

 else:         return (self.values[n//2-1] + self.values[n//2])/2 

The aggregating function is registered in SQLite using .create_aggregate().

 cx.create_aggregate("median", 1, median) 

The syntax is similar to .create_function() except that the argument is the name of the class that contains the reset(), step(), and finalize() methods required to implement that aggregate.

Listing 9.11 executes the following query to find both the median and mean averages from the test data:

 SELECT median(num), avg(num) FROM numbers 

The following output is produced to show that the median and mean are indeed different values:

 $ python listing9.11.py Median is 25.0 Mean is 31.6666666667 



    SQLite
    SQLite
    ISBN: 067232685X
    EAN: 2147483647
    Year: 2004
    Pages: 118
    Authors: Chris Newman

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net