Recipe 7.8. Using the Berkeley DB Database
Credit: Farhad Fouladi
You want to persist some data, exploiting the simplicity and good performance of the Berkeley DB database library.
If you have previously installed Berkeley DB on your machine, the Python Standard Library comes with package bsddb (and optionally bsddb3, to access Berkeley DB release 3.2 databases) to interface your Python code with Berkeley DB. To get either bsddb or, lacking it, bsddb3, use a try/except on import:
try: from bsddb import db # first try release 4 except ImportError: from bsddb3 import db # not there, try release 3 instead print db.DB_VERSION_STRING # emits, e.g: Sleepycat Software: Berkeley DB 4.1.25: (December 19, 2002)
To create a database, instantiate a db.DB object, then call its method open with appropriate parameters, such as:
adb = db.DB( ) adb.open('db_filename', dbtype=db.DB_HASH, flags=db.DB_CREATE)
db.DB_HASH is just one of several access methods you may choose when you create a databasea popular alternative is db.DB_BTREE, to use B+tree access (handy if you need to get records in sorted order). You may make an in-memory database, without an underlying file for persistence, by passing None instead of a filename as the first argument to the open method.
Once you have an open instance of db.DB, you can add records, each composed of two strings, key and data:
for i, w in enumerate('some words for example'.split( )): adb.put(w, str(i))
You can access records via a cursor on the database:
def irecords(curs): record = curs.first( ) while record: yield record record = curs.next( ) for key, data in irecords(adb.cursor( )): print 'key=%r, data=%r' % (key, data) # emits (the order may vary): # key='some', data='0' # key='example', data='3' # key='words', data='1' # key='for', data='2'
When you're done, you close the database:
At any future time, in the same or another Python program, you can reopen the database by giving just its filename as the argument to the open method of a newly created db.DB instance:
the_same_db = db.DB( ) the_same_db.open('db_filename')
and work on it again in the same ways:
the_same_db.put('skidoo', '23') # add a record the_same_db.put('words', 'sweet') # replace a record for key, data in irecords(the_same_db.cursor( )): print 'key=%r, data=%r' % (key, data) # emits (the order may vary): # key='some', data='0' # key='example', data='3' # key='words', data='sweet' # key='for', data='2' # key='skidoo', data='23'
Again, remember to close the database when you're done:
The Berkeley DB is a popular open source database. It does not support SQL, but it's simple to use, offers excellent performance, and gives you a lot of control over exactly what happens, if you care to exert it, through a huge array of options, flags, and methods. Berkeley DB is just as accessible from many other languages as from Python: for example, you can perform some changes or queries with a Python program, and others with a separate C program, on the same database file, using the same underlying open source library that you can freely download from Sleepycat.
The Python Standard Library shelve module can use the Berkeley DB as its underlying database engine, just as it uses cPickle for serialization. However, shelve does not let you take advantage of the ability to access a Berkeley DB database file from several different languages, exactly because the records are strings produced by pickle.dumps, and languages other than Python can't easily deal with them. Accessing the Berkeley DB directly with bsddb also gives you access to many advanced functionalities of the database engine that shelve simply doesn't support.
For example, creating a database with an access method of db.DB_HASH, as shown in the recipe, may give maximum performance, but, as you'll have noticed when listing all records with the generator irecords that is also presented in the recipe, hashing puts records in apparently random, unpredictable order. If you need to access records in sorted order, you can use an access method of db.DB_BTREE instead. Berkeley DB also supports more advanced functionality, such as transactions, which you can enable through direct access but not via anydbm or shelve.
For detailed documentation about all functionality of the Python Standard Library bsddb package, see http://pybsddb.sourceforge.net/bsddb3.html. For documentation, downloads, and more of the Berkeley DB itself, see http://www.sleepycat.com/.
Library Reference and Python in a Nutshell docs for modules anydbm, shelve, and bsddb; http://pybsddb.sourceforge.net/bsddb3.html for many more details about bsddb and bsddb3; http://www.sleepycat.com/ for downloads of, and very detailed documentation on, the Berkeley DB itself.