Section 19.3. DBM Files


19.3. DBM Files

Flat files are handy for simple persistence tasks, but they are generally geared toward a sequential processing mode. Although it is possible to jump around to arbitrary locations with seek calls, flat files don't provide much structure to data beyond the notion of bytes and text lines.

DBM files, a standard tool in the Python library for database management, improve on that by providing key-based access to stored text strings. They implement a random-access, single-key view on stored data. For instance, information related to objects can be stored in a DBM file using a unique key per object and later can be fetched back directly with the same key. DBM files are implemented by a variety of underlying modules (including one coded in Python), but if you have Python, you have a DBM.

19.3.1. Using DBM Files

Although DBM filesystems have to do a bit of work to map chunks of stored data to keys for fast retrieval (technically, they generally use a technique called hashing to store data in files), your scripts don't need to care about the action going on behind the scenes. In fact, DBM is one of the easiest ways to save information in PythonDBM files behave so much like in-memory dictionaries that you may forget you're actually dealing with a file. For instance, given a DBM file object:

  • Indexing by key fetches data from the file.

  • Assigning to an index stores data in the file.

DBM file objects also support common dictionary methods such as keys-list fetches and tests and key deletions. The DBM library itself is hidden behind this simple model. Since it is so simple, let's jump right into an interactive example that creates a DBM file and shows how the interface works:

 % python >>> import anydbm                           # get interface: dbm, gdbm, ndbm,.. >>> file = anydbm.open('movie', 'c')        # make a DBM file called 'movie' >>> file['Batman'] = 'Pow!'                 # store a string under key 'Batman' >>> file.keys( )                                 # get the file's key directory ['Batman'] >>> file['Batman']                          # fetch value for key 'Batman' 'Pow!' >>> who  = ['Robin', 'Cat-woman', 'Joker'] >>> what = ['Bang!', 'Splat!', 'Wham!'] >>> for i in range(len(who)): ...     file[who[i]] = what[i]              # add 3 more "records" ... >>> file.keys( ) ['Joker', 'Robin', 'Cat-woman', 'Batman'] >>> len(file), file.has_key('Robin'), file['Joker'] (4, 1, 'Wham!') >>> file.close( )                            # close sometimes required 

Internally, importing anydbm automatically loads whatever DBM interface is available in your Python interpreter, and opening the new DBM file creates one or more external files with names that start with the string 'movie' (more on the details in a moment). But after the import and open, a DBM file is virtually indistinguishable from a dictionary. In effect, the object called file here can be thought of as a dictionary mapped to an external file called movie.

Unlike normal dictionaries, though, the contents of file are retained between Python program runs. If we come back later and restart Python, our dictionary is still available. DBM files are like dictionaries that must be opened:

 % python >>> import anydbm >>> file = anydbm.open('movie', 'c')        # open existing DBM file >>> file['Batman'] 'Pow!' >>> file.keys( )                             # keys gives an index list ['Joker', 'Robin', 'Cat-woman', 'Batman']  >>> for key in file.keys( ): print key, file[key] ... Joker Wham! Robin Bang! Cat-woman Splat! Batman Pow! >>> file['Batman'] = 'Ka-Boom!'               # change Batman slot >>> del file['Robin']                         # delete the Robin entry >>> file.close( )                            # close it after changes 

Apart from having to import the interface and open and close the DBM file, Python programs don't have to know anything about DBM itself. DBM modules achieve this integration by overloading the indexing operations and routing them to more primitive library tools. But you'd never know that from looking at this Python codeDBM files look like normal Python dictionaries, stored on external files. Changes made to them are retained indefinitely:

 % python >>> import anydbm                           # open DBM file again >>> file = anydbm.open('movie', 'c') >>> for key in file.keys( ): print key, file[key] ... Joker Wham! Cat-woman Splat! Batman Ka-Boom! 

As you can see, this is about as simple as it can be. Table 19-1 lists the most commonly used DBM file operations. Once such a file is opened, it is processed just as though it were an in-memory Python dictionary. Items are fetched by indexing the file object by key and are stored by assigning to a key.

Table 19-1. DBM file operations

Python code

Action

Description

import anydbm

Import

Get dbm, gdbm, and so on...whatever is installed

file = anydbm.open('filename', 'c')

Open

Create or open an existing DBM file

file['key'] = 'value'

Store

Create or change the entry for key

value = file['key']

Fetch

Load the value for the entry key

count = len(file)

Size

Return the number of entries stored

index = file.keys( )

Index

Fetch the stored keys list

found = file.has_key('key')

Query

See if there's an entry for key

del file['key']

Delete

Remove the entry for key

file.close( )

Close

Manual close, not always needed


Despite the dictionary-like interface, DBM files really do map to one or more external files. For instance, the underlying gdbm interface writes two files, movie.dir and movie.pag, when a GDBM file called movie is made. If your Python was built with a different underlying keyed-file interface, different external files might show up on your computer.

Technically, the module anydbm is really an interface to whatever DBM-like filesystem you have available in your Python. When creating a new file, anydbm today tries to load the dbhash, gdbm, and dbm keyed-file interface modules; Pythons without any of these automatically fall back on an all-Python implementation called dumbdbm. When opening an already existing DBM file, anydbm tries to determine the system that created it with the whichdb module instead. You normally don't need to care about any of this, though (unless you delete the files your DBM creates).

Note that DBM files may or may not need to be explicitly closed, per the last entry in Table 19-1. Some DBM files don't require a close call, but some depend on it to flush changes out to disk. On such systems, your file may be corrupted if you omit the close call. Unfortunately, the default DBM as of the 1.5.2 Windows Python port, dbhash (a.k.a. bsddb), is one of the DBM systems that requires a close call to avoid data loss. As a rule of thumb, always close your DBM files explicitly after making changes and before your program exits, to avoid potential problems. This rule extends by proxy to shelves, which is a topic we'll meet later in this chapter.

In Python versions 1.5.2 and later, be sure to also pass a string 'c' as a second argument when calling anydbm.open, to force Python to create the file if it does not yet exist, and to simply open it otherwise. This used to be the default behavior but is no longer. You do not need the 'c' argument when opening shelves discussed aheadthey still use an "open or create" mode by default if passed no open mode argument. Other open mode strings can be passed to anydbm (e.g., n to always create the file and r for read-onlythe new default); see the library reference manuals for more details.





Programming Python
Programming Python
ISBN: 0596009259
EAN: 2147483647
Year: 2004
Pages: 270
Authors: Mark Lutz

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net