19.3. DBM Files
Flat files are handy for simple persistence tasks, but they are generally geared toward a sequential processing mode. Although it is possible to jump around to arbitrary locations with seek calls, flat files don't provide much structure to data beyond the notion of bytes and text lines.
DBM files, a standard tool in the Python library for database management, improve on that by providing key-based access to stored text strings. They implement a random-access, single-key view on stored data. For instance, information related to objects can be stored in a DBM file using a unique key per object and later can be fetched back directly with the same key. DBM files are implemented by a variety of underlying modules (including one coded in Python), but if you have Python, you have a DBM.
19.3.1. Using DBM Files
Although DBM filesystems have to do a bit of work to map chunks of stored data to keys for fast retrieval (technically, they generally use a technique called hashing to store data in files), your scripts don't need to care about the action going on behind the scenes. In fact, DBM is one of the easiest ways to save information in PythonDBM files behave so much like in-memory dictionaries that you may forget you're actually dealing with a file. For instance, given a DBM file object:
DBM file objects also support common dictionary methods such as keys-list fetches and tests and key deletions. The DBM library itself is hidden behind this simple model. Since it is so simple, let's jump right into an interactive example that creates a DBM file and shows how the interface works:
% python >>> import anydbm # get interface: dbm, gdbm, ndbm,.. >>> file = anydbm.open('movie', 'c') # make a DBM file called 'movie' >>> file['Batman'] = 'Pow!' # store a string under key 'Batman' >>> file.keys( ) # get the file's key directory ['Batman'] >>> file['Batman'] # fetch value for key 'Batman' 'Pow!' >>> who = ['Robin', 'Cat-woman', 'Joker'] >>> what = ['Bang!', 'Splat!', 'Wham!'] >>> for i in range(len(who)): ... file[who[i]] = what[i] # add 3 more "records" ... >>> file.keys( ) ['Joker', 'Robin', 'Cat-woman', 'Batman'] >>> len(file), file.has_key('Robin'), file['Joker'] (4, 1, 'Wham!') >>> file.close( ) # close sometimes required
Internally, importing anydbm automatically loads whatever DBM interface is available in your Python interpreter, and opening the new DBM file creates one or more external files with names that start with the string 'movie' (more on the details in a moment). But after the import and open, a DBM file is virtually indistinguishable from a dictionary. In effect, the object called file here can be thought of as a dictionary mapped to an external file called movie.
Unlike normal dictionaries, though, the contents of file are retained between Python program runs. If we come back later and restart Python, our dictionary is still available. DBM files are like dictionaries that must be opened:
% python >>> import anydbm >>> file = anydbm.open('movie', 'c') # open existing DBM file >>> file['Batman'] 'Pow!' >>> file.keys( ) # keys gives an index list ['Joker', 'Robin', 'Cat-woman', 'Batman'] >>> for key in file.keys( ): print key, file[key] ... Joker Wham! Robin Bang! Cat-woman Splat! Batman Pow! >>> file['Batman'] = 'Ka-Boom!' # change Batman slot >>> del file['Robin'] # delete the Robin entry >>> file.close( ) # close it after changes
Apart from having to import the interface and open and close the DBM file, Python programs don't have to know anything about DBM itself. DBM modules achieve this integration by overloading the indexing operations and routing them to more primitive library tools. But you'd never know that from looking at this Python codeDBM files look like normal Python dictionaries, stored on external files. Changes made to them are retained indefinitely:
% python >>> import anydbm # open DBM file again >>> file = anydbm.open('movie', 'c') >>> for key in file.keys( ): print key, file[key] ... Joker Wham! Cat-woman Splat! Batman Ka-Boom!
As you can see, this is about as simple as it can be. Table 19-1 lists the most commonly used DBM file operations. Once such a file is opened, it is processed just as though it were an in-memory Python dictionary. Items are fetched by indexing the file object by key and are stored by assigning to a key.
Despite the dictionary-like interface, DBM files really do map to one or more external files. For instance, the underlying gdbm interface writes two files, movie.dir and movie.pag, when a GDBM file called movie is made. If your Python was built with a different underlying keyed-file interface, different external files might show up on your computer.
Technically, the module anydbm is really an interface to whatever DBM-like filesystem you have available in your Python. When creating a new file, anydbm today tries to load the dbhash, gdbm, and dbm keyed-file interface modules; Pythons without any of these automatically fall back on an all-Python implementation called dumbdbm. When opening an already existing DBM file, anydbm tries to determine the system that created it with the whichdb module instead. You normally don't need to care about any of this, though (unless you delete the files your DBM creates).
Note that DBM files may or may not need to be explicitly closed, per the last entry in Table 19-1. Some DBM files don't require a close call, but some depend on it to flush changes out to disk. On such systems, your file may be corrupted if you omit the close call. Unfortunately, the default DBM as of the 1.5.2 Windows Python port, dbhash (a.k.a. bsddb), is one of the DBM systems that requires a close call to avoid data loss. As a rule of thumb, always close your DBM files explicitly after making changes and before your program exits, to avoid potential problems. This rule extends by proxy to shelves, which is a topic we'll meet later in this chapter.