Section 19.5. Shelve Files

19.5. Shelve Files

Pickling allows you to store arbitrary objects on files and file-like objects, but it's still a fairly unstructured medium; it doesn't directly support easy access to members of collections of pickled objects. Higher-level structures can be added, but they are not inherent:

You can sometimes craft your own higher-level pickle file organizations with the underlying filesystem (e.g., you can store each pickled object in a file whose name uniquely identifies the object), but such an organization is not part of pickling itself and must be manually managed.
You can also store arbitrarily large dictionaries in a pickled file and index them by key after they are loaded back into memory, but this will load the entire dictionary all at once when unpickled, not just the entry you are interested in.

Shelves provide structure to collections of pickled objects that removes some of these constraints. They are a type of file that stores arbitrary Python objects by key for later retrieval, and they are a standard part of the Python system. Really, they are not much of a new topicshelves are simply a combination of DBM files and object pickling:

To store an in-memory object by key, the shelve module first serializes the object to a string with the pickle module, and then it stores that string in a DBM file by key with the anydbm module.
To fetch an object back by key, the shelve module first loads the object's serialized string by key from a DBM file with the anydbm module, and then converts it back to the original in-memory object with the pickle module.

Because shelve uses pickle internally, it can store any object that pickle can: strings, numbers, lists, dictionaries, cyclic objects, class instances, and more.

19.5.1. Using Shelves

In other words, shelve is just a go-between; it serializes and deserializes objects so that they can be placed in DBM files. The net effect is that shelves let you store nearly arbitrary Python objects on a file by key and fetch them back later with the same key.

Your scripts never see all of this interfacing, though. Like DBM files, shelves provide an interface that looks like a dictionary that must be opened. In fact, a shelve is simply a persistent dictionary of persistent Python objectsthe shelve dictionary's content is automatically mapped to a file on your computer so that it is retained between program runs. This is quite a trick, but it's simpler to your code than it may sound. To gain access to a shelve, import the module and open your file:

 import shelve dbase = shelve.open("mydbase")

Internally, Python opens a DBM file with the name mydbase, or creates it if it does not yet exist. Assigning to a shelve key stores an object:

 dbase['key'] = object

Internally, this assignment converts the object to a serialized byte stream and stores it by key on a DBM file. Indexing a shelve fetches a stored object:

 value = dbase['key']

Internally, this index operation loads a string by key from a DBM file and unpickles it into an in-memory object that is the same as the object originally stored. Most dictionary operations are supported here too:

 len(dbase)        # number of items stored dbase.keys( )    # stored item key index

And except for a few fine points, that's really all there is to using a shelve. Shelves are processed with normal Python dictionary syntax, so there is no new database API to learn. Moreover, objects stored and fetched from shelves are normal Python objects; they do not need to be instances of special classes or types to be stored away. That is, Python's persistence system is external to the persistent objects themselves. Table 19-2 summarizes these and other commonly used shelve operations.

Table 19-2. Shelve file operations
Python code	Action	Description
`import shelve`	Import	Get `dbm`, `gdbm`, and so on...whatever is installed
`file = shelve.open('filename')`	Open	Create or open an existing DBM file
`file['key'] = anyvalue`	Store	Create or change the entry for `key`
`value = file['key']`	Fetch	Load the value for the entry `key`
`count = len(file)`	Size	Return the number of entries stored
`index = file.keys( )`	Index	Fetch the stored keys list
`found = file.has_key('key')`	Query	See if there's an entry for `key`
`del file['key']`	Delete	Remove the entry for `key`
`file.close( )`	Close	Manual close, not always needed

Because shelves export a dictionary-like interface too, this table is almost identical to the DBM operation table. Here, though, the module name anydbm is replaced by shelve, open calls do not require a second c argument, and stored values can be nearly arbitrary kinds of objects, not just strings. You still should close shelves explicitly after making changes to be safe, though; shelves use anydbm internally, and some underlying DBMs require closes to avoid data loss or damage.

19.5.2. Storing Built-In Object Types in Shelves

Let's run an interactive session to experiment with shelve interfaces. As mentioned, shelves are essentially just persistent dictionaries of objects, which you open and close:

 % python >>> import shelve >>> dbase = shelve.open("mydbase") >>> object1 = ['The', 'bright', ('side', 'of'), ['life']] >>> object2 = {'name': 'Brian', 'age': 33, 'motto': object1} >>> dbase['brian']  = object2 >>> dbase['knight'] = {'name': 'Knight', 'motto': 'Ni!'} >>> dbase.close( )

Here, we open a shelve and store two fairly complex dictionary and list data structures away permanently by simply assigning them to shelve keys. Because shelve uses pickle internally, almost anything goes herethe trees of nested objects are automatically serialized into strings for storage. To fetch them back, just reopen the shelve and index:

 % python >>> import shelve >>> dbase = shelve.open("mydbase") >>> len(dbase)                             # entries 2 >>> dbase.keys( )                          # index ['knight', 'brian'] >>> dbase['knight']                        # fetch {'motto': 'Ni!', 'name': 'Knight'} >>> for row in dbase.keys( ): ...     print row, '=>' ...     for field in dbase[row].keys( ): ...         print '  ', field, '=', dbase[row][field] ... knight =>    motto = Ni!    name = Knight brian =>    motto = ['The', 'bright', ('side', 'of'), ['life']]    age = 33    name = Brian

The nested loops at the end of this session step through nested dictionariesthe outer scans the shelve and the inner scans the objects stored in the shelve. The crucial point to notice is that we're using normal Python syntax, both to store and to fetch these persistent objects, as well as to process them after loading.

19.5.3. Storing Class Instances in Shelves

One of the more useful kinds of objects to store in a shelve is a class instance. Because its attributes record state and its inherited methods define behavior, persistent class objects effectively serve the roles of both database records and database-processing programs. We can also use the underlying pickle module to serialize instances to flat files and other file-like objects (e.g., trusted network sockets), but the higher-level shelve module also gives us a convenient keyed-access storage medium. For instance, consider the simple class shown in Example 19-2, which is used to model people.

Example 19-2. PP3E\Dbase\person.py (version 1)

 # a person object: fields + behavior class Person:     def _ _init_ _(self, name, job, pay=0):         self.name = name         self.job  = job         self.pay  = pay               # real instance data     def tax(self):         return self.pay * 0.25        # computed on call     def info(self):         return self.name, self.job, self.pay, self.tax( )

Nothing about this class suggests it will be used for database recordsit can be imported and used independent of external storage. It's easy to use it for a database, though: we can make some persistent objects from this class by simply creating instances as usual, and then storing them by key on an opened shelve:

 C:\...\PP3E\Dbase>python >>> from person import Person >>> bob   = Person('bob', 'psychologist', 70000) >>> emily = Person('emily', 'teacher', 40000) >>> >>> import shelve >>> dbase = shelve.open('cast')          # make new shelve >>> for obj in (bob, emily):             # store objects >>>     dbase[obj.name] = obj            # use name for key >>> dbase.close( )                            # need for bsddb

Here we used the instance objects' name attribute as their key in the shelve database. When we come back and fetch these objects in a later Python session or script, they are re-created in memory as they were when they were stored:

 C:\...\PP3E\Dbase>python >>> import shelve >>> dbase = shelve.open('cast')            # reopen shelve >>> >>> dbase.keys( )                         # both objects are here ['emily', 'bob'] >>> print dbase['emily'] <person.Person instance at 799940> >>> >>> print dbase['bob'].tax( )             # call: bob's tax 17500.0

Notice that calling Bob's tax method works even though we didn't import the Person class here. Python is smart enough to link this object back to its original class when unpickled, such that all the original methods are available through fetched objects.

19.5.4. Changing Classes of Objects Stored in Shelves

Technically, Python reimports a class to re-create its stored instances as they are fetched and unpickled. Here's how this works:

Store: When Python pickles a class instance to store it in a shelve, it saves the instance's attributes plus a reference to the instance's class. In effect, pickled class instances in the prior example record the self attributes assigned in the class. Really, Python serializes and stores the instance's _ _dict_ _ attribute dictionary along with enough source file information to be able to locate the class's module later.
Fetch: When Python unpickles a class instance fetched from a shelve, it re-creates the instance object in memory by reimporting the class, assigning the saved attribute dictionary to a new empty instance, and linking the instance back to the class.

The key point in this is that the class and stored instance data are separate. The class itself is not stored with its instances, but is instead located in the Python source file and reimported later when instances are fetched.

The upshot is that by modifying external classes in module files, we can change the way stored objects' data is interpreted and used without actually having to change those stored objects. It's as if the class is a program that processes stored records.

To illustrate, suppose the Person class from the previous section was changed to the source code in Example 19-3.

Example 19-3. PP3E\Dbase\person.py (version 2)

 # a person object: fields + behavior # change: the tax method is now a computed attribute class Person:     def _ _init_ _(self, name, job, pay=0):         self.name = name         self.job  = job         self.pay  = pay                  # real instance data     def _ _getattr_ _(self, attr):      # on person.attr         if attr == 'tax':             return self.pay * 0.30       # computed on access         else:             raise AttributeError         # other unknown names     def info(self):         return self.name, self.job, self.pay, self.tax

This revision has a new tax rate (30 percent), introduces a _ _getattr_ _ qualification overload method, and deletes the original tax method. Tax attribute references are intercepted and computed when accessed:

 C:\...\PP3E\Dbase>python >>> import shelve >>> dbase = shelve.open('cast')        # reopen shelve >>> >>> print dbase.keys( )               # both objects are here ['emily', 'bob'] >>> print dbase['emily'] <person.Person instance at 79aea0> >>> >>> print dbase['bob'].tax             # no need to call tax( ) 21000.0

Because the class has changed, tax is now simply qualified, not called. In addition, because the tax rate was changed in the class, Bob pays more this time around. Of course, this example is artificial, but when used well, this separation of classes and persistent instances can eliminate many traditional database update programs. In most cases, you can simply change the class, not each stored instance, for new behavior.

19.5.5. Shelve Constraints

Although shelves are generally straightforward to use, there are a few rough edges worth knowing about.

19.5.5.1. Keys must be strings

First, although they can store arbitrary objects, keys must still be strings. The following fails, unless you convert the integer 42 to the string 42 manually first:

 dbase[42] = value      # fails, but str(42) will work

This is different from in-memory dictionaries, which allow any immutable object to be used as a key, and derives from the shelve's use of DBM files internally.

19.5.5.2. Objects are unique only within a key

Although the shelve module is smart enough to detect multiple occurrences of a nested object and re-create only one copy when fetched, this holds true only within a given slot:

 dbase[key] = [object, object]    # OK: only one copy stored and fetched dbase[key1] = object dbase[key2] = object             # bad?: two copies of object in the shelve

When key1 and key2 are fetched, they reference independent copies of the original shared object; if that object is mutable, changes from one won't be reflected in the other. This really stems from the fact the each key assignment runs an independent pickle operationthe pickler detects repeated objects but only within each pickle call. This may or may not be a concern in your practice, and it can be avoided with extra support logic, but an object can be duplicated if it spans keys.

19.5.5.3. Updates must treat shelves as fetch-modify-store mappings

Because objects fetched from a shelve don't know that they came from a shelve, operations that change components of a fetched object change only the in-memory copy, not the data on a shelve:

 dbase[key].attr = value   # shelve unchanged

To really change an object stored on a shelve, fetch it into memory, change its parts, and then write it back to the shelve as a whole by key assignment:

 object = dbase[key]       # fetch it object.attr = value       # modify it dbase[key] = object       # store back-shelve changed

19.5.5.4. Concurrent updates are not directly supported

The shelve module does not currently support simultaneous updates. Simultaneous readers are OK, but writers must be given exclusive access to the shelve. You can trash a shelve if multiple processes write to it at the same time, which is a common potential in things such as Common Gateway Interface (CGI) server-side scripts. If your shelves may be hit by multiple processes, be sure to wrap updates in calls to the fcntl.flock or os.open built-ins to lock files and provide exclusive access.

19.5.5.5. Underlying DBM format portability

With shelves, the files created by an underlying DBM system used to store your persistent objects are not necessarily compatible with all possible DBM implementations or Pythons. For instance, a file generated by gdbm on Linux, or by the BSD library on Windows, may not be readable by a Python with other DBM modules installed.

Technically, when a DBM file (or by proxy, a shelve) is created, the anydbm module tries to import all possible DBM system modules in a predefined order and uses the first that it finds. When anydmb later opens an existing file, it attempts to determine which DBM system created it by inspecting the files(s) using the module whichdb. Because the BSD system is tried first at file creation time and is available on both Windows and many Unix-like systems, your DBM file is portable as long as your Pythons support BSD on both platforms. If the system used to create a DBM file is not available on the underlying platform, though, the DBM file cannot be used.

If DBM file portability is a concern, make sure that all the Pythons that will read your data use compatible DBM modules. If that is not an option, use the pickle module directly and flat files for storage, or use the ZODB system we'll meet later in this chapter.

19.5.6. Pickled Class Constraints

In addition to these shelve constraints, storing class instances in a shelve adds a set of additional rules you need to be aware of. Really, these are imposed by the pickle module, not by shelve, so be sure to follow these if you store class objects with pickle directly too:

Classes must be importable

The Python pickler stores instance attributes only when pickling an instance object, and it reimports the class later to re-create the instance. Because of that, the classes of stored objects must be importable when objects are unpickledthey must be coded unnested at the top level of a module file that is accessible on the module import search path at load time (e.g., named in PYTHONPATH or in a .pth file).

Further, they must be associated with a real module when instances are pickled, not with a top-level script (with the module name _ _main_ _), unless they will only ever be used in the top-level script. You need to be careful about moving class modules after instances are stored. When an instance is unpickled, Python must find its class's module on the module search using the original module name (including any package path prefixes) and fetch the class from that module using the original class name. If the module or class has been moved or renamed, it might not be found.

In applications where pickled objects are shipped over network sockets, it's possible to deal with this constraint by shipping the text of the class along with stored instances; recipients may simply store the class in a local module file on the import search path prior to unpickling received instances. Where this is inconvenient, simpler objects such as lists and dictionaries with nesting may be transferred instead.

Class changes must be backward compatible

Although Python lets you change a class while instances of it are stored on a shelve, those changes must be backward compatible with the objects already stored. For instance, you cannot change the class to expect an attribute not associated with already stored persistent instances unless you first manually update those stored instances or provide extra conversion protocols on the class.

Other pickle module constraints

Shelves also inherit the pickling systems' nonclass limitations. As discussed earlier, some types of objects (e.g., open files and sockets) cannot be pickled, and thus cannot be stored in a shelve.

In a prior Python release, persistent object classes also had to either use constructors with no arguments or provide defaults for all constructor arguments (much like the notion of a C++ copy constructor). This constraint was dropped as of Python 1.5.2classes with nondefaulted constructor arguments now work fine in the pickling system.^[*]

^[*] Subtle thing: internally, Python now avoids calling the class to re-create a pickled instance and instead simply makes a class object generically, inserts instance attributes, and sets the instance's _ _class_ _ pointer to the original class directly. This avoids the need for defaults, but it also means that the class _ _init_ _ constructors that are no longer called as objects are unpickled, unless you provide extra methods to force the call. See the library manual for more details, and see the pickle module's source code (pickle.py in the source library) if you're curious about how this works. Better yet, see the formtable module listed ahead in this chapterit does something very similar with _ _class_ _ links to build an instance object from a class and dictionary of attributes, without calling the class's _ _init_ _ constructor. This makes constructor argument defaults unnecessary in classes used for records browsed by PyForm, but it's the same idea.

19.5.7. Other Shelve Limitations

Finally, although shelves store objects persistently, they are not really object-oriented database systems. Such systems also implement features such as automatic write-through on changes, transaction commits and rollbacks, safe concurrent updates, and object decomposition and delayed ("lazy") component fetches based on generated object ID. Parts of larger objects are loaded into memory only as they are accessed. It's possible to extend shelves to support such features manually, but you don't need tothe ZODB system provides an implementation of a more complete object-oriented database system. It is constructed on top of Python's built-in pickling persistence support, but it offers additional features for advanced data stores. For more on ZODB, let's move on to the next section.