19.5. Shelve FilesPickling allows you to store arbitrary objects on files and file-like objects, but it's still a fairly unstructured medium; it doesn't directly support easy access to members of collections of pickled objects. Higher-level structures can be added, but they are not inherent:
Shelves provide structure to collections of pickled objects that removes some of these constraints. They are a type of file that stores arbitrary Python objects by key for later retrieval, and they are a standard part of the Python system. Really, they are not much of a new topicshelves are simply a combination of DBM files and object pickling:
Because shelve uses pickle internally, it can store any object that pickle can: strings, numbers, lists, dictionaries, cyclic objects, class instances, and more. 19.5.1. Using ShelvesIn other words, shelve is just a go-between; it serializes and deserializes objects so that they can be placed in DBM files. The net effect is that shelves let you store nearly arbitrary Python objects on a file by key and fetch them back later with the same key. Your scripts never see all of this interfacing, though. Like DBM files, shelves provide an interface that looks like a dictionary that must be opened. In fact, a shelve is simply a persistent dictionary of persistent Python objectsthe shelve dictionary's content is automatically mapped to a file on your computer so that it is retained between program runs. This is quite a trick, but it's simpler to your code than it may sound. To gain access to a shelve, import the module and open your file: import shelve dbase = shelve.open("mydbase") Internally, Python opens a DBM file with the name mydbase, or creates it if it does not yet exist. Assigning to a shelve key stores an object: dbase['key'] = object Internally, this assignment converts the object to a serialized byte stream and stores it by key on a DBM file. Indexing a shelve fetches a stored object: value = dbase['key'] Internally, this index operation loads a string by key from a DBM file and unpickles it into an in-memory object that is the same as the object originally stored. Most dictionary operations are supported here too: len(dbase) # number of items stored dbase.keys( ) # stored item key index And except for a few fine points, that's really all there is to using a shelve. Shelves are processed with normal Python dictionary syntax, so there is no new database API to learn. Moreover, objects stored and fetched from shelves are normal Python objects; they do not need to be instances of special classes or types to be stored away. That is, Python's persistence system is external to the persistent objects themselves. Table 19-2 summarizes these and other commonly used shelve operations.
Because shelves export a dictionary-like interface too, this table is almost identical to the DBM operation table. Here, though, the module name anydbm is replaced by shelve, open calls do not require a second c argument, and stored values can be nearly arbitrary kinds of objects, not just strings. You still should close shelves explicitly after making changes to be safe, though; shelves use anydbm internally, and some underlying DBMs require closes to avoid data loss or damage. 19.5.2. Storing Built-In Object Types in ShelvesLet's run an interactive session to experiment with shelve interfaces. As mentioned, shelves are essentially just persistent dictionaries of objects, which you open and close: % python >>> import shelve >>> dbase = shelve.open("mydbase") >>> object1 = ['The', 'bright', ('side', 'of'), ['life']] >>> object2 = {'name': 'Brian', 'age': 33, 'motto': object1} >>> dbase['brian'] = object2 >>> dbase['knight'] = {'name': 'Knight', 'motto': 'Ni!'} >>> dbase.close( ) Here, we open a shelve and store two fairly complex dictionary and list data structures away permanently by simply assigning them to shelve keys. Because shelve uses pickle internally, almost anything goes herethe trees of nested objects are automatically serialized into strings for storage. To fetch them back, just reopen the shelve and index: % python >>> import shelve >>> dbase = shelve.open("mydbase") >>> len(dbase) # entries 2 >>> dbase.keys( ) # index ['knight', 'brian'] >>> dbase['knight'] # fetch {'motto': 'Ni!', 'name': 'Knight'} >>> for row in dbase.keys( ): ... print row, '=>' ... for field in dbase[row].keys( ): ... print ' ', field, '=', dbase[row][field] ... knight => motto = Ni! name = Knight brian => motto = ['The', 'bright', ('side', 'of'), ['life']] age = 33 name = Brian The nested loops at the end of this session step through nested dictionariesthe outer scans the shelve and the inner scans the objects stored in the shelve. The crucial point to notice is that we're using normal Python syntax, both to store and to fetch these persistent objects, as well as to process them after loading. 19.5.3. Storing Class Instances in ShelvesOne of the more useful kinds of objects to store in a shelve is a class instance. Because its attributes record state and its inherited methods define behavior, persistent class objects effectively serve the roles of both database records and database-processing programs. We can also use the underlying pickle module to serialize instances to flat files and other file-like objects (e.g., trusted network sockets), but the higher-level shelve module also gives us a convenient keyed-access storage medium. For instance, consider the simple class shown in Example 19-2, which is used to model people. Example 19-2. PP3E\Dbase\person.py (version 1)
Nothing about this class suggests it will be used for database recordsit can be imported and used independent of external storage. It's easy to use it for a database, though: we can make some persistent objects from this class by simply creating instances as usual, and then storing them by key on an opened shelve: C:\...\PP3E\Dbase>python >>> from person import Person >>> bob = Person('bob', 'psychologist', 70000) >>> emily = Person('emily', 'teacher', 40000) >>> >>> import shelve >>> dbase = shelve.open('cast') # make new shelve >>> for obj in (bob, emily): # store objects >>> dbase[obj.name] = obj # use name for key >>> dbase.close( ) # need for bsddb Here we used the instance objects' name attribute as their key in the shelve database. When we come back and fetch these objects in a later Python session or script, they are re-created in memory as they were when they were stored: C:\...\PP3E\Dbase>python >>> import shelve >>> dbase = shelve.open('cast') # reopen shelve >>> >>> dbase.keys( ) # both objects are here ['emily', 'bob'] >>> print dbase['emily'] <person.Person instance at 799940> >>> >>> print dbase['bob'].tax( ) # call: bob's tax 17500.0 Notice that calling Bob's tax method works even though we didn't import the Person class here. Python is smart enough to link this object back to its original class when unpickled, such that all the original methods are available through fetched objects. 19.5.4. Changing Classes of Objects Stored in ShelvesTechnically, Python reimports a class to re-create its stored instances as they are fetched and unpickled. Here's how this works:
The key point in this is that the class and stored instance data are separate. The class itself is not stored with its instances, but is instead located in the Python source file and reimported later when instances are fetched. The upshot is that by modifying external classes in module files, we can change the way stored objects' data is interpreted and used without actually having to change those stored objects. It's as if the class is a program that processes stored records. To illustrate, suppose the Person class from the previous section was changed to the source code in Example 19-3. Example 19-3. PP3E\Dbase\person.py (version 2)
This revision has a new tax rate (30 percent), introduces a _ _getattr_ _ qualification overload method, and deletes the original tax method. Tax attribute references are intercepted and computed when accessed: C:\...\PP3E\Dbase>python >>> import shelve >>> dbase = shelve.open('cast') # reopen shelve >>> >>> print dbase.keys( ) # both objects are here ['emily', 'bob'] >>> print dbase['emily'] <person.Person instance at 79aea0> >>> >>> print dbase['bob'].tax # no need to call tax( ) 21000.0 Because the class has changed, tax is now simply qualified, not called. In addition, because the tax rate was changed in the class, Bob pays more this time around. Of course, this example is artificial, but when used well, this separation of classes and persistent instances can eliminate many traditional database update programs. In most cases, you can simply change the class, not each stored instance, for new behavior. 19.5.5. Shelve ConstraintsAlthough shelves are generally straightforward to use, there are a few rough edges worth knowing about. 19.5.5.1. Keys must be stringsFirst, although they can store arbitrary objects, keys must still be strings. The following fails, unless you convert the integer 42 to the string 42 manually first: dbase[42] = value # fails, but str(42) will work This is different from in-memory dictionaries, which allow any immutable object to be used as a key, and derives from the shelve's use of DBM files internally. 19.5.5.2. Objects are unique only within a keyAlthough the shelve module is smart enough to detect multiple occurrences of a nested object and re-create only one copy when fetched, this holds true only within a given slot: dbase[key] = [object, object] # OK: only one copy stored and fetched dbase[key1] = object dbase[key2] = object # bad?: two copies of object in the shelve When key1 and key2 are fetched, they reference independent copies of the original shared object; if that object is mutable, changes from one won't be reflected in the other. This really stems from the fact the each key assignment runs an independent pickle operationthe pickler detects repeated objects but only within each pickle call. This may or may not be a concern in your practice, and it can be avoided with extra support logic, but an object can be duplicated if it spans keys. 19.5.5.3. Updates must treat shelves as fetch-modify-store mappingsBecause objects fetched from a shelve don't know that they came from a shelve, operations that change components of a fetched object change only the in-memory copy, not the data on a shelve: dbase[key].attr = value # shelve unchanged To really change an object stored on a shelve, fetch it into memory, change its parts, and then write it back to the shelve as a whole by key assignment: object = dbase[key] # fetch it object.attr = value # modify it dbase[key] = object # store back-shelve changed 19.5.5.4. Concurrent updates are not directly supportedThe shelve module does not currently support simultaneous updates. Simultaneous readers are OK, but writers must be given exclusive access to the shelve. You can trash a shelve if multiple processes write to it at the same time, which is a common potential in things such as Common Gateway Interface (CGI) server-side scripts. If your shelves may be hit by multiple processes, be sure to wrap updates in calls to the fcntl.flock or os.open built-ins to lock files and provide exclusive access. 19.5.5.5. Underlying DBM format portabilityWith shelves, the files created by an underlying DBM system used to store your persistent objects are not necessarily compatible with all possible DBM implementations or Pythons. For instance, a file generated by gdbm on Linux, or by the BSD library on Windows, may not be readable by a Python with other DBM modules installed. Technically, when a DBM file (or by proxy, a shelve) is created, the anydbm module tries to import all possible DBM system modules in a predefined order and uses the first that it finds. When anydmb later opens an existing file, it attempts to determine which DBM system created it by inspecting the files(s) using the module whichdb. Because the BSD system is tried first at file creation time and is available on both Windows and many Unix-like systems, your DBM file is portable as long as your Pythons support BSD on both platforms. If the system used to create a DBM file is not available on the underlying platform, though, the DBM file cannot be used. If DBM file portability is a concern, make sure that all the Pythons that will read your data use compatible DBM modules. If that is not an option, use the pickle module directly and flat files for storage, or use the ZODB system we'll meet later in this chapter. 19.5.6. Pickled Class ConstraintsIn addition to these shelve constraints, storing class instances in a shelve adds a set of additional rules you need to be aware of. Really, these are imposed by the pickle module, not by shelve, so be sure to follow these if you store class objects with pickle directly too:
In a prior Python release, persistent object classes also had to either use constructors with no arguments or provide defaults for all constructor arguments (much like the notion of a C++ copy constructor). This constraint was dropped as of Python 1.5.2classes with nondefaulted constructor arguments now work fine in the pickling system.[*]
19.5.7. Other Shelve LimitationsFinally, although shelves store objects persistently, they are not really object-oriented database systems. Such systems also implement features such as automatic write-through on changes, transaction commits and rollbacks, safe concurrent updates, and object decomposition and delayed ("lazy") component fetches based on generated object ID. Parts of larger objects are loaded into memory only as they are accessed. It's possible to extend shelves to support such features manually, but you don't need tothe ZODB system provides an implementation of a more complete object-oriented database system. It is constructed on top of Python's built-in pickling persistence support, but it offers additional features for advanced data stores. For more on ZODB, let's move on to the next section. |