Recipe7.7.Mutating Objects with shelve

Recipe 7.7. Mutating Objects with shelve

Credit: Luther Blissett

Problem

You are using the standard module shelve. Some of the values you have shelved are mutable objects, and you need to mutate these objects.

Solution

The shelve module offers a kind of persistent dictionaryan important niche between the power of relational-database engines and the simplicity of marshal, pickle, dbm, and similar file formats. However, you should be aware of a typical trap you need to avoid when using shelve. Consider the following interactive Python session:

>>> import shelve >>> # Build a simple sample shelf >>> she = shelve.open('try.she', 'c') >>> for c in 'spam': she[c] = {c:23} ... >>> for c in she.keys( ): print c, she[c] ... p {'p': 23} s {'s': 23} a {'a': 23} m {'m': 23} >>> she.close( )

We've created the shelve file, added some data to it, and closed it. Goodnow we can reopen it and work with it:

>>> she=shelve.open('try.she', 'c') >>> she['p'] {'p': 23} >>> she['p']['p'] = 42 >>> she['p'] {'p': 23}

What's going on here? We just set the value to 42, but our setting didn't take in the shelve object! The problem is that we were working with a temporary object that shelve gave us, not with the "real thing". shelve, when we open it with default options, like here, doesn't track changes to such temporary objects. One reasonable solution is to bind a name to this temporary object, do our mutation, and then assign the mutated object back to the appropriate item of shelve:

>>> a = she['p'] >>> a['p'] = 42 >>> she['p'] = a >>> she['p'] {'p': 42} >>> she.close( )

We can verify that the change was properly persisted:

>>> she=shelve.open('try.she','c') >>> for c in she.keys( ): print c,she[c] ... p {'p': 42} s {'s': 23} a {'a': 23} m {'m': 23}

A simpler solution is to open the shelve object with the writeback option set to TRue:

>>> she = shelve.open('try.she', 'c', writeback=True)

The writeback option instructs shelve to keep track of all the objects it gets from the file and write them all back to the file before closing it, just in case they have been modified in the meantime. While simple, this approach can be quite expensive, particularly in terms of memory consumption. Specifically, if we read many objects from a shelve object opened with writeback=True, even if we only modify a few of them, shelve is going to keep them all in memory, since it can't tell in advance which one we may be about to modify. The previous approach, where we explicitly take responsibility to notify shelve of any changes (by assigning the changed objects back to the place they came from), requires more care on our part, but repays that care by giving us much better performance.

Discussion

The standard Python module shelve can be quite convenient in many cases, but it hides a potentially nasty trap, admittedly well documented in Python's online docs but still easy to miss. Suppose you're shelving mutable objects, such as dictionaries or lists. Naturally, you are quite likely to want to mutate some of those objectsfor example, by calling mutating methods (append on a list, update on a dictionary, etc.) or by assigning a new value to an item or attribute of the object. However, when you do this, the change doesn't occur in the shelve object. This is because we actually mutate a temporary object that the shelve object has given us as the result of shelve's own _ _getitem_ _ method, but the shelve object, by default, does not keep track of that temporary object, nor does it care about it once it returns it to us.

As shown in the recipe, one solution is to bind a name to the temporary object obtained by keying into the shelf, doing whatever mutations are needed to the object via the name, then assigning the newly mutated object back to the appropriate item of the shelve object. When you assign to a shelve object's item, the shelve object's _ _setitem_ _ method gets invoked, and it appropriately updates the shelve object itself, so that the change does occur.

Alternatively, you can add the flag writeback=True at the time you open the shelve object, and then shelve keeps track of every object it hands you, saving them all back to disk at the end. This approach may save you quite a bit of fussy and laborious coding, but take care: if you read many items of the shelve object and only modify a few of them, the writeback approach can be exceedingly costly, particularly in terms of memory consumption. When opened with writeback=True, shelve will keep in memory any item it has ever handed you, and save them all to disk at the end, since it doesn't have a reliable way to tell which items you may be about to modify, nor, in general, even which items you have actually modified by the time you close the shelve object. The recommended approach, unless you're going to modify just about every item you read (or unless the shelve object in question is small enough compared with your available memory that you don't really care), is the previous one: bind a name to the items that you get from a shelve object with intent to modify them, and assign each item back into the shelve object once you're done mutating that item.

Recipe7.7.Mutating Objects with shelve