Recipe4.10.Adding an Entry to a Dictionary


Recipe 4.10. Adding an Entry to a Dictionary

Credit: Alex Martelli, Martin Miller, Matthew Shomphe

Problem

Working with a dictionary d, you need to use the entry d[k] when it's already present, or add a new value as d[k] when k isn't yet a key in d.

Solution

This is what the setdefault method of dictionaries is for. Say we're building a word- to-page-numbers index, a dictionary that maps each word to the list of page numbers where it appears. A key piece of code in that application might be:

def addword(theIndex, word, pagenumber):     theIndex.setdefault(word, [  ]).append(pagenumber)

This code is equivalent to more verbose approaches such as:

def addword(theIndex, word, pagenumber):     if word in theIndex:         theIndex[word].append(pagenumber)     else:         theIndex[word] = [pagenumber]

and:

def addword(theIndex, word, pagenumber):     try:         theIndex[word].append(pagenumber)     except KeyError:         theIndex[word] = [pagenumber]

Using method setdefault simplifies this task considerably.

Discussion

For any dictionary d, d.setdefault(k, v) is very similar to d.get(k, v), which was covered previously in Recipe 4.9. The essential difference is that, if k is not a key in the dictionary, the setdefault method assigns d[k]=v as a side effect, in addition to returning v. (get would just return v, without affecting d in any way.) Therefore, consider using setdefault any time you have get-like needs, but also want to produce this side effect on the dictionary.

setdefault is particularly useful in a dictionary with values that are lists, as detailed in Recipe 4.15. The most typical usage for setdefault is something like:

somedict.setdefault(somekey, [  ]).append(somevalue)

setdefault is not all that useful for immutable values, such as numbers. If you just want to count words, for example, the right way to code is to use, not setdefault, but rather get:

theIndex[word] = theIndex.get(word, 0) + 1

since you must rebind the dictionary entry at theIndex[word] anyway (because numbers are immutable). But for our word-to page-numbers example, you definitely do not want to fall into the performance trap that's hidden in the following approach:

def addword(theIndex, word, pagenumber):     theIndex[word] = theIndex.get(word, [  ]) + [pagenumber]

This latest version of addword builds three new lists each time you call it: an empty list that's passed as the second argument to theIndex.get, a one-item list containing just pagenumber, and a list with N+1 items obtained by concatenating these two (where N is the number of times that word was previously found). Building such a huge number of lists is sure to take its toll, in performance terms. For example, on my machine, I timed the task of indexing the same four words occurring once each on each of 1,000 pages. Taking the first version of addword in the recipe as a reference point, the second one (using try/except) is about 10% faster, the third one (using setdefault) is about 20% slowerthe kind of performance differences that you should blissfully ignore in just about all cases. This fourth version (using get) is four times slowerthe kind of performance difference you just can't afford to ignore.

See Also

Recipe 4.9; Recipe 4.15; Library Reference and Python in a Nutshell documentation about dict.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net