Recipe4.11.Building a Dictionary Without Excessive Quoting

Recipe 4.11. Building a Dictionary Without Excessive Quoting

Credit: Brent Burley, Peter Cogolo

Problem

You want to construct a dictionary whose keys are literal strings, without having to quote each key.

Solution

Once you get into the swing of Python, you'll find yourself constructing a lot of dictionaries. When the keys are identifiers, you can avoid quoting them by calling dict with named-argument syntax:

data = dict(red=1, green=2, blue=3)

This is neater than the equivalent use of dictionary-display syntax:

data = {'red': 1, 'green': 2, 'blue': 3}

Discussion

One powerful way to build a dictionary is to call the built-in type dict. It's often a good alternative to the dictionary-display syntax with braces and colons. This recipe shows that, by calling dict, you can avoid having to quote keys, when the keys are literal strings that happen to be syntactically valid for use as Python identifiers. You cannot use this approach for keys such as the literal strings '12ba' or 'for', because '12ba' starts with a digit, and for happens to be a Python keyword, not an identifier.

Also, dictionary-display syntax is the only case in Python where you need to use braces: if you dislike braces, or happen to work on a keyboard that makes braces hard to reach (as all Italian layout keyboards do!), you may be happier, for example, using dict() rather than { } to build an empty dictionary.

Calling dict also gives you other possibilities. dict(d) returns a new dictionary that is an independent copy of existing dictionary d, just like d.copy( )but dict(d) works even when d is a sequence of pairs (key, value) instead of being a dictionary (when a key occurs more than once in the sequence, the last appearance of the key applies). A common dictionary-building idiom is:

d = dict(zip(the_keys, the_values))

where the_keys is a sequence of keys and the_values a "parallel" sequence of corresponding values. Built-in function zip builds and returns a list of (key, value) pairs, and built-in type dict accepts that list as its argument and constructs a dictionary accordingly. If the sequences are long, it's faster to use module itertools from the standard Python library:

import itertools d = dict(itertools.izip(the_keys, the_values))

Built-in function zip constructs the whole list of pairs in memory, while itertools.izip yields only one pair at a time. On my machine, with sequences of 10,000 numbers, the latter idiom is about twice as fast as the one using zip18 versus 45 milliseconds with Python 2.3, 17 versus 32 with Python 2.4.

You can use both a positional argument and named arguments in the same call to dict (if the named argument clashes with a key specified in the positional argument, the named argument applies). For example, here is a workaround for the previously mentioned issue that Python keywords, and other nonidentifiers, cannot be used as argument names:

d = dict({'12ba':49, 'for': 23}, rof=41, fro=97, orf=42)

If you need to build a dictionary where the same value corresponds to each key, call dict.fromkeys(keys_sequence, value) (if you omit the value, it defaults to None). For example, here is a neat way to initialize a dictionary to be used for counting occurrences of various lowercase ASCII letters:

import string count_by_letter = dict.fromkeys(string.ascii_lowercase, 0)

Recipe4.11.Building a Dictionary Without Excessive Quoting