Section 7.1. Mapping Type: Dictionaries | Core Python Programming (2nd Edition)

7.1. Mapping Type: Dictionaries

Dictionaries are the sole mapping type in Python. Mapping objects have a one-to-many correspondence between hashable values (keys) and the objects they represent (values). They are similar to Perl hashes and can be generally considered as mutable hash tables. A dictionary object itself is mutable and is yet another container type that can store any number of Python objects, including other container types. What makes dictionaries different from sequence type containers like lists and tuples is the way the data are stored and accessed.

Sequence types use numeric keys only (numbered sequentially as indexed offsets from the beginning of the sequence). Mapping types may use most other object types as keys; strings are the most common. Unlike sequence type keys, mapping keys are often, if not directly, associated with the data value that is stored. But because we are no longer using "sequentially ordered" keys with mapping types, we are left with an unordered collection of data. As it turns out, this does not hinder our use because mapping types do not require a numeric value to index into a container to obtain the desired item. With a key, you are "mapped" directly to your value, hence the term "mapping type." The reason why they are commonly referred to as hash tables is because that is the exact type of object that dictionaries are. Dictionaries are one of Python's most powerful data types.

Core Note: What are hash tables and how do they relate to dictionaries?

Sequence types use sequentially ordered numeric keys as index offsets to store your data in an array format. The index number usually has nothing to do with the data value that is being stored. There should also be a way to store data based on another associated value such as a string. We do this all the time in everyday living. You file people's phone numbers in your address book based on last name, you add events to your calendar or appointment book based on date and time, etc. For each of these examples, an associated value to a data item was your key.

Hash tables are a data structure that does exactly what we described. They store each piece of data, called a value, based on an associated data item, called a key. Together, these are known as key-value pairs. The hash table algorithm takes your key, performs an operation on it, called a hash function, and based on the result of the calculation, chooses where in the data structure to store your value. Where any one particular value is stored depends on what its key is. Because of this randomness, there is no ordering of the values in the hash table. You have an unordered collection of data.

The only kind of ordering you can obtain is by taking either a dictionary's set of keys or values. The keys() or values() method returns lists, which are sortable. You can also call items() to get a list of keys and values as tuple pairs and sort that. Dictionaries themselves have no implicit ordering because they are hashes.

Hash tables generally provide good performance because lookups occur fairly quickly once you have a key.

Python dictionaries are implemented as resizeable hash tables. If you are familiar with Perl, then we can say that dictionaries are similar to Perl's associative arrays or hashes.

We will now take a closer look at Python dictionaries. The syntax of a dictionary entry is key:value Also, dictionary entries are enclosed in braces ( { } ).

How to Create and Assign Dictionaries

Creating dictionaries simply involves assigning a dictionary to a variable, regardless of whether the dictionary has elements or not:

>>> dict1 = {} >>> dict2 = {'name': 'earth', 'port': 80} >>> dict1, dict2 ({}, {'port': 80, 'name': 'earth'})

In Python versions 2.2 and newer, dictionaries may also be created using the factory function dict(). We discuss more examples later when we take a closer look at dict(), but here's a sneak peek for now:

>>> fdict = dict((['x', 1], ['y', 2])) >>> fdict {'y': 2, 'x': 1}

In Python versions 2.3 and newer, dictionaries may also be created using a very convenient built-in method for creating a "default" dictionary whose elements all have the same value (defaulting to None if not given), fromkeys():

>>> ddict = {}.fromkeys(('x', 'y'), -1) >>> ddict {'y': -1, 'x': -1} >>> >>> edict = {}.fromkeys(('foo', 'bar')) >>> edict {'foo': None, 'bar': None}

How to Access Values in Dictionaries

To traverse a dictionary (normally by key), you only need to cycle through its keys, like this:

>>> dict2 = {'name': 'earth', 'port': 80} >>> >>>> for key in dict2.keys(): ...     print 'key=%s, value=%s' % (key, dict2[key]) ... key=name, value=earth key=port, value=80

Beginning with Python 2.2, you no longer need to use the keys() method to extract a list of keys to loop over. Iterators were created to simplify accessing of sequence-like objects such as dictionaries and files. Using just the dictionary name itself will cause an iterator over that dictionary to be used in a for loop:

>>> dict2 = {'name': 'earth', 'port': 80} >>> >>>> for key in dict2: ...     print 'key=%s, value=%s' % (key, dict2[key]) ... key=name, value=earth key=port, value=80

To access individual dictionary elements, you use the familiar square brackets along with the key to obtain its value:

>>> dict2['name'] 'earth' >>> >>> print 'host %s is running on port %d' % \ ...  (dict2['name'], dict2['port']) host earth is running on port 80

Dictionary dict1 defined above is empty while dict2 has two data items. The keys in dict2 are 'name' and 'port', and their associated value items are 'earth' and 80, respectively. Access to the value is through the key, as you can see from the explicit access to the 'name' key.

If we attempt to access a data item with a key that is not part of the dictionary, we get an error:

>>> dict2['server'] Traceback (innermost last):   File "<stdin>", line 1, in ? KeyError: server

In this example, we tried to access a value with the key 'server' which, as you know from the code above, does not exist. The best way to check if a dictionary has a specific key is to use the dictionary's has_key() method, or better yet, the in or not in operators starting with version 2.2. The has_key() method will be obsoleted in future versions of Python, so it is best to just use in or not in.

We will introduce all of a dictionary's methods below. The Boolean has_key() and the in and not in operators are Boolean, returning true if a dictionary has that key and False otherwise. (In Python versions preceding Boolean constants [older than 2.3], the values returned are 1 and 0, respectively.)

>>> 'server' in dict2 # or dict2.has_key('server') False >>> 'name' in dict # or dict2.has_key('name') True >>> dict2['name'] 'earth'

Here is another dictionary example mixing the use of numbers and strings as keys:

>>> dict3 = {} >>> dict3[1] = 'abc' >>> dict3['1'] = 3.14159 >>> dict3[3.2] = 'xyz' >>> dict3 {3.2: 'xyz', 1: 'abc', '1': 3.14159}

Rather than adding each key-value pair individually, we could have also entered all the data for dict3 at the same time:

dict3 = {3.2: 'xyz', 1: 'abc', '1': 3.14159}

Creating the dictionary with a set key-value pair can be accomplished if all the data items are known in advance (obviously). The goal of the examples using dict3 is to illustrate the variety of keys that you can use. If we were to pose the question of whether a key for a particular value should be allowed to change, you would probably say, "No." Right?

Not allowing keys to change during execution makes sense if you think of it this way: Let us say that you created a dictionary element with a key and value. Somehow during execution of your program, the key changed, perhaps due to an altered variable. When you went to retrieve that data value again with the original key, you got a KeyError (since the key changed), and you had no idea how to obtain your value now because the key had somehow been altered. For this reason, keys must be hashable, so numbers and strings are fine, but lists and other dictionaries are not. (See Section 7.5.2 for why keys must be hashable.)

How to Update Dictionaries

You can update a dictionary by adding a new entry or element (i.e., a key-value pair), modifying an existing entry, or deleting an existing entry (see below for more details on removing an entry).

>>> dict2['name'] = 'venus' # update existing entry >>> dict2['port'] = 6969    # update existing entry >>> dict2['arch'] = 'sunos5' # add new entry >>> >>> print 'host %(name)s is running on port %(port)d' % dict2 host venus is running on port 6969

If the key does exist, then its previous value will be overridden by its new value. The print statement above illustrates an alternative way of using the string format operator ( % ), specific to dictionaries. Using the dictionary argument, you can shorten the print request somewhat because naming of the dictionary occurs only once, as opposed to occurring for each element using a tuple argument.

You may also add the contents of an entire dictionary to another dictionary by using the update() built-in method. We will introduce this method in Section 7.4.

How to Remove Dictionary Elements and Dictionaries

Removing an entire dictionary is not a typical operation. Generally, you either remove individual dictionary elements or clear the entire contents of a dictionary. However, if you really want to "remove" an entire dictionary, use the del statement (introduced in Section 3.5.5). Here are some deletion examples for dictionaries and dictionary elements:

del dict2['name']       # remove entry with key 'name' dict2.clear()           # remove all entries in dict1 del dict2               # delete entire dictionary dict2.pop('name')       # remove & return entry w/key

Core Tip: Avoid using built-in object names as identifiers for variables!

For those of you who began traveling in the Python universe before version 2.3, you may have once used dict as an identifier for a dictionary. However, because dict() is now a type and factory function, overriding it may cause you headaches and potential bugs. The interpreter will allow such overriding-hey, it thinks you seem smart and look like you know what you are doing! So be careful. Do NOT use variables named after built-in types like: dict, list, file, bool, str, input, or len!