Section 2.3. Step 1: Representing Records


2.3. Step 1: Representing Records

If we're going to store records in a database, the first step is probably deciding what those records will look like. There are a variety of ways to represent information about people in the Python language. Built-in object types such as lists and dictionaries are often sufficient, especially if we don't care about processing the data we store.

2.3.1. Using Lists

Lists, for example, can collect attributes about people in a positionally ordered way. Start up your Python interactive interpreter and type the following two statements (this works in the IDLE GUI, after typing python at a shell prompt, and so on, and the >>> characters are Python's promptif you've never run Python code this way before, see an introductory resource such as O'Reilly's Learning Python for help with getting started):

 >>> bob = ['Bob Smith', 42, 30000, 'software'] >>> sue = ['Sue Jones', 45, 40000, 'music'] 

We've just made two records, albeit simple ones, to represent two people, Bob and Sue (my apologies if you really are Bob or Sue, generically or otherwise[*]). Each record is a list of four properties: name, age, pay, and job field. To access these fields, we simply index by position (the result is in parentheses here because it is a tuple of two results):

[*] No, I'm serious. For an example I present in Python classes I teach, I had for many years regularly used the named "Bob Smith," age 40.5, and jobs "developer" and "manager" as a supposedly fictitious database recorduntil a recent class in Chicago, where I met a student name Bob Smith who was 40.5 and was a developer and manager. The world is stranger than it seems.

 >>> bob[0], sue[2]             # fetch name, pay ('Bob Smith', 40000) 

Processing records is easy with this representation; we just use list operations. For example, we can extract a last name by splitting the name field on blanks and grabbing the last part, and we may give someone a raise by changing their list in-place:

 >>> bob[0].split( )[-1]         # what's bob's last name? 'Smith' >>> sue[2] *= 1.25             # give sue a 25% raise >>> sue ['Sue Jones', 45, 50000.0, 'music'] 

The last-name expression here proceeds from left to right: we fetch Bob's name, split it into a list of substrings around spaces, and index his last name (run it one step at a time to see how).

2.3.1.1. A database list

Of course, what we really have at this point is just two variables, not a database; to collect Bob and Sue into a unit, we might simply stuff them into another list:

 >>> people = [bob, sue] >>> for person in people:         print person ['Bob Smith', 42, 30000, 'software'] ['Sue Jones', 45, 50000.0, 'music'] 

Now, the people list represents our database. We can fetch specific records by their relative positions and process them one at a time, in loops:

 >>> people[1][0] 'Sue Jones' >>> for person in people:         print person[0].split( )[-1]            # print last names         person[2] *= 1.20                      # give each a 20% raise Smith Jones >>> for person in people: print person[2]      # check new pay 36000.0 60000.0 

Now that we have a list, we can also collect values from records using some of Python's more powerful iteration tools, such as list comprehensions, maps, and generator expressions:

 >>> pays = [person[2] for person in people]    # collect all pay >>> pays [36000.0, 60000.0] >>> pays = map((lambda x: x[2]), people)       # ditto >>> pays [36000.0, 60000.0] >>> sum(person[2] for person in people)       # generator expression sum (2.4) 96000.0 

To add a record to the database, the usual list operations, such as append and extend, will suffice:

 >>> people.append(['Tom', 50, 0, None]) >>> len(people) 3 >>> people[-1][0] 'Tom' 

Lists work for our people database, and they might be sufficient for some programs, but they suffer from a few major flaws. For one thing, Bob and Sue, at this point, are just fleeting objects in memory that will disappear once we exit Python. For another, every time we want to extract a last name or give a raise, we'll have to repeat the kinds of code we just typed; that could become a problem if we ever change the way those operations workwe may have to update many places in our code. We'll address these issues in a few moments.

2.3.1.2. Field labels

Perhaps more fundamentally, accessing fields by position in a list requires us to memorize what each position means: if you see a bit of code indexing a record on magic position 2, how can you tell it is extracting a pay? In terms of understanding the code, it might be better to associate a field name with a field value.

We might try to associate names with relative positions by using the Python range built-in function, which builds a list of successive integers:

 >>> NAME, AGE, PAY = range(3)                # [0, 1, 2] >>> bob = ['Bob Smith', 42, 10000] >>> bob[NAME] 'Bob Smith' >>> PAY, bob[PAY] (2, 10000) 

This addresses readability: the three variables essentially become field names. This makes our code dependent on the field position assignments, thoughwe have to remember to update the range assignments whenever we change record structure. Because they are not directly associated, the names and records may become out of sync over time and require a maintenance step.

Moreover, because the field names are independent variables, there is no direct mapping from a record list back to its field's names. A raw record, for instance, provides no way to label its values with field names in a formatted display. In the preceding record, without additional code, there is no path from value 42 to label AGE.

We might also try this by using lists of tuples, where the tuples record both a field name and a value; better yet, a list of lists would allow for updates (tuples are immutable). Here's what that idea translates to, with slightly simpler records:

 >>> bob = [['name', 'Bob Smith'], ['age', 42], ['pay', 10000]] >>> sue = [['name', 'Sue Jones'], ['age', 45], ['pay', 20000]] >>> people = [bob, sue] 

This really doesn't fix the problem, though, because we still have to index by position in order to fetch fields:

 >>> for person in people:         print person[0][1], person[2][1]      # name, pay Bob Smith 10000 Sue Jones 20000 >>> [person[0][1] for person in people]       # collect names ['Bob Smith', 'Sue Jones'] >>> for person in people:         print person[0][1].split( )[-1]        # get last names         person[2][1] *= 1.10                  # give a 10% raise Smith Jones >>> for person in people: print person[2] ['pay', 11000.0] ['pay', 22000.0] 

All we've really done here is add an extra level of positional indexing. To do better, we might inspect field names in loops to find the one we want (the loop uses tuple assignment here to unpack the name/value pairs):

 >>> for person in people:         for (name, value) in person:             if name == 'name': print value    # find a specific field Bob Smith Sue Jones 

Better yet, we can code a fetcher function to do the job for us:

 >>> def field(record, label):         for (fname, fvalue) in record:             if fname == label:                # find any field by name                 return fvalue >>> field(bob, 'name') 'Bob Smith' >>> field(sue, 'pay') 22000.0 >>> for rec in people:         print field(rec, 'age')               # print all ages 42 45 

If we proceed down this path, we'll eventually wind up with a set of record interface functions that generically map field names to field data. If you've done any Python coding in the past, you probably already know that there is an easier way to code this sort of association, and you can probably guess where we're headed in the next section.

2.3.2. Using Dictionaries

The list-based record representations in the prior section work, though not without some cost in terms of performance required to search for field names (assuming you need to care about milliseconds and such). But if you already know some Python, you also know that there are more convenient ways to associate property names and values. The built-in dictionary object is a natural:

 >>> bob = {'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'} >>> sue = {'name': 'Sue Jones', 'age': 45, 'pay': 40000, 'job': 'mus'} 

Now, Bob and Sue are objects that map field names to values automatically, and they make our code more understandable and meaningful. We don't have to remember what a numeric offset means, and we let Python search for the value associated with a field's name with its efficient dictionary indexing:

 >>> bob['name'], sue['pay']            # not bob[0], sue[2] ('Bob Smith', 40000) >>> bob['name'].split( )[-1] 'Smith' >>> sue['pay'] *= 1.10 >>> sue['pay'] 44000.0 

Because fields are accessed mnemonically now, they are more meaningful to those who read your code (including you).

2.3.2.1. Other ways to make dictionaries

Dictionaries turn out to be so useful in Python programming that there are even more convenient ways to code them than the traditional literal syntax shown earliere.g., with keyword arguments and the type constructor:

 >>> bob = dict(name='Bob Smith', age=42, pay=30000, job='dev') >>> bob {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} 

Other Uses for Lists

Lists are convenient any time we need an ordered container of other objects that may need to change over time. A simple way to represent matrixes in Python, for instance, is as a list of nested liststhe top list is the matrix, and the nested lists are the rows:

 >>> M = [[1, 2, 3],                 # 3x3, 2-dimensional          [4, 5, 6],          [7, 8, 9]] >>> N = [[2, 2, 2],          [3, 3, 3],          [4, 4, 4]] 

Now, to combine one matrix's components with another's, step over their indexes with nested loops; here's a simple pairwise multiplication:

 >>> for i in range(3):         for j in range(3):             print M[i][j] * N[i][j],         print 2 4 6 12 15 18 28 32 36 

To build up a new matrix with the results, we just need to create the nested list structure along the way:

 >>> tbl = [] >>> for i in range(3):         row = []         for j in range(3):             row.append(M[i][j] * N[i][j])         tbl.append(row) >>> tbl [[2, 4, 6], [12, 15, 18], [28, 32, 36]] 

Nested list comprehensions such as either of the following will do the same job, albeit at some cost in complexity (if you have to think hard about expressions like these, so will the next person who has to read your code!):

 [[M[i][j] * N[i][j] for j in range(3)] for i in range(3)] [[x * y for x, y in zip(row1, row2)]             for row1, row2 in zip(M, N)] 

List comprehensions are powerful tools, provided you restrict them to simple tasksfor example, listing selected module functions, or stripping end-of-lines:

 >>> import sys >>> [x for x in dir(sys) if x.startswith('getr')] ['getrecursionlimit', 'getrefcount'] >>> lines = [line.rstrip( ) for line in open('README.txt')] >>> lines[0] 'This is Python version 2.4 alpha 3' 

If you are interested in matrix processing, also see the mathematical and scientific extensions available for Python in the public domain, such as those available through NumPy and SciPy. The code here works, but extensions provide optimized tools. NumPy, for instance, is seen by some as an open source Matlab equivalent.


by filling out a dictionary one field at a time:

 >>> sue = {} >>> sue['name'] = 'Sue Jones' >>> sue['age']  = 45 >>> sue['pay']  = 40000 >>> sue['job']  = 'mus' >>> sue {'job': 'mus', 'pay': 40000, 'age': 45, 'name': 'Sue Jones'} 

and by zipping together name/value lists:

 >>> names  = ['name', 'age', 'pay', 'job'] >>> values = ['Sue Jones', 45, 40000, 'mus'] >>> zip(names, values) [('name', 'Sue Jones'), ('age', 45), ('pay', 40000), ('job', 'mus')] >>> sue = dict(zip(names, values)) >>> sue {'job': 'mus', 'pay': 40000, 'age': 45, 'name': 'Sue Jones'} 

We can even make dictionaries today from a sequence of key values and an optional starting value for all the keys (handy to initialize an empty dictionary):

 >>> fields = ('name', 'age', 'job', 'pay') >>> record = dict.fromkeys(fields, '?') >>> record {'job': '?', 'pay': '?', 'age': '?', 'name': '?'} 

2.3.2.2. Lists of dictionaries

Regardless of how we code them, we still need to collect our records into a database; a list does the trick again, as long as we don't require access by key:

 >>> people = [bob, sue] >>> for person in people:         print person['name'], person['pay']          # all name, pay Bob Smith 30000 Sue Jones 44000.0 >>> for person in people:         if person['name'] == 'Sue Jones':            # fetch sue's pay             print person['pay'] 44000.0 

Iteration tools work just as well here, but we use keys rather than obscure positions (in database terms, the list comprehension and map in the following code project the database on the "name" field column):

 >>> names = [person['name'] for person in people]    # collect names >>> names ['Bob Smith', 'Sue Jones'] >>> map((lambda x: x['name']), people)               # ditto ['Bob Smith', 'Sue Jones'] >>> sum(person['pay'] for person in people)          # sum all pay 74000.0 

And because dictionaries are normal Python objects, these records can also be accessed and updated with normal Python syntax:

 >>> for person in people:         print person['name'].split( )[-1]             # last name         person['pay'] *= 1.10                        # a 10% raise Smith Jones >>> for person in people: print person['pay'] 33000.0 48400.0 

2.3.2.3. Nested structures

Incidentally, we could avoid the last-name extraction code in the prior examples by further structuring our records. Because all of Python's compound datatypes can be nested inside each other and as deeply as we like, we can build up fairly complex information structures easilysimply type the object's syntax, and Python does all the work of building the components, linking memory structures, and later reclaiming their space. This is one of the great advantages of a scripting language such as Python.

The following, for instance, represents a more structured record by nesting a dictionary, list, and tuple inside another dictionary:

 >>> bob2 = {'name': {'first': 'Bob', 'last': 'Smith'},             'age':  42,             'job':  ['software', 'writing'],             'pay':  (40000, 50000)} 

Because this record contains nested structures, we simply index twice to go two levels deep:

 >>> bob2['name']                            # bob's full name {'last': 'Smith', 'first': 'Bob'} >>> bob2['name']['last']                    # bob's last name 'Smith' >>> bob2['pay'][1]                          # bob's upper pay 50000 

The name field is another dictionary here, so instead of splitting up a string, we simply index to fetch the last name. Moreover, people can have many jobs, as well as minimum and maximum pay limits. In fact, Python becomes a sort of query language in such caseswe can fetch or change nested data with the usual object operations:

 >>> for job in bob2['job']: print job       # all of bob's jobs software writing >> bob2['job'][-1]                          # bob's last job 'writing' >>> bob2['job'].append('janitor')           # bob gets a new job >>> bob2 {'job': ['software', 'writing', 'janitor'], 'pay': (40000, 50000), 'age': 42, 'name': {'last': 'Smith', 'first': 'Bob'}} 

It's OK to grow the nested list with append, because it is really an independent object. Such nesting can come in handy for more sophisticated applications; to keep ours simple, we'll stick to the original flat record structure.

2.3.2.4. Dictionaries of dictionaries

One last twist on our people database: we can get a little more mileage out of dictionaries here by using one to represent the database itself. That is, we can use a dictionary of dictionariesthe outer dictionary is the database, and the nested dictionaries are the records within it. Rather than a simple list of records, a dictionary-based database allows us to store and retrieve records by symbolic key:

 >>> db = {} >>> db['bob'] = bob >>> db['sue'] = sue >>> >>> db['bob']['name']                    # fetch bob's name 'Bob Smith' >>> db['sue']['pay'] = 50000             # change sue's pay >>> db['sue']['pay']                     # fetch sue's pay 50000 

Notice how this structure allows us to access a record directly instead of searching for it in a loop (we get to Bob's name immediately by indexing on key bob). This really is a dictionary of dictionaries, though you won't see all the gory details unless you display the database all at once:

 >>> db {'bob': {'pay': 33000.0, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'},  'sue': {'job': 'mus', 'pay': 50000, 'age': 45, 'name': 'Sue Jones'}} 

If we still need to step through the database one record at a time, we can now rely on dictionary iterators. In recent Python releases, a dictionary iterator produces one key in a for loop each time through (in earlier releases, call the keys method explicitly in the for loop: say db.keys( ) rather than just db):

 >>> for key in db:         print key, '=>', db[key]['name'] bob => Bob Smith sue => Sue Jones >>> for key in db:         print key, '=>', db[key]['pay'] bob => 33000.0 sue => 50000 

To visit all records, either index by key as you go:

 >>> for key in db:         print db[key]['name'].split( )[-1]         db[key]['pay'] *= 1.10 Smith Jones 

or step through the dictionary's values to access records directly:

 >>> for record in db.values( ): print record['pay'] 36300.0 55000.0 >>> x = [db[key]['name'] for key in db] >>> x ['Bob Smith', 'Sue Jones'] >>> x = [rec['name'] for rec in db.values( )] >>> x ['Bob Smith', 'Sue Jones'] 

And to add a new record, simply assign it to a new key; this is just a dictionary, after all:

 >>> db['tom'] = dict(name='Tom', age=50, job=None, pay=0) >>> >>> db['tom'] {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'} >>> db['tom']['name'] 'Tom' >>> db.keys( ) ['bob', 'sue', 'tom'] >>> len(db) 3 

Although our database is still a transient object in memory, it turns out that this dictionary-of-dictionaries format corresponds exactly to a system that saves objects permanentlythe shelve (yes, this should be shelf grammatically speaking, but the Python module name and term is shelve). To learn how, let's move on to the next section.




Programming Python
Programming Python
ISBN: 0596009259
EAN: 2147483647
Year: 2004
Pages: 270
Authors: Mark Lutz

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net