6.3. Exploring the WhatWhat Status Model
Now that you have a picture of how WhatWhat Status works, let's take a look at the code. The authors of WhatWhat Status have built a significantly more complicated database structure than our Bookmarker application, and although there should be plenty of familiar looking code, we also find quite a few new TurboGears features. In general, I find that if I start my code reviews by trying to understand the data model and working my way up through the controllers to the front end templates, I understand everything better. Here's a picture of the WhatWhat Status data model:
Figure 6.4. WhatWhat Status data diagram
As you can see, there's actually quite a bit going on in the WhatWhat Status model, and that's not the half of it because the WhatWhat Status authors have created a number of methods in the SQLObject classes that map to each of the fields in the database. Unfortunately, that means that we won't have the time or the space to cover every single class or method in depth, but we highlight all the interesting parts so that you can easily go over the rest of the model code at your leisure and understand everything that is happening.
There are quite a few imports at the top of the WhatWhat Status model.py file, which pull in all of SQLObject's column type classes:
from sqlobject import (SQLObject, UnicodeCol, StringCol, ForeignKey, MultipleJoin, IntCol, DateTimeCol, BoolCol, RelatedJoin, DatabaseIndex) from turbogears.database import PackageHub from datetime import datetime, timedelta
There are a few constants defined, but let's skip on down to the first table object, which defines the Person class, which is used to store information about WhatWhat Status users.
class Person(SQLObject): displayName = StringCol(length=255, alternateID=True) userId = StringCol(length=32, alternateID=True) password = StringCol() emailAddress = StringCol() groups = RelatedJoin("Groups", intermediateTable="tg_user_group", joinColumn="user_id", otherColumn="group_id") def _get_permissions( self ): perms = set() for g in self.groups: perms = perms | set(g.permissions) return perms
The Person class is pretty simple, but there are a couple of new things worth mentioning here.
The first is that SQLObject provides easy ways for you to customize the way your column objects are mapped to database tables, right in your model class definition. The authors of WhatWhat Status are using this SQLObject feature to customize the groups column class, by defining the name of the intermediate table that they want created, as well as the column names they want included in that table.
Remember, SQLObject enables you to define properties for a column. And we've already seen that the most important property on a RelatedJoin class is the first, which tells SQLObject which SQLObject class this RelatedJoin connects to (in this case) Groups.
You might think it's easier to just pass the Groups class in here, rather than a string, but it's possible that this class is going to have a circular reference to the Groups class, and that can create ugly problems at instantiation time. Passing a string allows SQLObject a fighting chance to resolve this for you.
There are other SQLObject class properties; WhatWhat Status uses the intermediateTable, joinColumn, and otherColumn properties to override the default names for the join table and its columns.
This is particularly useful when you want to connect to a legacy database, where you don't want your Python objects to be named after the existing column names.
The second thing to notice is the _get_ syntax; this is a SQLObject shortcut that can look like magic until you understand it. Any time you create a _get_something method, SQLObject automatically creates a something attribute for your class. When you try to retrieve the something attribute, _get_something is called, and the results are returned. This can be used to override attribute access for existing table rows or to create entirely new attributes (in this case, def _get_permissions).
This is required because Person replaces TurboGears.identity's User table and needs to provide the API that Identity is looking for in a user type object. We return to this method and delve into the details of identity, custom identity providers, and how WhatWhat Status user authentication/authorization works in Chapter 22, "Turbo-Gears Identity and Security."
class Groups(SQLObject): groupId = StringCol(length=16, alternateID=True) displayName = StringCol(length=255) created = DateTimeCol(default=datetime.now) # collection of all users belonging to this group users = RelatedJoin("Person", intermediateTable="tg_user_group", joinColumn="group_id", otherColumn="user_id") # collection of all permissions for this group permissions = RelatedJoin("Permission", joinColumn="group_id", intermediateTable="tg_group_permission", otherColumn="permission_id")
This code does a lot of the same things we've seen before, but there are a couple things worth looking at in a bit more depth in the Groups class.
In line 4, notice that we can use standard Python expressions when setting the default value for a particular column. Not only that, but the DateTimeCol knows enough to take a Python datetime object.
In line 7, we see the RelatedJoin that matches the one we saw earlier in the Person table. This allows our Identity system to use the group() method on any Person object to find all the groups that user is a member of, or the users method of any Group object to find all the people who are in that group.
The next class we are going to look at is by far the largest and most complex in our model. But don't worry, we've seen a lot of it before, and we'll go slowly and highlight each of the pieces that are new.
class Project(SQLObject): name = UnicodeCol(length=255, alternateID=True) description = UnicodeCol() archived = BoolCol(default=False) status = IntCol(default=STATUS_NOT_STARTED) parent_project = ForeignKey('Project') contact = ForeignKey('Person') issues = MultipleJoin('Issue', orderBy=['-impact', '-creation_date']) risks = MultipleJoin('Risk', orderBy=['-impact', '-creation_date']) questions = MultipleJoin('Question') notes = MultipleJoin('Note', orderBy='-creation_date') sub_projects = MultipleJoin('Project', joinColumn='parent_project_id') files = MultipleJoin('ProjectFile', orderBy='creation_date') idx_project = DatabaseIndex('parent_project') idx_contact = DatabaseIndex('contact')
A couple of new features here can save you a lot of time and hassle if you learn them:
The orderBy property works for any kind of column, and it can reduce the need to pass special sort-order information in when you call a select object. For example, WhatWhat Status always lists the issues for a particular project sorted first by impact and then by date. So, they add an orderBy property to the issues column to create a default order for the SQLObject result set that is returned through that join.
Because WhatWhat Status defines orderBy to be ['-impact', '-creation_date'], the results of a call to a Project object's issues method will return a set of issues sorted first by impact, in reverse (high to low) order, and then by creation date (again in reverse order).
The idx_project = DatabaseIndex('parent_project') line is useful only if you are using tg-admin sql create to generate your tables automatically from your SQLObject model classes. When you create a new database using sql create, this line adds an index for the parent_project column to the Project table, which can drastically increase performance on some select queries.
We look at this in more detail in Chapter 12, "Customizing SQLObject Behavior," but it's worth mentioning a couple of additional features of the DatabaseIndex syntax here. You can pass multiple columns to the DatabaseIndex method, and even to SQLObject's special sqlbuilder expressions, to make multicolumn indexes, or other "special" indexes. You can also require that the index (even a multicolumn index) be unique by adding a unique = True parameter.
The WhatWhat Status developers have added more than 20 additional methods to the Project class that will make common actions they want to take easier to handle in the controller. Here are a couple of examples:
def _get_recent_notes(self): now = datetime.now() delta = timedelta(15) return [note for note in self.notes if (now - note.creation_date <= delta) or (note.last_edit_date is not None and now - note.last_edit_date <= delta)] def _get_open_risks(self): return [risk for risk in self.risks if not risk.closed] def _get_closed_risks(self): return [risk for risk in self.risks if risk.closed]
By using _get_recent_notes, the controller can get back a list of all notes added or edited in the past 15 days for a particular project. Remember, SQLObject automaticallygeneratesa recent_notes attribute from the _get_recent_notes method, which always returns only the notes created or modified within the past 15 days. This returns a list of notes for each note in self.notes, which fits the filter criteria. Notice how using SQLObject's multiple join feature makes the call to the database totally transparent. You just have a project object and you call its notes method to get an iterator back, no SQL trickery needed.
If you aren't familiar with list comprehensions, here's what they do: They enable you to create a new list by filtering the contents of an existing list based on the expression that comes after the list.
The basic syntax of a list comprehension is: for item in iterable if expression. So you could write something like the following one liner to get a new list of all the items with a value greater than 1 in a list_or_iterator.
new_list for item in list_or_iterator if item > 1
Of course, using list comprehensions to filter a SQLObject results iterator is not the most high-performance way to go about getting a result set back from the database. It'll go to the database and retrieve all the members of that SQLObject result set, create a copy in memory, and only then run the list comprehension to filter the results down to what you want.
In general, the fastest way to get a list that contains only the notes related to this project added or edited in the past 15 days is to use SQLObject's query builder syntax to do this same query, using database-independent code to build a SQL query that only returns the rows you need. SQLObject's query builder enables you to create arbitrarily complex queries that perform similarly to native SQL, but also have database independence and give you the easy SQLSelect result set access methods that we've come to expect from our model objects.
In this case, there are unlikely to be more than a few dozen notes per project, so it's quite unlikely that this operation is going to be a bottleneck. So, it's better to do this the easy way in standard Python. However, if we were to profile WhatWhat Status later (as you learn about in Chapter 21, "Testing a TurboGears Application"), and discover that this method actually is a bottleneck, we could easily replace the contents of this method with an optimized version that uses the SQLObject query builder (without having to change a line of controller or view code).
In some cases, you might find that SQLObject's caching behavior works in your favor, particularly if you are doing a series of list comprehensions on the same results, because you avoid the round-trip overhead of another database call. (For more information on the theory behind this, see Martin Fowler's excellent book Patterns of Enterprise Application Architecture; for more information about SQLObject and performance, see Chapter 11, "Mastering SQLObject," and Chapter 12, "Customizing SQLObject Behavior").
Skipping over a few methods that don't do anything new, we find _total_questions has a slight twist on the same list-comprehension-as-results-filter idiom we've been discussing:
def _total_questions(self): list1 = [question for question in self.questions] list2 = [question for sub_project in self.sub_projects for question in sub_project.questions] return list1 + list2
At first glance, this seems to be exactly the same thing we've already seen; and for the most part, it is. However, the call to self.sub_project bears mentioning because sub_project is a multiple join against projects, so we aren't looking to another database table to find our sub_projects. Every sub_project is itself a project with all the same fields as any other project. So, we are now selecting all the sub_projects related to this project and iterating over them to get a list of questions, returning both lists, so _total_questions is actually returning all the questions associated with this project and with all of its sub_projects.