Section 2.5. Step 3: Stepping Up to OOP

2.5. Step 3: Stepping Up to OOP

Let's step back for a moment and consider how far we've come. At this point, we've created a database of records: the shelve, as well as per-record pickle file approaches of the prior section suffice for basic data storage tasks. As is, our records are represented as simple dictionaries, which provide easier-to-understand access to fields than do lists (by key, rather than by position). Dictionaries, however, still have some limitations that may become more critical as our program grows over time.

For one thing, there is no central place for us to collect record processing logic. Extracting last names and giving raises, for instance, can be accomplished with code like the following:

 >>> import shelve >>> db = shelve.open('people-shelve') >>> bob = db['bob'] >>> bob['name'].split( )[-1]             # get bob's last name 'Smith' >>> sue = db['sue'] >>> sue['pay'] *= 1.25                  # give sue a raise >>> sue['pay'] 75000.0 >>> db['sue'] = sue >>> db.close( )

This works, and it might suffice for some short programs. But if we ever need to change the way last names and raises are implemented, we might have to update this kind of code in many places in our program. In fact, even finding all such magical code snippets could be a challenge; hardcoding or cutting and pasting bits of logic redundantly like this in more than one place will almost always come back to haunt you eventually.

It would be better to somehow hidethat is, encapsulatesuch bits of code. Functions in a module would allow us to implement such operations in a single place and thus avoid code redundancy, but still wouldn't naturally associate them with the records themselves. What we'd like is a way to bind processing logic with the data stored in the database in order to make it easier to understand, debug, and reuse.

Another downside to using dictionaries for records is that they are difficult to expand over time. For example, suppose that the set of data fields or the procedure for giving raises is different for different kinds of people (perhaps some people get a bonus each year and some do not). If we ever need to extend our program, there is no natural way to customize simple dictionaries. For future growth, we'd also like our software to support extension and customization in a natural way.

This is where Python's OOP support begins to become attractive:

Structure: With OOP, we can naturally associate processing logic with record dataclasses provide both a program unit that combines logic and data in a single package and a hierarchy that allows code to be easily factored to avoid redundancy.
Encapsulation: With OOP, we can also wrap up details such as name processing and pay increases behind method functionsi.e., we are free to change method implementations without breaking their users.
Customization: And with OOP, we have a natural growth path. Classes can be extended and customized by coding new subclasses, without changing or breaking already working code.

That is, under OOP, we program by customizing and reusing, not by rewriting. OOP is an option in Python and, frankly, is sometimes better suited for strategic than for tactical tasks. It tends to work best when you have time for upfront planningsomething that might be a luxury if your users have already begun storming the gates.

But especially for larger systems that change over time, its code reuse and structuring advantages far outweigh its learning curve, and it can substantially cut development time. Even in our simple case, the customizability and reduced redundancy we gain from classes can be a decided advantage.

2.5.1. Using Classes

OOP is easy to use in Python, thanks largely to Python's dynamic typing model. In fact, it's so easy that we'll jump right into an example: Example 2-14 implements our database records as class instances rather than as dictionaries.

Example 2-14. PP3E\Preview\person_start.py

 class Person:     def _ _init_ _(self, name, age, pay=0, job=None):         self.name = name         self.age  = age         self.pay  = pay         self.job  = job if _ _name_ _ == '_ _main_ _':     bob = Person('Bob Smith', 42, 30000, 'sweng')     sue = Person('Sue Jones', 45, 40000, 'music')     print bob.name, sue.pay     print bob.name.split( )[-1]     sue.pay *= 1.10     print sue.pay

There is not much to this classjust a constructor method that fills out the instance with data passed in as arguments to the class name. It's sufficient to represent a database record, though, and it can already provide tools such as defaults for pay and job fields that dictionaries cannot. The self-test code at the bottom of this file creates two instances (records) and accesses their attributes (fields); here is this file being run under IDLE:

 >>> Bob Smith 40000 Smith 44000.0

This isn't a database yet, but we could stuff these objects into a list or dictionary as before in order to collect them as a unit:

 >>> from person_start import Person >>> bob = Person('Bob Smith', 42) >>> sue = Person('Sue Jones', 45, 40000) >>> people = [bob, sue]                          # a "database" list >>> for person in people:         print person.name, person.pay Bob Smith 0 Sue Jones 40000 >>> x = [(person.name, person.pay) for person in people] >>> x [('Bob Smith', 0), ('Sue Jones', 40000)]

Notice that Bob's pay defaulted to zero this time because we didn't pass in a value for that argument (maybe Sue is supporting him now?). We might also implement a class that represents the database, perhaps as a subclass of the built-in list or dictionary types, with insert and delete methods that encapsulate the way the database is implemented. We'll abandon this path for now, though, because it will be more useful to store these records persistently in a shelve, which already encapsulates stores and fetches behind an interface for us. Before we do, though, let's add some logic.

2.5.2. Adding Behavior

So far, our class is just data: it replaces dictionary keys with object attributes, but it doesn't add much to what we had before. To really leverage the power of classes, we need to add some behavior. By wrapping up bits of behavior in class method functions, we can insulate clients from changes. And by packaging methods in classes along with data, we provide a natural place for readers to look for code. In a sense, classes combine records and the programs that process those records; methods provide logic that interprets and updates the data.

For instance, Example 2-15 adds the last-name and raise logic as class methods; methods use the self argument to access or update the instance (record) being processed.

Example 2-15. PP3E\Preview\person.py

 class Person:     def _ _init_ _(self, name, age, pay=0, job=None):         self.name = name         self.age  = age         self.pay  = pay         self.job  = job     def lastName(self):         return self.name.split( )[-1]     def giveRaise(self, percent):         self.pay *= (1.0 + percent) if _ _name_ _ == '_ _main_ _':     bob = Person('Bob Smith', 42, 30000, 'sweng')     sue = Person('Sue Jones', 45, 40000, 'music')     print bob.name, sue.pay     print bob.lastName( )     sue.giveRaise(.10)     print sue.pay

The output of this script is the same as the last, but the results are being computed by methods now, not by hardcoded logic that appears redundantly wherever it is required:

 >>> Bob Smith 40000 Smith 44000.0

2.5.3. Adding Inheritance

One last enhancement to our records before they become permanent: because they are implemented as classes now, they naturally support customization through the inheritance search mechanism in Python. Example 2-16, for instance, customizes the last section's Person class in order to give a 10 percent bonus by default to managers whenever they receive a raise (any relation to practice in the real world is purely coincidental).

Example 2-16. PP3E\Preview\manager.py

 from person import Person class Manager(Person):     def giveRaise(self, percent, bonus=0.1):         self.pay *= (1.0 + percent + bonus) if _ _name_ _ == '_ _main_ _':     tom = Manager(name='Tom Doe', age=50, pay=50000)     print tom.lastName( )     tom.giveRaise(.20)     print tom.pay >>> Doe 65000.0

Here, the Manager class appears in a module of its own, but it could have been added to the person module instead (Python doesn't require just one class per file). It inherits the constructor and last-name methods from its superclass, but it customizes just the raise method. Because this change is being added as a new subclass, the original Person class, and any objects generated from it, will continue working unchanged. Bob and Sue, for example, inherit the original raise logic, but Tom gets the custom version because of the class from which he is created. In OOP, we program by customizing, not by changing.

In fact, code that uses our objects doesn't need to be at all ware of what the raise method doesit's up to the object to do the right thing based on the class from which it is created. As long as the object supports the expected interface (here, a method called giveRaise), it will be compatible with the calling code, regardless of its specific type, and even if its method works differently than others.

If you've already studied Python, you may know this behavior as polymorphism; it's a core property of the language, and it accounts for much of your code's flexibility. When the following code calls the giveRaise method, for example, what happens depends on the obj object being processed; Tom gets a 20 percent raise instead of 10 percent because of the Manager class's customization:

 >>> from person import Person >>> from manager import Manager >>> bob = Person(name='Bob Smith', age=42, pay=10000) >>> sue = Person(name='Sue Jones', age=45, pay=20000) >>> tom = Manager(name='Tom Doe',  age=55, pay=30000) >>> db = [bob, sue, tom] >>> for obj in db:         obj.giveRaise(.10)         # default or custom >>> for obj in db:         print obj.lastName( ), '=>', obj.pay Smith => 11000.0 Jones => 22000.0 Doe => 36000.0

2.5.4. Refactoring Code

Before we move on, there are a few coding alternatives worth noting here. Most of these underscore the Python OOP model, and they serve as a quick review.

2.5.4.1. Augmenting methods

As a first alternative, notice that we have introduced some redundancy in Example 2-16: the raise calculation is now repeated in two places (in the two classes). We could also have implemented the customized Manager class by augmenting the inherited raise method instead of replacing it completely:

 class Manager(Person):     def giveRaise(self, percent, bonus=0.1):         Person.giveRaise(self, percent + bonus)

The trick here is to call back the superclass's version of the method directly, passing in the self argument explicitly. We still redefine the method, but we simply run the general version after adding 10 percent (by default) to the passed-in percentage. This coding pattern can help reduce code redundancy (the original raise method's logic appears in only one place and so is easier to change) and is especially handy for kicking off superclass constructor methods in practice.

If you've already studied Python OOP, you know that this coding scheme works because we can always call methods through either an instance or the class name. In general, the following are equivalent, and both forms may be used explicitly:

 instance.method(arg1, arg2) class.method(instance, arg1, arg2)

In fact, the first form is mapped to the secondwhen calling through the instance, Python determines the class by searching the inheritance tree for the method name and passes in the instance automatically. Either way, within giveRaise, self refers to the instance that is the subject of the call.

2.5.4.2. Display format

For more object-oriented fun, we could also add a few operator overloading methods to our people classes. For example, a _ _str_ _ method, shown here, could return a string to give the display format for our objects when they are printed as a wholemuch better than the default display we get for an instance:

 class Person:     def _ _str_ _(self):         return '<%s => %s>' % (self._ _class_ _._ _name_ _, self.name) tom = Manager('Tom Jones', 50) print tom                                # prints: <Manager => Tom Jones>

Here _ _class_ _ gives the lowest class from which self was made, even though _ _str_ _ may be inherited. The net effect is that _ _str_ _ allows us to print instances directly instead of having to print specific attributes. We could extend this _ _str_ _ to loop through the instance's _ _dict_ _ attribute dictionary to display all attributes generically.

We might even code an _ _add_ _ method to make + expressions automatically call the giveRaise method. Whether we should is another question; the fact that a + expression gives a person a raise might seem more magical to the next person reading our code than it should.

2.5.4.3. Constructor customization

Finally, notice that we didn't pass the job argument when making a manager in Example 2-16; if we had, it would look like this with keyword arguments:

 tom = Manager(name='Tom Doe', age=50, pay=50000, job='manager')

The reason we didn't include a job in the example is that it's redundant with the class of the object: if someone is a manager, their class should imply their job title. Instead of leaving this field blank, though, it may make more sense to provide an explicit constructor for managers, which fills in this field automatically:

 class Manager(Person):     def _ _init_ _(self, name, age, pay):         Person._ _init_ _(self, name, age, pay, 'manager')

Now when a manager is created, its job is filled in automatically. The trick here is to call to the superclass's version of the method explicitly, just as we did for the giveRaise method earlier in this section; the only difference here is the unusual name for the constructor method.

2.5.4.4. Alternative classes

We won't use any of this section's three extensions in later examples, but to demonstrate how they work, Example 2-17 collects these ideas in an alternative implementation of our Person classes.

Example 2-17. PP3E\Preview\people-alternative.py

 """ alternative implementation of person classes data, behavior, and operator overloading """ class Person:     """     a general person: data+logic     """     def _ _init_ _(self, name, age, pay=0, job=None):         self.name = name         self.age  = age         self.pay  = pay         self.job  = job     def lastName(self):         return self.name.split( )[-1]     def giveRaise(self, percent):         self.pay *= (1.0 + percent)     def _ _str_ _(self):         return ('<%s => %s: %s, %s>' %                (self._ _class_ _._ _name_ _, self.name, self.job, self.pay)) class Manager(Person):     """     a person with custom raise     inherits general lastname, str     """     def _ _init_ _(self, name, age, pay):         Person._ _init_ _(self, name, age, pay, 'manager')     def giveRaise(self, percent, bonus=0.1):         Person.giveRaise(self, percent + bonus) if _ _name_ _ == '_ _main_ _':     bob = Person('Bob Smith', 44)     sue = Person('Sue Jones', 47, 40000, 'music')     tom = Manager(name='Tom Doe', age=50, pay=50000)     print sue, sue.pay, sue.lastName( )     for obj in (bob, sue, tom):         obj.giveRaise(.10)                 # run this obj's giveRaise         print obj                          # run common _ _str_ _ method

Notice the polymorphism in this module's self-test loop: all three objects share the constructor, last-name, and printing methods, but the raise method called is dependent upon the class from which an instance is created. When run, Example 2-17 prints the following to standard outputthe manager's job is filled in at construction, we get the new custom display format for our objects, and the new version of the manager's raise method works as before:

 <Person => Sue Jones: music, 40000> 40000 Jones <Person => Bob Smith: None, 0.0> <Person => Sue Jones: music, 44000.0> <Manager => Tom Doe: manager, 60000.0>

Such refactoring (restructuring) of code is common as class hierarchies grow and evolve. In fact, as is, we still can't give someone a raise if his pay is zero (Bob is out of luck); we probably need a way to set pay, too, but we'll leave such extensions for the next release. The good news is that Python's flexibility and readability make refactoring easyit's simple and quick to restructure your code. If you haven't used the language yet, you'll find that Python development is largely an exercise in rapid, incremental, and interactive programming, which is well suited to the shifting needs of real-world projects.

2.5.5. Adding Persistence

It's time for a status update. We now have encapsulated in the form of classes customizable implementations of our records and their processing logic. Making our class-based records persistent is a minor last step. We could store them in per-record pickle files again; a shelve-based storage medium will do just as well for our goals and is often easier to code. Example 2-18 shows how.

Example 2-18. PP3E\Preview\make_db_classes.py

 import shelve from person import Person from manager import Manager bob = Person('Bob Smith', 42, 30000, 'sweng') sue = Person('Sue Jones', 45, 40000, 'music') tom = Manager('Tom Doe',  50, 50000) db = shelve.open('class-shelve') db['bob'] = bob db['sue'] = sue db['tom'] = tom db.close( )

This file creates three class instances (two from the original class and one from its customization) and assigns them to keys in a newly created shelve file to store them permanently. In other words, it creates a shelve of class instances; to our code, the database looks just like a dictionary of class instances, but the top-level dictionary is mapped to a shelve file again. To check our work, Example 2-19 reads the shelve and prints fields of its records.

Example 2-19. PP3E\Preview\dump_db_class.py

 import shelve db = shelve.open('class-shelve') for key in db:     print key, '=>\n  ', db[key].name, db[key].pay bob = db['bob'] print bob.lastName( ) print db['tom'].lastName( )

Note that we don't need to reimport the Person class here in order to fetch its instances from the shelve or run their methods. When instances are shelved or pickled, the underlying pickling system records both instance attributes and enough information to locate their classes automatically when they are later fetched (the class's module simply has to be on the module search path when an instance is loaded). This is on purpose; because the class and its instances in the shelve are stored separately, you can change the class to modify the way stored instances are interpreted when loaded (more on this later in the book). Here is the shelve dump script running under IDLE just after creating the shelve:

 >>> tom =>    Tom Doe 50000 bob =>    Bob Smith 30000 sue =>    Sue Jones 40000 Smith Doe

As shown in Example 2-20, database updates are as simple as before, but dictionary keys become object attributes and updates are implemented by method calls, not by hardcoded logic. Notice how we still fetch, update, and reassign to keys to update the shelve.

Example 2-20. PP3E\Preview\update_db_class.py

 import shelve db = shelve.open('class-shelve') sue = db['sue'] sue.giveRaise(.25) db['sue'] = sue tom = db['tom'] tom.giveRaise(.20) db['tom'] = tom db.close( )

And last but not least, here is the dump script again after running the update script; Tom and Sue have new pay values, because these objects are now persistent in the shelve. We could also open and inspect the shelve by typing code at Python's interactive command line; despite its longevity, the shelve is just a Python object containing Python objects.

 >>> tom =>    Tom Doe 65000.0 bob =>    Bob Smith 30000 sue =>    Sue Jones 50000.0 Smith Doe

Tom and Sue both get a raise this time around, because they are persistent objects in the shelve database. Although shelves can store simpler object types such as lists and dictionaries, class instances allow us to combine both data and behavior for our stored items. In a sense, instance attributes and class methods take the place of records and processing programs in more traditional schemes.

2.5.6. Other Database Options

At this point, we have a full-fledged database system: our classes simultaneously implement record data and record processing, and they encapsulate the implementation of the behavior. And the Python pickle and shelve modules provide simple ways to store our database persistently between program executions. This is not a relational database (we store objects, not tables, and queries take the form of Python object processing code), but it is sufficient for many kinds of programs.

If we need more functionality, we could migrate this application to even more powerful tools. For example, should we ever need full-blown SQL query support, there are interfaces that allow Python scripts to communicate with relational databases such as MySQL, PostgreSQL, and Oracle in portable ways.

Moreover, the open source ZODB system provides a more comprehensive object database for Python, with support for features missing in shelves, including concurrent updates, transaction commits and rollbacks, automatic updates on in-memory component changes, and more. We'll explore these more advanced third-party tools in Chapter 19. For now, let's move on to putting a good face on our system.