Section 10.1. ORM Basics | Rapid Web Applications with TurboGears: Using Python to Create Ajax-Powered Sites

10.1. ORM Basics

An ORM is a software layer that maps data stored in a relational database to an object model in some object-oriented programming language.

You probably already know about relational databases. They are powerful tools to store data and to help ensure the integrity of that data. The theory behind relational databases was first explored by E. F. Codd in 1970, and in the years since then, various products that implement relational theory have grown up to become the dominant solution for storing and retrieving persistent data.

Relational databases can be accessed using the Structured Query Language (SQL), and every popular programming language has libraries that allow manipulating data in relational database management systems (RDBMSs) such as Postgres, MySQL, MSSQL Server, and Oracle.

Not long after relational algebra was invented and the first RDBMSs started appearing, another similar rise from obscurity to market dominance took place. Procedural programming began to give way to object-oriented programming, which allowed the bundling of data and actions into objects.

The problem we face is that when we write object-oriented applications, we want to manipulate the world through object models and not through the totally different paradigm of relational algebra. The ORM attempts to solve this problem by automatically mapping a particular DB schema to a corresponding object model in a specific language. For example, an employee table that contains columns such as employee_id, department_id, name, and manager, will be mapped to an Employee class and the columns will be mapped to properties/attributes.

Note

There are also generic object-oriented application programming interfaces (APIs) for databases such as ADO.NET. These APIs enable the developer to access the database through objects and get objects as results of queries, but the level of abstraction is lower. The objects these APIs expose are tables, row/record sets, columns, and constraints.

10.1.1. Who Needs ORM?

It turns out that accessing data through object models (at least in object-oriented programming languages) is a popular approach. It removes the cognitive dissonance between totally different paradigms, it allows developers to think about everything in terms of objects, and in many cases, it abstracts away the particular database implementation so that it's easy to switch to another vendor. Many developers who don't use an ORM solution end up doing the mapping manually. Typically, such developers will have a data access layer in their code that accesses the DB through SQL or another API and constructs an object model based on the results. Developing this manual ORM layer is a tedious and error-prone job. Any change to the schema or the object model requires synchronization with the other component and in the code that performs the mapping. ORM is really an automated implementation of the Data Access Object (DAO) design pattern.

10.1.2. ORM Alternatives

You have several alternatives to ORM. First of all, you can just use SQL or some other non-object-oriented API and live with it. You can also generate the data access layer manually. Another option is to use object-oriented DB or XML. All of these are viable solutions and may be more suitable for certain situations. However, ORM is the mainstream approach these days.

10.1.3. ORM Negatives

ORM is not a silver bullet. The biggest problem with ORM is the impedance mismatch between the relational paradigm and the object-oriented paradigm. This means that sometimes there is not an easy way to map a relational schema to an object model. For example, inheritance is an object-oriented feature that doesn't have a counterpart in the relational paradigm.

There are ways to simulate inheritance, but they can introduce subtle issues. Another common mapping problem is how to handle the database NULL value for static languages. Microsoft solved this problem in the .NET 2.0 by introducing Nullable types. This means that objects of any type (such as an integer) can be NULL and not just object references. SQLObject simply uses Python's "None" to mean NULL. This is natural and doesn't require any kludge or extension to the language or a special SQLObject concept.

ORMs can also introduce subtle performance problems when you don't know what is happening under the covers. If you are working with huge data sets, or you find that data access is a bottleneck for your application, you might want to know what queries are being generated by the ORM.

SQLObject provides a debug mechanism for exactly this purpose, which can be turned on easily by appending ?debug=True to the end of your database URI in the dev.cfg file. You will probably want to turn this on while you go through the examples in this chapter so that you can see exactly what SQL is being generated.