9.3 Designing a Database Interface

One central issue in the approach just described is the design of the database interface. When using the test-first approach, it will be built gradually and can constantly be adapted to the real-world requirements of the higher layers. Nevertheless, in this evolutionary approach, too, we should be careful to observe consistent naming and uniform semantics of the supplied interface methods. The design of CRMPersistence was based on the following guidelines:

We distinguish between independent and dependent objects. Independent objects (i.e., Customer and CustomerCategory in our above example) get methods to create (createXXX), override (writeXXX), and delete (deleteXXX) in and from the persistence interface. Dependent objects (i.e., only CustomerContact in this example) are stored, overwritten, and deleted through their mother objects. We could also treat Customer as a dependent class, because its instances depend on the existence of the assigned category object. However, we decided not to allow navigation from the category to the customer and to permit no automatic deletion of all customer objects belonging to a category.
Query and retrieval methods are added to the interface as needed (i.e., allCategories(), getCustomer(...), and allCustomers(...) in this example). These methods can return both independent and dependent objects.
When an object is created, written, and deleted, then this will automatically create, write, or delete all dependent objects. We have to be particularly careful about recursion when implementing this property.
All methods defined in the interface are wrapped by transactions. This means that both getter and setter access to persistent objects outside of transactions are allowed. It also means that the consistency and actuality of data is guaranteed only at the time when an interface method is used. This simplified assumption is sufficient for many applications and simplifies work with a persistence interface.
All checked exceptions (see Glossary) occurring during storage are converted into a special exception type (i.e., CRMException in this case). In particular, this signals when consistency conditions in our persistent domain model are violated. For example, an attempt to delete a CustomerCategory object should fail when this category still contains persistent Customer instances.
If necessary, the implementation takes care of maintaining the identity of an object. In this example, we don't insist on identity (==), but content ourselves with equality (equals(...)).

These basic assumptions have to match the requirements of our application. At times, both the design and the implementation of a persistence interface are more difficult due to special requirements. We will look at three typical difficulties in the sections that follow: transactions visible to the outside, ad hoc queries, and object-centered persistence.

Transactions

The transaction concept plays a central role in databases. So far, we have assumed that encapsulation of single persistence calls in transactions is sufficient. However, there are situations where transactions have to be available from the outside, for example, to execute several persistent actions in a transaction-protected manner. From the technical interface perspective, we could use an additional transaction interface in such cases. In the simplest case, we would use one single method:

 public interface CRMTransaction {    Object run() throws Exception; }

In addition, CRMPersistence has to be extended by one (or several) methods to execute the transactions:

 public interface CRMPersistence {    ...    Object executeTransaction(CRMTransaction transaction)       throws CRMException; }

In our application code, a transaction call would then look like this:

 public Set allCustomersFirstCategory() throws CRMException {    CRMTransaction t = new CRMTransaction() {       public Object run() throws CRMException {          Set categories = persistence.allCategories();          CustomerCategory cat =             CustomerCategory) categories.get(0);          return persistence.allCustomers(cat);       }    };    return (Set) persistence.executeTransaction(t); }

The situation in a real-world implementation dealing with a database or another persistence framework can be easy or very difficult, depending on the availability of nested transactions and various transaction types. In contrast, the implementation in DummyCRMPersistence is very easy:

 public Object executeTransaction(CRMTransaction transaction)    throws CRMException {    try {       return transaction.run();    } catch (CRMException crmex) {       throw crmex;    } catch (Exception ex) {       throw new CRMException(ex.getMessage());    } }

The above interface includes the transaction's commit implicitly. Alternatively, we could offer a commit(), a rollback(), or even the creation of subtransactions in the transaction object (i.e., CRMTransaction in this example). This would allow us an even finer control of the transactional behavior.

However, we should not forget that making transactions visible means that the program code will be more complicated and inflated. This means that we should make transactions visible only if we cannot do without explicit transactions. It is often sufficient to extend the persistence interface by a few parameters or methods to avoid this additional complexity.

Ad Hoc Queries

Many applications are characterized by the fact that SQL code—mostly specialized and optimized queries—are spread over all parts of a program. This "decentralization" is often motivated by a need to optimize queries. From the designer's perspective, there are several drawbacks inherent in this approach. First, code of the business logic becomes dependent upon a specific technology (SQL database), a specific database, and a fixed schema. Second, code from an initially specific level, namely, the production of persistence, is distributed over numerous classes, packages, and layers, instead of concentrating it in one place.

The separation of the persistence interface from the implementation introduced in this chapter prevents SQL queries from reaching the upper layers. Nevertheless, developers sometimes get to a point where they have to extend the persistence interface by a new query method for each new functionality. This is the reason why most persistence frameworks and object-oriented databases offer a way to build ad hoc queries against the database. We could give in to this pressure and extend our CRMPersistence interface by the following method:

 Set executeSqlQuery(String queryString) {}

But we would let ourselves in for the drawbacks just described. In addition, we would spoil our option to easily test this method's client code. In the simplest case, we would have to verify SQL strings for correct syntax and semantics. For this reason, such an expansion should be used as a last resort.

A better and normally sufficient way is to build a very small query language and specialize it for our application. A very simple example is the method we already have: allCustomers(CustomerCategory category). Passing an example object, for example, an object with its attribute set so it serves as search parameter and wildcard, is a little more complex. We can expand this query language as needed—to include our own classes for searching value ranges and/or combinations and much more.

The benefit of such a program-specific query language is twofold: it is independent of any persistence mechanism and data schema, and it is independent in representing queries by use of the programming language. Compared to string-based languages like SQL, a program-specific query language improves testability and allows the compiler to remove many query input errors upfront. As long as we keep the query definition within the Java source we can even stick to a pure object-oriented description of it. In case we have to externalize the query definitions, in order to make them configurable for instance, we can build a simple textual parser as described in Building Parsers with Java [Metsker01].

Sometimes there is a requirement to optimize queries for a specific database, a specific schema, or a specific physical database layout for performance reasons. In my experience, it is sufficient to offer separate time-critical queries in the persistence interface to treat and optimize them separately, whereas the majority of all ad hoc queries is still created generically from the query language.

Object-Centered Persistence

So far in this chapter, we directed each access to the persistent storage medium over an interface (CRMPersistence). This approach has several drawbacks:

The implementing object must be passed to all objects that need to invoke persistence methods, which causes the number of method or constructor parameters to increase.
In a large persistent domain model, the number of methods available in the interface would soon exceed the limit of clarity.

One approach to mitigate these drawbacks is to move part of the persistence methods towards the objects: writeXXX() and deleteXXX() become write() and delete() in class XXX. And if we enable write() to store newly created objects, we will no longer need the createXXX() method. On the other hand, we don't want to lose the independence of our persistent objects. Figure 9.2 shows a possible solution to this problem.

click to expand
Figure 9.2: An object-centered persistence interface.

Two points are important in this design:

The original interface CRMPersistence is split into two interfaces: CRM-Persistence for access from the outside and InternalCRMPersistence for access by the persistent objects themselves.
All independent persistent objects know their "internal" persistence implementation, to which they pass the write() and delete() invocations.

There is one constraint left in this approach: the internal persistence interface can tend to become (too) big. In this case, the next scaling step consists of splitting the internal interface, normally into one interface for each independent object. Splitting the internal interface means that the implementation class should also be split, eventually leading to a much more complex design.

For this reason, we should stick to the principle that the simplest solution is the best. Consequently, evolutionary designers would begin with a centralized persistence interface, then swing round to object-centered persistence at some point in time, and eventually split both the internal interface and the implementation at a later stage. An interesting experience report about the evolution of a persistence framework can be found in Jim Little's text [Little01]. Therein Little describes how a complex architecture, which looked good on paper but was unusable in the real world, was migrated step by step into a simpler, yet superior design by applying an evolutionary and test-driven approach.