2.4 Rolling Your Own Persistence Mapping Layer

The preceding sections explored some object-relational specific mapping issues. The following section looks into some topics that are not strictly O/R- related , but apply to any persistence framework that maps a Java object from and to an external datastore, be it relational, an object database, XML repositories, or any other imaginable data store.

While again such issues are addressed by existing products and do not need to be a source of concern for application-level developers, an understanding, or at least awareness of it may be of interest.

Furthermore, such issues have to be addressed if you want to realize your own JDO-like persistence layer with good performance characteristics. This section concludes with a short "build versus buy" discussion.

2.4.1 Caching

Now that objects can be instantiated on demand and unique references are returned for unique data in the database, it is a small step to realize further caching. Why should a call to getAuthor in the above example do anything at all, if the author object was previously retrieved? A reference to the in-memory object can be returned again. The following pseudo code illustrates the idea:

Check whether the book instance already contains valid data. If yes, return the Author reference.
Take the book's BOOKID .

Create a JDBC statement like

 SELECT * FROM BOOKS WHERE BOOKID = id.

Copy the necessary fields of the result record into the book instance.
Create an Author instance.
Copy the AUTHORID from the result record into the Author instance.
Mark the Author instance so that it is loaded when a get() method is called.
Return the author instance.

In this case, the on-demand loading of objects is split into instantiation of objects that are just placeholders for corresponding database identities and the actual data retrieval from the database.

Whenever objects are modified, the actual update operation to the database can be delayed until all work related to the object graph is done. This would save unnecessary update operations on the same object or dependent objects. Diverse solutions to find out about changes to objects of a graph of objects exist and are realized in O/R mapping tools. One can copy objects and compare each original object with the application object through Java reflection. Set and get methods for fields may be created by a tool to track modifications and load data on demand.

What all these solutions have in common is that the application has the notion of some point in time when all modifications have to be copied out to the datastore. Obviously, the same is true for reading. At some point, the cached objects need to be thrown away (cache eviction) and data has to be reloaded. Various strategies exist, usually linked to some time or memory limit exhaustion, and often implemented in Java by using SoftReferences , WeakReferences , and the like.

2.4.2 Transactional database access and transactional objects

The above-mentioned cache consistency is directly connected to transactional database access. Developers trust a database system and rely on ACID properties. A transactional system fulfilling these properties guarantees that it can commit some pieces of work in a single step. Either everything or nothing is done, independent of other parts of the system or separated from them. All work is committed persistently so that the system can fail afterward without data loss.

A typical workflow with transactional access looks like the following:

Start a new transaction.
Look up a first instance as a root of the object graph by using a query or provided application identity.
Navigate through the object graph.
Modify or delete objects.
Add new objects.
Commit or abort the transaction.

Beyond the "simple" usage of underlying datastore transactions as just described, some persistence layers (and JDO certainly does, as we'll see later) add a real transactional objects mechanism. Pure Java objects clearly are not transactional. For example, imagine that a transaction is started, an object's attributes are changed, and the transaction is then aborted. The Java object would keep the old, now invalid values from within a transaction that was rolled back. In a persistence framework with true transactional object support, the values of the attributes of the object itself would "roll back" as well.

2.4.3 Locking

Another issue, somewhat related to correct transaction handling in persistence layers, is locking. What happens in the case of concurrent read and write access to the same object? Different strategies exist, as we examine in the transaction chapter, but whatever the details, a persistence layer again needs to make an extra effort for this purpose.

For example, to support optimistic locking on a relational database, an extra timestamp column or incremental numeric counter column on each table is often introduced, kept up to date, and checked. For pessimistic locking, the " ... FOR UPDATE " SQL syntax (non-standard) may have to be used. Such underlying locking implementations must be implemented transparent to the application logic and translated to the respective API.

2.4.4 Arrays, sets, lists, and maps

The Java language and more so the standard java.util . Collection library have various ways of modeling associations between objects, from simple Arrays, to Sets, to Lists, and finally Maps.

Different datastores have different and sometimes limited direct support for such models. Relational datastoresfor example, out-of-the-box (with the ubiquitous "join table in the middle")really model only what a correct Java model would express as a Set , while in XML-to-object mapping, a collection of objects is always ordered, thus delivering what Java would model with a List type.

A persistence layer needs to provide mapping of such object collections in order to ensure that associations work exactly like their Java interface contract requests and perform well for large data.

2.4.5 Performance and efficiency

Questions of performance and efficiency arise with many aspects of the implementation of a persistence layer, including the following:

How are attributes of the persistent class read and written? Access to fields of objects through reflection is slow. For example, JDO implementations generally do not actually use Java Reflection. Other persistence layers have different approaches.
How are large query results (collection of objects) handled? Is everything loaded at once, or is some sort of Cursor or Iterator architecture used? Is pre-fetching a certain number of initial records possible (Like SQL's " OPTIMZE FOR x ROWS " or Oracle's " /* FIRST_ROWS */ " syntax)?
How are large associations between two objects handled? Are partial object reading with basic lazy loading of referenced objects and "non-default fetch group " attributes supported? Is a Set or List always fully loaded into memory, or are there optimized framework specific implementations of such interfaces, or are techniques such as Java Proxies implemented? Are such purely performance enhancing optimizations transparent to an application of a framework, or do some sort of ValueObject / DataTransferObject -type classes have to be used explicitly?
Is there any optimization before interacting with the underlying database? In other words, are "redundant" operations within one transaction (e.g., multiple sets on same attributes) recognized and simplified? Are protocol-specific features utilized, e.g., for O/R mapping through JDBC, reusing of prepared statements, batch statements, and so on.

2.4.6 Building versus buying a persistence framework

The previous pages have outlined the issues that a persistence framework in general and an O/R mapping-based one in particular need to address. The issues presented are not exhaustive, and we have not discussed issues such as complex object lifecycle definition and implementation used to model some of the behavior outlined. In short, while such a framework certainly could be built in-house, it is worth a detailed "build versus buy" cost and effort analysis.

This book cannot attempt to provide estimate figures, but merely outlines what is involved in case of a "build" decision. People who have tried "build instead of buy" usually argue for a "buy instead build" afterward.

Alternatively, a much simpler and less generic framework may be envisioned for "build" in rare situations. As for any major project, clearly defining the scope of a framework to build is essential. Areas that possibly provide room for simplification include the following:

Not real transparent orthogonal persistence, "no persistence by reachability," but simple explicit "save" or "update" methods (instead makePersistent ) on each modified persistent object.
Not real transactions and transactional objects. No locking. Not a great architecture, but many simple JDBC applications do work like that.
Limited query capability, no generic query language, and thus no translation. Only special application-specific cases. Alternatively, usage of SQL as query language, with no support for queries via object navigation, and so on.
No support for inheritance.
No caching.