Another Classification: Infrastructure Patterns

In this second part of the classification, I will use some of the PoEAA patterns [Fowler PoEAA], some of which are focused on infrastructure.

Again, this description can be used even if you decide to go the custom route. Who knows, perhaps this description will help one or two readers avoid creating their own custom solutions and instead choose one of the existing solutions. I think this is usually a good idea because it's so much work building your own full-fledged solution. Been there, tried that.

Metadata Mapping: Type of Metadata

We need to describe the relationship between the Domain Model and the database schema in metadata; that's what Metadata Mapping [Fowler PoEAA] is all about. O/R Mappers are implementations of the Metadata Mapping pattern, but different mappers use different types of metadata. Typical examples are

XML document(s) or other document formats
Attributes (annotations might be a clearer term; for example, [MyAttribute] in C#)
Source code

As usual, each one comes with its own catch. For instance, XML documents suffer from XML hell. For most developers, XML isn't a very productive format to work with. It's often said that XML isn't for people but for parsers, but it's still the case that tools aren't up to par here, so we often find ourselves sitting there editing huge XML documents.

Another problem with the XML documents is that they are external to the Domain Model source code, so it's easy for them to get out of synch with each other. And many IDEs lack an understanding of the semantics of the XML documents, and therefore refactoring won't work seamlessly.

Attributes are used for decorating the Domain Model, so the risk of getting out of synch with the Domain Model is somewhat smaller. A bigger problem in this case is that the Domain Model is slightly more coupled to the database. If you want to use your Domain Model with two different databases, there is a greater risk that you'll have to have two different versions of the source code for the Domain Model if you use attributes than if you use some external type of metadata. This might not be a huge problem, but it can be. It's also harder to get an overview of the mapping in this case, but tooling can help.

It is debatable whether providing the mapping information in source code is a category of its own. It's actually just another document format. Anyway, I thought having just two categories was a bit cheap. The distinction I'm after here is that the source code describes the mapping information in a procedural way rather than a declarative way. I also see this option as something coming between XML documents and Attributes. The metadata is in the source code and compiled, but it's written in a way that you get it as an overview. To some this means "nice C# code instead of ugly XML." I can't say that I totally disagree. Still, this is an esoteric option, not commonly used.

And as usual, it doesn't have to be "one and only one." For example, perhaps the information is provided as attributes but can be overridden by XML information.

Are you wondering what to describe in metadata? It's the relationship between the Domain Model and the database, but to become concrete, we need an example. We could take Identity Fields as an example.

Identity Field

An Identity Field [Fowler PoEAA] of an entity holds on to the value of the Primary Key of the underlying table in the database. That's how the relationship between the entity instance and the table row is handled.

When it comes to new entities, the values for Identity Fields can be generated in at least four different layers:

Consumer
Domain Model
Database
O/R Mapper

The first two are very similar from the O/R Mapper's perspective. As the O/R Mapper sees it, the value is provided and out of the control of the O/R Mapper. The O/R Mapper just has to hope that the Consumer or Domain Model follows the protocol. If I may choose, I prefer the Domain Model to the Consumer. A problem here is for O/R Mappers that use the Identity Field value to judge if an instance is new and should be inserted or updated.

Note

This is a good example of letting one thing have two responsibilities. It's simple and it looks good at first, but problems are waiting around the corner.

The third option is pretty common, but it comes with some semantic problems. It means that the database uses something like IDENTITY (SQL Server) or Sequence (Oracle) for setting the value of the primary key when the entity is persisted. The problem is that this is pretty late in the life cycle, and there can be problems with the sets where you have added the Entity when the identity value changes.

Finally, the O/R Mapper itself can take care of the generation of the values of the Identity Fields, and that's the most convenient solution, at least for the O/R Mapper.

So what all this comes down to is that it might be a good idea to keep two identities around for entities. One is a natural identity, let's call it Business Identity, that can get its value at different times in the life cycle, not necessarily from the very start (not all entities have such an identity, though). It's also common to have more than one such identification per class.

The second is the Identity Field, which is more of a Persistence ID or Database ID. If we compare this with Relational database terms, we get Table 8-2.

Table 8-2. Domain Model Terms by Relational Database Terms
Domain Model	Relational Database
Business ID	Alternate Key
Identity Field	Primary Key

You'll find a good discussion of this in [Bauer/King HiA].

This is actually something I need to apply as an implementation detail in my current Domain Model. For example, the OrderNumber of Order is a Business ID rather than an Identity Field. That's very apparent in the Equals()/HashCode() implementations, which creates problems for sets, for example. Therefore, I also add Id fields whose values are generated by the O/R Mapper when the newly created entity instance is associated with its repository. After that, the value won't change ever again. I use Guid for those Id fields, as shown in the diagram in Figure 8-1.

Figure 8-1. Added Identity Fields to the entities that had Business IDs

Note

The OrderNumber and CustomerNumber are still around. Also note that the repositories will change slightly because of this. There will be new overloads for GetOrder() and GetCustomer().

This was a change that was very much needed for the infrastructure and is therefore a typical example of a distraction. It was not anything I did for the sole purpose of just one specific O/R Mapper, but rather as a simplification for more or less all O/R Mappers.

Note

It's ironic. When we built Valhalla, we decided that using Guids should be compulsory. We weren't altogether happy with that decision because it put a "must" on the developer, and it was on our list for future changes. Now when I'm using other persistence frameworks, I'm free to choose. Nevertheless, I still think it's often a good idea to use a Guid as the Identity Field.

Let's get back to the metadata with another example of what it contains.

Foreign Key Mapping

Another thing that typically appears in metadata is the Foreign Key Mapping pattern [Fowler PoEAA]. It's a description of the foreign keys and the related associations in the Domain Model.

Unlike the Identity Fields, this isn't about copied values in the Domain Model. Instead, it's just a metadata thing.

There are a variety of relationships that can be used, and the ones that are supported by the O/R Mappers can differ quite a lot. To my experience, most of the time you will only use relatively few types of relationships, but when you need something more esoteric, you are glad if you find support for it.

Embedded Value

One very important way of bridging the Impedance Mismatch is the possibility of having coarse-grained tables and fine-grained classes in the Domain Model. That's where the Embedded Value pattern [Fowler PoEAA] comes in.

It means that you should be able to store a customer in a single Customers table in the database, but work with the customer as a Customer object and an Address object in the Domain Model. (This example was very simplistic; in reality, the difference is often very large.)

In the simplest form (let's call it level one), you are able to just describe the relationship between the Embedded Value and the columns in the database table. Level two is where you might have to write assisting code that helps out with the translation for advanced cases.

Inheritance Solutions

Inheritance hierarchies in the Domain Model don't have a perfect match in the relational database because inheritance isn't a relational database concept (at least not before SQL:1999, which few database products support at the time of this writing). Furthermore, inheritance in the Domain Model is probably less commonly used than many would expect. That said, when you need to use it, you should be able to support it with the O/R Mapper.

There are three different type solutions to the problem. They are Single Table Inheritance, Class Table Inheritance, and Concrete Table Inheritance [Fowler PoEAA]. I have chosen to group them together because they are just different solutions to the same problem.

The main difference is regarding how many tables are used for storing an inheritance hierarchy. Assume Person as base class and Student and Teacher as subclasses. Then the different patterns will lead to the following typical tables, as shown in Table 8-3. From there you can probably easily deduce what columns will go where.

Table 8-3. Patterns for Persistence of Inheritance Hierarchies and What Tables Are Needed
Pattern	Tables in the Database
Single Table Inheritance	`People`
Class Table Inheritance	`People`, `Students`, `Teachers`
Concrete Table Inheritance	`Student`, `Teachers`

If the O/R Mapper only supports one of those, your flexibility in the database design has decreased compared to if you had all three from which to choose. You have to decide whether that's important or not.

Identity Map

In my early attempts to create an O/R Mapper, I first thought I could skip the Identity Map, but I came to the conclusion that it gets too complex and there is too much responsibility for the consumer programmer.

On the other hand, a lot also depends on your Domain Model design. If you never have relationships between entities in different Aggregates, the need for the Identity Map decreases. So DDD ideas, such as simplification and decoupling within the Domain Model, make it easier to live life without an Identity Map. That said, I still think it's useful to have an active Identity Map.

Note

If you think about O/R Mapping as a work horse that goes between database and object, the Identity Map might not be important. But that's not what we are talking about here, because then we would need to do more work. Here we'd like to just support the Domain Model with as simple (or rather as good) a solution as possible.

The Identity Map can be used for other things as well and not only for controlling the object graph the consumer knows about. For instance, it can be used for dealing with building M:N relationships of objects in the Domain Model when reading from the database.

Note

M:N describes the relationship regarding cardinality/multiplicity between objects by saying it is many-to-many. For example, a house has many people staying there, and at the same time every person can own several houses.

It's also often considered a cache for performance reasons, but as you know, I'm not overly cache-friendly, so I see the Identity Map as a convenience for the programming model rather than as a means to improve performance.

Different O/R Mappers differ regarding what "session" level you can/must have the Identity Map for. It can be machine, a process, and/or a session.

Another pattern often goes hand in hand with the Identity Map. I'm thinking about Unit of Work.

Unit of Work

Most O/R Mappers use, or at least have support for, the Unit of Work pattern. The main difference is really how transparent the Unit of Work is for the consumer. Where the O/R Mapper is of "runtime-Persistence Ignorant" style, the consumer might have to talk to the Unit of Work explicitly for registering a new instance to be inserted at next persist. Other O/R Mappers are more transparent, but then you probably will have to instantiate with a factory supplied by the O/R Mapper. There are pros and cons to these approaches.

The "session" level might also differ for the Unit of Work, at least in theory, just as I said for the Identity Map.

Lazy Load/Eager Load

I've said before that my Domain Model style isn't using Lazy Load [Fowler PoEAA] a lot within my Aggregates [Evans DDD]. Even so, it's a piece of the puzzle that we need to have in the persistence infrastructure, and we need to be able to use it as an optimization.

Instead, as the default strategy for my Aggregates, I use load eagerly. Or aggressively, or greedily, or span loading, or pre-load, or whatever you like to call it. I will from now on call it Eager Load.

Eager Load is pretty much the opposite of Lazy Load; you load the complete graph immediately instead of delaying loading parts of the graph until later.

As I see it, Eager Load goes hand in hand with Aggregates [Evans DDD], at least as the default solution.

Note

To some I might have put too much emphasis on Aggregates when it comes to reading scenarios. I have been using Aggregates as the default load scheme and optimized when I have found the need for it.

After all, Aggregates are most important for write scenarios. When seeing it that way, adding a need for something like a GetForWrite() to the protocol before making changes to an Aggregate instance might make sense. The GetForWrite() would load everything from the Aggregate, possibly with read consistency.

If you do have a concept that is spanning many instances and you want to treat it as a unit, Aggregates is the thing, and it has implications.

I also think the read-only/writable distinction is often something nice for users, who have to actively choose to move into write mode. Another good thing about it is that the user won't start making changes to an already stale object when optimistic concurrency control is used, which means that the risk of collisions becomes smaller. This approach also translates nicely to pessimistic concurrency control schemes.

There are many different implementation variants for Eager Load. It's often solved with OUTER JOINs in SQL, and the resultset is then broken down to the graph. The other most common solution is to batch several SELECT statements after each other.

A competent O/R Mapper should support both Lazy Load and different Eager Load strategies, which goes both for lists and single instances. It may even apply for Lazy Loading groups of fields of a single instance, even if I don't consider that crucial. Such a group of attributes could always be factored out into a Value Object instead, which would make a lot of sense in most situations. And then we are back to Lazy Load/Eager Load of instances again.

Controlling Concurrency

As I've said several times already, the Aggregate pattern is a good tool for controlling concurrency. Thanks to it, I get the unit I want to use and work with as a single whole. Yet that's not the complete solution. I also need to avoid collisions or detect whether collisions have occurred so that we don't get inconsistent data (and especially not without being notified).

From [Fowler PoEAA], we find the following solutions to the problem:

Coarse-Grained Lock
Instead of locking on the instance level, this pattern suggests locking a more coarse-grained unit. We can, for example, use it on the Aggregate root level, thereby implicitly locking all the parts of the Aggregate.
Optimistic Offline Lock
Expect no conflict, but check before commit.
Pessimistic Offline Lock
Prevent conflicts by an exclusive check-out mechanism.

This reminds me that I now want to add some versioning information to the Domain Model as a way of stating where we need to deal with controlling concurrency, for dealing with feature 4, "Concurrency conflict detection is important." I think my Aggregate roots should get a Version field to support Optimistic Offline Lock. It's not an automatic action. Instead, you add it where you need it. You'll find the change in Figure 8-2.

Figure 8-2. Added Version fields to some of the Aggregate roots

It's not too much of a problem if your O/R Mapper doesn't support Pessimistic Offline Lock. First of all, you should be careful when using it at all because it's pretty costly when it comes to overhead. Second, if you need it, you can build a decent solution on your own pretty easily. (I discussed a solution based on a custom Locks-table in my previous book [Nilsson NED].)

Metadata Mapping: Type of Metadata

Identity Field

Table 8-2. Domain Model Terms by Relational Database Terms

Figure 8-1. Added Identity Fields to the entities that had Business IDs

Foreign Key Mapping

Embedded Value

Inheritance Solutions

Table 8-3. Patterns for Persistence of Inheritance Hierarchies and What Tables Are Needed

Identity Map

Unit of Work

Lazy Load/Eager Load

Controlling Concurrency

Figure 8-2. Added Version fields to some of the Aggregate roots