Requirements on the Persistence Infrastructure
Let's repeat again that I want the infrastructure to stay out of the way of the Domain Model as much as possible so that I can focus on creating a powerful Domain Model for solving the business problem without having more distractions than are necessary. For example, I believe that I should be able to work with as high a level of Persistence Ignorance (PI) as possible, so that's a requirement of the infrastructure.
At the same time, it's also important not to put too much responsibility on the consumer of the Domain Model. I want to provide the consumer with a very simple API so that the consumer programmer can focus on what is important to him: creating a good experience for consumer users.
I have already defined certain desirable features for the life cycle of the persistent entities in Chapter 5, "Moving Further with Domain-Driven Design," but let's repeat it again here in Table 8-1.
Table 8-1. Summary of the Semantics for the Life Cycle of the Domain Model Instances
Another way of thinking about the requirements of the persistence infrastructure is to look at the non-functional requirements I have. Typical examples are scalability, security, and maintainability.
Of course, this varies widely, not only concerning the requirements, but also because a certain persistence framework can sometimes fail and sometimes succeed for the same mix of non-functional requirements. The reason for this is that the success or failure depends on the type of application, the usage patterns, and so on. I have to leave it up to the reader to define her own mix of non-functional requirements and to check whether a certain persistence framework can fulfill the requirements by carrying out careful tests. Just remember that if your appetite for non-functional requirements becomes too big, it will cost you a lot somewhere else.
Most often it's very hard to get the customer to express any non-functional requirements at all. If you succeed, the risk is that he will want everything to 100%, and that's when it's good to remember the cost side of it.
For further discussion of non-functional requirements, see [POSA 1], [Fowler PoEAA] or [Nilsson NED].
As you may have noticed, I didn't mention it as a requirement on the persistence framework to be able to have a network between the consumer and the Domain Model. Building it is very possible, as we did in Valhalla [Valhalla]. However, it's probably better to think along the lines of explicit boundary where networks are involved and design the
So, we have defined requirements to fulfill for the Domain Model's sake, and we have defined how to use the API for the persistence framework.
With the requirements now in place, let's get going and make some choices about the infrastructure. First is the location for the persistent data.
Where to Store Data
Assume we start with a clean sheet of paper; how would we like to store the data? We have at least four choices:
I realize that this may be a case of apples and oranges because it is a two-part problem: what to store (Objects, Hierarchies, such as XML, or Tables) and where to store it: (RAM, File system, Object database, or Relational database). But that's not without problems, either. After thinking through it a couple of times, I think my original way of describing it is good enough, so let's move on.
I'll start with RAM.
Storing the data in RAM isn't
The beauty is that you avoid mapping from the Domain Model onto something else. The Domain Model is persistent by itself.
You could also store hierarchies in memory, with XML documents as a typical example. Then you do get some impedance mismatch because you are using two models and need to transform between them. On the other hand, you do get some functionality for free, such as querying if you find XPath and XQuery to be good languages for that.
No matter what you "store" in RAM, one problem is the
On the other hand, 64-bit servers are becoming more and more common, and RAM prices are dropping all the time. For a majority of applications, the current as well as the
Another problem is that it takes time to recreate the Domain Model after a system crash, because working through a huge log will take time. The problem is minimized if the RAM-based solution takes snapshots to disk every now and then of the current Domain Model. But the problem is still there, and taking snapshots might be a problem in itself because it will bring the system on its knees and might cause periods of waitstate for other
Gregory Young pointed out that to make the issue of snapshots smaller, context boundaries can be used within the domain and they can be snapshotted separately.
One of the worst problems with this approach comes in making changes to the schema. What you'll have to do is serialize the objects to disk, and then
Another big problem is that of transactions. Fulfilling
Atomicity, Consistency, Isolation and Durability
(ACID) is not easily done without hurting scalability a lot. First, instead of using the "try and see" approach in this case it's better to prepare for a transaction as much as possible in order to investigate whether the task is likely to succeed. ("I need to do this; will that get me into trouble?") Of course, this won't solve the whole problem, but at least it will reduce it. It's good to be proactive here,
This depends on the topology. If you
Again, being proactive didn't provide any transactional semantics, it just made a rollback less likely. One approach I've
One more problem is that there is no obvious choice for a query language. There are some
If possible, reporting should be done on a dedicated server and dedicated database anyway, so this might be less of a problem than what was first anticipated. On the other hand, in reality there is often at least a grey zone of what are
Yet another problem is that navigation in the Domain Model is typically based on traversing lists. There might not be built-in support for indexing. Sure, you can use hash tables here and there, but they only solve part of the problem. You can, of course, add an in-memory indexing solution if you need it. On the other hand, you should note that this
Finally, as I've already said a couple of times, I consider this approach to be a bit immature, but very interesting and
Another solution is to use the file system instead of RAM. What to persist is the same as with RAM, namely the Domain Model objects or XML. As a matter of fact, this solution could be very close to the RAM solution. It could be the same if the database is small, and it might "only" spill out to disk when the RAM is filled to a certain level.
This approach has similar problems to the previous one, except that the size of RAM isn't as much of a limiting factor in this case. On the other hand, the performance characteristics will probably be less impressive.
I believe that it might be pretty appealing to write your own solution for
If you do decide to build a Domain Model that could spill out to disk when persisting, what you actually create is quite like an object database. (Perhaps that gives a better sense of the amount of work and complexity.)
Historically, there have been many different styles of object databases, but the common denominator was that they tried to avoid the transformation between objects and some other storage format. This was done for more or less the same reasons as I have been talking about when wanting to delay adding infrastructure to the Domain Model, as well as for performance reasons.
The number of styles increases even more if we also consider the hybrids, such as object-relational databases, but I think those hybrids have most often come from a relational background and style rather than from the object-oriented side.
As it turned out, the number of distractions was not zero. In fact, you could say that the impedance mismatch was still there, but compared to bridging the gap between objects and a relational database, using object databases was pretty clean.
So far, the problems with object databases have been as
My evil friend Martin Rosn-Lidholm pointed out that many of the same arguments actually could be used against DDD and O/R Mapping compared to
I'm certainly no expert on object databases. I've
There was a time, around 1994, when I thought object databases were taking over as the de facto standard. But I based that idea on purely technical characteristics, and life isn't as simple as that. Object databases were promising a
As I said, the de facto solution for storing data in applications is to use a relational database, and this is the case even if you work with a Domain Model.
Storing the data in a relational database means that the data is stored in tabular format, where everything is data, including the relationships. This has proved to be a simple and yet effective (enough) solution in many applications. But no solution is without problems, and in this case when we want to persist a Domain Model in a relational database, the problem is the impedance mismatch. However, I talked about that at length in Chapter 1, "Values to Value," so I won't repeat it here.
If we go this route, the most common solution is to use an implementation of the Data Mapper pattern [Fowler PoEAA]. The purpose of the Data Mapper pattern is to bridge the gap between the Domain Model and the persistent representation, to shuffle the data both ways. We'll come back to that pattern in a few minutes.
Choosing what storage solution to use isn't obvious. Still, we have to make a choice.
Before choosing and moving forward, I'd like to think about a couple of other questions.
One or Several Resource Managers?
Another, completely different, question to ask is whether one resource manager should be used or several. It might
The Domain Model excels in a situation where there are several resource managers because it can completely hide this complexity from the consumers if desired. But we should also be clear that the presence of multiple resource managers adds to the complexity of mapping the Domain Model to its persistence. In order to make things simpler in the discussion, I'll only assume one resource manager here.
In reality, we rarely start with a clean sheet of paper. There are factors that
Other typical factors that come into play, apart from the raw technology factors that we talked about earlier and that didn't
Maturity in solutions is also a very influential factor when it comes to the data. Losing data
Choose and Move On
Taking the technological reasons, as well as the other factors I have mentioned, into consideration, it's no
It actually makes me want to add some requirements to the list of our requirements on the persistence infrastructure:
As I said, what is then needed is an implementation of the Data Mapper pattern. The question is how to implement that pattern.