There are certainly many different ways of dealing with the Data Mapper pattern, but I think the most typical are the following:
Let's start discussing custom manual code.
Custom Manual Code
Here you will typically write persistence code on your own. The code will reside in the repositories. Of course, helper classes should be used, but that won't solve all the problems. Some typical technical problems are then the following:
I think a sentence about each of the patterns mentioned previously is in order. The Unit of Work pattern [Fowler PoEAA] is about capturing information about changes that are done to the Domain Model during a logical unit of work. That information can then be used to affect the persistent representation.
The Identity Map pattern [Fowler PoEAA] is about keeping no more than one instance for each entity in the session. It's like an identity-based cache. This is vital for bridging the impedance mismatch. When you work with objects, you want to be able to use the built-in object identity, the address to the object.
Finally, the Lazy Load pattern [Fowler PoEAA] is about loading subgraphs just in time.
Unit of Work and Identity Map are needed, according to the requirements we set up. That also goes for querying. Lazy Load might not be needed though. It can be solved pretty easily on your own if you need it occasionally. With that said, it's nice if we get support for in case we do need it.
So it seems as if there is some work to do. And those were just a couple of examples. There is more, much more to it.
If you decide to stick to the requirements I defined, you will find that you are about to build a specific and (hopefully) simplified O/R Mapper if you go for custom manual code. That means that we have kind of left this categorization and moved to the third one. Let's ignore that for now and just assume that we will be trying really hard to live without Unit of Work, Identity Map, Lazy Load, dynamic and flexible querying, and so on.
First, you have to decide how everything should work in detail. And when all that work has been done, you have to apply it to your complete Domain Model. This is very tedious, time-consuming, and prone to errors. Even worse, the problem will hit you again each time you make changes to the Domain Model from now on.
Code Generation of Custom Code
Another approach is to use code generation (such as custom built, a generic tool with your custom written templates, or a purchased complete solution if you find something appropriate) for dealing with the Data Mapper pattern. What I mean here is to use a solution similar to the one with custom manual code, but when the design is settled, it's used over and over again with the help of code generation.
This approach has the same problems as custom manual code, plus complexities with the code generation itself, of course. The up side is that the productivity will be much better when the design itself is mature.
Another common problem with code generation-based solutions is that of source code control. A small change to the Domain Model will usually force you to check out (if you use a tool for source code control that requires a checkout before making changes) all the classes that deal with the Data Mapping and regenerate them all. It could be worse, but it's still inconvenient.
There are more problems, such as adding loads of uninteresting code to the code base, code that isn't to be read by any humans, only the compiler.
A common problem is that the generated code is often tightly coupled to a specific database product. Even if that's solvable with an abstraction layer, the problem is still there if you need to support different database schemas. Then you need to keep one generated code base per schema. This is also a viable option for different database products, even though it's not a very smooth solution.
Another problem is that you will probably have fewer possibilities for runtime tuning. Take how the optimizer of a relational database works as a comparison. Only "what" is decided statically, "how" is decided at runtime, and therefore "how" can change depending upon the situation. I don't see this as a big problem for most applications at the moment, though. Moreover, if you compare a specific scenario with the same approaches for generated code and reflective code, it is often possible to squeeze out better performance from generated static code than from reflective code.
Perhaps a worse problem is that of new generator versions. I have the gut feeling that it's hard to get a new version of a generator into production because it will have to re-generate all the code.
Also, if the database schema changes, it's not possible to make changes at runtime without a recompile. But this is something that's rarely recommended anyway. These sorts of changes should involve the developers.
The code generation being of roundtrip style or forward only is a pretty big difference. If it is roundtrip style, your changes to the generated code will be preserved, while in the case of forward only style, you should never change the generated code because changes will be lost at next generation.
Let's end on an up note. Debugging might be easier with a solution based on code generation than based on metadata mapping. All the code is "there," but the code might be hard to understand, and there is a lot of it.
Often the catalyst for wanting to look further is the lack of dynamics, such as when it comes to querying. And I also said that if we decide to implement dynamic querying, Unit of Work, Identity Map, and so on, I think we should take a good look at the next category: the O/R Mapper.
Metadata Mapping (Object Relational (O/R) Mapper)
A specific style of the Data Mapper pattern is what is called the Metadata Mapping pattern. You define the relationship between the Domain Model and the Relational database in metadata. The rest of the work is done for you automatically.
The most typical implementation is probably that of O/R Mappers. I will use the term O/R Mapper as a product family that takes care of Metadata mapping.
My friend Mats Helander describes O/R Mapping as Tai Chi.
Tai Chi is composed of two halves. The first half is about learning to elevate and lower the arms slowly, in harmony with the breathing. The second half is everything else. This isn't a joke. No matter how good you get at Tai Chi, you are "expected" or recommended to continue to spend as much time on the first move as on all the other moves together.
O/R Mapping is also composed of two halves. The first is about shuffling data between objects in memory and rows in the database and back. The other half is everything else.
As you can probably guess by now, most O/R Mappers have built-in support for Unit of Work, Identity Map, Lazy Load, and Querying.
But there's no solution without problems. A common complaint against O/R Mappers is that they are incapable of creating really good SQL code. Let's have a look at some examples that are often brought up. First is UPDATE with the WHERE clause. Look at the following example:
UPDATE Inventory SET Balance = Balance 1 WHERE Id = 42 AND Balance >= 1
This means that I only want to change the balance if there are products in stock. Otherwise, I do not want to change the balance. This is usually not supported directly by O/R Mappers. Instead, the approach that O/R Mappers would take here is to read the Inventory row with an optimistic lock, make the change, and write it back (hoping for the best that there was no concurrency exception).
To clarify, here's an example of the optimistic approach (in this case as a SQL batch, but note that we aren't holding on to any locks because there is no explicit transaction, so this exemplifies the scenario):
--Remember the old Balance. SET @oldBalance = (SELECT Balance FROM Inventory WHERE Id = 42) --Calculate what the new Balance should be... UPDATE Inventory SET Balance = @newBalance WHERE Id = 42 AND Balance = @oldBalance --If @@ROWCOUNT now is 0, then the update failed!
Alternatively, it would read the Inventory row with a pessimistic lock, make the change, and write it back. Both approaches would cause scalability to suffer.
Another complaint about O/R Mappers is that they are often ineffective for updating large numbers of rows. The approach for updating all products via an O/R Mapper will probably be to read all products into a list, loop the list, and update the products one by one. If that could have been done with a single UPDATE statement instead, the throughput would have been tremendously better.
Yet another complaint is that it's hard to balance the amount to read. Either too little is read and there will therefore be lots of roundtrips when lazy loading, or more is read than is necessary. Type explosion is a common result from this, defining several variations of the type definitions.
But we can look at it the other way around as well. Can a person write all code extremely well and be consistent? And are all the developers on the team as good as the best one? Even if that's the case, should we use person hours for this if we get good enough performance with an automatic approach?
To be fair, the point just made is also valid for code generation.
We found some pros and some cons. I like that, because it makes me feel that I understand (to some degree) the technology that is being discussed.
Is this an easy choice? Of course not, but in my experience the approach that best fulfills the requirements we decided on together is the O/R Mapper. Intellectually, it feels like a decent approach for many situations, but it's not without problems.
If we are in YAGNI-mood, I think O/R Mappers make sense because we can solve the problem in a simple way (hopefully without adding distractions to the Domain Model) and stand strong for the future. And when the performance isn't good enough, these (hopefully rare) cases can be dealt with by custom code. This is probably a very efficient way of dealing with the problem, at least as long as not all the problems are performance problems, but that's hardly the case.
So let's assume we've chosen the O/R Mapper.