Querying | Applying Domain-Driven Design and Patterns: With Examples in C# and .NET

Querying is extremely different in different infrastructure solutions. It's also something that greatly risks affecting the consumer. It might not be apparent at first when you start out simply, but after a while you'll often find quite a lot of querying requirements, and while you are fulfilling those, you're normally tying yourself to the chosen infrastructure.

Let's take a step back and take another solution that is not query-based. Earlier, I showed you how to fetch by identity in a Repository with the following code:

//OrderRepository public Order GetOrder(int orderNumber) {     return (Order)_ws.GetById(typeof(Order), orderNumber); }

However, if OrderNumber isn't an identity, the interface of the Repository method must clearly change to return a list instead, because several orders can have ordernumber 0 before they have reached a certain state. But then what? GetById() is useless now, because OrderNumber isn't an Identity (and let's assume it's not unique because I said the answer could be a list). I need a way to get to the second layer of Identity Maps of the Fake for the Orders. Let's assume I could do that with a GetAll(Type typeOfResult) like this:

//OrderRepository public IList GetOrders(int orderNumber) {     IList result = new ArrayList();     IList allOrders = _ws.GetAll(typeof(Order));     foreach (Order o in allOrders)     {         if (o.OrderNumber == orderNumber)             result.Add(o);     }     return result; }

It's still pretty silly code, and it's definitely not what you want to write when you have infrastructure in place, at least not for real-world applications.

Single-Set of Query Objects

Just as with the Repositories, it would be nice to write the queries "correctly" from day one in a single implementation which could be used both with the Fake and with the real infrastructure. How can we deal with that?

Let's assume that we want to work with Query Objects [Fowler PoEAA] (encapsulate queries as objects, providing object-oriented syntax for working with the queries) and we also change the signature from GetAll() to call it GetByQuery() and to let it take an IQuery (as defined in NWorkspace) instead. The code could now look like this:

//OrderRepository public IList GetOrders(int orderNumber) {    IQuery q = new Query(typeof(Order));    q.AddCriterion("OrderNumber", orderNumber);    return _ws.GetByQuery(q); }

OK, that was pretty straightforward. You just create an IQuery instance by saying which type you expect in the result. Then you set the criteria you want for holding down the size of the result set as much as possible, typically by processing the query in the database (or in the Fake code, in the case of when you're using the Fake implementation).

Note

We could pretty easily make the query interface more fluent, but let's stay with the most basic we can come up with for now.

That was what to do when you want to instantiate part of the Domain Model. Let's get back to the _GetNumberOfStoredCustomers() that we talked about earlier. How could that code look with our newly added querying tool? Let's assume it calls the Repository and a method like the following:

//CustomerRepository public int GetNumberOfStoredCustomers() {     return _ws.GetByQuery(new Query(typeof(Customer))).Count; }

It works, but for production scenarios that solution would reduce every DBA to tears. At least it will if the result is a SELECT that fetches all rows just so you can count how many rows there are. It's just not acceptable for most situations.

We need to add some more capabilities in the querying API. Here's an example where we have the possibility of returning simple types, such as an int, combined with an aggregate query (and this time aggregate isn't referring to the DDD pattern Aggregate, but, for example, to a SUM or AVG query in SQL):

//CustomerRepository public int GetNumberOfStoredCustomers() {     IQuery q = new Query(typeof(Customer),         new ResultField("CustomerNumber", Aggregate.Count));     return (int)_ws.GetByQuery(q)[0]; }

A bit raw and immature, but I think that this should give you an idea of how the basic querying API in NWorkspace is designed.

It would be nice to have a standard querying language, wouldn't it? Perhaps the absence of one was what made Object Databases not really take off. Sure, there was Object Query Language (OQL), but I think it came in pretty late, and it was also a pretty complex standard to implement. It's competent, but complex. (Well, it was probably a combination of things that hindered Object Databases from becoming mainstream; isn't it always?) I'll talk more about Object Databases in Chapter 8, "Infrastructure for Persistence."

What I now want, though, is a querying standard for persistence frameworks something as widespread as SQL, but for Domain Models. Until we have such a standard, the NWorkspace version could bridge the gap, for me at least. Is there a cost for that? Is there such a thing as a free lunch?

The Cost for Single-Set of Query Objects

Of course there's a cost for a transformation, and that goes for this case, too. First, there's a cost in performance for going from an IWorkspace implementation to the infrastructure solution. However, the performance cost will probably be pretty low in the context of end-to-end scenarios.

Then there's the cost of loss of power because the NWorkspace-API is simplified, and competent infrastructure solutions have more to offer and that is probably much worse. Yet another cost is that the API of NWorkspace itself is pretty raw and perhaps not as nice as the querying API of your infrastructure solution. OK, all those costs sound fair, and if there's a lot to be gained, I can live with them.

I left out one very important detail before about querying and the Identity Map: bypassing the cache when querying or not.

Querying and the Cache

I mentioned earlier that when you do a GetById(), if that operation must be fulfilled by going to persistence, the fetched instance will be added to the Identity Map before being returned to the consumer. That goes for GetByQuery() as well; that is, the instances will be added to the Identity Map.

However, there's a big difference in that the GetByQuery() won't investigate the Identity Map before hitting persistence. The reason is partly that we want to use the power of the backend, but above all that we don't know if we have all the necessary information in the Identity Map (or cache if you will) for fulfilling the query. To find out if we have that, we need to hit the database. This brings us to another problem. GetById() starts with the Identity Map; GetByQuery() does not. This is totally different behavior, and it actually means that GetByQuery() bypasses the cache, which is problematic. If you ask for the new Customer Volvo that has been added to the Identity Map/Unit of Work, but has not been persisted, it will be found if you ask by ID but not when you ask by name with a query. Weird.

To tell you the truth, it was a painful decision, but I decided to let GetByQuery() do an implicit PersistAll() by default before going to the database. (There's an override to GetByQuery() to avoid this, but again, it's the default to implicitly call PersistAll().) I came to the conclusion that this default style is probably most in line with the rest of NWorkspace and its goal of simplicity and the lessened risk of errors. This is why I made some sacrifices with transactional semantics. Some might argue that this violates the principle of least amount of surprise. But I think it depends on your background, what is surprising in this case.

The biggest drawback is definitely that when doing GetByQuery(), you might get save-related exceptions. What a painful decisionbut I need to decide something for now to move on.

Do you remember the simple and unoptimized version of the _GetNumberOf-StoredCustomers()? It's not just slowit might not work as expected when it looks like this (which goes for the optimized version as well:

public int GetNumberOfStoredCustomers() {     return _ws.GetByQuery(new Query(typeof(Customer))).Count; }

The reason it won't work for my purpose is that GetByQuery() will do that implicit PersistAll(). Instead, an overload must be used like this, where false is for the implicitPersistAll parameter:

public int GetNumberOfStoredCustomers() {     return _ws.GetByQuery(new Query(typeof(Customer)         , false)).Count; }

Note

And of course we could (and should) use the aggregate version instead. The focus here was how to deal with implicit PersistAll().

All this affects the programming model to a certain degree. First of all, you should definitely try to adopt a style of working in blocks when it comes to querying and changing for the same workspace instance. So when you have made changes, you should be happy about them before querying because querying will make the changes persistent.

Note

You might be right. I might be responsible for fostering a sloppy style of consumer programmers. They just code and it works, even though they forget about saving explicitly and so on.

An unexpected and unwanted side effect is that you can get totally different exceptions from GetByQuery() from what you expect because the exception might really come from PersistAll(). Therefore, it's definitely a good idea to do the PersistAll() explicitly in the consumer code anyway.

And again, if you hate this behavior, there's nothing to stop you from using the overload. (Actually, with GetById() you could do it the other way around, so that an overload goes to the database regardless, without checking the Identity Map.) "I don't care about my own transient work; I want to know what's in the database."

That was a bit about how querying works in relation to the Identity Map. Next up is where to host the queries.

Where to Locate Queries

As I see it, we have at least the following three places in which to host query instances:

In Repositories
In consumers of Repositories
In the Domain Model

Let's discuss them one by one.

In Repositories

Probably the most common place to set up queries is in the Repositories. Then the queries become the tool for fulfilling method requests, such as GetCustomersByName() and GetUndeliveredOrders(). That is, the consumer might send parameters to the methods, but those are just ordinary types and not Query Objects. The Query Objects are then set up in accordance with the method and possible parameter values.

In Consumers of Repositories

In the second case, the queries are set up in the consumers of the Repositories and sent to the Repositories as parameters. This is typically used in cases of highly flexible queries, such as when the user can choose to fill in any fields in a large filtering form. One such typical method on a Repository could be named GetCustomersByFilter(), and it takes an IQuery as parameter.

In the Domain Model

Finally, it might be interesting to set up typed Query Objects in the Domain Model (still queries that implement IQuery of NWorkspace). The consumer still gets the power of queries to be used for sending to Repositories, for example, but with a highly intuitive and typesafe API. How the API looks is, of course, totally up to the Domain Model developer.

Instead of the following simple typeless query:

//Consumer code IQuery q = new Query(typeof(Customer)); q.AddCriterion("Name", "Volvo");

the consumer could set up the same query with the following code:

//Consumer code CustomerQuery q = new CustomerQuery(); q.Name.Eq("Volvo");

In addition to getting simpler consumer code, this also further encapsulates the Domain Model.

It's also about lessening the flexibility, and that is very good. Don't make everything possible.

Assuming you like the idea of Domain Model-hosted queries, which queries do we need and how many?

Aggregates as a Tool Again

That last question may have sounded quite open, but I actually have an opinion on that. Guess what? Yes, I think we should use Aggregates.

Each of your Aggregates is a typical candidate for having Query Object in the Domain Model. Of course, you could also go the XP route of creating them when needed for the first time, which is probably better.

Speaking of Aggregates, I'd like to point out again that I see Aggregates as the default mechanism for determining how big the loadgraphs should be.

Note

With loadgraph, I mean how far from the target instance we should load instances. For example, when we ask for a certain order, should we also load its customer or not? What about its orderLines?

And when the default isn't good enough performance-wise, we have lots of room for performance optimizations. A typical example is to not load complete graphs when you need to list instances, such as Orders. Instead, you create a cut down type, perhaps called OrderSnapshot, with just the fields you need in your typical list situations. It's also the case that those lists won't be integrated with the Unit of Work and Identity Map, which probably is exactly what you want, again because of performance reasons (or it might create problemsas always, it depends).

An abstraction layer could support creating such lists so that your code will be agnostic about the implementation of the abstraction layer at work. It could look like this:

//For example OrderRepository.GetSnapshots() IQuery q = new Query(typeof(Order), typeof(OrderSnapshot)     , new string[]{"Id", "OrderNumber", "Customer.Id"     , "Customer.Name"}); q.AddCriterion("Year", year); result _ws.GetByQuery(q);

For this to work, OrderSnapshot must have suitable constructor, like this (assuming here that the Ids are implemented as Guids):

//OrderSnapshot public OrderSnapshot(Guid id, int orderNumber     , Guid customerId, string customerName)

Note

Getting type explosion is commonly referred to as the Achilles heel of O/R Mappers, which I provided an example of earlier. In my experience, the problem is there, but not anything we can't deal with. First, you do this only when you really have to, so not all Aggregate roots in your model will have a snapshot class. Second, you can often get away with a single snapshot class for several different list scenarios. The optimization effect of doing it at all is often significant enough that you don't need to go further.

Another common approach is to use Lazy Load for tuning by loading some of the data just in time. (We'll talk more about this in a later chapter.)

And if that isn't powerful enough, you can write manual SQL code and instantiate the snapshot type on your own. Just be very clear that you are starting to actively use the database model as well at that point in time.

One or Two Repository Assemblies?

With all this in place, you understand that you have many different possibilities for how to structure the Repositories, but you might wonder how it's done in real-world projects.

In the most recent large project of mine (which is in productionit's not a toy project), I use a single set of Repositories, both for Fake execution and execution against the database. There are a few optimizations that use native SQL, but I used a little hack there so that if the optimized method finds an injected connection string, it calls out to another method where I'm totally on my own.

Otherwise, the un-optimized code will be used instead. That way, the Fake will use the un-optimized version, and the database-related code will typically use the optimized version.

Again, this is only used in a handful of cases. Not extremely nice and clean, but it works fine for now.

Specifications as Queries

Yet another approach for querying is to use the Specification pattern [Evans DDD] (encapsulate conceptual specifications for describing something, such as what a customer that isn't allowed to buy more "looks like"). The concept gets a describing name and can be used again and again.

Using this approach is similar to creating type safe Domain Model-specific queries, but it goes a step further because it's not generic querying that we are after, but very specific querying, based on domain-specific concepts. This is one level higher than the query definitions that live in the Domain Model, such as CustomerQuery. Instead of exposing generic properties to filter by, a specification is a concept with a very specific domain meaning. A common example is the concept of a gold customer. The specification describes what is needed for a customer to be a gold customer.

The code gets very clear and purposeful because the concepts are caught and described as Specification classes.

Even those Specification classes could very well spit out IQuery so that you can use them with GetByQuery() of IWorkspace (or equivalent) if you like the idea of an abstraction layer.

Other Querying Alternatives

So far we have just talked about Query Objects as being the query language we need (sometimes embedded in other constructs, though). As a matter of fact, when the queries get complex, it's often more powerful to be able to write the queries in, for example, a string-based language similar to SQL.

I haven't thought about any more query languages for NWorkspace. But perhaps I can someday talk some people into adding support for NWorkspace queries as output from their queries. That way, their queries (such as something like SQL, but for Domain Model) would be useful against the infrastructure solutions that have adapter implementations for NWorkspace.

I can't let go of the idea that what querying language you want to use is as much of a lifestyle choice as is the choice of programming language. Of course, it might depend even more on the situation as to which query language we want to use, if we can choose. That's a nice way to end this subject for now, I think (more to come in Chapter 8 and 9). It's time to move on.