Don t Forget About Operations | Applying Domain-Driven Design and Patterns: With Examples in C# and .NET

Don't Forget About Operations

Not too long ago, I was talking to a team at a large Swedish company. I talked, for example, about the Valhalla framework and how it looked at that particular point in time.

They asked me how we had dealt with operational mechanisms, such as logging, configuration, security, and so on. When I told them that we hadn't added that yet, they first went quiet and then they started laughing out loud. They said they had spent years in their own framework with those aspects, and we hadn't even started thinking about it.

Luckily, I could defend myself to some extent. We had been thinking quite a lot about it, but we wanted to set the core parts of the framework before adding the operational mechanisms. After all, the core parts influence how the mechanism should look. I could also direct them to my last book [Nilsson NED] where I talked a lot in the initial chapters about mechanisms like those (such as tracing, logging, and configuration).

An Example of When a Mechanism Is Needed

Why are the operational aspects important? Let's take an example. Assume an application that is in production lacks tracing. (This isn't just fictional. I know that this operational aspect is forgotten pretty often. Even though for the last few years I have been talking myself blue in the face about this, I have old applications in production without tracing built-in myself.) When a weird problem occurs that isn't revealing too much about itself in the error log information, the reason for the problem is very hard to find and the problem is very hard to solve.

No Tracing in Place

You could always add tracing at that particular point in time, but it would probably take you a couple of days at least. If the problem is serious, the customer will expect you to find and solve the problem in less time than a couple of days.

A commonand most often pretty inefficientway to approach this is to make ad-hoc changes and after each change cross your fingers and hope that the problem is gone.

What you probably do instead is add ad-hoc tracing here and there. It will make your code much uglier, and it will take some time before you track down the problem. The next time there is another problem, very little has changed. You will be back at square one.

What might also be possible is to run a debugger in the production environment. However, there are problems with this such as you might interfere too much with other systems or you might have obfuscated the code with some tool so that it's hard to debug.

It's also risky to change the code in production, even if the change is as small as adding tracing. Not a big risk, but it's there.

Note

If you have the possibility of using Aspect-Oriented Programming (AOP), it might not take more than a few minutes to add tracing afterward. We will discuss AOP quite a lot in Chapter 10, "Design Techniques to Embrace."

Tracing in Place

If you have a working tracing solution in place, you know how efficient it might be to find and solve the problem instead. The days-long delay is gone, and you are on the way to tracking down the problem in minutes.

So it's important to be careful and not think "You Aren't Going to Need It" (YAGNI) too often when it comes to operational mechanisms. Using YAGNI often will cost too much when it comes to adding the mechanism if (or rather when) you will need it. Remember, the idea with YAGNI is that the cost of adding something is pretty much the same now and later, in which case you can always wait until you really need it. When the cost is low now and high later, and there's a good chance you will need it, you should make a different decision.

Some Examples of Operational Mechanisms

Here I have listed a short number of operational mechanisms that can be considered for most enterprise scale applications:

Tracing
As we just discussed, it's nice to be able to listen to what is going on at the same time as users run scenarios in the system. This is not only a very efficient solution for tracking down bugs, but it can be used for investigating where the bottlenecks are located, for example.
Logging
Errors, warnings, and information messages must be logged. This is extremely important for investigating problems after they have occurred. We can't expect the users to write down the exact messages for us. It might also be that we want to collect information that we don't want to show to the users.
Config
Have you had to recompile old applications just because the database server was switched to a new machine with a new name? I have. Of course, that kind of information should be configurable and depending on the application, this might be the case for loads of information.
Performance monitoring
Getting performance monitoring based on Domain Model information and other parts of your application is extremely helpful for tracking down problems, but also for keeping a proactive eye on the system. By doing that, you can easily track that it now takes perhaps 30% longer to execute a certain scenario compared to a time two months ago.
Security
These days, this one probably doesn't need any further explanation. We obviously need to carefully think through things like authentication and authorization. We also must protect our applications against attacks of different kinds.
Auditing
As one part of the security aspects, it's important to have auditing so that it's possible to check afterwards who did what when.

It's Not Just Our Fault

In the defense of developers, I know I have asked operational people several times about their requirements regarding operational mechanisms, and they haven't said very much. I guess they haven't been spoiled with a lot of support from the applications.

That said, an appealing way of dealing with this is to, if you can, get some resources from the operations side early on to act explicitly as a stakeholder on the system, so that you create the operational mechanisms that are really needed. The ordinary customer of the system isn't a good requirement creator here. The operational mechanisms are typical examples of non-functional requirements, and the ordinary customers won't normally add much there.

The flexibility for your mechanisms might be important because different customers use different operational platforms. There are standards such as Windows Management Instrumentation (WMI), but it's wise to build in flexibility if you build a framework for this so you can easily switch to different output formats for the logging, for example. One customer uses CA Unicenter, another uses Microsoft Operations Manager (MOM), yet another might use some product that won't understand WMI, and so on.