Don't Forget About Operations
Not too long ago, I was talking to a team at a large Swedish company. I talked, for example, about the Valhalla framework and how it looked at that particular point in time.
They asked me how we had dealt with operational mechanisms, such as logging, configuration, security, and so on. When I told them that we hadn't added that yet, they first went quiet and then they started laughing out loud. They said they had spent years in their own framework with those aspects, and we hadn't even started thinking about it.
Luckily, I could defend myself to some extent. We had been thinking quite a lot about it, but we wanted to set the core parts of the framework before adding the operational mechanisms. After all, the core parts influence how the mechanism should look. I could also direct them to my last book [Nilsson NED] where I talked a lot in the initial chapters about mechanisms like those (such as tracing, logging, and configuration).
An Example of When a Mechanism Is Needed
Why are the operational aspects important? Let's take an example. Assume an application that is in production lacks tracing. (This isn't just fictional. I know that this operational aspect is forgotten pretty often. Even though for the last few years I have been talking myself blue in the face about this, I have old applications in production without tracing built-in myself.) When a weird problem occurs that isn't revealing too much about itself in the error log information, the reason for the problem is very hard to find and the problem is very hard to solve.
No Tracing in Place
You could always add tracing at that particular point in time, but it would probably take you a couple of days at least. If the problem is serious, the customer will expect you to find and solve the problem in less time than a couple of days.
A commonand most often pretty inefficientway to approach this is to make ad-hoc changes and after each change cross your fingers and hope that the problem is gone.
What you probably do instead is add ad-hoc tracing here and there. It will make your code much uglier, and it will take some time before you track down the problem. The next time there is another problem, very little has changed. You will be back at square one.
What might also be possible is to run a debugger in the production environment. However, there are problems with this such as you might interfere too much with other systems or you might have obfuscated the code with some tool so that it's hard to debug.
It's also risky to change the code in production, even if the change is as small as adding tracing. Not a big risk, but it's there.
If you have the possibility of using Aspect-Oriented Programming (AOP), it might not take more than a few minutes to add tracing afterward. We will discuss AOP quite a lot in Chapter 10, "Design Techniques to Embrace."
Tracing in Place
If you have a working tracing solution in place, you know how efficient it might be to find and solve the problem instead. The days-long delay is gone, and you are on the way to tracking down the problem in minutes.
So it's important to be careful and not think "You Aren't Going to Need It" (YAGNI) too often when it comes to operational mechanisms. Using YAGNI often will cost too much when it comes to adding the mechanism if (or rather when) you will need it. Remember, the idea with YAGNI is that the cost of adding something is pretty much the same now and later, in which case you can always wait until you really need it. When the cost is low now and high later, and there's a good chance you will need it, you should make a different decision.
Some Examples of Operational Mechanisms
Here I have listed a short number of operational mechanisms that can be considered for most enterprise scale applications:
It's Not Just Our Fault
In the defense of developers, I know I have asked operational people several times about their requirements regarding operational mechanisms, and they haven't said very much. I guess they haven't been spoiled with a lot of support from the applications.
That said, an appealing way of dealing with this is to, if you can, get some resources from the operations side early on to act explicitly as a stakeholder on the system, so that you create the operational mechanisms that are really needed. The ordinary customer of the system isn't a good requirement creator here. The operational mechanisms are typical examples of non-functional requirements, and the ordinary customers won't normally add much there.
The flexibility for your mechanisms might be important because different customers use different operational platforms. There are standards such as Windows Management Instrumentation (WMI), but it's wise to build in flexibility if you build a framework for this so you can easily switch to different output formats for the logging, for example. One customer uses CA Unicenter, another uses Microsoft Operations Manager (MOM), yet another might use some product that won't understand WMI, and so on.