COMING TO TERMS
Exceptions and Error Handling
When designing an interface, architects naturally concentrate on documenting how resources work in the nominal case, when everything goes according to plan. The real world, of course, is far from nominal, and a well-designed system must take appropriate action in the face of undesired circumstances. What happens when a resource is called with parameters that make no sense? What happens when the resource requires more memory, but the allocation request fails because there isn't any more? What happens when a resource never returns, because it has fallen victim to a process deadlock? What happens when the software is supposed to read the value of a sensor, but the sensor has failed and either isn't responding or is responding with gibberish?
Terminating the program on the spot seldom qualifies as "appropriate action." More desirable alternatives, depending on the situation, include various combinations of the following:
- Returning a status indicator: an integer codeor even a messagethat reports on the resource's execution: what, if anything, went wrong and what the result was.
- Retrying, if the offending condition is considered transient. The program might retry indefinitely or up to a preset number of times, at which point it returns a status indicator.
- Computing partial results or entering a degraded mode of operation.
- Attempting to correct the problem, perhaps by using default or fallback values or alternative resources.
These are all reasonable actions that a resource can take in the presence of undesired circumstances. If a resource is designed to take any of these actions, that should simply be documented as part of the effects of that resource. But many times, something else is appropriate. The resource can, in effect, throw up its hands and report that an error condition existed and that it was unable to do its job. This is where old-fashioned programs would print an error message and terminate. Today, they often raise an exception, which allows execution to continue and perhaps accomplish useful work.
Making a strong distinction between detecting an error condition and handling it provides greater flexibility in taking corrective action. The right place to fix a problem raised by a resource is usually the actor that invoked it, not in the resource itself. The resource detects the problem; the actor handles it. If we're in development, handling it might mean terminating with an error message so the bug can be tracked down and fixed. Perhaps the actor made the mistake because one of its own resources was used incorrectly by another actor. In that case, the actor might handle the exception by raising an exception of its own and bubbling the responsibility back along the invocation chain until the actor ultimately responsible is notified.
Modern programming languages provide facilities for raising exceptions and assigning handlers. Program language reference manuals take a language-oriented view in classifying the world of exceptions. The C++ programming language, for instance, has built-in exceptions classes dealing with memory allocation failure, process failure, tasking failures, and the like. Those are exceptions that the compiled program is likely to encounter from the operating system.
But many other things can go wrong during execution of software, and it is incumbent on the architect to say what they are. An architecture-oriented classification of exceptions is summarized in Figure 7.2. In the context of an element's interface, exception conditions are one of the following:
- Errors on the part of an actor invoking the resource.
- An actor sent incorrect or illegal information to the resource, perhaps calling a method with a parameter of the wrong type. This error will be detected by the compiler, and an exception is not necessary unless types can change dynamically, in which case things aren't so clear-cut. If your compiler does not generate code to do runtime type checking, associating an exception with the resource is the prudent thing to do. Other exceptions of this variety describe a parameter with an illegal or out-of-bounds value. Division by zero is the classic example of this, with array-bounds violations a close runner-up. Other examples are a string that has the wrong syntax or length; in a pair of parameters defining a range, the minimum exceeds the maximum; an uninitialized variable was input; and a set contains a duplicate member.
- The element is in the wrong state for the requested resource. The element entered the improper state as a result of a previous action or lack of a previous action on the part of an actor. An example of the latter is invoking a resource before the element's initialization method has been called.
- Software or hardware events that result in a violation in the element's assumptions about its environment.
- A hardware or software error occurred that prevented the resource from successfully executing. Processor failures, inability to allocate more memory, and memory faults are examples of this kind of exception.
- The element is in the wrong state for the requested resource. The element's improper state was brought about by an event that occurred in the environment of the element, outside the control of the actor requesting the resource. An example is trying to read from a sensor or write to a storage device that has been taken offline by the system's human operator.
Figure 7.2. A classification of exceptions associated with a resource on an element's interface
Exceptions and effects produce a three-way division of the state space for every resource on an interface.
- First, effects promise what will happen in a certain portion of the state space, what Parnas (Parnas and Wuerges, 2001) has called the competence set of the program: the set of states in which it is competent to carry out its function. A resource invoked in a state that is a member of its competence set will execute as promised in the interface document.
- Second, exceptions specify the semantics in a different region of the state space, corresponding to error conditions that the architect has had the foresight to anticipate. If a resource is invoked in a state corresponding to an exception, the effects are simply that the exception is raised. (Remember, handling the exception is not in the purview of the resource but in the actor that invoked it. Raising the exception gives the actor the chance to do so.) We'll call this set of states the exception set.
- Third, there is everything else: the region of the state space where what happens is completely undefined if a resource is invoked. The architect may not even know and maybe even has never considered the possibility. We'll call this set of states the failure set; we could as well have called it the cross-your-fingers-and-hope-for-the-best set. The behavior may be unpredictable and hence difficult to re-create and therefore eliminate, or it may be depressingly predictable: a very ungraceful software crash.
In a perfect world, the architect squeezes the failure set to nothingness by moving failure states to the competence set by expanding the statement of effects, or to the exception set by creating more exceptions. An equally valid approach is to make a convincing argument that the program cannot possibly get into a state in the failure set.
For example, suppose that element E needs to have complete control of a shared device during the execution of resource R on interface I. If the architect wasn't sure that this would always be the case when R was invoked, he or she would either (1) specify what the element would do if the device was already in usereturn immediately with a failure code, retry a set number of times, wait a set periodor (2) define an exception for R that reported the condition back to the actor and made it the actor's responsibility to sort out. But perhaps the architect is certain that the device will never be in use, because element E is the only element that uses it. So the architect doesn't define behavior for the resource to account for that condition and doesn't define an exception for it, either. This puts the condition in the resource's failure set, but the architect can make a convincing argument that doing so is safe.
|