< Day Day Up > |
Dealing with error conditions is probably the hardest part of the development effort. Errors fall into at least two categories: conditions that arise in the normal operation of the program, and failures in the environment in which the program is operating. I prefer the term deviation for an error that occurs during normal processing. A deviation is a departure from straightforward processing that can occur during normal program operation. Most use case logic deals with straightforward logic. The user does this, the system responds with that. In the normal course of processing, the system needs to deviate from this straightforwardness. For example, it is possible that a CustomerID is entered that does not equal any of the IDs in the set of Customer s. This could occur because the CustomerID was input incorrectly or the Customer was deleted because the customer had not rented for several years . If the collection of customers is kept on a server, causes include a network failure or server failure. The first set of causes for a CustomerID not being found are deviations that can occur during normal processing. A correction mechanism can be suggested to the user (e.g., reenter the ID), though user action might not solve the problem. [*] The second set of causes (network or server failure) are errors, not deviations. They should not occur during normal operation. However, if the server or network were known to be unreliable, they could be handled as deviations.
Deviations should be dealt with at an appropriate level. The methods closest to where the deviation occurs often have the most information regarding what actions the user can take. If opening a nonexistent file signals an error, the caller of the open method usually knows the file's purpose and can add information regarding what might occur in the absence of that file. For example, suppose the file the method was opening was a configuration file. If the configuration file is nonexistent, the method might choose to use default settings. If the configuration file is absolutely required by the program, the method can signal an error. Errors also should be dealt with at an appropriate level. There are two types of errors: fatal errors and nonfatal errors. Fatal errors are conditions for which further processing is probably futile. Examples of fatal errors include "out of memory" and "out of disk space." The user level is usually the place to deal with these errors. The internal code cannot correct them. Nonfatal errors are conditions for which the program can continue operation, albeit in a reduced capacity. An example is the inability to contact a service over a primary network. The methods in the level on which this error occurs should attempt contact over a backup network, instead of passing it up to a higher level. If sufficient nonfatal errors occur, they could turn into a fatal error. For example, if both the primary and backup networks go down, a fatal error should be signaled.
Whether deviations are signaled using return codes or exceptions is a matter of preference. If they are reported using exceptions, they should be classified into their own hierarchy to differentiate them from exceptions for unexpected conditions. If all possible deviations are coded as just regular exceptions, it becomes difficult to separate the expected from the unexpected. Exceptions in many languages are divided into checked and unchecked exceptions. Checked exceptions are listed in the declaration of the method. The caller of the method must explicitly handle all checked exceptions by either catching them or passing them back to its caller. Unchecked exceptions are not listed and the caller might not even be aware that they are thrown. Unchecked exceptions are typically used for conditions that should cause termination, such as the inability to connect to a database. 3.9.1. Failure DistanceA large spread between the spot where an error occurs and when it is noticed makes the error harder to debug. For example, suppose an object reference is set to a null value. If this value is used to refer to an object, a program exception usually occurs. For example: String reference = NULL; // A few lines of code reference.get_length( ); If the distance between the setting of the reference and its use is small, it is relatively easy to detect. If the distance is within a single method, often a compiler can identify the problem and issue a warning. However, if the reference is set in one method and is not used until many methods later, all the intervening methods have to be examined for bugs . The sooner the error is detected , the easier it is to correct it. A concept of distance applies to the development process. The sooner an error is found, the easier it can be to fix. If abstract data types (ADTs) are used extensively, many errors can be detected at compile time in languages that support static type checking. For example, with a method such as: get_abbreviation_for_state(String state); any string can be passed to the method. With: enumeration State {Arkansas, Alaska, ...} get_abbreviation_for_state(State aState) the compiler will signal an error if anything other than a State is passed.
3.9.2. User MessagesMessages reported to the user from deviations and errors should be meaningful to the user. They should include as much information as possible regarding how to work around or correct the error. Failures can be categorized into the meaning of the failure and how the user might react to the failure. The user message should designate the category of the failure. For example, permanent failures imply that the user trying the same operation again will get the same error. Transient failures suggest that the user might attempt the operation again immediately and might complete it successfully. Temporary failures require some undetermined period of time before they are cleared up.
Implementation- related messages (such as a stack trace) should be captured, but not necessarily displayed to the user. You need to provide the means for developers to see the details. Otherwise, you cannot diagnose and fix problems, particularly intermittent problems. 3.9.3. AssertionsAssertions are statements about conditions that must be true while a program is executing. Some developers disable assertions when a program is used for production. If assertions in production code should be true during testing, they also ought to be true during production. Assertions should be removed only if there is a measured performance penalty. The behavior of an assertion during testing usually causes the program to exit. For many applications, that behavior is appropriate. The user can be informed with a friendly terminating error message. For other applications, such as a server, that behavior might not be acceptable. In that case, assertions should be reported immediately and logged.
|
< Day Day Up > |