Requirements for Error Handling | Comprehensive VB .NET Debugging

Although what developers aim to achieve with their error handling might seem obvious, it's worthwhile to take some time to understand the subtleties of the process. There's far more to error handling than just catching every exception and displaying a message box to the user . Before you can build a general errorhandling framework, you need to understand exactly what the requirements are. This section of the chapter discusses the requirements in three areas, based on the three groups of people involved in using and supporting a typical application.

End User Requirements

Remember that the end user is rarely interested in the precise details of a software problem or indeed why the problem happened . He just wants to use your software to accomplish something, and a detailed error message that's helpful to a developer is likely to be useless to the end user. This perspective is alien to most developers, which is why they should be careful about how they communicate a software malfunction to a user. Bearing this in mind, here's a list of ideas for you to think about when writing error-handling code that affects your end user:

Stop the task that the software is trying to perform. Trying to continue with that task after an application error is more likely to destabilize your application and possibly make the problem worse .
It's usually better for your application to give no answer than to give a potentially wrong answer. A wrong answer will mislead or confuse the user, especially when he doesn't see a direct link between the error and the answer.
If at all possible, keep your application stable after an error. This normally means rolling everything back to a known safe state, a task that's made easier when you use the Try Catch Finally construct that I discuss later. Try to avoid closing your application, as this can be very annoying for a user. Instead, try to structure your code so that it can recover after most errors, at least long enough for the user to complete her task. Unfortunately, as you'll see later, there are a few exceptions that will kill your application completely.
Don't lose or corrupt the end user's data. Nothing infuriates a user more effectively than losing information that he has spent some time compiling and entering into your application. Saving the user's data after an error can be tricky because the data in memory may have been corrupted by the error and you don't want to risk overwriting the valid information that's already stored. If necessary, save the data in memory to some temporary store to avoid disrupting information saved previously.
Users don't read anything! Study after study has shown that the majority of users simply don't bother to read that carefully crafted error message that you present to explain the problem that your application encountered . Think of an end user as similar in some ways to your boss. She doesn't want to hear about a problem; she wants to see a solution or workaround.
If you need to describe the problem to the end user, you should do so in that user's terms and language. Avoid obscure technical jargon, abbreviations, and acronyms.
Following on from the previous point, try to suggest a solution or workaround to your end user. For example, rather than just presenting an error message saying that your application can't connect to its database, create an e-mail message that explains the problem and is already addressed to the application support group . Then give your user the ability to transmit this e-mail by simply clicking a button. He'll appreciate your helpfulness and come to think of your application as polite and helpful, even when it goes wrong.

Once you've written code to help your users recover from application errors, you should monitor your applications in production to make sure that the information provided is as useful and helpful as possible. You'll often have to add specific advice and workarounds over time to cope with the most common problems experienced by your users.

Operations Support Requirements

The next group of people that you want to help with solving your application's problems is the first-level support department. These are the people tasked with solving the users day-to-day software problems and with passing on any problems that they can't solve to application or maintenance developers. If they're given the right application support, these people can be invaluable for dealing with most of the routine problems without troubling the developers. Here are some ideas for handling application errors in a way that helps the operational support staff:

End users are notorious for failing to supply sufficient information about application problems. With some users, a bald statement that your application isn't working, never worked, and resembles a pile of rat droppings may be all the information that the support person receives. If your application actively records copious details about every problem or exception in a central location accessible to the support staff, you can make their job of investigation much easier.
Think about proactively notifying the support staff when your software experiences a problem, without waiting for the user to complain. For example, sending a message via an instant messaging system is ideal for this task. Doing this provides a dual benefit. First, the allocated support person has a head start in tackling the problem and is aware of the situation even before being contacted by the user. Second, the user will be less unhappy if he knows that a support person is already aware of the problem and may even be investigating it.
Provide the support staff with an escalation path for resolving the problem if it proves to be intractable. Typically, one or more developers own each component in a software application. The application can inform support staff about who is likely to be the expert in the context of the specific component that's experiencing the error. This can result in quicker and more effective problem resolution.
Try to avoid flooding the support staff with the same problem many times within a short period. For instance, an application that monitors files within a network folder might find that somebody has changed some file or folder permissions in such a way that the application can no longer view the folder contents effectively. In this case, churning out an error message every time that the security exception is raised, maybe once a second or more, can become very annoying very quickly for the support staff. Either the software should recognize that the same exception is occurring ad infinitum and suppress its own error output for a while, or you could provide a way in which a support person can suppress a specific error message temporarily. Otherwise, the error message becomes like an irritating car alarm that wakes you up during the night and resolutely refuses to switch itself off.
Providing a complete audit trail of the application's actions leading up to an error and information about its current status can be invaluable in helping a support person to isolate a problem. This could involve writing a trace log of the application's major actions and also of the user's actions before the error occurred.

To get some ideas for the sort of information that support staff find useful, please refer to the "Useful Diagnostic Information" section at the beginning of Chapter 6.

Developer Requirements

Maintenance developers don't want friendly, reassuring error messages or any soft padding that hides the raw details of a problem. They need to see clear signposts to an error, along with a mass of technical information to fall back on if the signposts don't point in the correct direction. These are some of the points you should think about when writing error handling and diagnostics code that produces information viewed by other developers:

Try to keep your error handling and cleanup code together in a location separate from the main application logic. This separation of the application logic from the error handling and cleanup code makes it easier for a support developer to understand what happens in the event of a problem and makes it more likely that the cleanup code will always be executed consistently. The Try Catch Finally construct, which I discuss in detail later in this chapter, is ideal for maintaining this separation.
Pay as much attention to your error-handling and cleanup code as you do to the application logic. Does your error-handling code leave your application in a valid business state? Have you tried passing bad arguments to your methods so that you can step through your exception logic? Have you stepped through all of your exception code with the debugger? Do your unit and regression tests exercise all reasonable paths, especially exception paths, in your code?
If you're throwing an exception, add specific information to the exception to make it easier for the developer calling your code to determine the exact problem. For instance, instead of just reporting that an invalid parameter was passed, explain which parameter was involved and list the valid values for that parameter. One trick some developers use is to throw a custom exception that displays a complete description of a parameter's usage if the value of the parameter is null, thus providing built-in documentation for the programmer using the method. You may not want to go this far, but it's worth creating your own exception classes so that you can add custom information to your exceptions. Once again, I discuss this in detail later in the chapter.
Document every exception that your component explicitly throws. If you look at the .NET Framework documentation, you can see that nearly every method's exceptions are documented in this fashion. This practice helps developers to understand your component and its methods. You should take special care to document any custom exceptions that can be thrown.
Always provide an alternative code path or method in your component that allows a developer to check for a condition rather than rely on an exception being thrown. For instance, in a component that reads from a disk file, have a separate method that allows the developer to check whether the end of the file has been reached. This method will be in addition to any method that throws an EOF exception at the end of the file. This gives developers using your component more choices for handling certain situations. More choices are good in this context because many developers have fixed views about how components should work and fixed patterns in their use of components .
Make sure that you perform adequate cleanup before you throw an exception. The developer receiving the exception is entitled to assume that there are no side effects when a method throws an exception.
Don't swallow an exception unless you understand that exception in detail and have dealt with any side effects that the exception might have on the code in the call stack above your method. For example, contrary to what the documentation states, the System.IO.File.Exists method swallows all exceptions and always returns either true or false . This means that if your code sees this method returning false , you don't know whether it was because the file didn't exist or because there was some other problem, for instance a security exception. Don't follow Microsoft's example! Instead, be kind to developers calling your methods and avoid hiding what's really happening.

Once again, the "Useful Diagnostic Information" section at the beginning of Chapter 6 can give you some ideas about the sort of information that other developers need to support and work with your application.

Logging Exceptions

In a production environment, unhandled exceptions or any exceptions that reach an application's boundary should always be logged if you want to understand your application's problems. This exception information can then be viewed by operators and developers, and can be analyzed individually or as a whole in order to gain a clearer picture of your application's behavior over time.

Later in this chapter I go into detail on which exceptions should be logged and how to log them, but in this section I want to explain some of the possible locations for storing these exceptions. There are at least three possible locations where exceptions can be stored for later analysis:

The Windows event log: The Windows event log is a proven, reliable place to store information, although it's not available on Windows 98. It has facilities for log file management, and .NET provides Framework classes such as System.Diagnostics.EventLog to make the event log easy to update and maintain. Because the operating system and the CLR also use the event log, it's easy to reconcile system and application events. The only major problem is that it can be difficult to combine the event logs from multiple machines, although Application Center Server from Microsoft has this facility.
A SQL Server database: The major benefit of using a central database for logging exceptions is that you can store all of the exception information for all of your applications in a single application, and you can then analyze it using standard SQL stored procedures. The drawback is that you're introducing an extra point of failure, in that the database might not be available. To avoid losing exception information when using this technique, you need to use the Windows event log when the database isn't available, but this entails monitoring two specific locations for exception information.
A custom log: This is rarely a better choice than the first two options, because you need to allow concurrent access to the log file, you must have a process that manages the log file size , and you need to develop tools to view and analyze the exception information.

Now that you have a fair taste of the requirements that any error-handling scheme has to meet, I can discuss how errors and exceptions are handled in VB .NET.