Tracing and Instrumentation refers to the process of adding code to your application in order to locate and diagnose errors, identify application status, and measure performance. Instrumentation is about adding the tracing code, and tracing is about directing and
In a way, this process is similar to the use of black boxes in commercial and many private aircraft. Black boxes consist of a Flight Data Recorder (FDR) and a Cockpit Voice Recorder (CVR) that together record a large amount of information about the state of an aircraft and the behavior of its flight crew. Since their implementation, they have been invaluable in identifying and diagnosing nearly all of the problems that have caused airplane incidents and crashes.
You can think of adding tracing code as the equivalent of creating an FDR for your application. This is important because monitoring, supporting, and maintaining a software system,
This chapter looks at useful types of diagnostic information, who is likely to use this information, how to use the diagnostic tools available in .NET, and how to control the data that is generated. It deals in detail with the following subjects:
Types of diagnostic information that are useful to record
Tracing in VB .NET, including recording, listening, and configuration
Tracing with ASP .NET
Using custom performance counters with Performance Monitor
Tracing in ASP .NET applications is covered in Chapter 9, which covers the debugging of ASP .NET applications.
Before you look at how to use the tracing tools provided by .NET, you need to understand who will be using the information provided by the tracing code and how they will use that information. The four most common stakeholders are the application's end users, the application's support team, the application's testing team, and the application's development team.
An application's end users have to know information that could adversely affect their use of the application. These people need status information from an application that helps them to
Identify application status and failures,
Identify application mistakes that might affect business processes
Work around certain application problems
Some applications have the luxury of a dedicated support team, whereas others have to make do with the support available from the general help desk. However support is handled, a
Identify the application's status and behaviors
Diagnose application errors and identify the causes
Fix or work around the straightforward problems
Pass diagnostic information to developers for problems that can't be
Assist end users with performing certain application
Improve an application's availability to its end users
An application development team has to identify the causes of application errors, fix an application when it goes wrong, and
Diagnose application errors and identify the causes
Identify and understand end
Identify application performance bottlenecks and reliability "hot spots"
Improve an application's reliability and availability
Provide further tracing code to isolate and identify
As you can see from the
Failure data : Exceptions, errors, warnings, crashes, and assertion failures
Timing data : Performance, communications, queries, and lengthy operations
Statistical data
: Traffic
Debugging data : User actions, program state, and program data
It is the job of a developer to include tracing code to emit all of this data, and it is quite an art to identify exactly which information will be useful for solving a wide variety of potential problems. At the very least, I suggest that your tracing code be capable of recording each of the following situations:
Every program crash and the call stack that led to the crash
Every exception and the call stack that led to the exception
Every assertion failure (I discuss assertions in the
Every "Page not found" (404) error
Every component or page time-out
Timing data for every lengthy software operation
Timing data for every business-critical software operation
Timing data for every database query
Timing data for any "real-time" communication with another application
Status data recording the results of application health checks
Status data reporting any communication problems between
Records of all messages involving communication between components
Traffic volumes, especially any large
Usage patterns, especially any significant changes in usage
Note that having tracing code that records this information doesn't mean that all of the tracing code needs to be activated all of the time. As you'll see later in the section titled "Step 5: Trace Control at Runtime," it's possible to use an application's configuration file at runtime to control which information is actually
Of course, you also need to record information that pinpoints the exact location and type of problem. This might include some
Component
Machine name, IP, and maybe its geographic location if
Type of environment (e.g., dev, integration test, user test, live, or contingency)
Whether support is required and the support group to be contacted
The bottom line is that all of this diagnostic information helps you to improve your application's availability through better and faster understanding of problems. In the case of a large distributed application where problems can be especially hard to locate and diagnose, this information can make the difference between your application being viable and not
No matter what tracing tools you use, there are some design decisions that you need to make. Here are a few rules of thumb that have proven to be useful for many applications, even those built before .NET arrived:
Centralize tracing code : Create reusable tracing classes or components that standardize logging information and provide tracing facilities without bothering developers with unnecessary details.
Centralize diagnostic information : Place all diagnostic information in one central location (such as a database) to allow easier monitoring and statistical analysis.
Minimize performance degradation
: Use trace control mechanisms to ensure that you don't emit diagnostic information unless it's needed and to
Analyze failures and keep metrics
: To improve the service