Tracing | Professional .NET Framework 2.0 (Programmer to Programmer)

Terminology can certainly mean different things to different people, and the word tracing is no different. The word "tracing" is used in this book to mean any code that publishes information proactively for purposes of diagnostics, also commonly referred to as instrumentation. The data points and measurements published are determined up front while a developer writes code and tracing is usually activated by some configuration or application settings. Debugging, on the other hand, enables you to step into a program's execution to examine state and information in an interactive fashion. The two technologies are highly complimentary.

The fundamental premise underpinning tracing is that developers often know how, when, and why something might go wrong in their program. Often these potential problems end up as either comments or "poor man's tracing" (i.e., Console.WriteLines), both of which are not very effective for diagnosing errors. Console.WriteLines often get removed, commented out, or ifdefed out before development is complete. Removing these eliminates an invaluable source of pre- and postcondition and invariant checks that tend to come in very handy should the program fail. The tracing APIs provide abstractions so that this type of tracing can be preserved, toggled on and off as needed to eliminate unnecessary runtime costs, and controlled by the developer, testers, users, or administrators using the built piece of software.

Furthermore, a tracing log captures historical information that can provide great insight into what may have gone wrong and when. Some programs won't fail until hundreds of instructions after the actual program bug corrupted data, missed it, and made the mistake of continuing to execute. This is sometimes called limping along, and most programs don't survive it. If relying solely on interactive debugging, capturing such historical data could be a very laborious process of single-stepping through a program and watching variables change as it executes (the exception doesn't get thrown until some indeterminate time later). Many developers will respond by adding WriteLines and other such hooks that ultimately get removed after the problem has been diagnosed. What a waste! Generally, the closer to the real bug you capture information, the easier it will be to locate and fix later on.

Trying to figure out every place where tracing information would be useful should your application fail is admittedly a bit of a guessing game. Overtracing can pollute your trace log, lengthening the process of analyzing it upon failure, but conversely, undertracing can leave you scratching your head if the tracing code missed the crucial data about what failed and why. Aspect Oriented Programming (AOP) is often heralded as solving this problem, at least to some degree. It enables tracing to be inserted into your program in a systemic manner without a developer touching any code in the program, for example "log every method call and its arguments," "log every object allocation," and so forth. Unfortunately, there isn't a widely adopted AOP framework for the .NET Framework as there is for Java. Doing a quick Internet search should turn up some experiments, but as of the time of writing, not much that is appropriate for enterprise use.

In this section, we'll take a tour of the base tracing architecture, starting with the core types you need to work with regardless of whether you're doing an assert, informational trace, or writing to the Event Log. Then we'll move on to some specific abstractions and functions that sit on top of this infrastructure. We will then look at how tracing can be configured in a deployed environment, enabling testers, users, and administrators to turn certain tracing output on and off as needed. Lastly, we'll discuss how you might want to extend the tracing infrastructure to support your own custom tracing.

Why Tracing and Not Exceptions?

Before diving into the details of individual tracing components, let's first address a common question: Why use tracing instead of the exceptions subsystem? The simple answer is that you should use both, that they are highly complimentary. Once a problem at runtime is recognized, it's likely that you will end up throwing an exception, either on purpose or accidentally, depending on whether your code is able to detect corruption or not. If you know about it, however, you should throw exceptions for these situations.

But you still need to create trace information for these types of failures! Consider why. Tracing gives you a history of interactions between components and data inside your application leading up to a failure. If all you have is an exception message and a stack trace, it's likely going to involve a lot of investigative work to reproduce the problem. And you'll likely have lost important information about the incidents leading up to the failure.

Moreover, precisely what happens with an exception is up to your immediate caller. A problem called overcatching is widespread — where callers accidentally (or even intentionally) catch more than they are able to reasonably handle — often leading to an exception being swallowed. That is, the caller ignores the failure and proceeds as normal. This will probably result in a new exception sometime later on, at which point the original reason for the failure has been completely lost. Having trace information handy mitigates the damage.

Exceptions are still, of course, the primary mechanism with which to communicate errors. Throwing an exception to your caller enables them to determine their own fate. Your documentation should tell potential callers under what conditions the exception might be thrown and recommended actions to take. One such action might be to enable a certain trace listener to gather detailed information about the conditions leading to the failure. A large percentage of time the exception will go uncaught to be logged at the application's main loop and presented to the user, likely followed by termination of the application. For long-running server-side applications, it's probable that the application will restart the main loop or make a serious attempt at recovery.

Throughout all of this, having detailed tracing information is invaluable. It saves you work because you'll have a great starting point when a failure is noticed. And you can even point out worrisome behavior in the absence of failures. Without the right trace information, you're left with laborious hands-on debugging and probably a bit of after-the-fact tracing needing to be injected into your code. Tracing makes this process more palatable.

Tracing Architecture

Nearly all of the tracing features discussed in the pages to come are located in the System.Diagnostics namespace in the System.dll assembly. Any exceptions to this will be noted. This section takes a high-level overview at the core infrastructure that makes tracing work, from the entrypoint tracing API on through the listeners to the end result of the trace.

A trace typically begins life as a single method call to a source. Consider the following Customer class that has been instrumented to recognize when names are null or empty:

 class Customer {     static TraceSource customerTraceSource = new TraceSource("Customer");     public string Name;     public Customer(string name)     {         if (name == null || name == string.Empty)             throw new ArgumentNullException("name");         this.Name = name;         Trace.WriteLine("Instantiated a new customer {0}", this.ToString());     }     public override string ToString()     {         Debug.Assert(this.Name != null && this.Name != string.Empty,             "Expected customer name to be non-null here", this.ToString());         if (Name == null)             customerTraceSource.TraceEvent(TraceEventType.Warning, 1000,                 "Customer data might be corrupt (null name)");         return Name;     } }

In the Customer constructor, we have a runtime check to ensure that we never instantiate a customer will a null or empty name. Our Trace.WriteLine call simply logs that a new Customer got created. Then in our ToString method we first Debug.Assert that the name is non-null and nonempty; if this assert fails, we know the customer data was corrupted somehow. (It turns out that in this case, our Name field is public, so that's a likely suspect! Real-world scenarios would probably be more difficult to track down.) Because the Assert gets compiled out of non-DEBUG builds, we've also determined that this error is critical enough to log it in release builds, too. We use the TraceSource class to do that.

This demonstration used all three types of trace entrypoints we cover in this chapter: TraceSource, Trace, and Debug. A real program probably wouldn't use all three in this manner: Having three types of traces, some of which are redundant, all in close proximity to each other sure is confusing. Nonetheless, the calls to Trace.WriteLine, Debug.Assert, and customerTraceSource.TraceEvent are all sources of tracing and will be discussed further below.

Tracing Components

A high-level picture of the .NET Framework's tracing architecture is depicted in Figure 12-1.

image from book
Figure 12-1: .NET Framework tracing architecture.

When a program decides to publish tracing information, it does so by calling a trace function on a trace source API. This API then formats and passes the supplied information to zero or more listeners (instances of TraceListener) that have been attached to the source. (If you're working with a TraceSource, there is an optional Switch that filters messages before handing data to listeners. It is able to prevent trace events from hitting the rest of the system.) Each listener consults its optional Filter by calling its ShouldTrace method. If no filter has been defined or ShouldTrace returns true, the information gets committed to the trace destination.

Your application can programmatically attach listeners itself, or it can initialize listeners, switches, and filters from the application's configuration file (which happens to be the more flexible approach). Programmers can bypass sources and interact directly with the listener APIs themselves, although this is an atypical practice.

Sources of Tracing

Being the gateway into the entire tracing architecture, the source APIs have the intent of abstracting away all of the complexity behind them. This includes the mechanisms used to filter, route, and store information. We'll discuss available source types in detail in a moment, but first here's a quick list of them with a summary of when you would choose one over the other:

TraceSource: For most enterprise instrumentation purposes, TraceSource is the most capable, rich, and pluggable. It is a welcome addition to 2.0. TraceSource additionally enables you to log different categories of messages, each category of which may be filtered and stored independently, meaning that you don't have to worry about different APIs depending on the situation (e.g., Debug versus Trace).

TraceSource is not application-wide as is the case with the other tracing APIs we discuss later; this enables you to maintain separate individual component logs. Most production applications use TraceSource for its richness. With that said, calls to its APIs remain in default release application configurations (unlike Debug), so you do incur the slight overhead of making tracing method calls in your application regardless of whether any listeners have been configured.
Debug: This class is useful for two primary things. First, it offers DEBUG-only Assert functions to check conditions that your program assumes to be true and that, if false, represent a bug. These are useful during development and testing to "prove" the absence of buggy program conditions but are not normally used in actual production applications. Second, for those times you want to print out quick-and-dirty debug information, the Debug class has an array of simple Write and WriteLine methods.

Debug is a good replacement for Console.WriteLines during development because — unlike Console calls — Debug function calls are configurable and automatically turned off in non-DEBUG builds. You probably wouldn't want these types of checks enabled in your release builds (e.g., as TraceSource and Trace calls are), so this type gives a lightweight way to instrument your application. In fact, this is one reason why you might end up using both the Debug and TraceSource classes inside the same application.
Trace: The Trace class offers the same interface as Debug, including Assert and Write and WriteLine methods. The only difference here is that calls to these methods remain in most release builds. Often you will want a number of trace hooks to remain in the production application so that you can use configuration to turn it on and off once it's been deployed. Realistically, the only reason you'd want to use the Trace class over TraceSource is for release-enabled asserts (TraceSource doesn't have an Assert API) and for cases where you just need quick-and-dirty tracing. You don't have to worry about managing instances, for example, because the Trace class is application-wide. For more customizability, flexibility, and power, select TraceSource instead.

An instance of each of these sources can have zero to many TraceListeners attached to it. Remember: Debug and Trace are application-wide (AppDomain-wide really), but TraceSource can have an arbitrary number of instances created. Because TraceSource and Trace calls are present in release builds (using VS's default release configurations), you will probably want to use configuration to suppress output until an administrator or user chooses to turn it on. This can be used to send the developer or support personnel detailed trace logs to aid in debugging when an application fails. We'll see how to do that later in this chapter.

Trace Listeners

Each TraceListener is responsible for doing something interesting with the input, usually filtering based on configuration information and writing it to persistent storage. There are several listener implementations included in the .NET Framework: a DefaultTraceListener, which writes to debug output (integrating nicely with VS); TextWriterTraceListener, which uses a Stream or TextWriter (with an out of the box console-based ConsoleTraceListener); and an EventLogTraceListener that writes to the Windows Event Log, for example. Each implementation enables simple pluggable filtering with the TraceFilter class.

The base abstract class TraceListener is designed for extensibility. In many cases, people choose to write custom tracing logic. In particular, if a listener doesn't exist for your target storage medium, or if you need to perform some sort of custom logic, for example, you can subclass TraceListener, implement a few methods, and you're good to go. One obvious example when this might be useful is to write trace information to a custom database table. We show how to do that later on.

As mentioned above, you can attach a filter to a listener. A TraceFilter is a type that tells the listener whether a given entry should be logged or simply dropped on the floor. Filters can be dynamically altered through configuration files, enabling you to turn on and off entire segments of tracing functionality at runtime. For example, you can switch off a single component's trace information, turn off nonerror messages, and so forth. We'll look at precisely how to do this shortly.

Using the Tracing Sources

Now that you've seen the overview of the tracing infrastructure, let's drill down into the front end of the architecture. I use the term trace source to refer to any of the tracing APIs you use to initiate a trace. This includes the TraceSource, Debug, and Trace classes.

Rich Tracing (TraceSource)

The TraceSource class is the most capable of the three tracing APIs we'll discuss in this chapter. First and foremost, unlike the Debug and Trace types (which offer only static methods), you have to work with actual instances of TraceSource. This is useful when you want to segregate output from individual components. If this is undesirable or more work than you're looking for, just create a single static instance and share it across your entire application. Also note that the TraceSource APIs are conditionally compiled out when the TRACE symbol is not present. Visual Studio enables this symbol on the default release builds, but you can do it by hand via configuration or the C# compiler's /define:TRACE switch.

When instantiating a new TraceSource, you must supply a name. This, like many other tracing components, enables you to cross reference the instance from within a configuration file. A constructor also exists that takes a SourceLevels value indicating the switch to use by default. If no level is supplied the TraceSource defaults to Off, meaning that no tracing information will actually be published. Other valid values include Error, Warning, and Information. Refer to the section below on the SourceSwitch API for more details on how filtering works.

For example:

 internal static TraceSource myTraceSource =     new TraceSource("myTraceSource", SourceLevels.Error);

This snippet of code sets up a new static TraceSource that is configured to log Error-level information.

Once you have a TraceSource instance in hand, there are three primary APIs (with a few overloads each) with which you will publish trace information:

TraceEvent is the most general function on the type, enabling you to supply a TraceEventType, unique numeric identifier, and a string message. TraceEventType is used for filtering and gets logged in the underlying store; it represents the category of message. Popular values are Critical, Error, Warning, Information, and Verbose.
TraceData is very much like TraceEvent, with the exception that it takes any object as the argument instead of a string message. This is useful for logging arbitrary instance data, and is basically the same thing as doing a null check and calling ToString on the argument.
Lastly, TraceInformation is a shortcut to the TraceEvent function, with the TraceEventType.Information value passed for the eventType argument, and a 0 for the id argument. It's a convenient shortcut for general-purpose informative logging.

Here are some brief examples of how you might use these functions:

 myTraceSource.TraceEvent(TraceEventType.Error, 10000,     "Customer name is not valid {0}", customer.Name); myTraceSource.TraceData(TraceEventType.Error, 10001, customer); myTraceSource.TraceInformation("Customer successfully loaded from DB");

You'll see shortly how you can use the TraceEventType value to filter traces.

Asserting Program Invariants

The most barebones form of tracing is an assert. It is noted separately because it's available on both the Debug and Trace classes, and honestly because it's so common that it warrants its own section. (If you don't use asserts religiously in your programs, you should.) Asserts have been used in systems programming for a long time — including C++ and Java — and are useful for a few key scenarios:

Validating that invariants hold at specific points of execution in your program. An invariant is a Boolean condition that must be true for the logic you've written to remain correct. If somehow this condition were to become false, unpredictable things would happen. Because of the runtime cost of checking these invariants — there are often many — sprinkling asserts throughout your code enables you to keep them during testing against DEBUG builds but to remove them from your production builds. Furthermore, they are proven properties of your code; a primary goal during testing should be to eliminate all asserts.
As a variant on the first bullet, method pre- and postconditions are a form of invariant. A precondition is an invariant your method assumes and that must be true for it to work correctly. Postconditions are the opposite: they are conditions callers can assume hold after making a call to a method. Most of the time, you will guard these things with standard if-checks that throw exceptions upon failure. However, for private or internal methods, checking these in the same manner as invariants can prove quite useful. If you're comfortable with your level of test coverage, this can have the same benefits as invariants. Be cautious: if the methods are public, or if there's a simple code path that just flows publicly provided data to your method, your code might end up executing with a set of assumptions that aren't true. When in doubt, guard execution with an exception.
Determining code coverage for hard-to-hit code paths. During development, you'll often want unconditional asserts to fire just to make sure that you are successfully exercising a code path. For infrastructure components that have lots of layers on top, this can be instrumental during development, but these asserts often get removed after that.

As hinted at, exceptions are usually not appropriate for these type of situations. There is a runtime cost associated with the check, and because these errors often represent a real program bug you'll want to find and fix these during testing. With that said, if you're not confident that there is absolutely no way to break an invariant and you know that breaking such an invariant might lead to security or data corruption holes, guaranteeing execution is not possible (i.e., via an exception) is well worth the cost. As with everything, this is a judgment call you will have to make.

Asserts in the .NET Framework are conditionally compiled into builds based on whether you're using the Debug or Trace class. We discuss the mechanics of this process below. To summarize: using VS's defaults for Release Configuration, Debug asserts will show up only in Debug builds, and Trace asserts will show up in debug and release builds. This means that you need to aggressively test against whatever build the asserts appear in to discover bugs, since your program may continue silently executing in the face of failed asserts after they get conditionally compiled out.

Using the Assert APIs

The assert APIs are located in the System.dll assembly, under the System.Diagnostics namespace. There's not much to them. The static classes Debug and Trace both expose a set of Assert function overloads, each of which takes a Boolean condition as the first argument, and an optional message to publish upon failure. You may optionally supply a more detailed message regarding the situation.

 using System; using System.Diagnostics; class Program {     static void Main()     {         F(0, 0);         F(0, 1);     }     static void F(int x, int y)     {         Debug.Assert(x == y, "x does not equal y");         // Do some work...     } }

This code snippet is a bit silly. Asserts of course ordinarily perform more useful error checking. But this code at least illustrates the behavior of Debug.Assert. If F's arguments x and y do not have the same value upon entrance into the function, the assert will notice this and trigger the failure logic. We can also supply a more detailed error message as the third argument, for example:

 Debug.Assert(x == y, string.Format("x does not equal y",     "This method assumes that x and y are equal, but they are not (x: {0}, " +     "y: {1}); continuing execution could result in data corruption.",     x, y));

A more realistic situation might be checking the x and y fields of a type that has a known invariant of, say, y being greater than x. If the type is supposed to be in charge of ensuring this invariant holds, our code might decide to guard against it in case this invariant somehow gets broken.

Assert Failures

When an assert fails, it defers to a general failure-handling routine to react to it. This routine is the Fail method on the Debug or Trace class, which accepts strings to detail the error (the Assert method supplies these automatically). With the default configuration, these types use the DefaultTraceListener, which responds by popping up a UI box that contains the assert's failure message as well as a full stack trace detailing the path leading up to the assert. This message box provides three options: Abort, Retry, or Ignore. An example of this dialog — resulting from executing the above program — is shown in Figure 12-2.

image from book
Figure 12-2: An assert dialog.

What happens after an assert fails depends on the answer to this choice:

Abort will shut down the application immediately by calling Environment.Exit.
Retry launches the installed debugger (or if you're already in it, breaks execution). It enables you to Edit & Continue to fix the bug, or examine the state of variables to try to determine the cause for the failed assert.
Ignore will resume execution as if the assert didn't fail. Responding this way can obviously result in ensuing corruption or a plethora of additional asserts. You are essentially choosing to continue executing although invariants are known to be broken.

If you are running code inside the VS debugger, you will also see the text for the assert failure in your Output window, assuming that you have the "Show output from" dropdown set to include Debug. Other debuggers likewise see this output in a format depending on how they choose to represent debug output. This magic happens through the use of the System.Diagnostics.Debugger class.