Miscellaneous Debugging Techniques | The Windows 2000 Device Driver Book: A Guide for Programmers (2nd Edition)

< BACK NEXT >

[oR]

Often the main problem in correcting driver bugs is just getting enough information to make an accurate diagnosis. This section presents a variety of techniques that may help.

Leaving Debugged Code in the Driver

In general, it is a good idea to leave debugging code in place, even after the driver is ready for release. That way, it can be reused if the driver must be modified at some later date. Conditional compilation makes this easy.

The BUILD utility defines a compile-time symbol called DBG that can be used to conditionally add debugging code to a driver. In the checked BUILD environment, DBG has a value of 1; in the free environment, it has a value of 0. Several of the macros described below use this symbol to suppress the generation of extraneous debugging code in free versions of drivers. When adding debugging code to a driver, it should be wrapped in #if DBG and #endif directives.

Catching Incorrect Assumptions

As in real life, making unfounded assumptions in kernel-mode drivers is a dangerous practice. For example, assuming that some function argument will always be non-NULL, or that a piece of code is only called at a specific IRQL level can lead to disaster if these expectations are not met.

To catch unforeseen conditions that could lead to driver failure, two things must be done. First, the explicit assumptions made by code must be documented. Second, the assumptions must be verified at runtime. The ASSERT and ASSERTMSG macros help with both these tasks. They have the following syntax:

 ASSERT( Expression ); ASSERTMSG( Message, Expression );

If Expression evaluates to FALSE, ASSERT writes a message to WinDbg's command window. The message contains the source code of the failing expression, plus the filename and line number of where the ASSERT macro was called. It then provides the option of taking a breakpoint at the point of the ASSERT, ignoring the assertion failure, or terminating the process or thread in which the assertion occurred.

ASSERTMSG exhibits the same behavior, except that it includes the text of the Message argument with its output. The Message argument is just a simple string. Unlike the debug print functions described earlier, ASSERTMSG does not allow printf-style substitutions.

It should be noted that both assertion macros compile conditionally and disappear altogether in free builds of the driver. This means it is a very bad idea to put any executable code in the Expression argument.

Also, the underlying function used by these macros, RtlAssert, is a no-op in the free version of Windows 2000 itself. So, to see any assertion failures, a checked build of a driver must be run under a checked version of Windows 2000.

Finally, a warning is in order. The checked build of Windows 2000 crashes with a KMODE_EXCEPTION_NOT_HANDLED error if an assertion fails and the Kernel's debug client is not enabled. If the debug client is enabled, but there is no debugger on the other end of the serial line, the target machine simply hangs if an assertion fails. Recovery can be attempted by starting WinDbg on the host machine, but the text of the assertion that failed is lost.

Using Bugcheck Callbacks

A bugcheck callback is an optional driver routine that is called by the kernel when the system begins to crash. These routines provide a convenient way to capture debugging information at the time of crash. They can also be used to place hardware in a known state before the system goes away. They work as follows:

In DriverEntry, use KeInitializeCallbackRecord to set up a KBUGCHECK_CALLBACK_RECORD structure. The space for this opaque structure must be nonpaged and must be left untouched until it is released with a call to KeDeregisterBugCheckCallback.
Also in DriverEntry, call KeRegisterBugCheckCallback to request notification when a bugcheck occurs. The arguments to this function include the bugcheck callback record, the address of a callback routine, the address and size of the driver-defined crash buffer, and a string that is used to identify this driver's crash buffer. As with the bugcheck-callback record, memory for the driver's crash buffer must be nonpaged and left untouched until the driver calls KeDeregisterBugCheckCallback.
Call KeDeregisterBugCheckCallback in a driver's Unload routine to disconnect from the bugcheck notification mechanism.
If a bugcheck occurs, the system calls the driver's bugcheck-callback routine and passes it the address and size of the driver's crash buffer. The job of the Callback routine is to fill the crash buffer with any information that would not otherwise end up in the dump file (like the contents of device registers).
When analyzing a crash dump with WinDbg, use the !bugdump command to view the contents of the crash buffer.

There are some restrictions on what a bugcheck callback is allowed to do. When it runs, the Callback routine cannot allocate any system resources (like memory). It also cannot use spinlocks or any other synchronization mechanism. It is allowed to call kernel routines that don't violate these restrictions, as well as the HAL functions that access device registers.

Catching Memory Leaks

A memory leak is one of the harder kinds of driver pathology. Drivers that allocate pool space and then forget to release it may just degrade system performance over time, or they can lead to actual system crashes. Using the Windows 2000 built-in pool-tagging mechanism can help determine if a driver leaks memory. Here is how it works.

Replace calls to ExAllocatePool with ExAllocatePoolWithTag calls. The extra four-byte tag argument to this function is used to mark the block of memory allocated by the driver.
Run the driver under the checked build of Windows 2000. Keeping track of pool pages is an expensive activity, so it only works under the checked version of the OS. Optionally, the GFLAGS utility, supplied with the Platform SDK, can be used to enable the feature for the retail version of Windows 2000.
When analyzing a crash, or when a driver stops at a breakpoint, use the !poolused or !poolfind commands in WinDbg to examine the state of the pool areas. These commands sort the pool areas by tag value and display various memory statistics for each tag.

One easy way to use pool tagging is to replace the ExAllocatePool function with ExAllocatePoolWithTag inside of conditional compilation directives. This way, tagging can be enabled and disabled without considerable effort. Even better, a driver macro can be used for all pool allocations. The macro itself can contain the conditional compilation directives. For example:

 #define ALLOCATE_POOL( type, size )                   \ #if DBG==1                                            \   ExAllocatePoolWithTag( (type), (size), 'DCBA' )     \ #else                                                 \   ExAllocatePool( (type), (size) )                    \ #endif

The tag argument to ExAllocatePoolWithTag consists of four case-sensitive ANSI characters. Because of the byte-ordering phenomena of little-endian machines, the tag must be specified as characters in reverse order. Hence, the DCBA tag in the example becomes ABCD in the pool tag display.

In this example, the same tag value is used for all the allocations made by a single driver. For some situations, it may be appropriate to use different tag values for different kinds of data structures, or for allocations made by different parts of a driver. These kinds of strategies may help identify memory leaks caused by a driver.

The POOLMON utility that ships with the DDK allows dynamic observation of pool tags without the need for WinDbg. This command-line utility runs on the target machine and it outputs a continuously updated display of the pool tags. The tool is also supplied with the Windows 2000 Resource Kit.

Using Counters, Bits, and Buffers

There is no question that interactive driver debugging is a wonderful feature. Unfortunately, some bugs are time-dependent, and they disappear when breakpoints or single-stepping is used. This section presents several techniques that may help under these circumstances.

SANITY COUNTERS

Pairs of counters can be used to perform several kinds of sanity checks in a driver. For example, they might count how many IRPs arrive at a driver and how many are sent to IoCompleteRequest. Or, in a higher-level driver, the number of IRPs allocated versus the number released could be tracked. Checks like these can help find subtle inconsistencies in the behavior of a driver. The only disadvantage of sanity counters is that they do not necessarily pinpoint the location of the problem.

Implementing a counter is very simple. Declare a ULONG variable within the Device Extension for each counter, and then add appropriate code to increment the counters throughout the driver. As with all debugging support, it is a good idea to wrap sanity-counter code in conditional compilation statements that depend on the DBG symbol.

A somewhat ambitious plan would be to write a WinDbg extension command to display all of a driver's counters. As a simple alternative, a driver can force a bugcheck after it has collected enough data and simply use a bugcheck callback to save the counter values.

EVENT BITS

Another useful technique is to keep a collection of bit flags that track the occurrence of significant events in a driver. Each bit represents one specific event, and when that event happens, a driver sets the corresponding bit. Where sanity counters track global driver behavior, event bits provide information about what parts of code have executed.

One of the design decisions for event bits is whether to clear the event variable during DriverEntry, during the AddDevice or Dispatch routines, or when processing begins on each new IRP. Each of these options provides useful information in different situations.

TRACE BUFFERS

The problem with event bits and counters is that they do not provide information about the sequence of execution of code. As an alternative, a simple tracing mechanism can be added that makes entries in a special buffer as different parts of a driver execute.

Trace buffers can be very useful for tracking down unexpected interactions in asynchronous or full-duplex drivers. On the downside, this extra information is not free. Trace buffers use more CPU time than counters or event bits, and this can have invasive results on time-sensitive bugs.

Implementing a trace buffer mechanism takes more work than the other techniques already presented. Here are the basic steps to follow.

Add trace buffer data structures to the driver. Normally, the structures should appear in the Device Extension so that tracing can occur on a device-by-device basis. Occasionally, there may be merit in providing a global buffer that traces the entire driver.
Define a macro to make entries in the trace buffer. As with other debug code, it is a good idea to bracket the trace macro with conditional compilation statements.
Insert calls to the trace macro at various strategic places in the driver.
Write a debugger extension to dump the contents of the trace buffer.

The trace buffer is just an array coupled with a counter that keeps track of the next free slot. The following code fragment illustrates the structure of the basic trace buffer:

 typedef _DEVICE_EXTENSION {      : #if DBG==1      ULONG traceCount;      ULONG traceBuffer[ TRACE_BUFFER_SIZE ]; #endif      : } DEVICE_EXTENSION, *PDEVICE_EXTENSION;

Again, depending upon the data being sought, the traceCount field can be initialized once in the DriverEntry routine, or each time an IRP arrives.

Adding entries to the buffer is just a matter of storing an item in the array and incrementing the counter. The code fragment below demonstrates how to implement a basic trace macro.

 #if DBG==1 #define DRVTRACE( pDE, Tag )                \  if (pDE->traceCount >= TRACE_BUFFER_SIZE)  \     pDE->traceCount = 0;                    \ pDE->traceBuffer[ pDE->traceCount++ ] =     \     (ULONG) (Tag); #else #define DRVTRACE( pDE, Tag ) #endif

Notice that this implementation ignores all the synchronization issues that can arise when DRVTRACE is used from multiple IRQL levels (potentially on multiple CPUs). Since the whole purpose of using trace buffers is to catch errors that are sensitive to timing, putting synchronization mechanisms into DRVTRACE would probably render it useless.

One solution is to call DRVTRACE only from places in a driver where synchronization will not be a problem. For example, when calling DRVTRACE from DPC routines, synchronization is inherently handled as part of the larger structure of the driver itself. Similarly, if it is called from an ISR and a SyncCritSection routine, synchronization is already guaranteed. If these restrictions cannot be met, explicit synchronization must be added to DRVTRACE.

< BACK NEXT >