Often the main problem in correcting driver bugs is just getting enough information to make an accurate diagnosis. This section presents a variety of techniques that may help. Leaving Debugged Code in the DriverIn general, it is a good idea to leave debugging code in place, even after the driver is ready for release. That way, it can be reused if the driver must be modified at some later date. Conditional compilation makes this easy. The BUILD utility defines a compile-time symbol called DBG that can be used to conditionally add debugging code to a driver. In the checked BUILD environment, DBG has a value of 1; in the free environment, it has a value of 0. Several of the macros described below use this symbol to suppress the generation of extraneous debugging code in free versions of drivers. When adding debugging code to a driver, it should be wrapped in #if DBG and #endif directives. Catching Incorrect AssumptionsAs in real life, making unfounded assumptions in kernel-mode drivers is a dangerous practice. For example, assuming that some function argument will always be non-NULL, or that a piece of code is only called at a specific IRQL level can lead to disaster if these expectations are not met. To catch unforeseen conditions that could lead to driver failure, two things must be done. First, the explicit assumptions made by code must be documented. Second, the assumptions must be verified at runtime. The ASSERT and ASSERTMSG macros help with both these tasks. They have the following syntax: ASSERT( Expression ); ASSERTMSG( Message, Expression ); If Expression evaluates to FALSE, ASSERT writes a message to WinDbg's command window. The message contains the source code of the failing expression, plus the filename and line number of where the ASSERT macro was called. It then provides the option of taking a breakpoint at the point of the ASSERT, ignoring the assertion failure, or terminating the process or thread in which the assertion occurred. ASSERTMSG exhibits the same behavior, except that it includes the text of the Message argument with its output. The Message argument is just a simple string. Unlike the debug print functions described earlier, ASSERTMSG does not allow printf-style substitutions. It should be noted that both assertion macros compile conditionally and disappear altogether in free builds of the driver. This means it is a very bad idea to put any executable code in the Expression argument. Also, the underlying function used by these macros, RtlAssert, is a no-op in the free version of Windows 2000 itself. So, to see any assertion failures, a checked build of a driver must be run under a checked version of Windows 2000. Finally, a warning is in order. The checked build of Windows 2000 crashes with a KMODE_EXCEPTION_NOT_HANDLED error if an assertion fails and the Kernel's debug client is not enabled. If the debug client is enabled, but there is no debugger on the other end of the serial line, the target machine simply hangs if an assertion fails. Recovery can be attempted by starting WinDbg on the host machine, but the text of the assertion that failed is lost. Using Bugcheck CallbacksA bugcheck callback is an optional driver routine that is called by the kernel when the system begins to crash. These routines provide a convenient way to capture debugging information at the time of crash. They can also be used to place hardware in a known state before the system goes away. They work as follows:
There are some restrictions on what a bugcheck callback is allowed to do. When it runs, the Callback routine cannot allocate any system resources (like memory). It also cannot use spinlocks or any other synchronization mechanism. It is allowed to call kernel routines that don't violate these restrictions, as well as the HAL functions that access device registers. Catching Memory LeaksA memory leak is one of the harder kinds of driver pathology. Drivers that allocate pool space and then forget to release it may just degrade system performance over time, or they can lead to actual system crashes. Using the Windows 2000 built-in pool-tagging mechanism can help determine if a driver leaks memory. Here is how it works.
One easy way to use pool tagging is to replace the ExAllocatePool function with ExAllocatePoolWithTag inside of conditional compilation directives. This way, tagging can be enabled and disabled without considerable effort. Even better, a driver macro can be used for all pool allocations. The macro itself can contain the conditional compilation directives. For example: #define ALLOCATE_POOL( type, size ) \ #if DBG==1 \ ExAllocatePoolWithTag( (type), (size), 'DCBA' ) \ #else \ ExAllocatePool( (type), (size) ) \ #endif The tag argument to ExAllocatePoolWithTag consists of four case-sensitive ANSI characters. Because of the byte-ordering phenomena of little-endian machines, the tag must be specified as characters in reverse order. Hence, the DCBA tag in the example becomes ABCD in the pool tag display. In this example, the same tag value is used for all the allocations made by a single driver. For some situations, it may be appropriate to use different tag values for different kinds of data structures, or for allocations made by different parts of a driver. These kinds of strategies may help identify memory leaks caused by a driver. The POOLMON utility that ships with the DDK allows dynamic observation of pool tags without the need for WinDbg. This command-line utility runs on the target machine and it outputs a continuously updated display of the pool tags. The tool is also supplied with the Windows 2000 Resource Kit. Using Counters, Bits, and BuffersThere is no question that interactive driver debugging is a wonderful feature. Unfortunately, some bugs are time-dependent, and they disappear when breakpoints or single-stepping is used. This section presents several techniques that may help under these circumstances. SANITY COUNTERSPairs of counters can be used to perform several kinds of sanity checks in a driver. For example, they might count how many IRPs arrive at a driver and how many are sent to IoCompleteRequest. Or, in a higher-level driver, the number of IRPs allocated versus the number released could be tracked. Checks like these can help find subtle inconsistencies in the behavior of a driver. The only disadvantage of sanity counters is that they do not necessarily pinpoint the location of the problem. Implementing a counter is very simple. Declare a ULONG variable within the Device Extension for each counter, and then add appropriate code to increment the counters throughout the driver. As with all debugging support, it is a good idea to wrap sanity-counter code in conditional compilation statements that depend on the DBG symbol. A somewhat ambitious plan would be to write a WinDbg extension command to display all of a driver's counters. As a simple alternative, a driver can force a bugcheck after it has collected enough data and simply use a bugcheck callback to save the counter values. EVENT BITSAnother useful technique is to keep a collection of bit flags that track the occurrence of significant events in a driver. Each bit represents one specific event, and when that event happens, a driver sets the corresponding bit. Where sanity counters track global driver behavior, event bits provide information about what parts of code have executed. One of the design decisions for event bits is whether to clear the event variable during DriverEntry, during the AddDevice or Dispatch routines, or when processing begins on each new IRP. Each of these options provides useful information in different situations. TRACE BUFFERSThe problem with event bits and counters is that they do not provide information about the sequence of execution of code. As an alternative, a simple tracing mechanism can be added that makes entries in a special buffer as different parts of a driver execute. Trace buffers can be very useful for tracking down unexpected interactions in asynchronous or full-duplex drivers. On the downside, this extra information is not free. Trace buffers use more CPU time than counters or event bits, and this can have invasive results on time-sensitive bugs. Implementing a trace buffer mechanism takes more work than the other techniques already presented. Here are the basic steps to follow.
The trace buffer is just an array coupled with a counter that keeps track of the next free slot. The following code fragment illustrates the structure of the basic trace buffer: typedef _DEVICE_EXTENSION { : #if DBG==1 ULONG traceCount; ULONG traceBuffer[ TRACE_BUFFER_SIZE ]; #endif : } DEVICE_EXTENSION, *PDEVICE_EXTENSION; Again, depending upon the data being sought, the traceCount field can be initialized once in the DriverEntry routine, or each time an IRP arrives. Adding entries to the buffer is just a matter of storing an item in the array and incrementing the counter. The code fragment below demonstrates how to implement a basic trace macro. #if DBG==1 #define DRVTRACE( pDE, Tag ) \ if (pDE->traceCount >= TRACE_BUFFER_SIZE) \ pDE->traceCount = 0; \ pDE->traceBuffer[ pDE->traceCount++ ] = \ (ULONG) (Tag); #else #define DRVTRACE( pDE, Tag ) #endif Notice that this implementation ignores all the synchronization issues that can arise when DRVTRACE is used from multiple IRQL levels (potentially on multiple CPUs). Since the whole purpose of using trace buffers is to catch errors that are sensitive to timing, putting synchronization mechanisms into DRVTRACE would probably render it useless. One solution is to call DRVTRACE only from places in a driver where synchronization will not be a problem. For example, when calling DRVTRACE from DPC routines, synchronization is inherently handled as part of the larger structure of the driver itself. Similarly, if it is called from an ISR and a SyncCritSection routine, synchronization is already guaranteed. If these restrictions cannot be met, explicit synchronization must be added to DRVTRACE.
|