Basic Crash Dump Analysis

 < Day Day Up > 

If OCA fails to identify a resolution or you are unable to submit the crash to OCA (if, for example, you have a crash dump file generated by Windows 2000, which doesn't support OCA), an alternative is analyzing crashes yourself. As mentioned earlier, Windbg and Kd both execute the same analysis engine used by OCA when you load a crash dump file and the basic analysis can sometimes pinpoint the problem. So you might be fortunate and have the crash dump solved by the automatic analysis. But if not, there are some straightforward techniques to try to solve the crash.

This section explains how to perform basic crash analysis steps, followed by tips on leveraging the Driver Verifier (which is introduced in Chapter 7) to catch buggy drivers when they corrupt the system so that a crash dump analysis pinpoints them.

Notmyfault

You can use the Notmyfault utility from http://www.sysinternals.com/windowsinternals to generate the crashes described here. Notmyfault consists of an executable named Notmyfault.exe and a driver named Myfault.sys. When you run the Notmyfault executable, it loads the driver and presents the dialog box shown in Figure 14-7, which allows you to crash the system in various ways or to cause the driver to leak paged pool. The crash types offered represent the ones most commonly seen by Microsoft Product Support Services. Selecting an option and clicking the Do Bug button causes the executable to tell the driver, by using the DeviceIoControl Windows API, which type of bug to trigger. Note that you should execute Notmyfault crashes on a test system or in a virtual machine because there is a small risk that memory it corrupts will be written to disk and result in file or disk corruption.

Figure 14-7. Notmyfault


Note

The names of the Notmyfault executable and driver highlight the fact that user-mode cannot directly cause the system to crash. The Notmyfault executable can cause a crash only by loading a driver to perform an illegal operation for it in kernel-mode.


The most straightforward Notmyfault crash to debug is the one caused by selecting the High IRQL Fault (Kernelmode) option and clicking the Do Bug button. This causes the driver to allocate a page of paged pool, free the pool, raise the IRQL to above DPC/dispatch level, and then to touch the page it has freed. (See Chapter 3 for more information on IRQLs.) If that doesn't cause a crash, the process continues by reading memory past the end of the page until it causes a crash by accessing invalid pages. The driver performs several illegal operations as a result:

  1. It references memory that doesn't belong to it.

  2. It references paged pool at an IRQL that's DPC/dispatch level or higher, which is illegal because page faults are not permitted when the processor IRQL is DPC/dispatch level or higher.

  3. When it goes past the end of the memory that it had allocated, it tries to reference memory that is potentially invalid.

The reason the first page reference might not cause a crash is that it won't generate a page fault if the page that the driver frees remains in the system working set. (See Chapter 7 for information on the system working set.)

When you load a crash generated with this bug into Kd, its analysis displays something like this:

Microsoft (R) Windows Debugger Version 6.3.0011.2 Copyright (c) Microsoft Corporation. All rights reserved. Loading Dump File [c:\windows\memory.dmp] Kernel Summary Dump File: Only kernel address space is available Symbol search pathis: srv*c:\symbols*http://msdl.microsoft.com/download/symbols Executable search pathis: Windows 2000 Kernel Version 2195 UP Free x86 compatible Product: LanManNt, suite: Enterprise Terminal Server Kernel base = 0x80400000 PsLoaded ModuleList = 0x8046a4c0 Debug session time: Mon Apr 05 18:28:44 2004 System Uptime: 0 days 0:09:20.105 Loading Kernel Symbols ................................................................................ ............... Loading unloaded module list No unloaded module list present Loading User Symbols .......... ******************************************************************************* *                                                                             * *                       BugcheckAnalysis                                      * *                                                                             * ******************************************************************************* Use !analyze -v to get detailed debugging information. BugCheck D1, {bead1800, 1c, 0, ec1fb357} *** ERROR: Module load completed but symbols could not be loaded for myfault.sys *** WARNING: Unable to verify check sum for Not My fault.exe *** ERROR: Module load completed but symbols could not be loaded for NotMyfault.exe Probably caused by : myfault.sys ( myfault+357) Followup: MachineOwner --------- kd>

The first thing to note is that Kd reports errors trying to load symbols for Myfault.sys and Notmyfault.exe. These are expected because the symbol files for them are not on the symbol-file path (which is configured to point at the Microsoft symbol server). You'll see similar errors for third-party drivers and executables that do not ship with the operating system.

The analysis text itself is terse, showing the numeric stop code and bug check parameters followed by a "probably caused by" line that shows the analysis engine's best guess at the offending driver. In this case it's on the mark and points directly at Myfault.sys, so there's no need for manual analysis.

The "Followup" line is not generally useful except within Microsoft, where the debugger looks for the module name in the Triage.ini file that's located within the Triage directory of the Debugging Tools for Windows installation directory. The Microsoft-internal version of that file lists the developer or group responsible for handling crashes in a specific driver, and the debugger displays their name in the Followup line when appropriate.

Verbose Analysis

Even though the basic analysis of the Notmyfault crash identifies the faulty driver, you should always have the debugger execute a verbose analysis by entering the command:

!analyze v

The first obvious difference between the verbose and default analysis is the description of the stop code and its parameters. Following is the output of the command when executed on the same dump:

DRIVER_IRQL_NOT_LESS_OR_EQUAL(d1) An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high. Thisisusually caused by drivers using improper addresses. If kernel debugger is available get stack backtrace. Arguments: Arg1: bead1800, memory referenced Arg2: 0000001c, IRQL Arg3: 00000000, value 0 = read operation, 1 = write operation Arg4: ec1fb357, address which referenced memory

This saves you the trouble of opening the help file to find the same information, and the text sometimes suggests troubleshooting steps, an example of which you'll see in the next section on advanced crash dump analysis.

The other potentially useful information in a verbose analysis is the stack trace of the thread that was executing on the processor that crashed at the time of the crash. Here's what it looks like for the same dump:

STACK_TEXT: WARNING: Stack unwind information not available. Following frames may be wrong. b7e6dc34 804ac5de ff7c4040  beb4bf68  ff80a9c8  myfault+0x357 b7e6dd00 804a8f1e 00000070  00000000  00000000  nt!IopXxxControlFile+0x5e4 b7e6dd34 80461691 00000070  00000000  00000000  nt!NtDeviceIoControlFile+0x28 b7e6dd34 77f96be2 00000070  00000000  00000000  nt!KiSystemService+0xc4 0012f4a0 77e84c9b 00000070  00000000  00000000  ntdll!ZwDeviceIoControlFile+0xb 0012f504 004017c3 00000070  83360018  00000000  KERNEL32!DeviceIoControl+0x100 000200ac 00000000 00000000  00000000  00000000  NotMyfault+0x17c3

The preceding stack shows that the Notmyfault executable image, shown at the bottom, invoked the DeviceIoControl function in Kernel32.dll, which in turn invoked ZwDeviceIoControlFile in Ntdll.dll, and so on, until finally, the system crashed with the execution of an instruction in the Myfault image. A stack trace like this can be useful because crashes sometimes occur as the result of one driver passing another one improperly formatted, corrupt, or that has illegal parameters. The driver that's passed the invalid data might cause a crash and get the blame in an analysis, when the stack reveals that another driver was involved. In this sample trace, no driver other than myfault is listed. (The module "nt" is Ntoskrnl.)

If the driver singled out by an analysis is unfamiliar to you, use the lm (list modules) command to look at the driver's version information. Add the k (kernel modules) and v (verbose) options along with the m (match) option followed by the name of the driver and a wildcard:

kd> lm kv m myfault* start    end        modulename f224d000 f224dbe0   myfault    (deferred)     Image path:\??\C:\WINNT\system32\drivers\myfault.sys     Timestamp: Thu Apr 29 14:53:12  2004 (40915D28) Checksum: 00010090     ImageSize  : 00000BE0     File version:      2.0.0.0     Product version:   2.0.0.0     File flags:        0 (Mask3F)     File OS:           40004 NT Win32     File type:         3.7Driver     File date:         00000000.00000000     Translations:      0409.04b0     CompanyName:       Sysinternals     ProductName:       Sysinternals Myfault     InternalName:      myfault.sys     OriginalFilename:  myfault.sys     ProductVersion:    2.0     FileVersion:       2.0     FileDescription:   Crash Test Driver     LegalCopyright:    Copyright (C) M. Russinovich 2002-2004

In addition to using the description to identify the purpose of a driver, you can also use the file and product version numbers to see whether the version installed is the most up-to-date version available. (You can do this by checking the vendor Web site, for instance.) If version information isn't present (because it might have been paged out of physical memory at the time of the crash), look at the driver image file's properties in Explorer on the system that crashed.

     < Day Day Up > 


    Microsoft Windows Internals
    Microsoft Windows Internals (4th Edition): Microsoft Windows Server 2003, Windows XP, and Windows 2000
    ISBN: 0735619174
    EAN: 2147483647
    Year: 2004
    Pages: 158

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net