[oR]
Analyzing a Crash Dump
When a crash occurs, Windows 2000 can save the state of the system in a dump file. To enable this feature, the System Control Panel applet (Advanced tab, Startup and Recovery) must be configured. Crash dumps allow an immediate reboot of the system without losing the state of memory at the moment of failure. This section explains how to analyze a system crash dump.
Goals of the Analysis
With WinDbg and a crash dump file, the state of the failed system can be examined. It is possible to find out almost as much information as if it were still running or if a live debugger were attached at the moment of failure. This kind of forensic pathology can help develop a convincing explanation of what led to the crash. Some of the questions that should be asked during the analysis include
-
Which drivers were executing at the time of the crash?
-
Which driver was responsible for the crash?
-
What was the sequence of events leading to the crash?
-
What operation was the driver trying to perform when the system crashed?
-
What were the contents of the Device Extension?
-
What Device object was it working with?
Starting the Analysis
To begin the analysis, the crash dump file must be obtained. If WinDbg is available on the target system (the system that crashed), the file is present wherever the Control Panel configuration specified (e.g., WINNT\Memory.DMP). If the system that crashed is at a remote site, the dump file must be transported. Since dump file sizes range from large to
very
large, be sure to use appropriate techniques for transport (e.g., compression and/or CD-R media).
On the analyzing machine, invoke WinDbg. Then choose the menu option File and select Open Crash Dump. Choose the dump file
name
to
open
. After the dump file loads, information is displayed as shown in the following excerpt:
Kernel Debugger connection established for D:\WINNT\MEMORY.DMP
Kernel Version 2195 Free loaded @ ffffffff80400000
Bugcheck 0000001e : c0000005 f17a123f 00000000 00000000
Stopped at an unexpected exception: code=80000003 addr=8045249c
Hard coded breakpoint hit
...
Module Load: CRASHER.SYS (symbol loading deferred)
The initial information reveals the same STOP message information from the original blue screen. The bugcheck code is 0x1E, signifying KMODE_EXCEPTION_NOT_HANDLED. The second bugcheck argument is the address where the problem occurred (0xF17A123F). To see where this instruction
falls
within the source code, choose Edit, then Goto Address and enter the address from the bugcheck information. If symbol information is located (don't forget to set the symbol
path
from the Options dialog of the View menu), the source file is opened. For this example, a function that purposefully generates an unhandled exception,
TryToCrash
, is displayed with the cursor placed on the line of code that was executing. The screen shot of Figure 17.2 displays this remarkably helpful feature.
Figure 17.2.
Crash dump analysis screen shot.
The first parameter for bugcheck 0x1E is the unhandled exception code, 0xC0000005. This signifies an access violation, which is not surprising given the code of
TryToCrash
. Dereferencing a NULL pointer is never a great idea.
Do not be misled by the message about the unexpected exception with code 0x80000003. This is just the breakpoint used by
KeBugCheck
itself to halt the system, so it has no significance.
Tracing the Stack
The stack trace is one of the most important steps in analyzing a crash dump. The stack state at the time of the crash is a record of the calls made from the oldest frame (at the stack bottom) to the crash point itself (at the top).
Unfortunately, finding the right stack to trace is often quite involved. This is due to the fact that systems
operate
with many threads, each with their own context (which includes a private stack). At the time of an unhandled exception (as in the example), control is transferred to a system routine that switches to a safe context. (After all, the unhandled exception could have been caused by a corrupt stack. Further processing within that context would be unsafe.) The Windows 2000 kernel routine that
performs
the unhandled exception processing for most driver operations is
PspUnhandledExceptionInSystemThread
.
HIGH IRQL CRASHES
If the system crashed while it was running at or above DISPATCH_LEVEL IRQL, a straightforward stack trace is in order.
To obtain a stack trace from a crash dump under analysis by WinDbg, the Call Stack option can be selected from the View menu, or the k command can be used directly. To continue the example, the following is displayed:
> k
f79a6678 8045251c f79a66a0 8045cc77 f79a66a8
NTOSKRNL!PspUnhandledExceptionInSystemThread+0x18
f79a6ddc 80465b62 80418ada 80000001 00000000 NTOSKRNL!Psp-
SystemThreadStartup+0x7a (EBP)
00000000 00000000 00000000 00000000 00000000
NTOSKRNL!KiThreadStartup+0x16 (No FPO)
>
Each line shows the address of the stack frame, the return address of the function, and the first three arguments passed to the function. (The kb stack backtrace command can be used instead of k to better format the display.)
CRASHES BELOW DISPATCH_LEVEL
As demonstrated, the function
PspUnhandledExceptionInSystemThread
indeed was called to handle the NULL pointer dereference, but there is no obvious linkage back to the faulty driver code itself.
The first input parameter to
PspUnhandledExceptionInSystemThread
is a pointer to a structure that contains the exception and context records. The !exr and !cxr extension commands can be used to format and display these
vital
records.
After the !cxr command executes, the very useful !kb command displays a stack trace using the context of the last !cxr command. This should be the call stack that was in context at the time of the unhandled exception.
> dd f79a66a0 l 2 ; this is a lowecase L, not a 1
0xF79A66A0 f79a6b28 f79a6780
> !exr f79a6b28
Exception Record @ F79A6B28:
ExceptionAddress: f17a123f (TryToCrash+0xf)
ExceptionCode: c0000005
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000000
Parameter[1]: 00000000
> !cxr f79a6780
CtxFlags: 00010017
eax=00000001 ebx=00000000 ecx=01000100 edx=f79a6dcc esi=e1eba118 edi=fcdb8c38
eip=f17a123f esp=f79a6bf0 ebp=f79a6bf4 iopl=0 nv up ei pl zr na po nc
vip=0 vif=0
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00210246
0000123F
> !kb
Chi=00210246
0000123F
> !kb
Chi=00210246
0000123F
> !kb
ChildEBP RetAddr Args to Child
f79a6bf4 f17a11b5 00000001 00000000 7cdb3738 CRASHER!TryToCrash+0xf
f79a6c78 f17a102e fcdb3750 00000000 804a43c4 CRASHER!CreateDevice+0x162
f79a6c90 804a4431 fcdb3750 fcd67000 f79f2d08 CRASHER!DriverEntry+0x2e
f79a6d58 804d9281 0000035c fcd67000 f79f2d08 NTOSKRNL!_NtSetInformationFile@20+0x5a0
f79a6d78 80418b9f f79f2d08 00000000 00000000 NTOSKRNL!_NtSetInformationFile@20+0x7e1
f79f2d58 80461691 00f1f784 00000000 00000000 NTOSKRNL!_ExpWorkerThread@4+0xae
f79f2d58 77f9a31a 00f1f784 00000000 00000000 NTOSKRNL!_KiSystemService+0xc4
f79a6bec 01000100 f79a6c78 f17a11b5 00000001 +0xffffffff
00f1f794 00000000 00000000 00000000 00000000 +0xffffffff
Indirect Methods of Investigation
If a driver was not the direct cause of the crash, it still cannot be
ruled
out as an indirect cause. Perhaps a device DMA operation scribbled into memory. To analyze such situations, considerable information must be gathered. This can involve creativity and
imagination
(a.k.a. snooping and patience).
FINDING I/O
REQUESTS
A good starting point is to identify any IRPs that the driver was processing at the time of the crash. Begin by obtaining a list of the active IRPs on the entire system with the !irpfind command.
> !irpfind
Searching NonPaged pool (8090c000 : 8131e000) for Tag: Irp
8097c008 Thread 8094d900 current stack belongs to \Driver\Crasher
8097dec8 Thread 8094dda0 current stack belongs to \FileSystem\Ntfs
809861a8 Thread 8094dda0 current stack belongs to \Driver\symc810
809864e8 Thread 80951ba0 current stack belongs to \Driver\Mouclass
80986608 Thread 80951ba0 current stack belongs to \Driver\Kbdclass
80986728 Thread 8094dda0 current stack belongs to \Driver\symc810
From this list, select the IRP
belonging
to the driver under test. Then the !irp command is used to format the specific IRP.
> !irp 8097c008
Irp is active with 1 stacks 1 is current
No Mdl System buffer = ff593d88 Thread 80987da0: Irp stack trace.
cmd flg cl Device File Completion-Context
> 4 0 1 809d50d0 00000000 00000000-00000000 pending
\Driver\Crasher
Args: 0000000C 00000000 00000000 00000000
The
cmd
field shows the major function, and the
Args
field displays the
Parameters
union of the I/O stack location. The
flg
and
cl
fields show the stack location flags and control bits, defined in NTSTATUS.H.
For this example, the IRP major function code is 4, signifying IRP_MJ_WRITE, with a
Parameters.Write.Length
of 12 (0xC). Further, no completion routine is associated with the IRP and it has been
marked
pending at the time of the crash.
There is a system buffer associated with the IRP (at location 0xFF593D88), which can be examined with the dd command or the Memory option in the View menu. This device is performing buffered I/O.
To examine the Device object the IRP was sent to, use the !devobj command on the address specified by the IRP.
> !devobj 809d50d0
Device object is for:
Crash0 \Driver\Crasher DriverObject ff53e1d0
Current Irp 8097c008 RefCount 1 Type 00000022 DevExt ff58bc58
DeviceQueue:
The Device Extension can also be dumped using the dd command. Later in this chapter, a WinDbg extension that makes the Device Extension easier to display is demonstrated.
Of course, the IRP may not yield as much information as the stack trace, but it does reveal some possibly relevant information. For example, the IRP reveals that the driver was performing buffered I/O and that the request was passed to the Start I/O routine, since it was marked as pending. Detective work does not always yield a quick path to the truth.
EXAMINING PROCESSES
Sometimes, it is helpful to know what processes were running on a system at the time of a crash. This can help spot patterns of system usage or even specific
user
programs that trigger a driver to fail. For general information, the !process command is used.
> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
PROCESS 80a02a60 Cid: 0002 Peb: 00000000 ParentCid: 0000
DirBase: 00006e05 ObjectTable: 80a03788 TableSize: 150.
Image: System
PROCESS 80986f40 Cid: 0012 Peb: 7ffde000 ParentCid: 0002
DirBase: 000bd605 ObjectTable: 8098fce8 TableSize: 38.
Image: smss.exe
PROCESS 80958020 Cid: 001a Peb: 7ffde000 ParentCid: 0012
DirBase: 0008b205 ObjectTable: 809782a8 TableSize: 150.
Image: csrss.exe
PROCESS 80955040 Cid: 0020 Peb: 7ffde000 ParentCid: 0012
DirBase: 00112005 ObjectTable: 80955ce8 TableSize: 54.
Image: winlogon.exe
PROCESS 8094fce0 Cid: 0026 Peb: 7ffde000 ParentCid: 0020
DirBase: 00055005 ObjectTable: 80950cc8 TableSize: 222.
Image: services.exe
PROCESS 8094c020 Cid: 0029 Peb: 7ffde000 ParentCid: 0020
DirBase: 000c4605 ObjectTable: 80990fe8 TableSize: 110.
Image: lsass.exe
PROCESS 809258e0 Cid: 0044 Peb: 7ffde000 ParentCid: 0026
DirBase: 001e5405 ObjectTable: 80925c68 TableSize: 70.
Image: SPOOLSS.EXE
For more information, the CID number of a specific process can be used to increase the level of
verbosity
.
> !process 0 7
**** NT ACTIVE PROCESS DUMP ****
PROCESS fb667a00 Cid: 0002 Peb: 00000000 ParentCid: 0000
DirBase: 00030000 ObjectTable: e1000f88 TableSize: 112.
Image: System
VadRoot fb666388 Clone 0 Private 4. Modified 9850. Locked 0.
FB667BBC MutantState Signalled OwningThread 0
Token e10008f0
ElapsedTime 15:06:36.0338
UserTime 0:00:00.0000
KernelTime 0:00:54.0818
QuotaPoolUsage[PagedPool] 1480
Working Set Sizes (now,min,max) (3, 50, 345)
PeakWorkingSetSize 118
VirtualSize 1 Mb
PeakVirtualSize 1 Mb
PageFaultCount 992
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 8
THREAD fb667780 Cid 2.1 Teb: 00000000 Win32Thread: 80144900 WAIT:
(WrFreePage) KernelMode Non-Alertable
80144fc0 SynchronizationEvent
Not impersonating
Owning Process fb667a00
WaitTime (seconds) 32278
Context Switch Count 787
UserTime 0:00:00.0000
KernelTime 0:00:21.0821
Start Address Phase1Initialization (0x801aab44)
Initial Sp fb26f000 Current Sp fb26ed00
Priority 0 BasePriority 0 PriorityDecrement 0 DecrementCount 0
ChildEBP RetAddr Args to Child
fb26ed18 80118efc c0502000 804044b0 00000000 KiSwapThread+0xb5
fb26ed3c 801289d9 80144fc0 00000008 00000000 KeWaitForSingleObject+0x1c2
For multithreaded processes, this form of the !process command lists thread information, including objects on which they might be waiting. It also provides information about the I/O requests issued by a given thread, which may help in resolving deadlock conditions.
|