Using Crash Troubleshooting Tools

 < Day Day Up > 

The crash generated in the preceding section with Notmyfault's High IRQL Fault (Kernel-mode) option poses no challenge for the debugger's automated analysis. Unfortunately, most crashes are not so easy and often are impossible to debug. There are several levels of increasing severity in terms of system performance degradation that might help make a system that's producing crashes that cannot be analyzed into ones that can be. If the crashes generated after you configure a level and reboot aren't revealing the cause, try the next level.

  1. If there are one or more drivers you consider likely sources of the crashes because they were introduced into the system relatively recently, they were recently updated, or the circumstances of the crash implicate them enable them for verification using the Driver Verifier and check all the verification options except for low resources simulation.

  2. Enable the same level of verification as in level 1 on all unsigned drivers in the system. Or, if you are running Windows 2000 where the Driver Verifier doesn't distinguish between signed and unsigned drivers, enable it on all non-Microsoft drivers.

  3. Enable the same verification as in level 1 on all drivers in the system. To maintain reasonable performance, you may want to divide the drivers into groups, enabling the Driver Verifier on one group at a time between reboots

Obviously, before you spend time and energy making system configuration changes and analyzing crashes, you should ensure that your system's kernel and drivers are the most recent available by using the services of Windows Update and third-party driver support sites.

Note

If your system becomes unbootable because the Driver Verifier detects a driver error and crashes the system, then start in safe mode (where verification is disabled), run the Driver Verifier, and delete verification settings.


The following sections demonstrate how the Driver Verifier can make impossible-to-debug crashes into ones that you can solve. You should also refer to the Debugging Tools help file, which has tutorials on advanced debugging techniques.

Buffer Overrun and Special Pool

By far the most common source of crashes on Windows is pool corruption. Pool corruption usually occurs when a driver suffers from a buffer overrun or buffer underrun bug that causes it to overwrite data past either the end or start of a buffer it has allocated from paged or nonpaged pool. The Executive's pool-tracking structures reside on either side of a pool buffer and separate buffers from each other. These bugs, therefore, cause corruption to the pool tracking structures, to buffers owned by other drivers, or to both. The crashes caused by pool corruption are virtually impossible to debug because the system crashes when corrupted data is referenced, not when the corruption occurs.

Note

To assist in catching these difficult corruptions, Windows XP Service Pack 2 and later performs pool-block tail checking at all times. Thus, buffer overruns are likely to cause an immediate BAD_POOL_HEADER crash.


You can generate a pool corruption crash by running Notmyfault and selecting the Buffer Overflow bug. This causes Myfault to allocate a buffer and then overwrite the 40 bytes following the buffer. There can be a significant delay between the time you click the Do Bug button and a crash occurs, and you might even have to generate pool usage by exercising applications before a crash occurs, which highlights the distance between a corruption and its effect on system stability. An analysis of the resultant crash almost always reports Ntoskrnl or another driver as being the likely cause, which demonstrates the usefulness of a verbose analysis with its description of the stop code:

DRIVER_CORRUPTED_EXPOOL (c5) An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high. Thisis caused by drivers that have corrupted the system pool. Run the driver verifier against any new (orsuspect) drivers, and if that doesn't turn up the culprit, then use gflags to enable special pool. Arguments: Arg1: 4f4f4f53,  memoryreferenced Arg2: 00000002, IRQL Arg3: 00000001, value 0 = read operation, 1 = write operation Arg4: 80467139, address which referenced memory

The advice in the description is to run the Driver Verifier against any new or suspect drivers or to use Gflags to enable special pool. Both accomplish the same thing: to have the system detect a potential corruption when it occurs and crash the system in a way that makes the automated analysis point at the driver causing the corruption.

If the Driver Verifier's special pool option is enabled, verified drivers use special pool, rather than paged or nonpaged pool, for any allocations they make for buffers slightly less than a page in size. A buffer allocated from special pool is sandwiched between two invalid pages and by default is aligned against the top of the page. The special pool routines also fill the unused portions of the page in which the buffer resides with a random pattern. Figure 14-8 depicts a special pool allocation.

Figure 14-8. Special pool buffer allocation


The system detects any buffer overruns of under a page in size at the time of the overrun because they cause a page fault on the invalid page following the buffer. The signature serves to catch buffer underruns at the time the driver frees a buffer because the integrity of the pattern placed there at the time of allocation will have been compromised.

To see how the use of special pool causes a crash that the analysis engine easily diagnoses, run the Driver Verifier Manager. On Windows 2000 systems, go to the Settings tab, type myfault.sys in the edit box at the bottom of the page where it allows you to specify additional drivers, select the special pool check box, apply the changes, exit the Driver Verifier Manager, and reboot. On Windows XP and Windows Server 2003, choose the Create Custom Settings (For Code Developers) option on the first page of the wizard, choose Select Individual Settings From A Full List on the second, and select Special Pool. Choose the Select Drivers From A List option on the subsequent page, and on the page that lists drivers type myfault.sys into the File Find dialog box after pressing the button to add unloaded drivers. (You do not have to find myfault.sys in the File Find dialog box; just enter its name.) Then check the myfault.sys driver, exit the wizard, and reboot.

When you run Notmyfault and cause a buffer overflow, the system will immediately crash and the analysis of the dump reports this:

DRIVER_PAGE_FAULT_BEYOND_END_OF_ALLOCATION (d6) N bytes of memory was allocated and more than N bytes are being referenced. This cannot be protected by try-except. When possible, the guilty driver's name (Unicode string) is printed on the bugcheck screen and saved in KiBugCheckDriver. Arguments: Arg1: beb50000, memory referenced Arg2: 00000001, value 0 = read operation, 1 = writeoperation Arg3: ec3473f1, ifnon-zero, the address which referenced memory. Arg4: 00000000, (reserved)

Special pool made an elusive bug into one that instantly reveals itself and makes the analysis trivial.

Code Overwrite and System Code Write Protection

A driver with a bug that causes corruption or misinterpretation of its own data structures can reference memory the driver doesn't own when it interprets corrupted data as a memory pointer value. The target of the pointer can be anything in the virtual address space, including data belonging to other drivers, invalid memory, or the code of other drivers or the kernel. As with buffer overruns, by the time that corruption is detected and the system crashes it's usually impossible to identify the driver that caused the corruption. Enabling special pool increases the chance of catching wild-pointer bugs, but it does not catch code corruption.

When you run Notmyfault and select the Code Overwrite option, the Myfault driver corrupts the entry point to the NtReadFile kernel function. One of two things will happen at this point: if your system is running Windows XP and has 127 MB or less of physical memory or is running Windows Server 2003 and has 255 MB or less of physical memory, you'll get a crash for which an analysis points at Myfault.sys. The stop code description that a verbose analysis prints tells you that Myfault attempted to write to read-only memory:

ATTEMPTED_WRITE_TO_READONLY_MEMORY (be) An attempt was made to write to readonly memory. The guilty driver is on the stack trace (and is typically the current instruction pointer). When possible, the guilty driver's name (Unicode string) is printed on the bugcheck screen and saved in KiBugCheckDriver. Arguments: Arg1: 804bb7fd, Virtual address for the attempted write. Arg2: 004bb121, PTE contents. Arg3: b804db60, (reserved) Arg4: 0000000b, (reserved)

However, if you have Windows 2000 with more than 127 MB of memory or Windows XP Windows Server 2003 with more than 255 MB of memory, you'll get a different type of crash because the attempt to corrupt the memory isn't caught. Because NtReadFile is a commonly executed system service that is used by the Win32 subsystem to read keyboard and mouse input, the system will almost immediately crash as a thread attempts to execute the corrupted code and generates an illegal instruction fault. The analysis of crashes generated with this bug is always wrong, but it might vary, with Windows.sys and Ntoskrnl.exe commonly being the analyzer's best guess as to what's responsible. The bug check description for these crashes is:

KMODE_EXCEPTION_NOT_HANDLED (1e) This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address. Arguments: Arg1: c0000005,  The exception code that was not handled Arg2: 80461885,  The address that the exception occurred at Arg3: 00000000,  Parameter 0 of the exception Arg4: 00000000,  Parameter 1 of the exception

The reason for the different behaviors on different configurations relates to a mechanism introduced into Windows 2000 called system code write protection. Table 14-2 shows on which configurations system code write protection is enabled by default.

Table 14-2. System Code Write Protection Configurations
 

Windows 2000

Windows XP and Windows Server 2003

System Code Write Protection Disabled

RAM > 127 MB

RAM > 255 MB


If system code write protection is enabled, the memory manager maps Ntoskrnl.exe, the HAL, and boot drivers using standard physical pages (4 KB on x86 and x64, and 8 KB on IA64). Because the granularity of protection in an image is the standard page size, the memory manager can write-protect code pages so that an attempt to modify them generates an access fault (as seen in the first crash). However, when system code write protection is disabled, the memory manager uses large pages (4 MB on x86, and 16 MB on IA64 and x86-64) to map Ntoskrnl.exe, which includes Windows 2000 with more than 127 MB of RAM and Windows XP or Windows Server 2003 with more than 255 MB of RAM. It cannot protect code because code and data might reside on the same page.

If system code-write protection is off and crash analysis reports unlikely causes for a crash or you suspect code corruption, you should enable it. Verifying at least one driver with the Driver Verifier is the easiest way to enable it. You can also enable it manually by adding two registry values under HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management. First, specify the amount of RAM at which the memory manager uses large pages instead of standard pages to map Ntoskrnl.exe to an effectively infinite value. You do this by creating a DWORD value called LargePageMinimum and setting it to 0xFFFFFFFF. Then add another DWORD value named EnforceWriteProtection and set it to 1. You must reboot for the changes to take effect.

Note

When the debugger has access to the image files included in a crash dump, the analysis internally executes the !chkimg debugger command to verify that a copy of an image in a crash dump matches the on-disk image and reports any differences. Note that chkimg will always report discrepancies in Ntoskrnl.exe if you've enabled the Driver Verifier.


     < Day Day Up > 


    Microsoft Windows Internals
    Microsoft Windows Internals (4th Edition): Microsoft Windows Server 2003, Windows XP, and Windows 2000
    ISBN: 0735619174
    EAN: 2147483647
    Year: 2004
    Pages: 158

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net