The Command Window

Although there are numerous windows in WinDBG, such as the call stack window, which show you specific information, nearly all the work you'll do when it comes to debugging .NET applications will be in the Command window. In this section, I will cover the types of commands, how to get help, and commands to view all modules and symbols currently loaded. Regardless if you're looking at minidumps or doing the wildest live debugging, you'll always need to get help and look at modules.

WinDBG has three types of commands: regular commands, meta commands (also called dot commands), and extension commands. These commands are generally described in the following ways. Regular commands control the debuggee. For example, walking the native stack, viewing thread information, and viewing memory are regular commands. Meta commands mostly control the debugger and the act of debugging. For example, creating log files, attaching to processes, and writing dump files are meta commands. Extension commands are where the action is; they are commands that dig into the debuggee and perform analysis on situations or states. Examples of extension commands include handle dumping, critical section analysis, and crash analysis. All the commands in the SOS are extension commands.

The regular and meta commands are case insensitive, whereas extension commands are traditionally all lowercase. To keep consistent in this chapter, I will show all commands in their lowercase form when discussing them.

Getting Help

When you're staring at the blinking cursor in the bottom of the Command window wondering what command you'll need, you need to turn to the Help. If you just need a tip on what a regular command name is or what its syntax is, the ? (Command Help) command will bring up a couple of pages of listings so that you can see information about the various regular commands. Some of the regular commands do support passing -? as a parameter, so you can get quick help on their parameters. You'll have to use trial and error to find out which ones support -?. For meta commands, use .help (Meta-command Help) to see the quick listing. Help for extension commands is reliant on the order extensions are loaded, which I'll discuss later. However, because all your .NET debugging will be with the SOS extension, use !help to see help on SOS. Also, for regular and meta commands, the commands are case-insensitive, but for extension commands, the commands are all lowercase.

Probably the most important command is the .hh (Open HTML Help File) meta command. Passing any command type as a parameter to .hh will open the Debugger.chm help file to the Index tab with the specified command highlighted. Simply press Enter to see the help information for that command. I hope that in a future version of WinDBG, the development team will fix the .hh command so that it opens to the help topic of the specified command automatically.

When looking at the Help for a command in Debugger.chm, pay careful attention to the Environment section that appears with each command. The table in that section tells you the situations in which WinDBG can run the command. Obviously, for user-mode debugging, the Modes field will need to identify user mode. Nearly all the user-mode commands work during live debugging in addition to while looking at minidumps.

One thing that's not very clear in the help for any of the commands is why there is a complete lack of consistency when it comes to parameters you can pass to commands. Some commands take parameters you delimit with a hyphen, some take parameters that you delimit with a slash, and others take parameters that have no delimiters at all. Pay close attention to the documentation for how to specify parameters for any given command.

Ensuring That Correct Symbols Are Loaded

As I mentioned in Chapter 2, Visual Studio 2005 has drastically improved the symbol reporting so you can fix symbol loading problems more easily. WinDBG has similar support for symbol problems but requires a little more manual work to make it happen. One thing that makes WinDBG unique is that it employs lazy symbol loading. Visual Studio always loads symbols when a module comes into the address space. This makes sense if you remember that WinDBG is designed to debug the complete 40+ million lines of code of the Windows operating systems. If WinDBG loaded all the symbols, you'd quickly run out of memory in the address space!

Whenever the Command window is active, in other words, you're stopped in the debugger, the lm (List Loaded Modules) command will display the list of modules and their corresponding symbol files. As an example, I loaded a console program called SimpleConsoleApp.exe that called Trace.WriteLine and stopped after the trace statement completed. In the "Exceptions and Events" section later in the chapter, you'll see exactly how I was able to stop inside my .NET application running on the Windows XP Professional Tablet PC Edition 32-bit version. WinDBG stopped at the trace statement. Issuing the lm command shows the following output:

0:000> lm start    end        module name 00400000 00408000   SimpleConsoleApp   (deferred) 5d090000 5d127000   comctl32_5d090000   (deferred) 69c30000 6a1d2000   System_Xml_ni   (deferred) 76390000 763ad000   IMM32      (deferred) 773d0000 774d2000   comctl32   (deferred) 774e0000 7761d000   ole32      (deferred) 77c10000 77c68000   msvcrt     (deferred) 77d40000 77dd0000   USER32     (deferred) 77dd0000 77e6b000   ADVAPI32   (deferred) 77e70000 77f01000   RPCRT4     (deferred) 77f10000 77f56000   GDI32      (deferred) 77f60000 77fd6000   SHLWAPI    (deferred) 78800000 78840000   mscoree    (deferred) 78850000 788a2000   mscorjit   (deferred) 788b0000 79336000   mscorlib_ni   (deferred) 796c0000 79c01000   mscorwks   (deferred) 7a430000 7a50a000   System_Configuration_ni   (deferred) 7a560000 7ac76000   System_ni   (deferred) 7c370000 7c409000   MSVCR80    (deferred) 7c800000 7c8f4000   KERNEL32   (pdb symbols)             \\Symbols\OSSymbols\kernel32.pdb\FB3...DF2\kernel32.pdb 7c900000 7c9b0000   ntdll      (pdb symbols)             \\\Symbols\OSSymbols\ntdll.pdb\365...C02\ntdll.pdb 7c9c0000 7d1d4000   shell32    (deferred) Unloaded modules: 60340000 60348000   culture.dll

Note that I trimmed down the two GUID values in the PDB file locations for Kernel32.pdb and Ntdll.pdb.

The output just shown shows I have symbols loaded for Kernel32.dll and Ntdll.dll; the rest are marked as "(deferred)" because WinDBG hasn't had a reason to load them. If I were to attempt to walk the native call stack at this point, WinDBG would load the symbols for the modules with addresses on the stack.

You can probably guess what the output of the lm command tells you, but I wanted to mention just two quick tidbits. The first is that the images that end in "_ni" are .NET binaries that have been NGen'd and compiled to native images. You'll generally see only the Common Language Runtime (CLR) and Common Language Framework (CLF) assemblies NGen'd. If you're thinking that running NGen across your binaries sounds like a great idea to speed up your application, it won't unless your application fits into certain requirements, mainly that your assemblies in the Global Assembly Cache (GAC) and you're sharing code across App Domains. See the MSDN Magazine February 2006 CLR Inside Out column by Claudio Caldato (http://msdn.microsoft.com/msdnmag/issues/06/02/CLRInsideOut/).

The unloaded modules report from the lm command is a list of modules that have been loaded through the Windows native API function LoadLibrary and freed with a call to FreeLibrary. In my simple example, you can see that Culture.dll, which is a DLL that's in the Framework directory, has been loaded and unloaded once.

To force a symbol load, the ld (Load Symbols) command does the trick. LD takes only a module name on the command line, so to force the loading of symbols for SimpleConsoleApp.exe, I'd issue ld SimpleConsoleApp and get the following output:

0:000> ld SimpleConsoleApp *** WARNING: Unable to verify checksum for SimpleConsoleApp.exe Symbols loaded for SimpleConsoleApp

WinDBG is very particular about symbols and tells you about anything that could potentially be wrong with the symbols. .NET binaries do not set the native Portable Executable (PE) checksum header field at all. Before the debuggers started relying on GUID values to uniquely identify symbol files, they used timestamps along with the checksum value. WinDBG is being a little old-school with the checksum warning, and because there's no way to set it in .NET applications, train yourself to ignore it.

The .reload (Reload Module) command tells WinDBG to reload all symbols. You'll always want to use the /f option with .reload so you ensure that WinDBG reloads the symbols. If you want to reload just a single module, specify that module after the /f. For example, to load Kernel32.dll, you'd issue the command .reload /f kernel32.dll. If you are working with an application that has many modules, .reload /f can run for a long while. WinDBG has gotten better about indicating that it's working instead of appearing that it's hung; the Command window entry area will show *BUSY* in the left active process and thread area. To abort the symbol loading or any other long-running command, press Ctrl+Break. You might have to press it numerous times, but WinDBG will eventually stop the command.

You can also verify proper symbol loading through the lm command. After forcing all symbols to load, the output of the lm command shows the following (I folded the last item on each line to fit the width of the page and truncated the GUID values. I also clipped out the middle portion of the output to avoid repetitive duplication and thus boredom on your part.):

0:000> lm start    end        module name 00400000 00408000   SimpleConsoleApp C (private pdb symbols)   c:\Dev\3Book\Disk\SimpleConsoleApp\bin\Debug\SimpleConsoleApp.pdb 5d090000 5d127000   comctl32_5d090000   (pdb symbols)   \\Symbols\OSSymbols\comctl32.pdb\738...CB2\comctl32.pdb 69c30000 6a1d2000   System_Xml_ni C (pdb symbols)    \\Symbols\OSSymbols\System.Xml.pdb\A6B...C61\System.Xml.pdb 76390000 763ad000   IMM32      (pdb symbols)    \\Symbols\OSSymbols\imm32.pdb\2C1...162\imm32.pdb 773d0000 774d2000   comctl32   (pdb symbols)    \\Symbols\OSSymbols\MicrosoftWindowsCommon-Controls-6.0.2600.2180-          comctl32.pdb\C45...401\MicrosoftWindowsCommon-Controls-          6.0.2600.2180-comctl32.pdb . . . 7c900000 7c9b0000   ntdll      (pdb symbols)    \\Symbols\OSSymbols\ntdll.pdb\365...C02\ntdll.pdb 7c9c0000 7d1d4000   shell32    (pdb symbols)    \\Symbols\OSSymbols\shell32.pdb\290...6D2\shell32.pdb Unloaded modules: 60340000 60348000   culture.dll

Those module names followed by a "C" indicate symbols that don't have the checksums set in the module, which as I explained, is any .NET assembly that is loaded. An octothorpe (#) following a module indicates symbols that don't match between the symbol file and the executable. (Yes, you can use the .symopt command to set WinDBG to load the closest symbols, even if they're not correct.) In the preceding example, life is good and all the symbols match.

Those symbols listed as having a type of "(pdb symbols)" indicate that they are stripped symbols that are loaded. Because all the native symbols from the operating system are stripped of their private type information and source and line tables, seeing "(pdb symbols)" means you have correct symbols. If you look at the output closely, you'll see another symbol type, "(private pdb symbols)" next to the SimpleConsoleApp module. For the modules you build, both managed and native, you'll see "(private pdb symbols)" indicating that you have all the private data accessible in the PDB file. As I mentioned earlier, just because WinDBG shows that your .NET assembly has all the private info doesn't mean that you can single-step through your .NET source code. If you have native DLLs that you built with full PDB symbols, you will see "(private pdb symbols)" reported for those DLL symbol loads also.

If after loading symbols and issuing the lm command, you see anything other than "(pdb symbols)" or "(private pdb symbols)" after the module, symbols for that module were not loaded. Keep in mind that some modules loaded in your process may be from third parties, so you may not have symbols for them. To get a closer look at exactly which modules you have loaded, pass the v option to lm to see the detailed version information for all modules loaded. If you want to narrow down the verbose display to a single module, combine the v option with m to match a single module.

0:000> lm v m mscoree start    end        module name 79000000 79045000   mscoree    (deferred)     Image path: C:\WINDOWS\system32\mscoree.dll     Image name: mscoree.dll     Timestamp:        Fri Sep 23 07:30:38 2005 (4333E75E)     CheckSum:         00045512     ImageSize:        00045000     File version:     2.0.50727.42     Product version:  2.0.50727.42     File flags:       0 (Mask 3F)     File OS:          4 Unknown Win32     File type:        2.0 Dll     File date:        00000000.00000000     Translations:     0409.04b0     CompanyName:      Microsoft Corporation     ProductName:      Microsoft® .NET Framework     InternalName:     mscoree.dll     OriginalFilename: mscoree.dll     ProductVersion:   2.0.50727.42     FileVersion:      2.0.50727.42 (RTM.050727-4200)     FileDescription:  Microsoft .NET Runtime Execution Engine     LegalCopyright:   © Microsoft Corporation. All rights reserved.     Comments:         Flavor=Retail

To see exactly where WinDBG is loading symbols and why, the extension command !sym from Dbghelp.dll, which is automatically loaded into WinDBG, offers the noisy option. To look at all your symbol loading issues, run the following commands in order:

.reload !sym noisy ld *

The .reload command will unload all unused symbols first before you attempt to reload them. The output in the Command windows shows you exactly what process the WinDBG symbol engine goes through to find and load the symbols. Armed with the output, you should be able to solve any possible symbol-loading problem that you'll encounter. To turn off the noisy output, issue the !sym quiet command.

Processes and Threads

With the symbol story behind us, I can now turn to the various means of getting processes running under WinDBG. As can Visual Studio, WinDBG can debug any number of disparate processes at a time. What makes WinDBG a little more interesting is that you have better control over debugging processes spawned from a process being debugged.

Debugging Child Processes

If you look back at the Open Executable dialog box in Figure 6-4, you'll notice that the very bottom of the dialog box has a Debug Child Processes Also check box. By selecting it, you're telling WinDBG that you also want to debug any processes started by debuggees. If you forget to select that check box when opening a process, you can use the .childdbg (Debug Child Processes) command to change the option on the fly. By itself, .childdbg will tell you the current state. Issuing a .childdbg 1 command will turn on debugging child processes. Issue .childdbg 0 to turn it off.

To show you some of the multiple process and thread options, in the next section, I'll provide some of the output resulting from debugging the command prompt, Cmd.exe, and choosing to debug child processes also. After I get Cmd.exe loaded up and executing, I'll start Notepad.exe. If you follow the same steps and have child debugging enabled, as soon as you start Notepad.exe, WinDBG will stop at the loader breakpoint for Notepad.exe. It makes sense that WinDBG stopped Notepad.exe, but that also stops Cmd.exe because both processes are now sharing the debugger loop. The debugger loop is the thread in the debugger looping between the WaitForDebugEvent and ContinueDebugEvent APIs as events happen in the debugger.

To see in the UI the processes that are currently running, choose Processes And Threads from the View menu. You'll see a layout similar to that in Figure 6-5. In the Processes And Threads window, the processes are all the root nodes with each process's thread as their children. The numbers next to Cmd.exe, 000:C1C, are the WinDBG process number followed by the Win32 process ID. In Cmd.exe, the thread 000:E04 indicates the WinDBG thread ID and the Win32 thread ID. The WinDBG process and thread numbers are unique the entire time WinDBG is running. That means there can never be another process number 1 until I restart WinDBG. The WinDBG process and thread numbers are important because they are used to set per-process and native per-thread breakpoints and can be used as modifiers to various commands.

Figure 6-5. The Process and Threads window

Viewing Processes and Native Threads in the Command Window

As with anything in WinDBG, if WinDBG displays it in a window, there's a Command window command to get the same information. To view the processes being debugged, the | (Process Status) command does the trick. The output for the two processes shown in Figure 6-5 is as follows:

1:001> |    0   id: c1c   create   name: cmd.exe .  1   id: ff4   child   name: notepad.exe

The dot in the far left column indicates the active process, meaning that any commands you execute will be working on that process. The other interesting field is the one that tells how the process came to run under the debugger. "Create" means WinDBG created the process, and "child" indicates a process that was spawned by a parent process.

The overloaded s command|s for Set Current Process and ~s for Set Current Threaddoes the work to change which process is active. You can also use the Processes And Threads window and double-click the process you'd like to make active. The bold font indicates the active process. When using the s command, you need to specify the process as a prefix to the command. For example, to switch from the second process to the first, you'd issue |0s. To quickly see which process is active, look at the numbers to the left of the Command window input line. As you swap between the processes, you'll see the numbers update. When I switched to the first process using the Cmd.exe and Notepad.exe examples and issued the | command again, the output looked a little different:

0:000> | .  0   id: c1c   create   name: cmd.exe #  1   id: ff4   child   name: notepad.exe

The difference is the octothorpe in front of the Notepad.exe process. The octothorpe indicates the process that caused the exception to stop in WinDBG. Because Notepad.exe is sitting at its loader breakpoint, the exception was a breakpoint. If the active process is the one that had the exception, the dot overrides the octothorpe display.

Viewing native threads is almost identical to viewing processes. I'm going to let Notepad.exe start, so I'll press F5 in WinDBG (alternatively, I could issue the g (Go) command in the Command window). When Notepad.exe appears, I'll open the Open dialog box by choosing Open from the File menu because it creates a bunch of threads, and in WinDBG, I'll press Ctrl+Break to break into the debugger. If you do the same and have the Processes And Threads window open, you should see that Notepad.exe has multiple threads in it and Cmd.exe has two threads.

The ~ (Thread Status) command shows the active threads in the current process. Switching to the Notepad.exe process and issuing the ~ command creates the following output on Windows XP Tablet PC Edition:

1:001> ~ .  1  Id: ff4.b10 Suspend: 1 Teb: 7ffdd000 Unfrozen    2  Id: ff4.fc0 Suspend: 1 Teb: 7ffdc000 Unfrozen    3  Id: ff4.eec Suspend: 1 Teb: 7ffdb000 Unfrozen    4  Id: ff4.e48 Suspend: 1 Teb: 7ffda000 Unfrozen    5  Id: ff4.d58 Suspend: 1 Teb: 7ffd9000 Unfrozen    6  Id: ff4.f0c Suspend: 1 Teb: 7ffd8000 Unfrozen    7  Id: ff4.b78 Suspend: 1 Teb: 7ffd7000 Unfrozen    8  Id: ff4.e34 Suspend: 1 Teb: 7ffd6000 Unfrozen

As with the | command, the ~ command uses a dot to indicate the current thread and an octothorpe to signify the thread that either caused the exception or was active when the debugger attached. The WinDBG thread number is the next displayed item. As with process numbers, there will be only one thread number 2 for the life of the WinDBG instance. Next come the ID values, which are the Win32 process ID followed by the thread ID. The suspend count is a little confusing. A suspend count of 1 indicates the thread is suspended because you are doing live debugging and are stopped in the debugger. A suspend count of 0 will be shown if you are doing noninvasive debugging, which I'll talk about later in the chapter. If the suspend count is greater than 1, that means there have been multiple calls to the SuspendThread API done on that thread. After the suspend count is the linear address of the Thread Environment Block (TEB) for the thread. The TEB is the same as the Thread Information Block (TIB), where the Windows operating systems store the state of the native thread in memory. Finally, Unfrozen indicates whether you've used the ~f (Freeze Thread) command to freeze a thread. (Freezing a thread from the debugger is akin to calling SuspendThread on that thread from your program. You'll stop that thread from executing until it is unfrozen.)

A command will work on the current thread by default, but sometimes you'll want to see information about a different thread. For example, to see the registers of a different thread, you use the thread modifier in front of the r (Registers) command: ~2r. If you have multiple processes open, you can also apply the process modifier to the commands. The command |0~0r shows the registers for the first process and first thread no matter which process and thread are active.

One trick with the ~ command that is not documented is that if you issue ~ followed by a thread number, WinDBG will display the thread's starting address, the priority, and the priority class. If you issue ~*, you'll see the detailed data for all threads.

0:002> ~0    0  Id: f84.a0c Suspend: 1 Teb: 000007ff`fffde000 Unfrozen       Start: 11000000`00016a3e       Priority: 0  Priority class: 32

Creating Processes from the Command Window

Now that you know how to view processes and threads, I can move into some of the more advanced tricks that you can perform to get processes started under WinDBG. When stopped in WinDBG, the .create (Create Process) command lets you start up any arbitrary processes on the machine. This is extremely helpful when you need to debug multiple sides of a COM+ or other cross-process application. The main parameters to .create are the complete path to the process to start and any command-line parameters to that process. As when you start any processes, it's best to put the path and process name in quotation marks to avoid issues with spaces. The following code shows the use of the .create command to start Solitaire on one of my development machines:

.create "c:\windows\system32\sol.exe"

After pressing Enter, WinDBG indicates that the process will be created on the next execution. What that means is that WinDBG must allow the native debugger loop to spin over in order to handle the process creation notification. WinDBG has already made the CreateProcess API call, but the debugger hasn't seen it yet. By pressing F5, you will release the debug loop. The create process notification comes through, and WinDBG will stop on the loader breakpoint. If you use the | command to view the processes, WinDBG shows any processes started with .create marked as "create" as if you started the session with that process.

Attaching to and Detaching from Processes in the Command Window

If a process is already running on the machine and you want to debug it, the .attach (Attach to Process) command does the trick. The .attach command requires the process ID in order to perform the attach. If you have physical access to the machine the process is running on, you can look up the process ID with Task Manager, but for remote debugging, that's a little hard to do. Fortunately, the WinDBG developers thought of everything and added the .tlist (List Process IDs) command to list the running processes on the machine. If you're debugging Win32 services, use the -v parameter to .tlist to see which services are running in which processes. The output of the .tlist command looks like the following:

0n3364 C:\WINDOWS\system32\sol.exe  0n496 C:\Program Files\Windows NT\Pinball\PINBALL.EXE 0n3348 C:\WINDOWS\system32\inkball.exe

When I first saw the output, I thought there was a bug in the command and somebody accidentally typed "0n" instead of "0x." However, I've since learned that 0n as a prefix is the ANSI standard for decimal in the same way 0x is for hexadecimal.

Once you have the decimal process ID for the process, you'll pass it as the parameter to .attach (ensuring that you use the 0n prefix or it won't work). As it does when creating processes, WinDBG will say something about the attach occurring on the next execution, so you'll need to press F5 to let the debugger loop spin. From that point on, you're debugging the process you attached to. The only difference is that the | command will report the process as "attach" in its output.

To allow a process you're debugging to run outside the debugger, the .detach (Detach from Process) command is available to allow debuggees the ability to run free once again. Because it works only on the current process, you'll need to switch to the process you want to detach from before you execute the .detach command. At any point, you can reattach to the process to do full debugging.

If you looked at the help for the .attach command or carefully at the Attach to Process dialog box when you start WinDBG, you'll see a reference to a noninvasive attach. This type of attach was originally put in for supporting operating systems older than Windows XP and Microsoft Windows Server 2003. On prior operating systems, once you had a debugger attached to a process, that debugger was attached to that process for life. If the debugger shut down, so did the debuggee.

The noninvasive attach called the SuspendThread API on the target process threads and WinDBG just reads memory from the suspended process. You aren't debugging the process, you are examining it. Now that we are on later operating systems that support true detaching while debugging, the need for the noninvasive attach has lessened. However, when we get to the ADPlus discussion later in this chapter, you'll see that the noninvasive attach is used when ADPlus is doing a hang mode configuration. That's so ADPlus runs on Microsoft Windows 2000 and prior operating systems.

The WinDBG noninvasive attach allows me to discuss one of my favorite debugging tricks. If you're doing pure managed debugging in Visual Studio and you reach a point at which you want to load SOS and look at the heap an object is in, you're completely out of luck. If you run into that scenario, you can break into the process in the Visual Studio debugger. With the debuggee stopped in the debugger start WinDBG and noninvasively attach to the suspended debuggee. Then you can perform any informational command that WinDBG offers, such as !handle, and even load SOS. Now you have the best of both worlds even if you forgot to start mixed mode debugging from the beginning. When you want to get back to debugging the process with Visual Studio, you'll need to use the Q (Quit) command in WinDBG to end the noninvasive attach.

Walking the Native Stack

Because WinDBG is a native-only debugger, it's important to look at the native stack so you have a fighting chance of seeing how your .NET application is working with the operating system. To get that native call stack, the k (Display Stack Backtrace) command with one of its modifiers does the trick. The most useful modifier is P, which will show any native function parameters and their types for those modules for which you have private .pdb files. Because parameter information is part of the private native data in a .pdb file, you'll see them only for your native DLLs, but that can be a lifesaver for seeing what's going on with your interop code.

In the following example, I stopped a very simple .NET program on the 64-bit version of Windows XP as it made a call to the Windows API function, OutputDebugString, which DefaultTraceListener calls whenever you use Trace.Write*. You will see slightly different output if you try the same operation on a 32-bit operating system.

0:000> kP Child-SP          RetAddr           Call Site 00000000 0012e990 00000000 78d9fb19 KERNEL32!RaiseException+0x5c 00000000 0012ea60 00000000 78d9f743 KERNEL32!OutputDebugStringA+0x76 00000000 0012ed60 00000000 75ecce24 KERNEL32!OutputDebugStringW+0x42 00000000 0012edb0 00000000 794769e5 mscorwks!DoNDirectCall__PatchGetThreadCall+0x78 00000000 0012ee50 00000000 79476b75 System_ni+0x2269e5 00000000 0012ef20 00000000 79476c11 System_ni+0x226b75 00000000 0012ef80 00000000 79467089 System_ni+0x226c11 00000000 0012efc0 00000000 1a7501cd System_ni+0x217089 00000000 0012f050 00000000 75ecf422 0x1a7501cd 00000000 0012f080 00000000 75d9cb5a mscorwks!CallDescrWorker+0x82 00000000 0012f0d0 00000000 75d9afd3 mscorwks!CallDescrWorkerWithHandler+0xca 00000000 0012f170 00000000 75cf099a mscorwks!MethodDesc::CallDescr+0x1b3 00000000 0012f3b0 00000000 75e56775 mscorwks!ClassLoader::RunMain+0x22e 00000000 0012f610 00000000 75e2ebe8 mscorwks!Assembly::ExecuteMainMethod+0xb9 00000000 0012f900 00000000 75e6a523 mscorwks!SystemDomain::ExecuteMainMethod+0x3f0 00000000 0012feb0 00000000 75e78205 mscorwks!ExecuteEXE+0x47 00000000 0012ff00 00000000 7401a726 mscorwks!CorExeMain+0xb1 00000000 0012ff50 00000000 78d5965c mscoree!CorExeMain+0x46 00000000 0012ff80 00000000 00000000 KERNEL32!BaseProcessStart+0x29

When reading the output, the k command shows the module and function separated by an exclamation point. The line "KERNEL32!OutputDebugStringA+0x76" indicates that the module is Kernel32.dll, and the function is OutputDebugStringA. The "+0x76" is the offset into the function where the address lies. Since the native code needs either the .pdb files from Microsoft or your native .pdb files to correctly walk the native stack, you should always see offsets generally less than 0x100 (though there may be few larger offsets that are still correct). The native stack walk simply looks for the closest symbols, and if .pdb files are missing for a particular module, you may see extremely large offsets on the functions. I just wanted to give you some advance warning of what happens when you look at your own native stack walks.

If you have good symbols for both your application and the operating system, which you had better have after reading this book, you'll get good call stacks out of the k command. However, you will see lines in the call stack output that says, "WARNING: Frame IP not in any known module. Following frames may be wrong." That means that the k command is forced to take some guesses at the stack and may be incorrect because the native stack walking code knows nothing of .NET. Other warnings notes, such as "WARNING: Unable to verify checksum for module_name," are benign and are output by the symbol loading portions of WinDBG. You'll also see lines that start with "ERROR: Module load completed but symbols could not be loaded for module_name." If the module is one of the native image modules (its module name ends with "_ni"), you can ignore that message. As we've seen, WinDBG knows how to deal with those.

The good news is that the k command does go to heroic lengths to attempt the stack walk. In the previous example, you can see that it picks up somewhere in a module called MScorwks.dll, which is the main Common Language Runtime (CLR) DLL.

At the very bottom of the stack walk is BaseProcessStart in Kernel32.dll. For the main thread of the application, this is the equivalent to going back to the beginning of time. If this happened to be a call stack for a thread other than the main thread, the call stack would go back to BaseThreadStart. If you have a call stack all the way to BaseProcessStart or BaseThreadStart, the odds are excellent that the native call stack is perfect. Unfortunately, on x86, it's relatively rare that you'll see your mixed managed and native call stacks going back that far because of the various native calling conventions and the requirement of Frame Pointer Omission data from the .pdb file. On x64, the call stacks will always walk back to Base*Start because there's only a single calling convention. If you do have cases in which your call stacks don't go back to Base*St art , keep in mind that WinDBG stops at 50 calls, so if you have a deeper stack, pass a number after the kP command to indicate how far you want to go. It's rare that you have more than 250 items on the stack, so I'm in the habit of issuing the command kP 250 to ensure that I get the full stack. Although we can't always walk native stacks back to the beginning of time, the good news is that all your managed stacks will walk correctly even if you don't have symbols. Such is the beauty of full metadata in the .NET world.

Before moving on to exceptions and events, you should know two additional tricks for looking at native stacks. The first is that if you want to look at the call stack for a particular thread that is not the current thread, you can prefix the k command with the thread. For example, to see the native stack for the fifth thread, you'd issue the ~4kP command. (Remember, the thread numbers start at zero.) Finally, instead of typing each thread manually to see their stacks, you can use the special prefix ~*e, which tells the debugger to execute the command following the e on each thread. Thus, issuing the ~*ekP command walks all the threads call stacks in one fell swoop.

In discussing the k command, I've been assuming that you do not have a corrupted stack registerotherwise, stack walking could not occur. The native stack can also be wrong if you don't have the .pdb file for some of the binaries also. If you're dealing with bad native stack issues, the secret killer debugging trick is to use the dps (Display Words and Symbols) command with the stack register as the memory to analyze. For all CPUs, you'll pass the pseudo register, $csp, which will use the appropriate stack register for the operating "bitness." The command, dps $csp, will treat the values on the stack as addresses to look up in the native symbol tables. It's similar to dumping memory, but instead, it will attempt the symbol lookup for every address found. The dps command remembers the last address it displayed, so to continue up the stack, issue the dps command with no parameters.

The 'p' in dps stands for pointer and uses the architecture-specific pointer size when looking up values. The cousin commands, dds and dqs, treat pointers as double word and quad word size respectively.

If the stack pointer register is pointing to 0, dps obviously won't do you much good, but if you think you know where the stack lies for the thread, it can be a lifesaver to get you going. The following shows the dps $csp command output when I randomly stopped in an application:

0:004> dps $csp 00000000 042bec28  00000000 77d6cfbb                               KERNEL32!WaitForMultipleObjectsEx+0x1cf 00000000 042bec30  00000000 02770b10 00000000 042bec38  00000642 7f4d530e mscorwks!ClrFlsGetValue+0xe 00000000 042bec40  00000642 787af1d0 mscorlib_ni+0x7af1d0 00000000 042bec48  00000000 042bee40 00000000 042bec50  00000000 00000000 00000000 042bec58  00000642 7f4f4874                                 mscorwks!EETypeHashTable::FindItem+0x44 00000000 042bec60  00000642 78826dd8 mscorlib_ni+0x826dd8 00000000 042bec68  00000000 042becd0 00000000 042bec70  00000000 00000001 00000000 042bec78  00000000 00000000

Exceptions and Events

The operating system knows when a debuggee is running under a debugger, and whenever the debuggee performs a specific set of actions, the operating system suspends all the threads in the debuggee and notifies the debugger that a debugging event occurred. Without knowing anything about debuggers, you can probably guess that the set of the debugging events are: process create, process exit, thread create, thread exit, module load, module unload, a call to the OutputDebugString API, and an exception occurred. By the way, the exceptions are Windows exceptions and are reported through Structured Exception Handling (SEH). .NET 2.0 implements exceptions internally with SEH, but that's not to say that the internals may not change in future releases.

Most debuggers, including Visual Studio, allow you to perform actions only when various exceptions occur; WinDBG gives you much more control. For example, if you're doing live debugging, and you want to stop when a particular module is first loaded into memory, you can do so. Where WinDBG gets even more interesting is that it allows you to associate any WinDBG commands, including SOS commands, with events so that you can do extremely powerful live debugging.

Although you may not be doing a lot of live debugging of your .NET applications with WinDBG, the extra event and exception handling will pay for themselves many times over. In the last chapter, I discussed the improved Exceptions dialog box in Visual Studio, in which you could have the debugger stop each time your application throws an exception. The one problem is that, as you saw, Visual Studio reports the exceptions with a debugger dialog box, and you have to physically click the OK button to continue the execution.

What would be a lot better, especially when testing your code, is if you could have the debugger see that you had an exception, automatically perform common operations, and continue execution with no input required. That's exactly what you can do with WinDBG's exception and event handling. For example, if you assigned the command kP;gc to .NET exceptions, you'd get the native call stack in the Command window. The semicolon is a command separator, so it acts as press of an Enter key. The gc command continues execution based on how you were executing, such as native code single stepping.

In the Command window, the sx, sxd, sxe, sxi, sxn (Set Exceptions) commands do all the work, and I'll talk about them more in a minute. However, the easy way to set how you want debugging event handling is through the Event Filters dialog box, which you can get to by choosing Event Filters in the Debug menu and is shown in Figure 6-6. To access the Event Filters dialog box, the debuggee will need to be stopped in WinDBG.

Figure 6-6. The Event Filters dialog box

Even with a dialog box to help you, it's still a little confusing to figure out what happens with an exception because WinDBG uses some odd terminology in the sx* commands and the Event Filters dialog box. The Execution section near the lower right corner of the dialog box indicates how you want WinDBG to handle the exception when it is first thrown. Table 6-1 explains the meanings of the values in the Execution group box.

Table 6-1. Exception Break Status
Status	Description
Enabled	When the exception or event occurs, the target breaks into the debugger.
Disabled	The first time the exception or event occurs, the debugger ignores it (but reports it in the Command window). The second time it occurs, execution halts and the target breaks into the debugger.
Output	When the exception or event occurs, it doesn't break into the debugger. However, a message informing the user of this exception is displayed.
Ignore	When the exception or event occurs, the debugger ignores it. No message is displayed.

You can ignore the Continue section in the lower right corner. It's important only when you want different handling on breakpoint, single-step, and invalid-handle exceptions. If you add your own structured exception handling (SEH) errors to the list, leave the Continue option at the default, Not Handled. That way, any time the exception comes through WinDBG, WinDBG will properly pass the exception directly back to the debuggee. You don't want the debugger "eating" exceptions other than those it caused, such as a breakpoint or a single step.

After selecting a particular exception, the most important button in the dialog box is the Commands button. The name alone should give you a hint about what it does. Clicking the Commands button brings up the Filter Command dialog box shown in Figure 6-7. The first edit control is misnamed and should be labeled First-Chance Exception.

Figure 6-7. Filter Command dialog box

The terms first chance and second chance are from the native debugging side of the house, but they still apply to .NET debugging. The difference between these two terms has to do with when a debugger sees the exception and what happens to that exception.

If your application is running along and it encounters an exception, the operating system debugging API suspends the debuggee (your application) at the instruction that caused the exception. The debugger is notified that an exception has occurred in the debuggee and that it's the first chance the debugger has to look at ithence the name first chance. The other way to think about this is that the first chance exception is when your code throws the exception.

The debugger looks at the exception and has two decision paths. If the debugger caused the exception in the debuggee, it has to undo the changes it made to the debuggee's state so it doesn't corrupt the debuggee. If you're wondering why a debugger would be messing with the debuggee's state, it's actually quite common; the canonical example is a native breakpoint. When handed the first chance exception, the debugger looks at the address and type of the exception. If the address matches the location where the debugger sets a breakpoint, and the exception is a breakpoint, the debugger does its magic to make the breakpoint disappear and stops in the debugger for the developer. Because the debugger is "eating" the exception, there's no way the debuggee can ever see that a breakpoint occurred.

The other option the debugger has with the first chance exception is after looking at it to proclaim, "This isn't mine" and hand it back to the debuggee. The operating system restarts the debuggee to let the exception be treated normally in the context of the debuggee's exception-handling code. If the debuggee has exception handling, and a catch block handles the exception, the debuggee continues execution from that catch block. The debugger reports that it saw the exception so you'll know a handled exception occurred.

If the exception handed back to the debuggee causes the exception handling to unwind to the final protecting exception handler set by Windows, it's an unhandled exception and the application crashes. At that point, the operating system again suspends the debuggee process and notifies the debugger that the exception is the second chance the debugger has had to see the exception. With the application crashed, the debugger will stop and show the origin location of the exception.

To set up WinDBG to execute a command whenever a .NET exception occurs, open the Event Filters dialog box and look for the CLR exception about halfway down the list. Select CLR Exception, and then click Commands button to specify what will be executed whenever any CLR exception occurs.

When a CLR exception does occur, you will see a message like the following in the Command window:

(d20.c1c): CLR exception - code e0434f4d (first chance)

The string in parentheses is the Windows process and thread ID of the process and thread having the CLR exception. The value 0xE0434F4D is the SEH exception for .NET exceptions. If you stare at that value long enough, you may see an interesting value encoded in it. I'll leave it as an exercise for the reader to use WinDBG's .formats (Show Number Formats) command to see the hidden secret meaning.

If you want to stop on each .NET exception in WinDBG, you've seen how you can use the Event Filters dialog box to set the Execution to Enabled for CLR Exception to stop in the debugger whenever they are triggered. To do the same in the Command window, you'll use the sxe variant of the sx command, which enables the exception, and pass the exception value as the parameter to the command: sxe clr. The sx family of commands has predefined values for the common exceptions, which is why you can use clr. If you wanted to be hard-core, you could use the equivalent command sxe e0434f4d.

If you're really feeling hard-core and looking at the Event Filter dialog box will take up too much of your precious time, you can also specify the command string you want to execute when the exception occurs. To do the same action at the command line we did earlier by telling WinDBG to walk the stack and continue when any CLR exception occurs, you can use the command sxe -c"kp;gc" clr.

If you've looked closely at the Event Filters dialog box in Figure 6-6, you may have caught a glimpse of an event type called CLR Notification Exception nestled under the CLR Exception event. The name sounds interesting, and my first thought was that it had something to do with the Managed Debugging Assistants (MDA) I talked about in Chapter 4. My assumption was wrongit has nothing to do with MDA. It turns out that the CLR notifications are undocumented, and I've never gotten them to work. However, the SOS !bpmd command uses them to assist in setting breakpoints.

Stopping on Trace Statements

One other exception type that comes in extremely handy when doing live debugging is the strangely named Debuggee output. That's an odd way of saying trace statement. The previous "Walking the Native Stack" section used examples where I stopped on a trace statement, and what I want to show you now is how I did that. The debuggee output has a predefined event code of out, so if you simply issue an sxe out command, you'll stop whenever your managed application calls Trace.Write* and the DefaultTraceListener calls the native OutputDebugString API function. As you'll see when we get deep into SOS, it is possible to set breakpoints, but it's quite painful. Cheating and using sxe out to stop in WinDBG on your trace statements makes life much easier.

Those of you working on large projects are probably shaking your head right now at the thought of stopping in WinDBG every 1.5 seconds as a trace statement whizzes by. One of the super advanced tricks in my debugging bag is to take advantage of a little known feature of sxe out that lets you specify the trace statement string you want WinDBG to break on when it sees that string from one of your trace statements. If, for example, you wanted to stop on the trace statement when you open a database, you can specify the string to stop on after the sxe out.

There are some limitations to what WinDBG can take as the string to monitor. It can't contain any spaces, nor can it have any colons in it. The good news is that they do support enough of regular expression matching that you should be able to piece together a string that will be an exact match.

If the trace statement you want to stop on is something like "Opening database: {0}" (the {0} indicates the position where you'd put in the database name). In that case, your command in WinDBG would look like the following: sxe out:Opening?database*. The first question mark (?) takes care of the space, and the trailing asterisk (*) indicates that you'll take zero or more characters on the end of the string. For exact details, search for "String Wildcard Syntax" in the WinDBG Help.

Commands for Controlling WinDBG

Since I've talked about controlling what happens on exceptions in your application, it's a good time to talk about some of the useful commands you can use to control WinDBG itself. I've already covered some of the important meta commands (also known as dot commands) in discussing how to get help (.hh) and symbol handling (.reload). What I want to talk about now are a few of the meta commands that will come in handy in your day-to-day debugging battles.

The simplest, yet extremely useful command is .cls (Clear Screen). This allows you to clear the Command window so that you can start fresh. Because WinDBG can fill the Command window with a tremendous amount of information, which takes memory to store, it's good to clean the slate occasionally.

If you want to clear just a portion of the Command window, you can highlight the portion you want to delete and select Clear Command Output from the Edit menu. If nothing is selected, the effect is the same as issuing the .cls command. Right-clicking the Command window title bar brings up a shortcut menu that also has the Clear Command Output item. No matter how you decide you want to clear the Command window, make sure it is what you want to do because there's no undo capability, so once the output is gone, it's gone forever.

Another extremely useful command is .shell (Command Shell), which allows you to start up a Command Prompt console window from the debugger and redirect output to the Command window. Debugging on the same machine the debuggee is running on and pressing Alt+Tab might be an easier approach, but the beauty of .shell is that you get the output in the debugger even when doing remote debugging with the Command Prompt console running on the remote machine. You can also use the .shell command to run a single external program, redirect its output, and return to the Command window. After issuing a .shell command, the Command window input line says INPUT>, indicating that the Command Prompt console window is waiting for input. To close the Command Prompt and return to the Command window, use either the MS-DOS exit command, or preferably, the .shell_quit (Quit Command Prompt) command because it will terminate the Command Prompt even when the window is frozen.

The .shell command has a couple of very interesting options. The first is -x, which spawns the process you want to start completely detached from WinDBG. For example, if you issue the command .shell notepad, WinDBG assumes that the program is a console-based program, so the Command window switches to the >Input prompt ready to display the standard IO. Because Notepad.exe is a GUI program, the only action you can perform is .shell_quit.

If you specify the - x option before the command to execute (.shell -x notepad), WinDBG will just do a normal CreateProcess API call on the command to start so it runs cleanly and stays running even after you quit WinDBG. For those of you who want to be true alpha geeks, the fact that WinDBG now easily supports process creation means that you no longer have to use Explorer as your Windows shell; WinDBG will work perfectly well.

The second option to the .shell command, -ci, allows you to execute a series of WinDBG commands and redirect their output to a program. For example, if you whip up a Perl script to parse the output of the k command, you can pass the output like so: .shell -ci "~*ekp" perl.exe parseKcommand.pl.

In the Exceptions and Events section, I talked about how you could have WinDBG execute commands when exceptions occurred in your program. If you have a long-running process, and there are numerous exceptions triggering, you can easily lose yourself in a ton of text in the Command window. This is especially important if you are chaining many commands together. To make your life easier, the .echo (Echo Comment) command will output a string of your choosing into the Command window when executed. That way it will be much easier to find key locations in the execution transcript.

The .wtitle (Set Window Title) command is one of those commands that at first glance does not seem that useful, but it turns out to be a command you use all the time. When you are dealing with multiple dump files from the same application, you end up opening many of them at the same time to compare states and values. However, unless you are a master at divining the exact location of each WinDBG instance location on the screen, it can get confusing extremely quickly which window you are looking at or about to Alt+Tab to. By using .wtitle to set the WinDBG title, you'll speed up your multiple instance usage tremendously.

Keeping a log of your trials and tribulations inside WinDBG especially so you can look at data after the fact or send the transcript to someone else is a major key to getting great help from others on your team. WinDBG can log everything that goes to its Command window in a myriad of ways. To open a log file, use the .logopen (Open Log File) command. If you want to have the date and time appended on the name of the log file, pass the /t option before the file name passed to .logopen. To have WinDBG automatically name the file for you based on the name of the process you're debugging, pass /d as the only parameter to .logopen.

If you want to append to an existing log file instead of opening a new one, use the .logappend (Append Log File) command. Of course, to close a log file at any time, use the .logclose (Close Log File) command. A new feature of WinDBG is that you can also select Open/Close Log File on the Edit menu to do the same operations.

The last meta command I'll share with you is one that's part of my secret debugging tricks. When writing error handling, you usually know that by the time you're executing the error handling, your process is in serious trouble. You also know that if you hit a particular piece of error handling, 9 times out of 10, you're probably going to look at specific variable values or the call stack, or you will want to record specific information. What I've always wanted was a way to code the debugger commands I would normally execute directly into my error handling. By doing that, the commands would execute, enabling the maintenance programmers and me to debug a problem faster. My idea was that since trace statements calls go through the debugger, you could embed the commands into a trace statement. You'd tell the debugger what to look for at the front of the trace statement text, and anything after it would be the commands to execute.

What I've just described is exactly how WinDBG's .ocommand (Expect Commands from Target) command works. You call .ocommand, identifying the string prefix to look for, at the front of any trace statement calls. If the command is present, WinDBG will execute the rest of the text as a command string. Obviously, you'll want to be careful with the string you use, or WinDBG could go nuts trying to execute trace statement calls all through your programs. I like to use WINDBGCMD: as my string. I love this command and sprinkle WinDBG command strings all over my programs!

When using .ocommand, if you don't follow the command string with ";gc", WinDBG stops when the command ends. In the following function, I ensure that the commands all end with ";gc" so that execution continues. To get the commands to execute, I issue .ocommand WINDBGCMD: before the program starts. Note that in the trace statement, you can use any commands you would enter at the command line. The following shows a quick example of a trace statement passing a command to WinDBG. I'll discuss the full usage of the .dump (Create Dump File) in the next section.

// Some work takes place... catch ( FileNotFoundException ex ) {     Trace.WriteLine ( "WINDBGCMD: .dump /ma /u FNFE.dmp;kP;gc" ); }

Dump File Handling

We've been discussing the meta commands you'll be using while running WinDBG, but the most important meta command is .dump (Create Dump File). Because dump files, mainly known as minidumps, are so vital to tracking down those production-only problems, I want to spend some time discussing what they are and exactly what you'll need to do to get the best dumps for .NET development.

In the previous chapter, I discussed how to create and load minidumps from Visual Studio. You're probably wondering why I'm recommending in this chapter that you use a much tougher-to-use tool rather than the nice-and-easy Visual Studio. If you're dealing with minidumps created on your development machine during coding and testing, Visual Studio is absolutely easier to use, and in fact, that's what I use in those situations. However, for dumps from production systems, WinDBG, though more painful to use, offers the power and flexibility you need to look at the toughest problems. This will become much more apparent when we get to loading and using SOS.

What Is a Minidump?

A minidump is a snapshot of the application at a point in time and is akin to core dumps from other operating systems. There are kernel-mode minidumps that the operating system writes when a device driver or other kernel-mode component has an unhandled exception. For more information about kernel-mode debugging and minidumps, see the excellent document kernel_debugging_tutorial.doc that's in the WinDBG installation directory.

A user-mode minidump contains the state of the application at a particular point in time. That point can be at the instant of a crash or at any time you used ADPlus to create a minidump of a running process. For those of you who have seen minidumps in the wild, you know the term minidump is somewhat of an oxymoron. It's normal to have a usable minidump of a large ASP.NET system to be more than 500 MB. Minidumps that size are a little tough to send as an e-mail attachment.

You're probably scratching your head and wondering why a "mini" dump can be so big and why isn't there a format called full dump that captures everything? There actually is a format for a dump called full dump, and the Help on the .dump command mentions that you can still create them. The full dump format is an older format and has less information than the minidump format. Consequently, it's far better to use the minidump format and put all information into it, even though that makes the created minidump file actually larger than a full dump.

For our purposes, there are only two types of minidumps: a basic minidump and a full-memory minidump. A basic minidump contains essentially only two pieces of information. The first is the version information for each of the loaded modules. The second set of data consists of all the pages of memory necessary to walk the native call stack of each thread in the system.

Given the small amount of data collected, the corresponding file size of a basic minidump is quite small. For a C# console program that does nothing but call Trace.WriteLine, the basic minidump on Windows XP Professional x64 Edition is 80 KB. In the native debugging world, a basic minidump is nearly all you need.

A full-memory minidump starts out with the same information as a basic memory minidump but adds every allocated and committed virtual page of memory. That means that you're not only including the current working set, which consists of pages currently in memory, but also all the pages that are swapped out. That's how the numbers add up big in full-memory minidumps. Saving a full memory minidump of the same C# console program now yields 88.9 MB.

When it comes to .NET applications, SOS needs the full-memory minidump to be fully effective. If you step back and think about why, it makes sense. A .NET application is Just in Time (JIT) compiled, and data is in the garbage-collected heap. Without those chunks of memory available in the minidump, there's nothing for SOS to process to tell you about the .NET portions of the process. If all you can get is a basic minidump, you should use a new feature of SOS for .NET 2.0it can at least walk the call stack and show you the managed methods on the stack.

Creating Minidumps in WinDBG

When doing live debugging with WinDBG, you can create a minidump at any time with the .dump command. All options you pass to the .dump command are passed before the file name for the dump. To specify creating a full-memory minidump, you'll pass the /ma option. If the output file already exists, as a safety precaution, the .dump command will not overwrite an existing file. To overwrite an existing file, use the /o option. To ensure a unique file name, add the /u option to have the date, time, and process ID of the specified file name so you can reexecute the command and keep a consistent first part of the output file name.

If you're going to be saving off numerous minidumps from a single debugging session, you'll want to make sure to use the comment option, /c, to have your comment printed to the Command window when you open the minidump. The last .dump command option worth mentioning is the /b switch, which compresses the minidump into a .cab file. If space is an issue on a machine, it could be helpful. However, WinDBG writes the minidump to temporary storage before compressing into a .cab file. When using /b, you see the message "Creating a cab file can take a VERY VERY long time" (sic) in the Command window, so it may be a good excuse for a stretch break.

Opening Minidumps in WinDBG

Creating minidumps is nice; opening them is even more important. Before I discuss how to open, I need to remind you to ensure that you have the _NT_IMAGE_PATH environment variable set to the same value as your _NT_SYMBOL_PATH environment variable. Because the minidumps do not have the actual binary files in them, WinDBG will need to know where it can find the exact versions of the files referenced in the minidump so it can load the appropriate symbols. Hence, you need to tell WinDBG that they are in your Symbol Server.

The easy way to open a minidump is to start WinDBG and on the File menu, click Open Crash Dump or press the keyboard accelerator key (Ctrl+D). If you want to start WinDBG and have it open a minidump immediately, add the WinDBG - z option to WinDBG's command line followed by the name of the minidump file. In an ode to ease of use, WinDBG amazingly supports drag and drop from Windows Explorer.

Once you have the minidump open, look immediately at the dump type information, which is output to the Command window right after the WinDBG version and copyright:

Loading Dump File [C:\Dev\MyDump.dmp] User Mini Dump File with Full Memory: Only application data is available

You can see from the output that the dump opened above is a full memory minidump, so it's completely usable by SOS. If the minidump type string says anything else, your minidump does not have sufficient information in it for full SOS usage. As I mentioned earlier, the SOS stack walking command will work with .NET 2.0, so if the minidump is the only one you have, at least try the SOS commands to see if you can get any information from the dump.

If you've opened a dump that was written as the result of a Windows SEH crash, in other words, a native code crash, the first command you may want to execute is .ecxr (Display Exception Context Record), which reports the state of the application at the time of the crash. This command tells WinDBG to use the exception record context, which is the data structure that contains the state of the application at the time of the crash. If the minidump was written from inside the crashing application, always issue the .ecxr command. If an external process, such as a debugger, wrote the minidump, the exception record context and the context record are one and the same, and you don't need to use .ecxr.

It never hurts to issue the .ecxr command when opening a minidump because the debugger will ignore the command if it's not needed. If you do run the .ecxr command, you'll see a new register set and call stack appear if you have the Call stack window open. Additionally, the k command will report on each use that it's starting at the exception record context for stack walking.

Once the minidump file is open, it's as if you've stopped the application at that location, so you can issue all the WinDBG and SOS commands you could want. Obviously, because you're looking at a snapshot of the application in time, you can't run any debugging actions, such as native breakpoints and single stepping. The great news is that all the wonderful informational commands are right there for you to figure out exactly what went wrong in the application.

The last trick I want to mention about opening minidumps helps in a situation in which you have multiple minidumps to look at. After you've opened the first minidump as described earlier, you can open other minidumps in the same instance of the debugger by using the .opendump (Open Dump Files) command. This command is extremely helpful when you have minidumps of two related processes, such as Internet Information Server (IIS) and one of the Microsoft ASP.NET worker processes. It's also nice to look at similar dumps of the same process at the same time.

Extremely Useful Extension Commands

You would think that after this many pages I would finally be up to discussing SOS in all its glory. If I jumped into SOS now, I'd be cheating you out of some extraordinarily powerful commands that WinDBG offers. These commands can truly make the difference between having full-memory minidumps simply chewing up tons of disk space and actually fixing the bug.

Even though this is a whole book on .NET debugging, none of you are writing pure managed applications. You have to interact with Windows somehow to communicate with the user, access files, or touch the network. Additionally, many of you are in environments where you have a significant investment in COM components or native DLLs that you'll be using until you retire. In this section, I want to cover some WinDBG commands that make seeing the interaction with the native side of your application much easier.

All of the commands I'm going to mention here are extension commands, which, like SOS, are DLLs that provide additional functionality to WinDBG. WinDBG comes with a standard set of extensions so you don't need to download or install anything else to take advantage of these commands, which you can see by running the .chain (List Debugger Extensions) command. Because the WinDBG team is responsible for supporting these commands, they've even gone to the trouble of automatically loading them so they are always at the ready.

The !analyze Command

The first command, !analyze, has almost mystical capabilities. If you're thinking: "Hey, with a name like that, could !analyze just tell me where my bugs are?" you're going down the right track. It has its limitations, but it's the first command you need to run when you open a minidump of an application that crashed or experienced deadlocks on the native side. You can use !analyze with live debugging, but it's not nearly as useful.

The !analyze command takes several parameters, but you'll use only three. Always pass the first one, - v, because it sets the output to verbose mode. Without the -v, the output is not worthwhile. The second option, -hang, will attempt to perform deadlock analysis. Before using -hang, switch to one of the threads that you think is deadlocked, and run !analyze -hang -v from that thread. The final option, -f, is one that you'll rarely use. Its purpose is to force the !analyze analysis to run even if there's not a crash. When doing live debugging, it can come in handy every once in awhile.

To show the power of the !analyze command, I whipped up a very simple managed application that crashes inside a native C++ DLL function. Of course, with .NET 2.0, you'll see the new AccessViolationException exception in your exception handler, but before the exception handler runs, you'll get the ubiquitous Windows crash message. When the crash message appeared, I attached to the crashed program with WinDBG and created a full-memory minidump. Instead of showing you all the output at once, I'll display the output and provide a running commentary on what you're looking at so it makes more sense:

0:000> !analyze -v *************************************************************************** *                                                                         * *                        Exception Analysis                               * *                                                                         * *************************************************************************** FAULTING_IP: CrashinDll!BigMistakeToCallMe+1c [c:\junk\cruft\crashme\crashindll\crashindll.cpp @ 40] 00000000 1b0b108c 66c7005000       mov     word ptr [rax],0x50

The first part of the !analyze -v output is the big banner indicating that exception analysis is running, which is something I wish the WinDBG team would take out. The first piece of information displayed is the address information about the crash itself. Because !analyze will load the necessary symbols, you may see symbol loading information between the banner and the FAULTING_IP text. As you can see from the example above, good symbols were loaded, and the analysis is pointing directly to the C++ native source line that crashed:

EXCEPTION_RECORD:  ffffffffffffffff -- (.exr ffffffffffffffff) ExceptionAddress: 000000001b0b108c (CrashinDll!BigMistakeToCallMe+0x000000000000001c)    ExceptionCode: c0000005 (Access violation)   ExceptionFlags: 00000000 NumberParameters: 2    Parameter[0]: 0000000000000001    Parameter[1]: 0000000000000000 Attempt to write to address 0000000000000000

The exception record is the data structure filled in by the operating system SEH when an application has an unhandled native exception. The !analyze -v displays the data structure so you can see the exact cause of the crash. If the exception code does not have the textual representation of the exception code, you can look the code up in WINNT.H. If the exception code does not exist, it may be a software exception by a direct call to the Windows API RaiseException, which you'll see at the top of the stack walk:

DEFAULT_BUCKET_ID:  APPLICATION_FAULT PROCESS_NAME:  CrashMe.exe ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s". WRITE_ADDRESS:  0000000000000000 BUGCHECK_STR:  ACCESS_VIOLATION

The next section shows more detailed information about the crash. If you were to write a quick tool that would parse the output of the !analyze command, this output would contain general descriptions of the crash.

MANAGED_STACK: (TransitionMU) 000000000023EDF0 000000001AF6076C CrashMe!DomainBoundILStubClass.IL_STUB(Int32)+0x9c (TransitionMU) 000000000023EED0 000000001AF608A8 CrashMe!SecurityILStubClass.IL_STUB(Int32)+0x58 (TransitionMU) 000000000023EF80 000000001AF6060F CrashMe!CrashMe.DoSomeWork.Fum(Int32)+0x2f (TransitionMU) 000000000023EFC0 000000001AF605A4 CrashMe!CrashMe.DoSomeWork.Fo(Int32)+0x34 (TransitionMU) 000000000023EFF0 000000001AF60534 CrashMe!CrashMe.DoSomeWork.Fi(Int32)+0x34 (TransitionMU) 000000000023F020 000000001AF604C4 CrashMe!CrashMe.DoSomeWork.Fee(Int32)+0x34 (TransitionMU) 000000000023F050 000000001AF603DE CrashMe!CrashMe.Program.Main(System.String[])+0xfe (TransitionUM)

Wow! The MANAGED_STACK output is certainly a very nice surprise from the !analyze -v command. Whereas the stack walk k command lacks any knowledge about managed code, the !analyze -v command seems to have some nice smarts built in to give us more information about what's going on in our applications. SOS has a command to walk the call stack, but the !analyze command is not using SOS and has the logic built in. We can all keep our fingers crossed that Microsoft will put these same smarts into the k command in future versions of WinDBG.

LAST_CONTROL_TRANSFER:  from 0000000075ecce24 to 000000001b0b108c

This indicates the last call on the stack.

[View full width]

(Note: For wrapping, the upper DWORD on the 64-bit addresses was removed.) STACK_TEXT: 0023ed40 75ecce24 : 0000000a 00000000 74968000 00000000 : CrashinDll!BigMistakeToCallMe+0x1c [c:\junk\cruft\crashme\crashindll\crashindll.cpp @ 40] 0023ed50 1af6076c : 0000000a 1a960fd0 7c897ef6 00000000 : mscorwks!DoNDirectCall__PatchGetTh readCall+0x78 0023edf0 1af608a8 : 0000000a 75c746a0 00226ae8 00000001 : 0x1af6076c 0023eed0 1af6060f : 0000000a 0000000a 0023ef0e 0023ef80 : 0x1af608a8 0023ef80 1af605a4 : 02071f48 0000000a 0023ef0e 0023ef80 : 0x1af6060f 0023efc0 1af60534 : 02071f48 0000000a 0023ef0e 0023ef80 : 0x1af605a4 0023eff0 1af604c4 : 02071f48 0000000a 0023ef0e 0023ef80 : 0x1af60534 0023f020 1af603de : 02071f48 0000000a 0023ef0e 0023ef80 : 0x1af604c4 0023f050 75ecf422 : 02071e00 00000003 fffffffe 00285fc8 : 0x1af603de 0023f0b0 75d9cb5a : 0000001d fffffffe 00000000 00000000 : mscorwks!CallDescrWorker+0x82 0023f100 75d9afd3 : 0023f238 00000001 00000001 00000000 :

mscorwks!CallDescrWorkerWithHandler+0xca 0023f1a0 75cf099a : 00000001 1a960e90 00000000 749dd912 : mscorwks!MethodDesc::CallDescr+0x1b3 0023f3e0 75e56775 : 00000000 00000000 fffffffe 75d409bc : mscorwks!ClassLoader::RunMain+0x22e 0023f640 75e2ebe8 : 0023fc80 0023fc98 0027d918 0025ec40 : mscorwks!Assembly:

:ExecuteMainMethod+0xb9 0023f930 75e6a523 : 00000000 00000000 00000000 75eabc4a : mscorwks!SystemDomain:

:ExecuteMainMethod+0x3f0 0023fee0 75e78205 : 00000000 0023e060 00000000 00000000 : mscorwks!ExecuteEXE+0x47 0023ff30 7401a726 : ffffffff 00270ba0 00000000 00000000 : mscorwks!CorExeMain+0xb1 0023ff80 78d5965c : 75be0000 00000000 00000000 0023ffd8 : mscoree!CorExeMain+0x46 0023ffb0 00000000 : 7401a6e0 00000000 00000000 00000000 : kernel32!BaseProcessStart+0x29

The STACK_TEXT section shows the usual stack trace. The output is identical to the kb command, which shows the stack trace and the first four parameters to the function for x64 versions and the first three for x86 versions. If you compare the addresses in the stack trace that don't show function names to the managed stack shown in the MANAGED_STACK section, you'll see that they match up. You'll see this matching only on x64 versions because there is only a single native calling convention, so it's much easier to walk the stack on x64 versions than on x86:

FOLLOWUP_IP: CrashinDll!BigMistakeToCallMe+1c [c:\junk\cruft\crashme\crashindll\crashindll.cpp @ 40] 00000000 1b0b108c 66c7005000       mov     word ptr [rax],0x50 SYMBOL_STACK_INDEX:  0 FOLLOWUP_NAME:  MachineOwner SYMBOL_NAME:  CrashinDll!BigMistakeToCallMe+1c MODULE_NAME:  CrashinDll IMAGE_NAME:  CrashinDll.dll DEBUG_FLR_IMAGE_TIMESTAMP:  42c31c8a STACK_COMMAND:  .ecxr ; kb

The FOLLOWUP_IP section is always the same value as the address that crashed. The rest of the fields are there to identify the crashing module, symbol, and the image timestamp of the crashing module. The STACK_COMMAND section shows the command used to produce the STACK_TEXT section, and as the WinDBG documentation says, you can use the same command on your own exploration of the crash.

FAILURE_BUCKET_ID:  X64_ACCESS_VIOLATION_CrashinDll!BigMistakeToCallMe+1c BUCKET_ID:  X64_ACCESS_VIOLATION_CrashinDll!BigMistakeToCallMe+1c Followup: MachineOwner

The last part of the !analyze -v output shows the bucket information. A bucket is the unique identifier calculated by the !analyze -v command to uniquely identify the fault for the crash. Using these buckets, you can build up a database of all the crashes reported in order to determine which crashes you're seeing more than others so you can apply your fixing effort to the more common crashes.

The Followup and the earlier FOLLOWUP_NAME fields are for using the !analyze command with a special file called Triage.ini. With this file, you assign ownership to modules and functions so you'll know whom to yell at when your application crashes. The !analyze documentation discusses how to use the Triage.ini file.

The !handle Command

Those of us from native Windows backgrounds are paranoid about ensuring that our handles are closed. Since a handle from an API such as CreateEvent is an opaque reference to an actual chunk of real memory, you have to be extremely cognizant of ensuring that you appropriately close handles you've opened. If you don't close that handle, you leak memory and system resources.

In .NET, classes such as EventWaitHandle do a good job of hiding the actual handle from the developer. However, if you're passing handles from .NET code to native code, there is potential to leak them. Fortunately, WinDBG supports a great command to let you see exactly which handles are open in your process at any given time: !handle.

Issuing !handle with no parameters will produce the following output:

0:000> !handle Handle 254   Type            File Handle 2a8   Type            Section . . . Handle 3c0   Type            Directory Handle 3c4   Type            Desktop Handle 3c8   Type            Event . . . 87 Handles Type             Count Event             41 Section           4 File              11 Port              1 Directory         2 Mutant            2 WindowStation     2 Semaphore         2 Key               7 Thread            4 Desktop           1 IoCompletion      9 KeyedEvent        1

The first part of the output shows the value of each handle and its type. The final portion shows the number of handles currently open in the process and the number of each type. If you suspect that you have a native handle leak, you can keep an eye on the handle count. If you see the overall count going up, you'll also see the count for the leaking type increasing as well.

The wonderful book, Microsoft Window Internals, Fourth Edition: Microsoft Windows Server 2003, Windows XP, and Windows 2000 by David Solomon and Mark Russinovich (Microsoft Press, 2004) explains all the different handle types that you'll see with the !handle command. Most are self-explanatory, but I need to mention that a Directory is a kernel-mode Windows Object directory entry, and a Mutant is a mutex.

To dig into more information about a specific handle, pass two additional values to !handle. The first is the handle value itself, and the second is f. The second parameter is a bit field indicating the additional fields you want to see, but you'll always pass f to see everything.

If you want to see detailed information about all the handles of a specific type, pass zero for the handle value, f for all data, and the handle type as the third parameter. For example, !handle 0 f Event will show the detailed information for all Event handles in the process.

The following example shows the display of the detailed data for two handle values: the first is for an event, and the second is for a registry key.

0:003> !handle 2b4 f Handle 2b4   Type            Event   Attributes      0   GrantedAccess   0x1f0003:          Delete,ReadControl,WriteDac,WriteOwner,Synch          QueryState,ModifyState   HandleCount     2   PointerCount    5   Name            \BaseNamedObjects\EventerUniqueID_3504   Object Specific Information     Event Type Manual Reset     Event is Waiting 0:003> !handle 2f8 f Handle 2f8   Type            Key   Attributes      0   GrantedAccess   0x20019:          ReadControl          QueryValue,EnumSubKey,Notify   HandleCount     2   PointerCount    3   Name            \REGISTRY\MACHINE\SOFTWARE\MICROSOFT\Fusion\PublisherPolicy\Default   Object Specific Information     Key last write time:  21:07:38. 6/20/2005     Key name Default

As you looked through the output, you probably saw some very interesting items related to those handles. The name of the object is the most important field because it's what allows you to easily identify the handle as it relates to your code. By seeing the name of a specific event in addition to its signal state, you now have a fighting chance to see if that's the event you are deadlocking on.

Of course, you won't see the handle name unless you named the specific handle value. If you issue a !handle 0 f Event in your application, you'll see that nearly all the events in your process have the completely descriptive name of <none>, which indicates that they have no name. By the way, only mutexes, semaphores, and events have optional names. Other handle types, such as registry keys and Windows Stations, have the name of the opened key or station.

An unnamed handle is not a bugthe handle name is an optional value passed to the constructors of EventWaitHandle, Mutex, and Semaphore classes. The reason the name is optional is that if you don't specify the name, the scope of the handle is limited to the process. If you name the handle, it becomes global to the machine in scope.

If you don't think that's a problem, consider the following scenario: You have a process in which you name an event handle Foo that you use to block one thread execution until another thread is finished with some work. Everything runs great if only one process is running. If two of the same processes are running and both happen to have threads waiting for the Foo event, when that event is signaled, both threads in the two separate processes will stop waiting. That's one serious bug!

If you're going to name your handles, and you need to keep them unique to the process, you'll need to add a unique identifier to the name if you want the handle to have just process scope. Because the process ID is guaranteed to be unique as long as the process is running, I always append the current value to the name string to ensure different values for multiple running applications.

The following truncated output shows the unique name I gave to a Semaphore class my application created:

0:003> !handle 0 f Semaphore Handle 270   Type            Semaphore   Attributes      0   GrantedAccess   0x1f0003:          Delete,ReadControl,WriteDac,WriteOwner,Synch          QueryState,ModifyState   HandleCount     2   PointerCount    4   Name            \BaseNamedObjects\SemaphoreUniqueID_3504   Object Specific Information     Semaphore Count 0     Semaphore Limit 1

Because naming an event exposes the event outside the process, I must also mention that tools such as Process Explorer, which I discussed in Chapter 4, can see those values. Depending on your application's security requirements, that could expose data you may not want others to see. At a minimum, you'll want to at least name your handles in Debug builds so you can track down problems easier.

If you're looking for a project that will make your debugging life easier, what we need is a quick tool that will allow us to search for a named handle value so we don't have to read through thousands of lines of !handle output. You could write the tool as a WinDBG extension, but you'd have to do that in native C++, and it's certainly an adventure to get extensions working and debugged. A better approach would be to write a .NET console application that reads the standard output so you could pump the output of !handle 0 f into the tool with the previously mentioned .shell -ci command.

Other Extension Commands

With the big commands !analyze and !handle out of the way, I want to mention a few other extension commands that you have at your disposal to look at various items in your application or how you're interacting with the system.

The !runaway command isn't what you do when faced with debugging a nasty problem with WinDBG; it's the command that will show you the thread times for a process. If your application is chewing up more CPU time than you expect, !runaway will show you the thread that's doing the chewing. If you pass f as the parameter to !runaway, you'll see User Mode Time, Kernel Mode Time, and Elapsed Time, which is the amount of time elapsed since the thread was created. The following shows the !runaway output for a simple application:

0:003> !runaway f  User Mode Time   Thread       Time    0:8d0       0 days 0:00:00.015    3:d58       0 days 0:00:00.000    2:db4       0 days 0:00:00.000    1:9d8       0 days 0:00:00.000  Kernel Mode Time   Thread       Time    0:8d0       0 days 0:00:00.015    3:d58       0 days 0:00:00.000    2:db4       0 days 0:00:00.000    1:9d8       0 days 0:00:00.000  Elapsed Time   Thread       Time    0:8d0       0 days 1:28:21.642    1:9d8       0 days 1:28:16.673    2:db4       0 days 1:28:16.658    3:d58       0 days 0:17:20.285

The !token command displays the detailed information about a security token to make your security programming easier. Always use the -n option to !token to see the friendly names for security groups, unless you have values such as S-1-5-21-603047887-89138312-1407646538-1004 memorized.

To find all the Token handles in your process, first issue the !handle 0 f Token command to list them. That will show you output like the following, which you can see does not show you much at all about the token:

Handle 490   Type            Token   Attributes      0   GrantedAccess   0xc:          None          Impersonate,Query   HandleCount     2   PointerCount    3   Name            <none>   Object Specific Information     Auth Id    0 : 0x3e4     Type       Impersonation     Imp Level  Identification 1 handles of type Token

Pass the Token handle value you're interested in to the !token command, and you'll see output like that that follows. One very nice feature of !token is if you don't specify a Token handle to display, it defaults to the thread token so you can see the impersonation state of the thread.

1:013> !token -n 490 TS Session ID: 0 User: S-1-5-20 (Well Known Group: NT AUTHORITY\NETWORK SERVICE) Groups:   00 S-1-5-20 (Well Known Group: NT AUTHORITY\NETWORK SERVICE)     Attributes - Mandatory Default Enabled   01 S-1-1-0 (Well Known Group: localhost\Everyone)     Attributes - Mandatory Default Enabled   02 S-1-5-21-603047887-89138312-1407646538-1004 (Alias: TIMON\IIS_WPG)     Attributes - Mandatory Default Enabled   03 S-1-5-32-559 (Alias: BUILTIN\Performance Log Users)     Attributes - Mandatory Default Enabled   04 S-1-5-32-545 (Alias: BUILTIN\Users)     Attributes - Mandatory Default Enabled   05 S-1-5-6 (Well Known Group: NT AUTHORITY\SERVICE)     Attributes - Mandatory Default Enabled   06 S-1-5-11 (Well Known Group: NT AUTHORITY\Authenticated Users)     Attributes - Mandatory Default Enabled   07 S-1-5-15 (Well Known Group: NT AUTHORITY\This Organization)     Attributes - Mandatory Default Enabled   08 S-1-2-0 (Well Known Group: localhost\LOCAL)     Attributes - Mandatory Default Enabled   09 S-1-5-5-0-54414 (no name mapped)     Attributes - Mandatory Default Enabled LogonId   10 S-1-5-32-545 (Alias: BUILTIN\Users)     Attributes - Mandatory Default Enabled Primary Group: S-1-5-20 (Well Known Group: NT AUTHORITY\NETWORK SERVICE) Privs:   00 0x00000001e Unknown Privilege         Attributes - Enabled Default   01 0x00000001d SeImpersonatePrivilege    Attributes - Enabled Default   02 0x000000017 SeChangeNotifyPrivilege   Attributes - Enabled Default  Auth ID: 0:3e4  Impersonation Level: Identification  TokenType: Impersonation

Before I leave the cool extension commands and the native side of WinDBG as a whole, I need to mention that although I've covered a good deal of interesting features and tricks in WinDBG, I have certainly not covered everything. The more you read the WinDBG documentation, the better you'll be able to learn WinDBG and SOS. You were probably wondering if I was ever going to get to SOS, but with the background on WinDBG under your belt, we can finally start looking at the managed pieces of your application with SOS!