Debugging IL | Advanced .NET Programming

If you do start writing IL code, you'll want to debug it. And even if you have compiled code from a high-level language, you may want to step through the corresponding IL code for debugging purposes or in order to understand the code better. Debugging IL code is not hard to do using VS.NET, although you need to do a bit of preparatory work to set up the required VS.NET solution. I'll illustrate how it works by using the Max sample from the last section.

Debugging with VS.NET

The first thing you need to do is assemble the Max.il IL source code file and generate a program database file. You do this by supplying the /debug flag to ilasm.exe:

 C:\AdvDotNet\ILIntro>ilasm max.il /debug

Besides the executable assembly, you'll also find a program database file, max.pdb, has been created. A .pdb file contains information that links the executable code to the source code. In general, when a debugger attaches itself to a process, it looks for the presence of this file, and loads it if present. It is the .pdb file that tells the debugger which source code instructions correspond to which executable instructions, and which variable names correspond to which addresses in memory, thus allowing you to debug by setting breakpoints and examining variables using original the source code.

Now you should launch VS.NET and choose the menu options to open an existing solution. Navigate to the folder in which the newly compiled max.exe file is located, and select max.exe as the solution to debug - yes, VS.NET really will let you do that. It will accept .sln files and .exe files as valid solutions.

Next, open the max.il file as an individual file (not as a solution). You can now set breakpoints in this file in the normal way, and then hit the F5 key or use the main menu to start debugging. VS.NET will recognize from the .pdb file that the max.il file is the source code for the executable it is required to run. You'll also find that when you tell VS.NET to start debugging, it will prompt you to create a new .sln solution file that describes this relationship. The next time you want to debug this IL file, you can just open this .sln file directly (although you'll still have to compile from the command line if you make changes to the .il file).

The really great thing about it is that, besides the source window, VS.NET provides the Disassembly window - this allows you to see the actual native code that is generated from the IL by the JIT compiler:

click to expand

It is also possible to use the Watch window to track the values of variables. If variables have been named, then you can use the supplied names to examine them. However, in the Max() method of the Max sample, neither the local variables nor the arguments have been given names. In this situation, we can type in the names V_0, V_1, etc., for the local variables (V_0 = local 0, V_1 = local 1, etc.) and A_0, A_1, etc. for the arguments. VS.NET will automatically match these pseudo-names up to the appropriate variable or argument. The following screenshot shows the situation after the first stloc.0 command in the Max() method has been executed, with 34 and 25 input as the two numbers:

click to expand

The two arguments have the correct values, and the first argument has just been stored in local variable 0.

Debugging IL Compiled From High-Level Languages

The above technique works fine for IL that you have written yourself, but there are times when you might have written some code in a high-level language that you need to debug. Unfortunately, the Disassembly window in VS.NET for high-level language projects shows the source code and the native assembly, but not the IL code - so if we want to debug at the IL level, we'll need to fool VS.NET into thinking that the IL is the source code. One way to do this is to take advantage of the ildasm.exe/ilasm.exe round-tripping facility (this will only work in C# and VB, not in C++, because round-tripping doesn't work for assemblies that contain unmanaged code). To do this, you simply compile the code as normal. Then once you have the compiled assembly, you can disassemble it using ildasm.exe:

 ildasm MyProject.exe /out:MyProject.il

Applying the out flag to ildasm.exe will cause it to disassemble the IL and send the output to the named file instead of running its normal user interface. Now you have an IL source file, you can proceed as before: use ilasm to assemble the IL source specifying the /debug option, then start VS.NET specifying the executable as the solution.

Other Debuggers: CorDbg

Besides VS.NET, there are two other debuggers that you might wish to try out: DbgClr.exe and CorDbg.exe.

DbgClr.exe is essentially a cut-down version of the debugger that comes with VS.NET, though with a few modifications. DbgClr can be a convenient tool to use to debug IL code, but, compared to VS.NET, it has fewer features. We won't consider it further here. If you do want to try it out, you can find it in the FrameworkSDK\GuiDebug subfolder of your VS.NET installation folder.
CorDbg.exe is a command-line debugger designed to debug managed source code (either IL code or high-level language code). One thing that may make CorDbg.exe particularly interesting to advanced .NET developers is that it comes complete with its (unmanaged C++) source code, so if you are interested you can find out how it works, or use it as a basis for writing your own debuggers. You can find the source code in the Framework SDK\Tool Developers Guide subfolder of your VS.NET installation folder - or you can just run cordbg directly by typing in cordbg at the VS.NET command prompt. As far as ease of use goes, a command-line tool is never going to match the facilities offered by VS.NET, but CorDbg does have one plus: you can explicitly tell it to work in optimized mode, allowing you to debug optimized JIT-compiled code. We will use this facility in Chapter 6 to examine the optimized native assembly generated by the JIT compiler. We don't have space to go into full details of cordbg here, but I'll give you enough information to get you started.

To debug with cordbg, you need an assembly that has been compiled with the /debug option to generate a .pdb file, either from a high-level language or from IL assembly. Then you simply type in cordbg <AssemblyName>.exe at the command prompt.

 C:\AdvDotNet\ILIntro>cordbg max.exe Microsoft (R) Common Language Runtime Test Debugger Shell Version 1.0.3705.0 Copyright (C) Microsoft Corporation 1998-2001. All rights reserved. (cordbg) run max.exe Process 3976/0xf88 created. Warning: couldn't load symbols for c: \windows\microsoft.net\framework\vl.0.3705\mscorlib.dll [thread 0x5ac] Thread created. 020:   ldstr   "Input First number."

Cordbg will stop on the first instruction - notice from the above screenshot it has automatically used the .pdb file to locate the source code. Don't worry about the warning about not being able to load certain symbols. You'll always get that unless you've installed the debug version of the CLR but it won't stop you debugging your own code.

You can show the surrounding source code with the sh command, specifying how many lines around the current location you wish to see:

 C:\AdvDotNet\ILIntro>cordbg max.exe Microsoft (R) Common Language Runtime Test Debugger Shell Version 1.0.3705.0 Copyright (C) Microsoft Corporation 1998-2001. All rights reserved. (cordbg) run max.exe Process 3976/0xf88 created. Warning: couldn't load symbols for c:\windows\microsoft.net\framework\vl.0.3705\mscorlib.dll [thread Ox5ac] Thread created. (cordbg) sh 4 016:   { 017:   .maxstack 2 018:   .locals init (int32, int32) 019:   .entrypoint 020:*  ldstr   "Input First number." 021:   call    void [mscorlib]System.Console::WriteLine(string) 022:   call    string [mscorlib]System.Console::ReadLine() 023:   call    int32 [mscorlib]System.Int32::Parse(string) 024:   stloc.0

The asterisk (*) indicates the current point of execution.

You can then set a breakpoint at the appropriate line using the br command, then use the go command to continue execution to a breakpoint:

 (cordbg) br 22 Breakpoint #1 has bound to E:\IL\max.exe. #1      C:\AdvDotNet\ILIntro\max.il:22 Main+0xa(il) [active] (cordbg) go Input First number. break at #1     C:\AdvDotNet\ILIntro\max.il:22 Main+0xa(il) [active] 022:   call string [mscorlib]System.Console::ReadLine ()

You can also display the values of variables with the p command, or display the contents of machine registers with the reg command. However, we'll leave our introduction to cordbg there, since full details of the various cordbg commands can be found on MSDN.

Compile-Time Errors in IL

Debugging code is all well and good, but depends on your being able to run the program in the first place. In a high-level language, we would expect a program to run if there are no compile-time errors. But if you're used to coding in high-level languages then you'll find the types of build-time errors you get in IL are somewhat different, so it's worth spending a couple of pages discussing the categories of error you can get from IL source code.

Even in high-level languages, the categorization of build-time errors does depend to some extent on the language and environment. For example, C++ developers will be familiar with both compile and link errors. In a typical link error, the code is syntactically correct but refers to a method or class for which the correct library has not been made known to the compiler. The same type of error can occur in C# or VB.NET if you fail to reference a required assembly, but in this case this is considered a straight compile-time error, since these languages don't have separate compile and link phases (these compilers query assembly metadata for referenced types as they compile, whereas C++ simply throws in external references to be resolved later by the linker).

The types of errors that can occur in IL source code that will prevent the code from running are as follows (note that for IL code our definition of compile-time extends to JIT-compilation time):

Nature of error	Symptom	Usual Technique to Identify Error
Syntax error	ilasm.exe refuses to assemble the IL source code.	Examine the .il file for the error.
Invalid program	InvalidProgramException thrown as soon as the JIT compiler tries to compile the method that contains the error.	Run peverify on the binary assembly.
Unverifiable program	For fully trusted code there are no symptoms: the program will run fine. But if the code is being executed in certain partially trusted contexts (such as from the Internet or a network drive), the JIT compiler will throw an exception when it encounters the error. You can use peverify to check whether an assembly contains unverifiable code.	Run peverify on the binary assembly.

The table indicates that a tool called peverify is useful for helping to track down many of these errors. peverify.exe is a command-line program that we'll examine shortly. First I'll just review the three categories of errors listed above.

There is one other possible error condition that we haven't mentioned: the condition where one of the tokens in the instruction stream that refers to a method, field, or type, etc., turns out not to be a valid token that can be resolved. Because different implementations of the CLI may in principle find it more convenient to trap this error at different stages in the execution process, there are no specifications in the ECMA standard about when such errors should be caught. It may happen at JIT compilation time, or it may occur through an exception thrown when code actually tries to access that token. However, this type of error is very easy to understand, and we won't consider it further.

Syntax Errors

A syntax error is conceptually the simplest to understand. It occurs if there is something wrong with your IL source file that prevents ilasm.exe from understanding it. For example, the following code contains the non-existent instruction ldsomestr (which presumably should be ldstr):

    ldsomestr "Hello, World"   // error. Should be ldstr    call      void [mscorlib]System.Console::WriteLine(string)    ret

Syntax errors manifest themselves pretty clearly when you run ilasm, since ilasm will simply refuse to assemble the code.

Problems that will give rise to an error on assembly include (but are not limited to):

A non-existent IL instruction
Failing to provide an .EntryPoint method in an executable assembly
Some problem with the structure of the file (such as omitting an opening or closing brace)

Invalid Program

An invalid program is an assembly that contains errors in the binary IL code that prevent the JIT compiler from being able to understand the code. The C#, VB, and C++ compilers have of course been thoroughly tested, and so should never emit an assembly that contains invalid code; however, if you are using a third-party compiler, an invalid program error might arise if the compiler is buggy.

What might surprise you, however, is that it is perfectly possible for ilasm.exe to generate invalid code. The reason is that ilasm only performs the minimum of checks that it needs to perform its task - in other words, that the .il file is structurally sound and all the commands in it can individually be understood. However, the IL definition also lays down various other requirements for code to be valid- in particular, it must be possible for the JIT-compiler to determine the state of the stack at any given execution point. Not only that, but certain instructions (such as ret) require the stack to be in certain states. Checking these kinds of constraints is a non-trivial task, since it requires working through every possible path of execution of the code to see what types would be placed on the evaluation stack for each path, and ilasm.exe does not perform these checks. This means that if your code contains these kinds of error, then this will only be detected when you attempt to execute (or peverify, or ngen) the code.

The question of what constitutes an invalid program is a subtle one. We'll point out some of the ways this can happen now, but we'll postpone a full discussion of how the compiler checks that a program is valid until . For the time being, I just want to make sure you know what the different types of error are, in case you encounter them when writing IL assembly code.

The following code provides an example of a program that successfully assembles but is nevertheless invalid. This code is downloadable as the Invalid sample:

 .assembly extern mscorlib {} .assembly Invalid {} .module Invalid.exe .method static void Main() cil managed {    .entrypoint    .maxstack 2    ldc.i4   47    ldc.r8   32.4    mul    call     void [mscorlib]System.Console::WriteLine(int32)    ldc.i4   52    ret }

There are actually two problems in this code, either one of which is sufficient by itself to render the code invalid. In the first place, this code leaves a value on the evaluation stack when the method returns. In this code, the evaluation stack will be empty immediately after the call to Console.WriteLine(). We then load an integer onto it, and return. However, the IL definition requires that a method that returns void must leave an empty evaluation stack when it returns.

The other problem is that we have attempted to multiply an integer by a floating-point number. Look at the first instructions in the Main() method. We start by loading the value 47 as a four-byte integer onto the evaluation stack; then we load the value 32.4 as an eight-byte floating-point number, and we try to multiply these numbers together. Now the IL mul command is quite happy to perform either integer or floating-point arithmetic, but it must be presented with consistent data types. It cannot multiply one data type (an integer) by another data type (a float). And recall that, unlike some high-level languages, IL never performs any implicit conversions for you (other than widening or narrowing when loading one-or two-byte numeric types to or from the evaluation stack). If you want to cast data types, you must explicitly use a conv.* instruction.

If we try to run the invalid code presented above, you'll normally see the following:

click to expand

There is little point using either of the debuggers listed in the dialog since they would not be able to show you any executable code as the JIT compiler hasn't actually managed to JIT-compile any of your code! So the easiest option is to click No, in which case you'll get a message that tells you roughly where the error occurred, but doesn't give you much more information:

 Unhandled Exception: System.InvalidProgramException: Common Language Runtime detected an invalid program.   at Main()

Unfortunately the statement 'invalid program' doesn't help very much in identifying the error. If we want to gain more information about errors without resorting to debugging the code, the easiest way is to use the peverify tool, as I'll discuss soon.

Typical examples of problems that will give rise to invalid code include:

Performing an arithmetic operation on incompatible types
Code that places more items on the evaluation stack than was specified in the .maxstack directive
Leaving data of the wrong type on the evaluation stack when a method returns

Unverifiable Code

You'll no doubt be aware that the CLR imposes so-called type-safety checks, which are designed to identify code that might be able to perform dangerous operations by accessing memory outside the areas of memory specifically allocated to store that program's data. Unverifiable code is code that fails the CLR's type-safety checks. We will discuss type safety and the verification algorithms in more detail in Chapter 3, but for now we'll just point out that a program will fail type safety if it contains certain potentially dangerous IL instructions or sequences of IL instructions.

Typical examples of issues that will cause your code to fail verifiability include:

Attempting to pass the wrong types to a method call
Using certain IL opcodes that are formally regarded by the CLR as unverifiable, or which are regarded as unverifiable in certain conditions, when those conditions are true
Any instruction that treats an item on the evaluation stack as if it were the wrong type
Any code that de-references unmanaged pointers

Obviously, I've made sure that the IL samples I've written for this book (other than the Invalid sample) are verifiable. The VB compiler will always generate verifiable code, while the C# compiler will always generate verifiable code unless you have declared any C# unsafe blocks. The C++ compiler at the time of writing cannot generate verifiable code. Code written in C++ will always fail verifiability.

The interesting thing is that, although failing type safety is formally considered an error, the JIT compiler is still able in principle to convert the IL to native executable, and the code will still execute provided it has the security permission SkipVerification. That is why the table I presented earlier indicated that there might not be any symptoms for unverifiable code. Indeed, there may on occasions be reasons why you want your code to be unverifiable (for example if some operation can be performed more efficiently than would have been the case using only verifiable code). Precisely which code has the SkipVerification permission will depend on your security policy. By default when you install .NET, assemblies run from your local machine do have this permission, but assemblies run from the Internet or a network share do not. We'll examine those issues in more detail in Chapter 12.

If you do attempt to execute code that does not have SkipVerification permission, and which fails verifiability, then you'll see something like this:

 Unhandled Exception: System.Security.VerificationException: Operation could destabilize the runtime.   at Main()

Using Peverify

peverify has been described as the compiler-writer's best friend. It is a tool that examines an assembly and will report any problems that would cause the code either to be invalid or to fail verifiability. The great thing about it is that it will report all such errors - it doesn't terminate after it hits the first error, unlike the JIT compiler. The two main uses of peverify are to check code that you've written in IL and assembled using ilasm.exe, and for developers writing compilers, who can use peverify to check that their compiler is emitting verifiable code.

It's very simple to run peverify - you simply type peverify followed by the name of the assembly to be checked. If we run peverify on the invalid.exe sample, this is the result:

 C:\AdvDotNet\ILIntro>peverify invalid.exe Microsoft (R) .NET Framework PE Verifier  Version 1.0.3705.0 Copyright (C) Microsoft Corporation 1998-2001. All rights reserved. [IL]: Error: [C:\AdvDotNet\ILIntro\invalid.exe : <Module>::Main] [offset 0x0000000E] [opcode mul] Int32 Double Non-compatible types on the stack. [IL]: Error: [C:\AdvDotNet\ILIntro\invalid.exe : <Module>::Main] [offset 0x00000019] [opcode ret] Stack must be empty on return from a void function. 2 Errors Verifying invalid.exe

If you write your own IL source code, it's a good idea to always run peverify on the generated assembly.