Intermediate Language (IL) | Professional .NET Framework 2.0 (Programmer to Programmer)

Common Intermediate Language (a.k.a. CIL, or more commonly just IL) is the lingua franca of the CLR. Managed programs and libraries are comprised of metadata whose job it is to (1) describe the physical and logical CTS components and abstractions of your code, and (2) represent the code that comprises those components in a way that the CLR may inspect and deeply understand, the latter of which utilizes IL. IL is an assembly-like language into which all managed languages are compiled. For example, the compilers for C#, VB, and C++/CLIs transform the source language into metadata, which contains IL instructions to represent the executable portion.

The CLR understands this metadata and IL, and uses the information contained within it to load and run the program's code. Execution does not happen by interpreting the IL at runtime; rather, it occurs by compiling the IL into native code and executing that instead. By default, this compilation occurs lazily at runtime when the code is needed, hence the name Just in Time (JIT); alternatively, you can generate native images ahead of time using a technology called NGen, saving the initial cost of jitting the code and enabling static optimization of the layout of native images. We will see a bit more on these technologies shortly and in detail in Chapter 4.

Example IL: "Hello, World!"

To illustrate how metadata and IL represent a managed program written in C#, consider a very small snippet of C# code:

 using System; class Program {     static void Main()     {         Console.WriteLine("Hello, World!");     } }

This is a canonical "Hello, World!" example, which, when run, simply prints the text Hello, World! to the console standard output. The C# compiler compiles the above program into the binary form of the following textual IL:

 .assembly HelloWorld {} .assembly extern mscorlib {} .class Program extends [mscorlib]System.Object {     .method static void Main() cil managed     {         .entrypoint         .maxstack 1         ldstr "Hello, World!"         call void [mscorlib]System.Console::WriteLine(string)         ret     } }

The textual IL format used here is an easier-to-read representation of the actual binary format in which compiler-emitted code is stored. Of course, the C# compiler emits and the CLR executes binary-formatted IL. We will use the text format in this chapter for illustrative purposes. Seldom is it interesting to stop to consider the binary layout of assemblies, although we will note interesting aspects as appropriate.

Deconstructing the Example

There's a bit of information in the sample IL shown above. There are two kinds of .assembly directives — one to define the target of compilation, the HelloWorld assembly, and the other to indicate that our program depends on the external library mscorlib. mscorlib defines all of the core data types that nearly all managed programs depend on; we will assume a basic familiarity with it throughout the book, but detail these data types in Chapter 5.

There are also .class and.method directives in the IL, whose job it is to identify the CTS abstractions in our program; if we had created any interfaces, fields, properties, or other types of abstractions, you'd see those in their respective locations, too. These bits of metadata describe to the CLR the structure of and operations exposed by data types, and are used by compilers to determine what legal programs they can create at compile time.

Inside the .method directive, you will find a few additional directives, for example an .entrypoint, indicating the CLR loader should begin execution with this method (when dealing with an EXE), and .maxstack, indicating the maximum number of items that the evaluation stack will ever contain during execution of this function. Each directive can have a number of arguments and pseudo-custom attributes (keywords) associated with them. But they are not actual IL instructions representing code; they are bits of metadata representing the components which compose our program.

Everything else inside the .method directive's block is the actual IL implementing our program's executable behavior. The method body consists of three statements, each of which involves an instruction (sometimes called an opcode), for example ldstr, call, and ret, consume and produce some state from the execution stack, and can optionally take a set of input arguments. Each instruction differs in the stack state and arguments with which it works, and the side effects (if any) that result from its execution. When run, the CLR will compile this code into its jitted native code counterpart, and execute that. Any references to other DLLs will result in loading them and jitting those as needed.

Assembling and Disassembling IL

The ilasm.exe and ildasm.exe tools that ship with the Framework (ilasm.exe in the redist, ildasm.exe in the SDK) enable you to compile textual IL into its binary .NET assembly format and disassemble .NET assemblies into textual IL, respectively. They are indispensable tools for understanding the inner workings of the CLR and are great companions to this chapter.

Stack-Based Abstract Machine

IL is stack-based. Programs in the language work by pushing and popping operands onto and off the stack; each instruction is defined by the way it transforms the contents of the stack. Many instructions involve side effects and may take additional arguments, too, but the conventional way to communicate into and out of an instruction is via the stack. This is in contrast to many physical machines whose execution relies on a combination of register and stack manipulation. The stack is sometimes further qualified by calling it the logical execution stack in order to differentiate it from the physical stack, a segment of memory managed by the OS and used for method calling.

Example IL: The Add Instruction

To illustrate the stack-based nature of IL, consider the following example. It uses the IL add instruction. add pops two numbers off the stack and pushes a single number back on top, representing the result of adding the two popped numbers. We often describe instructions by the transformations they perform on the current stack state, called a stack transition diagram. This highlights the stack state the instruction expects and what state it leaves behind after execution.

add's stack transition is: ..., op1, op2 ‡ ..., sum. To the left of the arrow is the state of the evaluation stack prior to executing the instruction — in this case, two numbers op1 and op2 — and to the right is the state after execution — in this case, the number sum. The top of the stack is the rightmost value on either side of the arrow, and the ...s indicates that zero or more stack states can already exist, that it is uninteresting to the instruction, and that the instruction leaves such state unchanged.

The following IL might be emitted by a high-level compiler for the statement 3 + 5, for example:

 ldc.i4 3 ldc.i4 5 add

This sequence of IL starts off by loading two integers, 3 and 5, onto the stack (using the ldc instruction, more on that later). The add instruction then is invoked; internally, it pops 3 and 5 off the stack, adds them, and then leaves behind the result, 8. This transformation can be written as ..., 3, 5 ‡ ..., 8, and is graphically depicted in Figure 3-1. Usually a real program would do some loading of fields or calling methods to obtain the numeric values, followed by a store to some memory location, such as another local or field variable.

image from book
Figure 3-1: Example of stack-based addition (3 + 5).

While the execution stack is a nice abstraction, the JIT compiled native code is more efficient. If it knows the address of a value — for example, if it's relative to the current stack pointer — there will be as little copying of the arguments as possible. For instance, the add IL instruction will likely move the values into the respective registers and perform an add instruction in the underlying physical machine's instruction set. It might omit a store to another location if it uses the value right away and knows it won't need it later on. The final result differs based on the instruction and undocumented implementation details, of course, but the high-level point is that the stack is a logical representation, not the physical representation. The JIT manages physical representation for you.

Some instructions take arguments in addition to reading values on the stack. For example, many of the constants require that you pass an argument representing the literal constant to load onto the stack. Similarly, many instructions deal with integer constants, which represent metadata tokens. The call instruction is a perfect example. Notice in the "Hello, World" example above, we passed the method reference (methodref) to void [mscorlib]System.Console::WriteLine(string), which actually compiles into an integer token in the binary format. The instruction uses this information, but it does not get pushed and popped off of the stack — it is passed directly as an argument.

Register-Based Machines

Stack-based machines are much easier for programmers to reason about than register-based machines. They enable the programmer and compiler writer to think at a higher level and in terms of a single storage abstraction. Each instruction can be defined simply by the arguments and stack operands it consumes, and the output that is left on the stack afterward. Less context is required to rationalize the state of the world.

Register-based machines, on the other hand, are often more complex. This is primarily due to the plethora of implementation details that must be considered when emitting code, much like explicit memory management in many C-based languages. For illustration purposes, consider a few such complications:

There is only a finite number of registers on a machine, which means that code must assign and manage them intelligently. Management of registers is a topic whose study can span an entire advanced undergraduate Computer Science course and is not easy to do correctly. For example, if we run out of registers, we might have to use the machine's stack (if one exists and if we've reserved enough space) or write back to main memory. Conversely, the logical stack is infinite and is managed by the CLR (even the case when we run out of stack space).
Instructions for register-based machines often use different registers for input and output. One instruction might read from R0 and R1 and store its result back in R0, whereas another might read from R0 …R3, yet store its result in R4, for example, leaving the input registers unchanged. The intricacies must be understood well by compiler authors and are subject to differ between physical machines. All IL instructions, on the other hand, always work with the top n elements of the stack and modify it in some well-defined way.
Different processors offer more or less registers than others and have subtle semantic and structural differences in the instruction set. Using as many registers as possible at any given moment is paramount to achieving efficiency. Unfortunately, if managed compilers generated code that tries to intelligently manage registers, it would complicate the CLR's capability to optimize for the target machine. And it's highly unlikely compiler authors would do better than the JIT Compiler does today.

With that said, a simple fact of life is that most target machines do use registers. The JIT Compiler takes care of optimizing and managing the use of these registers, using a combination of register and the machine's stack to store and share items on the logical IL stack. Abstracting away this problem through the use of a stack enables the CLR to more efficiently manage and optimize storage.

Binary Instruction Size

Most instructions take up 1 byte worth of space in binary IL. Some instructions take up 2 bytes due to exhaustion of all 128 possible single-byte encodings in the set, however, indicated by a special leading marker byte 0xFE. Even more instructions take arguments serialized in the instruction stream in addition to the inputs on the stack, consuming even more space. This topic is mostly uninteresting to managed code developers but can be useful for compiler authors.

As an example, br is encoded as the single-byte number 38 followed by a target 4-byte jump offset. Thus, a single br instruction will take up 5 (1 + 4) bytes total in the IL body. To combat code bloat, many instructions offer short form variants to save on space; this is particularly true with instructions whose ordinary range of input is smaller than the maximum it can accept. For example, br.s is a variant of br that takes a single-byte target instead of a 4-byte one, meaning that it can squeeze into 2 bytes of total space. Nearly all branches are to sections of code close to the jump itself, meaning that br.s can be used for most scenarios. All good compilers optimize this usage when possible.

Consider the add sequence we saw above. As shown above (using ldc.i4 with arguments), it would consume 11 bytes to encode the IL. That's because the generic ldc.i4 instruction consumes 1 byte of space, plus 4 additional bytes for the argument (meaning 5 bytes each). However, an intelligent compiler can optimize this sequence using the shorthand ldc.i4.3 and ldc.i4.5 instructions. Each consumes only a single byte, takes no argument, and compresses the total IL stream to only 3 bytes. The resulting program looks as follows:

 ldc.i4.3 ldc.i4.5 add

The IL byte encoding for both is shown in Figure 3-2.

image from book
Figure 3-2: IL stream representation of add programs.

A Word on Type Tracking

IL is a typed language. Each instruction consumes and produces state of well-defined types, often dependent on the values of its arguments (e.g., those that accept a method or type token). In isolation, an instruction might not have a type, but when combined with legal sequences of IL, it does. The verifier is responsible for tracking such things to prove type safety (or the absence thereof). It will detect and report any violations of the type system rules.

peverify.exe is a useful tool in the .NET Framework SDK that permits you to inspect the verifiability of an assembly's code. Running it against a managed assembly will report violations, along with the guilty line numbers, of any of the CLR's type safety rules. This utility is a compiler writer's best friend. If you're a user of somebody else's compiler, you can use this to determine whether a program crash is due to a compiler bug (or you intentionally stepping outside of the bounds of the CLR's type system, for example using C++/CLI). Verifiability and the general notion of type safety were both discussed in more detail in Chapter 2.

Exploring the Instruction Set

There are over 200 IL instructions available. Most managed code developers can get by without deep knowledge of IL, still becoming quite productive in a higher-level language such as C# or VB. But a firm understanding of some of the most important instructions can prove instrumental in understanding how the CLR operates. With this rationale in mind, this section will walk through some important categories of instructions. Feel free to skim through it. My recommendation, of course, is to try to understand the details of most, as it will help you to understand how the platform is executing your code. Because of the large number of instructions, some will intentionally be omitted here; please refer to Appendix A for a complete guide to the entire instruction set.

Loading and Storing Values

To perform any interesting operation, we first have to get some data onto the stack. There are a few pieces of data you might want to load, including constant values, data local to a method activation frame such as locals and arguments, fields stored in your program's objects, and various bits of metadata held by the runtime. The reverse is also important. That is, once you've manipulated the data on the stack, you will often want to store it somewhere so that it can be accessed later on. This section explores the mechanisms the CLR provides to do both.

It's also worth considering for a moment what it means to load and store something onto or from the stack. A core difference between reference and value types is the way values are copied around as opaque bytes instead of object references to the GC heap. Loading a value onto the stack results in a bitwise copy of the value's contents; this means that if you are loading from a local slot, a field reference, or some other location, any updates to the structure are not visible unless the original location is updated with them. Objects, however, are always accessed through a reference. So, for example, if an object is modified through a reference loaded onto the stack, the reference itself needn't be saved back to its original location; all accesses occurred via a reference to the shared object on the heap.

As an example, consider this code:

 ldarg.0 // load the 'this' pointer ldfld instance int32 MyType::myField ldstr "Some string object"

Loading an object's field of type int32 and a string literal will result in two very different things on the stack: a sequence of 32-bits representing the value of the integer, and a sequence of 32- or 64-bits (depending on whether you're on a 32-bit or 64-bit machine) representing the value of the object's reference, or 0 to represent null. If the value type itself had updatable fields, it would have to be stored back into myField after these updates; conversely, if an object was used instead, it would not need to be restored because the updates occurred through the reference to the shared object.

Seldom do you need to think about IL in this low level; your compiler does it all for you. The C# compiler, for example, ensures that the address of the value is used for modifications when possible, for example using the ld*a instructions. But understanding this point will help to solidify your understanding of reference and value types.

Constants

We've already seen a couple instances of constant usage. The "Hello World" example above loaded a literal string using the ldstr instruction, and the addition example loaded a 4-byte integer using two variants of the ldc instruction. For obvious reasons, the discussion of constants involves only loads and no corresponding store instruction(s). The various types and their interesting members mentioned only in passing here, such as String, Int32, Double, and so forth, are described in Chapter 5.

Strings are simple data structures representing self-describing sequences of characters. ldstr loads a reference to one onto the stack. It takes an argument representing the string to load onto the stack in the form of a metadata token into the assembly's string table, which must be generated by the compiler and is composed of all of the unique strings in a binary. Executing ldstr doesn't modify any prior stack state and simply pushes a reference to the heap-allocated String object onto the existing stack. It allocates memory as necessary. Because of string interning, two ldstrs in the same program that use strings with identical characters will be shared. For example, this code:

 ldarg "Some random string" ldarg "Some random string" ceq

will evaluate to true, indicating that both strings are equal. Even if the ldarg takes place in entirely different components of the program, they will be identical.

Similarly, the ldc instruction loads a numeric constant onto the stack and offers a number of variants based on the data type. The table below shows each. The convenient shorthand instructions exist for common constants, helping to make the size of programs smaller. We already saw an example above of how this can be used to reduce IL footprint.

Open table as spreadsheet

Instruction	Argument	Description
ldc.i4	num (int32)	Pushes the num argument onto the stack as a 4-byte integer, that is int32.
ldc.i4.s	num (unsigned int8)	Pushes the num argument (which a single byte) onto the stack as an int32. This shortens the number of bytes an ldc.i4 instruction and its argument consumes.
ldc.i4.0…ldc.i4.8	n/a	Pushes num in ldc.i4.num onto the stack as an int32. There is a version offered for numbers 0 through 8.
ldc.i4.m1	n/a	Pushes -1 onto the stack as an int32.
ldc.i8	num (int64)	Pushes the num argument onto the stack as an 8-byte integer, that is, int64.
ldc.r4	num (float32)	Pushes the num argument onto the stack as a CLR floating point value.
ldc.r8	num (float64)	Pushes the num argument onto the stack as a CLR floating point value.

Lastly, the null constant can be used in place of object references, an instance of which can be loaded onto the stack using the ldnull instruction.

Arguments and Locals

When a function is called, an activation frame is logically constructed that contains all arguments supplied for the function's formal parameters in addition to slots for all local data allocations. This frame is allocated by the JIT compiled code on the OS stack. Rather than referencing offsets into the physical stack in order to access arguments and locals — which is precisely what the native code generated must do — we simply refer to individual items by their 0-based sequence number.

Much like the ldc instruction discussed above, both the ldarg and ldloc instruction have shorthand variants. They each have an ordinary ldarg or ldloc, which accepts an unsigned int16 as an argument representing the index of the item to load. Similarly, they have ldarg.s and ldloc.s versions, each of which takes a single byte unsigned integer (instead of 2-byte), saving some space in cases where the index is less than 255, which is highly likely. Lastly, each has a shorter ldarg.num and ldloc.num version, where num can be from 0 to 3, avoiding the need to pass an argument altogether. Note that for instance methods, the ldarg.0 instruction will load the this pointer — a reference to the target of the invocation.

Of course, both ldarg and ldloc have counterparts that store some state on the stack into a storage location: starg and stloc. They have similar shortcuts to the load instructions, that is, starg.num and stloc.num, where num, once again, is an integer from 0 to 3. These instructions pop off the top of the stack and store it in a target location. For this operation to be verifiable, clearly the top of the stack must be of a type compatible with the storage destination.

The CLR supports so-called byrefs. A byref enables you to pass the address of data in an activation frame — either an argument or local — to other functions. This ability is supported only so long as the original frame in which they reside is still active when they are accessed. To share data that survives a frame, you must use the GC heap. To support the byref feature, there exist ldarga, ldarga.s, ldloca, and ldloca.s instructions. Each takes an index argument (the .s versions take only a single byte) and will push a managed pointer to the desired item in the activation frame. Refer to details later in this section for storing to an address through a managed pointer and coverage in Chapter 2 of the method passing mechanisms supported in the CTS.

Fields

Accessing the fields of objects or values is a common operation. For this, there are two instructions: ldfld and ldsfld. The former is used for instance fields and thus expects a target on the top of the stack in the form of an object reference, value, or a pointer; the latter is for static fields, and therefore does not need a target. (Technically, you can pass a null as the object reference to ldfld and use it to access a static method, but ldsfld was developed to avoid the additional ldnull instruction.) Each takes a field token defining which field we are accessing. The result of executing the instruction is that the value of the field remains on the top of the stack. In other words, ldfld's stack transition is ..., target ‡ ..., value, and ldsfld's is ... ‡ ..., value.

You can also use the stfld and stsfld to store values on the top of the stack into instance and static fields. stfld expects two things on the stack — the target object and the value — and just like the load version accepts a single argument, a field token. Its stack transition is ..., target, value ‡ .... stsfld only expects a single item on the stack — the new value to store — and similarly takes a token as input. Its stack transition is ..., value ‡ ....

These instructions take into account the accessibility of the target field, so attempting to access a private field from another class, for example, will result in a FieldAccessException. Similarly, if you attempt to access a nonexistent field, a MissingFieldException will be thrown.

Lastly, much like the ability to load an address for local variables described above, you can load the address to a field. ldflda works much like ldfld does, except that instead of loading the value of the field, it loads the address to the field in the form of a managed or native pointer (depending on the type the field refers to). A static-field instruction is also provided, named ldsflda.

Indirect Loads and Stores

We've seen a few ways to load pointers to data rather than the data itself. For example, ldarga can be used to load an argument's address for byref scenarios, and ldflda can be used to refer to an object's field. What if you wanted to use that address to access or manipulate a data structure's contents? The ldind and stind instructions do precisely that, standing for "load indirect" and "store indirect," respectively. They expect a managed or native pointer onto the stack that refers to the target, and dereference it in order to perform the load or store.

ldind and stind both have subtle variations depending on the type of data being accessed. Verifiability ensures that you only use the correct variants when accessing specific types of components in the CLR. These variations are specified using a .<type> after the instruction, that is, ldind.<type> and stind.<type>, where <type> indicates the type of the data and is one of the following values: i1 (int8), i2 (int16), i4 (int32), i8 (int64), r4 (float32), r8 (float64), i (native int), ref (object reference). ldind also permits the values u1 (unsigned int8), u2 (unsigned int16), u4 (unsigned int32), u8 (unsigned int64); stind performs the necessary coercions to the value on the stack to store in an unsigned target.

Basic Operations

Some basic operations are provided that all modern instruction sets must provide. These include arithmetic, bitwise, and comparison operations. Because of their simplicity, general purposefulness, and elementary nature, we'll only mention them in passing:

Arithmetic: Addition (add), subtraction (sub), multiplication (mul), division (div). There are also various overflow and unsigned variants of these instructions. Each pops the two top items off the stack, performs the arithmetic operation on them, and then pushes the result back onto the stack. The remainder (rem) instruction computes the remainder resulting from the division of items on the top of the stack, also called modulus (e.g., % in C#). These instructions work with both integral and floating point values. The neg instruction pops a single number off the stack, and pushes back the inverse of it.
Bitwise operations: Binary and, or, xor and unary not. There are also shift operations for shifting left (shl) and right (with sign propagation [shr] and without [shr.un]).
Comparisons: Compare equal (ceq), compare greater than (cgt), compare less than (clt). Each of these pop two items off the top of the stack, and leave behind either 1 or 0 to indicate that the condition was true or false, respectively. You'll see shortly that the various branch instructions offer convenient ways to perform things like greater than or equal to checks.

Please refer to Appendix A for more complete coverage of these instructions.

Control Flow and Labels

All control flow in IL utilizes branch instructions, of which several variants exist. Each branch instruction takes a destination argument indicated by a signed offset from the instruction following the branch. Each branch-style instruction has an ordinary version that takes a 4-byte signed integer for its destination, and also a short version (suffixed by .s) that takes only a single-byte signed integer. In most cases, the branch uses a predicate based on the top of the stack to determine whether the branch occurs or not. If it doesn't occur, control falls through the instruction.

The simplest of all of these is an unconditional branch, represented by the br instruction. For example, an infinite while loop might look like either of these programs (each line is a separate program):

 br.s -2         // offset version LOOP: br.s LOOP // label version

Because the branch target is calculated from the point immediately after the branch instruction, we jump backward 2 bytes (br.s -2 takes up 2 bytes). If it were a br instead of br.s, it would be 5 bytes. And then the CLR executes the br.s instruction again, ad infinitum.

Labels are often used in textual IL to make calculation of offsets like this easier. In reality, they are just an easier-to-work-with notation which compilers patch up to an offset in the resulting binary IL. For example, ilasm.exe will transform any references to labels into binary offsets in the resulting code. The second line is an example of a label version of this loop, using LOOP as the label we jump to; its binary encoding is identical to the first line.

There are also brtrue (or its alias brinst) and brfalse (or one of its aliases, brnull, brzero) instructions, which take a single value on the top of the stack and jump if it is true or false, respectively. These can be used to implement a C# if statement, for example:

 Foo f = /*...*/; if (foo.SomeMethod()) {     // Body of code if true. } else {     // Body of code if false. } // Code after if stmt.

Using labels, this could be compiled to the following IL:

 ldloc.0 // Assume 'f' is stored as a local in slot #0. call instance bool Foo::SomeMethod() brfalse FALSE // Body of code if true. br.s AFTER FALSE: // Body of code if false. AFTER: // Code after if stmt.

The remaining branch instructions are all very similar. They each take two items off the top of the stack, compare them in some fashion, and then branch to the target. Their stack transition is ..., value1, value2 ‡ ..., and they are branch on equal (beq), branch on not equal or unordered (bne.un), branch on greater than (bgt and bgt.un), branch on greater than or equal to (bge and bge.un), branch on less than (blt and blt.un), and branch on less than or equal to (ble and ble.un). Each of these has a short version, that is, br.s, brfalse.s, beq.s, and so forth, which can be used when the target of the jump is within 255 bytes or less, and as expected consumes less space.

Allocating and Initializing

In order for instances of reference and value types to be used from your program, they must first be allocated and initialized. The process differs depending on which type you are dealing with. Reference types are always initialized using the newobj instruction, which ends up invoking one of its class's constructors to initialize state. Value types, on the other hand, can use the initobj instruction instead, which zeroes out its state and avoids any constructor invocation overhead.

newobj takes an argument representing the constructor method token to invoke. It also expects n items on the stack, where n is the number of parameters the target constructor expects. In other words, the stack transition diagram is ..., arg1, ..., argN ‡ ..., obj. It will allocate a new instance of the target type, zero out its state, invoke the constructor, passing the new instance as the this pointer and the constructor arguments from the stack, and then push the newly initialized instance onto the stack. In the case of value types, the bits are copied onto the stack, while reference types result in a managed reference to the object on the GC heap. This example snippet of code constructs a new System.Exception object:

 ldstr "A catastrophic failure has occurred!" newobj instance void [mscorlib]System.Exception::.ctor(string) // Right here we have a new Exception object on the stack.

initobj is useful for constructing new value types without invoking a constructor. It can also be used to set a location containing a reference type pointer to null, although the former is a much more common use. initobj expects a pointer on the top of the stack that refers to the destination to be initialized and takes a type metadata token as an argument representing the target's type.

Boxing and Unboxing

Values are sequences of bits composing state. They lack self-description information — that is, a method table pointer — which objects on the heap make available. This has advantages, namely that values have less overhead. But there are clear disadvantages; for example, often we'd like to either pass values around to methods that expect System.Objects or perhaps to make an invocation on a method inherited from Object or ValueType. To do that on the CLR, you need something whose structure has a method-table pointer as the first DWORD, as explained in Chapter 2.

Boxing a value with the box instruction allocates a new data structure on the GC heap to hold the value's data, copies the bits from the stack to that, and leaves a reference to it behind. This data structure also has a method-table pointer, meaning that it can then be used as described elsewhere. box expects a value on the top of the stack and takes a type token argument representing the type of the value. Its stack transition is ..., value ‡ ..., obj.

Unboxing with the unbox instruction does the reverse, that is, it will copy boxed data into an unboxed storage location. There are two variants of the unbox operation: unbox and unbox.any, the latter of which has been added in 2.0 and is used by C# exclusively over the other. unbox leaves behind a pointer to the unboxed data structure, usually computed simply as an interior pointer to the boxed value on the heap, which can then be accessed indirectly, for example using ldind. unbox.any, on the other hand, copies the actual value found inside the boxed instance to the stack and can be used against reference types (necessary when dealing with generics), which equates to just loading a reference to the object.

There is an additional facet to the above description. A new feature in 2.0 called a nullable value type enables the wrapping of any value in a Nullable<T> data structure. The result gives ordinary values null semantics. Compilers — such as C# — treat instances of Nullable<T>'s in a way that permits programmers to realize null semantics, for example, when comparing an instance for nullability. When something is boxed, however, its runtime type becomes opaque. Thus, boxing a Nullable<T> that represents null (i.e., HasValue == false) results in a null reference. Otherwise, a boxed T is left behind. The converse is also true: a null reference or boxed T may be unboxed into a Nullable<T>.

Calling and Returning from Methods

Calling a method is achieved using one of a few instructions: call, callvirt, and calli. Each one has its own distinct purpose and semantics. call and callvirt are used to make direct method invocations against a target method, using either static or virtual dispatch, respectively. calli is used to make a method call through a function pointer, hence its name "call indirect."

Both call and callvirt are supplied a method metadata token as an argument and expect to see a full set of method call arguments on the top of the stack in left-to-right order. In other words, they have a transition diagram much like newobj, i.e. ..., arg1, ..., argN ‡ ..., retval. The number of arguments popped off depends on the method metadata token supplied as an argument to the instruction. The first item pushed onto the stack must be the object that is the target of the invocation for instance (hasthis) methods, which is then accessed with the ldarg.0 instruction from within the target method's body. Static methods instead use the 0th argument as their first real argument. And the retval result pushed onto the stack can be absent in cases where a method with a void return type has been called. Of course, to be verifiable, all arguments passed to the method must be polymorphically compatible with the expected parameter types.

Note

Notice that arguments to a method are pushed in left-to-right order. Anybody familiar with the Visual C++ and Win32 calling conventions C(_cdecl), stdcall, fastcall, and thiscall will notice that this is the exact opposite ordering. These conventions use right-to-left ordering on the stack. Thankfully the JIT is responsible for reordering items, by placing them into the correct registers (some calling conventions pass some arguments in registers) or by pushing them onto the physical stack in the correct order.

The previous description was instruction agnostic. That is, it didn't differentiate between ordinary and virtual calls. The only difference is that callvirt performs a virtual method dispatch, which uses the runtime type of the this pointer to select the most-derived override. We described the process of selecting the proper overload in Chapter 2.

Indirect Calls

The calli instruction stands for "call indirect," and can be used to call a method through a function pointer. This pointer might have been obtained using a ldftn or ldvirtftn instruction, both of which accept a method metadata token and return a pointer to its code, through native code interop, or perhaps from a constructed delegate, for example.

The very first thing on the top of the stack must be the pointer to the method to invoke. Like the other call instructions, calli expects the this pointer for the method invocation to be first on the stack for instance methods, and requires that the arguments laid out in left-to-right order follow. To ensure type safety, a call-site description token must be passed as an argument, which the runtime uses to ensure that the items on the stack match, although it can't ensure at runtime that the target is actually expecting these items. If you mismatch the pointer and description, a failure will occur at runtime (hopefully, unless you end up accidentally corrupting some memory instead).

Returning from a Method

Inside of a method's implementation, a return instruction ret must always be present to exit back to the caller. It takes a single argument on the top of the stack that is returned to the caller. A ret is required even if the return type of the method is void, although no return value is pushed onto the stack prior to calling it. In all cases — after popping the return value in the case of non-void return types — the stack must be empty. Producing IL that contains stack state after a return indicates a compiler bug; as a user of one of those languages, you rarely need to worry about such things, although peverify.exe can be useful for diagnosing the error.

Tail Calls

A tail call is a commonly used term in functional languages (e.g., LISP, ML, Haskell), where recursion is usually preferred rather than iteration (as in Algol-derived languages, e.g., C, C++, C#). Recursion is simply a way to make repeated invocations to the same method, using modified values each time and a base case to terminate the recursive call chain. This piece of C# code demonstrates the difference:

 /* Iterative */ void f(int n) {     for (int i = n; i > 0; i--)         Console.WriteLine(i); } /* Recursive */ void f(int n) {     if (n == 0) return;     Console.WriteLine(n);     f(n - 1); }

Each prints a descending sequence of numbers, although in different manners; the iterative version uses a for loop, while the recursive version calls itself and terminates when n == 0. In languages where working with functions is more natural than introducing somewhat awkward C-style loop structures, this technique is very commonplace.

Another example of a recursive algorithm might make this clearer. Writing a factorial computation is often taught using the following algorithm in C# as well as functional languages:

 int fact(int n) {    return fact(n, 1); } int fact(int n, int v) {     if (n > 0)         return fact(n - 1, n * v);     else         return v; }

One problem with recursion, as you might have noticed, is that the call stack is continuing to grow with every new function call — keeping around any temporary data on each stack frame — versus iteration which runs in constant stack space. This is simply a byproduct of the way calls to functions are made, not necessarily a result of inherent properties of the algorithm. But this means that the fact function, as written above, will run out of stack space when supplied with large values of n.

Tail calls enable recursive code to run in constant space although the call stack is logically growing. When the stack is empty immediately after the function call or when the only value is used as the caller's own return value (for non-void return types), a tail call can be made. The fact function above satisfies these criteria. A tail call is indicated by the tail. prefix in IL; if a tail. is found just prior to a call, callvirt, or calli the CLR can reuse the current stack frame, overwriting the arguments just before making the call. This can be much more efficient in examples like those above but is usually a compiler-specific feature — seldom will you worry about it in user code.

Interestingly, C# does not implement tail calls; iteration is more natural for its users, and therefore the compiler writers haven't made supporting them a priority. Most functional language compilers, such as F#, do, however.

Constrained Calls

A topic we skirted above is how both ordinary and virtual method calls occur when the target is an unboxed value. We now know that value type instances are simply a sequence of bits that we interpret a certain way. Unlike objects on the heap, there is no easily accessible, self-describing method table. This makes resolving virtual methods based on type identity impossible. And furthermore, passing a value type as the this pointer to a method defined on System.Object won't result in the correct behavior, because it is expecting a reference to an object which has a type identity structure.

This has the consequence of requiring the boxing of value types in order to make calls to methods defined on Object, ValueType, or Enum, and to make any virtual calls. If a compiler knows the type of the method, this is easy to do; it just has to know the special rules, and it will insert the box instructions at the necessary locations in the IL. But in the case of generics, the compiler might not know the type when it generates the IL. For example, consider this C# example:

 static string GetToString<T>(T input) {     return input.ToString(); }

How does the compiler know whether to box input prior to emitting a callvirt to System.Object's virtual method ToString? It doesn't until T has been supplied, which isn't known when C# compiles the above code. Thus was born the constrained. prefix. It takes care of the relevant details. A constrained call essentially does the following:

If the target of the constrained call is a reference type, simply call it using the this pointer passed in to the call.
If the target of the constrained call is a value, but the value type has defined its own version of (has overridden) the method, simply call it on the value without boxing. If the method calls its base version, it will have to box it.
Else, we have a value type with an implementation on either Object, ValueType, or Enum. The CLR boxes the value and makes the call with that in hand.

So in the example above, the compiler can simply emit the constrained. prefix just prior to the callvirt to ToString. This may or may not result in the boxing of the input based on the type of T.

Nonvirtual Calls to Virtual Methods

As we saw above, there are two primary instructions for making direct method calls: call and callvirt. It actually is possible to make a call to a virtual method without using the callvirt method. This might be surprising, but consider a few examples. First, in an overridden method in a subclass, developers often will want to call the base class's implementation; this is done using the base keyword in C#, and is compiled as a call to the base class's method. But it's virtual! Clearly emitting a callvirt to the base class would be incorrect, leading to an infinite loop.

You will also see code like this in C++ rather frequently:

 using namespace System; ref class A { public:     virtual void f() { Console::WriteLine("A::f"); } }; ref class B : public A { public:      virtual void f() override { Console::WriteLine("B::f"); } }; int main() {     B b;     b.f();     b.A::f(); }

That last line b.A::f() looks a little strange, but it uses the scoping operator <type>:: to bypass normal dynamic virtual dispatch, and instead make a direct call to A's implementation of f. If compiled on the CLR, this too is implemented as a call to a virtual method.

Unfortunately, some authors of class libraries implicitly rely on security through inheritance. That is, they assume that just because they've overridden a base class's method, that the only way to call that method on an instance of their class is with a virtual call. This enables them to check invariants, perform security checks, and carry out any other validation before allowing the call to occur. To preserve this (to some extent), a change was made in 2.0 of the CLR to make nonvirtual calls to some virtual methods fail verification. This means that untrusted code cannot use the idiom shown above, but fully trusted code can.

Notice that I said "some virtual methods" in the preceding paragraph. The CLR still permits the ordinary "call to base" pattern, as C# and other languages use it quite extensively. What 2.0 now prohibits is nonvirtual calls to virtual methods on types entirely outside of the caller's type hierarchy. The verifier implements this by ensuring that the caller and callee are equivalent. In the C++ example above, it would now fail verification because main is not defined on class A or B.

Type Identity

There are two related instructions that perform a runtime type identity check: castclass and isinst. They are used to inspect the runtime type identity of an object on the top of the stack using its method table. Values must be boxed prior to passing them through these instructions.

castclass doesn't modify the item on the top of the stack at all. It simply takes a metadata type token, and checks that the item is compatible with this type. If the check succeeds, the type tracking for the IL stream is patched up so that the item can be treated as an instance of the checked type; otherwise, an InvalidCastException is generated by the runtime. Compatible in this case means the instance is of an identical type or a derived type; similarly, if the runtime type is B[] and the type token is A[], and if B can be cast to A, this check succeeds; lastly, if the runtime type is T and the type token is Nullable<T>, the check also succeeds. If the type token is a reference type and the instance is null, the check will succeed, because null is a valid instance of any reference type.

isinst is very similar in semantics to castclass. The only difference is in its response to an incompatible item. Rather than throwing an InvalidCastException, it leaves null behind on the stack. Notice that the use of null to indicate failure here means that checking the type identity of a null reference will technically succeed (e.g., null is a valid instance of System.String), but code inspecting its result can't differentiate between success and failure.

C# Is, As, and Casts (Language Feature)

C# uses the isinst instruction to implement the is and as keywords.

 object o = /*...*/; string s1 = o as string; if (s1 != null)     // Can work with 's1' as a valid string here. bool b = o is string; if (b)     // Can cast 'o' to string without worry, etc. string s2 = (string)o; // Can work with 's2' here; InvalidCastException results if it's not a string.

as just emits an isinst and pushes the result as its return value; is does nearly the same thing but compares the result to null and leaves behind a bool value on the stack resulting from the equality check. castclass is used when performing a cast (assuming that no explicit conversion operator has been supplied) and results in an InvalidCastExceptions if the cast fails.

The following IL corresponds to the above C#:

 // C#: object o = /*...*/; // Assume 'o' is in local slot #0. // C#: string s1 = o as string; ldloc.0 isinst     [mscorlib]System.String stloc.1 // 's1' is stored in local slot #1 as a 'System.String.' // C#: bool b = o is string; ldloc.0 isinst     [mscorlib]System.String ldnull cgt.un stloc.2 // 'b' is stored in local slot #2 as a 'System.Boolean.' // (Control flow logic omitted.) // C#: string s2 = (string)o; ldloc.0 castclass  [mscorlib]System.String stloc.3 // 's2' is stored in local slot #3 as a 'System.String.'

Based on this example, we can briefly summarize how it will execute: if o's runtime type is System.String, then s1 will refer to that instance, b will be true, and the cast will succeed; otherwise, s1 will be null, b will be false, and an InvalidCastException will be generated by the cast.

Arrays

Arrays are unlike other collections of data in the runtime in that they have special IL instructions to access elements and properties of them. Your compiler does not emit calls to methods and properties on the System.Array class — as it would for, say, System.Collections.Generic.List<T> — but rather specialized IL to deal with arrays in a more efficient manner.

First, new arrays are allocated using the newarr instruction; the System.Array class also permits dynamic allocation of arrays without using IL, but we defer discussion of those features to Chapter 6. newarr pops an expected integer off the top of the stack, representing the number of elements the array will hold. It also takes an argument type metadata token to indicate the type of each element. The act of creating a new array also zeroes out the array's memory, meaning that for value types each element will be the default value for that type (i.e., 0, false), and for reference types each element will be null.

The following C# and corresponding IL demonstrates this:

 // C#: int[] a = new int[100]; ldc.i4 0x1f4 newarr [mscorlib]System.Int32

Of course, once you have an instance of an array, you'll want to access its length, and load and store elements from and to the array. There are dedicated instructions for each of these operations. ldlen pops a reference to an array off the stack and leaves behind the number of elements it contains. ldelem takes an array reference, an integer index into the array on the stack, and a type token representing the type of element expected from the array; it extracts that element and places it onto the stack. A ldlema instruction is also available that loads a managed pointer to a specific element in an array. Lastly, stelem takes an array reference, an integer index, and an object or value to store into the array. It also takes a type token. There are a variety of variants of both ldelem and stelem (i.e., ldelem.<type> and stelem.<type>) that don't require a type token, but they are omitted here for brevity.