Working with Objects and ValueType Instances

In this section I am going to introduce objects and value types. We will cover how IL treats the differences between value and reference types, instantiating objects, methods, fields, properties, and constructors - in other words, all the usual basic aspects of classes and objects, but treated from an IL point of view. Although most of the concepts will be the same as you are used to in C# or VB, we'll see that IL and the CLR do throw up a few surprises.

Strictly speaking, in the context of managed code, the term "object" specifically means an instance of a reference type. However, in practice the term is often also used informally to indicate a value type instance, and in keeping with common usage this book will do so too when the context is clear.

We're going to work in this section by gradually developing an application based on a clock. The clock is represented initially by a value type, called Clock. Later on, we'll convert Clock to a reference type to see how this affects our code, but for now we'll stick with a value type to keep things simple. We'll use this class to gradually introduce the various object-based operations: declaring fields and instance methods, constructors, instantiating objects, and calling instance methods.

Instance Fields

We'll start off by presenting the first version of our Clock type. It's very simple: it just contains one unsigned int8 (System.Byte) field that indicates the time of day in hours. Crude, but when you're dealing with a low-level language like IL, believe me, this struct will easily be sufficient to show all the main programming features. Here's the ILAsm file, Clock.il:

 .assembly extern mscorlib {} .assembly Clock {    .ver 1:0:1:0 } .module Clock.dll .namespace Wrox.AdvDotNet.ClockSample {    .class public ansi auto sealed Clock extends [mscorlib]System.ValueType    {       .field public unsigned int8 Hours    } }

This code immediately tells us how to declare a field in IL. So far there's not really anything new apart from the syntax. Notice, however, that we've not declared the field explicitly as either instance or static. When declaring members of classes, the default is for items to be instance members unless you explicitly say they are static - this is in common with most high-level languages. Our code also rather clearly illustrates a pretty bad programming practice, in that we've declared a public field, but that's just temporary. As soon as we've covered how to declare properties, we'll put this field back to private.

In the last chapter we used only one file for each sample to keep things simple. In this chapter, we'll try and inject a bit more realism into the samples by having the Clock class itself in one assembly and the client code to test Clock in another. This arrangement also gives us the flexibility to test our IL code using clients written in high-level languages - we'll make use of this later on in the chapter.

We assemble Clock.il like this:

 ilasm Clock.il /dll

Now we'll create a separate file, TestClock.il, which will contain the Main() method. This is where things get interesting. Here's the file, containing a Main() method which instantiates a Clock instance as a local variable, sets its Hours field to 6, and displays the value of the field to make sure everything is working properly. In this assembly I've specifically indicated the version of the Clock assembly we wish to reference. That's because, for added realism, we'll keep the assembly name the same but increase its version number as we develop the Clock sample.

 // I've compacted the .assembly extern Clock directive on to one line for // simplicity, but I could have spread it over several lines if I wished. .assembly extern mscorlib {} .assembly extern Clock { .ver 1:0:1:0 } .assembly TestClock {    .ver 1:0:1:0 } .module TestClock.exe .namespace Wrox.AdvDotNet.ClockSample {    .class EntryPoint extends [mscorlib]System.Object    {       .method static void Main() cil managed       {          .maxstack 2          .locals init (valuetype                          [Clock]Wrox.AdvDotNet.ClockSample.Clock clock)          .entrypoint          // Set Hours to 6          ldloca.s clock          ldc.i4.6          stfld    unsigned int8 [Clock]                                 Wrox.AdvDotNet.ClockSample.Clock::Hours          ldstr    "Hours are "          call     void [mscorlib]System.Console::Write(string)          ldloca.s clock          ldfld    unsigned int8 [Clock]                                 Wrox.AdvDotNet.ClockSample.Clock::Hours          call     void [mscorlib]System.console::writeLine(int32)          ret       }    } }

Notice that in this sample the final call to Console.WriteLine() passes in an int32, even though we loaded an unsigned int8 onto the stack. That's fine because of the way that shorter integers will always be promoted on being pushed onto the stack. Because any number that can be stored in an unsigned int8 can also be stored in an int32, there are no overflow or sign issues. We had to pass in int32 because there is no overload of Console.WriteLine() that takes unsigned int8 (System.Byte)

This sample is the first time that we've declared a local variable that isn't a primitive type represented by a keyword in IL. The declaration in this case must give the full specification of the type. Also, since this type is a value type, its declaration must be prefixed by the word valuetype (for a reference type, as we'll see later, we use class instead of valuetype):

 .locals init (valuetype [Clock]Wrox.AdvDotNet.ClockSample.Clock clock)

In this sample I've given the local variable the name clock, but as always naming variables is optional. My real reason for naming this one is actually so I can refer to it more easily in the text!

Notice that the definition of the Clock type includes the assembly name - we need to do this because Clock is defined in a separate assembly. This name by the way is case-sensitive.

That's actually all we need to do to have the type available as a local variable - it really is as simple as that. Because we've explicitly indicated the init flag for local variables, clock will automatically be initialized by being zeroed out. Note that there has not been any constructor called (or even defined) for Clock. For performance reasons, the .NET Framework won't call constructors on value types unless the IL code explicitly tells it to.

Now we need to set the Hours field of clock to the integer 6. To do this, we need a new instruction, stfld.stfld expects to pop two items from the evaluation stack: a reference to the object for which we are going to set a field, and the actual value to be written to the field:

Important

..., address, value →...

For reference types, the reference would be an object reference, but since this is a value type we need a managed pointer instead - which we can get by using the ldloca.s instruction. In the following code I've explicitly named the variable to be loaded, but I could supply the index instead (ldloca.s 0), and ilasm.exe will replace the variable name with the index in the emitted IL anyway. ldloca works just like the ldloc.* instructions in this regard:

 // Set hours to 6 ldloca.s clock ldc.i4.6 stfld    unsigned int8 Wrox.AdvDotNet.ClockSample.Clock::Hours

Notice that stfld also takes an argument - a token indicating the type and field we are interested in. As with all tokens, in ILAsm this is a rather long string, but in the IL emitted by ilasm.exe it will simply be a four-byte integer (token) that indexes into the appropriate entry in the module metadata.

Retrieving the field value so we can display it involves another instruction, ldfld, which pretty much does the reverse of stfld: it loads a field onto the evaluation stack. Like stfld, ldfld needs a token identifying the field as an argument and expects to find the address of the object concerned on the stack as either a managed or unmanaged pointer or an object reference:

Important

..., address →..., value

This all means that we can retrieve and display the field with this code:

 ldloca.s clock     // Could write ldloca.s 0 here instead ldfld    unsigned int8 Wrox.AdvDotNet.ClockSample.Clock::hours call     void [mscorlib]System.Console::WriteLine(unsigned int8)

Although we won't be using them here, we'll note a couple of related commands:

ldflda is like ldfld, but retrieves the field's address as a managed pointer, instead of retrieving the field's value. You can use it if you need to get the address in order to pass the field by reference to another method.
ldsfld and stsfld work respectively like ldfld and stfld, but are intended for static fields - which means that they don't need any object address on the stack. ldsfld simply pushes the value of the field onto the stack and stsfld pops the value off the stack into the relevant field, in both cases without making other changes to the stack.

To compile the TestClock.il file (after compiling Clock using ilasm Clock.il /dll), type in this command:

 ilasm TestClock.il

Unlike the compilers for many high-level languages, ilasm.exe doesn't need a reference to Clock.dll to ilasm.exe at the command prompt: ilasm will figure it out from the .assembly extern declaration in testclock.il.

Defining Instance Methods and Properties

Now we've got a basic handle on how to manipulate fields of value types, we'll have a look at improving our Clock struct by wrapping that public field up in a property. This means that we'll be able to kill two birds with one stone as far as going over IL syntax is concerned: we'll cover both declaring instance methods and declaring properties. This is because in IL the get and set accessors of properties are actually declared as methods, and a separate .property directive links these methods to indicate that they constitute a property. The code for this sample is also contained in files called Clock.il and TestClock.il, but if you download the sample code you'll find the files for this sample in the Clock2 folder. In general, successive Clock samples in this chapter are numbered sequentially upwards.

I've also kept the same namespace name as the previous example Wrox.AdvDotNET.ClockSample. Since this is really a development of existing code, rather than a completely new sample, it makes more sense simply to increase the version number of the assembly, as I noted earlier. Since we are making a substantial change to the public interface of the class, we will change the major version number from 1 to 2.

 .assembly Clock {    .ver 2:0:1:0 }

Next, we change the declaration of the field to make it private. While we're at it, we'll change its name from Hours to hours to keep with the normal camel-casing convention for private fields, and to free up the name Hours for the property. We'll start with the method declaration for the get accessor. That means adding this code to the Clock struct:

    .class public ansi auto sealed Clock extends [mscorlib]System.ValueType    {       .field private unsigned int8 hours       .method specialname public instance unsigned int8 get_Hours()                                                             cil managed       {          ldarg.0          ldfld   unsigned int8 Wrox.AdvDotNet.ClockSample.Clock::hours          ret       }

We emphasize that this is so far essentially a normal method. The main difference in its declaration from the methods we've defined up to now is that instead of declaring the method as static, we've declared it as instance.

There is another new keyword in the above code - specialname. And this keyword is only present because we intend to use this method as a property accessor. specialname in this context is there to inform developer tools (such as VS.NET and the C++, C#, and VB compilers) that this method actually forms part of a property and should therefore be treated using the property syntax. In general, the purpose of specialname is to indicate that an item may be of significance to developer tools - though how the tools interpret it in a given context is up to them.

There is, however, one crucial thing you must remember when invoking the method: since this is an instance method, it has an extra hidden parameter - the address of the object against which it has been called. You never need to worry about the extra parameter in high-level languages because the compilers take care of it for you, but in IL you need to take it into account. Hence, although get_Hours() is declared without any parameters, when invoked it will always expect one parameter - and inside the method, ldarg.0 will load the quantity which in VB you think of as the Me reference and which to C++/C# people is the this reference. If there had been any explicit parameters, you would need to remember to index those parameters starting at 1 instead of 0. Bearing all that in mind, we can see that the code presented above simply loads up the hours field and returns its value.

Now for the set accessor. This is an instance method that also takes an explicit parameter. We shall name the parameter value in accordance with usual practice for property set accessors. However, since value is an ILAsm keyword, and would hence cause a syntax error if used by itself as a variable name, we enclose the name in single quotes. This is ILAsm's equivalent of preceding a variable name with the @ sign if the variable name clashes with a keyword in C#, or of enclosing the name in square brackets in VB.

       .method specialname public instance void set_Hours(                                                unsigned int8 'value')       {          ldarg.0          ldarg.1          stfld   unsigned int8 Wrox.AdvDotNet.ClockSample.Clock::hours          ret       }

set_Hours() uses the stfld command to copy the parameter to the hours field.

So far, all we've got is two methods that happen to have been tagged specialname and happen to follow the same signature and naming convention that you'd expect for property accessors. To actually have metadata placed in the assembly that marks these methods as property accessors, we need to add the following code to the Clock struct:

       .property instance unsigned int8 Hours()       {          .get instance unsigned int8 get_Hours()          .set instance void set_Hours(unsigned int8 'value')       }

This code formally declares to the .NET runtime that there is a property called Hours and that these two methods should be interpreted as accessors for this property. Notice that the .property directive does not itself have public or private or any other accessibility. Whether you can get/set a property depends upon the accessibility of the relevant accessor method. To be CLS-compliant, the two accessibilities have to be the same, however.

Although we now have a property definition, you should be aware that this definition is only actually useful for two situations:

It means that if reflection is used to examine the Clock class, Hour and its accessor methods will correctly be reported as a property.
High-level languages that use a special syntax for properties (and that includes C#, MC++, and VB) will be able to use their own property syntax for invoking the accessor methods. (In order to achieve this, the high-level language compilers can look for both the .property directive and the specialname tag on the accessor methods).

However, as far as the .NET runtime is concerned, the accessors remain normal methods. We can see this when we examine the new version of the Main() method in the new version of the TestClock.il file, which invokes these methods. The following code completes the Clock2 sample. It does exactly the same thing as the Clock1 sample, but now using the properties instead of the fields to set and read the hour stored in the Clock instance. The changed code is highlighted - as you can see, the calls to ldfld and stfld have been replaced by calls to invoke the accessor methods:

 .method static void Main() cil managed {    .maxstack 2    .locals init (valuetype [Clock]Wrox.AdvDotNet.ClockSample.Clock clock)    .entrypoint    // Initialize    ldloca.s clock    ldc.i4   6    call     instance void [Clock]Wrox.AdvDotNet.ClockSample.Clock::                                                   set_Hours(unsigned int8)    ldstr    "Hours are "    call     void [mscorlib]System.Console::Write(string)    ldloca.s clock    call     instance unsigned int8 [Clock]                              Wrox.AdvDotNet.ClockSample.Clock::get_Hours()    call     void [mscorlib]System.Console::WriteLine(int32)    ret }

Initialization and Instance Constructors

Now we're going to extend our Clock value type to add a couple of constructors to it: one that doesn't take any parameters (this type of constructor is often called a default constructor), and which initializes the hours to 12 (we'll assume midday is a suitable default value), and one that takes an unsigned int8 parameter indicating the initial hour.

In this chapter I'm going to focus on instance constructors - we're not going to worry about static constructors. So bear in mind that when I refer to constructors, I'm normally talking specifically about instance constructors.

Constructors, especially constructors of value types, is one area where high-level languages very often impose various rules or syntaxes of their own, which don't reflect the underlying mechanism in .NET. So you may find you need to forget quite a bit of what you've learned about constructors in your high-level language.

So how do constructors work in .NET? Firstly, a couple of points that apply to constructors in general, irrespective of whether we are dealing with value types or reference types:

Instance constructors are methods that have the name .ctor and return void (static constructors are called .cctor). They also need to be decorated with two flags - specialname, which we've already encountered, and a new flag, rtspecialname. Other than this, constructors are treated syntactically as normal methods and can be called whenever you want, not just at object initialization time, though in most cases it's not good programming practice to invoke them at any other time, and doing so may make your code unverifiable.
You can define as many different constructor overloads as you wish.
For reference types, you ought always to call the base class constructor from every possible code pathway inside the constructor. Failing to do so will cause the program to fail verification because of the risk of having uninitialized fields inherited from the ancestor types (although the constructor will still constitute valid code). Although the formal requirement is merely that every code pathway should call the base constructor, in practice the best solution is almost always to do this as the first thing in a constructor, before you do anything else. Note that this requirement is for reference types only. For value types, invoking the base constructor is not only unnecessary but is pointless: the base type is always System.ValueType, which doesn't contain any fields to initialize!
It is illegal to define a constructor as virtual.
Constructors of reference types are always invoked automatically when a new object is instantiated. Constructors of value types, however, are never automatically invoked. They are only invoked if you explicitly call them from your IL code.

Having learned the principles of constructors in .NET, let's briefly review a couple of the gotchas that will catch you out if you just blindly assume the rules for constructors are what your favorite high-level language would have you believe:

Many high-level languages prevent you from declaring a constructor return type. In IL, constructors must be specifically given their true return type - void.
Some high-level languages such as C# will not let you define a parameterless constructor for value types. This is a restriction of the language, not the .NET runtime. In IL there is no syntactical problem about declaring such a constructor, although you should think carefully before defining one. There are occasionally times when a default constructor may come in useful, but as we'll see later, default value-type constructors can cause some subtle run-time bugs that you'll need to take care to avoid (I stress this only applies to value-type constructors, not to constructors of reference types, nor to constructors that take parameters).
Many high-level languages automatically insert IL code in a constructor to call the base class constructor. For example, C# will automatically insert a call to the base class default constructor as the first item of code in a constructor, unless you supply a constructor initializer in your code, indicating that some other constructor should be called instead. This is useful because it forces good programming practice. As we've seen, IL itself doesn't put such stringent restrictions on code in constructors.

Adding a Default Constructor

That's the theory - now for the code. First we'll add a default (that is to say, parameterless) constructor to the Clock type (this will constitute the Clock3 sample in the code download). I know I hinted that doing this can be a bad idea for value types, but we still need to see the syntax for declaring a default constructor. Besides, we'll soon be converting Clock to a reference type, and then the default constructor will be important.

To add the constructor, we add this code inside the Clock definition:

 .method public specialname rtspecialname instance void .ctor() {    ldarg.0    ldc.i4.s 12    stfld    unsigned int8 Wrox.AdvDotNet.ClockSample.Clock::hours    ret }

Now for invoking the constructor. We need to add the following code to Main() to have the Clock initialized to its default value of 12 instead of explicitly initializing the hours field:

       Main() cil managed       {          .maxstack 2          .locals init (valuetype Wrox.AdvDotNet.ClockSample.Clock clock)          .entrypoint          // Initialize          ldloca.s clock          call     instance void [Clock]Wrox.AdvDotNet.ClockSample.Clock::                                                                     .ctor()          ldstr    "Hours are "          call     void [mscorlib]System.Console::Write(string)          ldloca.s clock          call     instance unsigned int8 [Clock]                               Wrox.AdvDotNet.ClockSample.Clock::get_Hours()          call     void [mscorlib]System.Console::WriteLine(int32)          ret       }

This code emphasizes that constructors of value types always have to be explicitly invoked if you want them to be executed.

Adding a Constructor that Takes Parameters

Finally, having seen how to write a parameterless constructor, we'll write one that takes one parameter that determines the initial time. So we'll add the following code (this will form the Clock4 sample):

 .method public specialname rtspecialname instance void .ctor(                                                        unsigned int8 hours) {    ldarg.0    ldarg.1    stfld   unsigned int8 Wrox.AdvDotNet.ClockSample.Clock::hours    ret }

Adding this code is all we strictly need to do to implement this constructor. However, we now have two constructors that separately initialize the hours field. Normally, in this situation, good programming practice dictates that we should keep the initialization code in one place and have the constructors call each other. So we'll modify the code for the default constructor as follows:

 .method public specialname rtspecialname instance void .ctor() {    ldarg.0    ldc.i4.s  12    call      instance void Wrox.AdvDotNet.ClockSample.Clock::.ctor(                                                             unsigned int8)    ret }

Lastly, we'll modify the code for Main() to call the one-parameter constructor (for simplicity, the test harness will only test this constructor):

       .method static void Main() cil managed       {            .maxstack 2            .locals init (valuetype Wrox.AdvDotNet.ClockSample.Clock clock)            .entrypoint            // Initialize            ldloca.s clock            ldc.i4   9            call     instance void [Clock]Wrox.AdvDotNet.ClockSample.Clock::                                                           .ctor(unsigned int8)            ldstr    "Hours are "            call     void [mscorlib]System.Console::Write(string)            ldloca.s clock            call     instance unsigned int8 Wrox.AdvDotNet.ClockSample.Clock::                                                                    get_Hours()            call     void [mscorlib]System.Console::WriteLine(int32)            ret        }

Value Type Initialization in C#

We are now going to see how the principles we've been discussing translate to a high-level language by examining how C# deals internally with the initialization of value types. In the process we'll see why default constructors for value types can cause problems, especially for clients in high-level languages.

Let's quickly write a C# program to consume the Clock value type. Here's the file: it's called CShTestClock.cs:

 using System; namespace Wrox.AdvDotNet.ClockSample {    class EntryPoint    {       static void Main()       {          Clock clock = new Clock();          Console.WriteLine(clock.Hours);       }    } }

To start off with, we place this file in the same folder as the Clock4 sample files - the last sample that contains constructors for Clock, and compile it using the /r flag to reference the Clock.dll assembly and run it.

 C:\AdvDotNet\ILDeeper>csc cshtestclock.cs /r:clock.dll Microsoft (R) Visual C# .NET Compiler version 7.00.9466 for Microsoft (R) .NET Framework version 1.0.3705 Copyright (C) Microsoft Corporation 2001. All rights reserved. C:\AdvDotNet\ILDeeper>cshtestclock 12

There are no surprises here. The parameterless Clock constructor has been invoked, so the clock has been initialized to 12.

Now let's do the same thing, but this time placing the C# source file in the folder that contains the first Clock sample, for which Clock did not have any constructors. Interestingly, despite the fact that we have used a constructor syntax in C# against a struct that has no constructors, the code compiles fine. However, with no constructors, the Hours field clearly won't get initialized to 12. In fact, it turns out to be initialized to zero:

 C:\AdvDotNet\ILDeeper>csc cshtestclock.cs /r:clock.dll Microsoft (R) Visual C# .NET Compiler version 7.00.9466 for Microsoft (R) .NET Framework version 1.0.3705 Copyright (C) Microsoft Corporation 2001. All rights reserved. C:\AdvDotNet\ILDeeper>cshtestclock 0

Evidently the C# compiler is up to something behind the scenes. And an examination of the emitted IL using ildasm shows up what's going on:

 .maxstack 1 .locals init (valuetype [Clock]Wrox.AdvDotNet.ClockSample.Clock V_0) IL_0000: Idloca.s V_0 IL_0002: initobj [Clock]Wrox.AdvDotNet.ClockSample.Clock IL_0008: ldloca.s V_0 IL_000a: Idfld  unsigned int8 [Clock]                  Wrox.AdvDotNet.ClockSample.Clock::Hours IL_000f: call   void [mscorlib]System.Console::WriteLine(int32) IL_0014: ret

Don't worry about the IL_* labels attached to each instruction. That's an artefact of ildasm: ildasm labels each instruction with a string indicating the relative offset in bytes of the instruction compared to the start of the method, in case you need the information, and to make it easier to see where the targets of branch instructions are.

The key is in that initobj instruction. We haven't encountered initobj yet, but its purpose is to initialize a value type by zeroing out all its fields. It requires the top item of the stack to be a reference (normally a managed pointer) to the value type instance to be initialized, and takes a token indicating the type as an argument (the argument is used to determine how much memory the type occupies and therefore needs to be zeroed).

The C# compiler detects whether a value type to be instantiated has a default constructor defined. If it does, it compiles the new() operator to invoke the constructor. If it doesn't, the new() operator calls initobj instead. There is a possible small performance hit to the extent that we are initializing the object twice - once through the init flag on the .locals directive, and once through the initobj command. We could avoid this hit by coding in IL directly, though it's undocumented whether the JIT compiler would detect this and optimize it away anyway.

The Problem with Default Value Type Constructors

Now we are in a position to see exactly why default constructors for value types can cause problems. The problem is that it is quite possible for value types to be instantiated without any constructor being invoked - unlike the case for reference types, which, as we'll see soon, simply cannot be instantiated without having one constructor executed. That means that if you write a value type that depends on its constructor always having been executed, you risk the code breaking.

Related to this is a risk of version-brittleness. If you compile C# code, for example, against a value type that has no default constructor, the client code will initialize the instance using initobj. If you subsequently modify the value type to add a constructor, you'll have existing client code that initializes the object in the 'wrong' way.

Another issue is that the usual expectation among .NET developers is that the statement SomeValueType x = new SomeValueType(); will initialize x by zeroing it out. If you supply a default constructor that does something else, you're running against the expectations of developers, which clearly means bugs are more likely in their code. On the other hand, if you are aware of these issues and prepared to work around them, you may feel that there is justification for defining a default constructor for some value type in certain situations (for example, if your type has the access level nested assembly and will only be used from your own code).

Working through the emitted IL code in this way demonstrates the potential for gaining a deeper understanding of what is going on in your high-level language if you are able to read the IL emitted by the compiler.

Instantiating Reference Objects

Now we've seen how value types are instantiated and initialized, we'll move on to examine instantiation of reference types. So, for the next sample (Clock5), let's change Clock to a reference type that derives from System.Object. To do this, we need to change its definition:

    .class public ansi auto Clock extends [mscorlib]System.Object    {       .field private unsigned int8 hours

We will also need to modify the one-parameter constructor so that it calls the System.Object constructor - recall that I mentioned earlier that constructors of reference types must call a base class constructor. This is the new one-parameter constructor for Clock:

 .method public specialname rtspecialname instance void .ctor() {    ldarg.0    call    instance void [mscorlib]System.Object::.ctor()    ldarg.0    ldarg.1    stfld   unsigned int8 Wrox.AdvDotNet.ClockSample.Clock::hours    ret }

Notice that the call to the base class constructor is the first thing we do. If we'd initialized the hours field first, the code would still pass verification - but in most cases it's not good programming practice to do anything else before calling the base class constructor: you risk the possibility of manipulating fields inherited from the base class that have not been initialized.

The default constructor, which simply invokes the one-parameter constructor, is completely unchanged.

 .method public specialname rtspecialname instance void .ctor() {    ldarg.0    ldc.i4.s 12    call     instance void Wrox.AdvDotNet.ClockSample.Clock::.ctor(                                                             unsigned int8)    ret }

Notice that this constructor doesn't invoke the base constructor directly, but does so indirectly, via the call to the one-parameter constructor. The verification process is nevertheless able to detect that the base constructor is invoked, so the code is still verifiable.

Now for the Main() method in the TestClock.il file. Again, the code that has changed has been highlighted. For this test, we invoke the default constructor:

 .class EntryPoint extends [mscorlib]System.Object {    .method static void Main() cil managed    {       .maxstack 2       .locals init (class [Clock]Wrox.AdvDotNet.ClockSample.Clock)       .entrypoint       newobj   void [Clock]Wrox.AdvDotNet.ClockSample.Clock::.ctor()       stloc.0       ldstr    "Hours are "       call     void [mscorlib]System.Console::Write(string)       ldloc.0       call     instance unsigned int8 [Clock]                         Wrox.AdvDotNet.ClockSample.Clock::get_Hours()        call     void [mscorlib]System.Console::WriteLine(int32)       ret    } }

The first difference is that in our definition of the type of the local variable that will store the object reference, we prefix the name of the type with class rather than with valuetype. This informs the JIT compiler that it only needs to reserve space for a reference rather than for the actual object. Interestingly, this program would work equally well if we declared the local variable as simply being of type object, although for obvious type-safety reasons, it's better practice to supply as much information as we can about the type in the IL code:

 .locals init (object)      // This would work too

More specifically, the advantage of declaring the type explicitly is that if there were a bug in your code that caused the wrong type of object to be instantiated, this could potentially be detected and an exception raised earlier - when the object reference is first stored in the local variable rather than the first time a method on it is called.

The next difference comes when we instantiate the object - and this is the crucial difference. Whereas for a value type, the space to hold the object was allocated in the local variables table at the start of the method, when we declared it as a local variable, now we need a new command to instantiate the object on the managed heap - newobj.newobj takes one argument, which must be a token to the constructor you wish to call. Note that we don't need a separate token to indicate the type as that information can be deduced from the constructor we are invoking. Any parameters required by the constructor will be popped off the stack, and an object reference to the newly created object is pushed onto the stack:

Important

..., parameter1, ..., parameterN →..., object

In our case, since we are calling the default constructor, no parameters will be popped from the stack. All we need to do once we've instantiated the object is store the reference to it using stloc.0:

 newobj  void Wrox.AdvDotNet.ClockSample.Clock::.ctor() stloc.0

There is one more change we need to make to our code. When loading the object reference onto the evaluation stack in order to call its get_Hours() method, we use ldloc.0 instead of ldloca.s 0:

 ldloc.0 call    instance unsigned int8 Wrox.AdvDotNet.ClockSample.Clock::get_Hours()

The reason for this change is that invoking an instance method requires a reference to the object to be on the stack. Previously, the local variable held the object itself when it was a value type, which meant we needed to load its address using ldloca.s in order to obtain a managed pointer suitable for use as the reference. Now, however, the local variable already contains an object reference. Using ldloca.s would result in the evaluation stack containing an unusable managed pointer to a reference. ldloc.0 is sufficient to put the correct data on the stack.

Finally, just for the sake of completeness, we'll show how we would need to change the code to instantiate the Clock instance using its one-parameter constructor (this code is downloadable as the Clock6 sample). The only change we will need to make is to ensure the value to be passed to the constructor is on the stack prior to executing newobj, and, obviously, ensuring that the argument to newobj is the correct constructor token:

 .maxstack 2 .locals init (class Wrox.AdvDotNet.ClockSample.Clock) .entrypoint ldc.i4.s 6      // To initialize hours to 6 newobj   void [Clock]Wrox.AdvDotNet.ClockSample.Clock::.ctor(unsigned int8) stloc.0

Virtual Methods

In this section we'll examine how to declare, override, and invoke virtual methods in IL.

Declaring Virtual Methods

Declaring and overriding virtual methods is no different in principle in IL compared to higher-level languages, although there is a certain amount more freedom of expression in how you define and invoke the methods. We'll demonstrate this through a new sample, Clock7, in which we override the ToString() method in our Clock class to return the time in hours followed by "O' Clock" (such as "6 O' Clock"). Here's our override:

 .method public virtual hidebysig string ToString() {    ldstr   "{0} O'Clock"    ldarg.0    ldflda  unsigned int8 Wrox.AdvDotNet.ClockSample.Clock::hours    call    instance string unsigned int8::ToString()    call    string string::Format(string, object)    ret }

This method uses the two-parameter overload of the String.Format() method to format the returned string. Recall that the first parameter to this overload of String.Format() is a format string, and the second parameter is an object whose string value will be inserted into the format string. The above code is roughly equivalent to this C# code:

 public override ToString() {    return String.Format("{0} O'Clock", this.hours.ToString()); }

There's a couple of points here to take note of in our implementation of ToString(): the way we have invoked unsigned int8.ToString() (or equivalently, System.Byte.ToString()) using a managed pointer, and the virtual hidebysig attributes in the method declaration:

As far as calling Byte.ToString() is concerned, notice that we have used the ldflda instruction to load the address of the hours field onto the evaluation stack. That's important because when calling an instance method, we need a reference to the this pointer for the method on the stack. In the case of calling methods of value types in general, and the CLR primitive types in particular, such as Byte, Int32, Int64, and so on, the methods expect a managed pointer (not an object reference). This is quite nice for performance since it means there is no need, for example, to box the types. If you want to call Byte.ToString() you just provide the address in memory of the unsigned int8 to be converted onto the evaluation stack, and you're ready to invoke the method. That unsigned int8 can literally be anywhere - field, local variable, argument, etc.

Now let's look at the virtual and hidebysig attributes.

There are no surprises in the virtual keyword. It's exactly equivalent to virtual in C++, to Inheritable or Overrides in VB, and to virtual or override in C#. Notice, however, that ILAsm doesn't have separate terms such as override/Overrides to distinguish whether or not this method is already overriding something else, as C# and VB do.

hidebysig is another of those flags that actually has no effect whatsoever as far as the runtime is concerned, but is there to provide extra information to compilers and developer tools for high-level languages. Its meaning is subtle (and provides one example of the fine degree of control that IL allows you), but the C# and VB compilers will routinely put it in the IL code they emit, so you need to be familiar with it. hidebysig tells us that this method - if it is to hide any method - should be interpreted by compilers as only hiding methods that have the same name and signature. If we omitted hidebysig, the method would hide any method that has the same name. In high-level languages, you never get a choice about hidebysig behavior: C# always uses it. VB uses it for Overrides methods but not for Shadows methods. MC++ never uses it. These days, hidebysig seems to be regarded as a better object-oriented approach, but MC++ does not use it because the ANSI C++ standard uses hide by name.

Before we go on to see how to invoke virtual methods, we'll quickly list some other flags related to override behavior that might be useful. For these flags, the only new thing you have to learn is the ILAsm keywords - there's no difference in meaning from the high-level language equivalents:

IL Keyword	C# Equivalent	VB Equivalent	MC++ Equivalent	Meaning
newslot	new	Shadows	new	This method takes a new slot in the vtable. It does not override any base class methods even if there are base methods that have the same name.
final	sealed	NotOverrideable	__sealed	It is not permitted to further override this method.
abstract	abstract	MustOverride	__abstract	No method body supplied. This method must be overridden in non-abstract derived classes.

Invoking Virtual Methods

In order to invoke a virtual method you will not normally use the IL call instruction. Rather, you will use a new instruction, callvirt. The syntax of callvirt is identical to call: it takes a token as an argument that indicates the method to be called. However, callvirt cannot be used to call static methods, which means it always requires a this reference for the method to be on the evaluation stack. callvirt also does a bit more work than call. Firstly, it checks that the item it pops off the evaluation stack is not null - and throws an exception if it is null. Secondly, it calls the method using the method table (vtable) of the object on the stack. In other words, it treats the method as a virtual method - you get the small performance hit of an extra level of indirection to locate the method to be invoked, but the security of knowing you are calling the appropriate method for the given object.

Here's the code in TestClock.il to display the value of the clock variable:

      .method static void Main() cil managed      {         .maxstack 2         .locals init (class [Clock]Wrox.AdvDotNet.ClockSample.Clock)         .entrypoint         // Initialize         newobj   void [Clock]Wrox.AdvDotNet.ClockSample.Clock::.ctor()         stloc.0         ldstr     "The time is "         call      void [mscorlib]System.Console::Write(string)         ldloc.0         callvirt instance string [Clock]Wrox.AdvDotNet.ClockSample.Clock::                                                                  ToString()         call     void [mscorlib]System.Console::WriteLine(string)         ret      }

The interesting thing about IL is that you get a choice each time you invoke a method whether you invoke the method using a virtual or non-virtual technique. In high-level languages, you don't always get that choice (though in C++ you can use the ClassName::MethodName() syntax to indicate which override you need). But in IL, you can always choose to call a method using call or callvirt. Both of these instructions can be used to call either virtual or non-virtual methods. To see the difference more clearly, let's change the above code to this:

 ldloc.0 callvirt instance string [mscorlib]System.Object::ToString()

Here we invoke Object.ToString(). However, because we're using callvirt, the specification of the object class won't make any difference. The method to be invoked will be taken by looking up String ToString() in the method table for the object reference on the evaluation stack, which of course will identify Clock.ToString() as the method to be invoked. On the other hand, this code:

 ldloc.0 call    instance string [mscorlib]System.Object::ToString()

will cause Object.ToString() to be invoked, even though the top of the evaluation stack contains a reference to a Clock instance.

Incidentally, although there is a performance hit associated with using callvirt rather than call, this is only very small. And the C# team decided it was small enough to be worth paying in just about every method call. If you examine code generated by the C# compiler, you'll see that callvirt is used almost exclusively, even for non-virtual methods. The reason? The extra code robustness you gain because callvirt checks for a null pointer. The Microsoft C# compiler works on the philosophy that that extra debugging check is worth the slight performance loss.

Boxing and Unboxing

Boxing value types is fairly easy in IL: you simply use the box instruction. For example, if we had an integer type stored in local variable with index 0 and wished to box it, we could use this code:

   ldloc.0   box    int32

As usual, we've used the int32 keyword as a shorthand for the type. Obviously, for non-primitive types you'd have to indicate the type explicitly:

   ldloc.0 // This is a Color instance   box     [System.Drawing]System.Drawing.Color

box will leave an object reference on the stack. The stack transition diagram looks like this:

Important

..., value →..., object ref

Unboxing is quite simple too. You use the unbox instruction, which expects a token identifying the type, pops an object reference off the stack, and returns a managed pointer to the data:

   unbox  int32

unbox has this stack transition diagram:

Important

..., object ref →..., managed pointer

Notice that the diagram for unbox isn't the reverse of that for box. Where box takes the value-type instance, unbox leaves an address on the stack - a managed pointer. And this reflects an important point we need to understand. Unboxing is not really the opposite of boxing. Boxing involves copying data, but unboxing does not. Boxing up a value type means taking its value - which might be located on the stack or inline on the heap if the object is a member of a reference type - and creating a boxed object on the managed heap and copying the contents of the value type into the boxed object. Because of this, it's common for high-level language programmers to assume that unboxing means extracting the value from the boxed object and copying it back onto the stack. But it doesn't mean that - all that unboxing involves is figuring out the address of the data on the managed heap and returning a managed pointer to that data. The difference can be seen from this diagram:

click to expand

It really comes down to the difference we indicated earlier between an object reference and a managed pointer. Recall that we said that where a managed pointer points to the first field in an object, an object reference contains the address of object's method table pointer. So when you unbox an object all you are effectively doing is adding a few bytes (OK, let's be specific - four bytes in version 1 of the .NET Framework) onto the object reference and interpreting the result as a managed pointer.

The confusion typically arises from this kind of code:

 // c# code int someNumber = 12; object boxedNumber = someNumber;    // Box the integer // Do some processing on object int copy =  (int)boxedNumber;      // Unbox and copy the integer

In this code, the statement in which we declare and initialize the boxedNumber variable is correctly described as boxing the integer. Later we take a copy of the boxed value - and this procedure is often loosely described by C# developers as unboxing the value. In fact, this last line of code unboxes and copies the value.

So after all that, suppose you want to unbox a value and take a local copy of it in IL. How do you do it? The answer is you go back to the ldind.* instructions to convert the managed pointer to data, and hence retrieve the actual value of the data. The following code snippet shows you how to do that:

 // Top of stack contains an object reference to a boxed int32. // We need to store the value of the int32 in local variable 0. unbox    int32 ldind.i4 stloc.0

Finally, before we go on to present a sample that illustrates boxing, I just want to warn you about a potential catch. The following code, which I've called the BoxWrong sample, loads an integer up, boxes it, then uses System.Int32.ToString() to convert its value to a string and display the value:

 .method static void Main() cil managed {    .maxstack 1    .entrypoint    ldstr     "The number is "    call      void [mscorlib]System.Console::Write(string)    ldc.i4.s  -45    box       int32    call      instance string int32::ToString()    call      void [mscorlib]System.Console::WriteLine(string)    ret }

If you think that this code will display the value -45, you're in for a shock:

 C:\AdvDotNet\ILDeeper\BoxWrong Sample>boxwrong The number is 2041972160

So what's gone wrong? The answer is quickly revealed if we run peverify on the code:

 C:\AdvDotNet\ILDeeper\BoxWrong Sample>peverify boxwrong.exe Microsoft (R) .NET Framework PE Verifier Version 1.0.3705.0 Copyright (C) Microsoft Corporation 1998-2001. All rights reserved. [IL]: Error: [c:\AdvDotNet\ILDeeper\boxwrong sample\boxwrong.exe : Wrox.AdvDotNet.BoxWrong.EntryPoint::Main] [offset 0x00000011] [opcode call] [found [box]value class 'System.Int32'] [expected address of Int32] Unexpected type on the stack. 1 Errors Verifying boxwrong.exe

The problem is that we are using an object reference to call a method on a boxed value type - and we have an object reference instead of a managed pointer on the stack. Remember I said earlier that you should use a managed pointer, not an object reference, to invoke members of value types. The same applies to boxed value types. Because int32 is a value type, its ToString() implementation expects the first parameter passed in to be a managed pointer to the int32 that needs to be converted. If the method had to do additional processing to figure out whether what it had been given was actually an object reference, performance would be pretty badly affected. In the case of the above code, Int32.ToString() will de-reference the object reference, which (at least for .NET version 1) will lead to the value (address) sitting in the object's method table pointer, and convert that to a string. Not surprisingly, the result isn't very meaningful.

Fortunately, this program is very easily corrected by inserting an unbox command. The corrected code is downloadable as the BoxRight sample.

           ldc.i4.s  -45           box       int32   // Assuming we need to box for some other reason           unbox     int32           call      instance string int32::ToString()           call      void [mscorlib]System.Console::WriteLine(string)

And the moral is: if you're calling a method defined in a value type, give it pointer to a value type - not an object reference to a boxed value type.

Note that the BoxRight sample is based on the supposition that we have some good reason for wanting to box our int32 - presumably connected with something else we intend to do to it, for example passing it to a method that expects an object as a parameter. As the code stands, it would be a lot more efficient not to box at all, but to store the integer in a local variable and load its address using Idloca. But then I couldn't demonstrate boxing to you.