Introducing IL Assembly

IL itself has a binary format. Just as with native assembly language, an IL instruction is actually stored in its containing assembly as a binary number (an opcode), which means that it would be pretty pointless using a text editor to read a file containing IL. However, just as with native executable code, an assembly language has been defined for IL that consists of textual mnemonic codes to represent the IL commands. This language is known as IL assembly - though since IL assembly is a long name, you'll often hear it conveniently, albeit not strictly accurately, referred to just as IL, or sometimes as ILAsm or as IL source code. For example, the IL instruction to add two numbers together is the opcode 0x58 (88 in decimal - in this book we follow the usual practice of prefixing hexadecimal numbers with 0x), but this instruction is represented in IL assembly by the string add. For obvious reasons, we'll use assembly rather than the native IL code in this book. In effect, we will strictly speaking be teaching you IL assembly rather than straight IL. However, because of the obvious one-to-one correspondence between the IL assembly instructions and IL instructions, this means for all practical purposes you will be learning IL as well. In this chapter we won't worry too much about the actual format of how the opcodes are represented in assemblies - we'll deal with that issue in Chapter 4, in which we examine assembly format in more detail. We will mention, though, that keeping file size small was one of the main design priorities for the binary format. Hence most opcodes occupy just one byte, although some more rarely used ones occupy two bytes. Quite a few of the instructions also take arguments - numbers occupying anything from one to four bytes that follow the instruction in the assembly and provide more information about the instruction. For example, the call instruction, which invokes a method, is followed by a metadata token - a number that indexes into the metadata for the module and can serve to identify the method to be invoked.

Of course, the .NET runtime can't actually execute ILAsm: .NET assemblies contain straight IL. So if we start writing ILAsm code, we'll need some way to convert it to IL. Fortunately, Microsoft has provided a command-line tool for this purpose, an IL assembler, which confusingly is also called ilasm - you run the assembler by running the file ilasm. exe.

The dual use of the name ilasm is unfortunate. In this book, when we use the name as a shorthand for IL assembly, we'll always capitalize it ILAsm, as opposed to ilasm for the assembler tool. Also, don't confuse IL Assembly (IL source code stored in text files, usually with the extension .i1) with the term assembly (the binary .exe or .d11 file that contains assembled IL).

In some ways, this idea of assembling ILAsm code looks like the same process as compiling a higher-level language, but the difference is that the IL assembly process is far simpler, being little more than a substitution of the appropriate binary code for each mnemonic (although the generation of the metadata in the correct format is more complex). Not only that, but in the case of IL, there is a disassembler tool to perform the reverse process - a command-line tool called ildasm.exe, which converts from IL to IL assembly. You've probably already used ildasm to examine the metadata in assemblies that you've compiled from high-level languages. In this book, we'll be using it to generate ILAsm files.

The following diagram summarizes the various languages, and the tools available to convert between them. The shaded boxes show the languages, while the arrows and plain boxes indicate the tools that convert code between the languages:

click to expand

A Hello World IL Program

Since the days of C programming back in the 1970s, it's been traditional for the first program we write when learning a new language to display the phrase Hello World at the command line. We're not going to break with that tradition in this book, so we'll start off with a Helloworld example - an IL program that displays Hello, World. Unfortunately, Visual Studio .NET does not offer any intrinsic support for writing programs directly in IL assembly, so we will have to fall back on a plain text editor, such as Notepad. We therefore open Notepad and type the following code into it:

 // HelloWorld.il // This is our first IL Program! .assembly extern mscorlib {} .assembly Helloworld {    .ver 1:0:1:0 } .module Helloworld.exe .method statid void Main () cil managed {    .maxstack 1    .entrypoint    1dstr   "Hello, World"    call    void [mscorlib] System.Console::WriteLine(string)    ret } 

If you want to try this out, you can either type in this code, or you can download the code from the Wrox Press web site - all the samples in this book are available for download. If you type in this file, give it the name HelloWorld.il. Then we can "compile" - or perhaps "assemble" is a better term - the file into an assembly using the ilasm tool - in general you do this by typing ilasm <FileName> at the command prompt. In our case, this produces an assembly called HelloWorld.exe, which we can now run. The following output shows what happens when we assemble and run the program at the command prompt:

 C:\AdvDotNet\ILIntro>ilasm HelloWorld.il Microsoft  (R) .NET Framework IL Assembler.  Version 1.0.3705.0 Copyright  (C) Microsoft Corporation 1998-2001. All rights reserved. Assembling 'HelloWorld.il' , no listing file, to EXE --> 'HelloWorld.EXE' Source file is ANSI Assembled global method Main Creating PE file Emitting members: Global  Methods: 1; Writing PE file Operation completed successfully C:\AdvDotNet\ILIntro>HelloWorld Hello, World 

If you don't supply any parameters to ilasm, it produces an executable assembly with the same name as the .il file, but with the .exe extension. In our case we get a file called HelloWorld. exe, which is of course the file that gets executed when we type HelloWorld. If you actually need a DLL, you just specify the /dll flag when running ilasm:

 ilasm /dll MyLibrary.il 

There is no command-line flag to specify a Windows (as opposed to a console) .exe - to generate a Windows application, you need to specify that in the IL code inside the .il file. You do this using the .subsystem directive, specifying the value 2 for a Windows application:

 .assembly MyAssembly {    .ver 1:0:1:0 } .module MyAssembly.exe .subsystem 0x00000002 

Now let's have a look at that IL source code. As you can see, the syntax bears some similarity to C#/C++ syntax in several aspects:

  • Curly braces are used to delimit regions of code. HelloWorld.il uses curly braces to mark the beginning and end of a method, but in general they can be used to group together any instructions for readability.

  • Excess whitespace is also ignored, which allows us to indent the code to make it easier for us to read.

  • The // syntax for single-line comments is supported. IL also supports the /* ... */ syntax for comments that are spread over multiple lines or over just a part of a line. Any text that follows a // on the same line or that comes between a /* and the following */ is completely ignored by the assembler, exactly as in C++ or C#.

On the other hand, IL terminates instructions by white space, not by semi-colons as in C-style languages. In general you'll probably want to place statements on separate lines as it makes the code a lot easier to read.

We'll now work through the code in a bit more detail. At this stage, we're not expecting to understand everything about it - just enough to get a rough idea what is going on.

The first uncommented line is the .assembly directive. This directive is qualified by the keyword extern:

 .assembly extern mscorlib {} 

.assembly extern is used to indicate other assemblies that will be referenced by the code in this assembly. You can have as many .assembly extern directives as you wish, but you must explicitly name all the assemblies that will be directly referenced. Strictly speaking, we don't need to declare mscorlib - mscorlib.dll is such an important assembly that ilasm.exe will assume you want to use it and supply the directive for you anyway, but we've included it here just to make it explicit.

Next we come to another .assembly directive, but this one is not marked as extern:

 .assembly HelloWorld {    .ver 1:0:1:0 } 

This directive instructs the assembler to insert an assembly manifest, which means that the file produced will be a complete assembly (as opposed to a module that will later be incorporated into an assembly). The name of the assembly will be the string following the .assembly command. The curly braces can contain other information you wish to specify that should go in the assembly manifest (such as the public key or the version number of the assembly) - for now we simply supply the version using the .ver directive. Since this is our first time for writing this assembly, we've gone for version 1:0:1:0 - major version 1, minor version 0, build 1, revision 0.

You'll notice that .assembly is one of a number of items in the HelloWorld.il file that are preceded by a dot. This dot indicates that the term is a directive: IL distinguishes between statements, which are actual instructions to be executed when the program is run (for example, ldstr, which loads a string), and directives, which supply extra information about the program. While statements map directly into IL instructions and are part of the formal definition of IL, directives are simply a convenient way of supplying information to the ilasm.exe assembler about the structure of the file and the metadata that should be written to the assembly. There are also some keywords, such as static in the HelloWorld example above, which serve to modify a directive, but which do not have a preceding dot.

The next directive is .module; this declares a module, and indicates the name of the file in which this module should be stored:

 .module HelloWorld.exe 

Note that we supply the file extension for a module, since this is a file name, in contrast to the .assembly directive, which is followed by the assembly name rather than a file name. All assemblies contain at least one module; since this file will assemble to a whole assembly, there's strictly speaking no need to explicitly declare the module here - ilasm.exe will assume a module with the same name as the assembly if none is specified, but I've put the directive in for completeness.

The next line contains the .method directive, which you won't be surprised to learn instructs the assembler to start a new method definition:

 .method static void Main() cil managed 

What might surprise you if your .NET programming has been entirely in C# is that the method declaration does not come inside a class. C# requires all methods to be inside a class, but that is actually a requirement of the C# language, it's not a requirement of .NET. In IL, just as in C++, there is no problem about defining global methods. The CLR has to be able to support global methods in order to support languages such as C++ or FORTRAN.

There are a couple of extra keywords here that provide more information about the nature of the Main() method. It is a static method, and will return a void, static in IL usually has the same meaning as in C# and C++ (or the Shared keyword in VB) - it indicates that a method is not associated with any particular object and does not take a this (VB.NET Me) reference as an implicit parameter. IL syntax requires that global methods should also be marked as static, in contrast to many high-level languages, which often only require the static keyword for static methods that are defined inside a class. The method name in IL assembly is followed by brackets, which contain the details of any parameters expected by the method (but for our Main() method there are no parameters). The string cil managed following the method signature indicates that this method will contain IL code. This is important because .NET also allows methods that contain native executable code instead of IL code.

Once we get into the method definition, there are two more assembler directives before we come to any actual instructions. The .entrypoint directive tells the assembler that this method is the entry point to the program - the method at which execution starts. ilasm.exe will flag an error if it can't find a method marked as the entry point (unless of course you have used the /dll flag to indicate you are assembling to a DLL). We won't worry about the .maxstack directive for now.

Now onto the code:

   ldstr   "Hello, World"   call    void [mscorlib]System.Console::WriteLine(string)   ret 

It shouldn't be too hard to guess what's going on here. We start off by loading the string, "Hello, World". (We'll examine exactly what we mean by loading the string soon; for now we'll say that this causes a reference to the string to be placed in a special area of memory known as the evaluation stack.) Then we call the Console.WriteLine() method with which you'll be familiar from high-level programming. Finally, the ret command exits the Main() method, and hence (since this is the .entrypoint method) the program altogether.

The above IL assembly code has exactly the same effect as this C# code:

 Console.WriteLine("Hello, World"); return; 

However, the syntax for calling the method is a bit different in IL. In C#, as in most high-level languages, you specify the name of the method, followed by the names of the variables to be passed to the method. In IL, by contrast, you simply supply the complete signature of the method - including the full name, parameter list, and return type of the method. What is actually passed as the parameters to the method at run time will be whatever data is sitting on top of the evaluation stack, which is why we loaded the string onto the stack before calling the method.

Hopefully the Hello, World program has given you a small flavor of the way IL works. We'll now leave HelloWorld behind and go on to examine the principles behind how IL works in more detail - in particular including that evaluation stack.



Advanced  .NET Programming
Advanced .NET Programming
ISBN: 1861006292
EAN: 2147483647
Year: 2002
Pages: 124

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net