Coding using Reflection.Emit | Advanced .NET Programming

In most cases, Reflection.Emit is going to be the preferred technique for creating assemblies. Not only is it more efficient, but the programming model is simpler, and - provided you have no need for any source code files - more flexible. (Note that if you need to see the emitted code in text form, you can always use ildasm.exe to disassemble it). Also, because you are effectively directly writing IL to the new assemblies, you are not confined to those CLR features that are implemented by any one high-level language in the emitted code. The only proviso is that - obviously - you won't get far with the Reflection.Emit classes unless you are familiar with IL.

With Reflection.Emit, the newly emitted assembly is not initially created as a file. Rather, you specify an application domain in which to create the assembly. The assembly is then created in memory, in the virtual address space for that application domain. You can then do one of two things with the new assembly:

Save it to a file
Execute it, without ever saving it

Obviously if you don't save the assembly to a file, it will cease to exist as soon as the application domain is unloaded, and so won't be available for any other programs to use. Such an assembly is known as a transient assembly.

You'll start off the operation of creating a transient assembly like this:

 // appDomain is a reference to the AppDomain the assembly is to be created in // assemblyName is an AssemblyName object that gives the name of an assembly AssemblyBuilder assembly = appDomain.DefineDynamicAssembly(assemblyName,                               AssemblyBuilderAccess.Run);

This code uses an enumeration, System.Reflection.Emit.AssemblyBuilderAccess, which defines what you want to do with the assembly. Possible values are Run, Save, and RunAndSave. RunAndSave is the most flexible - it means you'll be able to execute the assembly as soon as you've finished building it, and you'll be able to save it to file too.

The above code starts the whole process off. The next thing to do is add a module to the assembly:

 module = assembly.DefineDynamicModule("MainModule");

The string passed to DefineDynamicModule gives the name of the module. There's also a two-parameter version of this method, which is used if you want to save the assembly, and which takes as a second parameter the name of the file to save this module to.

The next step is to define your classes, etc. that will go in the module. The following code creates a class called MyClass, which is derived from System.Object.

 TypeBuilder myClass = module.DefineType("MyClass", TypeAttributes.Class,                                         typeof(System.Object));

Once you have a type, you'll want to add members to it. There are various TypeBuilder methods defined to do this, with names like DefineMethod(), DefineConstructor(), DefineEvent(), and DefineField(). For example, to add a method to the class you might do this:

 MethodBuilder myMethod = myClass.DefineMethod("GroovyMethod",      MethodAttributes.Public | MethodAttributes.Virtual,      typeof(float), new Type [] {typeof(int), typeof(int)});

Defining a method is more complex than most operations because you have a lot more information to supply. The first parameter to DefineMethod() gives the name of the method. The second parameter is a System.Reflection.MethodAttributes flag enumeration which specifies such things as whether the method should be public, private, protected, etc., and whether it should be virtual. The third parameter is the return type, and the final parameter is a Type[] array that lists the types of the arguments this method will take. So the above line of code will create a method with the IL equivalent of this signature:

 public virtual float GroovyMethod(int, int)

Adding code to the method is the point at which you need to start getting your hands dirty with IL. The starting point is a method called MethodBuilder.GetGenerator(). This returns an ILGenerator object which can be used to add code to the method using an Emit() method:

 ILGenerator ilStream = groovyMethod.GetILGenerator(); ilStream.Emit(OpCodes.Ldarg_1); ilStream.Emit(OpCodes.Ldarg_2); ilStream.Emit{OpCodes.Add); ilStream.Emit(OpCodes.Conv_R4); ilStream.Emit(OpCodes.Ret);

This code adds the following IL to the method:

 ldarg.1 ldarg.2 add conv.r4 ret

The emitted method here simply adds the two arguments, converts the result to a float, and returns this result. Note that the arguments are indexed as 1 and 2 since argument 0 is the this reference).

The ILGenerator.Emit() method deserves a bit more analysis - it has many overloads. However, the first argument is always the IL command to be added. This command is always represented by the type System.Reflection.Emit.OpCodes - a class that is designed to represent the IL opcodes, and has a large number of static read-only fields that represent all the different codes. In fact, the class implements only static fields - you cannot instantiate it. (There is also a static method that can be used to obtain information about the arguments required by different opcodes.) The names of the fields of this class are basically the same as the mnemonics for the corresponding IL opcodes - except for a couple of modifications to conform to normal .NET naming conventions: the field names each start with an uppercase letter, and any dots in the mnemonic are replaced by underscores. Hence OpCodes.Ldarg_1 represents the IL opcode ldarg.1, and so on.

I mentioned there are other overloads of ILGenerator.Emit(). This is to take account of the fact that many opcodes require one or more arguments of different types - the other ILGenerator.Emit() overloads take additional parameters that specify the opcode arguments to be added to the IL stream. We'll see some of these overloads in action in the examples later in the chapter.

Once you've added all the members and IL instructions and so on to a type, you actually complete the process of defining the type and making sure the type is added to the module like this:

 Type myNewType = myClass.CreateType();

In other words, you call the TypeBuilder.CreateType() method. You can think of all the stuff up to this point as simply telling the type builder what code and metadata you will want in the type when it's created. CreateType() is the method that does the work and actually creates the type. The neat thing is that it returns a System.Type reference to a Type object that describes the newly created type. This is a fully working type reference. This means that, provided you created the assembly as a Run or RunAndSave assembly, you can start manipulating this type straight away. For example, you can instantiate it (provided, of course, that you initially indicated in the AppDomain.CreateDynamicAssembly() call that this assembly was to be run).

 // paramsList is an object [] array that gives the parameters to // be supplied to the MyClass constructor object myNewObject = Activator.CreateInstance(myNewType, paramsList);

If, on the other hand, you just want to save the assembly, you can do this:

 assembly.Save("MyNewAssembly.dll");

One other neat thing you'll discover is this: Reflection.Emit classes are not only based on the .NET type system, but in many cases are directly derived from corresponding classes in the Sytem.Reflection namespace. For example, you'll no doubt be familiar with using the System.Reflection.Assembly class, which represents an assembly. The System.Reflection.Emit.AssemblyBuilder class is derived from Assembly, and additionally implements methods that allow you to create a new assembly instead of loading an existing one. Similarly, TypeBuilder is derived from System.Type, while ModuleBuilder is derived from System.Reflection.Module. The same pattern applies for most of the Reflection classes that represent items in an assembly. The beauty of this model is the way that it permits inline use of transient assemblies. The AssemblyBuilder and related classes that you use to create a new assembly already contain all the properties, etc. you need to query information about the assembly and its contained types.

Creating a Saved Executable Assembly

We are now going to present an example, called EmitHelloWorld, which uses the Reflection.Emit classes to create an executable assembly that contains the usual Hello, World! Main() method. The example will generate an assembly containing this IL code (as viewed in ildasm.exe):

 .method public static void Main() cil managed  {   .entrypoint   // Code size       11 (0xb)   .maxstack  1   IL_0000:  ldstr      "Hello, World!"   IL_0005:  call       void [mscorlib]System.Console::WriteLine(string)   IL_000a:  ret }

The code for the example looks like this:

 public static void Main() {    AssemblyName assemblyName = new AssemblyName();    assemblyName.Name = "HelloWorld";    assemblyName.Version = new Version("1.0.1.0");    AssemblyBuilder assembly = Thread.GetDomain().              DefineDynamicAssembly(assemblyName, AssemblyBuilderAccess.Save);    ModuleBuilder module;    module = assembly.DefineDynamicModule("MainModule", "HelloWorld.exe");    MethodBuilder mainMethod = module.DefineGlobalMethod("Main",       MethodAttributes.Static | MethodAttributes.Public, typeof(void), ,       Type.EmptyTypes);    Type[] writeLineParams = { typeof(string) };    MethodInfo writeLineMethod = typeof(Console).GetMethod("WriteLine",                                                           writeLineParams);    ILGenerator constructorILmainMethod.GetILGenerator();    constructorIL.Emit(OpCodes.Ldstr, "Hello, World!");    constructorIL.Emit(OpCodes.Call, writeLineMethod);    constructorIL.Emit(OpCodes.Ret);    module.CreateGlobalFunctions();    assembly.SetEntryPoint(mainMethod, PEFileKinds.ConsoleApplication);    assembly.Save ("HelloWorld.exe"); }

The first thing we do in this code is to define the identity for the assembly. An assembly identity is represented by the System.Reflection.AssemblyName class. Next we use the Thread.GetDomain() method to retrieve a reference to the application domain in which the current (main) thread of execution is running, and ask the application domain to create a new assembly, specifying that the new assembly is to be saved to file:

 AssemblyName assemblyName = new AssemblyName(); assemblyName.Name = "HelloWorld"; assemblyName.Version = new Version("1.0.1.0"); AssemblyBuilder assembly = Thread.GetDomain().           DefineDynamicAssembly(assemblyName, AssemblyBuilderAccess.Save);

From this, we create a module, and define a method. For such a simple application as we are creating, no types need to be defined - all we need is to set the Main() method up as a global method. This is done with the ModuleBuilder.DefineGlobalMethod() method:

 MethodBuilder mainMethod = module.DefineGlobalMethod("Main",       MethodAttributes.Static | MethodAttributes.Public, typeof(void), ,       Type.EmptyTypes);

The four parameters passed to DefineGlobalMethod() are respectively the method name, attributes, return type (void in this case), and an object[] array giving the parameter list. We pass a special field, Type.EmptyType for the final parameter to indicate the method will not take any parameters.

The next step will be to define the IL instruction stream for this method. Before we do that, we need a bit of preliminary work. The IL instruction stream is going to contain a call command to call the Console.WriteLine() method. So before we start, we need a MethodRef object that refers to this method. We can use the Type.GetMethod() method to achieve this:

 Type[] writeLineParams = { typeof(string) }; MethodInfo writeLineMethod = typeof(Console).GetMethod("WriteLine",                                                        writeLineParams);

Notice that because here we are simply retrieving a MethodInfo reference that describes an existing method, we can use the GetMethod() method that is implemented by TypeBuilder's base type, Type - this statement is exactly the same as you would see in normal reflection calls.

Now we can write out the instruction stream.

 ILGenerator constructorIL = mainMethod.GetILGenerator(); constructorIL.Emit(OpCodes.Ldstr, "Hello, World!"); constructorIL.Emit(OpCodes.Call, writeLineMethod); constructorIL.Emit(OpCodes.Ret);

The Reflection.Emit classes can automatically work out the required .maxstack size and insert it into the assembly metadata - so we don't need to worry about that.

The ldstr command is emitted by a two-parameter overload of ILGenerator.Emit(). The second parameter is simply a string - and this method will automatically cause the string to be added to the metadata as a string literal. Similarly, another overload of Emit() can emit the call command. This overload takes a MethodInfo reference as the second parameter, to identify the method to be called. Internally, the EmitCall() method will add an appropriate MethodRef field to the metadata and construct the metadata token that will be inserted into the IL stream as the argument to the call opcode.

Unfortunately, although it should be obvious that certain overloads of ILGenerator.Emit() will only generate correct IL if used with certain opcodes, these methods don't appear to perform any checking on the opcodes they have been passed. This means that it's very easy to use the overloads of this method to write out an instruction that has an argument type that isn't appropriate to the opcode - for example emitting an ldc.i4.0 command (which doesn't take an argument) and putting a method token as an argument! Obviously, the JIT compiler won't be able to make any sense of the resultant instruction stream since it'll interpret the 'argument' as more IL opcodes!

Finally, we need to do a bit of finishing off. The final three statements in the example create our global Main() function, set it up as the entry point method for the assembly, make sure it's an executable assembly (as opposed to a DLL) that will be emitted, and finally actually save the assembly:

 module.CreateGlobalFunctions(); assembly.SetEntryPoint(mainMethod, PEFileKinds.ConsoleApplication); assembly.Save("HelloWorld.exe");

ModuleBuilder.CreateGlobalFunctions() does for global functions what TypeBuilder.CreateType() does for types: it finishes the job of writing the methods and accompanying metadata to the assembly.

Notice that the string we've passed as a file name to the Save() method is the same as the file name we specified for the module. That ensures the module will be placed in the same file as the assembly itself - as you'd normally expect for a prime module. It might seem odd that we are specifying the same file name twice, but the distinction is important, since it's possible that we might want to create a multifile assembly - in which case some modules would be placed in different files from the main assembly file. Hence we separately specify the file name of the assembly and the file name of the module.

Creating and Running a DLL Assembly

Our second and final Reflection.Emit example, the EmitClass example, is similar to the previous example, but illustrates creating a DLL assembly. In this case, we'll define a class with a constructor and member function instead of a global function, which will somewhat affect the code in the example. We will also both save the assembly and instantiate the class defined in it.

The class we want to create will be called Utilities. It is a very simple class, but it will suffice for our purposes. The class contains an instance string member field, which contains the name of each instance. The value of this field is supplied on construction, and the class overrides Object.ToString() to return this field. The IL emitted, as viewed in ildasm.exe, looks like this:

 .class private auto ansi Utilities        extends [mscorlib]System.Object {   .field privatescope string a$PST04000001   .method public virtual instance string           ToString() cil managed   {     // Code size       7 (0x7)     .maxstack  1     IL_0000:  ldarg.0     IL_0001:  ldfld      string Utilities::a$PST04000001     IL_0006:  ret   } // end of method Utilities::ToString   .method public specialname rtspecialname           instance void .ctor(string name) cil managed   {     // Code size       14 (0xe)     .maxstack  4     IL_0000:  ldarg.0     IL_0001:  call       instance void [mscorlib]System.Object::.ctor()     IL_0006:  ldarg.0     IL0007:   ldarg.1     IL_0008:  stfld      string Utilities::a$PST04000001     IL_000d:  ret   } // end of method Utilities::.ctor  }

Notice the strange name, a$PST04000001, of the field containing the object's name. This is not the real name of the field. The actual name is simply a, but ildasm.exe always appends a string starting with $PST to the names of privatescope members when disassembling. This is to make sure there are no ambiguities if the file needs to be reassembled, since as we saw in Chapter 12, there are good security-related reasons for its name not to be meaningful.

For the benefit of anyone who isn't yet comfortable with reading such a long snippet of IL, I'll add that the C# equivalent of this code is:

 public class Utilities {    private string a;    public override string ToString() { return a; }    public Utilities(string name) { a = name; } }

In more detail, I can now say that the example creates an assembly containing the Utilities class. It then instantiates a Utilities object and calls its ToString() method - just to test that it works. Having done all that, it saves the assembly.

Here is the code for the Main() method in the example.:

 public static void Main() {    AssemblyName assemblyName = new AssemblyName();    assemblyName.Name = "Utilities";    assemblyName.Version = new Version("1.0.1.0");    AssemblyBuilder assembly = Thread.GetDomain()       DefineDynamicAssembly(assemblyName, AssemblyBuilderAccess.RunAndSave);    ModuleBuilder module;    module = assembly.DefineDynamicModule("MainModule", "Utilities.dll");    TypeBuilder utilsTypeBldr =       module.DefineType("Wrox.AdvDotNet.EmitClass.Utilities",       TypeAttributes.Class | TypeAttributes.Public,typeof(System.Object));    FieldBuilder nameFld = utilsTypeBldr.DefineField("a", typeof(string),       FieldAttributes.PrivateScope);    MethodBuilder toStringMethod = utilsTypeBldr.DefineMethod("ToString",       MethodAttributes.Public | MethodAttributes.Virtual, typeof(string),       Type.EmptyTypes);    ILGenerator toStringIL = toStringMethod.GetILGenerator();    toStringIL.Emit(OpCodes.Ldarg_0);    toStringIL.Emit(OpCodes.Ldfld, nameFld);    toStringIL.Emit(OpCodes.Ret);    Type[] constructorParamList = { typeof(string) };    ConstructorInfo objectConstructor = (typeof(System.Object))                                           GetConstructor(new Type[0]);    ConstructorBuilder constructor = utilsTypeBldr.DefineConstructor(       MethodAttributes.Public, CallingConventions.Standard,       constructorParamList); ILGenerator constructorIL = constructor.GetILGenerator();    constructorIL.Emit(OpCodes.Ldarg_0);    constructorIL.Emit(OpCodes.Call, objectConstructor);    constructorIL.Emit(OpCodes.Ldarg_0);    constructorIL.Emit(OpCodes.Ldarg_1);    constructorIL.Emit(OpCodes.Stfld, nameFld);    constructorIL.Emit(OpCodes.Ret);    Type utilsType = utilsTypeBldr.CreateType();    object utils = Activator.CreateInstance(utilsType, new object[] {                   "New Object!"} );    object name = utilsType.InvokeMember("ToString",                                BindingFlags.InvokeMethod, null, utils, null);    Console.WriteLine("ToString() returned: " + (string)name);    assembly.Save("Utilities.dll"); }

This code starts off in much the same way as the previous example. But once we have the module, instead of creating a global function, it uses the ModuleBuilder.DefineType() method to create the Utilities type, followed by the TypeBuilder.DefineField() method to create the member field:

 TypeBuilder utilsTypeBldr = module.DefineType("Utilities",                            TypeAttributes.Class | TypeAttributes.Public,                            typeof(System.Object)); FieldBuilder nameFld = utilsTypeBldr.DefineField("a", typeof(string),                            FieldAttributes.PrivateScope);

We similarly define a method using TypeBuilder.DefineMethod() and a constructor using TypeBuilder.DefineConstructor(), and use the same techniques we demonstrated in the previous example to add IL code to these members.

When we've done all this we use TypeBuilder.CreateType() to simultaneously actually create the type in the assembly, and return a System.Type reference to this type. This Type reference can then be used in the Activator.CreateInstance() and Type.InvokeMember() methods to instantiate a Utilities object, and call its ToString() method:

 Type utilsType = utilsTypeBldr.CreateType(); object utils = Activator.CreateInstance(utilsType, new object[] {                "New Object!" }); object name = utilsType.InvokeMember("ToString", BindingFlags.InvokeMethod,                                       null, utils, null);

One point to watch out for is that, even though TypeBuilder is derived from Type, I have used the Type reference utilsType, which is returned from TypeBuilder.CreateType(), when instantiating the object and invoking methods. Although it looks syntactically correct to use the utilsTypeBldr variable instead of utilsType in these methods, doing so won't work here. This is because the type doesn't actually exist until you call TypeBuilder.CreateType() - which would make it hard to create an instance of it! Using the returned Type object is safe because if you have that object then you can be certain that the type exists. Indeed, Activator.CreateInstance() has been implemented to check that it hasn't been passed a TypeBuilder reference, and will raise an exception if it finds one.

When using a dynamically created assembly, we have to use reflection-based methods to invoke members on the types so defined - because our original assembly does not have the necessary embedded metadata to be able to use these types directly. Thus there is going to be a performance hit whenever execution flow crosses the boundary from the old assembly to the new one. You can minimize the impact of this by making sure the new assembly has methods that perform a large amount of processing, and making sure the 'interface' between the two assemblies isn't too chatty. In other words, have a few calls across the boundary that perform lots of processing each, rather than lots of calls that each perform only a little processing. These are just the same performance considerations that apply to the managed-unmanaged code boundary or to the crossing of application domains, although the performance hit when using reflection is likely to be greater.

The final action in our example is to save the assembly for future use.

 assembly.Save("Utilities.dll");

When developing code that uses Reflection.Emit, it's well worth regularly running peverify.exe on the dynamically emitted assemblies - just to make sure that your code does generate correct and valid IL.

Obviously, once the new assembly is saved, any future code that you write that depends on this assembly will be able to reference the saved assembly's metadata in the normal way, so won't need to use reflection to access its types.

Be aware that this action is completely independent of the fact that we have already used the Utilities.dll assembly. Remember that you have a complete choice - you can use the assembly from your code, or save it, or do both.