Customizing How Assemblies Are Loaded Using Only Managed Code | Customizing the Microsoft .NET Framework Common Language Runtime

By writing the runcocoon.exe host in unmanaged code, you were able to implement an assembly loading manager that enabled you to customize completely how the CLR loads assemblies. However, as you've seen, there were several new concepts to learn and a considerable amount of code to write. You can also customize the CLR assembly loading process to some degree by writing completely in managed code. The amount of customization available is less than what you can achieve by writing an assembly loading manager, but if what you need to accomplish can be done from within managed code, this approach can save you some time and effort.

In this section, you'll rewrite the runcocoon.exe host in managed code. Doing so gives you a good chance to contrast the amount of customization available between an unmanaged CLR host and a managed extensible application. The extensible application, called runcocoonm.exe, will provide the same basic functionality that the unmanaged host did. That is, it will run applications encased in .cocoon files. You'll invoke the application's entry point just as you did in runcocoon.exe and load the application's assemblies out of the cocoon file instead of having the CLR find them. Although on the surface the functionality you'll be providing is the same, there are several subtle differences in the way the two programs work. Understanding these differences can help you decide which approach best meets your needs.

Before I describe how runcocoonm.exe is implemented, take a look at the pieces you need to build it. To start with, let me revisit the requirements I set for the cocoon deployment model:

Assemblies must be loaded from formats other than standard executable files on disk. In this case, the assemblies must be loaded out of your custom deployment format, an OLE structured storage file.
Assemblies must be loaded from a location other than the application's base directory, the global assembly cache, or from locations described by codebases.
The assemblies contained in the cocoon are the exact assemblies to be loaded when the application is run. The presence of external version policy won't cause you to load a different assembly.

In the unmanaged implementation, these requirements were satisfied by writing an assembly loading manager. In managed code, you achieve a similar effect by using some of the managed methods and events on the System.AppDomain and the System.Reflection.Assembly classes. Specifically, the ability to load assemblies from alternate formats is provided by the versions of Assembly.Load and AppDomain.Load that enable you to specify an array of bytes containing the assembly you'd like to load. The ability to load assemblies from locations in which the CLR wouldn't normally find them is provided by an event on System.AppDomain called AssemblyResolve. The third requirementto be able to circumvent default version policyisn't directly provided in managed code. This is one of the primary limitations in what a managed program can do, as you'll see in a bit.

The next few sections describe how the Assembly.Load(byte[]...) and AppDomain.Load(byte[]...) methods and the AppDomain.AssemblyResolve event work. Once you see how to implement the pieces, you bring them together in writing the runcocoonm.exe sample.

The Load(byte[]...) Methods

In runcocoon.exe, you returned assemblies from the cocoon to the CLR by returning a pointer to an IStream interface from IHostAssemblyStore::ProvideAssembly. You can achieve the same effect from within managed code by passing the assembly you'd like to load as a managed byte array to AppDomain.Load and Assembly.Load. The following partial class definitions show the versions of Load that accept a byte array as input:

   public sealed class AppDomain : MarshalByRefObject, _AppDomain,       IEvidenceFactory    {      //...      public Assembly Load(byte[] rawAssembly)      public Assembly Load(byte[] rawAssembly,                           byte[] rawSymbolStore)      public Assembly Load(byte[] rawAssembly,                           byte[] rawSymbolStore,                           Evidence securityEvidence)      // ... }   public class Assembly : IEvidenceFactory, ICustomAttributeProvider,      ISerializable   {      // ...      static public Assembly Load(byte[] rawAssembly)      static public Assembly Load(byte[] rawAssembly,                                  byte[] rawSymbolStore)      static public Assembly Load(byte[] rawAssembly,                                  byte[] rawSymbolStore,                                  Evidence securityEvidence)      // ...   }

As you can see by these definitions, both Assembly.Load and AppDomain.Load also enable you to pass a byte array containing the debugging file. In the unmanaged implementation, you accomplished this by returning an IStream pointer to the debugging file from IHostAssemblyStore::ProvideAssembly. I'm going to skip the Evidence parameter for now and leave it for the discussion of security in Chapter 10.

On the surface, the forms of Load that accept a byte array and IHostAssemblyStore::ProvideAssembly provide the same functionalitythey both enable you to load an assembly from any store you choose. However, using the managed Load method to achieve this is much less efficient. To understand why, you need to take a high-level look at how memory is used by the CLR when it loads an assembly for execution. Before the CLR can run the code in an assembly, it reads the contents of the assembly into memory, verifies that it is well formed, and builds several internal data structures. All of this is done in a heap allocated by a component of the CLR called the class loader. Because these heaps hold only native CLR data structures and not managed objects, I refer to them as unmanaged heaps. In contrast, every process in which managed code is run has a heap where the managed objects are stored. This is the heap that is managed by the CLR garbage collector. I call this heap the managed heap. When an assembly is loaded from an IStream*, the CLR calls IStream::Read to pull the contents of the assembly into an unmanaged heap where it can be verified and then executed. Because the contents of the assembly can be directly loaded into an unmanaged heap, loading from an IStream* is very efficient, as shown in Figure 8-8.

Figure 8-8. Assembly loading from IStream*

Loading an assembly with the managed Load method is less efficient because extra copies of the assembly must be made in memory before the CLR can execute it. The Load method takes an array of managed byte objects. Because these objects are managed, they must live in the managed heap. However, the CLR ultimately needs a copy of the bytes in an unmanaged heap to run them. As a result, an extra copy is made by the CLR to move the assembly's contents from the managed heap to an unmanaged heap. In the cocoon scenario, the case is even worse. You start by reading the contents of a stream into unmanaged memory. From there, you marshal those bytes to the managed heap so Load can be called. The CLR implementation of Load then copies the bytes back to unmanaged memory again! So, you've made two full copies of the assembly before it can be executed. This situation is shown in Figure 8-9.

Figure 8-9. Assembly loaded from managed byte array (byte[ ])

In addition to the fact that multiple memory copies must be made to prepare an assembly for execution, the Load method is also less efficient because it doesn't provide a way for the caller to assign a unique identifier to the assembly. Recall that the CLR uses a unique identifier internally to prevent loading the same assembly multiple times. When loading an assembly from a file, the fully qualified filename is used as this unique identifier. When loading an assembly returned from IHostAssemblyStore::ProvideAssembly, the host creates a unique identity and returns it in the pAssemblyId parameter. Because there is no way to specify a unique identifier for an assembly loaded from a managed byte array, the CLR has no way to tell whether the same assembly is being loaded multiple times, so it must treat each call to Load as a separate assembly. As a result, much more memory is used than would be in scenarios when one assembly is loaded multiple times.

Despite its limitations, Load is still commonly used to load assemblies from custom formats. The reason, of course, is that it's far easier to use than implementing an entire assembly loading manager. If you need to load an assembly from something other than a standard PE file, try using Load first. If you find that the performance is inadequate for your scenario, you can always go back and reimplement part of your application in unmanaged code to take advantage of an assembly loading manager.

The AssemblyResolve Event

The CLR raises the AppDomain.AssemblyResolve event when it cannot resolve a reference to an assembly. Managed programs can load assemblies from locations in which the CLR wouldn't normally find them by providing a handler for this event.

The key difference between resolving assemblies by handling the AssemblyResolve event and by implementing an assembly loading manager is that the AssemblyResolve event is raised after the CLR has failed to locate an assembly where the assembly loading manager (specifically IHostAssemblyStore::ProvideAssembly) is called, before the CLR even starts looking. This difference has huge implications in that it prevents you from building an application model that is completely isolated from the way the CLR applies version policy and loads assemblies by default. As an example, consider how the cocoon deployment model is affected by this difference in behavior. Consider the case in which one of the assemblies contained in the cocoon file is also present in the global assembly cache. Because the CLR would look in the GAC first, the assembly would be found there and the event would never be raised, so you'd never have the chance to load the assembly out of the cocoon. Furthermore, consider the case in which version policy is present on the system for an assembly contained in the cocoon. Because policy is evaluated as part of the CLR normal resolution process, this could cause a different version of the assembly to be loadedagain without you ever getting the chance to affect this. So, as you can see, although the AssemblyResolve event does enable you to load an assembly from a location in which the CLR wouldn't normally look, it doesn't provide the same level of customization that you can achieve by writing an assembly loading manager in unmanaged code.

To use the AppDomain.AssemblyResolve event, you simply create a delegate of type System.ResolveEventHandler and add it to the application domain's list of handlers for the event as shown in the following code snippet:

class ResolveClass {    static Assembly AssemblyResolveHandler(Object sender, ResolveEventArgs e)    {       // Locate or create an assembly depending on your scenario and return       // it.       Assembly asm = ...       return asm;    }   static void Main(string[] args)   {      // ...      // Set up the delegate for the assembly resolve event.      Thread.GetDomain().AssemblyResolve +=new      ResolveEventHandler(ResolveClass.AssemblyResolveHandler);      //...   } }

As you can see, the AssemblyResolve event takes as input an object of type ResolveEventArgs and returns an instance of an Assembly. ResolveEventArgs has a public property called Name that contains the string name of the assembly the CLR could not locate. Upon return from the event handler, the CLR checks the assembly it has been given to make sure it has the identity given in the Name property. As long as the assembly you return has the correct identity, you're free to take whatever steps you need in your event handler to find the assembly.

The Runcocoonm Sample

Now that you've seen how the AssemblyResolve event and the Load(byte[]...) methods work, it's easy to put them together to implement the managed version of the cocoon host. You'll create a delegate to handle the AssemblyResolve event and add it to the default application domain's list of handlers. Then, just as you did in the unmanaged implementation, you'll use the CocoonHostRuntime assembly to invoke the main entry point for the application contained in the cocoon. The CLR will fail to find the assemblies in the cocoon, so it will raise the AssemblyResolve event and your handler will get called. In your handler, you'll use the ResolveEventArgs.Name property to determine which assembly you need to load. You pull that assembly out of the cocoon file as a managed array of bytes and call Assembly.Load.

Because the cocoon files are OLE structured storage files, you need some intermediate layer that reads the assembly using the structured storage interfaces (specifically IStorage and IStream) and then returns the contents as a series of bytes. I've written an unmanaged helper DLL called cocoonreader.dll that performs this task. The code in the runcocoonm.exe program uses the CLR Platform Invoke services to call cocoonreader.dll. The architecture of runcocoonm.exe is shown in Figure 8-10.

Figure 8-10. Runcocoonm architecture

The code for the complete program is shown in following listings. Listing 8-4 contains the code for cocoonreader.dll, and Listing 8-5 shows the code for runcocoonm.exe.

Listing 8-4. Cocoonreader.dll

// // Cocoonreader.cpp: Contains utilities used by runcocoonm.exe to read // assemblies out of OLE structured storage cocoon files. // #include "stdafx.h" // Opens a cocoon file given a name. Each call to CocoonOpenCocoon must // be matched by a call to CocoonCloseCocoon. extern "C" __declspec(dllexport) HRESULT CocoonOpenCocoon( LPWSTR pszCocoonName, IStorage **pRootStorage) {    return StgOpenStorage(pszCocoonName, NULL, STGM_READ | STGM_DIRECT |       STGM_SHARE_EXCLUSIVE, NULL, 0, pRootStorage); } // Closes the cocoon file by releasing the cocoon's root storage. extern "C" __declspec(dllexport) HRESULT CocoonCloseCocoon( IStorage *pRootStorage) {    if (pRootStorage) pRootStorage->Release();    return S_OK; } // Opens a stream within a cocoon given a name. Each call to CocoonOpenStream // must be matched by a call to CocoonCloseStream. extern "C" __declspec(dllexport) HRESULT CocoonOpenStream( IStorage *pRootStorage, LPWSTR pszStreamName, IStream **pStream) {    return pRootStorage->OpenStream(pszStreamName, 0, STGM_READ | STGM_DIRECT |       STGM_SHARE_EXCLUSIVE, 0, pStream); } // Closes a stream. extern "C" __declspec(dllexport) HRESULT CocoonCloseStream(IStream *pStream) {    if (pStream) pStream->Release();    return S_OK; } // Returns the size of a stream in bytes. extern "C" __declspec(dllexport) HRESULT CocoonGetStreamSize(IStream *pStream, DWORD *pSize) {    assert(pStream);    // Get the statistics for the stream - which includes the size.    STATSTG stats;    pStream->Stat(&stats, STATFLAG_DEFAULT);    // Return the size.    *pSize = stats.cbSize.LowPart;    return S_OK; } // Returns the contents of the stream. The caller is responsible // for allocating and freeing the memory pointed to by pBytes. extern "C" __declspec(dllexport) HRESULT CocoonGetStreamBytes( IStream *pStream, BYTE *pBytes) {    assert (pStream);    // Get the number of bytes to read.    STATSTG stats;    pStream->Stat(&stats, STATFLAG_DEFAULT);    DWORD dwSize = stats.cbSize.LowPart;    // Read from the stream.    DWORD dwBytesRead = 0;    pStream->Read(pBytes, dwSize, &dwBytesRead);    assert (dwSize == dwBytesRead);    return S_OK; }

Listing 8-5. Runcocoonm.exe

using System; using System.Runtime.InteropServices; using System.Reflection; using System.Threading; using CocoonRuntime; namespace RunCocoonM {    class CCocoonHost    {       // Import the definitions for the helper routines from       // cocoonreader.dll.       [ DllImport( "CocoonReader.dll",CharSet=CharSet.Unicode)]       public static extern int CocoonOpenCocoon(string cocoonName,          ref IntPtr pCocoon);       [ DllImport( "CocoonReader.dll",CharSet=CharSet.Unicode)]       public static extern int CocoonCloseCocoon(IntPtr pCocoon);       [ DllImport( "CocoonReader.dll",CharSet=CharSet.Unicode)]       public static extern int CocoonOpenStream(IntPtr pCocoon,          string streamName, ref IntPtr pStream);       [ DllImport( "CocoonReader.dll",CharSet=CharSet.Unicode)]       public static extern int CocoonCloseStream(IntPtr pStream);       [ DllImport( "CocoonReader.dll",CharSet=CharSet.Unicode)]       public static extern int CocoonGetStreamSize(IntPtr pStream,          ref int size);       [ DllImport( "CocoonReader.dll",CharSet=CharSet.Unicode)]       public static extern int CocoonGetStreamBytes(IntPtr pStream,          IntPtr streamBytes); static Assembly AssemblyResolveHandler(Object sender, ResolveEventArgs e) {    // Get the name of the assembly you need to resolve from the    // event args. You want just the simple text name. If the name    // is fully qualified, you want just the portion before the    // comma.    string simpleAssemblyName;    int commaIndex = e.Name.IndexOf('.');    if (commaIndex == -1)       simpleAssemblyName = e.Name;    else       simpleAssemblyName = e.Name.Substring(0, commaIndex);    // Retrieve the cocoon from the application domain property.    IntPtr pCocoon = (IntPtr) Thread.GetDomain().GetData("Cocoon");    // Open the stream for the assembly.    IntPtr pStream = IntPtr.Zero;    CocoonOpenStream(pCocoon, simpleAssemblyName, ref pStream);    // Call the helper DLL to get the number of bytes in the stream    // you're about to read. You need the size so you can allocate    // the correct number of bytes in the managed array.    int size = 0;    CocoonGetStreamSize(pStream, ref size);    // Allocate enough memory to hold the contents of the entire    // stream.    IntPtr pBytes = Marshal.AllocHGlobal(size);    // Read the assembly from the cocoon.    CocoonGetStreamBytes(pStream, pBytes);    // Copy the bytes from unmanaged memory into your managed byte    // array. You need the bytes in this format to call    // Assembly.Load.    byte[] assemblyBytes = new byte[size];    Marshal.Copy(pBytes, assemblyBytes, 0 , size);    // Free the unmanaged memory.    Marshal.FreeHGlobal(pBytes);    // Close the stream.    CocoonCloseStream(pStream);    // Load the assembly from the byte array and return it.    return Assembly.Load(assemblyBytes, null, null); } static string GetTypeNameString() {    // Retrieve the cocoon from the application domain property.    IntPtr pCocoon = (IntPtr) Thread.GetDomain().GetData("Cocoon");    // Open the "_entryPoint" stream.    IntPtr pStream = IntPtr.Zero;    CocoonOpenStream(pCocoon, "_entryPoint", ref pStream);    // Get the size of the stream containing the main type name.    // You need to know the size so you can allocate the correct    // amount of space to hold the name.    int size = 0;    CocoonGetStreamSize(pStream, ref size);    // Allocate enough space to hold the main type name.    IntPtr pBytes = Marshal.AllocHGlobal(size);    // Read the main type name from the cocoon.    CocoonGetStreamBytes(pStream, pBytes);    // Copy the stream's contents from unmanaged memory into a    // managed character array - then create a string from the    // character array.    char[] typeNameChars = new char[size];    Marshal.Copy(pBytes, typeNameChars, 0 , size);    string typeName = new string(typeNameChars, 0, size/2);    // Free the unmanaged memory.    Marshal.FreeHGlobal(pBytes);    // Close the "MainTypeName" stream.    CocoonCloseStream(pStream);    return typeName; } [STAThread] static void Main(string[] args) {     // Make sure the name of a cocoon file was passed on the     // command line.     if (args.Length != 1)     {        Console.WriteLine("Usage: RunCocoonM <cocoon file>");        return;     }     // Open the cocoon file and store a pointer to it in     // an application domain property. You need this value in     // your AssemblyResolve event handler.     IntPtr pCocoon = IntPtr.Zero;     CocoonOpenCocoon(args[0], ref pCocoon);     Thread.GetDomain().SetData("Cocoon", pCocoon);     // Strip off the .cocoon from the command-line argument to get     // the name of the assembly within the cocoon that contains the     // Main method.     int dotIndex = args[0].IndexOf('.');     string assemblyName = args[0].Substring(0, dotIndex);       // Get name of type containing the application's Main method       // from the cocoon.       string typeName = CCocoonHost.GetTypeNameString();       // Set up the delegate for the assembly resolve event.       Thread.GetDomain().AssemblyResolve +=new          ResolveEventHandler(CCocoonHost.AssemblyResolveHandler);       // Use CocoonHostRuntime to invoke Main.       CocoonDomainManager cdm = new CocoonDomainManager();       cdm.Run(assemblyName, typeName);       // Close the cocoon file.       CocoonCloseCocoon(pCocoon);     }   } }

Supporting Multifile Assemblies

Managed programs can support multifile assemblies by handling the Assembly.ResolveModule event. This event is raised by the CLR if it cannot find one of an assembly's modules at run time, just as the AssemblyResolve event is raised if the file containing the assembly's manifest cannot be found.

Because modules are always part of an assembly, handlers for the ModuleResolve event are always associated with a particular assembly, not with the application domain as the AssemblyResolve event is. To register for the ModuleResolve event, create a delegate of type ModuleResolveEventHandler and add it to the appropriate assembly's list of handlers. The following code shows a handler for the ModuleResolve event being registered for the currently executing assembly:

class ResolveClass {    static Module ModuleResolveHandler(Object sender, ResolveEventArgs e)    {       // Locate or create the module depending on your scenario and       // return it.       Module m = ...       return m;    }    static void Main(string[] args)    {       // ...       // Set up the delegate for the module resolve event.       Assembly currentAssembly = Assembly.GetExecutingAssembly();       currentAssembly.ModuleResolve += new          ModuleResolveEventHandler(ResolveClass.ModuleResolveHandler);       //...    } }

The ModuleResolve event takes as input an object of type ResolveEventArgs and returns an instance of a System.Reflection.Module. ResolveEventArgs is the same type of class that was passed to the AssemblyResolve event. In this case, the Name property contains the string name of the module the CLR could not locate.

The most common way to get an instance of Module to return from the ModuleResolve event is to load one from an array of bytes using the LoadModule method on System.Reflection.Assembly. (The other way to get a module is to create one dynamically using the classes in System. Reflection.Emit.) LoadModule has two overloads, one that enables you to specify debugging information in addition to the module itself and one that does not:

public Module LoadModule(String moduleName,                          byte[] rawModule) public Module LoadModule(String moduleName,                          byte[] rawModule,                          byte[] rawSymbolStore)

When calling LoadModule, you must pass the name of the module in the moduleName parameter. The CLR uses this string to identify which module is being loaded by checking the name against the list of modules stored in the assembly's manifest. If you pass a moduleName that cannot be found in the manifest, the CLR throws an exception of type System.ArgumentException.