Manifest | Inside Microsoft .NET IL Assembler

Manifest

The metadata that describes an assembly and its modules is referred to as a manifest. The manifest carries the following information:

Identity, including a simple textual name, an assembly version number, an optional culture if the assembly contains localized managed resources, and an optional public key if the assembly is strong-named. This information is defined in two metadata tables: Module and Assembly (in the prime module only).
Contents, including types and managed resources exposed by this assembly for external use and the location of these types and resources. The metadata tables that contain this information are ExportedType (in the prime module only) and ManifestResource.
Dependencies, including other (external) assemblies this assembly references and, in the case of a multimodule assembly, other modules of the same assembly. You can find the dependency information in these metadata tables: AssemblyRef, ModuleRef, and File.
Requested permissions, specific to the assembly as a whole. More specific requested permissions might also be defined for certain types (classes) and methods. This information is defined in the DeclSecurity metadata table. (Chapter 14, “Security Attributes,” describes requested permissions and their declaration.)
Custom attributes, specific to the manifest components. Custom attributes provide additional information used by compilers and other tools. The common language runtime recognizes a limited number of custom attributes. Custom attributes are defined in the CustomAttribute metadata table. (Refer to Chapter 13, “Custom Attributes,” for more information on this topic.)

The diagram in Figure 5-1 shows the mutual references that take place between the metadata tables constituting the manifest.

Figure 5-1 Mutual references between the manifest’s metadata tables.

Assembly Metadata Table and Declaration

The Assembly metadata table contains at most one record, which appears in the prime module’s metadata. The table has the following column structure:

HashAlgId (4-byte unsigned integer) The ID of the hash algorithm used in this assembly to hash the files. The value must be one of the CALG_* values defined in the header file Wincrypt.h. The default hash algorithm is CALG_SHA (a.k.a. CALG_SHA1) (0x8004). ECMA specifications consider this algorithm to be standard, offering the best widely available technology for file hashing.
MajorVersion (2-byte unsigned integer) The major version of the assembly.
MinorVersion (2-byte unsigned integer) The minor version of the assembly.
BuildNumber (2-byte unsigned integer) The build number of the assembly.
RevisionNumber (2-byte unsigned integer) The revision number of the assembly.
Flags (4-byte unsigned integer) Assembly flags indicating limitations on running different versions of this assembly side by side.
PublicKey (offset in the #Blob stream) A binary object representing a public encryption key for a strong-named assembly.
Name (offset in the #Strings stream) The assembly name, which must be nonempty and must not contain a path or a filename extension.
Locale (offset in the #Strings stream) The culture (formerly known as locale) name, such as en-US (American English) or fr-CA (Canadian French). The culture name must match one of hundreds of culture names “known” to the runtime through the .NET Framework class library, but this validity rule is rather meaningless: to use a culture, the specific language support must be installed on the target machine. If the language support is not installed, it doesn’t matter whether the culture is “known” to the runtime.

In ILAsm, the Assembly is declared in the following way:

   .assembly <flags> <name> { <assemblyDecl>* }

where <flags> ::=

   <none> // No limitations on side-by-side running of the assembly      noappdomain  // No side-by-side running within one AppDomain      noprocess  // No side-by-side running within one process      nomachine // No side-by-side running on the same machine

and <assemblyDecl> ::=

   .hash algorithm <int32>  // Set hash algorithm ID      .ver <int32>:<int32>:<int32>:<int32> // Set version numbers      .publickey = ( <bytes> ) // Set public encryption key      .locale <quotedString> // Set assembly culture      <securityDecl> // Set requested permissions       <customAttrDecl> // Define custom attribute(s)

In this declaration, <int32> denotes an integer number, at most 4 bytes in size. The notation <bytes> represents a sequence of two-digit hexadecimal numbers, each representing 1 byte; this form, bytearray, is often used in ILAsm to represent binary objects of arbitrary size. Finally, <quotedString> denotes, in general, a composite quoted string—that is, a construct such as "ABC"+"DEF"+"GHI". The concatenation with the plus sign is useful for defining very long strings, although in this case we don’t need concatenation for strings such as en-US or nl-BE.


	In addition to the three flags related to side-by-side execution, three more, which are not relevant to the discussion at hand, are available. One indicates whether the assembly holds a full public key. This flag is never set explicitly; rather, it is set when a PublicKey entry is defined. The other two flags, EnableJITcompilerTracking and DisableJITcompileroptimizer, are related to the debug mode of the JIT (just-in-time) compiler and are set at the module load time.

AssemblyRef Metadata Table and Declaration

The AssemblyRef (assembly reference) metadata table defines the external dependencies of an assembly or a module. Both prime and nonprime modules can—and do, as a rule—contain this table. The only assembly that does not depend on any other assembly, and hence has an empty AssemblyRef table, is Mscorlib.dll, the root assembly of the .NET Framework class library.

The column structure of the AssemblyRef table is as follows:

MajorVersion (2-byte unsigned integer) The major version of the assembly.
MinorVersion (2-byte unsigned integer) The minor version of the assembly.
BuildNumber (2-byte unsigned integer) The build number of the assembly.
RevisionNumber (2-byte unsigned integer) The revision number of the assembly.
Flags (4-byte unsigned integer) Assembly reference flags, which indicate whether the assembly reference holds a full unhashed public key or a “surrogate” (public key token).
PublicKeyOrToken (offset in the #Blob stream) A binary object representing a public encryption key for a strong-named assembly or a token of this key. A key token is an 8-byte representation of a hashed public key.
Name (offset in the #Strings stream) A referenced assembly name, which must be nonempty and must not contain a path or a filename extension.
Locale (offset in the #Strings stream) The culture name.
HashValue (offset in the #Blob stream) A binary object representing a hash of the metadata of the referenced assembly’s prime module. Because this value is ignored by the loader in the first release of the common language runtime, it can safely be omitted.

In ILAsm, an AssemblyRef is declared in the following way:

   .assembly extern <name> { <assemblyRefDecl>* }

where <assemblyRefDecl> ::=

     .ver <int32>:<int32>:<int32>:<int32> // Set version numbers      .publickey = ( <bytes> ) // Set public encryption key      .publickeytoken = ( <bytes> ) // Set public encryption key token      .locale <quotedString> // Set assembly locale      .hash = ( <bytes> ) // Set hash value      <customAttrDecl> // Define custom attribute(s)

As you might have noticed, ILAsm does not provide a way to set the flags in the AssemblyRef declaration. The explanation is simple: the only flag relevant to an AssemblyRef is the flag indicating whether the AssemblyRef carries a full unhashed public encryption key, and this flag is set only when the .publickey directive is used.

When referencing a strong-named assembly, you are required to specify .publickeytoken (or .publickey, which is rarely used in AssemblyRefs) and .ver. The only exception to this rule among the strong-named assemblies is Mscorlib.dll.

If .locale is not specified, the referenced assembly is presumed to be “culture-neutral.”

An interesting situation arises when we need to use two or more versions of the same assembly side by side. An assembly is identified by its name, version, public key (or its token), and culture. It would be extremely cumbersome to list all these identifications every time we reference an assembly: “I want to call method Bar of class Foo from assembly SomeOtherAssembly, and I want the version number such-and-such, the culture nl-BE, and ” Of course, if we didn’t need to use different versions side by side, we could simply refer to an assembly by name.

ILAsm provides an AssemblyRef aliasing mechanism to deal with such situations. The AssemblyRef declaration can be extended as shown here:

   .assembly extern <name> as <alias> { <assemblyRefDecl>* }

And whenever we need to reference this assembly, we can use its <alias>, as seen in this example:

   .assembly extern SomeOtherAssembly as OldSomeOther     { .ver 1:1:1:1 }    .assembly extern SomeOtherAssembly as NewSomeOther     { .ver 1:3:2:1 }        call int32 [OldSomeOther]Foo::Bar(string)        call int32 [NewSomeOther]Foo::Bar(string)

The alias is not a part of metadata. Rather, it is simply a language tool, needed to identify a particular AssemblyRef among several same-name AssemblyRefs. IL Disassembler generates aliases for AssemblyRefs whenever it finds same-name AssemblyRefs in the module metadata.

The Loader in Search of Assemblies

When we define an AssemblyRef in the metadata, we expect the loader to find exactly this assembly and load it into the application domain. Let’s have a look at the process of finding an external assembly and binding it to the referencing application.

Given an AssemblyRef, the process of binding to that assembly is influenced by these factors:

The application base (AppBase), which is a URL to the referencing application location (that is, to the directory in which your application is located). For executables, this is the directory containing the EXE file. For Web applications, the AppBase is the root directory of the application as defined by the Web server.
Version policies specified by the application, by the publisher of the shared assembly being referenced, or by the administrator.
Any additional search path information given in the application configuration file.
Any code base (CodeBase) locations provided in the configuration files by the application, the publisher, or the administrator. The CodeBase is a URL to the location of the referenced external assembly.
Whether the reference is to a shared assembly with a strong name or to a private assembly.

As illustrated in Figure 5-2, the loader performs the following steps to locate a referenced assembly:

Initiate the binding. Basically, this means taking the relevant AssemblyRef record from the metadata and seeing what it holds—its external assembly name, whether it is strong-named, whether culture is specified, and so on.
Apply the version policies, which are statements made by the application, by the publisher of the shared assembly being referenced, or by the administrator. These statements are contained in XML configuration files and simply redirect references to a particular version (or set of versions) of an assembly to a different version.

The .NET Framework retrieves its configuration from a set of configuration files. Each file represents settings that have different scopes. For example, the configuration file supplied with the installation of the common language runtime has settings that can affect all applications that use that version of the runtime. The configuration file supplied with an application has settings that affect only that one application.
Check the CodeBase. Now that the common language runtime knows which version of the assembly it is looking for, it begins the process of locating it. If the CodeBase has been supplied (in the same XML configuration file), it points the runtime directly at the executable to load; otherwise, the runtime needs to look in the AppBase and the GAC, as described in step 4. If the executable specified by the CodeBase matches the assembly reference, the process of finding the assembly is complete, and the external assembly can be loaded. In fact, even if the executable specified by the CodeBase does not match the reference, the common language runtime stops searching. In this case, of course, the search is considered a failure, and no assembly load follows.
Check the GAC or the AppBase or both. If the CodeBase hasn’t been supplied, the remainder of the process depends on whether the referenced assembly is private or strong-named.

If the reference is to a private assembly, the process probes the AppBase. The probing involves consecutive searching in the directories defined by the AppBase, the private binary path (binpath) from the same XML configuration file, the culture of the referenced assembly, and its name. The AppBase plus directories specified in the binpath form a set of root directories {<root_k>, k=1 N}. If the AssemblyRef specifies the culture, the search is performed in directories <root_k>/<culture> and then in <root_k>/<culture>/<name>; otherwise, the directories <root_k> and then <root_k>/<name> are searched. When searching for a private assembly, the process ignores the version numbers. If the assembly is not found by probing, the binding fails.

If the assembly is strong-named, the process first looks in the global assembly cache. If the strong-named assembly is not found in the GAC, the process probes the AppBase as just described, and in this case it also checks the version numbers.

Figure 5-2 Searching for a referenced assembly.

Module Metadata Table and Declaration

The Module metadata table contains a single record that provides the identification of the current module. The column structure of the table is as follows:

Generation (2-byte unsigned integer) Used only at run time, in edit-and-continue mode.
Name (offset in the #Strings stream) The module name, which is the same as the name of the executable file with its extension but without a path. The length should not exceed 512 characters, counting the zero terminator.
Mvid (offset in the #GUID stream) A globally unique identifier, assigned to the module as it is generated.
EncId (offset in the #GUID stream) Used only at run time, in edit-and-continue mode.
EncBaseId (offset in the #GUID stream) Used only at run time, in edit-and-continue mode.

Because only one entry of the Module record can be set explicitly (the Name entry), the module declaration in ILAsm is quite simple:

   .module <name>

ModuleRef Metadata Table and Declaration

The ModuleRef metadata table contains descriptors of other modules referenced in the current module. The set of “other modules” includes subsets of both managed and unmanaged modules.

The relevant managed modules are the other modules of the current assembly. In ILAsm, they should be declared explicitly, and their declarations should be paired with File declarations (discussed in the following section).

The unmanaged modules described in the ModuleRef table are simply unmanaged DLLs containing methods called from the current module using the platform invocation mechanism—P/Invoke, discussed in Chapter 15, “Managed and Unmanaged Code Interoperation.” These ModuleRef records should not be paired with File records. They need not be explicitly declared in ILAsm because in ILAsm the DLL name is part of the P/Invoke specification.

A ModuleRef record contains only one entry, the Name entry, which is an offset in the #Strings stream. The ModuleRef declaration in ILAsm is not much more sophisticated than the declaration of Module:

   .module extern <name>

As in the case of Module, <name> in ModuleRef is the name of the executable file with its extension but without a path, not exceeding 512 characters.

File Metadata Table and Declaration

The File metadata table describes other files of the same assembly that are referenced in the current module. In single-module assemblies, this table is empty. The table has the following column structure:

Flags (4-byte unsigned integer) Binary flags characterizing the file. In this version, this entry is mostly reserved for future use; the only flag currently defined is File contains no metadata (0x00000001). This flag indicates that the file in question is not a managed PE file but rather a pure resource file.
Name (offset in the #Strings stream) The filename, subject to the same rules as the names in Module and ModuleRef. This is the only occurrence of data duplication in the metadata model: the File name matches the name used in the ModuleRef with which this File record is paired. However, because the names in both records are not physical strings but rather offsets in the string heap, the data might not actually be duplicated; instead, both records might reference the same string in the heap.
HashValue (offset in the #Blob stream) The blob representing the hash of the file, used to authenticate the files in a multifile assembly. Even in a strong-named assembly, the strong name signature resides only in the prime module and covers only the prime module. Nonprime modules in an assembly are authenticated by their hash values.

The File declaration in ILAsm looks like the following:

   .file <flag> <name>  .hash = ( <bytes> )

where <flag> ::=

   <none>          // The file is a managed PE file      nometadata    // The file is a pure resource file

If the hash value is not explicitly specified, the ILAsm compiler finds the named file and computes the hash value using the hash algorithm specified in the Assembly declaration.

The File declaration can also have a .entrypoint clause, as shown in this example:

.file MainClass.dll  .hash = (01 02 03 04 05 06    )  .entrypoint

This sort of File declaration can occur only in the prime module and only when the entry point method is defined in a nonprime module of the assembly. This clause of the File declaration does not affect the metadata, but it puts the appropriate file token in the EntryPointToken entry of the common language runtime header. See Chapter 3, “The Structure of a Managed Executable File,” for details about EntryPointToken and the runtime header.

The prime module of an assembly, especially a runnable application (EXE), must have a valid token in the EntryPointToken field of the common language runtime header; and this token must be either a Method token, if the entry point method is defined in the prime module, or a File token. In the latter case, the loader loads the relevant module and inspects its common language runtime header, which must contain a valid Method token in the EntryPointToken field.

Managed Resource Metadata and Declaration

A resource is any nonexecutable data that is logically deployed as a part of an application. The data can take any number of forms such as strings, images, persisted objects, and so on. As Chapter 3 described, resources can be either managed or unmanaged (platform-specific). These two kinds of resources have different formats and are accessed using managed and unmanaged APIs, respectively.

An application often must be customized for different cultures. A culture is a set of preferences based on a user’s language, sublanguage, and cultural conventions. In the .NET Framework, the culture is described by the CultureInfo class from the .NET Framework class library. A culture is used to customize operations such as formatting dates and numbers, sorting strings, and so on.

You might also need to customize an application for different countries or regions. A region defines a set of standards for a particular country or region of the world. In the .NET Framework, the class library describes a region using the RegionInfo class. A region is used to customize operations such as formatting currency symbols.

Localization of an application is the process of sharing the application’s executable code with the application’s resources that have been customized for specific cultures. Although a culture and a region together constitute a locale, localization is not concerned with customizing an application to a specific region. The .NET Framework and the common language runtime do not support localization of component metadata, instead relying solely on the managed resources for this task.

The .NET Framework uses a hub-and-spoke model for packaging and deploying resources. The hub is the main assembly, which contains the nonlocalizable executable code and the resources for a single culture (referred to as the neutral culture). The neutral culture is the fallback culture for the application. Each spoke connects to a satellite assembly that contains the resources for a single culture. Satellite assemblies do not contain code.

The advantages of this model are obvious. First, resources for new cultures can be added incrementally after an application is deployed. Second, an application needs to load only those satellite assemblies that contain the resources needed for a particular run.

The resources used in or exposed by an assembly can reside in one of the following locations:

In separate resource file(s) in the same assembly. Each resource file can contain one or more resources. The metadata descriptors of such files carry the nometadata flag.
Embedded in managed modules of the same assembly.
In another (external) assembly.

Because the resource data is not directly used or validated by the deployment subsystem or the loader, it can be of any kind.

All resource data embedded in a managed PE file resides in a contiguous block inside the .text section. The Resources data directory in the common language runtime header provides the relative virtual address (RVA) and size of embedded managed resources. Each individual resource is preceded by a 4-byte unsigned integer holding the resource’s length in bytes. Figure 5-3 shows the layout of embedded managed resources.

Figure 5-3 The layout of embedded managed resources.

The ManifestResource metadata table, describing the managed resources, has the following column structure:

Offset (4-byte unsigned integer) Location of the resource within the managed resource segment to which the Resources data directory of the common language runtime header points. This is not an RVA; rather, it is an offset within the managed resource segment.
Flags (4-byte unsigned integer) Binary flags indicating whether the managed resource is public (accessible from outside the assembly) or private (accessible from within the current assembly only).
Name (offset in the #Strings stream) Nonempty name of the resource, unique within the assembly.
Implementation (coded token of type Implementation) Token of the respective AssemblyRef record if the resource resides in another assembly or of the respective File record if the resource resides in another file of the current assembly. If the resource is embedded in the current module, this entry is set to 0. If the resource is imported from another assembly, the offset need not be specified; the loader will ignore it.

ILAsm syntax for the declaration of a managed resource is as follows:

   .mresource <flag> <name> { <mResourceDecl>* }

where <flag> ::= public private and <mResourceDecl> ::=

   .assembly extern <alias>      // Resource is imported from another                                   // assembly      .file <name> at <int32>     // Resource resides in another                                  // file of this assembly;                                  // <int32> is the offset      <customAttrDecl> // Define custom attribute for this resource

The default flag value is private.

The directives .assembly extern and .file in the context of a managed resource declaration refer to the resource’s Implementation entry and are mutually exclusive. If Implementation references the AssemblyRef or File before it has been declared, the ILAsm compiler will diagnose an error.

If the Implementation entry is empty, the resource is presumed embedded in the current module. In this case, the ILAsm compiler creates the PE file, loads the resource from the file according to the resource’s name, and writes it into the .text section of the PE file, automatically setting the Offset entry of the ManifestResource record. When the IL Disassembler disassembles a PE file into a text file, the embedded managed resources are saved into binary files named after these resources, which allows the ILAsm compiler to easily pick them up if the PE file needs to be reassembled.

ILAsm does not offer any language constructs to address the managed resources because IL lacks the means to do so. Managed APIs provided by the .NET Framework class library—specifically, the System.Resources.ResourceManager class—are used to load and manipulate managed resources.

ExportedType Metadata Table and Declaration

The ExportedType metadata table contains information about the public classes (visible outside the assembly) that are declared in nonprime modules of the assembly. Only the prime module’s manifest can carry this table.

This table is needed because the loader expects the prime module of an assembly to hold information about all classes exported by the assembly. The union of the classes defined in the prime module and those in the ExportedType table gives the loader the full picture.

On the other hand, the intersection of the classes defined in the prime module and those in the ExportedType table must be nil. As a result, the ExportedType table can be nonempty only in the prime module of a multimodule assembly.

The ExportedType table has the following column structure:

Flags (4-byte unsigned integer) Binary flags indicating accessibility of the exported type. The flags we are interested in are public and nested public; other accessibility flags—identical to the class accessibility flags discussed in Chapter 6, “Namespaces and Classes,”—are syntactically admissible but are not used to define true exported types. Other flags can be present in pseudo-ExportedTypes only, which the loader can use to resolve unscoped type references in multimodule assemblies.

Some explanation is in order. Any time a type (class) is referenced in a module, the resolution scope should be provided to indicate where the referenced class is defined (in the current module, in another module of this assembly, or in another assembly). If the resolution scope is not provided, the referenced type should be declared in the current module. However, if this type cannot be found in the module referencing it, and if the manifest of the prime module carries a same-name pseudo-ExportedType record indicating where the type is actually defined, the loader is nevertheless able to resolve the type reference. None of the current Microsoft managed compilers, including the ILAsm compiler, uses this rather bizarre technique.
TypeDefId (4-byte unsigned integer) An uncoded token referring to a record of the TypeDef table of the module where the exported class is defined. This is the only occasion in the entire metadata model in which a module’s metadata contains an explicit value of a metadata token from another module. This token is used as something of a hint for the loader and can be omitted without any ill effects. If the token is supplied, the loader retrieves the specific TypeDef record from the respective module’s metadata and checks the full name of ExportedType against the full name of TypeDef. If the names match, the loader has found the class it was looking for; if the names do not match, or if the token was not supplied in the first place, the loader finds the needed TypeDef by its full name. My advice: never specify a TypeDefId token explicitly when programming in ILAsm. This shortcut works only for automatic tools such as the Assembly Linker (AL) and only under certain circumstances.
TypeName (offset in the #Strings stream) Exported type’s name; must be nonempty.
TypeNamespace (offset in the #Strings stream) Exported type’s namespace; can be empty. Class names and namespaces are discussed in Chapter 6.
Implementation (coded token of type Implementation) Token of the File record indicating the file of the assembly where the exported class is defined or the token of another ExportedType, if the current one is nested in another one.

The exported types are declared in ILAsm as follows:

   .class extern <flag> <namespace>.<name> { <expTypeDecl>* }

where <flag> ::= public nested public and <expTypeDecl> ::=

   .file <name>     // File where exported class is defined      .class extern <namespace>.<name> // Enclosing exported type      .class <int32> // Set TypeDefId explicitly      <customAttrDecl> // Define custom attribute for this ExportedType

The directives .file and .class extern define the Implementation entry and are mutually exclusive. As in the case of the .mresource declaration, the File or ExportedType must be declared before being referenced by the Implementation entry.

It is fairly obvious that if Implementation is specified as .class extern, we are dealing with a nested exported type, and Flags must be set to nested public. Inversely, if Implementation is specified as .file, we are dealing with a top-level unnested class, and Flags must be set to public.