Metadata | Microsoft Visual Basic 2005 BASICS

Metadata about the overall assembly and modules is called the manifest. Some of the macro information assembled in the manifest includes the simple name, version number, external references, module name, and public key of the assembly. A portion of the manifest is created from the assembly attributes found in the AssemblyInfo.cs file of a Microsoft Visual Studio .NET C# project. This is a partial listing of a typical AssemblyInfo.cs file:

 using System.Reflection; using System.Runtime.CompilerServices; using System.Runtime.InteropServices; // General Information about an assembly is controlled through the following // set of attributes. Change these attribute values to modify the information // associated with an assembly. [assembly: AssemblyTitle("WindowsApplication4")] [assembly: AssemblyDescription("")] [assembly: AssemblyConfiguration("")] [assembly: AssemblyCompany("")] [assembly: AssemblyProduct("WindowsApplication4")] [assembly: AssemblyCopyright("Copyright © 2005")] [assembly: AssemblyTrademark("")] [assembly: AssemblyCulture("")]

Metadata also chronicles the microdata of the assembly, such as types, methods, and attributes. Metadata paints a portrait of each type, including the type name, methods of the type, parameters of each method of the type, each field of the type, and further details related to the loading and executing of that type at run time. Types are probably the most important construct in a .NET application, and metadata about types is used throughout the life cycle of a managed application. Here are a couple of examples. At startup, metadata is used to identify the entry point method where the program starts executing. During program execution, when a class is first touched, an EECLASS is built ostensibly from metadata to represent that type to the just-in-time compiler. The EECLASS is an important component of the just-in-time process. The EECLASS is further described in Chapter 13, "Advanced Debugging."

To extend either manifest or type-related metadata, employ attributes. Attributes are the adjectives of a managed application and extend the description of an assembly, class, method, field, or other target. Attributes are recorded as metadata and extend the axiomatic metadata of an assembly. In addition, the Microsoft .NET Framework class library (FCL) offers predefined-custom and pseudo-custom attributes. Obsolete and StructLayout attributes are examples of predefined attributes. Serializable is an example of a pseudo-custom attribute. The Obsolete attribute marks an entity as deprecated, whereas the StructLayout attribute stipulates the memory layout of fields in the context of unmanaged memory. The latter attribute is essential when passing a managed type to an unmanaged function or application programming interface (API). You can augment the predefined attributes with programmer-defined custom attributes, where the limit is only your imagination. Applying a version number to a class, assigning the name of the responsible developer or team to a class, and identifying design documents used to architect an application are some ways to exploit custom attributes.

Metadata persisted to an assembly is organized as a nonhierarchical but relational database of cross-referencing tables. The metadata database has many tables that can—and often do—reference each other. However, no parent-child relationship between tables is ever implied. Each categorization of data is maintained in a separate table, such as the TypeDef and MethodDef tables. Types alone are stored in the TypeDef table. Each record of the TypeDef table represents a type. If there were six types in the assembly, there would be six records or rows in the TypeDef table. Methods of all types are stored in the MethodDef table. Each row of the MethodDef table describes a method. The TypeDef table references the MethodDef table to link types to member methods. The MethodList column of the TypeDef table has record indexes (RIDs) into the MethodDef table. Extending this model, the MethodDef table has a ParamList column, which is index to the method's parameters found in the Param table.

Metadata tables are assigned unique table identifiers, which are 1-byte unsigned integers. For example, the table identifier for the TypeDef table is 2, whereas 6 identifies the MethodDef table. Metadata tables reserved for the run time are not published and not assigned an external table identifier for the RID. Table 10-1 lists some of the popular metadata tables.

Table 10-1: Metadata Tables
Table Name	Table ID	Table Description
Assembly	0x20	Data related to assembly
Field	0x04	Fields (data member) of types
MethodDef	0x06	Methods (member functions) of types
NestedClass	0x29	Type definitions for nested types
Param	0x08	Method parameters of methods
Property	0x17	Properties of types
TypeDef	0x02	Type definitions of types in current assembly
TypeRef	0x01	Type definitions of types external to this module

Metadata tables are collections of records and columns. A metadata table contains a certain type of data, and each record is an instance of that type. Columns represent specific data on each instance, and each column contains a constant or index. The index references another table or heap where the metadata token is an example of an index. Metadata tokens are used as metadata pointers, allowing tables to cross-reference each other. Metadata tables can be optimized (compressed) or not optimized. For the purpose of this book, it is assumed that metadata is optimized. Metadata that is not optimized requires intermediate tables for ordered access between tables.

Tokens

Metadata tokens cross-reference other metadata tables and heaps. Tokens are 4-byte unsigned integers and a combination of the table identifier and RID. As shown in Figure 10-1, the high byte is the table identifier, and the lower 3 bytes are the RID. A token into the Field table might be 04000002. The token refers to the second row of the Field table. The RID is one-based, not zero-based. Because tokens are padded with zeros, the run time might optimize them. Metadata tokens are probably the most public manifestation of metadata. You will repeatedly see metadata tokens over the next few chapters.

Figure 10-1: Layout of a metadata token

In addition to other tables, metadata tables reference metadata heaps. Records of metadata tables hold fixed-length metadata information. Variable-length data is stored in one of the metadata heaps. Methods signatures are variable length and typical of content found on the String heap.

Metadata Heaps

The four metadata heaps are as follows: String, Userstring, Blob, and GUID.

The String heap is an array of null-terminated strings. Namespace, type, field, and method names, as well as other identifiers, are stored on the String heap.
User-defined strings are not placed on the String heap but instead reside on the Userstring heap, which is also an array of null-terminated strings. String literals from your program are cached on this heap.
The Blob heap is a binary heap and a composite of length prefix data, such as default values, method signatures, and field signatures.
The GUID heap is an array of globally unique identifiers (GUIDs). Yes, this is obvious. You might remember GUIDs from COM as 16-byte unique identifiers assigned to almost everything—most notably, class identifiers (CLSIDs) are assigned to class factories. What kind of GUID is stored on the GUID heap? The GUID heap contains module version identifiers (MVIDs).

Streams

Physically, metadata tables and heaps are persisted in streams as part of an assembly. Six possible streams, including streams for each metadata heap, are available in .NET. There are also two mutually exclusive streams, optimized and nonoptimized, which are reservoirs of metadata tables. Metadata tables are optimized or not optimized. There is no concept of partially optimized metadata tables. If the metadata tables are optimized, the optimized stream is present. Otherwise, the nonoptimized stream is available. Therefore, a managed application has at most five streams. Table 10-2 provides a complete list of the metadata streams.

Table 10-2: Metadata Streams
Name	Description
#~	Optimized or compressed metadata tables
#-	Nonoptimized metadata tables
#Blob	Physical repository of the blob heap
#GUID	Physical repository of the GUID heap
#String	Physical repository the String heap
#US	Physical repository of the Userstring heap

Metadata Validation

Managed execution is largely dependent on metadata. Improperly formed metadata could cause a managed application to fail unceremoniously. An assembly with bad metadata is like a house built on quicksand. Loading a class, just-in-time compilation, code access security, and other run-time operations depend on robust data. Metadata validation tests the correctness of metadata and is enacted preemptively, preventing applications with inferior metadata from being executed. Preventing application crashes manifested by improper metadata enforces code isolation.

Several tests are performed to validate metadata. Here is a short list:

Cross-references between tables are validated.
Offsets into metadata heaps are validated.
Metadata tables must have a valid number of rows. For example, the Assembly table is allowed one row.
Metadata tables cannot have duplicate rows.
Several more tests are enacted to certify metadata.

Developers can request metadata validation on demand with the PEVerify and Intermediate Language Disassembler (ILDASM) tools. Both tools are included in the .NET Framework software development kit (SDK).

PEVerify submits an assembly for metadata validation and Microsoft intermediate language (MSIL) verification and then reports the results. (MSIL verification is discussed in Chapter 11, "MSIL Programming.") This is the basic syntax for PEVerify:

PEVerify assemblyname

PEVerify validates the metadata of assemblyname. If metadata validation is successful, MSIL verification is applied next. MSIL verification is skipped if the metadata validation fails. If validation fails, execution is not viable. This removes a compelling reason to conduct MSIL verification. PEVerify offers a variety of optional arguments, including the capability to force MSIL verification even when the metadata validation fails.

Table 10-3 lists some of the PEVerify arguments.

Table 10-3: PEVerify Options
Argument	Description
/break=errorcount	Aborts verification when errors exceed errorcount.
/clock	Collects data and reports duration of verification and validation tests.
/help	Help information on parameters.
/ignore=errorcode1, errorcode2, errorocoden	Ignores listed error codes.
/il	Conducts MSIL verification. When you use this command, if metadata validation is also required it must be requested explicitly.
/md	Conducts metadata validation. If MSIL verification and metadata validation are jointly desired, MSIL verification should be requested explicitly.
/?	Same as the /help argument.

The following is a simple Hello World application, which is compiled to hello.exe. It is a minimal application, in which not much can go wrong. PEVerify will confirm this.

 using System; class Starter {   static void Main() {     Console.WriteLine("Hello, World!");   } }

The following code shows the result of running PEVerify on Hello.exe with the /il and /clock options. Since the md command is omitted, metadata verification is skipped.

 c:\>peverify /il /clock hello.exe All Classes and Methods in hello.exe Verified. Timing: Total run     125 msec         IL Ver.cycle  125 msec         IL Ver.pure   93 msec

The elapsed cycle for validation and pure times is listed. Pure time is the duration of the actual metadata validation, whereas cycle encapsulates the startup and shutdown processes.

ILDASM is a .NET tool that performs validation and can browse and display the metadata of an assembly. ILDASM inspects an assembly using reflection and presents the results in a window, console, or file.

ILDASM Tool

ILDASM, which is a .NET disassembler and metadata browser, is a popular tool for developers. It proffers an internal representation of an assembly, which includes the metadata and MSIL code of an assembly in a variety of formats. ILDASM exercises reflection to inspect an assembly. The core syntax of ILDASM requires only an assembly name, which opens ILDASM and displays the metadata of the assembly:

ildasm assemblyname

The following simple application is a basic .NET application that references a library. The simple application has a ZClass and ZStruct type, whereas the dynamic-link library (DLL) publishes the YClass type.

 using System; namespace Donis.CSharpBook{     interface IA {     }     struct ZStruct {     }     class Starter {         public static void Main() {             YClass obj1=new YClass();             obj1.DisplayCreateTime();             ZClass obj2=new ZClass();             obj2.DisplayCreateTime();         }     }     class ZClass: IA {     public enum Flag {            aflag,            bflag         }     public event EventHandler AEvent=null;         public void DisplayCreateTime() {             Console.WriteLine("ZClass created at "+m_Time);         }         private string m_Time=DateTime.Now.ToLongTimeString();         public string Time {             get {                 return m_Time;             }         }     } }

Figure 10-2 is a view of Simple.exe from ILDASM. ILDASM displays a hierarchal object graph with an icon for each element of the application.

image from book
Figure 10-2: Simple.exe displayed in ILDASM

Some icons are collapsible and expandable, as indicated by a + or - symbol. The Assembly icon expands to show the details of the loaded assembly, the Namespace icon expands to show the members of the namespace, and so on. You can drill down the object graph from the assembly down to the class members. An icon depicts each item category of the graph. Table 10-4 describes each icon for which the action is double-clicking the icon.

Table 10-4: Elements of ILDASM
Icon Descriptions	Action
Assembly	Shows elements of the assembly
Class	Shows members of a class
Enum	Shows members of enum type
Event	Views metadata and MSIL code of event
Field	Views metadata of field
Interface	Shows members of interface
Manifest	Views attributes of an assembly
Method	Views metadata and MSIL code of method
Namespace	Shows members of the namespace
Property	Views metadata and MSIL code of property
Static Field	Views metadata of static field
Static Method	Views metadata and MSIL code of static method
Value Type	Shows members of a value type

Some elements are displayed twice. For example, a property is presented as itself and separately as accessor and mutator methods.

ILDASM has a variety of command-line options. Table 10-5 lists these parameters.

Table 10-5: ILDASM Options
ILDASM Option	Description
Out	Renders metadata and MSIL to a text file.
Text	Renders metadata and related MSIL to console.
HTML	Combines with the out option to display metadata and MSIL in an HTML format.
RTF	Renders metadata and MSIL in Rich Text Format.
Bytes	Shows MSIL code with opcodes and related bytes.
Raweh	Shows label form of try and catch directives in raw form.
Tokens	Shows metadata tokens.
Source	Shows MSIL interlaced with commented source code; for this command, the source code and debug file must be accessible.
Linenum	Inserts line directives into an output stream that matches source code to MSIL. This command requires the debug file.
Visibility	Disassembles only members with the stated visibility: pub (public), pri (private), fam (family), asm (assembly), FAA (family and assembly), foa (family and assembly), and PSC (private scope).
Pubonly	Disassembles only public elements; short notation for visibility=pub.
QuoteAllNames	Brackets all identifiers in single quotes.
NOCA	Excludes custom attributes.
CAVerbal	Displays blob information of custom attributes in symbolic form and not binary.
NOBAR	Do not display progress bar.
UTF8	Renders output file in UTF8 (default ANSI).
UNICODE	Renders output file in UNICODE.
NOIL	Do not disassemble language source code.
Forward	Generates forward references and assemble in the Class Structure Declaration section.
TypeList	Displays list of types.
Headers	Includes DOS, PE, COFF, CLR, and metadata header information.
Item	Disassembles a particular class or method.
Stats	Displays statistical information on file, PE Header, CLR Header, and metadata.
ClassList	Provides a commented list of classes with attributes.
All	Combination of the Header, Bytes, Stats, ClassLists, and Tokens commands.
Metadata	Displays specific information related to metadata.
Objectfile	Shows metadata of a library file.

The user interface of ILDASM presents the same choices as the command-line options. The following command line is typical. It disassembles simple.exe and outputs the resulting metadata, MSIL, metadata tokens, and source code in the simple.il file.

 ildasm /out=simple.il /source /tokens simple.exe

The source option of the preceding command interlaces source code in between MSIL code. The source code is commented. Associating MSIL to source code is invaluable when debugging.

The tokens generated per the tokens option are also commented. The disassembly created by ILDASM is a valid MSIL program that can be recompiled (which is the reason for the il extension, as in client.il). The assembly can be reassembled with the ILASM compiler, which compiles MSIL code. The newly assembled assembly is identical to the original assembly.

Some ILDASM options impede the creation of a full disassembly. When a partial disassembly is requested, ILDASM issues a warning, which prevents you from attempting to use a partial assembly as a full assembly. One limitation is that partial assemblies cannot be reassembled using ILASM. The following command creates a partial assembly:

 ildasm /out=simple.il /item=Donis.CSharpBook.ZClass simple.exe

The preceding command targets only the ZClass of the simple.exe assemblies. Because other types are omitted from the disassembly, it is not complete. For this reason, a warning is added to the output file. Following is a partial listing of the output file with the embedded warning:

 //  Microsoft (R) .NET Framework IL Disassembler.  Version 2.0.50601.0 //   Microsoft Corporation. All rights reserved. // warning : THIS IS A PARTIAL DISASSEMBLY, NOT SUITABLE FOR RE-ASSEMBLING .class private auto ansi beforefieldinit Donis.CSharpBook.ZClass        extends [mscorlib]System.Object        implements Donis.CSharpBook.IA {   .field private class [mscorlib]System.EventHandler AEvent   .field private string m_Time   .method public hidebysig specialname instance void

This is the final example of ILDASM and command-line options. This command profiles the metadata yields counts, validates the metadata, and persists the results to the simple.txt file:

 ildasm /metadata=csv /metadata=validate /out=simple.txt simple.exe