Heaps and Tables
Logically, metadata is represented as a set of named streams, each stream representing a category of metadata. These streams are divided into two types: metadata heaps and metadata tables.
Heaps
A metadata heap is a storage of trivial structure, holding a contiguous sequence of items. Heaps are used in metadata to store strings and binary objects. There are three kinds of metadata heaps:
String heap This type of heap contains zero-terminated character strings, encoded in UTF-8. The strings follow each other immediately. Because the first byte of the heap is always 0, the first string in the heap is an empty string. The last byte of the heap must be 0 as well.
GUID heap This type of heap contains 16-byte binary objects, immediately following each other. Because the size of the binary objects is fixed, length parameters or terminators are not needed.
Blob heap This type of heap contains binary objects of arbitrary size. Each binary object is preceded by its length (in compressed form). Binary objects are aligned on 4-byte boundaries.
The length compression formula is fairly simple. If the length (which is an unsigned integer) is 0x7F or less, it is represented as 1 byte; if the length is greater than 0x7F but no larger than 0x3FFF, it is represented as a 2-byte unsigned integer with the senior bit set. Otherwise, it is represented as a 4-byte unsigned integer with two senior bits set. Table 4-1 summarizes this formula.
Value Range | Compressed Size | Compressed Value |
0 0x7F | 1 byte | <value> |
0x80 0x3FFF | 2 bytes | 0x8000 <value> |
0x4000 0x1FFFFFFF | 4 bytes | 0xC0000000 <value> |
This compression formula is widely used in metadata. Of course, the compression works only for numbers not exceeding 0x1FFFFFFF (536,870,911), but this limitation isn’t a problem because the compression is usually applied to such values as lengths and counts. | |
General Metadata Header
A general metadata header consists of a storage signature and a storage header. The storage signature has the following structure:
Type | Field | Description |
DWORD | lSignature | “Magic” signature for physical metadata, currently 0x424A5342 |
WORD | iMajorVersion | Major version (1 for the first release of the common language runtime) |
WORD | iMinorVersion | Minor version (1 for the first release of the common language runtime) |
DWORD | iExtraData | Reserved; set to 0 |
DWORD | iLength | Length of the version string |
BYTE[ ] | iVersionString | Version string |
The storage header follows the storage signature, aligned on a 4-byte boundary. Its structure is simple:
Type | Field | Description |
BYTE | fFlags | Reserved; set to 0 |
BYTE |
| [padding] |
WORD | iStreams | Number of streams |
The storage header is followed by an array of stream headers. The structure of a stream header looks like this:
Type | Field | Description |
DWORD | iOffset | Offset in the file for this stream |
DWORD | iSize | Size of the stream in bytes |
char[16] | rcName | Name of the stream; a zero-terminated ANSI string no longer than seven characters |
Six named streams can be present in the metadata:
#Strings A string heap containing the names of metadata items (class names, method names, field names, and so on). The stream does not contain literal constants defined or referenced in the methods of the module.
#Blob A blob heap containing internal metadata binary objects, such as default values. This stream does not contain binary objects defined in the methods of the module.
#GUID A GUID heap containing all sorts of globally unique identifiers.
#US A blob heap containing user-defined strings. This stream contains string constants defined in the user code. The strings are kept in Unicode encoding. This stream’s most interesting characteristic is that the user strings can be explicitly addressed by the IL code (with the ldstr instruction). In addition, because it is actually a blob heap, the #US heap can store not only Unicode strings but any binary object, which opens some intriguing possibilities.
#~ A compressed (optimized) metadata stream. This stream contains an optimized system of metadata tables.
#- An uncompressed (unoptimized) metadata stream. This stream contains an unoptimized system of metadata tables, including the intermediate lookup tables (pointer tables).
The streams #~ and #- are mutually exclusive—that is, the metadata structure of the module is either optimized or unoptimized; it cannot be both at the same time. | |
If no items are stored in a stream, the stream is absent (null), and the iStreams field of the storage header is correspondingly reduced. At least three streams are guaranteed to be present: a metadata stream (#~ or #-), a string stream (#Strings), and a GUID stream (#GUID). Metadata items must be present in at least minimal configuration in even the most trivial module, and these metadata items must have names and GUIDs.
Figure 4-3 illustrates the general structure of metadata. In Figure 4-4, you can see the way streams are referenced by other streams as well as by external “consumers” such as metadata APIs and the IL code.
Figure 4-3 The general structure of metadata.
Figure 4-4 Stream referencing.
Metadata Table Streams
The metadata streams #~ and #- begin with the following header:
Size | Field | Description |
4 bytes | Reserved | Reserved; set to 0. |
1 byte | Major | Major version of the table schema (1 for the first release of the common language runtime). |
1 byte | Minor | Minor version of the table schema (0 for the first release of the common language runtime). |
1 byte | Heaps | Binary flags indicate the offset sizes to be used within the heaps. A 4-byte unsigned integer offset is indicated by 0x01 for a string heap, 0x02 for a GUID heap, and 0x04 for a blob heap. If a flag is not set, the respective heap offset is presumed to be a 2-byte unsigned integer. |
|
| A # stream can also have special flags set: flag 0x20, indicating that the stream contains only changes made during an edit-and-continue session, and flag 0x80, indicating that the metadata might contain items marked as deleted. |
1 byte | Rid | Bit count of the maximal record index to all tables of the metadata; calculated at run time (during the metadata stream initialization). |
8 bytes | MaskValid | Bit vector of present tables, each bit representing one table (1 if present). |
8 bytes | Sorted | Bit vector of sorted tables, each bit representing a respective table (1 if sorted). |
This header is followed by a sequence of 4-byte unsigned integers indicating the number of records in each table marked 1 in the MaskValid bit vector.
Like any database, metadata has a schema. The schema is a system of descriptors of metadata tables and columns—in this sense, it is “meta-metadata.” A schema is not a part of metadata, nor is it an attribute of a managed PE file. Rather, a metadata schema is an attribute of the common language runtime and is hard-coded. It should not change in the future unless there’s a major overhaul of the runtime.
Each metadata table has the following descriptors:
Type | Field | Description |
pointer | pColDefs | Pointer to an array of column descriptors |
BYTE | cCols | Number of columns in the table |
BYTE | iKey | Index of the key column |
WORD | cbRec | Size of a record in the table |
Column descriptors, to which the pColDefs fields of table descriptors point, have the following structure:
Type | Field | Description |
BYTE | Type | Code of the column’s type |
BYTE | oColumn | Offset of the column |
BYTE | cbColumn | Size of the column in bytes |
Type, the first field of a column descriptor, is especially interesting. The metadata schema of the first release of the common language runtime identifies the following codes for column types:
0 63 | Column holds the record index (RID) to another table; the specific value indicates which table. The width of the column is defined by the Rid field of the metadata stream header. |
64 95 | Column holds a coded token referencing another table; the specific value indicates the type of coded token. Tokens are references carrying the indexes of both the table and the record being referenced. The table being addressed and the index of the record are defined by the coded token value. |
96 | Column holds a 2-byte signed integer. |
97 | Column holds a 2-byte unsigned integer. |
98 | Column holds a 4-byte signed integer. |
99 | Column holds a 4-byte unsigned integer. |
100 | Column holds a 1-byte unsigned integer. |
101 | Column holds an offset in the string heap (the #Strings stream). |
102 | Column holds an offset in the GUID heap (the #GUID stream). |
103 | Column holds an offset in the blob heap (the #Blob stream). |
The metadata schema defines 44 tables. Given the range of RID type codes, the common language runtime definitely has room for growth. At the moment, the following tables are defined:
Module The current module descriptor.
TypeRef Class reference descriptors.
TypeDef Class or interface definition descriptors.
FieldPtr A class-to-fields lookup table, which does not exist in optimized metadata (#~ stream).
Field Field definition descriptors.
MethodPtr A class-to-methods lookup table, which does not exist in optimized metadata (#~ stream).
Method Method definition descriptors.
ParamPtr A method-to-parameters lookup table, which does not exist in optimized metadata (#~ stream).
Param Parameter definition descriptors.
InterfaceImpl Interface implementation descriptors.
MemberRef Member (field or method) reference descriptors.
Constant Constant value descriptors that map the default values stored in the #Blob stream to respective fields, parameters, and properties.
CustomAttribute Custom attribute descriptors.
FieldMarshal Field or parameter marshaling descriptors for managed/unmanaged interoperations.
DeclSecurity Security descriptors.
ClassLayout Class layout descriptors that hold information about how the loader should lay out respective classes.
FieldLayout Field layout descriptors that specify the offset or sequencing of individual fields.
StandAloneSig Stand-alone signature descriptors. Signatures per se are used in two capacities: as composite signatures of local variables of methods, and as parameters of the call indirect (calli) IL instruction.
EventMap A class-to-events mapping table. This is not an intermediate lookup table, and it does exist in optimized metadata.
EventPtr An event-map-to-events lookup table, which does not exist in optimized metadata (#~ stream).
Event Event descriptors.
PropertyMap A class-to-properties mapping table. This is not an intermediate lookup table, and it does exist in optimized metadata.
PropertyPtr A property-map-to-properties lookup table, which does not exist in optimized metadata (#~ stream).
Property Property descriptors.
MethodSemantics Method semantics descriptors that hold information about which method is associated with a specific property or event and in what capacity.
MethodImpl Method implementation descriptors.
ModuleRef Module reference descriptors.
TypeSpec Type specification descriptors.
ImplMap Implementation map descriptors used for the platform invocation (P/Invoke) type of managed/unmanaged code interoperation.
FieldRVA Field-to-data mapping descriptors.
ENCLog Edit-and-continue log descriptors that hold information about what changes have been made to specific metadata items during in-memory editing. This table does not exist in optimized metadata (#~ stream).
ENCMap Edit-and-continue mapping descriptors. This table does not exist in optimized metadata (#~ stream).
Assembly The current assembly descriptor, which should appear only in prime module metadata.
AssemblyProcessor This table is unused in the first release of the runtime.
AssemblyOS This table is unused in the first release of the runtime.
AssemblyRef Assembly reference descriptors.
AssemblyRefProcessor This table is unused in the first release of the runtime.
AssemblyRefOS This table is unused in the first release of the runtime.
File File descriptors that contain information about other files in the current assembly.
ExportedType Exported type descriptors that contain information about public classes exported by the current assembly, which are declared in other modules of the assembly. Only the prime module of the assembly should carry this table.
ManifestResource Managed resource descriptors.
NestedClass Nested class descriptors that provide mapping of nested classes to their respective enclosing classes.
TypeTyPar Reserved for future use.
MethodTyPar Reserved for future use.
The structural aspects of the various tables and their validity rules are discussed in later chapters, along with the corresponding ILAsm constructs.