Heaps and Tables

Heaps and Tables

Logically, metadata is represented as a set of named streams, each stream representing a category of metadata. These streams are divided into two types: metadata heaps and metadata tables.

Heaps

A metadata heap is a storage of trivial structure, holding a contiguous sequence of items. Heaps are used in metadata to store strings and binary objects. There are three kinds of metadata heaps:

  • String heap  This type of heap contains zero-terminated character strings, encoded in UTF-8. The strings follow each other immediately. Because the first byte of the heap is always 0, the first string in the heap is an empty string. The last byte of the heap must be 0 as well.

  • GUID heap  This type of heap contains 16-byte binary objects, immediately following each other. Because the size of the binary objects is fixed, length parameters or terminators are not needed.

  • Blob heap  This type of heap contains binary objects of arbitrary size. Each binary object is preceded by its length (in compressed form). Binary objects are aligned on 4-byte boundaries.

    The length compression formula is fairly simple. If the length (which is an unsigned integer) is 0x7F or less, it is represented as 1 byte; if the length is greater than 0x7F but no larger than 0x3FFF, it is represented as a 2-byte unsigned integer with the senior bit set. Otherwise, it is represented as a 4-byte unsigned integer with two senior bits set. Table 4-1 summarizes this formula.

Table 4-1  The Length Compression Formula for the Blob

Value Range

Compressed Size

Compressed Value

0 0x7F

1 byte

<value>

0x80 0x3FFF

2 bytes

0x8000 <value>

0x4000 0x1FFFFFFF

4 bytes

0xC0000000 <value>

note

This compression formula is widely used in metadata. Of course, the compression works only for numbers not exceeding 0x1FFFFFFF (536,870,911), but this limitation isn’t a problem because the compression is usually applied to such values as lengths and counts.

General Metadata Header

A general metadata header consists of a storage signature and a storage header. The storage signature has the following structure:

Type

Field

Description

DWORD

lSignature

“Magic” signature for physical metadata, currently 0x424A5342

WORD

iMajorVersion

Major version (1 for the first release of the common language runtime)

WORD

iMinorVersion

Minor version (1 for the first release of the common language runtime)

DWORD

iExtraData

Reserved; set to 0

DWORD

iLength

Length of the version string

BYTE[ ]

iVersionString

Version string

The storage header follows the storage signature, aligned on a 4-byte boundary. Its structure is simple:

Type

Field

Description

BYTE

fFlags

Reserved; set to 0

BYTE

[padding]

WORD

iStreams

Number of streams

The storage header is followed by an array of stream headers. The structure of a stream header looks like this:

Type

Field

Description

DWORD

iOffset

Offset in the file for this stream

DWORD

iSize

Size of the stream in bytes

char[16]

rcName

Name of the stream; a zero-terminated ANSI string no longer than seven characters

Six named streams can be present in the metadata:

  • #Strings  A string heap containing the names of metadata items (class names, method names, field names, and so on). The stream does not contain literal constants defined or referenced in the methods of the module.

  • #Blob  A blob heap containing internal metadata binary objects, such as default values. This stream does not contain binary objects defined in the methods of the module.

  • #GUID  A GUID heap containing all sorts of globally unique identifiers.

  • #US  A blob heap containing user-defined strings. This stream contains string constants defined in the user code. The strings are kept in Unicode encoding. This stream’s most interesting characteristic is that the user strings can be explicitly addressed by the IL code (with the ldstr instruction). In addition, because it is actually a blob heap, the #US heap can store not only Unicode strings but any binary object, which opens some intriguing possibilities.

  • #~  A compressed (optimized) metadata stream. This stream contains an optimized system of metadata tables.

  • #-  An uncompressed (unoptimized) metadata stream. This stream contains an unoptimized system of metadata tables, including the intermediate lookup tables (pointer tables).

    note

    The streams #~ and #- are mutually exclusive—that is, the metadata structure of the module is either optimized or unoptimized; it cannot be both at the same time.

If no items are stored in a stream, the stream is absent (null), and the iStreams field of the storage header is correspondingly reduced. At least three streams are guaranteed to be present: a metadata stream (#~ or #-), a string stream (#Strings), and a GUID stream (#GUID). Metadata items must be present in at least minimal configuration in even the most trivial module, and these metadata items must have names and GUIDs.

Figure 4-3 illustrates the general structure of metadata. In Figure 4-4, you can see the way streams are referenced by other streams as well as by external “consumers” such as metadata APIs and the IL code.

Figure 4-3 The general structure of metadata.

Figure 4-4 Stream referencing.

Metadata Table Streams

The metadata streams #~ and #- begin with the following header:

Size

Field

Description

4 bytes

Reserved

Reserved; set to 0.

1 byte

Major

Major version of the table schema (1 for the first release of the common language runtime).

1 byte

Minor

Minor version of the table schema (0 for the first release of the common language runtime).

1 byte

Heaps

Binary flags indicate the offset sizes to be used within the heaps.

A 4-byte unsigned integer offset is indicated by 0x01 for a string heap, 0x02 for a GUID heap, and 0x04 for a blob heap.

If a flag is not set, the respective heap offset is presumed to be a 2-byte unsigned integer.

A # stream can also have special flags set: flag 0x20, indicating that the stream contains only changes made during an edit-and-continue session, and flag 0x80, indicating that the metadata might contain items marked as deleted.

1 byte

Rid

Bit count of the maximal record index to all tables of the metadata; calculated at run time (during the metadata stream initialization).

8 bytes

MaskValid

Bit vector of present tables, each bit representing one table (1 if present).

8 bytes

Sorted

Bit vector of sorted tables, each bit representing a respective table (1 if sorted).

This header is followed by a sequence of 4-byte unsigned integers indicating the number of records in each table marked 1 in the MaskValid bit vector.

Like any database, metadata has a schema. The schema is a system of descriptors of metadata tables and columns—in this sense, it is “meta-metadata.” A schema is not a part of metadata, nor is it an attribute of a managed PE file. Rather, a metadata schema is an attribute of the common language runtime and is hard-coded. It should not change in the future unless there’s a major overhaul of the runtime.

Each metadata table has the following descriptors:

Type

Field

Description

pointer

pColDefs

Pointer to an array of column descriptors

BYTE

cCols

Number of columns in the table

BYTE

iKey

Index of the key column

WORD

cbRec

Size of a record in the table

Column descriptors, to which the pColDefs fields of table descriptors point, have the following structure:

Type

Field

Description

BYTE

Type

Code of the column’s type

BYTE

oColumn

Offset of the column

BYTE

cbColumn

Size of the column in bytes

Type, the first field of a column descriptor, is especially interesting. The metadata schema of the first release of the common language runtime identifies the following codes for column types:

0 63

Column holds the record index (RID) to another table; the specific value indicates which table. The width of the column is defined by the Rid field of the metadata stream header.

64 95

Column holds a coded token referencing another table; the specific value indicates the type of coded token. Tokens are references carrying the indexes of both the table and the record being referenced. The table being addressed and the index of the record are defined by the coded token value.

96

Column holds a 2-byte signed integer.

97

Column holds a 2-byte unsigned integer.

98

Column holds a 4-byte signed integer.

99

Column holds a 4-byte unsigned integer.

100

Column holds a 1-byte unsigned integer.

101

Column holds an offset in the string heap (the #Strings stream).

102

Column holds an offset in the GUID heap (the #GUID stream).

103

Column holds an offset in the blob heap (the #Blob stream).

The metadata schema defines 44 tables. Given the range of RID type codes, the common language runtime definitely has room for growth. At the moment, the following tables are defined:

  • Module  The current module descriptor.

  • TypeRef  Class reference descriptors.

  • TypeDef  Class or interface definition descriptors.

  • FieldPtr  A class-to-fields lookup table, which does not exist in optimized metadata (#~ stream).

  • Field  Field definition descriptors.

  • MethodPtr  A class-to-methods lookup table, which does not exist in optimized metadata (#~ stream).

  • Method  Method definition descriptors.

  • ParamPtr  A method-to-parameters lookup table, which does not exist in optimized metadata (#~ stream).

  • Param  Parameter definition descriptors.

  • InterfaceImpl  Interface implementation descriptors.

  • MemberRef  Member (field or method) reference descriptors.

  • Constant  Constant value descriptors that map the default values stored in the #Blob stream to respective fields, parameters, and properties.

  • CustomAttribute  Custom attribute descriptors.

  • FieldMarshal  Field or parameter marshaling descriptors for managed/unmanaged interoperations.

  • DeclSecurity  Security descriptors.

  • ClassLayout  Class layout descriptors that hold information about how the loader should lay out respective classes.

  • FieldLayout  Field layout descriptors that specify the offset or sequencing of individual fields.

  • StandAloneSig  Stand-alone signature descriptors. Signatures per se are used in two capacities: as composite signatures of local variables of methods, and as parameters of the call indirect (calli) IL instruction.

  • EventMap  A class-to-events mapping table. This is not an intermediate lookup table, and it does exist in optimized metadata.

  • EventPtr  An event-map-to-events lookup table, which does not exist in optimized metadata (#~ stream).

  • Event  Event descriptors.

  • PropertyMap  A class-to-properties mapping table. This is not an intermediate lookup table, and it does exist in optimized metadata.

  • PropertyPtr  A property-map-to-properties lookup table, which does not exist in optimized metadata (#~ stream).

  • Property  Property descriptors.

  • MethodSemantics  Method semantics descriptors that hold information about which method is associated with a specific property or event and in what capacity.

  • MethodImpl  Method implementation descriptors.

  • ModuleRef  Module reference descriptors.

  • TypeSpec  Type specification descriptors.

  • ImplMap  Implementation map descriptors used for the platform invocation (P/Invoke) type of managed/unmanaged code interoperation.

  • FieldRVA  Field-to-data mapping descriptors.

  • ENCLog  Edit-and-continue log descriptors that hold information about what changes have been made to specific metadata items during in-memory editing. This table does not exist in optimized metadata (#~ stream).

  • ENCMap  Edit-and-continue mapping descriptors. This table does not exist in optimized metadata (#~ stream).

  • Assembly  The current assembly descriptor, which should appear only in prime module metadata.

  • AssemblyProcessor  This table is unused in the first release of the runtime.

  • AssemblyOS  This table is unused in the first release of the runtime.

  • AssemblyRef  Assembly reference descriptors.

  • AssemblyRefProcessor  This table is unused in the first release of the runtime.

  • AssemblyRefOS  This table is unused in the first release of the runtime.

  • File  File descriptors that contain information about other files in the current assembly.

  • ExportedType  Exported type descriptors that contain information about public classes exported by the current assembly, which are declared in other modules of the assembly. Only the prime module of the assembly should carry this table.

  • ManifestResource  Managed resource descriptors.

  • NestedClass  Nested class descriptors that provide mapping of nested classes to their respective enclosing classes.

  • TypeTyPar  Reserved for future use.

  • MethodTyPar  Reserved for future use.

The structural aspects of the various tables and their validity rules are discussed in later chapters, along with the corresponding ILAsm constructs.



Inside Microsoft. NET IL Assembler
Inside Microsoft .NET IL Assembler
ISBN: 0735615470
EAN: 2147483647
Year: 2005
Pages: 147
Authors: SERGE LIDIN

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net