Overview of Metadata Physical Layout in PE Files | The Common Language Infrastructure Annotated Standard (Microsoft. NET Development Series)

Section 23 of Partition II describes how the metadata is laid out in PE files. If you look at a dump of a file, the sections relating to the PE file format are described in section 24, and the metadata sections are described in section 23.

Metadata can be stored in either a code or a data segment. From the point of view of the operating system, it is simply a part of that segment. The data is mapped in or not mapped in, depending on whether the segment is to be loaded into memory. At execution time, the metadata is designed to be read-only, and never contains executable code. Presently, Microsoft compilers store the metadata in the code segment, where it is executable but read-only, rather than in its own isolated segment. The standard does not require this, however, and implementations should be prepared to find the metadata anywhere.

The on-disk representation of metadata is stored in streams representing the metadata tables and heaps. It reflects the logical layout described in sections 21 and 22 of Partition II.

The standard deliberately describes only those parts of the format that are intended to be portable i.e., code that has been completely written and is ready for execution. The standard describes the format needed to transfer such a program. Microsoft compilers typically integrate into a build environment that produces intermediate versions that are not ready for direct execution. These contain additional metadata streams and do not always contain the streams that are described in the standard. In particular, there is a so-called "hard-optimized" metadata format that some compilers use when they produce debugging output or output that is intended to go through a linker. That part of the file format is evolving rapidly, is not portable, and is not intended to be standardized.

As the standard says, metadata is stored in two kinds of structures tables (arrays of records) and heaps. The tables, the physical representation of the logical metadata tables, are stored as streams in a stream designated "#~". These tables and their schemata are described in detail in section 21 of Partition II.

There are four heaps in any module: String, Blob, Userstring, and Guid.

The String heap stores identifiers for the tables. For example, the names of types and their members are stored in the String heap.

The Blob heap stores the blobs, chunks of binary data of a known size associated with metadata. Blobs are accessed using the offset of the start of the blob into the Blob heap. The only way to understand the format of a blob is to know where that offset came from. For example, in the metadata table for member definitions, one of the columns is the signature of the member. That will be an offset into the Blob heap; the format for that blob will be the MemberDefSignature.

The Userstring heap stores strings specified by a programmer, such as double-quoted strings in source code.

The Guid heap is an array of GUIDs, each 16 bytes wide. Its first element is numbered 1, its second 2, and so on.

The first three are byte arrays (so valid indexes into these heaps might be 0, 23, 25, 39, etc.). At present, only a single GUID is used in the CLI. It had been thought that the GUIDs would be used more, but that has not been the case.