The physical on-disk representation of metadata is a direct reflection of the logical representation described in Partition II, sections 21 and 22. That is, data is stored in streams representating the metadata tables and heaps. The main complication is that, where the logical representation is abstracted from the number of bytes needed for indexing into tables and columns, the physical representation has to take care of that explicitly by defining how to map logical metadata heaps and tables into their physical representations. Unless stated otherwise, all binary values are stored in little-endian format. 23.1 Fixed FieldsComplete CLI components (metadata and CIL instructions) are stored in a subset of the current Portable Executable (PE) file format (see Partition II, section 24). Because of this heritage, some of the fields in the physical representation of metadata have fixed values. When writing these fields, they shall be set to the value indicated; on reading they may be ignored. 23.2 File Headers23.2.1 Metadata RootThe root of the physical metadata starts with a magic signature, several bytes of version and other miscellaneous information, followed by a count and an array of stream headers, one for each stream that is present. The actual encoded tables and heaps are stored in the streams, which immediately follow this array of headers.
The Version string shall be "Standard CLI 2002" for any file that is intended to be executed on any conforming implementation of the CLI, and all conforming implementations of the CLI shall accept files that use this version string. Other strings shall be used when the file is restricted to a vendor-specific implementation of the CLI. Future versions of this standard shall specify different strings, but they shall begin "Standard CLI". Other standards that specify additional functionality shall specify their own specific version strings beginning with "Standard". Vendors that provide implementation-specific extensions shall provide a version string that does not begin with "Standard". 23.2.2 Stream HeaderA stream header gives the names, and the position and length of a particular table or heap. Note that the length of a stream header structure is not fixed, but depends on the length of its name field (a variable-length, null-terminated string).
Both logical tables and heaps are stored in streams. There are five possible kinds of streams: a stream header with name "#Strings" that points to the physical representation of the String heap where identifier strings are stored; a stream header with name "#US" that points to the physical representation of the Userstring heap; a stream header with name "#Blob" that points to the physical representation of the Blob heap; a stream header with name "#GUID" that points to the physical representation of the Guid heap; and a stream header with name "#~" that points to the physical representation of a set of tables (see Partition II, section 22).
Each kind of stream may occur at most once; that is, a metadata file may not contain two "#US" streams, or five "#Blob" streams. Streams need not be there if they are empty. The next sections will describe the structure of each kind of stream in more detail. 23.2.3 #Strings HeapThe stream of bytes pointed to by a "#Strings" header is the physical representation of the logical string heap. The physical heap may contain garbage; that is, it may contain parts that are unreachable from any of the tables, but parts that are reachable from a table shall contain a valid null-terminated UTF8 string. When the #String heap is present, the first entry is always the empty string (i.e., \0). 23.2.4 #US and #Blob HeapsThe stream of bytes pointed to by a "#US" or "#Blob" header are the physical representation of logical Userstring and Blob heaps, respectively. Both these heaps may contain garbage, as long as any part that is reachable from any of the tables contains a valid "blob." Individual blobs are stored with their length encoded in the first few bytes:
The first entry in both these heaps is the empty "blob" that consists of the single byte 0x00. Strings in the #US (Userstring) heap are encoded using 16-bit Unicode encodings. The count on each string is the number of bytes (not characters) in the string. Furthermore, there is an additional terminal byte (so all byte counts are odd, not even). This final byte holds the value 1 if and only if any UTF16 character within the string has any bit set in its top byte, or its low byte is any of the following: 0x01 0x08, 0x0E 0x1F, 0x27, 0x2D, 0x7F. Otherwise, it holds 0. The 1 signifies Unicode characters that require handling beyond that normally provided for 8-bit encoding sets. 23.2.5 #GUID HeapThe "#GUID" header points to a sequence of 128-bit GUIDs. There might be unreachable GUIDs stored in the stream. 23.2.6 #~ StreamThe "#~" streams contain the actual physical representations of the logical metadata tables (see Partition II, section 21). A "#~" stream has the following top-level structure:
The HeapSizes field is a bit vector that encodes how wide indices into the various heaps are. If bit 0 is set, indices into the "#String" heap are 4 bytes wide; if bit 1 is set, indices into the "#GUID" heap are 4 bytes wide; bit 2 is not used; if bit 3 is set, indices into the "#Blob" heap are 4 bytes wide. Conversely, if the HeapSize bit for a particular heap is not set, indices into that heap are 2 bytes wide.
The Valid field is a 64-bit-wide bit vector that has a specific bit set for each table that is stored in the stream; the mapping of tables to indices is given at the start of Partition II, section 21. For example when the DeclSecurity table is present in the logical metadata, bit 0x0e should be set in the Valid vector. It is illegal to include non-existent tables in Valid, so all bits above 0x2b shall be zero. The Rows array contains the number of rows for each of the tables that are present. When decoding physical metadata to logical metadata, the number of 1's in Valid indicates the number of elements in the Rows array. A crucial aspect in the encoding of a logical table is its schema. The schema for each table is given in Partition II, section 21. For example, the table with assigned index 0x02 is a TypeDef table, which, according to its specification in Partition II, section 21.34, has the following columns: 4-byte-wide flags, index into the String heap, another index into the String heap, index into the TypeDef or TypeRef table, index into the Field table, index into the MethodDef table. The physical representation of a table with schema (C0,…,Cn 1) with n rows consists of the concatenation of the physical representation of each of its rows. The physical representation of a row with schema (C0,…,Cn 1) is the concatenation of the physical representation of each of its elements. The physical representation of a row cell e at a column with type C is defined as follows:
|