23 Metadata Physical Layout


The physical on-disk representation of metadata is a direct reflection of the logical representation described in Partition II, sections 21 and 22. That is, data is stored in streams representating the metadata tables and heaps. The main complication is that, where the logical representation is abstracted from the number of bytes needed for indexing into tables and columns, the physical representation has to take care of that explicitly by defining how to map logical metadata heaps and tables into their physical representations.

Unless stated otherwise, all binary values are stored in little-endian format.

23.1 Fixed Fields

Complete CLI components (metadata and CIL instructions) are stored in a subset of the current Portable Executable (PE) file format (see Partition II, section 24). Because of this heritage, some of the fields in the physical representation of metadata have fixed values. When writing these fields, they shall be set to the value indicated; on reading they may be ignored.

23.2 File Headers

23.2.1 Metadata Root

The root of the physical metadata starts with a magic signature, several bytes of version and other miscellaneous information, followed by a count and an array of stream headers, one for each stream that is present. The actual encoded tables and heaps are stored in the streams, which immediately follow this array of headers.

Offset

Size

Field

Description

0

4

Signature

Magic signature for physical metadata: 0x424A5342

4

2

MajorVersion

Major version, 1 (ignore on read)

6

2

MinorVersion

Minor version, 1 (ignore on read)

8

4

Reserved

Reserved, always 0 (see Partition II, section 21)

12

4

Length

Length of version string in bytes say, m (<= 255) rounded up to a multiple of four

16

m

Version

UTF8-encoded version string of length m (see below)

16+m

  

Padding to next 4-byte boundary say, x

x

2

Flags

Reserved, always 0

x+2

2

Streams

Number of streams say, n

x+4

 

StreamHeaders

Array of n StreamHdr structures

The Version string shall be "Standard CLI 2002" for any file that is intended to be executed on any conforming implementation of the CLI, and all conforming implementations of the CLI shall accept files that use this version string. Other strings shall be used when the file is restricted to a vendor-specific implementation of the CLI. Future versions of this standard shall specify different strings, but they shall begin "Standard CLI". Other standards that specify additional functionality shall specify their own specific version strings beginning with "Standard". Vendors that provide implementation-specific extensions shall provide a version string that does not begin with "Standard".

23.2.2 Stream Header

A stream header gives the names, and the position and length of a particular table or heap. Note that the length of a stream header structure is not fixed, but depends on the length of its name field (a variable-length, null-terminated string).

Offset

Size

Field

Description

0

4

Offset

Memory offset to start of this stream from start of the metadata root (see Partition II, section 23.2.1)

4

4

Size

Size of this stream in bytes; shall be a multiple of 4

8

 

Name

Name of the stream as null-terminated, variable-length array of ASCII characters, padded to the next 4-byte boundary with \0 characters

Both logical tables and heaps are stored in streams. There are five possible kinds of streams: a stream header with name "#Strings" that points to the physical representation of the String heap where identifier strings are stored; a stream header with name "#US" that points to the physical representation of the Userstring heap; a stream header with name "#Blob" that points to the physical representation of the Blob heap; a stream header with name "#GUID" that points to the physical representation of the Guid heap; and a stream header with name "#~" that points to the physical representation of a set of tables (see Partition II, section 22).

ANNOTATION

Implementation-Specific (Microsoft): Some compilers store metadata in a "#~" stream, which holds an uncompressed, or non-optimized, representation of metadata tables; this includes extra metadata "pointer" tables. Such PE files do not form part of this International Standard.


Each kind of stream may occur at most once; that is, a metadata file may not contain two "#US" streams, or five "#Blob" streams. Streams need not be there if they are empty.

The next sections will describe the structure of each kind of stream in more detail.

23.2.3 #Strings Heap

The stream of bytes pointed to by a "#Strings" header is the physical representation of the logical string heap. The physical heap may contain garbage; that is, it may contain parts that are unreachable from any of the tables, but parts that are reachable from a table shall contain a valid null-terminated UTF8 string. When the #String heap is present, the first entry is always the empty string (i.e., \0).

23.2.4 #US and #Blob Heaps

The stream of bytes pointed to by a "#US" or "#Blob" header are the physical representation of logical Userstring and Blob heaps, respectively. Both these heaps may contain garbage, as long as any part that is reachable from any of the tables contains a valid "blob." Individual blobs are stored with their length encoded in the first few bytes:

  • If the first 1 byte of the "blob" is 0bs, then the rest of the "blob" contains the (bs) bytes of actual data.

  • If the first 2 bytes of the "blob" are 10bs and x, then the rest of the "blob" contains the (bs << 8 + x) bytes of actual data.

  • If the first 4 bytes of the "blob" are 110bs, x, y, and z, then the rest of the "blob" contains the (bs << 24 + x << 16 + y << 8 + z) bytes of actual data.

The first entry in both these heaps is the empty "blob" that consists of the single byte 0x00.

Strings in the #US (Userstring) heap are encoded using 16-bit Unicode encodings. The count on each string is the number of bytes (not characters) in the string. Furthermore, there is an additional terminal byte (so all byte counts are odd, not even). This final byte holds the value 1 if and only if any UTF16 character within the string has any bit set in its top byte, or its low byte is any of the following: 0x01 0x08, 0x0E 0x1F, 0x27, 0x2D, 0x7F. Otherwise, it holds 0. The 1 signifies Unicode characters that require handling beyond that normally provided for 8-bit encoding sets.

23.2.5 #GUID Heap

The "#GUID" header points to a sequence of 128-bit GUIDs. There might be unreachable GUIDs stored in the stream.

23.2.6 #~ Stream

The "#~" streams contain the actual physical representations of the logical metadata tables (see Partition II, section 21). A "#~" stream has the following top-level structure:

Offset

Size

Field

Description

0

4

Reserved

Reserved, always 0

4

1

MajorVersion

Major version of table schemata, always 1

5

1

MinorVersion

Minor version of table schemata, always 0

6

1

HeapSizes

Bit vector for heap sizes

7

1

Reserved

Reserved, always 1

8

8

Valid

Bit vector of present tables, let n be the number of bits that are 1

16

8

Sorted

Bit vector of sorted tables

24

4*n

Rows

Array of n 4-byte unsigned integers indicating the number of rows for each present table

24+4*n

 

Tables

The sequence of physical tables

The HeapSizes field is a bit vector that encodes how wide indices into the various heaps are. If bit 0 is set, indices into the "#String" heap are 4 bytes wide; if bit 1 is set, indices into the "#GUID" heap are 4 bytes wide; bit 2 is not used; if bit 3 is set, indices into the "#Blob" heap are 4 bytes wide. Conversely, if the HeapSize bit for a particular heap is not set, indices into that heap are 2 bytes wide.

Bit Position

Description

0x01

Size of "#String" stream >= 216.

0x02

Size of "#GUID" stream >= 216.

0x04

Size of "#Blob" stream >= 216.

The Valid field is a 64-bit-wide bit vector that has a specific bit set for each table that is stored in the stream; the mapping of tables to indices is given at the start of Partition II, section 21. For example when the DeclSecurity table is present in the logical metadata, bit 0x0e should be set in the Valid vector. It is illegal to include non-existent tables in Valid, so all bits above 0x2b shall be zero.

The Rows array contains the number of rows for each of the tables that are present. When decoding physical metadata to logical metadata, the number of 1's in Valid indicates the number of elements in the Rows array.

A crucial aspect in the encoding of a logical table is its schema. The schema for each table is given in Partition II, section 21. For example, the table with assigned index 0x02 is a TypeDef table, which, according to its specification in Partition II, section 21.34, has the following columns: 4-byte-wide flags, index into the String heap, another index into the String heap, index into the TypeDef or TypeRef table, index into the Field table, index into the MethodDef table.

The physical representation of a table with schema (C0,…,Cn 1) with n rows consists of the concatenation of the physical representation of each of its rows. The physical representation of a row with schema (C0,…,Cn 1) is the concatenation of the physical representation of each of its elements. The physical representation of a row cell e at a column with type C is defined as follows:

  • If e is a constant, it is stored using the number of bytes as specified for its column type C (i.e., a 2-byte bitmask of type PropertyAttributes).

  • If e is an index into the Guid heap, Blob [heap], or String heap, it is stored using the number of bytes as defined in the HeapSizes field.

  • If e is a simple index into a table with index i, it is stored using 2 bytes if table i has less than 216 rows, otherwise it is stored using 4 bytes.

  • If e is a coded index (see Partition II, section 21) that points into table ti out of n possible tables t0, …tn 1, then it is stored as e << (log n) | tag{ t0, …tn 1}[ ti] using 2 bytes if the maximum number of rows of tables, t0, …tn 1, is less than 216 (log n), and using 4 bytes otherwise. The family of finite maps tag{ t0, …tn 1} is defined below. Note that decoding a physical row requires the inverse of this mapping. (For example, the Parent column of the Constant table indexes a row in the Field, Param, or Property tables. The actual table is encoded into the low 2 bits of the number, using the values: 0 => Field, 1 => Param, 2 => Property. The remaining bits hold the actual row number being indexed. For example, a value of 0x321 indexes row number 0xC8 in the Param table.)

TypeDefOrRef: 2 bits to encode tag

Tag

TypeDef

0

TypeRef

1

TypeSpec

2

HasConstant: 2 bits to encode tag

Tag

FieldDef

0

ParamDef

1

Property

2

HasCustomAttribute: 5 bits to encode tag

Tag

MethodDef

0

FieldDef

1

TypeRef

2

TypeDef

3

ParamDef

4

InterfaceImpl

5

MemberRef

6

Module

7

Permission

8

Property

9

Event

10

StandAloneSig

11

ModuleRef

12

TypeSpec

13

Assembly

14

AssemblyRef

15

File

16

ExportedType

17

ManifestResource

18

HasFieldMarshall: 1 bit to encode tag

Tag

FieldDef

0

ParamDef

1

HasDeclSecurity: 2 bits to encode tag

Tag

TypeDef

0

MethodDef

1

Assembly

2

MemberRefParent: 3 bits to encode tag

Tag

Not used

0

TypeRef

1

ModuleRef

2

MethodDef

3

TypeSpec

4

HasSemantics: 1 bit to encode tag

Tag

Event

0

Property

1

MethodDefOrRef: 1 bit to encode tag

Tag

MethodDef

0

MemberRef

1

MemberForwarded: 1 bit to encode tag

Tag

FieldDef

0

MethodDef

1

Implementation: 2 bits to encode tag

Tag

File

0

AssemblyRef

1

ExportedType

 

CustomAttributeType: 3 bits to encode tag

Tag

Not used

0

Not used

1

MethodDef

2

MemberRef

3

Not used

4

ResolutionScope: 2 bits to encode tag

Tag

Module

0

ModuleRef

1

AssemblyRef

2

TypeRef

3



The Common Language Infrastructure Annotated Standard (Microsoft. NET Development Series)
The Common Language Infrastructure Annotated Standard (Microsoft. NET Development Series)
ISBN: N/A
EAN: N/A
Year: 2002
Pages: 121

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net