RIDs and Tokens

RIDs and Tokens

Record indexes and tokens are the unsigned integer values used for indexing the records in metadata tables. RIDs are simple indexes, applicable only to an explicitly specified table, and tokens carry the information identifying metadata tables they reference.

RIDs

A RID is a record identifier, which is simply a one-based row number in the table containing the record. The range of valid RIDs stretches from 1 to the record count of the addressed table, inclusive. RIDs are used in metadata internally only; metadata emission and retrieval APIs do not use RIDs as parameters.

The RID column type codes (0 63) serve as zero-based table indexes. Thus the type of the column identifies the referenced table, while the value of the table cell identifies the referenced record. This works fine as long as we know that a particular column always references one particular table and no other. Now if we only could combine RID with table identification.

Tokens

Actually, we can. The combined identification entity, referred to as a token, is used in all metadata APIs and in all IL instructions. A token is a 4-byte unsigned integer whose senior byte carries a zero-based table index (the same as the internal metadata RID type). The remaining 3 bytes are left for the RID.

There is a significant difference between token types and internal metadata RID types, however: whereas internal RID types cover all metadata tables, the token types are defined for only a limited subset of the tables, as noted in Table 4-2.

Table 4-2  Token Types and Their Referenced Tables 

Token Type

Value (RID Type << 24)

Referenced Table

mdtModule

0x00000000

Module

mdtTypeRef

0x01000000

TypeRef

mdtTypeDef

0x02000000

TypeDef

mdtFieldDef

0x04000000

Field

mdtMethodDef

0x06000000

Method

mdtParamDef

0x08000000

Param

mdtInterfaceImpl

0x09000000

InterfaceImpl

mdtMemberRef

0x0A000000

MemberRef

mdtCustomAttribute

0x0C000000

CustomAttribute

mdtPermission

0x0E000000

DeclSecurity

mdtSignature

0x11000000

StandAloneSig

mdtEvent

0x14000000

Event

mdtProperty

0x17000000

Property

mdtModuleRef

0x1A000000

ModuleRef

mdtTypeSpec

0x1B000000

TypeSpec

mdtAssembly

0x20000000

Assembly

mdtAssemblyRef

0x23000000

AssemblyRef

mdtFile

0x26000000

File

mdtExportedType

0x27000000

ExportedType

mdtManifestResource

0x28000000

ManifestResource

The 24 tables that do not have associated token types are not intended to be accessed from “outside,” through metadata APIs or from IL code. These tables (excluding the TypeTyPar and MethodTyPar tables, which are reserved for future use) are of an auxiliary or intermediate nature and should be accessed indirectly only, through the references contained in the “exposed” tables, which have associated token types.

The validity of these tokens can be defined simply: a valid token has a type from Table 4-2, and it has a valid RID—that is, a RID in the range 1 to the record count of the table of a specified type.

An additional token type, quite different from the types listed in Table 4-2, is mdtString (0x70000000). Tokens of this type are used to refer to the user-defined Unicode strings stored in the #US stream.

Both the type component and the RID component of user-defined string tokens differ from those of metadata table tokens. The type component of a user-defined string token (0x70) has nothing to do with column types (the maximal column type is 103 = 0x67), which is not surprising, considering that no column type corresponds to an offset in the #US stream. Because metadata tables never reference the user-defined strings, it’s not necessary to define a column type for the strings. In addition, the RID component of a user-defined string token does not represent a RID because no table is being referenced. Instead, the 3 lower bytes of a user-defined string token hold an offset in the #US stream.

The definition of the validity of a user-defined string token is more complex. The RID component is valid if it is greater than 0 and if the string it defines starts at a 4-byte boundary and is fully contained within the #US stream. The last condition is checked in the following way: The bytes at the offset specified by the RID component of the token are interpreted as the compressed length of the string. (Don’t forget that the #US stream is a blob heap.) If the sum of the offset and the size of compressed length brings us to a 4-byte boundary, and if this sum plus the calculated length are within the #US stream size, everything is fine and the token is valid.

Coded Tokens

The discussion thus far has focused on the “external” form of tokens. You have every right to suspect that the “internal” form of tokens, used inside the metadata, is different—and it is.

Why can’t the external form also be used as internal? Because the external tokens are huge. Imagine, 4 bytes for each token, when we fight for each measly byte, trying to squeeze the metadata into as small a footprint as possible. (Bandwidth! Don’t forget about the bandwidth!) Compression? Alas, because of the type component occupying the senior byte, external tokens represent very large unsigned integers and thus cannot be efficiently compressed, even though their middle bytes are full of zeros. We need a fresh approach.

The internal encoding of tokens is based on a simple idea: A column must be given a token type only if it might reference several tables. (Columns referencing only one table have a respective RID type.) But any such column certainly does not need to reference all the tables.

So our first task is to identify which group of tables each such column might reference and form a set of such groups. Let’s assign each group a number, which will be a coded token type of the column. Because coded token types occupy a range from 64 to 95, we can define up to 32 groups.

Now, every group contains two or more table types. Let’s enumerate them within the group and see how many bits we will need for this enumeration. This bit count will be a characteristic of the group and hence of the respective coded token type. The number assigned to a table within the group is called a tag.

This tag plays a role roughly equivalent to that of the type component of an external token. But, unwilling to once again create large tokens full of zeros, we will this time put the tag not in the most significant bits of the token but rather in the least significant bits. Then let’s left-shift the RID n bits and add the left-shifted RID to the tag, where n is the bit width of the tag. Now we’ve got a coded token.

What about the coded token size? We know which metadata tables form each group, and we know the record count of each table, so we know the maximal possible RID within the group. Say, for example, that we would need m bits to encode the maximal RID. If we can fit the maximal RID (m bits) and the tag (n bits) into a 2-byte unsigned integer (16 bits), we win, and the coded token size for this group will be 2 bytes. If we can’t, we are out of luck and will have to use 4-byte coded tokens for this group. No, we won’t even consider 3 bytes—it’s unbecoming.

To summarize, a coded token type has the following attributes:

  • Number of referenced tables (part of the schema)

  • Array of referenced table IDs (part of the schema)

  • Tag bit width (part of the schema, derived from the number of referenced tables)

  • Coded token size, either 2 or 4 bytes (computed at the metadata opening time from the tag width and the maximal record count among the referenced tables)

Table 4-3 lists the twelve coded token types defined in the metadata schema of the first release of the common language runtime.

Table 4-3  Coded Token Types 

Coded Token Type

Tag

TypeDefOrRef (64): 3 referenced tables, tag size 2

TypeDef

0

TypeRef

1

TypeSpec

2

HasConstant (65): 3 referenced tables, tag size 2

Field

0

Param

1

Property

2

HasCustomAttribute (66): 19 referenced tables, tag size 5

Method

0

Field

1

TypeRef

2

TypeDef

3

Param

4

InterfaceImpl

5

MemberRef

6

Module

7

DeclSecurity

8

Property

9

Event

10

StandAloneSig

11

ModuleRef

12

TypeSpec

13

Assembly

14

AssemblyRef

15

File

16

ExportedType

17

ManifestResource

18

HasFieldMarshal (67): 2 referenced tables, tag size 1

Field

0

Param

1

HasDeclSecurity (68): 3 referenced tables, tag size 2

TypeDef

0

Method

1

Assembly

2

MemberRefParent (69): 5 referenced tables, tag size 3

TypeDef

0

TypeRef

1

ModuleRef

2

Method

3

TypeSpec

4

HasSemantics (70): 2 referenced tables, tag size 1

Event

0

Property

1

MethodDefOrRef (71): 2 referenced tables, tag size 1

Method

0

MemberRef

1

MemberForwarded (72): 2 referenced tables, tag size 1

Field

0

Method

1

Implementation (73): 3 referenced tables, tag size 2

File

0

AssemblyRef

1

ExportedType

2

CustomAttributeType (74): 5 referenced tables, tag size 3

TypeRef

0

TypeDef

1

Method

2

MemberRef

3

String

4

ResolutionScope (75): 4 referenced tables, tag size 2

Module

0

ModuleRef

1

AssemblyRef

2

TypeRef

3

note

The coded token type range (64 95) provides room to add another twenty types in the future, should it ever become necessary.

Coded tokens are part of metadata’s internal affairs. The ILAsm compiler, like all other compilers, never deals with coded tokens. Compilers and other tools read and emit metadata through the metadata import and emission APIs, either directly or through managed wrappers provided in the .NET Framework class library—System.Reflection for metadata import and System.Reflection.Emit for metadata emission. The metadata APIs automatically convert standard 4-byte tokens to and from coded tokens. IL code also uses only standard 4-byte tokens.

Nonetheless, the preceding definitions are useful to us for two reasons. First, we will need them when we discuss individual metadata tables in later chapters. Second, these definitions provide a good hint about the nature of relationships between the metadata tables.



Inside Microsoft. NET IL Assembler
Inside Microsoft .NET IL Assembler
ISBN: 0735615470
EAN: 2147483647
Year: 2005
Pages: 147
Authors: SERGE LIDIN

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net