22.1 Bitmasks and FlagsThis section explains the various flags and bitmasks used in the various metadata tables. 22.1.1 Values for AssemblyHashAlgorithm
22.1.2 Values for AssemblyFlags
In portable programs, the Retargetable (0x100) bit shall be set on all references to assemblies specified in this standard.
22.1.3 Values for Culture
Note on RFC 1766 Locale names: a typical string would be "en-US". The first part ("en" in the example) uses ISO 639 characters ("Latin-alphabet characters in lowercase. No diacritical marks of modified characters are used"). The second part ("US" in the example) uses ISO 3166 characters (similar to ISO 639, but uppercase). In other words, the familiar ASCII characters a z and A Z, respectively. However, while RFC 1766 recommends the first part is lowercase, the second part uppercase, it allows mixed case. Therefore, the validation rule checks only that Culture is one of the strings in the list above but the check is totally case-blind where case-blind is the familiar fold on values less than U+0080.
22.1.4 Flags for Events (EventAttributes)
22.1.5 Flags for Fields (FieldAttributes)
22.1.6 Flags for Files (FileAttributes)
22.1.7 Flags for ImplMap (PInvokeAttributes)
22.1.8 Flags for ManifestResource (ManifestResourceAttributes)
22.1.9 Flags for Methods (MethodAttributes)
22.1.10 Flags for Methods (MethodImplAttributes)
22.1.11 Flags for MethodSemantics (MethodSemanticsAttributes)
22.1.12 Flags for Params (ParamAttributes)
22.1.13 Flags for Properties (PropertyAttributes)
22.1.14 Flags for Types (TypeAttributes)
22.1.15 Element Types Used in SignaturesThe following table lists the values for ELEMENT_TYPE constants. These are used extensively in metadata signature blobs see Partition II, section 22.2.
22.2 Blobs and SignaturesThe word "signature" is conventionally used to describe the type info for a function or method that is, the type of each of its parameters, and the type of its return value. Within metadata, the word "signature" is also used to describe the type info for fields, properties, and local variables. Each Signature is stored as a (counted) byte array in the Blob heap. There are six kinds of Signatures, as follows:
The value of the leading byte of a Signature "blob" indicates what kind of Signature it is. This section defines the binary "blob" format for each kind of Signature. Note that Signatures are compressed before being stored into the Blob heap (described below) by compressing the integers embedded in the signature. The maximum encodable integer is 29 bits long, 0x1FFFFFFF. The compression algorithm used is as follows (bit 0 is the least significant bit):
NOTE The table below shows several examples. The first column gives a value, expressed in familiar (C-like) hex notation. The second column shows the corresponding, compressed result, as it would appear in a PE file, with successive bytes of the result lying at successively higher byte offsets within the file. (This is the opposite order from how regular binary integers are laid out in a PE file.)
Thus, the most significant bits (the first ones encountered in a PE file) of a "compressed" field, can reveal whether it occupies 1, 2, or 4 bytes, as well as its value. For this to work, the "compressed" value, as explained above, is stored in big-endian order with the most significant byte at the smallest offset within the file.
Signatures make extensive use of constant values called ELEMENT_TYPE_xxx see Partition II, section 22.1.15. In particular, signatures include two modifiers called: ELEMENT_TYPE_BYREF this element is a managed pointer [see the annotation to Partition I, section 8.9.2]. This modifier can only occur in the definition of Param (Partition II, section 22.2.10) or RetType (Partition II, section 22.2.11). It shall not occur within the definition of a Field (Partition II, section 22.2.4). ELEMENT_TYPE_PTR this element is an unmanaged pointer (see Partition I [section 8.9.2]). This modifier can occur in the definition of Param (Partition II, section 22.2.10) or RetType (Partition II, section 22.2.11) or Field (Partition II, section 22.2.4).
22.2.1 MethodDefSigA MethodDefSig is indexed by the Method.Signature column. It captures the signature of a method or global function. The syntax chart for a MethodDefSig is: This chart uses the following abbreviations (see Partition II, section 14.3): HASTHIS = 0x20, used to encode the keyword instance in the calling convention EXPLICITTHIS = 0x40, used to encode the keyword explicit in the calling convention DEFAULT = 0x0, used to encode the keyword default in the calling convention VARARG = 0x5, used to encode the keyword vararg in the calling convention
The first byte of the Signature holds bits for HASTHIS, EXPLICITTHIS, and calling convention DEFAULT or VARARG. These are OR'd together. ParamCount is an integer that holds the number of parameters (0 or more). It can be any number between 0 and 0x1FFFFFFF. The compiler compresses it too (see Partition II, Metadata Validation [section 3]) before storing into the "blob" (ParamCount counts just the method parameters it does not include the method's return type). The RetType item describes the type of the method's return value (see Partition II, section 22.2.11). The Param item describes the type of each of the method's parameters. There shall be ParamCount instances of the Param item (see Partition II, section 22.2.10). 22.2.2 MethodRefSigA MethodRefSig is indexed by the MemberRef.Signature column. This provides the call-site Signature for a method. Normally, this call-site Signature shall match exactly the Signature specified in the definition of the target method. For example, if a method Foo is defined that takes two uint32's and returns void; then any call site shall index a signature that takes exactly two uint32's and returns void. In this case, the syntax chart for a MethodRefSig is identical with that for a MethodDefSig see Partition II, section 22.2.1. The Signature at a call site differs from that at its definition, only for a method with the VARARG calling convention. In this case, the call-site Signature is extended to include info about the extra VARARG arguments (for example, corresponding to the "..." in C syntax). The syntax chart for this case is: This chart uses the following abbreviations (see Partition II, section 14.3): HASTHIS = 0x20, used to encode the keyword instance in the calling convention EXPLICITTHIS = 0x40, used to encode the keyword explicit in the calling convention DEFAULT = 0x0, used to encode the keyword default in the calling convention VARARG = 0x5, used to encode the keyword vararg in the calling convention SENTINEL = 0x41 (see Partition II, section 22.1.15), used to encode "..." in the parameter list
The Param item describes the type of each of the method's parameters. There shall be ParamCount instances of the Param item. This starts just like the MethodDefSig for a VARARG method (see Partition II, section 22.2.1). But then a SENTINEL token is appended, followed by extra Param items to describe the extra VARARG arguments. Note that the ParamCount item shall indicate the total number of Param items in the Signature before and after the SENTINEL byte (0x41). In the unusual case that a call site supplies no extra arguments, the signature shall not include a SENTINEL (this is the route shown by the lower arrow that bypasses SENTINEL and goes to the end of the MethodRefSig definition). 22.2.3 StandAloneMethodSigA StandAloneMethodSig is indexed by the StandAloneSig.Signature column. It is typically created as preparation for executing a calli instruction. It is similar to a MethodRefSig, in that it represents a call-site signature, but its calling convention may specify an unmanaged target (the calli instruction invokes either managed or unmanaged code). Its syntax chart is: This chart uses the following abbreviations (see Partition II, section 14.3): HASTHIS for 0x20 EXPLICITTHIS for 0x40 DEFAULT for 0x0 VARARG for 0x5 C for 0x1 STDCALL for 0x2 THISCALL for 0x3 FASTCALL for 0x4 SENTINEL for 0x41 (see Partition II, section 22.1.15 and Partition II, section 14.3)
This is the most complex of the various method signatures. Two separate charts have been combined into one in this diagram, using shading to distinguish between them. Thus, for the following calling conventions: DEFAULT (managed), STDCALL, THISCALL, and FASTCALL (unmanaged), the signature ends just before the SENTINEL item (these are all non-vararg signatures). However, for the managed and unmanaged vararg calling conventions: VARARG (managed) and C (unmanaged), the signature can include the SENTINEL and final Param items (they are not required, however). These options are indicated by the shading of boxes in the syntax chart. 22.2.4 FieldSigA FieldSig is indexed by the Field.Signature column or by the MemberRef.Signature column (in the case where it specifies a reference to a field, not a method, of course). The Signature captures the field's definition. The field may be a static or instance field in a class, or it may be a global variable. The syntax chart for a FieldSig looks like this: This chart uses the following abbreviations: FIELD for 0x6
CustomMod is defined in Partition II, section 22.2.7. Type is defined in Partition II, section 22.2.12. 22.2.5 PropertySigA PropertySig is indexed by the Property.Type column. It captures the type information for a Property essentially, the signature of its getter method:
Note that the signatures of the getter and setter are related precisely as follows:
The syntax chart for a PropertySig looks like this: This chart uses the following abbreviations: PROPERTY for 0x8
Type specifies the type returned by the getter method for this property. Type is defined in Partition II, section 22.2.12. Param is defined in Partition II, section 22.2.10. ParamCount is an integer that holds the number of index parameters in the getter methods (0 or more) (see Partition II, section 22.2.1). (ParamCount counts just the method parameters it does not include the method's base type of the Property.) 22.2.6 LocalVarSigA LocalVarSig is indexed by the StandAloneSig.Signature column. It captures the type of all the local variables in a method. Its syntax chart is: This chart uses the following abbreviations: LOCAL_SIG for 0x7, used for the .locals directive (see Partition II, section 14.4.1.3)
BYREF for ELEMENT_TYPE_BYREF (see Partition II, section 22.1.15) Constraint is defined in Partition II, section 22.2.9. Type is defined in Partition II, section 22.2.12. Count is an unsigned integer that holds the number of local variables. It can be any number between 1 and 0xFFFE. There shall be Count instances of the Type in the LocalVarSig. 22.2.7 CustomModThe CustomMod (custom modifier) item in Signatures has a syntax chart like this: This chart uses the following abbreviations:
The CMOD_OPT or CMOD_REQD value is compressed (see Partition II, section 22.2). The CMOD_OPT or CMOD_REQD is followed by a metadata token that indexes a row in the TypeDef table or the TypeRef table. However, these tokens are encoded and compressed see Partition II, section 22.2.8 for details. If the custom modifier is tagged CMOD_OPT, then any importing compiler can freely ignore it entirely. Conversely, if the custom modifier is tagged CMOD_REQD, any importing compiler shall "understand|" the semantic implied by this custom modifier in order to reference the surrounding Signature.
22.2.8 TypeDef or Ref EncodedThese items are compact ways to store a TypeDef or TypeRef token in a Signature (see Partition II, section 22.2.12). Consider a regular TypeRef token, such as 0x01000012. The top byte of 0x01 indicates that this is a TypeRef token (see Partition V for a list of the supported metadata token types). The lower 3 bytes (0x000012) index row number 0x12 in the TypeRef table. The encoded version of this TypeRef token is made up as follows:
So, instead of the original, regular TypeRef token value of 0x01000012, requiring 4 bytes of space in the Signature "blob," this TypeRef token is encoded as a single byte. 22.2.9 ConstraintThe Constraint item in Signatures currently has only one possible value ELEMENT_TYPE_PINNED (see Partition II, section 22.1.15), which specifies that the target type is pinned in the runtime heap, and will not be moved by the actions of garbage collection. A Constraint can only be applied within a LocalVarSig (not a FieldSig). The Type of the local variable shall either be a reference type (in other words, it points to the actual variable for example, an Object or a String); or it shall include the BYREF item. The reason is that local variables are allocated on the runtime stack they are never allocated from the runtime heap; so unless the local variable points at an object allocated in the GC heap, pinning makes no sense. 22.2.10 ParamThe Param (parameter) item in Signatures has this syntax chart: This chart uses the following abbreviations:
CustomMod is defined in Partition II, section 22.2.7. Type is defined in Partition II, section 22.2.12. 22.2.11 RetTypeThe RetType (return type) item in Signatures has this syntax chart: RetType is identical to Param except for one extra possibility, that it can include the type VOID. This chart uses the following abbreviations:
22.2.12 TypeType is encoded in signatures as follows (I1 is an abbreviation for ELEMENT_TYPE_I1, etc.; see Partition II, section 22.1.15): Type ::= BOOLEAN | CHAR | I1 | U1 | I2 | U2 | I4 | U4 | I8 | U8 | R4 | R8 | I | U | | VALUETYPE TypeDefOrRefEncoded | CLASS TypeDefOrRefEncoded | STRING | OBJECT | PTR CustomMod* VOID | PTR CustomMod* Type | FNPTR MethodDefSig | FNPTR MethodRefSig | ARRAY Type ArrayShape (general array, see Partition II, section 22.2.13) | SZARRAY CustomMod* Type (single-dimensional, zero-based array--i.e., vector) 22.2.13 ArrayShapeAn ArrayShape has the following syntax chart: Rank is an integer (stored in compressed form; see Partition II, section 22.2) that specifies the number of dimensions in the array (shall be 1 or more). NumSizes is a compressed integer that says how many dimensions have specified sizes (it shall be 0 or more). Size is a compressed integer specifying the size of that dimension the sequence starts at the first dimension, and goes on for a total of NumSizes items. Similarly, NumLoBounds is a compressed integer that says how many dimensions have specified lower bounds (it shall be 0 or more). And LoBound is a compressed integer specifying the lower bound of that dimension the sequence starts at the first dimension and goes on for a total of NumLoBounds items. None of the dimensions in these two sequences can be skipped, but the number of specified dimensions can be less than Rank. Here are a few examples, all for element type int32:
NOTE Definitions can nest, since the Type may itself be an array. 22.2.14 TypeSpecThe signature in the Blob heap indexed by a TypeSpec token has the following format: TypeSpecBlob :== PTR CustomMod* VOID | PTR CustomMod* Type | FNPTR MethodDefSig | FNPTR MethodRefSig | ARRAY Type ArrayShape | SZARRAY CustomMod* Type For compactness, the ELEMENT_TYPE_ prefixes have been omitted from this list. So, for example, PTR is shorthand for ELEMENT_TYPE_PTR (see Partition II, section 22.1.15). Note that a TypeSpecBlob does not begin with a calling-convention byte, so it differs from the various other signatures that are stored into metadata. 22.2.15 Short-Form SignaturesThe general specification for signatures leaves some leeway in how to encode certain items. For example, it appears legal to encode a String as either
Only the short form is valid. The following table shows which short forms should be used in place of each long-form item. (As usual, for compactness, the ELEMENT_TYPE_ prefixes have been omitted here so VALUETYPE is short for ELEMENT_TYPE_VALUETYPE.)
NOTE Arrays shall be encoded in signatures using one of ELEMENT_TYPE_ARRAY or ELEMENT_TYPE_SZARRAY. There is no long form involving a TypeRef to System.Array. 22.3 Custom AttributesA Custom Attribute has the following syntax chart: All binary values are stored in little-endian format (except PackedLen items used only as counts for the number of bytes to follow in a UTF8 string). If there are no fields, parameters, or properties specified, the entire attribute may be represented as an empty blob. CustomAttrib starts with a Prolog an unsigned int16, with value 0x0001. Next comes a description of the fixed arguments for the constructor method. Their number and type is found by examining that constructor's MethodDef; this info is not repeated in the CustomAttrib itself. As the syntax chart shows, there can be zero or more FixedArgs. (Note that VARARG constructor methods are not allowed in the definition of Custom Attributes.) Next is a description of the optional "named" fields and properties. This starts with NumNamed an unsigned int16 giving the number of "named" properties or fields that follow. Note that NumNamed shall always be present. If its value is zero, there are no "named" properties or fields to follow (and of course, in this case, the CustomAttrib shall end immediately after NumNamed). In the case where NumNamed is non-zero, it is followed by NumNamed repeats of NamedArgs. The format for each FixedArg depends upon whether that argument is single, or an SZARRAY this is shown in the upper and lower paths, respectively, of the syntax chart [above]. So each FixedArg is either a single Elem, or NumElem repeats of Elem. (SZARRAY is the single byte 0x1d and denotes a vector a single-dimension array with a lower bound of zero.) NumElem is an unsigned int32 specifying the number of elements in the SZARRAY, or 0xFFFFFFFF to indicate that the value is null. An Elem takes one of three forms:
Val is the binary value for a simple type. A bool is a single byte with value 0 (false) or 1 (true); char is a two-byte unicode character; and the others have their obvious meaning. A NamedArg is simply a FixedArg (discussed above) preceded by information to identify which field or property it represents. FIELD is the single byte 0x53. PROPERTY is the single byte 0x54. If the parameter kind is a boxed simple value type (bool, char, float32, float64, int8, int16, int32, int64, unsigned int8, unsigned int16, unsigned int32, or unsigned int64), then FieldOrPropType is immediately preceded by a byte containing the value 0x51. The FieldOrPropType shall be exactly one of: ELEMENT_TYPE_BOOLEAN, ELEMENT_TYPE_CHAR, ELEMENT_TYPE_I1, ELEMENT_TYPE_U1, ELEMENT_TYPE_I2, ELEMENT_TYPE_U2, ELEMENT_TYPE_I4, ELEMENT_TYPE_U4, ELEMENT_TYPE_I8, ELEMENT_TYPE_U8, ELEMENT_TYPE_R4, ELEMENT_TYPE_R8, ELEMENT_TYPE_STRING, or the constant 0x50 (for an argument of type Type). A single-dimensional, zero-based array is specified as a single byte 0x1D followed by the FieldOrPropType of the element type. (See Partition II, section 22.1.15.) The FieldOrPropName is the name of the field or property, stored as a SerString (defined above). The SerString used to encode an argument of type Type includes the full type name, followed optionally by the assembly where it is defined, its version, culture, and public key token. If the assembly name is omitted, the CLI looks first in this assembly, and then the assembly named mscorlib. For example, consider the Type string "Ozzy.OutBack.Kangaroo+Wallaby, MyAssembly" for a class "Wallaby" nested within class "Ozzy.OutBack.Kangaroo", defined in the assembly "MyAssembly". 22.4 Marshalling DescriptorsA marshalling descriptor is like a signature it's a "blob" of binary data. It describes how a field or parameter (which, as usual, covers the method return, as parameter number 0) should be marshalled when calling to or from unmanaged code via PInvoke dispatch. The ilasm syntax marshal can be used to create a marshalling descriptor, as can the pseudo custom attribute MarshalAsAttribute see Partition II, section 20.2.1.) Note that a conforming implementation of the CLI need only support marshalling of the types specified earlier see Partition II, section 14.5.5. Marshalling descriptors make use of constants named NATIVE_TYPE_xxx. Their names and values are listed in the following table:
The "blob" has the following format: MarshalSpec ::= NativeInstrinsic | ARRAY ArrayElemType ParamNum ElemMult NumElem
NativeInstrinsic ::= BOOLEAN | I1 | U1 | I2 | U2 | I4 | U4 | I8 | U8 | R4 | R8 | CURRENCY | BSTR | LPSTR | LPWSTR | LPTSTR | INT | UINT | FUNC | LPVOID For compactness, the NATIVE_TYPE_ prefixes have been omitted in the above lists. So, for example, ARRAY is shorthand for NATIVE_TYPE_ARRAY.
NumElem is an integer (compressed as described in Partition II, section 22.2) that specifies how many elements are in the array. ArrayElemType :== NativeInstrinsic | BOOLEAN | I1 | U1 | I2 | U2 | I4 | U4 | I8 | U8 | R4 | R8 | LPSTR | INT | UINT | FUNC | LPVOID
ParamNum is an integer (compressed as described in Partition II, section 22.2) specifying the parameter in the method call that provides the number of elements in the array see below. ElemMult is an integer compressed as described in Partition II, section 22.2 (says by whatfactor to multiply see below). NOTE For example, in the method declaration: Foo (int ar1[], int size1, byte ar2[], int size2) The ar1 parameter might own a row in the FieldMarshal table, which indexes a MarshalSpec in the Blob heap with the format: ARRAY MAX 2 1 0 This says the parameter is marshalled to a NATIVE_TYPE_ARRAY. There is no additional info about the type of each element (signified by that NATIVE_TYPE_MAX). The value of ParamNum is 2, which indicates that parameter number 2 in the method (the one called "size1") will specify the number of elements in the actual array let's suppose its value on a particular call is 42. The value of ElemMult is 1. The value of NumElem is 0. The calculated total size, in bytes, of the array is given by the formula: if ParamNum == 0 SizeInBytes = NumElem * sizeof (elem) else SizeInBytes = ( @ParamNum * ElemMult + NumElem ) * sizeof (elem) endif The syntax "@ParamNum" is used here to denote the value passed in for parameter number ParamNum it would be 42 in this example. The size of each element is calculated from the metadata for the ar1 parameter in Foo's signature an ELEMENT_TYPE_I4 (see Partition II, section 22.1.15) of size 4 bytes. |