Custom Attribute Value Encoding

Custom Attribute Value Encoding

The Value blob of a custom attribute might contain two categories of data: encoded argument values of the instance constructor, and additional encoded name/value pairs specifying the initialization values of the fields and properties of the custom attribute class.

The Value blob encoding is based on serialization type codes enumerated in CorSerializationType in the CorHdr.h file. The serialization codes for the primitive types, strings, and vectors are the same as the respective ELEMENT_TYPE_* codes—that is, ELEMENT_TYPE_BOOLEAN, and so on, as described in Chapter 7, “Primitive Types and Signatures.” Additional serialization codes include TYPE (0x50), TAGGED_OBJECT (0x51), FIELD (0x53), PROPERTY (0x54), and ENUM (0x55). All the constant names include the prefix SERIALIZATION_TYPE_, which I omit because of my inherent laziness.

The encoded blob begins with the prolog, which is always the 2-byte value 0x0001. The prolog is followed by the values of the constructor arguments. Size and byte layout of these values are inferred from the constructor’s signature. For example, the value 0x1234 supplied as an argument of type int32 is encoded as the following sequence of bytes:

0x34 0x12 0x00 0x00

If the argument is a vector, its encoding begins with a 4-byte element count, followed by the element values. For example, a vector of the three unsigned int16 values 0x1122, 0x3344, and 0x5566 is encoded as follows:

0x03 0x00 0x00 0x00 0x22 0x11 0x44 0x33 0x66 0x55

If the argument is a string, its encoding begins with the compressed string length, followed by the string itself in UTF-8 encoding, without the terminating 0 byte. The length compression formula was discussed in Table 4-1. For example, the string Common Language Runtime is encoded as the following byte sequence, with the leading byte (0x17) representing the string length (23 bytes):

0x17 0x43 0x6F 0x6D 0x6D 0x6F 0x6E 0x20 0x4C 0x61 0x6E 0x67 0x75 0x61 0x67 0x65 0x20 0x52 0x75 0x6E 0x74 0x69 0x6D 0x65

If the argument is an object reference to a boxed primitive value type—bool, char, one of the integer types, or one of the floating-point types—the encoding consists of 1-byte primitive type encoding, followed by the value of the primitive value type.

Finally, if the argument is a type (class), its encoding is similar to that of a string, with the type’s fully qualified name playing the role of the string constant. The rules of the fully qualified type name formatting applied in the custom attribute blob encoding are those of Reflection, which differ from IL assembly language (ILAsm) conventions. The full class name is formed in Reflection and ILAsm almost identically, except for the separator symbols that denote the class nesting. ILAsm notation uses a forward slash:

MyNamespace.MyEnclosingClass/MyNestedClass

whereas the Reflection standard uses a plus sign:

MyNamespace.MyEnclosingClass+MyNestedClass

We find greater difference, however, in the way resolution scope is designated. In ILAsm, the resolution scope is expressed as the external assembly’s alias in square brackets preceding the full class name. In Reflection notation, the resolution scope is specified after the full class name, comma-separated from it. In addition, the concept of the external assembly alias is specific to ILAsm, and Reflection does not recognize it. Thus, if the version, public key token, or culture must be specified, it is done explicitly as a part of the resolution scope specification. The following is an ILAsm example:

.assembly extern OtherAssembly as OtherAsm2  {    .ver 1:2:3:4    .publickeytoken = (01 02 03 04 05 06 07 08)    .locale "fr-CA" }  [OtherAsm2]MyNamespace.MyEnclosingClass/MyNestedClass

In contrast, here is a Reflection example:

MyNamespace.MyEnclosingClass+MyNestedClass, OtherAssembly,     Version=1.2.3.4, PublicKeyToken=0102030405060708, Culture=fr-CA

According to Reflection conventions, the resolution scope specification can be omitted if the referenced class is defined in the current assembly or in Mscorlib.dll. In ILAsm, as you know, the resolution scope is omitted only if the class is defined in the current module.

The byte sequence representing the prolog and the constructor arguments is followed by the 2-byte count of the name/value pairs. A name/value pair specifies which particular field or property must be initialized to a certain value.

The name/value pair encoding begins with the serialization code of the target: FIELD or PROPERTY. The next byte is the serialization code of the target type, which is limited to the primitive types, string, and TYPE. After the target type comes the name of the target, encoded the same way a string argument would be: the compressed length, followed by the string itself in UTF-8 encoding, without the 0 terminator. Immediately after the target name is the target initialization value, encoded similarly to the arguments. For example, the name/value pair initializing a field (0x53) of type bool (0x02) named Inherited (length 0x09) to true (0x01) is encoded as this byte sequence:

0x53 0x02 0x09 0x49 0x6E 0x68 0x65 0x72 0x69 0x74 0x65 0x64 0x01 



Inside Microsoft. NET IL Assembler
Inside Microsoft .NET IL Assembler
ISBN: 0735615470
EAN: 2147483647
Year: 2005
Pages: 147
Authors: SERGE LIDIN

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net