Primitive Types in the Common Language Runtime

Primitive Types in the Common Language Runtime

All types have to be defined somewhere. The Microsoft .NET Framework class library defines hundreds of types, and other assemblies build their own types based on the types defined in the class library. Some of the types defined in the class library are recognized by the common language runtime as primitive types and are given special encoding in the signatures. This is done only for the sake of performance—theoretically, the signatures could have been built from type tokens only, given that every type is defined somewhere and hence has a token. But resolving all these tokens simply to find that they reference trivial items such as a 4-byte integer or a Boolean value can hardly be considered a sensible way to work in the runtime.

Primitive Data Types

The term primitive data types refers to the types defined in the .NET Framework class library that are given specific individual type codes to be used in signatures. Because all these types are defined in the assembly Mscorlib and all belong to the namespace System, I have omitted the prefix [mscorlib]System when supplying the class library type name for a type.

The individual type codes are defined in the enumeration CorElementType in the header file CorHdr.h. The names of all these codes begin with ELEMENT_TYPE_, which I have either omitted in this chapter or abbreviated as E_T_.

Table 7-1 describes primitive data types and their respective ILAsm notation.

Table 7-1  Primitive Data Types Defined in the Runtime 

Code

Constant Name

.NET Framework Type Name

ILAsm Notation

Comments

0x01

VOID

Void

void

0x02

BOOLEAN

Boolean

bool

Single-byte value, true = 1, false = 0

0x03

CHAR

Char

char

2-byte unsigned integer, representing a Unicode character

0x04

I1

SByte

int8

Signed 1-byte integer, the same as char in C/C++

0x05

U1

Byte

unsigned int8

Unsigned 1-byte integer

0x06

I2

Int16

int16

Signed 2-byte integer

0x07

U2

UInt16

unsigned int16

Unsigned 2-byte integer

0x08

I4

Int32

int32

Signed 4-byte integer

0x09

U4

UInt32

unsigned int32

Unsigned 4-byte integer

0x0A

I8

Int64

int64

Signed 8-byte integer

0x0B

U8

UInt64

unsigned int64

Unsigned 8-byte integer

0x0C

R4

Single

float32

4-byte floating-point

0x0D

R8

Double

float64

8-byte floating-point

0x16

TYPEDBYREF

TypedReference

typedref

Typed reference, carrying both reference to a type and information identifying the referenced type

0x18

I

IntPtr

native int

Pointer-size integer; size dependent on the underlying platform, hence use of the keyword native

0x19

U

UIntPtr

native unsigned int

Pointer-size unsigned integer

Data Pointer Types

Two data pointer types are defined in the common language runtime: the managed pointer, which is a reference, and the unmanaged pointer, which is a pointer in the conventional sense. The difference is that a managed pointer is managed by the runtime’s garbage collection subsystem and stays valid even if the referenced item is moved in memory during the process of garbage collection, whereas an unmanaged pointer can be safely used only in association with “unmovable” items.

Both pointer types have no meaning per se and must be followed by the base types, which are the types to which the pointer types point. As derivatives from base types, the pointer types have no corresponding types defined in the .NET Framework class library and cannot be boxed. Table 7-2 describes the two pointer types and their ILAsm notations. Neither of them has a respective .NET Framework type associated.

Table 7-2  Pointer Types Defined in the Runtime

Code

Constant Name

ILAsm Notation

Comments

0x0F

PTR

<type>*

Unmanaged pointer to <type>

0x10

BYREF

<type>&

Managed pointer to <type>

note

Note that although ILAsm notation places the pointer sign after the pointed type, in signatures E_T_PTR and E_T_BYREF always precede the pointed type.

Pointers of both types are subject to standard pointer arithmetic: an integer can be added to or subtracted from a pointer, resulting in a pointer; and one pointer can be subtracted from another, resulting in an integer value. The difference between pointer arithmetic in, say, C/C++ and in IL (intermediate language) is that in IL—and hence in ILAsm—the increments and decrements of pointers are always specified in bytes, regardless of the size of the item the pointer represents.

C/C++:

      long L, *pL=&L;              pL += 4; // pL is incremented by 4*sizeof(long) = 16 bytes

ILAsm:

      .locals init(int32 L, int32& pL)       ldloca L   // Load pointer to L on stack       stloc pL   // pL = &L              ldloc pL   // Load pL on stack       ldc.i4 4   // Load 4 on stack       add             stloc pL   // pL += 4, pL is incremented by 4 bytes

By the same token—now, this is just a common expression. I’m not referring to metadata tokens. (I think I’d better be extra careful with phrases like “by the same token” or “token of appreciation” in this book.) In the same way, the delta of two pointers in IL is always expressed in bytes, not in the items pointed at.

Using unmanaged pointers in IL is not considered nice. Because of the unlimited access that C-style pointer arithmetic gives to anybody for anything, IL code, which has unmanaged pointers dereferenced, is deemed unverifiable and can be run only from a local drive with run-time code verification disabled.

Managed pointers are tamed, domesticated pointers, fully owned by the common language runtime type control and the garbage collection subsystem. These pointers dwell in a safe but not too spacious corral, fenced along the following lines:

  • Managed pointers are always references to an item in existence—a field, an array element, a local variable, a method argument.

  • Managed pointer types can be used only for method attributes—local variables, parameters, or a return type.

  • Array elements and fields cannot have managed pointer types. Local variables and method parameters can, and it is not a simple coincidence that all these items are stack-allocated.

  • Managed pointers that point to “managed memory” (the garbage collector heap, which contains object instances and arrays) cannot be converted to unmanaged pointers.

  • Managed pointers that don’t point to the garbage collector heap can be converted to unmanaged pointers, but such conversion renders the IL code unverifiable.

  • The underlying type of a managed pointer cannot be another pointer, but it can be an object reference.

Managed pointers are different from object references. In Chapter 6, “Namespaces and Classes,” which described boxing and unboxing of the value types, we saw that it takes boxing to create an object reference to a value type. Using a simple reference—that is, a managed pointer—is not enough.

The difference is that an object reference points to the method table of an object, whereas a managed pointer points to the value (data) part of the item. When you take a managed pointer to an instance of a value type, you address the data part. You can have only this much because instances of value types, not being objects, have no method tables.

When you box a value type instance, you create an object, a class instance with its own method table and data part copied from the value type instance. This object is represented by an object reference.

Function Pointer Types

Chapter 6 briefly described the use of managed function pointers and compared them with delegate types. Managed function pointers are represented by type E_T_FNPTR, which is indicated by the value 0x1B and doesn’t have a .NET Framework type associated.

Just like a data pointer type, a function pointer type does not exist by itself and must be followed by the full signature of the function to which it points. (Method signatures are discussed later in this chapter; see “Signatures.”)

The ILAsm notation for a function pointer is as follows:

   <call_conv> <return_type> * (<type>[,<type>*])

where <call_conv> is a calling convention, <return_type> is the return type, and the <type> sequence in the parentheses is the argument list. You’ll find more details in the “Signatures” section.

Vectors and Arrays

The common language runtime recognizes two types of arrays: vectors and multidimensional arrays, as described in Table 7-3. Vectors are single-dimensional arrays with a zero lower bound. Multidimensional arrays, which I’ll refer to as arrays, can have more than one dimension and nonzero lower bounds. Neither of these two types of arrays has a respective .NET Framework type associated.

Table 7-3  Arrays Supported in the Runtime

Code

Constant Name

ILAsm Notation

Comments

0x1D

SZARRAY

<type>[ ]

Vector of <type>

0x14

ARRAY

<type>[<bounds> [,<bounds>*] ]

Array of <type>

All vectors and arrays are objects (class instances) derived from the abstract class [mscorlib]System.Array. This is a very peculiar class; in fact, it is a construct known as a generic.

Vector encoding is very simple: E_T_SZARRAY followed by the encoding of the underlying type, which can be anything except void. The size of the vector is not part of the encoding. Because arrays and vectors are object references, it is not enough to simply declare an array—you must create an instance of it, using the instruction newarr for a vector or calling an array constructor. It is at that point that the size of the vector or array instance is specified.

Array encoding is more sophisticated:

   E_T_ARRAY<underlying_type><rank><num_sizes><size1> <sizeN>             <num_lower_bounds><lower_bound1> <lower_boundM>

where the following is true:

   <underlying_type> cannot be void    <rank> is the number of array dimensions (K>0)    <num_sizes> is the number of specified sizes for dimensions (N = K)    <sizenis an unsigned integer specifying the size (n = 1, ,N)    <num_lower_bounds> is the number of specified lower bounds (M = K)    <lower_boundmis a signed integer specifying the lower bound (m =    1, ,M)

All the above unsigned integer values are compressed according to the length compression formula discussed in Chapter 4, “Metadata Tables Organization.” To save you a trip three chapters back, I will repeat this formula in Table 7-4.

Table 7-4  The Length Compression Formula for Unsigned Integers

Value Range

Compressed Size

Compressed Value

0 0x7F

1 byte

<value>

0x80 0x3FFF

2 bytes

0x8000 <value>

0x4000 0x1FFFFFFF

4 bytes

0xC0000000 <value>

Signed integer values (lower bound values) are compressed according to a different compression procedure. First the signed integer is encoded as an unsigned integer by taking the absolute value of the original integer, shifting it left by 1 bit, and setting the least significant bit according to the most significant (sign) bit of the original value. Then compression is applied according to the formula shown in Table 7-4.

If size and/or the lower bound for a dimension are not specified, they are not presumed to be 0; rather, they are marked as not specified. The specification of size and lower bound cannot have “holes”—that is, if you have an array of rank 5 and want to specify size (or lower bound) for its third dimension, you must specify size (or lower bound) for the first and second dimensions as well.

An array specification in ILAsm looks like this:

   <type> <bounds>[, <bounds>*] ]

where

   <bounds> ::= [<lower_bound>]   [<upper_bound>]

The following is an example:

   int32[ ,  ] // Two-dimensional array with undefined lower bounds                // And sizes    int32[2 5]  // One-dimensional array with lower bound 2 and size 4    int32[0 , 0 ] // Two-dimensional array with zero lower bounds                   // And undefined sizes

If neither lower bound nor upper bound is specified for a dimension in a multidimensional array declaration, the ellipsis can be omitted. Thus int32[ , ] and int32[,] mean the same: a two-dimensional array with no lower bounds or sizes specified.

This omission does not work in the case of single-dimensional arrays, however. The notation int32[ ] indicates a vector (<E_T_SZARRAY><E_T_I4>), and int32[ ] indicates an array of rank 1 whose lower bound and size are undefined (<E_T_ARRAY><E_T_I4><1><0><0>).

The common language runtime treats multidimensional arrays and vectors of vectors (of vectors, and so on) completely differently. The specifications int32[,] and int32[ ][ ] result in different type encoding, are created differently, and are laid out differently when created:

  • int32[,]  This specification has the encoding <E_T_ARRAY><E_T_ I4><1><0><0>, is created by a single call to an array constructor, and is laid out as a contiguous two-dimensional array of int32.

  • int32[ ][ ]  This specification has the encoding <E_T_SZARRAY><E_T_ SZARRAY><E_T_I4>, is created by a series of newarr instructions, and is laid out as a vector of vector references, each pointing to a contiguous vector of int32, with no guarantee regarding the location of each vector. Vectors of vectors are useful for describing jagged arrays, when the size of the second dimension varies depending on the first dimension index.

Modifiers

Four built-in common language runtime types, described in Table 7-5, do not denote any specific data or pointer type but rather are used as modifiers of data and pointer types. None of these modifiers have a respective .NET Framework type associated.

Table 7-5  Custom Modifiers Defined in the Runtime

Code

Constant Name

ILAsm Notation

Comments

0x1F

CMOD_REQD

modreq( <class_ref> )

Required C modifier

0x20

CMOD_OPT

modopt( <class_ref> )

Optional C modifier

0x41

SENTINEL

Start of optional arguments in a vararg method call

0x45

PINNED

pinned

Marks a local variable as unmovable by the garbage collector

The modifiers modreq and modopt indicate that the item to which they are attached—an argument, a return type, or a field, for example—must be treated in some special way. These modifiers are followed by TypeDef or TypeRef tokens, and the classes corresponding to these tokens indicate the special way the item is to be handled.

The tokens following modreq and modopt are compressed according to the following algorithm. As you might remember, an uncoded (external) metadata token is a 4-byte unsigned integer, which has the token type in its senior byte and a record index (RID) in its 3 lower bytes. It so happens that the tokens appearing in the signatures and hence requiring compression are of three types only: TypeDef, TypeRef, or TypeSpec. (See “Signatures” later in this chapter for information about TypeSpecs.) Because of that, only 2 bits, rather than a whole byte, are required for the token type: 00 denotes TypeDef, 01 is used for TypeRef, and 10 for TypeSpec. The token compression procedure resembles the procedure used to compress the signed integers: the RID part of the token is shifted left by 2 bits, and the 2-bit type encoding is placed in the least significant bits. The result is compressed just as any unsigned integer would be, according to the formula shown earlier in Table 7-4.

The modifiers modreq and modopt are used primarily by tools other than the common language runtime, such as compilers or program analyzers. The modreq modifier indicates that the modifier must be taken into account, whereas modopt indicates that the modifier is optional and can be ignored. The ILAsm compiler does not use these modifiers for its internal purposes.

The only use of the modreq and modopt modifiers recognized by the common language runtime is when these modifiers are applied to return types or parameters of methods subject to managed/unmanaged marshaling. For example, to specify that a managed method must have the cdecl calling convention when it is marshaled as unmanaged, we can use the following modifier attached to the method’s return type:

modopt([mscorlib]System.Runtime.InteropServices.CallConvCdecl)

When used in the context of managed/unmanaged marshaling, the modreq and modopt modifiers are equivalent.

Although the modreq and modopt modifiers have no effect on the managed types of the items to which they are attached, signatures with and without these modifiers are considered different. The same is true for signatures differing only in classes referenced by these modifiers.

The sentinel modifier ( ) was introduced in Chapter 1, “Simple Sample,” when we analyzed the declaration and calling of methods with a variable-length argument list (vararg methods). (See “Method Declaration.”) A sentinel signifies the beginning of optional arguments supplied for a vararg method call. This modifier can appear in only one context: at the call site, because the optional parameters of a vararg method are not specified when such a method is declared. The runtime treats a sentinel appearing in any other context as an error. The method arguments at the call site can contain only one sentinel, and the sentinel is used only if optional arguments are supplied:

   // Declaration of vararg method   mandatory parameters only:    .method public static vararg int32 Print(string Format)    {           }        // Calling vararg method with two optional arguments:    call vararg int32 Print(string int32int32)        // Calling vararg method without optional arguments:    call vararg int32 Print(string)

The pinned modifier is applicable to the method’s local variables only. Its use means that the local variable cannot be relocated by the garbage collector and must stay put throughout the method execution. If a local variable is “pinned,” it is safe to convert a managed pointer to this variable to an unmanaged pointer and then to dereference this unmanaged pointer, because the unmanaged pointer is guaranteed to still be valid when it is dereferenced:

   .locals init(int32 A, int32 pinned B, int32* pA, int32* pB)    ldloca A    stloc pA      // pA = &A    ldloca B    stloc pB      // pB = &B        ldloc pA    ldc.i4 123    stind.i4      // *pA=123   unsafe, A could have been moved    ldloc pB    ldc.i4 123    stind.i4      // *pB=123   safe, B is pinned and cannot move

Native Types

When managed code calls unmanaged methods or exposes managed fields to unmanaged code, it is sometimes necessary to provide specific information about how the managed types should be marshaled to and from the unmanaged types. The unmanaged types recognizable by the common language runtime are referred to as native, and they are listed in CorHdr.h in the enumeration CorNativeType. All constants in this enumeration have names that begin with NATIVE_TYPE_* ; for purposes of this discussion, I have omitted this part of the names or abbreviated it as N_T_. The same constants are also listed in the .NET Framework class library in the enumerator System.Runtime.InteropServices.UnmanagedType.

Some of the native types are obsolete and are ignored by the runtime interoperability subsystem. But since these native types are not retired altogether, ILAsm must have ways to denote them—and since ILAsm denotes these types, I cannot help but list obsolete types along with others, all of which you’ll find in Table 7-6.

Table 7-6  Native Types Defined in the Runtime 

Code

Constant Name

.NET Framework Type Name

ILAsm Notation

Comments

0x01

VOID

void

Obsolete and thus should not be used; recognized by ILAsm but ignored by the runtime interoperability subsystem

0x02

BOOLEAN

Bool

bool

4-byte Boolean value; true = nonzero, false = 0

0x03

I1

I1

int8

Signed 1-byte integer

0x04

U1

U1

unsigned int8

Unsigned 1-byte integer

0x05

I2

I2

int16

Signed 2-byte integer

0x06

U2

U2

unsigned int16

Unsigned 2-byte integer

0x07

I4

I4

int32

Signed 4-byte integer

0x08

U4

U4

unsigned int32

Unsigned 4-byte integer

0x09

I8

I8

int64

Signed 8-byte integer

0x0A

U8

U8

unsigned int64

Unsigned 8-byte integer

0x0B

R4

R4

float32

4-byte floating-point

0x0C

R8

R8

float64

8-byte floating-point

0x0D

SYSCHAR

syschar

Obsolete

0x0E

VARIANT

variant

Obsolete

0x0F

CURRENCY

Currency

currency

Currency value

0x10

PTR

*

Obsolete; use native int

0x11

DECIMAL

decimal

Obsolete

0x12

DATE

date

Obsolete

0x13

BSTR

BStr

bstr

Unicode Visual Basic style string

0x14

LPSTR

LPStr

lpstr

Pointer to a zero-terminated ANSI string

0x15

LPWSTR

LPWStr

lpwstr

Pointer to a zero-terminated Unicode string

0x16

LPTSTR

LPTStr

lptstr

Pointer to a zero-terminated ANSI or Unicode string, depending on platform

0x17

FIXEDSYSSTRING

ByValTStr

fixed sysstring [<size>]

Fixed-system string of size <size> bytes; applicable to field marshaling only

0x18

OBJECTREF

objectref

Obsolete

0x19

IUNKNOWN

IUnknown

iunknown

IUnknown interface pointer

0x1A

IDISPATCH

IDispatch

idispatch

IDispatch interface pointer

0x1B

STRUCT

Struct

struct

C-style structure, for marshaling the formatted managed types

0x1C

INTF

Interface

interface

Interface pointer

0x1D

SAFEARRAY

SafeArray

safearray <variant_type>

Safe array of type <variant_type>

0x1E

FIXEDARRAY

ByValArray

fixed array [<size>]

Fixed-size array, of size <size> bytes

0x1F

INT

IntPtr

int

Signed pointer-size integer

0x20

UINT

UIntPtr

unsigned int

Unsigned pointer-size integer

0x21

NESTEDSTRUCT

nested struct

Obsolete; use struct

0x22

BYVALSTR

VBByRefStr

byvalstr

Visual Basic style string in a fixed-length buffer

0x23

ANSIBSTR

AnsiBStr

ansi bstr

ANSI Visual Basic style string

0x24

TBSTR

TBStr

tbstr

bstr or ansi bstr, depending on the platform

0x25

VARIANTBOOL

VariantBool

variant bool

2-byte Boolean; true = -1, false = 0

0x26

FUNC

FunctionPtr

method

Function pointer

0x28

ASANY

AsAny

as any

Object; type defined at run time

0x2A

ARRAY

LPArray

<n_type> [<sizes>]

Fixed-size array of a native type <n_type>

0x2B

LPSTRUCT

LPStruct

lpstruct

Pointer to a C-style structure

0x2C

CUSTOMMARSHALER

CustomMarshaler

custom (<class_str>, <cookie_str>)

Custom marshaler

0x2D

ERROR

Error

error

Maps int32 to VT_HRESULT

The <sizes> parameter in the ILAsm notation for ARRAY, shown in Table 7-6, can be empty or can be formatted as <size> + <size_param_number>:

   <sizes> ::= <>                   <size>                   + <size_param_number>                   <size> <size_param_number>

If <sizes> is empty, the size of the native array is derived from the size of the managed array being marshaled.

The <size> parameter specifies the native array size in array items. The zero-based method parameter number <size_param_number> indicates which of the method parameters specifies the size of the native array. The total size of the native array is <size> plus the additional size specified by the method parameter that is indicated by <size_param_number>.

A custom marshaler declaration (shown in Table 7-6) has two parameters, both of which are quoted strings. The <class_str> parameter is the name of the class representing the custom marshaler, using the string conventions of Reflection.Emit. The <cookie_str> parameter is an argument string (cookie) passed to the custom marshaler at run time. This string identifies the form of the marshaling required, and its notation is specific to the custom marshaler.

Variant Types

Variant types are defined in the enumeration VARENUM in the Wtypes.h file, which is distributed with Microsoft Visual Studio. Not all variant types are applicable as safe array types, according to Wtypes.h, but ILAsm provides notation for all of them nevertheless, as shown in Table 7-7. It might look strange, considering that variant types appear in ILAsm only in the context of safe array specification, but we should not forget that one of ILAsm’s principal applications is the generation of test programs, which contain known, preprogrammed errors.

Table 7-7  Variant Types Defined in the Runtime 

Code

Constant Name

Applicable to Safe Array?

ILAsm Notation

0x00

VT_EMPTY

No

<empty>

0x01

VT_NULL

No

null

0x02

VT_I2

Yes

int16

0x03

VT_I4

Yes

int32

0x04

VT_R4

Yes

float32

0x05

VT_R8

Yes

float64

0x06

VT_CY

Yes

currency

0x07

VT_DATE

Yes

date

0x08

VT_BSTR

Yes

bstr

0x09

VT_DISPATCH

Yes

idispatch

0x0A

VT_ERROR

Yes

error

0x0B

VT_BOOL

Yes

bool

0x0C

VT_VARIANT

Yes

variant

0x0D

VT_UNKNOWN

Yes

iunknown

0x0E

VT_DECIMAL

Yes

decimal

0x10

VT_I1

Yes

int8

0x11

VT_UI1

Yes

unsigned int8

0x12

VT_UI2

Yes

unsigned int16

0x13

VT_UI4

Yes

unsigned int32

0x14

VT_I8

No

int64

0x15

VT_UI8

No

unsigned int64

0x16

VT_INT

Yes

int

0x17

VT_UINT

Yes

unsigned int

0x18

VT_VOID

No

void

0x19

VT_HRESULT

No

hresult

0x1A

VT_PTR

No

*

0x1B

VT_SAFEARRAY

No

safearray

0x1C

VT_CARRAY

No

carray

0x1D

VT_USERDEFINED

No

userdefined

0x1E

VT_LPSTR

No

lpstr

0x1F

VT_LPWSTR

No

lpwstr

0x24

VT_RECORD

Yes

record

0x40

VT_FILETIME

No

filetime

0x41

VT_BLOB

No

blob

0x42

VT_STREAM

No

stream

0x43

VT_STORAGE

No

storage

0x44

VT_STREAMED_OBJECT

No

streamed_object

0x45

VT_STORED_OBJECT

No

stored_object

0x46

VT_BLOB_OBJECT

No

blob_object

0x47

VT_CF

No

cf

0x48

VT_CLSID

No

clsid

0x1000

VT_VECTOR

Yes

<v_type> vector

0x2000

VT_ARRAY

Yes

<v_type> [ ]

0x4000

VT_BYREF

Yes

<v_type> &



Inside Microsoft. NET IL Assembler
Inside Microsoft .NET IL Assembler
ISBN: 0735615470
EAN: 2147483647
Year: 2005
Pages: 147
Authors: SERGE LIDIN

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net