Data Marshaling

Data Marshaling

All thunks and wrappers provide data conversions between managed and unmanaged data types, which is referred to as marshaling. Marshaling information is kept in the FieldMarshal metadata table, which is described in Chapter 8, “Fields and Data Constants.” The marshaling information can be associated with Field and Param metadata records.

Blittable Types

One significant subset of managed data types directly corresponds to unmanaged types, requiring no data conversion on managed and unmanaged code boundaries. These types, which are referred to as blittable, include pointers (not references), function pointers, signed and unsigned integer types, and floating-point types. Formatted value types (the value types having sequential or explicit class layout) that contain only blittable elements are also blittable.

The nonblittable managed data types that might require conversion during marshaling because of different or ambiguous unmanaged representation are as follows:

  • bool (1-byte, true = 1, false = 0) can be converted either to native type bool (4-byte, true = 1, false = 0) or to variant bool (2-byte, true = 0xFFFF, false = 0).

  • char (Unicode character, unsigned 2-byte integer) can be converted either to int8 (an ANSI character) or to unsigned int16 (a Unicode character).

  • string (class System.String) can be converted either to an ANSI or a Unicode zero-terminated string (an array of characters) or to bstr (a Unicode Visual Basic style string).

  • object (class System.Object) can be converted either to a structure or to an interface pointer.

  • class can be converted either to an interface pointer or, if the class is a delegate, to a function pointer.

  • valuetype (nonblittable) is converted to a structure with a fixed layout.

  • An array and a vector can be converted to a safe array or a C-style array.

The references (managed pointers) are marshaled as unmanaged pointers. The managed objects and interfaces are references in principle, so they are marshaled as unmanaged pointers as well. Consequently, references to the objects and interfaces (class IFoo&) are marshaled as double pointers (IFoo**).

In/Out Parameters

The method parameter flags in and out can be (but are not necessarily) taken into account by the marshaler. When that happens, the marshaler can optimize the process by abandoning the marshaling in one direction. By default, parameters passed by reference (including references to objects but excluding the objects) are presumed to be in/out parameters, whereas parameters passed by value (including the objects, even though managed objects are in principle references) are presumed to be in parameters. The exceptions to this rule are the [mscorlib]System.Text.StringBuilder class, which is always marshaled as in/out, and classes and arrays containing the blittable types that can be pinned—which, if the in and out flags are explicitly specified, can be two-way marshaled even when passed by value.

Considering that managed objects don’t necessarily stay in one place and can be moved any time the garbage collector does its job, it is vital to ensure that the arguments of an unmanaged call don’t wander around while the call is in progress. This can be accomplished in the following two ways:

  • Pin the object for the duration of the call, preventing the garbage collector from moving it. This is done for the instances of formatted, blittable classes that have fixed layout in memory, invariant to managed or unmanaged code.

  • Allocate some unmovable memory. If the parameter has an in flag, marshal the data from the argument to this unmovable memory. Call the method, passing this memory as the argument. If the parameter has an out flag, marshal this memory back to the original argument upon completion of the call.

The ILAsm syntax for explicit marshaling definition of fields and method parameters is described in Chapter 8 and in Chapter 9, “Methods.” Chapter 7, “Primitive Types and Signatures,” discusses the native types used in explicit marshaling definitions. Rather than reviewing that information here, let’s discuss some interesting marshaling cases instead.

String Marshaling

String marshaling is defined in at least three places: a string conversion flag of a TypeDef (ansi, unicode, or autochar), a similar flag of a P/Invoke implementation map, and, explicitly, in marshal( ) clauses.

As method arguments, managed strings (instances of the System.String class) can be marshaled as the following native types:

  • lpstr, a pointer to a zero-terminated ANSI string

  • lpwstr, a pointer to a zero-terminated Unicode string

  • lptstr, a pointer to a zero-terminated ANSI or Unicode string, depending on the platform

  • bstr, a Unicode Visual Basic style string with a prepended length

  • ansi bstr, an ANSI Visual Basic style string with a prepended length

  • tbstr, an ANSI or Unicode Visual Basic style string, depending on the platform

The COM wrappers marshal the string arguments as lpstr, lpwstr, or bstr only. Other unmanaged string types are not COM-compatible.

At times, a string buffer must be passed to an unmanaged method in order to be filled with some particular contents. Passing a string by value does not work in this case because the called method cannot modify the string contents. Passing the string by reference does not initialize the buffer to the required length. The solution, then, is to pass not a string (an instance of System.String) but rather an instance of System.Text.StringBuilder, initialized to the required length:

.method public static pinvokeimpl("user32.dll" stdcall)     int32 GetWindowText(int32 hndl,                        class [mscorlib]System.Text.StringBuilder s,                        int32 nMaxLen) { } .method public static string GetWText(int32 hndl) {    .locals init(class [mscorlib]System.Text.StringBuilder sb)    ldc.i4 1024 // Buffer size    newobj instance void        [mscorlib]System.Text.StringBuilder::.ctor(int32)    stloc.0    ldarg.0   // Load hndl on stack    ldloc.0   // Load StringBuilder instance on stack    ldc.i4 1024 // Buffer size again    call int32 GetWindowText(int32,                class [mscorlib]System.Text.StringBuilder,               int32)    pop      // Discard the result    ldloc.0  // Load StringBuilder instance (filled in) on stack    call instance string          [mscorlib]System.Text.StringBuilder::ToString()    ret }

The string fields of the value types are marshaled as lpstr, lpwstr, lptstr, bstr, or fixed sysstring[<size>], which is a fixed-length array of ANSI or Unicode characters, depending on the string conversion flag of the field’s parent TypeDef.

Object Marshaling

The fields and method parameters of an object type are marshaled as struct (converted to a COM-style variant), interface (converted to IDispatch if possible and otherwise to IUnknown), iunknown (converted to IUnknown), or idispatch (converted to IDispatch). The default marshaling is as struct.

When an object is marshaled as struct to a COM variant, the type of the variant can be explicitly set by those object types that implement the [mscorlib]System.IConvertible interface. The types that do not implement this interface are marshaled to and from variants as shown in Table 15-1. All listed types belong to the System namespace.

Table 15-1  Marshaling of Managed Objects to and from COM Variants 

Type of object marshaled to

COM variant type

marshaled to type of object

Null reference

VT_EMPTY

Null reference

DBNull

VT_NULL

DBNull

Runtime.InteropServices. ErrorWrapper

VT_ERROR

UInt32

Reflection.Missing

VT_ERROR with E_PARAMNOTFOUND

UInt32

Runtime.InteropServices. IdispatchWrapper

VT_DISPATCH

___ComObject or null reference if the variant value is null

Runtime.InteropServices. IunknownWrapper

VT_UNKNOWN

___ComObject or null reference if the variant value is null

Runtime.InteropServices. CurrencyWrapper

VT_CY

Decimal

Boolean

VT_BOOL

Boolean

Sbyte

VT_I1

Sbyte

Byte

VT_UI1

Byte

Int16

VT_I2

Int16

UInt16

VT_UI2

UInt16

Int32

VT_I4

Int32

UInt32

VT_UI4

UInt32

Int64

VT_I8

Int64

UInt64

VT_UI8

UInt64

Single

VT_R4

Single

Double

VT_R8

Double

Decimal

VT_DECIMAL

Decimal

DateTime

VT_DATE

DateTime

String

VT_BSTR

String

IntPtr

VT_INT

Int32

UintPtr

VT_UINT

UInt32

Array

VT_ARRAY

Array

If you wonder why, for example, System.Int16 and System.Boolean should be used instead of int16 and bool, respectively, I should remind you that our discussion concerns the conversion of the objects.

When a managed object is passed to unmanaged code by reference, the marshaler creates a new variant and copies the contents of the object reference into this variant. The unmanaged code is free to tinker with the variant contents, and these changes are propagated back to the referenced object when the method call is completed. If the type of the variant has been changed within the unmanaged code, the back-propagation of the changes can result in a change of the object type, so you might find yourself with a different type of object after the call. The same story happens (in reverse order) when unmanaged code calls a managed method, passing a variant by reference: the type of the variant can be changed during the call.

The variant can contain a pointer to its value rather than the value itself. (In this case, the variant has its type flag VT_BYREF set.) Such a “reference variant,” passed to the managed code by value, is marshaled to a managed object, and the marshaler automatically dereferences the variant contents and retrieves the actual value. Despite its reference type, the variant is nonetheless passed by value, so any changes made to the object in the managed code are not propagated back to the original variant.

If a “reference variant” is passed to the managed code by reference, it is marshaled to an object reference, with the marshaler dereferencing the variant contents and copying the value into a newly constructed managed object. But in this case, the changes made in the managed code are propagated back to the unmanaged code only if they did not lead to a change in the variant type. If the changes did affect the variant type, the marshaler throws an InvalidCast exception.

Class Marshaling

Managed classes are always marshaled by COM wrappers as the interfaces. Every managed class can be seen as implementing an implicit interface that contains all nonprivate members of the class.

When a type library is generated from an assembly, a class interface and a coclass are produced for each accessible managed class. The class interface is marked as a default interface for the coclass.

A CCW generated by the common language runtime for each instance of the exposed managed class also implements other interfaces not explicitly implemented by the class. In particular, a CCW automatically implements IUnknown and IDispatch.

When an interop assembly is generated from a type library, the coclasses of the type library are converted to the managed classes. The member sets of these classes are defined by the default interfaces of the coclasses.

An RCW generated by the runtime for a specific instance of a COM class represents this instance and not a specific interface exposed by this instance. Hence, an RCW must implement all interfaces exposed by the COM object. This means that the identity of the COM object itself must be determined by one of its interfaces because COM objects are not passed as method arguments but their interfaces are. In order to do this, the runtime queries the passed interface for IProvideClassInfo2. If this interface is unavailable, the runtime queries the passed interface for IProvideClassInfo. If either of the interfaces is available, the runtime obtains the CLSID (class identifier) of the COM class exposing the interface—by calling the IProvideClassInfo2::GetGUID() or IProvideClassInfo::GetClassInfo() method—and uses it to retrieve full information about the COM class from the registry. If this action sequence fails, the runtime instantiates a generic wrapper, System.___ComObject.

Array Marshaling

Unmanaged arrays can be either C-style arrays of fixed or variable length or COM-style safe arrays. Both kinds of arrays are marshaled to managed vectors, with the unmanaged element type of the array marshaled to the respective managed element type of the vector. For example, a safe array of BSTR is marshaled to string[].

The rank and bound information carried by a safe array is lost in the transition. If this information is vital for correct interfacing, manual intervention is required again: the interop assembly produced from the COM type library must be disassembled, the array definitions must be manually edited, and the assembly must be reassembled. For example, if a three-dimensional safe array of BSTR is marshaled as string[], the respective type must be manually edited to string[0 ,0 ,0 ] in order to restore the rank of the array.

C-style arrays can have either a fixed length or a length specified by another parameter of the method. Both values, the length and the length parameter’s zero-based ordinal, can be specified for the marshaler so that a vector of appropriate size can be allocated. The ILAsm syntax for specifying the array length is described in Chapter 7. For example:

// Fixed array length .method public static pinvokeimpl("unmanaged.dll" stdcall)    void Foo(string[] marshal(bstr[128]) StrArray) {} // Array length is specified by arrLen (parameter #1) .method public static pinvokeimpl("unmanaged.dll" stdcall)    void Boo(string[] marshal(bstr[+1]) StrArray, int32 arrLen) {} // Base length is 128, additional length specified by moreLen .method public static pinvokeimpl("unmanaged.dll" stdcall)    void Goo(int32 moreLen, string[] marshal(bstr[128+0]) StrArray) {}

The managed vectors and arrays can be marshaled as safe arrays or as C-style arrays. Marshaling as safe arrays preserves the rank and boundary information of the managed arrays. This information is lost when the managed arrays are marshaled as C-style arrays. Vectors of vectors—for example, int32[][]—cannot be marshaled.

Delegate Marshaling

Delegates are marshaled as interfaces by COM wrappers and as unmanaged function pointers by P/Invoke thunks. The type library Mscorlib.tlb defines the _Delegate interface, which represents delegates in the COM world. This interface exposes the DynamicInvoke method, which allows the COM code to call a delegated managed method.

Marshaling a delegate as an unmanaged function pointer represents a certain risk. Because the common language runtime does not count this as a live reference to the delegate, the delegate might be destroyed by the garbage collector before the call to the unmanaged method is completed. The calling managed code must take steps to ensure the delegate’s survival for the duration of the method call.



Inside Microsoft. NET IL Assembler
Inside Microsoft .NET IL Assembler
ISBN: 0735615470
EAN: 2147483647
Year: 2005
Pages: 147
Authors: SERGE LIDIN

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net