Data Marshaling
All thunks and wrappers provide data conversions between managed and unmanaged data types, which is referred to as marshaling. Marshaling information is kept in the FieldMarshal metadata table, which is described in Chapter 8, “Fields and Data Constants.” The marshaling information can be associated with Field and Param metadata records.
Blittable Types
One significant subset of managed data types directly corresponds to unmanaged types, requiring no data conversion on managed and unmanaged code boundaries. These types, which are referred to as blittable, include pointers (not references), function pointers, signed and unsigned integer types, and floating-point types. Formatted value types (the value types having sequential or explicit class layout) that contain only blittable elements are also blittable.
The nonblittable managed data types that might require conversion during marshaling because of different or ambiguous unmanaged representation are as follows:
bool (1-byte, true = 1, false = 0) can be converted either to native type bool (4-byte, true = 1, false = 0) or to variant bool (2-byte, true = 0xFFFF, false = 0).
char (Unicode character, unsigned 2-byte integer) can be converted either to int8 (an ANSI character) or to unsigned int16 (a Unicode character).
string (class System.String) can be converted either to an ANSI or a Unicode zero-terminated string (an array of characters) or to bstr (a Unicode Visual Basic style string).
object (class System.Object) can be converted either to a structure or to an interface pointer.
class can be converted either to an interface pointer or, if the class is a delegate, to a function pointer.
valuetype (nonblittable) is converted to a structure with a fixed layout.
An array and a vector can be converted to a safe array or a C-style array.
The references (managed pointers) are marshaled as unmanaged pointers. The managed objects and interfaces are references in principle, so they are marshaled as unmanaged pointers as well. Consequently, references to the objects and interfaces (class IFoo&) are marshaled as double pointers (IFoo**).
In/Out Parameters
The method parameter flags in and out can be (but are not necessarily) taken into account by the marshaler. When that happens, the marshaler can optimize the process by abandoning the marshaling in one direction. By default, parameters passed by reference (including references to objects but excluding the objects) are presumed to be in/out parameters, whereas parameters passed by value (including the objects, even though managed objects are in principle references) are presumed to be in parameters. The exceptions to this rule are the [mscorlib]System.Text.StringBuilder class, which is always marshaled as in/out, and classes and arrays containing the blittable types that can be pinned—which, if the in and out flags are explicitly specified, can be two-way marshaled even when passed by value.
Considering that managed objects don’t necessarily stay in one place and can be moved any time the garbage collector does its job, it is vital to ensure that the arguments of an unmanaged call don’t wander around while the call is in progress. This can be accomplished in the following two ways:
Pin the object for the duration of the call, preventing the garbage collector from moving it. This is done for the instances of formatted, blittable classes that have fixed layout in memory, invariant to managed or unmanaged code.
Allocate some unmovable memory. If the parameter has an in flag, marshal the data from the argument to this unmovable memory. Call the method, passing this memory as the argument. If the parameter has an out flag, marshal this memory back to the original argument upon completion of the call.
The ILAsm syntax for explicit marshaling definition of fields and method parameters is described in Chapter 8 and in Chapter 9, “Methods.” Chapter 7, “Primitive Types and Signatures,” discusses the native types used in explicit marshaling definitions. Rather than reviewing that information here, let’s discuss some interesting marshaling cases instead.
String Marshaling
String marshaling is defined in at least three places: a string conversion flag of a TypeDef (ansi, unicode, or autochar), a similar flag of a P/Invoke implementation map, and, explicitly, in marshal( ) clauses.
As method arguments, managed strings (instances of the System.String class) can be marshaled as the following native types:
lpstr, a pointer to a zero-terminated ANSI string
lpwstr, a pointer to a zero-terminated Unicode string
lptstr, a pointer to a zero-terminated ANSI or Unicode string, depending on the platform
bstr, a Unicode Visual Basic style string with a prepended length
ansi bstr, an ANSI Visual Basic style string with a prepended length
tbstr, an ANSI or Unicode Visual Basic style string, depending on the platform
The COM wrappers marshal the string arguments as lpstr, lpwstr, or bstr only. Other unmanaged string types are not COM-compatible.
At times, a string buffer must be passed to an unmanaged method in order to be filled with some particular contents. Passing a string by value does not work in this case because the called method cannot modify the string contents. Passing the string by reference does not initialize the buffer to the required length. The solution, then, is to pass not a string (an instance of System.String) but rather an instance of System.Text.StringBuilder, initialized to the required length:
.method public static pinvokeimpl("user32.dll" stdcall) int32 GetWindowText(int32 hndl, class [mscorlib]System.Text.StringBuilder s, int32 nMaxLen) { } .method public static string GetWText(int32 hndl) { .locals init(class [mscorlib]System.Text.StringBuilder sb) ldc.i4 1024 // Buffer size newobj instance void [mscorlib]System.Text.StringBuilder::.ctor(int32) stloc.0 ldarg.0 // Load hndl on stack ldloc.0 // Load StringBuilder instance on stack ldc.i4 1024 // Buffer size again call int32 GetWindowText(int32, class [mscorlib]System.Text.StringBuilder, int32) pop // Discard the result ldloc.0 // Load StringBuilder instance (filled in) on stack call instance string [mscorlib]System.Text.StringBuilder::ToString() ret }
The string fields of the value types are marshaled as lpstr, lpwstr, lptstr, bstr, or fixed sysstring[<size>], which is a fixed-length array of ANSI or Unicode characters, depending on the string conversion flag of the field’s parent TypeDef.
Object Marshaling
The fields and method parameters of an object type are marshaled as struct (converted to a COM-style variant), interface (converted to IDispatch if possible and otherwise to IUnknown), iunknown (converted to IUnknown), or idispatch (converted to IDispatch). The default marshaling is as struct.
When an object is marshaled as struct to a COM variant, the type of the variant can be explicitly set by those object types that implement the [mscorlib]System.IConvertible interface. The types that do not implement this interface are marshaled to and from variants as shown in Table 15-1. All listed types belong to the System namespace.
Type of object marshaled to | COM variant type | marshaled to type of object |
Null reference | VT_EMPTY | Null reference |
DBNull | VT_NULL | DBNull |
Runtime.InteropServices. ErrorWrapper | VT_ERROR | UInt32 |
Reflection.Missing | VT_ERROR with E_PARAMNOTFOUND | UInt32 |
Runtime.InteropServices. IdispatchWrapper | VT_DISPATCH | ___ComObject or null reference if the variant value is null |
Runtime.InteropServices. IunknownWrapper | VT_UNKNOWN | ___ComObject or null reference if the variant value is null |
Runtime.InteropServices. CurrencyWrapper | VT_CY | Decimal |
Boolean | VT_BOOL | Boolean |
Sbyte | VT_I1 | Sbyte |
Byte | VT_UI1 | Byte |
Int16 | VT_I2 | Int16 |
UInt16 | VT_UI2 | UInt16 |
Int32 | VT_I4 | Int32 |
UInt32 | VT_UI4 | UInt32 |
Int64 | VT_I8 | Int64 |
UInt64 | VT_UI8 | UInt64 |
Single | VT_R4 | Single |
Double | VT_R8 | Double |
Decimal | VT_DECIMAL | Decimal |
DateTime | VT_DATE | DateTime |
String | VT_BSTR | String |
IntPtr | VT_INT | Int32 |
UintPtr | VT_UINT | UInt32 |
Array | VT_ARRAY | Array |
If you wonder why, for example, System.Int16 and System.Boolean should be used instead of int16 and bool, respectively, I should remind you that our discussion concerns the conversion of the objects.
When a managed object is passed to unmanaged code by reference, the marshaler creates a new variant and copies the contents of the object reference into this variant. The unmanaged code is free to tinker with the variant contents, and these changes are propagated back to the referenced object when the method call is completed. If the type of the variant has been changed within the unmanaged code, the back-propagation of the changes can result in a change of the object type, so you might find yourself with a different type of object after the call. The same story happens (in reverse order) when unmanaged code calls a managed method, passing a variant by reference: the type of the variant can be changed during the call.
The variant can contain a pointer to its value rather than the value itself. (In this case, the variant has its type flag VT_BYREF set.) Such a “reference variant,” passed to the managed code by value, is marshaled to a managed object, and the marshaler automatically dereferences the variant contents and retrieves the actual value. Despite its reference type, the variant is nonetheless passed by value, so any changes made to the object in the managed code are not propagated back to the original variant.
If a “reference variant” is passed to the managed code by reference, it is marshaled to an object reference, with the marshaler dereferencing the variant contents and copying the value into a newly constructed managed object. But in this case, the changes made in the managed code are propagated back to the unmanaged code only if they did not lead to a change in the variant type. If the changes did affect the variant type, the marshaler throws an InvalidCast exception.
Class Marshaling
Managed classes are always marshaled by COM wrappers as the interfaces. Every managed class can be seen as implementing an implicit interface that contains all nonprivate members of the class.
When a type library is generated from an assembly, a class interface and a coclass are produced for each accessible managed class. The class interface is marked as a default interface for the coclass.
A CCW generated by the common language runtime for each instance of the exposed managed class also implements other interfaces not explicitly implemented by the class. In particular, a CCW automatically implements IUnknown and IDispatch.
When an interop assembly is generated from a type library, the coclasses of the type library are converted to the managed classes. The member sets of these classes are defined by the default interfaces of the coclasses.
An RCW generated by the runtime for a specific instance of a COM class represents this instance and not a specific interface exposed by this instance. Hence, an RCW must implement all interfaces exposed by the COM object. This means that the identity of the COM object itself must be determined by one of its interfaces because COM objects are not passed as method arguments but their interfaces are. In order to do this, the runtime queries the passed interface for IProvideClassInfo2. If this interface is unavailable, the runtime queries the passed interface for IProvideClassInfo. If either of the interfaces is available, the runtime obtains the CLSID (class identifier) of the COM class exposing the interface—by calling the IProvideClassInfo2::GetGUID() or IProvideClassInfo::GetClassInfo() method—and uses it to retrieve full information about the COM class from the registry. If this action sequence fails, the runtime instantiates a generic wrapper, System.___ComObject.
Array Marshaling
Unmanaged arrays can be either C-style arrays of fixed or variable length or COM-style safe arrays. Both kinds of arrays are marshaled to managed vectors, with the unmanaged element type of the array marshaled to the respective managed element type of the vector. For example, a safe array of BSTR is marshaled to string[].
The rank and bound information carried by a safe array is lost in the transition. If this information is vital for correct interfacing, manual intervention is required again: the interop assembly produced from the COM type library must be disassembled, the array definitions must be manually edited, and the assembly must be reassembled. For example, if a three-dimensional safe array of BSTR is marshaled as string[], the respective type must be manually edited to string[0 ,0 ,0 ] in order to restore the rank of the array.
C-style arrays can have either a fixed length or a length specified by another parameter of the method. Both values, the length and the length parameter’s zero-based ordinal, can be specified for the marshaler so that a vector of appropriate size can be allocated. The ILAsm syntax for specifying the array length is described in Chapter 7. For example:
// Fixed array length .method public static pinvokeimpl("unmanaged.dll" stdcall) void Foo(string[] marshal(bstr[128]) StrArray) {} // Array length is specified by arrLen (parameter #1) .method public static pinvokeimpl("unmanaged.dll" stdcall) void Boo(string[] marshal(bstr[+1]) StrArray, int32 arrLen) {} // Base length is 128, additional length specified by moreLen .method public static pinvokeimpl("unmanaged.dll" stdcall) void Goo(int32 moreLen, string[] marshal(bstr[128+0]) StrArray) {}
The managed vectors and arrays can be marshaled as safe arrays or as C-style arrays. Marshaling as safe arrays preserves the rank and boundary information of the managed arrays. This information is lost when the managed arrays are marshaled as C-style arrays. Vectors of vectors—for example, int32[][]—cannot be marshaled.
Delegate Marshaling
Delegates are marshaled as interfaces by COM wrappers and as unmanaged function pointers by P/Invoke thunks. The type library Mscorlib.tlb defines the _Delegate interface, which represents delegates in the COM world. This interface exposes the DynamicInvoke method, which allows the COM code to call a delegated managed method.
Marshaling a delegate as an unmanaged function pointer represents a certain risk. Because the common language runtime does not count this as a live reference to the delegate, the delegate might be destroyed by the garbage collector before the call to the unmanaged method is completed. The calling managed code must take steps to ensure the delegate’s survival for the duration of the method call.