Thunks and Wrappers

Thunks and Wrappers

The interoperation between managed and unmanaged code requires the common language runtime to build special interface elements that provide the target identification and necessary data conversion, or marshaling. These runtime-generated interface elements are referred to as thunks, or stubs, in interoperation with “traditional” unmanaged code; in COM interoperation, they are referred to as wrappers.

P/Invoke Thunks

In order to build a client thunk for managed code to call unmanaged code, the common language runtime needs the following information:

  • The name of the module exporting the unmanaged method—for example, Kernel32.dll

  • The exported method’s name or ordinal in the export table of the unmanaged module

  • Binary flags reflecting specifics of how the unmanaged method is called and marshaled

All these items constitute the metadata item known as an implementation map, discussed in the following section.

The binary flag values and the respective IL assembly language (ILAsm) keywords are as follows:

  • nomangle (0x0001)  The exported method’s name must be matched literally.

  • ansi (0x0002)  The method parameters of type string must be marshaled as ANSI zero-terminated strings unless explicitly specified otherwise.

  • unicode (0x0004)  The method parameters of typestring must be marshaled as Unicode strings.

  • autochar (0x0006)  The method parameters of typestring must be marshaled as ANSI or Unicode strings, depending on the underlying platform.

  • lasterr (0x0040)  The native method supports the last error querying by the Win32 API GetLastError.

  • winapi (0x0100)  The native method uses the calling convention standard for the underlying platform.

  • cdecl (0x0200)  The native method uses the C/C++ style calling convention, and the call stack is cleaned up by the caller.

  • stdcall (0x0300)  The native method uses the standard Win32 API calling convention, and the call stack is cleaned up by the callee.

  • thiscall (0x0400)  The native method uses the C++ member method (non-vararg) calling convention. The call stack is cleaned up by the callee, and the instance pointer (this) is pushed on the stack last.

  • fastcall (0x0500)  The arguments are passed to the native method in registers when possible.

The name of the exported method can be replaced with the method’s ordinal in the unmanaged module’s export table. The ordinal is specified as a decimal number, preceded by the # character—for example, #10.

If the specified name is a regular name rather than an ordinal, it is matched to the entries of the Export Name table of the unmanaged module. If the nomangle flag is set, the name is matched literally. Otherwise, things get more interesting.

Let’s suppose, for example, that the name is specified as Hello. If the strings are marshaled to ANSI and the Export Name table does not contain Hello, the P/Invoke mechanism tries to find HelloA. If the strings are marshaled as Unicode, the P/Invoke mechanism looks for HelloW; only if HelloW is not found does P/Invoke look for Hello. If it still can’t find a match, it tries the mangled name _Hello@N, where N is a decimal representation of the size of the method’s destination buffer in bytes. The destination buffer is the buffer holding all method parameters. For example, if method Hello has two 4-byte parameters (either integer or floating-point), the mangled name would be _Hello@8. Because this kind of function name mangling is characteristic only of the stdcallfunctions, if the calling convention is different and the name is mangled in some other way, the P/Invoke mechanism will not find the exported method.

The thunk is perceived by the managed code as simply another method, and hence it must be declared as any method would be. The presence of the pinvokeimpl flag in the respective Method record signals the runtime that this method is indeed a client thunk and not a true managed method. You have already encountered the following declaration of a P/Invoke thunk in Chapter 1, “Simple Sample”:

.method public static pinvokeimpl("msvcrt.dll" cdecl)     vararg int32 sscanf(string,int8*) cil managed { }

The parameters within the parentheses of the pinvokeimpl clause represent the implementation map data. The string marshaling flag is not specified, and the marshaling defaults to ANSI. The method name need not be specified because it is the same as the declared thunk name. If you want to use sscanf but would rather call it Foo (sscanf is such a reptilian name!), you could declare the thunk as follows:

.method public static pinvokeimpl("msvcrt.dll" as "sscanf" cdecl)     vararg int32 Foo(string,int8*) cil managed { }

Because the unmanaged method resides somewhere else and the thunk is generated by the runtime, the Method record of a “true” P/Invoke thunk has its RVA entry set to 0.

Implementation Map Metadata and Validity Rules

The implementation map metadata resides in the ImplMap metadata table. A record in this table has four entries:

  • MappingFlags (unsigned 2-byte integer)  Binary flags, which were described in the previous section. The validity mask is 0x0747.

  • MemberForwarded (coded token of type MemberForwarded) An index to the Method table, identifying the Method record of the P/Invoke thunk. This must be a valid index. The indexed method must have the pinvokeimpl and static flags set. The token type MemberForwarded can, in principle, index the Field table as well; but the first release of the common language runtime does not implement the P/Invoke mechanism for fields, and ILAsm syntax does not permit you to specify pinvokeimpl( ) in field definitions.

  • ImportName (offset in the #Strings stream)  The name of the unmanaged method as it is defined in the export table of the unmanaged module. The name must be nonempty and fewer than 1024 bytes long in UTF-8 encoding.

  • ImportScope (record index [RID] to the ModuleRef table)  The index of the ModuleRef record containing the name of the unmanaged module. It must be a valid RID.

IJW Thunks

IJW thunks, similar in structure and function to “true” P/Invoke thunks, are created without the implementation map information. The information regarding the identity of the unmanaged method is not needed because the method is embedded in the same PE file and can be identified by its relative virtual address (RVA). IJW thunks cannot have an RVA value of 0, as opposed to P/Invoke thunks, which must have an RVA value of 0.

The calling convention of the unmanaged method is defined by the thunk signature rather than by the binary flags of the implementation map. The IJW thunk signature usually has the modifier modopt or modreq—for example, modopt([mscorlib]System.Runtime.InteropServices.CallConvCdecl). The string marshaling default is ansi.

To distinguish IJW thunks from P/Invoke thunks, the loader first looks at the implementation flags; IJW thunk declarations should have the flags native and unmanaged set. If the loader doesn’t see these flags, it presumes that this is a “true” P/Invoke thunk and tries to find its implementation map. If the map is not found, the loader realizes that this is an IJW thunk after all and proceeds accordingly. That’s why I noted that the native and unmanaged flags should be set rather than specified that they must be set. The loader will discover the truth even without these flags, but not before it tries to find the implementation map and fails.

The following is a typical example of an IJW thunk declaration; it is a snippet from a disassembly of an MC++-generated mixed-code PE file:

.method public static pinvokeimpl(/* No map */)     unsigned int32  _mainCRTStartup() native unmanaged preservesig{    .entrypoint    .custom instance void [mscorlib]        System.Security.SuppressUnmanagedCodeSecurityAttribute::.ctor()        = ( 01 00 00 00 )     // Embedded native code    // Disassembly of native methods is not supported    // Managed TargetRVA = 0x106f }  // End of global method _mainCRTStartup

As you can see, a thunk can be declared as an entry point, and custom attributes and security attributes can be assigned to it. In these respects, a thunk has the same privileges as any other method.

As you can also see, neither the IL Disassembler nor ILAsm can handle the embedded native code. The mixed-code PE files, employing the IJW interoperation, cannot be round-tripped (disassembled and reassembled).

COM Callable Wrappers

Classic COM objects are allocated from the standard operating system heap and contain internal reference counters. The COM objects must self-destruct when they are not referenced any more—in other words, when their reference counters reach 0.

Managed objects are allocated from the common language runtime internal heap, which is controlled by the garbage collection subsystem (the GC heap). Managed objects don’t have internal reference counters. Instead, the runtime traces all the object references, and the GC automatically destroys unreferenced objects. But the references can be traced only if the objects are being referenced by managed code. Hence, it would be a bad idea to allow unmanaged COM clients to access managed objects directly.

Instead, for each managed object, the runtime creates a COM callable wrapper, which serves as a proxy for the object. Because a CCW is not subject to the GC mechanism, it can be referenced from unmanaged code without causing any ill effects.

In addition to lifetime control of the managed object, a CCW provides data marshaling for method calls and handles managed exceptions, converting them to HRESULT returns, which is standard for COM. If, however, a managed method is designed to return HRESULT (in the form of unsigned int32) rather than throw exceptions, it must have the implementation flag preservesig set. In this case, the method signature is exported exactly as defined.

The runtime carefully maintains a one-to-one relationship between a managed object and its CCW, not allowing an alternative CCW to be created. This guarantees that all interfaces of the same object relate to the same IUnknown and that the interface queries are consistent.

Any CCW generated by the runtime implements IDispatch for late binding. For early binding, which is done directly through the v-table, the runtime must generate the type information in a form consumable by COM clients—namely, in the form of a COM type library. The Microsoft .NET Framework SDK includes the type library exporting utility TlbExp.exe, which generates an accompanying COM type library for any specified assembly. Another tool, RegAsm.exe, also included in the .NET Framework SDK, registers the types exposed by an assembly as COM classes and generates the type library.

When managed classes and their members are exposed to COM, their exposed names might differ from the originals. First, the type library exporters consider all names that differ only in case to be a single form—for example, Hello, hello, HELLO, and hElLo are exported as Hello. Second, classes are exported by name only, without the namespace part, except in the case of a name collision. If a collision exists—if, for example, an assembly has classes A.B.IHello and C.D.IHello defined—the classes are exported by their full names, with underscores replacing the dots: A_B_IHello, C_D_IHello.

Other COM parameters characterizing the CCW for each class are defined by the COM interoperability custom attributes, listed in the section “Custom Attribute Classification” in Chapter 13, “Custom Attributes.” Because all information pertinent to exposing managed classes as COM servers is defined through custom attributes, ILAsm does not have any linguistic constructs specific to this aspect of the interoperation.

Runtime Callable Wrappers

A runtime callable wrapper is created by the common language runtime as a proxy of a classic COM object that the managed code wants to consume. The reasons for creating an RCW are roughly the same as those for creating a CCW: the managed objects know nothing about reference counting and expect their counterparts to belong to the GC heap. An RCW is allocated from the GC heap and caches the reference-counted interface pointers to a single COM object. In short, from the runtime point of view, an RCW is a “normal” managed server; and from the COM point of view, RCW is a “normal” COM client. So everyone’s happy.

An RCW is created when a COM object is instantiated—for example, by a newobj instruction. There are two approaches to binding to the COM classes: early binding, which requires a so-called interop assembly, and late binding by name, which is performed through Reflection methods.

An interop assembly is a managed assembly either produced from a COM type library by means of running the utility TlbImp.exe (included in the .NET Framework SDK) or, at run time, produced by calling methods of the class [mscorlib]System.Runtime.InteropServices.TypeLibConverter. From the point of view of the managed code, the interop assembly is simply another assembly, all classes of which happen to carry the import flag. This flag is the signal for the runtime to instantiate an RCW every time it is commanded to instantiate such a class.

Late binding through Reflection works in much the same way as IDispatch does, but it has nothing to do with the interface itself. The COM classes that implement IDispatch can be early-bound as well. Neither is late binding restricted to imported classes only. “Normal” managed types can also be late-bound by using the same mechanism.

Late binding is achieved by consecutive calls to the [mscorlib]System.Type::GetTypeFromProgID and [mscorlib]System.Activator::CreateInstance methods, followed when necessary by calls to the [mscorlib]System.Type::InvokeMember method. For example, if you want to instantiate a COM class Bar residing in the COM library Foo.dll and then call its Baz method, which takes no arguments and returns an integer, you could write the following code:

 .locals init (class [mscorlib]System.Type Typ,               object Obj,               int32 Ret) // Typ = Type::GetTypeFromProgID("Foo.Bar"); ldstr "Foo.Bar" call class [mscorlib]System.Type       [mscorlib]System.Type::GetTypeFromProgID(string) stloc Typ // Obj = Activator::CreateInstance(Typ); ldloc Typ call instance object [mscorlib]System.Activator::CreateInstance(      class [mscorlib]System.Type) stloc Obj  // Ret = (int)Typ->InvokeMember("Baz",BindingFlags::InvokeMethod, //                              NULL,Obj,NULL); ldloc Typ ldstr "Baz" ldc.i4 0x100  // System.Reflection.BindingFlags::InvokeMethod ldnull        // Reflection.Binder   don't need it ldloc Obj ldnull        // Parameter array   don't need it call instance object [mscorlib]System.Type::InvokeMember(string,               valuetype System.Reflection.BindingFlags,               class System.Reflection.Binder,               object,               object[]) unbox valuetype [mscorlib]System.Int32 stloc Ret 

An RCW converts the HRESULT returns of COM methods to managed exceptions. The only problem with this is that the RCW throws exceptions only for failing HRESULT values, so subtleties such as S_FALSE go unnoticed. The only way to deal with this situation is to set the implementation flag preservesig on the methods that might return S_FALSE and revert their signatures to the original form.

Another problem arises when the COM method has a variable-length array as one parameter and the array length as another. The type library carries no information about which parameter is the length, and the runtime is thus unable to marshal the array correctly. In this case, the signature of the method must be modified to include explicit marshaling information.

Yet another problem requiring manual intervention involves unions with overlapped reference types. Perfectly legal in the unmanaged world, such unions are outlawed in managed code. Therefore, these unions are converted into value types with .pack and .size parameters specified but without the member fields.

The manual intervention mentioned usually involves disassembling the interop assembly, editing the text, and reassembling it. Because the interop assemblies don’t contain embedded native code, this operation can easily be performed.



Inside Microsoft. NET IL Assembler
Inside Microsoft .NET IL Assembler
ISBN: 0735615470
EAN: 2147483647
Year: 2005
Pages: 147
Authors: SERGE LIDIN

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net