Defining Literal Binary Data

In this section I'm going to show you a useful technique in IL that allows us to embed hard-coded binary data in an assembly. Such data is specifically placed in a section of the PE file known as the .sdata section - it's not placed with the module's metadata. (All assemblies follow the PE file format - we'll examine this issue in Chapter 4.) In general, this technique is most useful in your own IL code for arbitrary binary data (blobs). However, to demonstrate the technique, I'm going to use embedded native strings as the data, and develop the previous PInvoke() sample into an application that displays a message box, but passes in unmanaged ANSI strings instead of relying on the platform invoke mechanism to marshal managed strings.

In general, embedding unmanaged strings in an assembly is not a technique I'd recommend if you are writing your own IL code, because it will make your code more complex and therefore harder to debug. In some situations, doing this can give a marginal performance improvement (for example, if you are passing ANSI strings to unmanaged code - in this case by storing the strings as ANSI strings, you save the cost of the Unicode-to-ANSI marshaling conversion). In any case, using embedded strings saves the initial string copy involved in a ldstr. However, the performance benefit from these is likely to be marginal and in most cases code robustness is more important. However, if you port unmanaged C++ code to the CLR, the C++ compiler will need to use this technique in order to cope with any unmanaged strings in the original C++ code (since this is compiler-generated code, code maintainability is not an issue - the code will definitely be correct!) It's therefore quite likely that if you work with C++, you will see embedded unmanaged strings in your assembly - which is why I've chosen strings as the data type to illustrate the embedded blob technique.

To embed and access the data requires a few new keywords. The easiest way to see how to do it is to examine some of the code that we will use in the sample:

 .class public explicit sealed AnsiString extends  [mscorlib]System.ValueType {    .size 13 } .data HelloWorldData = bytearray(48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 0) .field public static valuetype Wrox.AdvDotNet.DataDemo.AnsiString HelloWorld at HelloWorldData 

There are three stages here:

  • We define a placeholder value type that we can use to hold data. For this example, the new type is called AnsiString.

  • We reserve space for our data in the .sdata section of the PE file using the .data directive. The above code indicates that the name HelloWorldData will be used to refer to this data.

  • We declare a global variable called HelloWorld, of type AnsiString, and indicate that the location of this variable is in the .sdata section, at the HelloWorldData address. This is an unusual variable declaration: no new memory is allocated for the variable, nor is the variable initialized. We just indicate that this existing and pre-initialized block of memory is to be interpreted as forming the HelloWorld instance. This effectively means that the variable name, HelloWorldData, has for all practical purposes simply become a reference into the .sdata section.

When we define the AnsiString type, we don't define any member fields or methods. Instead, we use the .size directive to indicate how many bytes each instance of the type should occupy:

 .class public explicit sealed AnsiString extends [mscorlib]System.ValueType {    size 13 } 

This means that the type is assumed to take up 13 bytes (and if we ever instantiate it without using the at keyword, the CLR will reserve 13 bytes of memory for it). We've chosen 13 because that's how many bytes are needed to hold the string "Hello, World" as a C-style string - 12 characters plus a terminating zero.

The .data directive which initializes the block of memory in the .sdata section indicates the data to be placed in the memory using a byte array:

 .data HelloWorldData = bytearray(48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 0) 

The bytearray keyword simply indicates that we are explicitly specifying the numerical value to be placed in each byte. The values are given in hexadecimal, though without the prefix 0x that you would normally expect. It might not look obvious that my bytearray contains "Hello, World", but I promise you it does. Notice the trailing zero, which native API functions will interpret as indicating the end of a string.

Now we can present the code for the sample. First here's our new definition of the MessageBox() wrapper:

 .method public static pinvokeimpl("user32.dll" winapi) int32 MessageBox(      native int hWnd, int8* text, int8* caption, unsigned int32 type) { } 

This is what I think is one of the amazing things about P/Invoke: it's the same underlying native method, but a different definition for our wrapper - and it will still marshal over correctly. Instead of defining the second and third parameters as strings, we've defined them as int8* - our first ever use of an unmanaged pointer in IL in this book. Why int8*? Because int8 is the equivalent of an unmanaged ANSI character. C-style strings are, as we've said, basically pointers to sets of these characters, so what the native MessageBoxA() method is expecting is actually a pointer to an 8-bit integer (the first character of the string).

Now for the rest of the code. First the data definitions: we're actually defining two lots of data, represented by variables HelloWorld and Hello, to store the two strings that will be passed to MessageBox(). Notice that both sets of data are padded out to the 13 characters:

 .namespace Wrox.AdvDotNet.DataDemo {    .class public explicit sealed AnsiString                                  extends [mscorlib]System.ValueType    {       .size 13    }    .data HelloWorldData = bytearray(48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 0)    .data HelloData = bytearray(48 65 6c 6c 6f 0 0 0 0 0 0 0 0)    .field public static valuetype Wrox.AdvDotNet.DataDemo.AnsiString                                                HelloWorld at HelloWorldData    .field public static valuetype Wrox.AdvDotNet.DataDemo.AnsiString Hello                                                           at HelloData 

Now for the Main() method:

    .method static void Main() cil managed    {       .maxstack 4       .entrypoint       ldc.i4.0       ldsflda  valuetype Wrox.AdvDotNet.DataDemo.AnsiString HelloWorld       ldsflda  valuetype Wrox.AdvDotNet.DataDemo.AnsiString Hello       ldc.i4.1       call     int32 MessageBox(native int, int8*, int8*, unsigned int32)       pop       ret    } } 

This looks similar to the Main() method for the previous sample to the extent that all we are doing is loading the four parameters for MessageBox() onto the evaluation stack then calling MessageBox(). The second and third parameters are different though - instead of loading strings using ldstr, we are loading the addresses of the two ANSI strings stored in the metadata. We do this using ldsflda. Recall that ldsflda loads the address of a static field, as a managed pointer. Since ldsflda needs no information other than the token supplied in the argument to identify the field, it doesn't pop any data off the stack.

This sample when run will do exactly the same thing as the previous sample. On Windows 9x, the new sample will run slightly faster. However, there is a price (beyond the greater complexity of the code): there's no way that we are ever going to get this sample through a type-safety check. Just look at what we are doing. We load two items onto the stack as managed pointers to AnsiString structs, then we pass them off as if they were unmanaged int8*s when invoking the method! The JIT compiler will compile the code - it's valid code, since the type on the stack is defined at each point. But because we then interpret the types on the stack incorrectly in the call method, the code will fail type safety, so if you run peverify on the sample, you'll get quite a few type-safety failure messages.



Advanced  .NET Programming
Advanced .NET Programming
ISBN: 1861006292
EAN: 2147483647
Year: 2002
Pages: 124

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net