Code Retention

Code Retention

The sample code presented in the previous chapter is tight. If you don’t believe me, carry out a simple experiment: write a similar application in your favorite high-level Microsoft .NET language, compile it to an executable—and make sure it runs!—disassemble the executable, and compare the result to the sample offered here. Now let’s try to make the code tighter yet.

First, given what you now know about field mapping and value types as placeholders, we don’t need to continue employing this technique. If sscanf accepts string as the first argument, it can just as well accept string as the second argument too. Second, we can use (and discuss) certain “shortcuts” in the IL instruction set.

Let’s have a look at our simple sample with slight modifications (source file Simple1.il). The portions of interest are marked with the comment CHANGE!.

//----------- Program header .assembly extern mscorlib { } .assembly OddOrEven  { } .module OddOrEven.exe //----------- Class declaration .namespace Odd.or {     .class public auto ansi Even extends [mscorlib]System.Object { //----------- Field declaration         .field public static int32 val //----------- Method declaration         .method public static void check( ) cil managed {             .entrypoint             .locals init (int32 Retval)         AskForNumber:             ldstr "Enter a number"             call void [mscorlib]System.Console::WriteLine(string)             call string [mscorlib]System.Console::ReadLine()             ldstr "%d" // CHANGE!             ldsflda int32 Odd.or.Even::val             call vararg int32 sscanf(string,string,...,int32*) // CHANGE!             stloc.0 // CHANGE!             ldloc.0 // CHANGE!             brfalse.s Error  // CHANGE!             ldsfld int32 Odd.or.Even::val             ldc.i4.1  // CHANGE!             and             brfalse.s ItsEven  // CHANGE!             ldstr "odd!"             br.s PrintAndReturn  // CHANGE!         ItsEven:             ldstr "even!"             br.s PrintAndReturn  // CHANGE!         Error:             ldstr "How rude!"         PrintAndReturn:             call void [mscorlib]System.Console::WriteLine(string)             ldloc.0  // CHANGE!             brtrue.s AskForNumber  // CHANGE!             ret         } // End of method     } // End of class } // End of namespace //----------- Calling unmanaged code .method public static pinvokeimpl("msvcrt.dll" cdecl)      vararg int32 sscanf(string,stringcil managed { }

The program header, class declaration, field declaration, and method header look exactly the same. The first change comes within the method body, where the loading of the address of the global field Format is replaced with the loading of a metadata string constant, ldstr "%d". As noted earlier, we can abandon defining and using an ANSI string constant as the second argument of the call to sscanf in favor of using a metadata string constant (internally represented in Unicode), relying on the marshaling mechanism provided by P/Invoke to do the necessary conversion work.

Because we are no longer using an ANSI string constant, the declarations of the global field Format, the placeholder value type used as the type of this field, and the data to which the field was mapped are omitted. As you’ve undoubtedly noticed, there is no need to explicitly declare a metadata string constant in IL assembly language (ILAsm)—the mere mention of such a constant in the source code is enough for the ILAsm compiler to automatically emit this metadata item.

Having thus changed the nature of the second argument of our call to sscanf, we need to modify the signature of the sscanf P/Invoke thunk so that necessary marshaling can be provided. Hence the changes in the signature of sscanf, both in the method declaration and at the call site.

Another set of changes results from replacing the local variable loading/storing instructions ldloc Retval and stloc Retval with the instructions ldloc.0 and stloc.0, respectively. IL defines special operation codes for loading/storing the first four local variables on the list, numbered 0 to 3. We gain here because while the canonic form of the instruction (ldloc Retval) compiles into the operation code (ldloc) followed by an unsigned integer indexing the local variable (in this case 0), the instructions ldloc.n compile into single operation codes.

You might also notice that all branching instructions (br, brfalse, brtrue) in the method check are replaced with the short forms of these instructions (br.s, brfalse.s, brtrue.s). A standard (long) form of an instruction compiles into an operation code followed by a 4-byte parameter (in the case of branching instructions, offset from the current position), whereas a short form compiles into an operation code followed by a 1-byte parameter. This limits the range of branching to maximums of 128 bytes backward and 127 bytes forward from the current point in the IL stream, but in this case we can safely afford to switch to short forms because our method is rather small.

Short forms that take an integer or unsigned integer parameter are defined for all types of IL instructions. So even if we declare more than four local variables, we still could save a few bytes by using the instructions ldloc.s and stloc.s instead of ldloc and stloc, as long as the index of a local variable does not exceed 255.

The high-level language compilers, emitting the IL code, automatically estimate the ranges and choose whether a long form or a short form of the instruction should be used in each particular case. The ILAsm compiler, of course, does nothing of the sort. If you specify a long or short instruction, the compiler takes it at face value—you are the boss, and you are supposed to know better. But if you specify a short branching instruction and place the target label out of range, the ILAsm compiler will diagnose an error.

Once, a colleague of mine came to me complaining that the ILAsm compiler obviously could not compile the code the IL Disassembler (ILDASM) produced. The disassembler and the compiler are supposed to work in absolute concert, so I was quite startled by this discovery. A short investigation uncovered the grim truth. In an effort to work out a special method for automatic test program generation, my colleague was compiling the initial programs written in Visual C# .NET and Microsoft Visual Basic .NET, disassembling the resulting executables, inserting test-specific ILAsm segments, and reassembling the modified code into new executables. The methods in the initial executables, produced by Visual C# .NET and Visual Basic .NET compilers, were rather small, so the compilers were emitting the short branching instructions, which, of course, were shown in the disassembly as they were. And every time my colleague’s automatic utility inserted enough additional ILAsm code between a short branching instruction and its destination, the branching instruction, figuratively speaking, kissed its target label good-bye.

One more change to note in the sample: the instruction ldc.i4 1 was replaced with ldc.i4.1. The logic here is the same as in the case of replacing ldloc Retval with ldloc.0: using a shortcut operation code to get rid of a 4-byte integer parameter. The shortcuts ldc.i4.n exist for n from 0 to 8, and (-1) can be loaded using the operation code ldc.i4.m1. The short form of the ldc.i4 instruction—ldc.i4.s—works for the integers in the byte range (from -128 to 127).

Now copy the source file Simple1.il from the companion CD, compile it with the console command ilasm simple1 into an executable (Simple1.exe), and ensure that it runs exactly as Simple.exe does. Then disassemble both executables side by side, using console commands ildasm simple.exe /bytes and ildasm simple1.exe /bytes. (The /bytes option makes the disassembler show the actual byte values constituting the IL flow.) Find the check methods in the tree views of both instances of ILDASM, and double-click them to open disassembly windows, in which you can compare the two implementations of the same method to see whether the code retention worked.



Inside Microsoft. NET IL Assembler
Inside Microsoft .NET IL Assembler
ISBN: 0735615470
EAN: 2147483647
Year: 2005
Pages: 147
Authors: SERGE LIDIN

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net