Chapter 9: Basic Structures of Visual C .NET 2003 Inline Assembler | Visual C++ Optimization with Assembly Code

Download CD Content

This chapter will focus on how to use the C++ .NET 2003 inline assembler for optimizing applications. The inline assembler is a very effective tool for improving program performance. Microsoft has included it in the developer s environment. A review of the inline assembler concerns only Intel Pentium processors; however, the technique of using an inline assembler can be successfully applied to other types of processors as well.

In the early stages of software and hardware development, the low-level language availability in C allowed users to control a personal computer with high efficiency. The MS-DOS operating system permitted user applications to completely control a personal computer. The combination of an assembly language and C gave developers new possibilities in the writing of high-efficiency programs.

The situation has changed in the course of developments in the Windows operating system. The program still could use an assembler for controlling hardware directly, but only in Windows 95/98/Me. Advanced operating systems, such as Windows NT/ 2000/XP, have essentially limited user possibilities in controlling work on both an operating system and PC hardware.

The inline assembler for High-Level Languages (HLL) developed in the mid-90s was perceived more as a tribute to the past, than as a serious tool in software development. However, HLLs, despite powerful libraries, did not generate as effective a code as necessary. A new generation of processors required new approaches to optimization. Since Windows NT/2000/XP appeared, the problem of real-time applications increased. This created a demand for leading software vendors , such as Microsoft, Borland, IBM, and Intel, to improve an inline assembler of HLLs.

An inline assembler is also a very effective tool for loop optimization, data processing, and the implementation of high-efficiency mathematical calculations. Algorithms and stand-alone functions written in an assembly language are widely used while developing Windows device drivers and system services. Note that modern compilers of HLLs (not only C++ .NET) do not use 100% of the latest processors capabilities. A comprehensive implementation of a processor s capabilities could be accomplished only with an assembly language.

Discussions regarding whether an assembler is useful or not for developers who write their applications in HLL have ceased. It became clear that an assembler is an integral part of all programs and one of the basic tools for improving HLL applications performance.

A comparison between an inline assembler and a separate compiler, such as MASM or IA-32, does not address the question as to which is more effective. The advantage of a separate assembler is in minimizing the use of computer resources (memory and processor time). Separately compiled modules (object files) are very useful while implementing the same algorithms for other programs.

The C++ .NET inline assembler does not allow the user to create separate modules to be used in other applications, though a user can do this by writing dynamic link libraries (DLL) with the inline assembly functions included. The inline assembler is closely integrated with the development environment, which provides many advantages.

The inline assembler doesn t require separate assembly and link steps. This is more convenient than working with a separate assembler. An inline assembly code can use any C++ variable or function name that allows easy integration with a C++ code. Also, an assembly code can be mixed with C++ statements, which allows the user to perform tasks that otherwise could not be executed.

The main disadvantage of this tool is the strong link with the C++ .NET compiler that complicates the application debugging in achieving optimal performance.

The Visual C++ .NET development environment includes the most powerful tools for supporting assembly programming. Any assembler code included in a C++ program must begin with the _asm keyword and be enclosed in braces. The following code is a simple _asm block:

 _asm {        mov EAX, val1        sub EAX, EBX      }

You can also write _asm at the beginning of each assembly instruction:

 _asm mov EAX, val1  _asm sub EAX, EBX

If the _asm keyword is a statement separator, you can write assembly instructions on the same line:

 _asm mov EAX, val1  _asm sub EAX, EBX

Examples above generate the same code, but the first style with the _asm block enclosed in braces has certain advantages. First, the assembly code is separated from a C++ code by braces and it avoids repetition of the _asm keyword. Braces prevent from ambiguities . Second, when you want to put a C++ statement on the same line as the _asm block, you must enclose the block in braces. Finally, because the text in braces has the same format as MASM text, it is easy to cut text from existing MASM source files and paste it into a C++ source module.

The C++ .NET development environment offers a very convenient way to use an inline assembly code with macros. You can insert macros into a program by following these steps:

Close the _asm block with braces.
Write the _asm keyword at the beginning of each assembly instruction.
Use the old style for comments (/* the comment*/) instead of the assembler style (;) or the one-line variant (// the comment) .

The following code fragment illustrates how to create a macro to output a data byte to the printer s parallel port:

 #define PORTIO378 _ _asm  /* Port output */  {     _ _asm mov AL, 0x3     _ _asm mov EDX, 0x378     _ _asm out EDX, AL  }

This macro can be overwritten as follows :

 #define PORTIO378 {_asm mov AL, 0x3 _asm mov EDX, 0x378 _asm out EDX, AL}

A macro written in assembler can accept one or several parameters as opposed to the usual macros written in C++, in which the assembly macro does not return a result. Therefore, it is impossible to use such macros in C++ expressions. Be careful while using assembly macros with parameters. For example, the function call of a macro used in other functions and declared as _fastcall can be complicated and interfere with results.

The term function is used in C++ definition for separate subroutines, which will be explored in further detail.

Programmers using MASM may ask: How does the C++ .NET development environment support syntax of MASM?

Many MASM constructions, such as DB , DW , DD , DQ , DF , or operators DUP and THIS , are not supported. The inline assembler also does not support instructions such as STRUC , RECORD , WIDTH , and MASK .

Assembly operators such as LENGTH , SIZE , or TYPE are limited in use with applications developed in Visual C++ .NET. Operators cannot be applied with operator DUP , because the instructions DB , DW , DD , DQ , and DF aren t used for data definition. However, they can be used to define variable sizes as follows:

LENGTH returns a number of array elements or 1 for common variables .
SIZE returns the size of a C or C++ variable.
TYPE returns a variable size. If the variable specifies an array, this operator returns the size of a single array element.

For example, suppose that the program uses an array of 20 integers declared like this:

 int iarray [20],

The result of using these operators appears in Table 9.1.

Table 9.1: Correspondence of assembly operators to C++ operators
_asm operator	Analog in C++	Size
LENGTH iarray	sizeof (iarray) / sizeof (iarray [0])	20
SIZE iarray	sizeof (iarray)	80
TYPE iarray	sizeof (iarray [0])	4

Comments in a program and in MASM are separated from operators by a semicolon, as shown in the following example:

 _asm {        mov EAX, vail ; Comment for the first line        sub EAX, EBX ; Comment for the second line  }

Since inline-assembly commands are alternated with C++ operators, they can refer to structures and variables used in C++ .NET. That is why various C++ elements can be used in the _asm block:

Symbols, including labels, variables, and function names
Constants, including characters and strings
Macros and preprocessor instructions
Comments in the C style (/**/and //)
typedef names, normally used with PTR and TYPE operators or for accessing union or structure elements

Within an assembly block, you can define integer constants accepted both in C++ and the assembly language. For example, the blank symbol can be written as 0x20 or 20h .

The use of the define instruction for constant definition also is possible. Such definition will operate both in the _asm block and in the C++ program.

Before continuing our discussion of inline assembly tools, we will illustrate the details mentioned above. We will use an assembly macro to calculate the difference between two integers (Listing 9.1).

Listing 9.1: Using a macro with the inline assembler

 // USE_ASM_MACRO_IN_C.cpp: Defines the entry point for the console  // application.  #include "stdafx.h"  #define sub2 (x1,x2) {_asm mov EAX, x1 _asm sub EAX, x2  _asm mov x1, EAX}  int _tmain(int argc, _TCHAR* argv[])  {      int i1 = 357;      int i2 =   672;  sub2 (i1, i2);  printf("i1   i2 = %d\n", i1);      getchar ();      return 0; }

The application window is shown in Fig. 9.1.

Fig. 9.1: Application window demonstrating how to use an assembly macro

There are special features of using C++ operators in assembly blocks. It is impossible to directly apply C++ operators. At the same time, some operators have a completely different meaning in the assembler and in C++. For example, the square brackets operator [] in C++ is used to define the size of an array. In the inline assembler, the same operator is applied to indexing access to variables. The misunderstanding of such operators will lead to difficult-to-locate errors in the program.

The following example shows both the correct and incorrect use of square brackets. To illustrate this tip, we will develop a C++. NET application.

We will place onto the main application form three Edit controls, Button control, and three Label controls for the static text. The main program will include the function written in the inline assembler and a handler for the buttons . The handler will be used to display the results of calculations. We have an integer array with 5 elements. We need to replace the element with index 3 with the integer value equal ˆ’ 115 and chosen randomly . Replacement will be implemented in two ways (in both cases, the _asm block is used).

We will use the CString variables necessary for formatting a result and displaying it in edit boxes, and then link the array elements from the edit boxes to those variables. We will assign the iOrigin variable to Edit1 (label Original ); iAsmCorr to Edit2 (label Correct ); and iAsmWrong to Edit3 (label Wrong ). While a button is pressed, the corresponding edit boxes will be filled with the elements from the source array ( Edit1 ), elements of correctly converted array ( Edit2 ), and elements of incorrectly converted array ( Edit3 ). The program source of a button handler is submitted in Listing 9.2.

Listing 9.2: Using the square brackets in the _asm block through a button handler

 . . .  #include <string.h>  #define NUM_BYTES 4    . . .   void CUSING_OPERATORS_BRACKETSDlg::OnBnClickedButton1()  {    // TODO: Add your control notification handler code here.    int arr[5] = {4, 0, 9,   7, 50};    int arrw[5], arrc[5];    memcpy(arrw, arr, NUM_BYTES * 5);    memcpy(arrc, arr, NUM_BYTES * 5);    int* parr = arr;    int isize = sizeof(arr) / 4;    int cnt;    CString stmp;    stmp.Empty ();    for (cnt = 0; cnt < isize; cnt++)    {      stmp. Format ("%d", *parr);      iOrigin = iOrigin + " " + stmp;      parr++;   };   parr = arrc;   _asm {         mov EAX,   115         mov arrc [3 * TYPE int], EAX        };   stmp.Empty ();   for (cnt = 0; cnt < isize; cnt++)   {     stmp. Format ("%d", *parr);     iAsmCorr = iAsmCorr + " " + stmp;     parr++;   };   parr = arrw;   _asm {         mov EAX,   115         mov arrw [3], EAX        };   stmp.Empty ();   for (int cnt = 0; cnt < isize; cnt++)   {     stmp. Format ("%d", *parr);     iAsmWrong = iAsmWrong + " " + stmp;     parr++;   };   UpdateData(FALSE);  };

The application window is represented in Fig. 9.2.

Fig. 9.2: Application window demonstrating the correct and incorrect use of C++ operators

We will carry out the analysis of the button handler code. During the program start-up, the functions create two copies of the source array:

 memcpy (arrw, arr, NUM_BYTES * 5);  memcpy (arrc, arr, NUM_BYTES * 5);

Since here bytes are copied , the last parameter of the memcpy functions is a number of bytes of the array. The prototype of memcpy is defined in string.h , therefore, the header file is included in the declaration section:

 #include <string.h>

The button-pressing handler has three for loops that prepare the buffers of iOrigin , iAsmCorr , and iAsmWrong for outputting values to the edit boxes. Consider in detail what occurs with the arrc and arrw arrays while attempting to replace their fourth element. Instructions on how to store value ˆ’ 115 in the arrc array are shown here:

 parr = arrc;                     // Initialization of the index  _asm {     mov EAX,   115                // Move a value in the EAX register     mov arrc [3 * TYPE int], EAX // Move contents of EAX to an address                             // An element with index 3 (it is correct!)  };

Now, the 4th element of the arrc array is ˆ’ 115. Another situation occurs while manipulating with the arrw array. The 4th element is stored if the following commands are completed:

 parr = arrw;  _asm {     mov EAX,   115     mov arrw [3], EAX // THE WRONG COMMAND!  };

Note that after this chunk of code is completed, the four bytes in memory will be overwritten starting from the element with index 3. Obviously, the 4th byte is the last one for the first element of the array; and bytes 5 to 7 overlap the memory occupied by the second element of the array. As a result, the contents of the first two elements of the array will be destroyed (see Fig. 9.2).

Similar situations can occur while developing applications with the C++ .NET inline assembler. It is necessary to trace carefully all conversions in such programs.

As mentioned, you can use any C++ symbols inside an _asm block, but there are some limitations:

Any assembly command can refer to only one symbol (variable, function, or label). To use several symbols in one command, they should be included in LENGTH , TYPE , and SIZE .
Functions that refer to the commands of the _asm block should be declared in advance; otherwise, the compiler will not be able to distinguish a reference to the function from a label.
It is impossible to use C++ symbols that are similar to MASM instructions in assembly blocks.
Structures and unions are not distinguished in _asm blocks.

The most valuable feature of the C++ .NET inline assembler is its ability to recognize and use C++ variables. Suppose that in the C++ module, an inline assembler is used and the val1 and val2 variables are declared. In that case, the following reference in _asm block will be correct:

 _asm {        mov EAX, val1        add EAX, val2  }

C++ functions return the result to the main program through the return operator. For example, the following function (named as MulInts ) returns value i1 * i2 + 100 (Listing 9.3).

Listing 9.3: The function returning results through a return operator

 int CReturnValueinregisterEAXwithinlineassemblerDlg:: Mulints (int i1,                                                  int i2)  {   int valMul;    _asm {          mov EAX, i1          mov EBX, i2          mul EBX          xchg EAX, EDX          add EDX, 100          mov valMul, EDX  };   return valMul;  }

The inline assembler allows the user to avoid the return operator by using the EAX register instead. The same function MulInts with few changes is illustrated below (Listing 9.4).

Listing 9.4: The function returning result in the EAX register

 int CReturnValueinregisterEAXwithinlineassemblerDlg:: MuIints (int i1,                                                  int i2)  {    _asm {          mov EAX, i1          mov EBX, i2          mul EBX          add EAX, 100  };  }

In spite of the fact that the function does not return a result through return , the compiler can not generate the error message and provide information about missing the return statement.

While developing the inline assembly code, there is no need to keep registers EBX , ESI , and EDI . If the program uses these registers, the compiler saves them while the function call is executing and automatically restores after it is finished. Keep in mind that numerous calls of such functions can decrease the program performance to some degree.

If your program uses the assembly command cld or std , it is necessary to restore a direction flag when you exit from the function.

Often, it is necessary to use C++ .NET library functions in _asm blocks or macros. Such a combination of assembly commands and library functions allows both to reduce the code size and to increase the application performance. To use these options effectively, you should clearly understand interfacing between the inline assembler and C++ .NET standard functions. We will introduce the steps on how such interface can be built. Take a standard C++ function, for example, printf . The main program is developed as the console application. It consists of practically a single assembly block that subtracts two integers and outputs a result on the screen using the printf function.

The application is shown in Listing 9.5.

Listing 9.5: Using the printf library function in the assembly block

 // CALL_C_FUNC_IN_INLINEASM.cpp : Defines the entry point for the console  // application.  #include "stdafx.h"  #include <stdio.h>  int _tmain(int argc, _TCHAR* argv[])  {         int i1, i2, ires;         char cl[] = "Result of substraction = %d\n";         while (true)         {         printf("\nEnter i1: ");         scanf("%d", &i1);         printf("Enter i2: ");         scanf("%d", &i2);         _asm {               mov EAX, i1               sub EAX, i2               ires, EAX               push ires               lea EAX, c1               push EAX               call printf               add ESP, 8             };      };    return 0;  }

A window of the working application is shown in Fig. 9.3.

Fig. 9.3: Application window illustrating a call of C++ function from the assembly block

The first three lines of the _asm { } block are quite accessible. The printf call requires passing the parameters to this function. To do it correctly, the C++ function printf will look like:

 printf ("Result of substraction=%d\n ", ires)

The printf function requires two parameters ”an address of the string and the ires variable. Since function arguments are passed via the stack, simply push the needed arguments ”a string pointer and an integer value ”before calling the function. The arguments are pushed in reverse order and come off the stack in the desired order.

The chunk of a code, which implements the printf call, is realized as follows:

 push ires  lea EAX, c1  push EAX  call printf

Use the _cdec1 conventions for all C++ .NET projects. This means that the main program must clear the stack. The command add ESP, 8 is used for this purpose. This is very important, because if you forget to clear the stack, the application will crash immediately. The same thing will happen if the number of bytes popped from the stack differs from pushed onto it. If you mark the add ESP, 8 command as a comment (with //) and recompile the project, you will see an error message generated by a debugger after running the application (Fig. 9.4).

Fig. 9.4: Message from the debugger not operating with the stack

It is also important to note that it is not necessary to change a name of the (printf) function as in the case when you work with the separately compiled assembly procedures.

The following example is more complicated. The program takes two integers as strings, converts them to an integer format using the atoi library function (ASCII to Integer), and implements their multiplication. The results will be displayed on the screen. Strings s1[16] and s2[l6] save the symbolic representation of integers i1 and i2 . All operations are implemented in an assembly block. The source code is shown in Listing 9.6.

Listing 9.6: Using C++ functions atoi and printf within an assembly block

 // ATOI_INLINE.cpp: Defines the entry point for the console application.  #include "stdafx.h"  int _tmain (int argc, _TCHAR* argv [])  {  int i1, i2, ires;  char s1 [16], s2 [16];  char c1 [] = " i1 * i2 = %d\n ";  while (true)  {   printf (" \nEnter i1: ");   scanf ("%s", s1);   printf (" Enter i2: ");   scanf ("%s", s2);   _asm {         lea EAX, s1         push EAX         call atoi         add ESP, 4         mov i1, EAX         lea  EAX, s2         push EAX         call atoi         add ESP, 4         mov i2, EAX         mov EAX, i1         mov EBX, i2         imul EBX         mov ires, EAX         push ires         lea EAX, c1         push EAX         call printf         add ESP, 8  };  };  return 0;  }

The application window is represented in Fig. 9.5.

Fig. 9.5: Window of the application demonstrating the use of the library functions atoi and printf

We will focus on the key points of this program. The atoi function has the following syntax:

 int atoi (const char* str),

where the str variable is a pointer to the string. The function returns an integer value. Note that that atoi accepts a single parameter ”an address of the string.

The assembly variant of the function call (for the s1 string and i1 variable) looks like this:

 lea  EAX, s1  push EAX  call atoi  add  ESP, 4  mov  i1, EAX

The function passes a result in the usual way, through the EAX register. Further, the content of EAX is stored in the i1 variable. Similar operations are implemented with variables s2 and i2 .

Multiplication is executed by a block of commands:

 mov  EAX, i1  mov  EBX, i2  imul EBX

The printf function outputs results in the same way. The stack is cleared with the add ESP, n command, where n is the total number of bytes needed for storing parameters.

It is important to stress an important aspect of using the C++ .NET library functions. While developing the console applications, the Win32 framework was used with the option Console application . If this option is Empty project , you must manually include all header files needed for a project.

The next example is the most complex. It shows technique of using the inline assembler in applications. We will develop the application with a dialog window using the C++ .NET application wizard. The application should display a result of the calculation of (X1 ˆ’ X2) * (X1+X2) on the screen. We will place on the main form of the application three Edit controls. Two edit boxes accept integers X1 and X2 , and the third edit box displays the result of the calculations (X1MULX2) . Also, we will place three Static Text controls and the Button control. The final calculations will be displayed by clicking the left button of the mouse.

Now, we will link the variables X1, X2 and X1MULX2 to the Edit controls. Also we need to use two integer variables I1 and I2 (shown in bold in Listing 9.7). For intermediate calculations, the following functions should be developed with the inline assembler:

Listing 9.7: Complex example of using the inline assembler

 // CALL_FROM_INLINEASMDlg.cpp: implementation file  #include "stdafx.h"  #include "CALL_FROM_INLINEASM.h"  #include "CALL_FROM_INLINEASMDlg.h"  #include ".\call_from_inlineasmdlg.h"  #ifdef _DEBUG  #define new DEBUG_NEW  #endif  int I1, I2;  ...  int CCALL_FROM_INLINEASMDlg:: Add2ints (int i1, int i2)  {   _asm {         mov EAX, i1         add EAX, i2  }  // return 0;  }  int CCALL_FROM_INLINEASMDlg:: Sub2ints (int i1, int i2)  {   _asm {         mov EAX, i1         sub EAX, i2  };  // return 0;  }   int CCALL_FROM_INLINEASMDlg:: Imul2 (void)   {    asm {         push I2         push I1         call Add2ints         mov EDX, EAX         push I2         push I1         call Sub2ints         mov EBX, EAX         mov EAX, EDX         imul EBX  };  // return 0;  }  void CCALL_FROM_INLINEASMDlg:: OnBnClickedButton1 ()   {    // TODO: Add your control notification handler code here    UpdateData (TRUE);    I1 = X1;    I2 = X2;    X1MULX2 = Imul2 ();    UpdateData (FALSE);  }

Add2ints ”for calculation of X1 + X2
Sub2ints ”for calculation of X1 ˆ’ X2
Imul2 ”for calculation of (X1 ˆ’ X2) * (X1 + X2)

After generating the function frames , mark as a comment return 0 (bold font in Listing 9.7), because these functions return results to the EAX register. Consider how the Add2ints and Sub2ints functions calls are implemented using the Imul2 function. Parameters are passed via a stack as usual, so it is not necessary to use the add ESP, n command for clearing the stack. The compiler automatically includes prologue- epilogue commands for functions. Therefore, including the add ESP, n command after the call command will cause an error in the stack and the crash of the program! The source code of the application is shown in Listing 9.7.

A window of the working application is shown in Fig. 9.6.

Fig. 9.6: Window of the application using the inline assembler

This concludes our consideration of the Visual C++ .NET inline-assembly capabilities. The following chapters focus on practical aspects of programming with the inline assembler.