Main Optimization Points

So, what parts of the program code, created in the C++ .NET environment, can be optimized with the help of assembly language?

Loops and Conditional Jumps

Loops and conditional jumps are structures such as WHILE , DO WHILE , IF ELSE , SWITCH CASE . You can replace them easily with their equivalents in assembly language. It is often necessary to do so, especially if you have a set of similar calculations repeated many times. By sparing several instructions in each of such calculation loops, you can achieve considerable gain in application performance. This is equally applicable to both the independently compiled modules and the assembly blocks and functions used in the C++ .NET environment.

Assembly language also allows you to optimize the calculation algorithms inside the WHILE and DO WHILE loops. If it takes only several assembly commands to implement such a calculation block, then the best way is to use the built-in assembler language. In this case, the use of separate modules will not produce the required effect, and it can even slow down the program. It is important to note that a separate assembler module must contain the completed functions that require performing the prologue and the epilogue every time you call them. This means that every time the program calls a function from a separate module, you need to save certain registers in the stack and then restore them. For small programs, the delay will be tolerable, but for more serious applications it can be considerable.

The IF ELSE structures are not easy to optimize, but there are certain techniques that can be helpful. For example, instead of calculating two jump conditions, you can use one condition and one assignment operator.

Later in this chapter, these issues will be addressed in more detail with practical examples: see the sections Optimizing Loop Calculations and Optimizing Condi tional Jumps .

Mathematical Calculations

The large optimization potentials for C++ .NET applications lie in improving the mathematical operations. In loop calculations, the frequently repeated fragment of the program code can be implemented in assembly language, and there are often several ways to do this. The integer operations are usually easy to translate into assembly language, while for the floating-point operations, this task is much more complicated. In optimizing mathematical operations, an important role belongs to the mathematical coprocessor, or FPU (Floating-Point Unit) that performs operations over the floating-point numbers .

The FPU provides the system with additional mathematical calculation power, but does not replace any of the CPU commands. Commands such as add , sub , mul , and div are still performed by the CPU, while the FPU takes over the additional, more efficient arithmetic commands. The developer may view a system with a coprocessor as a single processor with a larger set of commands.

For more detail and practical examples on using the FPU, see the section for Optimizing Mathematical Calculations in this chapter.

Processor-Level Optimization (Using the SIMD Technologies)

A special role in the optimization process belongs to the SIMD (Single Instruction ”Multiple Data) technologies. These are implemented in such extensions as MMX (MultiMedia extensions) for integers and SSE (Streaming SIMD Extensions) for floating-point numbers. They facilitate the processing of several operands simultaneously . These technologies appeared quite recently, in the latest generations of the processors. To use them successfully, the developer should know the processor architecture and the system of commands, and also have a clear understanding of how to use certain functional units of the processor (registers, the cache, the command pipeline, the arithmetic logic unit, the floating-point unit, etc.). For more detail on main aspects of use of the MMX and SSE, see the Using SIMD Technologies (MMX, SSE) section in this chapter.

The early processor models used to have a rather simple architecture. They included a small set of assembly commands and operated a limited set of registers. These limitations were serious obstacles for using assembly language in developing serious applications. They were mostly used for accessing computer hardware resources and for creating hardware drivers.

The processor-level optimization of the program code lets you improve the performance of both the high-level language applications and the assembly procedures themselves . Developers working with high-level languages are often unaware of this optimization method; however, it can provide virtually unlimited possibilities. Those who develop assembly programs and procedures do sometimes make use of the features of the new processor models.

Also note that even the earlier models of the Intel processors include some additional commands. Though rarely used by developers, these commands can help you increase the efficiency of your program code.

The processor commands that perform copying and moving the multibyte data arrays require a smaller number of processor cycles than the classical commands of this type. Beginning with the MMX type, the processors add complex commands combining several functions performed by separate commands. There is now a considerably larger set of commands for bit operations. These commands are complex, as well, allowing you to perform several operations simultaneously. The options provided by these commands will be covered in Chapter 10 , where we will explore built-in tools of the high-level languages.

As already explained, great optimization potentials depend upon correct use of the features of the processor s hardware architecture. These are quite complicated matters, requiring you to know the methods of data processing and performing the processor commands on the hardware level. This area contains virtually unlimited potential for program optimization.

Naturally, the processor-level optimization has its own peculiarities . For instance, if your program should run on systems with the processors of several generations, then you should optimize the program based on the common features of all those devices.

Like the high-level languages, all modern assembly development tools come with an integrated debugger. Although such a debugger can offer you a somewhat lower level of service as compared to the high-level languages, its features are satisfactory for analyzing the program code. Since assembly language is the closest to the machine language, the advantages of the new generations of processors bring their immediate results to assembly programming.

Optimizing High-level Language Applications

Optimizing high-level language programs by using assembly language is somewhat labor- intensive . However, according to various estimations, it has been shown to increase application performance by 3 “4 to 14 “17 per cent.

To improve the high-level language programs, you can either use the assembly code in certain fragments of the program or implement the calculation algorithms in assembly language completely.

In practical work, the assembly optimization is efficient for the following tasks :

  • Optimizing the loop calculations.

  • Optimizing the processing of large amounts of data by using special string commands which actually let you process both character and numeric data.

  • Optimizing the mathematical calculations. It is important to note that an increase in performance is achieved by using both the common mathematical operations and the Floating-Point Unit (FPU) commands. A special place belongs to the SIMD technology that lets you increase the application performance by a multiple.

In addition to the options noted above, it is also extremely important to combine assembly and high-level languages correctly. This largely depends on the developer s experience and background. Even in such a complicated field as assembly optimization, there are certain empirical rules that can be applied more or less successfully.

And yet, by simply replacing, say, the WHILE loop in a C++ program, you can sometimes achieve no increase in performance. This concerns other assembly analogs of the high-level language operators, too. In most cases, an increase in performance can be achieved only if you analyze a certain code fragment thoroughly. For example, assembly language gives you several different ways to implement the WHILE loop. The implementations you choose for your projects may differ from the classical calculation patterns covered in assembly language manuals. To optimize a C++ code fragment using assembly language, it is not enough to know the assembly commands and their syntax. The most important thing is to have a clear idea of how these commands work in different combinations; otherwise , the use of assembly language may give you no gain at all!

Later in this chapter, we will analyze ways to build highly efficient calculation programs in assembly language. Here, we will not concern ourselves with optimizing the C++ .NET 2003 high-level structures; this is a separate field that will be covered in Chapter 4 . Here, we will focus on the basic principles of optimal assembly programming.



Visual C++ Optimization with Assembly Code
Visual C++ Optimization with Assembly Code
ISBN: 193176932X
EAN: 2147483647
Year: 2003
Pages: 50
Authors: Yury Magda

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net