Code Optimization | The Assembly Programming Master Book

In the previous few sections, I tried to explain to you how to restore the program source code on the basis of the disassembled listing. Now, you can evaluate and conclude whether I have succeeded when accomplishing this task. This can be achieved by comparing the restored code to the source code. You'll immediately notice that I mistakenly used the DO loop instead of the WHILE loop, which was used in the source code. However, if you compile and build the resulting program, it will work in the same way as the original one. The difference between the texts of the two programs (both are simple) is because the code is optimized by the translator. Because of this optimization, it is principally impossible to restore the source code using the executable code. However, it is still possible to get the program code that correctly describes the program's algorithm. This means that code analysis allows you to understand the idea of the program and what operations it executes. This is the main problem solved in this chapter.

Now, consider a small and trivial C program.

Listing 25.5: A small C program

 void main () {        int i;        char a[10];        char b[11];        for(i=0;  i<10; i++)        {               a[i]>=  'a';               *(b+i) =  'a';               printf("%c  %c\n", a[i], b[i]);        }        ExitProcess (0); }

Now, consider the Assembly code of this program obtained using Borland C++ 5.0.

Listing 25.6: Disassembled code of the program in Listing 25.5 compiled using Borland C++ 5.0

 .text:00401108 _main proc near ; DATA XREF:  .data:0040A0B8 .text:00401108 .text:00401108 var__18    = BYTE PTR -18H .text:00401108 var_C     = BYTE PTR -0CH .text:00401108 argc      = DWORD PTR 8 .text:00401108 argv      = DWORD PTR 0CH .text:00401108 envp      = DWORD PTR 10H .text:00401108 .text:00401108        PUSH  EBP .text:00401109        MOV   EBP, ESP .text:0040110B        ADD   ESP, 0FFFFFFE8H .text:0040110E        PUSH  EBX .text:0040110F        XOR   EBX, EBX .text:00401111 .text:00401111 loc_401111: ; CODE XREF:  _main+30 .text:00401111        MOV   [EBP+EBX+VAR_C], 61H .text:00401116        MOV   [EBP+EBX+VAR_18],  61H .text:0040111B        MOVSX EAX, [EBP+EBX+VAR_C] .text:00401120        PUSH  EAX .text:00401121        Movsx EDX, [EBP+EBX+VAR_C] .text:00401126        PUSH  EDX      ; CHAR .text: 00401127       PUSH  OFFSET ACC ; __VA_ARGS .text:0040112C        CALL  _PRINTF .text:00401131        ADD   ESP, 0CH .text:00401134        INC   EBX .text:00401135        CMP   EBX, 0AH .text:00401138        J1    Short Loc_401111 .text:0040113A        PUSH  0 ; uExitCode .text:0040113C        CALL  ExitProcess .text:00401141        POP   EBX .text:00401142        MOV   ESP, EBP .text:00401144        POP   EBP .text:00401145        RETN .text:00401145 _main     ENDP

By the way, the ExitProcess (0) function was introduced into the program text for a quick search of the required fragments in a debugger or in the disassembled code. Even with the naked eye, you'll notice that there was no optimization. Using this text, it doesn't make it difficult to restore the original C program. Now, consider the code optimized by Visual C++ 7.0.

Listing 25.7: Disassembled code of the program from Listing 25.5 compiled using Visual C++ 7.0

 .text:00401000 sub_401000 proc near ;  CODE XREF:  start+AF .text:00401000        PUSH  ESI .text:00401001        MOV  ESI,  OAH .text:00401006 .text:00401006 loc_401006:  ; CODE XREF:  sub_401000+18 .text:00401006        PUSH  61H .text:00401008        PUSH  61H .text:0040100A        PUSH  OFFSET ACC ; "%C  %C\N" .text:0040100F        CALL  SUB_401030 ; PRINTF .text:00401014        ADD   ESP,  0CH .text:00401017        DEC   ESI .text:00401018        JNZ   SHORT LOC_401006 .text:0040101A        PUSH  0  ;  uExitCode .text:0040101C        CALL  DS:ExitProcess .text:00401022        POP   ESI .text:00401023        RETN .text:00401023 sub_401000   ENDP

I guess that the code presented in Listing 25.7 will surprise you. But what's surprising here? Consider the code of the original C program. Two arrays of characters specified in the source code are not needed. The Visual C++ compiler immediately noticed this and modified the code so that it became impossible to restore the original code despite all efforts. Naturally, the optimization became possible only because these arrays are used in a limited area.

Now, it is time to consider some methods of optimization that might be useful when writing programs in Assembly language.

Speed versus Size

When dealing with the CPU, the topic of speed versus size is important. For example, the MOV EAX, EBX command executes faster than XCHG EAX, EBX ; however, its code is longer. Being aware of this feature, you can either set the size of your program or make it execute faster. For example, such operations as MUL are often replaced by other commands, such as SHL , and DIV operations are commonly replaced by SHR . Doing so can considerably increase the program's performance but increase its size. The interesting point is that arithmetical operations can also be carried out using the LEA command. Contemporary translators actively use this feature. Thus, the MUL command isn't encountered in the translated code as frequently as it might seem at first glance (based on the program's source code). Generally, it is a good practice to investigate properties of specific commands because you'll discover lots of interesting facts. For example, unsigned modulo-4 division is carried out as follows : AND EAX, 0003h an original approach, isn't it?

Optimizing Conditional Jumps

As you'll detect, there are some reserves . For example, the conditional check can be organized to minimize the number of jumps. This will make the program fragment under consideration run faster. Assume that you have the following code fragment:

 ...           CMP   EAX, 100           JB    L1           Fragment 1           JMP   L2     L1:           Fragment 2     L2:

You know, however, that the EAX register most often contains the values smaller than 100. Consequently, this fragment can be replaced by the following:

 ...           CMP   EAX,  100           JNB   L1           Fragment  2           JMP   L2     L1:           Fragment  1     L2:

Optimizing Procedure Calls

Consider the following code fragment:

 P1  PROC            .            .            .            CALL  P2            RET     P1  ENDP            .            .            .     P2  PROC            .            .            .            RET     P2  ENDP

This fragment can be replaced by a mote efficient one. This approach can often be encountered when viewing program code using a debugger:

 P1  PROC            .            .            .            JMP   P2     P1  ENDP            .            .            .     P2  PROC            .            .            .            RET      P2 ENDP

The code becomes shorter and faster; however, it becomes less understandable. To conclude this description of optimization techniques, I'd like to recommend that you read a book [12], in which you'll find more information on this topic.