Another factor that has a considerable influence upon application performance is the use of conditional jumps. In general, our recommendation ( applicable to all processor types) is to avoid using conditional jumps that involve flags.
Why is that so? It is because the latest generations of Pentium processors include microprogram tools for predicting program branching, and they operate in a specific way. Sometimes, you can achieve the necessary result even without using conditional jumps: In some cases, you can do with certain bit manipulations.
As an example, we will consider the task of finding the absolute value of a signed number. We will consider two variants of the program code ”one using conditional jumps and one not. The number is stored in the EAX register.
Here, you can see a code fragment that finds the absolute value of a number in a traditional way, by using the conditional jump commands (Listing 1.10).
. . . cmp EAX, 0 jge ex neg EAX ex: . . .
The following fragment performs the same operation without using the jge command (Listing 1.11).
. . . cdq xor EAX, EDX sub EAX, EDX . . .
For calculations of this kind, the CF carrying flag is extremely helpful. We will consider one more example, which contains conditional jump operators in its traditional implementation. Suppose you need to compare two integers ( i1 and i2 ) and set i1 equal to i2 if i2 < i1 . To make it clear, we will represent it by the corresponding C++ operator:
if (b < a) a = b
The classical variant of the assembly code for implementing this task is shown in Listing 1.12.
. . . ; Stores the value of the i1 variable in the EAX register mov EAX, DWORD PTR I1 ; Compares the contents of the EAX register with the i2 value cmp EAX, DWORD PTR I2 ; if i1 > i2, makes i1 equal to i2 jae set_i2_to_i1 jmp ex set_i2_to_i1: ; Swaps the contents of EAX and i2 xchg EAX, DWORD PTR I2 ; Stores the contents of EAX in the i1 variable mov DWORD PTR I1, EAX ex: . . .
The implementation of the same algorithm without using conditional jumps looks more elegant and appears faster in performance (Listing 1.13).
. . . mov EAX, DWORD PTR I1 mov EDX, DWORD PTR I2 sub EDX, EAX sbb ECX, ECX and ECX, EDX add EAX, ECX mov DWORD PTR I1, EAX . . .
In the next example, we select one of the two numbers according to the following pseudo-code:
if (i1!= 0) i1 = i2; else i1 = i3;
The classical solution using conditional jump commands can look like this (Listing 1.14).
. . . ; Stores the contents of the i1 i3 variables ; in the EAX, EDX, ECX registers respectively mov EAX, DWORD PTR I1 mov EDX, DWORD PTR I2 mov ECX, DWORD PTR I3 ; i1 = 0? cmp EAX, 0 ; No, i1 = i2 jne set_i2_to_i1 ; Yes, i1 = i3 mov EAX, ECX jmp ex set_i2_to_i1: mov EAX, EDX ex: mov DWORD PTR I3, EAX . . .
Now, you can see a fragment of the source code that does not use the conditional jump operators (Listing 1.15).
. . . mov EAX, DWORD PTR I1 mov EDX, DWORD PTR I2 mov ECX, DWORD PTR I3 cmp EAX, 1 sbb EAX, EAX xor ECX, EDX and EAX, ECX xor EAX, EDX mov DWORD PTR I3, EAX . . .
It is important to note that the overall performance of the application also depends on the way in which such a code fragment interacts with the rest of the program.
If you cannot do without using conditional jumps, you can try optimizing the code fragment by establishing the proper program branching. To see what this means, we will consider the following example. Suppose you have a fragment performing the following sequence of commands:
. . . test EAX,EAX jz label_A . . . <branch 1 commands> . . . jmp label_B label_A: . . . <branch 2 commands> . . . label_B: . . .
Suppose the commands of Branch 1 are executed much more frequently than those of Branch 2. In this case, the performance will be hindered by the delays caused by resetting and reinitializing the processor branching block. This is because the jump to another branch of the code is often unpredictable. We will try organizing the loop in another way:
... test EAX, EAX jnz label_A . . . <branch 2 commands> . . . jmp label_B label_A: . . . <branch 1 commands> . . . label_B: . . .
In this case, the branching block will have a much greater number of right hits, and this will increase the performance on this code fragment. The optimization of loop calculations cannot be reduced to the examples given above. To solve problems of this kind, it is necessary to have a clear idea of the principles of processor operation and its main functional units.