Optimizing Conditional Jumps

Another factor that has a considerable influence upon application performance is the use of conditional jumps. In general, our recommendation ( applicable to all processor types) is to avoid using conditional jumps that involve flags.

Why is that so? It is because the latest generations of Pentium processors include microprogram tools for predicting program branching, and they operate in a specific way. Sometimes, you can achieve the necessary result even without using conditional jumps: In some cases, you can do with certain bit manipulations.

Eliminating Conditional Jumps

As an example, we will consider the task of finding the absolute value of a signed number. We will consider two variants of the program code ”one using conditional jumps and one not. The number is stored in the EAX register.

Here, you can see a code fragment that finds the absolute value of a number in a traditional way, by using the conditional jump commands (Listing 1.10).

Listing 1.10: Finding the absolute value of a number by using conditional jumps

 . . .   cmp EAX, 0   jge ex   neg EAX ex:  . . .

The following fragment performs the same operation without using the jge command (Listing 1.11).

Listing 1.11: Finding the absolute value of a number without using conditional jumps

 . . .   cdq   xor EAX, EDX   sub EAX, EDX  . . .

For calculations of this kind, the CF carrying flag is extremely helpful. We will consider one more example, which contains conditional jump operators in its traditional implementation. Suppose you need to compare two integers ( i1 and i2 ) and set i1 equal to i2 if i2 < i1 . To make it clear, we will represent it by the corresponding C++ operator:

 if (b < a) a = b

The classical variant of the assembly code for implementing this task is shown in Listing 1.12.

Listing 1.12: A fragment of assembly code that evaluates the if (b < a) a = b expression by using conditional jumps

 . . . ; Stores the value of the i1 variable in the EAX register      mov  EAX, DWORD PTR I1  ; Compares the contents of the EAX register with the i2 value      cmp  EAX, DWORD PTR I2  ; if i1 > i2, makes i1 equal to i2      jae  set_i2_to_i1      jmp  ex    set_i2_to_i1:  ; Swaps the contents of EAX and i2      xchg EAX, DWORD PTR I2  ; Stores the contents of EAX in the i1 variable      mov  DWORD PTR I1, EAX    ex:      . . .

The implementation of the same algorithm without using conditional jumps looks more elegant and appears faster in performance (Listing 1.13).

Listing 1.13: A fragment of assembly code that evaluates the if (b < a) a = b expression without using conditional jumps

 . . .  mov  EAX, DWORD PTR I1  mov  EDX, DWORD PTR I2  sub  EDX, EAX  sbb  ECX, ECX  and  ECX, EDX  add  EAX, ECX  mov  DWORD PTR I1, EAX    . . .

In the next example, we select one of the two numbers according to the following pseudo-code:

 if (i1!= 0) i1 = i2;  else i1 = i3;

The classical solution using conditional jump commands can look like this (Listing 1.14).

Listing 1.14: An implementation of the if (i1 != 0) i1 = i2; else i1 = i3 algorithm with using the conditional jump commands

 . . . ; Stores the contents of the i1  i3 variables  ; in the EAX, EDX, ECX registers respectively     mov EAX, DWORD PTR I1     mov EDX, DWORD PTR I2     mov ECX, DWORD PTR I3  ; i1 = 0?     cmp EAX, 0  ; No, i1 = i2     jne set_i2_to_i1  ; Yes, i1 = i3     mov EAX, ECX     jmp ex  set_i2_to_i1:     mov EAX, EDX  ex:     mov DWORD PTR I3, EAX  . . .

Now, you can see a fragment of the source code that does not use the conditional jump operators (Listing 1.15).

Listing 1.15: An implementation of the if (i1 != 0) i1 = i2; else i1 = i3 algorithm without using the conditional jump commands

 . . .  mov EAX, DWORD PTR I1  mov EDX, DWORD PTR I2  mov ECX, DWORD PTR I3  cmp EAX, 1  sbb EAX, EAX  xor ECX, EDX  and EAX, ECX  xor EAX, EDX  mov DWORD PTR I3, EAX  . . .

It is important to note that the overall performance of the application also depends on the way in which such a code fragment interacts with the rest of the program.

Optimizing Program Branching

If you cannot do without using conditional jumps, you can try optimizing the code fragment by establishing the proper program branching. To see what this means, we will consider the following example. Suppose you have a fragment performing the following sequence of commands:

 . . .   test  EAX,EAX    jz    label_A      . . .      <branch 1 commands>      . . .    jmp  label_B  label_A:      . . .      <branch 2 commands>      . . .  label_B:      . . .

Suppose the commands of Branch 1 are executed much more frequently than those of Branch 2. In this case, the performance will be hindered by the delays caused by resetting and reinitializing the processor branching block. This is because the jump to another branch of the code is often unpredictable. We will try organizing the loop in another way:

 ...    test EAX, EAX     jnz   label_A        . . .        <branch 2 commands>        . . .     jmp label_B  label_A:        . . .        <branch 1 commands>        . . .  label_B:        . . .

In this case, the branching block will have a much greater number of right hits, and this will increase the performance on this code fragment. The optimization of loop calculations cannot be reduced to the examples given above. To solve problems of this kind, it is necessary to have a clear idea of the principles of processor operation and its main functional units.