26.6. Recoding in a Low-Level Language | Code Complete: A Practical Handbook of Software Construction, Second Edition

< Free Open Study >

One long-standing piece of conventional wisdom that shouldn't be left unmentioned is the advice that when you run into a performance bottleneck, you should recode in a low-level language. If you're coding in C++, the low-level language might be assembler. If you're coding in Python, the low-level language might be C. Recoding in a lowlevel language tends to improve both speed and code size. Here is a typical approach to optimizing with a low-level language:

Write 100 percent of an application in a high-level language.
Fully test the application, and verify that it's correct.
If performance improvements are needed after that, profile the application to identify hot spots. Since about 5 percent of a program usually accounts for about 50 percent of the running time, you can usually identify small pieces of the program as hot spots.
Cross-Reference

For details on the phenomenon of a small percentage of a program accounting for most of its run time, see "The Pareto Principle" in Section 25.2.
Recode a few small pieces in a low-level language to improve overall performance.

Whether you follow this well-beaten path depends on how comfortable you are with low-level languages, how well-suited the problem is to low-level languages, and on your level of desperation. I got my first exposure to this technique on the Data Encryption Standard program I mentioned in the previous chapter. I had tried every optimization I'd ever heard of, and the program was still twice as slow as the speed goal. Recoding part of the program in assembler was the only remaining option. As an assembler novice, about all I could do was make a straight translation from a high-level language to assembler, but I got a 50 percent improvement even at that rudimentary level.

Suppose you have a routine that converts binary data to uppercase ASCII characters. The next example shows the Delphi code to do it:

Delphi Example of Code That's Better Suited to Assembler

procedure HexExpand(    var source: ByteArray;    var target: WordArray;    byteCount: word ); var    index: integer;    lowerByte: byte;    upperByte: byte;    targetIndex: integer; begin    targetIndex := 1;    for index := 1 to byteCount do begin       target[ targetIndex ] := ( (source[ index ] and $F0) shr 4 ) + $41;       target[ targetIndex+1 ] := (source[ index ] and $0f) + $41;       targetIndex := targetIndex + 2;    end; end;

Although it's hard to see where the fat is in this code, it contains a lot of bit manipulation, which isn't exactly Delphi's forte. Bit manipulation is assembler's forte, however, so this code is a good candidate for recoding. Here's the assembler code:

Example of a Routine Recoded in Assembler

procedure HexExpand(    var source;    var target;    byteCount : Integer );    label    EXPAND;    asm          MOV   ECX,byteCount      // load number of bytes to expand          MOV   ESI,source         // source offset          MOV   EDI,target         // target offset          XOR   EAX,EAX            // zero out array offset    EXPAND:          MOV   EBX,EAX            // array offset          MOV   DL,[ESI+EBX]       // get source byte          MOV   DH,DL              // copy source byte          AND   DH,$F              // get msbs          ADD   DH,$41             // add 65 to make upper case          SHR   DL,4               // move lsbs into position          AND   DL,$F              // get lsbs          ADD   DL,$41             // add 65 to make upper case          SHL   BX,1               // double offset for target array offset          MOV   [EDI+EBX],DX       // put target word          INC   EAX                // increment array offset          LOOP  EXPAND             // repeat until finished    end;

Rewriting in assembler in this case was profitable, resulting in a time savings of 41 percent. It's logical to assume that code in a language that's more suited to bit manipulation C++, for instance would have less to gain than Delphi code would. Here are the results:

Language	High-Level Time	Assembler Time	Time Savings
C++	4.25	3.02	29%
Delphi	5.18	3.04	41%

The "before" picture in these measurements reflects the two languages' strengths at bit manipulation. The "after" picture looks virtually identical, and it appears that the assembler code has minimized the initial performance differences between Delphi and C++.

The assembler routine shows that rewriting in assembler doesn't have to produce a huge, ugly routine. Such routines are often quite modest, as this one is. Sometimes assembler code is almost as compact as its high-level-language equivalent.

A relatively easy and effective strategy for recoding in assembler is to start with a compiler that generates assembler listings as a byproduct of compilation. Extract the assembler code for the routine you need to tune, and save it in a separate source file. Using the compiler's assembler code as a base, hand-optimize the code, checking for correctness and measuring improvements at each step. Some compilers intersperse the high-level-language statements as comments in the assembler code. If yours does, you might keep them in the assembler code as documentation.

cc2e.com/2672

Checklist: Code-Tuning Techniques

Improve Both Speed and Size

Substitute table lookups for complicated logic.
Jam loops.
Use integer instead of floating-point variables.
Initialize data at compile time.
Use constants of the correct type.
Precompute results.
Eliminate common subexpressions.
Translate key routines to a low-level language.

Improve Speed Only

Stop testing when you know the answer.
Order tests in case statements and if-then-else chains by frequency.
Compare performance of similar logic structures.
Use lazy evaluation.
Unswitch loops that contain if tests.
Unroll loops.
Minimize work performed inside loops.
Use sentinels in search loops.
Put the busiest loop on the inside of nested loops.
Reduce the strength of operations performed inside loops.
Change multiple-dimension arrays to a single dimension.
Minimize array references.
Augment data types with indexes.
Cache frequently used values.
Exploit algebraic identities.
Reduce strength in logical and mathematical expressions.
Be wary of system routines.
Rewrite routines inline.

< Free Open Study >