26.4. Expressions | Code Complete: A Practical Handbook of Software Construction, Second Edition

< Free Open Study >

Much of the work in a program is done inside mathematical or logical expressions. Complicated expressions tend to be expensive, so this section looks at ways to make them cheaper.

Cross-Reference

For more information on expressions, see Section 19.1, "Boolean Expressions."

Exploit Algebraic Identities

You can use algebraic identities to replace costly operations with cheaper ones. For example, the following expressions are logically equivalent:

not a and not b not (a or b)

If you choose the second expression instead of the first, you can save a not operation.

Although the savings from avoiding a single not operation are probably inconsequential, the general principle is powerful. Jon Bentley describes a program that tested whether sqrt(x) < sqrt(y) (1982). Since sqrt(x) is less than sqrt(y) only when x is less than y, you can replace the first test with x < y. Given the cost of the sqrt() routine, you'd expect the savings to be dramatic, and they are. Here are the results:

Language	Straight Time	Code-Tuned Time	Time Savings	Performance Ratio
C++	7.43	0.010	99.9%	750:1
Visual Basic	4.59	0.220	95%	20:1
Python	4.21	0.401	90%	10:1

Use Strength Reduction

As mentioned earlier, strength reduction means replacing an expensive operation with a cheaper one. Here are some possible substitutions:

Replace multiplication with addition.
Replace exponentiation with multiplication.
Replace trigonometric routines with their trigonometric identities.
Replace longlong integers with longs or ints (but watch for performance issues associated with using native-length vs. non-native-length integers)
Replace floating-point numbers with fixed-point numbers or integers.
Replace double-precision floating points with single-precision numbers.
Replace integer multiplication-by-two and division-by-two with shift operations.

Suppose you have to evaluate a polynomial. If you're rusty on polynomials, they're the things that look like Ax² + Bx + C. The letters A, B, and C are coefficients, and x is a variable. General code to evaluate an nth-order polynomial looks like this:

Visual Basic Example of Evaluating a Polynomial

value = coefficient( 0 ) For power = 1 To order    value = value + coefficient( power ) * x^power Next

If you're thinking about strength reduction, you'll look at the exponentiation operator with a jaundiced eye. One solution would be to replace the exponentiation with a multiplication on each pass through the loop, which is analogous to the strength-reduction case a few sections ago in which a multiplication was replaced with an addition. Here's how the reduced-strength polynomial evaluation would look:

Visual Basic Example of a Reduced-Strength Method of Evaluating a Polynomial

value = coefficient( 0 ) powerOfX = x For power = 1 to order    value = value + coefficient( power ) * powerOfX    powerOfX = powerOfX * x Next

This produces a noticeable advantage if you're working with second-order polynomials that is, polynomials in which the highest-power term is squared or higher-order polynomials:

Language	Straight Time	Code-Tuned Time	Time Savings	Performance Ratio
Python	3.24	2.60	20%	1:1
Visual Basic	6.26	0.160	97%	40:1

If you're serious about strength reduction, you still won't care for those two floatingpoint multiplications. The strength-reduction principle suggests that you can further reduce the strength of the operations in the loop by accumulating powers rather than multiplying them each time:

Visual Basic Example of Further Reducing the Strength Required to Evaluate a Polynomial

value = 0 For power = order to 1 Step -1    value = ( value + coefficient( power ) ) * x Next value = value + coefficient( 0 )

This method eliminates the extra powerOfX variable and replaces the two multiplications in each pass through the loop with one. The results:

Language	Straight Time	First Optimization	Second Optimization	Savings over First Optimization
Python	3.24	2.60	2.53	3%
Visual Basic	6.26	0.16	0.31	-94%

This is a good example of theory not holding up very well to practice. The code with reduced strength seems like it should be faster, but it isn't. One possibility is that decrementing a loop by 1 instead of incrementing it by 1 in Visual Basic hurts performance, but you'd have to measure that hypothesis to be sure.

Initialize at Compile Time

If you're using a named constant or a magic number in a routine call and it's the only argument, that's a clue that you could precompute the number, put it into a constant, and avoid the routine call. The same principle applies to multiplications, divisions, additions, and other operations.

I once needed to compute the base-two logarithm of an integer, truncated to the nearest integer. The system didn't have a log-base-two routine, so I wrote my own. The quick and easy approach was to use this fact:

log(x)base = log(x) / log(base)

Given this identity, I could write a routine like this one:

C++ Example of a Log-Base-Two Routine Based on System Routines

unsigned int Log2( unsigned int x ) {    return (unsigned int) ( log( x ) / log( 2 ) ); }

Cross-Reference

For details on binding variables to their values, see Section 10.6, "Binding Time."

This routine was really slow, and because the value of log(2) never changed, I replaced log(2) with its computed value, 0.69314718, like this:

C++ Example of a Log-Base-Two Routine Based on a System Routine and a Constant

const double LOG2 = 0.69314718; ... unsigned int Log2( unsigned int x ) {    return (unsigned int) ( log( x ) / LOG2 ); }

Since log() tends to be an expensive routine much more expensive than type conversions or division you'd expect that cutting the calls to the log() function by half would cut the time required for the routine by about half. Here are the measured results:

Language	Straight Time	Code-Tuned Time	Time Savings
C++	9.66	5.97	38%
Java	17.0	12.3	28%
PHP	2.45	1.50	39%

In this case, the educated guess about the relative importance of the division and type conversions and the estimate of 50 percent were pretty close. Considering the predictability of the results described in this chapter, the accuracy of my prediction in this case proves only that even a blind squirrel finds a nut occasionally.

Be Wary of System Routines

System routines are expensive and provide accuracy that's often wasted. Typical system math routines, for example, are designed to put an astronaut on the moon within ±2 feet of the target. If you don't need that degree of accuracy, you don't need to spend the time to compute it either.

In the previous example, the Log2() routine returned an integer value but used a floating-point log() routine to compute it. That was overkill for an integer result, so after my first attempt, I wrote a series of integer tests that were perfectly accurate for calculating an integer log2. Here's the code:

C++ Example of a Log-Base-Two Routine Based on Integers

unsigned int Log2( unsigned int x ) {    if ( x < 2 ) return 0 ;    if ( x < 4 ) return 1 ;    if ( x < 8 ) return 2 ;    if ( x < 16 ) return 3 ;    if ( x < 32 ) return 4 ;    if ( x < 64 ) return 5 ;    if ( x < 128 ) return 6 ;    if ( x < 256 ) return 7 ;    if ( x < 512 ) return 8 ;    if ( x < 1024 ) return 9 ;    ...    if ( x < 2147483648 ) return 30;    return 31 ; }

This routine uses integer operations, never converts to floating point, and blows the doors off both floating-point versions:

Language	Straight Time	Code-Tuned Time	Time Savings	Performance Ratio
C++	9.66	0.662	93%	15:1
Java	17.0	0.882	95%	20:1
PHP	2.45	3.45	-41%	2:3

Most of the so-called "transcendental" functions are designed for the worst case that is, they convert to double-precision floating point internally even if you give them an integer argument. If you find one in a tight section of code and don't need that much accuracy, give it your immediate attention.

Another option is to take advantage of the fact that a right-shift operation is the same as dividing by two. The number of times you can divide a number by two and still have a nonzero value is the same as the log2 of that number. Here's how code based on that observation looks:

C++ Example of an Alternative Log-Base-Two Routine Based on the Right-Shift Operator

unsigned int Log2( unsigned int x ) {    unsigned int i = 0;    while ( ( x = ( x >> 1 ) ) != 0 ) {       i++;    }    return i ; }

To non-C++ programmers, this code is particularly hard to read. The complicated expression in the while condition is an example of a coding practice you should avoid unless you have a good reason to use it.

This routine takes about 350 percent longer than the longer version above, executing in 2.4 seconds rather than 0.66 seconds. But it's faster than the first approach, and it adapts easily to 32-bit, 64-bit, and other environments.

This example highlights the value of not stopping after one successful optimization. The first optimization earned a respectable 30 40 percent savings but had nowhere near the impact of the second or third optimizations.

Use the Correct Type of Constants

Use named constants and literals that are the same type as the variables they're assigned to. When a constant and its related variable are different types, the compiler has to do a type conversion to assign the constant to the variable. A good compiler does the type conversion at compile time so that it doesn't affect run-time performance.

A less advanced compiler or an interpreter generates code for a run-time conversion, so you might be stuck. Here are some differences in performance between the initializations of a floating-point variable x and an integer variable i in two cases. In the first case, the initializations look like this:

x = 5 i = 3.14

and require type conversions, assuming x is a floating point variable and i is an integer. In the second case, they look like this:

x = 3.14 i = 5

and don't require type conversions. Here are the results, and the variation among compilers is once again notable:

Language	Straight Time	Code-Tuned Time	Time Savings	Performance Ratio
C++	1.11	0.000	100%	not measurable
C#	1.49	1.48	<1%	1:1
Java	1.66	1.11	33%	1.5:1
Visual Basic	0.721	0.000	100%	not measurable
PHP	0.872	0.847	3%	1:1

Precompute Results

A common low-level design decision is the choice of whether to compute results on the fly or compute them once, save them, and look them up as needed. If the results are used many times, it's often cheaper to compute them once and look them up the rest of the time.

This choice manifests itself in several ways. At the simplest level, you might compute part of an expression outside a loop rather than inside. An example of this appeared earlier in the chapter. At a more complicated level, you might compute a lookup table once when program execution begins, using it every time thereafter, or you might store results in a data file or embed them in a program.

In a space-wars video game, for example, the programmers initially computed gravity coefficients for different distances from the sun. The computation for the gravity coefficients was expensive and affected performance. The program recognized relatively few distinct distances from the sun, however, so the programmers were able to precompute the gravity coefficients and store them in a 10-element array. The array lookup was much faster than the expensive computation.

Cross-Reference

For more on using data in tables instead of complex logic, see Chapter 18, "Table-Driven Methods."

Suppose you have a routine that computes payment amounts on automobile loans. The code for such a routine would look like this:

Java Example of a Complex Computation That Could Be Precomputed

double ComputePayment(    long loanAmount,    int months,    double interestRate    ) {    return loanAmount /       (       ( 1.0 - Math.pow( ( 1.0 + ( interestRate / 12.0 ) ), -months ) ) /       ( interestRate / 12.0 )       ); }

The formula for computing loan payments is complicated and fairly expensive. Putting the information into a table instead of computing it each time would probably be cheaper.

How big would the table be? The widest-ranging variable is loanAmount. The variable interestRate might range from 5 percent through 20 percent by quarter points, but that's only 61 distinct rates. months might range from 12 through 72, but that's only 61 distinct periods. loanAmount could conceivably range from $1000 through $100,000, which is more entries than you'd generally want to handle in a lookup table.

Most of the computation doesn't depend on loanAmount, however, so you can put the really ugly part of the computation (the denominator of the larger expression) into a table that's indexed by interestRate and months. You recompute the loanAmount part each time:

Java Example of Precomputing a Complex Computation

 double ComputePayment(    long loanAmount,    int months,    double interestRate    ) {    int interestIndex =       <-- 1       Math.round( ( interestRate - LOWEST_RATE ) * GRANULARITY * 100.00 );    return loanAmount / loanDivisor[ interestIndex ][ months ]; }

(1)The new variable interest-Index is created to provide a subscript into the loanDivisor array.

In this code, the hairy calculation has been replaced with the computation of an array index and a single array access. Here are the results of that change:

Language	Straight Time	Code-Tuned Time	Time Savings	Performance Ratio
Java	2.97	0.251	92%	10:1
Python	3.86	4.63	-20%	1:1

Depending on your circumstances, you would need to precompute the loanDivisor array at program initialization time or read it from a disk file. Alternatively, you could initialize it to 0, compute each element the first time it's requested, store it, and look it up each time it's requested subsequently. That would be a form of caching, discussed earlier.

You don't have to create a table to take advantage of the performance gains you can achieve by precomputing an expression. Code similar to the code in the previous examples raises the possibility of a different kind of precomputation. Suppose you have code that computes payments for many loan amounts, as shown here:

Java Example of a Second Complex Computation That Could Be Precomputed

 double ComputePayments(    int months,    double interestRate    ) {    for ( long loanAmount = MIN_LOAN_AMOUNT; loanAmount < MAX_LOAN_AMOUNT;       loanAmount++ ) {       payment = loanAmount / (          ( 1.0 - Math.pow( 1.0+(interestRate/12.0), - months ) ) /          ( interestRate/12.0 )          );       ...       <-- 1    } }

(1)The following code would do something with payment here; for this example's point, it doesn't matter what.

Even without precomputing a table, you can precompute the complicated part of the expression outside the loop and use it inside the loop. Here's how it would look:

Java Example of Precomputing the Second Complex Computation

 double ComputePayments(    int months,    double interestRate    ) {    long loanAmount;    double divisor = ( 1.0 - Math.pow( 1.0+(interestRate/12.0). - months ) ) /       <-- 1       ( interestRate/12.0 );       <-- 1    for ( long loanAmount = MIN_LOAN_AMOUNT; loanAmount <= MAX_LOAN_AMOUNT;       loanAmount++ ) {       payment = loanAmount / divisor;       ...    } }

(1)Here's the part that's precomputed.

This is similar to the techniques suggested earlier of putting array references and pointer dereferences outside a loop. The results for Java in this case are comparable to the results of using the precomputed table in the first optimization:

Language	Straight Time	Code-Tuned Time	Time Savings	Performance Ratio
Java	7.43	0.24	97%	30:1
Python	5.00	1.69	66%	3:1

Python improved here, but not in the first optimization attempt. Many times when one optimization does not produce the desired results, a seemingly similar optimization will work as expected.

Optimizing a program by precomputation can take several forms:

Computing results before the program executes, and wiring them into constants that are assigned at compile time
Computing results before the program executes, and hard-coding them into variables used at run time
Computing results before the program executes, and putting them into a file that's loaded at run time
Computing results once, at program startup, and then referencing them each time they're needed
Computing as much as possible before a loop begins, minimizing the work done inside the loop
Computing results the first time they're needed, and storing them so that you can retrieve them when they're needed again

Eliminate Common Subexpressions

If you find an expression that's repeated several times, assign it to a variable and refer to the variable rather than recomputing the expression in several places. The loan-calculation example has a common subexpression that you could eliminate. This is the original code:

Java Example of a Common Subexpression

payment = loanAmount / (       ( 1.0 -- Math.pow( 1.0 + ( interestRate / 12.0 ), -months ) ) /       ( interestRate / 12.0 )    );

In this sample, you can assign interestRate/12.0 to a variable that is then referenced twice rather than computing the expression twice. If you have chosen the variable name well, this optimization can improve the code's readability at the same time that it improves performance. This is the revised code:

Java Example of Eliminating a Common Subexpression

monthlyInterest = interestRate / 12.0; payment = loanAmount / (       ( 1.0 -- Math.pow( 1.0 + monthlyInterest, -months ) ) /       monthlyInterest    );

The savings in this case don't seem impressive:

Language	Straight Time	Code-Tuned Time	Time Savings
Java	2.94	2.83	4%
Python	3.91	3.94	-1%

It appears that the Math.pow() routine is so costly that it overshadows the savings from subexpression elimination. Or possibly the subexpression is already being eliminated by the compiler. If the subexpression were a bigger part of the cost of the whole expression or if the compiler optimizer were less effective, the optimization might have more impact.

< Free Open Study >