# 26.4. Expressions

 < Free Open Study >

### 26.4. Expressions

Much of the work in a program is done inside mathematical or logical expressions. Complicated expressions tend to be expensive, so this section looks at ways to make them cheaper.

Cross-Reference

#### Exploit Algebraic Identities

You can use algebraic identities to replace costly operations with cheaper ones. For example, the following expressions are logically equivalent:

```not a and not b

not (a or b)
```

If you choose the second expression instead of the first, you can save a not operation.

Although the savings from avoiding a single not operation are probably inconsequential, the general principle is powerful. Jon Bentley describes a program that tested whether sqrt(x) < sqrt(y) (1982). Since sqrt(x) is less than sqrt(y) only when x is less than y , you can replace the first test with x < y . Given the cost of the sqrt() routine, you'd expect the savings to be dramatic, and they are. Here are the results:

Language

Straight Time

Code-Tuned Time

Time Savings

Performance Ratio

C++

7.43

0.010

99.9%

750:1

Visual Basic

4.59

0.220

95%

20:1

Python

4.21

0.401

90%

10:1

#### Use Strength Reduction

As mentioned earlier, strength reduction means replacing an expensive operation with a cheaper one. Here are some possible substitutions:

• Replace exponentiation with multiplication.

• Replace trigonometric routines with their trigonometric identities.

• Replace longlong integers with long s or int s (but watch for performance issues associated with using native-length vs. non-native-length integers)

• Replace floating-point numbers with fixed-point numbers or integers.

• Replace double-precision floating points with single-precision numbers.

• Replace integer multiplication-by-two and division-by-two with shift operations.

Suppose you have to evaluate a polynomial. If you're rusty on polynomials , they're the things that look like A x 2 + B x + C. The letters A , B , and C are coefficients, and x is a variable. General code to evaluate an n th-order polynomial looks like this:

##### Visual Basic Example of Evaluating a Polynomial
```value = coefficient( 0 )

For power = 1 To order

value = value + coefficient( power ) * x^power

Next
```

If you're thinking about strength reduction, you'll look at the exponentiation operator with a jaundiced eye. One solution would be to replace the exponentiation with a multiplication on each pass through the loop, which is analogous to the strength-reduction case a few sections ago in which a multiplication was replaced with an addition. Here's how the reduced-strength polynomial evaluation would look:

##### Visual Basic Example of a Reduced-Strength Method of Evaluating a Polynomial
```value = coefficient( 0 )

powerOfX = x

For power = 1 to order

value = value + coefficient( power ) * powerOfX

powerOfX = powerOfX * x

Next
```

This produces a noticeable advantage if you're working with second-order polynomials—that is, polynomials in which the highest-power term is squared—or higher-order polynomials:

Language

Straight Time

Code-Tuned Time

Time Savings

Performance Ratio

Python

3.24

2.60

20%

1:1

Visual Basic

6.26

0.160

97%

40:1

If you're serious about strength reduction, you still won't care for those two floatingpoint multiplications. The strength-reduction principle suggests that you can further reduce the strength of the operations in the loop by accumulating powers rather than multiplying them each time:

##### Visual Basic Example of Further Reducing the Strength Required to Evaluate a Polynomial
```value = 0

For power = order to 1 Step -1

value = ( value + coefficient( power ) ) * x

Next

value = value + coefficient( 0 )
```

This method eliminates the extra powerOfX variable and replaces the two multiplications in each pass through the loop with one. The results:

Language

Straight Time

First Optimization

Second Optimization

Savings over First Optimization

Python

3.24

2.60

2.53

3%

Visual Basic

6.26

0.16

0.31

-94%

This is a good example of theory not holding up very well to practice. The code with reduced strength seems like it should be faster, but it isn't. One possibility is that decrementing a loop by 1 instead of incrementing it by 1 in Visual Basic hurts performance, but you'd have to measure that hypothesis to be sure.

#### Initialize at Compile Time

If you're using a named constant or a magic number in a routine call and it's the only argument, that's a clue that you could precompute the number, put it into a constant, and avoid the routine call. The same principle applies to multiplications, divisions, additions, and other operations.

I once needed to compute the base-two logarithm of an integer, truncated to the nearest integer. The system didn't have a log-base-two routine, so I wrote my own. The quick and easy approach was to use this fact:

```log(x)base = log(x) / log(base)
```

Given this identity, I could write a routine like this one:

##### C++ Example of a Log-Base-Two Routine Based on System Routines
```unsigned int Log2( unsigned int x ) {

return (unsigned int) ( log( x ) / log( 2 ) );

}
```

Cross-Reference

For details on binding variables to their values, see Section 10.6, "Binding Time."

This routine was really slow, and because the value of log(2) never changed, I replaced log(2) with its computed value, 0.69314718 , like this:

##### C++ Example of a Log-Base-Two Routine Based on a System Routine and a Constant
```const double LOG2 = 0.69314718;

...

unsigned int Log2( unsigned int x ) {

return (unsigned int) ( log( x ) / LOG2 );

}
```

Since log() tends to be an expensive routine—much more expensive than type conversions or division—you'd expect that cutting the calls to the log() function by half would cut the time required for the routine by about half. Here are the measured results:

Language

Straight Time

Code-Tuned Time

Time Savings

C++

9.66

5.97

38%

Java

17.0

12.3

28%

PHP

2.45

1.50

39%

In this case, the educated guess about the relative importance of the division and type conversions and the estimate of 50 percent were pretty close. Considering the predictability of the results described in this chapter, the accuracy of my prediction in this case proves only that even a blind squirrel finds a nut occasionally.

#### Be Wary of System Routines

System routines are expensive and provide accuracy that's often wasted . Typical system math routines, for example, are designed to put an astronaut on the moon within ±2 feet of the target. If you don't need that degree of accuracy, you don't need to spend the time to compute it either.

In the previous example, the Log2() routine returned an integer value but used a floating-point log() routine to compute it. That was overkill for an integer result, so after my first attempt, I wrote a series of integer tests that were perfectly accurate for calculating an integer log2. Here's the code:

##### C++ Example of a Log-Base-Two Routine Based on Integers
```unsigned int Log2( unsigned int x ) {

if ( x < 2 ) return 0 ;

if ( x < 4 ) return 1 ;

if ( x < 8 ) return 2 ;

if ( x < 16 ) return 3 ;

if ( x < 32 ) return 4 ;

if ( x < 64 ) return 5 ;

if ( x < 128 ) return 6 ;

if ( x < 256 ) return 7 ;

if ( x < 512 ) return 8 ;

if ( x < 1024 ) return 9 ;

...

if ( x < 2147483648 ) return 30;

return 31 ;

}
```

This routine uses integer operations, never converts to floating point, and blows the doors off both floating-point versions:

Language

Straight Time

Code-Tuned Time

Time Savings

Performance Ratio

C++

9.66

0.662

93%

15:1

Java

17.0

0.882

95%

20:1

PHP

2.45

3.45

-41%

2:3

Most of the so-called " transcendental " functions are designed for the worst case—that is, they convert to double-precision floating point internally even if you give them an integer argument. If you find one in a tight section of code and don't need that much accuracy, give it your immediate attention.

Another option is to take advantage of the fact that a right-shift operation is the same as dividing by two. The number of times you can divide a number by two and still have a nonzero value is the same as the log2 of that number. Here's how code based on that observation looks:

## C++ Example of an Alternative Log-Base-Two Routine Based on the Right-Shift Operator

```unsigned int Log2( unsigned int x ) {

unsigned int i = 0;

while ( ( x = ( x >> 1 ) ) != 0 ) {

i++;

}

return i ;

}
```

To non-C++ programmers, this code is particularly hard to read. The complicated expression in the while condition is an example of a coding practice you should avoid unless you have a good reason to use it.

This routine takes about 350 percent longer than the longer version above, executing in 2.4 seconds rather than 0.66 seconds. But it's faster than the first approach, and it adapts easily to 32-bit, 64-bit, and other environments.

 This example highlights the value of not stopping after one successful optimization. The first optimization earned a respectable 30-40 percent savings but had nowhere near the impact of the second or third optimizations.

#### Use the Correct Type of Constants

Use named constants and literals that are the same type as the variables they're assigned to. When a constant and its related variable are different types, the compiler has to do a type conversion to assign the constant to the variable. A good compiler does the type conversion at compile time so that it doesn't affect run-time performance.

A less advanced compiler or an interpreter generates code for a run-time conversion, so you might be stuck. Here are some differences in performance between the initializations of a floating-point variable x and an integer variable i in two cases. In the first case, the initializations look like this:

```x = 5

i = 3.14
```

and require type conversions, assuming x is a floating point variable and i is an integer. In the second case, they look like this:

```x = 3.14

i = 5
```

and don't require type conversions. Here are the results, and the variation among compilers is once again notable:

Language

Straight Time

Code-Tuned Time

Time Savings

Performance Ratio

C++

1.11

0.000

100%

not measurable

C#

1.49

1.48

<1%

1:1

Java

1.66

1.11

33%

1.5:1

Visual Basic

0.721

0.000

100%

not measurable

PHP

0.872

0.847

3%

1:1

#### Precompute Results

A common low-level design decision is the choice of whether to compute results on the fly or compute them once, save them, and look them up as needed. If the results are used many times, it's often cheaper to compute them once and look them up the rest of the time.

This choice manifests itself in several ways. At the simplest level, you might compute part of an expression outside a loop rather than inside. An example of this appeared earlier in the chapter. At a more complicated level, you might compute a lookup table once when program execution begins, using it every time thereafter, or you might store results in a data file or embed them in a program.

In a space-wars video game, for example, the programmers initially computed gravity coefficients for different distances from the sun. The computation for the gravity coefficients was expensive and affected performance. The program recognized relatively few distinct distances from the sun, however, so the programmers were able to precompute the gravity coefficients and store them in a 10-element array. The array lookup was much faster than the expensive computation.

Cross-Reference

For more on using data in tables instead of complex logic, see Chapter 18, "Table-Driven Methods."

Suppose you have a routine that computes payment amounts on automobile loans. The code for such a routine would look like this:

##### Java Example of a Complex Computation That Could Be Precomputed
```double ComputePayment(

long loanAmount,

int months,

double interestRate

) {

return loanAmount /

(

( 1.0 - Math.pow( ( 1.0 + ( interestRate / 12.0 ) ), -months ) ) /

( interestRate / 12.0 )

);

}
```

The formula for computing loan payments is complicated and fairly expensive. Putting the information into a table instead of computing it each time would probably be cheaper.

How big would the table be? The widest- ranging variable is loanAmount . The variable interestRate might range from 5 percent through 20 percent by quarter points, but that's only 61 distinct rates. months might range from 12 through 72, but that's only 61 distinct periods. loanAmount could conceivably range from \$1000 through \$100,000, which is more entries than you'd generally want to handle in a lookup table.

Most of the computation doesn't depend on loanAmount , however, so you can put the really ugly part of the computation (the denominator of the larger expression) into a table that's indexed by interestRate and months . You recompute the loanAmount part each time:

##### Java Example of Precomputing a Complex Computation
```double ComputePayment(

long loanAmount,

int months,

double interestRate

) {

int interestIndex =

<-- 1

Math.round( ( interestRate - LOWEST_RATE ) * GRANULARITY * 100.00 );

return loanAmount / loanDivisor[ interestIndex ][ months ];

}
```

(1) The new variable interest-Index is created to provide a subscript into the loanDivisor array.

In this code, the hairy calculation has been replaced with the computation of an array index and a single array access. Here are the results of that change:

Language

Straight Time

Code-Tuned Time

Time Savings

Performance Ratio

Java

2.97

0.251

92%

10:1

Python

3.86

4.63

-20%

1:1

Depending on your circumstances, you would need to precompute the loanDivisor array at program initialization time or read it from a disk file. Alternatively, you could initialize it to , compute each element the first time it's requested, store it, and look it up each time it's requested subsequently. That would be a form of caching, discussed earlier.

You don't have to create a table to take advantage of the performance gains you can achieve by precomputing an expression. Code similar to the code in the previous examples raises the possibility of a different kind of precomputation. Suppose you have code that computes payments for many loan amounts, as shown here:

##### Java Example of a Second Complex Computation That Could Be Precomputed
```double ComputePayments(

int months,

double interestRate

) {

for ( long loanAmount = MIN_LOAN_AMOUNT; loanAmount < MAX_LOAN_AMOUNT;

loanAmount++ ) {

payment = loanAmount / (

( 1.0 - Math.pow( 1.0+(interestRate/12.0), - months ) ) /

( interestRate/12.0 )

);

...

<-- 1

}

}
```

(1) The following code would do something with payment here; for this example's point, it doesn't matter what.

Even without precomputing a table, you can precompute the complicated part of the expression outside the loop and use it inside the loop. Here's how it would look:

##### Java Example of Precomputing the Second Complex Computation
```double ComputePayments(

int months,

double interestRate

) {

long loanAmount;

double divisor = ( 1.0 - Math.pow( 1.0+(interestRate/12.0). - months ) ) /

<-- 1

( interestRate/12.0 );

<-- 1

for ( long loanAmount = MIN_LOAN_AMOUNT; loanAmount <= MAX_LOAN_AMOUNT;

loanAmount++ ) {

payment = loanAmount / divisor;

...

}

}
```

(1) Here's the part that's precomputed.

This is similar to the techniques suggested earlier of putting array references and pointer dereferences outside a loop. The results for Java in this case are comparable to the results of using the precomputed table in the first optimization:

Language

Straight Time

Code-Tuned Time

Time Savings

Performance Ratio

Java

7.43

0.24

97%

30:1

Python

5.00

1.69

66%

3:1

Python improved here, but not in the first optimization attempt. Many times when one optimization does not produce the desired results, a seemingly similar optimization will work as expected.

Optimizing a program by precomputation can take several forms:

• Computing results before the program executes, and wiring them into constants that are assigned at compile time

• Computing results before the program executes, and hard-coding them into variables used at run time

• Computing results before the program executes, and putting them into a file that's loaded at run time

• Computing results once, at program startup, and then referencing them each time they're needed

• Computing as much as possible before a loop begins, minimizing the work done inside the loop

• Computing results the first time they're needed, and storing them so that you can retrieve them when they're needed again

#### Eliminate Common Subexpressions

If you find an expression that's repeated several times, assign it to a variable and refer to the variable rather than recomputing the expression in several places. The loan-calculation example has a common subexpression that you could eliminate. This is the original code:

##### Java Example of a Common Subexpression
```payment = loanAmount / (

( 1.0 -- Math.pow( 1.0 + ( interestRate / 12.0 ), -months ) ) /

( interestRate / 12.0 )

);
```

In this sample, you can assign interestRate/12.0 to a variable that is then referenced twice rather than computing the expression twice. If you have chosen the variable name well, this optimization can improve the code's readability at the same time that it improves performance. This is the revised code:

##### Java Example of Eliminating a Common Subexpression
```monthlyInterest = interestRate / 12.0;

payment = loanAmount / (

( 1.0 -- Math.pow( 1.0 + monthlyInterest, -months ) ) /

monthlyInterest

);
```

The savings in this case don't seem impressive:

Language

Straight Time

Code-Tuned Time

Time Savings

Java

2.94

2.83

4%

Python

3.91

3.94

-1%

It appears that the Math.pow() routine is so costly that it overshadows the savings from subexpression elimination . Or possibly the subexpression is already being eliminated by the compiler. If the subexpression were a bigger part of the cost of the whole expression or if the compiler optimizer were less effective, the optimization might have more impact.

 < Free Open Study >