26.2. Loops | Code Complete: A Practical Handbook of Software Construction, Second Edition

< Free Open Study >

Because loops are executed many times, the hot spots in a program are often inside loops. The techniques in this section make the loop itself faster.

Cross-Reference

For other details on loops, see Chapter 16, "Controlling Loops."

Unswitching

Switching refers to making a decision inside a loop every time it's executed. If the decision doesn't change while the loop is executing, you can unswitch the loop by making the decision outside the loop. Usually this requires turning the loop inside out, putting loops inside the conditional rather than putting the conditional inside the loop. Here's an example of a loop before unswitching:

C++ Example of a Switched Loop

for ( i = 0; i < count; i++ ) {    if ( sumType == SUMTYPE_NET ) {       netSum = netSum + amount[ i ];    }    else {       grossSum = grossSum + amount[ i ];    } }

In this code, the test if ( sumType == SUMTYPE_NET ) is repeated through each iteration, even though it'll be the same each time through the loop. You can rewrite the code for a speed gain this way:

C++ Example of an Unswitched Loop

if ( sumType == SUMTYPE_NET ) {    for ( i = 0; i < count; i++ ) {       netSum = netSum + amount[ i ];    } } else {    for ( i = 0; i < count; i++ ) {       grossSum = grossSum + amount[ i ];    } }

Note

This code fragment violates several rules of good programming. Readability and maintenance are usually more important than execution speed or size, but in this chapter the topic is performance, and that implies a tradeoff with the other objectives. As in the last chapter, you'll see examples of coding practices here that aren't recommended in other parts of this book.

This is good for about a 20 percent time savings:

Language	Straight Time	Code-Tuned Time	Time Savings
C++	2.81	2.27	19%
Java	3.97	3.12	21%
Visual Basic	2.78	2.77	<1%
Python	8.14	5.87	28%

A hazard distinct to this case is that the two loops have to be maintained in parallel. If count changes to clientCount, you have to remember to change it in both places, which is an annoyance for you and a maintenance headache for anyone else who has to work with the code.

This example also illustrates a key challenge in code tuning: the effect of any specific code tuning is not predictable. The code tuning produced significant improvements in three of the four languages but not in Visual Basic. To perform this specific optimization in this specific version of Visual Basic would produce less maintainable code without any offsetting gain in performance. The general lesson is that you must measure the effect of each specific optimization to be sure of its effect no exceptions.

Jamming

Jamming, or "fusion," is the result of combining two loops that operate on the same set of elements. The gain lies in cutting the loop overhead from two loops to one. Here's a candidate for loop jamming:

Visual Basic Example of Separate Loops That Could Be Jammed

For i = 0 to employeeCount - 1    employeeName( i ) = "" Next ... For i = 0 to employeeCount - 1    employeeEarnings( i ) = 0 Next

When you jam loops, you find code in two loops that you can combine into one. Usually, that means the loop counters have to be the same. In this example, both loops run from 0 to employeeCount - 1, so you can jam them:

Visual Basic Example of a Jammed Loop

For i = 0 to employeeCount - 1    employeeName( i ) = ""    employeeEarnings( i ) = 0 Next

Here are the savings:

Language	Straight Time	Code-Tuned Time	Time Savings
C++	3.68	2.65	28%
PHP	3.97	2.42	32%
Visual Basic	3.75	3.56	4%
Note: Benchmarked for the case in which employeeCount equals 100.

As before, the results vary significantly among languages.

Loop jamming has two main hazards. First, the indexes for the two parts that have been jammed might change so that they're no longer compatible. Second, you might not be able to combine the loops easily. Before you combine the loops, make sure they'll still be in the right order with respect to the rest of the code.

Unrolling

The goal of loop unrolling is to reduce the amount of loop housekeeping. In Chapter 25, a loop was completely unrolled and 10 lines of code were shown to be faster than 3. In that case, the loop that went from 3 to 10 lines was unrolled so that all 10 array accesses were done individually.

Although completely unrolling a loop is a fast solution and works well when you're dealing with a small number of elements, it's not practical when you have a large number of elements or when you don't know in advance how many elements you'll have. Here's an example of a general loop:

Java Example of a Loop That Can Be Unrolled

 i = 0;       <-- 1 while ( i < count ) {    a[ i ] = i;    i = i + 1; }

(1)Normally, you'd probably use a for loop for a job like this, but to optimize, you'd have to convert to a while loop. For clarity, a while loop is shown here.

To unroll the loop partially, you handle two or more cases in each pass through the loop instead of one. This unrolling hurts readability but doesn't hurt the generality of the loop. Here's the loop unrolled once:

Java Example of a Loop That's Been Unrolled Once

 i = 0; while ( i < count - 1 ) {    a[ i ] = i;    a[ i + 1 ] = i + 1;    i = i + 2; } if ( i == count ) {       <-- 1    a[ count - 1 ] = count - 1;| }       <-- 1

(1)These lines pick up the case that might fall through the cracks if the loop went by twos instead of by ones.

The technique replaced the original a[ i ] = i line with two lines, and i is incremented by 2 rather than by 1. The extra code after the while loop is needed when count is odd and the loop has one iteration left after the loop terminates.

When five lines of straightforward code expand to nine lines of tricky code, the code becomes harder to read and maintain. Except for the gain in speed, its quality is poor. Part of any design discipline, however, is making necessary tradeoffs. So, even though a particular technique generally represents poor coding practice, specific circumstances might make it the best one to use.

Here are the results of unrolling the loop:

Language	Straight Time	Code-Tuned Time	Time Savings
C++	1.75	1.15	34%
Java	1.01	0.581	43%
PHP	5.33	4.49	16%
Python	2.51	3.21	-27%
Note: Benchmarked for the case in which count equals 100.

A gain of 16 to 43 percent is respectable, although again you have to watch out for hurting performance, as the Python benchmark shows. The main hazard of loop unrolling is an off-by-one error in the code after the loop that picks up the last case.

What if you unroll the loop even further, going for two or more unrollings? Do you get more benefit if you unroll a loop twice?

Java Example of a Loop That's Been Unrolled Twice

i = 0; while ( i < count - 2 ) {    a[ i ] = i;    a[ i + 1 ] = i+1;    a[ i + 2 ] = i+2;    i = i + 3; } if ( i <= count - 1 ) {    a[ count - 1 ] = count - 1; } if ( i == count - 2 ) {    a[ count -2 ] = count - 2; }

Here are the results of unrolling the loop the second time:

Language	Straight Time	Double Unrolled Time	Time Savings
C++	1.75	1.01	42%
Java	1.01	0.581	43%
PHP	5.33	3.70	31%
Python	2.51	2.79	-12%
Note: Benchmarked for the case in which count equals 100.

The results indicate that further loop unrolling can result in further time savings, but not necessarily so, as the Java measurement shows. The main concern is how Byzantine your code becomes. When you look at the previous code, you might not think it looks incredibly complicated, but when you realize that it started life a couple of pages ago as a five-line loop, you can appreciate the tradeoff between performance and readability.

Minimizing the Work Inside Loops

One key to writing effective loops is to minimize the work done inside a loop. If you can evaluate a statement or part of a statement outside a loop so that only the result is used inside the loop, do so. It's good programming practice, and in some cases it improves readability.

Suppose you have a complicated pointer expression inside a hot loop that looks like this:

C++ Example of a Complicated Pointer Expression Inside a Loop

for ( i = 0; i < rateCount; i++ ) {    netRate[ i ] = baseRate[ i ] * rates->discounts->factors->net; }

In this case, assigning the complicated pointer expression to a well-named variable improves readability and often improves performance.

C++ Example of Simplifying a Complicated Pointer Expression

quantityDiscount = rates->discounts->factors->net; for ( i = 0; i < rateCount; i++ ) {    netRate[ i ] = baseRate[ i ] * quantityDiscount; }

The extra variable, quantityDiscount, makes it clear that the baseRate array is being multiplied by a quantity-discount factor to compute the net rate. That wasn't at all clear from the original expression in the loop. Putting the complicated pointer expression into a variable outside the loop also saves the pointer from being dereferenced three times for each pass through the loop, resulting in the following savings:

Language	Straight Time	Code-Tuned Time	Time Savings
C++	3.69	2.97	19%
C#	2.27	1.97	13%
Java	4.13	2.35	43%
Note: Benchmarked for the case in which rateCount equals 100.

Except for the Java compiler, the savings aren't anything to crow about, implying that during initial coding you can use whichever technique is more readable without worrying about the speed of the code until later.

Sentinel Values

When you have a loop with a compound test, you can often save time by simplifying the test. If the loop is a search loop, one way to simplify the test is to use a sentinel value, a value that you put just past the end of the search range and that's guaranteed to terminate the search.

The classic example of a compound test that can be improved by use of a sentinel is the search loop that checks both whether it has found the value it's seeking and whether it has run out of values. Here's the code:

C# Example of Compound Tests in a Search Loop

 found = FALSE; i = 0; while ( ( !found ) && ( i < count ) ) {       <-- 1    if ( item[ i ] == testValue ) {       found = TRUE;    }    else {       i++;    } } if ( found ) {    ...

(1)Here's the compound test.

In this code, each iteration of the loop tests for !found and for i < count. The purpose of the !found test is to determine when the desired element has been found. The purpose of the i < count test is to avoid running past the end of the array. Inside the loop, each value of item[] is tested individually, so the loop really has three tests for each iteration.

In this kind of search loop, you can combine the three tests so that you test only once per iteration by putting a "sentinel" at the end of the search range to stop the loop. In this case, you can simply assign the value you're looking for to the element just beyond the end of the search range. (Remember to leave space for that element when you declare the array.) You then check each element, and if you don't find the element until you find the one you stuck at the end, you know that the value you're looking for isn't really there. Here's the code:

C# Example of Using a Sentinel Value to Speed Up a Loop

 // set sentinel value, preserving the original value initialValue = item[ count ]; item[ count ] = testValue;       <-- 1 i = 0; while ( item[ i ] != testValue ) {    i++; } // check if value was found if ( i < count ) {    ...

(1)Remember to allow space for the sentinel value at the end of the array.

When item is an array of integers, the savings can be dramatic:

Language	Straight Time	Code-Tuned Time	Time Savings	Performance Ratio
C#	0.771	0.590	23%	1.3:1
Java	1.63	0.912	44%	2:1
Visual Basic	1.34	0.470	65%	3:1
Note: Search is of a 100-element array of integers.

The Visual Basic results are particularly dramatic, but all the results are good. When the kind of array changes, however, the results also change. When item is an array of single-precision floating-point numbers, the results are as follows:

Language	Straight Time	Code-Tuned Time	Time Savings
C#	1.351	1.021	24%
Java	1.923	1.282	33%
Visual Basic	1.752	1.011	42%
Note: Search is of a 100-element array of 4-byte floating-point numbers.

As usual, the results vary significantly.

The sentinel technique can be applied to virtually any situation in which you use a linear search to linked lists as well as arrays. The only caveats are that you must choose the sentinel value carefully and that you must be careful about how you put the sentinel value into the data structure.

Putting the Busiest Loop on the Inside

When you have nested loops, think about which loop you want on the outside and which you want on the inside. Following is an example of a nested loop that can be improved:

Java Example of a Nested Loop That Can Be Improved

for ( column = 0; column < 100; column++ ) {    for ( row = 0; row < 5; row++ ) {       sum = sum + table[ row ][ column ];    } }

The key to improving the loop is that the outer loop executes much more often than the inner loop. Each time the loop executes, it has to initialize the loop index, increment it on each pass through the loop, and check it after each pass. The total number of loop executions is 100 for the outer loop and 100 * 5 = 500 for the inner loop, for a total of 600 iterations. By merely switching the inner and outer loops, you can change the total number of iterations to 5 for the outer loop and 5 * 100 = 500 for the inner loop, for a total of 505 iterations. Analytically, you'd expect to save about (600 505) / 600 = 16 percent by switching the loops. Here's the measured difference in performance:

Language	Straight Time	Code-Tuned Time	Time Savings
C++	4.75	3.19	33%
Java	5.39	3.56	34%
PHP	4.16	3.65	12%
Python	3.48	3.33	4%

The results vary significantly, which shows once again that you have to measure the effect in your particular environment before you can be sure your optimization will help.

Strength Reduction

Reducing strength means replacing an expensive operation such as multiplication with a cheaper operation such as addition. Sometimes you'll have an expression inside a loop that depends on multiplying the loop index by a factor. Addition is usually faster than multiplication, and if you can compute the same number by adding the amount on each iteration of the loop rather than by multiplying, the code will typically run faster. Here's an example of code that uses multiplication:

Visual Basic Example of Multiplying a Loop Index

For i = 0 to saleCount - 1    commission( i ) = (i + 1) * revenue * baseCommission * discount Next

This code is straightforward but expensive. You can rewrite the loop so that you accumulate multiples rather than computing them each time. This reduces the strength of the operations from multiplication to addition.

Visual Basic Example of Adding Rather Than Multiplying

incrementalCommission = revenue * baseCommission * discount cumulativeCommission = incrementalCommission For i = 0 to saleCount - 1    commission( i ) = cumulativeCommission    cumulativeCommission = cumulativeCommission + incrementalCommission Next

Multiplication is expensive, and this kind of change is like a manufacturer's coupon that gives you a discount on the cost of the loop. The original code incremented i each time and multiplied it by revenue * baseCommission * discount first by 1, then by 2, then by 3, and so on. The optimized code sets incrementalCommission equal to revenue * baseCommission * discount. It then adds incrementalCommission to cumulativeCommission on each pass through the loop. On the first pass, it's been added once; on the second pass, it's been added twice; on the third pass, it's been added three times; and so on. The effect is the same as multiplying incrementalCommission by 1, then by 2, then by 3, and so on, but it's cheaper.

The key is that the original multiplication has to depend on the loop index. In this case, the loop index was the only part of the expression that varied, so the expression could be recoded more economically. Here's how much the rewrite helped in some test cases:

Language	Straight Time	Code-Tuned Time	Time Savings
C++	4.33	3.80	12%
Visual Basic	3.54	1.80	49%
Note: Benchmark performed with saleCount equals 20. All computed variables are floating point.

< Free Open Study >