16.2. Controlling the Loop | Code Complete: A Practical Handbook of Software Construction, Second Edition

< Free Open Study >

What can go wrong with a loop? Any answer would have to include incorrect or omitted loop initialization, omitted initialization of accumulators or other variables related to the loop, improper nesting, incorrect termination of the loop, forgetting to increment a loop variable or incrementing the variable incorrectly, and indexing an array element from a loop index incorrectly.

You can forestall these problems by observing two practices. First, minimize the number of factors that affect the loop. Simplify! Simplify! Simplify! Second, treat the inside of the loop as if it were a routine keep as much of the control as possible outside the loop. Explicitly state the conditions under which the body of the loop is to be executed. Don't make the reader look inside the loop to understand the loop control. Think of a loop as a black box: the surrounding program knows the control conditions but not the contents.

C++ Example of Treating a Loop as a Black Box

 while ( !inputFile.EndOfFile() && moreDataAvailable ) {  }

Cross-Reference

If you use the while ( true )-break technique described earlier, the exit condition is inside the black box. Even if you use only one exit condition, you lose the benefit of treating the loop as a black box.

What are the conditions under which this loop terminates? Clearly, all you know is that either inputFile.EndOfFile() becomes true or MoreDataAvailable becomes false.

Entering the Loop

Use these guidelines when entering a loop:

Enter the loop from one location only A variety of loop-control structures allows you to test at the beginning, middle, or end of a loop. These structures are rich enough to allow you to enter the loop from the top every time. You don't need to enter at multiple locations.

Put initialization code directly before the loop The Principle of Proximity advocates putting related statements together. If related statements are strewn across a routine, it's easy to overlook them during modification and to make the modifications incorrectly. If related statements are kept together, it's easier to avoid errors during modification.

Keep loop-initialization statements with the loop they're related to. If you don't, you're more likely to cause errors when you generalize the loop into a bigger loop and forget to modify the initialization code. The same kind of error can occur when you move or copy the loop code into a different routine without moving or copying its initialization code. Putting initializations away from the loop in the data-declaration section or in a housekeeping section at the top of the routine that contains the loop invites initialization troubles.

Cross-Reference

For more on limiting the scope of loop variables, see "Limit the scope of loop-index variables to the loop itself" later in this chapter.

Use while( true ) for infinite loops You might have a loop that runs without terminating for example, a loop in firmware such as a pacemaker or a microwave oven. Or you might have a loop that terminates only in response to an event an "event loop." You could code such an infinite loop in several ways. Faking an infinite loop with a statement like for i = 1 to 99999 is a poor choice because the specific loop limits muddy the intent of the loop 99999 could be a legitimate value. Such a fake infinite loop can also break down under maintenance.

The while( true ) idiom is considered a standard way of writing an infinite loop in C++, Java, Visual Basic, and other languages that support comparable structures. Some programmers prefer to use for( ;; ), which is an accepted alternative.

Prefer for loops when they're appropriate The for loop packages loop-control code in one place, which makes for easily readable loops. One mistake programmers commonly make when modifying software is changing the loop-initialization code at the top of a while loop but forgetting to change related code at the bottom. In a for loop, all the relevant code is together at the top of the loop, which makes correct modifications easier. If you can use the for loop appropriately instead of another kind of loop, do it.

Don't use a for loop when a while loop is more appropriate A common abuse of the flexible for loop structure in C++, C#, and Java is haphazardly cramming the contents of a while loop into a for loop header. The following example shows a while loop crammed into a for loop header:

C++ Example of a while Loop Abusively Crammed into a for Loop Header

// read all the records from a file for ( inputFile.MoveToStart(), recordCount = 0; !inputFile .EndOfFile();    recordCount++ ) {    inputFile.GetRecord(); }

The advantage of C++'s for loop over for loops in other languages is that it's more flexible about the kinds of initialization and termination information it can use. The weakness inherent in such flexibility is that you can put statements into the loop header that have nothing to do with controlling the loop.

Reserve the for loop header for loop-control statements statements that initialize the loop, terminate it, or move it toward termination. In the example just shown, the inputFile.GetRecord() statement in the body of the loop moves the loop toward termination, but the recordCount statements don't; they're housekeeping statements that don't control the loop's progress. Putting the recordCount statements in the loop header and leaving the inputFile.GetRecord() statement out is misleading; it creates the false impression that recordCount controls the loop.

If you want to use the for loop rather than the while loop in this case, put the loop-control statements in the loop header and leave everything else out. Here's the right way to use the loop header:

C++ Example of Logical if Unconventional Use of a for Loop Header

recordCount = 0; for ( inputFile.MoveToStart(); !inputFile.EndOfFile(); inputFile.GetRecord() ) {    recordCount++; }

The contents of the loop header in this example are all related to control of the loop. The inputFile.MoveToStart() statement initializes the loop, the !inputFile.EndOfFile() statement tests whether the loop has finished, and the inputFile.GetRecord() statement moves the loop toward termination. The statements that affect recordCount don't directly move the loop toward termination and are appropriately not included in the loop header. The while loop is probably still more appropriate for this job, but at least this code uses the loop header logically. For the record, here's how the code looks when it uses a while loop:

C++ Example of Appropriate Use of a while Loop

// read all the records from a file inputFile.MoveToStart(); recordCount = 0; while ( !inputFile.EndOfFile() ) {    inputFile.GetRecord();    recordCount++; }

Processing the Middle of the Loop

The following subsections describe handling the middle of a loop:

Use { and } to enclose the statements in a loop Use code brackets every time. They don't cost anything in speed or space at run time, they help readability, and they help prevent errors as the code is modified. They're a good defensive-programming practice.

Avoid empty loops In C++ and Java, it's possible to create an empty loop, one in which the work the loop is doing is coded on the same line as the test that checks whether the work is finished. Here's an example:

C++ Example of an Empty Loop

 while ( ( inputChar = dataFile.GetChar() ) != CharType_Eof ) {    ; }

In this example, the loop is empty because the while expression includes two things: the work of the loop inputChar = dataFile.GetChar() and a test for whether the loop should terminate inputChar != CharType_Eof. The loop would be clearer if it were recoded so that the work it does is evident to the reader:

C++ Example of an Empty Loop Converted to an Occupied Loop

do {    inputChar = dataFile.GetChar(); } while ( inputChar != CharType_Eof );

The new code takes up three full lines rather than one line and a semicolon, which is appropriate since it does the work of three lines rather than that of one line and a semicolon.

Keep loop-housekeeping chores at either the beginning or the end of the loop Loop-housekeeping chores are expressions like i = i + 1 or j++, expressions whose main purpose isn't to do the work of the loop but to control the loop. The housekeeping is done at the end of the loop in this example:

C++ Example of Housekeeping Statements at the End of a Loop

nameCount = 0; totalLength = 0; while ( !inputFile.EndOfFile() ) {    // do the work of the loop    inputFile >> inputString;    names[ nameCount ] = inputString;    ...    // prepare for next pass through the loop--housekeeping    nameCount++;       <-- 1    totalLength = totalLength + inputString.length();       <-- 1 }

(1)Here are the housekeeping statements.

As a general rule, the variables you initialize before the loop are the variables you'll manipulate in the housekeeping part of the loop.

Make each loop perform only one function The mere fact that a loop can be used to do two things at once isn't sufficient justification for doing them together. Loops should be like routines in that each one should do only one thing and do it well. If it seems inefficient to use two loops where one would suffice, write the code as two loops, comment that they could be combined for efficiency, and then wait until benchmarks show that the section of the program poses a performance problem before changing the two loops into one.

Cross-Reference

For more on optimization, see Chapter 25, "Code-Tuning Strategies," and Chapter 26, "Code-Tuning Techniques."

Exiting the Loop

These subsections describe handling the end of a loop:

Assure yourself that the loop ends This is fundamental. Mentally simulate the execution of the loop until you are confident that, in all circumstances, it ends. Think through the nominal cases, the endpoints, and each of the exceptional cases.

Make loop-termination conditions obvious If you use a for loop and don't fool around with the loop index and don't use a goto or break to get out of the loop, the termination condition will be obvious. Likewise, if you use a while or repeat-until loop and put all the control in the while or repeat-until clause, the termination condition will be obvious. The key is putting the control in one place.

Don't monkey with the loop index of a for loop to make the loop terminate Some programmers jimmy the value of a for loop index to make the loop terminate early. Here's an example:

Java Example of Monkeying with a Loop Index

 for ( int i = 0; i < 100; i++ ) {    // some code    ...    if ( ... ) {       i = 100;       <-- 1    }    // more code    ... }

(1)Here's the monkeying.

The intent in this example is to terminate the loop under some condition by setting i to 100, a value that's larger than the end of the for loop's range of 0 through 99. Virtually all good programmers avoid this practice; it's the sign of an amateur. When you set up a for loop, the loop counter is off limits. Use a while loop to provide more control over the loop's exit conditions.

Avoid code that depends on the loop index's final value It's bad form to use the value of the loop index after the loop. The terminal value of the loop index varies from language to language and implementation to implementation. The value is different when the loop terminates normally and when it terminates abnormally. Even if you happen to know what the final value is without stopping to think about it, the next person to read the code will probably have to think about it. It's better form and more self-documenting if you assign the final value to a variable at the appropriate point inside the loop.

This code misuses the index's final value:

C++ Example of Code That Misuses a Loop Index's Terminal Value

 for ( recordCount = 0; recordCount < MAX_RECORDS; recordCount++ ) {    if ( entry[ recordCount ] == testValue ) {       break;    } } // lots of code ... if ( recordCount < MAX_RECORDS ) {       <-- 1    return( true ); } else {    return( false ); }

(1)Here's the misuse of the loop index's terminal value.

In this fragment, the second test for recordCount < MaxRecords makes it appear that the loop is supposed to loop though all the values in entry[] and return true if it finds the one equal to testValue and false otherwise. It's hard to remember whether the index gets incremented past the end of the loop, so it's easy to make an off-by-one error. You're better off writing code that doesn't depend on the index's final value. Here's how to rewrite the code:

C++ Example of Code That Doesn't Misuse a Loop Index's Terminal Value

found = false; for ( recordCount = 0; recordCount < MAX_RECORDS; recordCount++ ) {    if ( entry[ recordCount ] == testValue ) {       found = true;       break;    } } // lots of code ... return( found );

This second code fragment uses an extra variable and keeps references to recordCount more localized. As is often the case when an extra boolean variable is used, the resulting code is clearer.

Consider using safety counters A safety counter is a variable you increment each pass through a loop to determine whether a loop has been executed too many times. If you have a program in which an error would be catastrophic, you can use safety counters to ensure that all loops end. This C++ loop could profitably use a safety counter:

C++ Example of a Loop That Could Use a Safety Counter

do {    node = node->Next;    ... } while ( node->Next != NULL );

Here's the same code with the safety counters added:

C++ Example of Using a Safety Counter

 safetyCounter = 0; do {    node = node->Next;    ...    safetyCounter++;       <-- 1    if ( safetyCounter >= SAFETY_LIMIT ) {       Assert( false, "Internal Error: Safety-Counter Violation." );       <-- 1    }    ... } while ( node->Next != NULL );

(1)Here's the safety-counter code.

Safety counters are not a cure-all. Introduced into the code one at a time, safety counters increase complexity and can lead to additional errors. Because they aren't used in every loop, you might forget to maintain safety-counter code when you modify loops in parts of the program that do use them. If safety counters are instituted as a projectwide standard for critical loops, however, you learn to expect them and the safety-counter code is no more prone to produce errors later than any other code is.

Exiting Loops Early

Many languages provide a means of causing a loop to terminate in some way other than completing the for or while condition. In this discussion, break is a generic term for break in C++, C, and Java; for Exit-Do and Exit-For in Visual Basic; and for similar constructs, including those simulated with gotos in languages that don't support break directly. The break statement (or equivalent) causes a loop to terminate through the normal exit channel; the program resumes execution at the first statement following the loop.

The continue statement is similar to break in that it's an auxiliary loop-control statement. Rather than causing a loop exit, however, continue causes the program to skip the loop body and continue executing at the beginning of the next iteration of the loop. A continue statement is shorthand for an if-then clause that would prevent the rest of the loop from being executed.

Consider using break statements rather than boolean flags in a while loop In some cases, adding boolean flags to a while loop to emulate exits from the body of the loop makes the loop hard to read. Sometimes you can remove several levels of indentation inside a loop and simplify loop control just by using a break instead of a series of if tests.

Putting multiple break conditions into separate statements and placing them near the code that produces the break can reduce nesting and make the loop more readable.

Be wary of a loop with a lot of break s scattered through it A loop's containing a lot of breaks can indicate unclear thinking about the structure of the loop or its role in the surrounding code. A proliferation of breaks raises the possibility that the loop could be more clearly expressed as a series of loops rather than as one loop with many exits.

According to an article in Software Engineering Notes, the software error that brought down the New York City phone systems for 9 hours on January 15, 1990, was due to an extra break statement (SEN 1990):

C++ Example of Erroneous Use of a break Statement Within a do-switch-if Block

 do {    ...    switch      ...      if () {        ...         break;       <-- 1        ...      }      ... } while ( ... );

(1)This break was intended for the if but broke out of the switch instead.

Multiple breaks don't necessarily indicate an error, but their existence in a loop is a warning sign, a canary in a coal mine that's gasping for air instead of singing as loud as it should be.

Use continue for tests at the top of a loop A good use of continue is for moving execution past the body of the loop after testing a condition at the top. For example, if the loop reads records, discards records of one kind, and processes records of another kind, you could put a test like this one at the top of the loop:

Pseudocode Example of a Relatively Safe Use of continue

while ( not eof( file ) ) do    read( record, file )    if ( record.Type <> targetType ) then       continue    -- process record of targetType    ... end while

Using continue in this way lets you avoid an if test that would effectively indent the entire body of the loop. If, on the other hand, the continue occurs toward the middle or end of the loop, use an if instead.

Use the labeled break structure if your language supports it Java supports use of labeled breaks to prevent the kind of problem experienced with the New York City telephone outage. A labeled break can be used to exit a for loop, an if statement, or any block of code enclosed in braces (Arnold, Gosling, and Holmes 2000).

Here's a possible solution to the New York City telephone code problem, with the programming language changed from C++ to Java to show the labeled break:

Java Example of a Better Use of a Labeled break Statement Within a

 do-switch-if Block do {    ...    switch       ...       CALL_CENTER_DOWN:       if () {          ...           break CALL_CENTER_DOWN;       <-- 1          ...       }       ... } while ( ... );

(1)The target of the labeled break is unambiguous.

Use break and continue only with caution Use of break eliminates the possibility of treating a loop as a black box. Limiting yourself to only one statement to control a loop's exit condition is a powerful way to simplify your loops. Using a break forces the person reading your code to look inside the loop for an understanding of the loop control. That makes the loop more difficult to understand.

Use break only after you have considered the alternatives. You don't know with certainty whether continue and break are virtuous or evil constructs. Some computer scientists argue that they are a legitimate technique in structured programming; some argue that they aren't. Because you don't know in general whether continue and break are right or wrong, use them, but only with a fear that you might be wrong. It really is a simple proposition: if you can't defend a break or a continue, don't use it.

Checking Endpoints

A single loop usually has three cases of interest: the first case, an arbitrarily selected middle case, and the last case. When you create a loop, mentally run through the first, middle, and last cases to make sure that the loop doesn't have any off-by-one errors. If you have any special cases that are different from the first or last case, check those too. If the loop contains complex computations, get out your calculator and manually check the calculations.

Willingness to perform this kind of check is a key difference between efficient and inefficient programmers. Efficient programmers do the work of mental simulations and hand calculations because they know that such measures help them find errors.

Inefficient programmers tend to experiment randomly until they find a combination that seems to work. If a loop isn't working the way it's supposed to, the inefficient programmer changes the < sign to a <= sign. If that fails, the inefficient programmer changes the loop index by adding or subtracting 1. Eventually the programmer using this approach might stumble onto the right combination or simply replace the original error with a more subtle one. Even if this random process results in a correct program, it doesn't result in the programmer's knowing why the program is correct.

You can expect several benefits from mental simulations and hand calculations. The mental discipline results in fewer errors during initial coding, in more rapid detection of errors during debugging, and in a better overall understanding of the program. The mental exercise means that you understand how your code works rather than guessing about it.

Using Loop Variables

Here are some guidelines for using loop variables:

Use ordinal or enumerated types for limits on both arrays and loops Generally, loop counters should be integer values. Floating-point values don't increment well. For example, you could add 1.0 to 26,742,897.0 and get 26,742,897.0 instead of 26,742,898.0. If this incremented value were a loop counter, you'd have an infinite loop.

Cross-Reference

For details on naming loop variables, see "Naming Loop Indexes" in Section 11.2.

Use meaningful variable names to make nested loops readable Arrays are often indexed with the same variables that are used for loop indexes. If you have a one-dimensional array, you might be able to get away with using i, j, or k to index it. But if you have an array with two or more dimensions, you should use meaningful index names to clarify what you're doing. Meaningful array-index names clarify both the purpose of the loop and the part of the array you intend to access.

Here's code that doesn't put this principle to work; it uses the meaningless names i, j, and k instead:

Java Example of Bad Loop Variable Names

for ( int i = 0; i < numPayCodes; i++ ) {    for ( int j = 0; j < 12; j++ ) {       for ( int k = 0; k < numDivisions; k++ ) {          sum = sum + transaction[ j ][ i ][ k ];       }    } }

What do you think the array indexes in transaction mean? Do i, j, and k tell you anything about the contents of transaction? If you had the declaration of transaction, could you easily determine whether the indexes were in the right order? Here's the same loop with more readable loop variable names:

Java Example of Good Loop Variable Names

for ( int payCodeIdx = 0; payCodeIdx < numPayCodes; payCodeIdx++ ) {    for (int month = 0; month < 12; month++ ) {       for ( int divisionIdx = 0; divisionIdx < numDivisions; divisionIdx++ ) {          sum = sum + transaction[ month ][ payCodeIdx ][ divisionIdx ];       }    } }

What do you think the array indexes in transaction mean this time? In this case, the answer is easier to come by because the variable names payCodeIdx, month, and divisionIdx tell you a lot more than i, j, and k did. The computer can read the two versions of the loop equally easily. People can read the second version more easily than the first, however, and the second version is better since your primary audience is made up of humans, not computers.

Use meaningful names to avoid loop-index cross-talk Habitual use of i, j, and k can give rise to index cross-talk using the same index name for two different purposes. Take a look at this example:

C++ Example of Index Cross-Talk

 for ( i = 0; i < numPayCodes; i++ ) {       <-- 1    // lots of code    ...    for ( j = 0; j < 12; j++ ) {       // lots of code       ...       for ( i = 0; i < numDivisions; i++ ) {       <-- 2          sum = sum + transaction[ j ][ i ][ k ];       }    } }

(1)i is used first here….
(2)…and again here.

The use of i is so habitual that it's used twice in the same nesting structure. The second for loop controlled by i conflicts with the first, and that's index cross-talk. Using more meaningful names than i, j, and k would have prevented the problem. In general, if the body of a loop has more than a couple of lines, if it might grow, or if it's in a group of nested loops, avoid i, j, and k.

Limit the scope of loop-index variables to the loop itself Loop-index cross-talk and other uses of loop indexes outside their loops is such a significant problem that the designers of Ada decided to make for loop indexes invalid outside their loops; trying to use one outside its for loop generates an error at compile time.

C++ and Java implement the same idea to some extent they allow loop indexes to be declared within a loop, but they don't require it. In the example on page 378, the recordCount variable could be declared inside the for statement, which would limit its scope to the for loop, like this:

C++ Example of Declaring a Loop-Index Variable Within a for loop

for ( int recordCount = 0; recordCount < MAX_RECORDS; recordCount++ ) {    // looping code that uses recordCount }

In principle, this technique should allow creation of code that redeclares recordCount in multiple loops without any risk of misusing the two different recordCounts. That usage would give rise to code that looks like this:

C++ Example of Declaring Loop-Indexes Within for loops and Reusing Them Safely Maybe!

for ( int recordCount = 0; recordCount < MAX_RECORDS; recordCount++ ) {    // looping code that uses recordCount } // intervening code for ( int recordCount = 0; recordCount < MAX_RECORDS; recordCount++ ) {    // additional looping code that uses a different recordCount }

This technique is helpful for documenting the purpose of the recordCount variable; however, don't rely on your compiler to enforce recordCount's scope. Section 6.3.3.1 of The C++ Programming Language (Stroustrup 1997) says that recordCount should have a scope limited to its loop. When I checked this functionality with three different C++ compilers, however, I got three different results:

The first compiler flagged recordCount in the second for loop for multiple variable declarations and generated an error.
The second compiler accepted recordCount in the second for loop but allowed it to be used outside the first for loop.
The third compiler allowed both usages of recordCount and did not allow either one to be used outside the for loop in which it was declared.

As is often the case with more esoteric language features, compiler implementations can vary.

How Long Should a Loop Be?

Loop length can be measured in lines of code or depth of nesting. Here are some guidelines:

Make your loops short enough to view all at once If you usually look at loops on your monitor and your monitor displays 50 lines, that puts a 50-line restriction on you. Experts have suggested a loop-length limit of one page. When you begin to appreciate the principle of writing simple code, however, you'll rarely write loops longer than 15 or 20 lines.

Limit nesting to three levels Studies have shown that the ability of programmers to comprehend a loop deteriorates significantly beyond three levels of nesting (Yourdon 1986a). If you're going beyond that number of levels, make the loop shorter (conceptually) by breaking part of it into a routine or simplifying the control structure.

Cross-Reference

For details on simplifying nesting, see Section 19.4, "Taming Dangerously Deep Nesting."

Move loop innards of long loops into routines If the loop is well designed, the code on the inside of a loop can often be moved into one or more routines that are called from within the loop.

Make long loops especially clear Length adds complexity. If you write a short loop, you can use riskier control structures such as break and continue, multiple exits, complicated termination conditions, and so on. If you write a longer loop and feel any concern for your reader, you'll give the loop a single exit and make the exit condition unmistakably clear.

< Free Open Study >