26.1. Logic | Code Complete: A Practical Handbook of Software Construction, Second Edition

< Free Open Study >

Much of programming consists of manipulating logic. This section describes how to manipulate logical expressions to your advantage.

Cross-Reference

For other details on using statement logic, see Chapters 14 19.

Stop Testing When You Know the Answer

Suppose you have a statement like

if ( 5 < x ) and ( x < 10 ) then ...

Once you've determined that x is greater than 5, you don't need to perform the second half of the test.

Some languages provide a form of expression evaluation known as "short-circuit evaluation," which means that the compiler generates code that automatically stops testing as soon as it knows the answer. Short-circuit evaluation is part of C++'s standard operators and Java's "conditional" operators.

Cross-Reference

For more on short-circuit evaluation, see "Knowing How Boolean Expressions Are Evaluated" in Section 19.1.

If your language doesn't support short-circuit evaluation natively, you have to avoid using and and or, adding logic instead. With short-circuit evaluation, the code above changes to this:

if ( 5 < x ) then    if ( x < 10 ) then ...

The principle of not testing after you know the answer is a good one for many other kinds of cases as well. A search loop is a common case. If you're scanning an array of input numbers for a negative value and you simply need to know whether a negative value is present, one approach is to check every value, setting a negativeFound variable when you find one. Here's how the search loop would look:

C++ Example of Not Stopping After You Know the Answer

negativeInputFound = false; for ( i = 0; i < count; i++ ) {    if ( input[ i ] < 0 ) {       negativeInputFound = true;    } }

A better approach would be to stop scanning as soon as you find a negative value. Any of these approaches would solve the problem:

Add a break statement after the negativeInputFound = true line.
If your language doesn't have break, emulate a break with a goto that goes to the first statement after the loop.
Change the for loop to a while loop, and check for negativeInputFound as well as for incrementing the loop counter past count.
Change the for loop to a while loop, put a sentinel value in the first array element after the last value entry, and simply check for a negative value in the while test. After the loop terminates, see whether the position of the first found value is in the array or one past the end. Sentinels are discussed in more detail later in the chapter.

Here are the results of using the break keyword in C++ and Java:

Language	Straight Time	Code-Tuned Time	Time Savings
C++	4.27	3.68	14%
Java	4.85	3.46	29%

Note

(1) Times in this and the following tables in this chapter are given in seconds and are meaningful only for comparisons across rows of each table. Actual times will vary according to the compiler, compiler options used, and the environment in which each test is run. (2) Benchmark results are typically made up of several thousand to many million executions of the code fragments to smooth out the sample-to-sample fluctuations in the results. (3) Specific brands and versions of compilers aren't indicated. Performance characteristics vary significantly from brand to brand and version to version. (4) Comparisons among results from different languages aren't always meaningful because compilers for different languages don't always offer comparable code-generation options. (5) The results shown for interpreted languages (PHP and Python) are typically based on less than 1% of the test runs used for the other languages. (6) Some of the "time savings" percentages might not be exactly reproducible from the data in these tables due to rounding of the "straight time" and "code-tuned time" entries.

The impact of this change varies a great deal depending on how many values you have and how often you expect to find a negative value. This test assumed an average of 100 values and assumed that a negative value would be found 50 percent of the time.

Order Tests by Frequency

Arrange tests so that the one that's fastest and most likely to be true is performed first. It should be easy to drop through the normal case, and if there are inefficiencies, they should be in processing the uncommon cases. This principle applies to case statements and to chains of if-then-elses.

Here's a Select-Case statement that responds to keyboard input in a word processor:

Visual Basic Example of a Poorly Ordered Logical Test

Select inputCharacter    Case "+", "="       ProcessMathSymbol( inputCharacter )    Case "0" To "9"       ProcessDigit( inputCharacter )    Case ",", ".", ":", ";", "!", "?"       ProcessPunctuation( inputCharacter )    Case " "       ProcessSpace( inputCharacter )    Case "A" To "Z", "a" To "z"       ProcessAlpha( inputCharacter )    Case Else       ProcessError( inputCharacter ) End Select

The cases in this case statement are ordered in something close to the ASCII sort order. In a case statement, however, the effect is often the same as if you had written a big set of ifthen-elses, so if you get an "a" as an input character, the program tests whether it's a math symbol, a punctuation mark, a digit, or a space before determining that it's an alphabetic character. If you know the likely frequency of your input characters, you can put the most common cases first. Here's the reordered case statement:

Visual Basic Example of a Well-Ordered Logical Test

Select inputCharacter    Case "A" To "Z", "a" To "z"       ProcessAlpha( inputCharacter )    Case " "       ProcessSpace( inputCharacter )    Case ",", ".", ":", ";", "!", "?"       ProcessPunctuation( inputCharacter )    Case "0" To "9"       ProcessDigit( inputCharacter )    Case "+", "="       ProcessMathSymbol( inputCharacter )    Case Else       ProcessError( inputCharacter ) End Select

Because the most common case is usually found sooner in the optimized code, the net effect will be the performance of fewer tests. Following are the results of this optimization with a typical mix of characters:

Language	Straight Time	Code-Tuned Time	Time Savings
C#	0.220	0.260	-18%
Java	2.56	2.56	0%
Visual Basic	0.280	0.260	7%
Note: Benchmarked with an input mix of 78 percent alphabetic characters, 17 percent spaces, and 5 percent punctuation symbols.

The Microsoft Visual Basic results are as expected, but the Java and C# results are not as expected. Apparently that's because of the way switch-case statements are structured in C++ and Java because each value must be enumerated individually rather than in ranges, the C++ and Java code doesn't benefit from the optimization as the Visual Basic code does. This result underscores the importance of not following any optimization advice blindly specific compiler implementations will significantly affect the results.

You might assume that the code generated by the Visual Basic compiler for a set of ifthen-elses that perform the same test as the case statement would be similar. Take a look at those results:

Language	Straight Time	Code-Tuned Time	Time Savings
C#	0.630	0.330	48%
Java	0.922	0.460	50%
Visual Basic	1.36	1.00	26%

The results are quite different. For the same number of tests, the Visual Basic compiler takes about five times as long in the unoptimized case, four times in the optimized case. This suggests that the compiler is generating different code for the case approach than for the if-then-else approach.

The improvement with if-then-elses is more consistent than it was with the case statements, but that's a mixed blessing. In C# and Visual Basic, both versions of the case statement approach are faster than both versions of the if-then-else approach, whereas in Java both versions are slower. This variation in results suggests a third possible optimization, described in the next section.

Compare Performance of Similar Logic Structures

The test described above could be performed using either a case statement or if-thenelses. Depending on the environment, either approach might work better. Here is the data from the preceding two tables reformatted to present the "code-tuned" times comparing if-then-else and case performance:

Language	case	if-then-else	Time Savings	Performance Ratio
C#	0.260	0.330	-27%	1:1
Java	2.56	0.460	82%	6:1
Visual Basic	0.260	1.00	258%	1:4

These results defy any logical explanation. In one of the languages, case is dramatically superior to if-then-else, and in another, if-then-else is dramatically superior to case. In the third language, the difference is relatively small. You might think that because C# and Java share similar syntax for case statements, their results would be similar, but in fact their results are opposite each other.

This example clearly illustrates the difficulty of performing any sort of "rule of thumb" or "logic" to code tuning there is simply no reliable substitute for measuring results.

Substitute Table Lookups for Complicated Expressions

In some circumstances, a table lookup might be quicker than traversing a complicated chain of logic. The point of a complicated chain is usually to categorize something and then to take an action based on its category. As an abstract example, suppose you want to assign a category number to something based on which of three groups Groups A, B, and C it falls into:

Cross-Reference

For details on using table lookups to replace complicated logic, see Chapter 18, "Table-Driven Methods."

This complicated logic chain assigns the category numbers:

C++ Example of a Complicated Chain of Logic

if ( ( a && !c ) || ( a && b && c ) ) {    category = 1; } else if ( ( b && !a ) || ( a && c && !b ) ) {    category = 2; } else if ( c && !a && !b ) {    category = 3; } else {    category = 0; }

You can replace this test with a more modifiable and higher-performance lookup table:

C++ Example of Using a Table Lookup to Replace Complicated Logic

 // define categoryTable static int categoryTable[ 2 ][ 2 ][ 2 ] = {       <-- 1    // !b!c !bc b!c bc        0, 3, 2, 2, // !a        1, 2, 1, 1 // a }; ... category = categoryTable[ a ][ b ][ c ];

(1)This table definition is somewhat difficult to understand. Any commenting you can do to make table definitions readable helps.

Although the definition of the table is hard to read, if it's well documented it won't be any harder to read than the code for the complicated chain of logic was. If the definition changes, the table will be much easier to maintain than the earlier logic would have been. Here are the performance results:

Language	Straight Time	Code-Tuned Time	Time Savings	Performance Ratio
C++	5.04	3.39	33%	1.5:1
Visual Basic	5.21	2.60	50%	2:1

Use Lazy Evaluation

One of my former roommates was a great procrastinator. He justified his laziness by saying that many of the things people feel rushed to do simply don't need to be done. If he waited long enough, he claimed, the things that weren't important would be procrastinated into oblivion and he wouldn't waste his time doing them.

Lazy evaluation is based on the principle my roommate used. If a program uses lazy evaluation, it avoids doing any work until the work is needed. Lazy evaluation is similar to just-in-time strategies that do the work closest to when it's needed.

Suppose, for example, that your program contains a table of 5000 values, generates the whole table at startup time, and then uses it as the program executes. If the program uses only a small percentage of the entries in the table, it might make more sense to compute them as they're needed rather than all at once. Once an entry is computed, it can still be stored for future reference (otherwise known as "cached").

< Free Open Study >