10.4. Scope | Code Complete: A Practical Handbook of Software Construction, Second Edition

< Free Open Study >

"Scope" is a way of thinking about a variable's celebrity status: how famous is it? Scope, or visibility, refers to the extent to which your variables are known and can be referenced throughout a program. A variable with limited or small scope is known in only a small area of a program a loop index used in only one small loop, for instance. A variable with large scope is known in many places in a program a table of employee information that's used throughout a program, for instance.

Different languages handle scope in different ways. In some primitive languages, all variables are global. You therefore don't have any control over the scope of a variable, and that can create a lot of problems. In C++ and similar languages, a variable can be visible to a block (a section of code enclosed in curly brackets), a routine, a class (and possibly its derived classes), or the whole program. In Java and C#, a variable can also be visible to a package or namespace (a collection of classes).

The following sections provide guidelines that apply to scope.

Localize References to Variables

The code between references to a variable is a "window of vulnerability." In the window, new code might be added, inadvertently altering the variable, or someone reading the code might forget the value the variable is supposed to contain. It's always a good idea to localize references to variables by keeping them close together.

The idea of localizing references to a variable is pretty self-evident, but it's an idea that lends itself to formal measurement. One method of measuring how close together the references to a variable are is to compute the "span" of a variable. Here's an example:

Java Example of Variable Span

a = 0; b = 0; c = 0; a = b + c;

In this case, two lines come between the first reference to a and the second, so a has a span of two. One line comes between the two references to b, so b has a span of one, and c has a span of zero. Here's another example:

Java Example of Spans of One and Zero

a = 0; b = 0; c = 0; b = a + 1; b = b / c;

In this case, there is one line between the first reference to b and the second, for a span of one. There are no lines between the second reference to b and the third, for a span of zero.

Keep Variables "Live" for as Short a Time as Possible

A concept that's related to variable span is variable "live time," the total number of statements over which a variable is live. A variable's life begins at the first statement in which it's referenced; its life ends at the last statement in which it's referenced.

Unlike span, live time isn't affected by how many times the variable is used between the first and last times it's referenced. If the variable is first referenced on line 1 and last referenced on line 25, it has a live time of 25 statements. If those are the only two lines in which it's used, it has an average span of 23 statements. If the variable were used on every line from line 1 through line 25, it would have an average span of 0 statements, but it would still have a live time of 25 statements. Figure 10-1 illustrates both span and live time.

Figure 10-1. "Long live time" means that a variable is live over the course of many statements. "Short live time" means it's live for only a few statements. "Span" refers to how close together the references to a variable are

As with span, the goal with respect to live time is to keep the number low, to keep a variable live for as short a time as possible. And as with span, the basic advantage of maintaining a low number is that it reduces the window of vulnerability. You reduce the chance of incorrectly or inadvertently altering a variable between the places in which you intend to alter it.

A second advantage of keeping the live time short is that it gives you an accurate picture of your code. If a variable is assigned a value in line 10 and not used again until line 45, the very space between the two references implies that the variable is used between lines 10 and 45. If the variable is assigned a value in line 44 and used in line 45, no other uses of the variable are implied, and you can concentrate on a smaller section of code when you're thinking about that variable.

A short live time also reduces the chance of initialization errors. As you modify a program, straight-line code tends to turn into loops and you tend to forget initializations that were made far away from the loop. By keeping the initialization code and the loop code closer together, you reduce the chance that modifications will introduce initialization errors.

A short live time makes your code more readable. The fewer lines of code a reader has to keep in mind at once, the easier your code is to understand. Likewise, the shorter the live time, the less code you have to keep on your screen when you want to see all the references to a variable during editing and debugging.

Finally, short live times are useful when splitting a large routine into smaller routines. If references to variables are kept close together, it's easier to refactor related sections of code into routines of their own.

Measuring the Live Time of a Variable

You can formalize the concept of live time by counting the number of lines between the first and last references to a variable (including both the first and last lines). Here's an example with live times that are too long:

Java Example of Variables with Excessively Long Live Times

 1   // initialize all variables 2   recordIndex = 0; 3   total = 0; 4   done = false;     ... 26  while ( recordIndex < recordCount ) { 27  ... 28     recordIndex = recordIndex + 1;       <-- 1        ... 64  while ( !done ) {        ... 69     if ( total > projectedTotal ) {       <-- 2 70        done = true;       <-- 3

(1)Last reference to recordIndex.
(2)Last reference to total.
(3)Last reference to done.

Here are the live times for the variables in this example:

recordIndex	( line 28 - line 2 + 1 ) = 27
total	( line 69 - line 3 + 1 ) = 67
done	( line 70 - line 4 + 1 ) = 67
Average Live Time	( 27 + 67 + 67 ) / 3 54

The example has been rewritten below so that the variable references are closer together:

Java Example of Variables with Good, Short Live Times

     ... 25  recordIndex = 0;       <-- 1 26  while ( recordIndex < recordCount ) { 27  ... 28     recordIndex = recordIndex + 1;        ... 62  total = 0;       <-- 2 63  done = false;       <-- 2 64  while ( !done ) {        ... 69     if ( total > projectedTotal ) { 70        done = true;

(1)Initialization of recordIndex is moved down from line 3.
(2)Initialization of total and done are moved down from lines 4 and 5.

Here are the live times for the variables in this example:

recordIndex	( line 28 - line 25 + 1 ) = 4
total	( line 69 - line 62 + 1 ) = 8
done	( line 70 - line 63 + 1 ) = 8
Average Live Time	( 4 + 8 + 8 ) / 3 7

Intuitively, the second example seems better than the first because the initializations for the variables are performed closer to where the variables are used. The measured difference in average live time between the two examples is significant: An average of 54 vs. an average of 7 provides good quantitative support for the intuitive preference for the second piece of code.

General Guidelines for Minimizing Scope

Here are some specific guidelines you can use to minimize scope:

Initialize variables used in a loop immediately before the loop rather than back at the beginning of the routine containing the loop Doing this improves the chance that when you modify the loop, you'll remember to make corresponding modifications to the loop initialization. Later, when you modify the program and put another loop around the initial loop, the initialization will work on each pass through the new loop rather than on only the first pass.

Cross-Reference

For details on initializing variables close to where they're used, see Section 10.3, "Guidelines for Initializing Variables," earlier in this chapter.

Don't assign a value to a variable until just before the value is used You might have experienced the frustration of trying to figure out where a variable was assigned its value. The more you can do to clarify where a variable receives its value, the better. Languages like C++ and Java support variable initializations like these:

Cross-Reference

For more on this style of variable declaration and definition, see "Ideally, declare and define each variable close to where it's first used" in Section 10.3.

C++ Example of Good Variable Declarations and Initializations

int receiptIndex = 0; float dailyReceipts = TodaysReceipts(); double totalReceipts = TotalReceipts( dailyReceipts );

Group related statements The following examples show a routine for summarizing daily receipts and illustrate how to put references to variables together so that they're easier to locate. The first example illustrates the violation of this principle:

Cross-Reference

For more details on keeping related statements together, see Section 14.2, "Statements Whose Order Doesn't Matter."

C++ Example of Using Two Sets of Variables in a Confusing Way

 void SummarizeData(...) {    ...    GetOldData( oldData, &numOldData );       <-- 1    GetNewData( newData, &numNewData );         |    totalOldData = Sum( oldData, numOldData );  |    totalNewData = Sum( newData, numNewData );  |    PrintOldDataSummary( oldData, totalOldData, numOldData );    PrintNewDataSummary( newData, totalNewData, numNewData );    SaveOldDataSummary( totalOldData, numOldData );    SaveNewDataSummary( totalNewData, numNewData );       <-- 1    ... }

(1)Statements using two sets of variables.

Note that, in this example, you have to keep track of oldData, newData, numOldData, numNewData, totalOldData, and totalNewData all at once six variables for just this short fragment. The next example shows how to reduce that number to only three elements within each block of code:

C++ Example of Using Two Sets of Variables More Understandably

 void SummarizeData( ... ) {    GetOldData( oldData, &numOldData );       <-- 1    totalOldData = Sum( oldData, numOldData );  |    PrintOldDataSummary( oldData, totalOldData, numOldData );    SaveOldDataSummary( totalOldData, numOldData );       <-- 1    ...    GetNewData( newData, &numNewData );       <-- 2    totalNewData = Sum( newData, numNewData );  |    PrintNewDataSummary( newData, totalNewData, numNewData );    SaveNewDataSummary( totalNewData, numNewData );       <-- 2    ... }

(1)Statements using oldData.
(2)Statements using newData.

When the code is broken up, the two blocks are each shorter than the original block and individually contain fewer variables. They're easier to understand, and if you need to break this code out into separate routines, the shorter blocks with fewer variables will promote better-defined routines.

Break groups of related statements into separate routines All other things being equal, a variable in a shorter routine will tend to have smaller span and live time than a variable in a longer routine. By breaking related statements into separate, smaller routines, you reduce the scope that the variable can have.

Begin with most restricted visibility, and expand the variable's scope only if necessary Part of minimizing the scope of a variable is keeping it as local as possible. It is much more difficult to reduce the scope of a variable that has had a large scope than to expand the scope of a variable that has had a small scope in other words, it's harder to turn a global variable into a class variable than it is to turn a class variable into a global variable. It's harder to turn a protected data member into a private data member than vice versa. For that reason, when in doubt, favor the smallest possible scope for a variable: local to a specific loop, local to an individual routine, then private to a class, then protected, then package (if your programming language supports that), and global only as a last resort.

Cross-Reference

For more on global variables, see Section 13.3, "Global Data."

Comments on Minimizing Scope

Many programmers' approach to minimizing variables' scope depends on their views of the issues of "convenience" and "intellectual manageability." Some programmers make many of their variables global because global scope makes variables convenient to access and the programmers don't have to fool around with parameter lists and class-scoping rules. In their minds, the convenience of being able to access variables at any time outweighs the risks involved.

Other programmers prefer to keep their variables as local as possible because local scope helps intellectual manageability. The more information you can hide, the less you have to keep in mind at any one time. The less you have to keep in mind, the smaller the chance that you'll make an error because you forgot one of the many details you needed to remember.

Cross-Reference

The idea of minimizing scope is related to the idea of information hiding. For details, see "Hide Secrets (Information Hiding)" in Section 5.3.

The difference between the "convenience" philosophy and the "intellectual manage-ability" philosophy boils down to a difference in emphasis between writing programs and reading them. Maximizing scope might indeed make programs easy to write, but a program in which any routine can use any variable at any time is harder to understand than a program that uses well-factored routines. In such a program, you can't understand only one routine; you have to understand all the other routines with which that routine shares global data. Such programs are hard to read, hard to debug, and hard to modify.

Consequently, you should declare each variable to be visible to the smallest segment of code that needs to see it. If you can confine the variable's scope to a single loop or to a single routine, great. If you can't confine the scope to one routine, restrict the visibility to the routines in a single class. If you can't restrict the variable's scope to the class that's most responsible for the variable, create access routines to share the variable's data with other classes. You'll find that you rarely, if ever, need to use naked global data.

Cross-Reference

For details on using access routines, see "Using Access Routines Instead of Global Data" in Section 13.3.

< Free Open Study >