13.3. Global Data

< Free Open Study >

Global variables are accessible anywhere in a program. The term is also sometimes used sloppily to refer to variables with a broader scope than local variables such as class variables that are accessible anywhere within a class. But accessibility anywhere within a single class does not by itself mean that a variable is global.

Cross-Reference

For details on the differences between global data and class data, see "Class data mistaken for global data" in Section 5.3.

Most experienced programmers have concluded that using global data is riskier than using local data. Most experienced programmers have also concluded that access to data from several routines is pretty useful.

Even if global variables don't always produce errors, however, they're hardly ever the best way to program. The rest of this section fully explores the issues involved.

Common Problems with Global Data

If you use global variables indiscriminately or you feel that not being able to use them is restrictive, you probably haven't caught on to the full value of information hiding and modularity yet. Modularity, information hiding, and the associated use of well-designed classes might not be revealed truths, but they go a long way toward making large programs understandable and maintainable. Once you get the message, you'll want to write routines and classes with as little connection as possible to global variables and the outside world.

People cite numerous problems in using global data, but the problems boil down to a small number of major issues:

Inadvertent changes to global data You might change the value of a global variable in one place and mistakenly think that it has remained unchanged somewhere else. Such a problem is known as a "side effect." For example, in this example, theAnswer is a global variable:

Visual Basic Example of a Side-Effect Problem

 theAnswer = GetTheAnswer()       <-- 1 otherAnswer = GetOtherAnswer()       <-- 2 averageAnswer = (theAnswer + otherAnswer) / 2       <-- 3

(1)theAnswer is a global variable.
(2)GetOtherAnswer() changes theAnswer.
(3)averageAnswer is wrong.

You might assume that the call to GetOtherAnswer() doesn't change the value of theAnswer; if it does, the average in the third line will be wrong. And, in fact, GetOtherAnswer() does change the value of theAnswer, so the program has an error to be fixed.

Bizarre and exciting aliasing problems with global data "Aliasing" refers to calling the same variable by two or more different names. This happens when a global variable is passed to a routine and then used by the routine both as a global variable and as a parameter. Here's a routine that uses a global variable:

Visual Basic Example of a Routine That's Ripe for an Aliasing Problem

Sub WriteGlobal( ByRef inputVar As Integer )    inputVar = 0    globalVar = inputVar + 5    MsgBox( "Input Variable: " & Str( inputVar ) )    MsgBox( "Global Variable: " & Str( globalVar ) ) End Sub

Here's the code that calls the routine with the global variable as an argument:

Visual Basic Example of Calling the Routine with an Argument, Which Exposes an Aliasing Problem

WriteGlobal( globalVar )

Since inputVar is initialized to 0 and WriteGlobal() adds 5 to inputVar to get globalVar, you'd expect globalVar to be 5 more than inputVar. But here's the surprising result:

The Result of the Aliasing Problem in Visual Basic

Input Variable:  5 Global Variable: 5

The subtlety here is that globalVar and inputVar are actually the same variable! Since globalVar is passed into WriteGlobal() by the calling routine, it's referenced or "aliased" by two different names. The effect of the MsgBox() lines is thus quite different from the one intended: they display the same variable twice, even though they refer to two different names.

Re-entrant code problems with global data Code that can be entered by more than one thread of control is becoming increasingly common. Multithreaded code creates the possibility that global data will be shared not only among routines, but among different copies of the same program. In such an environment, you have to make sure that global data keeps its meaning even when multiple copies of a program are running. This is a significant problem, and you can avoid it by using techniques suggested later in this section.

Code reuse hindered by global data To use code from one program in another program, you have to be able to pull it out of the first program and plug it into the second. Ideally, you'd be able to lift out a single routine or class, plug it into another program, and continue merrily on your way.

Global data complicates the picture. If the class you want to reuse reads or writes global data, you can't just plug it into the new program. You have to modify the new program or the old class so that they're compatible. If you take the high road, you'll modify the old class so that it doesn't use global data. If you do that, the next time you need to reuse the class you'll be able to plug it in with no extra fuss. If you take the low road, you'll modify the new program to create the global data that the old class needs to use. This is like a virus; not only does the global data affect the original program, but it also spreads to new programs that use any of the old program's classes.

Uncertain initialization-order issues with global data The order in which data is initialized among different "translation units" (files) is not defined in some languages, notably C++. If the initialization of a global variable in one file uses a global variable that was initialized in a different file, all bets are off on the second variable's value unless you take explicit steps to ensure the two variables are initialized in the right sequence.

This problem is solvable with a workaround that Scott Meyers describes in Effective C++, Item #47 (Meyers 1998). But the trickiness of the solution is representative of the extra complexity that using global data introduces.

Modularity and intellectual manageability damaged by global data The essence of creating programs that are larger than a few hundred lines of code is managing complexity. The only way you can intellectually manage a large program is to break it into pieces so that you only have to think about one part at a time. Modularization is the most powerful tool at your disposal for breaking a program into pieces.

Global data pokes holes in your ability to modularize. If you use global data, can you concentrate on one routine at a time? No. You have to concentrate on one routine and every other routine that uses the same global data. Although global data doesn't completely destroy a program's modularity, it weakens it, and that's reason enough to try to find better solutions to your problems.

Reasons to Use Global Data

Data purists sometimes argue that programmers should never use global data, but most programs use "global data" when the term is broadly construed. Data in a database is global data, as is data in configuration files such as the Windows registry. Named constants are global data, just not global variables.

Used with discipline, global variables are useful in several situations:

Preservation of global values Sometimes you have data that applies conceptually to your whole program. This might be a variable that reflects the state of a program for example, interactive vs. command-line mode, or normal vs. error-recovery mode. Or it might be information that's needed throughout a program for example, a data table that every routine in the program uses.

Emulation of named constants Although C++, Java, Visual Basic, and most modern languages support named constants, some languages such as Python, Perl, Awk, and UNIX shell script still don't. You can use global variables as substitutes for named constants when your language doesn't support them. For example, you can replace the literal values 1 and 0 with the global variables TRUE and FALSE set to 1 and 0, or you can replace 66 as the number of lines per page with LINES_PER_PAGE = 66. It's easier to change code later when this approach is used, and the code tends to be easier to read. This disciplined use of global data is a prime example of the distinction between programming in vs. programming into a language, which is discussed more in Section 34.4, "Program into Your Language, Not in It."

Cross-Reference

For more details on named constants, see Section 12.7, "Named Constants."

Emulation of enumerated types You can also use global variables to emulate enumerated types in languages such as Python that don't support enumerated types directly.

Streamlining use of extremely common data Sometimes you have so many references to a variable that it appears in the parameter list of every routine you write. Rather than including it in every parameter list, you can make it a global variable. However, in cases in which a variable seems to be accessed everywhere, it rarely is. Usually it's accessed by a limited set of routines you can package into a class with the data they work on. More on this later.

Eliminating tramp data Sometimes you pass data to a routine or class merely so that it can be passed to another routine or class. For example, you might have an error-processing object that's used in each routine. When the routine in the middle of the call chain doesn't use the object, the object is called "tramp data." Use of global variables can eliminate tramp data.

Use Global Data Only as a Last Resort

Before you resort to using global data, consider a few alternatives:

Begin by making each variable local and make variables global only as you need to Make all variables local to individual routines initially. If you find they're needed elsewhere, make them private or protected class variables before you go so far as to make them global. If you finally find that you have to make them global, do it, but only when you're sure you have to. If you start by making a variable global, you'll never make it local, whereas if you start by making it local, you might never need to make it global.

Distinguish between global and class variables Some variables are truly global in that they are accessed throughout a whole program. Others are really class variables, used heavily only within a certain set of routines. It's OK to access a class variable any way you want to within the set of routines that use it heavily. If routines outside the class need to use it, provide the variable's value by means of an access routine. Don't access class values directly as if they were global variables even if your programming language allows you to. This advice is tantamount to saying "Modularize! Modularize! Modularize!"

Use access routines Creating access routines is the workhorse approach to getting around problems with global data. More on that in the next section.

Using Access Routines Instead of Global Data

Anything you can do with global data, you can do better with access routines. The use of access routines is a core technique for implementing abstract data types and achieving information hiding. Even if you don't want to use a full-blown abstract data type, you can still use access routines to centralize control over your data and to protect yourself against changes.

Advantages of Access Routines

Using access routines has multiple advantages:

You get centralized control over the data. If you discover a more appropriate implementation of the structure later, you don't have to change the code every-where the data is referenced. Changes don't ripple through your whole program. They stay inside the access routines.
You can ensure that all references to the variable are barricaded. If you push elements onto the stack with statements like stack.array[ stack.top ] = newElement, you can easily forget to check for stack overflow and make a serious mistake. If you use access routines for example, PushStack( newElement ) you can write the check for stack overflow into the PushStack() routine. The check will be done automatically every time the routine is called, and you can forget about it.
Cross-Reference

For more details on barricading, see Section 8.5, "Barricade Your Program to Contain the Damage Caused by Errors."
You get the general benefits of information hiding automatically. Access routines are an example of information hiding, even if you don't design them for that reason. You can change the interior of an access routine without changing the rest of the program. Access routines allow you to redecorate the interior of your house and leave the exterior unchanged so that your friends still recognize it.
Cross-Reference

For details on information hiding, see "Hide Secrets (Information Hiding)" in Section 5.3.
Access routines are easy to convert to an abstract data type. One advantage of access routines is that you can create a level of abstraction that's harder to do when you're working with global data directly. For example, instead of writing code that says if lineCount > MAX_LINES, an access routine allows you to write code that says if PageFull(). This small change documents the intent of the if lineCount test, and it does so in the code. It's a small gain in readability, but consistent attention to such details makes the difference between beautifully crafted software and code that's just hacked together.

How to Use Access Routines

Here's the short version of the theory and practice of access routines: Hide data in a class. Declare that data by using the static keyword or its equivalent to ensure only a single instance of the data exists. Write routines that let you look at the data and change it. Require code outside the class to use the access routines rather than working directly with the data.

For example, if you have a global status variable g_globalStatus that describes your program's overall status, you can create two access routines: globalStatus.Get() and globalStatus.Set(), each of which does what it sounds like it does. Those routines access a variable hidden within the class that replaces g_globalStatus. The rest of the program can get all the benefit of the formerly global variable by accessing globalStatus.Get() and globalStatus.Set().

If your language doesn't support classes, you can still create access routines to manipulate the global data but you'll have to enforce restrictions on the use of the global data through coding standards in lieu of built-in programming language enforcement.

Cross-Reference

Restricting access to global variables even when your language doesn't directly support that is an example of programming into a language vs. programming in a language. For more details, see Section 34.4, "Program into Your Language, Not in It."

Here are a few detailed guidelines for using access routines to hide global variables when your language doesn't have built-in support:

Require all code to go through the access routines for the data A good convention is to require all global data to begin with the g_ prefix, and to further require that no code access a variable with the g_ prefix except that variable's access routines. All other code reaches the data through the access routines.

Don't just throw all your global data into the same barrel If you throw all your global data into a big pile and write access routines for it, you eliminate the problems of global data but you miss out on some of the advantages of information hiding and abstract data types. As long as you're writing access routines, take a moment to think about which class each global variable belongs in and then package the data and its access routines with the other data and routines in that class.

Use locking to control access to global variables Similar to concurrency control in a multiuser database environment, locking requires that before the value of a global variable can be used or updated, the variable must be "checked out." After the variable is used, it's checked back in. During the time it's in use (checked out), if some other part of the program tries to check it out, the lock/unlock routine displays an error message or fires an assertion.

This description of locking ignores many of the subtleties of writing code to fully support concurrency. For that reason, simplified locking schemes like this one are most useful during the development stage. Unless the scheme is very well thought out, it probably won't be reliable enough to be put into production. When the program is put into production, the code is modified to do something safer and more graceful than displaying error messages. For example, it might log an error message to a file when it detects multiple parts of the program trying to lock the same global variable.

Cross-Reference

For details on planning for differences between developmental and production versions of a program, see "Plan to Remove Debugging Aids" in Section 8.6 and Section 8.7, "Determining How Much Defensive Programming to Leave in Production Code."

This sort of development-time safeguard is fairly easy to implement when you use access routines for global data, but it would be awkward to implement if you were using global data directly.

Build a level of abstraction into your access routines Build access routines at the level of the problem domain rather than at the level of the implementation details. That approach buys you improved readability as well as insurance against changes in the implementation details.

Compare the pairs of statements in Table 13-1:

Table 13-1. Accessing Global Data Directly and Through Access Routines
Direct Use of Global Data	Use of Global Data Through Access Routines
node = node.next	account = NextAccount( account )
node = node.next	employee = NextEmployee( employee )
node = node.next	rateLevel = NextRateLevel( rateLevel )
event = eventQueue[ queueFront ]	event = HighestPriorityEvent()
event = eventQueue[ queueBack ]	event = LowestPriorityEvent()

In the first three examples, the point is that an abstract access routine tells you a lot more than a generic structure. If you use the structure directly, you do too much at once: you show both what the structure itself is doing (moving to the next link in a linked list) and what's being done with respect to the entity it represents (getting an account, next employee, or rate level). This is a big burden to put on a simple datastructure assignment. Hiding the information behind abstract access routines lets the code speak for itself and makes the code read at the level of the problem domain, rather than at the level of implementation details.

Keep all accesses to the data at the same level of abstraction If you use an access routine to do one thing to a structure, you should use an access routine to do everything else to it too. If you read from the structure with an access routine, write to it with an access routine. If you call InitStack() to initialize a stack and PushStack() to push an item onto the stack, you've created a consistent view of the data. If you pop the stack by writing value = array[ stack.top ], you've created an inconsistent view of the data. The inconsistency makes it harder for others to understand the code. Create a PopStack() routine instead of writing value = array[ stack top ].

Cross-Reference

Using access routines for an event queue suggests the need to create a class. For details, see Chapter 6, "Working Classes."

In the example pairs of statements in Table 13-1, the two event-queue operations occurred in parallel. Inserting an event into the queue would be trickier than either of the two operations in the table, requiring several lines of code to find the place to insert the event, adjust existing events to make room for the new event, and adjust the front or back of the queue. Removing an event from the queue would be just as complicated. During coding, the complex operations would be put into routines and the others would be left as direct data manipulations. This would create an ugly, nonparallel use of the structure. Now compare the pairs of statements in Table 13-2:

Table 13-2. Parallel and Nonparallel Uses of Complex Data
Nonparallel Use of Complex Data	Parallel Use of Complex Data
event = EventQueue[ queueFront ]	event = HighestPriorityEvent()
event = EventQueue[ queueBack ]	event = LowestPriorityEvent()
AddEvent( event )	AddEvent( event )
eventCount = eventCount - 1	RemoveEvent( event )

Although you might think that these guidelines apply only to large programs, access routines have shown themselves to be a productive way of avoiding the problems of global data. As a bonus, they make the code more readable and add flexibility.

How to Reduce the Risks of Using Global Data

In most instances, global data is really class data for a class that hasn't been designed or implemented very well. In a few instances, data really does need to be global, but accesses to it can be wrapped with access routines to minimize potential problems. In a tiny number of remaining instances, you really do need to use global data. In those cases, you might think of following the guidelines in this section as getting shots so that you can drink the water when you travel to a foreign country: they're kind of painful, but they improve the odds of staying healthy.

Develop a naming convention that makes global variables obvious You can avoid some mistakes just by making it obvious that you're working with global data. If you're using global variables for more than one purpose (for example, as variables and as substitutes for named constants), make sure your naming convention differentiates among the types of uses.

Cross-Reference

For details on naming conventions for global variables, see "Identify global variables" in Section 11.4.

Create a well-annotated list of all your global variables Once your naming convention indicates that a variable is global, it's helpful to indicate what the variable does. A list of global variables is one of the most useful tools that someone working with your program can have.

Don't use global variables to contain intermediate results If you need to compute a new value for a global variable, assign the global variable the final value at the end of the computation rather than using it to hold the result of intermediate calculations.

Don't pretend you're not using global data by putting all your data into a monster object and passing it everywhere Putting everything into one huge object might satisfy the letter of the law by avoiding global variables, but it's pure overhead, producing none of the benefits of true encapsulation. If you use global data, do it openly. Don't try to disguise it with obese objects.

< Free Open Study >

Common Problems with Global Data

Visual Basic Example of a Side-Effect Problem

Visual Basic Example of a Routine That's Ripe for an Aliasing Problem

Visual Basic Example of Calling the Routine with an Argument, Which Exposes an Aliasing Problem

The Result of the Aliasing Problem in Visual Basic

Reasons to Use Global Data

Use Global Data Only as a Last Resort

Using Access Routines Instead of Global Data

Advantages of Access Routines

How to Use Access Routines

Table 13-1. Accessing Global Data Directly and Through Access Routines

Table 13-2. Parallel and Nonparallel Uses of Complex Data

How to Reduce the Risks of Using Global Data