13.2. Pointers | Code Complete: A Practical Handbook of Software Construction, Second Edition

< Free Open Study >

Pointer usage is one of the most error-prone areas of modern programming, to such an extent that modern languages like Java, C#, and Visual Basic don't provide a pointer data type. Using pointers is inherently complicated, and using them correctly requires that you have an excellent understanding of your compiler's memory-management scheme. Many common security problems, especially buffer overruns, can be traced back to erroneous use of pointers (Howard and LeBlanc 2003).

Even if your language doesn't require you to use pointers, a good understanding of pointers will help your understanding of how your programming language works. A liberal dose of defensive programming practices will help even further.

Paradigm for Understanding Pointers

Conceptually, every pointer consists of two parts: a location in memory and a knowledge of how to interpret the contents of that location.

Location in Memory

The location in memory is an address, often expressed in hexadecimal notation. An address on a 32-bit processor would be a 32-bit value, such as 0x0001EA40. The pointer itself contains only this address. To use the data the pointer points to, you have to go to that address and interpret the contents of memory at that location. If you were to look at the memory in that location, it would be just a collection of bits. It has to be interpreted to be meaningful.

Knowledge of How to Interpret the Contents

The knowledge of how to interpret the contents of a location in memory is provided by the base type of the pointer. If a pointer points to an integer, what that really means is that the compiler interprets the memory location given by the pointer as an integer. Of course, you can have an integer pointer, a string pointer, and a floating-point pointer all pointing at the same memory location. But only one of the pointers interprets the contents at that location correctly.

In thinking about pointers, it's helpful to remember that memory doesn't have any inherent interpretation associated with it. It is only through use of a specific type of pointer that the bits in a particular location are interpreted as meaningful data.

Figure 13-1 shows several views of the same location in memory, interpreted in several different ways.

Figure 13-1. The amount of memory used by each data type is shown by double lines

In each of the cases in Figure 13-1, the pointer points to the location containing the hex value 0x0A. The number of bytes used beyond the 0A depends on how the memory is interpreted. The way memory contents are used also depends on how the memory is interpreted. (It also depends on what processor you're using, so keep that in mind if you try to duplicate these results on your Desktop Cray.) The same raw memory contents can be interpreted as a string, an integer, a floating point, or anything else it all depends on the base type of the pointer that points to the memory.

General Tips on Pointers

With many types of defects, locating the error is the easiest part of dealing with the error and correcting it is the hard part. Pointer errors are different. A pointer error is usually the result of a pointer's pointing somewhere it shouldn't. When you assign a value to a bad pointer variable, you write data into an area of memory you shouldn't. This is called "memory corruption." Sometimes memory corruption produces horrible, fiery system crashes; sometimes it alters the results of a calculation in another part of the program; sometimes it causes your program to skip routines unpredictably; and sometimes it doesn't do anything at all. In the last case, the pointer error is a ticking time bomb, waiting to ruin your program five minutes before you show it to your most important customer. Symptoms of pointer errors tend to be unrelated to causes of pointer errors. Thus, most of the work in correcting a pointer error is locating the cause.

Working with pointers successfully requires a two-pronged strategy. First, avoid installing pointer errors in the first place. Pointer errors are so difficult to find that extra preventive measures are justified. Second, detect pointer errors as soon after they are coded as possible. Symptoms of pointer errors are so erratic that extra measures to make the symptoms more predictable are justified. Here's how to achieve these key goals:

Isolate pointer operations in routines or classes Suppose you use a linked list in several places in a program. Rather than traversing the list manually each place it's used, write access routines such as NextLink(), PreviousLink(), InsertLink(), and DeleteLink(). By minimizing the number of places in which pointers are accessed, you minimize the possibility of making careless mistakes that spread throughout your program and take forever to find. Because the code is then relatively independent of data-implementation details, you also improve the chance that you can reuse it in other programs. Writing routines for pointer allocation is another way to centralize control over your data.

Declare and define pointers at the same time Assigning a variable its initial value close to where it is declared is generally good programming practice, and it's all the more valuable when working with pointers. Here is an example of what not to do:

C++ Example of Bad Pointer Initialization

Employee *employeePtr; // lots of code ... employeePtr = new Employee;

If even this code works correctly initially, it's error-prone under modification because a chance exists that someone will try to use employeePtr between the point where the pointer is declared and the time it's initialized. Here's a safer approach:

C++ Example of Good Pointer Initialization

// lots of code ... Employee *employeePtr = new Employee;

Delete pointers at the same scoping level as they were allocated Keep allocation and deallocation of pointers symmetric. If you use a pointer within a single scope, call new to allocate and delete to deallocate the pointer within the same scope. If you allocate a pointer inside a routine, deallocate it inside a sister routine. If you allocate a pointer inside an object's constructor, deallocate it inside the object's destructor. A routine that allocates memory and then expects its client code to deallocate the memory manually creates an inconsistency that is ripe for error.

Check pointers before using them Before you use a pointer in a critical part of your program, make sure the memory location it points to is reasonable. For example, if you expect memory locations to be between StartData and EndData, you should take a suspicious view of a pointer that points before StartData or after EndData. You'll have to determine what the values of StartData and EndData are in your environment. You can set this up to work automatically if you use pointers through access routines rather than manipulate them directly.

Check the variable referenced by the pointer before using it Sometimes you can perform reasonableness checks on the value the pointer points to. For example, if you're supposed to be pointing to an integer value between 0 and 1000, you should be suspicious of values over 1000. If you're pointing to a C++-style string, you might be suspicious of strings with lengths greater than 100. This can also be done automatically if you work with pointers through access routines.

Use dog-tag fields to check for corrupted memory A "tag field" or "dog tag" is a field you add to a structure solely for the purpose of error checking. When you allocate a variable, put a value that should remain unchanged into its tag field. When you use the structure especially when you delete the memory check the tag field's value. If the tag field doesn't have the expected value, the data has been corrupted.

When you delete the pointer, corrupt the field so that if you accidentally try to free the same pointer again, you'll detect the corruption. For example, let's say that you need to allocate 100 bytes:

First, new 104 bytes, 4 bytes more than requested.
Set the first 4 bytes to a dog-tag value, and then return a pointer to the memory that starts after that.
When the time comes to delete the pointer, check the tag.
If the tag is OK, set it to 0 or some other value that you and your program recognize as an invalid tag value. You don't want the value to be mistaken for a valid tag after the memory has been freed. Set the data to 0, 0xCC, or some other non-random value for the same reason.
Finally, delete the pointer.

Putting a dog tag at the beginning of the memory block you've allocated allows you to check for redundant attempts to deallocate the memory block without needing to maintain a list of all the memory blocks you've allocated. Putting the dog tag at the end of the memory block allows you to check for overwriting memory beyond the location that was supposed to be used. You can use tags at the beginning and the end of the block to accomplish both objectives.

You can use this approach in concert with the reasonableness check suggested earlier checking that the pointers are between StartData and EndData. To be sure that a pointer points to a reasonable location, rather than checking for a probable range of memory, check to see that the pointer is in the list of allocated pointers.

You could check the tag field just once before you delete the variable. A corrupted tag would then tell you that sometime during the life of that variable its contents were corrupted. The more often you check the tag field, however, the closer to the root of the problem you will detect the corruption.

Add explicit redundancies An alternative to using a tag field is to use certain fields twice. If the data in the redundant fields doesn't match, you know memory has been corrupted. This can result in a lot of overhead if you manipulate pointers directly. If you isolate pointer operations in routines, however, it adds duplicate code in only a few places.

Use extra pointer variables for clarity By all means, don't skimp on pointer variables. The point is made elsewhere that a variable shouldn't be used for more than one purpose. This is especially true for pointer variables. It's hard enough to figure out what someone is doing with a linked list without having to figure out why one genericLink variable is used over and over again or what pointer->next->last->next is pointing at. Consider this code fragment:

C++ Example of Traditional Node Insertion Code

 void InsertLink(    Node *currentNode,    Node *insertNode    ) {    // insert "insertNode" after "currentNode"    insertNode->next = currentNode->next;    insertNode->previous = currentNode;    if ( currentNode->next != NULL ) {       currentNode->next->previous = insertNode;       <-- 1    }    currentNode->next = insertNode; }

(1)This line is needlessly difficult.

This is traditional code for inserting a node in a linked list, and it's needlessly hard to understand. Inserting a new node involves three objects: the current node, the node currently following the current node, and the node to be inserted between them. The code fragment explicitly acknowledges only two objects: insertNode and currentNode. It forces you to figure out and remember that currentNode->next is also involved. If you tried to diagram what is happening without the node originally following currentNode, you would get something like this:

A better diagram would identify all three objects. It would look like this:

Here's code that explicitly references all three of the objects involved:

C++ Example of More Readable Node-Insertion Code

void InsertLink(    Node *startNode,    Node *newMiddleNode    ) {    // insert "newMiddleNode" between "startNode" and "followingNode"    Node *followingNode = startNode->next;    newMiddleNode->next = followingNode;    newMiddleNode->previous = startNode;    if ( followingNode != NULL ) {       followingNode->previous = newMiddleNode;    }    startNode->next = newMiddleNode; }

This code fragment has an extra line of code, but without the first fragment's current-Node->next->previous, it's easier to follow.

Simplify complicated pointer expressions Complicated pointer expressions are hard to read. If your code contains expressions like p->q->r->s.data, think about the person who has to read the expression. Here's a particularly egregious example:

C++ Example of a Pointer Expression That's Hard to Understand

for ( rateIndex = 0; rateIndex < numRates; rateIndex++ ) {    netRate[ rateIndex ] = baseRate[ rateIndex ] *  rates->discounts->factors->net; }

Complicated expressions like the pointer expression in this example make for code that has to be figured out rather than read. If your code contains a complicated expression, assign it to a well-named variable to clarify the intent of the operation. Here's an improved version of the example:

C++ Example of Simplifying a Complicated Pointer Expression

quantityDiscount = rates->discounts->factors->net; for ( rateIndex = 0; rateIndex < numRates; rateIndex++ ) {    netRate[ rateIndex ] = baseRate[ rateIndex ] * quantityDiscount; }

With this simplification, not only do you get a gain in readability, but you might also get a boost in performance from simplifying the pointer operation inside the loop. As usual, you'd have to measure the performance benefit before you bet any folding money on it.

Draw a picture Code descriptions of pointers can get confusing. It usually helps to draw a picture. For example, a picture of the linked-list insertion problem might look like the one shown in Figure 13-2.

Figure 13-2. An example of a picture that helps us think through the steps involved in relinking pointers

Cross-Reference

Diagrams such as the one in Figure 13-2 can become part of the external documentation of your program. For details on good documentation practices, see Chapter 32, "Self-Documenting Code."

Delete pointers in linked lists in the right order A common problem in working with dynamically allocated linked lists is freeing the first pointer in the list first and then not being able to get to the next pointer in the list. To avoid this problem, make sure that you have a pointer to the next element in a list before you free the current one.

Allocate a reserve parachute of memory If your program uses dynamic memory, you need to avoid the problem of suddenly running out of memory, leaving your user and your user's data lost in RAM space. One way to give your program a margin of error is to preallocate a memory parachute. Determine how much memory your program needs to save work, clean up, and exit gracefully. Allocate that amount of memory at the beginning of the program as a reserve parachute, and leave it alone. When you run out of memory, free the reserve parachute, clean up, and shut down.

Shred your garbage Pointer errors are hard to debug because the point at which the memory the pointer points to becomes invalid is not deterministic. Sometimes the memory contents will look valid long after the pointer is freed. Other times, the memory will change right away.

C++ Example of Forcing Deallocated Memory to Contain Junk Data

memset( pointer, GARBAGE_DATA, MemoryBlockSize( pointer ) ); delete pointer;

Of course, this technique requires that you maintain a list of pointer sizes that can be retrieved with the MemoryBlockSize() routine, which I'll discuss later.

Set pointers to null after deleting or freeing them A common type of pointer error is the "dangling pointer," use of a pointer that has been delete'd or free'd. One reason pointer errors are hard to detect is that sometimes the error doesn't produce any symptoms. By setting pointers to null after freeing them, you don't change the fact that you can read data pointed to by a dangling pointer. But you do ensure that writing data to a dangling pointer produces an error. It will probably be an ugly, nasty, disaster of an error, but at least you'll find it instead of someone else finding it.

The code preceding the delete operation in the previous example could be augmented to handle this, too:

C++ Example of Setting a Pointer to Null After Deleting It

memset( pointer, GARBAGE_DATA, MemoryBlockSize( pointer ) ); delete pointer; pointer = NULL;

Check for bad pointers before deleting a variable One of the best ways to ruin a program is to delete() or free() a pointer after it has already been delete'd or free'd. Unfortunately, few languages detect this kind of problem.

Setting freed pointers to null also allows you to check whether a pointer is set to null before you use it or attempt to delete it again; if you don't set freed pointers to null, you won't have that option. That suggests another addition to the pointer deletion code:

C++ Example of Asserting That a Pointer Is Not Null Before Deleting It

ASSERT( pointer != NULL, "Attempting to delete null pointer." ); memset( pointer, GARBAGE_DATA, MemoryBlockSize( pointer ) ); delete pointer; pointer = NULL;

Keep track of pointer allocations Keep a list of the pointers you have allocated. This allows you to check whether a pointer is in the list before you dispose of it. Here's an example of how the standard pointer deletion code could be modified to include that:

C++ Example of Checking Whether a Pointer Has Been Allocated

ASSERT( pointer != NULL, "Attempting to delete null pointer." ); if ( IsPointerInList( pointer ) ) {    memset( pointer, GARBAGE_DATA, MemoryBlockSize( pointer ) );    RemovePointerFromList( pointer );    delete pointer;    pointer = NULL; } else {    ASSERT( FALSE, "Attempting to delete unallocated pointer." ); }

Write cover routines to centralize your strategy to avoiding pointer problems As you can see from this example, you can end up with quite a lot of extra code each time a pointer is new'd or delete'd. Some of the techniques described in this section are mutually exclusive or redundant, and you wouldn't want to have multiple, conflicting strategies in use in the same code base. For example, you don't need to create and check dog-tag values if you're maintaining your own list of valid pointers.

You can minimize programming overhead and reduce chance of errors by creating cover routines for common pointer operations. In C++, you could use these two routines:

SAFE_NEW This routine calls new to allocate the pointer, adds the new pointer to a list of allocated pointers, and returns the newly allocated pointer to the calling routine. It can also check for a null return from new (aka an "out-of-memory" error) in this one place only, which simplifies error processing in other parts of your program.
SAFE_DELETE This routine checks to see whether the pointer passed to it is in the list of allocated pointers. If it is in the list, it sets the memory the pointer pointed at to garbage values, removes the pointer from the list, calls C++'s delete operator to deallocate the pointer, and sets the pointer to null. If the pointer isn't in the list, SAFE_DELETE displays a diagnostic message and stops the program.

Implemented here as a macro, the SAFE_DELETE routine looks like this:

C++ Example of Putting a Wrapper Around Pointer Deletion Code

#define SAFE_DELETE( pointer ) { \    ASSERT( pointer != NULL, "Attempting to delete null pointer."); \    if ( IsPointerInList( pointer ) ) { \       memset( pointer, GARBAGE_DATA, MemoryBlockSize( pointer ) ); \       RemovePointerFromList( pointer ); \       delete pointer; \       pointer = NULL; \    } \    else { \       ASSERT( FALSE, "Attempting to delete unallocated pointer." ); \    } \ }

In C++, this routine will delete individual pointers, but you would also need to implement a similar SAFE_DELETE_ARRAY routine to delete arrays.

Cross-Reference

For details on planning to remove code used for debugging, see "Plan to Remove Debugging Aids" in Section 8.6.

By centralizing memory handling in these two routines, you can also make SAFE_NEW and SAFE_DELETE behave differently in debug mode vs. production mode. For example, when SAFE_DELETE detects an attempt to free a null pointer during development, it might stop the program, but during production it might simply log an error and continue processing.

You can easily adapt this scheme to calloc and free in C and to other languages that use pointers.

Use a nonpointer technique Pointers are harder than average to understand, they're error-prone, and they tend to require machine-dependent, unportable code. If you can think of an alternative to using a pointer that works reasonably, save yourself a few headaches and use it instead.

C++-Pointer Pointers

C++ introduces some specific wrinkles related to using pointers and references. The following subsections describe guidelines that apply to using pointers in C++:

C++ Example of Passing Parameters by Reference and by Value

void SomeRoutine(    const LARGE_OBJECT &nonmodifiableObject,    LARGE_OBJECT *modifiableObject );

This approach provides the additional benefit of providing a syntactic differentiation within the called routine between objects that are supposed to be treated as modifiable and those that aren't. In a modifiable object, the references to members will use the object->member notation, whereas for nonmodifiable objects references to members will use object.member notation.

The limitation of this approach is difficulties propagating const references. If you control your own code base, it's good discipline to use const whenever possible (Meyers 1998), and you should be able to declare pass-by-value parameters as const references. For library code or other code you don't control, you'll run into problems using const routine parameters. The fallback position is still to use references for read-only parameters but not declare them const. With that approach, you won't realize the full benefits of the compiler checking for attempts to modify nonmodifiable arguments to a routine, but you'll at least give yourself the visual distinction between object->member and object.member.

Use auto_ptrs If you haven't developed the habit of using auto_ptrs, get into the habit! By deleting memory automatically when the auto_ptr goes out of scope, auto_ptrs avoid many of the memory-leakage problems associated with regular pointers. In Scott Meyers's More Effective C++, Item #9 contains a good discussion of auto_ptr (Meyers 1996).

Get smart about smart pointers Smart pointers are a replacement for regular pointers or "dumb" pointers (Meyers 1996). They operate similarly to regular pointers, but they provide more control over resource management, copy operations, assignment operations, object construction, and object destruction. The issues involved are specific to C++. More Effective C++, Item #28, contains a complete discussion.

C-Pointer Pointers

Here are a few tips on using pointers that apply specifically to the C language:

Use explicit pointer types rather than the default type C lets you use char or void pointers for any type of variable. As long as the pointer points, the language doesn't really care what it points at. If you use explicit types for your pointers, however, the compiler can give you warnings about mismatched pointer types and inappropriate dereferences. If you don't, it can't. Use the specific pointer type whenever you can.

The corollary to this rule is to use explicit type casting when you have to make a type conversion. For example, in this fragment, it's clear that a variable of type NODE_ PTR is being allocated:

C Example of Explicit Type Casting

NodePtr = (NODE_PTR) calloc( 1, sizeof( NODE ) );

Avoid type casting Avoiding type casting doesn't have anything to do with going to acting school or getting out of always playing "the heavy." It has to do with avoiding squeezing a variable of one type into the space for a variable of another type. Type casting turns off your complier's ability to check for type mismatches and therefore creates a hole in your defensive-programming armor. A program that requires many type casts probably has some architectural gaps that need to be revisited. Redesign if that's possible; otherwise, try to avoid type casts as much as you can.

Follow the asterisk rule for parameter passing You can pass an argument back from a routine in C only if you have an asterisk (*) in front of the argument in the assignment statement. Many C programmers have difficulty determining when C allows a value to be passed back to a calling routine. It's easy to remember that, as long as you have an asterisk in front of the parameter when you assign it a value, the value is passed back to the calling routine. Regardless of how many asterisks you stack up in the declaration, you must have at least one in the assignment statement if you want to pass back a value. For example, in the following fragment, the value assigned to parameter isn't passed back to the calling routine because the assignment statement doesn't use an asterisk:

C Example of Parameter Passing That Won't Work

void TryToPassBackAValue( int *parameter ) {    parameter = SOME_VALUE; }

Here, the value assigned to parameter is passed back because parameter has an asterisk in front of it:

C Example of Parameter Passing That Will Work

void TryToPassBackAValue( int *parameter ) {    *parameter = SOME_VALUE; }

Use sizeof() to determine the size of a variable in a memory allocation It's easier to use sizeof() than to look up the size in a manual, and sizeof() works for structures you create yourself, which aren't in the manual. Because it's calculated at compile time, sizeof() doesn't carry a performance penalty. It's portable recompiling in a different environment automatically changes the value calculated by sizeof(). And it requires little maintenance since you can change types you have defined and allocations will be adjusted automatically.

< Free Open Study >