Chapter 10: Memory Leaks and Their Debugging | Memory as a Programming Concept in C and C++

Overview

Classification of the causes of memory leaks. Tracing memory leaks in C programs using location reporting and allocation/deallocation information-gathering versions of the Callocators and deallocators. Tracing memory leaks in C++ programs: overloading the operators new and delete and the problems it causes. Techniques for location tracing. Counting objects in C++. Smart pointers as a remedy for memory leaks caused by the undetermined ownership problem.

As mentioned previously, I do not like the terms "memory leaks" or "leaking memory". They somehow put the onus on memory, as if it were the memory's inadequacy that caused the problem. Every time I hear a project manager or a student explain in a grave tone of voice that "the project is delayed because we have memory leaking", I feel like retorting "OK, find a better memory that doesn't leak". In truth, it's not the memory but rather the program that is inadequate. We should be talking about leaking programs, not about leaking memory. In this chapter we will classify the most common problems leading to memory leaks and discuss how to identify and locate them. We will start with trivial and obvious problems and proceed to more subtle ones that are harder to deal with.

The first class of memory leaks is called orphaned allocation and is characterized by allocation of a memory segment whose address is not preserved for later deallocation. This is seldom intentional and usually results from insufficient knowledge of the C or C++ language. We present a few classic examples with strings (though any kind of object could be involved). First,

 char* p;    ...    p = malloc(strlen(s)+1);    p=s;

The memory for a copy of the string s is allocated, yet the pointer p is merely set to point to the same place as the pointer s . Even if free(p) is subsequently invoked, it does not free the segment previously allocated. The program may seem to work (at least p seems to be the "correct" string), but trying to free p and then s may induce a crash because some operating systems (like LINUX) will terminate a program that is trying to free a segment not currently allocated. A similar situation holds for

 char* p;    ...    p = malloc(4);    p = "abc";

The next class of memory leaks is referred to as the hidden allocation problem. It is characterized by a call to another function to deliver an object without realizing that the object has been created on the heap. As an example, consider the standard function strdup() . It duplicates a string and returns a pointer to it. The duplicated string is created on the heap. If not deallocated later, the memory leaks. However, shouldn't all C/C++ programmers know what a standard function such as strdup() does? Of course, but do they all know everything that standard functions do? And how about their knowledge of the particular operating system being used? Very often we need the services of external procedures, as in particular system calls. For instance, consider the UNIX system call

 struct group *getgrnam(const char *name)

Is the structure group allocated on the heap or somewhere else (say, in the kernel of the operating system)? Should it be deallocated by the user when no longer needed? Or consider the following Windows example:

 HBRUSH br;    ...    br = CreateSolidBrush(RGB(255,255,255));

Here br is a handle to a brush. We created a solid brush of certain color and captured the reference to it in br . Is the brush an object on the heap? Do we need to deallocate it? A diligent and thorough programmer would find answers to these questions, but often the questions are simply not asked. It's almost as if the programmer is behaving according to the principle "I am not responsible for what I have not done explicitly". The consequences of implicit actions are too often ignored.

The third class of memory leaks is a close relative to the previous one of hidden allocation; we call it the undetermined ownership problem. When discussing resources and their allocation, it is customary to refer to the entity responsible for "releasing" an unneeded resource as the "owner" of the resource. Thus the owner of a memory segment is (as far as we are concerned ) the module that is responsible for deallocating it. The undetermined ownership problem is characterized by a module obtaining a memory segment without realizing that it is now the owner and therefore responsible for its deallocation. A typical scenario runs along the following lines. A module A requests a module B to provide an object. The object is created dynamically and thus it cannot be owned by B . But the programmer of module A is not aware of the ownership issue - thinking perhaps that the object will be deallocated by some other module somewhere else, for the programmer is just "using" the object here. This confusion is usually compounded by the path of an object being more complex; for example, an object might have been created by a module D , which passed it to C , which passed it to B , which passed to A . Hence the programmer responsible for the design and/or coding of A may not even know the object's origin, let alone its intended future or who should deallocate it. The further apart the place where a dynamic object is created and the place where it is used, the more likely the object will be forgotten and undeallocated. Because objects are sometimes - for the sake of efficiency - passed in static "containers" (e.g., a function returns a pointer to a string that was not created dynamically but was stored in a static buffer), the problem cannot be avoided by following some simple rule like "always deallocate the object you received if you do not pass it along".

The fourth class of memory leaks, which is specific to C++, are those caused by an incorrect or missing destructor (as discussed in Chapter 8). We call this the insufficient destructor problem, and it arises when an object has some dynamic parts that are created by constructor(s) or other means yet are not specifically deallocated in the destructor. If an explicit destructor is omitted, the default destructor may be insufficient for the task.

Similar to the insufficient destructor problem is the fifth class of memory leaks (again, particular to C++ and detailed in Chapter 8), known as the incorrect assignment problem. When assigning new values to all the various members of an object, if any member is a dynamically allocated segment then it must be explicitly deallocated prior to assigning a new value. Most often this situation occurs when an explicit assignment is missing and so a default memberwise assignment is used instead.

The sixth class of memory leaks is also specific to C++; we call it the exception-unsafe code problem. It can be described as the following scenario:

 void doit()    {       TYPE *t = new TYPE;       ...       ...     // the code here can throw an exception       ...     // that is caught outside of doit()       ...       delete t;    }

If an exception is thrown by the code in the middle of the function doit() then the fully formed dynamic object that t points to will not be deallocated, because dynamic objects can be explicitly deallocated only by delete or delete[] (though delete or delete[] themselves may be called implicitly by a destructor).

Exceptions thrown in a constructor are not a problem, since all objects that have been created up to the throw point are automatically destroyed when the system stack is unwinding . In this case a memory leak could occur only if some raw memory had been allocated using the Callocators. Automatic objects are always destroyed when they go out of scope, no matter what the reason. Thus, they will also be destroyed during the stack unwinding.

The real problem is with exceptions in destructors: if an exception is thrown in a destructor, the deallocation that was to take place after the throw point does not happen. However, exceptions in destructors can be prevented by adhering to a simple principle: do not use any code that can throw an exception in a destructor. No matter how careful we are in the design of constructors and the destructor, the function doit() in the previous code will still leak memory if an exception is thrown.

Writing code that is exception-safe must be implemented in a design that is centered on the safety issue. Exception safety is an important programming issue in C++, and many excellent texts are available that deal with the whole range of its aspects. Let us simply mention that, in its abstract form, "exception safety" refers to making sure that no resources have leaked and that certain invariants are preserved when an exception is thrown. In broader, colloquial terms: after an exception has occurred, a thorough housecleaning is performed that brings the system back to the state before the "offending" module was activated. Later in the chapter we will discuss some possible solutions, but for now let us illustrate how insidious the problem can be. If you think that the offending code can easily be modified along lines like this:

 void doit() {    TYPE *t = new TYPE;    try {    ...       // the code here can throw an exception    ...       // that used to be caught outside of doit()    }    catch(...) {      ...      delete t;      throw;   // re-throw the same exception so it can be caught    }          // at the original site outside of doit()    ...    delete t; }

then beware of the same problem in a much more innocuous setting. The order of evaluation of subexpressions in an expression (or of arguments to a function) are not specified by the language, in order to allow some flexibility in the design of compilers. Thus an exception thrown by a TYPE2 constructor in the call

 void doit(TYPE1*,TYPE2*);   // prototype    ...    doit(new TYPE1,new TYPE2);  // call

cannot be rectified because we do not know whether the object of TYPE1 has been constructed yet. This is just poorly designed code. Another example:

 class COMP_NUMB {  //complex numbers    ...       friend COMP_NUMB& operator+(COMP_NUMB&,COMP_NUMB&);    ...    ...    };//end class COMP_NUMB    ...    p=p+ (*(new COMP_NUMB(2,3)) + *(new COMP_NUMB(3,5)));

If new COMP_NUMB(3,5) throws an exception, we do not know whether COMP_NUMB(2,3) has been created yet.

In neither of the previous two examples can we rectify the problem by placing an extra try{} and catch{} to provide the opportunity for a housecleaning. Instead, such code should simply be otherwise designed in the first place.

Finally, the last class of memory leaks is labeled the external component problem. It is quite conceivable that a memory leak occurs in an external component we are using (e.g., some operating system function or class, or some commercial software for distributed computing, or some database system, etc. - just search the Internet to see how common this actually is). With the complexity of software rapidly increasing, more and more software systems use components that come from external sources and not from where the software system is being developed.

Let us summarize our classification of memory leaks by stating to what language each type applies, identifying the usual cause, and assessing how difficult it is to rectify the problem.

Orphaned allocation (C and C++) - caused by poorly developed programming skills; can easily be rectified when detected and located.
Hidden allocation (C and C++) - caused by insufficient knowledge of the modules being used; can be rectified when detected and located, though changes in the code could be significant.
Undetermined ownership (C and C++) - caused by poor design and integration; can be rectified when detected and located, though changes in the code could be significant.
Insufficient destructor (C++) - caused by poor design and/or poor programming skills; can be relatively easily rectified when detected and located (it's the detection that is difficult).
Incorrect assignment (C++) - caused by poor design and/or poor programming skills; can be relatively easily rectified when detected and located (it's the detection that is difficult).
Exception-unsafe code (C++) - caused by poor design and/or poor programming skills; rectification requires a significant redesign of the code (this problem is difficult to detect and localize).
External component (C and C++) - caused by tough luck; not too much you can do about it except contact the external source or work around the problem.

In the following we will turn our attention to tracing memory leaks resulting from C allocators and from C++ operators. By my own reckoning, the undetermined ownership problem is the most prevalent cause of memory leaks in C-based software; whereas in C++ software the undetermined ownership, insufficient destructor, and incorrect assignment problems (in that order) are the most prevalent causes of memory leaks. I believe that there is a lot of exception-unsafe code out there, but the memory leaks associated with such software manifest themselves rarely, so finding them is crucial only for fail-safe and mission-critical applications. The only remedy for memory leaks associated with exception-unsafe code is to redesign the software, which of course is beyond the scope of this book. Similarly, memory leaks associated with the insufficient destructor and incorrect assignment problems are evidence of poor programming and must be rectified by providing or fixing the constructors and/or assignments. This is also outside the scope of this book, yet we hope that the material in Chapter 9 provides sufficient understanding of these issues to help the reader avoid the leaks. At the end of this chapter we shall deal with the undetermined ownership problem by discussing a possible C++ remedy in the form of so-called safe or smart pointers.