Tracing Memory Leaks in C

Allocation of memory using the C allocators malloc() , calloc () , and realloc() was covered in Chapter 4. The C approach to allocation presents a slight problem because the allocation is not type-safe; for instance, p = malloc(sizeof(NODE)) will yield a totally different result than p = malloc(sizeof(NODE*)) , an easy error to make. On the other hand, all C allocators are standard functions and hence are not tied to the compiler by which we mean that, unlike in C++, the compiler need not be aware that any allocation or deallocation is taking place. It is thus extremely simple to use replacement versions of these functions that can provide much more than just the allocation/deallocation. This is an invaluable tool for tracing memory leaks in C programs. For simplicity we will discus only malloc() and free() , though the same remarks may be applied to calloc() and realloc() . The basic idea is to have malloc() record what was allocated, who (i.e., which piece of the code) requested the allocation, and when it was allocated; likewise, we use free() to keep track of what has been deallocated, by whom, and when. We want to do this with the least effort and without significantly modifying the program.

The simplest approach utilizes the ANSI C preprocessor. If we merely add to our program the preprocessing directive

 #define malloc(size) debug_malloc(__FILE__,__LINE__,size) 

then (for example) an occurrence of p = malloc(sizeof(NODE)); in the source file source.c on line 321 will be expanded prior to compilation to p = debug_malloc("source.c",321,67); . We link with the program our debugging version of malloc() using the prototype

 void* debug_malloc(const char* src,int line,size_t size) 

Besides allocating the requested amount of memory (through a call to ordinary malloc() ), debug_malloc() can record the information in a log file and/or keep it in a private data structure that is accessible only by debug_malloc() and debug_free() . Similarly, we can add the preprocessing directive

 #define free(ptr) debug_free(__FILE__,__LINE__,ptr) 

and, for instance, an occurrence of free(p); in the source file source.c on line 457 will be expanded to debug_free("source.c",457,p); . Our debugging version of free() with prototype

 void debug_free(const char* src,int line,void* ptr) 

must also be linked to the program; this version can log the information (in the same log file as debug_malloc() or in a separate log) and/or remove the information from the private data structure. Of course, debug_free() deallocates the requested memory using the ordinary free() .

We can also have statistics - on what remains undeallocated - printed out or logged at program termination. The least intrusive method is to add (to the beginning of main() of our program) an atexit registration of a function that prints or logs the statistics; the ANSI C function atexit() provides the registration. A function registered "atexit" will be automatically executed by the ANSI C function exit() used to terminate programs -unless the program uses some other means of termination (e.g., _exit() under UNIX) that bypasses atexit.

By examining the log or the exit statistics after program execution, we can determine whether all dynamically allocated memory has been deallocated - that is, whether memory is leaking. Information concerning what has not been deallocated and which part of the code requested the offending allocation can help determine if there really is a memory leak.

If debug_malloc() and/or debug_free() log the information in a file, it is prudent to execute fflush (log) after each entry to the log so that the log will be current if a crash occurs. (If our program spawns multiple processes or executes in a multithreaded fashion then the logging becomes more complicated, because some form of file locking must be provided in order to prevent two processes or two threads from writing into the log simultaneously ; we will touch upon this topic in Chapter 11.) As Figure 10.1 illustrates, our approach to tracing memory leaks requires just small changes to the program.

image from book
Figure 10.1: Modifying a C program in order to trace memory leaks

The localization provided by __FILE__ and __LINE__ may not be sufficient. Consider a service function doit() that is called in a large number of lines of the code of the program. We may determine that the leaks occur in doit() based on the data passed to doit() by its caller. But which caller is it? There is a solution, but it must be built into the coding practice; it cannot be just "magically added on" when we need to trace memory leaks as we did with debug_malloc() and debug_free() . It is always prudent in serious software development - especially if network-based - to require programmers to denote entry into modules in some standard way and so enable extensive logging in test versions of the system. For instance, we may use something like

 void doit(....)    {      TRACE(doit)      ...      ...      ...      RETURN    } 

where TRACE is defined through a macro to set some global variable with the unique designation of the function doit() upon its activation. This can provide run-time localization of the log entry. Similarly, RETURN is defined as a macro as needed. Logs with such entries are easier to read and examine than mere references to lines in the source files. If desired for more detailed debugging (as a compilation option), the macro TRACE is defined to actually stack the function references during execution and the macro RETURN is defined to pop the stacked function references, thus providing the means to trace a whole thread of execution for debugging. For a production build, RETURN is simply defined as return (a more detailed treatment of this topic may be found in the following section on tracing memory leaks in C++ programs).

There are many reasons for not using macros. The most important reason is that they may change the "common sense" semantics of the code - for example, #define add(a,b) subtract(a,b) (purposefully an extreme example) will confuse everyone reading the program, since they would naturally think that add(a,b) in the code actually adds the values. However, enabling detection of leaks is one of the few cases for which I recommend using macros to alter the code. Using macros to define TRACE() and RETURN for tracing (as illustrated previously) is similarly justified. It does not confuse anybody because it is not used in any context other than designating the entry or exit of a function. Macros provide a speedy and efficient way to handle certain tasks , and they can be of great help if used sparingly and cautiously.

The approach described here works well for C programs developed in-house, but is not very helpful if the system includes external C-based object code where the leak may have occurred. All calls to malloc() and free() in the object code linked with our program will be linked with the system malloc() and free() , not our debugging versions. In this situation we must use some replacement versions of malloc() and free() , say rmalloc() and rfree() . We also need intimate knowledge of the compiler being used.

First we need a new function malloc() that does essentially the same job as debug_malloc() but gets its location information from global variables (rather than arguments) and uses rmalloc() for the actual allocation. Second, we now need a new debug_malloc() function: it has the same prototype as the previous version but now merely sets the global variables for malloc() and calls malloc() . Similarly, our new deallocator free() does essentially the same job as debug_free() did previously, getting its location information from the global variables and using rfree() for the actual deallocation. The new debug_free() sets the global variables and calls free() . Prior to each call to a function from the offending object code, we add instructions setting the localization global variables accordingly . We link with our program the object codes of malloc() and free() (avoiding the standard malloc() and the standard free() ; this is where we need a good knowledge of the C compiler we are using), debug_malloc() , debug_free() , .... Thus we have a program in which every call to malloc() in the part programmed by us is a call to debug_malloc() , which sets the location global variables and calls our version of malloc() . Every call to malloc() within the external object code is now a call to our version of malloc() , and likewise for free() . As before, we can log or store the information about allocation and deallocation and use it to determine if (and where) memory leaks have occurred. See Figure 10.2.

image from book
Figure 10.2: Modifying a C program that has an external object code in order to trace memory leaks

There are many public-domain or commercial versions of malloc() and free() available. Many of those have debugging features along the lines described here. The decision of whether to obtain them or to use your own debugging versions of malloc() and free() depends on many factors outside the scope of this book. In any case, understanding how the debugging feature works is essential to a proper use of the debugging versions of malloc() and free() .



Memory as a Programming Concept in C and C++
Memory as a Programming Concept in C and C++
ISBN: 0521520436
EAN: 2147483647
Year: 2003
Pages: 64

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net