A helpful way to put all of these tools in perspective is to consider what additional information they develop beyond that of a traditional procedural compiler. The Chomsky hierarchy categorizes some of the types of information that such a compiler can develop. We have summarized the types of errors that such information can uncover. In addition, procedural compilers develop symbol tables and control-flow and data-flow information on a per-procedure basis.
Whole-program compilers are relatively rare. The author led the development of the first multilanguage, interprocedurally optimizing compiler available as a commercially supported product over a decade ago [LMSS91]. There have only been a few such compilers developed since then. See the bibliography for additional information on compiler techniques for interprocedural analysis.
The additional compile time required for interprocedural analysis is usually justified for compute-intensive applications. A side benefit of such compilers, however, is that they develop symbol tables and control-flow and data-flow information on a whole-program basis. The users of the aforementioned compiler liked it as much for its additional error-checking capabilities as for its optimizing ability!
CodeSurfer does both intraprocedural and interprocedural pointer-target analysis. Static pointer-target analysis associates the names of variables whose address is taken with pointer variables. CodeSurfer doesn’t do any type of constant propagation. Constant propagation is the use of control-flow and data-flow information to determine that a variable must have a particular constant value at a specific point in the program. PC-lint does both intraprocedural and interprocedural value tracking. Value tracking includes both constant propagation and pointer-target analysis. Pointer-target analysis makes it possible to catch a variety of misuses of dynamic memory statically.
Splint does error checking beyond that of procedural compilers by using annotations supplied by the user. These annotations, when provided, make it possible to perform strict type checking, to enforce information hiding of user-defined types, and to check the side effects of calling functions. PC-lint also employs supplemental information when it’s provided by the user for similar purposes.
Insure++ uses modified source code to track the status of memory blocks and verify the correctness of arguments to API calls. BoundsChecker uses modified intermediate code to track the status of memory blocks and verify the correctness of arguments to API calls. Purify uses modified object code to track the status of memory blocks and verify the correctness of arguments to API calls.
Insure++ and BoundsChecker use calls embedded in the program to record information relating pointer variables and addresses of memory. Other embedded calls check that information to validate the status of the pointer variables and make sure that operations on the memory addresses are valid.
Purify uses calls embedded in the program to record whether memory locations are writable (allocated) and readable (initialized). Other embedded calls check that information to make sure that operations on the memory addresses are valid. Unlike Insure++ and BoundsChecker, Purify has no concept of pointer variables, since it works without high-level-language source code. It tracks memory usage purely in terms of addresses.
mpatrol uses special versions of libraries to track the status of memory blocks. Like Purify, it can insert special buffers around dynamically allocated memory so that there is a record of aberrant pointer behavior at runtime.
Insure++, BoundsChecker, and Purify all check the correctness of arguments to API calls at runtime. If the APIs that they check were all written in a language with strict typing, it wouldn’t be necessary to check their arguments at runtime.
Unfortunately, many of these APIs are written in C, and so these tools are useful for catching API errors. The information needed from a strictly typed language to provide full static checking would include the following:
A true Boolean type that couldn’t be converted to and from integers
Types derived from integers with ranges of allowable values
Enumeration types that couldn’t be converted to and from integers
Character types that couldn’t be converted to and from integers
Array types composed from the types listed above with specified bounds