14.5 Static-analysis tools

Chapter 9 discussed debugging tactics that can be used with the tools that every programmer has available. These include a text editor and a language translator, either a compiler, interpreter, or assembler. In addition, most programmers have access to an interactive debugger, either at the highlevel-language or assembly-language level. This section, and the following one, complement that chapter. They discuss advanced debugging techniques that can only be done with the aid of sophisticated tools.

We want to find bugs beyond those that can be identified by compilers. Compilers use semantic analysis augmented with symbol tables and control-flow and data-flow information to find the defects described previously. To find additional bugs, it’s necessary to develop additional information about the behavior of the program.

There are two main approaches to developing this information. The first approach is static analysis. Static analysis derives information prior to execution, usually at compile time.

There are two techniques used in static analysis: rule evaluation and symbolic execution. Rule evaluation first matches parts of the representation of the program with representations of problem program fragments. It executes a set of instructions for each match. Symbolic execution first creates an abstract representation of some aspect of the computation. It then evaluates each part of that representation in an order derived from the application.

Symbolic execution can be flow sensitive or insensitive. Flow-sensitive execution means that the flow of control of the original program is simulated in some fashion. Flow-insensitive execution means that the simulation is sequential, without the possibility of conditional or iterative execution. Flow-sensitive symbolic execution is normally more expensive, both in time and space.

The other approach to developing extra information about the behavior of a program is dynamic analysis. Dynamic analysis derives information at application runtime.

Dynamic information can be collected by modifying the source files, modifying the object files, linking in different libraries, modifying the application executable, or running a separate process to monitor the behavior of the original.

14.5.1 Static analysis

There are several reasons why static analysis may be more effective than dynamic analysis in diagnosing a problem. Static analysis evaluates all code in an application. It can find defects in code not exercised by a particular test run. Actually running an application may take significant resources in a production environment. Sometimes those resources aren’t available to developers. Static analysis can be done offline in a development environment.

On the other hand, static evaluation isn’t without its drawbacks. Static analysis requires source code, which normally excludes system and third-party libraries from the analysis. Static analysis often takes more time than dynamic analysis.

14.5.2 Splint

Splint is an open-source tool for statically checking C programs. It is available for download from www.splint.org. Splint is a derivative of LCLint, which was developed by MIT and DEC SRC [GH93].

It can be used as a substitute for UNIX™ lint. By adding annotations to programs, it can perform stronger checks than any standard lint can. Annotations are stylized comments that document assumptions about functions, variables, arguments, and types.

14.5.2.1 Features

The following problems, among others, can be detected by Splint with just source code:

Unused declarations
Type inconsistencies
Variables used before being assigned
Function return values that are ignored
Execution paths with no return
Switch cases that fall through
Apparent infinite loops

The following problems, among others, can be detected by Splint with annotation information:

Dereferencing pointers with possible null values
Using storage that is undefined or partly undefined
Returning storage that is undefined or partly defined
Type mismatches
Using deallocated storage
Memory leaks
Inconsistent modification of caller visible states
Unexpected aliasing or data-sharing errors
Inconsistent use of global variables
Violations of information hiding
Undefined program behavior due to evaluation order, incomplete logic, infinite loops, statements with no effect, and so on
Problematic uses of macros

14.5.2.2 Technology

Special comments, called annotations, are used to provide extra information about types, variables, and functions. These comments start and end with the “at” sign (@).

Splint provides several hundred command-line options to control its error checking:

Global options control initialization.
Message format options control message display.
Mode selectors provide coarse control of checking.
Checking options select checks and the classes of reported messages.

Global options are used on the command line or in initialization files. The other options can be used in control comments in the source as well.

Splint detects null pointer dereferences by analyzing pointers at procedure interface boundaries. If this checking is turned on, the program must protect all dereferences to possible null pointers with either guard statements or annotations that declare the safety of various pointer constructs.

Splint detects local variable references that may occur before a valid value is assigned to the variable. Unless references are annotated, the storage referred to by global variables, function arguments, and function return values must be defined before and after function calls.

Instead of treating user-defined enumeration as integers, Splint treats each enumeration as a distinct type. Similarly, instead of treating char data as integers, Splint treats it as a distinct type. Splint also provides a Boolean type that is treated differently from integers, making it possible to check for common errors in control-flow statements.

Splint detects memory leaks and the use of invalid pointers with annotations that document the assumptions about function interfaces, pointer variables, type definitions, and structure fields. It uses the concept of deallocation obligations, which occur at certain points in the program, to detect deallocation problems. If the assumptions about pointers are completely documented, the tool can assure the programmer that some memory management errors never occur. In contrast, dynamic tools can only provide the assurance that those errors don’t occur when executing specific test cases.

While C++ provides built-in language features for object-oriented programming, those who must use C can also benefit from some of its features through the use of Splint. It detects when the representation of an abstract type is exposed, which occurs when a user of an object has a pointer to storage that is part of an object instance. Not only can it identify these types of information-hiding violations, but it can also detect other related problems such as modifications of string literals.

Splint takes the concept of the prototype in ANSI C to its logical conclusion. It enables the programmer to specify what global variables a function uses and what side effects it can cause. It also provides a general-purpose facility for describing predicates that must be true on entry to or exit from a function.

14.5.2.3 Usage

Under what circumstances does it make sense to use Splint? First, Splint only handles C source code, not C++ or Java. There are still many large programs written in C. In some cases, there is a large prior investment in code. In other cases, a C++ compiler isn’t available for the target hardware. In still other cases, the C++ development environment may not be competitive with the C development environment in some respect.

The second constraint on Splint usage is the investment in annotations. If the program will be in use for a relatively long time, the extra effort required to add annotations will pay off in quicker diagnosis of defects. If there will be more than one person doing maintenance on the program, the explanatory benefit of the annotations can be significant.

The third constraint on Splint usage is the relative priority of security. If the application being checked will be in wide public usage in a way that could be used to attack system security, the investment in annotation is easily justified. Splint is supported by a research group that specializes in security issues. It is particularly effective at finding certain kinds of problems that are commonly exploited by hackers.

14.5.3 CodeSurfer

14.5.3.1 Features

CodeSurfer is a product of GrammaTech for statically analyzing C programs. It is available on Windows™ platforms, Solaris™, and Linux™.

A slice is a collection of all the code that contributes to the computation of a value. A slice can be computed strictly from static data-flow graphs, or it can be constrained by actual statements executed. Research going back twenty years shows that programmers think in terms of slices when they debug [We82b].

CodeSurfer analyzes the application as a whole to generate the following information:

Data predecessors are those assignments whose values may be used by a statement.
Control predecessors are the control statements that may affect whether a statement is executed.
Data successors are those program elements that may use the values assigned by a statement.
Control successors are the statements whose execution depends on control-flow choices made by a statement.
Backward slicing shows all the program elements that may affect a specified statement.
Forward slicing shows all the program elements that may be affected by executing a specified statement.
Chopping shows all the ways one set of program elements affects another set of program elements.

CodeSurfer doesn’t currently perform dynamic slicing, which is sometimes called dicing. This would require the use of an execution profile.

The results of analysis can be used both from an interactive tool (CodeSurfer) and through an API for the Scheme programming language. The interactive tool supports queries of the developed analyses from the perspective of selected variables, selected variables used in particular program elements, and selected functions. You can create complex queries through the use of a set calculator.

The interactive tool creates a number of graphs that you can navigate. This navigation is done in a manner analogous to using a Web browser. The links between program elements aren’t inserted into program text. Instead, they’re collected into property sheets, which you can activate by clicking the appropriate program element.

14.5.3.2 Technology

Interprocedural control- and data-flow analysis are essential to compute useful slices. CodeSurfer does both. To understand C programs, it’s essential to do pointer-target analysis, which CodeSurfer also does [LMSS91].

You can increase the performance of pointer-target analysis, as well as the other dependency analysis algorithms, by selecting settings that generate less-precise results. Pointer-target analysis in CodeSurfer is insensitive to flow of control within a procedure and to function invocation among procedures. The results of flow-sensitive analysis would provide more precise results.

CodeSurfer doesn’t currently apply the results of its dependence analysis to do interprocedural constant (or value) propagation [MS93]. This analysis could be fed back into control-flow analysis, resulting in more precise slices.

CodeSurfer doesn’t currently apply the results of its dependence analysis to do interprocedural array subscript analysis. This analysis would result in more precise slices.

14.5.3.3 Usage

Chapter 7 describes a strategy for using a slice browser. First, you use a backwards slice to identify those program elements that are related to a given statement. Then you recursively investigate those elements looking for the source of the problem. Once you have identified a change to fix the problem, you use a forward slice from the point of the change to identify those program elements that will be affected by your proposed change.

At the time of this writing, CodeSurfer only supports the C language. Hopefully, by the time you read this book, it will also support C++. Currently, the vendor says that 100,000 lines of code is a reasonable limit on the size of the application that CodeSurfer can handle. Hopefully, by the time you read this book, this limit will have been increased.

There are some limitations on its effectiveness. Currently, CodeSurfer doesn’t represent the dependencies introduced by the following features of ANSI C:

union
setjmp/longjmp
Signals
Volatile storage
System calls such as exec and abort that terminate an application

This means that if your problem is related to the use of these features, CodeSurfer won’t identify them as dependencies of the statement where you identified the problem.

14.5.4 PC-lint/FlexeLint

PC-lint is a product of Gimpel Software for statically checking C and C++ programs. It is available on all Windows™ platforms. The corresponding tool, FlexeLint, is available on UNIX™, Linux™, and other popular operating systems.

14.5.4.1 Features

PC-lint provides the traditional rule-based lint features. It also implements value tracking both within and across function boundaries. It provides the means for C programmers to perform strong type checking through the use of typedef.

The most recent version of PC-lint has more than eight hundred error messages. It also provides more than one hundred command-line options, so you have complete control over the messages you wish to see.

14.5.4.2 Technology

PC-lint employs a number of different methods for identifying errors or likely errors. It uses control-flow and data-flow analysis to find the follow ing errors, which are normally found by optimizing compilers:

Uninitialized simple and aggregate variables
Unused variables and functions
Variables that are assigned, but not used
Code that is unreachable

PC-lint compares expressions to common error patterns to find the following types of errors:

Likely problems with operator precedence
Constant inputs to control-flow statements
Empty statements in problematic places
Undefined order of evaluation for expressions
Insufficient or excessive initializers

PC-lint uses language-specific analyses to find the several dozen different errors in C++ code that aren’t detected by some C++ compilers, including the following:

Constructor misuses
Destructor misuses
Initializer misuses
Exception misuses

PC-lint uses analysis of the numerical types and precision of expressions to find the following types of errors:

Loss of precision in expressions
Mixing signed and unsigned integer expressions
Overflow evaluating constant expressions
Constant expressions that reduce to zero
Unsigned comparisons with zero

PC-lint uses a scan of preprocessor macros to find the following types of errors:

Using an expression as a macro parameter that isn’t parenthesized
Using an expression with side effects as a macro parameter that is repeated
Macros that are unparenthesized expressions

PC-lint uses an analysis of language constructs that only exist at compile time to find the following types of errors:

Unused macros, typedefs, declarations, structs, unions, enums, classes, and templates
Unused header files
Externals that can be declared static
Declarations that can be removed from header files

PC-lint uses procedural and interprocedural value tracking to find the following types of errors:

Boolean expressions that always evaluate to the same result
Dereferencing a null pointer
Passing a null pointer to library functions
Dereferencing a pointer that doesn’t point to value memory
Failure to deallocate dynamic memory
Incorrect deallocation of dynamic memory

In PC-lint, value tracking associates a set of values that are bound to local variables, arguments, and class data members. These values can be bound by assignment, passing arguments, and returning results. Not only are the values associated with the name, but also the source code locations that contributed to generating the value.

While C++ provides built-in language features for object-oriented programming, those who must use C can also benefit from some of its features through the use of PC-lint. PC-lint provides features that allow the C programmer to use strong typing. Thus, the relationship between typedef references are examined strictly by name, rather than by equivalence of implementation. Command-line options are provided to control the nature of type checking, to add array subscripts as special types, and to create an inheritance hierarchy of types.

14.5.4.3 Usage

The best way to use a static checking tool is by continually keeping your source code clean of those messages reported by the tool that you know are more likely to indicate bugs. Some reports from these tools have such a low yield of real bugs that you’re better off ignoring them. On the other hand, for those reports that you know have a meaningful possibility of indicating a real bug, you should investigate them immediately and resolve the problem.

If you’re starting a new project and plan to use PC-lint, decide which of its messages you will be concerned about, turn them on, and resolve them frequently. Some projects do a nightly build of their software, and this is an excellent time to run a static analyzer. First thing in the morning, you can resolve all of the reports, in addition to any bugs turned up by nightly testing. Don’t let a backlog develop. The inevitable result of accumulating a backlog is that you will turn off the messages and eventually stop using the tool altogether.

If you’re doing maintenance of an existing project and have decided to use PC-lint at this point in the project life cycle, it’s important to schedule time to resolve all the reports this tool will generate. Once you have decided which reports you will be concerned about, resolve all of them and follow a zero-tolerance policy so they don’t creep back in.

In either case, it’s instructive to keep a record of the percentage of reports that actually indicate a real defect. Even for the most important messages, this will still be only a modest fraction of the total. Even if only 5 percent of the messages point to a problem, that could translate to dozens of defects that weren’t being found by your other methods of testing.