14.8 Review

A grammar is a set of rules that defines how to generate valid sentences in a language. Noam Chomsky defined a hierarchy of increasingly complex grammars and corresponding methods for recognizing the sentences that those grammars generate. It is possible to describe the valid words in most high-level programming languages, except Fortran and COBOL, using regular grammars. Similarly, it’s possible to describe the valid sentences in most high-level programming languages, except Fortran and COBOL, using context-free grammars. Rather than use context-sensitive grammars, most compilers either use a hand-coded program to verify the context-sensitive aspects of a program or a combination of a hand-coded program and an attribute grammar.

The more information a compiler has, the more defects it can find. The types of syntactic and semantic errors that a compiler can find depend on the language being compiled. The semantic analyzer performs context-sensitive analysis by referring to additional data structures as it examines its representation of the program. These additional data structures include symbol tables, control-flow information, and data-flow information, each at either the procedure or whole-program level.

Static analysis evaluates all code in an application. It can find defects in code not exercised by a particular test run. Static analysis requires source code and is likely to take more time than dynamic analysis.

Splint is a tool for statically checking C programs. Special comments called annotations are used to provide extra information about types, variables, and functions. By adding annotations to programs, it can perform stronger checks than any standard lint-like tool. Splint provides several hundred command-line options to control its error checking.

Splint detects memory leaks, the use of invalid pointers, and null pointer dereferences. It provides C programmers with tools to check object-oriented implementations by checking for violations of information hiding. It provides for complete type checking of Boolean, enum, and char data types. It extends the notion of C prototypes by enabling the programmer to identify global variables and side effects. Splint is particularly useful in identifying defects in C programs that are commonly used by hackers to attack network applications.

PC-lint/FlexeLint is a tool for statically checking C and C++ programs. It uses a number of different types of analysis to detect potential problems. Like other lint-type programs, it uses control-flow analysis, data-flow analysis, and the comparison of expressions to known problematic patterns. It performs special analyses of preprocessor macros and of all programming language constructs that don’t correspond to generated code or data. One of its strong features is procedural and interprocedural value tracking. It also provides features for C programmers to get some of the benefits of object-oriented programming.

CodeSurfer is a slice browsing tool for statically analyzing C programs. A slice is a collection of all the code that contributes to the computation of a value. CodeSurfer does both intraprocedural and interprocedural control and data-flow analysis, as well as pointer-target analysis. CodeSurfer provides the following analysis results in an interactive tool: data predecessors, control predecessors, data successors, control successors, backward slicing, forward slicing, and chopping. The interactive tool enables you to view these analyses from the perspective of specific variables, specific variables used in particular program elements, and specific functions.

Dynamic analysis can include system and third-party libraries, since source code isn’t required. Dynamic analysis only evaluates code that is executed. While this may mean that it will take less time than static analysis, it also means you must have a comprehensive test suite to get the full benefit from doing it.

Insure++ is a tool for dynamically checking C and C++ programs. Insure++ detects pointer-reference problems, memory-leak problems, miscellaneous pointer problems, and library API problems. It is used in place of your normal compiler. It parses your source code, generates revised source code that contains monitoring calls, and passes this code to your compiler. When you run your program, the runtime part of the tool checks data values and memory references for correctness.

Purify is a tool for dynamically checking C and C++ programs. It finds references to heap and stack memory that are unallocated or uninitialized. Purify works by modifying the object code used to build your application. It inserts instructions in front of every instruction that accesses data from memory. These instructions call functions that track the state of memory addresses in terms of allocation and initialization. When unallocated addresses are read or written or uninitialized addresses are read, Purify signals an error. After the object code of an application is processed by Purify, it can be linked in the normal way. When the application is executed, the Purify runtime library will report errors that it finds.

mpatrol is a library for dynamically checking C and C++ programs on a wide variety of platforms. It can generate a readable log file, summary profile, and history trace of dynamic-memory operations. It includes a complete set of replacements for those C and C++ library functions that allocate and manipulate memory. It also provides a complete API for controlling and examining the behavior of the memory allocator while it’s running. It provides multiple methods for diagnosing common memory-allocation and -manipulation problems.

All of these tools are very valuable in detecting bugs. If you’re programming in C or C++, we strongly recommend that you get both a static- and a dynamic-analysis tool. If you’re programming in C and can’t afford commercial tools, get Splint and mpatrol. If you’re programming in C++ and have a tools budget, get one of the dynamic-analysis tools first and then consider adding one of the static-analysis tools.