13.6 Building debugging infrastructure

13.6.1 Augmented data structures

If you know that you will have to debug a program you’re developing, consider augmenting your data structures with information that will make the debugging task easier. Several different types of data can be added.

One way to augment data structures is to provide identification information that connects the data structure to the user input or the outside world. Compiler developers normally insert the source-file name, line number, and column number into the internal representations the compiler builds while parsing a program. This enables the compiler to generate error messages that relate back to the element of the user program that is erroneous.

Optimizing compilers normally use additional representations, such as control-flow graphs, derived from the representation generated during parsing. The source-file, line, and column information must be consistently transferred to these intermediate representations. When a problem in the compiler occurs, the compiler writer can more easily trace the problem in his data structures to the user program and often create a minimal test case.

Programs that process discrete transactions can include timestamp information in data structures that are related to the transactions. Time-stamps can be collected in several ways. Sometimes the most valuable information is the time of the original transaction. In other situations, it’s useful to collect a string of timestamps that identify each occasion when a data structure was updated. In other situations, just the time of the most recent update is useful.

Another way to augment data structures is to provide structural redundancy. Linked lists and B-trees can be augmented to include redundant structural information. Research into augmented data structures was motivated by the desire to build fault tolerance into software. The bibliography contains several references on this subject.

When data structures have been afflicted by memory corruption, debugging can be very difficult. Data structures that contain sufficient information to detect corruption, and even to correct the problem automatically, make this task much easier. Designers of such robust structures take care to minimize both the extra space and extra time required to use them.

13.6.2 Augmented procedures

If you know that you will have to debug a program you’re developing, consider augmenting your procedures with code you can use to help with debugging. Chapter 9 begins with the following tactics that involved augmenting procedures:

Display variable values.
Display execution messages.
Display procedure arguments.

The refined versions of these tactics all included variants that place the output generation under the control of a preprocessing statement or a conditional statement that is evaluated at runtime. The conditional statement could either test a state variable or check whether a command-line option was used.

Besides building in the ability to display variable values, execution points, and procedure arguments, you can also consider adding the ability to store important values. In the simplest case, you may want just to assign the intermediate results of a complex calculation to a variable. This may simplify using an interactive debugger.

When debugging programs that run for a long time, such as a database system or an operating system kernel, it can be useful to accumulate a log of actions performed by the program. Normally, this log is written to a memory area for efficiency reasons. You may want to inspect the log while the program is running. To do so, you can attach an interactive debugger to the running program and display the contents of the log area. Alternatively, the logging code can provide calls that flush the log to disk when the area is full or upon request.

13.6.3 Specialized interactive debuggers

If you know that you will have to debug a program with complex data structures, consider implementing a specialized debugger. You invoke the specialized debugger by executing a procedure while running the application under a standard interactive debugger. The special debugger takes commands from the keyboard and executes procedures that implement the commands.

The Convex Application Compiler [LM94] created a Program Data Base (PDB) to represent all of the information that it collected or deduced about an application. The PDB contained a modest number of root objects, such as a programwide symbol table and a procedure call graph. A graph that represented the contains relationship for objects in the PDB was over ten deep for some of the root objects. The program had five phases, each of which read from and wrote to the PDB.

The Convex Application Compiler included a specialized internal debugger. Programmers typically debugged this compiler by running one or more phases of the compiler and then inspecting the PDB. At each level, the programmer could select any data member of the active object. This would reveal the data members of the selected member, and so the entire database could be traversed recursively. For members that were container objects (usually vectors), the programmer could select any item in the container.

Each object that was a part of the database was required to have several standard methods. The sketch method would display the contents of the object without recursively traversing the contents of the contained objects. Pointers were simply shown as hexadecimal numbers. The dump method would recursively display the object and all the contained objects, using the most human-readable form possible. For example, enumeration constants were shown with literal tags, rather than integer values.

13.6.4 Assertions

An assertion is a test to confirm the truth condition that should be true. If the test is true, nothing happens. If the test is false, a message is printed, and the program stops. Assertions prevent programs from executing when the assumptions they make aren’t valid.

Some people practice contractual programming. This doesn’t refer to how the programmer is paid, but the relationship between calling procedures and called procedures. The language Eiffel has built-in features that support the concept of the contract. The calling procedure promises that certain conditions will be true when it makes a call. The called procedure promises that certain conditions will be true when it has completed.

There is no real need for language features to support assertions, since a simple conditional test and I/O statement provide the necessary support. You can assert the assumptions a procedure makes before it starts work and assert the conditions it guarantees to be true before it returns. Be careful not to include any code in an assertion that causes side effects, such as assignments or input/output. If you include these, and turn off assertion checking, the behavior of your program will change in mystifying ways.