8.2 Stabilize the program

Most works on debugging recommend a process often called stabilizing the problem . The goal of stabilization is reproducibility. Defects that can’t be stabilized can’t be fixed. Stabilizing a defect can be the hardest part of the debugging task.

The first step in stabilizing a bug is to run the defective program several times. If you don’t always get an undesired behavior, you will need to modify the program or the environment in which you’re running it. These types of defects will show a premature termination on one run, an infinite execution on another run, and so on.

Even getting an undesired behavior each time is no guarantee that you don’t need to stabilize at this level. It can just mean that you haven’t run the program enough times for a nondeterministic defect to show itself, or that you haven’t run it in the right environment or on the right platform.

At this point, if you do get different behaviors on different runs, several hypotheses should suggest themselves. These types of bugs often involve memory corruption, data-structure violations, or problems with the environment or platform.

There is one circumstance in which it may not be possible to cause the undesirable behavior on each execution. This occurs when your application has multiple independent streams of control. Parallel execution can occur at the thread level or the process level. If you know that the application has parallel execution, then you may need to be satisfied with increasing the frequency of the problem occurrence. When a problem is caused by nondeterminism, it’s often impossible to cause a problem to manifest on every run.

The second step in stabilizing a bug is to run the defective program several times again if you had to make changes to cause an undesired behavior to occur with each run. If the same undesired behavior occurs with each run, you can move on to the next step. If you don’t always get the same undesired behavior, you will need to modify the program or the environment you’re running it in again.

Once you have completed this task, the same group of hypotheses should suggest themselves. These types of bugs often involve memory corruption, data-structure violations, or problems with the environment or platform.

Now that you’re getting the same undesired behavior with each execution of the program, you would be wise to try another level of stabiliza tion. Try running the offending program in a different environment. By environment, we mean things like user account, environment variables, registry settings, system load, and so forth. A different environment still means on the same platform, by which we mean operating system and hardware architecture.

Under what circumstances is this type of stabilization helpful? If a problem occurs on the submitter’s account, but not on the developer’s account, a hypothesis of initialization errors, data-structure violations, or memory problems is a good place to start. The same hypotheses are also indicated if a program fails when running standalone but works correctly when running under a debugger.

Finally, it’s sometimes useful to try running the offending program on a different platform. Of course, if the program relies on specific features of the operating system or hardware, this may not be possible. Even if a completely different platform is out of the question, you may be able to use related platforms, such as different versions of the same operating system. For example, you might not be able to take a program developed on a version of UNIX™ and run it on Windows™, but you might be able to run it on Linux™.

Under what circumstances is this type of stabilization helpful? If you run the program on a different platform and notice that discrepancies in floating-point output change, a hypothesis of value corruption due to different floating-point hardware or libraries is reasonable. If the application is dependent on vendor-provided versions of industry-standard libraries, and you see differences in behavior from one platform to another, a hypothesis of problems in other people’s software is warranted.

Stabilization is one of the first things we do when diagnosing a bug. If done correctly, it can provide you with a number of qualified working hypotheses.