Professor Solomon isn’t a professor, nor is he a detective. His advice on how to find lost objects is nonetheless in the best tradition of Sherlock Holmes and Lord Peter Wimsey. They spend little of their literary lives looking for lost objects. Real detectives, however, are often hired to find missing persons and sometimes missing objects as well.
Most of Professor Solomon’s book is devoted to his twelve principles for finding lost objects:
Don’t look for it.
It’s not lost—you are.
Remember the three c’s.
It’s where it’s supposed to be.
Look for domestic drift.
You’re looking right at it.
The camouflage effect.
Look once, look well.
The eureka zone.
It wasn’t you.
Our approach to applying the professor’s method is to make an analogy between lost objects and unknown causes of defects. The lost object is instead an action or lack of action occurring in an unknown location in a program. The visibility of an object is instead the understanding of how a piece of code causes the symptom.
Professor Solomon explains this somewhat cryptic advice with the following suggestion: “Wait until you have some idea where to look.”
Professor Solomon’s approach fits perfectly with ours. After all, we advocate debugging by thinking, not debugging by playing around in an interactive debugger or debugging by hacking source code. Professor Solomon advocates finding things by thinking.
The first thing not to do when you have found or received evidence of defect is to start looking at source code. The second thing not to do when you have found or received evidence of defect is to start running the application and looking at output.
Here are some things you should do when you first start thinking about a bug:
Make a list of criteria you will use to qualify modules or procedures for or disqualify them from investigation.
Make a list of similar defects you have seen before, particularly in this application.
Make a list of root causes of those defects and choose the ones that are possible hypotheses.
Once you have created the first list, you can start applying your criteria and eliminating parts of the application from further consideration. Once you have created the second list, you can select those parts that remain that have demonstrated similar problems in the past. Once you have created the third list, you can start thinking about how to collect information from these parts to prove or disprove your hypotheses.
Professor Solomon expands on this observation with the following qualification: “There are no missing objects. Only unsystematic searchers.”
Lost objects are objects whose location is currently unknown to you, not objects whose location is unknowable. In the same way, a bug is a behavior that is as yet unexplained. It isn’t impossible to explain it; it’s just not yet explained.
We suggest a number of different ways to search systematically for the causes of defects. To a surprising degree, it doesn’t matter which system you use, so much as whether you use any system at all.
A systematic searcher can tell you what he or she has already searched, what he or she is currently searching, and what remains to be searched. Some people search systematically, but have never put their thought processes into words. You can assess whether you’re searching systematically by writing down the details of your search. When you read the journal at a later time, you should see the system you’re using, even if you weren’t conscious of it at the time.
“To find a lost object, you must be in the proper frame of mind.
There are several aspects of comfort when programming. Your eyes should be comfortable looking at the display. Your body should by comfortable sitting in the chair. The posture you must take to reach the keyboard and mouse should be comfortable from the chair. The work area should be comfortable to rest your hands on as you type or move the mouse.
Why does Professor Solomon recommend getting comfortable before tackling a difficult search? The most obvious reason is that the physical tension caused by squinting, sitting in unnatural positions, and so forth translates into mental tension. This tension isn’t conducive to thinking.
Young programmers, particularly students, are more likely to think that they’re immune to ergonomic problems. This is foolish. There are people who have been programming since their early teens who have done permanent damage to their hands by their mid-twenties.
If a bug has never put you in a frantic state of mind, either you haven’t been programming long or you’re a rare individual. Whether it’s a live customer demonstration or a project deadline, bugs manifest themselves at the most inconvenient times. You won’t be effective in finding a difficult bug if you’re in a state of panic.
There are several ways to get yourself back into the right frame of mind. Perhaps you or your spouse have learned breathing exercises as a part of a natural childbirth course. They are very effective outside the delivery room as well. Some people also find prayer or meditation an effective means of relaxing.
You can also listen to music that is relaxing while you debug. Just because you enjoy a particular genre of music doesn’t mean it will relax you. For example, if you like symphonic music, composers who wrote prior to 1800 will probably be the most helpful. If you like popular music, some New Age music may prove helpful. You want composers who soothe emotions, rather than stir them.
When a bug shows itself in code that you thought you had thoroughly tested, it can shake your confidence. When you spend fruitless hours trying to track it down without success, this feeling is only compounded.
If you use the methods suggested in this book, you can restore your confidence. First, you’re using methods that have proven effective for thousands of programmers. Second, you need to put the bug that is currently tormenting you into perspective. Finding the cause of a defect isn’t as hard as finding the answer to life, the universe, and everything. It is also likely that no matter how terrible this problem may seem to you, a year from now, or five years from now, you probably won’t even remember it.
“Believe it or not, things are often right where they’re supposed to be.”
To apply this concept, we have to stand it on its head. We aren’t supposed to be putting any bugs in our software, so we certainly don’t have a standard place to put them.
So, now the question becomes, where is a defect not supposed to be? Here we have to work backward.
The defect shouldn’t be in the code that shows the defective value. So, start by verifying the code that formats or writes the output that is incorrect. You may trust system libraries that do the conversion, but check the control inputs you provide to those libraries. You can’t determine where the problem is until you can trust the code that is indicating a problem exists.
The defect shouldn’t be in the code that computes the value. So, you must verify the code that calculates the values. Do you own a calculator or hand-held computer? You should. They are quite valuable in manually checking complex calculations.
The defect shouldn’t be in the code that reads in the values that are used as input to the calculations. So, you must verify the code that reads and converts the input, and then verify the input values themselves. No paranoia goes unrewarded.
“Many objects do have a designated or customary place where they are kept. But the reality is that they aren’t always returned there. Instead, they are left wherever last used.”
The customary place for a bug to occur is the last place that was modified. The place where an incorrect value is created is often not the place that it’s observed.
Defective values tend to drift down the data-flow graph. A data-flow graph is a collection of arcs and nodes in which the nodes are either places where variables are assigned or used, and the arcs show the relationship between the places where a variable is assigned and where the assigned value is subsequently used.
To find the source of a defective value that has drifted down the data-flow graph, work backward from the manifestation to the definitions. The difficulty of performing this analysis depends on the scope of the variable. If it’s a local variable on the stack, your search can be more limited. If it’s a global variable, you may have to review many procedures to develop a graph that shows the chain of values.
There are several ways to develop a data-flow graph. If you have a compiler or tool that generates cross-reference tables, it will do much of the dirty work for you. Failing that, a simple text search with a tool like the UNIX™ command grep can help you identify the places where a variable is assigned or used. A slicing tool, explained in Chapter 14, is most helpful.
“It is possible to look directly at a missing object and not see it. This is due to the agitated state of mind that often accompanies a misplacement. Go back and look again. Return to your armchair and get calm.”
It is possible to look right at a problem in code and not see it. This happens because you confuse what you know the code is supposed to do with what it is actually doing. One way to overcome this blindness is to explain what you think the code is doing to someone else. This method has been discussed previously.
A second way to break out of your mental model is to hand-execute the code you suspect to determine whether it does what you think it does. We form mental models when we read code, and these mental models aren’t always correct. Write down the line number of each statement as it is encountered and the value of each variable as it is assigned. This method has been recommended in the past as a way to check code and is called desk checking. Here we’re using it not to determine the correctness of a program, but to understand the behavior of a code segment we know to be incorrect.
Another way to break out of the mental model you have formed is to run a detailed trace of the code in question. In such a trace, the line number of each executable statement is printed as it’s encountered. For each executable statement, the values stored to memory and the names of the variables assigned are also printed. Such traces can generate a lot of output. They can show you that statements aren’t being executed in the order you thought or that the values being generated aren’t the values you thought.
If you know that a given piece of code must be the source of a problem, but after a thorough investigation, you’re unable to identify the source, per haps you should just write the code over again. There is a point at which it’s more cost effective to rewrite a piece of code than it’s to stare at it. The smaller the code segment in question, the quicker you come to that point.
“Your object may be right where you thought it was—but it has become hidden from view. Be sure to check under anything that could be covering your object, having inadvertently been placed on top of it.”
You may have identified a particular code segment as the cause of a problem, yet been unable to see the problem source. There are a number of programming constructs that can place a problem out of view:
Complex language constructs (C++, Ada, PL/I)
Some languages, such as C, C++, and PL/I, are normally used with a preprocessing phase. Preprocessor macros can obscure the source of a problem. If you’re sure a given source file contains a problem, consider reading the output of the preprocessor. Most compilers that apply preprocessors provide a command-line option to generate this output, even if the preprocessor has been integrated into the compiler. There are publicly available tools that will selectively preprocess a file, which will reduce the amount of code you will have to review. When you apply preprocessing selectively, you can control which files will be included and which macro definitions will be expanded.
All high-level languages have procedures that can be invoked, no matter whether they’re called subroutines, functions, or methods. Most modern languages have exception-handling facilities, and even C has the primitive setjmp/longjmp facility, both very powerful programming facilities, but they can obscure the cause of a problem because they introduce nonlocal flow of control.
If a code segment you suspect has procedure invocations, you have two choices. You can turn off the calls, perhaps by commenting them out or using a preprocessor command, and determine whether the problem exists.
Or, you can go through the process of determining that all of the code executed by the called procedure isn’t the problem. You can take a similar approach to code invoked by exception handlers. These can involve quite a bit of effort to verify, since the handler may not even be visible in the code you suspect.
As a last resort, you can look at the assembly code or bytecode generated by the compiler you’re using. While this is very tedious, it’s sometimes necessary to see what is really going on in a code segment you know is the source of a problem.
“You were there when the object was put down—was left in an obscure location—was consigned to oblivion. You were there—because you did it! So you must have a memory—however faint—of where this happened.”
These comments obviously only apply to code for which you’re solely responsible. You can think back much more easily if you have made a record of your actions.
There are several automatic ways you can keep track of what you did. There are also at least three levels of history you may want to track.
Changes to a single file during an editing session occur with high frequency. Source files can remain open across compilations in both command-line environments and integrated development environments (IDEs).
Some commercial editing tools and IDEs keep an “undo” history in a separate file, which you may want to archive. You can get the source code to the most popular command-line editors, vi and EMACS, which are freely available from the GNU project. If you want to track changes at this level, one way to do so is to use a modified version of your editor. This special version writes the “undo” history to a file, either by a special command or at regular intervals.
Changes that you save when you complete editing of a given source file occur with medium frequency. If you develop in a command-line environment, make good use of the history features of the shell you’re using. Keep the maximum history that the shell allows. Memory is cheap compared with the cost of redoing work. This history tells you which files you edited last, therefore which is the most logical place to look for a newly introduced bug.
You can extend the benefit of this history even further. Change your logout procedure so it appends your history to a separate file. This will make it possible to recall actions taken in previous logon sessions. This history will contain, among other things, both editor invocations and execution of whatever source control system you’re using. There should be a correlation between edits and checkins, which can be verified automatically with a tool that processes your long-term history.
If you’re developing with an IDE, use the feature that lists recently opened files on the File menu. Some systems provide you with the ability to set the number of files listed there. Set this value to the maximum allowed.
The least frequently changed level of history is source-code checkins. You should be using a source control system of some sort. The use of source control systems is discussed later in this chapter.
“Once you’ve checked a site, do not go back and check again. No matter how promising a site—if the object wasn’t there the first time, it won’t be there the second. Assuming, of course, that your first check was thorough.”
At each potential problem location, use all relevant tools to identify the source of the problem. See Chapter 14 for some suggestions for tools you might not be using.
If you’re going to ensure that you don’t visit a code segment more than once, it’s important to keep a record of where you have looked so far. This might seem trivial when you’re tracking down bugs that take only ten or fifteen minutes to resolve. It is much more important when you are working on a bug that takes ten or fifteen days to diagnose.
There are several ways to keep a record of your search. You can put a comment in each procedure as you investigate it. You can make a handwritten list or type your list into an editor screen or a hand-held computer.
You can even use a voice-activated recorder. This has the advantage of not requiring you to remove your hands from the keyboard. Of course, if you work in an office with cubicles, your coworkers might not appreciate your new technique.
If you don’t know how to describe an investigation verbally, watch a television program that shows physicians handling medical emergencies. They have an excellent protocol for describing what they see as they encounter it.
“The majority of lost objects are right where you figure. Others, however, are in the immediate vicinity of that place. They have undergone a displacement.”
Physical locality isn’t a primary concern when debugging software. Displacements of problems are more likely to occur temporally rather than spatially. Here is a list of items that are likely to be temporally displaced from the source of a problem:
Suspect those variables that were most recently modified before the problem became visible.
Suspect those statements that were most recently executed in the current procedure before the problem became visible.
Suspect those procedures that are closest on the call stack to the procedure where the problem became visible.
Physical displacement of problems is mostly a concern when using languages that provide arbitrary pointer manipulation, such as C and C++. Here is a list of items that are likely to be spatially displaced from the source of a problem:
Suspect references to variables in the same heterogeneous storage construct as the variable in which the problem became visible. This applies to constructs such as struct in C, class in C++, or COMMON in Fortran.
Suspect references to variables that are on the stack at the same time, as the variable that manifests an incorrect value. These variables are chiefly the local variables in the same procedure but can also include the arguments.
Suspect references to storage allocated on the heap at about the same time as the variable in which the problem became visible.
“If you still haven’t found your object, it may be time to Recreate the Crime. Remove your thinking cap and don your detective’s cap. You are about to follow your own trail.”
Earlier in this chapter, we recommended keeping the history of your work on several levels. If you want to be able to “tail yourself,” you need the information provided by a revision control system.
If you want to be a productive debugger, be a revision control fanatic. Check in everything you work on, and do so on a daily or even hourly basis.
If you are a part of a group programming project that uses a heavy-duty commercial source control system, you may want to use a lightweight source control system to keep track of your own work. Group projects generally want you to check things in only when they’re in a stable state. This is laudable but contradictory to the philosophy of using a source control system to enable you to “tail yourself.” You want to be able to compare source files that you know don’t work, not just the ones that you have determined are stable.
The simplest source control methodology is to make a separate directory at regular intervals, and copy your source files into that directory.
To go beyond this, you should use one of the good source control systems that are publicly available.
The Revision Control System (RCS) [Ti85] has been used on UNIX™ systems for nearly two decades, and it’s perfectly adequate for projects with a small number of programmers. It doesn’t handle multisite projects, nor does it have a concept of multifile modules.
The Concurrent Versions System (CVS) [Be90] supports multiple programmers working on a single source base, multiple directories of source, and distributed development across a wide-area network. Both RCS and CVS are open-source systems available from the GNU project.
While debugging, each time you make a change to test a hypothesis, mark the changed code with comments. This also means that you won’t delete code, just render it nonoperational. You can do this by commenting it out, put it under the control of a conditional compilation directive, or under the control of a control-flow statement that never is executed.
You can’t see changes that cause a problem if you can’t track the deltas. Here are some tags you can use to indicate tentative changes:
// comment out deleted code // D // A: added code // C: changed code
“When all else has failed, explore the possibility that your object hasn’t been misplaced. Rather, it is been misappropriated. Perhaps someone you know has borrowed or moved the object you are looking for.”
After you have made a heroic effort to diagnose your problem, it’s reasonable to consider whether your code may not be the culprit. Problems that show up in applications can be caused by libraries and middleware, compilers, and operating systems. The developers of these systems are mere mortals too, and they have their share of bugs. On the other hand, they also typically have the benefit of large numbers of users to exercise their code and work out the bugs. This is why you should suspect your code first.
If you want to get help with a problem in other people’s software, it’s important to create a minimal test case that demonstrates the problem. The downside to having all those users is that software vendors receive many bogus defect reports from those same users. The natural human tendency is to look at the problems that are easiest to diagnose, and those are the ones with the shortest test cases. You are far more likely to get a rapid response from a responsible software vendor with a well-structured test case than by pressure tactics.
A well-designed test case is self-contained, prints a simple pass/fail message, and takes up less than a page of code. When we submit such defect reports, we take great care to double- and triple-check the behavior with which we’re unhappy. There are few things more embarrassing than submitting a bogus bug report and displaying your ignorance in public.