The Debugging Process | Debugging Applications for MicrosoftВ® .NET and Microsoft WindowsВ® (Pro-Developer)

[Previous] [Next]

Finally, let's start talking about hands-on debugging by discussing the debugging process. Determining a process that works for all bugs, even "freak" bugs (bugs that come out of the blue and don't make any sense), was a bit challenging. But by drawing on my own experiences and by talking to my colleagues about their experiences, I eventually came up with a debugging approach that all great developers intuitively follow but that less experienced (or just poorer) developers often don't find obvious.

As you'll see, this debugging process doesn't take a rocket scientist to implement. The hard part is making sure you start with this process every time you debug. Here are the nine steps involved in the debugging approach that I recommend:

Step 1: Duplicate the bug

Step 2: Describe the bug

Step 3: Always assume that the bug is yours

Step 4: Divide and conquer

Step 5: Think creatively

Step 6: Leverage tools

Step 7: Start heavy debugging

Step 8: Verify that the bug is fixed

Step 9: Learn and share

Depending on your bug, you can skip some steps entirely because the problem and the location of the problem are entirely obvious. You must always start with Step 1 and get through Step 2. At any point between Step 3 and Step 7, however, you might figure out the solution and be able to fix the bug. In those cases, after you fix the bug, skip to Step 8 to verify and test the fix. Figure 1-1 illustrates the steps of the debugging process.

click to view at full size.

Figure 1-1 The debugging process

Step 1: Duplicate the Bug

The most critical step in the process is the first one: duplicating the bug. This is sometimes difficult, or even impossible, but if you can't duplicate a bug, you probably can't eliminate it. When trying to duplicate a bug, you might need to go to extremes. I had one bug in my code that I couldn't duplicate just by running the program. I had an idea of the data conditions that might cause it, however, so I ran the program under the debugger and entered the data I needed to duplicate the bug directly into memory. It worked. If you're dealing with a synchronization problem, you might need to take steps such as loading the same tasks so that you can duplicate the state in which the bug occurred.

Once you've duplicated the bug by using one set of steps, you should evaluate whether you can duplicate the bug through a different set of steps. You can get to some bugs via one code path only, but you can get to other bugs through multiple paths. The idea is to try to see the behavior from all possible angles. By duplicating the bug from multiple paths, you have a much better sense of the data and boundary conditions that are causing the problems. Additionally, as we all know, some bugs can mask other bugs. The more ways you can find to duplicate a bug, the better off you'll be.

Even if you can't duplicate the bug, you should still log it into your bug tracking system. If I have a bug that I can't duplicate, I always log it into the system anyway, but I leave a note that says I couldn't duplicate it. That way, if another engineer is responsible for that section of the code, he or she at least has an idea that something is amiss. When logging a bug that you can't re-create, you need to be as descriptive as possible. If the description is good enough, that information might be sufficient for you or another engineer to solve the problem eventually.

Step 2: Describe the Bug

If you were a typical engineering student in college, you probably concentrated on your math and engineering classes and barely passed your writing classes. In the real world, your writing skills are almost more important than your engineering skills because you need to be able to describe your bugs, both verbally and in writing. When faced with a tough bug, you should always stop right after you duplicate it and describe it. Ideally, you would do this in your bug tracking system, even if it's your responsibility to debug the bug, but talking it out is also useful. The main reason for describing the bug is that it often helps you fix it. I can't remember how many times I've been able to figure out a bug just by describing it to someone.

That "someone" doesn't even have to be a human. My cat, as it turns out, is an excellent debugger, and she has helped me solve a number of nasty bugs when I talked to her about them. For those bugs my cat couldn't solve, describing them to her gave me good practice for when I presented them to my human colleagues.

Of course, your colleagues can help you out only if you're able to describe your bugs so that they can understand them. Hence, the reason having strong communication skills is so important.

Step 3: Always Assume That the Bug Is Yours

In all the years I've been in software development, only a small percentage of the bugs I've seen were the result of the compiler or the operating system. If you have a bug, the odds are excellent that it's your fault, and you should always assume and hope that. If the bug is in your code, at least you can fix it; if it's in your compiler or the operating system, you have bigger problems. You should eliminate any possibility that the bug is in your code before spending time looking for it elsewhere.

Step 4: Divide and Conquer

If you've duplicated your bug and described it well, you have started a hypothesis about the bug and have an idea of where it's hiding. In this step, you start firming and testing your hypothesis. To test the hypothesis, you can sometimes start by doing a little light debugging in the debugger. Light debugging involves checking states and variable values—not slogging through the code groping and guessing for a solution. If your hypothesis doesn't pan out in a few minutes, stop for a moment and reassess the situation. You've learned a little more about the bug, so now you can reevaluate your hypothesis and try again.

Debugging is like a binary search algorithm. You're trying to find where the bug is, and on each iteration through your different hypotheses, you are, hopefully, eliminating the sections of the programs where the bug is not. As you continue to look, you eliminate more and more of the program until you can box the bug into a section of code. As you continue to develop your hypothesis and learn more about the bug, you can update your bug description to reflect the new information.

When I'm in this step, I generally try out three to five solid hypotheses before moving on to the next step. The idea is to have a reason for running the debugger. Ideally, you can test your hypothesis without running the debugger and still prove or disprove it.

Step 5: Think Creatively

If the bug you're trying to eliminate is one of those nasty ones that happens only on certain machines or is hard to duplicate, start looking at the bug from different perspectives. This is the step in which you should start thinking about DLL version mismatches, operating system differences, problems with your program's binaries or its installation, and other external factors.

A technique that sometimes works wonders for me is to walk away from the problem for a day or two. You can sometimes focus so intensely on a problem that you lose sight of the forest for the trees and start missing obvious clues. By walking away from the bug, you give your subconscious a chance to work on the problem for a while.

Step 6: Leverage Tools

I've never understood why some companies let their engineers spend weeks searching for a bug when spending a thousand dollars for error detection, performance, and code-coverage tools would help them find the current bug—and bugs they will encounter in the future—in minutes.

Before I do any heavy debugging, I always run my code through Compuware NuMega's BoundsChecker/SmartCheck (an error detection tool), TrueTime (a performance tool), and TrueCoverage (a code-coverage tool). I have better things to do with my time than play in the debugger, and so do you. Granted, because I helped write each of these products, I'm a little biased toward them; but other companies, such as Rational Software and MuTek Solutions, have products with functionality similar to NuMega's. The point is that if you're not using a third-party tool to help you debug your products, you're spending more time debugging than you need to be.

For those of you who are unfamiliar with these types of tools, let me explain what each of them does. An error detection tool looks for invalid memory accesses, invalid parameters to system APIs and COM interfaces, memory leaks, and resource leaks, among other things. A performance tool helps you track down where your application is slow, which is invariably somewhere other than where you think it is. A code-coverage tool shows you the source lines not executed when you run your program. Code-coverage information is helpful because if you're looking for a bug, you want to look for it only in lines that are executing.

Step 7: Start Heavy Debugging

I differentiate heavy debugging from the light debugging I mentioned in Step 4 by what you're doing in the debugger. When you're doing light debugging, you're just looking at a few states and a couple of variables. In contrast, when you're doing heavy debugging, you're spending a good deal of time exploring your program's operation. It is during the heavy debugging stage that you want to use the debugger's advanced features. Your goal is to let the debugger do as much of the heavy lifting as possible. Chapter 5 discusses the debugger's advanced features.

Just as when you're doing light debugging, when you're doing heavy debugging, you should have an idea of where you think your bug is before you start using the debugger, and then use the debugger to prove or disprove your hypothesis. Never sit in the debugger and just poke around.

Also, when you're doing heavy debugging, remember to regularly review changes you made to fix the bug in the debugger. This double-checking is especially important in the later stages of the project, when you need to be careful not to destabilize the code base.

If you set up your project correctly and follow the debugging steps in this chapter and the recommendations in Chapter 2, you won't have to spend much time doing heavy debugging.

Step 8: Verify That the Bug Is Fixed

When you think you've finally fixed the bug, the next step in the debugging process is to test, test, and retest the fix. Did I also mention that you need to test the fix? If the bug is in an isolated module on a line of code called once, testing the fix is easy. However, if the fix is in a core module, especially one that handles your data structures and the like, you need to be very careful that your fix doesn't cause problems or have side effects in other parts of the project.

When testing your fix, especially in critical code, you should verify that it works with all data conditions, good and bad. Nothing is worse than a fix for one bug that causes two other bugs. If you do make a change in a critical module, you should let the rest of the team know that you made the change. That way, they can be on the lookout for any ripple effects as well.

Debugging War Story

Where Did the Integration Go?

The Battle

One of the developers I worked with at NuMega thought he'd found a great bug in NuMega's Visual C++ Integrated Development Environment (VC IDE) integration because it didn't work on his machine.

For those of you who are unfamiliar with NuMega's VC IDE integration, let me provide a little background information. NuMega's software products integrate with the VC IDE—and have for a number of years. This integration allows NuMega's windows, toolbars, and menus to appear seamlessly inside the VC IDE.

The Outcome

This developer spent a couple of hours using SoftICE, a kernel debugger, exploring the bug. After a while, he had set breakpoints all over the operating system. Finally, he found his "bug." He noticed that when he started the VC IDE, CreateProcess was being called with the "\\R2D2\VCommon\MSDev98\ Bin\MSDEV.EXE" instead of the "C:\VSCommon\MSDev98\Bin\MSDEV.EXE" he thought it should be. In other words, instead of running the VC IDE from his local machine (C:\VSCommon\MSDev98\Bin\MSDEV.EXE), he was running it from his old machine (\\R2D2\VCommon\MSDev98\Bin\MSDEV.EXE). How did this happen?

The developer had just gotten a new machine and had installed the full NuMega VC IDE integration for the products. To get it set up faster, he had copied his desktop links (LNK files) from his old machine, which were installed without VC IDE integration, to his new machine by dragging them with the mouse. When you drag LNK files, the internal links update to reflect the location of the original link. Therefore, he was always starting the VC IDE from his desktop icon LNK, which had the pointer to the old machine, instead of running the VC IDE from his new machine. He'd been running it from his old machine all along.

The Lesson

The developer went about debugging the problem wrong by just jumping right in with a kernel debugging instead of attempting to duplicate the problem in multiple ways. In Step 1 of the debugging process, "Duplicate the bug," I recommend that you try to duplicate the bug in multiple ways so that you can be assured that you're looking at the right bug, not just multiple bugs masking and compounding each other. If this developer had followed Step 5, "Think creatively," he would have been better off because he would have thought about the problem first instead of plunging right in.

Step 9: Learn and Share

Each time you fix a "good" bug (that is, one that was challenging to find and fix), you should take the time to quickly summarize what you learned. I like to record my good bugs in a journal so that I can later see what I did right in finding and fixing the problem. More important, I also want to learn what I did wrong so that I can learn to avoid dead-ends when debugging and solve bugs faster. You learn the most about development when you're debugging, so you should take every opportunity to learn from it.

One of the most important steps you can take after fixing a good bug is to share the information you learned fixing the bug with your colleagues, especially if the bug is project specific. This information will help your coworkers the next time they need to eliminate a similar bug.

Final Debugging Process Secret

I'd like to share one final debugging secret with you: the debugger can answer all your debugging questions as long as you ask it the right questions. Again, I'm suggesting that you need to have a hypothesis in mind—something you want to prove or disprove—before the debugger can help you. Sometimes I even write out my hypothesis before I ever touch the debugger to ensure that I have a purpose each time I use it.

Remember that the debugger is just a tool, like a screwdriver. It does only what you tell it to do. The real debugger is the software in your hardware cranium.