The Debugging Process | Debugging Applications for MicrosoftВ® .NET and Microsoft WindowsВ® (Pro-Developer)

Finally, let's start talking about hands-on debugging by discussing the debugging process. Determining a process that works for all bugs, even "freak" bugs (bugs that come out of the blue and don't make any sense), was a bit challenging. But by drawing on my own experiences and by talking to my colleagues about their experiences, I eventually came up with a debugging approach that all great developers intuitively follow but that less experienced (or just poorer) developers often don't find obvious.

As you'll see, this debugging process doesn't take a rocket scientist to implement. The hard part is making sure you start with this process every time you debug. Here are the nine steps involved in the debugging approach that I recommend:

Step 1: Duplicate the bug
Step 2: Describe the bug
Step 3: Always assume that the bug is yours
Step 4: Divide and conquer
Step 5: Think creatively
Step 6: Leverage tools
Step 7: Start heavy debugging
Step 8: Verify that the bug is fixed
Step 9: Learn and share

Depending on your bug, you can skip some steps entirely because the problem and the location of the problem are entirely obvious. You must always start with Step 1 and get through Step 2. At any point between Step 3 and Step 7, however, you might figure out the solution and be able to fix the bug. In those cases, after you fix the bug, skip to Step 8 to verify and test the fix. Figure 1-1 illustrates the steps of the debugging process.

click to expand
Figure 1-1: The debugging process

Step 1: Duplicate the Bug

The most critical step in the debugging process is the first one: duplicating the bug. This is sometimes difficult, or even impossible, but if you can't duplicate a bug, you probably can't eliminate it. When trying to duplicate a bug, you might need to go to extremes. I had one bug in my code that I couldn't duplicate just by running the program. I had an idea of the data conditions that might cause it, however, so I ran the program under the debugger and entered the data I needed to duplicate the bug directly into memory. It worked. If you're dealing with a synchronization problem, you might need to take steps such as loading the same tasks so that you can duplicate the state in which the bug occurred.

At this point you're probably thinking, "Well, duh! Of course the first thing you do is duplicate the bug. If I could duplicate it all the time, I wouldn't need your book!" It all depends on your definition of "duplicatability." My definition is duplicating the bug on a single machine once in a 24-hour period. That's sufficient for my company to come in to work on it. Why? Simple. If you can get it on one machine, you can throw 30 machines at it and get the bug duplicated 30 times. The big mistake people make with duplicating the bug is to not get as many machines as possible into the mix. If you have 30 people to manually punch keys for you, that's great. However, a valuable effort would be to automate the user interface to drive the bug out into the open. You can use either the Tester application from Chapter 17 or a commercial automated regression testing tool.

Once you've duplicated the bug by using one general set of steps, you should evaluate whether you can duplicate the bug through a different set of steps. You can get to some bugs via one code path only, but you can get to other bugs through multiple paths. The idea is to try to see the behavior from all possible angles. By duplicating the bug from multiple paths, you have a much better sense of the data and boundary conditions that are causing the problems. Additionally, as we all know, some bugs can mask other bugs. The more ways you can find to duplicate a bug, the better off you'll be.

Even if you can't duplicate the bug, you should still log it into your bug tracking system. If I have a bug that I can't duplicate, I always log it into the system anyway, but I leave a note that says I couldn't duplicate it. That way, if another engineer is responsible for that section of the code, she at least has an idea that something is amiss. When logging a bug that you can't duplicate, you need to be as descriptive as possible. If the description is good enough, it might be sufficient for you or another engineer to solve the problem eventually. A good description is especially important because you can correlate various non-reproducible bug reports, enabling you to start seeing patterns in the bug's behavior.

Step 2: Describe the Bug

If you were a typical engineering student in college, you probably concentrated on your math and engineering classes and barely passed your writing classes. In the real world, your writing skills are almost more important than your engineering skills because you need to be able to describe your bugs, both verbally and in writing. When faced with a tough bug, you should always stop right after you duplicate it and describe it. Ideally, you do this in your bug tracking system, even if it's your responsibility to debug the bug, but talking it out is also useful. The main reason for describing the bug is that doing so often helps you fix it. I can't remember how many times another engineer's description helped me look at a bug in a different way.

Step 3: Always Assume That the Bug Is Yours

In all the years I've been in software development, only a miniscule percentage of the bugs I've seen were the result of the compiler or the operating environment. If you have a bug, the odds are excellent that it's your fault, and you should always assume and hope that it is. If the bug is in your code, at least you can fix it; if it's in your compiler or the operating environment, you have bigger problems. You should eliminate any possibility that the bug is in your code before spending time looking for it elsewhere.

Step 4: Divide and Conquer

If you've duplicated your bug and described it well, you have started a hypothesis about the bug and have an idea of where it's hiding. In this step, you start firming and testing your hypothesis. The important thing to remember here is the paraphrased line from the movie Star Wars: "Use the source, Luke!" Read the source code, and desk-check what you think is happening with what the code really does. Reading the code will force you to take the extra time to look at the problem. Starting with the state of the machine at the time of the crash or problem, work through the various scenarios that could cause you to get to that section of code. If your hypothesis of what went wrong doesn't pan out, stop for a moment and reassess the situation. You've learned a little more about the bug, so now you can reevaluate your hypothesis and try again.

Debugging is like a binary search algorithm. You're trying to find the bug, and with each iteration through your different hypotheses, you are, hopefully, eliminating the sections of the programs where the bug is not. As you continue to look, you eliminate more and more of the program until you can box the bug into a section of code. As you continue to develop your hypothesis and learn more about the bug, you can update your bug description to reflect the new information. When I'm in this step, I generally try out three to five solid hypotheses before moving on to the next step. If you feel you're getting close, you can do a little "light" debugging in this step to do final verification of the hypothesis. By light, I mean double-checking states and variable values, not slogging through looking at everything.

Step 5: Think Creatively

If the bug you're trying to eliminate is one of those nasty ones that happens only on certain machines or is hard to duplicate, start looking at the bug from different perspectives. This is the step in which you should start thinking about version mismatches, operating system differences, problems with your program's binaries or its installation, and other external factors.

A technique that sometimes works wonders for me is to walk away from the problem for a day or two. You can sometimes focus so intensely on a problem that you lose sight of the forest for the trees and start missing obvious clues. By walking away from the bug, you give your subconscious a chance to work on the problem for a while. I'm sure everyone reading this book has solved a bug on the way home from work. Of course, walking away from that bug might be difficult if the bug is the one holding up shipment and your boss is breathing down your neck.

At several companies I've worked at, the highest priority interrupt has been something called "Bug Talk." That means you are totally stumped and need to talk the bug over with someone. The idea is that you can walk into a person's office and present the problem on a white board. I don't know how many times I've walked into someone's office, uncapped the marker, touched the marker on the board, and solved my problem without even saying a word. Just getting your mind prepared to present the problem helps you get past the individual tree you're staring at and lets you see the whole forest. When you pick a person to do a Bug Talk with, you should pick someone other than the colleagues you're working very closely with on the same section of the project. That way, you can ensure your Bug Talk partner isn't making the same assumptions you are about the problem.

What's interesting is the "someone" doesn't even have to be a human. My cats, as it turns out, are excellent debuggers, and they have helped me solve a number of really nasty bugs. After rounding them up, I draw the problem out on my white board and let them work their magic. Of course, the day I was doing this without having taken a shower and wearing nothing but shorts was a little difficult to explain to the UPS delivery guy standing at my door.

The one person you should always avoid doing Bug Talks with is your spouse or significant other. For some reason, the fact that you're having a relationship with that person means there's a built-in problem. Of course, you've probably already seen this when you try to describe that bug and the person's eyes glaze over and he or she nearly passes out.

Step 6: Leverage Tools

I've never understood why some companies let their engineers spend weeks searching for a bug when spending a thousand dollars for error detection, performance, and code-coverage tools would help them find the current bug—and bugs they will encounter in the future—in minutes.

Several companies, such as Compuware and Rational, make excellent tools for both managed and native code. I always run my code through their tools before I tackle the heavy debugging step. Since native code bugs are always harder to find than managed code bugs, the tools are much more important. From Compuware NuMega you have BoundsChecker (an error detection tool), TrueTime (a performance tool), and TrueCoverage (a code-coverage tool). Rational makes Purify (error detection), Quantify (performance), and PureCoverage (code coverage). The point is that if you're not using a third-party tool to help you debug your products, you're spending more time debugging than you need to.

For those of you who are unfamiliar with these types of tools, let me explain what each of them does. An error detection tool looks for invalid memory accesses, invalid parameters to system APIs and COM interfaces, memory leaks, and resource leaks, among other things. A performance tool helps you track down where your application is slow; that spot is invariably somewhere other than where you think it is. A code-coverage tool shows you the source lines not executed when you run your program. Code-coverage information is helpful because if you're looking for a bug, you want to look for it only in lines that are executing.

Step 7: Start Heavy Debugging

I differentiate heavy debugging from the light debugging I mentioned in Step 4 by what you're doing in the debugger. When you're doing light debugging, you're just looking at a few states and a couple of variables. In contrast, when you're doing heavy debugging, you're spending a good deal of time exploring your program's operation. It is during the heavy debugging stage that you want to use the debugger's advanced features. Your goal is to let the debugger do as much of the heavy lifting as possible. Chapters 6 through 8 discuss the various debuggers' advanced features.

Just as when you're doing light debugging, when you're doing heavy debugging, you should have an idea of where you think your bug is before you start using the debugger, and then use the debugger to prove or disprove your hypothesis. Never sit in the debugger and just poke around. In fact, I strongly encourage you to actually write out your hypothesis before you ever fire up the debugger. That will help you keep completely focused on exactly what you're trying to accomplish.

Also, when you're doing heavy debugging, remember to regularly review changes you made to fix the bug in the debugger. I like to have two machines set up side by side at this stage. That way I can work at fixing the bug on one machine and use the other machine to run the same code with normal condition cases. The idea is to always double-check and triple-check any changes so you're not destabilizing the normal operation of your product. I'll give you some career advice and let you know that your boss really hates it when you check in code to fix a bug and your product handles only weird boundary conditions and no longer handles the normal operation case.

If you set up your project correctly and follow the debugging steps in this chapter and the recommendations in Chapter 2, you hopefully won't have to spend much time doing heavy debugging.

Step 8: Verify That the Bug Is Fixed

When you think you've finally fixed the bug, the next step in the debugging process is to test, test, and retest the fix. Did I also mention that you need to test the fix? If the bug is in an isolated module on a line of code called once, testing the fix is easy. However, if the fix is in a core module, especially one that handles your data structures and the like, you need to be very careful that your fix doesn't cause problems or have side effects in other parts of the project.

When testing your fix, especially in critical code, you should verify that it works with all data conditions, good and bad. Nothing is worse than a fix for one bug that causes two other bugs. If you do make a change in a critical module, you should let the rest of the team know that you made the change. That way, they can be on the lookout for any ripple effects as well.

Debugging War Story: Where Did the Integration Go?

The Battle

One of the developers I worked with at NuMega thought he'd found a great bug in NuMega's Visual C++ Integrated Development Environment (VC IDE) integration because it didn't work on his machine. For those of you who are unfamiliar with NuMega's VC IDE integration, let me provide a little background information. NuMega's software products integrate with the VC IDE—and have for a number of years. This integration allows NuMega's windows, toolbars, and menus to appear seamlessly inside the VC IDE.

The Outcome

This developer spent a couple of hours using SoftICE, a kernel debugger, exploring the bug. After a while, he had set breakpoints all over the operating system. Finally, he found his "bug." He noticed that when he started the VC IDE, CreateProcess was being called with the \\R2D2\VSCommon\MSDev98\Bin\MSDEV.EXE path instead of the C:\VSCommon\MSDev98\Bin\MSDEV.EXE path he thought it should be called with. In other words, instead of running the VC IDE from his local machine (C:\VSCommon\MSDev98\Bin\MSDEV.EXE), he was running it from his old machine (\\R2D2\VSCommon\MSDev98\Bin\MSDEV.EXE). How did this happen?

The developer had just gotten a new machine and had installed the full NuMega VC IDE integration for the products. To get it set up faster, he copied his desktop shortcuts (LNK files) from his old machine, which were installed without VC IDE integration, to his new machine by dragging them with the mouse. When you drag shortcuts, the internal paths update to reflect the location of the original target. Since he was always starting the VC IDE from his desktop shortcut, which was pointing to his old machine, he'd been running the VC IDE on his old machine all along.

The Lesson

The developer went about debugging the problem in the wrong way by just jumping right in with a kernel debugging instead of attempting to duplicate the problem in multiple ways. In Step 1 of the debugging process, "Duplicate the Bug," I recommended that you try to duplicate the bug in multiple ways so that you can be assured you're looking at the right bug, not just multiple bugs masking and compounding one another. If this developer had followed Step 5, "Think Creatively," he would have been better off because he would have thought about the problem first instead of plunging right in.

Step 9: Learn and Share

Each time you fix a "good" bug (that is, one that was challenging to find and fix), you should take the time to quickly summarize what you learned. I like to record my good bugs in a journal so that I can later see what I did right in finding and fixing the problem. More important, I also want to learn what I did wrong so that I can learn to avoid dead ends when debugging and solve bugs faster. You learn the most about development when you're debugging, so you should take every opportunity to learn from it.

One of the most important steps you can take after fixing a good bug is to share with your colleagues the information you learned while fixing the bug, especially if the bug is project-specific. This information will help your coworkers the next time they need to eliminate a similar bug.

Final Debugging Process Secret

I'd like to share one final debugging secret with you: the debugger can answer all your debugging questions as long as you ask it the right ones. Again, I'm suggesting that you need to have a hypothesis in mind—something you want to prove or disprove—before the debugger can help you. As I recommended earlier in Step 7 I write out my hypothesis before I ever touch the debugger to ensure that I have a purpose each time I use it.

Remember that the debugger is just a tool, like a screwdriver. It does only what you tell it to do. The real debugger is the software in your hardware cranium.