Bugs and Debugging | Debugging Applications for MicrosoftВ® .NET and Microsoft WindowsВ® (Pro-Developer)

[Previous] [Next]

Bugs are cool! Well, actually, finding bugs is cool. The coolest of all bugs are those that you find before the customer sees your product. Discovering those prerelease bugs means that you're doing your job and producing a higher quality product. Having your customer find the bug is the opposite of cool.

Compared to other engineering fields, software engineering is an anomaly in two ways. First, software engineering is a new and somewhat immature branch of engineering. Second, users have come to accept bugs in our products, particularly in PC software. Although they grudgingly resign themselves to bugs, they're still not happy when they find them.

You need to care about bugs because ultimately they cost your business in two ways. In the short term, customers contact you for help, forcing you to spend your time and money sustaining the current product while your competitors are working on their next version. In the long term, the invisible hand of economics kicks in and customers just start buying alternatives to your buggy product. As software begins to be delivered more as a service than as a capital investment, the pressure for higher quality software will increase. Very soon, your users will be able to switch among software products from various vendors just by moving from one Web site to another. This boon for users will mean less job security for you and I if our products are buggy and more incentive to create high-quality products.

What Are Bugs?

Before you can start debugging, you need a definition of bugs. My definition of a bug is "anything that causes a user pain." I classify bugs into the following categories:

Inconsistent user interfaces

Unmet expectations

Poor performance

Crashes or data corruption

Inconsistent User Interfaces

Inconsistent user interfaces, though not the most serious type of bug, are annoying. One of the reasons for the success of Microsoft Windows is that all Windows applications generally behave the same way. When an application deviates from the Windows standard, it becomes a burden for the user. A small example of this nonstandard, irksome behavior is the Find accelerators in Microsoft Outlook. In every other English-language Windows application on the planet, Ctrl+F brings up the Find dialog box so that you can find text in the current window. In Outlook, however, Ctrl+F forwards the open message. Even after many years of using Outlook, I can never remember to use the F4 key to find text in the currently open message.

You can solve problems with inconsistent user interfaces by following the recommendations in the book Microsoft Windows User Experience (Microsoft Press, 1999). At the time of this writing, a previous version of the book, The Windows Interface Guidelines for Software Design, also appeared on the Microsoft Developer Network (MSDN). If either book doesn't address your particular issue, look for another Microsoft application that does something similar to what you're trying to achieve and follow its model.

Unmet Expectations

Not meeting the user's expectations is one of the hardest bugs to solve. This bug usually occurs right at the beginning of a project, when the company doesn't do sufficient research on what the real customer needs. In both types of shops—shrinkwrap (those writing software for sale) and Information Technology (IT) (those writing in-house applications)—the cause of this bug comes down to communication problems.

In general, development teams don't communicate directly with their product's customers, so they aren't learning what the users need. Ideally, all members of the engineering team should be visiting customer sites so that they can see how the customers use their product. Watching over a customer's shoulder as your product is being used can be an eye-opening experience. Additionally, this experience will give you the insight you need to properly interpret what customers are asking your product to do. If you do get to talk to customers, make sure you speak with as many as possible so that you can get input from across a wide spectrum.

In addition to customer visits, another good idea is to have the engineering team review the support call summaries and support e-mails. This feedback will allow the engineering team to see the problems that the users are having, without any filtering applied.

Another aspect of this kind of bug is the situation in which the user's level of expectation has been raised higher than the product can deliver. This inflation of user expectations is the classic result of too much hype, and you must resist misrepresenting your product's capabilities at all costs. When users don't get what they anticipated from a product, they tend to feel that the product is even buggier than it really is. The rule for avoiding this situation is to never promise what you can't deliver and to always deliver what you promise.

Poor Performance

Users are very frustrated by bugs that cause the application to slow down when it encounters real-world data. Invariably, improper testing is the root of all poor performance bugs—however great the application might have looked in development, the team failed to test it with anything approaching real-world volumes. One project I worked on, NuMega's BoundsChecker 3.0, had this bug with its original FinalCheck technology. That version of FinalCheck inserted additional debugging and contextual information directly into the source code so that BoundsChecker could better report errors. Unfortunately, we failed to sufficiently test the FinalCheck code on larger real-world applications before we released BoundsChecker 3.0. As a result, more users than we cared to admit couldn't use that feature. We completely rewrote the FinalCheck feature in subsequent releases, but because of the performance problems in the original version, many users never tried it again, even though it was one of the product's most powerful and useful features.

You tackle poor performance bugs in two ways. First, make sure you determine your application's performance requirements up front. To know whether you have a performance problem, you need a goal to measure against. An important part of performance planning is keeping baseline performance numbers. If your application starts missing those numbers by 10 percent or more, you need to stop and determine why your performance dropped and take steps to correct the problem. Second, make sure you test your applications against as close to real-world scenarios as possible—and that you do this as early in the development cycle as you can.

Crashes or Data Corruption

Crashes and data corruption bugs are what most developers and users think of when they think of a bug. Users might be able to work around the types of bugs just described, but crashes stop them dead—which is why the majority of this book concentrates on solving these extreme problems. In addition, crashes and data corruption bugs are the most common type of bug. As we all know, some of these bugs are easy to solve, and others are almost impossible. The main point to remember about crashes and data corruption bugs is that you should never ship a product if you know it has one of these bugs in it.

Process Bugs and Solutions

Although shipping software without bugs is possible—given enough attention to detail—I've shipped enough products to know that most teams haven't reached that level of software development maturity. Bugs are a fact of life in this business. However, you can minimize the number of bugs your applications have. That is what teams that ship high-quality products—and there are many out there—do. The reasons for bugs generally fall into the following process categories:

Short or impossible deadlines

The "code first, think later" approach

Misunderstood requirements

Engineer ignorance or improper training

Lack of commitment to quality

Short or Impossible Deadlines

We've all been part of development teams for which "management" has set a deadline that was determined by either a tarot card reader or, if that was too expensive, a Magic 8-Ball. Although we'd like to believe that managers are responsible for most unrealistic schedules, more often than not, they aren't to blame. Engineers' work estimates are usually the basis of the schedule, and sometimes engineers underestimate how long it will take them to develop a solid product. Whether an unrealistic ship date is the fault of management or engineering or both, the bottom line is that a schedule that's impossible to meet leads to cut corners and a lower quality product.

I've been fortunate enough to work on several teams that have shipped software on time. In all cases, the development team truly owned the schedule, and we were good at determining realistic ship dates. To figure out realistic ship dates, we based our dates on a feature set. If the company found the proposed ship date unacceptable, we cut features to move up the date. In addition, everyone on the development team agreed to the schedule before we presented it to management. That way, the team's credibility was on the line to finish the product on time. Interestingly, besides shipping on time, these products were some of the highest quality products that I've ever worked on.

The "Code First, Think Later" Approach

My friend Peter Ierardi coined the term "code first, think later" to describe the all-too-common situation in which an engineering team starts programming before they start thinking. Every one of us is guilty of this approach to an extent. Playing with compilers, writing code, and debugging is the fun stuff; it's why we got interested in this business in the first place. Very few of us like to sit down and write documents that describe what we're going to do.

If you don't write these documents, however, you'll start to run into bugs. Instead of stopping and thinking about how to avoid bugs in the first place, you'll start tweaking the code as you go to work around the bugs. As you might imagine, this tactic will compound the problem because you'll introduce more and more bugs into an already unstable code base. If you find yourself saying, "We've got too big an investment in this code base to change now," you have a symptom of the "code first, think later" syndrome.

Fortunately, the solution to this problem is simple: plan your projects. Some very good books have been written about requirements gathering and project planning. I cite them in Appendix B, and I highly recommend that you read them. Although it isn't very sexy and is generally a little painful, up-front planning is vital to eliminating bugs.

Misunderstood Requirements

Proper planning also minimizes one of the biggest bug causers in development: feature creep. Feature creep—the tacking on of features not originally planned—is a symptom of poor planning and inadequate requirements gathering. Adding last-minute features, whether in response to competitive pressure, as a developer's pet feature, or on the whim of management, causes more bugs in software than almost anything else.

Software engineering is an extremely detail-oriented business. The more details you hash out and solve before you start coding, the less you leave to chance. The only way to achieve proper attention to detail is to plan your milestones and the implementation for your projects. Of course, this doesn't mean that you need to go completely overboard and generate thousands of pages of documentation describing what you're going to do.

One of the best design documents I ever created for a product was simply a series of paper drawings, or "paper prototypes," of the user interface. Based on research and on the teachings of Jarod Spool and his company, User Interface Engineering, my team drew the user interface and worked through each user scenario completely. In doing so, we had to focus on the requirements for the product and figure out exactly how the users were going to perform their tasks. In the end, we knew exactly what we were going to deliver and, more important, so did everyone else in the company. If a question about what was supposed to happen in a given scenario arose, we pulled out the paper prototypes and worked through the scenario again.

Even though you might do all the planning in the world, you have to really understand your products' requirements to implement them properly. At one company where I worked—mercifully, for less than a year—the requirements for the product seemed very simple and straightforward. As it turned out, however, most of the team members didn't understand the customers' needs well enough to figure out what the product was supposed to do. The company made the classic mistake of drastically increasing engineering head count but failing to train the new engineers sufficiently. Consequently, even though the team planned out everything to extremes, the product shipped several years late and the market rejected it.

There were two large mistakes on this project. The first was that the company wasn't willing to take the time to thoroughly explain the customers' needs to the engineers who were new to the problem domain, even though some of us begged for the training. The second mistake was that many of the engineers, both old and new, didn't care to learn more about the problem domain. As a result, the team kept changing direction each time marketing and sales reexplained the requirements. The code base was so unstable that it took months to get even the simplest user scenarios to work without crashing.

Very few companies train their engineers in their problem domain at all. Although many of us have college degrees in engineering, we generally don't know much about how customers will use our products. If companies would spend adequate time up front helping their engineers understand the problem domain, they could eliminate many bugs caused by misunderstood requirements.

The fault isn't just with the company, though. Engineers must make the commitment to learn the problem domain as well. Some engineers like to think they're building tools that enable a solution so that they can maintain their separation from the problem domain. As engineers, we're responsible for solving the problem, not merely enabling a solution!

An example of enabling a solution is a situation in which you design a user interface that, although it technically works, doesn't match the way the user works. Another example of enabling a solution is building your application in such a way that it solves the user's short-term problem but doesn't move forward to accommodate the user's changing business needs.

When solving the user's problem rather than just enabling a solution, you, the engineer, become as knowledgeable as you can about the problem domain so that your software product becomes an extension of the user. The best engineers are not those who can twiddle bits but those who can solve a user's problem.

Engineer Ignorance or Improper Training

Another significant cause of bugs is that developers don't understand the operating system, the language, or the technology their projects use. Unfortunately, few engineers are willing to admit this deficiency and seek training. Instead, they cover up their lack of knowledge and, unintentionally, introduce avoidable bugs.

In many cases, however, this ignorance isn't a personal failing so much as a fact of life in modern software development. So many layers and interdependencies are involved in developing software these days that no one person can be expected to know the ins and outs of every operating system, language, and technology. There's nothing wrong with admitting that you don't know something. In fact, if a team is healthy, acknowledging the strengths and limitations of each member works to the team's advantage. By cataloging the skills their developers have and don't have, the team can get the maximum advantage from their training dollars. By strengthening every developer's weaknesses, the team will better be able to adjust to unforeseen circumstances and, in turn, broaden the whole team's skill set.

The team can also schedule development time more accurately when team members are willing to admit what they don't know. If a team member needs to learn about a new technology in order to implement some part of the application but isn't given enough time, the schedule will almost certainly slip.

I'll have more to say about what skills and knowledge are critical for developers to know in the section "Prerequisites to Debugging" later in the chapter.

Lack of Commitment to Quality

The final reason that bugs exist in projects is, in my opinion, the most serious. Every company and every engineer I've ever talked to has told me that they are committed to quality. Unfortunately, some companies and engineers lack the real commitment quality takes. If you've ever worked at a company that was committed to quality or with an engineer who was, you certainly know it. They both feel a deep pride in what they are producing and are willing to spend the effort on all parts of development, not just the sexy parts. For example, instead of getting all wrapped up in the minutia of an algorithm, they pick a simpler algorithm and spend their time working on how best to test that algorithm. The customer doesn't buy algorithms, after all; the customer buys high-quality products. Companies and individuals with a real commitment to quality exhibit many of the same characteristics: careful up-front planning, personal accountability, solid quality control, and excellent communication abilities. Many companies and individuals go through the motions of the big software development tasks (that is, scheduling, coding, and so on), but only those who pay attention to the details ship on time with high quality.

A good example of a commitment to quality is when I had my first annual review at NuMega. One of the key parts of the review was to record how many bugs I had logged against the product. I was stunned to discover that NuMega would evaluate this statistic as part of my performance review, however, because even though tracking bugs is a vital part of maintaining a product's quality, no other company I had worked at had ever checked something so obvious. The developers know where the bugs are, but they must be given an incentive to enter those bugs into the bug tracking system. NuMega found the trick. When I learned about the bug count entry part of my review, you'd better believe I logged everything I found, no matter how trivial. With all the technical writers, quality engineers, development engineers, and managers engaged in healthy competition to log the most bugs, few surprise bugs slipped through the cracks. More important, we had a realistic idea of where we stood on a project at any given time.

When I was a development manager, I followed a ritual that I'm sure fostered a commitment to quality: each team member had to agree that the product was ready to go at every milestone. If any person on the team didn't feel that the product was ready, it didn't ship. I'd rather fix a minor bug and suffer through another complete day of testing than send out something the team wasn't proud of. Not only did this ritual ensure that everyone on the team thought that the quality was there, but it also gave everyone on the team a stake in the outcome. An interesting phenomenon that I noticed was that team members never got the chance to stop the release for someone else's bug; the bug's owner always beat them to it.

A company's commitment to quality sets the tone for the entire development effort. That commitment starts with the hiring process and extends through the final quality assurance on the release candidate. Every company says that it wants to hire the best people, but few companies are willing to offer salaries and benefits that will draw them. In addition, some companies aren't willing to provide the tools and equipment that engineers need to produce high-quality products. Unfortunately, too many companies resist spending $500 on a tool that will solve a nasty crash bug in minutes but are willing to blow many thousands of dollars to pay their developers to flounder around for weeks trying to solve that same bug.

If you do find yourself in an organization that suffers from a lack of commitment to quality, you'll find that there's no easy way to turn a company into a quality-conscious organization overnight. If you're a manager, you can set the direction and tone for the engineers working for you and work with upper management to lobby for extending a commitment to quality across the organization. If you're an engineer, you can work to make your code the most robust and extensible on the project so that you set an example for others.

Planning for Debugging

Now that we've gone over the types and origins of bugs and you have some ideas about how to avoid or solve them, it's time to start thinking about the process of debugging. Although many people start thinking about debugging only when they crash during the coding phase, you should think about it right from the beginning, in the requirements phase. The more you plan your projects up front, the less time—and money—you'll spend debugging them later.

As I mentioned earlier in the chapter, feature creep can be a bane to your project. More often than not, unplanned features introduce bugs and wreak havoc on a product. This doesn't mean that your plans must be cast in stone, however. Sometimes you must change or add a feature to a product to be competitive or to better meet the user's needs. The key point to remember is that before you change your code, you need to determine—and plan for—exactly what will change. And keep in mind that adding a feature doesn't affect just the code; it also affects testing, documentation, and sometimes even marketing messages. When revising your production schedule, a general rule to follow is that the time it takes to add or remove a feature grows exponentially the further along the production cycle you are.

In Steve McConnell's excellent book Code Complete (Microsoft Press, 1993, pp. 25-26), he refers to the costs of fixing a bug. To fix a bug during the requirements and planning phases costs very little. As the product progresses, however, the cost of fixing a bug rises exponentially, as does the cost of debugging—much the same scenario as if you add or remove features along the way.

Planning for debugging goes together with planning for testing. As you plan, you need to look for different ways to speed up and improve both processes. One of the best precautions you can take is to write file data dumpers and validators for internal data structures as well as for binary files, if appropriate. If your project reads and writes data to a binary file, you should automatically schedule someone to write a testing program that dumps the data in a readable format to a text file. The dumper should also validate the data and check all interdependencies in the binary file. This step will make both your testing and your debugging easier.

By properly planning for debugging, you minimize the time spent in your debugger, and this is your goal. You might think such advice sounds strange coming from a book on debugging, but the idea is to try to avoid bugs in the first place. If you build sufficient debugging code into your applications, that code—not the debugger—should tell you where the bugs are. I'll cover the issues concerning debugging code more in Chapter 3.