Bugs and Debugging | Debugging Microsoft .NET 2.0 Applications

Bugs are cool! They help you learn the most about how things work. We all got into this business because we like to learn, and tracking down bugs is the ultimate learning experience. I don't know how many times I've had nearly every programming book I own open and spread out across my office looking for a good bug. It feels just plain great to find and fix those bugs! Of course, the coolest bugs are those that you find before the customer sees your product. That means that you have to do your job to find those bugs before your customers do. Having your customers find them is extremely uncool.

Compared with other engineering fields, software engineering is an anomaly in two ways. First, software engineering is a new and somewhat immature branch of engineering compared with other forms of engineering that have been around for a while, such as structural and electrical engineering. Second, users have come to accept bugs in our products, particularly in PC software. Although they grudgingly resign themselves to bugs on PCs, they're still not happy when they find them. Interestingly enough, those same customers would never tolerate a bug in a nuclear reactor design or a piece of medical hardware. With PC software becoming more a part of people's lives, the free ride that the software engineering field has enjoyed is nearly over. I don't doubt that the liability laws that apply to other engineering disciplines will eventually cover software engineering also.

You need to care about bugs because ultimately, they are costly to your business. In the short term, customers contact you for help, forcing you to spend your time and money sustaining the current product while your competitors work on their next versions. In the long term, the invisible hand of economics kicks in, and customers just start buying alternatives to your buggy product. Software is now more of a service than a capital investment, so the pressure for higher-quality software will increase. With every application supporting Extensible Markup Language (XML) for input and output, your users are almost able to switch among software products from various vendors just by moving from one Web site to another. This boon for users will mean less job security for you and me if our products are buggy and more incentive to create high-quality products. Let me phrase this another way: the buggier your product, the more likely you are to have to look for a new job. If there's anything that engineers hate, it's going through the job-hunting process.

What Are Bugs?

Before you can start debugging, you need a definition of "bugs." My definition of a bug is "anything that causes a user pain." I classify bugs into the following categories:

Crashes and hangs
Poor performance and scalability
Incorrect results
Security exploits
Inconsistent user interfaces
Unmet expectations

Crashes and Hangs

Crashes and hangs are what most developers and users think of when they think of a bug. Users might be able to work around other types of bugs I'll be describing, but obviously, crashes and hangs stop them dead, which is why the majority of this book concentrates on solving these extreme problems and finding ways to test them out of your code. As we all know, some of these bugs are easy to solve, and others are almost impossible. The main point to remember about crashes and hang bugs is that you should never ship a product if you know it has one of these bugs in it. Some may argue that you can ship with that rare crash or hang problem, but if you know about it and can duplicate it, even if it's hard to duplicate, you need to fix it.

Fortunately, Microsoft .NET eliminates many of the nasty and bizarre crash problems we all spent countless late nights tracking down when developing the native code. Of course, if you're using native components in your application, they still have the power to reach up and scramble the Common Language Runtime (CLR) internals at any time, so we are not free of crashes yet. In the .NET world, unhandled exceptions are the bane of the end-user existence. With the wonderful Exception class as the root of all errors, we have a clean way to figure out where the problem originated, and with a small bit of code that I'll discuss later in the book, you can easily build a super-smart error reporting system much like Microsoft's Windows Error Reporting, which has contributed magnificently to improving Microsoft Windows and Microsoft Office.

Poor Performance and Scalability

Users are very frustrated by bugs that cause the application to slow down when it encounters real-world data. Invariably, improper testing is the root of all poor performance bugshowever great the application might have looked in development, the team failed to test it with anything approaching real-world volumes or scenarios. One project I worked on years ago, NuMega's BoundsChecker 3.0, had this bug with its original FinalCheck technology. That version of FinalCheck inserted additional debugging and contextual information directly into the source code so that BoundsChecker could better report errors. Unfortunately, we failed to sufficiently test the FinalCheck code on larger real-world applications before we released BoundsChecker 3.0. As a result, more users than we cared to admit couldn't use that feature. We completely rewrote the FinalCheck feature in subsequent releases, but because of the performance problems in the original version, many users never tried it again, even though it was one of the product's most powerful and useful features. Interestingly enough, we released BoundsChecker 3.0 in 1995 and I still have people eleven years laterat least four eons in Internet timetelling me that they still hadn't used FinalCheck because of one bad experience!

From the project management level, a major change of thinking needs to take place when it comes to performance. Performance can never be an afterthought; it's a feature in its own right. A major mistake I've seen in my consulting work is that very few developers have performance numbers that they have to meet. By having those numbers, you have a goal to meet to avoid the poor performance pitfall. If you don't have a performance number, I'll give you a patented, guaranteed way of setting a level of performance: don't make it any slower than the last version. Now the question is all about what that number is for the current version, so you'll have to go out and determine it. If you're working on new development, you'll need to start with those numbers from the requirements phase.

Once you have some performance numbers, you have to work at monitoring them. If your application starts missing those numbers by 10 percent or more, you need to stop and determine why your performance dropped and take steps to correct the problem. Also, make sure that you test your applications against scenarios that are as close to the real world as possibleand do this as early in the development cycle as you can.

Here's one common question I continually get from developers: "Where can I get those real-world data sets so that I can do performance testing?" The answer is to talk to your customers. It never hurts to ask whether you can get their data sets so that you can do your testing. If a customer is concerned about privacy issues, take a look at writing a program that will change sensitive information. You can let the customer run that program and ensure that the changes hide sufficient sensitive information so that the customer feels comfortable giving you the data.

A good friend of mine related a story about the time she was working for a company doing financial data applications for private banking companies. When the development team asked the major client for some real-world data to test against, they got quite a shock. The data was live and completely unobfuscated! Her initial thought was to start doing some data mining for all men under 30 years old worth over $10 million. She immediately deleted all instances of the data and asked the client for obfuscated data.

It should come as no shock to anyone reading this book, but in the modern world of .NET, performance problems can be a major part of your debugging challenges. Although .NET does a great job of eliminating much of the annoying development challenges we faced in the native C++ days, the cost of the ease is that it's harder to see into the "black box" of the Common Language Runtime (CLR) itself.

Incorrect Results

This type of bug is very subtle and potentially malicious. If you're working on an application for a bank, a bad calculation in the middle of a huge set of data can have a ripple effect that can lead to data that looks correct but costs the bank serious money. Although we have debuggers and performance tools to track down crashes and data corruption, there's very little in the way of tools for finding incorrect result problems. To find incorrect results, you're going to need to write code that checks the results.

A perfect example of double-checking all results is the Microsoft Office Excel recalculation engine. In debug builds, the normal, highly optimized calculation engine does its work, and after it finishes, a second debugging-only engine checks that all values are correct from the first engine. You need to have that test code because the alternative is to manually check all the outputs from your code. If you thought writing some of the code was tedious, spend weeks manually calculating data!

Security Exploits

You almost can't look at the news these days without seeing another story about data theft or a security hole in some company's Web site. Everyone in an organization from the receptionist on to the CEO has security at the top of his or her worry list. No matter how great your application performs or looks, a single SQL injection attack can change you from hero to loser in fifteen seconds.

Just like performance and scalability, security can't be an afterthought; it has to be treated as a full feature right from the requirements phase. You have to have someone on the team who's responsible for the security of the product. However, don't make the mistake of randomly assigning someone as the security czar for the project on top of his or her other duties. This role has to have the time to plan security testing and threat modeling for the project as a whole. As Michael Howard and David LeBlanc say in the seminal Writing Secure Code, 2^nd Edition (Microsoft Press, 2003), "Secure products are quality products."

Inconsistent User Interfaces

Inconsistent user interfaces, though not the most serious type of bug, are annoying. One reason for the success of the Windows operating system is that all Windows-based applications generally behave the same way. When an application deviates from the Windows standard, it becomes a burden for the user. An excellent example of this nonstandard, irksome behavior is in the Find accelerators in Microsoft Office Outlook. In every other English-language Windows-based application on the planet, Ctrl+F brings up the Find dialog box so that you can find text in the current window. In Office Outlook, however, Ctrl+F forwards the open message, which I consider a bug. Even after many years of using Outlook, I can never remember to use the F4 key to find text in the currently open message. Maybe if we all file bug reports about the incorrect accelerator key, Microsoft will finally fix this glaring problem.

With client applications, it's easy to solve problems with inconsistent user interfaces by following the recommendations in the Windows Vista User Experience Guidelines, available at http://msdn.microsoft.com/windowsvista/building/ux/default.aspx. If that book doesn't address a particular issue, look for another Microsoft application that does something similar to what you're trying to achieve and follow its model. Microsoft seems to have infinite resources and unlimited time; if you take advantage of their extensive research, solving consistency problems won't be so expensive.

If you're working on Web front ends, life is much more difficult because there's no standard for user interface display. As we've all experienced from the user perspective, it's quite difficult to get a good user interface (UI) in a Web browser. For developing strong Web client UIs, I can recommend two books. The first is the standard bible on Web design, Jacob Nielsen's Designing Web Usability: The Practice of Simplicity (New Riders Press, 2000). The second is an outstanding small book that you should give to any self-styled usability experts on your team who couldn't design their way out of a wet paper bag (such as any executive who wants to do the UI but has never used a computer): Steve Krug's Don't Make Me Think! A Common Sense Approach to Web Usability (New Riders Press, 2000). Whatever you do for your Web UI, keep in mind that not all your users will have 100-MB-per-second pipes for their browsers, so keep your UI simple and avoid lots of fluff that takes forever to download. When doing research on great Web clients, User Interface Engineering (www.uie.com) found that approaches, such as the UI on CNN.com, worked best with all users. A simple set of clean links with information groups under clean sections lets users find what they are looking better than anything else.

Another excellent way to handle usability testing comes from the excellent mind of Joel Spolskey, whose blog all software developers absolutely must subscribe to. In his "The Joel Test: 12 Steps to Better Code" entry (http://www.joelonsoftware.com/articles/fog0000000043.html), his Step 12 is "Do you do hallway usability testing?" The idea is to grab five people walking down the hall and force them to use the code you just wrote. According to Joel, you'll find 95 percent of your usability problems immediately.

Unmet Expectations

Not meeting the user's expectations is one of the hardest bugs to solve. This bug usually occurs right at the beginning of a project when the company doesn't do sufficient research on what the real customer needs. In both types of shopsshrinkwrap (those writing software for sale) and Information Technology (or IT, which are those writing in-house applications)the cause of this bug comes down to communication problems.

In general, development teams don't communicate directly with their product's customers, so they aren't learning what the users need. Ideally, all members of the engineering team should be visiting customer sites so that they can see how the customers use their product. Watching over a customer's shoulder as your product is being used can be an eye-opening experience. Additionally, this experience will give you the insight you need to properly interpret what customers are asking your product to do. If you do get to talk to customers, make sure you speak with as many as possible so that you can get input from across a wide spectrum. In fact, I would strongly recommend that you stop reading right now and go schedule a customer meeting. I can't say it strongly enough: the more you talk with customers, the better an engineer you'll be.

In addition to customer visits, another good idea is to have the engineering team review the support call summaries and support e-mail messages. This feedback will allow the engineering team to see the problems that the users are having without any filtering applied. These visits can lead to all sorts of interesting ideas in addition to tools to help diagnose problems.

Another aspect of this kind of bug is the situation in which the user's level of expectation has been raised higher than the product can deliver. This inflation of user expectations is the classic result of too much hype, and you must resist misrepresenting your product's capabilities at all costs. When users don't get what they anticipated from a product, they tend to feel that the product is even buggier than it really is. The rule for avoiding this situation is to never promise what you can't deliver and to always deliver what you promise.

Process Bugs and Solutions

Although shipping software without bugs is theoretically possibleprovided you give enough attention to detail and almost infinite timeI've shipped enough products to know that most companies would go out of business if they tried that. Bugs are a fact of life in this business, even at places like the National Aeronautics and Space Administration (NASA) Software Engineering Laboratory, which is considered the most bug-free development shop in the world. However, you can minimize the number of bugs your applications have. That is what teams that ship high-quality productsand there are many out theredo. The reasons for bugs generally fall into the following process categories:

Short or impossible deadlines
The "Code First, Think Later" approach
Misunderstood requirements
Engineer ignorance or improper training
Lack of commitment to quality

Short or Impossible Deadlines

We've all been part of development teams for which "management" has set a deadline that was determined by either a tarot card reader or, if that was too expensive, throwing a dart at the calendar. Although we'd like to believe that managers are responsible for most unrealistic schedules, more often than not, they aren't to blame. Engineers' work estimates are usually the basis of the schedule, and sometimes engineers underestimate how long it will take them to develop a solid product. Engineers are funny people. They are introverted but almost always very positive thinkers. Given a task, they believe down to their bones that they can make the computer stand up and dance. If their manager comes to them and says that they have to add an XML transform to the application, the average engineer says "Sure, boss! It'll be three days." Of course, that engineer might not even know how to spell "XML," but he'll know it'll take three days. The big problem is that engineers and managers don't take into account the learning time necessary to make a feature happen. In the section "Scheduling Time for Building Debugging Systems" in Chapter 2, I'll cover some of the rules that you should take into account when scheduling. Whether an unrealistic ship date is the fault of management or engineering or both, the bottom line is that a schedule that's impossible to meet leads to cut corners and a lower-quality product.

I've been fortunate enough to work on several teams that have shipped software on time. In each case, the development team truly owned the schedule, and we were good at determining realistic ship dates. To figure out realistic ship dates, we based our dates on a feature set. If the company found the proposed ship date unacceptable, we cut features to move up the date. In addition, everyone on the development team agreed to the schedule before we presented it to management. That way, the team's credibility was on the line to finish the product on time. Interestingly, besides shipping on time, these products were some of the highest-quality products I've ever worked on.

The "Code First, Think Later" Approach

My friend Peter Ierardi coined the term "Code First, Think Later" to describe the all-too-common situation in which an engineering team starts programming before they start thinking. Every one of us is guilty of this approach to an extent. Playing with compilers, writing code, and debugging is the fun stuff; it's why we got interested in this business in the first place. Very few of us like to sit down and write documents with UML diagrams that describe what we're going to do.

If you don't write these documents, however, you'll start to run into bugs. Instead of stopping and thinking about how to avoid bugs in the first place, you'll start tweaking the code as you go along to work around the bugs. As you might imagine, this tactic will compound the problem because you'll introduce more and more bugs into an already unstable code base. The company I work for goes around the world helping debug the nastiest problems that developers encounter. Unfortunately, many times we are brought in to help solve corruption or performance problems, and there's nothing we can do because the problems are fundamentally architectural. When we bring the problems to the management who hired us and tell them it's going to take a partial rewrite to fix the problems, we sometimes hear, "We've got too big an investment in this code base to change it now." That's a sure sign of a company that has fallen into the "Code First, Think Later" problem. When reporting on a client, we simply report "CFTL" as the reason we were unsuccessful when helping them.

Fortunately, the solution to this problem is simple: plan your projects. Some very good books have been written about requirement gathering and project planning. Although it is not every engineer's idea of a good time and is generally a little painful, up-front planning is vital to eliminating bugs.

One of the big complaints I got on previous versions of this book was that I recommended that you plan your projects but didn't tell you how to do it. That complaint is perfectly valid, and I want to make sure that I address it now. The only problem is that I really don't know how because I'm certainly not a project management guru. Now you're wondering if I'm doing the bad author thing and leaving it as an exercise to the reader. Read on, and I'll tell you what planning tactics have worked for me as an engineer and frontline manager. You'll still want to read books and articles on project management, but I hope my tactics provide you with some ideas to handle the battles all us engineers face when planning software.

If you read my bio at the end of the book, you'll notice that I didn't get started in the software business until I was in my late 20s and that it's really my second career. My first career was to jump out of airplanes and hunt down the enemy, as I was a paratrooper and Green Beret in the United States Army. If that's not preparation for the software business, I don't know what is! Of course, if you meet me now, you'll see just a short fat guy with a pasty green glowa result of sitting in front of a monitor too much. However, I really used to be a man. I really did!

Being a Green Beret taught me how to plan. When you're planning a special operations mission and the odds are fairly high that you could die, you are extremely motivated to do the best planning possible. When planning one of those operations, the Army puts the whole team in isolation. At Fort Bragg, North Carolina, the home of Special Forces, there are special areas where they actually lock the team away to plan the mission. The whole key during the planning was called "what if-ing yourself to death." We'd sit around and think about scenarios. What happens if we're supposed to parachute in and we pass the point of no return and the Air Force can't find the drop zone? What happens if we have casualties before we jump? What happens if we hit the ground and can't find the guerilla commander we're supposed to meet? What happens if the guerilla commander we're supposed to meet has more people with him than he's supposed to? What happens if we're ambushed? We'd spend forever thinking up questions and devising the answers to these questions before ever leaving isolation. The idea was to have every contingency planned out so that nothing was left to chance. Trust me: when there's a good chance you might die when doing your job, you want to know all the variables and account for them. I tell this story to illustrate the truth of the statement I heard long ago in the Army: Plans are worthless, but planning is everything. We worked so hard to look at the problem for every conceivable angle that we were able to quickly adapt when the unexpected happened.

When I got into the software business, that's the kind of planning I was accustomed to doing. The first time I sat in a meeting and said, "What if Bob dies before we get through the requirements phase?" everyone got quite nervous, so now I phrase questions with a less morbid spin, like "What if Bob wins the lottery and calls in rich before we get through the requirements phase?" However, the idea is still the same. Find all the areas of doubt and confusion in your plans and address them. It's not easy to do and will drive weaker engineers crazy, but the key issues will always pop out if you drill down enough. For example, in the requirements phase, you'll be asking questions such as, "What if our requirements aren't what the user wants?" Such questions will prompt you to budget time and money to find out if those requirements are what you need to be addressing. In the design phase, you'll be asking questions like, "What if our performance isn't good enough?" Such questions will make you remember to sit down and set your performance goals and start planning how you're going to achieve those goals by testing against real-world scenarios. Planning is much easier if you can get all the issues on the table. Just be thankful that your life doesn't depend on shipping software on time!

If you've ever done some reading on risk analysis, the "what if" approach might sound very familiar because risk analysis is the fancy work for my "what if" approach. In risk analysis, you're trying to find all the issues and, more importantly, assign levels to those risks so you can determine the possibilities of those issues coming back to hurt you. By finding the issues that will cost the most or have the highest probability, you can deal with them before they blow up in your face. Every project has its own peculiar risk issues, but unless you have an idea of what they are, you're never going to get the job done.

Debugging War Story: Severe CFTL

The Battle

A client called us in because they had a big performance problem and the ship date was fast approaching. One of the first things we ask for when we start on these emergency problems is a 15-minute architectural overview so that we can get up to speed on the terminology and get an idea of how the project fits together. The client hustled in one of the architects, and he started the explanation on the whiteboard.

Normally, these circle-and-arrow sessions take 10 to 15 minutes. However, this architect was still going strong 45 minutes later, and I was getting confused because I needed more than a roadmap to keep up. I finally admitted that I was totally lost and asked again for the 10-minute system overview. I didn't need to know everything; I just needed to know the high points. The architect started again and in 15 minutes was only about 25 percent through the system!

The Outcome

This was a large COM system, and at about this point I started to figure out what the performance problem was. Evidently, some architect on the team had become enamored with COM. He didn't just sip from a glass of COM Kool-Aid; he immediately started guzzling from the 55-gallon drum of COM. In what I later guessed was a system that needed 8 to 10 main objects, this team had over 80! To give you an idea how ridiculous this was, it was as if every character in a string was a COM object. This thing was over-engineered and completely under-thought. It was the classic case in which the architects had zero hands-on experience.

After about a half a day, I finally got the manager off to the side and said that there wasn't much we could do for performance because the overhead of COM itself was the problem. He was none too happy to hear this and immediately blurted out this infamous phrase: "We've got too big an investment in this code to change now!" Unfortunately, with their existing architecture, we couldn't do much to effect a performance boost.

The Lesson

This project suffered from several major problems right from the beginning. First, team members handed over the complete design to non-implementers. Second, they immediately started coding when the plan came down from on high. There was absolutely no thought other than to code this thing up and code it up now. It was the classic "Code First, Think Later" problem preceded by "No-Thought Design." I can't stress this enough: you have to get realistic technology assessments and plan your development before you ever turn on the computer.

Misunderstood Requirements

Proper planning also minimizes one of the biggest bug causers in development: feature creep. Feature creepthe tacking on of features not originally plannedis a symptom of poor planning and inadequate requirements gathering. Adding last-minute features, whether in response to competitive pressure, as a developer's pet feature, or on the whim of management, causes more bugs in software than almost anything else.

Software engineering is an extremely detail-oriented business. The more details you hash out and solve before you start coding, the fewer you leave to chance. The only way to achieve proper attention to detail is to plan your milestones and the implementation for your projects. Of course, this doesn't mean that you need to go completely overboard and generate thousands of pages of documentation describing what you're going to do.

One of the best design documents I ever created for a product was simply a series of paper drawings, or paper prototypes, of the user interface. Based on research and on the teachings of Jared Spool and his company, User Interface Engineering, my team drew the user interface and worked through each user scenario completely. In doing so, we had to focus on the requirements for the product and figure out exactly how the users were going to perform their tasks. In the end, we knew exactly what we were going to deliver, and more importantly, so did everyone else in the company. If a question about what was supposed to happen in a given scenario arose, we pulled out the paper prototypes and worked through the scenario again.

Even though you might do all the planning in the world, you have to really understand your product's requirements to implement them properly. At one company where I workedmercifully, for less than a yearthe requirements for the product seemed very simple and straightforward. As it turned out, however, most of the team members didn't understand the customers' needs well enough to figure out what the product was supposed to do. The company made the classic mistake of drastically increasing the number of engineers but failing to train the new engineers sufficiently. Consequently, even though the team planned everything to extremes, the product shipped several years late, and the market rejected it.

There were two large mistakes on this project. The first was that the company wasn't willing to take the time to thoroughly explain the customers' needs to the engineers who were new to the problem domain, even though some of us begged for the training. The second mistake was that many of the engineers, both old and new, didn't care to learn more about the problem domain. As a result, the team kept changing direction each time marketing and sales reexplained the requirements. The code base was so unstable that it took months to get even the simplest user scenarios to work without crashing.

Very few companies train their engineers in their problem domain at all. Although many of us have college degrees in engineering, we generally don't know much about how customers will use our products. If companies spent adequate time up front helping their engineers understand the problem domain, they could eliminate many bugs caused by misunderstood requirements.

The fault isn't just with the companies, though. Engineers must also make the commitment to learn the problem domain. Some engineers like to think that they're building tools that enable a solution so that they can maintain their separation from the problem domain. As engineers, we're responsible for solving the problem, not merely enabling a solution!

An example of enabling a solution is a situation in which you design a user interface that technically works but doesn't match the way the user works. Another example of enabling a solution is building your application in such a way that it solves the user's short-term problem but doesn't move forward to accommodate the user's changing business needs.

When solving the user's problem rather than just enabling a solution, you, as the engineer, become as knowledgeable as you can about the problem domain so that your software product becomes an extension of the user. The best engineers are not those who can twiddle bits but those who can solve a user's problem.

Engineer Ignorance or Improper Training

Another significant cause of bugs results from developers who don't understand the operating system, the language, or the technology their projects use. Unfortunately, few engineers are willing to admit this deficiency and seek training. Instead, they cover up their lack of knowledge and, unintentionally, introduce avoidable bugs.

In many cases, however, this ignorance isn't a personal failing so much as a fact of life in modern software development. So many layers and interdependencies are involved in developing software these days that no one person can be expected to know the ins and outs of every operating system, language, and technology. There's nothing wrong with admitting that you don't know something. It's not a sign of weakness, and it won't take you out of the running to be the office's alpha geek. In fact, if a team is healthy, acknowledging the strengths and limitations of each member works to the team's advantage. By cataloging the skills their developers have and don't have, the team can get the maximum advantage from their training dollars. By strengthening every developer's weaknesses, the team will better be able to adjust to unforeseen circumstances and, in turn, broaden the whole team's skill set. The team can also schedule development time more accurately when team members are willing to admit what they don't know. You can build in time for learning and create a much more realistic schedule if team members are candid about the gaps in their knowledge.

The best way to learn about a technology is to do something with that technology. Years ago, when NuMega sent me off to learn about Microsoft Visual Basic so that we could write products for Visual Basic developers, I laid out a schedule for what I was going to learn, and my boss was thrilled. The idea was to develop an application that insulted you, appropriately called "The Insulter." Version 1 was a simple form with a single button that, when pressed, popped up a random insult from the list of hard-coded insults. The second version read insults from a database and allowed you to add new insults by using a form. The third version connected to the company's Microsoft Exchange server and allowed you to e-mail insults to others in the company. My manager was very happy to see how and what I was going to do to learn the technology. All your manager really cares about is being able to tell his boss what you're doing day to day. If you give your manager that information, you'll be his favorite employee. When I had my first encounter with Microsoft .NET, I simply dusted off the Insulter idea, and it became Insulter .NET!

I'll have more to say about what skills and knowledge are critical for developers to have in the section "Prerequisites to Debugging" later in this chapter.

Lack of Commitment to Quality

The final reason that bugs exist in projects is, in my opinion, the most serious. Every company and every engineer I've ever talked to has claimed to be committed to quality. Unfortunately, some companies and engineers lack the real commitment that quality requires. If you've ever worked at a company that was committed to quality or with an engineer who was, you certainly know it. They both exude a deep pride in what they are producing and are willing to spend the effort on all parts of development, not on just the sexy parts. For example, instead of getting all wrapped up in the minutia of an algorithm, they pick a simpler algorithm and spend their time working on how best to test that algorithm. The customer doesn't buy algorithms, after all; the customer buys high-quality products. Companies and individuals with a real commitment to quality exhibit many of the same characteristics: careful up-front planning, personal accountability, solid quality control, and excellent communication abilities. Many companies and individuals go through the motions of the big software development tasks (that is, scheduling, coding, and so on), but only those who pay attention to the details ship high-quality products on time.

A good example of a commitment to quality is when I had my first monthly review at NuMega. First off, I was astounded that I was getting a review that quickly when normally you have to beg for any feedback from your managers. One of the key parts of the review was to record how many bugs I had logged against the product. I was stunned to discover that NuMega would evaluate this statistic as part of my performance review, however, because even though tracking bugs is a vital part of maintaining a product's quality, no other company I had worked at had ever checked something so obvious. The developers know where the bugs are, but they must be given an incentive to enter those bugs into the bug tracking system. NuMega found the trick. When I learned about the bug count entry part of my review, you'd better believe that I logged everything I found, no matter how trivial. With all the technical writers, quality engineers, development engineers, and managers engaged in healthy competition to log the most bugs, few surprise bugs slipped through the cracks. More important, we had a realistic idea of where we stood on a project at any given time.

Two additional examples from the engineering side come from the first two editions of this book. The first edition's companion CD has over 2.5 MB of source code. The second edition, which covers both native and .NET debugging, has over 6.9 MB of source code (and that wasn't compiled code, it was just the source code!). That's a huge amount of code, and I'm happy to say many times more than what you get with most books. What many people don't realize is that I spent nearly 60 percent of the time on both books just testing the code. People get really excited when they find a bug in the Bugslayer's code, and the last thing I want is one of those "Gotcha! I found a bug in the Bugslayer!" e-mails.

In both books, I failed in my goal of zero bugs. In the first edition, I have five, and the second edition had 13 reported. With the phenomenal testing tools that are part of Microsoft Visual Studio 2005, I'm striving extremely hard for zero bugs in this edition. The classic software book, Software Reliability: Measurement, Prediction, Application, by John D. Musa, Anthony Iannino, and Kazuhira Okumoto (McGraw-Hill Book Company, 1987), states that the average code contains one bug per ten lines of code. What's the secret to producing that much code and keeping the bug counts down? It's simple: testing the heck out of it. In fact, it's such a big topic that I'm dedicating an entire book on testing to follow this one, called, appropriately enough, Testing .NET 2.0 Applications.

As a development manager, I follow a ritual that I'm sure fosters a commitment to quality: each team member has to agree that the product is ready to go at every milestone. If any person on the team doesn't feel that the product is ready, it doesn't ship. I'd rather fix a minor bug and suffer through another complete day of testing than send out something the team wasn't proud of. Not only does this ritual ensure that everyone on the team believes the quality was there, but it also gives everyone on the team a stake in the outcome. An interesting phenomenon I noticed was that team members never stop the release for someone else's bug; the bug's owner always beats them to it.

A company's commitment to quality sets the tone for the entire development effort. That commitment starts with the hiring process and extends through the final quality assurance on the release candidate. Every company says that it wants to hire the best people, but few companies are willing to offer salaries and benefits that will draw them. In addition, some companies aren't willing to provide the tools and equipment that engineers need to produce high-quality products. Unfortunately, too many companies resist spending $500 on a tool that will solve a nasty crash bug in minutes but are willing to spend many thousands of dollars to pay their developers to flounder around for weeks trying to solve that same bug.

A company also shows its commitment to quality when it does the hardest thing to do in businessfire people who are not living up to the standards the organization set. When building a great team full of people on the right side of the bell curve, you have to work to keep them there. We've all seen the person whose chief job seems to be stealing oxygen but who keeps getting raises and bonuses like yours even though you're killing yourself and working late nights and sometimes weekends to make the product happen. The result is good people quickly realizing that the effort isn't worth it. They start slacking off or, worse yet, looking for other jobs.

When I was a project manager on one project, I dreaded doing it, but I fired someone two days before Christmas. I knew that people on the team were feeling that this one individual wasn't working up to standards. If they came back from the Christmas holiday with that person still there, I'd start losing the team we had worked so hard to build. I had been documenting the person's poor performance for quite a while, so I had the proper reasons for proceeding. Trust me, I would rather have been shot at again in the Army than fire that person. It would have been much easier to let it ride, but my commitment was to my team and to the company to do the quality job I had been hired to do. It was better to go through that upheaval than to have anyone turn off and stop performing. I agonized over every firing, but I had to do it. A commitment to quality is extremely difficult and will mean that you'll have to do things that will keep you up at night, but that's what it takes to ship great software and take care of your people.

If you do find yourself in an organization that suffers from a lack of commitment to quality, you'll find that there's no easy way to turn a company into a quality-conscious organization overnight. If you're a manager, you can set the direction and tone for the engineers working for you and work with upper management to lobby for extending a commitment to quality across the organization. If you're an engineer, you can work to make your code the most robust and extensible on the project so that you set an example for others.

Planning for Debugging

Now that we've gone over the types and origins of bugs, and you have some ideas about how to avoid or solve them, it's time to start thinking about the process of debugging. Although many people start thinking about debugging only when they have a crash during the coding phase, you should think about it right from the beginning, in the requirements phase. The more you plan your projects up front, the less timeand moneyyou'll spend debugging them later.

As I mentioned earlier in the chapter, feature creep can be a bane to your project. More often than not, unplanned features introduce bugs and wreak havoc on a product. This doesn't mean that your plans must be cast in stone, however. Sometimes you must change or add a feature to a product to be competitive or to better meet the user's needs. The key point to remember is that before you change your code, you need to determineand plan forexactly what will change. And keep in mind that adding a feature doesn't affect only the code; it also affects testing, documentation, and sometimes even marketing messages. When revising your production schedule, a general rule to follow is that the time it takes to add or remove a feature grows exponentially the further along the production cycle you are.

In Steve McConnell's excellent book, Code Complete, 2nd Edition (Microsoft Press, 2004, pp. 2930), he discusses the costs of fixing a bug. To fix a bug during the requirements and planning phases costs very little. As the product progresses, however, the cost of fixing a bug rises exponentially as does the cost of debuggingmuch the same as if you add or remove features along the way.

Planning for debugging goes together with planning for testing. As you plan, you need to look for different ways to speed up and improve both processes. One of the best precautions you can take is to write file data dumpers and validators for internal data structures and for binary files, if appropriate. If your project reads and writes data to a binary file, you should automatically assign someone to write a testing program that dumps the data in a readable format to a text file. The dumper should also validate the data and check all interdependencies in the binary file. This step will make both your testing and your debugging easier.

By properly planning for debugging, you minimize the time spent in your debugger, which is your goal. You might think such advice sounds strange coming from a book on debugging, but the idea is to try to avoid bugs in the first place. If you build sufficient debugging code into your applications, that codenot the debuggershould tell you where the bugs are. I'll cover the issues concerning debugging code more in Chapter 3.