Planning for Adequate Testing: How Much Is Enough? | Software Testing Fundamentals: Methods and Metrics

Test coverage is the percentage of all the known tests that were actually attempted and completed. It can be used to answer the question, "How much did we test?" To calculate what percentage of the whole you have accomplished, you must have some idea how big it is. We use a test inventory to answer the question, "How big is it?"

Unfortunately, most test coverage in commercial software today seems to be estimated by gut feel. When the question is asked, "Did we test enough?", the answer is often along the lines of, "Well, we haven't found any bugs for a while; it's probably okay." This is part of the I-feel-lucky approach.

To make an objective and quantitative answer to the "Did we test enough?" question, a tester must first have made an estimate of how much there is to test. In my workshops, I ask how many testers build a list of all the things there are to test in a given system. Perhaps 1 tester in 30 reports regularly building such a list. Most testers report that they just start testing. If no one measured how much there was to test, then no one would ever know how much of it was tested. This is the "ignorance-is-bliss" corollary to the I-feel-lucky approach.

These testers always know that they did not test everything; but since they do not know exactly how much they did not test, they can rationalize at least temporarily that they tested enough. Even though enough testing is more a function of testing the right things than testing everything, we must have some idea how big the project is. This topic is discussed in detail later in this chapter (see Using Historical Data in Estimating Effort).

Planning for an Adequate Test Effort

Studies of production problems that I conducted in 1993 and 1994 showed that 80 to 90 percent of the bugs found in production were encountered during testing. About 60 percent of these bugs were difficult to reproduce. The test effort did not have sufficient resources to track those bugs down and get them fixed. Test coverage was sufficient, but the test effort was not. As mentioned in an earlier chapter, one of the myths in the software industry is that we always fix every bug we find. While this is generally true in safety-critical systems, it is not the case in most commercial software. A test effort does not usually fail because bugs went undetected but because an unacceptable number of bugs were shipped with the product.

In many cases, the main reason that these bugs can be hard to reproduce is that they only exist in certain environments-that is, they are not in the software being tested but in other supporting software in the system. How can a software maker be held responsible for bugs in someone else's screen driver or printer driver program? On the other hand, how is a normal user or an average tester supposed to recognize that the reason a machine locks up sometimes when he or she selects a graphing function in the word processor is a defect in a certain screen driver?

Another reason that some bugs are hard to reproduce is that the systems where the software is running are large, multiuser systems. These are not finite state machines. Our software systems are complex, event-driven societies of interactive applications. For example, the tester is listening to her favorite CD on her PC while using her LAN-connected Internet browser to do research for the paper she is writing on her word processor, and both the word processor and the browser are open on the screen. In addition to the operating system, there are a variety of drivers, like the mouse, video, memory, fax, and the CD, running on the windowing operating system concurrently. The system locks up and the screen goes black when the fax tries to answer an incoming call. Generally duplicating or even completely documenting the actual state of such a system when the fatal event occurs is impossible, even when the system in question is a single standalone machine, let alone when it exists in a network. Consequently, re-creating the exact state(s) that existed in one of these systems at the moment a failure occurred is generally impossible.

Testers alone cannot hope to catalog and control today's complex environments; development must take a proactive approach to making the product testable. Studies have shown that a significant percentage of production problems have been hard to reproduce because there are insufficient diagnostics in the environment, or misleading or erroneous error messages have been displayed when the bugs occurred. Good defensive programming and rich environmental diagnostics are required in order to isolate these bugs. In some cases, the best fix possible is simply a notice that explains a bug that might happen. In any case, if the test effort fails to get enough bugs fixed, it will be judged a failure.

Just as the creation of the developer is the code that performs the functions in the product, the creation of the tester is tests that demonstrate the reliability of the product. The testers want to create the most bug-free product possible. The tester's frustration is that he or she cannot usually fix the bugs found. The developers must be persuaded to do it-in some cases, third-party developers. Sound test methods and good measurements are critical. They are the basis of all persuasive efforts with development and management, as well as the professional credibility of the testers. Without good measurements to give management a clear idea of the importance and possible impact of these hard-to-reproduce bugs, it will be difficult to convince them to spend money and possibly slip delivery dates to hunt for them.

The point here is that test coverage alone is not enough to ensure a successful test effort. There are two parts to the successful test effort: adequate test coverage and an adequate test effort. Both parts must be considered in the planning process.

The most effective way to determine if the test coverage is adequate is by measuring. Worksheets are an excellent tool to aid you in accumulating the measurements and applying factors of safety for various tests and related tasks. I use worksheets both for estimating the resource requirements for the test effort and for keeping track of the actual counts during the test effort. I also use them to measure the actual performance of the test effort after the product has been deployed. This performance measurement provides the basis of the factor of safety that I will be applying to correct the worksheet estimates of the next test effort.

The remainder of this chapter deals with building the test inventory, which is the basis of the worksheet. Each of the next three chapters deals with techniques necessary to flesh out the worksheet and the test inventory. I discuss techniques for automating the creation and maintenance of the worksheet, as well as the calculations on the worksheet, throughout.