Section 8.8. Using Software Testing Effectively | Applied Software Project Management

8.8. Using Software Testing Effectively

Software testing is often the most difficult area of software development for a project manager to put in place because, unlike other disciplines of software engineering, many people have a negative attitude toward it. Some people think that good programmers simply don't write bugs, so if they hire good programmers, then they don't need testers. Others only see software testing as something that holds back development. However, there are many organizations where software testing is used as an effective tool for reducing project schedules and increasing user satisfaction. If a project manager works to combat the negative attitudes about testing, he can achieve these results on his own projects.

Commonly, people (mistakenly) think about software testing in one of two ways. Sometimes they expect testers to be "all-powerful." They should be able to catch every single bug in the software so that there will never be a user error or complaint, ever. To someone with this mindset, any complaint that comes from a user is seen as a failure of the software testers. On the other hand, people sometimes consider software testers to be little more than bean counters, whose job is to simply look for typos and for "insignificant" errors that the programmers might have missed in their "exhaustive" testing. Paradoxically, some people can even hold both of these misconceptions simultaneously. In other words, the expectations put on software testers are not only impossible to meet, but often contradictory.

When people have one of these common misunderstandings about what it is that software testers do, it generally leads to an even bigger misunderstanding of what quality is. They do not think about quality as how far a product has deviated from its specifications. Instead, they think about it as some sort of theoretical limit that would be really great to achieve, but could never actually be accomplished in practice.

Much of this confusion about quality and software testing comes from the fact that software testers do not add to the software; they "just" run it and report any problems. The actual mechanics of testing seem almost stagnant to somebody who does not really understand its purpose. They feel that the software is built, yet there are these testers who are running the same tests over and over on the same piece of software, keeping the organization from releasing it. It falls to the project manager to defend the software quality by keeping the test activities from being shortchanged.

8.8.1. Understand What Testers Do

Testers are not a special breed of person. Software testing is a skill; it is a discipline of software engineering, just like programming, design, requirements analysis, and project management. Yet many people think that someone needs to be a certain kind of person in order to be a successful tester. They feel that being an effective software tester is purely a matter of disposition, rather than an acquired skill. It is very common for good testers to constantly hear their peers say things like, "I can't understand how you could do what you do" and "I could never do that to people."

Even highly respected thinkers in software engineering tout the virtues of having your software tested by people who don't know anything about software engineering at all. It has even been suggested that software testing involves no skill and could be done by temps, college students, smart teenagers, and retirees. The thinking goes that anyone who is qualified to be a good tester would not want to be one because the job is somehow so horrible that anyone who would be really good at it would opt for programming instead.This is ridiculous. Yet it is by far the most common opinion on software testing in the industry.

To understand what testers do, it's important to debunk certain popular myths. The first myth is that anyone off the street can test software. This is no truer than it would be of any other software engineering discipline. Testing requires training, skill, experience, and an understanding of the requirements and design of the application under test. A software requirements specification would be essentially unreadable to, say, a college student or temp with no software engineering background. How would such a person then be able to design a test strategy to verify it?

Many people mistakenly believe that software testing is a stepping-stone to programming. This is complete nonsense. The skills are completely different, and a good programmer will not learn any more about programming by testing software than a tester would by programming. They are different jobs with different skills. Nobody suggests that a programmer should first be a requirements analyst or project manager before beginning to program. But for some reason, many people suggest exactly this about those who work in quality assurance. All of those jobs have different skill sets, and all are necessary in order for a team to develop software. A good project manager will help her organization see that no one of those jobs is more important than any other.

Another myth is that testers have to be "nasty." The thinking behind this is that testers must enjoy criticizing other people about minute details. (This is also often what people mean when they say that testers should be "detail-oriented." Shouldn't everyone involved in software development be equally detail-oriented? Details can make or break the project.) In truth, testers do not need to be nasty; rather, they should be scientifically-minded. Each test case is really an experiment: if the user interacts in this way, does the software produce the desired results? The testers really want the software to pass, and would prefer to be able to move on to the next test rather than to have to enter a defect.

The most common myth is that testers want to keep the organization from releasing software, and want to criticize the programmers who built it. It is common to talk about a "battle" between programming and QA, in which there is a lot of tension and competition between the two groups in the organization. This myth is highly counterproductive, because it drives a wedge between software engineers working toward the same goal. A good tester simply wants the truth about the software to be known, so that responsible and accurate decisions can be made about its health. Testers do not want to keep the software from being released; they just want the stakeholders and managers to make an informed decision about releasing it.

In other words, testers don't want to be mean to the programmers; they want to help them make the code better, so that everyone can be proud of the product. There is no mystery to what motivates a good software tester. Testers want to do a good job and take pride in a well-engineered product, just like all other software engineers. It's their job to catch defects; if they miss defects, they look like lousy testers. They aren't hired to abuse programmers; they're hired to "abuse" softwareby people who have the software's best interests in mind. Software testers certainly do not want to make specific comments about individual programmers. In fact, when they are testing, they rarely even know who wrote the code that broke. All a tester wants to do is report that taking certain action with the software does not yield the results she expected. It's up to management to decide how to handle that.

By and large, these myths are not true. However, there are some people who were attracted to software testing specifically because of these myths and, as a result, turn them into self-fulfilling prophecies. There's a tiny minority of people in the software testing world who really do enjoy being nasty to programmers. Often, these people do not have much skill in software testing. However, these people represent the exception and not the rule. Treating all software testers based on this stereotype is always counterproductive.

8.8.2. Divide Quality Tasks Efficiently

When looked at from a high level, it seems that testing tasks can be easily divided between testers and programmers. The programmers are responsible for making sure that the software does what they intended it to do; the testers are responsible for making sure that the software does what the users and stakeholders intended it to do. However, when testing is actually under way, there are often grey areas of responsibility that emerge between programming and QA that can be a source of contention that inside the team. It is up to the project manager to make sure that tasks are distributed according to what is efficient, and by how people define their jobs.

Many people mistakenly believe that software testers can test "everything" in an application. When a defect is found after the software is released, they expectantly look to the QA team to figure out why they did not catch it. Sometimes it is true that QA should have caught a defect. If there is a requirement written into the specification or design that is planned, but the software does not properly implement the requirement or design, the testers should have caught it. If there are test cases that were supposed to execute the behavior and were marked as "passed," then it is reasonable to assume that a tester made a mistake. But it is unreasonable to think that every client complaint, every crash, or every "bug" should be caught by QA. Not only is it unreasonable, it is simply impossible to do this. The job of a software tester is to verify that the software met its requirements. If there is a core behavior that users will expect and that stakeholders need, the only way for the tester to know this is for a requirements analyst or designer (or a project manager, if necessary!) to write down exactly how the software should behave. It is unreasonable to expect the tester to come up with that independently.

Quality is everyone's responsibility. Some programmers look at QA as a sort of "quality dumping ground" where all quality tasks can be relegated. Study after study, book after book, and, most importantly, practical experience show that this approach fails every time. Software testers just can't tack quality onto a product at the end of the project. Quality must be planned in from the beginning, and every project team member needs to do his or her part to make sure that the defects are caught.

Consider the example of a defect that is found in a test that is very complicatedfor example, it may occur in only one environment with a specific set of data. In some organizations, rules put in place by management require that the software testers spend days researching what might be causing this problem: reconstructing scenarios, attempting to reproduce it in different environments, trying to recreate corrupted data...all of which are highly time-consuming. This is a very inefficient use of engineering time, if it's the case that a programmer could have just loaded the software into a debugger in the same environment that the tester originally found it in, recreated the problem using the same test data, and fixed it on the spot.

Another example of a grey area is a critical crash in the software that makes an important feature of the software essentially impossible to use, and that is immediately apparent on installation. For example, the crash may occur as soon as the feature is launched. This is a common argument between programmers and testers: the tester immediately rejects the build, but the programmer may insist that it was perfectly reasonable to send such a broken build to QA. What the programmer may not realize is that the tester may have spent several hours setting up the test environment (imaging machines, redoing the network, installing database software, etc.), only to abort the test within five minutes. And, because the install has been done already, the entire environment will have to be done again. This is clearly an inefficient use of time. The programmer could have taken a few minutes to compile the build and launch the softwarehe would have found the problem immediately. This is why it is important that there are quality-related activities like unit tests and smoke tests in place that are always performed by the programmer (and passed) before the build is turned over to the testers.

Decisions about who is responsible for quality tasks should not be made based on an argument about who is responsible for what. It should not be about what programmers or testers feel should or should not be in their job descriptions. Instead, the decision should be made based on what is most efficient. If a task will take two hours for a programmer but three days for a tester, then clearly that task should be assigned to the programmer. This is why the project manager needs to have a solid understanding of what it is that the testers do, and of what's involved in setting up the test environment. That way, if there is an argument about whether or not a programmer should take on a certain quality task, the project manager can demonstrate why it is much more efficient for the programmer to do soor choose to assign the task to the tester, if that's what makes sense. The most important part of this is for the project manager to make optimal decisions that have the smallest impact on the schedule.

8.8.3. Manage Time Pressure

The most common complaint about testing that project managers hear from senior managers is that the testers are wasting time, when a build could be released today. It is often very frustrating for a senior manager to hold a CD or see a web site that looks complete, only to be told that he has to wait for the testers to finish their jobs. The features seem to work, the software seems to do what it needs to do, and everything looks correct. Can't the software be rolled out now?

It is possible that even the very first build that gets created could be ready to be shipped before a single test is executed. In fact, this is what the QA team is hoping for! If the team did an excellent job of defect prevention through requirements gathering, unit testing, inspections, and reviews, then it is possible that not a single defect will be found that is serious enough to prevent the software from shipping. But it is very unlikely for this to be the case; few (if any) experienced testers have seen this. It is much more likely that the tests will uncover a defect that the senior managers decide must be fixed before the software is shipped. This is why testing tasks are necessary, and why the project manager must fight for time to finish testing the software.

Testing tasks cannot be cut any more than programming tasks can. Yet, when the project is late, senior managers and project stakeholders will often put a great deal of pressure on the project manager to do exactly that. Sometimes they will insist that regression tests be replaced with smoke tests (or cut entirely); other times, they will cut out environments, performance tests, or other activities that are critical to ensuring the software functions. It is the project manager's responsibility to resist this and assure the quality of the software.

One way that software quality is often compromised through time pressure is by putting pressure on testers to make overly aggressive estimates, and then to work overtime to meet those estimates. This happens in testing more than it happens anywhere else in a software organization, simply by virtue of the fact that testing activities are the last ones performed on the project, and the fact that few people really understand what it is that testers do. The key to avoiding this is for a project manager to use a repeatable estimation process (see Chapter 3), to resist the urge to cut quality tasks from the schedule, and to constantly stand up for the fact that the time that was estimated is what will actually be needed to test the software.

For example, if it's known that one iteration of regression tests takes three weeks, and the stakeholders choose during triage to repair defects and cut a new build, then it must be made clear to them that they are making the decision to spend three more weeks testing the software. There are two options at that point: either release the software as is, or bite the bullet and take the three weeks. One effective way to explain this is to point out that had the last iteration of regression testing not been done, the team would not even know about the defects that they want to fix, so not testing is a big risk.

Another common misguided request is that testers cut out any test cases that are not seen as representative of typical user behavior. For example, in the test case example in Table 8-2, the five bullet points in the requirement being tested require at least five different test cases. A manager putting pressure on the tester may insist that only one of the tests for that requirement be executed. Yet that manager would never go to the programmer and suggest that they only implement one of the five bullets. (This is especially likely with negative cases and boundary cases.) The bottom line is that if a feature is important enough to build, then it's important enough to make sure that it works properly.

The project manager must fight for the estimated time frame. If the estimates are overly aggressive and the testers need to work overtime to meet their own goal, that's fine. The testers should not have agreed to the inaccurate estimates. However, it's important not to routinely undercut the tester. It's very common for a manager under pressure to require, for example, that the testers come in on weekends to meet a deadline, or to arbitrarily cut down the amount of time that it takes to run a regression test. This is unfair. It's no different than telling a programmer who needs three weeks to build a feature that she only has two. Something will give and, in either of these cases, it will lead to problems with the software.

8.8.4. Gather Metrics

Metrics are statistics gathered over the course of the software project in order to identify accomplishments and areas for improvement. With a good metrics program, it's possible to compare projects to one another, regardless of size or scope. The ability to make these comparisons will help a project manager make consistent judgements and set standards for project teams across multiple projects.

It is not difficult to gather most simple metrics. The information needed to calculate them is available in the defect tracking system and the project schedule. (In Chapter 4, valuable earned value metrics were gathered and discussed.) In addition to earned value, there are several other metrics that can be useful to a project manager. The project manager should work with the senior management of the organization to determine what is worth measuring.

All metrics should be taken on an organization-wide level. It is important that the numbers that are gathered are used for improving the organization, and not for rewarding or penalizing individual software engineers. It is tempting, for example, to make part of an annual bonus dependent on the team reducing certain defect measurements. In practice, that is an effective way to make sure that defects do not get reported or tracked properly. Here is a list of metrics commonly used in software projects:

The Defects per Project Phase metric provides a general comparison of defects found in each project phase. This metric requires the person entering the defect in the defect-tracking database to enter the phase at which the defect was found. It also requires that defects from document reviews be included, and that those defects be prioritized in the same manner as all other defects. (For example, defects found during an SRS inspection would be classified as found during the requirements phase.) This metric is usually shown as a bar graph, with the project phases on the X-axis, the number of defects on the Y-axis, and the data points being the number of defects found per phase (one sub-bar per priority).
The Mean Defect Discovery Rate metric weighs the number of defects found by the effort on a day-by-day basis over the course of the project. This is a standard and useful tool to determine whether a specific project is going according to a projected defect discovery rate (based on previous projects and industry averages). This rate should slow down as the project progresses, so that far more defects are found at the beginning of software testing than at the end. If this metric remains constant or, even worse, is increasing over the course of several test iterations, it could mean that there are serious scope or requirements problems, or that the programmers did not fully understand what it is the software was supposed to do.
The Defect Resolution Rate tracks the time taken to resolve defects, from the time that they are entered until the time that they are closed. It is calculated by dividing the number of non-closed problems by the average time to close. This can be used to predict release dates by extrapolating from the current open defects.
Defect Age is used to measure how effective the review and inspection process is. A defect that was introduced in the scope or requirements phase but not discovered until testing is far more costly to fix than it would have been had the inspection caught it earlyso the lower the mean defect age, the better. To capture the defect age metric, the root cause, or the project phase in which each defect was introduced, must be determined during triage and entered into the defect tracking database. (This is referred to as root cause analysis.) An age can then be assigned to each defect based on which phase it was discovered in and which phase it was introduced. By comparing the average defect age for the entire project with the average age for defects introduced in a specific phase, the project manager can identify which inspections and reviews are most effective and introduce better inspection practices to target the costliest defects.
The Percentage of Engineering Effort metric compares how software effort (in man-hours) and actual calendar time break down into project phases. These two measurements will be useful in determining where additional training, staff, or process improvement is needed. This metric measures the average number of man-hours spent in each project phase. Some phases consume a lot of calendar time, but do not require many man-hours. A project manager can use this measurement to gauge whether the effort is being used efficiently over the course of the project schedule.
Defect Density measures the number of defects per KLOC (thousand lines of code). This is one of the most common metrics used in software engineering. It is often used to determine whether the software is ready for release. A test plan may have a maximum defect density listed in the acceptance criteria, which requires that the software not contain any critical or high-priority defects, and contain less than a specific number of low-priority defects per KLOC.

Note: More information on software testing can be found in Testing Computer Software by Cem Kaner, Jack Falk, and Hung Quoc Nguyen (Wiley, 1999).More information on software metrics can be found in Software Metrics: Establishing a Company-wide Program by Robert Grady and Deborah Caswell (Prentice Hall, 1987).