DEFECT ANALYSIS AND PREVENTION

11.4 DEFECT ANALYSIS AND PREVENTION

Defect prevention aims to learn from defects found so far on the project and to prevent defects in the rest of the project. As discussed in Chapter 5, defect prevention activities are usually done twice in a project: once when about 20% of the modules have been coded and unit tested, and again when 50% of the modules have been coded and unit tested. The main tasks of defect prevention are to perform Pareto analysis to identify the main defect types, perform causal analysis to identify the causes of defects, and identify solutions to attack the causes. Here we discuss how these tasks are performed in a project.

11.4.1 Performing Pareto Analysis

A common statistical technique used for analyzing causes, Pareto analysis is one of the primary tools for quality management.⁷^,⁸ It is also sometimes called the 80-20 rule: 80% of the problems come from 20% of the possible sources. In software it can mean that 80% of the defects stem from 20% of the root causes or that 80% of the defects are found in 20% of the code.

The first step in defect prevention is to draw a Pareto chart from the defect data. The number of defects found of different types is computed from the defect data and is plotted as a bar chart in decreasing order. Along with the bar chart, another chart is plotted on the same graph showing the cumulative number of defects as we move from types of defects on the left of the x-axis to the right of the x-axis. The Pareto chart makes it immediately clear in visual as well as quantitative terms which are the main types of defects, and also which types of defects together form 80% 85% of the total defects. Instead of plotting the number of defects, you can plot a weighted sum by assigning different weights to different types of defects.

The overall procedure for doing the Pareto analysis is as follows:

1. List all the defects identified so far.

2. Calculate the total number of defects by type.

3. Sort defects by type in descending order of number of defects.

4. Calculate the percentage of each defect type with respect to the total number of defects detected.

5. Identify the defect type that is the cause for about 80% of the total defects.

For example, consider the Pareto chart of the defect data for the ACE project shown in Figure 11.5. In this project, features are being added to an existing system. The defects data for all previous enhancements was used for this analysis. As you can see, the highest number of defects are logic defects, followed by user interface defects and standards defects. Defects in these three categories together account for more than 88% of the total defects, and the defects in the top two categories account for more than 75%. Clearly, the target for defect prevention should be the top two or three categories.

Figure 11.5. Pareto chart for defects found in the ACE project

graphics/11fig05.gif

11.4.2 Performing Causal Analysis

The Pareto chart helps to identify the main types of defects that have been found in the project so far and are likely to be found in the rest of the project unless action is taken. These defects can be treated as "effects" that you want to minimize in the future. To reduce these defects, you must find their main causes and then try to eliminate them. A cause-effect (CE) diagram can be used to determine the causes of the observed effects.⁷^,⁸ For example, the cause-effect diagram can be used to determine the main causes for the high number of GUI defects (or logic defects) in the ACE project. The main purpose of the CE diagram is to graphically represent the relationship between an effect and its various possible causes. Understanding the causes helps to identify solutions to eliminate them.

The first step in building a cause-effect diagram is to identify the effect to be analyzed. In the ACE example, the effect could be "too many GUI errors." To identify the causes, you first establish some major categories of causes. For manufacturing, these major causes often are manpower, machines, methods, materials, measurement, and environment. For causal analysis at Infosys, the standard set of major causes of defects is process, people, technology, and training (training is separated from people because it shows up very often). The main structure of the diagram shows the effect as a box on the right; a straight horizontal line extends from the box, and an angular line for each major cause connects to the main line.

To analyze the causes, the key is to ask, "Why does this cause produce this effect?" for each of the major causes. The answers to these questions become the subcauses and are represented as short horizontal lines joining the line representing the major cause. Then the same question is asked for the causes identified. This "Why-Why-Why" process is repeated until all the root causes have been identified that is, the causes for which asking "Why" no longer makes sense. When all the causes are marked in the diagram, the final picture looks like a fish-bone structure, and hence the cause-effect diagram is also called a fish-bone diagram, or Ishikawa diagram after the name of its inventor.

The main steps in drawing a cause-effect diagram are as follows⁸:

1. Clearly define the problem (the effect) to be studied. For defect prevention, it typically is "too many defects of type X."

2. Draw an arrow from left to right with a box containing the effect drawn at the head. This is the backbone of the diagram.

3. Determine the major categories of causes. These could be the standard categories or some variation to suit the problem.

4. Write these major categories in boxes and connect them with diagonal arrows to the backbone. These form the major bones of the diagram.

5. Brainstorm for the subcauses of the major causes by asking repeatedly, for each major cause, "Why does this major cause produce the effect?"

6. Add the subcauses to the diagram clustered around the bone of the major cause. If necessary, further subdivide these causes. Stop when no worthwhile answer to the question can be found.

When the fishbone diagram is finished, you have identified all the causes of the effect under study. However, most likely the initial fishbone diagram will have too many causes. Clearly, some of the causes have a greater impact than others. Hence, before completing the root cause analysis, you identify the top few causes, largely through discussion. For defect prevention, you can conduct this entire exercise for the top one or two categories of defects found in the Pareto analysis.

Figure 11.6 shows the fish-bone diagram for the ACE project. In this analysis, causes of the three major types of defects were discussed in one brainstorming session. Hence, our effect was "too many logic/GUI/standards defects." When we asked the question, "Why do people cause too many logic or GUI or standards defects?" we identified some of the (almost obvious) reasons: lack of training, oversight (that is, incomplete attention), lack of technical skills. Similarly, when we asked, "Why do processes cause too many logic/GUI/standards defects?" the answers were "standards not comprehensively documented" and "people not aware of standards." For technology the causes were "unclear specifications" and "technical problems of tools." The brainstorming sessions for the causal analysis generated many more causes. After listing all the suggestions made during the meeting, the defect prevention team prioritized them by considering each of the defects and identifying its causes. The causes that show up most frequently are the ones that are high priority. They are shown in Figure 11.6.

Figure 11.6. Cause-effect diagram for the ACE project

graphics/11fig06.gif

11.4.3 Developing and Implementing Solutions

So far we have discussed how to identify the types of frequently occurring defects and their root causes. The next phase is to take action to reduce the occurrence of defects.

The basic paradigm is the adage "An ounce of prevention is worth a pound of cure." With defect prevention, you are not trying to "cure" the software of defects; instead, you are taking preventive actions so that the software does not "fall sick" from defects. Common prevention actions are creating or improving checklists, holding training programs and reviews, and using a specific tool. Sometimes, of course, you must take drastic actions such as changing the process or the technology.

The solutions, like the cause-effect analysis, are developed through a brainstorming session. Hence, these two steps are often done in the same session. This is how it is done at Infosys.

The preventive solutions are designated as action items that someone must perform. Hence, the implementation of the solutions is the key. Unless the solutions are implemented, they are of no use. At Infosys, along with the solution, the person responsible for implementing it is also specified. These action items are then added to the detailed schedule of tasks for the project, and their implementation is tracked like other tasks. Table 11.6 shows the root causes and the preventive actions developed for the ACE project. The proposed preventive actions are self-explanatory. They were scheduled in the MSP schedule of the project.

An important part of implementing these solutions is to see whether they are having the desired effect of reducing the injection of defects and thereby reducing the rework effort. Further analysis of defects found after the solutions have been implemented can give insight into this question. Generally, the next analysis for defect prevention can be used for this purpose. In addition to tracking the impact, such follow-up analysis has a tremendous reinforcing effect. Seeing the benefits convinces people as nothing else does. Hence, in addition to implementation, the impact of implementation should also be analyzed.

11.4.4 DP in the ACIC Project

Now let's look at the DP process for the ACIC case study. Defect data after the first construction iteration were analyzed, and the frequency of the various types of defects is shown in Table 11.7. Figure 11.7 shows the Pareto chart for the defect data.

Figure 11.7. Pareto chart for defects

graphics/11fig07.gif

The main purpose of defect prevention activities is to reduce the defect injection rate. In the first iteration, the ACIC project manager knew that at least 57 defects were injected. From the effort data, he calculated the defect injection rate for the build phase as 0.33 defects per hour. As per the plan, it was expected that about 70% of the defects would be injected in the build activity, whose estimated effort was about 110 days (excluding the estimate for the rework effort). That is, as per the quality and effort plan, the defect injection rate during coding was expected to be around 0.1 defects per person-hour. But after the first iteration, the defect injection rate was three times that much! Clearly, defect prevention activities were needed to achieve the target.

Table 11.6. Root Causes and Proposed Solutions for the ACE Project
Root Cause	Preventive Actions	Assigned to	Implementation Date
Standards not followed	Do a group reading of the standards (after they have been updated). Ensure that standards are followed in the mock projects done.	All	15/12/00
Standards/checklists not documented well	Do a group review of the standards with expert from outside and then update the standards.	Xxxx	Next week
Oversight (incomplete attention)	Effective self-review Rigorous code reviews	All	Immediate effect Immediate effect
Unclear/incorrect specifications	Specification reviews	All	Immediate effect
Lack of training	Every new entrant will do a mock project, whose code will be reviewed and tested thoroughly. A detailed specification and test plan will be made for the same.	xxxx	29/12/00
Technical problems	Create awareness in people about the problems with the tools and how to avoid them. Write a BOK on this and make it available.
Lack of technical skills	Document a BOK on topics like Sheridan grids, recordsets, Active Reports.	xxxx	31/01/01

To reduce the defect injection rate significantly, the project manager decided to tackle the top three categories of defects: logic, standards, and redundant code. A brainstorming session was held to identify the root causes and possible preventive actions. The regular procedure for brainstorming was followed. First, all the possible causes that anyone suggested were listed, and then the ones that were identified as the main culprits were separated out. For these causes, possible preventive actions were discussed and finally agreed on. Table 11.8 shows the final result of the causal analysis meeting namely, the main root causes and preventive actions to be implemented. Many of these preventive actions became schedulable activities and were added to the project schedule and then later executed (those assigned to "self" were monitored informally).

Table 11.7. Summary of Defect Data after First Iteration, ACIC Project
Defect Type	Number of Defects
Logic	19
Standards	17
Redundant code	11
UI	8
Architecture	2
Total	57

The preventive actions given in the table are proposals by the team members; the project manager had to ascertain that they gave the desired result. Whether or not these measures were successful in reducing the defect injection rate could be checked only through the defect data.

The defect prevention activities were performed after the first construction iteration was done. Because the ACIC project had three such iterations, the defect injection rate after the next two iterations was also computed. Figure 11.8 shows the result of the analysis done after the other two iterations. This chart clearly shows the impact of implementing the preventive actions on the defect injection rate: It fell from more than 0.33 to less than 0.1!

Figure 11.8. Defect injection rate in different iterations, ACIC project

graphics/11fig08.gif

Table 11.8. Root Causes and Preventive Actions for the ACIC Project
Defect Type (Number of Defects)	Root Cause	Preventive Action	Assigned To
Standards (17)	Lack of programming experience	Training.	Self
	Oversight	Developers should read the coding standards carefully and adhere to them strictly.	Self
	Lack of understanding of program specs use and need	(i) Come up with a method to generate program specs from Rational Rose. (ii) Prepare a checklist for reviewing program specs. (iii) Prepare guidelines for writing program specs.	xxxx
	Coding standards not updated	Update coding standards and prepare a document listing the applicable project-specific UI standards.	xxxx
Redundant Code (11)	Lack of understanding of language	Training.	xxxx
	Lack of understanding of object model and database	(i) Training on database structure. (ii) Developer should go through the object model thoroughly.	Session on DB to be taken by xxxxx
	Lack of understanding of existing code	Group to discuss in a meeting and finalize the set of general method calls and identify where they should be called from.	Team
	Lack of understanding of table model	Understand the functionality of table model and dependency on Table Selection and inform the team about it.	xxxx
Logic (19)	Lack of understanding of existing code	Arrange code reading sessions.	Self
	Lack of programming experience	Training.	xxxx
	Lack of understanding of sequence diagrams representations	Give training in Rational Rose.	xxxx
	Lack of understanding of database and associated processes.	Same as earlier.	xxxx
	Lack of understanding of object model	Same as earlier.	Self
	Oversight	Self-testing by programmer should be made more thorough. A session on how to test a small part of code to be taken.	Training by xxxx
	Lack of understanding of use cases	Developers will do a requirement walkthrough.	Self
	Lack of understanding of business rules	(i) Developer to refer to the matrix available that deals with various rules. (ii) Developer to review use cases of earlier application for better understanding of business rules.	Self
	Lack of understanding of defect	Follow-up with the reviewer should be taken by the owner of the defect. An attempt shall be made to reduce any existing communication gaps by more frequent follow-ups of the issues with the team/member concerned.	Team

Reduction in defect injection implies that there are fewer defects to be detected and fixed. Hence, a successful defect prevention activity should lead to reduction in the rework effort that follows testing. Figure 11.9 shows the rework effort in the three iterations. (This rework effort is obtained from the WAR because there is a different code for rework, and the program and module are also specified.) The rework effort after the first construction iteration was about 16% of the total effort for that iteration. This effort fell to about 5% and 3% in the next two construction cycles. The effort spent in the causal analysis was a few hours for data analysis, along with a brainstorming meeting of about in defect prevention.

Figure 11.9. Rework reduction in ACIC due to defect prevention

graphics/11fig09.gif