The process database is a permanent repository of the process performance data from projects; it can be used for project planning, estimation, analysis of productivity and quality, and other purposes.2 The PDB consists of data from completed projects, with each project providing one data record. As you can imagine, to populate the PDB, data must be collected, analyzed, and then organized for entry. Here we focus on how the data are represented in the PDB at Infosys; Chapter 7 explains how the data are collected.
To use the information in the PDB during planning, project managers often find information about similar projects particularly useful. To allow for similarity checking, you should capture in the PDB general information about the project, such as languages used, platforms, databases used, tools used, size, and effort. With this type of information, a project manager can search and find information on all projects that, for example, focused on a particular application domain, used a particular database management system (DBMS) or language, or targeted a specific platform.
To help in project planning, you should capture data about the effort, defects, schedule, risk, and so on. If the total effort spent in a project is known, along with the size and distribution of effort in different phases, this data can be used for estimating effort in a new project.
Thus, the data captured in the PDB at Infosys can be classified as follows:
Project characteristics
Project schedule
Project effort
Size
Defects
Data on project characteristics consists of the project name, the names of the project manager and module leaders (so that they can be contacted for further information or clarification), the business unit (to permit analysis based on business unit), the process deployed (to allow separate analyses of different processes), the application domain, the hardware platform, the languages used, the DBMS used, a brief statement of the project goals, information about project risks, the duration of the project, and team size.
The schedule data is primarily the project's expected and actual start and end dates. The data on project effort includes data on the initial estimated effort and the total actual effort, and the distribution of the actual effort among various stages, such as project initiation, requirements management, design, build, unit testing, and other phases. Chapter 7 discusses how to capture the effort data.
The size of the software developed may be in terms of lines of code (LOC), the number of simple, medium, or complex programs, or a combination of these measures. Even if function points are not used for estimation, you can obtain a uniform metric for productivity by representing the final size in function points, which is usually obtained by converting the measured size of the software in LOC to function points using published conversion tables.3 The size of the final system in function points is also captured.
The data on defects includes the number of defects found in the various defect detection activities, and the number of defects injected in different stages. Hence, you record the number of defects of different origins found in requirements review, design review, code review, unit testing, and other phases. Chapter 7 explains how projects record defect data.
In addition, notes are recorded, including notes on estimation (for example, the criteria used for classifying programs as simple, medium, or complex) and notes on risk management (for example, how risk perception changed during the project).
Let's look at a sample PDB entry for a project, which we will refer to with the pseudonym Synergy. In the Synergy project an application was built that formed the precursor to that of the case study (the ACIC project). The case study will refer to this PDB entry during planning.
Data for the four major tables are shown (the example uses expressive names, but codes are used in the actual database for various phases and quality activities). In this example, the data are fairly complete; in other situations, however, the data may not be complete. Such data cannot always be discarded because the information may still be useful.4 Hence, such data may also be captured in the PDB.
Table 2.1 gives the general information on the project, including start and end dates (estimated and actual), estimated effort (actual effort is not put in this table because it can be computed from the effort table), peak team size, information about the risk, tools used, and other items. In addition, other information for example, about the client is stored in this table.
Table 2.1. General Data about a Project | |
General Characteristics | |
Field Name | Value for Synergy |
ProcessCategory | Development |
LifeCycle | Full |
BusinessDomain | Brokerage/Finance |
ProcessTailoringNotes | Added group review for high-impact documents. First program of each developer was group reviewed. |
PeakTeamSize | 12 |
ToolsUsed | VSS for document CM, VAJ for source code |
EstimatedStart | 20 Jan 2000 |
EstimatedFinish | 5 May 2000 |
EstimatedEffortHrs | 3,106 |
EstimationNotes | Use case point approach was one method used for estimation. |
ActualStart | 20 Jan 2000 |
ActualFinish | 5 May 2000 |
First Risk | Working through link on customer DB |
Second Risk | Additional requirements |
Third Risk | Attrition |
RiskNotes | Worked in shifts; agreed to take enhancements after acceptance of this product; team building exercises were done. |
The second table captures the information about effort. For different stages in the process, it includes data on the effort spent in the activity and the effort spent in rework after the task. Rework effort is captured because it helps in calculating and understanding the cost of quality. Table 2.2 shows the Synergy effort data in person-hours. Estimated effort for the phases is also given. (The total effort spent in life-cycle stages is 2,950 person-hours, and in review, 223 person-hours; the total estimated effort is 3,012 person-hours.)
The third table contains information about defects. It is desirable to know not only when the defect was detected but also when it was injected. Hence, you should record the number of defects found for each injection stage and detection stage combination. The detection stages consist of various reviews and testing, whereas the injection stages involve requirements, design, and coding. If you can separate the defects detected by stage according to their injection stages, then you can compute removal efficiencies of the defect detection stages. This information can be useful for identifying potential improvement areas. Table 2.3 shows the defect data for the Synergy project.
Table 2.2. Effort Data | |||
Effort by Stage | |||
Stage | TaskEffort | Review Effort | Estimated |
Requirements analysis | 0 | 0 | 0 |
Design | 414 | 32 | 367 |
Coding | 1147 | 76 | 1182 |
Independent unit testing | 156 | 74 | 269 |
Integration testing | 251 | 30 | 180 |
Acceptance testing and installation | 183 | 0 | 175 |
Project management | 237 | 8 | 357 |
Configuration management | 30 | 3 | 38 |
Project-specific training | 200 | 0 | 218 |
Others | 332 | 0 | 226 |
The final table contains information about the size of the project. Different languages may be used in a project, so this table may have multiple entries. Multiple units of size may also be used, so the table captures the unit. Generally, if the size is given in LOC, size in function points can also be computed by using conversion tables as needed. This information is used to calculate productivity in terms of function points. Because size is a critical factor in determining productivity, other factors, such as the operating system and hardware used, are also captured. Table 2.4 shows the values for this table for Synergy.
Table 2.3. Defect Data for the Synergy Project | ||||||
| Requirement Review | Design Review | Code Review | Unit Testing | System Testing | Acceptance Testing |
Requirements | 0 | 0 | 0 | 1 | 1 | 0 |
Design |
| 14 | 3 | 1 | 0 | 0 |
Coding |
|
| 21 | 48 | 17 | 6 |
Table 2.4. Size Data for the Synergy Project | |||||
Size | |||||
LangCode | OSCode | DBMSCode | HWCode | MeasureCode | ActualCode Size |
Java | Windows |
| PC | LOC | 8,082 |
Persistent Builder | Windows NT | DB2 | Client MC | LOC | 12,185 |