THE PROCESS DATABASE

2.1 THE PROCESS DATABASE

2.1 THE PROCESS DATABASE

2.1.1 Contents of the PDB

2.1.2 A Sample Entry

2.1.1 Contents of the PDB

2.1.2 A Sample Entry

Table 2.1. General Data about a Project

Table 2.2. Effort Data

Table 2.3. Defect Data for the Synergy Project

Table 2.4. Size Data for the Synergy Project

The process database is a permanent repository of the process performance data from projects; it can be used for project planning, estimation, analysis of productivity and quality, and other purposes.² The PDB consists of data from completed projects, with each project providing one data record. As you can imagine, to populate the PDB, data must be collected, analyzed, and then organized for entry. Here we focus on how the data are represented in the PDB at Infosys; Chapter 7 explains how the data are collected.

To use the information in the PDB during planning, project managers often find information about similar projects particularly useful. To allow for similarity checking, you should capture in the PDB general information about the project, such as languages used, platforms, databases used, tools used, size, and effort. With this type of information, a project manager can search and find information on all projects that, for example, focused on a particular application domain, used a particular database management system (DBMS) or language, or targeted a specific platform.

To help in project planning, you should capture data about the effort, defects, schedule, risk, and so on. If the total effort spent in a project is known, along with the size and distribution of effort in different phases, this data can be used for estimating effort in a new project.

Thus, the data captured in the PDB at Infosys can be classified as follows:

Project characteristics

Project schedule

Project effort

Size

Defects

Data on project characteristics consists of the project name, the names of the project manager and module leaders (so that they can be contacted for further information or clarification), the business unit (to permit analysis based on business unit), the process deployed (to allow separate analyses of different processes), the application domain, the hardware platform, the languages used, the DBMS used, a brief statement of the project goals, information about project risks, the duration of the project, and team size.

The schedule data is primarily the project's expected and actual start and end dates. The data on project effort includes data on the initial estimated effort and the total actual effort, and the distribution of the actual effort among various stages, such as project initiation, requirements management, design, build, unit testing, and other phases. Chapter 7 discusses how to capture the effort data.

The size of the software developed may be in terms of lines of code (LOC), the number of simple, medium, or complex programs, or a combination of these measures. Even if function points are not used for estimation, you can obtain a uniform metric for productivity by representing the final size in function points, which is usually obtained by converting the measured size of the software in LOC to function points using published conversion tables.³ The size of the final system in function points is also captured.

The data on defects includes the number of defects found in the various defect detection activities, and the number of defects injected in different stages. Hence, you record the number of defects of different origins found in requirements review, design review, code review, unit testing, and other phases. Chapter 7 explains how projects record defect data.

In addition, notes are recorded, including notes on estimation (for example, the criteria used for classifying programs as simple, medium, or complex) and notes on risk management (for example, how risk perception changed during the project).

Let's look at a sample PDB entry for a project, which we will refer to with the pseudonym Synergy. In the Synergy project an application was built that formed the precursor to that of the case study (the ACIC project). The case study will refer to this PDB entry during planning.

Data for the four major tables are shown (the example uses expressive names, but codes are used in the actual database for various phases and quality activities). In this example, the data are fairly complete; in other situations, however, the data may not be complete. Such data cannot always be discarded because the information may still be useful.⁴ Hence, such data may also be captured in the PDB.

Table 2.1 gives the general information on the project, including start and end dates (estimated and actual), estimated effort (actual effort is not put in this table because it can be computed from the effort table), peak team size, information about the risk, tools used, and other items. In addition, other information for example, about the client is stored in this table.

Table 2.1. General Data about a Project
General Characteristics
Field Name	Value for Synergy
ProcessCategory	Development
LifeCycle	Full
BusinessDomain	Brokerage/Finance
ProcessTailoringNotes	Added group review for high-impact documents. First program of each developer was group reviewed.
PeakTeamSize	12
ToolsUsed	VSS for document CM, VAJ for source code
EstimatedStart	20 Jan 2000
EstimatedFinish	5 May 2000
EstimatedEffortHrs	3,106
EstimationNotes	Use case point approach was one method used for estimation.
ActualStart	20 Jan 2000
ActualFinish	5 May 2000
First Risk	Working through link on customer DB
Second Risk	Additional requirements
Third Risk	Attrition
RiskNotes	Worked in shifts; agreed to take enhancements after acceptance of this product; team building exercises were done.

The second table captures the information about effort. For different stages in the process, it includes data on the effort spent in the activity and the effort spent in rework after the task. Rework effort is captured because it helps in calculating and understanding the cost of quality. Table 2.2 shows the Synergy effort data in person-hours. Estimated effort for the phases is also given. (The total effort spent in life-cycle stages is 2,950 person-hours, and in review, 223 person-hours; the total estimated effort is 3,012 person-hours.)

The third table contains information about defects. It is desirable to know not only when the defect was detected but also when it was injected. Hence, you should record the number of defects found for each injection stage and detection stage combination. The detection stages consist of various reviews and testing, whereas the injection stages involve requirements, design, and coding. If you can separate the defects detected by stage according to their injection stages, then you can compute removal efficiencies of the defect detection stages. This information can be useful for identifying potential improvement areas. Table 2.3 shows the defect data for the Synergy project.

Table 2.2. Effort Data
Effort by Stage
Stage	TaskEffort	Review Effort	Estimated
Requirements analysis	0	0	0
Design	414	32	367
Coding	1147	76	1182
Independent unit testing	156	74	269
Integration testing	251	30	180
Acceptance testing and installation	183	0	175
Project management	237	8	357
Configuration management	30	3	38
Project-specific training	200	0	218
Others	332	0	226

The final table contains information about the size of the project. Different languages may be used in a project, so this table may have multiple entries. Multiple units of size may also be used, so the table captures the unit. Generally, if the size is given in LOC, size in function points can also be computed by using conversion tables as needed. This information is used to calculate productivity in terms of function points. Because size is a critical factor in determining productivity, other factors, such as the operating system and hardware used, are also captured. Table 2.4 shows the values for this table for Synergy.

Table 2.4. Size Data for the Synergy Project
Size
LangCode	OSCode	DBMSCode	HWCode	MeasureCode	ActualCode Size
Java	Windows		PC	LOC	8,082
Persistent Builder	Windows NT	DB2	Client MC	LOC	12,185

Table 2.3. Defect Data for the Synergy Project

Requirement Review

Design Review

Code Review

Unit Testing

System Testing

Acceptance Testing

Requirements