EFFORT ESTIMATION

4.2 EFFORT ESTIMATION

At Infosys, estimation generally takes place after analysis. That is, when a project manager estimates the effort, the requirements are well understood. The business processes are organized to support this approach. For example, the requirement phase is sometimes executed as a separate project from the software development project.

At Infosys, multiple estimation approaches have been proposed, some of which are discussed here. A project manager can choose any of the estimation approaches that suit the nature of the work. Sometimes, a project manager may estimate using multiple methods, either to validate the estimate from its primary method or to reduce the risks, particularly where past data of similar projects are limited.

4.2.1 The Bottom-up Estimation Approach

Because the types of projects undertaken at Infosys vary substantially, the bottom-up approach is preferred and recommended. The company employs a task unit approach,¹ although some of the limitations of this strategy have been overcome through the use of past data and the process capability baseline (see Chapter 2).

In the task unit approach, the project manager first divides the software under development into major programs (or units). Each program unit is then classified as simple, medium, or complex based on certain criteria. For each classification unit, the project manager defines a standard effort for coding and self-testing (together called the build effort). This standard build effort can be based on past data from a similar project, from the internal guidelines available, or some combination of these.

Once the number of units in the three categories of complexity is known and the estimated build effort for each program is selected, the total effort for the build phase of the project is known. From the build effort, the effort required for the other phases and activities is determined as a percentage of the coding effort. From the process capability baseline or the process database, the distribution of effort in a project is known. The project manager uses this distribution to determine the effort for other phases and activities. From these estimates, the total effort for the project is obtained.

This approach lends itself to a judicious mixture of experience and data. If suitable data are not available (for example, if you're launching a new type of project), you can estimate the build effort by experience after you analyze the project and when you know the various program units. With this estimate available, you can obtain the estimate for other activities by working with the effort distribution data obtained from past projects. This strategy even accounts for activities that are sometimes difficult to enumerate early but do consume effort; in the effort distribution for a project, the "other" category is frequently used to handle miscellaneous tasks.

The procedure for estimation can be summarized as the following sequence of steps:

1. Identify programs in the system and classify them as simple, medium, or complex (S/M/C). As much as possible, use either the provided standard definitions or definitions from past projects.

2. If a project-specific baseline exists, get the average build effort for S/M/C programs from the baseline.

3. If a project-specific baseline does not exist, use project type, technology, language, and other attributes to look for similar projects in the process database. Use data from these projects to define the build effort of S/M/C programs.

4. If no similar project exists in the process database and no project-specific baseline exists, use the average build effort for S/M/C programs from the general process capability baseline.

5. Use project-specific factors to refine the build effort for S/M/C programs.

6. Get the total build effort using the build effort of S/M/C programs and the counts for them.

7. Using the effort distribution given in the capability baseline or for similar projects given in the process database, estimate the effort for other tasks and the total effort.

8. Refine the estimates based on project-specific factors.

This procedure uses the process database and process capability baseline, which are discussed in Chapter 2. As mentioned earlier, if many projects of a type are being executed, you can build a project-specific capability baseline. Such baselines are similar to the general baselines but use only data from specific projects. These baselines have been found to be the best for predicting effort for another project of that type. Hence, for estimation, their use is preferred.

Because many factors can affect the effort required for a project, it is essential that estimates account for project-specific factors. Instead of classifying parameters into different levels and then determining the effect on the effort requirement, the approach outlined here lets the project manager determine the impact of project-specific factors on the estimate. Project managers can make this adjustment using their experience, the experience of the team members, or data from projects found in the process database.

Note that this method of classifying programs into a few categories and using an average build effort for each category is followed for overall estimation. In detailed scheduling, however in which a project manager assigns each unit to a member of the team for coding and budgets time for the activity characteristics of a unit are taken into account to give more or less time than the average.

4.2.2 The Top-Down Estimation Approach

Like any top-down approach, the Infosys approach starts with an estimate of the size of the software in function points. The function points can be counted using standard function point counting rules. Alternatively, if the size estimate is known in terms of LOC, it can be converted into function points.

In addition to the size estimate, a top-down approach requires an estimate of productivity. The basic approach is to start with productivity levels of similar projects (data for which is available in the process database) or with standard productivity figures (data for which is available in the process capability baseline), and then to adjust those levels, if needed, to suit the project. The productivity estimate is then used to calculate the overall effort estimate. From the overall effort estimate, estimates for the various phases are derived by using the percentage distributions. (These distributions, as in the bottom-up approach, are obtained from the process database or the capability baseline.)

To summarize, the overall approach for top-down estimation involves the following steps:

1. Get the estimate of the total size of the software in function points.

2. Using the productivity data from the project-specific capability baseline, from the general process capability baseline, or from similar projects, fix the productivity level for the project.

3. Obtain the overall effort estimate from the productivity and size estimates.

4. Use effort distribution data from the process capability baselines or similar projects to estimate the effort for the various phases.

5. Refine the estimates, taking project-specific factors into consideration.

Like the bottom-up estimation, the top-down approach allows the estimates to be refined using project-specific factors. This allowance, without actually defining these factors, acknowledges that each project is unique and may have some characteristics that do not exist in other projects. It may not be possible to enumerate these characteristics or formally model their effects on productivity. Hence, it is left to the project manager to decide which factors should be considered and how they will affect the project.

4.2.3 The Use Case Points Approach

The use case points approach employed at Infosys is based on the approach from Rational and is similar to the function points methods. This approach can be applied if use cases are used for requirement specification. The basic steps in this approach are as follows.

1. Classify each use case as simple, medium, or complex. The basis of this classification is the number of transactions in a use case, including secondary scenarios. A transaction is defined to be an atomic set of activities that is either performed entirely or not at all. A simple use case has three or fewer transactions, an average use case has four to seven transactions, and a complex use case has more than seven transactions. A simple use case is assigned a factor of 5, a medium use case a factor of 10, and a complex use case a factor of 15. Table 4.1 gives this classification and factors.

2. Obtain the total unadjusted use case points (UUCPs) as a weighted sum of factors for the use cases in the application. That is, for each of the three complexity classes, first obtain the product of the number of use cases of a particular complexity and the factor for that complexity. The sum of the three products is the number of UUCPs for the application.

3. Adjust the raw UUCP to reflect the project's complexity and the experience of the people on the project. To do this, first compute the technical complexity factor (TCF) by reviewing the factors given in Table 4.2 and rating each factor from 0 to 5. A rating of 0 means that the factor is irrelevant for this project; 5 means it is essential. For each factor, multiply its rating by its weight from the table and add these numbers to get the TFactor. Obtain the TCF using this equation:

TCF = 0.6 + (0.01 * TFactor)

Table 4.1. Use Case Complexity and Factors
Use Case Type	Description	Factor
Simple	3 or fewer transactions	5
Medium	4 7 transactions	10
Complex	>7 transactions	15

Table 4.2. Technical Factors and Weights
Sequence Number	Factor	Weight
1	Distributed system	2
2	Response or throughput performance objectives	1
3	End-user efficiency (online)	1
4	Complex internal processing	1
5	Code must be reusable	1
6	Easy to install	0.5
7	Easy to use	0.5
8	Portable	2
9	Easy to change	1
10	Concurrent	1
11	Includes special security features	1
12	Provides direct access for third parties	1
13	Special user training facilities required	1

4. Similarly, compute the environment factor (EF) by going through Table 4.3 and rating each factor from 0 to 5. For experience-related factors, 0 means no experience in the subject, 5 means expert, and 3 means average. For motivation, 0 means no motivation on the project, 5 means high motivation, and 3 means average. For the stability of requirements, 0 means extremely unstable requirements, 5 means unchanging requirements, and 3 means average. For part-time workers, 0 means no part-time technical staff, 5 means all part-time staff, and 3 means average. For programming language difficulty, 0 means easy-to-use programming language, 5 means very difficult programming language, and 3 means average. The weighted sum gives the EFactor, from which the EF is obtained by the following equation:

Table 4.3. Environmental Factors for Team and Weights
Sequence Number	Factor	Weight
1	Familiar with Internet process	1.5
2	Application experience	0.5
3	Object-oriented experience	1
4	Lead analyst capability	0.5
5	Motivation	1
6	Stable requirements	2
7	Part-time workers	1
8	Difficult programming language	1

5. EF = 1.4 + ( 0.03 * EFactor)

6. Using these two factors, compute the final use case points (UCP) as follows:

UCP = UUCP * TCF * EF

For effort estimation, assign, on an average, 20 person-hours per UCP for the entire life cycle. This will give a rough estimate. Refine this further as follows. Count how many factors are less than 3 and how many factors are greater than 3. If the total number of factors that have a value less than 3 are few, 20 person-hours per UCP is suitable. If there are many, use 28 person-hours per UCP. In other words, the range is 20 to 28 person-hours per UCP, and the project manager can decide which value to use depending on the various factors.

4.2.4 Effectiveness of the Overall Approach

The common way to analyze the effectiveness of an estimation approach is to see how the estimated effort compares with the actual effort. As discussed earlier, this comparison gives only a general idea of the accuracy of the estimates; it does not indicate how optimal the estimates are. To gain that information, you must study the effects of estimates on programmers (for example, whether they were "stretched" or were "underutilized"). Nevertheless, a comparison of actual effort expended and estimated effort does give an idea of the effectiveness of the estimation method.

For completed projects, as discussed in Chapter 3, the process database includes information on the estimated effort as well as the actual effort. Figure 4.1 shows the scatter plot of estimated effort and actual effort for some of the completed development projects.

Figure 4.1. Actual versus estimated effort

graphics/04fig01.gif

As the plot shows, the estimation approach works quite well; most of the data points are close to the 45-degree line in the graph (if all estimates match the actual effort, all points will fall on the 45-degree line). The data also show that more than 50% of the projects are within 25% of the estimated effort. Nevertheless, the data indicate that the estimates are usually lower than the actual effort; note that most of the points are above the 45-degree line rather than below it. That is, people tend to underestimate more often something that afflicts the software industry in general. On average, the actual effort was 25% higher than the estimate. Overall, although there is room for improvement, the estimation approach is reasonably effective.

4.2.5 Effort Estimate of the ACIC Project

Here we illustrate the estimation approach by showing its application on the ACIC project. Two other examples can be found in my earlier book.¹⁰ The ACIC project employs the use-case-driven approach. Hence, the main decomposition is in terms of use cases and not in terms of modules. To classify the use cases, the project manager used the classification criteria. Table 4.4 lists the 26 use cases along with their complexity.

To estimate the build effort for different types of use cases, the ACIC project manager used the data from the Synergy project, whose process database entry is given in Chapter 3. The Synergy project had 21 simple, 11 medium, and 8 complex use cases. The detailed build data for the different use cases was used to estimate the average build efforts. (The total build effort was about 143 person-days. With average build efforts of 1 person-day, 5 person-days, and 8 person-days, respectively, for average, medium, and complex use cases, the total comes to 140, a number that is reasonably close to actual.) Table 4.5 shows the average build effort for each type of use case and the total build effort.

Table 4.4. Use Cases in the ACIC Project
Use Case Number	Description	Complexity
1	Navigate Screen	Complex
2	Update Personal Details	Medium
3	Add Address	Medium
4	Update Address	Complex
5	Delete Address	Complex
6	Add Telephone Number	Medium
7	Update Telephone Number	Complex
8	Delete Telephone Number	Complex
9	Add E-mail	Medium
10	Update E-mail	Medium
11	Delete E-mail	Medium
12	Update Employment Details of a Party	Medium
13	Update Financial Details of a Party	Medium
14	Update Details of an Account	Medium
15	Maintain Activities of an Account	Complex
16	Maintain Memos of an Account	Simple
17	View History of Party Details	Complex
18	View History of Account Details	Complex
19	View History of Option Level and Service Options	Simple
20	View History of Activities and Memos	Simple
21	View History of Roles	Complex
22	View Account Details	Simple
23	View Holdings of an Account	Complex
24	View Pending Orders of an Account	Complex
25	Close/Reactivate Account	Simple
26	Make Intelligent Update to Business Partners of ACIC	Complex

To estimate the effort distribution among the stages, the project manager used the distribution found in the Synergy project. Because the earlier project did not have a requirements phase, the distribution had to be modified. Table 4.6 gives the estimate for each phase and for the total.

Table 4.5. Build Effort for the ACIC Project
Use Case Type	Effort (per use case, in person-days)	Number of Units	Total Build Effort (person-days)
Simple use cases	1	5	5
Medium use cases	5	9	45
Complex use cases	8	12	96
Total			146

In this project, in addition to estimating in this bottom-up manner, the project manager employed the use case point methodology. As described earlier, first the UUCPs are determined from the use cases by assigning 5 points to each simple use case, 10 points to each medium-complexity use case, and 15 to each complex use case. The number of simple, medium, and complex use cases were 5, 9, and 12, respectively, so this translates to

UUCP = 5 * 5 + 9 * 10 + 12 * 15 = 295

To take into account the various factors, first the ACIC project manager assigned weights to the factors related to the complexity of the technology and obtained the technology complexity factor. He chose the following values of the factors (in the order given in Table 4.3): 4, 3, 5, 3, 4, 5, 5, 0, 4, 1, 2, 0, and 5, resulting in a TFactor of 40 (8 + 3 + 5 + 3 + 4 + 2.5 + 2.5 + 0 + 4 + 1 + 2 + 0 + 5) and a TCF of 1.0. Next, he computed the environmental factor. He assigned the following weights to the environmental factors: 3, 1, 3, 4, 5, 5, 0, and 3; the resulting EFactor was 22 (4.5 + 0.5 + 3 + 2 + 5 + 10 + 0 3) and an EF of 0.74. From these, he calculated the total use case points as

Table 4.6. Estimated Effort for the ACIC Project
Activity	Estimated Effort
	Person-days	% of Total Effort
Requirements	50	10
Design	60	12
Build	146	29
Integration testing	35	7
Regression testing	10	2
Acceptance testing	30	6
Project management	75	15
Configuration management	16	3
Training	50	10
Others	40	6
Estimated effort	501	100%

UCP = 295 * 1.0 * 0.74 = 218.3

Using the standard effort figure of 20 person-hours per UCP, he got the effort estimate as

218 * 20 = 4,360 person-hours = 499 person-days (at 8.75 hrs/day)

513 person-days (at 8.5 hrs/day)

These estimates were amazingly close to the earlier estimate, increasing the confidence of the project manager in the estimation. (As it turns out, the estimates for this project were indeed highly accurate, as you will see in the closure report given in Chapter 12. Furthermore, at all the milestones, the effort overrun, as compared to planned, was minuscule, as you will see in a milestone analysis given in Chapter 11.)

In this project, as mentioned earlier, the iterative process of RUP was used. Because the phases of design, analysis, and build were spread over many iterations, a phase-wise effort estimate, by itself, would not have provided a direct input for planning. For planning, the project manager had to estimate the effort for the various iterations. To obtain this, he started with the overall estimate as determined earlier. The estimate for requirements was broken into project initiation and inception phases. The effort for design, build, and test was broken into elaboration and construction, based on the use cases chosen in the various iterations and the guidelines given in the RUP methodology. The project management, CM, and other costs remained the same. Table 4.7 shows the distribution of effort by iterations.

Table 4.7. Distribution of Effort by Iterations in the ACIC Project
Iteration	Estimated Effort
	Person-days	% of Total Effort
Project initiation	25	5
Inception phase	24	5
Elaboration phase: Iteration 1	45	9
Elaboration phase: Iteration 2	34	7
Construction phase: Iteration 1	27	5
Construction phase: Iteration 2	24	5
Construction phase: Iteration 3	21	4
Transition phase	110	22
Project closure	10	2
Project management	75	15
Configuration management	16	3
Training	50	10
Others	40	8
Total estimated effort	501 person-days	100%

4.2 EFFORT ESTIMATION

4.2.1 The Bottom-up Estimation Approach

4.2.2 The Top-Down Estimation Approach

4.2.3 The Use Case Points Approach

Table 4.1. Use Case Complexity and Factors

Table 4.2. Technical Factors and Weights

Table 4.3. Environmental Factors for Team and Weights

4.2.4 Effectiveness of the Overall Approach

Figure 4.1. Actual versus estimated effort

4.2.5 Effort Estimate of the ACIC Project

Table 4.4. Use Cases in the ACIC Project

Table 4.5. Build Effort for the ACIC Project

Table 4.6. Estimated Effort for the ACIC Project

Table 4.7. Distribution of Effort by Iterations in the ACIC Project