Productivity Metrics | Metrics and Lessons Learned for Object-Oriented Projects

As stated in the preface, productivity metrics are outside the scope of this book. Software productivity is a complex subject that deserves a much more complete treatment than a brief discussion in a book that focuses on quality and quality metrics. For non-OO projects, much research has been done in assessing and measuring productivity and there are a number of well-known books in the literature. For example, see Jones's work (1986, 1991, 1993, 2000). For productivity metrics for OO projects, relatively few research has been conducted and published. Because this chapter is on OO metrics in general, we include a brief discussion on productivity metrics.

Metrics like lines of code per hour , function points per person-month (PM), number of classes per person-year (PY) or person-month, number of methods per PM, average person-days per class, or even hours per class and average number of classes per developer have been proposed or reported in the literature for OO productivity (Card and Scalzo, 1999; Chidamer et al., 1997; IBM OOTC, 1993; Lorenz and Kidd, 1994). Despite the differences in units of measurement, these metrics all measure the same concept of productivity, which is the number of units of output per unit of effort. In OO development, the unit of output is class or method and the common units of effort are PY and PM. Among the many variants of productivity metric, number of classes per PY and number of classes per PM are perhaps the most frequently used.

Let us look at some actual data. For the five IBM projects discussed earlier, data on project size in terms of number of classes were available (Table 12.2). We also tracked the total PYs for each project, from design, through development, to the completion of testing. We did not have effort data for Project E because it was a joint project with an external alliance. The number of classes per PY thus calculated for these projects are shown in Table 12.5. The numbers ranged from 2.8 classes per PM to 6 classes per PM. The average of the projects was 4.4 classes per PM, with a standard deviation of 1.1. The dispersion of the distribution was small in view of the fact that these were separate projects with different development teams , albeit all developed in the same organization. The high number of classes per PM for Project B may be related to the small number of methods per class (3 methods per class) for the project, as discussed earlier. It is also significant to note that the differences between the C++ projects and the Smalltalk projects were small.

Lorenz and Kidd (1994) show data on average number of person-days per class for four Smalltalk projects and two C++ projects, in a histogram format. From the histograms, we estimate the person-days per class for the four Smalltalk projects were 7, 6, 2, and 8, and for the two C++ projects, they were about 23 and 35. The Smalltalk data seem to be close to that of the IBM projects, amounting to about 4 classes per PM. The C++ projects amounted to about one PM or more per class.

Table 12.5. Productivity in Terms of Number of Classes per PY for Five OO Projects

	Project A C++	Project B C++	Project C C++	Project D IBM Smalltalk	Project E OTI Smalltalk	Project F Digitalk Smalltalk
Number of Classes	5,741	2,513	3,000	100	566	492
PY	100	35	90	2	na	10
Classes per PY	57.4	71.8	33.3	50	na	49.2
Classes per PM	4.8	6	2.8	4.2	na	4.1
Methods per PM	8	18	20	71	na	86

Lorenz and Kidd (1994) list the pertinent factors affecting the differences, including user interface versus model class, abstract versus concrete class, key versus support class, framework versus framework-client class, and immature versus mature classes. For example, they observe that key classes, classes that embody the "essence" of the business domain, normally take more time to develop and require more interactions with domain experts. Framework classes are powerful but are not easy to develop, and require more effort. Mature classes typically have more methods but required less development time. Therefore, without a good understanding of the projects and a consistent operational definition, it is difficult to make valid comparisons across projects or organizations.

It should be noted that all the IBM projects discussed here were systems software, either part of an operating system, or related to an operating system or a development environment. The architecture and subsystem design were firmly in place. Therefore, the classes of these projects may belong more to the mature class category. Data on classes shown in the tables include all classes, regardless of whether they were abstract or concrete, and key or support classes.

In a recent assessment of OO productivity, we looked at data from two OO projects developed at two IBM sites, which were developing middleware software related to business frameworks and Web servers. Their productivity numbers are shown in Table 12.6. The productivity numbers for these two projects were much lower than those discussed earlier. These numbers certainly reflected the difficulty in designing and implementing framework-related classes, versus the more mature classes related to operating systems. The effort data in the table include the end-to-end effort from architecture to design, development, and test. If we confined the measurement to development and test and excluded the effort related to design and architecture, then the metrics value would increase to the following:

Web server: 2.6 classes per PM, 4.5 methods per PM

Framework: 1.9 classes per PM, 14.8 methods per PM

Table 12.6. Productivity Metrics for Two OO Projects

	Classes (C++)	Methods (C++)	Total PMs	Classes per PM	Methods per PM
Web Server	598	1,029	318	1.9	3.2
Framework	3,215	24,670	2,608	1.2	9.5

The IBM OOTC's rule of thumb for effort estimate (at the early design stage of a project) is one to three PM per business class (or key class) (West, 1999). In Lorenz and Kidd's (1994) definition, a key class is a class that is central to the business domain being automated. A key class is one that would cause great difficulties in developing and maintaining a system if it did not exist. Since the ratio between key classes and support classes, or total classes in the entire project is not known, it is difficult to correlate this 1 to 3 PM per key class guideline to the numbers discussed above.

In summary, we attempt to evaluate some empirical OO productivity data, in terms of number of classes per PM. With the preceding discussion, we have the following tentative values as a stake in the ground for OO project effort estimation:

For project estimate at the early design phase, 1 to 3 PM per business class (or one-third of a class to one class per PM)
For framework-related projects, about 1.5 classes per PM
For mature class projects for systems software, about 4 classes per PM

Further studies and accumulation of empirical findings are definitely needed to establish robustness for such OO productivity metrics. A drawback of OO metrics is that there are no conversion rules to lines of code metrics and to function point metrics. As such, comparisons between OO projects described by OO metrics and projects outside the OO paradigm cannot be made. According to Jones (2002), function point metrics works well with OO projects. Among the clients of the Software Productivity Research, Inc. (SPR), those who are interested in comparing OO productivity and quality level to procedural projects all use function point metrics (Jones, 2002). The function point could eventually be the link between OO and non-OO metrics. Because there are variations in the function and responsibility of classes and methods, there are studies that started to use the number of function points as a weighting factor when measuring the number of classes and methods.

Finally, as a side note, regardless of whether it is classes per PM for OO projects or LOC per PY and function points per PM for procedural languages, these productivity metrics are two dimensional: output and effort. The productivity concept in software, especially at the project level, however, is three-dimensional: output (size or function of deliverables), effort, and time. This is because the tradeoff between time and effort is not linear, and therefore the dimension of time must be addressed. If quality is included as yet another variable, the productivity concept would be four-dimensional. Assuming quality is held constant or quality criteria can be established as part of the requirements for the deliverable , we can avoid the confusion of mixing productivity and quality, and productivity remains a three-dimensional concept. As shown in Figure 12.2, if one holds any of the two dimensions constant, a change in the third dimension is a statement of productivity. For example, if effort (resources) and development time are fixed, then the more output (function) a project produces, the more productive is the project team. Likewise, if resources and output (required functions) are fixed, then the faster the team delivers, the more productive it is.

Figure 12.2. Dimensions of the Productivity Concept

graphics/12fig02.gif

It appears then that the two-dimensional metrics are really not adequate to measure software productivity. Based on a large body of empirical data, Putnam and Myers (1992) derived the software productivity index (PI), which takes all three dimensions of productivity into account. For the output dimension, the PI equation still uses LOC and therefore the index is subject to all shortcomings associated with LOC, which are well documented in the literature (Jones, 1986, 1991, 1993, 2000). The index is still more robust comparing to the two-dimensional metrics because (1) it includes time in its calculation, (2) there is a coefficient in the formula to calibrate for the effects of project size, and (3) after the calculation is done based on the equation, a categorization process is used to translate the raw number of productivity parameters (which is a huge number) to the final productivity index (PI), and therefore the impact of the variations in LOC data is reduced. Putnam and associates also provided the values of PI by types of software based on a large body of empirical data on industry projects. Therefore, the calculated PI value of a project can be compared to the industry average according to type of software.

For procedural programming, function point productivity metrics are regarded as better than the LOC-based productivity metrics. However, the time dimension still needs to be addressed. This is the same case for the OO productivity metrics. Applying Putnam's PI approach to the function point and OO metrics will likely produce better and more adequate productivity metrics. This, however, requires more research with a large body of empirical data in order to establish appropriate equations that are equivalent to the LOC-based PI equation.

What Is Software Quality?

Software Development Process Models

Fundamentals of Measurement Theory

Software Quality Metrics Overview

Applying the Seven Basic Quality Tools in Software Development

Defect Removal Effectiveness

The Rayleigh Model

Exponential Distribution and Reliability Growth Models

Quality Management Models

In-Process Metrics for Software Testing

Complexity Metrics and Models

Metrics and Lessons Learned for Object-Oriented Projects

Availability Metrics

Measuring and Analyzing Customer Satisfaction

Conducting In-Process Quality Assessments

Conducting Software Project Assessments

Dos and Donts of Software Process Improvement

Using Function Point Metrics to Measure Software Process Improvements

Concluding Remarks

A Project Assessment Questionnaire

A Project Assessment Questionnaire