4.5 MODELS AND TOOLS REVISITED

< Day Day Up >

To apply a set of principles, which operate at the conceptual level, you have to develop an operational process model and that should be founded upon an logical view of the process. This is a fancy way of saying that we start with a set of ideas, the principles; we use these to put together an approach; and, we build a set of mechanisms that we can use on a day-to-day basis within that approach.

There are basically two strategies for cost estimation that are generally seen as competitive. Fortunately, more and more people are realizing that the two strategies can work harmoniously within a cost estimation process. The two strategies can be termed Model Based and Technique Based.

Looking first at the model based approach we find a profusion of models to choose from and we may be forgiven for asking "where did they all come from?" or perhaps, "why are there so many?"

Models are really a scientific answer to the problem of cost estimation and they appeal to many individuals within the IT industry. The size of the potential market is one reason why there are so many models, basically everyone would like a chunk of what is going. It is also true that most, if not all, of the models are really nothing more than a variation on a theme. So how does a model come about? There are two approaches that you can adopt, both well recognized and accepted within the world of mathematical modeling which is what we are really talking about.

The first approach goes something like this: First go and collect a lot of data about a lot of projects. Next, investigate the relationships between input and output variables, for example size and complexity as inputs, cost and duration as outputs, and use these relationships to construct a model. Finally, validate your model on a different set of data by comparing actuals to the model predictions.

The second approach is only slightly different: First produce a hypothesis that you believe models the reality. This will usually be in the form of an input/output model. Next apply your model to a data set to test the hypothesis it contains. Modify the model in the light of these experimental results. Finally, validate your revised model on a different set of data, again comparing actuals to predictions. That final validation step is often missed out which is one reason why these models need such careful calibration.

What you do when you have your model then depends upon your own personality and preferences. Some people take the very admirable step of placing their models in the public domain. The best known of these are Barry Boehm, (COCOMO II), Boehm et al (1) and Larry Putnam, (SLIM), Putnam (1) . Of course, once a model is in the public domain then it becomes fair game for everyone to adopt, package or shoot down in flames.

Another approach is to package the model yourself and to then present it as a black box solution to the cost estimation problem. I would say the best known individual in this category is Capers Jones, (CHECKPOINT, US or CHECKMARK, UK). Now this really irritates some people but I must admit that I can see the commercial sense of this approach. After all, the individuals who do go public come in for a great deal of criticism — sometimes unjustified, sometimes justified.

Which leads me to an interesting point. Within Software Metrics there are a number of gurus, some of whom are best known for their work in the cost modeling area, and I have heard most of them speak at some time. One thing has impressed me. They are fanatical only about improving processes that they see as needing improvement. They are not fanatical about their own particular ideas, ideas that they have very often moved on from anyway. Howard Rubin, Barry Boehm, Putnam and Jones and the others in their league all recognize that they have only provided partial solutions and they actively welcome constructive criticism that can be used to enhance their own original ideas. It is in this direction that we should be focusing our activities and energy rather than the rather pointless arguments about whose model is best, arguments that the developers of the models studiously refrain from.

Having made my plea for sanity I think we should get back to the topic and think about the basic construction of these models.

Essentially the models apply mathematical operations such as multiplication by a constant, to an estimated size, usually in terms of Lines of Code although some models derived from and driven by size in terms of Function Point scores are available. Note that many of the packages that claim to be driven by Function Point scores are in fact only converting those scores to Lines of Code values using in built tables.

I would like to talk a little bit more about the COCOMO cost estimation model as I find this to be one of the most widely used. In my opinion, COCOMO is also a very typical cost estimation model but in some aspects is also one of the more sophisticated.

In the original model, there were three levels but I would like to consider the basic model. This takes the form of a formula:

Effort = a * Size ^ b

where Size takes the form of thousands of delivered source instructions or KDSI with a and b as constants. COCOMO being quite sophisticated recognizes three types of development environments and provides different variations of the basic model for each environment. The Organic Environment is essentially that of a small scale, non-bureaucratic project and for this environment the model takes the form:

Effort = 2.4 * Size ^ 1.05

The Embedded Environment is the opposite of the Organic in that it relates to a very bureaucratic, tightly controlled and formal organization. For this environment the model takes the form:

Effort = 3.6* Size ^ 1.2

The so called Semi-Detached Environment is one that falls between the two extremes and for this the model takes the form:

Effort = 3.0 * Size ^ 1.12

Various other elements of the model can then be used to modify the basic result. These elements are often termed "cost drivers." For further information the reader is referred to Boehm et al (1) .

I make no apology for referring back to what is, by today's standards, an old model, defined some years ago by Boehm, Boehm (1) and further developed since then. Why do I not apologize? Simply, do you or your organization have better today? For most readers the answer will be "no."

There is a great deal of debate currently about the validity and usefulness of the cost drivers used to modify the basic formulae, which can range from things like the experience of the project team to the layout of the office. There are two easily identified camps that can be identified with respect to cost drivers. One feels that the basic set of cost drivers that come with most models are insufficient; the other believes that there are too many cost drivers being considered. Personally I feel more affinity for the second view and this seems to be borne out by research that has been done as part of the ESPRIT MERMAID project, Kitchenham (1) . The results of this research which is based upon a statistical analysis of both newly collected data from a number of sources and the data sets originally used by the developers of some of the better known cost models indicates that the drivers are not orthogonal. In simple terms, the same thing is being addressed by more than one driver. When you use the drivers to adjust the base estimate you are effectively double-counting or perhaps canceling out effects depending upon how you answer specific questions.

It is also interesting to note that locally developed cost models, that is models developed from data within a single environment or installation, tend to have no more than five or six adjusting drivers, and these are often different to the drivers provided by the publicly or commercially available models, which I will call "generalized cost models."

This may indicate why there is a group that seems to be looking for more drivers. My feeling is that there is a level of dissatisfaction with the drivers already identified within the generalized models. This dissatisfaction exists because users feel that other drivers are needed for their sites or application environments. It is not so much that more drivers are needed it is that different drivers are needed.

< Day Day Up >