10.3 MODEL DEFINITION OR GOALS, QUESTIONS, METRICS

< Day Day Up >

What I intend is to discuss the derivation of metrics or measures and to then support that discussion with a couple of examples. Remember where we are: we have identified requirements for information and metrics-based techniques linked to specific customer groupings.

These perhaps range from generalities such as "improve the cost estimation process" to very specific requirements such as "provide information to the Systems Development Director that quantifies the achieved productivity of all product development teams on a quarterly basis." The related primary customers could be the project managers (because they are the people who are fed up getting blamed for poor estimates) and the Systems Development Director respectively.

You have also been talking to people in the organization to derive these requirements so you have a great asset available to you: personal contacts.

You have, in terms of Basili and Rombach's Goal, Question, Metric paradigm, Rombach (1) , the goals defined. You also have the means by which you can resolve the questions. Before you do this, do ensure that the goals you have identified are measurement goals. Individuals will often throw a general goal at you that is not suitable as a requirement within a metrics program. For example, a manager may claim that his goal is to increase development productivity by 25% within 2 years. Fine, as a strategic goal for that manager this is acceptable — but it does not help a metrics program at all. Unfortunately Software Metrics do not do managers' jobs for them and they need to realize this.

Acceptable measurement requirements within this managers strategic goal may include, "provide a mechanism by which the current and future productivity levels can be assessed," followed by "provide metrics-based techniques that can reduce rework within development."

This is not to say that measurement programs will not include some direct benefits to the development process, but only some parts of a Software Metrics program can do this. Examples include metrics-based techniques that can be used by engineers and managers on a day-by-day basis to directly improve intermediate or final deliverables; and, measurements that aim to identify the areas within generic products that are causing the majority of the problems, the idea being that these can be rewritten to reduce rework or testing effort. The other components of a measurement initiative, for example improving cost estimation and project control or providing general management information, will not directly improve things. What these components can do is enable managers to manage more effectively and it is this which improves the process.

So, be careful. Do not get forced into a situation where you have to shoulder what are essentially management responsibilities for development; you have quite enough to do just providing them with the tools they need to do their job.

Now, assuming that you have been careful about defining the requirements that have been placed against the metrics program, you can move on to the next step in model definition, requirement classification.

If you look at the various requirements you have identified you will notice similarities between them. These similarities will enable you to classify your requirements into a small number of major categories. A typical set of such macro-requirements was given in the previous chapter and is reproduced below:

"Improve cost, size and duration estimation."
"Improve project control procedures."
"Provide management information about performance, covering productivity and achieved quality."
"Address the prediction of quality attributes prior to release."
"Address the assessment of designs and requirements."

Generally speaking the provision of information about achieved quality will be broken down into a number of specific "-ities."

The reason for this grouping of requirements is to enable the satisfaction of as many individual requirements as possible in the shortest time. In almost every case that I have seen, the number of customers and individual requirements that a Software Metrics program is expected to address far exceeds the available resources if those requirements are addressed individually. The only solution that appears to work effectively is to offer generic solutions to these requirements within the organization, which can be tailored to meet the needs of specific customers.

One question that this approach raises is related to the customer set that was identified earlier. What happens to these individuals and groups? Each customer who is linked to a generic or macro-requirement becomes a "viewpoint authority" for that requirement.

To explain this concept let us consider the case of cost estimation improvement. Various functional groups within the organization may have expressed this as a requirement and these could include:

Project managers who have to prepare estimates. For now we will assume that this is one customer group currently using the same techniques and operating under the same constraints. While this is probably a simplification of reality it still enables us to illustrate the point.
Marketing, who have to prepare bids for work.
The Product manager who wishes to see better estimation.
The Systems Development director who is fed up with other directors telling him that the department is costing money by failing to meet promises to customers.
The external customers themselves.

Each of these groups has a vested interest in improving the cost estimation process and each will have input regarding the requirements for that improvement. It makes no sense at all to try to satisfy these individual requirements separately when a single, what we can call generic, solution will satisfy the majority of them. What we must realize is that each group will have a view of the requirement and the solution that is different to the other groups. In this sense, they all have a viewpoint on the requirement. We should also realize that each group may have subgroups within it and to reconcile these it is a good idea to identify or nominate a viewpoint authority. This individual is expected to express the needs of the group and to represent their interests during the process of satisfying those requirements.

There may well be conflict between the various viewpoint authorities and this can be resolved by identifying a customer authority who is the individual with the final say in resolving such conflicts. The ideal candidate for the role of customer authority is the primary customer for the specific requirement solution you have already identified.

One other advantage that comes from requirements grouping is that you can more easily use other peoples' solutions. You already have the concept of tailoring solutions for the various viewpoints but you can equally take an external solution, such as a cost estimation package that runs on a PC, and tailor its use for the organization. Requirements grouping helps because it makes the identification of external solutions easier.

Having carried out this requirements grouping you can start to go through the Question or modeling task of metrics definition.

Almost all requirements laid against a Software Metrics program relate to attributes of the development process or its products. In fact, the recognition of this was a fundamental step forward in the use of Software Metrics. Current thinking seems to have added in attributes that relate to Resources as well. This is more a large subgroup of attributes within the process/product split, but such a view can be useful. Do not fall into the trap of thinking that this solves all your problems or that this breakdown will automatically lead to metrics; it will not. Nor should you waste time trying to categorize metrics as product or process measures; it provides little added value.

What can be categorized is the requirement, but this is addressed by identifying the attribute of interest within that requirement — for example, the productivity of the development process as a process-oriented requirement.

Now for any attribute within a requirement, there are three items that need to be considered. The first is to determine what is meant by that attribute, in other words you will need to define the attribute. It may be that this has already been done when you were considering the set of initial definitions during the last stage but if this has not been addressed it is vital that it is done now. You need to be very clear about what is meant. The attributes that seem to typify the problem at the moment are "maintainability," "enhanceability" and "extendability." Any of these apparently innocuous terms can be used by different people to refer to the same attribute and individuals, including myself, will argue strongly for their own preference.

I am sure that the debate within the industry will continue for many years and may never be resolved to the satisfaction of everyone but you cannot afford to wait for such agreement. The approach I have adopted with a fair degree of success in the past is to define particular attributes within an organization. Even this can involve discussion and debate but it is practical to drive definitions for attributes, perhaps agreed and endorsed by the Metrics Coordination Group, in a fairly short space of time.

As far as the various quality "-ities" are concerned, some definitions are available from International Standard 9126, (ref. IS9126), and this can be used as the foundation of your own definitions.

Having defined the attribute you must now realize that there are two areas of interest for such an attribute. First, you may wish to monitor or assess the attribute. Alternatively, you may wish to apply prediction to the attribute. An example will clarify the problem. Software system reliability is an attribute that concerns many organizations. Reliability can be defined as:

"A set of attributes that bear on the capability of software to maintain its level of performance under stated conditions for a stated period of time"
(IS9126, 5th January 1990)

or my own definition prepared before I had sight of the ISO definition

"The capability of a software component to perform its functions to a level specified in the requirement, or defaulting to 100%, within a given environment."

There may be a requirement in an organization to monitor the reliability of its software products in the field over time or across build releases. This information allows reducing reliability to be addressed before it becomes a disaster or, if reliability is increasing, it can be used to help market the revenue earning products.

Most organizations are also interested in building reliability into their products. This implies that conditions that exist within designs, for example, have an effect on the reliability of the delivered product. So the organization may wish to measure the extent of coupling and cohesion of components in design to predict the field reliability. It may even be as simple as using faults discovered and corrected during testing as a prediction of the number of faults that will be discovered by users or customers.

The approach used to satisfy these two demands are very different and I suggest that you treat the monitoring or assessment and the prediction of an attribute as separate requirements within the metrics program.

Refining the requirements in this way, through attribute definition and determining whether you are required to monitor or predict, will enable you to identify the processes you need to consider and, hence, the process products to which you will apply measurement.

The next step is to derive a model that will satisfy the requirement. To do this you need to identify what can be called the characteristics of the process or product that affect the attribute you are interested in. Modeling by its very nature is a human activity and is therefore somewhat subjective. Again, the best advice I can give you is to talk to the people who are best placed to answer the question, "what affects this attribute?" In other words talk to the people who are involved in the process.

Identifying the characteristics of interest may cause you to do still more definition, for example what is meant by size or complexity, but please remember to be pragmatic and to listen carefully. If you have doubts about presenting a definition, then call it a 'working definition.' You are not looking for perfection — you are looking for something that will enable you to satisfy a real and important requirement.

Once you have obtained a set of characteristics you can start to combine these in the form of a model. In this context, a model is a mathematical combination of characteristics resulting in a function for which the dependent variable provides information about the attribute of interest. The model can, and perhaps should, include a technique built around the analysis of that function.

As I have said, modeling is a human activity and also, by definition, includes an element of simplification. Do not try to form the "perfect model" or you will be firmly on the road to frustration and despair. The metrics you end up with are unlikely to tell the whole story about any attribute; the real question is, do they give you more information than you have now and is that information of practical use to the organization? This simplification may mean that you ignore some of the characteristics identified. Remember that simpler models are probably better than more complicated ones.

If you find that your models are not sufficiently sophisticated to be able to supply you with the information you require, perhaps because a characteristic is more important than you assumed, then CHANGE THE MODEL! Do not fall into the trap of believing that your models are perfect or, in some way "right," just because you have spent time and effort developing them. The great power of models is that they can be incrementally developed and extended.

Your model, or at least the component of the model that is a mathematical function gives you the composite metric that you will use to measure a particular attribute.

By looking at the components of the composite metric you can identify the base metrics, the raw data, that you will need to collect to provide information about a particular attribute. All that remains is to implement the use of the base and composite metrics and then to assess the results. The implementation step is the difficult one.

So as you can see, deriving the metrics is easy! Well, perhaps you still have some doubts, so let us try it with a couple of examples.

Suppose that we have a requirement from a product development team leader for a single measure to monitor the performance of software development. I have phrased this to ask for a single measure to deliberately simplify things as I wish to illustrate the modeling technique, not provide you with a new metric. In fact we are going to end up with a very old metric in this case! Such a requirement may come from a number of team leaders and similar requirements may also come from other customer groups such as senior management who wish to monitor performance across a number of teams. For the sake of illustration I am going to just talk about this single requirement source, a manager responsible for a number of teams, although the steps and principles are the same.

This requirement can be classed as a business entity metric because it relates to a number of projects for which the team has responsibility. All that this classification implies is that we can look at what other organizations are doing in the same area but, of course, to be able to see that we need to have some idea of what that area is. We can also identify the customer authority and the viewpoints for this requirement. The prime candidate for the role of customer authority would seem to be the manager as the request has come from him. Are there any other viewpoints in this rather simple example? There most certainly is at least one other viewpoint: the engineers who will be contributing to the measure! Depending on the size of the team, at least one viewpoint authority should be identified within the group. This individual will act as their spokesperson although others should also be talked to.

We now need to identify and define the attribute of interest. Development performance can be defined as:

"The effectiveness of carrying out software development activities within a given environment."

Effectiveness can, in turn be defined as the degree to which a desired effect is produced. An effect is anything brought about by a cause, a result. I think it is time to go and talk to our customer and viewpoint authorities.

We may well find that the result they are aiming for is to deliver quality software at minimum cost. Discussion may also reveal that, of the two main elements within this target, they feel that quality is important and that they are taking continuous steps to improve this, but what they actually want to show is that they are also improving their ability to reduce costs. This could lead to a refinement in our basic measurement requirement to monitor development performance, the more refined version being:

"Supply a single high level indicator of development performance that describes the effectiveness of production."

Now we are getting to the nub of the requirement, we are talking about the effectiveness of production; that sounds like productivity to me. Well, I did say it was going to be an old metric!

Productivity can be defined as the degree to which economic value is produced.

What we now have is an identified attribute and a definition of what we mean by that attribute. Armed with this we can start to look around the industry, and other industries to see how they measure "productivity." We can also go back and talk to our customer and viewpoint authorities again to ask the question, "what is it that affects productivity, what do we put in and what do we get out?" In other words, what are the characteristics that affect productivity?

With many examples that you will come across there are two ways of answering this question about characteristics. One is to model them internally, the other is to find out what the "state of the art" or "accepted wisdom" is in other organizations and industries. For the purposes of this example, we will have a look at the first approach.

Productivity would seem to have two elements, what do we put in and what do we get out in terms of what does the software development process consume and what does it deliver? We put in people and time that we can call 'effort,' equipment, managerial support and organizational support covering things like the personnel department, accommodation and travel. Of these cost elements you will almost certainly find that the most significant is effort. You will also probably find that your accountants have techniques by which they can apportion the other cost elements down to effort costs to give a loaded salary for any individual or group. This simplifies things because we can, perhaps, just use the effort cost as the "what do we put in" component of a productivity measure.

Now what do we get out of software development? One way of looking at this is to say that we deliver a software system to the market and what we get out is the revenue from that system. This is complicated in two ways, if the team is involved in enhancement engineering then we may have problems determining the revenue from a particular build and many of the factors that determine revenue are beyond the control of the software development function, falling instead to, for example, marketing. Remember that we wish to keep it as simple as possible; so, take a look at the process for which software development does have responsibility. What is the product of that process? It would seem to be a software system including code, object or source, and documentation.

Now, what are the characteristics that contribute to what we get out? These could include the amount and detail of the documentation that is required; the size of the code and the complexity of that code. Talking to our engineers we may find that the documentation is relatively standard across projects and if this is the case we may decide to factor out this element. What about the code? We have now got involved in looking at how we are going to combine our aspects but you will often find that this overlap between characteristic identification and combination happens.

As far as the code is concerned, you may see the size of the system as being important and the complexity being equally important but, again, talking to the engineers and the team leaders may indicate that complexity between projects is fairly constant, especially in an enhancement situation. So in this case, could we simply use code size? You may decide to check this out by looking for a relationship between code size and development effort.

Time pressure, or more correctly varying time pressure, on projects is also seen as having a major impact on effort. If time pressure is applied and the project must be completed by a very tight deadline then, providing scope or requirements are not reduced, the only variable that can be adjusted is effort. Despite the argument that says throwing effort at a problem will not reduce duration, the reality is that this is the only option available and that large projects can be delivered relatively quickly by using more staff. The point to remember is that the relationship is not linear, in other words if a project is required in half the time then you cannot simply double the team size. You may need to quadruple the size of the team and add further resources just to tackle the communication problem. You will probably pay for this approach to system development in terms of reduced productivity.

However, you will often find that the time pressure operating on projects is relatively constant within, and this is important, a single environment, for example a tool development group or a specific product group handling ongoing releases. Again this effect can often be factored out. In this sense we can justifiably simplify our productivity model.

Alternatively, you may have looked at what other organizations are doing and reached the same conclusion, productivity is normally expressed as:

Productivity = Work Product / Product Cost

A ratio is used because you wish to compare what we get out to what we put in.

Generally speaking, productivity within software development is expressed as:

Productivity = Project Size / Project Effort

This is your basic productivity model and it is this that will be used to derive the composite and base metrics. Essentially, this means that we need to change the generalized model components, project size and effort, to specific, defined measures.

The obvious candidates for project size are lines of code or "Function Points," really the unit-less number that results from the application of Function Point Analysis to a requirement specification, high level design or produced system.

Both of these measures can be defined in such a way as to cater to enhancement projects. For Lines of Code you could consider the added, deleted and changed lines and the same principle can be applied to Function Point Analysis.

For effort we may choose engineering or person days, months or even years. My own recommendation for an effort measure would be to use person hours. This gets rid of many of the differences and debates about what comprises a person day. Assume that we decide to use Lines of Code and person hours as the components of our composite metric giving:

Productivity = Lines of Code/Person Hours

We are almost there now. The base metrics or items of raw data that need to be collected almost drop out from this composite. There is one very important step that needs to be addressed: we have to define our base metrics in such a way as to make them meaningful, and their collection feasible, within our organization. This is true of almost every metric that you will use within the program. Yet again you will need to talk to the people who are or who will be involved in the program to make sure that what you propose is practical, but once you have defined the base metrics you have completed your metric derivation.

Well, almost! You do need to ensure that what you have defined is practical and this is another area where pilot exercises can be extremely useful. You also need to consider the analysis and feedback of the data you have collected. Do not fall into the trap of many metrics initiatives that end up with masses of data that is not being used by the organization. I will have more to say about these subjects later. You also have to consider the implementation of the processes that will facilitate the collection, analysis and feedback of data. Again we will address this topic later.

Let us try another example of the derivation technique. This time I will cut down the discussion to the bare bones. Our requirement is to provide information to senior management about the field quality of software products. Discussion indicates that our customer is the System Development Director and he is interested in reliability as perceived by the customer. Reliability is seen as the key quality factor in this case. The requirement can be classified as a Business Entity Metric.

The attribute, reliability, can be defined as before, "the capability of a software component," in this case the delivered system within the population of products, "to perform its functions to a level specified in the requirement... within a given environment."

What are the characteristics that affect reliability? What is it about the product that we deliver that can affect reliability? Big systems tend to be less reliable than small ones so product size could be considered. The complexity of the product may be important and the degree of use of the product could also be a possible attribute. Other possibilities include the time pressure applied to the development project, the degree of testing, the extent to which other verification and validation techniques were used, etc.

We also need to consider the product characteristics that indicate the level of reliability. An obvious candidate includes the time between failures, and please note that the traditional "Mean Time Between Failures" may not be appropriate as the use of a mean average depends upon the frequency distribution of time between failures. Another possibility is the number of system crashes over a given period of time or the number of user raised faults. Notice that I am using faults here rather than defects. Bear in mind that the customer will raise defect reports but that some of those defects will be duplicated or result from a lack of understanding of, say the documentation.

This information itself can be important for other measures, for example for some usability measures, but I suggest that reliability is a function of the validated, non-duplicate defects, the faults detected by the user or customer.

Trying to combine all of these aspects would result in a very complex metric and, probably, a very frustrated metrics team member, so keep it simple. Talk to the managers and if possible to the users and pick on the most important aspects. I suggest that field reliability of a product can be modeled by:

Product Reliability = User Detected Faults / (Product Size * Usage)

What this model says is that quality or reliability is a function of the validated "pain" being suffered by the customer to the extent that he or she will complain about it; how big the product is, which I include so that I can make more meaningful comparisons between products; and, the degree to which that product is used. This may work for a single product but the requirement is for a business entity metric that will be used by a senior manager. To meet this requirement I suggest that an average Product Reliability be derived from a sample of Product Reliability figures taken over a period of months. The type of average depends on the frequency distribution. For a normal or Gaussian distribution the mean average is the most appropriate, but for skewed distributions median or mode averages are more appropriate. I strongly suggest that you talk to a statistician about this or at least consult statistical text books. Both sources of information can explain the reasons for this more ably than myself.

We now have our model and some components of our composite metric. User-detected faults I can live with in my composite; product size I can define as Lines of Code or, more preferably in my view Function Points; and, usage I can define as the number of sites on which we support the product.

So my composite metric becomes:

Reliability is the average of the product reliabilities for a sample at a given time where product reliability is defined as:

Product Reliability = User Detected Faults / (Product Function Points * Sites Supported)

To get our base metrics we have to define the components of field reliability in more concrete terms. In this case I am tempted to simply amend the definitions to say that user detected faults are accumulated over a three month period while the other two components , i.e., product function points and sites supported, are a snapshot, most likely taken at the point of delivery or, maybe, at the end of the three-month period. .

And that is it! At this point I reach for my tin hat. Whenever you define a metric to meet a specific requirement you can guarantee that someone, somewhere will try to shoot it full of holes. There are various reasons for this, some valid, some not so valid. Nobody with any sense would claim that any of the metrics outlined above are perfect.

I claim that they satisfy the requirements in a pragmatic way but in all honesty I cannot say that the field reliability measure tells me everything I want to know about the attribute. It cannot, because I have identified many characteristics that contribute or effect field reliability and then I have ignored some! There may be good reasons for this. For example, I may feel that I have no way of assessing the complexity of the product at the full system level. What should I do? Should I devote possibly a great deal of time trying to solve the problem of system complexity before I provide a metric, or should I be satisfied with a coarser measure that gives me a reasonable indication of field reliability?

Look at who the customer is. Will a senior manager be interested in every nuance, every aspect of the attribute? I suggest not. A senior manager requires an indication of reliability levels so that trends can be identified and possibly so that meaningful targets can be set. Will they be meaningful? Well, remember that you have not done this in isolation. You have been talking to engineers and project managers while you have been deriving the measures so some degree of acceptance should be present. If you aim for perfection you will not hit the target for many years. Satisfying the requirement is what you should aim for and you should do this as quickly and simply as possible. Some claim that this is perfection anyway.

Now there may be valid reasons for people taking pot shots at the metrics. It may be that you have got it wrong or that you have, at least, not covered all the bases. In the case of field reliability we are talking about faults rather than defects but when a customer raises a defect you may not know if it is a duplicate or an invalid defect or an indication of a true fault. In some ways this is an implementation problem but it should be given some consideration now. If you define user detected faults to be the sum of all non-duplicated and validated defect reports raised and closed during a particular period of time together with all reports raised but still open, you will have some that should not have been included because they will turn out to be duplicates or invalid. You may feel that this is more acceptable than simply ignoring the open defects because you may have some clever individuals who realize that by never closing defect reports they improve reliability. This is not going to please the customer. So, if you go for the first option, try it out for a couple of months to see how many of the defects actually turn out to be "no fault" or duplicated reports. I suggest that the number will be insignificant especially as you may be able to spot the duplicates as they are raised.

Other criticisms may be more fundamental or important. Listen! If they are valid criticisms then change the model and the metrics but do watch out for the constant critics who you will never satisfy.

So, to summarize the steps in metric derivation as I have done it on many occasions and relating that to the GQM paradigm, Rombach (1) , on which my approach is firmly based, the steps are as follows.

GOALS: Identify the initial requirement and associated customers. Refine this to a specific requirement with an identified customer set. Classify the requirement within the high-level requirements of the metrics program.
QUESTIONS: Identify the attribute that the requirement addresses. Define the attribute. Find out what others are doing. Identify characteristics of the product or process that contribute to or affect the attribute of interest. Derive a model by combining aspects or through the adoption of a technique.
METRICS: From the model derive a composite metric by associating measurement units or scales with the aspects in that model. From these scales identify the base metrics that will need to be collected.

Then all you have to do is implement the metric. Now that is the hard part but we will talk about that later!

< Day Day Up >