5.3 INFORMATION FLOW METRIC

< Day Day Up >

The other set of metrics I would like to place under the ADM umbrella are generally known as "Information Flow" metrics. At the conceptual level Information Flow metrics are not difficult to understand; it is when you come to apply them that the fun can start. Having said that, a pragmatic approach, as always, works wonders.

The basis of Information Flow metrics is founded upon the following premise. All but the most simple systems consist of components and it is the work that these components do and how they are fitted together that influence the complexity of a system. If a component has to do numerous discrete tasks it is said to lack "cohesion." If it passes information to, and/or accepts information from, many other components within the system it is said to be highly "coupled." Systems theory tells us that components that are highly coupled and that lack cohesion tend to be less reliable and less maintainable than those that are loosely coupled and that are cohesive.

Sometimes definitions of terms like cohesion and coupling help so I present the following as working definitions of those terms:

Cohesion	The degree to which a component performs a single function.
Coupling	The term used to describe the degree of linkage between one component and others in the same system.

Now, what is a "component?"

Component

Any element identified by decomposing a (software) system into its constituent parts.

This systems view maps to software systems extremely easily as most engineers today use, or are at least familiar with, top-down design techniques that produce a hierarchical view of system components. Even the more modern "middle out" or rapid engineering design approaches produce this structured type of deliverable, provided documentation is produced and maintained. Here again, Information Flow metrics can be used.

Information Flow metrics model the degree of cohesion and coupling for a particular system component. How that model is constructed can justifiably range from the simple to the complex. I intend to start with the most simple representation of Information Flow metrics to illustrate the basic concepts, how to derive information using the metrics and how to use that information. I will then expand this basic IF model.

Just before I do this I would like to put the credit for these metrics and the approaches I outline here where it is due. In terms of applying Information Flow metrics to software systems the pioneering work was done by Henry and Kafura, Henry (1) . They looked at the UNIX operating system and found a strong association between the Information Flow metrics and the level of maintainability ascribed to components by programmers. Other individuals who tried to apply these principles did find difficulties in using the Henry and Kafura approach. Further work was done in the UK by Professor Darrell Ince and Martin Shepperd, Ince (1) , among others, which resulted in a more practical IF model. This work was complimented by Barbara Kitchenham, Kitchenham (2) , who addressed the same problem and who also presented a clear approach to the question of interpretation.

The author had the good fortune of having the assistance of Ince, Shepperd and Kitchenham when he was first attempting to use these metrics. What is presented here is a distillation of that assistance.

Information Flow metrics are applied to the components of a system design. Figure 5.5 shows a fragment of such a design and for component A we can define three measures, but remember that these are the most simple models of IF.

click to expand
Figure 5.5: Aspects of Complexity

The first measure is "FAN IN." This is simply a count of the number of other components that can call, or pass control, to component A.

The second is "FAN OUT." This is the number of components that are called by component A.

The third measure is derived from the first two by using the following formula. We will call this measure the INFORMATION FLOW index of module A, abbreviated to IF(A):

IF(A) = (FAN IN(A) * FAN OUT(A)) ^2

The formula includes a power component to, as most texts on Information Flow metrics put it, "model the non-linear nature of complexity." The assumption is that if something is more complex than something else then it is much more complex rather than just a little bit more complex. Given that assumption we could raise to a power three or four or whatever we want but on the principle that the simpler the model the better, then two is a good enough choice. From my point of view, raising to two makes it easier, as you will see, to pick out the potential bad guys. That is a good enough reason and I will leave it to the purists to worry about the finer detail.

Information Flow metrics can be applied to any functional decomposition of a software system. Examples of these include structure charts, Data Flow Diagrams and SDL Block diagrams. Obviously you may have to tailor your terminology to suit the notation being used. For example, in a Data Flow Diagram you do not have "calls;" instead you have data flows between processes. The principle is the same. One of the easiest applications I have come across is to use Information Flow metrics on the hierarchical tree that forms a basis of some configuration management systems. This is a good example of the synergy that can sometimes be found to operate within software engineering.

Given that Information Flow metrics apply to these forms of functional decomposition they come into play from as early as the high-level design stage and serve a useful purpose right the way down to low-level design when you can start to use McCabe metrics.

Given your functional decomposition you will notice that there is one additional attribute possessed by each component, namely its level in the decomposition. The following is a step-by-step guide to deriving these most simple of Information Flow metrics.

Note the level of each component in the system design.
For each component, count the number of calls to that component, this is the FAN IN of that component. Some organizations allow more than one component at the highest level in the design so for components at the highest level which should have a FAN IN of 0, assign a FAN IN of 1. Also note that a simple model of FAN IN can penalize reused components. As a pragmatic rule, if a component calls no components, its FAN IN is greater than seven and it is deemed to be "small," and this last point requires the discretion of the designer, then assign a FAN IN of one.
For each component, count the number of calls from that component. For components that call no other, assign a FAN OUT value of one.
Calculate the IF value for each component using the formula above.
Sum the IF value for all components within each level. I will call this the LEVEL SUM.
Sum the IF values for the total system design. I will call this the SYSTEM SUM.

Which brings us to the analysis phase, so to continue:
For each level, rank the components in that level according to FAN IN, FAN OUT and IF value. Three histograms or line plots should be prepared for each level.
Plot the LEVEL SUM values for each level using a histogram or line plot.

This may sound like a great deal of work but for most commercial systems that I have come across, provided you have the documentation, this data can be derived and the analysis done within one engineering day. If your systems are larger than the ones I have seen then it will obviously take longer but remember that once done it is very easy to keep up to date. Depending upon your environment you may even be able to automate the calculations.

Having got the information, you now need to do something with it. You must realize that, for Information Flow metrics, there are no absolute values of good or bad. Information Flow metrics are relative indicators. This means that value for your system may be higher than for a system I have but this does not mean that your system is worse. Nor does a high metric value guarantee that a component will be unreliable and unmaintainable. It is only that it will probably be less reliable and less maintainable than its fellows.

The rub is that in most systems, less reliable and less maintainable means that it is potentially going to cost you significant amounts of money to fix and enhance. Potentially it could even be a nightmare component.

A nightmare component is the one that the system administer has nightmares about because he or she knows that if anyone touches that component, the whole system is going to crash and it will take weeks to fix because Fred designed it and Fred was weird. Fred also left five years ago!

So the strength of Information Flow metrics is not in the numbers themselves but in how you use the information.

As a guide, the 25% of components with the highest scores for FAN IN, FAN OUT and IF values should be investigated. Now in practice you may well find that you have a certain number of modules that stick out like a sore thumb, especially on the IF values. If this group is more or less than the 25% guide then do not worry about it, concentrate on those that seem to be odd according to the metric values rather than following any 25% rule slavishly.

High FAN IN values indicate modules that lack cohesion. It may well be that you have not broken out the functions to a great enough degree. Basically, these components are called often because they are doing more than one job.

High levels of FAN OUT also indicates a lack of cohesion or missed levels of abstraction. Here you stopped design before design was finished and this is reflected in the high number of calls from the component.

Generally speaking, FAN OUT appears to be a better indicator of problem modules than FAN IN but it is early days yet and I would not wish to discount FAN IN.

High IF values indicate highly coupled components. You need to look at these components in terms of FAN IN and FAN OUT to see how to reduce the complexity level. Sometimes you may hit a "traffic center." This is a component where, for whatever reasons, you have a high IF value but cannot improve things. Switching components in telecommunication systems often exhibit this. Here you have a potential problem area which, if it is also a large component, may be very error-prone. If you cannot reduce the complexity then at least make sure that you test that component thoroughly.

Looking at the LEVEL SUM plot of values you should see a fairly smooth curve showing controlled growth in Information Flow across the levels. Sudden increases in these values across levels can indicate a missed level of abstraction within the general design. For systems where the design has less than ten levels then a simple count of components at each level seems to work equally well.

The final item of information you have is the SYSTEM SUM value. This gives you an overall complexity rating for the design in terms of Information Flow metrics. Most presentations on this topic will say that this number can be used to assess alternative design proposals. At which point you often get wry chuckles from the practitioners in the audience who feel they never have enough time to develop one design let alone alternatives. My sympathies have always been with the practitioners but let me just state that I have come across a number of teams in different organizations who do prepare alternative designs at this kind of level for enhancement projects. Information Flow metrics give them the opportunity to increase confidence in the choice they eventually make by quantifying aspects of complexity. Score one for the so-called practitioners who discount this as impossible!

We have looked at the most simple form of Information Flow metrics but the original proposals put forward by Henry and Kafura were more sophisticated than the control flow based variant discussed above. As I said earlier, Ince, Shepperd and Kitchenham have done a great deal of work to help in the practical application of Henry and Kafura's pioneering proposals and it is a distillation of that work that I will now summarize into the more sophisticated IF model. You should also realize that this is a model and it will need to be tailored to your own organization's design mechanisms if it is to be used. Such a tailoring process should not take more than two days for counting rule derivation and documentation of these rules provided you use a well-defined design notation and use a competent engineer who knows that notation.

The only difference between the simple and the sophisticated Information Flow models lies in the definition of FAN IN and FAN OUT.

For a component A let:

a	=	the number of components that call A.
b	=	the number of parameters passed to A from components higher in the hierarchy.
c	=	the number of parameters passed to A from components lower in the hierarchy.
d	=	the number of data elements read by component A.

Then:

FAN IN(A) = a + b + c + d

Also let:

e	=	the number of components called by A.
f	=	the number of parameters passed from A to components higher in the hierarchy.
g	=	the number of parameters passed from A to components lower in the hierarchy.
h	=	the number of data elements written to by A.

Then:

FAN OUT(A) = e + f + g + h

Other than those changes to the basic definitions the derivation, analysis and interpretation remain the same. I must say that my advice to any organization starting to apply Information Flow metrics would be to build up confidence by using the simpler form. If these work for your organization then leave it at that. If and only if the simpler form fails in your environment — in other words, you are confident that no significant relationship exists between the simple measures and the levels of reliability and maintainability — only then spend the effort to tailor and pilot the more sophisticated form.

You can be encouraged by the fact that there have been a number of experimental validations of Information Flow metrics that seem to support the claims made for them. These results have been encouraging. Programming groups that have been introduced to Information Flow metrics have been able to make use of them and also report benefits in the area of design quality control and system management. They seem to work but there does seem to be some reluctance in the industry as a whole to make use of Information Flow metrics. Perhaps one reason is because managers feel they are a bit "techie." Perhaps others feel that they are not yet ready to use sophisticated techniques like Information Flow metrics. I hope that this brief explanation of the measures has shown that they are practical and pragmatic method of ensuring quality.

< Day Day Up >