Preface | Software Engineering Measurement

This is a book about software measurement. It is not a book about software metrics. There are many of those books. The world is populated with metricians who are eager to build new rulers. The act of applying the metrics is called software measurement. Measuring software, however, will produce data, not information. We must have a reason to measure. We need to know what the data produced by the measurement process really tell us about the software. And we will need to know the essential difference between measuring the complexity of the software application and the complexity of the programming language metaphor.

Measurement, it seems, is a necessary condition for a scientific discipline but not a sufficient condition. Researchers and practitioners need training in the use of this measurement data to validate the theories of computer science. That is what motivated me to write this book.

There are three fundamental questions that remain unanswered after all these years of software development and measurement. First, exactly how do you get the measurement data? Second, how to you convert the data from the measurement process to information that you can use to manage the software development process? Third, how do you manage all of the data? These are the fundamental issues that this book was written to resolve.

Let us start with the first question: how do you get data? I have worked with members of the IEEE 982.1 Standards Committee on the Dictionary of Measures to Produce Reliable Software. Some of the measurement data relates to the measurement of faults. Very well. What is a fault? Exactly what is a fault? Where is there a standard for recognizing faults in C code? How will I count faults? The committee produced a glib document in which faults figured heavily but never got around to telling us what a fault is. Looking further, we discover that the basis of software science measures is operators and operands. N₁ is, for example, the total number of operators in a program. Very well. Exactly what is an operator? You might well have some C code and want to count operators in this code. You will not learn how to do this from IEEE 982.1. What you need is a list of things in C that are operators and things that are operands. Oddly, the National Institute of Standards and Technology resolutely maintains its independence from software measurement standards. You will learn in this book what operands are and how to count them. There is a standard for C code measurement. This standard is here for its value in showing just what a standard should look like. Once you have seen such a standard for one programming language, it is easy to generalize this standard to other language contexts. In early England, a foot was the length of the current king's foot. Big king, big foot. Little king, little foot. Once the English finally nailed down the standard for a foot, commerce was made so much easier.

Finally, the astonishing volumes of data that result from the measurement process lead to the death of many software measurement programs. If we have not thought out in advance just how we are going to use measurement data, we will literally be overwhelmed. We might, for example, be working with a software system consisting of 10,000 modules with a total of one million lines of code. Typically, the developers for this system will reside in many different sites remote from each other. On a good day in this software system maybe as many as 100 of the program modules will change. Every time a module is changed, it should be remeasured. Each time it is measured we might collect as many as 30 different data points. That is 3000 data points per day that have to go somewhere. Further, if we are monitoring the activity of the software during test (and we certainly should), the monitoring activity may produce more than a gigabyte of data per second (1 Gbps). If the software is tested for several hundred hours, we will drown in the data long before the test activity is complete. Or, we will just simply not measure the test outcomes and hope for the best (which is neither science nor a sound engineering practice). This book will show how data are reduced to manageable quanta of information. It will talk to the management of the measurement data.

The second question - How do we convert data to information? - is central to the failure of many measurement programs. Many software development organizations have thought to do engineering with their software processes. The act of measurement is central to this engineering process. To get measurements, the thing to do is to buy measurement tools. That is a mistake. If we will take the time to reminisce, we will find in our memories the days in first and second grade when we were taught how to use a ruler and a thermometer. It took quite a bit of teaching for these measurement skills to be acquired. A measurement tool, in and of itself, produces data. We can imagine giving a micrometer to an aborigine from deepest New Guinea. He would probably use this device to decorate either his nose or his ear. The act of measurement is alien to him. So it is also with the software developer. No one has ever shown him or her how to use measurement tools. They run the measurement tools and decorate their offices with the pictures from measurement tools just as the aborigine would decorate himself with the micrometer. These pictures are the measurement data. They have no meaning to the software developer, just as the micrometer has no meaning to the aborigine.

There is probably no harm done in the aborigine putting the micrometer through a hole in his nose. There is very often great damage done when software developers are led to inappropriate conclusions about the data they have obtained on projects they have measured. For example, if we track fault data very carefully by developer, we will find that the best developers contribute a disproportionate number of faults to the code. It seems reasonable, then, that we should sack the most experienced people and our total fault count would drop. If we look further into the problem, the reason for this observation becomes obvious. We give novice developers easy jobs. We give experienced developers heinous problems to solve. If the novices were working on the same problem, their fault introduction rate would be very much higher than the experienced developers. W. Edwards Deming often maintained that people without training in statistics should not be allowed to possess data because it could be so easily misused. This book is designed to teach software developers what the tools measure and what can be done with the data they produce to convert it into meaningful and valid information. The statistics necessary to perform this conversion will be supplied.

Measurement process issues are also given a fair amount of attention throughout the book. Over the years, I have discovered that if measurement processes are not automated, they simply do not occur. Thus, there is a heavy bias given to automating and hiding the measurement activity itself. When, for example, a program module is checked out of a source control system, updated, and placed back in the code repository, the source control system should invoke a measurement tool that will measure the code and record that data in the measurement database unseen to the developer. The second feature that we will address is the measurement process improvement process. Once the measurement processes are in place and everything is simmering on the back burner, the measurement tools can be refined and new tools can be added to the process.

To build better software technology we will certainly need to do better engineering. Our engineering processes are limited by our measurement technology. The Industrial Revolution was not the result of the steam engine, as is the popular belief. The Industrial Revolution was the direct result of precision manufacturing, which was only possible with precision measurement.

Underlying the development of any software system is a software process. This process must be monitored continuously, just like any other manufacturing process. Steel mills are heavily instrumented. Every part of the steel-making process is continuously monitored and measured; so, too, should software processes. We should measure each change to the software specifications, design, and code. We should monitor both the activity of the test process and the activity of the code as it is executing. Finally, the software systems produced by this process should be continuously monitored when they are deployed.

Computer security and reliability concerns dominate our world. Millions of dollars are being spent trying to secure software systems. The problem is that these systems are essentially running out of control. There are no monitoring systems built into the operating software to watch what it is doing when it is running. Novice hackers very easily hijack these unmonitored systems. A system is easily hacked when no one is minding the store. We do not have a computer security problem. We do have a software control problem. If suitable instrumentation is placed into the systems that we develop, their activity can be monitored in real-time. We can detect code that is being attacked. We can detect code that is likely to fail. We can block the hacker and his inroads to the system if we are monitoring our software systems. We can fix a software system before it breaks if we are monitoring that software system. We just have to know what to watch for and how to instrument properly.

I have used this material in a graduate-level course on software engineering measurement. It presumes the level of academic preparation of a typical computer science student. Most of these students have had at least some background in statistics. Where I know that background to be missing, I attempt to fill in the holes. There is also an appendix on statistics that should also be useful in this regard. Where possible, I have created simple examples to show some of the measurement concepts. I have also integrated data and observations from my years of measurement practice in industry.

My overall objectives in creating this opus are to (1) develop an understanding in the reader about measuring software, (2) foster a more precise use of software measurement in the computer science and software engineering literature, (3) demonstrate how one can develop simple experiments for the empirical validation of theoretical research, and (4) show how the measurement data can be made into meaningful information.

One of the most inappropriate terms ever used for a discipline was the term "computer science." There has been, historically, practically no science in computer science. This is not surprising, in that the majority of the early pioneers in computer software were drawn from mathematics and philosophy. These pioneers had very little or no training in scientific methodology. They were very good at creating mathematical systems. They had neither the skills nor the training nor the interest in the empirical validation of theoretical constructs. As a result, the field of computer science has much theory and little or no real science. The programming language C++ is an example of this folly. What would ever possess a rational human being to construct a safety-critical software system around a programming language with an ambiguous grammar and no standard semantics?

Modern physical sciences are based primarily on the observation and measurement of physical phenomena. Theories survive only insofar as they can be validated empirically. A considerable amount of effort is devoted to the act of measurement in the training of a new physical scientist. In chemistry, for example, one of the first courses that a potential chemist will take is a course in quantitative analysis. The objective of this course is to train the incipient chemist in basic laboratory measurement techniques. A student in computer science, on the other hand, is typically never contaminated with any course that will permit measurement and assessment of any abstract constructs.

The training of a modern computer scientist is very similar to the training of a theological student in a seminary. Students are exposed to courses such as Dogma 1 (Structured Programming), Dogma 2 (Object-Oriented Programming), Dogma 3 (The Lambda Calculus), etc. The basis of theological study is faith. Theological students are never offered a shred of scientific evidence that any of the constructs they are asked to learn have proven useful in any context. That is not what theology is about. Computer science students are never offered a shred of empirical evidence that any of the constructs that we present to them are valid. They are taught to believe that structured programming is somehow good, that object-oriented design is really great. The use of the words "good" or "great" suggests to a scientist that there must be some criterion measure (or measures) that can be established that would permit structured code (whatever that might be) to be measured against code that is unstructured (whatever that might be). A scientist would set about to define criterion measure(s) for goodness and then conduct experiments to test for the superiority (or lack thereof) of structured programming or the clear benefits derived from object-oriented design. It is interesting to note that in the particular case of structured programming, if the criterion measure is runtime efficiency, for example, structured code will generally result in much slower execution times than unstructured approaches in solving problems of substantial complexity. We will pay a price for program structure.

It is important to understand the limitations of our software developers as they graduate from college. Above all, they are not trained in science. I am very familiar with this educational system, having taught computer science for more than 30 years. Odds are that typical computer science students will never have designed or conducted an experiment of any type, or collected any data about a program or its operation. They will never have been introduced to the notion of measurement. It would never occur to them that a good algorithm might lead to very poor system performance. They will have had no training in statistics or in the interpretation of scientific data. They cannot do science and, as a result, they cannot recognize good science.

This book is the product of many years of work in the software measurement business. While there are quite a few books on the subject of software metrics, there are none about the measurement activity itself. There seems to be a great deal of interest in the creation of new metrics for software development. Anyone, it seems, with a computer and a word processor can become an expert in the area. There is, however, little science that generates from all this activity. This book discusses the conduct of scientific inquiry in the development and deployment of software systems.