1.4 Reasonable and Attainable Goals for Software Measurement

A child must learn to crawl before he walks. He must learn to walk before he runs. So it will be for an incipient software measurement program. We must learn to crawl first. At the outset, the complexity of the measurement problem space appears astonishingly large. There are many products, such as requirements, design, and code. Each of these products was produced by a process, in a development environment, by people. It is very difficult to measure a software process. This is not a good place to start. It is very dangerous to measure people. This measurement data is so easily misused. Until we have gained some real sophistication in measuring people and process, we will have little success in trying to measure aspects of the software development environment.

Our objective is not to do software measurement. We must learn to a build reliable software measurement process based on valid software measurement tools. If we try to do too much too soon, we will likely fail. Basically, software engineering measurement is not a resource issue; it is a commitment issue. We are going to build a measurement process that will generate great volumes of data. These data must be converted to information that can be used in the software development decision-making process. The principle is a simple one. Even a small amount of measurement data that can be converted to useful information is better than large volumes of measurement data that have little or no information value. We must begin with simple tools and focus most of our attention on the measurement and management processes. We must remember that we learned how to measure distances in the first grade with a very crude ruler. Our teachers did not give us micrometers to learn measurement. Ultimately, we learned that our rulers could be used to quantify size attributes of the objects in our environment. These rulers, in fact, had utility beyond their obvious use as bludgeons that we could use on our classmates.

A very simple rule for the software engineering measurement process is that we must learn to walk before we run. Let us work out a very simple process to begin a measurement program within our software development organization.

At the outset we will focus our efforts on building our first measurement tool. We will identify a very small set of metrics that we know will yield useful information about our software development process. There is no quality in quantity. A small working set of good metrics will yield far more information than a large set of metrics of questionable validity. In Chapter 5 we describe a very simple set of such metrics. We will not convene a panel of metrics experts to tell us what to do. We must learn this ourselves. That is part of the process. It turns out that the processes surrounding the measurement activity are far more important, initially, than the measurements themselves. We will build, or cause to be built, a simple measurement tool that will collect the necessary metrics we have decided to incorporate in our initial working set. We will then validate that the tool does exactly what it is supposed to do.

Now the real fun begins. A modern software system is a very complex organism. It is rapidly evolving. New pieces are being added continually. Existing components are being changed almost constantly. Some dysfunctionality will be removed from time to time. Our measurement of the system must be timely. We must be able to know its measured attributes as it is right now. We must figure out a way to integrate the measurement of this code into the development process. It is clear that if we are forced run our measurement tool as a separate measurement exercise, this measurement process is destined to fail. It must occur automatically and invisibly.

Let us observe that essentially all modern software systems are placed under some type of configuration control system, such as SCCS or RCS, at their inception. Whenever a developer wishes to change a module, it must be checked out first. After the changes are made, the module can then be checked back into the system. This is a very good place to insert the measurement tool. We can simply trap the update to the configuration control system and measure the new code module with changes. We will have integrated our measurement tool into the existing process so that it occurs automatically. It is not necessary that the developer know anything about this process; it is part of the software manufacturing process.

It is clear that if there is a lot of code churn, there will be a lot of measurement activity. This measurement activity will create a lot of data. This data must go to some repository. The next logical step in the measurement process is to design a measurement database that will capture this data as it is generated. There are any number of database management systems that are well suited for this task. We will design our measurement tool to interface with the appropriate database management tool to update our measurement database. Now the data are generated automatically and placed into a repository automatically. Measurement will happen without human intervention.

The next step in the measurement process is the conversion of measurement data to information that can be used by the managers of the software development process. It is clear that our database system will probably have a simple query language such as SQL (structured query language) that will allow us to summarize the information in the database. We now have at our disposal a system that allows us to see the changes being made to the code base in a very timely fashion. We could know, for example, how many lines of code had been added or changed in the last five minutes. The query language capability of the database is an invaluable aid in the transformation of data to information.

We now have crossed some of the most difficult hurdles in the software measurement process. We now have a software measurement system. It will invisibly chug away at its measurement task. The data will go to a measurement database where it can be converted to information. We accept the fact that measurement is an integral and continuous part of the software development process.

There are any number of directions that we can now go that will begin to flesh out our measurement database. Perhaps the next logical step is to accumulate information on software quality. We will next probably begin to interface our problem tracking system with our measurement database. All failure events, as they are assigned tracking numbers, will be recorded in our database. As the failures are traced to specific faults in specific code modules, the fault data is also stored in our management database. New code and code changes are attributable to people. It is appropriate that we begin to collect data on exactly who is making changes to the code base. As each code element, subsystem, and system is tested, the code activity for the test suite can be recorded in the measurement database. We can know where the problems in the code are and how well we have tested for them. We know that each code module implements a specific design element. Each design element, in turn, implements a specific requirement. It is appropriate that we also manage these design and requirement specifications in our database. Each test case exercises a specific functional system requirement. Test cases should also be managed by our measurement system. Now, for the first time, we are in a position to begin the engineering and testing of our software systems. We will just have the beginnings of a primitive engineering measurement system.

Now begins the most important part of the entire measurement process. We are going to institutionalize the process of measurement process improvement. The more we really learn about software development processes, people, products, and environments, the more questions we will have for which we seek answers. That is the essence of science. Each new thing we learn will create a flurry of new questions. For example, software faults are clearly inserted into the code by developers for reasons that can be known. Some faults are because of the complexity of data structures. Other faults are attributable to the complexity of control flow. We should be able to measure the code base and determine where these types of faults are likely to be found by code module. If we cannot make good predictions of where faults might be found with the data we have at our disposal, then we can design experiments to find new metrics that will allow better fault prediction. We can then test and validate our new metric candidates. The software measurement tool can then be modified to incorporate the new validated metrics. This tool can be modified without impact to the measurement process. No one will know that a change has occurred. We have institutionalized the measurement process; it is part of the system.

Next, we might well observe that there are sources of unexplained variation among programmers with respect to certain types of software faults. We will now design experiments to understand why programmers make certain types of errors. The data, in turn, will permit us to assign developers to tasks where they can be most productive. In short, the measurement database allows us to manage our resources.