The Quest for Silver Bullets | Extreme Programming Perspectives

"No silver bullets!" claims almost any "serious" book on software engineering. This is too easy people are being cheap.

In almost any discipline there are silver bullets. "Do not smoke!" "Align the digits!" "One concept, one sentence!" Just to cite a few.

So in the following sections, I will be unfashionable and try to determine a few silver bullets for measuring XP and agile development.

Silver Bullet 1: Do Not Be Measure-Obsessed or Measure-Depressed Just Set Your Goals

There is a kind of syllogism. With measures you get numbers. With numbers you deal with math. In math everything is precise. To get anything sensible from a measurement effort, you need to be 100% precise false.

You will never obtain 100% precision in your measurements, nor do you want it it is too expensive, and you do not need it. Think of the speedometer it always shows at least a 10% error. Still, you can use it to drive safely and to avoid tickets.

This is not to advocate sloppiness. It is to prevent a sense of impotence or, worse, the imposition of a measurement mania that ends up either rejecting any measure or artificially creating numbers to please the boss.

The good news is that in most cases, what you have or what you can get with a reasonable effort is enough to quantitatively assess your XP practices.

Moreover, you cannot measure everything, nor would it be useful. Vic Basili has coined a cool acronym to describe how and what to measure: GQM (Goal, Question, Metrics).^[1] That is, first determine your measurement goal (or goals) say, to get fit. Then ask yourself questions about how to measure your goal "What is my weight?" "What size is my waist?" Last, determine the metrics that can answer your questions and how to collect them pounds are measurable with a scale, and inches are measurable with a rule.

^[1] There are several papers on the GQM. A nice one was written by Yasuhiro Mashiko and Victor R. Basili [Basili+1997].

Do not measure more you would just become confused, and, worse, the power of your statistical inferences may drop. Statisticians call this fact the "principle of parsimony."

Silver Bullet 2: Collect the Information You Already Have and Make the Best Possible Use of It

Measuring an XP project does not require setting up a measurement framework. You may already have sources of information around you. Places to look include the following:

Accounting department
Defect database
Customer service center
Configuration management system

Accounting Department

Often, the accounting department of your company collects information on projects for billing customers, tracking resources, writing off certain expenses from taxes, and so on. The information available can also be quite detailed, especially if your company uses Activity Based Costing.

In certain cases, you may be surprised to find that your company knows with a fair amount of precision who developed each portion of the code, or who fixed a bug, or how much time was spent developing a file or a class. Using this information, you could try to build models of your development speed, linking time to classes and classes to user stories. Note that this is not contradictory with collective code ownership. The fact that everyone owns the code does not imply that we should not record who (or what pair) worked on a certain day on a certain piece of code. We are dealing with shared responsibility, not with alienated responsibility.

Defect Database

It is a good practice during final testing for the testing group to store the defect information for faulty modules in a defect database. We hope that there will not be too many defects to fix, thanks to the continuous integration effort and the systematic testing in place. Still, there will be a few.

Such a defect database serves two main purposes: (1) It provides precise feedback to the development group about where fixes are required; (2) it tracks the defect discovery rate, thus helping predict the residual defects in the system and when the system will be ready for deployment to the customer.

It is common to find in such a database the operating context for the defects, the possible pieces of code causing the defects, an estimation of the severity and the priority of the defects, the time of finding the defects, and, sometimes, the time and the effort to resolve the defects. Such a database contains a wealth of information useful for measuring the overall XP project.

Customer Service Center

It is quite common for software companies to have a customer service center or department. Usually, such an entity collects information on failures of systems from customers and passes it on to the developer as a "failure record." Typically, the failure record contains the scenarios of the failure, its criticality, a timestamp, and a suggested priority.

This is a wealth of data to be used in measurement. We can easily deduce the mean time to failure, which is required to build reliability growth models, which in turn are useful in debugging and deciding when the system is ready to ship. With minimal additional effort, we can determine the defectiveness of the overall system, which is essential in setting and verifying quality goals. This can also be used in comparing the quality of different teams say, one using XP and one not using XP.

Often, the developers add to the failure record the files or even the classes that were modified as a result of the failure. If so, advanced statistical models can be built relating internal properties of the code to defects. Best practices on code design could be derived. The need for refactoring could be assessed.

Sometimes, details on the time spent fixing the bugs are also available. In this case more detailed models can be built for both development speed and fixing speed. The cost for nonquality can be precisely assessed, making the case for or against pair programming, test before code, and other supposedly useful XP practices.

Configuration Management System

As soon as development cannot be done by a single person, the problem of tracking the development effort becomes significant. Most development teams use configuration management systems. There are very simple systems and very sophisticated ones.

But even in the simplest ones, important information is tracked: the number of versions of each managed entity, usually the file. It has been empirically verified that the number of revisions usually correlates highly with the number of defects. And here we have a situation similar to the one discussed earlier.

We know that if A is correlated to B with r² = z and B is correlated to C with r² = w, the correlation between A and C may end up having an r² as low as the product z x w. But still we have some information that may be useful to build models of quality and so on.

I am sure that if you think carefully, you can find many other places where useful data is stored in your company. You just have to sit down and think.

Silver Bullet 3: Automate Whenever Possible The Hackystat Model

Murphy's Law states, "If something can go wrong, it will!" We should always keep this in mind. (I think that Kent got his inspiration from Murphy when he created XP. Kent, is this correct?) Being lightweight is a way to minimize what can go wrong.

Now that we know that there are three major sources of information to build statistical models, we should question the integrity of the data we have collected. Data reported from the configuration management system is the least accurate approximation of the defects discovered in the system but is also the most integral. There is no easy way to alter it, and it is very unlikely that someone would alter it accidentally.

Data from accounting and customer service centers or from defect databases may have some problems because it is collected by humans, who may have lots of reasons to alter the data. We know that when we are pressed because, say, a deadline is coming a week ago and we are working 24/7 to fix the last thousand defects, the last thing we care about is entering the data on what we are doing. Therefore, it would be advisable to use devices for automatically collecting such data. Of course, if we implement XP, this will never happen, but we are humans after all.

Even though there is no way to monitor the activities with 200% precision, an interesting new generation of tools for process control could be exploited. I find particularly enlightening the approach taken by Phil Johnson in the development of Hackystat [Johnson2001].

He started from the consideration that the Personal Software Process would dramatically increase developers' productivity if people would record their data accurately. As we know, this is not the case. However, for most software development activities, people use tools, and from the analysis of what they are doing in the tool, it is possible to determine with a fair degree of precision the activity they are accomplishing. Hackystat takes advantage of this fact, using plug-ins for extracting from the tools what people are doing.

The good news is that most existing tools are customizable, so the development of such plug-ins is not "mission impossible." Moreover, such customization can be often performed in Java, which is well known inside the XP community. Sometimes, bridges between Java and COM are required.

Silver Bullet 4: There Are Lies, Damned Lies, and Statistics (Disraeli)

Now that we have a lot of data available, we need to process it and try to develop meaningful conclusions.

We know that by trying really hard, we can find some subtle statistics that make the data look like what we would expect. In fact, it is very likely that such statistics could not be correctly applied, because their working hypotheses are far from being satisfied.

Now, even at the risk of losing some significant conclusion, I recommend that nonstatisticians apply simple and general tests, which are robust in data distribution and enable understandable conclusions.

The field of statistics offers us some of these tests, such as the Mann-Whitney U test, the Komogorov Smirnov test, the Spearman Rank test, and many others. The problem is that such tests are often too simple to be explained in university statistical courses, so we may not know about them. To this end, I recommend starting with a good book on statistics for social scientists that usually assumes readers have no university math background. Personally, I like the one by Aron and Aron [Aron+2001]. Then, to move on to some nonparametric reference, you can have a look at the book by Siegel and Castellan [Siegel+1998]. While writing this chapter, I found some interesting and well-formatted information at the URL http://www.cas.lancs.ac.uk/glossary_v1.1/nonparam.html.