Chapter19.The Refresh Problem | The Software Development Edge(c) Essays on Managing Successful Projects

Chapter 19. The Refresh Problem

As a group, we spend proportionately way too much time talking about new software development. Although those are the projects people prefer to work onthe "green field" development we hear so much aboutthe plain fact of the matter is that software lives for a very long time. As a result, we spend a lot of time maintaining and upgrading legacy code. It is not uncommon for systems' lives to be measured in decades, not years. Now one might ask, in a field that changes so rapidly, why does software hang around for so long?

There are a few reasons. First, initial development costs are usually much higher than forecast, so the "sunk cost" of a newly deployed system is already more than was anticipated at the origination of the project. At each subsequent decision point, the cost of an update must be weighed against the cost of a replacement, either by acquisition or by rewrite. Usually these update costs are small compared with replacement, and minor upgrades to functionality can usually be accommodated if the original architecture was robust. Error rates go through a relatively predictable pattern.

Let's talk about error rates a bit more. Early in the lifecycle of a new software product, reported error rates are usually relatively high. These are the errors that have escaped detection during testing and have been discovered by the first wave of customers using the product. We could compare these bugs to the "infant mortality" failures in hardware. These are discovered and flushed from the system relatively quickly.

Then, just like hardware, we enter into a relatively long period when there are few bugs or failures. In the hardware world, we attribute this to parts that have been burned in; in software, it corresponds to a mature product that has usage patterns which are fairly commonly repeated. Most of the code that is being exercised is the same code, over and over again; remember that many parts of a software system are infrequently calledrare error cases, for example.

Finally, we enter into an "end-of-life" period. In hardware, this corresponds to parts failing through fatigue. It is almost the mirror image of the infant mortality problem; these are the critters that are dying of old age. In software, we see a similar problem: In older systems, bug rates go up again. There are two reasons for this:

First, statistically infrequent paths begin to be explored in the code, and a bug that may have been present from Day 1 is triggered because some unusual combination of circumstances finally arose.
Second, the cumulative effects of years of maintenance take their toll. All the patches and fixes put in by generations of maintenance programmers have so degraded the initial architecture that "code rot" is taking over. Make one more fix and the whole house of cards may come tumbling down around you. It is at this point that a replacement system makes more technical and economic sense than even one additional upgrade cycle.

The total lifecycle cost of most software systems is large when compared to the initial development cost. But this cost is rarely even estimated when new systems are proposed. The right cost metric for any new system should be its total lifecycle cost, but we rarely see this number at proposal time. It is another manifestation of the immaturity of our discipline.