SOFTWARE COMPLEXITY AND DISASTERS

Prev don't be afraid of buying books Next

There is a long history of software problems that have led to serious disasters. The cost has been astronomical. Below is a list of some of the better-known failures. The figures in brackets are the estimated costs of the project. As you read through this list, a recurring theme is that these large projects were extremely complex—to design, test, construct, and implement, as well as manage.

  • 1960— The first successful U.S. Corona spy satellite mission was launched after 12 previous failures due to software problems. (Cost too large to calculate)

  • 1962— The United States launched Ranger 3 to land scientific instruments on the Moon, but the probe missed its target by some 22,000 miles due to software problems. ($14 million)

  • 1962— Mariner I was launched for Venus, veered off course within seconds, and was ordered destroyed. It was later found that a single hyphen from the computer launch code was missing. ($16 million)

  • 1981— U.S. Air Force Communications and Control Software exceeded estimated development costs by a factor of 10. ($3.2 million)

  • 1987–1993— Attempt to build an integrated car and license software management system in California failed. ($44 million)

  • 1992— The London Ambulance system had to be scrapped due to the system not being tested sufficiently before introduction. Lost emergency calls and duplicate dispatches were just a few of the problems encountered. ($50 million)

  • 1993— Integration of SABRE reservation system with other online systems failed. ($162 million)

  • 1995— The new Denver International airport was delayed for over nine months due to the software problems of the baggage handling system. (Cost not disclosed)

  • 1997— All development on California's SACSS system was stopped after exceeding budgets. Eight alternative solutions were later considered. ($312 million)

  • 1999— The Ariane Rocket—launched by the European Space Agency—was destroyed shortly after takeoff. The cause? Failure in the ADA launch codes. ($500 million)

  • 2000— The Mars Polar Lander had software problems with metrics conversion that lead to the total loss of the spacecraft. It crashed into the surface of Mars. ($165 million)

It is estimated that software failures cost industry over $100 billion in the year 2000. Some observers say that figure is conservative and the actual is much higher.

Software that is used in critical life-supporting equipment should always be critically tested before release. On occasion, failures can have dire consequences. Take the case of the Panamanian x-ray disaster that happened in 2001. An x-ray machine was incorrectly computing the dosage rates and exposure on patients. Twenty-eight people were overexposed, and three died. The remaining survivors are likely to develop "serious complications, which in some cases may ultimately prove fatal," according to the FDA.[1]

There have also been several near misses. For example, in March 1997, the three-man Soyuz TM-24 barely evaded two potential catastrophic software flaws during its return to Earth. First, after separating from its propulsion module, the command module was nearly rammed by the jettisoned unit when its control computer fired the wrong set of pointing rockets. Moments later, the command module's autopilot lined it up for atmospheric entry—but in precisely the wrong direction, nose first rather than heat shield first. Manual intervention fixed that problem—but even at the height of the shuttle-Mir U.S.-Russian space partnership, there's no indication the Russians shared news of either of these flaws with NASA.

The rest of the descent appeared to go as planned, and the parachutes and soft-landing engines did their job. As in about half of all Soyuz landings, the landing module wound up on its side, probably pulled over by a gust of wind in its parachute just at touchdown.

The three men, who knew they were far off course, were able to open the hatch themselves and get out, as it's a much easier drop to the ground when the capsule is on its side. They then waited two hours to be spotted by a search plane, and several hours more for the arrival of the first helicopter. This is not what you would call a smooth and predictable landing.

The complexity of large software systems cannot be overemphasized. We simply do not have the rigorous testing and deployment mechanisms that can manage this complex environment. More efficient automated tools are needed to break down the complexity and manage it. Where automated tools do not exist, we need to manage the complexity with self-managing systems.

One would hope that we would have learned our lessons already, but it appears that we have not. Until we do, software disasters will continue to haunt us.

Amazon


Autonomic Computing
Autonomic Computing
ISBN: 013144025X
EAN: 2147483647
Year: 2004
Pages: 254
Authors: Richard Murch

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net