Interlude: The 500 Million Exception


Interlude: The $500 Million Exception

On June 4, 1996, the maiden flight of the European Space Agency's Ariane 5 launcher ended after 40 seconds with the rocket veering off its flight path , breaking up, and exploding. The total (uninsured) cost of the crash was reported to be in the region of $500 million!

The detailed report that ensued from the accident noted that the failure of the flight was caused by the complete loss of guidance and altitude information approximately 37 seconds after the start of the main engine ignition sequence. This loss of information was in turn caused by specification and design errors in the software of the inertial reference system. The report concluded that the problem was a technical one, and not one caused by incompetence or managerial negligence.

The actual coding problem at the source of the accident was an unhandled exception in an Ada program that caused the inertial reference system to shut down. In an attempt to convert a 64-bit floating-point value into a 16-bit signed integer value, the floating-point number had a value that was greater than could be represented by the integer. In the Ada software, this resulted in an Operand exception. When the inertial reference system failed, it transmitted diagnostic data to the launcher's main computer, where it was interpreted as flight data and used for flight control calculations. This in turn caused a rapid change in altitude, which caused the launcher to disintegrate due to aerodynamic forces.

Although similar conversion instructions in the vicinity had been protected with error handling, this specific instruction had not been protected, with the idea of enhancing performance. The reasoning was that the potential result was physically limited, a theory that was true with the Ariane 4 launcher but proved false with the much higher acceleration and horizontal velocity of the Ariane 5 rocket.

VB .NET, of course, has similar functionality. The equivalent of an Ada Operand exception in .NET is a System.OverflowException , and any unhandled exception will cause the VB .NET program in which it occurs to shut down.

The error was really a reuse error, in that the software module for the Ariane 5 launcher was reused from 10-year-old software used for Ariane 4. The original specification did actually imply that the result should always fit into a 16-bit integer without explicitly stating it, but this implied requirement was nowhere to be found in the code.

I'm not sure what the moral of this story is. Peter van der Linden, the C language guru, suggests that developers of mission-critical software such as that used on aircraft and on space missions should have the privilege of accompanying the software on its first operational flight, thus having a major incentive to debug the program properly when things go wrong. Obviously, writing "safeware" is very difficult. Some people have suggested that the use of assertions might have picked up the problem at the core of the Ariane crash during the testing phase. Others point to the fact that using assertions in real-time systems can hide timing problems such as race conditions and deadlocks, which might then occur when the assertions are removed during the release build.




Comprehensive VB .NET Debugging
Comprehensive VB .NET Debugging
ISBN: 1590590503
EAN: 2147483647
Year: 2003
Pages: 160
Authors: Mark Pearce

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net