N-Version Programming for Reliability | Design for Trustworthy Software: Tools, Techniques, and Methodology of Developing Robust Software

N-Version Programming (NVP) has been proposed as a method of providing fault tolerance in software. In practice it requires the independent preparation of multiple versions of a software component for some function or application. All the versions are executed in parallel in the application environment, each receiving the same inputs and each producing the required outputs. The N outputs are presented to a selection algorithm that chooses the final application output. If all outputs are expected to be the same, the selection algorithm would merely be majority decision logic. This programming model is clearly derived from hardware risk analysis, and a large body of experience applies it to hardware systems. In fact, the first electronic digital computer in the U.S., the ENIAC, built at the University of Pennsylvania from 1942 to 1946, used two interesting techniques to provide its fabled reliability. When J. Presper Eckert and John Mauchly designed the machine, consisting of 18,000 vacuum tubes, they were warned by electronic reliability engineers that the machine would not run for more than 30 seconds without a failure. Eckert, the engineer on the team, chose the military's rugged 6SN7 as the vacuum tube type and decided to run the tubes with 5.0 volts on the filaments rather than the design value of 6.3 volts. He then designed each logic gate in the machine to independently produce three results and used majority decision logic to accept the two that agreed as the correct answer (that is, 1 bit in the case of a logic gate). This was a very conservative design, but Mauchly and Eckert knew that if the first computer were unreliable, it would set back their effort to demonstrate the utility of electronic computing.^[20] NVP is this same idea applied at the subroutine, component, or method level in software rather than at the bit level in hardware.

If numerical results at the function level in software result from the N versions, they may be expected to differ slightly, because different numerical methods may have been used. In this case the decision algorithm might be programmed to select the median valueor, more conservatively, the median value of all the answers that differ from one another by some tolerance less than a given epsilon value. It may occur, however, that no decision to output a result can be made, because all N versions of the program fail or for some reason produce answers that cannot be resolved in the decision logic. NVP's inability to come to a consensus may also result from the use of finite-precision or rational arithmetic.^[21] All the theorems of real analysis were proven only for the real numbers, not the rational numbers. Although the rationals are dense in the reals, and a lot of work has been done in constructive real analysis, situations do arise where this seemingly theoretical distinction is important. Every numerical analyst knows that you cannot use Laplace's determinant method to solve linear systems on a computer for this reason. Many who have done a lot of linear algebra have computed left inverses to a matrix that turn out not to be right inverses. Whenever finite precision arithmetic is used, the result of a sequence of computations depends on the order of the computations and the actual arithmetic algorithms used by the hardware. This problem is known as the consistent comparison problem. It can be shown to occur in simple three-version control systemsexactly the kind of application you would expect to benefit from software redundancy.^[22]

Advantages of NVP

NVP has some advantages for critical applications that must never be allowed to fail:^[23]

The independent design and build of functionally similar programs is more costly but does not add to system complexity.
Due to this design, if diversity fails to deliver trustworthy operation, it does so independently and with very low risk of coincident failures.
As Very Large-Scale Integration (VLSI) circuits (chips) become larger and more complex, it is more difficult to check out all possible failure modes. Different ways of computing the same critical function tend to reduce this risk.
Verification and validation time of multiple versions are reduced because multiplicity does not add to complexity.
Given an effective specification for multiple versions, the programs can be job-shopped to a variety of experts at competitive rates.

Disadvantages of NVP

NVP's disadvantages or challenges are fewer but well worth careful consideration:

An unambiguous, precise specification is a sine qua non for contracting functional requirements to a diverse group of program version suppliers.
The hope for NVP as a software reliability enhancement is based on faith that software designed and implemented in different ways will not cause similar errors at the decision point.
The cost of NVP is a linear multiple of the cost of single-version software. Economy suggests that NVP be applied only to the critical paths in the program.