At the turn of the century when IBM reorganized its disparate processor divisions into the eServer brand, each of the four server teams got to choose a letter to precede the word "Series." The old S/390 division chose the letter "z" for near-zero downtime, to reflect its focus on a design for continuous availability of the processors.
The design of the zSeries processor takes into account single points of failure in many ways. All levels of memory have a type of ECC (error correction code) that ensures that single-bit (often multiple-bit) failures are corrected "on the fly." The instruction processing unit itself has two identical sets of logic executing every instruction. If the results do not agree, then the instruction is retried. If the processor itself fails, a spare CPU is automatically "configured in" to replace it. In fact a number of types of CPU failures, for example the "adder logic" fails, are completely transparent to the software. The replacement processor simply receives a copy of all the current registers and begins executing again at the point where the other processor had failed.
There are times when an almost-perfectly running system simply needs additional hardware resources to avoid impacting availability. The zSeries family also has some interesting solutions in this arena. In 17.2, "Simple server hardware consolidation," there was a brief discussion of Capacity Upgrade on Demand and the ability to "configure in" additional main storage. The I/O adapters are hot pluggable.
In other words, Linux on the mainframe gets to inherit all these mainframe availability characteristics that come with zSeries. Some find the risks of hardware failure in the zSeries machine so low that having a redundant zSeries machine is only considered during disaster recovery discussions.