I need to introduce some terminology before we can continue to discuss RAID:
Since a disk array involves multiple physical disks, I use the term logical volume to distinguish a volume composed of multiple disks.
The Mean Time Between Failure (MTBF) refers to the average time between failures of a component. For example, the MTBF of a given disk drive might be specified at 500,000 hours, meaning that the typical disk of that model will run for half a million hours before failing. Since this is nothing more than a statistical measure, an individual disk drive might well fail fifty (or one million) hours into service. These component-based measurements can be applied to the whole system; for example, a 9 TB array composed of one thousand 9 GB disks with a MTBF of 1.2 million hours each would be expected to have an aggregate MTBF of 1,200 hours (about fifty days).
The Mean Time To Data Loss (MTTDL) gives the amount of time that a system can operate before it suffers a failure sufficient to cause data loss. If all the components operate independently, the MTTDL is equal to the MTBF of the least reliable component; however, designers go to great lengths to ensure that the MTTDL is increased despite constant MTBF numbers for components. For example, a mirrored pair of disks won't suffer data loss unless both drives fail, which is much less likely than one disk failing. It is also critical to note that a high MTTDL does not imply continuous availability: a host controller failure might render a mirrored disk pair inaccessible, but the data could still be completely intact.
The Mean Time To Data Inaccessibility (MTTDI) is not formally defined; however, it is increasingly important to be able to always access your data, not just protect it from loss. The MTTDI gives the amount of time that a system can operate before it suffers a failure sufficient to make data unavailable.
While we won't discuss it here, the Mean Time To Repair (MTTR) is often an important metric, and certainly something that should be monitored . As we'll see, the loss of a disk with some disk array schemes can result in degraded performance and susceptibility to data loss; knowing how long it takes your system to recover from a failed disk (including the time spent to physically replace the failed disk) can be quite useful.