Disk System Fault Tolerance | Network+ Study Guide

A hard disk is a temporary storage device, and every hard disk will eventually fail. The most common problem is a complete hard-disk failure (also known as a hard-disk crash). When this happens, all stored data is irretrievable. Therefore, if you want your data to be accessible 90 to 100 percent of the time (as with warm and hot sites), you need to use some method of disk fault tolerance. Typically, disk fault tolerance is achieved through disk management technologies such as mirroring, striping, and duplexing drives, and provides some level of data protection. As with other methods of fault tolerance, disk fault tolerance means that a disk system is able to recover from an error condition of some kind.

The methods that provide fault tolerance for hard-disk systems include:

Mirroring
Duplexing
Data striping
Redundant array of independent (or inexpensive) disks (RAID)

Understanding Disk Volumes

Before you read about the various methods of providing fault tolerance for disk systems, you should know about one important concept: volumes. When you install a new hard disk into a computer and prepare it for use, the NOS sets up the disk so that you can store data on it in a process known as formatting. Once this has been achieved, the NOS can access the disk. Before it can store data on the disk, it must set up what is known as a volume. A volume, for all practical purposes, is a named chunk of disk space. This chunk can exist on part of a disk, can exist on all of a disk, or can span multiple disks. Volumes provide a way of organizing disk storage, as you can see in this illustration:

click to expand

Disk Mirroring

Mirroring a drive means designating a hard-disk drive in the computer as a mirror or duplicate to another, specified drive. The two drives are attached to a single disk controller. This disk fault tolerance feature is provided by most network operating systems. When the NOS writes data to the specified drive, the same data is also written to the drive designated as the mirror. If the first drive fails, the mirror drive is already online, and because it has a duplicate of the information contained on the specified drive, the users won’t know that a disk drive in the server has failed. The NOS notifies the administrator that the failure has occurred. The downside is that if the disk controller fails, neither drive is available. Figure 9.1 shows how disk mirroring works.

click to expand
Figure 9.1: Disk mirroring

The drives do not need to be identical, but this helps. Both drives must have the same amount of free space to allow a mirror to be formed. For example, you have two 4GB drives; one has 3GB free, and the other has 2GB free. You can create one 2GB mirrored system.

Note

Mirroring is an implementation of RAID level 1, which is discussed in detail later in this chapter.

Disk Duplexing

As with mirroring, duplexing also saves data to a mirror drive. In fact, the only major difference between duplexing and mirroring is that duplexing uses two separate disk controllers (one for each disk). Thus, duplexing provides not only a redundant disk, but a redundant controller and data ribbon as well. Duplexing provides fault tolerance even if one of the controllers fails. Figure 9.2 shows a duplexed disk system. Compare this with Figure 9.1. Notice that there is now an extra disk controller in the system.

click to expand
Figure 9.2: Disk duplexing

Note

Duplexing is also an implementation of RAID level 1.

Disk Striping

From a performance point of view, writing data to a single drive is slow. When three drives are configured as a single volume, information must fill the first drive before it can go to the second and fill the second before filling the third. If you configure that volume to use disk striping, you will see a definite performance gain. Disk striping breaks up the data to be saved to disk into small portions and sequentially writes the portions to all disks simultaneously in small areas called stripes. These stripes maximize performance because all of the read/write heads are working constantly. Figure 9.3 shows an example of striping data across multiple disks. Notice that the data is broken into sections and that each section is sequentially written to a separate disk.

click to expand
Figure 9.3: How disk striping works

Striping data across multiple disks improves only performance; it does not improve fault tolerance. To add fault tolerance to disk striping, it is necessary to use parity. Disk striping is also known as RAID level 0.

Parity Information

Parity, as it relates to disk fault tolerance, is a general term for the fault tolerance information computed for each chunk of data written to a disk. This parity information can be used to reconstruct missing data should a disk fail. Striping can use parity or not, but if the striping technology doesn’t use parity, you won’t gain any fault tolerance. When using striping with parity, the parity information is computed for each block and written to the drive.

The advantage to using parity with striping is gaining fault tolerance. If any part of the data gets lost or destroyed, the information can be rebuilt from the parity information. The downside to using parity is that computing and writing parity information reduces the total performance of a disk system that uses striping. The parity information also reduces the total amount of free disk space.

Redundant Array of Inexpensive (or Independent) Disks (RAID)

RAID is a technology that uses an array of less expensive hard disks instead of one enormous hard disk and provides several methods for writing to those disks to ensure redundancy. Those methods are described as levels, and each level is designed for a specific purpose:

RAID 0 (Commonly Used) This method is the fastest because all read/ write heads are constantly being used without the burden of parity or duplicate data being written. A system using this method has multiple disks, and the information to be stored is striped across the disks in blocks without parity. This RAID level only improves performance; it does not provide fault tolerance.

RAID 1 (Commonly Used) This level uses two hard disks, one mirrored to the other (commonly known as mirroring; duplexing is also an implementation of RAID 1). This is the most basic level of disk fault tolerance. If the first hard disk fails, the second automatically takes over. No parity or error-checking information is stored. Rather, each drive has duplicate information of the other. If both drives fail, a new drive must be installed and configured, and the data must be restored from a backup.

RAID 2 At this level, individual bits are striped across multiple disks. One drive (designated as the parity drive) in this configuration is dedicated to storing parity data. If any data drive (a drive in this configuration that is not the parity drive) fails, the data on that drive can be rebuilt from parity data stored on the parity drive. At least three disk drives are required in this configuration. This is not a commonly used implementation.

RAID 3 At this level, data is striped across multiple hard drives using a parity drive (similar to RAID 2). The main difference is that the data is striped in bytes, not bits as in RAID 2. This configuration is popular because more data is written and read in one operation, increasing overall disk performance.

RAID 4 This level is similar to RAID 2 and 3 (striping with parity drive), except that data is striped in blocks, which facilitates fast reads from one drive. RAID 4 is the same as RAID 0, with the addition of a parity drive. This is not a popular implementation.

RAID 5 (Commonly Used) At this level, the data and parity are striped across several drives. This allows for fast writes and reads. The parity information for data on one disk is stored with the data on another disk, so if any one disk fails, the drive can be replaced and its data can be rebuilt from the parity data stored on the other drives. This works well if one disk fails. If more than one disk fails, however, the data will need to be recovered from backup media. A minimum of three disks is required. Five or more disks are most often used.

Note

There are other levels of RAID, including RAID 53, 6, 7, and 10, but because they aren’t covered on the exam, we won’t discuss them here.