Fully fault-tolerant systems use fault-tolerant disk arrays, such as RAID, or storage area networks (SANs) to prevent the loss of data. In this lesson you’ll learn how to design a fault-tolerant data storage system by using RAID or SANs.
One method you can use to protect data is RAID. RAID provides fault tolerance by implementing data redundancy. With data redundancy, a computer writes data to more than one disk, which protects the data in the event of a single hard disk failure. You can implement RAID fault tolerance as either a software or hardware solution. The software implementation is available in Windows 2000 Server.
Although the data is available and current in a fault-tolerant system, you should still make backups to protect the information on hard disks from erroneous deletions, fire, theft, or other disasters. Disk fault tolerance isn’t an alternative to a backup strategy with offsite storage, which is the best insurance for recovering lost or damaged data.
If you experience the loss of a hard disk due to mechanical or electrical failure and haven’t implemented fault tolerance, your only option for recovering the data on the failed drive is to replace the hard disk and restore your data from a backup. However, the loss of access to the data while you replace the hard disk and restore your data can translate into lost time and money.
To store large amounts of data, you can use a SAN to make data available in the event of a disaster. A SAN provides fault tolerance on a large scale.
With software implementations of RAID, there’s no fault tolerance following a failure until the fault is repaired. If a second fault occurs before the data lost from the first fault is regenerated, you can recover the data only by restoring it from a backup.
Windows 2000 Server supports two software implementations of RAID that provide fault tolerance: mirrored volumes (RAID-1) and striped volumes with parity (RAID-5). In Windows 2000, you can create new RAID volumes only on Windows 2000 dynamic disks.
When you upgrade Windows NT 4 to Windows 2000, any existing mirror sets or stripe sets with parity are retained. Windows 2000 provides limited support for these fault tolerance sets, allowing you to manage and delete them.
A mirrored volume uses the Windows 2000 Server fault tolerance driver (Ftdisk.sys) to write the same data to a volume on each of two physical disks simultaneously, as shown in Figure 3.2. Each volume is considered a member of the mirrored volume. Implementing a mirrored volume helps to ensure the survival of data in the event that one member of the mirrored volume fails.
Figure 3.2 - Mirrored volume
A mirrored volume can contain any partition, including the boot or system partition; however, both disks in a mirrored volume must be Windows 2000 dynamic disks.
Performance on Mirrored Volumes Mirrored volumes can enhance read performance because the fault tolerance driver reads from both members of the volume at once. There can be a slight decrease in write performance because the fault tolerance driver must write to both members. When one member of a mirrored volume fails, performance returns to normal because the fault tolerance driver works with only a single partition.
Because disk space usage is only 50 percent (two members for one set of data), mirrored volumes can be expensive.
Deleting a mirrored volume will delete all the information stored on both disks.
If the same disk controller controls both physical disks in a mirrored volume and the disk controller fails, neither member of the mirrored volume is accessible. You can install a second controller in the computer so that each disk in the mirrored volume has its own controller. This arrangement, called disk duplexing, can protect the mirrored volume against both controller failure and hard disk failure. Some hardware implementations of disk duplexing use two or more channels on a single disk controller card.
Disk duplexing reduces bus traffic and potentially improves read performance. Disk duplexing is a hardware enhancement to a Windows 2000 mirrored volume and requires no additional software configuration.
Windows 2000 Server also supports fault tolerance through striped volumes with parity. Parity is a mathematical method of determining the number of odd and even bits in a number or series of numbers, which you can use to reconstruct data if one number in a sequence of numbers is lost.
In a RAID-5 volume, Windows 2000 achieves fault tolerance by adding a parity-information stripe to each disk partition in the volume, as shown in Figure 3.3. If a single disk fails, Windows 2000 can use the data and parity information on the remaining disks to reconstruct the data that was on the failed disk.
Figure 3.3 - RAID-5 parity-information stripes
Because of the parity calculation, write operations on a RAID-5 volume are slower than on a mirrored volume. However, RAID-5 volumes provide better read performance than mirrored volumes, especially with multiple controllers, because data is distributed among multiple drives. If a disk fails, however, the read performance on a RAID-5 volume slows while Windows 2000 Server reconstructs the data for the failed disk by using parity information.
RAID-5 volumes have a cost advantage over mirrored volumes because disk usage is optimized. The more disks you have in the RAID-5 volume, the less the cost of the redundant data stripe. Table 3.2 shows how the amount of space required for the data stripe decreases with the addition of 2-gigabyte (GB) disks to the RAID-5 volume.
Table 3.2 RAID-5 Redundancy
|Number of Disks||Disk Space Used||Available Disk Space||Redundancy|
RAID-5 volumes implement some software restrictions. First, RAID-5 volumes involve a minimum of 3 drives and a maximum of 32 drives. Second, a software-level RAID-5 volume can’t contain the boot or system partition.
The Windows 2000 operating system isn’t aware of RAID implementations in hardware. Therefore, the restrictions that apply to software-level RAID don’t apply to hardware-level RAID configurations.
RAID-1 volumes and RAID-5 volumes provide different levels of fault tolerance. Deciding which option to implement depends on the level of protection you require and the cost of hardware. The major differences between RAID-1 and RAID-5 volumes are performance and cost. Table 3.3 describes some differences between software-level RAID-1 and RAID-5.
Table 3.3 Comparing RAID-1 and RAID-5
|Mirrored Volumes (RAID-1)||Striped Volumes with Parity (RAID-5)|
Supports file allocation table (FAT) and NT file system (NTFS)
Supports FAT and NTFS
Can protect system or boot partition
Can’t protect system or boot partition
Requires two hard disks
Requires a minimum of 3 hard disks and allows a maximum of 32 hard disks
Has a higher cost per megabyte
Has a lower cost per megabyte
50 percent used for redundancy
Equivalent of one physical drive used for redundancy
Has good write performance
Has moderate write performance
Has good read performance
Has excellent read performance
Uses less system memory
Requires more system memory
Generally, mirrored volumes offer read and write performance comparable to that of single disks. RAID-5 volumes offer better read performance than mirrored volumes, especially with multiple controllers, because data is distributed among multiple drives. However, the need to calculate parity information requires more computer memory, which can slow write performance.
Mirroring uses only 50 percent of the available disk space, so it’s more expensive in cost per MB than disks without mirroring. RAID-5 uses 33 percent of the available disk space for parity information when you use the minimum number of hard disks (three). With RAID-5, disk utilization improves as you increase the number of hard disks.
In a hardware solution, the disk controller interface handles the creation and regeneration of redundant information. Some hardware vendors implement RAID data protection directly in their hardware, as with disk array controller cards. Because these methods are vendor specific and bypass the fault tolerance software drivers of the operating system, they offer performance improvements over software implementations of RAID. In addition, hardware implementations of RAID usually include extra features, such as additional fault-tolerant RAID configurations, hot swapping of failed hard disks, hot sparing for online failover, and dedicated cache memory for improved performance.
The level of RAID supported in a hardware implementation depends on the hardware manufacturer.
Consider the following points when deciding whether to use a software or hardware implementation of RAID:
With hardware RAID, mirrored volumes can be striped across multiple disks. This configuration is often referred to as RAID-10: RAID-1 mirroring and RAID-0 striping. Unlike RAID-0, RAID-10 is a fault-tolerant RAID configuration because each disk in the stripe is also mirrored. RAID-10 improves disk I/O by performing read and write operations across the stripe.
A storage area network (SAN) is an elaborate network comprised of one or more storage systems, each capable of providing terabytes of disk storage capacity at very high transfer rates. Most SANs use Fibre Channel technology and are capable of providing I/O throughputs in the gigabits-per-second (Gbps) range (100 to 200 megabytes per second [MBps] or higher). A SAN also allows for flexible configurations, is very scalable, and ensures high availability for mission-critical data storage.
SANs can improve performance for many applications that move large amounts of data between multiple servers over the network: Network resources are freed up for other transactions, and bulk data transfers are performed on the SAN at a much faster rate by utilizing the SAN Fibre Channel network. For example, before implementing SANs, one organization maintained a large sales database that performed five 70-GB transfers over the network per weekend and incurred 24 hours of planned downtime. With the SAN architecture, the same operation takes only two to three hours and the network isn’t used.
Key elements common to different kinds of hardware-specific SANs include the following:
The hardware components that make up a SAN are similar to those of a network with storage elements but vary depending on the type of SAN being implemented and the hardware vendor. Host servers require Fibre Channel interfaces, known as host bus adapters (HBAs), to interface with the SAN. Storage components such as tape drives, disk drives, RAID controllers, hubs, switches, storage processors, disk enclosures, and arrays make up the SAN itself.
In recent years there’s been more and more interest in using SANs to provide fault-tolerant large-scale storage for both files and network applications. However, because of the large initial cost of implementing a SAN, they’re generally used only in networks that need more than 100 to 200 GB of storage capacity.
You can use a SAN to centralize your data storage and simplify administration of backups and restores. SANs remove the storage function from general-purpose servers onto a high-speed network specifically designed for moving large amounts of data. This process provides the following advantages:
If you can justify the cost, the SAN will allow you to build a storage solution with far greater scalability than multiple arrays. In addition, the cost of ownership is lower due to centralized management and higher availability of the storage within the SAN. For example, when you use a SAN with your Microsoft Exchange mailbox servers, you can back up and restore your mailbox data much faster, which in turn helps fulfill service level agreements (SLAs) and maximizes the number of users that one server can host.
Although a deployment can be expensive, a SAN solution could be preferable because the long-term total cost of ownership (TCO) might be lower than the cost of maintaining many small arrays. Consider the following advantages of a SAN solution:
Hardware vendors implement most SAN solutions. Find a SAN provider who will help you with the process of designing, installing, and maintaining your SAN and discuss your storage needs with them. They’ll then be able to configure the SAN to offer you the best combination of performance, security, and storage group and tracking log distribution.
Before you invest in a SAN, calculate the cost of your current storage solution in terms of hardware and administrative resources and evaluate the company’s need for dependable storage. Then calculate whether moving to a SAN would provide a greater overall cost and reliability benefit than maintaining multiple arrays would.
In the past, SANs have been implemented by using a dedicated direct connection or a Fibre Channel–arbitrated loop. Newer Fibre Channel fabric switches provide much higher levels of throughput and allow administrators to design SANs that minimize or eliminate any single points of failure.
At minimum, a switched FC SAN includes the following:
Note that each edge switch is connected to both core switches.
When implementing a SAN, double all devices except the core: use two Fibre Channel adapters in each server, two edge switches in each LAN, two edge switches for the SAN disk farm, and two edge switches for the SAN backup device.
You might have to make several decisions when designing a fault-tolerant data storage system. First, you should decide whether to use RAID or a SAN. If you use RAID, you’ll need to decide whether to use a hardware implementation or a software implementation. If you use a software implementation, you must decide whether to use RAID-1 or RAID-5. Table 3.4 describes each of these strategies.
Table 3.4 Storage Strategies
A SAN is a good strategy to use if you need more than 100 to 200 GB of storage capacity. Although the initial cost of implementing a SAN is large, the long-term TCO might be lower than RAID. In addition, the cost and management of the storage within the SAN can be kept to a minimum while providing high availability. If the cost of a SAN can’t be justified, use a RAID configuration.
A hardware implementation can offer performance improvements over software implementations of RAID, and hardware can sometimes support hot swapping and hot sparing. However, a hardware implementation is more expensive than a software implementation, and your equipment options might be limited to a specific vendor. Software fault tolerance is cheaper, but if you have a drive failure, downtime is required to replace the failed drive.
RAID-5 volumes have a lower cost per MB and better read performance than RAID-1 volumes. However, RAID-1 volumes can protect system or boot partitions and have better write performance. You can eliminate any single point of failure when implementing RAID-1 by using disk duplexing.
If you need more than 100 to 200 GB of storage capacity and you can justify the expense, you should use a SAN to provide fault-tolerant storage. Otherwise, you should use RAID. A hardware implementation of RAID is preferable to a software implementation if you’re willing to make the investment and if you can work within the limits of vendor specifications. If you decide to use a software implementation, use RAID-1 for applications that require high availability and don’t require a lot of disk space. You should use RAID-5 for environments with mostly read operations and occasional write operations. When implementing RAID, use disk duplexing.
The Tailspin Toys company maintains a relational database that contains customer information. The database is stored on a dedicated Windows 2000 Server computer that’s configured with SQL Server 2000. RAID-1 and RAID-5 are used to ensure fault tolerance. Figure 3.4 shows how the logical partitions, logical disks, and physical disks are set up on the server.
Figure 3.4 - A fault-tolerant disk configuration in a Windows 2000 Server computer
The operating system is stored on one mirror set (partition C:), and the database log files are stored on a second mirror set (partition D:). Partition E: contains the database files. Partition D: is separate from Partition E: because the log files are write-intensive and RAID-1 is better suited to write-intensive operations. On the other hand, Partition E: contains the database files because RAID-5 is better suited for large sequential reads and large databases where reads occur more often than writes.
One method of protecting data is to use RAID. You can implement RAID fault tolerance as either a software or hardware solution. The software implementation is available in Windows 2000 Server. Windows 2000 supports RAID-1 and RAID-5. RAID-1 uses the fault tolerance driver (Ftdisk.sys) to write the same data to a volume on each of two physical disks simultaneously. You can install a second controller in the computer so that each disk in the mirrored volume has its own controller. In a RAID-5 volume, Windows 2000 achieves fault tolerance by adding a parity-information stripe to each disk partition in the volume. Generally, RAID-1 volumes offer read and write performance comparable to that of single disks. RAID-5 volumes offer better read performance than mirrored volumes, but poorer write performance. RAID-5 is also more inexpensive to implement than RAID-1. Some hardware vendors implement RAID data protection directly in their hardware, as with disk array controller cards. Hardware solutions are more expensive than software solutions, but performance is better and you can replace a disk without shutting down the computer. To store large amounts of data, you can use a SAN to make data available in the event of a disaster. A SAN provides fault-tolerant storage on a large scale. However, a SAN is expensive to implement. If you’re storing more than 5 terabytes (TB) of data and you can justify the expense, you should use a SAN to provide fault-tolerant storage. Otherwise you should use RAID.