Lesson 2: Designing Data Storage for High Availability | MCSE Training Kit (Exam 70-226): Designing Highly Available Web Solutions with Microsoft Windows 2000 Server Technologies (MCSE Training Kits)

Fully fault-tolerant systems use fault-tolerant disk arrays, such as RAID, or storage area networks (SANs) to prevent the loss of data. In this lesson you’ll learn how to design a fault-tolerant data storage system by using RAID or SANs.

After this lesson, you will be able to

Design a data storage system that uses RAID to provide fault tolerance
Incorporate a SAN into your data storage system to provide fault tolerance

Estimated lesson time: 30 minutes

Disk Fault Tolerance

One method you can use to protect data is RAID. RAID provides fault tolerance by implementing data redundancy. With data redundancy, a computer writes data to more than one disk, which protects the data in the event of a single hard disk failure. You can implement RAID fault tolerance as either a software or hardware solution. The software implementation is available in Windows 2000 Server.

Although the data is available and current in a fault-tolerant system, you should still make backups to protect the information on hard disks from erroneous deletions, fire, theft, or other disasters. Disk fault tolerance isn’t an alternative to a backup strategy with offsite storage, which is the best insurance for recovering lost or damaged data.

If you experience the loss of a hard disk due to mechanical or electrical failure and haven’t implemented fault tolerance, your only option for recovering the data on the failed drive is to replace the hard disk and restore your data from a backup. However, the loss of access to the data while you replace the hard disk and restore your data can translate into lost time and money.

To store large amounts of data, you can use a SAN to make data available in the event of a disaster. A SAN provides fault tolerance on a large scale.

Software Implementations of RAID

With software implementations of RAID, there’s no fault tolerance following a failure until the fault is repaired. If a second fault occurs before the data lost from the first fault is regenerated, you can recover the data only by restoring it from a backup.

Windows 2000 Server supports two software implementations of RAID that provide fault tolerance: mirrored volumes (RAID-1) and striped volumes with parity (RAID-5). In Windows 2000, you can create new RAID volumes only on Windows 2000 dynamic disks.

When you upgrade Windows NT 4 to Windows 2000, any existing mirror sets or stripe sets with parity are retained. Windows 2000 provides limited support for these fault tolerance sets, allowing you to manage and delete them.

Mirrored Volumes (RAID-1)

A mirrored volume uses the Windows 2000 Server fault tolerance driver (Ftdisk.sys) to write the same data to a volume on each of two physical disks simultaneously, as shown in Figure 3.2. Each volume is considered a member of the mirrored volume. Implementing a mirrored volume helps to ensure the survival of data in the event that one member of the mirrored volume fails.

Figure 3.2 - Mirrored volume

A mirrored volume can contain any partition, including the boot or system partition; however, both disks in a mirrored volume must be Windows 2000 dynamic disks.

Performance on Mirrored Volumes Mirrored volumes can enhance read performance because the fault tolerance driver reads from both members of the volume at once. There can be a slight decrease in write performance because the fault tolerance driver must write to both members. When one member of a mirrored volume fails, performance returns to normal because the fault tolerance driver works with only a single partition.

Because disk space usage is only 50 percent (two members for one set of data), mirrored volumes can be expensive.

Deleting a mirrored volume will delete all the information stored on both disks.

Disk Duplexing

If the same disk controller controls both physical disks in a mirrored volume and the disk controller fails, neither member of the mirrored volume is accessible. You can install a second controller in the computer so that each disk in the mirrored volume has its own controller. This arrangement, called disk duplexing, can protect the mirrored volume against both controller failure and hard disk failure. Some hardware implementations of disk duplexing use two or more channels on a single disk controller card.

Disk duplexing reduces bus traffic and potentially improves read performance. Disk duplexing is a hardware enhancement to a Windows 2000 mirrored volume and requires no additional software configuration.

Striped Volumes with Parity (RAID-5)

Windows 2000 Server also supports fault tolerance through striped volumes with parity. Parity is a mathematical method of determining the number of odd and even bits in a number or series of numbers, which you can use to reconstruct data if one number in a sequence of numbers is lost.

In a RAID-5 volume, Windows 2000 achieves fault tolerance by adding a parity-information stripe to each disk partition in the volume, as shown in Figure 3.3. If a single disk fails, Windows 2000 can use the data and parity information on the remaining disks to reconstruct the data that was on the failed disk.

Figure 3.3 - RAID-5 parity-information stripes

Because of the parity calculation, write operations on a RAID-5 volume are slower than on a mirrored volume. However, RAID-5 volumes provide better read performance than mirrored volumes, especially with multiple controllers, because data is distributed among multiple drives. If a disk fails, however, the read performance on a RAID-5 volume slows while Windows 2000 Server reconstructs the data for the failed disk by using parity information.

RAID-5 volumes have a cost advantage over mirrored volumes because disk usage is optimized. The more disks you have in the RAID-5 volume, the less the cost of the redundant data stripe. Table 3.2 shows how the amount of space required for the data stripe decreases with the addition of 2-gigabyte (GB) disks to the RAID-5 volume.

Table 3.2 RAID-5 Redundancy

Number of Disks	Disk Space Used	Available Disk Space	Redundancy
3	6 GB	4 GB	33 percent
4	8 GB	6 GB	25 percent
5	10 GB	8 GB	20 percent

RAID-5 volumes implement some software restrictions. First, RAID-5 volumes involve a minimum of 3 drives and a maximum of 32 drives. Second, a software-level RAID-5 volume can’t contain the boot or system partition.

The Windows 2000 operating system isn’t aware of RAID implementations in hardware. Therefore, the restrictions that apply to software-level RAID don’t apply to hardware-level RAID configurations.

RAID-1 Volumes vs. RAID-5 Volumes

RAID-1 volumes and RAID-5 volumes provide different levels of fault tolerance. Deciding which option to implement depends on the level of protection you require and the cost of hardware. The major differences between RAID-1 and RAID-5 volumes are performance and cost. Table 3.3 describes some differences between software-level RAID-1 and RAID-5.

Table 3.3 Comparing RAID-1 and RAID-5

Mirrored Volumes (RAID-1)	Striped Volumes with Parity (RAID-5)
Supports file allocation table (FAT) and NT file system (NTFS)	Supports FAT and NTFS
Can protect system or boot partition	Can’t protect system or boot partition
Requires two hard disks	Requires a minimum of 3 hard disks and allows a maximum of 32 hard disks
Has a higher cost per megabyte	Has a lower cost per megabyte
50 percent used for redundancy	Equivalent of one physical drive used for redundancy
Has good write performance	Has moderate write performance
Has good read performance	Has excellent read performance
Uses less system memory	Requires more system memory

Generally, mirrored volumes offer read and write performance comparable to that of single disks. RAID-5 volumes offer better read performance than mirrored volumes, especially with multiple controllers, because data is distributed among multiple drives. However, the need to calculate parity information requires more computer memory, which can slow write performance.

Mirroring uses only 50 percent of the available disk space, so it’s more expensive in cost per MB than disks without mirroring. RAID-5 uses 33 percent of the available disk space for parity information when you use the minimum number of hard disks (three). With RAID-5, disk utilization improves as you increase the number of hard disks.

Hardware Implementations of RAID

In a hardware solution, the disk controller interface handles the creation and regeneration of redundant information. Some hardware vendors implement RAID data protection directly in their hardware, as with disk array controller cards. Because these methods are vendor specific and bypass the fault tolerance software drivers of the operating system, they offer performance improvements over software implementations of RAID. In addition, hardware implementations of RAID usually include extra features, such as additional fault-tolerant RAID configurations, hot swapping of failed hard disks, hot sparing for online failover, and dedicated cache memory for improved performance.

The level of RAID supported in a hardware implementation depends on the hardware manufacturer.

Consider the following points when deciding whether to use a software or hardware implementation of RAID:

Hardware fault tolerance is more expensive than software fault tolerance.
Hardware fault tolerance generally provides faster disk I/O than software fault tolerance.
Hardware fault tolerance solutions might limit equipment options to a single vendor.
Hardware fault tolerance solutions might implement hot swapping of hard disks to allow for replacement of a failed hard disk without shutting down the computer and hot sparing so that a failed disk is automatically replaced by an online spare.

With hardware RAID, mirrored volumes can be striped across multiple disks. This configuration is often referred to as RAID-10: RAID-1 mirroring and RAID-0 striping. Unlike RAID-0, RAID-10 is a fault-tolerant RAID configuration because each disk in the stripe is also mirrored. RAID-10 improves disk I/O by performing read and write operations across the stripe.

SANs

A storage area network (SAN) is an elaborate network comprised of one or more storage systems, each capable of providing terabytes of disk storage capacity at very high transfer rates. Most SANs use Fibre Channel technology and are capable of providing I/O throughputs in the gigabits-per-second (Gbps) range (100 to 200 megabytes per second [MBps] or higher). A SAN also allows for flexible configurations, is very scalable, and ensures high availability for mission-critical data storage.

SANs can improve performance for many applications that move large amounts of data between multiple servers over the network: Network resources are freed up for other transactions, and bulk data transfers are performed on the SAN at a much faster rate by utilizing the SAN Fibre Channel network. For example, before implementing SANs, one organization maintained a large sales database that performed five 70-GB transfers over the network per weekend and incurred 24 hours of planned downtime. With the SAN architecture, the same operation takes only two to three hours and the network isn’t used.

Key elements common to different kinds of hardware-specific SANs include the following:

Externalized storage Storage that isn’t installed for private single server access
Centralized storage Storage that can be centrally located, managed, and controlled
Remote clustering Storage that enables single server and multiserver access

The hardware components that make up a SAN are similar to those of a network with storage elements but vary depending on the type of SAN being implemented and the hardware vendor. Host servers require Fibre Channel interfaces, known as host bus adapters (HBAs), to interface with the SAN. Storage components such as tape drives, disk drives, RAID controllers, hubs, switches, storage processors, disk enclosures, and arrays make up the SAN itself.

In recent years there’s been more and more interest in using SANs to provide fault-tolerant large-scale storage for both files and network applications. However, because of the large initial cost of implementing a SAN, they’re generally used only in networks that need more than 100 to 200 GB of storage capacity.

You can use a SAN to centralize your data storage and simplify administration of backups and restores. SANs remove the storage function from general-purpose servers onto a high-speed network specifically designed for moving large amounts of data. This process provides the following advantages:

Optimal server rack space created by moving disk arrays out of the rack
Increased security created by storing data in a separate network not vulnerable to currently known types of attacks
LAN-free backups provided by keeping backup traffic off the data network

If you can justify the cost, the SAN will allow you to build a storage solution with far greater scalability than multiple arrays. In addition, the cost of ownership is lower due to centralized management and higher availability of the storage within the SAN. For example, when you use a SAN with your Microsoft Exchange mailbox servers, you can back up and restore your mailbox data much faster, which in turn helps fulfill service level agreements (SLAs) and maximizes the number of users that one server can host.

Although a deployment can be expensive, a SAN solution could be preferable because the long-term total cost of ownership (TCO) might be lower than the cost of maintaining many small arrays. Consider the following advantages of a SAN solution:

If you currently have multiple arrays managed by multiple administrators, centralized administration of all storage could allow administrators to be available for other tasks.
No other single solution has the potential to offer the comprehensive and flexible reliability that a vendor-supported SAN provides.

Hardware vendors implement most SAN solutions. Find a SAN provider who will help you with the process of designing, installing, and maintaining your SAN and discuss your storage needs with them. They’ll then be able to configure the SAN to offer you the best combination of performance, security, and storage group and tracking log distribution.

Before you invest in a SAN, calculate the cost of your current storage solution in terms of hardware and administrative resources and evaluate the company’s need for dependable storage. Then calculate whether moving to a SAN would provide a greater overall cost and reliability benefit than maintaining multiple arrays would.

SANs Connectivity

In the past, SANs have been implemented by using a dedicated direct connection or a Fibre Channel–arbitrated loop. Newer Fibre Channel fabric switches provide much higher levels of throughput and allow administrators to design SANs that minimize or eliminate any single points of failure.

At minimum, a switched FC SAN includes the following:

Interconnected switches or switches cascaded via E-ports
Several switches at the edge—one switch for each LAN connected to the SAN
An FC interface in each server, which is connected to its local SAN switch
A switch for the SAN disk farm, which is connected to both core switches
A switch for the SAN backup device, which is connected to both core switches

Note that each edge switch is connected to both core switches.

Eliminating Points of Failure

When implementing a SAN, double all devices except the core: use two Fibre Channel adapters in each server, two edge switches in each LAN, two edge switches for the SAN disk farm, and two edge switches for the SAN backup device.

Making a Decision

You might have to make several decisions when designing a fault-tolerant data storage system. First, you should decide whether to use RAID or a SAN. If you use RAID, you’ll need to decide whether to use a hardware implementation or a software implementation. If you use a software implementation, you must decide whether to use RAID-1 or RAID-5. Table 3.4 describes each of these strategies.

Table 3.4 Storage Strategies

Strategy	Description
SAN/RAID	A SAN is a good strategy to use if you need more than 100 to 200 GB of storage capacity. Although the initial cost of implementing a SAN is large, the long-term TCO might be lower than RAID. In addition, the cost and management of the storage within the SAN can be kept to a minimum while providing high availability. If the cost of a SAN can’t be justified, use a RAID configuration.
Hardware/software RAID	A hardware implementation can offer performance improvements over software implementations of RAID, and hardware can sometimes support hot swapping and hot sparing. However, a hardware implementation is more expensive than a software implementation, and your equipment options might be limited to a specific vendor. Software fault tolerance is cheaper, but if you have a drive failure, downtime is required to replace the failed drive.
RAID-1/RAID-5	RAID-5 volumes have a lower cost per MB and better read performance than RAID-1 volumes. However, RAID-1 volumes can protect system or boot partitions and have better write performance. You can eliminate any single point of failure when implementing RAID-1 by using disk duplexing.

Recommendations

If you need more than 100 to 200 GB of storage capacity and you can justify the expense, you should use a SAN to provide fault-tolerant storage. Otherwise, you should use RAID. A hardware implementation of RAID is preferable to a software implementation if you’re willing to make the investment and if you can work within the limits of vendor specifications. If you decide to use a software implementation, use RAID-1 for applications that require high availability and don’t require a lot of disk space. You should use RAID-5 for environments with mostly read operations and occasional write operations. When implementing RAID, use disk duplexing.

Example: RAID Configuration for Tailspin Toys

The Tailspin Toys company maintains a relational database that contains customer information. The database is stored on a dedicated Windows 2000 Server computer that’s configured with SQL Server 2000. RAID-1 and RAID-5 are used to ensure fault tolerance. Figure 3.4 shows how the logical partitions, logical disks, and physical disks are set up on the server.

Figure 3.4 - A fault-tolerant disk configuration in a Windows 2000 Server computer

The operating system is stored on one mirror set (partition C:), and the database log files are stored on a second mirror set (partition D:). Partition E: contains the database files. Partition D: is separate from Partition E: because the log files are write-intensive and RAID-1 is better suited to write-intensive operations. On the other hand, Partition E: contains the database files because RAID-5 is better suited for large sequential reads and large databases where reads occur more often than writes.

Lesson Summary

One method of protecting data is to use RAID. You can implement RAID fault tolerance as either a software or hardware solution. The software implementation is available in Windows 2000 Server. Windows 2000 supports RAID-1 and RAID-5. RAID-1 uses the fault tolerance driver (Ftdisk.sys) to write the same data to a volume on each of two physical disks simultaneously. You can install a second controller in the computer so that each disk in the mirrored volume has its own controller. In a RAID-5 volume, Windows 2000 achieves fault tolerance by adding a parity-information stripe to each disk partition in the volume. Generally, RAID-1 volumes offer read and write performance comparable to that of single disks. RAID-5 volumes offer better read performance than mirrored volumes, but poorer write performance. RAID-5 is also more inexpensive to implement than RAID-1. Some hardware vendors implement RAID data protection directly in their hardware, as with disk array controller cards. Hardware solutions are more expensive than software solutions, but performance is better and you can replace a disk without shutting down the computer. To store large amounts of data, you can use a SAN to make data available in the event of a disaster. A SAN provides fault-tolerant storage on a large scale. However, a SAN is expensive to implement. If you’re storing more than 5 terabytes (TB) of data and you can justify the expense, you should use a SAN to provide fault-tolerant storage. Otherwise you should use RAID.