Disk Arrays

[Previous] [Next]

The most common hardware malfunction is probably a hard disk failure. Even though hard disks have become more reliable over time, they are still subject to failure, especially during their first month or so of use. They are also subject to both catastrophic and degenerative failures caused by power problems. Fortunately, disk arrays have become the norm for most servers, and good fault-tolerant RAID systems are available in Windows 2000 Server and RAID-specific hardware supported by Windows 2000.

The choice of software or hardware RAID, and the particulars of how you configure your RAID system, can significantly affect the cost of your servers. To make an informed choice for your environment and needs, you must understand the trade-offs and the differences in fault tolerance, speed, configurability, and so on.

Hardware vs. Software

RAID can be implemented at the hardware level, using RAID controllers, or at the software level, either by the operating system or by a third-party add-on. Windows 2000 supports both hardware RAID and its own software RAID.

Hardware RAID implementations require specialized controllers and cost much more than an equal level of software RAID. But for that extra price, you get faster, more flexible, and more fault-tolerant RAID. When compared to the software RAID provided in Windows 2000 Server, a good hardware RAID controller supports more levels of RAID, on-the-fly reconfiguration of the arrays, hot-swap and hot-spare drives (discussed later in this chapter), and dedicated caching of both reads and writes.

The Windows 2000 Server software RAID requires that you convert your disks to dynamic disks. The disks will no longer be available to other operating systems, although this really shouldn't be a problem in a production environment. However, you should consider carefully whether you want to convert your boot disk to a dynamic disk. Dynamic disks can be more difficult to access if a problem occurs, and the Windows 2000 setup and installation program provides only limited support. For maximum fault tolerance, we recommend using hardware mirroring on your boot drive; if you do use software mirroring, make sure that you create the required fault-tolerant boot floppy disk and test it thoroughly before you need it. (See Chapter 33.)

RAID Levels for Fault Tolerance

Except for level 0, RAID is a mechanism for storing sufficient information on a group of hard disks such that even if one hard disk in the group fails, no information is lost. Some RAID arrangements go even further, providing protection in the event of multiple hard disk failures. The more common levels of RAID and their appropriateness in a fault-tolerant environment are shown in Table 35-1.

Table 35-1. RAID levels and their fault tolerance

LevelNumber of Disks*SpeedFault ToleranceDescription
0 N +++ --- Striping alone. Not fault-tolerant, but provides for the fastest read and write performance.
1 2N + ++ Mirror or duplex. Slightly faster read than single disk, but no gain during write operations. Failure of any single disk causes no loss in data and minimal performance hit.
3 N+1 ++ + Byte-level parity. Data is striped across multiple drives at the byte level with the parity information written to a single dedicated drive. Reads are much faster than with a single disk, but writes operate slightly slower than a single disk since parity information must be generated and written to a single disk. Failure of any single disk causes no loss of data but can cause a significant loss of performance.
4 N+1 ++ + Block-level parity with a dedicated parity disk. Similar to RAID-3 except that data is striped at the block level.
5 N+1 + ++ Interleaved block-level parity. Parity information is distributed across all drives. Reads are much faster than a single disk but writes are significantly slower. Failure of any single disk provides no loss of data but will result in a major reduction in performance.
0+1 (also known as level 10) 2N +++ ++ Striped mirrored disks. Data is striped across multiple mirrored disks. Failure of any one disk causes no data loss and no speed loss. Failure of a second disk could result in data loss. Faster than a single disk for both reads and writes.
Other Varies +++ +++ Array of RAID arrays. Different hardware vendors have different proprietary names for this RAID concept. Excellent read and write performance. Failure of any one drive results in no loss of performance and continued redundancy.

* In the number of Disks column, N refers to the number of hard disks required to hold the original copy of the data. The plus and minus symbols show relative improvement or deterioration compared to a system using no version of RAID. The scale peaks at three symbols.

When choosing the RAID level to use for a given application or server, consider the following factors:

  • Intended use Will this application be primarily read-intensive, such as file serving, or will it be predominately write-intensive, such as a transactional database?
  • Fault tolerance How critical is this data, and how much can you afford to lose?
  • Availability Does this server or application need to be available at all times, or can you afford to be able to reboot it or otherwise take it offline for brief periods?
  • Performance Is this application or server heavily used, with large amounts of data being transferred to and from it, or is this server or application less I/O intensive?
  • Cost Are you on a tight budget for this server or application, or is the cost of data loss or unavailability the primary driving factor?

You need to evaluate each of these factors when you decide which type of RAID to use for a server or portion of a server. No one answer fits all cases, but the final answer will require you to carefully weigh each of these factors and balance them against your situation and your needs. The following sections take a closer look at each factor and how it weighs in the overall decision-making process.

Intended Use

The intended use, and the kind of disk access associated with that use, plays an important role in determining the best RAID level for your application. Think about how write-intensive the application is and whether the manner in which the application uses the data is more sequential or random. Is your application a three-square-meals-a-day kind of application, with relatively large chunks of data being read or written at a time, or is it more of a grazer or nibbler, reading and writing little bits of data from all sorts of different places?

If your application is relatively write-intensive, you'll want to avoid software RAID if possible and avoid RAID-5 if other considerations don't force you to it. With RAID-5, any application that requires greater than 50 percent writes to reads is likely to be at least somewhat slower if not much slower than it would be on a single disk. You can mitigate this to some extent by using more but smaller drives in your array and by using a hardware controller with a large cache to off-load the parity processing as much as possible. RAID-1, in either a mirror or duplex configuration, provides a high degree of fault tolerance with no significant penalty during write operations—a good choice for the Windows 2000 system disk.

If your application is primarily read-intensive, and the data is stored and referenced sequentially, RAID-3 or RAID-4 may be a good choice. Because the data is striped across many drives, you have parallel access to it, improving your throughput. And since the parity information is stored on a single drive, rather than dispersed across the array, sequential read operations don't have to skip over the parity information and are therefore faster. However, write operations will be substantially slower, and the single parity drive can become an I/O bottleneck.

If your application is primarily read-intensive and not necessarily sequential, RAID-5 is an obvious choice. It provides a good balance of speed and fault tolerance, and the cost is substantially less than RAID-1. Disk accesses are evenly distributed across multiple drives, and no one drive has the potential to be an I/O bottleneck. However, writes will require calculation of the parity information and the extra write of that parity, slowing write operations down significantly.

If your application provides other mechanisms for data recovery or uses large amounts of temporary storage that doesn't require fault tolerance, a simple RAID-0, with no fault tolerance but fast reads and writes, is a possibility.

Fault Tolerance

Carefully examine the fault tolerance of each of the possible RAID choices for your intended use. All RAID levels, except RAID-0, provide some degree of fault tolerance, but the effect of a failure, and the ability to recover from subsequent failures, can be different.

If a drive in a RAID-1 mirror or duplex array fails, a full, complete, exact copy of the data remains. Access to your data or application is unimpeded, and performance degradation is minimal, although you will lose the benefit gained on read operations of being able to read from either disk. Until the failed disk is replaced, however, you will have no fault tolerance on the remaining disk.

In a RAID-3 or RAID-4 array, if one of the data disks fails, a significant performance degradation will occur since the missing data needs to be reconstructed from the parity information. Also, you'll have no fault tolerance until the failed disk is replaced. If it is the parity disk that fails, you'll have no fault tolerance until it is replaced, but also no performance degradation.

In a RAID-5 array, the loss of any disk will result in a significant performance degradation, and your fault tolerance will be gone until you replace the failed disk. Once you replace the disk, you won't return to fault tolerance until the entire array has a chance to rebuild itself, and performance will be seriously degraded during the rebuild process.

RAID systems that are arrays of arrays can provide for multiple failure tolerance. These arrays provide for multiple levels of redundancy and are appropriate for mission-critical applications that must be able to withstand the failure of more than one drive in an array.

REAL WORLD  Multiple Disk Controllers Provide Increased Fault Tolerance
Spending the money for a hardware RAID system will increase your overall fault tolerance, but it can still leave a single point of failure in your disk subsystem: the disk controller itself. While failures of the disk controller are certainly less common, they do happen. Many hardware RAID systems are based on a single multiple-channel controller—certainly a better choice than those based on a single-channel controller, but an even better solution is a RAID system based on multiple identical controllers. In these systems, the failure of a single disk controller is not catastrophic but simply an annoyance. In RAID-1 this technique is known as duplexing, but it is also common with many of the proprietary arrays of arrays that are available from server vendors and in the third-party market.

Availability

All levels of RAID, except RAID-0, provide higher availability than a single drive. However, if availability is expanded to also include the overall performance level during failure mode, some RAID levels provide definite advantages over others. Specifically, RAID-1, mirroring/duplexing, provides enhanced availability when compared to RAID levels 3, 4, and 5 during failure mode. There is no performance degradation when compared to a single disk if one half of a mirror fails, while a RAID-5 array will have substantially compromised performance until the failed disk is replaced and the array is rebuilt.

In addition, RAID systems that are based on an array of arrays can provide higher availability than RAID levels 1 through 5. Running on multiple controllers, these arrays are able to tolerate the failure of more than one disk and the failure of one of the controllers, providing protection against the single point of failure inherent in any single-controller arrangement. RAID-1 that uses duplexed disks running on different controllers—as opposed to RAID-1 that uses mirroring on the same controller—also provides this additional protection and improved availability.

Hot-swap drives and hot-spare drives (discussed later in this chapter) can further improve availability in critical environments, especially hot-spare drives. By providing for automatic failover and rebuilding, they can reduce your exposure to catastrophic failure and provide for maximum availability.

Performance

The relative performance of each RAID level depends on the intended use. The best compromise for many situations is arguably RAID-5, but you should be suspicious of that compromise if your application is fairly write-intensive. Especially for relational database data and index files where the database is moderately or highly write-intensive, the performance hit of using RAID-5 can be substantial. A better alternative is to use RAID-0+1 (also known as RAID-10 from some vendors).

Whatever level of RAID you choose for your particular application, it will benefit from using more small disks rather than a few large disks. The more drives contributing to the stripe of the array, the greater the benefit of parallel reading and writing you'll be able to realize—and your array's overall speed will improve.

Cost

The delta in cost between RAID configurations is primarily the cost of drives, potentially including the cost of additional array enclosures because more drives are required for a particular level of RAID. RAID-1, either duplexing or mirroring, is the most expensive of the conventional RAID levels, since it requires at least 33 percent more raw disk space for a given amount of net storage space than other RAID levels.

Another consideration is that RAID levels that include mirroring/duplexing must use drives in pairs. Therefore, it's more difficult (and more expensive) to add on to an array if you need additional space on the array. A net 18-GB RAID-0+1 array, comprising four 9-GB drives, requires four more 9-GB drives to double in size, a somewhat daunting prospect if your array cabinet has bays for only six drives, for example. A net 18-GB RAID-5 array, however, can be doubled in size simply by adding two more 9-GB drives, for a total of five drives.

Hot-Swap and Hot-Spare Disk Systems

Hardware RAID systems can provide for both hot-swap and hot-spare capabilities. A hot-swap disk system allows failed hard disks to be removed and a replacement disk inserted into the array without powering down the system or rebooting the server. When the new drive is inserted, it is automatically recognized and either will be automatically configured into the array or can be manually configured into it. Additionally, many hot-swap RAID systems allow you to add hard disks into empty slots dynamically, automatically or manually increasing the size of the RAID volume on the fly without a reboot.

A hot-spare RAID configuration uses an additional, preconfigured disk or disks to automatically replace a failed disk. These systems usually don't support hot- swapped hard disks so that the failed disk can't be removed until the system can be powered down, but full fault tolerance is maintained by having the hot spare available.



Microsoft Windows 2000 Server Administrator's Companion, Vol. 1
Microsoft Windows 2000 Server Administrators Companion (IT-Administrators Companion)
ISBN: 1572318198
EAN: 2147483647
Year: 2000
Pages: 366

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net