Overview of Common RAID Levels | Microsoft SQL Server 2000 Administrators Companion

3 4

The main characteristic of a RAID array is that two or more physical disk drives are combined to form a logical disk drive, which appears to Windows 2000 (and Performance Monitor) as one physical disk drive. A logical disk drive can hold many hundreds of gigabytes, even though 100-GB disk drives don't exist (yet!).

Most of the RAID levels that will be described here use data striping. Data striping combines the data from two or more disks into one larger RAID logical disk, which is accomplished by placing the first piece of data on the first disk, the second piece of data on the second disk, and so on. These pieces are known as stripes, or chunks. The size of the stripe is determined by the controller. Some controllers allow you to configure the stripe size, whereas other controllers have a fixed stripe size.

The individual piece of data on each disk is referred to as a stripe, or chunk, but the combination of all of the chunks across all disk drives is also referred to as the stripe, as shown in Figure 5-8.

Thus, the term stripe can be used to describe the piece of data on a specific disk drive, as in the disk stripe, or to refer to the set of related data, as in the RAID stripe. Keep this in mind as you read this chapter and others that refer to RAID.

The RAID level identifies the configuration type and therefore the characteristics of a RAID array other than internal or external logic. One of the most important of these characteristics is fault tolerance. Fault tolerance is the ability of a RAID system to continue to function after a disk drive has failed. Fault tolerance is the primary purpose of RAID controllers. Because your data is valuable, you must protect it against a disk failure. In this section, you will learn about the most common RAID levels: how they work, what fault tolerance they provide, and how quickly they perform. There are other RAID levels that are rarely used; only the most popular ones will be mentioned.

click to view at full size.

Figure 5-8. RAID stripes.

RAID 0

RAID 0 is the most basic RAID level, offering disk striping only. A chunk is created on each disk drive, and the controller defines the size of the chunk. As Figure 5-9 illustrates, a round-robin method is used to distribute the data to each chunk of each disk in the RAID 0 array to create a large logical disk.

Figure 5-9. RAID 0.

Although RAID 0 is considered a RAID level, technically, there is no redundancy at this level. Because there is no redundancy, there is no fault tolerance. If any disk fails in a RAID 0 array, all data is lost. The loss of one disk would be equivalent to losing every fourth word in this book. With this portion of the data missing, the array is useless.

RAID 0 Recommendations

RAID 0 is not normally recommended for storing SQL Server data files. Because the data in the database is so important to your business, losing that data could be devastating. Because a RAID 0 array does not protect you against a disk failure, you shouldn't use it for any critical system component, such as the operating system, a transaction log, or database files.

NOTE
A disk drive spins at a high rate and operates at a high temperature. Because the disk is a mechanical component, it eventually will fail. Thus, it is important to protect SQL Server data files from that failure by creating a fault-tolerant system.

RAID 1

RAID 1 is the most basic fault-tolerant RAID level. RAID 1, also known as mirroring, duplicates your data disk. As Figure 5-10 shows, the duplicate contains all of the information that exists on the original disk. In the event of a disk failure, the mirror takes over; thus, you lose no data. Because all the data is held on one disk (and its mirror), no striping is involved. Because RAID 1 uses the second disk drive to duplicate the first disk, the total space of the RAID 1 volume is equivalent to the space of one disk drive. Thus, RAID 1 is costly in that you must double the number of disks and you get no additional disk space in return, but you do get a high level of fault tolerance.

Figure 5-10. RAID 1.

For a RAID 1 volume, an I/O operation is not considered complete until the controller has written data to both disk drives. Until that happens, a "fault" (disk failure) cannot be tolerated without loss of data. Once that data has been written to both disk drives, the data can be recovered in the event of a failure in either disk. This means that if writing the data to one disk takes longer than writing the same data to the other disk, the overall latency will equal the greater of the two latencies.

The fact that the write goes to both disks also reduces the performance of the logical disk drive. When calculating how many I/O operations go to each disk drive in the array, you must multiply the number of writes by two. Reads occur on only one disk. Disks might perform at different rates because the heads on one disk might be in a position different from that of the heads on the other disk; thus, a seek might take longer. Because of a performance feature of RAID 1 known as split seeks, the disks' heads might be in different positions.

Split seeks allow the disks in a RAID 1 volume to read data independently of each other. Split seeks are possible because reads occur on only one disk of the volume at a time. Most controller manufacturers support split seeks. Split seeks increase performance because the I/O load is distributed to two disks instead of one. However, because the disk heads are operating independently and because they both must perform the write, the overall write latency is the longer latency between the two disks.

RAID 1 Recommendations

RAID 1 offers a high degree of fault tolerance and high performance. RAID 1 is a great solution when one disk drive can hold all of the data. Some recommendations for using RAID 1 are as follows:

Use RAID 1 for the disk that contains your operating system because rebuilding it takes so much time. RAID 1 is a good choice also because the operating system can usually fit on one disk.
Use RAID 1 for the transaction log. Typically, the SQL Server transaction log can fit on one disk drive. In addition, the transaction log performs mostly sequential writes. Only rollback operations cause reads from the transaction log. Thus, you can achieve a high rate of performance by isolating the transaction log on its own RAID 1 volume.
Use write caching on RAID 1 volumes. Because a RAID 1 write will not be complete until both writes have been done, performance of writes can be improved through the use of a write cache. When using a write cache, be sure that it is battery-backed up.

As you will see later in this chapter, you can use other fault-tolerant solutions if more than one disk is required. RAID 1 is great when fault tolerance is required and one disk is sufficient.

RAID 5

RAID 5 is a fault-tolerant RAID level that uses parity to protect data. Each RAID stripe creates parity information on one disk in the stripe. Along with the other disks in the RAID stripe, this parity information can be used to re-create the data on any of the other disk drives in the stripe. Thus, a RAID 5 array can tolerate the loss of one disk drive in the array. The parity information is rotated among the various disk drives in the array, as Figure 5-11 shows.

Figure 5-11. RAID 5.

The advantage of RAID 5 is that the space that is available in this RAID level is equal to n - 1, where n is the number of disk drives in the array. Thus, a RAID 5 array made up of 10 disk drives will have the space of 9 disks, making RAID 5 an economical, fault-tolerant choice.

Unfortunately, there are performance penalties associated with RAID 5. Maintaining the parity information requires additional overhead. When data is written to a RAID 5 array, both the target disk stripe and the parity stripe must be read, the parity must be calculated, and then both stripes must be written out. Therefore, a RAID 5 write actually incurs four physical I/O operations, as you will see.

RAID 5 Parity Explained

In RAID 5, a parity bit is created on the data in each stripe on all of the disk drives. A parity bit is an additional piece of data that, when created on a set of bits, determines what the other bits are. This parity bit is created by adding up all of the other bits and determining which value the parity bit must contain to create either an even or odd number. The parity bit, along with all of the remaining bits, can be used to determine the value of a missing bit.

Let's look at an example of how parity works. For this example, we will consider a RAID 5 system with five disk drives. Each disk drive is essentially made up of a number of bits, starting from the first part of the stripe on the disk and ending at the end part of the stripe on the disk. The parity bit is based on the bits from each disk drive.

In this example, we will consider the parity to be even; thus, all of the bits must add up to 0. If the first bit on the first disk drive is 0, the first bit on the second drive is 1, the first bit on the third drive is 1, and the first bit on the fourth drive is 1, the parity must be 1 in order for these bits to add up to an even number, as Table 5-2 shows.

Table 5-2. An example of RAID parity

Disk 1 Bit 1	Disk 2 Bit 1	Disk 3 Bit 1	Disk 4 Bit 1	Disk 5 Parity bit	Sum of bits
0	1	1	1	1	4 (even)

So think of the parity as being created on single bits. Even though the disk stripe contains many bits, you make the data recoverable by creating a parity on the single bits.

As you can see from Table 5-2, the parity is actually created on individual bits in the stripes. Even though the disk drives are broken up into chunks or stripe pieces that might be 64 KB or larger, the parity can be created only at the bit level, as shown here. Parity is actually calculated with a more sophisticated algorithm than that just described.

So let's say, for example, that Disk 3 fails. In this case, the parity bit plus the bits from the other disk drives can be used to recover the missing bit from Disk 3 because they must all add up to an even number.

Creating the Parity As you have seen in this section, the RAID 5 parity is created by finding the sum of the same bits on all of the drives in the RAID 5 array and then creating a parity bit so that the result is even. Well, as you might imagine, it is impractical for an array controller to read all of the data from all of the drives each time an I/O operation occurs. This would be inefficient and slow.

When a RAID 5 array is created, the data is initially zeroed out, and the parity bit is created. You then have a set of RAID 5 disk drives with no data but with a full set of parity bits.

From this point on, whenever data is written to a disk drive, both the data disk and the parity disk must first be read from. The new data is compared with the old data, and if the data for a particular bit has changed, the parity for that bit must be changed. This is accomplished with an exclusive OR (XOR) operation. Thus, only the data disk and the parity disk, not all of the disks in the array, need to be read. Once this operation has been completed, both disk drives must be written out because the parity operation works on entire stripes. Therefore, for each write to a RAID 5 volume, four physical I/O operations are incurred: two reads (one from data and one from parity) and two writes (back to data and back to parity). But with a RAID 5 array, the parity is distributed, so this load should be balanced among all the disk drives in the array.

RAID 5 Recommendations

Because of the additional I/O operations incurred by RAID 5 writes, this RAID level is recommended for disk volumes that are used mostly for reading. Because the parity is distributed among the various disks in the array, all disks are used for read operations. Because of this characteristic, the following is recommended:

Use RAID 5 on read-only volumes. Any disk volume that does more than 10 percent writes is not a good candidate for RAID 5.
Use write caching on RAID 5 volumes. Because a RAID 5 write will not be completed until two reads and two writes have occurred, the response time of writes can be improved through the use of a write cache. (When using a write cache, be sure that it is battery-backed up.) However, the write cache is not a cure for overdriving your disk drives. You must still stay within the capacity of those disks.

As you can see, RAID 5 is economical, but you pay a performance price. You will see later in this chapter how high that price can be.

RAID 10

RAID 10 is a combination of RAID 0 and RAID 1. RAID 10 involves mirroring a disk stripe. Each disk will have a duplicate, but each disk will contain only a part of the data, as Figure 5-12 illustrates. This level offers the fault tolerance of RAID 1 and the convenience and performance advantages of RAID 0.

Figure 5-12. RAID 10.

As with RAID 1, each RAID 10 write operation will incur two physical I/O operations-one to each disk in the mirror. Thus, when calculating the number of I/O operations per disk, you must multiply the writes by two. As with RAID 1, the RAID10 I/O operation is not considered completed until both writes have been done; thus, the write latency might be increased. But, as with RAID 1, most controllers support split seeks with RAID 10.

RAID 10 offers a high degree of fault tolerance. In fact, the array can survive even if more than one disk fails. Of course, the loss of both sides of the mirror cannot be tolerated. If the mirror is split across disk cabinets, the loss of an entire cabinet can be tolerated.

RAID 10 Recommendations

RAID 10 offers high performance and a high degree of fault tolerance. RAID 10 should be used when a large volume is required and more than 10 percent of the I/O operations are writes. RAID 10 recommendations include the following:

Use RAID 10 whenever the array experiences more than 10 percent writes. RAID 5 does not perform as well as RAID 10 with a large number of writes.
Use RAID 10 when performance is critical. Because RAID 10 supports split seeks, you get premium performance.
Use write caching on RAID 10 volumes. Because a RAID 10 write will not be completed until both writes have been done, performance of writes can be improved through the use of a write cache. Write caching is safe only when used in conjunction with battery-backed up caches.

RAID 10 is the best fault-tolerant solution in terms of protection and performance, but it comes at a cost. You must purchase twice the number of disks that are necessary with RAID 0. If your volume is mostly read, RAID 5 might be acceptable.