Introduction to RAID


RAID appeared on the scene in 1988, out of the work of Frank Hayes at University of California at Berkeley. As part of a graduate class in computer science, he and his students considered the different ways in which small hard drives could be configured and attached to a Macintosh to improve performance, redundancy, and other beneficial features. The resulting paper, titled "The Case for Redundant Arrays of Inexpensive Disks," argued that a collection of small disks operating together can outperform a single large disk, even one operating at breakneck speed, because the smaller seek paths and larger number of heads possible enhance performance.

The original RAID 1 was written in software and was followed by the higher-performance RAID 2 hardware implementation with redundancy. (These labels bear no relationship to the labels we use now for different RAID types.) You could pull a disk out of RAID 2 while the drives were up and running, which was novel at the time for small systems. (Many of these ideas had preexisting counterparts in the mainframe workspace.) The idea took off, and different types of RAID were proposed and named.

It wasn't long before people realized that although a single array of the small hard drives of the era was "inexpensive," multiple arrays of such drives was not. So the researchers in the field changed the I in RAID to the word independent instead of inexpensive. You can find both definitions in use today.

RAID leads us logically to the concept of storage virtualization, which we consider at the end of this chapter, in the section "Storage Virtualization." RAID as an underpinning storage virtualization tool gives a modern server powerful methods for managing data and can improve operation, availability, and performance immeasurably.

Table 11.3 lists the most popular RAID levels discussed in this chapter.

Table 11.3. Popular RAID Levels

RAID Level

Type

Requires

Fault Tolerance

Usage

RAID 0

Striped

2 drives

No

Graphics, games, non-server applications

RAID 1

Mirror or duplexing

2 drives

Can survive loss of one mirror

Financial and other high-availability applications

RAID 4

Independent disks with shared parity disk

4 drives

Good fault tolerance but slow rebuilds

Filers

RAID 5

Striped with distributed parity blocks

3 drives

Can survive loss of one disk

High-performance applications requiring high availability

RAID 10

Mirrored with striping

4 drives

Can survive the loss of a mirror

High-performance and fault-tolerant applications, but costly


JBODs: Concatenation

As described previously, JBOD is a method for obtaining a larger volume size out of a group of smaller-sized drives. This scheme concatenates the disks so that the addressing scheme extends or spans over the drives in the array, creating a virtual volume, as shown in Figure 11.5.

Figure 11.5. JBOD concatenates disks to form larger volumes. The letters on the disks indicate that partitions can be written on any disk and that these partitions are independent (no relationships to) of any others.


JBOD places no restrictions on the size of the drive, its type, its speed, or the bus that it's on. If your operating system can see the disk, you can create a JBOD with it. So you can consume whatever disk you have around with this arrangement.

JBOD isn't RAID, but it does require the use of a controller. Given that most any controller offers RAID of some kind, most people opt for a RAID solution instead of relying on a JBOD methodology. In Windows, for example, you could use the Disk Management snap-in of the MMC (formerly called the Disk Administrator) to create a JBOD. Creating a JBOD in the Disk Manager is just like creating individual partitions on multiple drives that your system can access. To create a JBOD, you do the following:

1.

Attach the various drives to your controller and then boot your system.

2.

Open the Disk Manager and right-click the first drive in your JBOD. Then select the Initialize Disk command.

3.

If a wizard appears, select the size of the partition you want to create, select its file type, and set a drive label. If a wizard does not appear, right-click an unallocated section of the disk and select the Format command then specify the parameters of the partition you wish to create.

4.

Repeat step 3 to initialize other disks in the JBOD. Then create the partitions you want.

JBOD doesn't get recommended very often, but because the fact that it is both cheap and easy to implement makes it popular in certain environments. With JBOD, when a disk fails, you can still recover the files on your working disks. However, to reestablish the volume, you need to restore from backup.

RAID 0: Striping, No Parity

RAID has come to be codified into several different levels, which are different configurations or techniques for creating data structures. Let's first look at RAID 0, which is the favorite disk structure for PC gamers who want to inexpensively achieve higher disk performance.

RAID 0 takes any number of diskslet's call that number Dnand combines those disks into a single container or logical structure. The container is then formatted so that the data is "striped" across all the disks sequentially, as shown in Figure 11.6. As data is written, the disk head proceeds to write one set of blocks on one drive, followed by the head of the second disk writing the same set of blocks on the second, and so forth. At the end of the last drive in the array, the data writing continues on the first drive, picking up at the next set of blocks that followed the previous set. RAID 0 is sometimes referred to as a striped set.

Figure 11.6. How striping works in a RAID 0 implementation. Notice that there is now a consecutive sequence of lettering, indicating that data is written in small sections, switching from disk to disk as those sections get filled. Striping improves performance by bringing more disk heads into play.


Note

Because RAID 0 must stripe across similar-sized areas of a disk, most RAID 0 implementations require that you use the same disk drive size in order to create the array. This isn't always the case, though; some HBAs or disk managers allow you to create the same-sized partition on each drive and stripe across that. Whenever possible, you should always opt for the solution that gives you the maximum amount of flexibility. Because the hard drive of today is the dinosaur of tomorrow, you want to be able to move up in capacity without having to swap out all your drives at once. So it pays to look carefully for the more forgiving RAID implementations.


For a small file, you might find that the data is on one single drive, but in most cases, you will find that the data is actually spread out across two or more drives, maybe even all the drives. For multifile operations, particularly in a multitasked environment, the data is actually spread out randomly among all the drives, and some operations can approach n times the performance of a single drive. Sometimes people use RAID 0 to split a collection of disks into a larger set of virtual disks.

RAID 0 imposes a slight performance penalty for write operations because the controller has to manage the move from one disk to another. When it comes to read operations, however, RAID 0 gives you a considerable performance enhancement because you can have multiple heads reading data on multiple disks at the same time, with each of them concurrently sending that data to the host controller. The smaller and more numerous the reads, the more performance is boosted. For large data files that are written sequentially across all the disks, RAID 0 doesn't offer much of a performance boost, even for read operations, because the file still has to be read sequentially, which negates the advantage of having more drive heads accessing the data. RAID 0 is helpful when you have a large NFS server with multiple disks or where the operating system limits you to a smaller number of drive letters, such as the 24-letter limit in Windows.

RAID 0 is the lowest form of RAID, and some people argue that it isn't RAID at all. There's no redundant data being written, and there's absolutely no data protection. You can't pull a disk out of RAID 0 because all disks are part of the data structure, although some RAID systems let you add a disk to RAID 0. In fact, because more disks participate in a single data structure, there is n times the MTBF that you will have a disk failure and run into trouble. There are also n times the likelihood that a cluster on a drive will fail, and so forth.

If you implement RAID and run into any kind of problem at all, you better be well backed up because you might have to re-create your RAID set and your volume to get back in business. RAID 0 is really "AID," or simply an association of independent disks, because it has no redundancy. Still, RAID 0 at least offers some performance enhancements, and for that reason it is considered RAID where JBOD is not.

RAID 1: Mirroring

RAID 1 is the simplest form of data redundancy you can have, and it is often used for small, entry-level RAID systems. In a RAID 1 configuration, your data is written in two places at essentially the same time, which is where the term mirror is derived. When you have a problem with one disk, you can break the mirror and switch over to the second copy on the other disk, which is presumably still functioning normally. When your hardware problem is fixed, usually because you've replaced the malfunctioning drive, you can add the first volume back and rebuild the mirror.

Caution

No version of RAID protects you from software errors; that's what backup and snapshots are for. You need to keep in mind that mirroring is only one form of redundancy, and it won't protect you if the nature of your hardware failure damages each drive in the array (fire or flood damage, for example). Mirroring your data is not a substitute for backing up data to some form of removable storage than can be securely stored.


Many dual-channel SCSI HBAs are specifically used to implement mirrored structures. A mirror can be made of part of a diskperhaps a volume; or a mirror can be the simple duplication of one whole drive to another. In the former case, the mirror is almost always created and managed by either the operating system or a program running under the operating system. In the latter case, block copying is faster when done in hardware at the HBA.

RAID 1 provides no performance enhancement with either read or write operations. Depending on how mirroring is implemented and the robustness of your controller, most often mirroring doesn't affect your servers' performance at all. Most mirroring implementations done in hardware are fast enough that they don't have to buffer or cache content written to the second disk. Because your want your mirrored disk to be valid, there is some reading of the second disk to determine whether it has the same data at the same place as the first disk. Usually this data validation can be done in the background.

With RAID 1 you get full data protection against drive or array failure, albeit at a high price. If you have two HBAs, one connected to each array (an architecture called duplexing), RAID 1 also gives you protection against HBA failure. The cost of this protection is that you have duplicated an entire set of drives and perhaps a controller as well. RAID 1 is almost never implemented because there are more attractive options available with the same collection of storage system elements, as you will see when we discuss RAID 1+0 in a moment.

Business Continuance Volumes

In server technology, it is essential to protect a corporate database that is the core asset of your business. When that database resides on a volume on a storage server, you can create a duplicate volume and then use that duplicate volume to create other volumes if you like. EMC pioneered this approach as a data protection technique on its enterprise-class Symmetrix storage server line, and it called the mirrored or cloned volumes Business Continuance Volumes (BCVs). Figure 11.7 shows the principle behind mirroring and BCVs.

Figure 11.7. A mirrored volume and BCVs.


BCVs offer several important advantages. The primary advantage is that you can seamlessly switch over to a BCV when you have a problem with your primary data set, with no downtime that is perceptible to your customers. However, that isn't the only advantage that a BCV offers.

When you split off a BCV from the original data set, you have stopped synchronizing it with the primary volume. You have created a snapshot of your volume in time. Considering that a large volume can take many hours, even days, to back up to tape, you now have a data structure that you can use to back up to a tape library offline. Here's where things get really interesting. If you have a duplicate storage server and create a BCV on it, when you split that BCV off, you then have a test bed you can use to do development work on. More often than not, a large corporation will add additional identical storage servers connected by a LAN (a SAN, actually) or a WAN, in case of either server or site failure, to take over the operation.

A fully loaded, high-end Symmetric server can represent a $1 to $2 million investment for a multi-terabyte system, so the cost of the EMC servers is very often more than the cost of the buildings that surround them. The price of redundancy can be high, but the price of not being redundant can be incalculable.

This technique of BCVs isn't only for the Exxon-Mobils of the world. Regardless of the size of your server, if you can mirror a volume, you can create a BCV. So it pays to think in these terms even if it you can't make the solution as seamless as it is in larger enterprise environments.

RAID 10 (or 1+0): Mirroring with Striping

RAID 10, also called RAID 1+0, combines the RAID techniques you've just learned about in the two previous sections in a combination that some people describe as either nested or stacked RAID levels. RAID 10 gives you the redundancy of a mirrored volume or drive along with the performance benefit of a striped array.

Because RAID 1+0 is nothing more than RAID 1 and RAID 0 combined, you will find that all RAID HBAs, from the cheapest ones you can find to the most expensive ones, offer RAID 1+0. Many people refer to this RAID level as RAID 10 or mirroring with striping to reflect the order in which the RAID levels are applied.

Because RAID 10 is both high performance and fully redundant, this particular configuration is recommended for high-transaction-volume applications where you need performance and fast failover to a redundant data set should something fail on the first array. RAID 10 is one of the most popular RAID levels implemented. Large database servers, messaging servers, and webservers often implement RAID 10 for their disk arrays.

You can think of RAID 1+0 as being formed by applying to your physical disks a first layer of RAID (mirroring, in this case). Then you overlay the second RAID level, which stripes data across the mirrored array.

RAID 0+1: Striping with Mirroring

You might ask what the difference is between RAID 1+0 (or RAID 10) and the level of RAID you would create if you striped first and then applied a mirroror if, indeed, there is any difference at all. There is a difference, and this other RAID level is referred to as RAID 0+1 or striping with mirroring. (You usually don't see RAID 0+1 abbreviated as RAID 01 because there is concern that people will get this abbreviation mixed up with RAID 1.)

Figure 11.8 illustrates the difference between RAID 10 and RAID 0+1. In this example, you have a number of disks (Dn) arranged in two equal-sized mirrored volumes. Consider what happens when a drive fails. In RAID 10, you would break the mirror, replace the disk, and rebuild the mirror. A RAID 10 array can survive both the loss of one mirror (any number of disks) and disk failures in both arrays, as long as the failed disks in the two arrays are not the same failed members of the sets. That is, in a four-disk array where A1B1 is mirrored to A2B2, and the As and Bs have the same data, you could fail A1 and B2 or B1 and A2, but not A1 and A2 or B1 and B2.

Figure 11.8. RAID 10 versus RAID 0+1. In this figure, the top two disks are A1 and A2, and the bottom two disks are B1 and B2.


Now let's consider RAID 0+1, where first you stripe and then you mirror. When you lose a disk in this configuration, one of your RAID 0 sets has been lost. When you break the mirror and add your new disk, the remaining disks in the stripe no longer correspond to the disks in the other striped mirror. The result is that you actually need to start from scratch and either rebuild the complete stripe from your working drives in the damaged mirror (plus the new one) or, as is often the case, start with a complete set of new drives that match the damaged set being replaced. In this circumstance, RAID 0+1 has to write more data than RAID 1+0.

Writing data is strictly a mechanical process, and except with very large volumes, you may not care if you have to rebuild the entire stripe. However, while RAID 10 survives the loss of disks in both arrays, RAID 0+1 does not. When you lose two or more drives on both mirrors in RAID 0+1 you can no longer rely on RAID to get you going again, and you have to reestablish your array from backups (and invariably some new data is lost). Nothing is perfect in this world, but RAID 10 is a little more perfect than RAID 0+1.

RAID 5: Striping with Parity

RAID 5 is the third most popular RAID level used in the industry. RAID 5 performs block-level striping across the disk set of a volume and writes some redundant information, called parity data, that lets you reconstruct missing data if a disk fails. The parity data is written across all the disks so that the array can survive a single disk failure and still be rebuilt. RAID 5 doesn't offer quite the performance of RAID 1 because of the overhead of reading and writing parity data, but it provides some redundancy that you don't find in RAID 1 without having to duplicate the volume as part of a mirror. Thus, if you like, you can think of RAID 5 as poor man's RAID 10.

Let's look a little more closely at how RAID 5 works and what parity is all about. You need three or more disks to create a RAID 5 volume, as shown in Figure 11.9. A block of data is subdivided by partitioning software into sectors, and the number of blocks is determined by the capacity of the disk. The number of sectors is a variable that you can defineusually 256 or fewer sectors. As each block of data is written to disk, the RAID 5 algorithm calculates a parity block that corresponds to the data and then writes the parity block on the same stripe, but not on the same disk. If stripe n in a three-disk RAID 5 volume has the parity block on disk 1, then stripe n+1 would have the parity block on disk 2, stripe n+2 would place the parity block on disk 3, so that by stripe n+3, the parity block would be returned to disk 1 again to start the cycle over, creating a distributed parity block arrangement.

Figure 11.9. How RAID 5 works. In this three-disk array, data is written in small sections on sequential disks, with parity data alternating on disks from stripe to stripe.


RAID 5 imposes some overhead on write operations because in an n disk system, 1/n of the data that is written isn't going to be anything you can use, except in the hopefully rare case in which you need to rebuild a RAID 5 array. RAID 5 requires at least three disks because if one of a two disk RAID 5 arrays fails, you would lose half of your data set. The third disk in the RAID 5 set provides the extra redundancy you need to make the system work. To really make RAID 5 perform, you want to have more than three disks. Although there is a write operation penalty, RAID 5 doesn't impose a read operation penalty on your disk I/O because the parity information is ignored. You still have the same benefit of multiple heads on multiple disks reading data at the same time.

Upon a read operation, RAID 5 performs a cyclic redundancy check (CRC) calculation to see if the data is valid. CRC is an algorithm that reads the data and writes a sum, based on the data it contains. When an error is detected, RAID 5 can read the parity block in that stripe, locate the sector with the error, and use the data to reconstruct the sector with the incorrect checksum. This process usually occurs on-the-fly, without your noticing it (though it will post an appropriate error message). If a whole disk fails, RAID 5 can rebuild the missing disk from the data contained in the remaining disks, and it can even do so automatically to a hot spare. In most implementations, RAID 5 arrays can continue to collect data even while rebuilding a failed disk, although at a considerably slower rate because the disk heads are busy with the rebuild. RAID 5 can sustain the loss of a single drive without data loss.

RAID 5 can be intelligent. Some RAID 5 systems have predictive functions that can determine whether a disk is likely to fail and initiate a rebuild based on that information. Data such as inaccessible clusters, hot spots on the disk, and other factors can be an indication of an impending disk failure. That predictive capability can be put to good use. In some instances, RAID 5 systems can analyze disk activity, figure out where your hot spots are, and move data around to lessen the stress on that area of the disk as well as to optimize data locations to improve disk head access. It's no wonder that RAID 5 is so popular. It really has a lot of things going for it.

RAID 5 works best when it is used with from 4 to 15 drives. Beyond 15 drives, RAID 5's performance drops, and the risk of data loss increases to unacceptable levels. RAID 5 gives you a slight write penalty (less so with more drives), enough redundancy to recover from a disk failure, and read performance similar to what you can achieve with RAID 0. One of the main attractions of RAID 5 is that you get some of the benefits of mirroring (RAID 1) without having to duplicate the entire disk set of the array.

There are two dual-level, or stacked, RAID 5 arrays in use today. The first is called RAID 1.5 or RAID 15. Here, the first array is mirrored to a second, and each array is written and striped with distributed parity. With RAID 15 you get the performance of reading from two drives at a time (one in each mirror) as well the performance of being able to write continuously (as with RAID 1). RAID 15 is considered to offer high performance, especially for streaming and sequential reads, fast writes, and redundancy. The cost of RAID 5 is high, even greater than that of RAID 10. Most people who can afford RAID 15 opt for RAID 10 in order to achieve faster write performance.

The second type of stacked RAID 5 is RAID 50. With RAID 50 a drive can fail in each of the RAID 5 arrays contained in the array. RAID 50 offers enhanced write performance, but as with all RAID levels that can rebuild an array on-the-fly, you see significant performance degradation when a rebuild is taking place. RAID 50, like RAID 1, is expensive to build. Not only do you have the redundancy of the mirrored drives, but you also have the redundancy of the parity drives with each RAID 5 array. A similar RAID 0+5 type of array is also defined, but although both of these RAID levels are secure and perform well, neither is in wide use.

Nonstandard Single-Level RAID Arrays

A few additional single-level RAID array types have been defined and find occasional use in the industry. Let's take a brief look at them before moving on to nonstandard dual-level RAID and proprietary RAID level definitions described later in this chapter, in the section "Miscellaneous RAID Types."

RAID 2: Bit-Level Striping with Redundancy

RAID 2 is similar to RAID 0 in that it uses a striping technique, but instead of striping blocks, RAID 2 stripes data at the bit level. RAID 2 includes redundancy (which RAID 0 doesn't) by applying a hammering code to determine the validity of the data contained in the array. In RAID 2 the RAID controller synchronizes the data set so that it can write to all disks in the array simultaneously. RAID 2 is not currently in use in any vendor's products.

RAID 3: Bit-Level Striping with Parity

RAID 3 is similar to RAID 5 in that it stripes data, but it does so at a byte level and uses a dedicated parity disk, which reduces processing somewhat. You need at least three drives and a reasonably powerful RAID controller to get good performance out of RAID 3. You won't find RAID 3 created in software because it requires a hardware implementation to obtain reasonable performance.

RAID 3 combines striping with parity, which imparts a performance penalty on any random disk activity (both read and write operations). Thus you don't find RAID 3 used in high-performance transactional systems. Like RAID 5, RAID 3 can survive the loss of a drive in the array, and most implementations also support hot-swaps and automatic rebuilds. Unlike with other RAID levels, rebuilding a lost drive in the array doesn't impart a large performance penalty; however, it does take a long time to rebuild the missing drive because all data across the array must be rebuilt.

RAID 3 is rather similar to RAID 4, and most people implementing this type of solution opt for RAID 4.

RAID 4: Block-Level Synchronized Striping with Parity

RAID 4 is another RAID level that does striping, but it does so at a block level, with a dedicated parity disk. RAID 4 synchronizes data across all disks in the array and offers outstanding read performance for long sequential read operations. Thus RAID 4 finds favor in streaming media applications, or when large files, such as prepress files, are used. In this regard, RAID 4 is very similar to RAID 3 and has the same benefits and penalties. Figure 11.10 illustrates the difference between RAID 3 and 4.

Figure 11.10. RAID levels 3 and 4.


Along with excellent read transaction and aggregate transfer rates, RAID 4 offers little disk loss to parity information storage. The disadvantages are that (like RAID 3) RAID 4 sustains very poor write transaction and aggregate transfer rates, and its block read transfer rate isn't much better than what you would achieve with a single disk. Rebuilds impose a low performance penalty on this type of array but take a long time to process.

Some fairly large arrays are written with RAID 4, even though RAID 4 controllers are rather difficult to design. The most well-known proponent of RAID 4 is Network Appliance, which has pioneered the use of this RAID level in its large FAS series filers. (See www.netapp.com/tech_library/3001.html for a detailed description of how RAID 4 is applied.)

RAID 6: Independent Disks with Dual Parity

RAID 6 is very similar to RAID 5, but it has an extra method of distributed parity, called a Reed-Solomon parity scheme. In RAID 6 the data is striped by blocks across the array in the same manner as you would find in RAID 5, but in this case, a second set of parity data is then written over all drives. This RAID level requires the use of two additional drives to write the parity information on (as shown in Figure 11.11), but it provides for both outstanding fault tolerance and high performance. RAID 6 can sustain multiple drive failures.

Figure 11.11. A RAID 6 array.


RAID 6 may come into wider use over time, but at the moment, RAID 6 is found on very few RAID controllers, and those are usually on the expensive end of the cost spectrum because the design needed to compute the additional parity information is hard to engineer and requires a high-performance design. However, with a fast controller, it is possible to use RAID 6 to approach RAID 5 performance with much better fault tolerance and still at a much lower cost than mirroring allows. To get the most out of a RAID 6 array, you have to deploy it with more drives than you would a RAID 5 array. At a minimum, you need three drives, as shown in Figure 11.11, but performance will be slow. Most RAID 6 implementations choose to use a minimum of four or five drives so that there is less of a performance and disk usage penalty for the redundant RAID information.

Miscellaneous RAID Types

The vast majority of RAID implementations that you will encounter use one of the levels described previously. A few other types of RAID do exist, though:

  • RAID 7 This RAID level is a proprietary design of Storage Computer Corporation (see www.storage.com/metadot/index.pl). It is similar to RAID 3 and 4, but it adds a data cache, which improves performance somewhat.

  • RAID S or Parity RAID This is a proprietary RAID array scheme that appears in EMC's Symmetrix storage servers. To create RAID S, you create a volume on each physical hard drive and then combine the volumes together to create parity information.

  • Matrix RAID The term Matrix RAID was part of the term coined as part of the Intel ICH6BR BIOS. Matrix RAID partitions a drive with half of the drive assigned as RAID 0 and the other half as RAID 1. You get the performance of RAID 0, and you get redundancy in case your RAID 0 partition fails. Matrix RAID doesn't really provide fault tolerance, and it is really aimed at small, single-disk workstations and home users. Because this isn't a configuration that can survive a disk failure, it isn't really RAID.

Software Versus Hardware RAID

You may think of RAID as a hardware feature because it uses HBAs and disk drives. Oftentimes, RAID doesn't even require your server's operating system to be created or managed because that process is managed by an ASIC on the controller. But RAID isn't a hardware feature: It is essentially a software feature. When you use hardware RAID, the software for RAID is embedded on an ASIC and presented as part of the controller's BIOS. An ASIC is a chip that contains custom programming that supports an application.

Software RAID implements the various RAID functions by using your server's CPUs in place of a RAID controller's ASIC. There is some performance penalty involved in doing so, but RAID is a low-level disk function and doesn't require much processing in order to implement. Today's CPUs are so powerful that software RAID is often faster or at least no slower than the RAID you can achieve with dedicated hardware. When we talk about hardware RAID, in essence we are talking about using a controller that performs RAID offloading.

The best-known example of software RAID is the Veritas Volume Manager, which is available for both Windows and Sun servers. Light versions of the Veritas Volume Manager have appeared in Windows for a while now. Windows 2000 had it in the form of the Microsoft Logical Disk Manager (LDM), which is now the Disk Management snap-in of the MMC. Nearly every operating system offers some form of software RAID. The Linux Software RAID program (see www.tldp.org/HOWTO/Software-RAID-HOWTO.html) and the Solstice Disksuite let you add software RAID to Linux and Solaris, respectively.




Upgrading and Repairing Servers
Upgrading and Repairing Servers
ISBN: 078972815X
EAN: 2147483647
Year: 2006
Pages: 240

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net