RAID | The Complete E-Commerce Book, Second Edition: Design, Build & Maintain a Successful Web-based Business

RAID, which is an acronym for “redundant array of independent (or inexpensive) disks,” should be a part of all but the smallest websites. By purchasing a good UPS you’ve protected your site against power problems. Now you need to protect your website against data problems and drive failure. That’s where RAID comes in. Although some may feel that a RAID system is too expensive, give it serious thought.

A RAID system links the capacity of two or more hard drives that are then viewed as a single large virtual drive by the RAID management software. By doing this it is possible to improve data storage reliability and thereby achieve fault tolerance.

A basic RAID system includes RAID functionality built into a controller and two or more hard drives. While RAID software can implement RAID in a server without a special drive controller, the efficiency and performance leaves much to be desired. Don’t use it.

RAID can be found in many different configurations and in just as many price ranges:

A floor-standing cabinet.
A complete system in one full-size drive bay.
A self-contained system with its own redundant power supplies, etc.
In RAID Levels 3 and 5, drives can be hot swapped (you can change drives without shutting down the server) and the RAID controller and reconstruction software will automatically rebuild any lost data.

RAID Levels

Although there are many different RAID levels, the most common ones used for website operation are listed below.

RAID-0 divides each file into blocks and distributes these among multiple disks in a process called “disk striping.” This provides high performance since more than one disk is read and/or written to simultaneously. A file can now be input with one revolution of four disks as opposed to four revolutions of one disk. Unfortunately RAID-0 doesn’t have the one key feature you expect from a RAID subsystem: Data redundancy, hence no fault tolerance. When you read that a system supports RAID-0 it really means that although the system has disk striping, it isn’t actually RAID at all. RAID 0 is used for high performance situations such as video editing and is generally not used for web servers, unless a high performance database also must be put online.

Figure 11: RAID-0 — Data is divided and striped across multiple drives. RAID-0 is typically used to increase a system’s performance, but this type of RAID offers no data protection.

Figure 12: RAID-1 — Data is completely copied or “mirrored” onto a second disk. This type of RAID offers good data protection, although this is expensive for a large website.

RAID-1 is the easiest and, for a small website, can be one of the least expensive ways to protect your website’s data from a hard drive failure. With RAID 1, as the data is written it is simultaneously copied or mirrored onto a second disk which is connected to a common disk controller and, voila! You have data redundancy. RAID-1 is considered to be the most common, secure, and reliable form of RAID. However, as your need for data storage increases, your costs can become considerable. At such a point you would look for another storage method such as RAID-3 or RAID-5.

Before going further into RAID technology we should explain parity data in the form of a type of Error Correcting Code (ECC), which avoids the cost of duplicating disk drives in their entirety. There is a method of transmitting binary data where an extra bit (the “parity bit”) is added to each group of bits. If parity is to be odd then the extra or parity bit is assigned either a one or zero so the total number of ones in the character will be odd. If the parity is even, the parity bit is assigned a value so that the total number of ones in the character is even. In this way errors can be detected.

The author knows that this explanation probably appears as clear as mud to the reader, but stay with me and hopefully it will become clearer.

By using RAID with parity, when a drive fails the ECCs and binary value of the striped bytes or sectors can be used to recover data from a failed drive by comparing data on the still-functioning drives to the parity data that sits on a special parity data drive. The RAID system then can re-create the data on the failed drive.

This is similar to how one solves a missing variable in an equation. For example, 3+5=8, where “3” is a bit on one drive, “5” is a bit on another drive, and “8” is the data’s parity information stored on a third drive. If the drive storing “3” fails, you could recalculate it by solving for X in the equation X+5=8, so X=3.

Or, to state it in very simple terms, parity-based systems calculate the data in two drives and store the result on a third drive. Those results can be later used to reconstruct what was on the other two drives. Now we can continue with our RAID discussion.

RAID-3 stripes data across multiple disks one byte at a time. Parity is also calculated bit-by-bit and stored on an extra “parity drive.” All drives have synchronized rotation. When a drive fails, data is rebuilt transparently in the background from the remaining functioning drives as the system continues to operate.

RAID-5 is the most popular high-end RAID technique used today. RAID 5 stripes data at the sector or block level across a minimum of 3 drives. It also provides stripe error correction information by striping it along with the data evenly over the drive set. This results in excellent performance and good fault tolerance but it still lags behind the performance found with RAID-1 disk mirroring. Most of the high-end, pre-configured RAID set-ups are RAID-5. A RAID-5 system with preconfigured drives, RAID-5 software, adapter cards, and the necessary cables, is easy to purchase and to set up.

Figure 13: RAID-3 — Data is striped across multiple disks one byte at a time. Parity is also calculated byte-by-byte and stored on an extra “parity drive.” All drives have synchronized rotation. Although RAID-3 offers a good data redundancy system, RAID-5 is better.

Figure 14: RAID-5 — Data is striped across multiple drives in large, sector-sized blocks. Drives spin independently. Parity information is striped along with the data. RAID-5 is the most popular RAID configuration.

RAID-10 is really just a combination of RAID-1 and RAID-0, i.e., mirroring and disk striping. While expensive, RAID-10 is a reliable, comprehensive RAID set-up and should be considered when operating a large, full-blown e-commerce website.

Some kind of RAID storage is necessary for your website and there are certainly all types and levels available. To learn more about RAID systems, read Richard Grigonis’ book, Fault Resilient PCs (Miller Freeman).