Section 2.2. My Boss Insists on Real-Time Backups | Linux Annoyances for Geeks: Getting the Most Flexible System in the World Just the Way You Want It

2.2. My Boss Insists on Real-Time Backups

There are two key reasons to do backups: data redundancy in case of local hardware failure and data availability in case of disaster. Appropriate RAID arrays can ensure the availability of your data in case of hardware failure. Careful network backups can help keep your data available in case of natural or man-made disaster. Naturally, this is an administrative function beyond the capabilities of regular users.

High-speed network backups saved the data from a number of financial firms after the tragedies of September 11, 2001. Without those backups, a lot more financial data would have been lost, and I suspect the subsequent economic declines might have been much worse.

Many standard books and documents tell you how to back up your system while it's down and unavailable to users. But Linux is increasingly being used as a server in environments where downtime is considered a sin. When used with removable hard drives, RAID does not require downtime. And the removable hard drives can be sent to safe locations.

Unfortunately, hardware RAID solutions are more expensive compared to software RAID. And they go beyond the packages that are included with most Linux distributions. There are other high-capacity/high-availability commercial solutions, such as Red Hat Cluster Manager and SUSE Heartbeat. But if you're stuck and need to configure a real-time backup using just the software available with a Linux distribution, consider software RAID, as described in this annoyance.

2.2.1. RAID Basics

As software RAID in Linux has limits, it's useful to review some basic characteristics of RAID. Linux supports four different levels of software RAID:

RAID 0: RAID 0 doesn't offer data redundancy, but it does support faster data transfer. It allows you to combine the space from two or more approximately equal-sized disks or partitions into one volume. Once created, you can mount a single directory such as / or /usr onto that volume. To take full advantage of RAID 0, you should combine partitions from separate drives on separate controllers; performance is enhanced because data can be transferred through both controllers simultaneously. However, if any disk in the array fails, the information from all RAID 0 disks in the volume is lost.
RAID 1: RAID 1 mirrors data onto two disks or partitions of appropriately equal size. When you write a file to a RAID 1 mirror, you're writing to both drives. Writes are therefore slower than to a standard hard disk. However, if one disk fails, a complete backup of the data is available on the other disk. You can replace either of the disks, if one fails or if you just want to move a backup to a remote location. Once a disk is replaced, the data from the other disk in the array is copied to the new disk. In short, RAID 1 supports robustness through redundancy.
RAID 4 and RAID 5: RAID 4 and RAID 5 allow you to configure three or more disks together in an array. Data is striped across multiple disks. Parity information supports recovery if a disk fails. On RAID 4, the parity information is stored on a single disk. On RAID 5, the parity information is distributed among all disks in the array. Recovery uses the parity data. Both forms support robustness, as with RAID 1, using less space.

There are other levels of RAID supported by Linux. For example, there is partial support for RAID 6, which includes a second level of striping compared to RAID 5 (and therefore can tolerate failures of two disks in the array). RAID 10 is a combination of RAID 0 and RAID 1, a striped volume built on two RAID 1 arrays.

Until the development of Serial ATA (SATA) and multidisk Parallel ATA (PATA) IDE hard disks (and controllers), effective use of RAID was limited to SCSI drives.

Naturally, if you want a real-time backup, you're looking for some implementation of a RAID 1 array. While hardware RAID is not dependent on the operating system, there is an excellent introduction to how to use it with Linux in the DPT Hardware RAID HOWTO at http://www.ram.org/computing/linux/dpt_raid.html.

RAID supports the use of spare disks. For example, if you have a spare disk on a RAID 1 array, you can remove one of the disks in the array and configure authentic mirroring in the spare.

2.2.2. Tools for Software RAID

Software RAID is configured through the operating system. Unlike hardware RAID, there is no dedicated hardware controller for the disk array. You can control a software RAID array with commands and configuration files.

While Software RAID takes more work than hardware RAID, it can be surprisingly efficient because there is no potential bottleneck at the non-RAID hardware controller. But when you configure a software RAID array, be careful. Make sure to configure partitions on different physical drives. Otherwise, you can't get the benefit of fast access through different controllers. Furthermore, if you've configured more than one RAID 1 or 4/5 partition on a single physical drive and it fails, you'll lose all data in that array.

Before you can use software RAID, your system must meet two requirements:

The mdadm or raidtools package must be installed (raidtools is obsolete on some distributions).
The RAID multidevice (md) module must be supported by the kernel. This is easy to verify: just check kernel settings in the associated config-`uname -r` file in the /boot directory, or check for the availability of /proc/mdstat.

A couple other packages may be useful:

SUSE's scsirastools package is designed to administer and monitor SCSI disks in a RAID 1 array. For other distributions, the source code is available from http://scsirastools.sourceforge.net.
Starting with Fedora Core 3, Red Hat includes the dmraid package, which can detect and help you manage software RAID arrays. The dmraid package is also available in the Debian Etch (testing) distribution.

2.2.3. Typical RAID 1 Configuration

A Linux software RAID 1 array keeps an exact copy of the files from one partition, such as /dev/hda5, on a second partition, such as /dev/hdc5. It's important to place each partition in a RAID 1 array on separate hard disks. While details vary, this is essentially how to set up the /home directory on a RAID 1 array:

Back up the information on /home.
Configure the partitions on the two or more hard disks that you're planning to use for the RAID 1 array.
If you're going to use the existing partition with the /home directory as one of the partitions in the RAID array, unmount it.
Make sure you have a backup of your /home directory, as you're about to destroy the data on the production partition.
Run fdisk to reconfigure the target partitions with the Linux RAID autodetect file type.
Edit the /etc/raidtab configuration file. The default SUSE raidtools package includes sample configuration files that you can use in the /usr/share/doc/packages/raidtools directory. As you may have a different distribution, I show here a sample configuration file, which assumes that you have two IDE hard drives:
```
 # Sample raid-1 configuration raiddev                 /dev/md0 raid-level              1 nr-raid-disks           2 nr-spare-disks          0 chunk-size              4 device                  /dev/hda5 raid-disk               0 device                  /dev/hdc5 raid-disk               1 
```
Once you've configured /etc/raidtab, you can initialize the array with the mkraid /dev/md0 command.
Alternatively, if you use the mdadm package, you can create the same array with the following command, which creates RAID device /dev/md0, at RAID level 1, with the two RAID partition devices shown:
```
 mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2/dev/hda5 \ /dev/hdc5 
```
Format the device with the appropriate command, such as mke2fs for ext2 or ext3 formatted filesystems.
To test the result, mount the RAID device on an empty directory. Check the result in your /etc/mtab file and with the df command.
Now you can configure the RAID device, in this case, /dev/md0, in /etc/fstab, with the directory of your choice.

2.2.4. Networking RAID 1

You can set up one of the mirrors in a RAID array in a remote location. Naturally, this requires a dedicated high-speed connection. Details would take up another full book. There are several techniques for real-time networked RAID mirrors, based on the Enhanced Network Block Device (ENBD). With an ENBD, you can configure a remote partition from a RAID array so that it appears local on your computer. It's an academic research project funded in part by Realm Software; the home page is http://www.it.uc3m.es/ptb/enbd.