18.5. Objective 5: Maintain an Effective Data Backup StrategyRegardless of how careful we are or how robust our hardware might be, it is highly likely that sometimes data will be lost. Though fatal system problems are rare, accidentally deleted files or mistakes using mv or cp are common. Routine system backup is essential to avoid losing precious data. There are many reasons to routinely back up your systems:
All of these reasons for creating a backup strategy could be summarized as insurance. Far too much time and effort goes into a computer system to allow random incidents to force repeated work. 18.5.1. Backup Concepts and StrategiesMost backup strategies involve copying data between at least two locations. At a prescribed time, data is transferred from a source medium (such as a hard disk) to some form of backup medium. Backup media are usually removable and include tapes, floppy disks, Zip disks, and so on. These media are relatively inexpensive, compact, and easy to store off-site. On the other hand, they are slow relative to hard disk drives. 18.5.1.1. Backup typesBackups are usually run in one of three general forms:
Typically, a full backup is coupled with a series of either differential backups or incremental backups, but not both. For example, a full backup could be run once per week with six daily differential backups on the remaining days. Using this scheme, a restoration is possible from the full backup media and the most recent differential backup media. Using incremental backups in the same scenario, the full backup media and all incremental backup media would be required to restore the system. The choice between the two is related mainly to the trade-off between media consumption (incremental backup requires more media) versus backup time (differential backup takes longer, particularly on heavily used systems). For large organizations that require retention of historical data, a backup scheme longer than a week is created. Incremental or differential backup media are retained for a few weeks, after which the tapes are reformatted and reused. Full backup media are retained for an extended period, perhaps permanently. At the very least, one full backup from each month should be retained for a year or more. A backup scheme such as this is called a media rotation scheme , because media are continually written, retained for a defined period, and then reused. The media themselves are said to belong to a media pool, which defines the monthly full, the weekly full, and differential or incremental media assignments, as well as when media can be reused. When media with full backups are removed from the pool for long-term storage, new media join the pool, keeping the size of the pool constant. Media may also be removed from the pool if your organization chooses to limit the number of uses media are allowed, assuming that reliability goes down as the number of passes through a tape mechanism increases. Your organization's data storage requirements dictate the complexity of your backup scheme. On systems in which many people frequently update mission-critical data, a conservative and detailed backup scheme is essential. For casual-use systems, such as desktop PCs, only a basic backup scheme is needed, if at all. 18.5.1.2. Backup verificationTo be effective, backup media must be capable of yielding a successful restoration of files. To ensure this, a backup scheme must also include some kind of backup verification in which recently written backup media are tested for successful restore operations. This could take the form of a comparison of files after the backup, an automated restoration of a select group of files on a periodic basis, or even a random audit of media on a recurring basis. However the verification is performed, it must prove that the media, tape drives, and programming will deliver a restored system. Proof that your backups are solid and reliable ensures that they will be useful in case of data loss. 18.5.2. Device FilesBefore discussing actual backup procedures, a word on so-called device files is necessary. When performing backup operations to tape and other removable media, you must specify the device using its device file. These files are stored in /dev and are understood by the kernel to stimulate the use of device drivers that control the device. Archiving programs that use the device files need no knowledge of how to make the device work. Here are some typical device files you may find on Linux systems:
These names are just examples. The names on your system will be hardware- and distribution-specific.
18.5.3. Using tar and mtThe tar (tape archive) program is used to recursively read files and directories and then write them onto a tape or into a file. Along with the data goes detailed information on the files and directories copied, including modification times, owners, modes, and so on. This makes tar much better for archiving than simply making a copy, because the restored data has all of the properties of the original. The tar utility stores and extracts files from an archive file known as a tarfile, (conventionally named with a .tar extension). Since tape drives and other storage devices in Linux are viewed by the system as files, one type of tarfile is a device file, such as /dev/st0 (SCSI tape drive 0). However, nothing prevents using regular files with tar. This is common practice and a convenient way to distribute complete directory hierarchies as a single file. During restoration of files from a tape with multiple archives, the need arises to position the tape to the archive that holds the necessary files. To accomplish this control, use the mt command. (The name comes from "magnetic tape.") The mt command uses a set of simple instructions that directs the tape drive to perform a particular action.
18.5.4. Backup OperationsUsing tar or mt interactively for routine system backups can become tedious. It is common practice to create backup scripts called by cron to execute the backups for you. This leaves the administrator or operator with the duty of providing correct media and examining logs. This section describes a basic backup configuration using tar, mt, and cron. 18.5.4.1. What should I back up?It's impossible to describe exactly what to back up on your system. If you have enough time and media, complete backups of everything are safest. However, much of the data on a Linux system, such as commands, libraries, and manpages, don't change routinely and probably won't need to be saved often. Making a full backup of the entire system makes sense after you have installed and configured your system. Once you've created a backup of your system, there are some directories that you should routinely back up:
Of course, this list is just a start, as each system will have different backup requirements. 18.5.4.2. An example backup script#!/bin/sh # Options can be passed in either with the TAROPTIONS # environment variable or as options on the command line. TAROPTIONS=${TAROPTIONS:-""} # Uncomment this block if you want tar to be verbose # when this script is run interactively. #if [ -t ]; then # TAROPTIONS="$TAROPTIONS --verbose" #fi TAROPTIONS="$TAROPTIONS $*" # The tape device can be passed in the TARGET # environment variable. The default is /dev/st0. TARGET=${TARGET:-"/dev/st0"} die ( ) { message="$@" echo "$message" >&2 exit 1 } exitmessage ( ) { status=$? if [ $status -ne 0 ]; then echo "tar returned with exit value $status." >&2 elif [ -t 0 -a -t 2 ]; then echo "tar completed successfully." >&2 fi echo "Finished 'date'." exit $status } trap exitmessage EXIT label="'hostname' 'date '+%A %x''" echo $label echo "Started 'date'." cd / || die "Failed to change to root directory" # This is a good time to turn compression on. # Unfortunately, that's somewhat specific to # particular tape drive models. # mt -f $TARGET compression on mt -f $TARGET rewind || die "Failed to rewind tape" tar -V "$label" -lcf $TARGET --totals $TAROPTIONS \ . \ ./boot \ ./usr \ ./var \ ./opt \ ./home \ --exclude=.journal \ --exclude=lost+found \ --exclude=./dev/gpmctl \ --exclude=./dev/log # The last few lines give examples of things you # might not want backed up or cause problems when # they are backed up (like the /dev entries listed). 18.5.4.3. Verifying tar archivesKeeping tape drives clean and using fresh media lay a solid foundation for reliable backups. In addition to those preventive measures, you'll want to routinely verify your backups to ensure that everything ran smoothly. Verification is important on many levels. Clearly, it is important to ensure that the data is correctly recorded. Beyond that, you should also verify that the tape drives and the backup commands function correctly during restoration. Proper file restoration techniques should be established and tested during normal operations, before tragedy strikes and places your operation into an emergency situation. You can verify the contents of a tar archive by simply listing its contents. For example, suppose a backup has been made of the /etc directory using the following command: # tar cvzf /dev/st0 /etc After the backup is complete, the tape drive rewinds. The archive can then be verified immediately by reviewing the contents with the -t option: # tar tf /dev/st0 This command lists the contents of the archive so that you can verify the contents of the tarfile. Additionally, any errors that may prevent tar from reading the tape are displayed at this time. If there are multiple archives on the tape, they can be verified in sequence using the nonrewinding device file: # tar tf /dev/nst0 # mt -f /dev/nst0 fsf 1 # tar tf /dev/nst0 # mt -f /dev/st0 rewind While this verification tells you that the tapes are readable, it does not tell you that the data being read is identical to that in the filesystem. If your backup device supports them, the tar utility contains two options, verify and compare, that may be useful to you. However, comparisons of files on the backup media against the live filesystem may yield confusing results if your files are changing constantly. In this situation, it may be necessary to select for comparison specific files that you are certain will not change after they are backed up. You would probably restore those files to a temporary directory and compare them manually, outside of tar. If it is necessary to compare an entire archive, be aware that doing so doubles the time required to complete the combined backup and verify operation. 18.5.4.4. File restorationRestoring files from a tar archive is simple. However, you must exercise caution regarding exactly where you place the restored files in the filesystem. In some cases, you may be restoring only one or two files, which may be safely written to their original locations if you're sure the versions on tape are the ones you need. However, restoring entire directories to their original locations on a running system can be disastrous, resulting in changes being made to the system without warning as files are overwritten. For this reason, it is common practice to restore files to a different location and move those files you need into the directories where you want them. Reusing a previous example, suppose a backup has been made of the /etc directory: # tar cvzf /dev/st0 /etc To restore the /etc/hosts file from this archive, the following commands can be used: # cd /tmp # tar xzf /dev/st0 etc/hosts The first command puts our restore operation out of harm's way by switching to the /tmp directory. (The directory selected could be anywhere, such as a home directory or scratch partition.) The second command extracts the specified file from the archive. Note that the file to extract is specified without the leading slash. This file specification will match the one originally written to the medium by tar, which strips the slash to prevent overwriting the files upon restore. tar will search the archive for the specified file, create the etc directory under /tmp, and then create the final file: /tmp/etc/hosts. This file should then be examined by the system administrator and moved to the appropriate place in the filesystem only after its contents have been verified. To restore the entire /etc directory, simply specify that directory: # tar xzf /dev/st0 etc To restore the .bash_profile file for user jdean from a second archive on the same tape, use mt before using tar: # cd /tmp # mt -f /dev/nst0 fsf 1 # tar xzf /dev/st0 /home/jdean/.bash_profile In this example, the nonrewinding tape device file is used with mt to skip forward over the first archive. This leaves the tape positioned before the second archive, where it is ready for tar to perform its extraction.
You should also know how to use the mt command to position a tape to extract the correct archive. |