Section 18.5. Objective 5: Maintain an Effective Data Backup Strategy


18.5. Objective 5: Maintain an Effective Data Backup Strategy

Regardless of how careful we are or how robust our hardware might be, it is highly likely that sometimes data will be lost. Though fatal system problems are rare, accidentally deleted files or mistakes using mv or cp are common. Routine system backup is essential to avoid losing precious data.

There are many reasons to routinely back up your systems:

  • Protection against disk failures

  • Protection against accidental file deletion and corruption

  • Protection against disasters, such as fire, water, or vandalism

  • Retention of historical data

  • Creation of multiple copies of data, with one or more copies stored at off-site locations for redundancy

All of these reasons for creating a backup strategy could be summarized as insurance. Far too much time and effort goes into a computer system to allow random incidents to force repeated work.

18.5.1. Backup Concepts and Strategies

Most backup strategies involve copying data between at least two locations. At a prescribed time, data is transferred from a source medium (such as a hard disk) to some form of backup medium. Backup media are usually removable and include tapes, floppy disks, Zip disks, and so on. These media are relatively inexpensive, compact, and easy to store off-site. On the other hand, they are slow relative to hard disk drives.

18.5.1.1. Backup types

Backups are usually run in one of three general forms:


Full backup

A full, or complete, backup saves all of the files on your system. Depending on circumstances, "all files" may mean all files on the system, all files on a physical disk, all files on a single partition, or all files that cannot be recovered from original installation media. Depending on the size of the drive being backed up, a full backup can take hours to complete.


Differential backup

Save only files that have been modified or created since the last full backup. Compared to full backups, differentials are relatively fast because of the reduced number of files written to the backup media. A typical differential scheme would include full backup media plus the latest differential media. Intermediate differential media are superseded by the latest and can be recycled.


Incremental backup

Save only files that have been modified or created since the last backup, including the last incremental backup. These backups are also relatively fast. A typical incremental backup would include full backup media plus the entire series of subsequent incremental media. All incremental media are required to reconstruct changes to the filesystem since the last full backup.

Typically, a full backup is coupled with a series of either differential backups or incremental backups, but not both. For example, a full backup could be run once per week with six daily differential backups on the remaining days. Using this scheme, a restoration is possible from the full backup media and the most recent differential backup media. Using incremental backups in the same scenario, the full backup media and all incremental backup media would be required to restore the system. The choice between the two is related mainly to the trade-off between media consumption (incremental backup requires more media) versus backup time (differential backup takes longer, particularly on heavily used systems).

For large organizations that require retention of historical data, a backup scheme longer than a week is created. Incremental or differential backup media are retained for a few weeks, after which the tapes are reformatted and reused. Full backup media are retained for an extended period, perhaps permanently. At the very least, one full backup from each month should be retained for a year or more.

A backup scheme such as this is called a media rotation scheme , because media are continually written, retained for a defined period, and then reused. The media themselves are said to belong to a media pool, which defines the monthly full, the weekly full, and differential or incremental media assignments, as well as when media can be reused. When media with full backups are removed from the pool for long-term storage, new media join the pool, keeping the size of the pool constant. Media may also be removed from the pool if your organization chooses to limit the number of uses media are allowed, assuming that reliability goes down as the number of passes through a tape mechanism increases.

Your organization's data storage requirements dictate the complexity of your backup scheme. On systems in which many people frequently update mission-critical data, a conservative and detailed backup scheme is essential. For casual-use systems, such as desktop PCs, only a basic backup scheme is needed, if at all.

18.5.1.2. Backup verification

To be effective, backup media must be capable of yielding a successful restoration of files. To ensure this, a backup scheme must also include some kind of backup verification in which recently written backup media are tested for successful restore operations. This could take the form of a comparison of files after the backup, an automated restoration of a select group of files on a periodic basis, or even a random audit of media on a recurring basis. However the verification is performed, it must prove that the media, tape drives, and programming will deliver a restored system. Proof that your backups are solid and reliable ensures that they will be useful in case of data loss.

18.5.2. Device Files

Before discussing actual backup procedures, a word on so-called device files is necessary. When performing backup operations to tape and other removable media, you must specify the device using its device file. These files are stored in /dev and are understood by the kernel to stimulate the use of device drivers that control the device. Archiving programs that use the device files need no knowledge of how to make the device work. Here are some typical device files you may find on Linux systems:


/dev/st0

First SCSI tape drive


/dev/ft0

First floppy-controller tape drive, such as Travan drives


/dev/fd0

First floppy disk drive


/dev/hdd

The slave device on the second IDE controller in a system, which could be an ATAPI Zip or other removable disk

These names are just examples. The names on your system will be hardware- and distribution-specific.

Did I Rewind That Tape?

When using tape drives, the kernel driver for devices such as /dev/st0 and /dev/ft0 automatically sends a rewind command after any operation. However, there may be times when rewinding the tape is not desirable. Since the archive program has no knowledge of how to send special instructions to the device, a nonrewinding device file exists that instructs the driver to omit the rewind instruction. These files have a leading n added to the filename. For example, the nonrewinding device file for /dev/st0 is /dev/nst0. When using nonrewinding devices, the tape is left at the location just after the last operation by the archive program. This allows the addition of more archives to the same tape.


18.5.3. Using tar and mt

The tar (tape archive) program is used to recursively read files and directories and then write them onto a tape or into a file. Along with the data goes detailed information on the files and directories copied, including modification times, owners, modes, and so on. This makes tar much better for archiving than simply making a copy, because the restored data has all of the properties of the original.

The tar utility stores and extracts files from an archive file known as a tarfile, (conventionally named with a .tar extension). Since tape drives and other storage devices in Linux are viewed by the system as files, one type of tarfile is a device file, such as /dev/st0 (SCSI tape drive 0). However, nothing prevents using regular files with tar. This is common practice and a convenient way to distribute complete directory hierarchies as a single file.

During restoration of files from a tape with multiple archives, the need arises to position the tape to the archive that holds the necessary files. To accomplish this control, use the mt command. (The name comes from "magnetic tape.") The mt command uses a set of simple instructions that directs the tape drive to perform a particular action.


Syntax

 tar [options] files 


Description

Archive or restore files. tar recursively creates archives of files and directories, including file properties. It requires at least one basic mode option to specify the operational mode.

Most tar options will work with or without a leading -.


Basic mode options


-c

Create a new tarfile.


-t

List the contents of a tarfile.


-x

Extract files from a tarfile.


Frequently used options


-f tarfile

Unless tar is using standard I/O, use the -f option with tar to specify the tarfile. This might be simply a regular file or it may be a device such as /dev/st0.


-v

Verbose mode. By default, tar runs silently. When -v is specified, tar reports each file as it is transferred.


-w

Interactive mode. In this mode, tar asks for confirmation before archiving or restoring files. This option is useful only for small archives.


-z

Enable compression. When using -z, data is filtered through the gzip compression program prior to being written to the tarfile, saving additional space. The savings can be substantial, at times better than an order of magnitude depending on the data being compressed. An archive created using the -z option must also be listed and extracted with -z; tar will not recognize a compressed file as a valid archive without the -z option. Tarfiles created with this option often have the .tar.gz or .tgz file extension.


-N date

Store only files newer than the date specified. This option can be used to construct an incremental or differential backup scheme.


-V label

Adds a label to the tar archive. A label is handy if you find an unmarked tape or poorly named tar file.

Don't forget quotes around label if it includes spaces or special characters.


Example1

Create an archive on SCSI tape of the /etc directory, reporting progress:

 # tar cvf /dev/st0 /etc tar: Removing leading '/' from absolute path names in the archive etc/ etc/hosts etc/csh.cshrc etc/exports etc/group etc/host.conf etc/hosts.allow etc/hosts.deny etc/motd ... 

Note the message indicating that tar will strip the leading slash from /etc for the filenames in the archive. This is done to protect the filesystem from accidental restores to /etc from this archive, which could be disastrous.


Example2

List the contents of the tar archive on SCSI tape 0:

 # tar tf /dev/st0 ... 


Example3

Extract the entire contents of the tar archive on SCSI tape 0, reporting progress:

 # tar xvf /dev/st0 ... 


Example4

Extract only the /etc/hosts file:

 # tar xvf /dev/st0 etc/hosts etc/hosts 

Note that the leading slash is omitted in the file specification (etc/hosts), to match the archive with the stripped slash as noted earlier.


Example5

Create a compressed archive of root's home directory on a floppy:

 # tar cvzf /dev/fd0 -V "root home dir" /root tar: Removing leading '/' from absolute path names in the archive root/ root/lost+found/ root/.Xdefaults root/.bash_logout root/.bash_profile root/.bashrc root/.cshrc root/.tcshrc ... tar (grandchild): Cannot write to /dev/fd0: No space left on device tar (grandchild): Error is not recoverable: exiting now 

As you can see from reading the error messages, there isn't enough room on the floppy, despite compression. In this case, try storing the archive to an ATAPI Zip drive:

 # tar cvzf /dev/hdd -V "root home dir" /root ... 

As mentioned earlier, tape drives have more than one device file. A tape drive's nonrewinding device file allows you to write to the tape without sending a rewind instruction. This allows you to use tar again on the same tape, writing another archive to the medium. The number of archives written is limited only by the available space on the tape.

Often multiple archives are written on a single tape to accomplish a backup strategy for multiple computers, multiple disks, or some other situation in which segmenting the backup makes sense. One thing to keep in mind when constructing backups to large media such as tape is the reliability of the medium itself. If an error occurs while tar is reading the tape during a restore operation, it may become confused and give up. This may prevent a restore of anything located beyond the bad section of tape. Segmenting the backup into pieces may enable you to position the tape beyond the bad section to the next archive, where tar would work again. In this way, a segmented backup could help shield you from possible media errors.

See the tar info or manpage for full details.


Syntax

 mt [-h] [-f device_file] operation [count] 


Description

Control a tape drive. The tape drive is instructed to perform the specified operation once, unless count is specified.


Frequently used options


-h

Print usage information, including operation names, and exit.


-f device_file

Specify the device file; if omitted, the default is used, as defined in the header file /usr/include/sys/mtio.h. The typical default is /dev/tape.

18.5.4. Backup Operations

Using tar or mt interactively for routine system backups can become tedious. It is common practice to create backup scripts called by cron to execute the backups for you. This leaves the administrator or operator with the duty of providing correct media and examining logs. This section describes a basic backup configuration using tar, mt, and cron.

18.5.4.1. What should I back up?

It's impossible to describe exactly what to back up on your system. If you have enough time and media, complete backups of everything are safest. However, much of the data on a Linux system, such as commands, libraries, and manpages, don't change routinely and probably won't need to be saved often. Making a full backup of the entire system makes sense after you have installed and configured your system. Once you've created a backup of your system, there are some directories that you should routinely back up:


/etc

Most of the system configuration files for a Linux system are stored in /etc, which should be backed up regularly.


/home

User files are stored in /home. On multiuser systems, /home can be quite large.


/var/log

If you have security or operational concerns, it may be wise to save log files stored in /var/log.


/var/spool/mail

If you use email hosted locally, the mail files are stored in /var/spool/mail and should be retained.


/var/spool/at and /var/spool/cron

Users' at and crontab files are stored in /var/spool/at and /var/spool/cron, respectively. These directories should be retained if these services are available to your users.

Of course, this list is just a start, as each system will have different backup requirements.

18.5.4.2. An example backup script
 #!/bin/sh # Options can be passed in either with the TAROPTIONS # environment variable or as options on the command line. TAROPTIONS=${TAROPTIONS:-""} # Uncomment this block if you want tar to be verbose # when this script is run interactively. #if [ -t ]; then #    TAROPTIONS="$TAROPTIONS --verbose" #fi TAROPTIONS="$TAROPTIONS $*" # The tape device can be passed in the TARGET # environment variable.  The default is /dev/st0. TARGET=${TARGET:-"/dev/st0"} die ( ) {     message="$@"     echo "$message" >&2     exit 1 } exitmessage ( ) {     status=$?     if [ $status -ne 0 ]; then         echo "tar returned with exit value $status." >&2     elif [ -t 0 -a -t 2 ]; then         echo "tar completed successfully." >&2     fi     echo "Finished 'date'."     exit $status } trap exitmessage EXIT label="'hostname' 'date '+%A %x''" echo $label echo "Started 'date'." cd / || die "Failed to change to root directory" # This is a good time to turn compression on. # Unfortunately, that's somewhat specific to # particular tape drive models. # mt -f $TARGET compression on mt -f $TARGET rewind || die "Failed to rewind tape" tar -V "$label" -lcf $TARGET --totals $TAROPTIONS \     .           \     ./boot      \     ./usr       \     ./var       \     ./opt       \     ./home      \     --exclude=.journal                 \     --exclude=lost+found               \     --exclude=./dev/gpmctl             \     --exclude=./dev/log # The last few lines give examples of things you # might not want backed up or cause problems when # they are backed up (like the /dev entries listed). 

18.5.4.3. Verifying tar archives

Keeping tape drives clean and using fresh media lay a solid foundation for reliable backups. In addition to those preventive measures, you'll want to routinely verify your backups to ensure that everything ran smoothly. Verification is important on many levels. Clearly, it is important to ensure that the data is correctly recorded. Beyond that, you should also verify that the tape drives and the backup commands function correctly during restoration. Proper file restoration techniques should be established and tested during normal operations, before tragedy strikes and places your operation into an emergency situation.

You can verify the contents of a tar archive by simply listing its contents. For example, suppose a backup has been made of the /etc directory using the following command:

 # tar cvzf /dev/st0 /etc 

After the backup is complete, the tape drive rewinds. The archive can then be verified immediately by reviewing the contents with the -t option:

 # tar tf /dev/st0 

This command lists the contents of the archive so that you can verify the contents of the tarfile. Additionally, any errors that may prevent tar from reading the tape are displayed at this time. If there are multiple archives on the tape, they can be verified in sequence using the nonrewinding device file:

 # tar tf /dev/nst0 # mt -f /dev/nst0 fsf 1 # tar tf /dev/nst0 # mt -f /dev/st0 rewind 

While this verification tells you that the tapes are readable, it does not tell you that the data being read is identical to that in the filesystem. If your backup device supports them, the tar utility contains two options, verify and compare, that may be useful to you. However, comparisons of files on the backup media against the live filesystem may yield confusing results if your files are changing constantly. In this situation, it may be necessary to select for comparison specific files that you are certain will not change after they are backed up. You would probably restore those files to a temporary directory and compare them manually, outside of tar. If it is necessary to compare an entire archive, be aware that doing so doubles the time required to complete the combined backup and verify operation.

18.5.4.4. File restoration

Restoring files from a tar archive is simple. However, you must exercise caution regarding exactly where you place the restored files in the filesystem. In some cases, you may be restoring only one or two files, which may be safely written to their original locations if you're sure the versions on tape are the ones you need. However, restoring entire directories to their original locations on a running system can be disastrous, resulting in changes being made to the system without warning as files are overwritten. For this reason, it is common practice to restore files to a different location and move those files you need into the directories where you want them.

Reusing a previous example, suppose a backup has been made of the /etc directory:

 # tar cvzf /dev/st0 /etc 

To restore the /etc/hosts file from this archive, the following commands can be used:

 # cd /tmp # tar xzf /dev/st0 etc/hosts 

The first command puts our restore operation out of harm's way by switching to the /tmp directory. (The directory selected could be anywhere, such as a home directory or scratch partition.) The second command extracts the specified file from the archive. Note that the file to extract is specified without the leading slash. This file specification will match the one originally written to the medium by tar, which strips the slash to prevent overwriting the files upon restore. tar will search the archive for the specified file, create the etc directory under /tmp, and then create the final file: /tmp/etc/hosts. This file should then be examined by the system administrator and moved to the appropriate place in the filesystem only after its contents have been verified.

To restore the entire /etc directory, simply specify that directory:

 # tar xzf /dev/st0 etc 

To restore the .bash_profile file for user jdean from a second archive on the same tape, use mt before using tar:

 # cd /tmp # mt -f /dev/nst0 fsf 1 # tar xzf /dev/st0 /home/jdean/.bash_profile 

In this example, the nonrewinding tape device file is used with mt to skip forward over the first archive. This leaves the tape positioned before the second archive, where it is ready for tar to perform its extraction.

On the Exam

This Objective on system backup isn't specific about particular commands or techniques. However, tar is among the most common methods in use for simple backup schemes.


You should also know how to use the mt command to position a tape to extract the correct archive.



LPI Linux Certification in a Nutshell
LPI Linux Certification in a Nutshell (In a Nutshell (OReilly))
ISBN: 0596005288
EAN: 2147483647
Year: 2004
Pages: 257

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net