Using tar

Using `tar`

The lowest common denominator of tape backups in the Linux and UNIX world is tar ”the same program that's used to create archives for grouping multiple files into a single file for easier storage or transmission over the Internet. In fact, the name tar stands for tape archive. You can use tar as part of a backup procedure for a network, using either a client-initiated or a server-initiated strategy. Similar procedures apply to many other common Linux backup programs, such as cpio and dump , but the details of the commands you use with them will of course differ , so you'll need to look up the details for how to handle certain network- related options, particularly for client-initiated backups. I cover tar here both because it's the lowest common denominator and because it's used by several other tools, such as smbtar and AMANDA.

Basic `tar` Features

The tar utility is extremely powerful and supports a large number of options. These options come in two forms: commands and qualifiers. Commands tell tar what to do ”for instance, create an archive, list the contents of an archive, or extract files from an archive. Qualifiers modify the action of commands ”they're used to specify the device or file tar uses, to limit the files that might be backed up, to compress the resulting archive with gzip or bzip2 , and so on. When running tar , the basic syntax is:

 tar  command  [  qualifiers  ]  filenames

The filenames you specify are actually often directory names , possibly including the root directory ( / ). When you specify a directory name, tar backs up all the files and subdirectories in that directory.

Tables 17.1 and 17.2 list some of the more common tar commands and qualifiers. These are only a sample, however, particularly for qualifiers. You should consult the tar man page for information on more options.

Table 17.1. Common `tar` Commands

Command	Abbreviation	Purpose
`--create`	`c`	Creates an archive.
`--concatenate`	`A`	Adds a `tar` file to an existing archive.
`--append`	`r`	Adds ordinary files to an existing archive.
`--update`	`u`	Adds ordinary files that are newer than those in the existing archive.
`--diff` or `--compare`	`d`	Compares archived files to those on disk.
`--list`	`t`	Displays contents of an archive.
`--extract` or `”get`	`x`	Copies files out of an archive.

Table 17.2. Common `tar` Qualifiers

Command	Abbreviation	Purpose
`--absolute-paths`	`P`	Keeps the leading `/` on filenames.
`--bzip2`	`I`	Passes the archive through `bzip2` . (Not available on older versions of `tar` .)
`--directory` `dir`	`C`	Changes to the specified directory before acting.
`--exclude` `file`	(none)	Blocks `file` from being backed up.
`--exclude-from` `file`	`X`	Blocks all files listed in `file` from being backed up.
`--file [` `host` `:]` `file`	`f`	Performs backup using `file` on `host` as the archive file. (The `host` option is used in client-initiated network backups.)
`--gzip` or `”ungzip`	`z`	Passes the archive through `gzip` or `ungzip` .
`--listed-incremental=` `file`	`g`	Creates or uses an incremental backup file.
`--multi-volume`	`M`	Processes a multi-tape archive.
`--one- file-system`	`l`	Backs up or restores just one filesystem.
`--same-permissions` or `--preserve-permissions`	`p`	Preserves all username and permission information.
`--tape-length` `N`	`L`	Specifies the length of a tape in kilobytes; used in conjunction with `”multi-volume` .
`--verbose`	`v`	Displays filenames as they're processed .
`--verify`	`W`	Compares original files to archive immediately after writing it.

As an example of these options in use, suppose a computer has a SCSI tape drive, which can be accessed as /dev/st0 or /dev/nst0 . You could back up the /home directory of this computer, preserving all permissions and displaying the filenames as they're backed up, with the following command:

 #  tar --create --verbose --file /dev/st0 /home

The abbreviations shown in Tables 17.1 and 17.2 allow for a somewhat more succinct variant of this command:

 #  tar cvf /dev/st0 /home

A few tar options deserve special discussion. These are --one-file-system , ”same-permissions , --listed-incremental , and --verify . The --one-file-system option is particularly useful for backups because Linux systems may include virtual filesystems (such as /proc ), removable media, and perhaps even regular filesystems that should not be backed up. Using --one-file-system forces tar to back up only the directories or files you specify, so when you use this option, you should list all the partitions you want to back up. Alternatively, you could omit --one-file-system and use --exclude or --exclude-from to explicitly block directories such as /proc from being backed up.

The --same-permissions option is particularly important when backing up system files because tar sometimes loses certain permissions, particularly those that are not allowed by the current umask value. This option is important when restoring files, but not when backing them up.

The --listed-incremental option creates or uses a file that records information on the files that tar backs up. The first time the program is run with this option, the specified file is created and all files are backed up. Subsequent uses of this option cause only files that have been added or changed since the last backup to be backed up. This allows tar to create a partial backup, which is much smaller than the regular full backup. Many administrators perform full backups every week or month, and partial backups on a daily basis. This provides good protection against disaster with minimal effort. (When restoring incremental backups, though, you may find files you've intentionally deleted have been restored, because the increment procedure doesn't mark files deleted since the last backup as deleted.) In a network environment, you may want to rotate which machines receive full backups on any given day ”for instance, machine1 on Monday, machine2 on Tuesday, and so on.

Finally, --verify is intended to check the accuracy of your backup. The verify pass will increase backup time substantially, but it may be worthwhile, particularly if your tape drive doesn't include its own verify feature. (Most mid-range and high-end drives do include verification in hardware, often referred to as read-after-write. ) Any verification performed using --verify or on a second pass using the --diff command is likely to turn up some false alarms, because Linux systems are constantly active, so some files are likely to change between the backup and verify passes. Log files, files in /tmp , spool files such as mail and printer queues, and perhaps user files are particularly likely to change. If you only see a few changes in files that might reasonably have changed during the backup, there's no cause for alarm. If you see changes in other files, particularly in static files such as the contents of /usr , then it's possible that your tape, tape drive, or network connections are at fault.

Most modern tape drives support built-in compression, so there's no need to use the --bzip2 or --gzip options. Indeed, these options are potentially dangerous even on the low-end drives that lack compression features. The reason is that tar uses gzip or bzip2 to compress an entire archive, not individual files. If an error occurs when reading back a compressed archive, tar won't be able to recover, so all the data in the archive after that point will be lost. Tape drives' built-in compression algorithms are more robust against such errors; in the event of an error, you're likely to lose a file or two, but not the entire archive. Some backup programs don't use compression in this way, and so are more robust against errors. For instance, the commercial BRU (http://www.tolisgroup.com) package uses file-by-file compression when compression is enabled.

Testing Local `tar` and Tape Functions

When setting up a backup server, you should test basic backup functions locally before introducing the network into the equation. Local backups are invariably simpler than are network backups, so if you know that local backups work, you can be reasonably confident in attributing problems with network backups to the network configuration. In addition, it's important to remember to back up the backup server itself; like any other computer, it can fail, and if it fails without a backup, the rest of your network will be at risk.

The most basic local test is to try backing up using a command like the one presented in the previous section. The trickiest part of this is in determining the correct device file to use. Four device files are common for mid-range and high-end tape devices: /dev/st0 , /dev/nst0 , /dev/ht0 , and /dev/nht0 . The first two refer to SCSI tape drives, and the second two refer to EIDE/ATAPI devices. The filenames whose names begin with n are nonrewinding devices ”when an operation completes, the driver leaves the tape wound, so you can place multiple backups on a single tape. Device filenames without the leading n refer to rewinding devices, which automatically rewind the tape after every operation. Note that this is a characteristic of the device file, not of the hardware; every tape device has both a rewinding and a nonrewinding device file. If you have multiple tape drives, the second will have a filename that ends in 1 instead of , the third's filename will end in 2 , and so on.

There are a few exotic hardware types that use other device filenames. For instance, some older tape drives interfaced through the floppy port and used device filenames like /dev/qft0 and /dev/nqft0 . Such drives are very low in capacity and slow by today's standards, and so are unsuitable for network backups. Other drives use specialized interface hardware. Check the Linux kernel configuration for drivers for such boards .

If you have problems with a local backup, check your device hardware and check the drivers for the device. SCSI drives need both basic SCSI support and SCSI tape support enabled. Likewise, EIDE/ATAPI drives need both EIDE support and EIDE/ATAPI tape support. Be sure to check your ability to both back up and restore data; try using a small test directory, then a larger one. Use a verify function to confirm that your data are being recovered correctly.

Particularly if you want to place multiple backups on a single tape, the mt utility may be useful. This tool lets you control the tape drive, setting options such as its built-in compression and moving among various backup sets stored on the tape.

NOTE

The mt man page refers to backup sets as files, and tar documentation often does the same. Think of the tape as a hard disk without a filesystem; your backups are really just tar files stored sequentially on the tape, hence this terminology.

You may want to experiment with tar and mt to place multiple backups on a tape using a nonrewinding tape device. The basic syntax for mt is as follows :

 mt [-f  device  ]  operation  [  count  ] [  arguments  ]

The operation is a command like fsf (forward space files), bsf (backward space files), rewind (rewind tape), and datcompression (set compression ”send an argument of to disable compression or anything else to enable it). For instance, the following string of commands creates two backups and then verifies them:

 #  tar cvplf /dev/nst0 testdir-1/  #  tar cvplf /dev/nst0 testdir-2/  #  mt -f /dev/nst0 rewind  #  tar df /dev/nst0 testdir-1/  #  mt -f /dev/nst0 fsf 1  #  tar df /dev/nst0 testdir-2/

Most of these commands should be followed by tape activity. The first two tar commands will show the names of the files being backed up, and the last two tar commands will show the names of any files that differ between the original and the backup. The second mt command is needed when reading back the archives, but not when creating them.

Performing a Client-Initiated Backup

A client-initiated backup using tar requires that the client have a tar program and that the backup server be running an appropriate server program to grant the client's tar program access to the tape device. There's little special that you must do on the client side, aside from changing the tar commands from those described earlier. The backup server's configuration isn't the standard one in most Linux distributions, though, so you'll have to reconfigure the backup server.

Client-Initiated Network Configurations

The --file option shown in Table 17.2 takes a filename as an option. This may be a regular disk file, a device file that corresponds to a tape device, or a path to a network resource. In this final case, the backup server must be running the rshd daemon (which is often called in.rshd ). This daemon allows a remote system to execute commands on the system on which the server runs. The tar program uses this ability to pass the tar file it creates to a device file on the backup server. The rshd server comes with most Linux systems and is usually run from a super server. An /etc/inetd.conf entry to handle this server might resemble the following:

 shell  stream  tcp  nowait  root  /usr/sbin/tcpd \ /usr/sbin/in.rshd -h

If your system uses xinetd , you would need to create an equivalent entry in /etc/xinetd.conf , or a dedicated startup file in /etc/xinetd.d , as described in Chapter 4, Starting Servers. A xinetd configuration probably wouldn't call TCP Wrappers ( /usr/sbin/tcpd ), but in either case, the security provided by TCP Wrappers or directly by xinetd is important. The rshd daemon relies almost exclusively on the caller's IP address for security. Although TCP Wrappers and xinetd provide similar access control mechanisms, the redundancy on this matter can be important in case of a security bug in rshd .

Although IP addresses are the strongest type of access control used by rshd , the server also uses usernames to control remote access in order to prevent ordinary users from running dangerous programs with undue authority on the server. Ordinarily, rshd won't accept commands from root on any remote system. The -h parameter to rshd , demonstrated in the preceding inetd.conf entry, changes this default. This is extremely important because backups of system files must ordinarily be run with root privileges in order to back up sensitive system files and all user files, depending upon your system's user file permissions. If you omit -h , ordinary users will be able to perform backups to the server, but only if the permissions for the device file on the server allow this. (Most distributions don't allow ordinary users to access tape device files in any meaningful way.)

WARNING

The -h option to rshd is broken or disabled on some systems, so this procedure won't work. You may be able to use SSH instead ”run an SSH server on the backup server, and link ssh on the backup client to the rsh name so that tar calls ssh to do the network transfer. This has security advantages even for systems on which rshd works as described. This will only work if you configure SSH to accept logins without requiring a password authentication, though, as described in Chapter 13, Maintaining Remote Login Servers.

Because of the security issues surrounding rshd and its required configuration, the best configuration for a client-initiated backup server of this type is to dedicate a computer to this function. Such a computer need not be very powerful, aside from having a tape backup unit and a fast network connection. It should be protected from the Internet at large by a firewall, and ideally it shouldn't contain any vital data or run servers aside from rshd and any others needed for its configuration.

Performing the Backup

Once you've set up a backup server, you can perform backups with it. To do so, you must insert a tape into the backup server's tape drive and issue a command similar to the following on the backup client:

 #  tar cvlpf buserver:/dev/st0 /home /var /

This command backs up the /home , /var , and / directories on the current system to the rewinding tape device on buserver , and excludes any mounted filesystems other than those explicitly specified. If the three specified directories are the only ones on the computer, this command performs a complete network backup of the client.

You can use the same type of addressing with mt as you can with tar to specify a network backup device. For instance, mt -f buserver:/dev/nst0 rewind will rewind the tape in buserver 's tape drive.

In sum, performing a client-initiated network backup using tar is very much like performing a local backup using tar . You must add the name of the backup server to the device specification, but otherwise the commands used are identical. The extra effort goes into configuring the backup server system.

Performing a Server-Initiated Backup

Server-initiated backups, as described earlier, have the advantage of allowing a central server to control the scheduling of backups. This type of setup places the bulk of the configuration details on the backup client, which must run an appropriate network server package. This section describes using the Network Filesystem (NFS) server, as covered in Chapter 8, File Sharing via NFS, to perform network backups. Once the client is configured, the actual backup operation is much like a local one, although you must mount the backup client's export on the backup server system in order to perform the backup.

NOTE

It's possible to use a file-sharing protocol other than NFS for network backups. In fact, the upcoming section, "Using smbmount ," describes using smbmount to back up Windows file shares. For backing up a Linux system in this way, a protocol that preserves Linux file ownership and permission information is a practical necessity; hence, NFS is a good choice.

Server-Initiated Network Configurations

You should read Chapter 8 to learn how to configure a Linux computer to export specified filesystems. To perform a complete backup of a system, you must configure that system to allow the backup server to mount all of its important disk filesystems. You can omit /proc , removable media you don't want to back up, and so on. Ordinarily, you'll configure the backup client to export all its hard disk partitions.

For backup purposes, the backup client may export all directories with read-only access; the backup server doesn't need to write to these directories. If you need to restore data, though, you'll need to change this configuration to allow write access to the relevant directories. Alternatively, you could use some more convoluted method of restoring data, such as restoring it to a directory on the backup server, which you can then export for the backup client to read; or you could use a client-initiated restore if you configure the backup server appropriately.

One potentially dangerous requirement of a server-initiated backup configuration is that the backup server's root user must have full root access rights on the backup client ”in other words, you must use the no_root_squash option when you define exports. Without this option, the backup server won't be able to read many important system files, and perhaps not many users' files, either. This requirement allows miscreants with local network access or who can spoof the backup server's address to read all the files on the backup client, and even modify those files if you export client directories using read-write mode. For this reason, you should protect the backup server and all its clients with a good firewall to minimize the risk of outside access, and carefully monitor logs for evidence of tampering or other abuse.

As an example of a configuration, consider a client with three partitions that should be backed up: /home , /var , and / (root). You can export these filesystems by creating appropriate /etc/exports entries. If the backup server is called buserver , these entries might resemble the following:

 /home  buserver(ro,no_root_squash) /var   buserver(ro,no_root_squash) /      buserver(ro,no_root_squash)

If you need to restore files, you'll have to change the ro to rw and restart the NFS server. Another challenge, particularly at restore time, is keeping file ownership intact. If the backup specifies that a file is owned by, say, jbrown , and if this name doesn't map appropriately onto a correct UID, then the ownership of the file may be lost or mangled. As a general rule, it's simplest if the UIDs associated with specific users are the same on both the client and the server at both backup and restore time.

Performing the Backup

The backup commands are just like those described earlier, but you must first mount the backup client's exports on the backup server system. For instance, suppose the backup client is called buclient , and a mount point called /mnt/client exists for holding its backup directories. You might then mount and back up its files by issuing commands like the following:

 #  mount -t nfs -o soft buclient:/ /mnt/client  #  mount -t nfs -o soft buclient:/var /mnt/client/var  #  mount -t nfs -o soft buclient:/home /mnt/client/home  #  cd /mnt/client  #  tar cvlf /dev/st0 home var ./

NOTE

The preceding sequence assumes that the backup client's NFS server does not export mounted subdirectories. If the NFS server does export mounted subdirectories, you only need the first mount command.

One point to note about this particular backup sequence is that it uses cd to change into the main mount point for the backup client computer. Thus, the view in this directory is of the backup client's directory tree. The tar command backs up the individual mount points in this directory tree, but omits the complete path. The result is a tape that includes no references to the /mnt/client mount point. Files on this tape may be restored by mounting the target partition at the same mount point or elsewhere and moving into the mounted directory to do the restore. It's also possible to back up with a command like the following:

 #  tar cvlf /dev/st0 /mnt/client/home /mnt/client/var /mnt/client

Such a command includes references to the /mnt/client directory (or, more precisely, mnt/client , missing the leading / , unless you use the --absolute-paths qualifier). Such a backup can therefore only be restored if the target system is mounted in the same way as at backup, or at least in a directory that includes a mnt/client subdirectory of its own. Restores lacking such a directory tree will create one ”possibly on the backup server machine rather than the backup client.

WARNING

One potentially serious drawback of this type of server-initiated backup is that the backup process may stall if the backup client goes offline during the process. The -o soft mount option used in the preceding example allows the NFS client on the backup server to return errors to tar , which may be preferable to a hung backup process.

Using tar

Basic tar Features

Table 17.1. Common tar Commands

Table 17.2. Common tar Qualifiers

Testing Local tar and Tape Functions

Performing a Client-Initiated Backup

Client-Initiated Network Configurations

Performing the Backup

Performing a Server-Initiated Backup

Server-Initiated Network Configurations

Performing the Backup

Using `tar`

Basic `tar` Features

Table 17.1. Common `tar` Commands

Table 17.2. Common `tar` Qualifiers

Testing Local `tar` and Tape Functions