The one task that is overlooked by a majority of users who operate their own personal systems, whether running Linux or any other operating system, is backing up and archiving necessary data and resources. Yet, this same task is thrust upon most system administrators and operators who manage multiuser servers. So, before moving ahead, you should understand what the terms backup and archival really mean. Do they refer to the same process or are they two completely different tasks ? For most people, the two terms are synonymous because they are often discussed together in most books, magazines, and other documentation. In reality, however, they have different connotations and purpose.
A backup is a copy of a resource kept elsewhere in case the existing resource fails or corrupts in some way; it allows you to get back up and running quickly. For example, you would typically want to back up the operating system and all of the configuration files, so you can quickly recover in case of a fatal system crash. Backups are typically optimized for speed of recovery of a system or application. On the other hand, an archive is a copy of a resource that you may need, but you don t want it cluttering up your main system. The resource could be an old file that you don t use anymore, or it could be a snapshot of a frequently changing resource that was taken at a particular point in time. You would typically want the ability to retrieve portions of data, such as a specific file or an exact system log from a certain date, from an archive.
Backing up and archiving are usually mentioned together in the same breath , mainly because one common practice involves taking backups and archiving them . However, confusing the two tasks will lead to requirements for both processes being poorly served . And so, this chapter considers the purpose and process of each task separately.
Linux comes with a vast assortment of applications and utilities to back up and archive resources quickly and easily. There are tools available for bundling individual files and directories, mirroring disk partitions and filesystems, and performing network backups. However, to keep things simple, we ll use the tar application for a majority of this section. Some of the other useful utilities and tools are discussed briefly at the end of the section.
The tar application, which stands for t ape ar chive, is a traditional UNIX tool for bundling and distributing files. It is best suited for single- user systems or systems with a small amount of data because it lacks the sophisticated features of more complex backup and archival applications. tar works by taking a specified set of files and serializing them into one big stream that consists of the file headers ( name , owner, file creation date) and contents. You can store this data stream in a file on a filesystem or removable media, or even directly onto a magnetic tape. One of the main advantages of tar is its support across many operating systems, ranging from Linux to Windows and the Mac OS. For example, the highly popular WinZip application for Microsoft Windows and the Stuffit application for both Windows and Mac OS can deserialize tar files without any problem. The implications of this type of portability are huge; you can recover resources from your saved backups and archives on other operating systems even if you are having problems with your Linux system.
As discussed previously, there is a distinct difference in process and purpose between backing up and archiving. Backups are generally used for disaster recovery. In a typical computing environment, you can run into all types of unfortunate situations where you lose precious data, but having a good quality backup ”a copy of a resource from a previous point in time ”will help you get back up and running with ease. Here are a few of these situations.
You could accidentally delete a file critical to the performance of the system, such as a configuration file or even the kernel itself. Imagine what would happen if you deleted the critical /etc/passwd file. No one would be allowed to log in to the system. But, if you could quickly recover this important file from a previous backup, you would be back in business in no time. Actually, this is one of the more practical reasons why you should not be logged in as the system administrator (root user) on a regular basis; you will have the necessary privileges to modify or remove just about any file on the system.
If, for example, you are the main system administrator for a corporation that deals with daily financial transactions, it is paramount for the systems to be running all of the time. What if you experience a serious hardware failure that wipes out all of your financial applications? If you had meticulously backed up the most recent versions of the applications, their license keys, and configuration, you could get the system working properly with only a minimum loss of business. On the other hand, had you not backed up this information, you would have to find all of these applications from the original distributions, install and configure them on the system, and make sure all of the patches are up to date. This could take a very long time, which would translate into a serious loss of business.
These are just examples of two types of failures. Unfortunately, there are plenty of other less dramatic instances that you could experience, where you could save precious time and money by making sure you back up all of your critical applications and files. And so, this section discusses various issues related to backups, including the following:
What data needs to be backed up?
What medium should you use for backups?
How do you create and verify backups?
How often do you need to back up your data?
This is a very hard question to answer and depends highly on your needs and requirements. On a typical Linux system, you will find the operating system, its configuration and log files, any special applications that you install, their licenses and configuration, and users personal data. Of course, this is a very simplistic view; you probably have a lot more information and data that is specific for your needs. But, based on this list and your specific requirements, you need to ask yourself just one question: What resources do I need to get the system up and running quickly in case of a disaster of some sort ?
Most likely, your answer may have included the operating system, configuration files, and all of the specially installed applications and their related resources. However, if you have not updated the original operating system with patches and other updates to core applications, you don t need to back it up; you can possibly install it from the original distribution and restore the system configuration from saved backups. On the other hand, if you have made a lot of changes to the original system, you will save a lot of time and unnecessary headache by simply backing up the entire operating system, including all of the configurations.
Your specially installed applications are a completely different matter, altogether. Because these applications and their related data are either critical for your business or your everyday needs, you definitely should back them up. Why go through the hassle of installing all of these applications and their patches from the original medium when you can simply restore them from backups quickly and easily?
In addition, you should also consider the speed with which you can recover these resources when needed. For example, if you have access to a high-speed network attached storage (NAS) server with an extremely large storage capacity, you might consider even backing up critical resources that you can install from their original distribution. This will allow you to restore everything in one fell swoop very quickly without the hassle of dealing with different media.
And, finally, other resources such as users personal data and server log files are important, but are not critical for the running of the system. Sure, it is very useful to know who accessed your Web server at what time ”which you can see from the Apache Web server log file ”but if you lose a few days worth of data, it will not significantly hurt you. Having said that, however, these resources should be archived on a periodic basis; see the next section on archiving for more information. Your users will not be too thrilled if they lose all of their files.
We have talked about what you need to back up, but what medium do you use to actually store these resources? Various types of media are available for backing up data, everything from floppy disks and magnetic tapes to CD-ROM, DVD, and hard disks, as well as network storage. The following table illustrates some of the advantages and disadvantages of each type of media.
Removable disks (Zip, USB Flash)
Low capacity, not supported on all operating systems, more expensive than floppy disks, and only somewhat reliable.
Inexpensive, high capacity; ideal for unattended backups.
Sensitive to heat and electromagnetic fields, relatively slow, and drives are expensive.
Inexpensive, DVD has high capacity; reasonably fast.
CD-ROM has low capacity; multiple DVD standards.
Hard disk/Network storage
Very fast, no media to load, and relatively inexpensive.
Stored backups in system are vulnerable to same risks as the system itself.
You need to assess the cost and benefits of what medium to choose based on the answers to the following three questions:
How long will you be storing the backups?
How frequently do you envision accessing the backups and restoring data?
What are you willing to pay for the medium and its associated writing device?
You should typically lean toward optical media, such as CD-ROM or DVD, if you are intent on storing backups for a long period of time. On the other hand, all magnetic media are sensitive to environmental factors, including heat and electromagnetic fields, and typically have a much shorter shelf life than optical media. Optical media also tend to be much less expensive than all other media, costing only pennies for nearly 700MB of storage capacity.
However, hard disks and dedicated network storage servers are ideal for recovering data quickly from saved backups. You should stay away from storing backups in an internal hard disk that is physically located in the same machine as the operating system, as you run the risk of the same problem wiping out the system and the backup. A better solution might include setting up a standalone disk array, formatted as VFAT or NTFS, that can be used by all Linux machines in your network to store backups using SAMBA (see Chapter 9 for more information). In addition, by formatting this disk array as VFAT or NTFS, you can access the same disk array with most versions of the Windows operating system, as well. Needless to say, while hard disks and network storage servers provide great performance, they are also much more expensive than all other media.
Now that you have an idea of what backups are, what types of resources you should back up, and where you should back them up to, it is time to look at some examples. In this section, you learn more about tar , especially how to create an archive and restore from an existing archive.
Let s start with a simple example:
# tar -cWPhf etc-200403101335.tar /etc
The tar application will recursively iterate through all the files and subdirectories within /etc and create an archive named etc-20040310.tar in the current directory. The meaning of the word archive here is different than the purpose and process of archiving, as defined at the top of this section. The -c and “W switches are used to ask tar to create an archive and verify it after creation, storing the final archive in the file specified by the “f switch. If an error occurs during the creation of a backup file (so that the file becomes corrupted in some way), that backup file is as good as useless. If you ve taken the care to generate regular and systematic backups, and you find that backup is corrupted when you ultimately turn to it in need, you re unlikely to be very happy. Therefore, it s a good idea to check the validity of each backup at the time you create it, so that you can rest assured that the backup file will not let you down if you ever need to use it.
The “P switch forces tar to keep the leading / (slash) from filenames; all files will be stored as /etc/ instead of the default etc/ . Not specifying this switch would cause tar to fail during the verification phase because it would not be able to find files starting with etc/ in the current directory.
We did not discuss the very important “h switch in the previous paragraph. By default, tar archives symbolic links (also known as shortcuts on other platforms) as is, which is not much help if you were to lose the original. The “h switch, on the other hand, forces tar to archive the original files pointed to by the symbolic links.
And finally, you might be wondering why we named our archive etc-200403101335.tar . It is very important for you to choose a proper naming convention for your backups because doing so allows you to distinguish one backup from another. This one uses the folder name followed by the date and time (March 10, 2004, 1:35 PM), which allows you to see the date of the backup simply by looking at the archive name.
What if you want to see the files that tar is archiving? Simple, add a -v (verbose) switch to the preceding command, like so:
# cd / # tar cvWhf etc-200403101339.tar /etc tar: Removing leading `/' from member names etc/ etc/sysconfig/ etc/sysconfig/network-scripts/ etc/sysconfig/network-scripts/ifup-aliases ... Verify etc/shadow Verify etc/gshadow Verify etc/lilo.conf.anaconda Verify etc/fstab.REVOKE
As tar archives each file, you will see it listed on your screen. You can use the -v switch with any syntax to enable verbose output. Also notice how tar removed the leading / (slash) from the file names because the “P switch was not specified.
How do you restore files from a tar archive? You can simply do the following:
$ tar zvf etc-200403101339.tar etc/ etc/sysconfig/ etc/sysconfig/network-scripts/ etc/sysconfig/network-scripts/ifup-aliases ...
The tar application creates a directory called etc in your current working directory, along with all the files and subdirectories that were archived. We use the “f switch to specify the tar archive from which to restore, the “z switch to force restoration mode, and the “v switch to enable verbose output.
If you don t want to extract all the files, or are not sure what files are contained in an archive, use the -tf switches first to look at the files in the archive:
$ tar -tvf etc-200403101339.tar
Then, you can extract a specific file, like so:
$ tar -f etc-200403101339.tar -vx etc/shadow
By default, tar overwrites existing files with files from the archive. This is typically not a problem unless you perform the extraction or restoration in the directory where the original file or directory resides. For example, if you were to invoke the following set of commands, the file etc/shadow from the archive would overwrite the original file /etc/shadow :
# cd / # tar -f etc-200403101339.tar -vx etc/shadow
You can use the “k switch to force tar not to follow this behavior:
$ tar -f etc-200403101339.tar -vxk etc/shadow
The tar application has other options, such as the ability to compress archives, to include files from multiple directories, and to exclude certain files. Here is another example:
# tar -cvPhzf mysystem-200403121800.tar.gz /etc /usr/local /home
Here, tar will include configuration files from /etc , custom applications from /usr/local , and users personal data from /home into a gzip compressed archive, mysystem-200403121800.tar.gz . No verification will be performed because the “W switch is not compatible with compression.
You should make it a habit to store all of your specially used and custom-built applications in a separate directory hierarchy, so it is easy to back up these resources quickly without having to remember a long list of disparate directories. The /usr/local directory is quite popular for this purpose among UNIX users, developers, and administrators.
You can restore this compressed archive by using the -z switch again:
$ tar -zxvf mysystem-200403121800.tar.gz
Over the last several years , there has been a great explosion in the number of CD writers installed in personal computers, as prices have been severely reduced across the board. This is terrific for us, as we can use a CD writer to back up our data and resources to a CD-ROM, providing us with an inexpensive and stable backup solution.
CD writers come with many interfaces: parallel port, IDE, SCSI, USB, and FireWire. Because IDE writers are by far the most widely used, we will deal with them exclusively in this section; a similar procedure applies for all other types of writers, as well. Linux uses the ide-cd kernel module to communicate with IDE CD writers; this module comes with most recent Linux 2.6 distributions. See if you have the module installed by using the following:
# find /lib/modules/`uname r` -name 'ide-cd.ko' print /lib/modules/2.6.1-1.65/kernel/drivers/ide/ide-cd.ko
Kernel modules and this command syntax are described in great detail in the Chapter 12. The kernel will load the module on demand at boot time when it detects the presence of the CD writer.
Burning a CD-ROM involves two distinct steps: the first is to create an ISO9660 image and then burn that image to the disk. First, we will use the mkisofs command to create the image:
# mkisofs o mysystem-200403121800.iso \ V MySystem-2004-03-10-1800 \ J R -v mysystem-200403121800.tar.gz INFO: UTF-8 character encoding detected by locale settings. Assuming UTF-8 encoded filenames on source filesystem, Use -input-charset to override. mkisofs 2.01a27 (i686-pc-linux-gnu) Writing: Initial Padbock Start Block 0 Done with: Initial Padbock Block(s) 16 ... Total translation table size: 0 Total rockridge attributes bytes: 269 Total directory bytes: 0 Path table size(bytes): 10 Done with: The File(s) Block(s) 11909 Writing: Ending Padblock Start Block 11940 Done with: Ending Padblock Block(s) 28 Max brk space used 6024 11968 extents written (23 MB)
The “o switch specifies the ISO9660 image name, the -J switch enables Joliet filenames for Windows compatibility, the -R switch preserves filenames and permissions, the “V switch specifies the volume identification, and the “v switch enables verbose output.
If you cannot get the mkisofs application to work, you may not have it installed on your Linux system. In that case, simply insert either the DVD or the first CD-ROM from the distribution, wait a few moments, and invoke the following commands:
# rpm -hvi /mnt/cdrom/Fedora/RPMS/mkisofs-2.01-0.a27.3.i386.rpm # rpm -hvi /mnt/cdrom/Fedora/RPMS/cdrecord-2.01-0.a27.3.i386.rpm
You will need the cdrecord application to actually burn the image to disc, as you will see next.
Before you burn the ISO9660 image to disc, you can view its contents by mounting it to a filesystem directory via a loopback device, like so:
# mount -t iso9660 mysystem-200403121800.iso /mnt/cdrom -o loop $ ls ls /mnt/cdrom total 1245 1245 rw-r--r-- 1 root root 1274545 Mar 12 18:01 mysystem- 200403121800.tar.gz
This mounts the image in the /mnt/cdrom directory. You can treat the image like a regular filesystem, using commands such as ls and cat to view individual directories and files. When you are finished checking the contents, you can unmount the image, like so:
# umount /mnt/cdrom
And burn the image to disc.
Now that the image has been tested , you can burn it onto a CD-ROM with the following:
# cdrecord dev=ATAPI scanbus Cdrecord-Clone 2.01a27-dvd (i686-pc-linux-gnu) Copyright (C) 1995-2004 Jrg Schilling Note: This version is an unofficial (modified) version with DVD support Note: and therefore may have bugs that are not present in the original. ... Using libscg version 'schily-0.8'. scsibus0: 0,0,0 0) 'LG ' 'CD-RW CED-8080B ' '1.07' Removable CD-ROM 0,1,0 1) * 0,2,0 2) * 0,3,0 3) * 0,4,0 4) * 0,5,0 5) * 0,6,0 6) * 0,7,0 7) * # cdrecord -v -eject -speed=8 dev=ATAPI:0,0,0 mysystem-200403121800.iso
The cdrecord command with the “scanbus argument returns a list of all the CD-ROM devices attached to the system. You pass the device number from the list to the cdrecord command, in addition to the speed and ISO9660 image to burn the image to the disc; the “eject switch forces the CD writer to eject the tray after the burn is finished.
Later on, if you want to restore a file from a backup stored on the CD-ROM, you would mount the disk and extract the necessary file or files directly from the disk, like so:
# mount /dev/cdrom /mnt/cdrom $ tar zxvf /mnt/cdrom/mysystem-200403121800.tar.gz
Or, you can extract a single file:
$ tar -zf /mnt/cdrom/mysystem-200403121800.tar.gz -vx /etc/shadow
Having considered how to archive, you now need to think about how often you need to perform backups. The frequency with which a certain resource should be backed up depends on several issues, namely the following:
How often does the resource change making the old backup outdated ?
How much data can you afford to lose?
For example, financial institutions cannot afford to lose any data, so transaction data must be backed up as the transactions occur, for replay later if necessary. Unfortunately, this can be expensive, and in many other situations, it is possible and optimal to back up daily or weekly. Fortunately, frequent backups don t necessarily have to consume a lot of space because most backup applications support incremental backups . An incremental backup is different than a full or complete backup, in that it archives only those files that have been added or changed since the last full backup.
Finally, how do you create incremental tar archives that contain files that have only been added or modified since a specific date? Luckily, the newer versions of tar have an -N switch, which allows you to do this:
# tar -cvWPhf etc-200403121830.tar /etc -N 01/01/04
This will archive files that have been added or modified after January 1, 2004. As you can see from all of these examples, tar is a very flexible and powerful archival tool, but one that is not designed to archive arbitrary files located in various directories throughout the system. For that purpose, you use the cpio application, which is briefly explained toward the end of this section.
But now, we will look at the process of archiving resources and information ”a process that has an entirely different purpose than backups.
People are creatures of habit; they like to keep things around whether they need them or not. Take a good look around your Linux system. You will most likely find a ton of resources that you may no longer need, including old log files, copies of previous system configuration files, source code for applications that you are no longer using, or outdated personal data. Obviously, you don t want to go out and delete all of these resources on a whim because you might really need them at a later time. For example, what if you need to determine the Web server utilization rate for the last three years to plan for the future? You would need to take a look at all of the old Web server log files during that time period to determine this information. Or, what if you want to see how you built a certain application with an older set of development tools so you could use the same procedure with an updated version of the tools? You certainly would not want to spend the time to rediscover the techniques that you followed.
On the flip side, deleting these types of resources not only removes clutter from your system, but also frees up valuable disk space that you could use to store other more important resources. You will be surprised at just how much space all of these older resources consume. The answer to this dilemma is to archive the data and information in a safe place, away from the main system. That way, you can always retrieve any of the information when needed, while keeping your system clean and up-to-date.
Remember again the distinction between backups and archiving. Backups allow you to recover critical resources so you can get up and running quickly in case of a disaster. Archiving data serves another purpose completely, namely long- term storage of data and resources that you may need at a later time. In fact, a common practice among system administrators involves archiving old backups. As you back up necessary information on a periodic basis, the older backups are no longer significant or entirely useful. But there is a chance, however slim, that you could need older versions of these resources at some time.
We have briefly mentioned a list of possible resources that are prime candidates for archiving, namely old or outdated log files, configuration files, source code for special applications, and personal data. This list is just the tip of the iceberg; as you look around, you will find other files that can safely be archived as well.
How do you determine what to archive? Unfortunately, there is no single answer that will cover every possible scenario and environment. But, here are a few examples that illustrate some of the resources you should or should not archive. For example, corporate entities may be required by law to archive financial, legal, or medical records for a particular period of time; a copy of the corporate financial accounts can be archived at a particular date to read-only media. On the other hand, you may not want to archive source code to applications that you can easily download from online repositories, such as Perl or Python (see Chapter 12 for more information on managing your system with Perl).
However, a reasonably efficient overall technique is to use the find command to determine all files that have not been modified in more than six months and archive them, deleting them from the system in the process, if necessary. If you use this automated technique, along with a manual list of special files, you will be assured of archiving a broad range of data and resources.
Because backup and archival are similar and related processes, you can use much the same criteria used for assessing media for backups and apply them to archiving, as well; namely, the following:
The length of time you need to store the archive
The frequency with which you are likely to access the archive
Whether to create a permanent archive or a temporary one
The price you are willing to pay for the media
You should especially consider the first and last points in the preceding list. Because archives are typically stored for a longer period of time than up-to-date backups, you can have a number of archives on your hands at any given time. So cost is a very important factor. Optical media, such as CD-ROM and DVD, are ideal for the archiving process because they fit both of these criteria very well. However, you should feel free to look through the table of media in the backup section and decide for yourself what media best fits your needs and requirements.
You can use tar in much the same manner as before; the main difference is that you will be using it to archive data as opposed to backing it up.
Here is a slightly more advanced example that uses the find command in conjunction with the xargs command to pass a list of files that have not been accessed in six months or more from the user s home directories to tar for archival:
# find /home -atime +180 -print \ xargs tar cvPhz f home-180-200403101335.tar.gz
The find command finds all files in /home that have not been accessed in 180 days or more, using the “atime switch. Those filenames are then passed to the xargs command via a pipe; notice the vertical (bar) character. The xargs command d invokes tar , with our standard set of switches, passing these file names as arguments. See Chapter 13 for more examples that illustrate find and xargs in action.
It s worth taking a little time to learn to use the find command well. With mastery of the find command s options, you can develop a very powerful and effective backup and archival strategy; see the manual page for find for more information on the various options.
For simplicity, we ve exclusively used tar as the application of choice for both simple backups and archiving. However, you can use other useful applications, namely cpio and dump . The cpio archiving application allows you to easily specify disparate files and directories to backup or archive by specifying them on the command line. On the other hand, dump makes it easy for you to back up entire filesystems, although the resulting file is far from portable even across other UNIX platforms.
Early versions of tar had their share of limitations. For example:
They could not create archives that spanned multiple (tape) volumes .
They could not handle all types of files.
They could not compress the archives on the fly.
They could not handle bad areas on a tape.
Over time, tar has improved greatly, and these issues are no longer relevant. However, the limitations of early versions of tar were also the motivation behind the development of the cpio archiving tool. At this point in time, there are few differences in functionality between tar and cpio , except for the syntax, so you should use the one that you prefer.
Let s look at an example of cpio in action:
# find /etc -print cpio -vo > etc-200403101335.cpio cpio: /etc: truncating inode number /etc cpio: /etc/sysconfig: truncating inode number /etc/sysconfig cpio: /etc/sysconfig/network-scripts: truncating inode number /etc/sysconfig/network-scripts cpio: /etc/sysconfig/network-scripts/ifup-aliases: truncating inode number /etc/sysconfig/network-scripts/ifup-aliases ...
Here, the find command finds a list of all the files and subdirectories in /etc , and they are passed to the cpio command, which creates the archive on the filesystem. The -o switch asks cpio to create the archive, while the -v (verbose) switch gives you verbose output (just as it did for the tar command).
As you can see, passing filenames in this manner to cpio allows you great flexibility in selecting what files to archive and what not to archive. For example, take a look at this:
# find /home -name '*.p[lm]' -print cpio -vo > perl-200403151335.cpio cpio: /home/dwilliams/lib/CGI/Lite.pm: truncating inode number /home/dwilliams/lib/CGI/Lite.pm /home/gundavaram/bin/ckmail.pl cpio: /home/mellett/lib/Tie/Handle.pm: truncating inode number /home/mellett/lib/Tie/Handle.pm ...
Here, we ask find to locate all Perl application files that end with either a .pl or .pm extension, and send that list to cpio .
You ve learned how to create cpio archives, but not yet how to restore them. In fact, it is a rather easy process. Here s an example that restores data from a cpio file:
# cpio -vi < perl-200403151335.cpio cpio: /home/dwilliams/lib/CGI/Lite.pm not created: newer or same age version exists cpio: /home/gundavaram/bin/ckmail.pl not created: newer or same age version exists cpio: /home/mellett/lib/Tie/Handle.pm not created: newer or same age version exists 11 blocks
As you can see, by default cpio does not create the files if the destination file is newer than the archived file or has not been modified. If the existing file is older than the archived file, cpio overwrites that file with the archived version. You can change this behavior with the --unconditional switch:
# cpio -vi --unconditional < perl-200403151335.cpio
Or better yet, if you want to simply restore the archive to another location, use the following syntax:
# cpio -vi -d --no-absolute-filenames --unconditional --force-local < perl-200403151335.cpio
What if you don t want to extract all of the files, but only a single file? First, you can use the --list option to get a list of all the files in the archive:
# cpio -vi --list < perl-200403151335.cpio -rw-rw-r-- 1 dwillia eng 0 Mar 14 03:07 /home/dwilliams/lib/CGI/Lite.pm -rwx------ 1 gundava eng 5112 Mar 12 03:35 /home/gundavaram/bin/ckmail.pl -rw-rw-r-- 1 mellett eng 0 Mar 14 03:07 /home/mellett/lib/Tie/Handle.pm 11 blocks
Then, use the -E switch to extract a specific file:
# cpio -vi -E /home/gundavaram/bin/ckmail.pl < perl-200403151335.cpio
We hope this brief introduction to cpio has provided you with a chance to learn more about this powerful archival solution. In summary, both tar and cpio have similar functionality, although the manner in which you specify the files to archive is different. The final backup application to be discussed in this chapter is dump , which is quite different than either tar or cpio .
What if you want to back up an entire partition, or filesystem, incrementally, complete with the correct ownership, permissions, and creation and modification dates? You could certainly use the tar and cpio tools, but they would be far from efficient because you must determine what has been modified and when. However, dump understands the layout of the filesystem, but can only work with certain types, notably ext2 and ext3 . Unfortunately, the dump format is not at all portable. With tar and cpio , there is a good chance that another UNIX system can read and process their archives, but with dump , there is little chance at all. As a result, dump is sensible for backups, but not for archiving files.
In addition, dump is designed to perform incremental backups, recognizing up to ten backup levels. When you use dump , you need to assign a backup level from 0 to 9. The strategy is to perform a full, or complete, backup first ”this is referred to as a level 0 backup. Then, periodically, you can perform incremental backups at different levels. If you are curious as to how dump keeps track of modified files, take a look at the /etc/dumpdates configuration file, which looks like the following:
/dev/hda2 0 Wed Mar 10 13:30:00 2004 /dev/hda2 9 Thu Mar 11 13:30:00 2004
For example, if you later back up at level 9, dump archives files that have been modified since the last backup at level 8 or lower. With this type of strategy in place, you can simply recover the entire system from several sets of backups: the full backup plus a series of incremental backups.
Let s look at an example:
# dump -0 /dev/hda2 -u -f /data/backup/hda2-200403101430-0.dmp DUMP: Date of this level 0 dump: Wed Mar 10 14:30:00 2004 DUMP: Dumping /dev/hda2 (/home) to /data/backup/hda2-200403101430-0.dmp DUMP: Added inode 8 to exclude list (journal inode) DUMP: Added inode 7 to exclude list (resize inode) DUMP: Label: /home DUMP: mapping (Pass I) [regular files] DUMP: mapping (Pass II) [directories] DUMP: estimated 40720 tape blocks. DUMP: Volume 1 started with block 1 at: Wed Mar 10 14:30:03 2004 DUMP: dumping (Pass III) [directories] DUMP: dumping (Pass IV) [regular files] DUMP: Closing /data/backup/hda2-200403101430-0.dmp DUMP: Volume 1 completed at: Wed Mar 10 14:30:36 2004 DUMP: Volume 1 43190 tape blocks (42.18MB) DUMP: Volume 1 took 0:00:30 DUMP: Volume 1 transfer rate: 1439 kB/s DUMP: 43190 tape blocks (42.18MB) on 1 volume(s) DUMP: finished in 30 seconds, throughput 1439 kBytes/sec DUMP: Date of this level 0 dump: Wed Mar 10 14:30:00 2004 DUMP: Date this dump completed: Wed Mar 10 14:30:36 2004 DUMP: Average transfer rate: 1439 kB/s DUMP: DUMP IS DONE
This performs a level 0 backup of the /dev/hda2 partition to the specified file. The -u option is very important because it asks dump to update the /etc/dumpdates file. It s important to remember to label your backups properly, whether you are archiving your data onto hard disk or onto other media. This allows you to find your backups quickly if disaster ever strikes.
Like tar , dump also supports compression on the fly. By specifying either the -j (bzlib) switch or the -z (zlib) switch, followed by a compression level from 1 to 9, you can enable compression. Here is an example:
# dump -0 /dev/hda2 -u -z9 -f /data/backup/hda2-200403111430-0.dmp DUMP: Date of this level 0 dump: Thu Mar 11 14:30:00 2004 DUMP: Dumping /dev/hda2 (/home) to /data/backup/hda2-200403111430-0.dmp DUMP: Added inode 8 to exclude list (journal inode) DUMP: Added inode 7 to exclude list (resize inode) DUMP: Compressing output at compression level 9 (zlib) ... DUMP: Wrote 44940kB uncompressed, 29710kB compressed, 1.513:1 DUMP: DUMP IS DONE
After a full backup, you should typically perform a series of incremental backups, like so:
# dump -9 /dev/hda2 -u -z9 -f /data/backup/hda2-200403121430-9.dmp DUMP: Date of this level 9 dump: Fri Mar 12 14:30:00 2004 DUMP: Date of last level 0 dump: Thu Mar 11 14:30:00 2004 DUMP: Dumping /dev/hda2 (/home) to /data/backup/hda2-200403121430-9.dmp ...
You can decide to do a partial restore, selecting a set of specific files, or a full restore. To do a full restore, however, you need to create and mount the target filesystem, like so:
# mke2fs /dev/hda2 # mount /dev/hda2 /home # cd /home# restore -rvf /data/backup/hda2-200403101430-0.dmp Verify tape and initialize maps Input is from file/pipe Input block size is 32 Dump date: Wed Mar 10 14:30:00 2004 Dumped from: the epoch Level 0 dump of /home on localhost.localdomain:/dev/hda2 Label: /home Begin level 0 restore Initialize symbol table. Extract directories from tape Calculate extraction list. Make node ./lost+found Make node ./dwilliams Make node ./dwilliams/.gnome2 Make node ./dwilliams/.gnome2/accels Make node ./dwilliams/.gnome2/share ...
This restores the dumped archive. The -r asks restore to rebuild the filesystem, the -v switch turns on verbose messages, and the -f option specifies the dump file. In turn, you could restore the incremental backups in much the same manner. You need to make sure that you restore the dumps in the order they were created, starting with the lowest -level dump, typically level 0.
In summary, backing up and archiving data and resources is critical for quickly recovering from possible disasters. Linux comes with a vast assortment of applications and tools to make archiving a convenient process. There are tools available for mirroring partitions and files, performing network backups, such as Amanda, as well as backups to CD-ROM. Once you decide on a backup and archival strategy, you should investigate these tools to determine the application that is best suited for your requirements.
Next, let s shift gears a bit and talk about building applications from source code. Say, for example, you find a really powerful backup or archival application. After looking at its documentation, you realize, to your disappointment, that it does not come with a pre-built binary package for Linux, but only source code. How do you go about building it? Proceed to the next section to find out more.