7.3. rdiff-backuprdiff-backup is a program written in Python and C that uses the same rolling-checksum algorithm that rsync does. Although rdiff-backup and rsync are similar and use the same algorithm, they do not share any code and must be installed separately. When backing up, both rsnapshot and rdiff-backup create a mirror of the source directory. For both, the current backup is just a copy of the source, ready to be copied and verified like an ordinary directory. And both can be used over ssh in either push or pull mode. The most important conceptual differences between rsync snapshots and rdiff-backup are how they store older backups and how they store file metadata. An rsync-snapshot system basically stores older backups as complete copies of the source. As mentioned earlier in the chapter, by being clever with hard links, these copies do not take long to create and usually do not take up nearly as much disk space as unlinked copies. However, every distinct version of every file in the backup is stored as a separate copy of that file. For instance, if you add one line to a file or change a file's permissions, that file is stored twice in the backup archive in its entirety. This can be troublesome especially with logfiles, which grow slightly quite often. On the other hand, rdiff-backup does not keep complete copies of older files in the backup archive. Instead, it stores only the compressed differences between current files and their older versions, called diffs or deltas. For logfiles, rdiff-backup would not keep a separate copy of the older and slightly shorter log. Instead, it would save to the archive a delta file that contains the information "the older version is the current version but without the last few lines." These deltas are often much smaller than an entire copy of the older file. When a file has changed completely, the delta is about the same size as the older version (but is then compressed). When an rdiff-backup archive has multiple versions of a file, the program stores a series of deltas. Each one contains instructions on how to construct an earlier version of a file from a later one. When restoring, rdiff-backup starts with the current version and applies deltas in reverse order. Besides storing older versions as deltas instead of copies, rdiff-backup also stores the (compressed) metadata of all files in the backup archive. Metadata is data associated with a file that describes the file's real data. Some examples of file metadata are ownership, permissions, modification time, and file length. This metadata does not take up much space because metadata is generally very compressible. Newer versions go further and store only deltas of the metadata, for even more space efficiency. At the cost of some disk space, storing metadata separately has several uses: first, data loss is avoided even if the destination filesystem does not support all the features of the source filesystem. For instance, ownership can be preserved even without root access, and Linux filesystems with symbolic links, device files, and ACLs can be backed up to a Windows filesystem. You don't have to examine the details of each filesystem to know that the backup will work. Second, with metadata stored separately, rdiff-backup is less disk-intensive on the backup server. When backing up, rdiff-backup does not need to traverse the mirror's directory structure to determine which files have changed. Third, metadata such as SHA-1 checksums can be used to verify the integrity of backups. 7.3.1. AdvantagesHere are some advantages of using rdiff-backup instead of an rsync script or rsnapshot:
7.3.2. DisadvantagesLet's be honest. rdiff-backup has some disadvantages, too:
7.3.3. Quick StartHere's a basic, but complete, example of how to use rdiff-backup to back up and restore a directory. Suppose the directory to be backed up is called <source>, and we want our archive directory to be called <destination>: $ rdiff-backup source destination This command backs up the <source> directory into <destination>. If you look into <destination>, you'll see that it is just like <source> but contains a directory called <destination>/rdiff-backup-data where the metadata and deltas are stored. The rdiff-backup-data directory is laid out in a fairly straightforward wayall information is either in (possibly gzipped) text files or in deltas readable with the rdiff utilitybut we don't have the space to go into the data format here. The first time you run this command, it creates the <destination> and <destination>/rdiff-backup-data directories. On subsequent runs, it sees that <destination> exists and makes an incremental backup instead. For daily backup usage, no special switches are necessary. Suppose you accidentally delete the file <source>/foobar and want to restore it from backups. Both of these commands do that: $ cp -a destination/foobar source $ rdiff-backup -r now destination/foobar source The first command works because <destination>/foobar is a mirror of <source>/foobar, so you can use cp or any other utility to restore. The second command contains the - r switch, which tells rdiff-backup to enter restore mode, and restore the specified file at the given time. In the example, now is specified, meaning restore the most recent version of the file. rdiff-backup accepts a large variety of time formats. Now suppose you realize you deleted the important file <source>/foobar a week ago and want to restore. You can't use cp to restore because the file is no longer present in <destination> in its original form (in this case it's gzipped in the <destination>/rdiff-backup-data directory). However the -r syntax still works, except you tell it 7D for seven days: $ rdiff-backup -r 7D destination/foobar source Finally, suppose that the <destination> directory is getting too big, and you need to delete older backups to save disk space. This command deletes backup information more than one year old: $ rdiff-backup -remove-older-than 1Y destination Just like rsync, rdiff-backup allows the source or destination directory (or both) to be on a remote computer. For example, to back up the local directory <source> to the <destination> directory on the computer host.net, use the command: $ rdiff-backup source user@host.net::destination This works as long as rdiff-backup is installed on both computers, and host.net can receive ssh connections. The earlier commands also work if user@host.net::<destination> is substituted for <destination>. 7.3.4. Windows, Mac OS X, and the FutureAlthough rdiff-backup was originally developed under Linux for Unix-style systems, newer versions have features that are useful to Windows and Mac users. For instance, rdiff-backup can back up case-sensitive filesystems and files whose names contain colons (:) to Windows filesystems. Also, rdiff-backup supports Mac resource forks and Finder information, and is easy to install on Mac OS X because it is included in the Fink distribution. Unfortunately, rdiff-backup is a bit trickier to install natively under Windows; currently, cygwin is probably the easiest way. Future development of rdiff-backup may consist mostly of making sure that the newer features like full Mac OS X support are as stable as the core Unix support, and adding support for new filesystem features as they emerge. For more information on rdiff-backup, including full documentation and a pointer to the mailing list, see the rdiff-backup project home page at http://rdiff-backup.nongnu.org/.
|