Securely Transfer and Back Up Files


rsync -v

rsync is one of the coolest, most useful programs ever invented, and many people rely on it every day (like me!). What does it do? Its uses are myriad (here we go again into "you could write a book about this command!"), but let's focus on one very powerful, necessary feature: its capability to back up files effectively and securely, with a minimum of network traffic.

Let's say you intend to back up 2GB of files every night from a machine named coleridge (username: sam) to another computer named wordsworth (username: will). Without rsync, you're looking at a transfer of 2GB every single night, a substantial amount of traffic, even on a fast network connection. With rsync, however, you might be looking at a transfer that will take a few moments at most. Why? Because when rsync backs up those 2GB, it transfers only the differences between all the files that make up those 2GB of data. If only a few hundred kilobytes changed in the past 24 hours, that's all that rsync transfers. If instead it was 100MB, that's what rsync copies over. Either way, it's much less than 2GB.

Here's a command that, run from coleridge, transfers the entire content of the documents directory to a backup drive on wordsworth. Look at the command, look at the results, and then you can walk through what those options mean (the command is given first with long options instead of single letters for readability, and then with single letters, if available, for comparison).

[View full width]

$ rsync --verbose --progress --stats --recursive --times --perms --links --compress --rsh=ssh --delete /home/sam/documents/will@wordsworth:/media/backup/documents $ rsync -v --progress --stats -r -t -p -l -z -e ssh --delete /home/sam/documents/ will@wordsworth:/media/backup/documents


Of course, you could also run the command this way, if you wanted to combine all the options:

[View full width]

$ rsync -vrtplze ssh --progress --stats --delete /home/sam/documents/ will@wordsworth: /media/backup/documents


Upon running rsync using any of the methods listed, you'd see something like this:

building file list ... 107805 files to consider deleting clientele/Linux_Magazine/do_it_yourself/13/gantt_chart.txt~ deleting Security/diebold_voting/black_box_voting/bbv_chapter-9.pdf deleting E-commerce/Books/20050811 eBay LIL ABNER DAILIES 6 1940.txt Security/electronic_voting/diebold/black_box_voting/bbv_chapter-9.pdf legal_issues/free_speech/Timeline A history of free speech.txt E-commerce/2005/Books/20050811 eBay LIL ABNER DAILIES 6 1940.txt connectivity/connectivity_info.txt [Results greatly truncated for length] Number of files: 107805 Number of files transferred: 120 Total file size: 6702042249 bytes Total transferred file size: 15337159 bytes File list size: 2344115 Total bytes sent: 2345101 Total bytes received: 986 sent 2345101 bytes  received 986 bytes  7507.48 bytes/sec total size is 6702042249  speedup is 2856.69 


Take a look at those results. rsync first builds a list of all files that it must consider107,805 in this caseand then deletes any files on the target (wordsworth) that no longer exist on the source (coleridge). In this example, three files are deleted: a backup file (the ~ is a giveaway on that one) from an article for Linux Magazine, a PDF on electronic voting, and then a text receipt for a purchased book.

After deleting files, rsync copies over any that have changed, or if it's the same file, just the changes to the file, which is part of what makes rsync so slick. In this case, four files are copied over. It turns out that the PDF was actually moved to a new subdirectory, but to rsync it's a new file, so it's copied over in its entirety.

The same is true for the text receipt. The A history of free speech.txt file is an entirely new file, so it's copied over to wordsworth as well.

After listing the changes it made, rsync gives you some information about the transfer as a whole. 120 files were transferred, 15337159 bytes (about 14MB) out of 6702042249 bytes (around 6.4GB). Other data points are contained in the summation, but those are the key ones.

Now let's look at what you asked your computer to do. The head and tail of the command are easy to understand: the command rsync at the start, then options, and then the source directory you're copying from (/home/sam/documents/, found on coleridge), followed by the target directory you're copying to (/media/backup/documents, found on wordsworth). Before going on to examine the options, you need to focus on the way the source and target directories are designated because there's a catch in there that will really hurt if you don't watch it.

You want to copy the contents of the documents directory found on coleridge, but not the directory itself, and that's why you use documents/ and not documents. The slash after documents in /home/sam/documents/ tells rsync that you want to copy the contents of that directory into the documents directory found on wordsworth; if you instead used documents, you'd copy the directory and its contents, resulting in /media/backup/documents/documents on wordsworth.

Note

The slash is only important on the source directory; it doesn't matter whether you use a slash on the target directory.


There's one option that wasn't listed previously but is still a good idea to include when you're figuring out how your rsync command will be structured: -n (or --dry-run). If you include that option, rsync runs, but doesn't actually delete or copy anything. This can be a lifesaver if your choices would have resulted in the deletion of important files. Before committing to your rsync command, especially if you're including the --delete option, do yourself a favor and perform a dry run first!

Now on to the options you used. The -v (or --verbose) option, coupled with --progress, orders rsync to tell you in detail what it's doing at all times. You saw that in the results shown earlier in this section, in which rsync tells you what it's deleting and what it's copying. If you're running rsync via an automated script, you don't need this option, although it doesn't hurt; if you're running rsync interactively, this is an incredibly useful display of information to have in front of you because you can see what's happening.

The metadata you saw at the end of rsync's resultsthe information about the number and size of files transferred, as well as other interesting dataappeared because you included the --stats option. Again, if you're scripting rsync, this isn't needed, but it's sure nice to see if you're running the program manually.

You've seen the -r (or --recursive) option many other times with other commands, and it does here what it does everywhere else. Instead of stopping in the current directory, it tunnels down through all subdirectories, affecting everything in its path. Because you want to copy the entire documents directory and all of its contents, you want to use -r.

The -t (or --times) option makes rsync transfer the files' modification times along with the files. If you don't include this option, rsync cannot tell what it has previously transferred, and the next time you run the command, all files are copied over again. This is probably not the behavior you want, as it completely obviates the features that make rsync so useful, so be sure to include -t.

Permissions were discussed in Chapter 7, "Ownerships and Permissions," and here they reappear again. The -p (or --perms) option tells rsync to update permissions on any files found on the target so they match what's on the source. It's part of making the backup as accurate as possible, so it's a good idea.

When a soft link is found on the source, the -l (or --links) option re-creates the link on the target. Instead of copying the actual file, which is obviously not what the creator of a soft link intended, the link to the file is copied over, again preserving the original state of the source.

Even over a fast connection, it's a good idea to use the -z (or --compress) option, as rsync then uses gzip compression while transferring files. On a slow connection, this option is mandatory; on a fast connection, it saves you that much more time.

In the name of security, you're using the -e (or --rsh=ssh) option, which tells rsync to tunnel all of its traffic using ssh. Easy file transfers, and secure as well? Sign me up!

Note

If you're using ssh, why didn't you have to provide a password? Because you used the technique displayed in "Securely Log In to Another Machine Without a Password" to remove the need to do so.


We've saved the most dangerous for last: --delete. If you're creating a mirror of your files, you obviously want that mirror to be as accurate as possible. That means deleted files on the source need to be deleted on the target as well. But that also means you can accidentally blow away stuff you wanted to keep. If you're going to use the --delete optionand you probably willbe sure to use the -n (or --dry-run) option that was discussed at the beginning of your look at rysnc's options.

rsync has many other options and plenty of other ways to use the command (the man page identifies eight ways it can be used, which is impressive), but the setup discussed in this section will definitely get you started. Open up man rsync on your terminal, or search Google for "rsync tutorial," and you'll find a wealth of great information. Get to know rsync: When you yell out "Oh no!" upon deleting a file, but then follow it with "Whew! It's backed up using rsync!" you'll be glad you took the time to learn this incredibly versatile and useful command.

Tip

If you want to be really safe with your data, set up rsync to run with a regular cron job. For instance, create a file titled backup.sh (~/bin is a good place for it) and type in the command you've been using:


[View full width]

$ rsync --verbose --progress --stats --recursive --times --perms --links --compress --rsh=ssh --delete /home/sam/documents/ will@wordsworth:/media/backup/documents


Use chmod to make the file executable:

$ chmod 744 /home/scott/bin/backup.sh 


Then add the following lines to a file named cronfile (I put mine in my ~/bin directory as well):

# backup documents every morning at 3:05 am 05 03 * * * /home/scott/bin/backup.sh 


The first line is a comment explaining the purpose of the job, and the second line tells cron to automatically run /home/scott/bin/backup.sh every night at 3:05 a.m.

Now add the job to cron:

$ crontab /home/scott/bin/cronfile 


Now you don't need to worry about your backups ever again. It's all automated for youjust make sure you leave your computers on overnight!

(For more on cron, see man cron or "Newbie: Intro to cron" at www.unixgeeks.org/security/newbie/unix/cron-1.html.)



Linux Phrasebook
Linux Phrasebook
ISBN: 0672328380
EAN: 2147483647
Year: 2007
Pages: 288

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net