Recipe10.7.Backing Up Files


Recipe 10.7. Backing Up Files

Credit: Anand Pillai, Tiago Henriques, Mario Ruggier

Problem

You want to make frequent backup copies of all files you have modified within a directory tree, so that further changes won't accidentally obliterate some of your editing.

Solution

Version-control systems, such as RCS, CVS, and SVN, are very powerful and useful, but sometimes a simple script that you can easily edit and customize can be even handier. The following script checks for new files to back up in a tree that you specify. Run the script periodically to keep your backup copies up to date.

import sys, os, shutil, filecmp MAXVERSIONS=100 def backup(tree_top, bakdir_name='bakdir'):     for dir, subdirs, files in os.walk(tree_top):         # ensure each directory has a subdir called bakdir         backup_dir = os.path.join(dir, bakdir_name)         if not os.path.exists(newdir):             os.makedirs(newdir)         # stop any recursing into the backup directories         subdirs[:] = [d for d in subdirs if d != bakdir_name]         for file in files:             filepath = os.path.join(dir, file)             destpath = os.path.join(backup_dir, file)             # check existence of previous versions             for index in xrange(MAXVERSIONS):                 backup = '%s.%2.2d' % (destpath, index)                 if not os.path.exists(backup): break             if index > 0:                 # no need to backup if file and last version are identical                 old_backup = '%s.%2.2d' % (destpath, index-1)                 try:                     if os.path.isfile(old_backup                        ) and filecmp.cmp(abspath, old_backup, shallow=False):                         continue                     except OSError:                         pass             try:                 shutil.copy(filepath, backup)             except OSError:                 pass if _ _name_ _ == '_ _main_ _':     # run backup on the specified directory (default: the current directory)     try: tree_top = sys.argv[1]     except IndexError: tree_top = '.'     backup(tree_top)

Discussion

Although version-control systems are more powerful, this script can be useful in development work. I often customize it, for example, to keep backups only of files with certain extensions (or, when that's handier, of all files except those with certain extensions); it suffices to add an appropriate test at the very start of the for file in files loop, such as:

        name, ext = os.path.splitext(file)         if ext not in ('.py', '.txt', '.doc'): continue

This snippet first uses function splitext from the standard library module os.path to extract the file extension (starting with a period) into local variable ext, then conditionally executes statement continue, which passes to the next leg of the loop, unless the extension is one of a few that happen to be the ones of interest in the current subtree.

Other potentially useful variants include backing files up to some other subtree (potentially on a removable drive, which has some clear advantages for backup purposes) rather than the current one, compressing the files that are being backed up (look at standard library module gzip for this purpose), and more refined ones yet. However, rather than complicating function backup by offering all of these variants as options, I prefer to copy the entire script to the root of each of the various subtrees of interest, and customize it with a little simple editing. While this strategy would be a very bad one for any kind of complicated, highly reusable production-level code, it is reasonable for a simple, straightforward system administration utility such as the one in this recipe.

Worthy of note in this recipe's implementation is the use of function os.walk, a generator from the standard Python library's module os, which makes it very simple to iterate over all or most of a filesystem subtree, with no need for such subtleties as recursion or callbacks, just a straightforward for statement. To avoid backing up the backups, this recipe uses one advanced feature of os.walk: the second one of the three values that os.walk yields at each step through the loop is a list of subdirectories of the current directory. We can modify this list in place, removing some of the subdirectory names it contains. When we perform such an in-place modification, os.walk does not recurse through the subdirectories whose names we removed. The following steps deal only with the subdirectories whose names are left in. This subtle but useful feature of os.walk is one good example of how a generator can receive information from the code that's iterating on it, to affect details of the iteration being performed.

See Also

Documentation of standard library modules os, shutils, and gzip in the Library Reference and Python in a Nutshell.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net