The next three sections conclude this chapter by exploring a handful of additional utilities for processing directories (a.k.a. "folders") on your computer with Python. They present directory copy, deletion, and comparison scripts that demonstrate system tools at work. All of these were born of necessity, are generally portable among all Python platforms, and illustrate Python development concepts along the way.
Some of these scripts do something too unique for the visitor module's classes we've been applying in early sections of this chapter, and so require more custom solutions (e.g., we can't remove directories we intend to walk through). Most have platform-specific equivalents too (e.g., drag-and-drop copies), but the Python utilities shown here are portable, easily customized, callable from other scripts, and surprisingly fast.
5.6.1 A Python Tree Copy Script
My CD writer sometimes does weird things. In fact, copies of files with odd names can be totally botched on the CD, even though other files show up in one piece. That's not necessarily a show-stopper -- if just a few files are trashed in a big CD backup copy, I can always copy the offending files to floppies one at a time. Unfortunately, Windows drag-and-drop copies don't play nicely with such a CD: the copy operation stops and exits the moment the first bad file is encountered. You only get as many files as were copied up to the error, but no more.
There may be some magical Windows setting to work around this feature, but I gave up hunting for one as soon as I realized that it would be easier to code a copier in Python. The cpall.py script in Example 5-20 is one way to do it. With this script, I control what happens when bad files are found -- skipping over them with Python exception handlers, for instance. Moreover, this tool works with the same interface and effect on other platforms. It seems to me, at least, that a few minutes spent writing a portable and reusable Python script to meet a need is a better investment than looking for solutions that work on only one platform (if at all).
Example 5-20. PP2ESystemFiletoolscpall.py
######################################################### # Usage: "python cpall.py dir1 dir2". # Recursive copy of a directory tree. Works like a # unix "cp -r dirFrom/* dirTo" command, and assumes # that dirFrom and dirTo are both directories. Was # written to get around fatal error messages under # Windows drag-and-drop copies (the first bad file # ends the entire copy operation immediately), but # also allows you to customize copy operations. # May need more on Unix--skip links, fifos, etc. ######################################################### import os, sys verbose = 0 dcount = fcount = 0 maxfileload = 100000 blksize = 1024 * 8 def cpfile(pathFrom, pathTo, maxfileload=maxfileload): """ copy file pathFrom to pathTo, byte for byte """ if os.path.getsize(pathFrom) <= maxfileload: bytesFrom = open(pathFrom, 'rb').read( ) # read small file all at once open(pathTo, 'wb').write(bytesFrom) # need b mode on Windows else: fileFrom = open(pathFrom, 'rb') # read big files in chunks fileTo = open(pathTo, 'wb') # need b mode here too while 1: bytesFrom = fileFrom.read(blksize) # get one block, less at end if not bytesFrom: break # empty after last chunk fileTo.write(bytesFrom) def cpall(dirFrom, dirTo): """ copy contents of dirFrom and below to dirTo """ global dcount, fcount for file in os.listdir(dirFrom): # for files/dirs here pathFrom = os.path.join(dirFrom, file) pathTo = os.path.join(dirTo, file) # extend both paths if not os.path.isdir(pathFrom): # copy simple files try: if verbose > 1: print 'copying', pathFrom, 'to', pathTo cpfile(pathFrom, pathTo) fcount = fcount+1 except: print 'Error copying', pathFrom, to, pathTo, '--skipped' print sys.exc_type, sys.exc_value else: if verbose: print 'copying dir', pathFrom, 'to', pathTo try: os.mkdir(pathTo) # make new subdir cpall(pathFrom, pathTo) # recur into subdirs dcount = dcount+1 except: print 'Error creating', pathTo, '--skipped' print sys.exc_type, sys.exc_value def getargs( ): try: dirFrom, dirTo = sys.argv[1:] except: print 'Use: cpall.py dirFrom dirTo' else: if not os.path.isdir(dirFrom): print 'Error: dirFrom is not a directory' elif not os.path.exists(dirTo): os.mkdir(dirTo) print 'Note: dirTo was created' return (dirFrom, dirTo) else: print 'Warning: dirTo already exists' if dirFrom == dirTo or (hasattr(os.path, 'samefile') and os.path.samefile(dirFrom, dirTo)): print 'Error: dirFrom same as dirTo' else: return (dirFrom, dirTo) if __name__ == '__main__': import time dirstuple = getargs( ) if dirstuple: print 'Copying...' start = time.time( ) apply(cpall, dirstuple) print 'Copied', fcount, 'files,', dcount, 'directories', print 'in', time.time( ) - start, 'seconds'
This script implements its own recursive tree traversal logic, and keeps track of both the "from" and "to" directory paths as it goes. At every level, it copies over simple files, creates directories in the "to" path, and recurs into subdirectories with "from" and "to" paths extended by one level. There are other ways to code this task (e.g., other cpall variants on the book's CD change the working directory along the way with os.chdir calls), but extending paths on descent works well in practice.
Notice this script's reusable cpfile function -- just in case there are multigigabyte files in the tree to be copied, it uses a file's size to decide whether it should be read all at once or in chunks (remember, the file read method without arguments really loads the while file into an in-memory string). Also note that this script creates the "to" directory if needed, but assumes it is empty when a copy starts up; be sure to remove the target directory before copying a new tree to its name (more on this in the next section).
Here is a big book examples tree copy in action on Windows; pass in the name of the "from" and "to" directories to kick off the process, and run a rm shell command (or similar platform-specific tool) to delete the target directory first:
C: emp>rm -rf cpexamples C: emp>python %X%systemfiletoolscpall.py examples cpexamples Note: dirTo was created Copying... Copied 1356 files, 118 directories in 2.41999995708 seconds C: emp>fc /B examplesSystemFiletoolscpall.py cpexamplesSystemFiletoolscpall.py Comparing files examplesSystemFiletoolscpall.py and cpexamplesSystemFiletoolscpall.py FC: no differences encountered
This run copied a tree of 1356 files and 118 directories in 2.4 seconds on my 650 MHz Windows 98 laptop (the built-in time.time call can be used to query the system time in seconds). It runs a bit slower if programs like MS Word are open on the machine, and may run arbitrarily faster or slower for you. Still, this is at least as fast as the best drag-and-drop I've timed on Windows.
So how does this script work around bad files on a CD backup? The secret is that it catches and ignores file exceptions, and keeps walking. To copy all the files that are good on a CD, I simply run a command line like this:
C: emp>python %X%systemfiletoolscpall_visitor.py g:PP2ndEdexamplesPP2E cpexamples
Because the CD is addressed as "G:" on my Windows machine, this is the command-line equivalent of drag-and-drop copying from an item in the CD's top-level folder, except that the Python script will recover from errors on the CD and get the rest. In general, cpall can be passed any absolute directory path on your machine -- even ones that mean devices like CDs. To make this go on Linux, try a root directory like /dev/cdrom to address your CD drive.
5.6.2 Recoding Copies with a Visitor-Based Class
When I first wrote the cpall script just discussed, I couldn't see a way that the visitor class hierarchy we met earlier would help -- two directories needed to be traversed in parallel (the original and the copy), and visitor is based on climbing one tree with os.path.walk. There seemed no easy way to keep track of where the script is at in the copy directory.
The trick I eventually stumbled onto is to not keep track at all. Instead, the script in Example 5-21 simply replacesthe "from" directory path string with the "to" directory path string, at the front of all directory and pathnames passed-in from os.path.walk. The results of the string replacements are the paths that the original files and directories are to be copied to.
Example 5-21. PP2ESystemFiletoolscpall_visitor.py
########################################################### # Use: "python cpall_visitor.py fromDir toDir" # cpall, but with the visitor classes and os.path.walk; # the trick is to do string replacement of fromDir with # toDir at the front of all the names walk passes in; # assumes that the toDir does not exist initially; ########################################################### import os from PP2E.PyTools.visitor import FileVisitor from cpall import cpfile, getargs verbose = 1 class CpallVisitor(FileVisitor): def __init__(self, fromDir, toDir): self.fromDirLen = len(fromDir) + 1 self.toDir = toDir FileVisitor.__init__(self) def visitdir(self, dirpath): toPath = os.path.join(self.toDir, dirpath[self.fromDirLen:]) if verbose: print 'd', dirpath, '=>', toPath os.mkdir(toPath) self.dcount = self.dcount + 1 def visitfile(self, filepath): toPath = os.path.join(self.toDir, filepath[self.fromDirLen:]) if verbose: print 'f', filepath, '=>', toPath cpfile(filepath, toPath) self.fcount = self.fcount + 1 if __name__ == '__main__': import sys, time fromDir, toDir = sys.argv[1:3] if len(sys.argv) > 3: verbose = 0 print 'Copying...' start = time.time( ) walker = CpallVisitor(fromDir, toDir) walker.run(startDir=fromDir) print 'Copied', walker.fcount, 'files,', walker.dcount, 'directories', print 'in', time.time( ) - start, 'seconds'
This version accomplishes roughly the same goal as the original, but has made a few assumptions to keep code simple -- the "to" directory is assumed to not exist initially, and exceptions are not ignored along the way. Here it is copying the book examples tree again on Windows:
C: emp>rm -rf cpexamples C: emp>python %X%systemfiletoolscpall_visitor.py examples cpexamples -quiet Copying... Copied 1356 files, 119 directories in 2.09000003338 seconds C: emp>fc /B examplesSystemFiletoolscpall.py cpexamplesSystemFiletoolscpall.py Comparing files examplesSystemFiletoolscpall.py and cpexamplesSystemFiletoolscpall.py FC: no differences encountered
Despite the extra string slicing going on, this version runs just as fast as the original. For tracing purposes, this version also prints all the "from" and "to" copy paths during the traversal, unless you pass in a third argument on the command line, or set the script's verbose variable to 0:
C: emp>python %X%systemfiletoolscpall_visitor.py examples cpexamples Copying... d examples => cpexamples f examplesautoexec.bat => cpexamplesautoexec.bat f examplescleanall.csh => cpexamplescleanall.csh ...more deleted... d examplesSystem => cpexamplesSystem f examplesSystemSystem.txt => cpexamplesSystemSystem.txt f examplesSystemmore.py => cpexamplesSystemmore.py f examplesSystem eader.py => cpexamplesSystem eader.py ...more deleted... Copied 1356 files, 119 directories in 2.31000006199 seconds
Introducing Python
Part I: System Interfaces
System Tools
Parallel System Tools
Larger System Examples I
Larger System Examples II
Part II: GUI Programming
Graphical User Interfaces
A Tkinter Tour, Part 1
A Tkinter Tour, Part 2
Larger GUI Examples
Part III: Internet Scripting
Network Scripting
Client-Side Scripting
Server-Side Scripting
Larger Web Site Examples I
Larger Web Site Examples II
Advanced Internet Topics
Part IV: Assorted Topics
Databases and Persistence
Data Structures
Text and Language
Part V: Integration
Extending Python
Embedding Python
VI: The End
Conclusion Python and the Development Cycle