Visitor: Walking Trees Generically | Larger System Examples II

Table of contents:

Visitor Walking Trees Generically

Armed with the portable search_all script from Example 5-10, I was able to better pinpoint files to be edited, every time I changed the book examples tree structure. At least initially, I ran search_all to pick out suspicious files in one window, and edited each along the way by hand in another window.

Pretty soon, though, this became tedious too. Manually typing filenames into editor commands is no fun, especially when the number of files to edit is large. The search for "Part2" shown earlier returned 74 files, for instance. Since there are at least occasionally better things to do than manually start 74 editor sessions, I looked for a way to automatically run an editor on each suspicious file.

Unfortunately, search_all simply prints results to the screen. Although that text could be intercepted and parsed, a more direct approach that spawns edit sessions during the search may be easier, but may require major changes to the tree search script as currently coded. At this point, two thoughts came to mind.

First, I knew it would be easier in the long-run to be able to add features to a general directory searcher as external components, not by changing the original script. Because editing files was just one possible extension (what about automating text replacements too?), a more generic, customizable, and reusable search component seemed the way to go.

Second, after writing a few directory walking utilities, it became clear that I was rewriting the same sort of code over and over again. Traversals could be even further simplified by wrapping common details for easier reuse. The os.path.walk tool helps, but its use tends to foster redundant operations (e.g., directory name joins), and its function-object-based interface doesn't quite lend itself to customization the way a class can.

Of course, both goals point to using an OO framework for traversals and searching. Example 5-11 is one concrete realization of these goals. It exports a general FileVisitor class that mostly just wraps os.path.walk for easier use and extension, as well as a generic SearchVisitor class that generalizes the notion of directory searches. By itself, SearchVisitor simply does what search_all did, but it also opens up the search process to customization -- bits of its behavior can be modified by overloading its methods in subclasses. Moreover, its core search logic can be reused everywhere we need to search; simply define a subclass that adds search-specific extensions.

Example 5-11. PP2EPyToolsvisitor.py

#############################################################
# Test: "python ....PyToolsvisitor.py testmask [string]".
# Uses OOP, classes, and subclasses to wrap some of the 
# details of using os.path.walk to walk and search; testmask 
# is an integer bitmask with 1 bit per available selftest;
# see also: visitor_edit/replace/find/fix*/.py subclasses,
# and the fixsitename.py client script in InternetCgi-Web;
#############################################################

import os, sys, string
listonly = 0

class FileVisitor:
 """
 visits all non-directory files below startDir;
 override visitfile to provide a file handler
 """
 def __init__(self, data=None, listonly=0):
 self.context = data
 self.fcount = 0
 self.dcount = 0
 self.listonly = listonly
 def run(self, startDir=os.curdir): # default start='.'
 os.path.walk(startDir, self.visitor, None) 
 def visitor(self, data, dirName, filesInDir): # called for each dir 
 self.visitdir(dirName) # do this dir first
 for fname in filesInDir: # do non-dir files 
 fpath = os.path.join(dirName, fname) # fnames have no path
 if not os.path.isdir(fpath):
 self.visitfile(fpath)
 def visitdir(self, dirpath): # called for each dir
 self.dcount = self.dcount + 1 # override or extend me
 print dirpath, '...'
 def visitfile(self, filepath): # called for each file
 self.fcount = self.fcount + 1 # override or extend me
 print self.fcount, '=>', filepath # default: print name

class SearchVisitor(FileVisitor):
 """ 
 search files at and below startDir for a string
 """
 skipexts = ['.gif', '.exe', '.pyc', '.o', '.a'] # skip binary files
 def __init__(self, key, listonly=0):
 FileVisitor.__init__(self, key, listonly)
 self.scount = 0
 def visitfile(self, fname): # test for a match
 FileVisitor.visitfile(self, fname)
 if not self.listonly:
 if os.path.splitext(fname)[1] in self.skipexts:
 print 'Skipping', fname
 else:
 text = open(fname).read( )
 if string.find(text, self.context) != -1:
 self.visitmatch(fname, text)
 self.scount = self.scount + 1
 def visitmatch(self, fname, text): # process a match
 raw_input('%s has %s' % (fname, self.context)) # override me lower


# self-test logic
dolist = 1
dosearch = 2 # 3=do list and search
donext = 4 # when next test added

def selftest(testmask):
 if testmask & dolist:
 visitor = FileVisitor( )
 visitor.run('.')
 print 'Visited %d files and %d dirs' % (visitor.fcount, visitor.dcount)

 if testmask & dosearch: 
 visitor = SearchVisitor(sys.argv[2], listonly)
 visitor.run('.')
 print 'Found in %d files, visited %d' % (visitor.scount, visitor.fcount)

if __name__ == '__main__':
 selftest(int(sys.argv[1])) # e.g., 5 = dolist | dorename

This module primarily serves to export classes for external use, but it does something useful when run standalone too. If you invoke it as a script with a single argument "1", it makes and runs a FileVisitor object, and prints an exhaustive listing of every file and directory at and below the place you are at when the script is invoked (i.e., ".", the current working directory):

C:	emp>python %X%PyToolsvisitor.py 1 
. ...
1 => .autoexec.bat
2 => .cleanall.csh
3 => .echoEnvironment.pyw
4 => .Launcher.py
5 => .Launcher.pyc
6 => .Launch_PyGadgets.py
7 => .Launch_PyDemos.pyw
 ...more deleted...
479 => .GuiClockplotterGui.py
480 => .GuiClockplotterText.py
481 => .GuiClockplotterText1.py
482 => .GuiClock\__init__.py
.Guigifs ...
483 => .Guigifsfrank.gif
484 => .Guigifsfrank.note
485 => .Guigifsgilligan.gif
486 => .Guigifsgilligan.note
 ...more deleted...
1352 => .PyToolsvisitor_fixnames.py
1353 => .PyToolsvisitor_find_quiet2.py
1354 => .PyToolsvisitor_find.pyc
1355 => .PyToolsvisitor_find_quiet1.py
1356 => .PyToolsfixeoln_one.doc.txt
Visited 1356 files and 119 dirs

If you instead invoke this script with a "2" as its first argument, it makes and runs a SearchVisitor object, using the second argument as the search key. This form is equivalent to running the search_all.py script we met earlier; it pauses for an Enter key press after each matching file is reported (lines in bold font here):

C:	empexamples>python %X%PyToolsvisitor.py 2 Part3 
. ...
1 => .autoexec.bat
2 => .cleanall.csh
.cleanall.csh has Part3 
3 => .echoEnvironment.pyw
4 => .Launcher.py
.Launcher.py has Part3 
5 => .Launcher.pyc
Skipping .Launcher.pyc
6 => .Launch_PyGadgets.py
7 => .Launch_PyDemos.pyw
8 => .LaunchBrowser.out.txt
9 => .LaunchBrowser.py
10 => .Launch_PyGadgets_bar.pyw
11 => .makeall.csh
.makeall.csh has Part3 
...
 ...more deleted
...
1353 => .PyToolsvisitor_find_quiet2.py
1354 => .PyToolsvisitor_find.pyc
Skipping .PyToolsvisitor_find.pyc
1355 => .PyToolsvisitor_find_quiet1.py
1356 => .PyToolsfixeoln_one.doc.txt
Found in 49 files, visited 1356

Technically, passing this script a first argument "3" runs both a FileVisitor and a SearchVisitor (two separate traversals are performed). The first argument is really used as a bitmask to select one or more supported self-tests -- if a test's bit is on in the binary value of the argument, the test will be run. Because 3 is 011 in binary, it selects both a search (010) and a listing (001). In a more user-friendly system we might want to be more symbolic about that (e.g., check for "-search" and "-list" arguments), but bitmasks work just as well for this script's scope.

Text Editor War and Peace

In case you don't know, the vi setting used in the visitor_edit.py script is a Unix text editor; it's available for Windows too, but is not standard there. If you run this script, you'll probably want to change its editor setting on your machine. For instance, "emacs" should work on Linux, and "edit" or "notepad" should work on all Windows boxes.

These days, I tend to use an editor I coded in Python (PyEdit), so I'll leave the editor wars to more politically-minded readers. In fact, changing the script to assign editor either of these ways:

 editor = r'python GuiTextEditor	extEditor.pyw'
 editor = r'start GuiTextEditor	extEditor.pyw'

will open the matched file in a pure and portable Python text editor GUI -- one coded in Python with the Tkinter interface, which runs on all major GUI platforms, and which we'll meet in Chapter 9. If you read about the start command in Chapter 3, you know that the first editor setting pauses the traversal while the editor runs, but the second does not (you'll get as many PyEdit windows as there are matched files).

This may fail, however, for very long file directory names (remember, os.system has a length limit unlike os.spawnv). Moreover, the path to the textEditor.pyw program may vary depending on where you are when you run visitor_edit.py (i.e., the CWD). There are ways around this latter problem:

Prefixing the script's path string with the value of the PP2EHOME shell variable, fetched with os.environ; with the standard book setup scripts, PP2EHOME gives the absolute root directory, from which the editor script's path can be found.

Prefixing the path with sys.path[0] and a '../' to exploit the fact that the first import directory is always the script's home directory (see Section 2.7 in Chapter 2).

Windows shortcuts or Unix links to the editor script from the CWD.

Searching for the script naively with Launcher.findFirst or guessLocation, described near the end of Chapter 4.

But these are all beyond the scope of a sidebar on text editor politics.

5.5.1 Editing Files in Directory Trees

Now, after genericizing tree traversals and searches, it's an easy step to add automatic file editing in a brand-new, separate component. Example 5-12 defines a new EditVisitor class that simply customizes the visitmatch method of the SearchVisitor class, to open a text editor on the matched file. Yes, this is the complete program -- it needs to do something special only when visiting matched files, and so need provide only that behavior; the rest of the traversal and search logic is unchanged and inherited.

Example 5-12. PP2EPyToolsvisitor_edit.py

###############################################################
# Use: "python PyToolsvisitor_edit.py string".
# add auto-editor start up to SearchVisitor in an external
# component (subclass), not in-place changes; this version 
# automatically pops up an editor on each file containing the
# string as it traverses; you can also use editor='edit' or 
# 'notepad' on windows; 'vi' and 'edit' run in console window;
# editor=r'python GuiTextEditor	extEditor.pyw' may work too;
# caveat: we might be able to make this smarter by sending
# a search command to go to the first match in some editors; 
###############################################################

import os, sys, string
from visitor import SearchVisitor
listonly = 0

class EditVisitor(SearchVisitor):
 """ 
 edit files at and below startDir having string
 """
 editor = 'vi' # ymmv
 def visitmatch(self, fname, text):
 os.system('%s %s' % (self.editor, fname))

if __name__ == '__main__':
 visitor = EditVisitor(sys.argv[1], listonly)
 visitor.run('.')
 print 'Edited %d files, visited %d' % (visitor.scount, visitor.fcount)

When we make and run an EditVisitor, a text editor is started with the os.system command-line spawn call, which usually blocks its caller until the spawned program finishes. On my machines, each time this script finds a matched file during the traversal, it starts up the vi text editor within the console window where the script was started; exiting the editor resumes the tree walk.

Let's find and edit some files. When run as a script, we pass this program the search string as a command argument (here, the string "-exec" is the search key, not an option flag). The root directory is always passed to the run method as ".", the current run directory. Traversal status messages show up in the console as before, but each matched file now automatically pops up in a text editor along the way. Here, the editor is started eight times:

C:...PP2E>python PyToolsvisitor_edit.py -exec 
1 => .autoexec.bat
2 => .cleanall.csh
3 => .echoEnvironment.pyw
4 => .Launcher.py
5 => .Launcher.pyc
Skipping .Launcher.pyc
 ...more deleted...
1340 => .old_Part2Basicsunpack2.py
1341 => .old_Part2Basicsunpack2b.py
1342 => .old_Part2Basicsunpack3.py
1343 => .old_Part2Basics\__init__.py
Edited 8 files, visited 1343

This, finally, is the exact tool I was looking for to simplify global book examples tree maintenance. After major changes to things like shared modules and file and directory names, I run this script on the examples root directory with an appropriate search string, and edit any files it pops up as needed. I still need to change files by hand in the editor, but that's often safer than blind global replacements.

5.5.2 Global Replacements in Directory Trees

But since I brought it up: given a general tree traversal class, it's easy to code a global search-and-replace subclass too. The FileVisitor subclass in Example 5-13, ReplaceVisitor, customizes the visitfile method to globally replace any appearances of one string with another, in all text files at and below a root directory. It also collects the names of all files that were changed in a list, just in case you wish to go through and verify the automatic edits applied (a text editor could be automatically popped up on each changed file, for instance).

Example 5-13. PP2EPyToolsvisitor_replace.py

################################################################
# Use: "python PyToolsvisitor_replace.py fromStr toStr".
# does global search-and-replace in all files in a directory
# tree--replaces fromStr with toStr in all text files; this
# is powerful but dangerous!! visitor_edit.py runs an editor
# for you to verify and make changes, and so is much safer;
# use CollectVisitor to simply collect a list of matched files;
################################################################

import os, sys, string
from visitor import SearchVisitor
listonly = 0

class ReplaceVisitor(SearchVisitor):
 """ 
 change fromStr to toStr in files at and below startDir;
 files changed available in obj.changed list after a run
 """
 def __init__(self, fromStr, toStr, listonly=0):
 self.changed = []
 self.toStr = toStr
 SearchVisitor.__init__(self, fromStr, listonly)
 def visitmatch(self, fname, text):
 fromStr, toStr = self.context, self.toStr
 text = string.replace(text, fromStr, toStr)
 open(fname, 'w').write(text)
 self.changed.append(fname)

if __name__ == '__main__':
 if raw_input('Are you sure?') == 'y':
 visitor = ReplaceVisitor(sys.argv[1], sys.argv[2], listonly)
 visitor.run(startDir='.')
 print 'Visited %d files' % visitor.fcount
 print 'Changed %d files:' % len(visitor.changed)
 for fname in visitor.changed: print fname

To run this script over a directory tree, go to the directory to be changed and run the following sort of command line, with "from" and "to" strings. On my current machine, doing this on a 1354-file tree and changing 75 files along the way takes roughly six seconds of real clock time when the system isn't particularly busy:

C:	empexamples>python %X%/PyTools/visitor_replace.py Part2 SPAM2 
Are you sure?y 
. ...
1 => .autoexec.bat
2 => .cleanall.csh
3 => .echoEnvironment.pyw
4 => .Launcher.py
5 => .Launcher.pyc
Skipping .Launcher.pyc
6 => .Launch_PyGadgets.py
 ...more deleted...
1351 => .PyToolsvisitor_find_quiet2.py
1352 => .PyToolsvisitor_find.pyc
Skipping .PyToolsvisitor_find.pyc
1353 => .PyToolsvisitor_find_quiet1.py
1354 => .PyToolsfixeoln_one.doc.txt
Visited 1354 files
Changed 75 files:
.Launcher.py
.LaunchBrowser.out.txt
.LaunchBrowser.py
.PyDemos.pyw
.PyGadgets.py
.README-PP2E.txt
 ...more deleted...
.PyToolssearch_all.out.txt
.PyToolsvisitor.out.txt
.PyToolsvisitor_edit.py

[to delete, use an empty toStr]
C:	empexamples>python %X%/PyTools/visitor_replace.py SPAM ""

This is both wildly powerful and dangerous. If the string to be replaced is something that can show up in places you didn't anticipate, you might just ruin an entire tree of files by running the ReplaceVisitor object defined here. On the other hand, if the string is something very specific, this object can obviate the need to automatically edit suspicious files. For instance, we will use this approach to automatically change web site addresses in HTML files in Chapter 12; the addresses are likely too specific to show up in other places by chance.

5.5.3 Collecting Matched Files in Trees

The scripts so far search and replace in directory trees, using the same traversal code base (module visitor). Suppose, though, that you just want to get a Python list of files in a directory containing a string. You could run a search and parse the output messages for "found" messages. Much simpler, simply knock off another SearchVisitor subclass to collect the list along the way, as in Example 5-14.

Example 5-14. PP2EPyToolsvisitor_collect.py

#################################################################
# Use: "python PyToolsvisitor_collect.py searchstring".
# CollectVisitor simply collects a list of matched files, for
# display or later processing (e.g., replacement, auto-editing);
#################################################################

import os, sys, string
from visitor import SearchVisitor

class CollectVisitor(SearchVisitor):
 """
 collect names of files containing a string;
 run this and then fetch its obj.matches list
 """
 def __init__(self, searchstr, listonly=0):
 self.matches = []
 SearchVisitor.__init__(self, searchstr, listonly) 
 def visitmatch(self, fname, text):
 self.matches.append(fname)

if __name__ == '__main__':
 visitor = CollectVisitor(sys.argv[1])
 visitor.run(startDir='.')
 print 'Found these files:'
 for fname in visitor.matches: print fname

CollectVisitor is just tree search again, with a new kind of specialization -- collecting files, instead of printing messages. This class is useful from other scripts that mean to collect a matched files list for later processing; it can be run by itself as a script too:

C:...PP2E>python PyToolsvisitor_collect.py -exec 
...
 ...more deleted...
...
1342 => .old_Part2Basicsunpack2b.py
1343 => .old_Part2Basicsunpack3.py
1344 => .old_Part2Basics\__init__.py
Found these files:
.package.csh
.README-PP2E.txt
.
eadme-old-pp1E.txt
.PyToolscleanpyc.py
.PyToolsfixeoln_all.py
.SystemProcessesoutput.txt
.InternetCgi-Webfixcgi.py

5.5.3.1 Suppressing status messages

Here, the items in the collected list are displayed at the end -- all the files containing the string "-exec". Notice, though, that traversal status messages are still printed along the way (in fact, I deleted about 1600 lines of such messages here!). In a tool meant to be called from another script, that may be an undesirable side effect; the calling script's output may be more important than the traversal's.

We could add mode flags to SearchVisitor to turn off status messages, but that makes it more complex. Instead, the following two files show how we might go about collecting matched filenames without letting any traversal messages show up in the console, all without changing the original code base. The first, shown in Example 5-15, simply takes over and copies the search logic, without print statements. It's a bit redundant with SearchVisitor, but only in a few lines of mimicked code.

Example 5-15. PP2EPyToolsvisitor_collect_quiet1.py

##############################################################
# Like visitor_collect, but avoid traversal status messages
##############################################################

import os, sys, string
from visitor import FileVisitor, SearchVisitor

class CollectVisitor(FileVisitor):
 """
 collect names of files containing a string, silently;
 """
 skipexts = SearchVisitor.skipexts
 def __init__(self, searchStr):
 self.matches = []
 self.context = searchStr
 def visitdir(self, dname): pass
 def visitfile(self, fname):
 if (os.path.splitext(fname)[1] not in self.skipexts and
 string.find(open(fname).read( ), self.context) != -1):
 self.matches.append(fname)

if __name__ == '__main__':
 visitor = CollectVisitor(sys.argv[1])
 visitor.run(startDir='.')
 print 'Found these files:'
 for fname in visitor.matches: print fname

When this class is run, only the contents of the matched filenames list show up at the end; no status messages appear during the traversal. Because of that, this form may be more useful as a general-purpose tool used by other scripts:

C:...PP2E>python PyToolsvisitor_collect_quiet1.py -exec
Found these files:
.package.csh
.README-PP2E.txt
.
eadme-old-pp1E.txt
.PyToolscleanpyc.py
.PyToolsfixeoln_all.py
.SystemProcessesoutput.txt
.InternetCgi-Webfixcgi.py

A more interesting and less redundant way to suppress printed text during a traversal is to apply the stream redirection tricks we met in Chapter 2. Example 5-16 sets sys.stdin to a NullOut object that throws away all printed text for the duration of the traversal (its write method does nothing).

The only real complication with this scheme is that there is no good place to insert a restoration of sys.stdout at the end of the traversal; instead, we code the restore in the __del__ destructor method, and require clients to delete the visitor to resume printing as usual. An explicitly called method would work just as well, if you prefer less magical interfaces.

Example 5-16. PP2EPyToolsvisitor_collect_quiet2.py

##############################################################
# Like visitor_collect, but avoid traversal status messages
##############################################################

import os, sys, string
from visitor import SearchVisitor

class NullOut:
 def write(self, line): pass

class CollectVisitor(SearchVisitor):
 """
 collect names of files containing a string, silently
 """
 def __init__(self, searchstr, listonly=0):
 self.matches = []
 self.saveout, sys.stdout = sys.stdout, NullOut( )
 SearchVisitor.__init__(self, searchstr, listonly) 
 def __del__(self):
 sys.stdout = self.saveout
 def visitmatch(self, fname, text):
 self.matches.append(fname)

if __name__ == '__main__':
 visitor = CollectVisitor(sys.argv[1])
 visitor.run(startDir='.')
 matches = visitor.matches
 del visitor
 print 'Found these files:'
 for fname in matches: print fname

When this script is run, output is identical to the prior run -- just the matched filenames at the end. Perhaps better still, why not code and debug just one verbose CollectVisitor utility class, and require clients to wrap calls to its run method in the redirect.redirect function we wrote back in Example 2-10 ?

>>> from PP2E.PyTools.visitor_collect import CollectVisitor
>>> from PP2E.System.Streams.redirect import redirect
>>> walker = CollectVisitor('-exec') # object to find '-exec'
>>> output = redirect(walker.run, ('.',), '')  # function, args, input
>>> for line in walker.matches: print line # print items in list
...
.package.csh
.README-PP2E.txt
.
eadme-old-pp1E.txt
.PyToolscleanpyc.py
.PyToolsfixeoln_all.py
.SystemProcessesoutput.txt
.InternetCgi-Webfixcgi.py

The redirect call employed here resets standard input and output streams to file-like objects for the duration of any function call; because of that, it's a more general way to suppress output than recoding every outputter. Here, it has the effect of intercepting (and hence suppressing) printed messages during a walker.run('.') traversal. They really are printed, but show up in the string result of the redirect call, not on the screen:

>>> output[:60]
'. ...121 => .\autoexec.bat122 => .\cleanall.csh123 => .\echoEnv'

>>> import string
>>> len(output), len(string.split(output, '
')) # bytes, lines
(67609, 1592)

>>> walker.matches
['.\package.csh', '.\README-PP2E.txt', '.\readme-old-pp1E.txt', 
'.\PyTools\cleanpyc.py', '.\PyTools\fixeoln_all.py',
'.\System\Processes\output.txt', 
'.\Internet\Cgi-Web\fixcgi.py']

Because redirect saves printed text in a string, it may be less appropriate than the two quiet CollectVisitor variants for functions that generate much output. Here, for example, 67,609 bytes of output was queued up in an in-memory string (see the len call results); such a buffer may or may not be significant in some applications.

In more general terms, redirecting sys.stdout to dummy objects as done here is a simple way to turn off outputs (and is the equivalent to the Unix notion of redirecting output to file /dev/null -- a file that discards everything sent to it). For instance, we'll pull this trick out of the bag again in the context of server-side Internet scripting, to prevent utility status messages from showing up in generated web page output streams.[10]

[10] For the impatient: see commonhtml.runsilent in the PyMailCgi system presented in Chapter 13. It's a variation on redirect.redirect that discards output as it is printed (instead of retaining it in a string), returns the return value of the function called (not the output string), and lets exceptions pass via a try/finally statement (instead of catching and reporting them with a try/except). It's still redirection at work, though.

5.5.4 Recoding Fixers with Visitors

Be warned: once you've written and debugged a class that knows how to do something useful like walking directory trees, it's easy for it to spread throughout your system utility libraries. Of course, that's the whole point of code reuse. For instance, very soon after writing the visitor classes presented in the prior sections, I recoded both the fixnames_all.py and fixeoln_all.py directory walker scripts listed earlier in Examples Example 5-6 and Example 5-4, respectively, to use visitor instead of proprietary tree-walk logic (they both originally used find.find). Example 5-17 combines the original convertLines function (to fix end-of-lines in a single file) with visitor's tree walker class, to yield an alternative implementation of the line-end converter for directory trees.

Example 5-17. PP2EPyToolsvisitor_fixeoln.py

##############################################################
# Use: "python visitor_fixeoln.py todos|tounix".
# recode fixeoln_all.py as a visitor subclass: this version
# uses os.path.walk, not find.find to collext all names first;
# limited but fast: if os.path.splitext(fname)[1] in patts:
##############################################################

import visitor, sys, fnmatch, os
from fixeoln_dir import patts
from fixeoln_one import convertEndlines

class EolnFixer(visitor.FileVisitor):
 def visitfile(self, fullname): # match on basename
 basename = os.path.basename(fullname) # to make result same
 for patt in patts: # else visits fewer 
 if fnmatch.fnmatch(basename, patt):
 convertEndlines(self.context, fullname)
 self.fcount = self.fcount + 1 # could break here
 # but results differ
if __name__ == '__main__':
 walker = EolnFixer(sys.argv[1])
 walker.run( )
 print 'Files matched (converted or not):', walker.fcount

As we saw in Chapter 2, the built-in fnmatch module performs Unix shell-like filename matching; this script uses it to match names to the previous version's filename patterns (simply looking for filename extensions after a "." is simpler, but not as general):

C:	empexamples>python %X%/PyTools/visitor_fixeoln.py tounix 
. ...
Changing .echoEnvironment.pyw
Changing .Launcher.py
Changing .Launch_PyGadgets.py
Changing .Launch_PyDemos.pyw
 ...more deleted...
Changing .PyToolsvisitor_find.py
Changing .PyToolsvisitor_fixnames.py
Changing .PyToolsvisitor_find_quiet2.py
Changing .PyToolsvisitor_find_quiet1.py
Changing .PyToolsfixeoln_one.doc.txt
Files matched (converted or not): 1065

C:	empexamples>python %X%/PyTools/visitor_fixeoln.py tounix 
 ...more deleted...
.ExtendSwigShadow ...
. ...
.EmbExtExports ...
.EmbExtExportsClassAndMod ...
.EmbExtRegist ...
.PyTools ...
Files matched (converted or not): 1065

If you run this script and the original fixeoln_all.py on the book examples tree, you'll notice that this version visits two fewer matched files. This simply reflects the fact that fixeoln_all also collects and skips over two directory names for its patterns in the find.find result (both called "Output"). In all other ways, this version works the same way even when it could do better -- adding a break statement after the convertEndlines call here avoids visiting files that appear redundantly in the original's find results lists.

The first command here takes roughly six seconds on my computer, and the second takes about four (there are no files to be converted). That's faster than the eight- and six-second figures for the original find.find-based version of this script, but they differ in amount of output, and benchmarks are usually much more subtle than you imagine. Most of the real clock time is likely spent scrolling text in the console, not doing any real directory processing. Since both are plenty fast for their intended purposes, finer-grained performance figures are left as exercises.

The script in Example 5-18 combines the original convertOne function (to rename a single file or directory) with the visitor's tree walker class, to create a directory tree-wide fix for uppercase filenames. Notice that we redefine both file and directory visitation methods here, as we need to rename both.

Example 5-18. PP2EPyToolsvisitor_fixnames.py

###############################################################
# recode fixnames_all.py name case fixer with the Visitor class
# note: "from fixnames_all import convertOne" doesn't help at 
# top-level of the fixnames class, since it is assumed to be a
# method and called with extra self argument (an exception);
###############################################################

from visitor import FileVisitor

class FixnamesVisitor(FileVisitor):
 """
 check filenames at and below startDir for uppercase
 """
 import fixnames_all
 def __init__(self, listonly=0):
 FileVisitor.__init__(self, listonly=listonly)
 self.ccount = 0
 def rename(self, pathname):
 if not self.listonly:
 convertflag = self.fixnames_all.convertOne(pathname)
 self.ccount = self.ccount + convertflag
 def visitdir(self, dirname):
 FileVisitor.visitdir(self, dirname)
 self.rename(dirname)
 def visitfile(self, filename):
 FileVisitor.visitfile(self, filename)
 self.rename(filename)

if __name__ == '__main__': 
 walker = FixnamesVisitor( )
 walker.run( )
 allnames = walker.fcount + walker.dcount
 print 'Converted %d files, visited %d' % (walker.ccount, allnames)

This version is run like the original find.find based version, fixnames_all, but visits one more name (the top-level root directory), and there is no initial delay while filenames are collected on a list -- we're using os.path.walk again, not find.find. It's also close to the original os.path.walk version of this script, but is based on a class hierarchy, not direct function callbacks:

C:	empexamples>python %X%/PyTools/visitor_fixnames.py 
 ...more deleted...
303 => .\__init__.py
304 => .\__init__.pyc
305 => .AiExpertSystemholmes.tar
306 => .AiExpertSystemTODO
Convert dir=.AiExpertSystem file=TODO? (y|Y) 
307 => .AiExpertSystem\__init__.py
308 => .AiExpertSystemholmescnv
309 => .AiExpertSystemholmesREADME.1ST
Convert dir=.AiExpertSystemholmes file=README.1ST? (y|Y) 
 ...more deleted...
1353 => .PyToolsvisitor_find.pyc
1354 => .PyToolsvisitor_find_quiet1.py
1355 => .PyToolsfixeoln_one.doc.txt
Converted 1 files, visited 1474

Both of these fixer scripts work roughly the same as the originals, but because the directory walking logic lives in just one file (visitor.py), it only needs to be debugged once. Moreover, improvements in that file will automatically be inherited by every directory-processing tool derived from its classes. Even when coding system-level scripts, reuse and reduced redundancy pay off in the end.

5.5.5 Fixing File Permissions in Trees

Just in case the preceding visitor-client sections weren't quite enough to convince you of the power of code reuse, another piece of evidence surfaced very late in this book project. It turns out that copying files off a CD using Windows drag-and-drop makes them read-only in the copy. That's less than ideal for the book examples directory on the enclosed CD (see http://examples.oreilly.com/python2) -- you must copy the directory tree onto your hard drive to be able to experiment with program changes (naturally, files on CD can't be changed in place). But if you copy with drag-and-drop, you may wind up with a tree of over 1000 read-only files.

Since drag-and-drop is perhaps the most common way to copy off a CD on Windows, I needed a portable and easy-to-use way to undo the read-only setting. Asking readers to make these all writable by hand would be impolite to say the least. Writing a full-blown install system seemed like overkill. Providing different fixes for different platforms doubles or triples the complexity of the task.

Much better, the Python script in Example 5-19 can be run in the root of the copied examples directory to repair the damage of a read-only drag-and-drop operation. It specializes the traversal implemented by the FileVisitor class again -- this time to run an os.chmod call on every file and directory visited along the way.

Example 5-19. PP2EPyToolsfixreadonly-all.py

#!/usr/bin/env python
###############################################################
# Use: python PyToolsfixreadonly-all.py 
# run this script in the top-level examples directory after
# copying all examples off the book's CD-ROM, to make all 
# files writeable again--by default, copying files off the 
# CD with Windows drag-and-drop (at least) creates them as 
# read-only on your hard drive; this script traverses entire 
# dir tree at and below the dir it is run in (all subdirs);
###############################################################

import os, string
from PP2E.PyTools.visitor import FileVisitor # os.path.walk wrapper
listonly = 0

class FixReadOnly(FileVisitor):
 def __init__(self, listonly=0): 
 FileVisitor.__init__(self, listonly=listonly) 
 def visitDir(self, dname):
 FileVisitor.visitfile(self, fname)
 if self.listonly:
 return
 os.chmod(dname, 0777)
 def visitfile(self, fname): 
 FileVisitor.visitfile(self, fname)
 if self.listonly:
 return
 os.chmod(fname, 0777)

if __name__ == '__main__':
 # don't run auto if clicked
 go = raw_input('This script makes all files writeable; continue?') 
 if go != 'y':
 raw_input('Canceled - hit enter key')
 else:
 walker = FixReadOnly(listonly)
 walker.run( )
 print 'Visited %d files and %d dirs' % (walker.fcount, walker.dcount)

As we saw in Chapter 2, the built-in os.chmod call changes the permission settings on an external file (here, to 0777 -- global read, write, and execute permissions). Because os.chmod and the FileVisitor's operations are portable, this same script will work to set permissions in an entire tree on both Windows and Unix-like platforms. Notice that it asks whether you really want to proceed when it first starts up, just in case someone accidentally clicks the file's name in an explorer GUI. Also note that Python must be installed before this script can be run to make files writable; that seems a fair assumption to make of users about to change Python scripts.

C:	empexamples>python PyToolsfixreadonly-all.py 
This script makes all files writeable; continue?y 
. ...
1 => .autoexec.bat
2 => .cleanall.csh
3 => .echoEnvironment.pyw
 ...more deleted...
1352 => .PyToolsvisitor_find.pyc
1353 => .PyToolsvisitor_find_quiet1.py
1354 => .PyToolsfixeoln_one.doc.txt
Visited 1354 files and 119 dirs

Introducing Python

Part I: System Interfaces