Section 7.5. Visitor: Walking Trees Generically


7.5. Visitor: Walking Trees Generically

Armed with the portable search_all script from Example 7-10, I was able to better pinpoint files to be edited every time I changed the book examples tree structure. At least initially, in one window I ran search_all to pick out suspicious files and edited each along the way by hand in another window.

Pretty soon, though, this became tedious too. Manually typing filenames into editor commands is no fun, especially when the number of files to edit is large; the search for "Part2" shown earlier returned 74 files, for instance. Since I occasionally have better things to do than manually start 74 editor sessions, I looked for a way to automatically run an editor on each suspicious file.

Unfortunately, search_all simply prints results to the screen. Although that text could be intercepted and parsed, a more direct approach that spawns edit sessions during the search may be easier, but may require major changes to the tree search script as currently coded. At this point, two thoughts came to mind.

First, I knew it would be easier in the long run to be able to add features to a general directory searcher as external components, not by changing the original script. Because editing files was just one possible extension (what about automating text replacements too?), a more generic, customizable, and reusable search component seemed the way to go.

Second, after writing a few directory walking utilities, it became clear that I was rewriting the same sort of code over and over again. Traversals could be even further simplified by wrapping common details for easier reuse. The os.path.walk tool helps, but its use tends to foster redundant operations (e.g., directory name joins), and its function-object-based interface doesn't quite lend itself to customization the way a class can.

Of course, both goals point to using an object-oriented framework for traversals and searching. Example 7-11 is one concrete realization of these goals. It exports a general FileVisitor class that mostly just wraps os.path.walk for easier use and extension, as well as a generic SearchVisitor class that generalizes the notion of directory searches. By itself, SearchVisitor simply does what search_all did, but it also opens up the search process to customization; bits of its behavior can be modified by overloading its methods in subclasses. Moreover, its core search logic can be reused everywhere we need to search. Simply define a subclass that adds search-specific extensions. As is usual in programming, once you repeat tactical tasks often enough, they tend to inspire this kind of strategic thinking.

Example 7-11. PP3E\PyTools\visitor.py

 ########################################################################## # Test: "python ..\..\PyTools\visitor.py testmask [string]".  Uses OOP, # classes, and subclasses to wrap some of the details of os.path.walk # usage to walk and search; testmask is an integer bitmask with 1 bit # per available selftest; see also: visitor_edit/replace/find/fix*/.py # subclasses, and the fixsitename.py client script in Internet\Cgi-Web; ########################################################################## import os, sys listonly = False class FileVisitor:     """     visits all nondirectory files below startDir;     override visitfile to provide a file handler     """     def _ _init_ _(self, data=None, listonly=False):         self.context  = data         self.fcount   = 0         self.dcount   = 0         self.listonly = listonly     def run(self, startDir=os.curdir):                  # default start='.'         os.path.walk(startDir, self.visitor, None)     def visitor(self, data, dirName, filesInDir):       # called for each dir         self.visitdir(dirName)                          # do this dir first         for fname in filesInDir:                        # do non-dir files             fpath = os.path.join(dirName, fname)        # fnames have no path             if not os.path.isdir(fpath):                 self.visitfile(fpath)     def visitdir(self, dirpath):                        # called for each dir         self.dcount += 1                                # override or extend me         print dirpath, '...'     def visitfile(self, filepath):                      # called for each file         self.fcount += 1                                # override or extend me         print self.fcount, '=>', filepath               # default: print name class SearchVisitor(FileVisitor):     """     search files at and below startDir for a string     """     skipexts = ['.gif', '.exe', '.pyc', '.o', '.a']     # skip binary files     def _ _init_ _(self, key, listonly=False):         FileVisitor._ _init_ _(self, key, listonly)         self.scount = 0     def visitfile(self, fname):                         # test for a match         FileVisitor.visitfile(self, fname)         if not self.listonly:             if os.path.splitext(fname)[1] in self.skipexts:                 print 'Skipping', fname             else:                 text = open(fname).read( )                 if text.find(self.context) != -1:                     self.visitmatch(fname, text)                     self.scount += 1     def visitmatch(self, fname, text):                     # process a match         raw_input('%s has %s' % (fname, self.context))     # override me lower # self-test logic dolist   = 1 dosearch = 2    # 3=do list and search donext   = 4    # when next test added def selftest(testmask):     if testmask & dolist:        visitor = FileVisitor( )        visitor.run('.')        print 'Visited %d files and %d dirs' % (visitor.fcount, visitor.dcount)     if testmask & dosearch:        visitor = SearchVisitor(sys.argv[2], listonly)        visitor.run('.')        print 'Found in %d files, visited %d' % (visitor.scount, visitor.fcount) if _ _name_ _ == '_ _main_ _':     selftest(int(sys.argv[1]))    # e.g., 5 = dolist | dorename 

This module primarily serves to export classes for external use, but it does something useful when run standalone too. If you invoke it as a script with a single argument, 1, it makes and runs a FileVisitor object and prints an exhaustive listing of every file and directory at and below the place you are at when the script is invoked (i.e., ".", the current working directory):

 C:\temp>python %X%\PyTools\visitor.py 1  . ... 1 => .\autoexec.bat 2 => .\cleanall.csh 3 => .\echoEnvironment.pyw 4 => .\Launcher.py 5 => .\Launcher.pyc 6 => .\Launch_PyGadgets.py 7 => .\Launch_PyDemos.pyw  ...more deleted... 479 => .\Gui\Clock\plotterGui.py 480 => .\Gui\Clock\plotterText.py 481 => .\Gui\Clock\plotterText1.py 482 => .\Gui\Clock\_ _init_ _.py .\Gui\gifs ... 483 => .\Gui\gifs\frank.gif 484 => .\Gui\gifs\frank.note 485 => .\Gui\gifs\gilligan.gif 486 => .\Gui\gifs\gilligan.note  ...more deleted... 1352 => .\PyTools\visitor_fixnames.py 1353 => .\PyTools\visitor_find_quiet2.py 1354 => .\PyTools\visitor_find.pyc 1355 => .\PyTools\visitor_find_quiet1.py 1356 => .\PyTools\fixeoln_one.doc.txt Visited 1356 files and 119 dirs 

If you instead invoke this script with a 2 as its first argument, it makes and runs a SearchVisitor object using the second argument as the search key. This form is equivalent to running the search_all.py script we met earlier; it pauses for an Enter key press after each matching file is reported (lines in bold font here):

 C:\temp\examples>python %X%\PyTools\visitor.py 2 Part3  . ... 1 => .\autoexec.bat 2 => .\cleanall.csh .\cleanall.csh has Part3  3 => .\echoEnvironment.pyw 4 => .\Launcher.py .\Launcher.py has Part3  5 => .\Launcher.pyc Skipping .\Launcher.pyc 6 => .\Launch_PyGadgets.py 7 => .\Launch_PyDemos.pyw 8 => .\LaunchBrowser.out.txt 9 => .\LaunchBrowser.py 10 => .\Launch_PyGadgets_bar.pyw 11 => .\makeall.csh .\makeall.csh has Part3  ...  ...more deleted ... 1353 => .\PyTools\visitor_find_quiet2.py 1354 => .\PyTools\visitor_find.pyc Skipping .\PyTools\visitor_find.pyc 1355 => .\PyTools\visitor_find_quiet1.py 1356 => .\PyTools\fixeoln_one.doc.txt Found in 49 files, visited 1356 

Technically, passing this script a first argument of 3 runs both a FileVisitor and a SearchVisitor (two separate traversals are performed). The first argument is really used as a bit mask to select one or more supported self-tests; if a test's bit is on in the binary value of the argument, the test will be run. Because 3 is 011 in binary, it selects both a search (010) and a listing (001). In a more user-friendly system, we might want to be more symbolic about that (e.g., check for -search and -list arguments), but bit masks work just as well for this script's scope.

Text Editor War and Peace

In case you don't know, the vi setting used in the visitor_edit.py script is a Unix text editor; it's available for Windows too but is not standard there. If you run this script, you'll probably want to change its editor setting on your machine. For instance, "emacs" should work on Linux, and "edit" or "notepad" should work on all Windows boxes.

These days, I tend to use an editor I coded in Python (PyEdit), so I'll leave the editor wars to more politically minded readers. In fact, changing the script to assign editor in either of these ways:

     editor = r'python Gui\TextEditor\textEditorNoConsole.pyw'     editor = r'start  Gui\TextEditor\textEditorNoConsole.pyw' 

will open the matched file in a pure and portable Python text editor GUIone coded in Python with the Tkinter interface, which runs on all major GUI platforms and which we'll meet in Chapter 12. If you read about the start command in Chapter 5, you know that the first editor setting pauses the traversal while the editor runs, but the second does not (you'll get as many PyEdit windows as there are matched files).

This may fail, however, for very long file directory names (remember, os.system has a length limit, unlike os.spawnv). Moreover, the path to the textEditor.pyw program may vary depending on where you are when you run visitor_edit.py (i.e., the CWD). There are four ways around this latter problem:

  • Prefixing the script's path string with the value of the PP3EHOME shell variable, fetched with os.environ; with the standard book setup scripts, PP3EHOME gives the absolute root directory, from which the editor script's path can be found

  • Prefixing the path with sys.path[0] and a '../' to exploit the fact that the first import directory is always the script's home directory (see Chapter 3)

  • Windows shortcuts or Unix links to the editor script from the CWD

  • Searching for the script naïvely with Launcher.findFirst or guessLocation, described near the end of Chapter 6

But these are all beyond the scope of a sidebar on text editor politics.


7.5.1. Editing Files in Directory Trees

Now, after genericizing tree traversals and searches, it's an easy step to add automatic file editing in a brand-new, separate component. Example 7-12 defines a new EditVisitor class that simply customizes the visitmatch method of the SearchVisitor class to open a text editor on the matched file. Yes, this is the complete program. It needs to do something special only when visiting matched files, and so it needs provide only that behavior; the rest of the traversal and search logic is unchanged and inherited.

Example 7-12. PP3E\PyTools\visitor_edit.py

 ############################################################### # Use: "python PyTools\visitor_edit.py string". # add auto-editor startup to SearchVisitor in an external # component (subclass), not in-place changes; this version # automatically pops up an editor on each file containing the # string as it traverses; you can also use editor='edit' or # 'notepad' on Windows; 'vi' and 'edit' run in console window; # editor=r'python Gui\TextEditor\textEditor.py' may work too; # caveat: we might be able to make this smarter by sending # a search command to go to the first match in some editors; ############################################################### import os, sys from visitor import SearchVisitor listonly = False class EditVisitor(SearchVisitor):     """     edit files at and below startDir having string     """     editor = 'vi'  # ymmv     def visitmatch(self, fname, text):         os.system('%s %s' % (self.editor, fname)) if _ _name_ _  == '_ _main_ _':     visitor = EditVisitor(sys.argv[1], listonly)     visitor.run('.')     print 'Edited %d files, visited %d' % (visitor.scount, visitor.fcount) 

When we make and run an EditVisitor, a text editor is started with the os.system command-line spawn call, which usually blocks its caller until the spawned program finishes. On my machines, each time this script finds a matched file during the traversal, it starts up the vi text editor within the console window where the script was started; exiting the editor resumes the tree walk.

Let's find and edit some files. When run as a script, we pass this program the search string as a command argument (here, the string -exec is the search key, not an option flag). The root directory is always passed to the run method as ".", the current run directory. Traversal status messages show up in the console as before, but each matched file now automatically pops up in a text editor along the way. Here, the editor is started eight times:

 C:\...\PP3E>python PyTools\visitor_edit.py -exec  1 => .\autoexec.bat 2 => .\cleanall.csh 3 => .\echoEnvironment.pyw 4 => .\Launcher.py 5 => .\Launcher.pyc Skipping .\Launcher.pyc ...more deleted...  1340 => .\old_Part2\Basics\unpack2.py 1341 => .\old_Part2\Basics\unpack2b.py 1342 => .\old_Part2\Basics\unpack3.py 1343 => .\old_Part2\Basics\_ _init_ _.py Edited 8 files, visited 1343 

This, finally, is the exact tool I was looking for to simplify global book examples tree maintenance. After major changes to things such as shared modules and file and directory names, I run this script on the examples root directory with an appropriate search string and edit any files it pops up as needed. I still need to change files by hand in the editor, but that's often safer than blind global replacements.

7.5.2. Global Replacements in Directory Trees

But since I brought it up, given a general tree traversal class, it's easy to code a global search-and-replace subclass too. The FileVisitor subclass in Example 7-13, ReplaceVisitor, customizes the visitfile method to globally replace any appearances of one string with another, in all text files at and below a root directory. It also collects the names of all files that were changed in a list just in case you wish to go through and verify the automatic edits applied (a text editor could be automatically popped up on each changed file, for instance).

Example 7-13. PP3E\PyTools\visitor_replace.py

 ################################################################ # Use: "python PyTools\visitor_replace.py fromStr toStr". # does global search-and-replace in all files in a directory # tree--replaces fromStr with toStr in all text files; this # is powerful but dangerous!! visitor_edit.py runs an editor # for you to verify and make changes, and so is much safer; # use CollectVisitor to simply collect a list of matched files; ################################################################ import sys from visitor import SearchVisitor listonly = False class ReplaceVisitor(SearchVisitor):     """     change fromStr to toStr in files at and below startDir;     files changed available in obj.changed list after a run     """     def _ _init_ _(self, fromStr, toStr, listonly=False):         self.changed = []         self.toStr   = toStr         SearchVisitor._ _init_ _(self, fromStr, listonly)     def visitmatch(self, fname, text):         fromStr, toStr = self.context, self.toStr         text = text.replace(fromStr, toStr)         open(fname, 'w').write(text)         self.changed.append(fname) if _ _name_ _  == '_ _main_ _':     if raw_input('Are you sure?') == 'y':         visitor = ReplaceVisitor(sys.argv[1], sys.argv[2], listonly)         visitor.run(startDir='.')         print 'Visited %d files'  % visitor.fcount         print 'Changed %d files:' % len(visitor.changed)         for fname in visitor.changed: print fname 

To run this script over a directory tree, go to the directory to be changed and run the following sort of command line with "from" and "to" strings. On my current machine, doing this on a 1,354-file tree and changing 75 files along the way takes roughly six seconds of real clock time when the system isn't particularly busy.

 C:\temp\examples>python %X%/PyTools/visitor_replace.py Part2 SPAM2  Are you sure?y  . ... 1 => .\autoexec.bat 2 => .\cleanall.csh 3 => .\echoEnvironment.pyw 4 => .\Launcher.py 5 => .\Launcher.pyc Skipping .\Launcher.pyc 6 => .\Launch_PyGadgets.py  ...more deleted... 1351 => .\PyTools\visitor_find_quiet2.py 1352 => .\PyTools\visitor_find.pyc Skipping .\PyTools\visitor_find.pyc 1353 => .\PyTools\visitor_find_quiet1.py 1354 => .\PyTools\fixeoln_one.doc.txt Visited 1354 files Changed 75 files: .\Launcher.py .\LaunchBrowser.out.txt .\LaunchBrowser.py .\PyDemos.pyw .\PyGadgets.py .\README-PP3E.txt  ...more deleted... .\PyTools\search_all.out.txt .\PyTools\visitor.out.txt .\PyTools\visitor_edit.py [to delete, use an empty toStr] C:\temp\examples>python %X%/PyTools/visitor_replace.py SPAM ""  

This is both wildly powerful and dangerous. If the string to be replaced can show up in places you didn't anticipate, you might just ruin an entire tree of files by running the ReplaceVisitor object defined here. On the other hand, if the string is something very specific, this object can obviate the need to automatically edit suspicious files. For instance, we will use this approach to automatically change web site addresses in HTML files in Chapter 16; the addresses are likely too specific to show up in other places by chance.

7.5.3. Collecting Matched Files in Trees

The scripts so far search and replace in directory trees, using the same traversal code base (the visitor module). Suppose, though, that you just want to get a Python list of files in a directory containing a string. You could run a search and parse the output messages for "found" messages. Much simpler, simply knock off another SearchVisitor subclass to collect the list along the way, as in Example 7-14.

Example 7-14. PP3E\PyTools\visitor_collect.py

 ################################################################# # Use: "python PyTools\visitor_collect.py searchstring". # CollectVisitor   simply collects a list of matched      files, for # display or later processing (e.g., replacement, auto-editing); ################################################################# import sys from visitor import SearchVisitor class CollectVisitor(SearchVisitor):     """     collect names of files containing a string;     run this and then fetch its obj.matches list     """     def _ _init_ _(self, searchstr, listonly=False):         self.matches = []         SearchVisitor._ _init_ _(self, searchstr, listonly)     def visitmatch(self, fname, text):         self.matches.append(fname) if _ _name_ _  == '_ _main_ _':     visitor = CollectVisitor(sys.argv[1])     visitor.run(startDir='.')     print 'Found these files:'     for fname in visitor.matches: print fname 

CollectVisitor is just a tree search again, with a new kind of specializationcollecting files instead of printing messages. This class is useful from other scripts that mean to collect a matched files list for later processing; it can be run by itself as a script too:

 C:\...\PP3E>python PyTools\visitor_collect.py -exec  ...  ...more deleted... ... 1342 => .\old_Part2\Basics\unpack2b.py 1343 => .\old_Part2\Basics\unpack3.py 1344 => .\old_Part2\Basics\_ _init_ _.py Found these files: .\package.csh .\README-PP3E.txt .\readme-old-pp1E.txt .\PyTools\cleanpyc.py .\PyTools\fixeoln_all.py .\System\Processes\output.txt .\Internet\Cgi-Web\fixcgi.py 

7.5.3.1. Suppressing status messages

Here, the items in the collected list are displayed at the endall the files containing the string -exec. Notice, though, that traversal status messages are still printed along the way (in fact, I deleted about 1,600 lines of such messages here!). In a tool meant to be called from another script, that may be an undesirable side effect; the calling script's output may be more important than the traversal's.

We could add mode flags to SearchVisitor to turn off status messages, but that makes it more complex. Instead, the following two files show how we might go about collecting matched filenames without letting any traversal messages show up in the console, all without changing the original code base. The first, shown in Example 7-15, simply takes over and copies the search logic, without print statements. It's a bit redundant with SearchVisitor, but only in a few lines of mimicked code.

Example 7-15. PP3E\PyTools\visitor_collect_quiet1.py

 ############################################################## # Like visitor_collect, but avoid traversal status messages ############################################################## import os, sys from visitor import FileVisitor, SearchVisitor class CollectVisitor(FileVisitor):     """     collect names of files containing a string, silently;     """     skipexts = SearchVisitor.skipexts     def _ _init_ _(self, searchStr):         self.matches = []         self.context = searchStr     def visitdir(self, dname): pass     def visitfile(self, fname):         if (os.path.splitext(fname)[1] not in self.skipexts and             open(fname).read( ).find(self.context) != -1):             self.matches.append(fname) if _ _name_ _  == '_ _main_ _':     visitor = CollectVisitor(sys.argv[1])     visitor.run(startDir='.')     print 'Found these files:'     for fname in visitor.matches: print fname 

When this class is run, only the contents of the matched filenames list show up at the end; no status messages appear during the traversal. Because of that, this form may be more useful as a general-purpose tool used by other scripts:

 C:\...\PP3E>python PyTools\visitor_collect_quiet1.py -exec Found these files: .\package.csh .\README-PP3E.txt .\readme-old-pp1E.txt .\PyTools\cleanpyc.py .\PyTools\fixeoln_all.py .\System\Processes\output.txt .\Internet\Cgi-Web\fixcgi.py 

A more interesting and less redundant way to suppress printed text during a traversal is to apply the stream redirection tricks we met in Chapter 3. Example 7-16 sets sys.stdin to a NullOut object that throws away all printed text for the duration of the traversal (its write method does nothing). We could also use the StringIO module we met in Chapter 3 for this purpose, but it's overkill here; we don't need to retain printed text.

The only real complication with this scheme is that there is no good place to insert a restoration of sys.stdout at the end of the traversal; instead, we code the restore in the _ _del_ _ destructor method and require clients to delete the visitor to resume printing as usual. An explicitly called method would work just as well, if you prefer less magical interfaces.

Example 7-16. PP3E\PyTools\visitor_collect_quiet2.py

 ############################################################## # Like visitor_collect, but avoid traversal status messages ############################################################## import sys from visitor import SearchVisitor class NullOut:     def write(self, line): pass class CollectVisitor(SearchVisitor):     """     collect names of files containing a string, silently     """     def _ _init_ _(self, searchstr, listonly=False):         self.matches = []         self.saveout, sys.stdout = sys.stdout, NullOut( )         SearchVisitor._ _init_ _(self, searchstr, listonly)     def _ _del_ _(self):         sys.stdout = self.saveout     def visitmatch(self, fname, text):         self.matches.append(fname) if _ _name_ _  == '_ _main_ _':     visitor = CollectVisitor(sys.argv[1])     visitor.run(startDir='.')     matches = visitor.matches     del visitor     print 'Found these files:'     for fname in matches: print fname 

When this script is run, output is identical to the prior runjust the matched filenames at the end. Perhaps better still, why not code and debug just one verbose CollectVisitor utility class, and require clients to wrap calls to its run method in the redirect.redirect function we wrote in Example 3-10?

 >>> from PP3E.PyTools.visitor_collect import CollectVisitor >>> from PP3E.System.Streams.redirect import redirect >>> walker = CollectVisitor('-exec')                   # object to find '-exec' >>> output = redirect(walker.run, ('.',), '')          # function, args, input >>> for line in walker.matches: print line             # print items in list ... .\package.csh .\README-PP3E.txt .\readme-old-pp1E.txt .\PyTools\cleanpyc.py .\PyTools\fixeoln_all.py .\System\Processes\output.txt .\Internet\Cgi-Web\fixcgi.py 

The redirect call employed here resets standard input and output streams to file-like objects for the duration of any function call; because of that, it's a more general way to suppress output than recoding every outputter. Here, it has the effect of intercepting (and hence suppressing) printed messages during a walker.run('.') traversal. They really are printed, but show up in the string result of the redirect call, not on the screen:

 >>> output[:60] '. ...\n1 => .\\autoexec.bat\n2 => .\\cleanall.csh\n3 => .\\echoEnv' >>> len(output), len(output.split('\n'))           # bytes, lines (67609, 1592) >>> walker.matches ['.\\package.csh', '.\\README-PP3E.txt', '.\\readme-old-pp1E.txt', '.\\PyTools\\cleanpyc.py', '.\\PyTools\\fixeoln_all.py', '.\\System\\Processes\\output.txt', '.\\Internet\\Cgi-Web\\fixcgi.py'] 

Because redirect saves printed text in a string, it may be less appropriate than the two quiet CollectVisitor variants for functions that generate much output. Here, for example, 67,609 bytes of output were queued up in an in-memory string (see the len call results); such a buffer may or may not be significant in most applications.

In more general terms, redirecting sys.stdout to dummy objects as done here is a simple way to turn off outputs (and is the equivalent to the Unix notion of redirecting output to the file /dev/nulla file that discards everything sent to it). For instance, we'll pull this trick out of the bag again in the context of server-side Internet scripting, to prevent utility status messages from showing up in generated web page output streams.[*]

[*] For the impatient: see commonhtml.runsilent in the PyMailCGI system presented in Chapter 17. It's a variation on redirect.redirect that discards output as it is printed (instead of retaining it in a string), returns the return value of the function called (not the output string), and lets exceptions pass via a TRy/finally statement (instead of catching and reporting them with a TRy/except). It's still redirection at work, though.

7.5.4. Recoding Fixers with Visitors

Be warned: once you've written and debugged a class that knows how to do something useful like walking directory trees, it's easy for it to spread throughout your system utility libraries. Of course, that's the whole point of code reuse. For instance, very soon after writing the visitor classes presented in the prior sections, I recoded both the fixnames_all.py and the fixeoln_all.py directory walker scripts listed earlier in Examples 7-6 and 7-4, respectively, to use visitor rather than proprietary tree-walk logic (they both originally used find.find). Example 7-17 combines the original convertLines function (to fix end-of-lines in a single file) with visitor's tree walker class, to yield an alternative implementation of the line-end converter for directory trees.

Example 7-17. PP3E\PyTools\visitor_fixeoln.py

 ############################################################## # Use: "python visitor_fixeoln.py todos|tounix". # recode fixeoln_all.py as a visitor subclass: this version # uses os.path.walk, not find.find to collect all names first; # limited but fast: if os.path.splitext(fname)[1] in patts: ############################################################## import visitor, sys, fnmatch, os from fixeoln_dir import patts from fixeoln_one import convertEndlines class EolnFixer(visitor.FileVisitor):     def visitfile(self, fullname):                        # match on basename         basename = os.path.basename(fullname)             # to make result same         for patt in patts:                                # else visits fewer             if fnmatch.fnmatch(basename, patt):                 convertEndlines(self.context, fullname)                 self.fcount += 1                          # could break here                                                           # but results differ if _ _name_ _ == '_ _main_ _':     walker = EolnFixer(sys.argv[1])     walker.run( )     print 'Files matched (converted or not):', walker.fcount 

As we saw in Chapter 4, the built-in fnmatch module performs Unix shell-like filename matching; this script uses it to match names to the previous version's filename patterns (simply looking for filename extensions after a "." is simpler, but not as general):

 C:\temp\examples>python %X%/PyTools/visitor_fixeoln.py tounix  . ... Changing .\echoEnvironment.pyw Changing .\Launcher.py Changing .\Launch_PyGadgets.py Changing .\Launch_PyDemos.pyw  ...more deleted... Changing .\PyTools\visitor_find.py Changing .\PyTools\visitor_fixnames.py Changing .\PyTools\visitor_find_quiet2.py Changing .\PyTools\visitor_find_quiet1.py Changing .\PyTools\fixeoln_one.doc.txt Files matched (converted or not): 1065 C:\temp\examples>python %X%/PyTools/visitor_fixeoln.py tounix   ...more deleted... .\Extend\Swig\Shadow ... .\ ... .\EmbExt\Exports ... .\EmbExt\Exports\ClassAndMod ... .\EmbExt\Regist ... .\PyTools ... Files matched (converted or not): 1065 

If you run this script and the original fixeoln_all.py on the book examples tree, you'll notice that this version visits two fewer matched files. This simply reflects the fact that fixeoln_all also collects and skips over two directory names for its patterns in the find.find result (both called "Output"). In all other ways, this version works the same way even when it could do better; adding a break statement after the convertEndlines call here avoids visiting files that appear redundantly in the original's find results lists.

The second command here takes roughly two-thirds as long as the first to finish on my computer (there are no files to be converted). That's roughly 33 percent faster than the original find.find-based version of this script, but they differ in the amount of output, and benchmarks are usually much subtler than you imagine. Most of the real clock time is likely spent scrolling text in the console, not doing any real directory processing. Since both are plenty fast for their intended purposes, finer-grained performance figures are left as exercises.

The script in Example 7-18 combines the original convertOne function (to rename a single file or directory) with the visitor's tree walker class, to create a directory tree-wide fix for uppercase filenames. Notice that we redefine both file and directory visitation methods here, as we need to rename both.

Example 7-18. PP3E\PyTools\visitor_fixnames.py

 ############################################################### # recode fixnames_all.py name case fixer with the Visitor class # note: "from fixnames_all import convertOne" doesn't help at # top level of the fixnames class, since it is assumed to be a # method and called with extra self argument (an exception); ############################################################### from visitor import FileVisitor class FixnamesVisitor(FileVisitor):     """     check filenames at and below startDir for uppercase     """     import fixnames_all     def _ _init_ _(self, listonly=False):         FileVisitor._ _init_ _(self, listonly=listonly)         self.ccount = 0     def rename(self, pathname):         if not self.listonly:             convertflag = self.fixnames_all.convertOne(pathname)             self.ccount += convertflag     def visitdir(self, dirname):         FileVisitor.visitdir(self, dirname)         self.rename(dirname)     def visitfile(self, filename):         FileVisitor.visitfile(self, filename)         self.rename(filename) if _ _name_ _ == '_ _main_ _':     walker = FixnamesVisitor( )     walker.run( )     allnames = walker.fcount + walker.dcount     print 'Converted %d files, visited %d' % (walker.ccount, allnames) 

This version is run like the original find.find-based version, fixnames_all, but visits one more name (the top-level root directory), and there is no initial delay while filenames are collected on a listwe're using os.path.walk again, not find.find. It's also close to the original os.path.walk version of this script but is based on a class hierarchy, not direct function callbacks:

 C:\temp\examples>python %X%/PyTools/visitor_fixnames.py   ...more deleted... 303 => .\_ _init_ _.py 304 => .\_ _init_ _.pyc 305 => .\Ai\ExpertSystem\holmes.tar 306 => .\Ai\ExpertSystem\TODO Convert dir=.\Ai\ExpertSystem file=TODO? (y|Y)  307 => .\Ai\ExpertSystem\_ _init_ _.py 308 => .\Ai\ExpertSystem\holmes\cnv 309 => .\Ai\ExpertSystem\holmes\README.1ST Convert dir=.\Ai\ExpertSystem\holmes file=README.1ST? (y|Y)   ...more deleted... 1353 => .\PyTools\visitor_find.pyc 1354 => .\PyTools\visitor_find_quiet1.py 1355 => .\PyTools\fixeoln_one.doc.txt Converted 1 files, visited 1474 

Both of these fixer scripts work roughly the same way as the originals, but because the directory-walking logic lives in just one file (visitor.py), it needs to be debugged only once. Moreover, improvements in that file will automatically be inherited by every directory-processing tool derived from its classes. Even when coding system-level scripts, reuse and reduced redundancy pay off in the end.

7.5.5. Fixing File Permissions in Trees

Just in case the preceding visitor-client sections weren't quite enough to convince you of the power of code reuse, another piece of evidence surfaced very late in this book project. It turns out that copying files off a CD using Windows drag-and-drop sometimes makes them read only in the copy. That's less than ideal for the book examples distribution if it is obtained on CD; you must copy the directory tree onto your hard drive to be able to experiment with program changes (naturally, files on CD can't be changed in place). But if you copy with drag-and-drop, you may wind up with a tree of more than 1,000 read-only files.

The book CD use cases described for this and some other examples in this chapter are something of historic artifacts today. As mentioned in the Preface, as of this third edition, the book's examples are made available on the Web instead of on an enclosed CD.

The Web is more pervasive today and allows for much more dynamic updates. However, even though the book CD is a vestige of the past, the examples which were originally coded to manage it still apply to other types of CDs and so are generally useful tools.


Since drag-and-drop is perhaps the most common way to copy off a CD on Windows, I needed a portable and easy-to-use way to undo the read-only setting. Asking readers to make all of these writable by hand would be impolite, to say the least. Writing a full-blown install system seemed like overkill. Providing different fixes for different platforms doubles or triples the complexity of the task.

Much better, the Python script in Example 7-19 can be run in the root of the copied examples directory to repair the damage of a read-only drag-and-drop operation. It specializes the traversal implemented by the FileVisitor class again, this time to run an os.chmod call on every file and directory visited along the way.

Example 7-19. PP3E\PyTools\fixreadonly-all.py

 #!/usr/bin/env python ########################################################################### # Use: python PyTools\fixreadonly-all.py # run this script in the top-level examples directory after copying all # examples off the book's CD-ROM, to make all files writable again--by # default, copying files off the CD with Windows drag-and-drop (at least) # may create them as read-only on your hard drive; this script traverses # entire directory tree at and below the dir it is run in (all subdirs); ########################################################################### import os from PP3E.PyTools.visitor import FileVisitor          # os.path.walk wrapper listonly = False class FixReadOnly(FileVisitor):     def _ _init_ _(self, listonly=0):         FileVisitor._ _init_ _(self, listonly=listonly)     def visitDir(self, dname):         FileVisitor.visitfile(self, fname)         if self.listonly:             return         os.chmod(dname, 0777)     def visitfile(self, fname):         FileVisitor.visitfile(self, fname)         if self.listonly:             return         os.chmod(fname, 0777) if _ _name_ _ == '_ _main_ _':     # don't run auto if clicked     go = raw_input('This script makes all files writeable; continue?')     if go != 'y':         raw_input('Canceled - hit enter key')     else:         walker = FixReadOnly(listonly)         walker.run( )         print 'Visited %d files and %d dirs' % (walker.fcount, walker.dcount) 

As we saw in Chapter 3, the built-in os.chmod call changes the permission settings on an external file (here, to 0777global read, write, and execute permissions). Because os.chmod and the FileVisitor's operations are portable, this same script will work to set permissions in an entire tree on both Windows and Unix-like platforms. Notice that it asks whether you really want to proceed when it first starts up, just in case someone accidentally clicks the file's name in an explorer GUI. Also note that Python must be installed before this script can be run in order to make files writable; that seems a fair assumption to make about users who are about to change Python scripts.

 C:\temp\examples>python PyTools\fixreadonly-all.py  This script makes all files writeable; continue?y  . ... 1 => .\autoexec.bat 2 => .\cleanall.csh 3 => .\echoEnvironment.pyw  ...more deleted... 1352 => .\PyTools\visitor_find.pyc 1353 => .\PyTools\visitor_find_quiet1.py 1354 => .\PyTools\fixeoln_one.doc.txt Visited 1354 files and 119 dirs 

7.5.6. Changing Unix Executable Path Lines

Finally, the following script does something more unique: it uses the visitor classes to replace the "#!" lines at the top of all scripts in a directory tree (this line gives the path to the Python interpreter on Unix-like machines). It's easy to do this with the visitor_replace script of Example 7-13 that we coded earlier. For example, say something like this to replace all #!/usr/bin/python lines with #!\Python24\python:

 C:\...\PP3E>python PyTools\visitor_replace.py                     #!/usr/bin/python #!\Python24\python 

Lots of status messages scroll by unless redirected to a file. visitor_replace does a simple global search-and-replace operation on all nonbinary files in an entire directory tree. It's also a bit naïve: it won't change other "#!" line patterns that mention python (e.g., you'll have to run it again to change #!/usr/local/bin/python), and it might change occurrences besides those on a first line. That probably won't matter, but if it does, it's easy to write your own visitor subclass to be more accurate.

When run, the script in Example 7-20 converts all "#!" lines in all script files in an entire tree. It changes every first line that starts with "#!" and names "python" to a line you pass in on the command line or assign in the script, like this:

 C:\...\PP3E>python PyTools\visitor_poundbang.py #!\MyPython24\python Are you sure?y . ... 1 => .\_ _init_ _.py 2 => .\PyDemos2.pyw 3 => .\towriteable.py ... 1474 => .\Integrate\Mixed\Exports\ClassAndMod\output.prog1 1475 => .\Integrate\Mixed\Exports\ClassAndMod\setup-class.csh Visited 1475 files and 133 dirs, changed 190 files .\towriteable.py .\Launch_PyGadgets.py .\Launch_PyDemos.pyw ... C:\...\PP3E>type .\Launch_PyGadgets.py #!\MyPython24\python ############################################### # PyGadgets + environment search/config first ... 

This script caught and changed 190 files (more than visitor_replace), so there must be other "#!" line patterns lurking in the examples tree besides #!/usr/bin/python.

Example 7-20. PP3E\PyTools\visitor_poundbang.py

 ########################################################################## # change all "#!...python" source lines at the top of scripts to either # commandline arg or changeToDefault, in all files in all dirs at and # below the dir where run; could skip binary filename extensions too, # but works ok; this version changes all #! first lines that name python, # and so is more accurate than a simple visitor_replace.py run; ########################################################################## """ Run me like this, to convert all scripts in the book examples tree, and redirect/save messages to a file: C:\...\PP3E>python PyTools\visitor_poundbang.py                            #!\MyPython24\python > out.txt """ import sys from PP3E.PyTools.visitor import FileVisitor    # reuse the walker classes changeToDefault = '#!\Python24\python'          # used if no cmdline arg class PoundBangFixer(FileVisitor):     def _ _init_ _(self, changeTo=changeToDefault):         FileVisitor._ _init_ _(self)         self.changeTo = changeTo         self.clist    = []     def visitfile(self, fullname):         FileVisitor.visitfile(self, fullname)         try:             lines = open(fullname, 'r').readlines( )             if (len(lines) > 0        and                 lines[0][0:2] == '#!' and                # or lines[0].startswith( )                 'python' in lines[0]                     # or lines[0].find( ) != -1                 ):                 lines[0] = self.changeTo + '\n'                 open(fullname, 'w').writelines(lines)                 self.clist.append(fullname)         except:             print 'Error translating %s -- skipped' % fullname             print '...', sys.exc_info( ) if _ _name_ _ == '_ _main_ _':     if raw_input('Are you sure?') != 'y': sys.exit( )     if len(sys.argv) == 2: changeToDefault = sys.argv[1]     walker = PoundBangFixer(changeToDefault)     walker.run( )     print 'Visited %d files and %d dirs,' % (walker.fcount, walker.dcount),     print 'changed %d files' % len(walker.clist)     for fname in walker.clist: print fname 

7.5.7. Summary: Counting Source Lines Four Ways

We've seen a few techniques for scanning directory trees in this book so far. To summarize and contrast, this section briefly lists four scripts that count the number of lines in all program source files in an entire tree. Each script uses a different directory traversal scheme, but returns the same result.

I counted 41,938 source lines of code (SLOC) in the book examples distribution with these scripts, as of November 2001 (for the second edition of this book). Study these scripts' code for more details. They don't count everything (e.g., they skip makefiles), but are comprehensive enough for ballpark figures. Here's the output for the visitor class version when run on the root of the book examples tree; the root of the tree to walk is passed in as a command-line argument, and the last output line is a dictionary that keeps counts for the specific file-type extensions in the tree:

 C:\temp>python wcall_visitor.py %X% ...lines deleted... C:\PP2ndEd\examples\PP3E\Integrate\Mixed\Exports\ClassAndMod\cinterface.py C:\PP2ndEd\examples\PP3E\Integrate\Mixed\Exports\ClassAndMod\main-table.c Visited 1478 files and 133 dirs -------------------------------------------------------------------------------- Files=> 903 Lines=> 41938 {'.c': 46, '.cgi': 24, '.html': 41, '.pyw': 11, '.cxx': 2, '.py': 768,  '.i': 3,  '.h': 8} 

The first version, listed in Example 7-21, counts lines using the standard library's os.path.walk call, which we met in Chapter 4 (using os.walk would be similar, but we would replace the callback function with a for loop, and subdirectories and files would be segregated into two lists of names).

Example 7-21. PP3E\PyTools\wcall.py

 ################################################################## # count lines in all source files in tree; os.path.walk version ################################################################## import os, sys allLines = allFiles = 0 allExts  = ['.py', '.pyw', '.cgi', '.html', '.c', '.cxx', '.h', '.i'] allSums = dict.fromkeys(allExts, 0) def sum(dir, file, ext):     global allFiles, allLines     print file     fname = os.path.join(dir, file)     lines = open(fname).readlines( )     allFiles += 1                                     # or all = all + 1     allLines += len(lines)     allSums[ext] += 1 def wc(ignore, dir, fileshere):     for file in fileshere:         for ext in allExts:             if file.endswith(ext):                    # or f[-len(e):] == e                 sum(dir, file, ext)                 break if _ _name_ _ == '_ _main_ _':     os.path.walk(sys.argv[1], wc, None)               # cmd arg=root dir     print '-'*80     print 'Files=>', allFiles, 'Lines=>', allLines     print allSums 

Counting with the find module we wrote at the end of Chapter 4 with Example 7-22 is noticeably simpler, though we must wait for the list of files to be collected.

Example 7-22. PP3E\PyTools\wcall_find.py

 ################################################################### # count lines in all source files in tree; find file list version ################################################################### import sys from wcall import allExts from PP3E.PyTools.find import find allLines = allFiles = 0 allSums  = dict.fromkeys(allExts, 0) def sum(fname, ext):     global allFiles, allLines     print fname     lines = open(fname).readlines( )     allFiles += 1     allLines += len(lines)     allSums[ext] += 1 for file in find('*', sys.argv[1]):     for ext in allExts:         if file.endswith(ext):             sum(file, ext)             break print '-'*80 print 'Files=>', allFiles, 'Lines=>', allLines print allSums 

The prior script collected all source files in the tree with find and manually checked their extensions; the next script (Example 7-23) uses the pattern-matching capability in find to collect only source files in the result list.

Example 7-23. PP3E\PyTools\wcall_find_patt.py

 ################################################################## # count lines in all source files in tree; find patterns version ################################################################## import sys from wcall import allExts from PP3E.PyTools.find import find allLines = allFiles = 0 allSums  = dict.fromkeys(allExts, 0) def sum(fname, ext):     global allFiles, allLines     print fname     lines = open(fname).readlines( )     allFiles += 1     allLines += len(lines)     allSums[ext] += 1 for ext in allExts:     files = find('*' + ext, sys.argv[1])     for file in files:         sum(file, ext) print '-'*80 print 'Files=>', allFiles, 'Lines=>', allLines print allSums 

And finally, Example 7-24 is the SLOC counting logic refactored to use the visitor class framework we wrote in this chapter; OOP adds a bit more code here, but this version is more accurate (if a directory name happens to have a source-like extension, the prior versions will incorrectly tally it). More importantly, by using OOP:

  • We get the superclass's walking logic for free, including a directory counter.

  • We have a self-contained package of names that supports multiple independent instances and can be used more easily in other contexts.

  • We can further customize this operation because it is a class.

  • We will automatically inherit any changes made to visitor in the future.

Even in the systems tools domains, strategic thinking can pay off eventually.

Example 7-24. PP3E\PyTools\wcall_visitor.py

 ################################################################## # count lines in all source files in tree; visitor class version ################################################################## import sys from wcall import allExts from PP3E.PyTools.visitor import FileVisitor class WcAll(FileVisitor):     def _ _init_ _(self):         FileVisitor._ _init_ _(self)         self.allLines = self.allFiles = 0         self.allSums  = dict.fromkeys(allExts, 0)     def sum(self, fname, ext):         print fname         lines = open(fname).readlines( )         self.allFiles += 1         self.allLines += len(lines)         self.allSums[ext] += 1     def visitfile(self, filepath):         self.fcount += 1         for ext in allExts:             if filepath.endswith(ext):                 self.sum(filepath, ext)                 break if _ _name_ _ == '_ _main_ _':     walker = WcAll( )     walker.run(sys.argv[1])     print 'Visited %d files and %d dirs' % (walker.fcount, walker.dcount)     print '-'*80     print 'Files=>', walker.allFiles, 'Lines=>', walker.allLines     print walker.allSums 




Programming Python
Programming Python
ISBN: 0596009259
EAN: 2147483647
Year: 2004
Pages: 270
Authors: Mark Lutz

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net