Section 7.3. Fixing DOS Filenames


7.3. Fixing DOS Filenames

The heart of the prior script was findFiles, a function that knows how to portably collect matching file and directory names in an entire tree, given a list of filename patterns. It doesn't do much more than the built-in find.find call, but it can be augmented for our own purposes. Because this logic was bundled up in a function, though, it automatically becomes a reusable tool.

For example, the next script imports and applies findFiles, to collect all filenames in a directory tree, by using the filename pattern * (it matches everything). I use this script to fix a legacy problem in the book's examples tree. The names of some files created under MS-DOS were made all uppercase; for example, spam.py became SPAM.PY somewhere along the way. Because case is significant both in Python and on some platforms, an import statement such as import spam will sometimes fail for uppercase filenames.

To repair the damage everywhere in the thousand-file examples tree, I wrote and ran Example 7-6. It works like this: for every filename in the tree, it checks to see whether the name is all uppercase and asks the console user whether the file should be renamed with the os.rename call. To make this easy, it also comes up with a reasonable default for most new namesthe old one in all-lowercase form.

Example 7-6. PP3E\PyTools\fixnames_all.py

 ########################################################################## # Use: "python ..\..\PyTools\fixnames_all.py". # find all files with all uppercase names at and below the current # directory ('.'); for each, ask the user for a new name to rename the # file to; used to catch old uppercase filenames created on MS-DOS # (case matters, when importing Python module files); caveats: this # may fail on case-sensitive machines if directory names are converted # before their contents--the original dir name in the paths returned by # find may no longer exist; the allUpper heuristic also fails for # odd filenames that are all non-alphabetic (ex: '.'); ########################################################################## import os, string listonly = False def allUpper(name):     for char in name:         if char in string.lowercase:    # any lowercase letter disqualifies             return 0                    # else all upper, digit, or special     return 1 def convertOne(fname):     fpath, oldfname = os.path.split(fname)     if allUpper(oldfname):         prompt = 'Convert dir=%s file=%s? (y|Y)' % (fpath, oldfname)         if raw_input(prompt) in ['Y', 'y']:             default  = oldfname.lower( )             newfname = raw_input('Type new file name (enter=%s): ' % default)             newfname = newfname or default             newfpath = os.path.join(fpath, newfname)             os.rename(fname, newfpath)             print 'Renamed: ', fname             print 'to:      ', str(newfpath)             raw_input('Press enter to continue')             return 1     return 0 if _ _name_ _ == '_ _main_ _':     patts = "*"                              # inspect all filenames     from fixeoln_all import findFiles        # reuse finder function     matches = findFiles(patts)     ccount = vcount = 0     for matchlist in matches:                # list of lists, one per pattern         for fname in matchlist:              # fnames are full directory paths             print vcount+1, '=>', fname      # includes names of directories             if not listonly:                 ccount += convertOne(fname)             vcount += 1     print 'Converted %d files, visited %d' % (ccount, vcount) 

As before, the findFiles function returns a list of simple filename lists, representing the expansion of all patterns passed in (here, just one result list, for the wildcard pattern *).[*] For each file and directory name in the result, this script's convertOne function prompts for name changes; an os.path.split and an os.path.join call combination portably tacks the new filename onto the old directory name. Here is a renaming session in progress on Windows:

[*] Interestingly, using string '*' for the patterns list works the same way as using list ['*'] here, only because a single-character string is a sequence that contains itself; compare the results of map(find.find, '*') with map(find.find, ['*']) interactively to verify.

 C:\temp\examples>python %X%\PyTools\fixnames_all.py  Using Python find 1 => .\.cshrc 2 => .\LaunchBrowser.out.txt 3 => .\LaunchBrowser.py ...  ...more deleted... ... 218 => .\Ai 219 => .\Ai\ExpertSystem 220 => .\Ai\ExpertSystem\TODO Convert dir=.\Ai\ExpertSystem file=TODO? (y|Y)n  221 => .\Ai\ExpertSystem\_ _init_ _.py 222 => .\Ai\ExpertSystem\holmes 223 => .\Ai\ExpertSystem\holmes\README.1ST Convert dir=.\Ai\ExpertSystem\holmes file=README.1ST? (y|Y)y  Type new file name (enter=readme.1st): Renamed:  .\Ai\ExpertSystem\holmes\README.1st to:       .\Ai\ExpertSystem\holmes\readme.1st Press enter to continue 224 => .\Ai\ExpertSystem\holmes\README.2ND Convert dir=.\Ai\ExpertSystem\holmes file=README.2ND? (y|Y)y  Type new file name (enter=readme.2nd): readme-more  Renamed:  .\Ai\ExpertSystem\holmes\README.2nd to:       .\Ai\ExpertSystem\holmes\readme-more Press enter to continue ...  ...more deleted... ... 1471 => .\todos.py 1472 => .\tounix.py 1473 => .\xferall.linux.csh Converted 2 files, visited 1473 

This script could simply convert every all-uppercase name to an all-lowercase equivalent automatically, but that's potentially dangerous (some names might require mixed case). Instead, it asks for input during the traversal and shows the results of each renaming operation along the way.

7.3.1. Rewriting with os.path.walk

Notice, though, that the pattern-matching power of the find.find call goes completely unused in this script. Because this call must always visit every file in the tree, the os.path.walk interface we studied in Chapter 4 would work just as well and avoids any initial pause while a filename list is being collected (that pause is negligible here but may be significant for larger trees). Example 7-7 is an equivalent version of this script that does its tree traversal with the walk callbacks-based model.

Example 7-7. PP3E\PyTools\fixnames_all2.py

 ########################################################################### # Use: "python ..\..\PyTools\fixnames_all2.py". # same, but use the os.path.walk interface, not find.find; to make this # work like the simple find version, puts off visiting directories until # just before visiting their contents (find.find lists dir names before # their contents); renaming dirs here can fail on case-sensitive platforms # too--walk keeps extending paths containing old dir names; ########################################################################### import os listonly = False from fixnames_all import convertOne def visitname(fname):     global ccount, vcount     print vcount+1, '=>', fname     if not listonly:         ccount += convertOne(fname)     vcount += 1 def visitor(myData, directoryName, filesInDirectory):  # called for each dir     visitname(directoryName)                           # do dir we're in now,     for fname in filesInDirectory:                     # and non-dir files here         fpath = os.path.join(directoryName, fname)     # fnames have no dirpath         if not os.path.isdir(fpath):             visitname(fpath) ccount = vcount = 0 os.path.walk('.', visitor, None) print 'Converted %d files, visited %d' % (ccount, vcount) 

This version does the same job but visits one extra file (the topmost root directory), and it may visit directories in a different order (os.listdir results are unordered). Both versions run in similar time for the examples directory tree on my computer.[*] We'll revisit this script, as well as the fixeoln line-end fixer, in the context of a general tree-walker class hierarchy later in this chapter.

[*] A very subtle thing: both versions of this script might fail on platforms where case matters if they rename directories along the way. If a directory is renamed before the contents of that directory have been visited (e.g., a directory SPAM renamed to spam), then later reference to the directory's contents using the old name (e.g., SPAM/filename) will no longer be valid on case-sensitive platforms. This can happen in the find.find version, because directories can and do show up in the result list before their contents. It's also a potential with the os.path.walk version, because the prior directory path (with original directory names) keeps being extended at each level of the tree. I use this script only on Windows (DOS), so I haven't been bitten by this in practice. Workaroundsordering find result lists, walking trees in a bottom-up fashion, making two distinct passes for files and directories, queuing up directory names on a list to be renamed later, or simply not renaming directories at allare all complex enough to be delegated to the realm of reader experiments (see the newer os.walk walker in Chapter 4 for bottom-up traversal options). As a rule of thumb, changing a tree's names or structure while it is being walked is a risky venture.




Programming Python
Programming Python
ISBN: 0596009259
EAN: 2147483647
Year: 2004
Pages: 270
Authors: Mark Lutz

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net