C.8 Chapter 8

Describing a directory . There are several solutions to this exercise, naturally. One simple solution is:

 import os, sys, stat def describedir(start):     def describedir_helper(arg, dirname, files):         """ Helper function for describing directories """         print "Directory %s has files:" % dirname         for file in files:             # find the full path to the file (directory + filename)             fullname = os.path.join(dirname, file)             if os.path.isdir(fullname):                 # if it's a directory, say so; no need to find the size                 print '  '+ file + ' (subdir)'              else:                  # find out the size, and print the info.                 size = os.stat(fullname)[stat.ST_SIZE]                 print '  '+file+' size='  + `size`     # Start the 'walk'.     os.path.walk(start, describedir_helper, None)

which uses the walk function in the os.path module, and works just fine:

 >>>  import describedir  >>>  describedir.describedir2('testdir')  Directory testdir has files:   describedir.py size=939   subdir1 (subdir)   subdir2 (subdir) Directory testdir\subdir1 has files:   makezeros.py size=125   subdir3 (subdir) Directory testdir\subdir1\subdir3 has files: Directory testdir\subdir2 has files:

Note that you could have found the size of the files by doing len( open (fullname, 'rb').read()) , but this works only when you have read access to all the files and is quite inefficient. The stat call in the os module gives out all kinds of useful information in a tuple, and the stat module defines some names that make it unnecessary to remember the order of the elements in that tuple. See the Library Reference for details.

Modifying the prompt. The key to this exercise is to remember that the ps1 and ps2 attributes of the sys module can be anything, including a class instance with a __ repr __ or _ _ str __ method. For example:

 import sys, os class MyPrompt:     def __init__(self, subprompt='>>> '):         self.lineno = 0         self.subprompt = subprompt     def __repr__(self):         self.lineno = self.lineno + 1         return os.getcwd()+'%d'%(self.lineno)+self.subprompt sys.ps1 = MyPrompt() sys.ps2 = MyPrompt('... ')

This code works as shown (use the - i option of the Python interpreter to make sure your program starts right away):

 h:\David\book>  python -i modifyprompt.py  h:\David\book1>>>  x = 3  h:\David\book2>>>  y = 3  h:\David\book3>>>  def foo():  h:\David\book3...  x = 3  # the secondary prompt is supported h:\David\book3... h:\David\book4>>>  import os  h:\David\book5>>>  os.chdir('..')  h:\David6>>>                             # note the prompt changed!

Avoiding regular expressions. This program is long and tedious , but not especially complicated. See if you can understand how it works. Whether this is easier for you than regular expressions depends on many factors, such as your familiarity with regular expressions and your comfort with the functions in the string module. Use whichever type of programming works for you.

 import string file = open('pepper.txt') text = file.read() paragraphs = string.split(text, '\n\n') def find_indices_for(big, small):     indices = []     cum = 0     while 1:         index = string.find(big, small)         if index == -1:             return indices         indices.append(index+cum)         big = big[index+len(small):]         cum = cum + index + len(small) def fix_paragraphs_with_word(paragraphs, word):     lenword = len(word)     for par_no in range(len(paragraphs)):         p = paragraphs[par_no]         wordpositions = find_indices_for(p, word)         if wordpositions == []: return         for start in wordpositions:             # look for 'pepper' ahead             indexpepper = string.find(p, 'pepper')             if indexpepper == -1: return -1             if string.strip(p[start:indexpepper]) != '':                 # something other than whitespace in between!                 continue             where = indexpepper+len('pepper')             if p[where:where+len('corn')] == 'corn':                 # it's immediately followed by 'corn'!                 continue             if string.find(p, 'salad') < where:                 # it's not followed by 'salad'                 continue             # Finally! we get to do a change!             p = p[:start] + 'bell' + p[start+lenword:]             paragraphs[par_no] = p         # change mutable argument! fix_paragraphs_with_word(paragraphs, 'red') fix_paragraphs_with_word(paragraphs, 'green') for paragraph in paragraphs:     print paragraph+'\n'

We won't repeat the output here; it's the same as that of the regular expression solution.

Wrapping a text file with a class. This one is surprisingly easy, if you understand classes and the split function in the string module. The following is a version that has one little twist over and beyond what we asked for:

 import string class FileStrings:     def __init__(self, filename=None, data=None):         if data == None:             self.data = open(filename).read()         else:             self.data = data         self.paragraphs = string.split(self.data, '\n\n')         self.lines = string.split(self.data, '\n')         self.words = string.split(self.data)     def __repr__(self):         return self.data     def paragraph(self, index):         return FileStrings(data=self.paragraphs[index])     def line(self, index):         return FileStrings(data=self.lines[index])     def word(self, index):         return self.words[index]

This solution, when applied to the file pepper.txt , gives:

 >>>  from FileStrings import FileStrings  >>>  bigtext = FileStrings('pepper.txt')  >>>  print bigtext.paragraph(0)  This is a paragraph that mentions bell peppers multiple times.  For one, here is a red Pepper and dried tomato salad recipe.  I don't like to use green peppers in my salads as much because they have a harsher flavor. >>>  print bigtext.line(0)  This is a paragraph that mentions bell peppers multiple times.  For >>>  print bigtext.line(-4)  aren't peppers, they're chilies, but would you rather have a good cook >>>  print bigtext.word(-4)  botanist

How does it work? The constructor simply reads all the file into a big string (the instance attribute data ) and then splits it according to the various criteria, keeping the results of the splits in instance attributes that are lists of strings. When returning from one of the accessor methods , the data itself is wrapped in a FileStrings object. This isn't required by the assignment, but it's nice because it means you can chain the operations, so that to find out what the last word of the third line of the third paragraph is, you can just write:

 >>>  print bigtext.paragraph(2).line(2).word(-1)  'cook'