Chapter 8. Working with Files

CONTENTS
  •  Simple File Operations
  •  Common File Methods
  •  Putting It All Together: The Address Book Example
  •  The Full address3.py Code
  •  Persisting Objects with pickle
  •  pickle and the Address Book Application
  •  Summary

Terms in This Chapter

  • Class instance

  • Comment

  • Data persistence

  • Data structure

  • Document string

  • Dump

  • File

  • File object

  • File pointer

  • for, if, and while statements

  • for loop

  • Java Virtual Machine

  • Method

  • Mode

  • newline

  • Object instance

  • pickle module

  • Scaffolding

  • try except

  • try finally

  • while loop

I've written programs that read and write data from files in Java, Delphi, Visual Basic, C++, and C. None have such a straightforward and easy approach to dealing with files as Python's. In this chapter we'll be covering file input and output, which is important for saving data. First we'll cover simple I/O; later we'll deal with persisting class instances.

Writing a program that resides only in memory is good for illustration, but eventually you'll need to write that program to a file. Do you remember our address book example in Chapter 4? What happened to the addresses when the program ended? Gone, and that isn't good. We need a way to save addresses so that we can use them again. Files fit the bill.

Simple File Operations

Before we begin our tour of files, we'll create a directory, c:\dat, where we'll put the data file we'll be working with throughout the chapter.

In Python, file I/O is built into the language, so working with files doesn't require an API or library. You just make a call to the intrinsic (built-in) open() function, which returns a file object. The typical form of open() is open(filename, mode).

Say, for example, that you want to open a file in c:\dat and write "Hello World" to it. Here's how to do that. (Start up the interactive interpreter and follow along.)

>>> file = open("c:\\dat\\hello.txt", "w") >>> file.write("Hello World ") >>> file.write("Hello Mars ") >>> file.write("Hello Jupiter ") >>> file.close()

Now, using your favorite editor, open up c:\dat\hello.txt. You should see this:

Hello World Hello Mars Hello Jupiter

Determining the File Object Type

To see the type of a file object, enter

>>> file = open("\\dat\\test.dat","w") >>> type(file)

Under Jython and Jython you get

<jclass org.python.core.PyFile at 2054104961>

Under Python (a.k.a CPython) you get

<type 'file'>

File Modes

The Python file object supports the following modes:

  • Write "w", "wb"

  • Read "r", "rb"

  • Read and write "r+", "r+b"

  • Append "a", "ab"

The "b" appended to the mode signifies binary, which is necessary under Windows and Macintosh operating systems to work with binary files. You don't need binary mode with Jython because its code executes in the context of the Java Virtual Machine (JVM), which in this case is like a virtual operating system sitting atop the base operating systems, making them behave alike. You don't need binary mode, either, if you're running CPython under UNIX

Persisted Data

Writing to files without reading them doesn't do you much good unless you're creating a report of some kind. If you want to persist data so you can use it later, you have to be able to read it back into the program. Here's how we read the data that we wrote in the first example (please follow along):

>>> file = open("\\dat\\hello.txt", "r") >>> file.read() 'Hello Mars Hello Jupiter' >>> file.close()

The read() method returns a string representing the contents of the file.

Common File Methods

The Python file object supports the methods shown in the following list:

  • read() read in data from the file

  • readline() read a line of data from a file

  • readlines() read all the lines in the file and return them as a tuple of strings

  • write() write data to a file

  • writelines() write a tuple of lines to a file

  • seek() move to a certain point in a file

  • tell() determine the current location in the file

This is only a partial list. You can read about other file object methods in the Python documentation.

A Little Review

You can get a list of methods that a file object supports by using the dir() command like this:

>>> file = open("\\dat\\hello.txt") >>> dir (file) ['close', 'closed', 'fileno', 'flush', 'isatty', 'mode', 'name', 'read', 'readinto', 'readline', 'readlines', 'seek', 'softspace', 'tell', 'truncate', 'write', 'writelines'] dir() lists all public members of a class.

write() and readline()

In our c:\dat\hello.txt example, we wrote three strings. Then we used a text editor to read that file and saw that all three strings were on the same line. Of course, this is a very short file, but if all of the items we wrote to a standard-size file were written on the same line, the file would be hard to read and parse. To avoid this, we can write a newline character (\n) at the end of each line to denote that the text that follows begins a new line.

Let's continue with some examples that demonstrate writing data on separate lines. Follow along in the interactive interpreter.

First we write three lines to a file like this:

>>> fname = "c:\\dat\\data.txt" >>> f = open (fname, "w")  #open the file in write mode >>> f.write("line1 \n")         #write out a line of text. >>>                              # the \n signifies newline >>> f.write("line2 ")            #write out some text. >>> f.write(" still line2")      #write out some more text >>> f.write(" \n")        #write out a newline character >>> f.write("line3 \n")          #write out a line of text >>> f.close()             #close the file >>> f = None              #set the ref to none 

Without the newline character, all of the text is on the same line, which you can see by opening the file (c:\dat\data.txt) and comparing each line of text with the code that created it.

f.write("line2 ")        #write out some text. f.write(" still line2")  #write out some more text f.write(" \n")                  #write out a newline character 

Now let's reopen our file in read mode and read each line individually with the readline() method.

>>> f = open (fname, "r")  # open the file in read mode >>> line = f.readline()           # read one line in and store it in line >>> line                          # line one contains the first line we >>>                               # wrote. Note "\n" = "\12" 'line1 \ 12' >>> print line                    # print the line. note that the newline >>>                               # is still attatched to the line line1 >>> f.readline()           #read the second line. 'line2    still line2 \ 12' >>> f.readline()           #read the third line 'line3 \ 12' >>> f.close()        #close the file >>> 

Notice that we have a lot fewer readline() calls than we had write() calls. This is because readline() reads a line of text until it hits the newline character, which it interprets as the last character in the input string.

readlines()

The readlines() method (note plural) reads all of the lines in the file and puts them in a list of strings. To illustrate, we'll read all of the lines in the file at once with the following interactive session:

>>> f = open(fname, "r")   #reopen the file in read mode >>> list = f.readlines()   #read in all the lines at once >>> list                          #display the list ['line1 \ 12', 'line2   still line2 \ 12', 'line3 \ 12'] >>> for line in list:             #print out each line ...        print line ... line1 line2 still line2 line3 >>>

Getting Rid of \n

You may want to dispose of the newline character when you read in a line. Here's the not very Python way of doing this:

>>> f = open("\\dat\\data.txt")  #reopen the file again >>> line = f.readline()    #read in the line >>> line                   #before removing the newline character 'line1 \ 12' >>> line = line[0:len(line)-1] #chop of the newline character >>> line                   #After the newline character is gone. 'line1 ' >>> f.close()       #close the file

Here's the more Python way:

>>> f = open("\\dat\\data.txt")   # reopen the file >>> line = (f.readline())[:-1]    # read the line and chop >>>                                     # the newline character off. >>> line 'line1 '

We're doing two things at once with this call. We're putting readline() in parentheses and then using the [] operator with slice notation on the list it returns.

line = (f.readline())[:-1]             # read the line and chop                      # the newline character off.

In case you didn't catch what we just did, here's the same thing in slow motion, reading the second line, with a few more code steps for clarity.

>>> line2 = f.readline()    # Read in line2. >>> line2 = line2[:-1]             # Using slice notation assign >>>                                # line2 to line2 from the first >>>                                # character up to but not including >>>                         # the last character. >>> line2                          # Display line2 'line2 still line2 '

For a review of slice notation, go back to Chapters 1, 2 and 3.

read()

Let's start a new example to show how to use read(). First we'll create a file and write the hex digits 1 through F to it. Then we'll open the file (c:\\dat\\read.txt) in write mode. (Don't worry about hex for now; just follow along.)

>>> f = open("\\dat\\read.txt", "w") >>> f.write("0123456789ABCDEF") >>> f.close()

That's the setup. Now we'll demonstrate the different ways to use read().

>>> f = open("\\dat\\read.txt", "r") >>> f.read(3)       # Read the first three characters. '012' >>> f.read(3)       # Read the next three characters in the file. '345' >>> f.read(4)       # Read the next four characters in the file. '6789' >>> f.read()        # Read the rest of the file. 'ABCDEF'

We can see that read(size) reads a specified number of characters from the file. Note that calling read() with no arguments reads the rest of the file and that the second read() starts reading where the first one left off.

tell() and seek()

We just saw that the file object keeps track of where we left off reading in a file. What if we want to move to a previous location? For that we need the tell() and seek() methods. Here's an example that continues our read() example:

>>> f.seek(0)        #reset the file pointer to 0 >>> f.read()  #read in the whole file '0123456789ABCDEF' >>> f.tell()  #see where the file pointer is 16 >>> f.seek(8) #move to the middle of the file >>> f.tell()    #see where the file pointer is 8 >>> f.read()   #read from the middle of the file to the end '89ABCDEF'

The second line reads in the whole file, which means that the file pointer was at the end. The third line uses tell() to report where the file pointer was, and then seek() positions the pointer to the middle of the file. Again, tell() reports that location. To demonstrate that read() picks up from the file pointer's location, we'll read the rest of the file and display it.

For Beginners: Try It Out

If you're sitting there staring at the book and wondering what I'm talking about, you need to do three things:

  1. Enter in the last example in the interactive interpreter, and experiment with tell(), seek(), and read().

  2. Open the file with your favorite text editor, and count the characters in it.

  3. Move around the file, and read various characters. To read a single character, call the read() method with 1 as its argument.

If you still don't get it, don't worry; we'll cover this more in the next section.

Putting It All Together: The Address Book Example

In Chapter 4, we used an address book application to demonstrate the if, for, and while statements. We're bringing it back to review the concepts we've learned so far.

name = "   " while (name != ""):        name = raw_input("Enter in the name: ")        if (name == ""):              continue        elif (name == "quit"):              break        phone_number = raw_input("Enter in the phone number: ")        address_line1 = raw_input("Enter address line one: ")        if (address_line1 == ""):              continue        address_line2 = raw_input("Enter address line two: ")        address_line3 = raw_input("Enter address line three: ")        #do something useful with the addresses else:        print("send the email to Rick")        #emailAddresses("rick_m_hightower@emailcompany.com")

Organization

The first thing we want to do is organize our program to store the data structures, so we'll create a class (the code is from address1.py).

class Address:        def __init__(self, name="", phone_number="", address_lines=[]):              self.__name=name              self.__phone_number=phone_number              self.__lines=address_lines

The Address class represents an entry in our address book. Later we'll add a __save__ method so we can write Address's contents out to a file. Before we do that, though, we need to change the address program to store the addresses in instances of Address, and we need to store the instances in some collection object I picked a dictionary. (I also added comments to make the flow of the program clear for those of you who forgot Chapter 4.) Here's the amended code (from address1.py):

dict = {}      # to hold the addresses, i.e., to hold        # instances of the Address class name = "   " while (name != ""):        # Get the name of the person name = raw_input("Enter in the name: ")        # if the name is empty then continue at the top of the loop        # else if the name equal quit then break out of the loop if (name == ""):        continue elif (name == "quit"):        break        # Get the phone number for the person phone_number = raw_input("Enter in the phone number: ")        # Get the address of the person        # If the first address line is blank then continue # with entering in the next address        # Otherwise gather the other two address lines address_line1 = raw_input("Enter address line one: ") if (address_line1 == ""):              #Create an address object and store        # it in the dictionary        address = Address(name, phone_number, [])        dict[name]=address        continue address_line2 = raw_input("Enter address line two: ") address_line3 = raw_input("Enter address line three: ")        #Create an address object and store it in the dictionary address = Address(name, phone_number, [address_line1, address_line2, address_line3])        dict[name]=address        #do something useful with the addresses else:        print("send the email to Rick")

The code consists of a while loop that gathers addresses from the user. If the user enters a blank string for the name, the loop starts at the top. If the user returns a blank for the first address line, that line is left off. Once all fields have been gathered, we construct an instance of the Address class and store it in the dictionary. As an exercise, try running the program from address1.py.

File Support

The next thing we do is provide file support for our program by adding the __save__ method in the Address class. Taking a file object as an argument, __save__ saves members of the class instance to a file. The following, from address2.py, shows Address.save:

class Address:        ...        ...        def save(self, file):               file.write(self.__name + "\n")               file.write(self.__phone_number + "\n")                      # if there are address lines then write them out               # to the file               if(len(self.__lines)>0):                      file.write("true" + "\n")                      file.write(`self.__lines` + "\n")

As you can see, save writes all of the members of the class. The code checks for any address lines; if there aren't any, it just skips writing them. If there are, it writes out "true" to the file and then writes out the address lines using the string representation `self.__lines`.

For Beginners: repr()

Remember the repr() built-in function from Chapter 6? It prints the string representation of an object so that it can be rebuilt with a call to the eval statement. Thus, when we do this:

file.write(`self.__lines` + "\n")

we get a string representation of the built-in list object. Note that repr() is called when you use back quotes, so `self__lines` is equivalent to repr(self.__lines).

repr() is essential for reading the list back in from the file. We'll review it with this interactive session:

>>> list = ["item1", "item2", "item3"] # create list >>> list                    # show the contents of the list ['item1', 'item2', 'item3'] >>> string = `list`         # return string representation of the list >>> type(string)            # show that this is a string <type 'string'> >>> string           #print the string to the console "['item1', 'item2', 'item3']" >>> list2=eval(string)      # use the eval statement to create >>>                                # the object from the string >>> list2                   # show that the newly created object >>>                         # is a list like the other list ['item1', 'item2', 'item3'] >>> list ==list2 1 >>> type(list2) <jclass org.python.core.PyList at 161274>

Writing to the File

The next thing we add is the ability to write each address out to the file (this example is from address2.py).

dict = {}  #to hold the addresses name = "   " while (name != ""):        ...        ... else:               #Open up the file for writing        file = open ("\\dat\\address2.txt", "w")               #write each address in the dictionary to the file        for address in dict.values():            address.save(file)

In the else clause of the while loop, we write out the addresses that we collected in the dictionary. First we open up c:\dat\address2.txt for reading. Then we create a for loop to iterate through the address objects in dict (the dictionary object). We write out each address instance by calling __save__ and pass it the file object we opened.

Let's run our program (address2.py), enter a few addresses, and look at the output from the file; we can compare it to the source code that generated it.

Wait a minute. There's no use writing items to a file unless we're going to read them back, so let's first add another method, read(), to our class. Here's the code (address3.py Address.read) for showing how to read an address instance:

def read(self, file):        """        The read method reads fields of the instance from a file.        The read method has one argument. The file argument holds the reference to the output file object.        """               # Read in the name field.        self.__name = (file.readline())[:-1]               # Read in the phone number field.        self.__phone_number = (file.readline())[:-1]               # Check to see if the address lines are present.               # If the lines are present, read in the address        # lines string. Use the eval statement to recreate        # the address lines.        if(file.readline()[:-1]) == "true":               string = (file.readline()[:-1])               self.__lines=eval(string)

Scaffolding Code

It's often useful to develop what's known as scaffolding code to test the code we're writing. One good reason to do so is that testing code in an interactive session can be cumbersome. The while loop we used above to gather address information isn't a good place for code testing, so we'll add methods that do nothing but test if a piece of functionality is working. Any method that starts with the word test is a scaffolding method.

Here we're reading back the fields one by one. We read in the list that contains the three lines in one call to readline(). Then we use eval to recreate the object. (If this idea seems weird, take another look at the repr() notebox.)

Now that we have both the read() and write() methods for Address, let's create some functions to test them. The following code writes addresses and then reads them back and prints them for display:

def test():        test_write()        test_read() def test_write():        file = open("\\dat\\testaddr.txt", "w")        address = Address("Rick Hightower", "920-555-1212", ["123 Main St", "Antioch Ca, 95432", "Line 3"])        address.save(file)        address = Address("Missy Hightower", "920-555-1212", ["123 Main St", "Denver Co, 80202", "Line 3"])        address.save(file)        address = Address("Martha Hightower", "602-555-1212", ["123 Main St", "Denver Co, 80202", "Line 3"])        address.save(file)        address = Address("Mary Hightower", "520-555-1212", [])        address.save(file) def test_read():        file = open("\\dat\\testaddr", "r")        for index in range(0,3):               address = Address()               address.read(file)               print address

If you try to run this code without adding a __str__ method to the Address class, you'll get some very uninteresting and somewhat useless output. Therefore, we'll add __str__ to Address to display a pretty print string representation of the address object.

class Address:        ...        ...        def __str__(self):              str = self.__name + "\n"              str = str + self.__phone_number + "\n"              for line in self.__lines:                     str = str + line + "\n"              return str

Having a meaningful string representation of a class instance can make debugging very easy.

When we run our test, we get the following output. Eyeball it for correctness.

>>> from address3 import * >>> test() Rick Hightower 920-555-1212 123 Main St Antioch Ca, 95432 Line 3 Missy Hightower 920-555-1212 123 Main St Denver Co, 80202 Line 3 Martha Hightower 602-555-1212 123 Main St Denver Co, 80202 Line 3

This is what the testaddr.txt file looks like:

Rick Hightower 920-555-1212 true ['123 Main St', 'Antioch Ca, 95432', 'Line 3'] Missy Hightower 920-555-1212 true ['123 Main St', 'Denver Co, 80202', 'Line 3'] Martha Hightower 602-555-1212 true ['123 Main St', 'Denver Co, 80202', 'Line 3'] Mary Hightower 520-555-1212 false

Writing Out and Reading the File

Of course, now that we've added and tested our reading and writing, we need to put this functionality in getAddresses. Actually, we've already added writing, but we need to be able to write out a dictionary of addresses; our writing code does not do any error checking; and the file is getting pretty long. For these reasons, we have to add reading to getAddresses. What we'll do is create two functions that read and write out a dictionary of address instances, and we'll add the necessary try...except and try...finally blocks to the code.

def readAddresses(filename, dict):       """       Read in a dictionary of addresses.       This method takes two arguments as follows:         filename   holds the filename of the file to read in address                        instances from         dict       holds reference to the dictionary object that                        this function adds addresses to       """       file = None  #to hold reference to the input file object              # Use try..finally block to work with the file              # If you can't work with the file for any reason close it       try:                  #     try to read in the addresses from the file                  #     if you can't read the addresses then print an #     error message              try:                  file = open(filename,"r")         #open the file in read mode                  strLength=(file.readline())[:-1] # read in the length                  length = int(strLength)          # convert length to an int                        #read in the addresses from 0 to length                  for index in range(0, length):                        address=Address()                        address.read(file)                        dict[address.name()]=address              except Exception, error:                  print error.__class__.__name__                  print error       finally:              if not (file is None): file.close() def writeAddresses(filename, dict):       """              Write the addresses instances in a dictionary to a file.              writeAddresses has two arguments as follows:                  filename        holds the name of the output file                  dict            holds the dictionary of address instances       """       file=None  #to hold the output file object              #try..finally: try to write the instances to a file.              #if all else fails then close the file object       try:                  #        Write the address instances in dict to a file                  #     specified by filename.                  #        If there are any errors then print an error message              try:                  file = open (filename, "w")                  length = str(len(dict))       # determine the number of                                                # address instances in dict                  file.write(length + "\n")     # write the length of dict to a # file.                           # for each address in dict.values write out the                           # address instance to the file object                  for address in dict.values():                           address.save(file)              except Exception,error:                  print error.__class__.__name__                  print error       finally:              if(file):file.close()

To make the read() and write() functions workable, we'll add the following method to the Address class:

def name(self):        return self.__name

name is used as the key into the dictionary.

I added many comments to the previous methods. Make it a habit to read them and any document strings in the code listings. Consider document strings and comments as part of the text of this book. Also, be sure to amply comment your own code.

Testing

So far we've added read and write functionality to the Address class and added functionality for reading in an entire dictionary of addresses and for writing an entire dictionary of address instances. Now we need to update getAddresses() to use all of these functions. Before we do that, though, we need to test read Addresses() and writeAddresses(), which means more scaffolding code.

def test_read_write():       """              Scaffolding code to test reading and writing dictionaries       of addresses.       """              # populate a dictionary object with some sample data       dict={}       address = Address("Rick Hightower", "920-555-1212", ["123 Main St", "Antioch Ca, 95432", "Line 3"])       dict["Rick Hightower"] = address       address = Address("Missy Hightower", "920-555-1212", ["123 Main St", "Denver Co, 80202", "Line 3"])       dict["Missy Hightower"]=address       address = Address("Martha Hightower", "602-555-1212", ["123 Main St", "Phoenix, AZ, 85226", "Line 3"])       dict["Martha Hightower"]=address       address = Address("Mary Hightower", "520-555-1212", [])       dict["Mary Hightower"]=address       fname="\\dat\\testaddr.txt" #to hold output filename              # write the dictionary full of addresses out to the file       print "calling writeAddresses"       writeAddresses(fname, dict)              # read in the dictionary full of addresses back from       # the file we just wrote it to       dict_in = {}                      #to hold input addresses       print "calling readAddresses"       readAddresses(fname, dict_in)              #show that the input matches the output       print "Input"       print dict_in       print "Output"       print dict              #return whether these equal each other       return dict==dict_in

Essentially, our scaffolding code populates a Python dictionary with address instances. Then it writes the dictionary to a file and reads it back. Next it displays the input and output dictionaries so that you can test them visually. Finally it tells us whether the input dictionary is equal to the output dictionary. With the way the code is written, it should work until we hit the last line.

For the last line to work, we need to add a __cmp__ method to our Address class that iterates through all items in the dictionary and compares them. If all of the items are equal, __cmp__ returns that the dictionaries themselves are equal. Here's how we add __cmp__ to the Address class:

def __cmp__(self,other):       """       Compares one address instance to another.       If the address instances are equal the __cmp__ returns 0.       If the address instances are not equal then we return a           non-zero value.       """             # To implement this all we do is compare the       # dictionaries of the class             # The __dict__ member of the instance holds all       # of the instance fields in a dictionary       return cmp(self.__dict__ ,other.__dict__)

__cmp__ and Equality versus Default Object Identity

If we don't define __cmp__, the comparison of objects won't work. To prove this, let's do a small interactive session.

First we define a simple class.

>>> class class1: ...   var="hi" ...

Then we create two instances.

>>> instance = class1() >>> instance2 = class1()

The instances have the same values, yet when we compare them they aren't equal (a 0 value equals false).

>>> instance == instance2 0

The default operation for __cmp__ is to check for object identity. Thus, if we set another instance to equal the first and then test for equality, we get a true (1 value).

>>> instance3 = instance >>> instance3 == instance 1

However, if we don't define __cmp__, testing for equality is the same as testing for identity. That means that the following is equivalent to the above code:

>>> instance3 is instance 1

We can use scaffolding to retest the code whenever we add fields to the Address class or whenever we change our reading and writing. The following interactive session demonstrates the use of scaffolding code to test the reading and writing of dictionaries of addresses:

>>> from address3 import * >>> test_read_write() calling writeAddresses calling readAddresses Input {'Rick Hightower': <address3.Address instance at 60188>, 'Martha Hightower': <address3.Address instance at 5fe4c>, 'Missy Hightower': <address3.Address instance at 5ffec>, 'Mary Hightower': <address3.Address instance at 5fc7c>} Output {'Mary Hightower': <address3.Address instance at 5ea58>, 'Martha Hightower': <address3.Address instance at 5e9dc>, 'Missy Hightower': <address3.Address instance at 5e950>, 'Rick Hightower': <address3.Address instance at 5e890>} 1

As you can see, our code passed because the function returned true. This means that every address in the output dictionary is equal to every address in the input dictionary.

The Full address3.py Code

Now it seems that we have everything we need to run our program, but first let's see all of address3.py, including the scaffolding code.

class Address:        """This class represents an entry in an address book"""              #Constructor        def __init__(self, name="", phone_number="", address_lines=[]):              """ The constructor takes three arguments:                     name to hold the persons name                  (string)                     phone_number to hold the persons phone number  (string)                     address_lines to hold the address of the person (list)              """                     # assign the name, phone_number and address_lines to the                     # instance variables                     # __name, __phone_number, and __lines              self.__name=name              self.__phone_number=phone_number              self.__lines=address_lines              #Methods        def save(self, file):              """              The save method saves the instance out to a file.              The save method has one argument.                     file        holds the reference to a file object.              """                     # write the __name and __phone_number instance variables                     # to the file. each variable is on its own line.              file.write(self.__name + "\n")              file.write(self.__phone_number + "\n")                     # if there are address lines then write them out to the file                     # Since these lines are optional, write "true" if they are                     # present and "false" if they are not present.              if(len(self.__lines)>0):                     file.write("true" + "\n")                     file.write(`self.__lines` + "\n")              else :                     file.write("false\n")        def read(self, file):              """              The read method reads fields of the instance from a file.              The read method has one argument.                     file        holds the reference to the output file object.              """                     # Read in the name field.              self.__name = (file.readline())[:-1]                     # Read in the phone number field.              self.__phone_number = (file.readline())[:-1]                     # Check to see if the address lines are present.                     # If lines are present, read in the address lines string.                     # Use the eval statement to recreate the address lines.              if(file.readline()[:-1]) == "true":                     string = (file.readline()[:-1])                     self.__lines=eval(string)        def name(self): return self.__name        def __cmp__(self, other):              """              Compares one address instance to another.              If the address instances are equal the __cmp__ returns 0.              If the address instances are not equal then we return a non-zero value.              """ # to implement this, __cmp__ compares the dictionaries of # two instances. The __dict__ holds all of the members.              return cmp(self.__dict__ ,other.__dict__)        def __hash__(self): return hash(self.__name)        def __str__(self):              str = self.__name + "\n"              str = str + self.__phone_number + "\n"              for line in self.__lines:                     str = str + line + "\n"              return str def getAddresses():        dict = {}  #to hold the addresses              # Call read addresses to get the dictionary of addresses # from the file        readAddresses("c:\\dat\\addressbook.txt", dict)        name = " "        while (name != ""):                     # Get the name of the person              name = raw_input("Enter in the name: ")                     # if the name is empty then continue at the top of the loop                     # if the name equals quit then break out of the loop              if (name == ""):                     continue              elif (name == "quit"):                     break                     #Get the phone number for the person              phone_number = raw_input("Enter in the phone number: ")                     # Get the address of the person              address_line1 = raw_input("Enter address line one: ")                     # If the first address line is blank then continue with                     # entering in the next address                     # Otherwise gather the other two address lines              if (address_line1 == ""):                            # Create an address object and store it in the                            # dictionary                     address = Address(name, phone_number, [])                     dict[name]=address                     continue              address_line2 = raw_input("Enter address line two: ")              address_line3 = raw_input("Enter address line three: ")                     # Create an address object and store it in the dictionary                         lines = [address_line1, address_line2, address_line3]              address = Address(name, phone_number, lines)              dict[name]=address              # Write the addresses we created and the ones that              # we read back to the file        writeAddresses("c:\\dat\\addressbook.txt",dict)        test_read() def test_write():        file = open("\\dat\\testaddr.txt", "w")        address = Address("Rick Hightower", "920-555-1212", ["123 Main St", "Antioch Ca, 95432", "Line 3"])        address.save(file)        address = Address("Missy Hightower", "920-555-1212", ["123 Main St", "Denver Co, 80202", "Line 3"])        address.save(file)        address = Address("Martha Hightower", "602-555-1212", ["123 Main St", "Denver Co, 80202", "Line 3"])        address.save(file)        address = Address("Mary Hightower", "520-555-1212", [])        address.save(file) def test_read():        file = open("\\dat\\testaddr.txt", "r")        for index in range(0,3):               address = Address()               address.read(file)               print address def test_equal():        address1 = Address("Missy Hightower", "920-555-1212", ["123 Main St", "Denver Co, 80202", "Line 3"])        address2 = Address("Missy Hightower", "920-555-1212", ["123 Main St", "Denver Co, 80202", "Line 3"])        return address1==address2

Using getAddresses()

We'll fire up an interactive session as an exercise, and enter three addresses using the getAddresses() function.

>>> from address3 import getAddresses >>> getAddresses() Enter in the name: Rick Hightower Enter in the phone number: 925-555-1212 Enter address line one: Enter in the name: Kiley Hightower Enter in the phone number: 925-555-1212 Enter address line one: Enter in the name: Whitney Hightower Enter in the phone number: 925-555-1212 Enter address line one: 123 Main St. Enter address line two: Antioch, CA Enter address line three: line 3 Enter in the name: >>> getAddresses() Enter in the name: Jenna Paul Enter in the phone number: 925-555-1255 Enter address line one: 125 Main St. Enter address line two: Antioch, CA Enter address line three: line 3 Enter in the name: quit

Here's the listing for our session:

4 Whitney Hightower 925-555-1212 true ['123 Main St.', 'Antioch, CA', 'line 3'] Jenna Paul 925-555-1255 true ['125 Main St.', 'Antioch, CA', 'line 3'] Rick Hightower 925-555-1212 false Kiley Hightower 925-555-1212 false

Persisting Objects with pickle

I've got good news for you. There's a way to write Python programs that persist Python objects that's much easier than the way we've just done it. It's called the pickle module.

Let's redefine our Address class and take away its ability to write itself to a file. Then we'll use it to demonstrate pickle. This example is from address4.Address.

class Address:       """This class represents an entry in an address book"""             #Constructor       def __init__(self, name="", phone_number="", address_lines=[]):             """The constructor takes three arguments:                    name to hold the persons name                 (string)                    phone_number to hold the persons phone number (string)                    address_lines to hold the address of the person     (list)                     """                    # Assign the name, phone_number and address_lines to                    # the instance variables                    # __name, __phone_number, and __lines.             self.__name=name             self.__phone_number=phone_number             self.__lines=address_lines             #Methods       def name(self): return self.__name       def __cmp__(self,other):             """             Compares one address instance to another.             If the address instances are equal the __cmp__ returns 0.             If the address instances are not equal then we return             a non-zero value.             """                    # To implement this, all we do is compare the                    # dictionaries of the class                    # the __dict__ member of the instance holds                    # all of the instance fields in a dictionary             return cmp(self.__dict__ ,other.__dict__)       def __hash__(self): return hash(self.__name)       def __str__(self):             str = self.__name + "\n"             str = str + self.__phone_number + "\n"             for line in self.__lines:                    str = str + line + "\n"             return str

We haven't added anything new to the class. Essentially we've just removed the read() and write() methods, and we've taken out the read() and save() methods of the Address class. Now let's show reading and writing this class to a file with the pickle module.

Import the address4 and pickle modules.

>>> import address4 >>> import pickle

Create an instance of Address, and print it to the screen.

>>> address = address4.Address("Tony Scalise", "555-555-3699") >>> print address Tony Scalise 555-555-3699

Open a file for outputting the address.

>>> dump_file = open("c:\\dat\\pickle.txt","w")

Call the pickle.dump() method, and close the file object.

>>> pickle.dump(address,dump_file) >>> dump_file.close()

Read the file back in and print it out to the screen to show that it's the same.

>>>          #open the file for reading >>> address_file = open ("c:\\dat\pickle.txt", "r") >>>          #load the address instance from the file >>> address2 = pickle.load(address_file) >>>    #print the address instance to the screen >>> print address2 Tony Scalise 555-555-3699

Test to make sure that the values are equal but that this isn't the same object.

>>> address == address2   # the objects are equal 1 >>> address is address2        # the objects are not the same object 0

Editing a pickle File

At this point you may be wondering what the file created with the pickle module looks like. It's just a text file. In fact, we can edit it to change its values. Here's the original file:

(iaddress4 Address p0 (dp1 S'_Address__name' p2 S'Tony Scalise' p3 sS'_Address__lines' p4 (lp5 sS'_Address__phone_number' p6 S'555-555-3699' p7 sb.

With your favorite text editor, change 'Tony Scalise' to 'Kelly Pagleochini' (or any name you like), and change the phone number from '555-555-3699' to '555-555-2577'.

(iaddress4 Address p0 (dp1 S'_Address__name' p2 S'Kelly Pagleochini' p3 sS'_Address__lines' p4 (lp5 sS'_Address__phone_number' p6 S'555-555-2577' p7 sb.

Now, in an interactive session, we'll show how this changes the object instance.

First close the file and reopen it.

>>> address_file.close()          # close the file >>> address_file = open("c:\\dat\\pickle.txt", "r")   # reopen it

Then read in the instance using the pickle.load() function.

>>>    #load the new instance from the file >>> address3 = pickle.load(address_file) >>> print address3  # print the new instance out. Kelly Pagleochini 555-555-2577       >>> address_file.close()

I don't recommend that you edit the file that pickle dumps, but it's nice to know that you can. It comes in handy when you're debugging an application, and it's especially useful for viewing dumped files. Are you wondering what S, sS, p0, p1, and the like, mean? They denote the type and placement of the attributes of an object in a file. To learn more about their exact meanings, refer to the pickle Python Library reference.

Writing Out Objects with pickle

With the pickle module you can do more than just write out class instances to a file; you can write out any Python object as well.

Open a new file for writing.

>>> file = open("c:\\dat\\pickle2.txt", "w")

Create a dictionary, populating it with a string, an integer, a float, a list, and another dictionary, and write it to a file.

>>> dict = {} >>> dict["string"] = "string"            #add a string to the dictionary >>> dict["int"]= 1                #add an int to the dictionary >>> dict["float"]=1.11                   #add a float to the dictionary >>> dict["list"]=[1,2,3]          #add a list to the dictionary >>> dict["dict"]={"martha":"miguel"}  #add a dictionary to the dictionary >>> pickle.dump(dict,file)        #write the dictionary to the file >>> file.close()

The output looks like this:

(dp0 S'int'  < - here is the int item's key p1 I1            < - here is the int item's value sS'string'    < - here is the string item's key p2 g2 sS'dict'      < - here is the dictionary item's key p3 (dp4 S'martha' p5 S'miguel' p6 ssS'float'    <  here is the float item's key p7 F1.11         <  here is the float item's value sS'list'      <  here is the list item's key p8 (lp9 I1 aI2 aI3 as.

Of course, we can read back the dictionary object.

Open the file for reading, and read in the pickled dictionary.

>>>          #open the file for reading >>> file = open("c:\\dat\\pickle2.txt","r") >>>          #load the dictionary object from the file >>> dict2 = pickle.load(file)

We can see that the dictionary read in from the file is equal to the dictionary written to it, but the two are not the same object.

>>> dict2 == dict          # test for equality 1 >>> dict2 is dict    # test: see if the dictionaries are the same object. 0 >>> file.close()

For efficiency, you can write a pickled object as a binary instead of a text image:

>>> file = open("c:\\dat\\pickle3.bin","w") >>> pickle.dump(dict,file,1) >>> file.close() >>> file = open("c:\\dat\\pickle3.bin","r") >>> dict3 = pickle.load(file) >>> file.close() >>> dict==dict3, dict is dict3 (1, 0)

The above session is a lot like the one before it. The main difference is the third argument to the pickle.dump() function. We passed it a true (nonzero) value to denote that we wanted this written in binary mode. Of course, looking at this file with a text editor won't do you any good because you need a utility to view binary files. Here's what it would look like. (You may not know how to read it, but at the least you can see that it's much harder to read and edit than the text mode.)

0FBD:0100 7D 71 00 28 55 03 69 6E-74 71 01 4B 01 55 06 73 } q.(U.intq.K.U.s 0FBD:0110 74 72 69 6E 67 71 02 68-02 55 04 64 69 63 74 71 tringq.h.U.dictq 0FBD:0120 03 7D 71 04 55 06 6D 61-72 74 68 61 71 05 55 06 .} q.U.marthaq.U. 0FBD:0130 6D 69 67 75 65 6C 71 06-73 55 05 66 6C 6F 61 74 miguelq.sU.float 0FBD:0140 71 07 46 31 2E 31 31 0D-0A 55 04 6C 69 73 74 71 q.F1.11..U.listq 0FBD:0150 08 5D 71 09 28 4B 01 4B-02 4B 03 65 75 2E 00 02 .]q.(K.K.K.eu... 0FBD:0160 75 08 F7 06 23 D3 00 04-74 04 81 CD 00 40 C6 06 u...#...t....@..

pickling an Object to a String

In addition to reading and writing to files, you can read and write to strings. Instead of the dump() function, you use the dumps() function (note plural), as follows.

Dump the dictionary into a text string.

>>> string = pickle.dumps(dict)

Dump the dictionary into a binary string.

>>> bin_string = pickle.dumps(dict,1)

Load the dictionary from the text string.

>>> dict4 = pickle.loads(string)

Load the dictionary from the binary string.

>>> dict5 = pickle.loads(bin_string)

Check to see if the loaded dictionaries are equal to the dumped dictionary.

>>> dict == dict4, dict == dict5 (1, 1)

Check to see if the loaded dictionaries have the same identity as the dumped dictionary.

>>> dict is dict4, dict is dict5 (0, 0)

Earlier we said that binary mode is more efficient than text mode. The question is how much. We can find out by comparing the size of the binary string to the size of the text string from the last example.

>>> len(string), len(bin_string) (129, 93)

In this example, binary mode is 40 percent more efficient than text mode. But of course more than just size efficiency is involved. When you use text mode, the pickle module has to convert text strings into their binary equivalents; if you use binary mode, less conversion is necessary.

pickle and the Address Book Application

Now that we have a handle on the pickle module, we can change our address book program to use it. Here's the listing for address4 (areas of interest are highlighted in bold):

class Address:        """This class represents an entry in an address book"""              #Constructor        def __init__(self, name="", phone_number="", address_lines=[]):              """ The constructor takes three arguments:                     name to hold the persons' name                   (string)                     phone_number to hold the persons' phone number (string)                     address_lines to hold the address of the person  (list)              """                     # assign the name, phone_number and address_lines to the                     # instance variables                     # __name, __phone_number, and __lines              self.__name=name              self.__phone_number=phone_number              self.__lines=address_lines              #Methods        def name(self): return self.__name        def __cmp__(self,other):              """              Compares one address instance to another.              If the address instances are equal the __cmp__ returns 0.              If the address instances are not equal then we return a non-zero                  value.              """                     # to implement this all we do is compare the dictionaries of                     # the class. the __dict__ member of the instance holds all                     # of the instance fields in a dictionary.              return cmp(self.__dict__ ,other.__dict__)        def __hash__(self): return hash(self.__name)        def __str__(self):              str = self.__name + "\n"              str = str + self.__phone_number + "\n"              for line in self.__lines:                     str = str + line + "\n"              return str def readAddresses(filename):        import pickle        """        Read in a dictionary of addresses.        This method takes two arguments as follows:              filename           holds the filename of the file to read in                                     address instances from        """        file = None    #to hold reference to the input file object              # try..finally to work with the file              # if you can't work with the file for any reason close it        try:                     # try to read in the addresses from the file                     # if you can't read addresses then print an error message              try:                     file = open(filename,"r") #open the file in read mode                     dict = pickle.load(file)                     return dict              except Exception, error:                     print error.__class__.__name__                     print error                     return {}        finally:              if not (file is None): file.close() def writeAddresses(filename, dict, bin=0):        import pickle        """              Write the addresses instances in a dictionary to a file.              writeAddresses has two arguments as follows:                     filename     holds the name of the output file                     dict         holds the dictionary of address instances                     bin          whether to use binary mode or not for the pickler        """        file=None    #to hold the output file object              # try..finally: try to write the instances to a file.              # if all else fails then close the file object        try:                     # Write the address instances in dict to a file                     # specified by filename.                     # If there are any errors then print an error message              try:                     file = open (filename, "w")                     pickle.dump(dict, file, bin)              except Exception,error:                     print error.__class__.__name__                     print error        finally:              if(file):file.close() def test_read_write():        """              Scaffolding code to test reading and writing dictionaries of                  addresses.        """              # populate a dictionary object with some sample data        dict={}        address = Address("Rick Hightower", "920-555-1212", ["123 Main St", "Antioch Ca, 95432", "Line 3"])        dict["Rick Hightower"] = address        address = Address("Missy Hightower", "920-555-1212", ["123 Main St", "Denver Co, 80202", "Line 3"])        dict["Missy Hightower"]=address        address = Address("Martha Hightower", "602-555-1212", ["123 Main St", "Phoenix, AZ, 85226", "Line 3"])        dict["Martha Hightower"]=address        address = Address("Mary Hightower", "520-555-1212", [])        dict["Mary Hightower"]=address        fname="c:\\dat\\testaddr.dat" #to hold output filename              #write the dictionary full of addresses out to the file        print "calling writeAddresses"        writeAddresses(fname, dict)              #read in the dictionary full of addresses back from the              # file we just wrote it to        print "calling readAddresses"        dict_in = readAddresses(fname) #to hold input addresses              #show that the input matches the output        print "Input"        print dict_in        print "Output"        print dict              #return whether these equal each other        return dict==dict_in def test_equal():        address1 = Address("Missy Hightower", "920-555-1212", ["123 Main St", "Denver Co, 80202", "Line 3"])        address2 = Address("Missy Hightower", "920-555-1212", ["123 Main St", "Denver Co, 80202", "Line 3"])        return address1==address2

You have to notice that the pickle version is much shorter than the original (address3.py). Shorter is better because less code to write means less code to maintain.

Summary

Python allows you to read and write to files. The Python file object, which is built into the language, supports the following methods:

  • read() read in data from a file

  • readline() read a line of data from a file

  • readlines() read all the lines in the file and return the lines as a list of strings

  • write() write data to a file

  • writelines() write a sequence of lines to a file

  • seek() move to a certain point in a file

  • tell() determine the current location in the file

Python makes working with files straightforward. In addition, its pickle and cPickle modules allow the persisting of objects to a file or string and make for speedy development of persisted class data.

In this chapter, we expanded on the address book example from Chapter 4. We also covered such issues as using __str__ and __repr__ with class instances. If you followed along with our expansion of the address book program to read and write files, you reviewed a lot of the first seven chapters of the book.

CONTENTS


Python Programming with the JavaT Class Libraries. A Tutorial for Building Web and Enterprise Applications with Jython
Python Programming with the Javaв„ў Class Libraries: A Tutorial for Building Web and Enterprise Applications with Jython
ISBN: 0201616165
EAN: 2147483647
Year: 2001
Pages: 25

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net