9.3. File Built-in MethodsOnce open() has completed successfully and returned a file object, all subsequent access to the file transpires with that "handle." File methods come in four different categories: input, output, movement within a file, which we will call "intra-file motion," and miscellaneous. A summary of all file methods can be found in Table 9.3. We will now discuss each category. 9.3.1. InputThe read() method is used to read bytes directly into a string, reading at most the number of bytes indicated. If no size is given (the default value is set to integer -1) or size is negative, the file will be read to the end. It will be phased out and eventually removed in a future version of Python. The readline() method reads one line of the open file (reads all bytes until a line-terminating character like NEWLINE is encountered). The line, including termination character(s), is returned as a string. Like read(), there is also an optional size option, which, if not provided, defaults to -1, meaning read until the line-ending characters (or EOF) are found. If present, it is possible that an incomplete line is returned if it exceeds size bytes. The readlines() method does not return a string like the other two input methods. Instead, it reads all (remaining) lines and returns them as a list of strings. Its optional argument, sizhint, is a hint on the maximum size desired in bytes. If provided and greater than zero, approximately sizhint bytes in whole lines are read (perhaps slightly more to round up to the next buffer size) and returned as a list. In Python 2.1, a new type of object was used to efficiently iterate over a set of lines from a file: the xreadlines object (found in the xreadlines module). Calling file.xreadlines() was equivalent to xreadlines.xreadlines(file). Instead of reading all the lines in at once, xreadlines() reads in chunks at a time, and thus were optimal for use with for loops in a memory-conscious way. However, with the introduction of iterators and the new file iteration in Python 2.3, it was no longer necessary to have an xreadlines() method because it is the same as using iter(file), or in a for loop, is replaced by for eachLine in file. Easy come, easy go.
Another odd bird is the readinto() method, which reads the given number of bytes into a writable buffer object, the same type of object returned by the unsupported buffer() built-in function. (Since buffer() is not supported, neither is readinto().) 9.3.2. OutputThe write() built-in method has the opposite functionality as read() and readline(). It takes a string that can consist of one or more lines of text data or a block of bytes and writes the data to the file. The writelines() method operates on a list just like readlines(), but takes a list of strings and writes them out to a file. Line termination characters are not inserted between each line, so if desired, they must be added to the end of each line before writelines() is called. Note that there is no "writeline()" method since it would be equivalent to calling write() with a single line string terminated with a NEWLINE character. Core Note: Line separators are preserved
9.3.3. Intra-file MotionThe seek() method (analogous to the fseek() function in C) moves the file pointer to different positions within the file. The offset in bytes is given along with a relative offset location, whence. A value of 0, the default, indicates distance from the beginning of a file (note that a position measured from the beginning of a file is also known as the absolute offset), a value of 1 indicates movement from the current location in the file, and a value of 2 indicates that the offset is from the end of the file. If you have used fseek() as a C programmer, the values 0, 1, and 2 correspond directly to the constants SEEK_SET, SEEK_CUR, and SEEK_END, respectively. Use of the seek() method comes into play when opening a file for read and write access. tell() is a complementary method to seek(); it tells you the current location of the filein bytes from the beginning of the file. 9.3.4. File IterationGoing through a file line by line is simple: for eachLine in f: : Inside this loop, you are welcome to do whatever you need to with eachLine, representing a single line of the text file (which includes the trailing line separators). Before Python 2.2, the best way to read in lines from a file was using file.readlines() to read in all the data, giving the programmer the ability to free up the file resource as quickly as possible. If that was not a concern, then programmers could call file.readline() to read in one line at a time. For a brief time, file.xreadlines() was the most efficient way to read in a file. Things all changed in 2.2 when Python introduced iterators and file iteration. In file iteration, file objects became their own iterators, meaning that users could now iterate through lines of a file using a for loop without having to call read*() methods. Alternatively, the iterator next method, file.next() could be called as well to read in the next line in the file. Like all other iterators, Python will raise StopIteration when no more lines are available.
So remember, if you see this type of code, this is the "old way of doing it," and you can safely remove the call to readline(). for eachLine in f.readline(): : File iteration is more efficient, and the resulting Python code is easier to write (and read). Those of you new to Python now are getting all the great new features and do not have to worry about the past. 9.3.5. OthersThe close() method completes access to a file by closing it. The Python garbage collection routine will also close a file when the file object reference has decreased to zero. One way this can happen is when only one reference exists to a file, say, fp = open(...), and fp is reassigned to another file object before the original file is explicitly closed. Good programming style suggests closing the file before reassignment to another file object. It is possible to lose output data that is buffered if you do not explicitly close a file. The fileno() method passes back the file descriptor to the open file. This is an integer argument that can be used in lower-level operations such as those featured in the os module, i.e., os.read(). Rather than waiting for the (contents of the) output buffer to be written to disk, calling the flush() method will cause the contents of the internal buffer to be written (or flushed) to the file immediately. isatty() is a Boolean built-in method that returns true if the file is a tty-like device and False otherwise. The TRuncate() method truncates the file to the size at the current file position or the given size in bytes. 9.3.6. File Method MiscellanyWe will now reprise our first file example from Chapter 2: filename = raw_input('Enter file name: ') f = open(filename, 'r') allLines = f.readlines() f.close() for eachLine in allLines: print eachLine, # suppress print's NEWLINE We originally described how this program differs from most standard file access in that all the lines are read ahead of time before any display to the screen occurs. Obviously, this is not advantageous if the file is large. In that case, it may be a good idea to go back to the tried-and-true way of reading and displaying one line at a time using a file iterator: filename = raw_input('Enter file name: ') f = open(filename, 'r') for eachLine in f: print eachLine, f.close() Core Note: Line separators and other file system inconsistencies
We would also like to remind you that the comma placed at the end of the print statement is to suppress the NEWLINE character that print normally adds at the end of output. The reason for this is because every line from the text file already contains a NEWLINE. readline() and readlines() do not strip off any whitespace characters in your line (see exercises.) If we omitted the comma, then your text file display would be doublespaced one NEWLINE which is part of the input and another added by the print statement. File objects also have a truncate() method, which takes one optional argument, size. If it is given, then the file will be truncated to, at most, size bytes. If you call TRuncate() without passing in a size, it will default to the current location in the file. For example, if you just opened the file and call TRuncate(), your file will be effectively deleted, truncated to zero bytes because upon opening a file, the "read head" is on byte 0, which is what tell() returns. Before moving on to the next section, we will show two more examples, the first highlighting output to files (rather than input), and the second performing both file input and output as well as using the seek() and tell() methods for file positioning. filename = raw_input('Enter file name: ') fobj = open(filename, 'w') while True: aLine = raw_input("Enter a line ('.' to quit): ") if aLine != ".": fobj.write('%s%s' % (aLine, os.linesep) else: break fobj.close() Here we ask the user for one line at a time, and send them out to the file. Our call to the write() method must contain a NEWLINE because raw_input() does not preserve it from the user input. Because it may not be easy to generate an end-of-file character from the keyboard, the program uses the period ( . ) as its end-of-file character, which, when entered by the user, will terminate input and close the file. The second example opens a file for read and write, creating the file from scratch (after perhaps truncating an already existing file). After writing data to the file, we move around within the file using seek(). We also use the tell() method to show our movement. >>> f = open('/tmp/x', 'w+') >>> f.tell() 0 >>> f.write('test line 1\n') # add 12-char string [0-11] >>> f.tell() 12 >>> f.write('test line 2\n') # add 12-char string [12-23] >>> f.tell() # tell us current file location (end)) 24 >>> f.seek(-12, 1) # move back 12 bytes >>> f.tell() # to beginning of line 2 12 >>> f.readline() 'test line 2\012' >>> f.seek(0, 0) # move back to beginning >>> f.readline() 'test line 1\012' >>> f.tell() # back to line 2 again 12 >>> f.readline() 'test line 2\012' >>> f.tell() # at the end again 24 >>> f.close() # close file Table 9.3 lists all the built-in methods for file objects.
|