Section 9.3. File Built-in Methods


9.3. File Built-in Methods

Once open() has completed successfully and returned a file object, all subsequent access to the file transpires with that "handle." File methods come in four different categories: input, output, movement within a file, which we will call "intra-file motion," and miscellaneous. A summary of all file methods can be found in Table 9.3. We will now discuss each category.

9.3.1. Input

The read() method is used to read bytes directly into a string, reading at most the number of bytes indicated. If no size is given (the default value is set to integer -1) or size is negative, the file will be read to the end. It will be phased out and eventually removed in a future version of Python.

The readline() method reads one line of the open file (reads all bytes until a line-terminating character like NEWLINE is encountered). The line, including termination character(s), is returned as a string. Like read(), there is also an optional size option, which, if not provided, defaults to -1, meaning read until the line-ending characters (or EOF) are found. If present, it is possible that an incomplete line is returned if it exceeds size bytes.

The readlines() method does not return a string like the other two input methods. Instead, it reads all (remaining) lines and returns them as a list of strings. Its optional argument, sizhint, is a hint on the maximum size desired in bytes. If provided and greater than zero, approximately sizhint bytes in whole lines are read (perhaps slightly more to round up to the next buffer size) and returned as a list.

In Python 2.1, a new type of object was used to efficiently iterate over a set of lines from a file: the xreadlines object (found in the xreadlines module). Calling file.xreadlines() was equivalent to xreadlines.xreadlines(file). Instead of reading all the lines in at once, xreadlines() reads in chunks at a time, and thus were optimal for use with for loops in a memory-conscious way. However, with the introduction of iterators and the new file iteration in Python 2.3, it was no longer necessary to have an xreadlines() method because it is the same as using iter(file), or in a for loop, is replaced by for eachLine in file. Easy come, easy go.

Another odd bird is the readinto() method, which reads the given number of bytes into a writable buffer object, the same type of object returned by the unsupported buffer() built-in function. (Since buffer() is not supported, neither is readinto().)

9.3.2. Output

The write() built-in method has the opposite functionality as read() and readline(). It takes a string that can consist of one or more lines of text data or a block of bytes and writes the data to the file.

The writelines() method operates on a list just like readlines(), but takes a list of strings and writes them out to a file. Line termination characters are not inserted between each line, so if desired, they must be added to the end of each line before writelines() is called.

Note that there is no "writeline()" method since it would be equivalent to calling write() with a single line string terminated with a NEWLINE character.

Core Note: Line separators are preserved

When reading lines in from a file using file input methods like read() or readlines(), Python does not remove the line termination characters. It is up to the programmer. For example, the following code is fairly common to see in Python code:

f = open('myFile', 'r') data = [line.strip() for line in f.readlines()] f.close()


Similarly, output methods like write() or writelines() do not add line terminators for the programmer... you have to do it yourself before writing the data to the file.


9.3.3. Intra-file Motion

The seek() method (analogous to the fseek() function in C) moves the file pointer to different positions within the file. The offset in bytes is given along with a relative offset location, whence. A value of 0, the default, indicates distance from the beginning of a file (note that a position measured from the beginning of a file is also known as the absolute offset), a value of 1 indicates movement from the current location in the file, and a value of 2 indicates that the offset is from the end of the file. If you have used fseek() as a C programmer, the values 0, 1, and 2 correspond directly to the constants SEEK_SET, SEEK_CUR, and SEEK_END, respectively. Use of the seek() method comes into play when opening a file for read and write access.

tell() is a complementary method to seek(); it tells you the current location of the filein bytes from the beginning of the file.

9.3.4. File Iteration

Going through a file line by line is simple:

for eachLine in f:      :


Inside this loop, you are welcome to do whatever you need to with eachLine, representing a single line of the text file (which includes the trailing line separators).

Before Python 2.2, the best way to read in lines from a file was using file.readlines() to read in all the data, giving the programmer the ability to free up the file resource as quickly as possible. If that was not a concern, then programmers could call file.readline() to read in one line at a time. For a brief time, file.xreadlines() was the most efficient way to read in a file.

Things all changed in 2.2 when Python introduced iterators and file iteration. In file iteration, file objects became their own iterators, meaning that users could now iterate through lines of a file using a for loop without having to call read*() methods. Alternatively, the iterator next method, file.next() could be called as well to read in the next line in the file. Like all other iterators, Python will raise StopIteration when no more lines are available.

So remember, if you see this type of code, this is the "old way of doing it," and you can safely remove the call to readline().

for eachLine in f.readline():      :


File iteration is more efficient, and the resulting Python code is easier to write (and read). Those of you new to Python now are getting all the great new features and do not have to worry about the past.

9.3.5. Others

The close() method completes access to a file by closing it. The Python garbage collection routine will also close a file when the file object reference has decreased to zero. One way this can happen is when only one reference exists to a file, say, fp = open(...), and fp is reassigned to another file object before the original file is explicitly closed. Good programming style suggests closing the file before reassignment to another file object. It is possible to lose output data that is buffered if you do not explicitly close a file.

The fileno() method passes back the file descriptor to the open file. This is an integer argument that can be used in lower-level operations such as those featured in the os module, i.e., os.read().

Rather than waiting for the (contents of the) output buffer to be written to disk, calling the flush() method will cause the contents of the internal buffer to be written (or flushed) to the file immediately. isatty() is a Boolean built-in method that returns true if the file is a tty-like device and False otherwise. The TRuncate() method truncates the file to the size at the current file position or the given size in bytes.

9.3.6. File Method Miscellany

We will now reprise our first file example from Chapter 2:

filename = raw_input('Enter file name: ') f = open(filename, 'r') allLines = f.readlines() f.close() for eachLine in allLines:     print eachLine,  # suppress print's NEWLINE


We originally described how this program differs from most standard file access in that all the lines are read ahead of time before any display to the screen occurs. Obviously, this is not advantageous if the file is large. In that case, it may be a good idea to go back to the tried-and-true way of reading and displaying one line at a time using a file iterator:

filename = raw_input('Enter file name: ') f = open(filename, 'r') for eachLine in f:     print eachLine, f.close()


Core Note: Line separators and other file system inconsistencies

One of the inconsistencies of operating systems is the line separator character that their file systems support. On POSIX (Unix family or Mac OS X) systems, the line separator is the NEWLINE ( \n ) character. For old MacOS, it is the RETURN ( \r ), and DOS and Win32 systems use both ( \r\n ). Check your operating system to determine what your line separator(s) are.

Other differences include the file pathname separator (POSIX uses "/", DOS and Windows use "\", and the old MacOS uses ":"), the separator used to delimit a set of file pathnames, and the denotations for the current and parent directories.

These inconsistencies generally add an irritating level of annoyance when creating applications that run on all three platforms (and more if more architectures and operating systems are supported). Fortunately, the designers of the os module in Python have thought of this for us. The os module has five attributes that you may find useful. They are listed in Table 9.2.

Table 9.2. OS Module Attributes to Aid in Multi-platform Development

os Module

 

Attribute

Description

linesep

String used to separate lines in a file

sep

String used to separate file pathname components

pathsep

String used to delimit a set of file pathnames

curdir

String name for current working directory

pardir

String name for parent (of current working directory)


Regardless of your platform, these variables will be set to the correct values when you import the os module: One less headache to worry about.


We would also like to remind you that the comma placed at the end of the print statement is to suppress the NEWLINE character that print normally adds at the end of output. The reason for this is because every line from the text file already contains a NEWLINE. readline() and readlines() do not strip off any whitespace characters in your line (see exercises.) If we omitted the comma, then your text file display would be doublespaced one NEWLINE which is part of the input and another added by the print statement.

File objects also have a truncate() method, which takes one optional argument, size. If it is given, then the file will be truncated to, at most, size bytes. If you call TRuncate() without passing in a size, it will default to the current location in the file. For example, if you just opened the file and call TRuncate(), your file will be effectively deleted, truncated to zero bytes because upon opening a file, the "read head" is on byte 0, which is what tell() returns.

Before moving on to the next section, we will show two more examples, the first highlighting output to files (rather than input), and the second performing both file input and output as well as using the seek() and tell() methods for file positioning.

filename = raw_input('Enter file name: ') fobj = open(filename, 'w') while True:    aLine = raw_input("Enter a line ('.' to quit): ")    if aLine != ".":      fobj.write('%s%s' % (aLine, os.linesep)    else:      break fobj.close()


Here we ask the user for one line at a time, and send them out to the file. Our call to the write() method must contain a NEWLINE because raw_input() does not preserve it from the user input. Because it may not be easy to generate an end-of-file character from the keyboard, the program uses the period ( . ) as its end-of-file character, which, when entered by the user, will terminate input and close the file.

The second example opens a file for read and write, creating the file from scratch (after perhaps truncating an already existing file). After writing data to the file, we move around within the file using seek(). We also use the tell() method to show our movement.

>>> f = open('/tmp/x', 'w+') >>> f.tell() 0 >>> f.write('test line 1\n')  # add 12-char string [0-11] >>> f.tell() 12 >>> f.write('test line 2\n')  # add 12-char string [12-23] >>> f.tell()                  # tell us current file location (end)) 24 >>> f.seek(-12, 1)             # move back 12 bytes >>> f.tell()                   # to beginning of line 2 12 >>> f.readline() 'test line 2\012' >>> f.seek(0, 0)               # move back to beginning >>> f.readline() 'test line 1\012' >>> f.tell()                   # back to line 2 again 12 >>> f.readline() 'test line 2\012' >>> f.tell()                   # at the end again 24 >>> f.close()                  # close file


Table 9.3 lists all the built-in methods for file objects.

Table 9.3. Methods for File Objects

File Object Method

Operation

file.close()

Closes file

file.fileno()

Returns integer file descriptor (FD) for file

file.flush()

Flushes internal buffer for file

file.isatty()

Returns true if file is a tty-like device and False otherwise

file.next[a]( )

Returns the next line in the file [similar to file.readline()] or raises StopIteration if no more lines are available

file.read(size=-1)

Reads size bytes of file, or all remaining bytes if size not given or is negative, as a string and return it

file.readinto[b](buf, size)

Reads size bytes from file into buffer buf (unsupported)

file.readline(size=-1)

Reads and returns one line from file(includes line-ending characters), either one full line or a maximum of size characters

file.readlines(sizhint=0)

Reads and returns all lines from file as a list (includes all line termination characters); if sizhint given and > 0, whole lines are returned consisting of approximately sizhint bytes (could be rounded up to next buffer's worth)

file.xreadlines[c]( )

Meant for iteration, returns lines in file read as chunks in a more efficient way than readlines()

file.seek(off, whence=0)

Moves to a location within file, off bytes offset from whence (0 == beginning of file, 1 == current location, or 2 == end of file)

file.tell()

Returns current location within file

file.truncate(size=file.tell())

Truncates file to at most size bytes, the default being the current file location

file.write(str)

Writes string str to file

file.writelines(seq)

Writes seq of strings to file; seq should be an iterable producing strings; prior to 2.2, it was just a list of strings


[a] New in Python 2.2.

[b] New in Python 1.5.2 but unsupported.

[c] New in Python 2.1 but deprecated in Python 2.3.



Core Python Programming
Core Python Programming (2nd Edition)
ISBN: 0132269937
EAN: 2147483647
Year: 2004
Pages: 334
Authors: Wesley J Chun

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net