Section 10.3. File Objects


10.3. File Objects

As mentioned in "Organization of This Chapter" on page 215, file is a built-in type in Python and the single most common way for your Python programs to read or write data. With a file object, you can read and/or write data to a file as seen by the underlying operating system. Python reacts to any I/O error related to a file object by raising an instance of built-in exception class IOError. Errors that cause this exception include open failing to open or create a file, calls to a method on a file object to which that method doesn't apply (e.g., calling write on a read-only file object, or calling seek on a nonseekable file), and I/O errors diagnosed by a file object's methods. This section covers file objects, as well as the important issue of making temporary files.

10.3.1. Creating a File Object with open

To create a Python file object, call the built-in open with the following syntax:

 open(filename, mode='r', bufsize=-1) 

open opens the file named by plain string filename, which denotes any path to a file. open returns a Python file object f, which is an instance of the built-in type file. Currently, calling file directly is like calling open, but you should call open, which may become a factory function in some future release of Python. If you explicitly pass a mode string, open can also create filename if the file does not already exist (depending on the value of mode, as we'll discuss in a moment). In other words, despite its name, open is not just for opening existing files: it can also create new ones.

10.3.1.1. File mode

mode is a string that indicates how the file is to be opened (or created). mode can be:


'r'

The file must already exist, and it is opened in read-only mode.


'w'

The file is opened in write-only mode. The file is truncated and overwritten if it already exists, or created if it does not exist.


'a'

The file is opened in write-only mode. The file is kept intact if it already exists, and the data you write is appended to what's already in the file. The file is created if it does not exist. Calling f.seek on the file is innocuous but has no effect.


'r+'

The file must already exist and is opened for both reading and writing, so all methods of f can be called.


'w+'

The file is opened for both reading and writing, so all methods of f can be called. The file is truncated and overwritten if it already exists, or created if it does not exist.


'a+'

The file is opened for both reading and writing, so all methods of f can be called. The file is kept intact if it already exists, and the data you write is appended to what's already in the file. The file is created if it does not exist. Calling f.seek on the file has no effect if the next I/O operation on f writes data but works normally if the next I/O operation on f reads data.

10.3.1.2. Binary and text modes

The mode string may also have any of the values just explained followed by a b or t. b denotes binary mode, while t denotes text mode. When the mode string has neither b nor t, the default is text mode (i.e., 'r' is like 'rt', 'w' is like 'wt', and so on).

On Unix, there is no difference between binary and text modes. On Windows, when a file is open in text mode, '\n' is returned each time the string that is the value of os.linesep (the line termination string) is encountered while the file is being read. Conversely, a copy of os.linesep is written each time you write '\n' to the file.

This widespread convention, originally developed in the C language, lets you read and write text files on any platform without worrying about the platform's line-separation conventions. However, except on Unix-like platforms, you do have to know (and tell Python, by passing the proper mode argument to open) whether a file is binary or text. In this chapter, for simplicity, I use \n to refer to the line-termination string, but remember that the string is in fact os.linesep in files on the filesystem, translated to and from \n in memory only for files opened in text mode.

Python also supports universal newlines, which let you open a text file for reading in mode 'U' (or, equivalently, 'rU') when you don't know how line separators are encoded in the file. This is useful, for example, when you share text files across a network between machines with different operating systems. Mode 'U' takes any of '\n', '\r', and '\r\n' as a line separator, and translates any line separator to '\n'.

10.3.1.3. Buffering

bufsize is an integer that denotes the buffer size you're requesting for the file. When bufsize is less than 0, the operating system's default is used. Normally, this default is line buffering for files that correspond to interactive consoles and some reasonably sized buffer, such as 8,192 bytes, for other files. When bufsize equals 0, the file is unbuffered; the effect is as if the file's buffer were flushed every time you write anything to the file. When bufsize equals 1, the file is line-buffered, which means the file's buffer is flushed every time you write \n to the file. When bufsize is greater than 1, the file uses a buffer of about bufsize bytes, rounded up to some reasonable amount. On some platforms, you can change the buffering for files that are already open, but there is no cross-platform way to do this.

10.3.1.4. Sequential and nonsequential access

A file object f is inherently sequential (i.e., a stream of bytes). When you read from a file, you get bytes in the sequential order in which they're present in the file. When you write to a file, the bytes you write are put in the file in the order in which you write them.

To allow nonsequential access, each built-in file object keeps track of its current position (the position on the underlying file where the next read or write operation will start transferring data). When you open a file, the initial position is at the start of the file. Any call to f.write on a file object f opened with a mode of 'a' or 'a+' always sets f's position to the end of the file before writing data to f. When you read or write n bytes on file object f, f's position advances by n. You can query the current position by calling f.tell and change the position by calling f.seek, which are both covered in the next section.

10.3.2. Attributes and Methods of File Objects

A file object f supplies the attributes and methods documented in this section.

close

f.close( )

Closes the file. You can call no other method on f after f.close. Multiple calls to f.close are allowed and innocuous.

closed

closed

f.closed is a read-only attribute that is true if f.close( ) has been called; otherwise, False.

encoding

encoding

f.encoding is a read-only attribute that is either None, if I/O on f uses the system default encoding, or a string that names the encoding in use. (Encodings are covered in "Unicode" on page 198.) In practice, this attribute is set only on the stdin, stdout, and stderr attributes of module sys (covered in stdin, stdout, stderr on page 171) when they refer to terminals.

flush

f.flush( )

Requests that f's buffer be written out to the operating system so that the file as seen by the system has the exact contents that Python's code has written. Depending on the platform and the nature of f's underlying file, f.flush may not be able to ensure the desired effect.

isatty

f.isatty( )

Returns true if f's underlying file is an interactive terminal; otherwise, False.

fileno

f.fileno( )

Returns an integer, which is the file descriptor of f's file at operating system level. File descriptors are covered in "File and Directory Functions of the os Module" on page 242.

mode

mode

f.mode is a read-only attribute that is the value of the mode string used in the open call that created f.

name

name

f.name is a read-only attribute that is the value of the filename string used in the open call that created f.

newlines

newlines

f.newlines is a read-only attribute useful for text files opened for "universal-newlines reading." f.newlines may be one of the strings '\n', '\r', or '\r\n' (when that string is the only kind of line separator met so far while reading f); a tuple, whose items are the different kinds of line separators met so far; or None, when no line separators have been met yet while reading f, or when f was not opened in mode 'U'.

read

f.read(size=-1)

Reads up to size bytes from f's file and returns them as a string. read reads and returns less than size bytes if the file ends before size bytes are read. When size is less than 0, read reads and returns all bytes up to the end of the file. read returns an empty string if the file's current position is at the end of the file or if size equals 0.

readline

f.readline(size=-1)

Reads and returns one line from f's file, up to the end of line (\n), included. If size is greater than or equal to 0, readline reads no more than size bytes. In this case, the returned string might not end with \n. \n might also be absent if readline reads up to the end of the file without finding \n. readline returns an empty string if the file's current position is at the end of the file or if size equals 0.

readlines

f.readlines(size=-1)

Reads and returns a list of all lines in f's file, each a string ending in \n. If size>0, readlines stops and returns the list after collecting data for a total of about size bytes rather than reading all the way to the end of the file.

seek

f.seek(pos, how=0)

Sets f's current position to the signed integer byte offset pos away from a reference point. how indicates the reference point. When how is 0, the reference is the start of the file; when it is 1, the reference is the current position; and when it is 2, the reference is the end of the file. In Python 2.5, module os has attributes named SEEK_SET, SEEK_CUR, and SEEK_END, with values of 0, 1, and 2, respectively. They are usable instead of the bare integer constants to obtain greater readability when calling this method.

When f is opened in text mode, f.seek may set the current position in unexpected ways, due to the implied translations between os.linesep and \n. This troublesome effect does not occur on Unix platforms, nor when you opened f in binary mode or when you called f.seek with a pos that is the result of a previous call to f.tell, and how is 0. When f is opened in mode 'a' or 'a+', all data written to f is appended to the data that is already in f, regardless of any calls to f.seek.

softspace

softspace

f.softspace is a read-write bool attribute used internally by the print statement (covered in "The print Statement" on page 256) to keep track of its own state. A file object doesn't alter nor interpret softspace in any way: it just lets the attribute be freely read and written, and print takes care of the rest.

tell

f.tell( )

Returns f's current position, an integer offset in bytes from the start of the file.

truncate

f.truncate([size])

Truncates f's file. When size is present, truncates the file to be at most size bytes. When size is absent, uses f.tell( ) as the file's new size.

write

f.write(s)

Writes the bytes of string s to the file.

writelines

f.writelines(lst)

Like:

 for line in lst: f.write(line) 

It does not matter whether the strings in iterable lst are lines: despite its name, method writelines just writes each of the strings to the file, one after the other.


10.3.3. Iteration on File Objects

A file object f, open for text-mode reading, is also an iterator whose items are the file's lines. Thus, the loop:

 for line in f: 

iterates on each line of the file. Due to buffering issues, interrupting such a loop prematurely (e.g., with break), or calling f.next( ) instead of f.readline( ), leaves the file's position set to an arbitrary value. If you want to switch from using f as an iterator to calling other reading methods on f, be sure to set the file's position to a known value by appropriately calling f.seek. On the plus side, a loop directly on f has very good performance, since these specifications allow the loop to use internal buffering to minimize I/O without taking up excessive amounts of memory even for huge files.

10.3.4. File-Like Objects and Polymorphism

An object x is file-like when it behaves polymorphically to a file, meaning that a function (or some other part of a program) can use x as if x were a file. Code using such an object (known as the client code of the object) typically receives the object as an argument or gets it by calling a factory function that returns the object as the result. For example, if the only method that client code calls on x is x.read( ), without arguments, then all x needs to supply in order to be file-like for that code is a method read that is callable without arguments and returns a string. Other client code may need x to implement a larger subset of file methods. File-like objects and polymorphism are not absolute concepts: they are relative to demands placed on an object by some specific client code.

Polymorphism is a powerful aspect of object-oriented programming, and file-like objects are a good example of polymorphism. A client-code module that writes to or reads from files can automatically be reused for data residing elsewhere, as long as the module does not break polymorphism by the dubious practice of type testing. When we discussed the built-ins type and isinstance in type on page 157 and isinstance on page 163, I mentioned that type testing is often best avoided, since it blocks the normal polymorphism that Python otherwise supplies. Sometimes, you may have no choice. For example, the marshal module (covered in "The marshal Module" on page 278) demands real file objects. Therefore, when your client code needs to use marshal, your code must deal with real file objects, not just file-like ones. However, such situations are rare. Most often, to support polymorphism in your client code, all you have to do is avoid type testing.

You can implement a file-like object by coding your own class (as covered in Chapter 5) and defining the specific methods needed by client code, such as read. A file-like object fl need not implement all the attributes and methods of a true file object f. If you can determine which methods client code calls on fl, you can choose to implement only that subset. For example, when fl is only going to be written, fl doesn't need "reading" methods, such as read, readline, and readlines.

When you implement a writable file-like object fl, make sure that fl.softspace can be read and written, and don't alter nor interpret softspace in any way, if you want fl to be usable by print (covered in "The print Statement" on page 256). Note that this behavior is the default when you write fl's class in Python. You need to take specific care only when fl's class overrides special method _ _setattr_ _, or otherwise controls access to its instances' attributes (e.g., by defining _ _slots_ _), as covered in Chapter 5. In particular, if your new-style class defines _ _slots_ _, then one of the slots must be named softspace if you want instances of your class to be usable as destinations of print statements.

If the main reason you want a file-like object instead of a real file object is to keep the data in memory, use modules StringIO and cStringIO, covered in "The StringIO and cStringIO Modules" on page 229. These modules supply file-like objects that hold data in memory and behave polymorphically to file objects to a wide extent.

10.3.5. The tempfile Module

The tempfile module lets you create temporary files and directories in the most secure manner afforded by your platform. Temporary files are often an excellent solution when you're dealing with an amount of data that might not comfortably fit in memory, or when your program needs to write data that another process will then use.

The order of the parameters for the functions in this module is a bit confusing: to make your code more readable, always call these functions with named-argument syntax. Module tempfile exposes the following functions.

mkstemp

mkstemp(suffix=None, prefix=None, dir=None, text=False)

Securely creates a new temporary file, readable and writable only by the current user, not executable, not inherited by subprocesses; returns a pair (fd, path), where fd is the file descriptor of the temporary file (as returned by os.open, covered in open on page 231) and string path is the absolute path to the temporary file. You can optionally pass arguments to specify strings to use as the start (prefix) and end (suffix) of the temporary file's filename, and the path to the directory in which the temporary file is created (dir); if you want to use the temporary file as a text file, you should explicitly pass the argument text=TRue. Ensuring that the temporary file is removed when you're done using it is up to you. Here is a typical usage example that creates a temporary text file, closes it, passes its path to another function, and finally ensures the file is removed:

 import tempfile, os fd, path = tempfile.mkstemp(suffix='.txt', text=True) try:     os.close(fd)     use_filepath(path) finally:     os.unlink(path) 

mkdtemp

mkdtemp(suffix=None, prefix=None, dir=None)

Securely creates a new temporary directory that is readable, writable, and searchable only by the current user, and returns the absolute path to the temporary directory. The optional arguments suffix, prefix, and dir are like for function mkstemp. Ensuring that the temporary directory is removed when you're done using it is your program's responsibility. Here is a typical usage example that creates a temporary directory, passes its path to another function, and finally ensures the directory is removed together with all of its contents:

 import tempfile, shutil path = tempfile.mkdtemp( ) try:     use_dirpath(path) finally:     shutil.rmtree(path) 

TemporaryFile

TemporaryFile(mode='w+b', bufsize=-1,suffix=None, prefix=None, dir=None)

Creates a temporary file with mkstemp (passing to mkstemp the optional arguments suffix, prefix, and dir), makes a file object from it with os.fdopen as covered in fdopen on page 254 (passing to fdopen the optional arguments mode and bufsize), and returns the file object (or a file-like wrapper around it). The temporary file is removed as soon as the file object is closed (implicitly or explicitly). For greater security, the temporary file has no name on the filesystem, if your platform allows that (Unix-like platforms do; Windows doesn't).

NamedTempor-aryFile

NamedTemporaryFile(mode='w+b', bufsize=-1,suffix=None, prefix=None, dir=None)

Like TemporaryFile, except that the temporary file does have a name on the filesystem. Use the name attribute of the file or file-like object to access that name. Some platforms, mainly Windows, do not allow the file to be opened again: therefore, the usefulness of the name is limited if you want to ensure your program works cross-platform. If you need to pass the temporary file's name to another program that opens the file, use function mkstemp, instead of NamedTemporaryFile, to guarantee correct cross-platform behavior. Of course, when you choose to use mkstemp, you do have to take care to ensure the file is removed when you're done with it.





Python in a Nutshell
Python in a Nutshell, Second Edition (In a Nutshell)
ISBN: 0596100469
EAN: 2147483647
Year: 2004
Pages: 192
Authors: Alex Martelli

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net