Section 9.2. File Built-in Functions open() and file()


9.2. File Built-in Functions [open() and file()]

As the key to opening file doors, the open() [and file()] built-in function provides a general interface to initiate the file input/output (I/O) process. The open() BIF returns a file object on a successful opening of the file or else results in an error situation. When a failure occurs, Python generates or raises an IOError exceptionwe will cover errors and exceptions in the next chapter. The basic syntax of the open() built-in function is:

file_object = open(file_name, access_mode='r', buffering=-1)


The file_name is a string containing the name of the file to open. It can be a relative or absolute/full pathname. The access_mode optional variable is also a string, consisting of a set of flags indicating which mode to open the file with. Generally, files are opened with the modes 'r,' 'w,'or 'a,' representing read, write, and append, respectively. A 'U' mode also exists for universal NEWLINE support (see below).

Any file opened with mode 'r' or 'U' must exist. Any file opened with 'w' will be truncated first if it exists, and then the file is (re)created. Any file opened with 'a' will be opened for append. All writes to files opened with 'a' will be from end-of-file, even if you seek elsewhere during access. If the file does not exist, it will be created, making it the same as if you opened the file in 'w' mode. If you are a C programmer, these are the same file open modes used for the C library function fopen().

There are other modes supported by fopen() that will work with Python's open(). These include the '+' for read-write access and 'b' for binary access. One note regarding the binary flag: 'b' is antiquated on all Unix systems that are POSIX-compliant (including Linux) because they treat all files as binary files, including text files. Here is an entry from the Linux manual page for fopen(), from which the Python open() function is derived:

The mode string can also include the letter "b" either as a last character or as a character between the characters in any of the two-character strings described above. This is strictly for compatibility with ANSI C3.159-1989 ("ANSI C") and has no effect; the "b" is ignored on all POSIX conforming systems, including Linux. (Other systems may treat text files and binary files differently, and adding the "b" may be a good idea if you do I/O to a binary file and expect that your program may be ported to non-Unix environments.)

You will find a complete list of file access modes, including the use of 'b' if you choose to use it, in Table 9.1. If access_mode is not given, it defaults automatically to 'r.'

Table 9.1. Access Modes for File Objects

File Mode

Operation

r

Open for read

rU or U[a]

Open for read with universal NEWLINE support (PEP 278)

w

Open for write (truncate if necessary)

a

Open for append (always works from EOF, create if necessary)

r+

Open for read and write

w+

Open for read and write (see w above)

a+

Open for read and write (see a above)

rb

Open for binary read

wb

Open for binary write (see w above)

ab

Open for binary append (see a above)

rb+

Open for binary read and write (see r+ above)

wb+

Open for binary read and write (see w+ above)

ab+

Open for binary read and write (see a+ above)


[a] New in Python 2.5.

The other optional argument, buffering, is used to indicate the type of buffering that should be performed when accessing the file. A value of 0 means no buffering should occur, a value of 1 signals line buffering, and any value greater than 1 indicates buffered I/O with the given value as the buffer size. The lack of or a negative value indicates that the system default buffering scheme should be used, which is line buffering for any teletype or tty-like device and normal buffering for everything else. Under normal circumstances, a buffering value is not given, thus using the system default.

Here are some examples for opening files:

fp = open('/etc/motd')        #open file for read fp = open('test', 'w')        #open file for write fp = open('data', 'r+')       #open file for read/write fp = open(r'c:\io.sys', 'rb') #open binary file for read


9.2.1. The file() Factory Function

The file() built-in function came into being in Python 2.2, during the types and classes unification. At this time, many built-in types that did not have associated built-in functions were given factory functions to create instances of those objects, i.e., dict(), bool(), file(), etc., to go along with those that did, i.e., list(), str(), etc.

Both open() and file() do exactly the same thing and one can be used in place of the other. Anywhere you see references to open(), you can mentally substitute file() without any side effects whatsoever.

For foreseeable versions of Python, both open() and file() will exist side by side, performing the exact same thing. Generally, the accepted style is that you use open() for reading/writing files, while file() is best used when you want to show that you are dealing with file objects, i.e., if instance(f, file).

9.2.2. Universal NEWLINE Support (UNS)

In an upcoming Core Note sidebar, we describe how certain attributes of the os module can help you navigate files across different platforms, all of which terminate lines with different endings, i.e., \n, \r, or \r\n. Well, the Python interpreter has to do the same thing, toothe most critical place is when importing modules. Wouldn't it be nicer if you just wanted Python to treat all files the same way?

That is the whole point of the UNS, introduced in Python 2.3, spurred by PEP 278. When you use the 'U' flag to open a file, all line separators (or terminators) will be returned by Python via any file input method, i.e., read*(), as a NEWLINE character ( \n ) regardless of what the line-endings are. (The 'rU' mode is also supported to correlate with the 'rb' option.) This feature will also support files that have multiple types of line-endings. A file.newlines attribute tracks the types of line separation characters "seen."

If the file has just been opened and no line-endings seen, file.newlines is None. After the first line, it is set to the terminator of the first line, and if one more type of line-ending is seen, then file.newlines becomes a tuple containing each type seen. Note that UNS only applies to reading text files. There is no equivalent handling of file output.

UNS is turned on by default when Python is built. If you do not wish to have this feature, you can disable it by using the --without-universal-newlines switch when running Python's configure script. If you must manage the line-endings yourself, then check out the Core Note and use those os module attributes!



Core Python Programming
Core Python Programming (2nd Edition)
ISBN: 0132269937
EAN: 2147483647
Year: 2004
Pages: 334
Authors: Wesley J Chun

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net