Section 10.4. Auxiliary Modules for File IO


10.4. Auxiliary Modules for File I/O

File objects supply all the minimal indispensable functionality needed for file I/O. Some auxiliary Python library modules, however, offer convenient supplementary functionality, making I/O even easier and handier in several important cases.

10.4.1. The fileinput Module

The fileinput module lets you loop over all the lines in a list of text files. Performance is good, comparable to the performance of direct iteration on each file, since fileinput uses buffering to minimize I/O. You can therefore use module fileinput for line-oriented file input whenever you find the module's rich functionality convenient, with no worries about performance. The input function is the key function of module fileinput, and the module also provides a FileInput class whose methods support the same functionality as the module's functions.

close

close( )

Closes the whole sequence so that iteration stops and no file remains open.

FileInput

class FileInput(files=None, inplace=False, backup='', bufsize=0)

Creates and returns an instance f of class FileInput. Arguments are the same as for fileinput.input, and methods of f have the same names, arguments, and semantics as functions of module fileinput. f also supplies a method readline, which reads and returns the next line. You can use class FileInput explicitly when you want to nest or mix loops that read lines from more than one sequence of files.

filelineno

filelineno( )

Returns the number of lines read so far from the file now being read. For example, returns 1 if the first line has just been read from the current file.

filename

filename( )

Returns the name of the file being read, or None if no line has been read yet.

input

input(files=None, inplace=False, backup='', bufsize=0)

Returns the sequence of lines in the files, suitable for use in a for loop. files is a sequence of filenames to open and read one after the other, in order. Filename '-' means standard input (sys.stdin). If files is a string, it's a single filename to open and read. If files is None, input uses sys.argv[1:] as the list of filenames. If the sequence of filenames is empty, input reads sys.stdin.

The sequence object that input returns is an instance of class FileInput; that instance is also the global state of module input, so all other functions of module fileinput operate on the same shared state. Each function of module fileinput corresponds directly to a method of class FileInput.

When inplace is false (the default), input just reads the files. When inplace is true, input moves each file being read (except standard input) to a backup file and redirects standard output (sys.stdout) to write to a new file with the same path as the original one of the file being read. This way, you can simulate overwriting files in-place. If backup is a string that starts with a dot, input uses backup as the extension of the backup files and does not remove the backup files. If backup is an empty string (the default), input uses .bak and deletes each backup file as the input files are closed.

bufsize is the size of the internal buffer that input uses to read lines from the input files. If bufsize is 0, input uses a buffer of 8,192 bytes.

isfirstline

isfirstline( )

Returns true or False, just like filelineno( )==1.

isstdin

isstdin( )

Returns true if the current file being read is sys.stdin; otherwise, False.

lineno

lineno( )

Returns the total number of lines read since the call to input.

nextfile

nextfile( )

Closes the file being read so that the next line to read is the first one of the next file.


10.4.2. The linecache Module

The linecache module lets you read a given line (specified by number) from a file with a given name, keeping an internal cache so that if you read several lines from a file, it's faster than opening and examining the file each time. Module linecache exposes the following functions.

checkcache

checkcache( )

Ensures that the module's cache holds no stale data and reflects what's on the filesystem. Call checkcache when the files you're reading may have changed on the filesystem to ensure that future calls to getline return updated information.

clearcache

clearcache( )

Drops the module's cache so that the memory can be reused for other purposes. Call clearcache when you know you don't need to perform anymore reading for a while.

getline

getline(filename, lineno)

Reads and returns the lineno line (the first line is 1, not 0 as is usual in Python) from the text file named filename, including the trailing \n. For any error, getline does not raise exceptions but rather returns the empty string ''. If filename is not found, getline looks for the file in the directories listed in sys.path.

getlines

getlines(filename)

Reads and returns all lines from the text file named filename, as a list of strings, each including the trailing \n. For any error, getlines does not raise exceptions but rather returns the empty list []. If filename is not found, getlines looks for the file in the directories listed in sys.path.


10.4.3. The struct Module

The struct module lets you pack binary data into a string, and unpack the bytes of such a string back into the data they represent. Such operations are useful for many kinds of low-level programming. Most often, you use module struct to interpret data records from binary files that have some specified format, or to prepare records to write to such binary files. The module's name comes from C's keyword struct, which is usable for related purposes. On any error, functions of module struct raise exceptions that are instances of exception class struct.error, the only class the module supplies.

Module struct relies on struct format strings, which are plain strings with a specific syntax. The first character of a format string specifies byte order, size, and alignment of packed data:


@

Native byte order, native data sizes, and native alignment for the current platform; this is the default if the first character is none of the characters listed here (note that format P in Table 10-1 is available only for this kind of struct format string)


=

Native byte order for the current platform, but standard size and alignment


<

Little-endian byte order (like Intel platforms); standard size and alignment


>, !

Big-endian byte order (network standard); standard size and alignment

Table 10-1. Format characters for struct

Character

C type

Python type

Standard size

B

unsigned char

int

1 byte

b

signed char

int

1 byte

c

char

str (length 1)

1 byte

d

double

float

8 bytes

f

float

float

4 bytes

H

unsigned short

int

2 bytes

h

signed short

int

2 bytes

I

unsigned int

long

4 bytes

i

signed int

int

4 bytes

L

unsigned long

long

4 bytes

l

signed long

int

4 bytes

P

void*

int

N/A

p

char[ ]

String

N/A

s

char[ ]

String

N/A

x

padding byte

no value

1 byte


Standard sizes are indicated in Table 10-1. Standard alignment means no forced alignment, with explicit padding bytes used if needed. Native sizes and alignment are whatever the platform's C compiler uses. Native byte order is either little-endian or big-endian, depending on the platform.

After the optional first character, a format string is made up of one or more format characters, each optionally preceded by a count (an integer represented by decimal digits). (The format characters are shown in Table 10-1.) For most format characters, the count means repetition (e.g., '3h' is exactly the same as 'hhh'). When the format character is s or pi.e., a stringthe count is not a repetition, but rather the total number of bytes occupied by the string. Whitespace can be freely and innocuously used between formats, but not between a count and its format character.

Format s means a fixed-length string as long as its count (the Python string is truncated, or padded with copies of the null character '\0', if needed). Format p means a Pascal-like string: the first byte is the number of significant characters, and the characters start from the second byte. The count is the total number of bytes, including the length byte.

Module struct supplies the following functions.

calcsize

calcsize(fmt)

Returns the size in bytes of the structure corresponding to struct format string fmt.

pack

pack(fmt, *values)

Packs the given values according to struct format string fmt and returns the resulting string. values must match in number and types of the values required by fmt.

unpack

unpack(fmt, s)

Unpacks binary string s according to struct format string fmt and returns a tuple of values. len(s) must be equal to struct.calcsize(fmt).





Python in a Nutshell
Python in a Nutshell, Second Edition (In a Nutshell)
ISBN: 0596100469
EAN: 2147483647
Year: 2004
Pages: 192
Authors: Alex Martelli

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net