3.9. Standard Streams
The sys module is also the place where the standard input, output, and error streams of your Python programs live:
>>> for f in (sys.stdin, sys.stdout, sys.stderr): print f ... <open file '<stdin>', mode 'r' at 762210> <open file '<stdout>', mode 'w' at 762270> <open file '<stderr>', mode 'w' at 7622d0>
The standard streams are simply preopened Python file objects that are automatically connected to your program's standard streams when Python starts up. By default, all of them are tied to the console window where Python (or a Python program) was started. Because the print statement and raw_input functions are really nothing more than user-friendly interfaces to the standard output and input streams, they are similar to using stdout and stdin in sys directly:
>>> print 'hello stdout world' hello stdout world >>> sys.stdout.write('hello stdout world' + '\n') hello stdout world >>> raw_input('hello stdin world>') hello stdin world>spam 'spam' >>> print 'hello stdin world>',; sys.stdin.readline( )[:-1] hello stdin world>eggs 'eggs'
3.9.1. Redirecting Streams to Files and Programs
Technically, standard output (and print) text appears in the console window where a program was started, standard input (and raw_input) text comes from the keyboard, and standard error text is used to print Python error messages to the console window. At least that's the default. It's also possible to redirect these streams both to files and to other programs at the system shell, as well as to arbitrary objects within a Python script. On most systems, such redirections make it easy to reuse and combine general-purpose command-line utilities.
184.108.40.206. Redirecting streams to files
Redirection is useful for things like canned (precoded) test inputs: we can apply a single test script to any set of inputs by simply redirecting the standard input stream to a different file each time the script is run. Similarly, redirecting the standard output stream lets us save and later analyze a program's output; for example, testing systems might compare the saved standard output of a script with a file of expected output to detect failures.
Although it's a powerful paradigm, redirection turns out to be straightforward to use. For instance, consider the simple read-evaluate-print loop program in Example 3-6.
Example 3-6. PP3E\System\Streams\teststreams.py
As usual, the interact function here is automatically executed when this file is run, not when it is imported. By default, running this file from a system command line makes that standard stream appear where you typed the Python command. The script simply reads numbers until it reaches end-of-file in the standard input stream (on Windows, end-of-file is usually the two-key combination Ctrl-Z; on Unix, type Ctrl-D instead[*]):
C:\...\PP3E\System\Streams>python teststreams.py Hello stream world Enter a number>12 12 squared is 144 Enter a number>10 10 squared is 100 Enter a number> Bye
But on both Windows and Unix-like platforms, we can redirect the standard input stream to come from a file with the < filename shell syntax. Here is a command session in a DOS console box on Windows that forces the script to read its input from a text file, input.txt. It's the same on Linux, but replace the DOS type command with a Unix cat command:
C:\...\PP3E\System\Streams>type input.txt 8 6 C:\...\PP3E\System\Streams>python teststreams.py < input.txt Hello stream world Enter a number>8 squared is 64 Enter a number>6 squared is 36 Enter a number>Bye
Here, the input.txt file automates the input we would normally type interactivelythe script reads from this file rather than from the keyboard. Standard output can be similarly redirected to go to a file with the > filename shell syntax. In fact, we can combine input and output redirection in a single command:
C:\...\PP3E\System\Streams>python teststreams.py < input.txt > output.txt C:\...\PP3E\System\Streams>type output.txt Hello stream world Enter a number>8 squared is 64 Enter a number>6 squared is 36 Enter a number>Bye
This time, the Python script's input and output are both mapped to text files, not to the interactive console session.
220.127.116.11. Chaining programs with pipes
On Windows and Unix-like platforms, it's also possible to send the standard output of one program to the standard input of another using the | shell character between two commands. This is usually called a "pipe" operation because the shell creates a pipeline that connects the output and input of two commands. Let's send the output of the Python script to the standard more command-line program's input to see how this works:
C:\...\PP3E\System\Streams>python teststreams.py < input.txt | more Hello stream world Enter a number>8 squared is 64 Enter a number>6 squared is 36 Enter a number>Bye
Here, teststreams's standard input comes from a file again, but its output (written by print statements) is sent to another program, not to a file or window. The receiving program is more, a standard command-line paging program available on Windows and Unix-like platforms. Because Python ties scripts into the standard stream model, though, Python scripts can be used on both ends. One Python script's output can always be piped into another Python script's input:
C:\...\PP3E\System\Streams>type writer.py print "Help! Help! I'm being repressed!" print 42 C:\...\PP3E\System\Streams>type reader.py print 'Got this" "%s"' % raw_input( ) import sys data = sys.stdin.readline( )[:-1] print 'The meaning of life is', data, int(data) * 2 C:\...\PP3E\System\Streams>python writer.py | python reader.py Got this" "Help! Help! I'm being repressed!" The meaning of life is 42 84
This time, two Python programs are connected. Script reader gets input from script writer; both scripts simply read and write, oblivious to stream mechanics. In practice, such chaining of programs is a simple form of cross-program communications. It makes it easy to reuse utilities written to communicate via stdin and stdout in ways we never anticipated. For instance, a Python program that sorts stdin text could be applied to any data source we like, including the output of other scripts. Consider the Python command-line utility scripts in Examples 3-7 and 3-8 that sort and sum lines in the standard input stream.
Example 3-7. PP3E\System\Streams\sorter.py
The last command here connects three Python scripts by standard streamsthe output of each prior script is fed to the input of the next via pipeline shell syntax.
18.104.22.168. Coding alternatives
A few coding pointers here: if you look closely, you'll notice that sorter reads all of stdin at once with the readlines method, but adder reads one line at a time. If the input source is another program, some platforms run programs connected by pipes in parallel. On such systems, reading line by line works better if the data streams being passed about are large because readers don't have to wait until writers are completely finished to get busy processing data. Because raw_input just reads stdin, the line-by-line scheme used by adder can always be coded with sys.stdin too:
C:\...\PP3E\System\Streams>type adder2.py import sys sum = 0 while True: line = sys.stdin.readline( ) if not line: break sum += int(line) print sum
This version utilizes the fact that the int allows the digits to be surrounded by whitespace (readline returns a line including its \n, but we don't have to use [:-1] or rstrip( ) to remove it for int). In fact, we can use Python's more recent file iterators to achieve the same effectthe for loop, for example, automatically grabs one line each time through when we iterate over a file object directly (more on file iterators in the next chapter):
C:\...\PP3E\System\Streams>type adder3.py import sys sum = 0 for line in sys.stdin: sum += int(line) print sum
Changing sorter to read line by line this way may not be a big performance boost, though, because the list sort method requires that the list already be complete. As we'll see in Chapter 20, manually coded sort algorithms are likely to be much slower than the Python list sorting method.
Interestingly, these two scripts can also be coded in a much more compact fashion in Python 2.4 by using the new sorted function, list comprehensions, and file iterators. The following work the same way as the originals:
C:\...\PP3E\System\Streams>type sorter24.py import sys for line in sorted(sys.stdin): print line, C:\...\PP3E\System\Streams>type adder24.py import sys print sum(int(line) for line in sys.stdin)
The latter of these employs a generator expression, which is much like a list comprehension, but results are returned one at a time, not in a physical list. The net effect is space optimization.
22.214.171.124. Redirected streams and user interaction
At the start of the last section, we piped teststreams.py output into the standard more command-line program with a command similar to this one:
C:\...\PP3E\System\Streams>python teststreams.py < input.txt | more
But since we already wrote our own "more" paging utility in Python near the start of this chapter, why not set it up to accept input from stdin too? For example, if we change the last three lines of the more.py file listed earlier in this chapter to this...
if _ _name_ _ == '_ _main_ _': # when run, not when imported if len(sys.argv) == 1: # page stdin if no cmd args more(sys.stdin.read( )) else: more(open(sys.argv).read( ))
...it almost seems as if we should be able to redirect the standard output of teststreams.py into the standard input of more.py:
C:\...\PP3E\System\Streams>python teststreams.py < input.txt | python ..\more.py Hello stream world Enter a number>8 squared is 64 Enter a number>6 squared is 36 Enter a number>Bye
This technique generally works for Python scripts. Here, teststreams.py takes input from a file again. And, as in the last section, one Python program's output is piped to another's inputthe more.py script in the parent (..) directory.
126.96.36.199. Reading keyboard input
But there's a subtle problem lurking in the preceding more.py command. Really, chaining worked there only by sheer luck: if the first script's output is long enough that more has to ask the user if it should continue, the script will utterly fail. The problem is that the augmented more.py uses stdin for two disjointed purposes. It reads a reply from an interactive user on stdin by calling raw_input, but now it also accepts the main input text on stdin. When the stdin stream is really redirected to an input file or pipe, we can't use it to input a reply from an interactive user; it contains only the text of the input source. Moreover, because stdin is redirected before the program even starts up, there is no way to know what it meant prior to being redirected in the command line.
If we intend to accept input on stdin and use the console for user interaction, we have to do a bit more. Example 3-9 shows a modified version of the more script that pages the standard input stream if called with no arguments but also makes use of lower-level and platform-specific tools to converse with a user at a keyboard if needed.
Example 3-9. PP3E\System\moreplus.py
Most of the new code in this version shows up in its getreply function. The file's isatty method tells us whether stdin is connected to the console; if it is, we simply read replies on stdin as before. Unfortunately, there is no portable way to input a string from a console user independent of stdin, so we must wrap the non-stdin input logic of this script in a sys.platform test:
Of course, we have to add such extra logic only to scripts that intend to interact with console users and take input on stdin. In a GUI application, for example, we could instead pop up dialogs, bind keyboard-press events to run callbacks, and so on (we'll meet GUIs in Chapter 8).
Armed with the reusable getreply function, though, we can safely run our moreplus utility in a variety of ways. As before, we can import and call this module's function directly, passing in whatever string we wish to page:
>>> from moreplus import more >>> more(open('System.txt').read( )) This directory contains operating system interface examples. Many of the examples in this unit appear elsewhere in the examples distribution tree, because they are actually used to manage other programs. See the README.txt files in the subdirectories here for pointers.
Also as before, when run with a command-line argument, this script interactively pages through the named file's text:
C:\...\PP3E\System>python moreplus.py System.txt This directory contains operating system interface examples. Many of the examples in this unit appear elsewhere in the examples distribution tree, because they are actually used to manage other programs. See the README.txt files in the subdirectories here for pointers. C:\...\PP3E\System>python moreplus.py moreplus.py ############################################################# # split and interactively page a string, file, or stream of # text to stdout; when run as a script, page stdin or file # whose name is passed on cmdline; if input is stdin, can't # use it for user reply--use platform-specific tools or GUI; ############################################################# import sys, string def getreply( ): ?n
But now the script also correctly pages text redirected into stdin from either a file or a command pipe, even if that text is too long to fit in a single display chunk. On most shells, we send such input via redirection or pipe operators like these:
C:\...\PP3E\System>python moreplus.py < moreplus.py ############################################################# # split and interactively page a string, file, or stream of # text to stdout; when run as a script, page stdin or file # whose name is passed on cmdline; if input is stdin, can't # use it for user reply--use platform-specific tools or GUI; ############################################################# import sys, string def getreply( ): ?n C:\...\PP3E\System>type moreplus.py | python moreplus.py ############################################################# # split and interactively page a string, file, or stream of # text to stdout; when run as a script, page stdin or file # whose name is passed on cmdline; if input is stdin, can't # use it for user reply--use platform-specific tools or GUI; ############################################################# import sys, string def getreply( ): ?n
This works the same way on Linux, but, again, use the cat command rather than type. Finally, piping one Python script's output into this script's input now works as expected, without botching user interaction (and not just because we got lucky):
......\System\Streams>python teststreams.py < input.txt | python ..\moreplus.py Hello stream world Enter a number>8 squared is 64 Enter a number>6 squared is 36 Enter a number>Bye
Here, the standard output of one Python script is fed to the standard input of another Python script located in the parent directory: moreplus.py reads the output of teststreams.py.
All of the redirections in such command lines work only because scripts don't care what standard input and output really areinteractive users, files, or pipes between programs. For example, when run as a script, moreplus.py simply reads stream sys.stdin; the command-line shell (e.g., DOS on Windows, csh on Linux) attaches such streams to the source implied by the command line before the script is started. Scripts use the preopened stdin and stdout file objects to access those sources, regardless of their true nature.
And for readers keeping count, we have run this single more pager script in four different ways: by importing and calling its function, by passing a filename command-line argument, by redirecting stdin to a file, and by piping a command's output to stdin. By supporting importable functions, command-line arguments, and standard streams, Python system tools code can be reused in a wide variety of modes.
3.9.2. Redirecting Streams to Python Objects
All of the previous standard stream redirections work for programs written in any language that hooks into the standard streams and rely more on the shell's command-line processor than on Python itself. Command-line redirection syntax like < filename and | program is evaluated by the shell, not by Python. A more Pythonesque form of redirection can be done within scripts themselves by resetting sys.stdin and sys.stdout to file-like objects.
The main trick behind this mode is that anything that looks like a file in terms of methods will work as a standard stream in Python. The object's interface (sometimes called its protocol), and not the object's specific datatype, is all that matters. That is:
Because print and raw_input simply call the write and readline methods of whatever objects sys.stdout and sys.stdin happen to reference, we can use this technique to both provide and intercept standard stream text with objects implemented as classes.
Such plug-and-play compatibility is usually called polymorphismi.e., it doesn't matter what an object is, and it doesn't matter what its interface does, as long as it provides the expected interface. This liberal approach to datatypes accounts for much of the conciseness and flexibility of Python code. Here, it provides a way for scripts to reset their own streams. Example 3-10 shows a utility module that demonstrates this concept.
Example 3-10. PP3E\System\Streams\redirect.py
This module defines two classes that masquerade as real files:
The redirect function at the bottom of this file combines these two objects to run a single function with input and output redirected entirely to Python class objects. The passed-in function to run need not know or care that its print statements, raw_input calls and stdin and stdout method calls, are talking to a class rather than to a real file, pipe, or user.
To demonstrate, import and run the interact function at the heart of the teststreams script of Example 3-6 that we've been running from the shell (to use the redirection utility function, we need to deal in terms of functions, not files). When run directly, the function reads from the keyboard and writes to the screen, just as if it were run as a program without redirection:
C:\...\PP3E\System\Streams>python >>> from teststreams import interact >>> interact( ) Hello stream world Enter a number>2 2 squared is 4 Enter a number>3 3 squared is 9 Enter a number >>>
Now, let's run this function under the control of the redirection function in redirect.py and pass in some canned input text. In this mode, the interact function takes its input from the string we pass in ('4\n5\n6\n'tHRee lines with explicit end-of-line characters), and the result of running the function is a string containing all the text written to the standard output stream:
>>> from redirect import redirect >>> output = redirect(interact, ( ), '4\n5\n6\n') >>> output 'Hello stream world\nEnter a number>4 squared is 16\nEnter a number> 5 squared is 25\nEnter a number>6 squared is 36\nEnter a number>Bye\n'
The result is a single, long string containing the concatenation of all text written to standard output. To make this look better, we can split it up with the string object's split method:
>>> for line in output.split('\n'): print line ... Hello stream world Enter a number>4 squared is 16 Enter a number>5 squared is 25 Enter a number>6 squared is 36 Enter a number>Bye
Better still, we can reuse the more.py module we saw earlier in this chapter; it's less to type and remember, and it's already known to work well:
>>> from PP3E.System.more import more >>> more(output) Hello stream world Enter a number>4 squared is 16 Enter a number>5 squared is 25 Enter a number>6 squared is 36 Enter a number>Bye
This is an artificial example, of course, but the techniques illustrated are widely applicable. For example, it's straightforward to add a GUI interface to a program written to interact with a command-line user. Simply intercept standard output with an object such as the Output class shown earlier and throw the text string up in a window. Similarly, standard input can be reset to an object that fetches text from a graphical interface (e.g., a popped-up dialog box). Because classes are plug-and-play compatible with real files, we can use them in any tool that expects a file. Watch for a GUI stream-redirection module named guiStreams in Chapter 11.
3.9.3. The StringIO Module
The prior section's technique of redirecting streams to objects proved so handy that now a standard library automates the task. It provides an object that maps a file object interface to and from in-memory strings. For example:
>>> from StringIO import StringIO >>> buff = StringIO( ) # save written text to a string >>> buff.write('spam\n') >>> buff.write('eggs\n') >>> buff.getvalue( ) 'spam\neggs\n' >>> buff = StringIO('ham\nspam\n') # provide input from a string >>> buff.readline( ) 'ham\n' >>> buff.readline( ) 'spam\n' >>> buff.readline( ) ''
As in the prior section, instances of StringIO objects can be assigned to sys.stdin and sys.stdout to redirect streams for raw_input and print and can be passed to any code that was written to expect a real file object. Again, in Python, the object interface, not the concrete datatype, is the name of the game:
>>> from StringIO import StringIO >>> import sys >>> buff = StringIO( ) >>> temp = sys.stdout >>> sys.stdout = buff >>> print 42, 'spam', 3.141 # or print >> buff, ... >>> sys.stdout = temp # restore original stream >>> buff.getvalue( ) '42 spam 3.141\n'
3.9.4. Capturing the stderr Stream
We've been focusing on stdin and stdout redirection, but stderr can be similarly reset to files, pipes, and objects. This is straightforward within a Python script. For instance, assigning sys.stderr to another instance of a class such as Output or a StringIO object in the preceding section's example allows your script to intercept text written to standard error too.
Python itself uses standard error for error message text (and the IDLE GUI interface intercepts it and colors it red by default). However, no higher-level tools for standard error do what print and raw_input( ) do for the output and input streams. If you wish to print to the error stream, you'll want to call sys.stderr.write( ) explicitly or read the next section for a print statement trick that makes this a bit simpler.
Redirecting standard errors from a shell command line is a bit more complex and less portable. On most Unix-like systems, we can usually capture stderr output by using shell-redirection syntax of the form command > output 2>&1. This may not work on some flavors of Windows platforms, though, and can even vary per Unix shell; see your shell's manpages for more details.
3.9.5. Redirection Syntax in Print Statements
Because resetting the stream attributes to new objects was so popular, as of Python 2.0 the print statement is also extended to include an explicit file to which output is to be sent. A statement of the form:
print >> file, stuff # file is an object, not a string name
prints stuff to file instead of to stdout. The net effect is similar to simply assigning sys.stdout to an object, but there is no need to save and restore in order to return to the original output stream (as shown in the section on redirecting streams to objects). For example:
import sys print >> sys.stderr, 'spam' * 2
will send text the standard error stream object rather than sys.stdout for the duration of this single print statement only. The next normal print statement (without >>) prints to standard output as usual.
3.9.6. Other Redirection Options
Earlier in this chapter, we studied the built-in os.popen function, which provides a way to redirect another command's streams from within a Python program. As we saw, this function runs a shell command line (e.g., a string we would normally type at a DOS or csh prompt) but returns a Python file-like object connected to the command's input or output stream.
Because of that, the os.popen tool is another way to redirect streams of spawned programs, and it is a cousin to the techniques we just met: its effect is much like the shell | command-line pipe syntax for redirecting streams to programs (in fact, its name means "pipe open"), but it is run within a script and provides a file-like interface to piped streams. It's similar in spirit to the redirect function, but it's based on running programs (not calling functions), and the command's streams are processed in the spawning script as files (not tied to class objects). That is, os.popen redirects the streams of a program that a script starts instead of redirecting the streams of the script itself.
By passing in the desired mode flag, we redirect a spawned program's input or output streams to a file in the calling scripts:
C:\...\PP3E\System\Streams>type hello-out.py print 'Hello shell world' C:\...\PP3E\System\Streams>type hello-in.py input = raw_input( ) open('hello-in.txt', 'w').write('Hello ' + input + '\n') C:\...\PP3E\System\Streams>python >>> import os >>> pipe = os.popen('python hello-out.py') # 'r' is default--read stdout >>> pipe.read( ) 'Hello shell world\n' >>> pipe = os.popen('python hello-in.py', 'w') >>> pipe.write('Gumby\n') # 'w'--write to program stdin >>> pipe.close( ) # \n at end is optional >>> open('hello-in.txt').read( ) 'Hello Gumby\n'
The popen call is also smart enough to run the command string as an independent process on platforms that support such a notion. It accepts an optional third argument that can be used to control buffering of written text.
Additional popen-like tools in the Python library allow scripts to connect to more than one of the commands' streams. For instance, the os.open2 call includes functions for hooking into both a command's input and output streams:
childStdIn, childStdout = os.popen2('python hello-in-out.py') childStdin.write(input) output = childStdout.read( )
os.popen3 is similar, but it returns a third pipe for connecting to standard error as well. A related call, os.popen4, returns two pipe file objects; it's like os.popen3, but the output and error streams are tied together into a single pipe:
childStdin, childStdout, childStderr = os.popen3('python hello-in-out.py') childStdin, childStdout_and_err = os.popen4('python hello-in-out.py')
The os.popen2/3/4 variants work much like os.popen, but they connect additional streams and accept an optional second argument that specifies text or binary-mode data (t or bmore on the distinction in the next chapter).
The os.popen calls are also Python's portable equivalent of Unix-like shell syntax for redirecting the streams of spawned programs. The Python versions also work on Windows, though, and are the most platform-neutral way to launch another program from a Python script. The command-line strings you pass to them may vary per platform (e.g., a directory listing requires an ls on Unix but a dir on Windows), but the call itself works on all major Python platforms.
On Unix-like platforms, the combination of the calls os.fork, os.pipe, os.dup, and some os.exec variants can be used to start a new independent program with streams connected to the parent program's streams. As such, it's another way to redirect streams and a low-level equivalent to tools such as os.popen.
As of this writing, the os.fork call does not work on the standard version of Python for Windows, however, because it is too much at odds with that system's process model. See Chapter 5 for more on all of these calls, especially its section on pipes, as well its sidebar on Cygwin, a third-party package that includes a library for use on Windows that adds Unix calls such as fork and a version of Python that contains such tools.[*]
In the next chapter, we'll continue our survey of Python system interfaces by exploring the tools available for processing files and directories. Although we'll be shifting focus somewhat, we'll find that some of what we've learned here will already begin to come in handy as general system-related tools. Spawning shell commands, for instance, provides ways to inspect directories, and the file interface we will expand on in the next chapter is at the heart of the stream processing techniques we have studied here.