Section 5.6. Pipes | Programming Python

5.6. Pipes

Pipes, another cross-program communication device, are made available in Python with the built-in os.pipe call. Pipes are unidirectional channels that work something like a shared memory buffer, but with an interface resembling a simple file on each of two ends. In typical use, one program writes data on one end of the pipe, and another reads that data on the other end. Each program sees only its end of the pipes and processes it using normal Python file calls.

Pipes are much more within the operating system, though. For instance, calls to read a pipe will normally block the caller until data becomes available (i.e., is sent by the program on the other end) instead of returning an end-of-file indicator. Because of such properties, pipes are also a way to synchronize the execution of independent programs.

5.6.1. Anonymous Pipe Basics

Pipes come in two flavorsanonymous and named. Named pipes (sometimes called fifos) are represented by a file on your computer. Anonymous pipes exist only within processes, though, and are typically used in conjunction with process forks as a way to link parent and spawned child processes within an application; parent and child converse over shared pipe file descriptors. Because named pipes are really external files, the communicating processes need not be related at all (in fact, they can be independently started programs).

Since they are more traditional, let's start with a look at anonymous pipes. To illustrate, the script in Example 5-16 uses the os.fork call to make a copy of the calling process as usual (we met forks earlier in this chapter). After forking, the original parent process and its child copy speak through the two ends of a pipe created with os.pipe prior to the fork. The os.pipe call returns a tuple of two file descriptorsthe low-level file identifiers we met earlierrepresenting the input and output sides of the pipe. Because forked child processes get copies of their parents' file descriptors, writing to the pipe's output descriptor in the child sends data back to the parent on the pipe created before the child was spawned.

Example 5-16. PP3E\System\Processes\pipe1.py

 import os, time def child(pipeout):     zzz = 0     while 1:         time.sleep(zzz)                          # make parent wait         os.write(pipeout, 'Spam %03d' % zzz)     # send to parent         zzz = (zzz+1) % 5                        # goto 0 after 4 def parent( ):     pipein, pipeout = os.pipe( )                  # make 2-ended pipe     if os.fork( ) == 0:                           # copy this process         child(pipeout)                             # in copy, run child     else:                                         # in parent, listen to pipe         while 1:             line = os.read(pipein, 32)           # blocks until data sent             print 'Parent %d got "%s" at %s' % (os.getpid(), line, time.time( )) parent( )

If you run this program on Linux (pipe is now available on Windows, but fork is not), the parent process waits for the child to send data on the pipe each time it calls os.read. It's almost as if the child and parent act as client and server herethe parent starts the child and waits for it to initiate communication.^[*] Just to tease, the child keeps the parent waiting one second longer between messages with time.sleep calls, until the delay has reached four seconds. When the zzz delay counter hits 005, it rolls back down to 000 and starts again:

^[*] We will clarify the notions of "client" and "server" in the Internet programming part of this book. There, we'll communicate with sockets (which are very roughly like bidirectional pipes for networks), but the overall conversation model is similar. Named pipes (fifos), described later, are a better match to the client/server model because they can be accessed by arbitrary, unrelated processes (no forks are required). But as we'll see, the socket port model is generally used by most Internet scripting protocols.

 [mark@toy]$ python pipe1.py Parent 1292 got "Spam 000" at 968370008.322 Parent 1292 got "Spam 001" at 968370009.319 Parent 1292 got "Spam 002" at 968370011.319 Parent 1292 got "Spam 003" at 968370014.319 Parent 1292 got "Spam 004Spam 000" at 968370018.319 Parent 1292 got "Spam 001" at 968370019.319 Parent 1292 got "Spam 002" at 968370021.319 Parent 1292 got "Spam 003" at 968370024.319 Parent 1292 got "Spam 004Spam 000" at 968370028.319 Parent 1292 got "Spam 001" at 968370029.319 Parent 1292 got "Spam 002" at 968370031.319 Parent 1292 got "Spam 003" at 968370034.319

If you look closely, you'll see that when the child's delay counter hits 004, the parent ends up reading two messages from the pipe at once; the child wrote two distinct messages, but they were close enough in time to be fetched as a single unit by the parent. Really, the parent blindly asks to read, at most, 32 bytes each time, but it gets back whatever text is available in the pipe (when it becomes available). To distinguish messages better, we can mandate a separator character in the pipe. An end-of-line makes this easy, because we can wrap the pipe descriptor in a file object with os.fdopen and rely on the file object's readline method to scan up through the next \n separator in the pipe. Example 5-17 implements this scheme.

Example 5-17. PP3E\System\Processes\pipe2.py

 # same as pipe1.py, but wrap pipe input in stdio file object # to read by line, and close unused pipe fds in both processes import os, time def child(pipeout):     zzz = 0     while 1:         time.sleep(zzz)                          # make parent wait         os.write(pipeout, 'Spam %03d\n' % zzz)   # send to parent         zzz = (zzz+1) % 5                        # roll to 0 at 5 def parent( ):     pipein, pipeout = os.pipe( )                  # make 2-ended pipe     if os.fork( ) == 0:                           # in child, write to pipe         os.close(pipein)                          # close input side here         child(pipeout)     else:                                        # in parent, listen to pipe         os.close(pipeout)                        # close output side here         pipein = os.fdopen(pipein)               # make stdio input object         while 1:             line = pipein.readline( )[:-1]        # blocks until data sent             print 'Parent %d got "%s" at %s' % (os.getpid(), line, time.time( )) parent( )

This version has also been augmented to close the unused end of the pipe in each process (e.g., after the fork, the parent process closes its copy of the output side of the pipe written by the child); programs should close unused pipe ends in general. Running with this new version returns a single child message to the parent each time it reads from the pipe, because they are separated with markers when written:

 [mark@toy]$ python pipe2.py Parent 1296 got "Spam 000" at 968370066.162 Parent 1296 got "Spam 001" at 968370067.159 Parent 1296 got "Spam 002" at 968370069.159 Parent 1296 got "Spam 003" at 968370072.159 Parent 1296 got "Spam 004" at 968370076.159 Parent 1296 got "Spam 000" at 968370076.161 Parent 1296 got "Spam 001" at 968370077.159 Parent 1296 got "Spam 002" at 968370079.159 Parent 1296 got "Spam 003" at 968370082.159 Parent 1296 got "Spam 004" at 968370086.159 Parent 1296 got "Spam 000" at 968370086.161 Parent 1296 got "Spam 001" at 968370087.159 Parent 1296 got "Spam 002" at 968370089.159

5.6.2. Bidirectional IPC with Pipes

Pipes normally let data flow in only one directionone side is input, one is output. What if you need your programs to talk back and forth, though? For example, one program might send another a request for information and then wait for that information to be sent back. A single pipe can't generally handle such bidirectional conversations, but two pipes can. One pipe can be used to pass requests to a program and another can be used to ship replies back to the requestor.^[*]

^[*] This really does have real-world applications. For instance, I once added a GUI interface to a command-line debugger for a C-like programming language by connecting two processes with pipes. The GUI ran as a separate process that constructed and sent commands to the existing debugger's input stream pipe and parsed the results that showed up in the debugger's output stream pipe. In effect, the GUI acted like a programmer typing commands at a keyboard. By spawning command-line programs with streams attached by pipes, systems can add new interfaces to legacy programs. We'll see a simple example of this sort of structure in Chapter 11.

The module in Example 5-18 demonstrates one way to apply this idea to link the input and output streams of two programs. Its spawn function forks a new child program and connects the input and output streams of the parent to the output and input streams of the child. That is:

When the parent reads from its standard input, it is reading text sent to the child's standard output.
When the parent writes to its standard output, it is sending data to the child's standard input.

The net effect is that the two independent programs communicate by speaking over their standard streams.

Example 5-18. PP3E\System\Processes\pipes.py

 ############################################################################# # spawn a child process/program, connect my stdin/stdout to child process's # stdout/stdin--my reads and writes map to output and input streams of the # spawned program; much like os.popen2 plus parent stream redirection; ############################################################################# import os, sys def spawn(prog, *args):                         # pass progname, cmdline args     stdinFd  = sys.stdin.fileno( )             # get descriptors for streams     stdoutFd = sys.stdout.fileno( )            # normally stdin=0, stdout=1     parentStdin, childStdout  = os.pipe( )     # make two IPC pipe channels     childStdin,  parentStdout = os.pipe( )     # pipe returns (inputfd, outoutfd)     pid = os.fork( )                           # make a copy of this process     if pid:         os.close(childStdout)                 # in parent process after fork:         os.close(childStdin)                  # close child ends in parent         os.dup2(parentStdin,  stdinFd)        # my sys.stdin copy  = pipe1[0]         os.dup2(parentStdout, stdoutFd)       # my sys.stdout copy = pipe2[1]     else:         os.close(parentStdin)                 # in child process after fork:         os.close(parentStdout)                # close parent ends in child         os.dup2(childStdin,  stdinFd)         # my sys.stdin copy  = pipe2[0]         os.dup2(childStdout, stdoutFd)        # my sys.stdout copy = pipe1[1]         args = (prog,) + args         os.execvp(prog, args)                 # new program in this process         assert False, 'execvp failed!'        # os.exec call never returns here if _ _name_ _ == '_ _main_ _':     mypid = os.getpid( )     spawn('python', 'pipes-  testchild.py', 'spam')      # fork child program     print 'Hello 1 from parent', mypid                 # to child's stdin     sys.stdout.flush( )                               # subvert stdio buffering     reply = raw_input( )                              # from child's stdout     sys.stderr.write('Parent got: "%s"\n' % reply)    # stderr not tied to pipe!     print 'Hello 2 from parent', mypid     sys.stdout.flush( )     reply = sys.stdin.readline( )     sys.stderr.write('Parent got: "%s"\n' % reply[:-1])

The spawn function in this module does not work on Windows (remember that fork isn't yet available there today). In fact, most of the calls in this module map straight to Unix system calls (and may be arbitrarily terrifying at first glance to non-Unix developers). We've already met some of these (e.g., os.fork), but much of this code depends on Unix concepts we don't have time to address well in this text. But in simple terms, here is a brief summary of the system calls demonstrated in this code:

os.fork: Copies the calling process as usual and returns the child's process ID in the parent process only.
os.execvp: Overlays a new program in the calling process; it's just like the os.execlp used earlier but takes a tuple or list of command-line argument strings (collected with the *args form in the function header).
os.pipe: Returns a tuple of file descriptors representing the input and output ends of a pipe, as in earlier examples.
os.close(fd): Closes the descriptor-based file fd.
os.dup2(fd1,fd2): Copies all system information associated with the file named by the file descriptor fd1 to the file named by fd2.

In terms of connecting standard streams, os.dup2 is the real nitty-gritty here. For example, the call os.dup2(parentStdin,stdinFd) essentially assigns the parent process's stdin file to the input end of one of the two pipes created; all stdin reads will henceforth come from the pipe. By connecting the other end of this pipe to the child process's copy of the stdout stream file with os.dup2(childStdout,stdoutFd), text written by the child to its sdtdout winds up being routed through the pipe to the parent's stdin stream.

To test this utility, the self-test code at the end of the file spawns the program shown in Example 5-19 in a child process and reads and writes standard streams to converse with it over two pipes.

Example 5-19. PP3E\System\Processes\pipes-testchild.py

 import os, time, sys mypid     = os.getpid( ) parentpid = os.getppid( ) sys.stderr.write('Child %d of %d got arg: %s\n' %                                 (mypid, parentpid, sys.argv[1])) for i in range(2):     time.sleep(3)              # make parent process wait by sleeping here     input = raw_input( )       # stdin tied to pipe: comes from parent's stdout     time.sleep(3)     reply = 'Child %d got: [%s]' % (mypid, input)     print reply                # stdout tied to pipe: goes to parent's stdin     sys.stdout.flush( )        # make sure it's sent now or else process blocks

Here is our test in action on Linux; its output is not incredibly impressive to read, but it represents two programs running independently and shipping data back and forth through a pipe device managed by the operating system. This is even more like a client/server model (if you imagine the child as the server). The text in square brackets in this output went from the parent process to the child and back to the parent again, all through pipes connected to standard streams:

 [mark@toy]$ python pipes.py Child 797 of 796 got arg: spam Parent got: "Child 797 got: [Hello 1 from parent 796]" Parent got: "Child 797 got: [Hello 2 from parent 796]"

5.6.2.1. Deadlocks, flushes, and unbuffered streams

The two processes of the prior section's example engage in a simple dialog, but it's already enough to illustrate some of the dangers lurking in cross-program communications. First of all, notice that both programs need to write to stderr to display a message; their stdout streams are tied to the other program's input stream. Because processes share file descriptors, stderr is the same in both parent and child, so status messages show up in the same place.

More subtly, note that both parent and child call sys.stdout.flush after they print text to the stdout stream. Input requests on pipes normally block the caller if no data is available, but it seems that this shouldn't be a problem in our example because there are as many writes as there are reads on the other side of the pipe. By default, though, sys.stdout is buffered, so the printed text may not actually be transmitted until some time in the future (when the stdio output buffers fill up). In fact, if the flush calls are not made, both processes will get stuck waiting for input from the otherinput that is sitting in a buffer and is never flushed out over the pipe. They wind up in a deadlock state, both blocked on raw_input calls waiting for events that never occur.

Keep in mind that output buffering is really a function of the system libraries used to access pipes, not of the pipes themselves (pipes do queue up output data, but they never hide it from readers!). In fact, it occurs in this example only because we copy the pipe's information over to sys.stdout, a built-in file object that uses stdio buffering by default. However, such anomalies can also occur when using other cross-process tools, such as the popen2 and popen3 calls introduced in Chapter 3.

In general terms, if your programs engage in a two-way dialog like this, there are at least three ways to avoid buffer-related deadlock problems:

As demonstrated in this example, manually flushing output pipe streams by calling the file flush method is an easy way to force buffers to be cleared.
It's possible to use pipes in unbuffered mode. Either use low-level os module calls to read and write pipe descriptors directly, or (on most systems) pass a buffer size argument of zero to os.fdopen to disable stdio buffering in the file object used to wrap the descriptor. For fifos, described in the next section, do the same for open.
Simply use the -u Python command-line flag to turn off buffering for the sys.stdout stream (or equivalently, set your PYTHONUNBUFFERED environment variable to a nonempty value).

The last technique merits a few more words. Try this: delete all the sys.stdout.flush calls in Example 5-18 and Example 5-19 (the files pipes.py and pipes-testchild.py) and change the parent's spawn call in pipes.py to this (i.e., add a -u command-line argument):

 spawn('python', '-u', 'pipes-testchild.py', 'spam')

Then start the program with a command line like this: python -u pipes.py. It will work as it did with the manual stdout flush calls, because stdout will be operating in unbuffered mode.

We'll revisit the effects of unbuffered output streams in Chapter 11, when we code a GUI that displays the output of a non-GUI program by reading it over a pipe in a thread. Deadlock in general, though, is a bigger problem than we have space to address here; on the other hand, if you know enough that you want to do IPC in Python, you're probably already a veteran of the deadlock wars. See also the sidebar below on the pty module and Pexpect package for related tools.

More on Stream Buffering: pty and Pexpect

On Unix-like platforms, you may also be able to use the Python pty standard library module to force another program's standard output to be unbuffered, especially if it's not a Python program and you cannot change its code.

Technically, default buffering for stdout is determined by whether the underlying file descriptor refers to a terminal. This occurs in the stdio library and cannot be controlled by the spawning program. In general, output to terminals is line buffered, and output to nonterminals (including files, pipes, and sockets) is fully buffered. This policy is used for efficiency.

The pty module essentially fools the spawned program into thinking it is connected to a terminal so that only one line is buffered for stdout. The net effect is that each newline flushes the prior linetypical of interactive programs, and what you need if you wish to grab each piece of the printed output as it is produced.

Note, however, that the pty module is not required for this role when spawning Python scripts with pipes: simply use the -u Python command-line flag or manually call sys.stdout.flush( ) in the spawned program. The pty module is also not available on all Python platforms today.

The Pexpect package, a pure-Python equivalent of the Unix expect program, uses pty to add additional functionality and to handle interactions that bypass standard streams (e.g., password inputs). See the Python library manual for more on pty, and search the Web for Pexpect.

5.6.3. Named Pipes (Fifos)

On some platforms, it is also possible to create a pipe that exists as a file. Such files are called named pipes (or, sometimes, fifos) because they behave just like the pipes created within the previous section's programs but are associated with a real file somewhere on your computer, external to any particular program.

Once a named pipe file is created, processes read and write it using normal file operations. Fifos are unidirectional streams. In typical operation, a server program reads data from the fifo, and one or more client programs write data to it. But a set of two fifos can be used to implement bidirectional communication just as we did for anonymous pipes in the prior section.

Because fifos reside in the filesystem, they are longer-lived than in-process anonymous pipes and can be accessed by programs started independently. The unnamed, in-process pipe examples thus far depend on the fact that file descriptors (including pipes) are copied to child processes' memory. That makes it difficult to use anonymous pipes to connect programs started independently. With fifos, pipes are accessed instead by a filename visible to all programs running on the computer, regardless of any parent/child process relationships.

Because of that, fifos are better suited as general IPC mechanisms for independent client and server programs. For instance, a perpetually running server program may create and listen for requests on a fifo that can be accessed later by arbitrary clients not forked by the server. In a sense, fifos are an alternative to the socket interface we'll meet in the next part of this book, but fifos do not directly support remote network connections, are not available on as many platforms, and are accessed using the standard file interface instead of the more unique socket port numbers and calls we'll study later.

In Python, named pipe files are created with the os.mkfifo call, available today on Unix-like platforms but not on all flavors of Windows (though this call is also available in Cygwin Python on Windowssee the earlier sidebar). This creates only the external file, though; to send and receive data through a fifo, it must be opened and processed as if it were a standard file. Example 5-20 is a derivation of the pipe2.py script listed earlier. It is written to use fifos rather than anonymous pipes.

Example 5-20. PP3E\System\Processes\pipefifo.py

 ############################################################### # named pipes; os.mkfifo not available on Windows 95/98/XP # (without Cygwin); no reason to fork here, since fifo file # pipes are external to processes--shared fds are irrelevent; ############################################################### import os, time, sys fifoname = '/tmp/pipefifo'                       # must open same name def child( ):     pipeout = os.open(fifoname, os.O_WRONLY)     # open fifo pipe file as fd     zzz = 0     while 1:         time.sleep(zzz)         os.write(pipeout, 'Spam %03d\n' % zzz)         zzz = (zzz+1) % 5 def parent( ):     pipein = open(fifoname, 'r')                 # open fifo as stdio object     while 1:         line = pipein.readline( )[:-1]            # blocks until data sent         print 'Parent %d got "%s" at %s' % (os.getpid(), line, time.time( )) if _ _name_ _ == '_ _main_ _':     if not os.path.exists(fifoname):         os.mkfifo(fifoname)                       # create a named pipe file     if len(sys.argv) == 1:         parent( )                                 # run as parent if no args     else:                                         # else run as child process         child( )

Because the fifo exists independently of both parent and child, there's no reason to fork here. The child may be started independently of the parent as long as it opens a fifo file by the same name. Here, for instance, on Linux the parent is started in one xterm window and then the child is started in another. Messages start appearing in the parent window only after the child is started and begins writing messages onto the fifo file:

 [mark@toy]$ python pipefifo.py Parent 657 got "Spam 000" at 968390065.865 Parent 657 got "Spam 001" at 968390066.865 Parent 657 got "Spam 002" at 968390068.865 Parent 657 got "Spam 003" at 968390071.865 Parent 657 got "Spam 004" at 968390075.865 Parent 657 got "Spam 000" at 968390075.867 Parent 657 got "Spam 001" at 968390076.865 Parent 657 got "Spam 002" at 968390078.865 [mark@toy]$ file /tmp/pipefifo /tmp/pipefifo: fifo (named pipe) [mark@toy]$ python pipefifo.py -child