5.6. PipesPipes, another cross-program communication device, are made available in Python with the built-in os.pipe call. Pipes are unidirectional channels that work something like a shared memory buffer, but with an interface resembling a simple file on each of two ends. In typical use, one program writes data on one end of the pipe, and another reads that data on the other end. Each program sees only its end of the pipes and processes it using normal Python file calls. Pipes are much more within the operating system, though. For instance, calls to read a pipe will normally block the caller until data becomes available (i.e., is sent by the program on the other end) instead of returning an end-of-file indicator. Because of such properties, pipes are also a way to synchronize the execution of independent programs. 5.6.1. Anonymous Pipe BasicsPipes come in two flavorsanonymous and named. Named pipes (sometimes called fifos) are represented by a file on your computer. Anonymous pipes exist only within processes, though, and are typically used in conjunction with process forks as a way to link parent and spawned child processes within an application; parent and child converse over shared pipe file descriptors. Because named pipes are really external files, the communicating processes need not be related at all (in fact, they can be independently started programs). Since they are more traditional, let's start with a look at anonymous pipes. To illustrate, the script in Example 5-16 uses the os.fork call to make a copy of the calling process as usual (we met forks earlier in this chapter). After forking, the original parent process and its child copy speak through the two ends of a pipe created with os.pipe prior to the fork. The os.pipe call returns a tuple of two file descriptorsthe low-level file identifiers we met earlierrepresenting the input and output sides of the pipe. Because forked child processes get copies of their parents' file descriptors, writing to the pipe's output descriptor in the child sends data back to the parent on the pipe created before the child was spawned. Example 5-16. PP3E\System\Processes\pipe1.py
If you run this program on Linux (pipe is now available on Windows, but fork is not), the parent process waits for the child to send data on the pipe each time it calls os.read. It's almost as if the child and parent act as client and server herethe parent starts the child and waits for it to initiate communication.[*] Just to tease, the child keeps the parent waiting one second longer between messages with time.sleep calls, until the delay has reached four seconds. When the zzz delay counter hits 005, it rolls back down to 000 and starts again:
[mark@toy]$ python pipe1.py Parent 1292 got "Spam 000" at 968370008.322 Parent 1292 got "Spam 001" at 968370009.319 Parent 1292 got "Spam 002" at 968370011.319 Parent 1292 got "Spam 003" at 968370014.319 Parent 1292 got "Spam 004Spam 000" at 968370018.319 Parent 1292 got "Spam 001" at 968370019.319 Parent 1292 got "Spam 002" at 968370021.319 Parent 1292 got "Spam 003" at 968370024.319 Parent 1292 got "Spam 004Spam 000" at 968370028.319 Parent 1292 got "Spam 001" at 968370029.319 Parent 1292 got "Spam 002" at 968370031.319 Parent 1292 got "Spam 003" at 968370034.319 If you look closely, you'll see that when the child's delay counter hits 004, the parent ends up reading two messages from the pipe at once; the child wrote two distinct messages, but they were close enough in time to be fetched as a single unit by the parent. Really, the parent blindly asks to read, at most, 32 bytes each time, but it gets back whatever text is available in the pipe (when it becomes available). To distinguish messages better, we can mandate a separator character in the pipe. An end-of-line makes this easy, because we can wrap the pipe descriptor in a file object with os.fdopen and rely on the file object's readline method to scan up through the next \n separator in the pipe. Example 5-17 implements this scheme. Example 5-17. PP3E\System\Processes\pipe2.py
This version has also been augmented to close the unused end of the pipe in each process (e.g., after the fork, the parent process closes its copy of the output side of the pipe written by the child); programs should close unused pipe ends in general. Running with this new version returns a single child message to the parent each time it reads from the pipe, because they are separated with markers when written: [mark@toy]$ python pipe2.py Parent 1296 got "Spam 000" at 968370066.162 Parent 1296 got "Spam 001" at 968370067.159 Parent 1296 got "Spam 002" at 968370069.159 Parent 1296 got "Spam 003" at 968370072.159 Parent 1296 got "Spam 004" at 968370076.159 Parent 1296 got "Spam 000" at 968370076.161 Parent 1296 got "Spam 001" at 968370077.159 Parent 1296 got "Spam 002" at 968370079.159 Parent 1296 got "Spam 003" at 968370082.159 Parent 1296 got "Spam 004" at 968370086.159 Parent 1296 got "Spam 000" at 968370086.161 Parent 1296 got "Spam 001" at 968370087.159 Parent 1296 got "Spam 002" at 968370089.159 5.6.2. Bidirectional IPC with PipesPipes normally let data flow in only one directionone side is input, one is output. What if you need your programs to talk back and forth, though? For example, one program might send another a request for information and then wait for that information to be sent back. A single pipe can't generally handle such bidirectional conversations, but two pipes can. One pipe can be used to pass requests to a program and another can be used to ship replies back to the requestor.[*]
The module in Example 5-18 demonstrates one way to apply this idea to link the input and output streams of two programs. Its spawn function forks a new child program and connects the input and output streams of the parent to the output and input streams of the child. That is:
The net effect is that the two independent programs communicate by speaking over their standard streams. Example 5-18. PP3E\System\Processes\pipes.py
The spawn function in this module does not work on Windows (remember that fork isn't yet available there today). In fact, most of the calls in this module map straight to Unix system calls (and may be arbitrarily terrifying at first glance to non-Unix developers). We've already met some of these (e.g., os.fork), but much of this code depends on Unix concepts we don't have time to address well in this text. But in simple terms, here is a brief summary of the system calls demonstrated in this code:
In terms of connecting standard streams, os.dup2 is the real nitty-gritty here. For example, the call os.dup2(parentStdin,stdinFd) essentially assigns the parent process's stdin file to the input end of one of the two pipes created; all stdin reads will henceforth come from the pipe. By connecting the other end of this pipe to the child process's copy of the stdout stream file with os.dup2(childStdout,stdoutFd), text written by the child to its sdtdout winds up being routed through the pipe to the parent's stdin stream. To test this utility, the self-test code at the end of the file spawns the program shown in Example 5-19 in a child process and reads and writes standard streams to converse with it over two pipes. Example 5-19. PP3E\System\Processes\pipes-testchild.py
Here is our test in action on Linux; its output is not incredibly impressive to read, but it represents two programs running independently and shipping data back and forth through a pipe device managed by the operating system. This is even more like a client/server model (if you imagine the child as the server). The text in square brackets in this output went from the parent process to the child and back to the parent again, all through pipes connected to standard streams: [mark@toy]$ python pipes.py Child 797 of 796 got arg: spam Parent got: "Child 797 got: [Hello 1 from parent 796]" Parent got: "Child 797 got: [Hello 2 from parent 796]" 5.6.2.1. Deadlocks, flushes, and unbuffered streamsThe two processes of the prior section's example engage in a simple dialog, but it's already enough to illustrate some of the dangers lurking in cross-program communications. First of all, notice that both programs need to write to stderr to display a message; their stdout streams are tied to the other program's input stream. Because processes share file descriptors, stderr is the same in both parent and child, so status messages show up in the same place. More subtly, note that both parent and child call sys.stdout.flush after they print text to the stdout stream. Input requests on pipes normally block the caller if no data is available, but it seems that this shouldn't be a problem in our example because there are as many writes as there are reads on the other side of the pipe. By default, though, sys.stdout is buffered, so the printed text may not actually be transmitted until some time in the future (when the stdio output buffers fill up). In fact, if the flush calls are not made, both processes will get stuck waiting for input from the otherinput that is sitting in a buffer and is never flushed out over the pipe. They wind up in a deadlock state, both blocked on raw_input calls waiting for events that never occur. Keep in mind that output buffering is really a function of the system libraries used to access pipes, not of the pipes themselves (pipes do queue up output data, but they never hide it from readers!). In fact, it occurs in this example only because we copy the pipe's information over to sys.stdout, a built-in file object that uses stdio buffering by default. However, such anomalies can also occur when using other cross-process tools, such as the popen2 and popen3 calls introduced in Chapter 3. In general terms, if your programs engage in a two-way dialog like this, there are at least three ways to avoid buffer-related deadlock problems:
The last technique merits a few more words. Try this: delete all the sys.stdout.flush calls in Example 5-18 and Example 5-19 (the files pipes.py and pipes-testchild.py) and change the parent's spawn call in pipes.py to this (i.e., add a -u command-line argument): spawn('python', '-u', 'pipes-testchild.py', 'spam') Then start the program with a command line like this: python -u pipes.py. It will work as it did with the manual stdout flush calls, because stdout will be operating in unbuffered mode. We'll revisit the effects of unbuffered output streams in Chapter 11, when we code a GUI that displays the output of a non-GUI program by reading it over a pipe in a thread. Deadlock in general, though, is a bigger problem than we have space to address here; on the other hand, if you know enough that you want to do IPC in Python, you're probably already a veteran of the deadlock wars. See also the sidebar below on the pty module and Pexpect package for related tools.
5.6.3. Named Pipes (Fifos)On some platforms, it is also possible to create a pipe that exists as a file. Such files are called named pipes (or, sometimes, fifos) because they behave just like the pipes created within the previous section's programs but are associated with a real file somewhere on your computer, external to any particular program. Once a named pipe file is created, processes read and write it using normal file operations. Fifos are unidirectional streams. In typical operation, a server program reads data from the fifo, and one or more client programs write data to it. But a set of two fifos can be used to implement bidirectional communication just as we did for anonymous pipes in the prior section. Because fifos reside in the filesystem, they are longer-lived than in-process anonymous pipes and can be accessed by programs started independently. The unnamed, in-process pipe examples thus far depend on the fact that file descriptors (including pipes) are copied to child processes' memory. That makes it difficult to use anonymous pipes to connect programs started independently. With fifos, pipes are accessed instead by a filename visible to all programs running on the computer, regardless of any parent/child process relationships. Because of that, fifos are better suited as general IPC mechanisms for independent client and server programs. For instance, a perpetually running server program may create and listen for requests on a fifo that can be accessed later by arbitrary clients not forked by the server. In a sense, fifos are an alternative to the socket interface we'll meet in the next part of this book, but fifos do not directly support remote network connections, are not available on as many platforms, and are accessed using the standard file interface instead of the more unique socket port numbers and calls we'll study later. In Python, named pipe files are created with the os.mkfifo call, available today on Unix-like platforms but not on all flavors of Windows (though this call is also available in Cygwin Python on Windowssee the earlier sidebar). This creates only the external file, though; to send and receive data through a fifo, it must be opened and processed as if it were a standard file. Example 5-20 is a derivation of the pipe2.py script listed earlier. It is written to use fifos rather than anonymous pipes. Example 5-20. PP3E\System\Processes\pipefifo.py
Because the fifo exists independently of both parent and child, there's no reason to fork here. The child may be started independently of the parent as long as it opens a fifo file by the same name. Here, for instance, on Linux the parent is started in one xterm window and then the child is started in another. Messages start appearing in the parent window only after the child is started and begins writing messages onto the fifo file: [mark@toy]$ python pipefifo.py Parent 657 got "Spam 000" at 968390065.865 Parent 657 got "Spam 001" at 968390066.865 Parent 657 got "Spam 002" at 968390068.865 Parent 657 got "Spam 003" at 968390071.865 Parent 657 got "Spam 004" at 968390075.865 Parent 657 got "Spam 000" at 968390075.867 Parent 657 got "Spam 001" at 968390076.865 Parent 657 got "Spam 002" at 968390078.865 [mark@toy]$ file /tmp/pipefifo /tmp/pipefifo: fifo (named pipe) [mark@toy]$ python pipefifo.py -child |