Pipes

	Network Programming with Perl By Lincoln D. Stein Slots : 1
	Table of Contents

	Chapter 2. Processes, Pipes, and Signals

Content

Network programming is all about interprocess communication (IPC). One process exchanges data with another. Depending on the application, the two processes may be running on the same machine, may be running on two machines on the same segment of a local area network, or may be halfway across the world from each other. The two processes may be related to each other ”for example, one may have been launched under the control of the other ”or they may have been written decades apart by different authors for different operating systems.

The simplest form of IPC that Perl offers is the pipe. A pipe is a filehandle that connects the current script to the standard input or standard output of another process. Pipes are fully implemented on UNIX, VMS, and Microsoft Windows ports of Perl, and implemented on the Macintosh only in the MPW environment.

Opening a Pipe

The two-argument form of open() is used to open pipes. As before, the first argument is the name of a filehandle chosen by you. The second argument, however, is a program and all its arguments, either preceded or followed by the pipe " " symbol. The command should be entered exactly as you would type it in the operating system's default shell, which for UNIX machines is the Bourne shell ("sh") and the DOS/NT command shell on Microsoft Windows systems. You may specify the full path to the command, for example /usr/bin/ls , or rely on the PATH environment variable to find the command for you.

If the pipe symbol precedes the program name, then the filehandle is opened for writing and everything written to the filehandle is sent to the standard input of the program. If the pipe symbol follows the program, then the filehandle is opened for reading, and everything read from the filehandle is taken from the program's standard output.

For example, in UNIX the command ls -l will return a listing of the files in the current directory. By passing an argument of " ls -l " to open() , we can open a pipe to read from the command:

 open (LSFH,"ls -l ") or die "Can't open ls -l: $!"; while (my $line = <LSFH>) {   print "I saw: $line\n"; } close LSFH;

This fragment simply echoes each line produced by the ls -l command. In a real application, you'd want to do something more interesting with the information.

As an example of an output pipe, the UNIX wc -lw command will count the lines (option " -l ") and words (option " -w ") of a text file sent to it on standard input. This code fragment opens a pipe to the command, writes a few lines of text to it, and then closes the pipe. When the program runs, the word and line counts produced by wc are printed in the command window:

 open (WC," wc -lw") or die "Can't open wordcount: $!"; print WC "This is the first line.\n"; print WC "This is the another line.\n"; print WC "This is the last line.\n"; print WC "Oops. I lied.\n"; close WC;

IO::Filehandle supports pipes through its open() method:

 $wc = IO::Filehandle->open(" wc - lw") or die "Can't open wordcount: $!";

Using Pipes

Let's look at a complete functional example (Figure 2.2). The program whos_there.pl opens up a pipe to the UNIX who command and counts the number of times each user is logged in. It produces a report like this one:

Figure 2.2. A script to open a pipe to the who command

graphics/02fig02.gif

 %  whos_there.pl  jsmith 9      abu 5   lstein 1  palumbo 1

This indicates that users "jsmith" and "abu" are logged in 9 and 5 times, respectively, while "lstein" and "palumbo" are each logged in once. The users are sorted in descending order of the number of times they are logged in. This is the sort of script that might be used by an administrator of a busy system to watch usage.

Lines 1 “3: Initialize script We turn on strict syntax checking with use strict . This catches mistyped variables , inappropriate use of globals , failure to quote strings, and other potential errors. We create a local hash %who to hold the set of logged-in users and the number of times they are logged in.

Line 4: Open pipe to who command We call open() on a filehandle named WHOFH , using who as the second argument. If the open() call fails, die with an error message.

Lines 5 “8: Read the output of the who command We read and process the output of who one line at a time. Each line of who looks like this:
 jsmith pts/23 Aug 12 10:26 (cranshaw.cshl.org) 
The fields are the username, the name of the terminal he's using, the date he logged in, and the address of the remote machine he logged in from (this format will vary slightly from one dialect of UNIX to another). We use a pattern match to extract the username, and we tally the names into the %who hash in such a way that the usernames become the keys, and the number of times each user is logged in becomes the value.

The <WHOFH> loop will terminate at the EOF, which in the case of pipes occurs when the program at the other end of the pipe exits or closes its standard output.

Lines 9 “11: Print out the results We sort the keys of %who based on the number of times each user has logged in, and print out each username and login count. The printf() format used here, " %10s %d\n ", tells printf() to format its first argument as a string that is right justified on a field 10 spaces long, to print a space, and then to print the second argument as a decimal integer.

Line 12: Close the pipe We are done with the pipe now, so we close() it. If an error is detected during close, we print out a warning.

With pipes, the open() and close() functions are enhanced slightly to provide additional information about the subprocess. When opening a pipe, open() returns the process ID (PID) of the command at the other end of the pipe. This is a unique nonzero integer that can be used to monitor and control the subprocess with signals (which we discuss in detail later in the Handling Signals section). You can store this PID, or you can ignore its special meaning and treat the return value from open() as a Boolean flag.

When closing a pipe, the close() call is enhanced to place the exit code from the subprocess in the special global variable $? . Contrary to most Perl conventions, $? is zero if the command succeeded, and nonzero on an error. The perlvar POD page has more to say about the exit code, as does the section Handling Child Termination in Chapter 10.

Another aspect of close() is that when closing a write pipe, the close() call will block until the process at the other end has finished all its work and exited. If you close a read pipe before reading to the EOF, the program at the other end will get a PIPE signal (see The PIPE Signal) the next time it tries to write to standard output.

Pipes Made Easy: The Backtick Operator

Perl's backtick operator, (`), is an easy way to create a one-shot pipe for reading a program's output. The backtick acts like the double-quote operator, except that whatever is contained between the backticks is interpreted as a command to run. For example:

 $ls_output = `ls`;

This will run the ls (directory listing) command, capture its output, and assign the output to the $ls_output scalar.

Internally, Perl opens a pipe to the indicated command, reads everything it prints to standard output, closes the pipe, and returns the command output as the operator result. Typically at the end of the result there is a new line, which can be removed with chomp() .

Just like double quotes, backticks interpolate scalar variables and arrays. For example, we can create a variable containing the arguments to pass to ls like this:

 $arguments = '-l -F'; $ls_output = `ls $arguments`;

The command's standard error is not redirected by backticks. If the subprocess writes any diagnostic or error messages, they will be intermingled with your program's diagnostics. On UNIX systems, you can use the Bourne shell's output redirection system to combine the subprocess's standard error with its standard output like this:

 $ls_output = `ls 2>&1`;

Now $ls_output will contain both the standard error and the standard output of the command.

Pipes Made Powerful: The pipe() Function

A powerful but slightly involved way to create a pipe is with Perl's built-in pipe() function. pipe() creates a pair of filehandles: one for reading and one for writing. Everything written to the one filehandle can be read from the other.

$result = pipe (READHANDLE,WRITEHANDLE)

Open a pair of filehandles connected by a pipe. The first argument is the name of a filehandle to read from, and the second is a filehandle to write to. If successful, pipe() returns a true result code.

Why is pipe() useful? It is commonly used in conjunction with the fork() function in order to create a parent-child pair that can exchange data. The parent process keeps one filehandle and closes the other, while the child does the opposite . The parent and child process can now communicate across the pipe as they work in parallel.

A short example will illustrate the power of this technique. Given a positive integer, the facfib.pl script calculates its factorial and the value of its position in the Fibonacci series. To take advantage of modern multiprocessing machines, these calculations are performed in two subprocesses so that both calculations proceed in parallel. The script uses pipe() to create filehandles that the child processes can use to communicate their findings to the parent process that launched them. When we run this program, we may see results like this:

 %  facfib.pl 8  factorial(1) => 1 factorial(2) => 2 factorial(3) => 6 factorial(4) => 24 factorial(5) => 120 fibonacci(1) => 1 factorial(6) => 720 fibonacci(2) => 1 factorial(7) => 5040 fibonacci(3) => 2 factorial(8) => 40320 fibonacci(4) => 3 fibonacci(5) => 5 fibonacci(6) => 8 fibonacci(7) => 13 fibonacci(8) => 21

The results from the factorial and Fibonacci calculation overlap because they are occurring in parallel.

Figure 2.3 shows how this program works.

Figure 2.3. Using `pipe()` to create linked filehandles

graphics/02fig03.gif

Lines 1 “3: Initialize module We turn on strict syntax checking and recover the command-line argument. If no argument is given, we default to 10.

Line 4: Create linked pipes We create linked pipes with pipe() . READER will be used by the main (parent) process to read results from the children, which will use WRITER to write their results.

Lines 5 “10: Create first child process We call fork() to clone the current process. In the parent process, fork() returns the nonzero PID of the child process. In the child process, fork() returns numeric 0. If we see that the result of fork() is 0, we know we are the child process. We close the READER filehandle because we don't need it. We select() WRITER , making it the default filehandle for output, and turn on autoflush mode by setting $ to a true value. This is necessary to ensure that the parent process gets our messages as soon as we write them.

We now call the factorial() subroutine with the integer argument from the command line. After this, the child process is done with its work, so we exit() . Our copy of WRITER is closed automatically.

Lines 11 “16: Create the second child process Back in the parent process, we invoke fork() again to create a second child process. This one, however, calls the fibonacci() subroutine rather than factorial() .

Lines 17 “19: Process messages from children In the parent process, we close WRITER because we no longer need it. We read from READER one line at a time, and print out the results. This will contain lines issued by both children. READER returns undef when the last child has finished and closed its WRITER filehandle, sending us an EOF. We could close() READER and check the result code, or let Perl close the filehandle when we exit, as we do here.

Lines 20 “25: The factorial() subroutine We calculate the factorial of the subroutine argument in a straightforward iterative way. For each step of the calculation, we print out the intermediate result. Because WRITER has been made the default filehandle with select() , each print() statement enters the pipe, where it is ultimately read by the parent process.

Lines 26 “34: The fibonacci() subroutine This is identical to factorial() except for the calculation itself.

Instead of merely echoing its children's output, we could have the parent do something more useful with the information. We use a variant of this technique in Chapter 14 to implement a preforked Web server. The parent Web server manages possibly hundreds of children, each of which is responsible for processing incoming Web requests . To tune the number of child processes to the incoming load, the parent monitors the status of the children via messages that they send via a pipe launching more children under conditions of high load, and killing excess children when the load is low.

The pipe() function can also be used to create a filehandle connected to another program in much the way that piped open() does. We don't use this technique elsewhere, but the general idea is for the parent process to fork() , and for the child process to reopen either STDIN or STDOUT onto one of the paired filehandles, and then exec () the desired program with arguments. Here's the idiom:

 pipe(READER,WRITER) or die "pipe no good: $!"; my $child = fork(); die "Can't fork: $!" unless defined $child; if ($child == 0) { # child process    close READER;              # child doesn't need this    open (STDOUT,">&WRITER");  # STDOUT now goes to writer    exec $cmd,$args;    die "exec failed: $!"; } close WRITER;  # parent doesn't need this

At the end of this code, READER will be attached to the standard output of the command named $cmd , and the effect is almost exactly identical to this code:

 open (READER,"$cmd $args ") or die "pipe no good: $!";

Bidirectional Pipes

Both piped open() and pipe() create unidirectional filehandles. If you want to both read and write to another process, you're out of luck. In particular, this sensible -looking syntax does not work:

 open(FH," $cmd ");

One way around this is to call pipe() twice, creating two pairs of linked filehandles. One pair is used for writing from parent to child, and the other for child to parent, rather like a two-lane highway . We won't go into this technique, but it's what the standard IPC::Open2 and IPC::Open3 modules do to create a set of filehandles attached to the STDIN , STDOUT , and STDERR of a subprocess.

A more elegant way to create a bidirectional pipe is with the socketpair() function. This creates two linked filehandles like pipe() does, but instead of being a one-way connection, both filehandles are read/write. Data written into one filehandle comes out the other one, and vice versa. Because the socketpair() function involves the same concepts as the socket() function used for network communications, we defer our discussion of it until Chapter 4.

Distinguishing Pipes from Plain Filehandles

You will occasionally need to test a filehandle to see if it is opened on a file or a pipe. Perl's filehandle tests make this possible (Table 2.1).

Table 2.1. Perl's Filehandle Tests

Test	Description
`-p`	Filehandle is a pipe.
`-t`	Filehandle is opened on a terminal.
`-s`	Filehandle is a socket.

If a filehandle is opened on a pipe, the -p test will return true:

 print "I've got a pipe!\n" if -p FILEHANDLE;

The -t and -S file tests can distinguish other special types of filehandle. If a filehandle is opened on a terminal (the command-line window), then -t will return true. Programs can use this to test STDIN to see if the program is being run interactively or has its standard input redirected from a file:

 print "Running in batch mode, confirmation prompts disabled.\n"       unless -t STDIN;

The -S test detects whether a filehandle is opened on a network socket (introduced in Chapter 3):

 print "Network active.\n" if -S FH

There are more than a dozen other file test functions that can give you a file's size , modification date, ownership, and other information. See the perlfunc POD page for details.

The Dreaded PIPE Error

When your script is reading from a filehandle opened on a pipe, and the program at the other end either exits or simply closes its end of the pipe, your program will receive an EOF on the filehandle. What happens in the opposite case, when your script is writing to a pipe and the program at the other end terminates prematurely or closes its end of the connection?

To find out, we can write two short Perl scripts. One, named write_ten.pl , opens up a pipe to the second program and attempts to write ten lines of text to it. The script checks the result code from print() , and bumps up a variable named $count whenever print() returns a true result. When write_ten.pl is done, it displays the contents of $count , indicating the number of lines that were successfully written to the pipe. The second program, named read_three.pl , reads three lines of text from standard input and then exits.

The two scripts are shown in Figures 2.4 and 2.5. Of note is that write_ten.pl puts the pipe into autoflush mode so that each line of text is sent down the pipe immediately, rather than being buffered locally. write_ten.pl also sleep() s for one second after writing each line of text, giving read_three.pl a chance to report that the text was received. Together, these steps make it easier for us to see what is happening. When we run write_ten.pl we see the following:

Figure 2.4. The write_ten.pl script writes ten lines of text to a pipe

graphics/02fig04.gif

Figure 2.5. The read_three.pl script reads three lines of text from standard input

graphics/02fig05.gif

 %  write_ten.pl  Writing line 1 Read_three got: This is line number 1 Writing line 2 Read_three got: This is line number 2 Writing line 3 Read_three got: This is line number 3 Writing line 4 Broken pipe %

Everything works as expected through line three, at which point read_three.pl exits. When write_ten.pl attempts to write the fourth line of text, the script crashes with a Broken pipe error. The statement that prints out the number of lines successfully passed to the pipe is never executed.

When a program attempts to write to a pipe and no program is reading at the other end, this results in a PIPE exception. This exception, in turn, results in a PIPE signal being delivered to the writer. By default this signal results in the immediate termination of the offending program. The same error occurs in network applications when the sender attempts to transmit data to a remote program that has exited or has stopped receiving.

To deal effectively with PIPE , you must install a signal handler, and this brings us to the next major topic.

Top