Section 14.5. Processes as Filehandles | Learning Perl, 5th Edition

14.5. Processes as Filehandles

So far, we've been looking at ways to deal with synchronous processes, where Perl stays in charge, launches a command, (usually) waits for it to finish, and possibly grabs its output. But Perl can also launch a child process that stays alive, communicating^[] to Perl on an ongoing basis until the task is complete.

] Via pipes or whatever your operating system provides for interprocess communication.

The syntax for launching a concurrent (parallel) child process is to put the command as the "filename" for an open call and to precede or follow the command with a vertical bar, which is the "pipe" character. For that reason, this is often called a piped open:

     open DATE, "date|" or die "cannot pipe from date: $!";     open MAIL, "|mail merlyn" or die "cannot pipe to mail: $!";

In the first example, with the vertical bar on the right, the command is launched with its standard output connected to the DATE filehandle opened for reading, similar to the way that the command date | your_program would work from the shell. In the second example, with the vertical bar on the left, the command's standard input is connected to the MAIL filehandle opened for writing, similar to what happens with the command your_program | mail merlyn. In either case, the command launches and continues independently of the Perl process.^[*] The open fails if the child process cannot be created. If the command doesn't exist or exits erroneously, this will (generally) not be seen as an error when opening but as an error when closing. We'll get to that in a moment.

^[*] If the Perl process exits before the command is complete, a command that's been reading will see end-of-file, while a command that's been writing will get a "broken pipe" error signal on the next write by default.

For all intents and purposes, the rest of the program doesn't know, doesn't care, and would have to work hard to figure out that this is a filehandle opened on a process rather than on a file. So, to get data from a filehandle opened for reading, we'll do the normal read:

     my $now = <DATE>;

To send data to the mail process (waiting for the body of a message to deliver to merlyn on standard input), a simple print-with-a-filehandle will do:

     print MAIL "The time is now $now"; # presume $now ends in newline

In short, you can pretend that these filehandles are hooked up to magical files, one that contains the output of the date command and one that will automatically be mailed by the mail command.

If a process is connected to a filehandle, open for reading, and then it exits, the filehandle returns end-of-file, just like reading up to the end of a normal file. When you close a filehandle open for writing to a process, the process will see end-of-file. So, to finish sending the email, close the handle:

     close MAIL;     die "mail: nonzero exit of $?" if $?;

If you close a filehandle attached to a process, Perl waits for the process to complete so it can get the process's exit status. The exit status is then available in the $? variable (reminiscent of the same variable in the Bourne Shell) and is the same kind of number as the value returned by the system function: zero for success and nonzero for failure. Each new exited process overwrites the previous value though, so save it quickly if you want it. (The $? variable also holds the exit status of the most recent system or backquoted command, if you're curious.)

The processes are synchronized like a pipelined command. If you try to read with no data available, the process is suspended (without consuming additional CPU time) until the sending program has started speaking again. Similarly, if a writing process gets ahead of the reading process, the writing process slows down until the reader starts to catch up. There's a buffer (usually 8KB or so) in between so they don't have to stay in lockstep.

Why use processes as filehandles? Well, it's the only easy way to write to a process based on the results of a computation. If you're only reading, backquotes can be easier to manage unless you want to have the results as they come in.

For example, the Unix find command locates files based on their attributes, and it can take a while if used on a fairly large number of files (such as starting from the root directory). You can put a find command inside backquotes, but it's often nicer to see the results as they are found:

     open F, "find / -atime +90 -size +1000 -print|" or die "fork: $!";     while (<F>) {       chomp;       printf "%s size %dK last accessed on %s\n",         $_, (1023 + -s $_)/1024, -A $_;     }

The find command in the previous example is looking for all the files not accessed within the past 90 days and larger than 1,000 blocks. (These are good candidates to move to longer-term storage.) While find is searching, Perl can wait. As each file is found, Perl responds to the incoming name and displays some information about that file for further research. Had this been written with backquotes, we wouldn't have seen any output until the find commmand had finished. It's comforting to see that it's actually doing the job before it's done.