Concurrent Clients

	Network Programming with Perl By Lincoln D. Stein Slots : 1
	Table of Contents

	Chapter 5. The IO::Socket API

Content

This chapter concludes by introducing a topic that is one of the central issues of Part III, the problem of concurrency.

A Gab Client, First Try

To motivate this discussion, let's write a simple client that can be used for interactive conversations with line-oriented servers. This program will connect to a designated host and port, and simply act as a direct conduit between the remote machine and the user. Everything the user types is transmitted across the wire to the remote host, and everything we receive from the remote host is echoed to standard output.

We can use this client to talk to the echo servers from this chapter and the last, or to talk with other line-oriented servers on the Internet. Common examples include FTP, SMTP, and POP3 servers.

We'll call this client gab1.pl, because it is the first in a series of such clients. A simple ”but incorrect ”implementation of the gab client looks like Figure 5.6.

Figure 5.6. An incorrect implementation of a gab client

graphics/05fig06.gif

Lines 1 “6: Initialize module We load IO::Socket as before and recover the desired remote hostname and port number from the command-line arguments.

Lines 7 “8: Create socket We create the socket using IO::Socket::INET->new() or, if unsuccessful , die with an error message.

Lines 9 “21: Enter main loop We enter a loop. Each time through the loop we read one line from the socket and print it to standard output. Then we read a line of input from the user and print it to the socket.

Because the remote server uses CRLF pairs to end its lines, but the user types conventional newlines, we need to keep setting and resetting $/ . The easiest way to do this is to place the code that reads a line from the socket in a little block, and to localize (with local ) the $/ variable, so that its current value is saved on entry into the block, and restored on exit. Within the block, we set $/ to CRLF .

If we get an EOF from either the user or the server, we leave the loop by calling last .

At first, this straightforward script seems to work. For example, this transcript illustrates a session with an FTP server. The first thing we see on connecting with the server is its welcome banner (message code 220). We type in the FTP USER command, giving the name "anonymous," and get an acknowledgment. We then provide a password with PASS, and get another acknowledgment. Everything seems to be going smoothly.

 %  gab1.pl phage.cshl.org ftp  220 phage.cshl.org FTP server ready.  USER anonymous  331 Guest login ok, send your complete e-mail address as password.  PASS jdoe@nowhere.com  230 Guest login ok, access restrictions apply.

Unfortunately, things don't last that way for long. The next thing we try is the HELP command, which is supposed to print a multiline summary of FTP commands. This doesn't go well. We get the first line of the expected output, and then the script stops, waiting for us to type the next command. We type another HELP, and get the second line of the output from the first HELP command. We type QUIT, and get the third line of the HELP command.

  HELP  214-The following commands are recognized (* =>'s unimplemented).  HELP  USER     PORT    STOR    MSAM*    RNTO    NLST    MKD   CDUP  QUIT  PASS     PASV    APPE    MRSQ*    ABOR    SITE    XMKD  XCUP  QUIT  ACCT*    TYPE    MLFL*   MRCP*    DELE    SYST    RMD   STOU  QUIT  ...

Clearly the script has gotten out of synch. As it is written, it can deal with only the situation in which a single line of input from the user results in a single line of output from the server. Having no way of dealing with multiline output, it can't catch up with the response to the HELP command.

What if we changed the line that reads from the server to something like this?

 while ($from_server = <$socket>) {   chomp $from_server;   print $from_server,"\n"; }

Unfortunately, this just makes matters worse . Now the script hangs after it reads the first line from the server. The FTP server is waiting for us to send it a command, but the script is waiting for another line from the server and hasn't even yet asked us for input, a situation known as deadlock.

In fact, none of the straightforward rearrangements of the read and print orders fix this problem. We either get out of synch or get hopelessly deadlocked.

A Gab Client, Second Try

What we need to do is decouple the process reading from the remote host from the process of reading from the socket. In fact, we need to isolate the tasks in two concurrent but independent processes that won't keep blocking each other the way the naive implementation of the gab client did.

On UNIX and Windows systems, the easiest way to accomplish this task is using the fork() command to create two copies of the script. The parent process will be responsible for copying data from standard input to the remote host, while the child will be responsible for the flow of data in the other direction. Unfortunately, Macintosh users do not have access to this call. A good but somewhat more complex solution that avoids the call to fork() is discussed in Chapter 12, Multiplexed Operations.

As it turns out, the simple part of the script is connecting to the server, forking a child, and having each process copy data across the network. The hard part is to synchronize the two processes so that they both quit gracefully when the session is done. Otherwise, there is a chance that one process may continue to run after the other has exited.

There are two scenarios for terminating the connection. In the first scenario, the remote server initiates the process by closing its end of the socket. In this case, the child process receives an EOF when it next tries to read from the server and calls exit() . It somehow has to signal to the parent that it is done. In the second scenario, the user closes standard input. The parent process detects EOF when it reads from STDIN and has to inform the child that the session is done.

On UNIX systems, there is a built-in way for children to signal parents that they have exited. The CHLD signal is sent automatically to a parent whenever one of its subprocesses has died (or have either stopped or resumed; we discuss this in more detail in Chapter 10). For the parent process to detect that the remote server has closed the connection it merely has to install a CHLD handler that calls exit() . When the child process detects that the server has closed the connection, the child will exit, generating a CHLD signal. The parent's signal handler is invoked, and the process now exits too.

The second scenario, in which the user closes STDIN , is a bit more complicated. One easy way is for the parent just to kill() its child after standard input has closed. There is, however, a problem with this. Just because the user has closed standard input doesn't mean that the server has finished sending output back to us. If we kill the child before it has received and processed all the pending information from the server, we may lose some information.

The cleaner way to do this is shown in Figure 5.7. When the parent process gets an EOF from standard input, it closes its end of the socket, thereby sending the server an end-of-file condition. The server detects the EOF, and closes its end of the connection, thereby propagating the EOF back to the child process. The child process exits, generating a CHLD signal. The parent intercepts this signal, and exits itself.

Figure 5.7. Closing a connection in a forked client

graphics/05fig07.gif

The beauty of this is that the child doesn't see the EOF until after it has finished processing any queued server data. This guarantees that no data is lost. In addition, the scheme works equally well when the termination of the connection is initiated by the server. The risk of this scheme is that the server may not cooperate and close its end of the connection when it receives an EOF. However, most servers are well behaved in this respect. If you encounter one that isn't, you can always kill both the parent and the child by pressing the interrupt key.

There is one subtle aspect to this scheme. The parent process can't simply close() its copy of the socket in order to send an EOF to the remote host. There is a second copy of the socket in the child process, and the operating system won't actually close a filehandle until its last copy is closed. The solution is for the parent to call shutdown(1) on the socket, forcefully closing it for writing. This sends EOF to the server without interfering with the socket's ability to continue to read data coming in the other direction. This strategy is implemented in Figure 5.8, in a script named gab2.pl.

Figure 5.8. A working implementation of a gab client

graphics/05fig08.gif

Lines 1 “7: Initialize module We turn on strict syntax checking, load IO::Socket, and fetch the host and port from the command line.

Line 8: Create the socket We create the connected socket in exactly the same way as before.

Lines 9 “10: Call fork We call fork() , storing the result in the variable $child . Recall that if successful, fork() duplicates the current process. In the parent process, fork() returns the PID of the child; in the child process, fork() returns numeric 0.

In case of error, fork() returns undef . We check for this and exit with an error message.

Lines 11 “15: Parent process copies from standard input to socket The rest of the script is divided into halves . One half is the parent process, and is responsible for reading lines from standard input and writing to the server; the other half is the child, which is responsible for reading lines from the server and writing them to standard output.

In the parent process, $child is nonzero. For the reasons described earlier, we set up a signal handler for the CHLD signal. This handler simply calls exit() . We then call the user_to_host() subroutine, which copies user data from standard input to the socket.

When standard input is closed, user_to_host() returns. We call the socket's shutdown() method, closing it for writing. Now we go to sleep indefinitely, awaiting the expected CHLD signal that will terminate the process.

Lines 16 “19: Child process copies from socket to standard output In the child process, we call host_to_user() to copy data from the socket to standard output. This subroutine will return when the remote host closes the socket. We don't do anything special after that except to warn that the remote host has closed the connection. We allow the script to exit normally and let the operating system generate the CHLD message.

Lines 20 “26: The user_to_host() subroutine This subroutine is responsible for copying lines from standard input to the socket. Our loop reads a line from standard input, removes the newline, and then prints to the socket, appending a CRLF to the end. We return when standard input is closed.

Lines 27 “34: The host_to_user() subroutine This subroutine is almost the mirror image of the previous one. The only difference is that we set the $/ input record separator global to CRLF before reading from the socket. Notice that there's no reason to localize $/ in this case because changes made in the child process won't affect the parent. When we've read the last line from the socket, we return.

You may wonder why the parent goes to sleep rather than simply exit after it has shutdown() its copy of the socket. The answer is simply esthetic. As soon as the parent exits, the user will see the command-line prompt reappear. However, the child may still be actively reading from the socket and writing to standard output. The child's output will intermingle in an ugly way with whatever the user is doing at the command line. By sleeping until the child exits, the parent avoids this behavior.

You may also wonder about the call to exit() in the CHLD signal handler. While this is a problematic construction on Windows platforms because it causes crashes, the sad fact is that the Windows port of Perl does not generate or receive CHLD signals when a child process dies, so this issue is moot. To terminate gab2.pl on Windows platforms, press the interrupt key.

When we try to connect to an FTP server using the revised script, the results are much more satisfactory. Multiline results now display properly, and there is no problem of synchronization or deadlocking.

 %  gab2.pl phage.cshl.org ftp  220 phage.cshl.org FTP server ready.  USER anonymous  331 Guest login ok, send your complete e-mail address as password.  PASS ok@better.now  230 Guest login ok, access restrictions apply.  HELP  214-The following commands are recognized (* =>'s unimplemented).    USER     PORT    STO     RMSAM*   RNTO   NLST    MKD     CDUP    PASS     PASV    APP     EMRSQ*   ABOR   SITE    XMKD    XCUP    ACCT*    TYPE    MLFL*   MRCP*    DELE   SYST    RMD     STOU    SMNT*    STRU    MAIL*   ALLO     CWD    STAT    XRMD    SIZE    REIN*    MODE    MSND*   REST     XCWD   HELP    PWD     MDTM    QUIT     RETR    MSOM*   RNFR     LIST   NOOP    XPWD     214 Direct comments to ftp-bugs@phage.cshl.org  QUIT  221 Goodbye. Connection closed by foreign host.

This client is suitable for talking to many line-oriented servers, but there is one Internet service that you cannot successfully access via this client ”the Telnet remote login service itself. This is because Telnet servers initially exchange some binary protocol information with the client before starting the conversation. If you attempt to use this client to connect to a Telnet port (port 23), you will just see some funny characters and then a pause as the server waits for the client to complete the protocol handshake. The Net::Telnet module (Chapter 6) provides a way to talk to Telnet servers.

Top