Summary | Network Programming with Perl

	Network Programming with Perl By Lincoln D. Stein Slots : 1
	Table of Contents

	Chapter 17. TCP Urgent Data

Content

A Travesty Server

We now have all the ingredients necessary to write a client/server pair that does something useful with urgent data. This server implements "travesty," a Markov chain algorithm that analyzes a text document and generates a new document that preserves all the word-pair (tuple) frequencies of the original. The result is a completely incomprehensible document that has an eerie similarity to the writing style of the original. For example, here's an excerpt from the text generated after running the previous chapter through the travesty algorithm:

It initiates an EWOULDBLOCK error. The urgent data signal. This may be several such messages from different children. The parent will start a new document that explains the problem in %STATUS. Just before the urgent data is because this version can handle up to the EWOULDBLOCK error constant. The last two versions of interrupts now shows this pattern: Each time through, sysread () is called the "thundering herd" phenomenon . More seriously, however, some operating systems may not already at its maximum.

The results of running Ernest Hemingway through the wringer are similarly amusing. Oddly, James Joyce's later works seem to be entirely unaffected by this translation.

The client/server pair in this example divides the work in the classical manner. The client runs the user interface. It prompts the user for commands to load text files into the analyzer, generate the travesty, and reset the word frequency tables. The server does the heavy lifting , constructing the Markov model from uploaded files and generating travesties of arbitrary length.

TCP urgent data is useful in this application because it frequently takes longer for the server to analyze the word tuple frequencies in an uploaded text file than for the client to upload it. The user may wish to abort the upload midway, in which case the client must send the server an urgent signal to stop processing the file and to ignore all data sent from the time the user interrupted the process.

Conversely, once the tuple frequency tables are created, the server has the ability to generate travesty text far faster than the network can transfer it. We would like the user to be able to interrupt the incoming text stream, again by issuing an urgent data signal.

The client/server pair requires three external modules in addition to the standard ones: Sockatmark, which we have already seen; Text::Travesty, the travesty generator; and IO::Getline, the nonblocking replacement for Perl's getline() function, which we developed in Chapter 13 (Figure 13.2). In this case we won't be using IO::Getline for its nonblocking features, but for its ability to clear its internal line buffer when the flush() method is called.

The Text::Travesty Module

The travesty algorithm is encapsulated in a small module named Text::Travesty. Its source code list is in Appendix A; it may also be available on CPAN. It is adapted from a small demo application that comes in the eg/ directory of the Perl distribution. Like other modules in this book, it is object-oriented. You start by creating a new Text::Travesty object with Text::Travesty->new() :

 $t = Text::Travesty->new;

You then call add() one or more times to analyze the word tuple frequencies in a section of text:

 $t->add($text);

Once the text is analyzed , you can generate a travesty with calls to generate() or pretty_text() :

 $travesty = $t->generate(1000); $wrapped  = $t->pretty_text(2000);

Both methods take a numeric argument that indicates the length of the generated travesty, measured in words. The difference between the two methods is that generate() creates unwrapped raw text, while pretty_text() invokes Perl's Text::Wrap module to create nicely indented and wrapped paragraphs.

The words() method returns the number of unique words in the frequency tables. reset() clears the tables and readies the object to receive fresh data to analyze:

 $word_count = $t->words; $t->reset;

The Travesty Server Design

In addition to its main purpose of showing the handling of urgent data, the travesty server illustrates a number of common design motifs in client/server communications.

The server is line oriented. After receiving an incoming connection, it issues a welcome banner and then enters a loop in which it reads a line from the socket, parses the command, and takes the appropriate action. The following commands are recognized:

DATA Prepare to receive text data to analyze. The server reads an indefinite amount of incoming information, terminating when it sees a dot (" . ") on a line by itself. The text is passed to Text::Travesty to construct frequency tables.

RESET Reset the travesty frequency tables to empty.

GENERATE < word_count> Generate a travesty from the stored frequency tables. word_count is a positive integer specifying the length of the travesty to generate. The server indicates the end of the travesty by sending a dot on a line by itself.

BYE Terminate the connection.
The server responds to each command by sending a response line like the following:
```
 205 Travesty reset 
```
The initial three-digit result code is what the client pays attention to. The human-readable text is designed for remote debugging.
As an additional aid to debugging, the server uses CRLF pairs for all incoming commands and outgoing responses. This makes the server compatible with Telnet and other common network clients .

Figure 17.6 lists the server application.

Figure 17.6. Travesty server

graphics/17fig06.gif

Lines 1 “12: Load modules and initialize signal handlers The travesty server follows the familiar accept-and-fork architecture. In addition to the usual networking packages, we load Fcntl in order to get access to the F_SETOWN constant and the Text::Travesty, IO::Getline, and Sockatmark modules. Recall that the latter adds the atmark() method to the IO::Socket class. We also define a constant, DEBUG , which enables debugging messages, and a global to hold the IO::Getline object.

After loading the required modules, we set up two signal handlers. The CHLD handler is the usual one used in accept-and-fork servers. We initially tell Perl to ignore URG signals. We'll reenable them in the places where they have meaning, during the uploading and downloading of large data streams.

Lines 13 “26: Create listening socket and enter accept loop The server creates a listening socket and enters its accept() loop. Each incoming connection spawns a child that runs the handle_connection() subroutine. After handle_connection() terminates, the child dies.

Lines 27 “49: The handle_connection() subroutine handle_connection() is responsible for managing the Text::Travesty object, reading client commands from the socket, and handing the command off to the appropriate subroutine. We begin by calling fcntl() to set the owner of the socket so that the process can receive urgent signals. If this is successful, we set the line termination character to the CRLF pair using local to dynamically scope the change in the $/ global variable to the current block and all subroutines it invokes.

We now create a new Text::Travesty object and an IO::Getline wrapper for the socket. Recall from Chapter 13 that IO::Getline has nonblocking behavior by default. In this application, we don't use its nonblocking features, so we turn blocking back on after creating the wrapper. The IO::Getline wrapper is global to the package so as to allow the URG handler to find it; since this server uses a different process to service each incoming connection, this use of a global won't cause problems.

Having finished our initialization, we write our welcome banner to the client, using result code 200. Notice that the IO::Getline module accepts all the object methods of IO::Socket, including syswrite() . This makes the code easier to read than would calling the getline object's handle() method each time to recover the underlying socket.

The remainder of the handle_connection() code is the command-processing loop. Each time through the loop, we read a line, parse it, and take the appropriate action. The BYE command is handled directly in the loop, and the others are passed to an appropriate subroutine. If a command isn't recognized, the server issues a 500 error.

Lines 50 “65: The analyze_file() subroutine The analyze_file() subroutine processes uploaded data. It accepts a Text::Travesty object, reinitializes it by calling its reset() method, and then transmits a 201 message, which prompts the remote host to upload some text data.

We're now going to accept uploaded data from the client by calling $gl->getline() repeatedly until we encounter a line consisting of a dot, or until we are interrupted by an URG signal.

To terminate the loop cleanly, we wrap it in inside an eval{} block and create an URG handler that is local to the block. If an urgent signal comes in, the handler calls the subroutine do_urgent() and then dies. Because die() is called within an eval{} , its effect is to terminate the eval{} block and continue execution at the first statement after the eval{} .

Before exiting, we transmit a code 202 message giving the number of unique words we processed , regardless of whether the upload was interrupted. Notice that we treat interrupted file transfers just as if the uploaded file ended early. We leave the travesty generator in whatever state it happened to be in when the URG signal was received. Because the travesty generator is not affected by the analysis of a partial file, this causes no harm and might be construed as a feature. Another application might want to reset itself to a known state.

Lines 66 “88: The make_travesty() subroutine The make_travesty() subroutine is responsible for generating the travesty text and transmitting it to the client. Its arguments are the Text::Travesty object and the size of the travesty to generate. We first check that the travesty object is not empty; if it is, we return with an error message. Otherwise, we transmit a code 203 message indicating that the travesty text will follow.

We're going to transmit the mangled text now. As in the previous subroutine, we enter an I/O loop wrapped in an eval{} , and again install a local URG handler that runs do_urgent() and dies. If the socket enters urgent mode, our download loop is terminated immediately. This time, however, our URG handler also sets a local variable named $abort to true. The loop calls the travesty object's pretty_text() method to generate up to 500 words, replaces newline characters with the CRLF sequence, and writes out the resulting text. At the end of the loop, we transmit a lone dot.

If the transmission was aborted, we must tell the client to discard data left in the socket stream. We do this by sending an urgent data byte back to the client using this idiom:

 if ($abort) {   warn "make_travesty() aborted\n" if DEBUG;   $gl->send('!',MSG_OOB);   } }

Again, notice that the send() method is passed by IO::Getline to the underlying IO::Socket object.

Lines 89 “93: The reset_travesty() subroutine reset_travesty() calls the travesty object's reset() method and transmits a message acknowledging that the word frequency tables have been cleared.

Lines 94 “108: The do_urgent() signal handler do_urgent() is the signal handler responsible for emptying the internal read buffer when an urgent data byte is received. We recover the socket from the global IO::Getline object and invoke sysread() in a tight loop until the socket's atmark() method returns true. This discards any and all data up to the urgent byte.

We then invoke recv() to read the urgent data itself. The exact contents of the urgent data have no particular meaning to this application, so we ignore it. When this is done, we clear out any of the remaining data in the IO::Getline object's internal buffer by calling its flush() method. The end result of these manipulations is that all unread data transmitted up to and including the urgent data byte is discarded.

The Travesty Client

Now we look at the client (Figure 17.7). It is slightly more complex than the server because it has to receive commands from the user, forward them to the server, and interpret the server's status codes appropriately.

Figure 17.7. Travesty client

graphics/17fig07.gif

Lines 1 “9: Load modules We turn on strict type checking and load the required networking modules, including the Sockatmark module developed in this chapter. We also make STDOUT nonbuffered so that the user's command prompt appears immediately.

Lines 10 “12: Set up globals The $HOST and $PORT globals contain the remote hostname and port number to use. If not provided on the command line, they default to reasonable values. Two other globals are used by the script. $gl contains the IO::Getline object that wraps the connected socket, and $quit_now contains a flag that indicates that the program should exit. Both are global so that they can be accessed by signal handlers.

Lines 13 “15: Set up default signal handlers We set up some signal handlers. The QUIT signal, ordinarily generated from the keyboard by ^\ , is used to terminate the program. INT , however, is a bit more interesting. Each time the handler executes, it increments the $quit_now global by one. If the variable reaches 2 or higher, the program exits. Otherwise, the handler prints " Press ^C again to exit ." The result is that to terminate the program, the user must press the interrupt key twice without intervening commands. This prevents the user from quitting the program when she intended to interrupt output. The URG handler is set to run the do_urgent() subroutine, which we will examine later.

Lines 16 “18: Create connected socket We try to create an IO::Socket handle connected to the remote host. If successful, we use fcntl() to set the socket's owner to the current process ID so that we receive URG signals.

Lines 19 “22: Create IO::Getline wrapper We create a new IO::Getline wrapper on the socket, turn blocking behavior back on, and immediately look for the welcome banner from the host by pattern matching for the 200 result code. If no result code is present, we die with an appropriate error message.

Lines 23 “36: Command loop We now enter the program's main command loop. Each time through the loop, we print a command prompt ( ">" ) and read a line of user input from standard input. We parse the command and call the appropriate subroutine. User commands are:

analyze ”Upload and analyze a text file
generate NNNN ”Generate NNNN words of travesty
reset ”Reset frequency tables
bye ”Quit the program
goodbye ”Quit the program

The command loop's continue{} block sets $quit_now to 0, resetting the global INT counter.

Lines 37 “60: The do_analyze() subroutine The do_analyze() subroutine is called to upload a text file to the server for analysis. The subroutine receives a file path as its argument and tries to open it using IO::File. If the file can't be opened, we issue a warning and return. Otherwise, we send the server the DATA command and the response line. If the response matches the expected 201 result code, we proceed. Otherwise, we echo the response to standard error and return.

We now begin to upload the text file to the server. As in the server code, the upload is done in an eval{} block, but in this case it is the INT signal that we catch. Before entering the block, we set a local variable $abort to false. Within the block we create a local INT handler that prints a warning, sets $abort to true, and dies, causing the eval{} block to terminate. By declaring the handler local, we temporarily replace the original INT handler, and restore it automatically when the eval{} block is finished. Within the block itself we read from the text file one line at a time and send it to the server. When the file is finished, we send the server a "." character.

After finishing the loop, we check the $abort variable. If it is true, then the transfer was interrupted prematurely when the user hit the interrupt key. We need to alert the server to this fact so that it can ignore any data that we've sent it that it hasn't processed yet. This is done by sending the server 1 byte of urgent data.

The last step is to read the response line from the server and print the number of unique words successfully processed.

Lines 61 “67: Handle the reset and bye commands The do_reset() subroutine sends a RESET command to the server and checks the result code. do_bye() sends a BYE command to the server, but in this case does not check the result code because the program is about to exit anyway.

Lines 68 “90: The do get() subroutine The do_get() subroutine is called when the user chooses to generate a travesty from a previously uploaded file. We receive an argument consisting of the number of words of travesty to generate, which we pass on to the server in the form of a GENERATE command. We then read the response from the server and proceed only if it is the expected 203 "travesty follows" code.

We are now ready to read the travesty from the server. The logic is similar to the do_analyze() subroutine. We set the local variable $abort to a false value and enter a loop that is wrapped in an eval{} . For the duration of the loop, the default INT handler is replaced with one that increments $abort and dies, terminating the eval{} block. The loop accepts lines from the server, removes the CRLF pairs with chomp() , and prints them to standard output with proper newlines. The loop terminates normally when it encounters a line consisting of one dot.

After the loop is done, we check the $abort variable for abnormal termination. If it is set to a true value, then we send the server an urgent data byte, telling it to stop transmission. Recall that this also results in the server sending back an urgent data byte to indicate the point at which transmission was halted.

Lines 91 “104: The do_urgent() subroutine The do_urgent() subroutine handles URG signals and is identical to the subroutine of the same name in the server. It discards everything in the socket up to and including the urgent data byte and resets the contents of the IO::Getline object.

Lines 105 “113: Print the program usage print_usage() provides a terse command summary that is displayed whenever the user types an unrecognized command.

Testing the Travesty Server

To test the travesty client/server, I launched the server on one machine and the client on another, in both cases leaving the DEBUG constant true so that I could see debugging messages.

For the first test, I uploaded the file ch17.txt with the ANALYZE command and waited for the upload to complete. I then issued the command generate 100 in order to generate 100 words of travesty:

 %  trav_cli.pl prego.lsjs.org  > analyze /home/lstein/docs/ch17.txt analyzing...processed 2658 words >  generate 100  Summary This will be blocked in flock() until the process receives    a signal to the top of the preforking server that you can provide    an optional timeout value to return if no events occur within a     designated period. The handles() method returns a nonempty list.        At the very top of the program simply terminates with an error     message to the named pipe (also known as an "event"). Each child     process IDs. Its keys are the children. Only one of its termination.    However in an EWOULDBLOCK error. The urgent data containing the     character "!"

The next step was to test that I could interrupt uploads. I ran the analyze command again, but this time hit the interrupt key before the analysis was complete:

 >  analyze /home/lstein/docs/ch17.txt  analyzing...interrupted!...processed 879 words

The message indicates that only 879 of 2,658 unique words were processed this time, confirming that the upload was aborted prematurely. Meanwhile, on the server's side of the connection, the server's do_urgent() URG handler emitted the following debug messages as it discarded all data through to the urgent pointer:

 command = DATA discarding 1024 bytes discarding 1024 bytes discarding 1024 bytes discarding 1024 bytes discarding 531 bytes reading 1 byte of urgent data

The final test was to confirm that I could interrupt travesty generation. I issued the command generate 20000 to generate a very long 20,000-word travesty, then hit the interrupt key as soon as text started to appear.

 >  reset  reset successful >  analyze /home/lstein/docs/ch17.txt  analyzing...processed 2658 words >  generate 20000  to the segment has already been created by a series of possible    .ph file paths. If none succeeds, it dies: Figure 7.4: This     preforking server won't actually close it until all the data bound     for the status hash, and DEBUG is a simple solution is to copy the     contents of the socket or STDIN will be inhibited until the process    receives an INT or TERM signal handlers are parent-specific. So we    don't want to do this while there is significant complexity lurking    under the surface. TCP urgent data. Otherwise the [interrupted]    discarding 1024 bytes of data    discarding 1024 bytes of data    discarding 855 bytes of data    reading 1 byte of urgent data

As expected, the transmission was interrupted and the client's URG signal handler printed out a series of debug messages as it discarded data leading up to the server's urgent data.

The IO::Sockatmark Module

Because of the difficulty in using h2ph to generate the .ph files required by the Sockatmark.pm module, I have recently written a C-language extension module names IO::Sockatmark. It is available on CPAN, and on this book's companion site. If you have encounter problems getting the pure-Perl version of Sockatmark.pm to work, I suggest you replace it with IO::Sockatmark. You will need a C compiler to do this.

The modifications required to use IO::Sockatmark in the "travesty" code examples are very minor. In Figures 17.6 and 17.7, simply change:

 use Sockatmark;

 use IO::Sockatmark;

Top