It's easiest to understand how a preforked server works by contrasting it with an accept-and-fork server. As you recall from Chapter 6, accept-and-fork servers spend most of their time blocking in accept() , waiting for a new incoming connection. When the connection comes in, the parent wakes up just long enough to call fork() and pass the connected socket to its child. After forking, the child process goes on to handle the connection, while the parent process goes back to waiting for accept() . The core of an accept-and-fork server are these lines of code: while ( my $c = $socket->accept ) { my $child = fork; die unless defined $child; if ($child == 0) { # in child process handle_connection($c); exit 0; } close $c; # in parent process } This technique works well under typical conditions, but it can be a problem for heavily loaded servers. Here, connections come in so rapidly that the overhead from fork() call has a noticeable impact, and the server may not be able to keep up with incoming connections. This is particularly the case for Web server applications, which process many short requests that arrive in rapid-fire succession. A common solution to this problem is a technique called preforking. As the name applies, preforking servers fork() themselves multiple times soon after launch. Each forked child calls accept() individually, handles the incoming connection completely, and then goes back to waiting on accept() . Each child may continue to run indefinitely or may exit after processing a predetermined number of requests. The original parent process, meanwhile, acts as a supervisor for the whole process, forking off new children when old ones die and shutting down all the children when the time comes to terminate. At its heart, a preforking server looks like this: for (1..PREFORK_CHILDREN) { next if fork; # parent process do_child($socket); # child process exit 0; # child never loops } sub do_child { my $socket = shift; my $connection_count = 0; while (my $c = $socket->accept ) { handle_connection($c); close $c; } } The main loop forks a number of children, passing the listening socket to each one. Each child process calls accept() on the socket and handles the connection. That's it in a nutshell , but many details make implementing a preforking server more complex than this. The parent process has to wait on its children and launch new ones when they die; it has to shut down its children gracefully when the time comes to terminate; signal handlers must be written carefully so that signals intended for the parent don't get handled by the children and vice versa. The server gets more complicated if you want it to adapt itself dynamically to serve the network by maintaining fewer children when incoming traffic is light and more children when the traffic is heavy. The next sections take you through the evolution of a preforking server from a simple but functional version to a reasonably complex beast . A Web Server For the purposes of illustration, we will write a series of Web servers. These servers will respond to requests for static files only and recognize only a handful of file extensions. Although limited, the final product will be a fully functional server that you can communicate with through any standard Web browser. Each version of the server contains a few subroutines that handle the interaction with the client by implementing a portion of the HTTP core protocol. Since they're invariant, we'll put these subroutines together into a module called Web. We discussed the HTTP protocol from the client's point of view in Chapters 9 and 12. When a browser connects to the server, it sends an HTTP request consisting of a request method (typically "GET") and the URL it wishes to fetch. This may be followed by optional header fields; the whole request is then terminated by two carriage return/ linefeed (CRLF) pairs. The server reads the request and translates the URL into the path to a physical file somewhere on the filesystem. If the file exists and the client is allowed to fetch it, then the server sends a brief header followed by the file contents. The header begins with a numeric status code indicating the success or failure of the request, followed by optional fields describing the nature of the document that follows . The header is separated from the file contents by another pair of CRLF sequences. The HEAD request is treated in a similar fashion, but instead of returning the entire document, the server returns just the header information. Figure 15.1 lists the Web module. Figure 15.1. Core Web server routines Lines 1 “8: Module setup The module declares the handle_connection() and docroot() functions for export. The former is the main entry point for Web transaction handling. The latter is used to set the location of the "document root", the physical directory that corresponds to the URL "/". Lines 9 “10: Declare global variables Our only global variable is $DOCUMENT_ROOT , which contains the path to the physical directory that corresponds to the topmost URL at the site. All files served by the Web server will reside under this directory. We default to /home/www/htdocs , but your script can call docroot() to change this location. Like many line-oriented network protocols, HTTP terminates its lines with the CRLF sequence. For readability, we define a $CRLF global that contains the correct character sequence. Lines 11 “32: The handle_connection() subroutine Most of the work happens in handle_connection() , which takes a connected socket as its argument and handles the entire HTTP transaction. The first part of the subroutine reads the request by setting the line-end character ( $/ ) to " $CRLF$CRLF " and invoking the <> operator. Lines 16 “19: Process request The next section processes the request. It attempts first to parse out the topmost line and extract the requested URL. If the request method isn't GET or HEAD, or if the protocol the browser is using isn't HTTP/1.0 or HTTP/1.1, then the function sends an error message to the browser by calling a subroutine named invalid_request() , and returns. Otherwise, it calls the lookup_file() subroutine to try to open the requested file for reading. If lookup_file() is successful, it returns a three-element list that contains an open filehandle, the type of the file, and its length. Otherwise, it returns an empty list and calls not_found() to send an appropriate error message to the browser. Another exceptional condition that the subroutine needs to deal with is the case of the browser requesting a URL that ends in a directory name rather than a filename. Such URLs must end with a slash, or else relative links in HTML documents, such as ../service_info.html , won't work correctly. If the browser requests a URL that ends in a directory and the URL has no terminating slash, then lookup_file() reports this case by returning a file type of "directory." In this eventuality, the server calls a function named redirect() to tell the browser to reissue its request using the URL with a slash appended. Lines 20 “24: Print header If the requested document was opened successfully, handle_connection() produces a simple HTTP header by sending a status line with a result code of 200, followed by headers indicating the length and type of the document. This is terminated by a CRLF pair. A real Web server would send other information as well, such as the name of the server software, the current date and time, and the modification time of the requested file. Lines 25 “32: If the request was HEAD, then we're finished and we exit from the routine. Otherwise, we copy the contents of the filehandle to the socket using a tight while() loop. When the entire file has been copied to the socket, we close its filehandle and return. Lines 33 “48: lookup_file() subroutine The lookup_file() subroutine is responsible for translating a requested URL into a physical file path, gathering some information about the selected file, and opening it, if possible. The subroutine is also responsible for making sure that the browser doesn't try to play malicious tricks with the URL, such as incorporating double dots into the path in order to move into a part of the filesystem that it doesn't have permission to access. Lines 35 “39: Process URL lookup_file() begins by turning the URL into a physical path by prepending the contents of $DOCUMENT_ROOT to the URL. We then do some cleanup on the URL. For example, the path may contain a query string (a " ? " followed by text) and possibly an HTML fragment (a " # " followed by text). We strip out this information. The path may terminate with a slash, indicating that it is a directory. In this case, we append index.html to the end of the path in order to retrieve the automatic "welcome page." The last bit of path cleanup is to prevent the remote user from tricking us into retrieving files outside the document root space by inserting relative path elements (such as " .. ") into the URL. We defeat this by refusing to process paths that contain relative elements. Line 40: Handle directory requests Now we need to deal with requests for paths that end in directory names (without the terminating slash). In this case, we must alert the caller of the fact so that it can generate a redirect. We apply the -d directory test operator to the path; if the operator returns true, we return a phony document type of "directory" to the caller. Lines 41 “45: Determine MIME type and size of document The next part of the subroutine determines the MIME type of the requested document. A real Web browser would have a long lookup table of file extensions. We look for HTML, GIF, and JPEG files only and default to text/plain for anything else. The routine now retrieves the size of the requested file in bytes by calling stat() . Perl already called stat() internally when it processed the -d switch, so there isn't any reason to repeat the system call. The idiom stat(_) retrieves the buffered status information from that earlier invocation, saving a small amount of CPU time. The file may not exist, in which case stat() returns undef . Lines 46 “48: Open document The last step is to open the file by calling IO::File->new() . There is another hidden trap here if the remote user includes shell metacharacters (such as " > " or " ") in the URL. Instead of calling new() with a single argument, which will pass these metacharacters to the shell for processing, we call new() with two arguments: the filename and the file mode (" < " for read). This inhibits metacharacter processing and avoids our inadvertently launching a subprocess or clobbering a file if we're passed a maliciously crafted URL. If new() fails, we return undef . Otherwise, the function returns a three-element list of the open filehandle, the file type, and the file length. Lines 49 “66: Redirect() function The redirect() function is responsible for sending a redirection message to the browser. It's called when the browser asks for a URL that ends in a directory and no terminal slash. The ultimate goal of the function is to transmit a document like this one: HTTP/1.0 301 Moved permanently Location: http://192.168.2.1:8080/service_records/ Content-type: text/html <HTML> <HEAD><TITLE>301 Moved</TITLE></HEAD> <BODY><H1>Moved</H1> <P>The requested document has moved <A HREF="http://192.168.2.1:8080/service_records/";>here</A>.</P> </BODY> </HTML> The important part of the document is the status code, 301 for "moved permanently," the Location field, which gives the full URL where the document can be found. The remainder of the document produces a human-readable page for the benefit of some (extremely old) browsers that don't recognize the redirect command. The logic of redirect() is very straightforward. We recover the IP address of the server host and the listening port by calling the connected socket's sockhost() and sockport() methods . We then generate an appropriate document based on these values. This version of redirect() suffers the minor esthetic deficiency of replacing the name of the server host with its dotted IP address. You could fix this by calling gethostbyaddr() (Chapter 3) to turn this address into a hostname, probably caching the result in a global for performance considerations. Lines 67 “93: invalid_request() and not_found() subroutines The invalid_request() and not_found() functions are very similar. invalid_request() returns a status code of 400, which is the blanket code for "bad request". This is followed by a little HTML document that explains the problem in human-readable terms. not_found() is similar but has a status code of 404, used when the requested document is not available. Lines 94 “98: docroot() subroutine The docroot() subroutine either returns the current value of $DOCUMENT_ROOT or changes it if an argument is provided. Serial Web Server This first version of the Web server is very simple (Figure 15.2). It consists of a single accept() loop that handles requests serially . Figure 15.2. The baseline server handles requests serially I used this "baseline" server to verify that the Web module was working properly. After creating the socket, the server enters an accept() loop. Each time through the loop it calls the Web module's handle_connection() to handle the request. If you run this server and point your favorite Web browser at port 8080 of the host, you'll see that it is perfectly capable of fetching HTML files and following links. However, pages with multiple inline images will be slow to display, because the browser tries to open a new connection for each image but the Web server can handle connections only in a serial fashion. Accept-and-Fork Web Server The next step up in complexity is a conventional forking server (Figure 15.3). This version uses the Daemon module developed in Chapter 14 to do some of the common tasks of a network daemon, including autobackgrounding, writing its PID into a file, and rerouting warn() and die() so that error messages appear in the system log. The Daemon module also automatically installs a CHLD signal handler so that we don't have to worry about reaping terminated children. Figure 15.3. A forking Web server Daemon won't work on Win32 systems because it makes various UNIX-specific calls. Appendix A lists a simple DaemonDebug module, which has the same interface calls as Daemon but doesn't autobackground, open the syslog, or make other UNIX-specific calls. Instead, the process remains in the foreground and writes its error and debugging messages to standard error. In the following code examples, just replace "Daemon" with "DaemonDebug" and everything should work fine on Win32 systems. You might do this on UNIX systems as well if you want the server to remain in the foreground or you are having problems getting the Sys::Syslog module to work. We've looked at accept-and-fork servers before, but we do things a bit differently in this one, so we'll step through it. Lines 1 “7: Load modules We load the standard IO::* modules, Daemon, and Web. The latter two modules must be installed in the current directory or somewhere else in your Perl @INC path. Line 8: Define constants We choose a filename for the PID file used by Daemon. After autobackgrounding, this file will contain the PID of the server process. Line 9: Declare globals The $DONE global variable is used to flag the main loop to exit. Line 10: Install signal handlers We create a handler for INT and TERM to bump up the $DONE variable, causing the main loop to exit. During initialization, Daemon installs a CHLD handler as well. Lines 11 “14: Create listening socket We create a listening IO::Socket::INET object in the usual way. Line 15: Create IO::Select object We create an IO::Select object containing the socket for use in the main accept loop. The rationale for this will be explained in a moment. Lines 16 “18: Initialize server We call the Daemon module's init_server() routine to create the PID file for the server, autobackground, and initialize logging. Lines 19 “30: Main accept loop We enter a loop in which we call accept() , fork off a child to handle the connection, and continue looping. The loop will only terminate when the INT or TERM interrupt handler sets the $DONE global to true. The problem with this strategy is that the loop spends most of its time blocking in the call to accept() , making it likely that the termination signal will be received during this system call. However, accept() is one of the slow I/O calls that is automatically restarted when interrupted by a signal. Although $DONE is set to true, the server accepts one last incoming connection before it realizes that it's time to quit. We would prefer that the server exit immediately. In previous versions of the forking server we have either (1) let the interrupt handler kill the server immediately or (2) used IO::Socket's timeout mechanism to make accept() interruptable. For variety, this version of the server uses a different strategy. Rather than block in accept() , we block in a call to IO::Select->can_read() . Unlike the I/O calls, select() is not automatically restarted. When the INT or TERM signal is received, the can_read() method is interrupted and returns undef . We detect this and return to the top of the loop, where the change in $DONE is detected . If, instead, can_read() returns true, then we know we have an incoming connection. We go on to call the socket object's accept() method. If this is successful, then we call the launch_child() function exported by the Daemon module. Recall that launch_child() is a wrapper around fork() that launches children in a signal-safe manner and updates a package global containing the PIDs of all active children. launch_child() can take a number of arguments, including a callback to be invoked when the child is reaped. In this case, we're not interested in handling that event, so we pass no arguments. If launch_child() returns a child PID of 0, then we know we are in the child process. We close our copy of the listening socket and call the Web module's handle_connection() method on the connected socket. Otherwise, we are the parent. We close our copy of the connected socket and continue looping. The subjective performance of the accept-and-fork server is significantly better than the serial version, particularly when handling pages with inline images. Preforking Web Server, Version 1 The next version of our server (Figure 15.4) is not much more complex. After opening the listen socket, the server forks a preset number of child processes. Having done its job, the parent process exits, leaving each child process to run a serial accept() loop. The total number of simultaneous connections that the server can handle is limited by the number of forked children. Figure 15.4. Preforking Web server, version 1 Lines 1 “6: Load modules We load the IO::* modules, Daemon, and Web. Lines 6 “7: Define constants In addition to the PIDFILE constant needed by the init_server() routine, we declare PREFORK_CHILDREN to be the number of child server processes we will fork. Lines 8 “11: Create listening socket We create the listening socket in the usual way. Lines 12 “13: Initialize the server We call the Daemon module's init_server() function to autobackground the server, set up logging, and create the PID file. The server will actually exit soon after this, and the PID file will disappear; this problem will be fixed in the next iteration of the server. Lines 14 “15: Prefork children We call our make_new_child() subroutine PREFORK_CHILDREN times to spawn the required number of children. The main server process then exits, leaving the children to run the show. Lines 16 “20: make_new_child() subroutine The make_new_child() subroutine calls the Daemon module's launch_child() function to do a signal-safe fork. If launch_child() returns a PID, we know we are in the parent process and return. Otherwise, we are the child, so we run the do_child() subroutine. When do_child() returns, we exit. Lines 22 “40: do_child() subroutine Each child runs what is essentially a serial accept() loop. We call $socket->accept() in a loop, handle the incoming connection, and then wait for the next incoming request. When you run this version of the server, it returns to the command line after all the children are forked. If you run the ps command on UNIX, or the Process Manager program on Windows ( assuming you have a newer version of Perl that supports fork on Windows), you will see five identical Perl processes corresponding to the five server children. The subjective performance of this server is about the same as that of the forking server. The differences show up only when the server is heavily loaded with multiple incoming connections, at which point the fact that the server can't handle more than PREFORK_CHILDREN connections simultaneously becomes noticeable. Preforking Web Server, Version 2 Although the first version of the preforking server works adequately, it has some problems. One is that the parent process abandons its children after spawning them. This means that if a child crashes or is killed deliberately by an external signal, there's no way to launch a new child to take its place. On the flip side, there currently isn't an easy way to terminate the server ”each child has to be killed by hand by discovering its PID and sending it an INT or TERM signal (or using the Process Manager to terminate the task on Win32 platforms). The solution to this problem is for the parent to install signal handlers to take the appropriate actions when a child dies or the parent receives a termination signal. After launching the first set of children, the parent remains active until it receives the signal to terminate. A second problem is more subtle. When multiple processes try to accept() on the same socket, they are all put to sleep until an incoming connection becomes available. When a connection finally does come in, all the processes wake up simultaneously and compete to complete the accept() . Even under the best of circumstances, this can put a strain on the operating system because of the large number of processes becoming active at once and competing for a limited pool of system resources. This is called the "thundering herd" phenomenon . This problem is made worse by the fact that some operating systems, Solaris in particular, forbid multiple processes from calling accept() on the same socket. If they try to do so, accept() returns an error. So the preforking server does not work at all on these systems. Fortunately, a simple strategy will solve both the thundering herd problem and the multiple accept() error. This is to serialize the call to accept() so that only one child process can call it at any given time. The strategy is to make the processes compete for access to a low-overhead system resource, typically an advisory lock on a file, before they can call accept() . The process that gets the lock is allowed to call accept() , after which it releases the lock. The result is that one process is blocked in accept() , while all the rest are put to sleep until the lock becomes available. In this example, we use the flock () system call to serialize accept. This system call allows a process to obtain an advisory lock on an opened file. If one process holds a lock on the file and another process tries to obtain its own lock, the second process blocks in flock() until the first lock is released. Having obtained the lock, no other process can obtain it until the lock is released. Our strategy is to create and maintain a temporary lock file to use for flock() serialization. Each child will attempt to lock the file before calling accept() and release the lock immediately afterward. The result of this is to protect the call to accept() so that only one process can call it at any time. The others are blocked in flock() and waiting for the lock to become available. We discussed the syntax of flock() in the Chapter 14 section, Direct Logging to a File. Conveniently enough, we don't have to create a separate lock file because we can use our PID file for this purpose. On entry to the do_child() subroutine, we call IO::File's open() method to open the PID file, using the O_RDONLY flag to open it in a read-only fashion. In this version of the preforking Web server, we make the necessary modifications to serialize accept() and to relaunch child processes to replace exited ones. We also arrange for the parent process to kill its children cleanly when it exits. Figure 15.5 shows the server with both sets of modifications in place. Figure 15.5. This preforking server serializes accept() and relaunches new children to replace old ones Lines 1 “7: Import modules We import the Fcntl module in addition to those we imported in earlier versions. This module exports several constants we need to perform file locking and unlocking. Lines 8 “11: Define constants In addition to PREFORK_CHILDREN and PIDFILE , we define a MAX_REQUEST constant. This constant determines the number of transactions each child will handle before it exits. By setting this to a low value, you can watch children exit and the parent spawn new ones to replace them. We also define DEBUG , which can be set to generate verbose log messages. Lines 12 “13: Declare global variables $CHILD_COUNT is updated to reflect the number of children active at any given time. $DONE is used as before to flag the parent server that it is time to exit. Line 14: Signal handlers The INT and TERM handlers process requests to terminate. As before, we will rely on the Daemon module to install a handler for CHLD . Lines 15 “20: Create listening socket, initialize server We create the listening socket and call the Daemon module's init_server() routine to write the PID file and go into the background. Lines 21 “24: Main loop We now enter a loop in which we launch PREFORK_CHILDREN and then go to sleep until a signal is received. As we will see, each call to make_new_child() increments the $CHILD_COUNT global by one each time it creates a child, and the CHLD callback routine decrements $CHILD_COUNT each time a child dies. The effect of the loop is to wait until CHLD or another signal is received and then to call make_new_child() as many times as necessary to bring the umber of children up to the limit set by PREFORK_CHILDREN . This continues indefinitely until the parent server receives an INT or TERM signal and sets $DONE to true. Lines 25 “27: Kill children and exit When the main loop is finished, we kill all the children by calling the Daemon module's kill_children() subroutine. The essence of this routine is the line of code: kill TERM => keys %CHILDREN; where %CHILDREN is a hash containing the PIDs of the active children launched by launch_child() . kill_children() waits until the last child has died before terminating. Lines 28 “37: make_new_child() subroutine As in the last version, the make_new_child() subroutine is invoked to create a new server child process. One change from the previous version is that when we call the launch_child() subroutine, we pass it a reference to a subroutine to be invoked whenever Daemon reaps the child. In this case, our callback is cleanup_child() , which decrements the $CHILD_COUNT global by one. The other new feature is that after the parent launches a new child, it increments $CHILD_COUNT by one. Together, these changes allow $CHILD_COUNT to reflect an accurate count of active child processes. Lines 38 “52: do_child() subroutine The do_child() subroutine, which runs each child's accept() loop, is modified to serialize accepts. On entry to the subroutine, we open the PID file read-only, creating a filehandle that we can use for locking. Before each call to accept() , we call flock() on the filehandle with an argument of LOCK_EX to gain an exclusive lock. We then release this lock following accept() by calling flock() again with the LOCK_UN argument. After accepting the connection, we call the Web module's handle_connection() routine as before. Lines 53 “56: cleanup_child() subroutine This subroutine is called by the Daemon module's CHLD handler, which is invoked after reaping an exited child; consequently, the subroutine is invoked within an interrupt. We recover the child PID, which is passed to us by the Daemon module, but we don't do anything with that information in this version of the server. We just decrement $CHILD_COUNT by one to flag the main loop that a child has died. If you have a version of the ps or top routines that can show the system call that each process is executing, you can see the difference between the nonserialized and the serialized versions of the server. On my Linux system, top shows the following for the nonserialized version of the server: PID SIZE WCHAN STAT %CPU %MEM TIME COMMAND 15300 2560 tcp_parse S 0.0 4.0 0:00 web_prefork1.pl 15301 2560 tcp_parse S 0.0 4.0 0:00 web_prefork1.pl 15302 2560 tcp_parse S 0.0 4.0 0:00 web_prefork1.pl 15303 2560 tcp_parse S 0.0 4.0 0:00 web_prefork1.pl 15304 2560 tcp_parse S 0.0 4.0 0:00 web_prefork1.pl There are five children, and each one (as indicated by the WCHAN column) is in a system call named tcp_parse . This routine is presumably called by accept() while waiting for an incoming connection. In contrast, the latest version of the preforking server shows a different profile: PID SIZE WCHAN STAT %CPU %MEM TIME COMMAND 15313 2984 pause S 0.0 4.6 0:00 web_prefork2.pl 15314 2980 flock_lock S 0.0 4.6 0:00 web_prefork2.pl 15315 2980 tcp_parse S 0.0 4.6 0:00 web_prefork2.pl 15316 2980 flock_lock S 0.0 4.6 0:00 web_prefork2.pl 15317 2980 flock_lock S 0.0 4.6 0:00 web_prefork2.pl 15318 2980 flock_lock S 0.0 4.6 0:00 web_prefork2.pl The process at the top of the list (PID 15313) is the parent. Top shows it in pause because that's the system call invoked by sleep() . The other five processes (15314 “15318) are the children. Only one of them is performing an accept() . The others are blocked in the flock_lock system call. As the children process incoming connections, they take turns, with never more than one calling accept() at any given time. An Adaptive Preforking Server A limitation of the previous versions of the preforking Web server is that if the number of incoming connections exceeds the number of children available to handle them, the excess connections will wait in the incoming TCP queue until one of the children becomes available to call accept() . The accept-and-fork servers of Chapters 10 and 14 don't have this behavior; they just launch new children as necessary to handle incoming requests. The last two versions of the preforking server that we consider are adaptive ones. The parent keeps track of which children are idle and which are busy handling connections. If the number of idle children drops below a level called the "low water mark," the parent launches new children to raise the number. If the number of idle children exceeds a level called the "high water mark," the parent kills the excess idle ones. This strategy ensures that there are always a few idle children ready to handle incoming connections, but not so many that system resources are wasted . The main challenge to an adaptive server is the communication between the children and their parent. In previous versions, the only communication between child and parent was the automatic CHLD signal sent to the parent when a child died. This was sufficient to keep track of the number of active children, but it is inadequate for our current needs, where the child must pass descriptive information about its activities. There are two common solutions to this problem. One is for the parent and children to send messages via a filehandle. The other technique is to use shared memory so that the parent and child processes share a Perl variable. When the variable is changed in a child process, the changes become visible in the parent as well. In this section, we show an example of an adaptive preforking server that uses a pipe for child-to-parent communications. We'll look at the shared memory solution in the next section. Chapter 2 demonstrated how unidirectional pipes created with the pipe() call can be used by a set of child processes to send messages to their common parent (see the section Creating Pipes with the pipe() Function). The same technique is ideal in this application. At startup time, the adaptive server creates a pipe using pipe() : pipe(CHILD_READ,CHILD_WRITE); This creates two handles. CHILD_WRITE will be used by the children to write status messages, and CHILD_READ will be used by the parent to receive them. Each time we fork a new child process, the new child closes CHILD_READ and keeps a copy of CHILD_WRITE . The format of the status messages is simple. They consist of the child's PID, whitespace, the current status, and a newline: 2209 busy The status may be any of the strings "idle," "busy," and "done." The child issues the "idle" status just before calling accept() and "busy" just after accepting a new connection. The child announces that it is "done" when it has processed its maximum number of connections and is about to exit. The parent reads the messages in a loop, parsing them and keeping a global named %STATUS up to date. Each time a child's status changes, the parent counts the busy and idle children and if necessary launches new children or kills old ones to keep the number of idle processes in the desired range. We want the parent's read loop to be interruptable by signals so that we can kill the server. Before the server exits, it kills each remaining child so that everything exits cleanly. Similarly, we arrange for the child processes' accept() loop to be interruptable so that the child exits immediately when it receives a termination signal from its parent. At any time, there is a single active CHILD_READ filehandle in the parent and multiple CHILD_WRITE filehandles in the children. You might well wonder what prevents messages from the children being garbled as they are intermingled. This design works because of a particular characteristic of the pipe implementation. Provided that messages are below a certain size threshold, write operations on pipes are automatic. A message written to a pipe by one process is guaranteed not to interrupt a message written by another. This ensures that messages written into the pipe come out intact at the other end and not garbled with data from writes performed by other processes. The size limit on automatic messages is controlled by the operating system constant PIPE_BUF , available in the header file limits.h . This varies from system to system, but 512 bytes is generally a safe value. Figure 15.6 shows the code for the adaptive server. Figure 15.6. Preforking server using a pipe for interprocess communication Lines 1 “8: Load modules We bring in the standard IO::* modules, Fcntl , and our own Daemon and Web modules. Lines 9 “14: Define constants We define several new constants. HI_WATER_MARK and LO_WATER_MARK define the maximum and minimum number of idle servers, respectively. They are set deliberately low in this example to make it easy to watch the program work. DEBUG is a constant indicating whether to print debugging information. Lines 15 “16: Declare globals The $DONE flag causes the server to exit when set to true. The %STATUS hash contains child status information. As in the previous example, the child PIDs form the keys of the hash, while the status information forms the values. Line 17: Interrupt handlers We install a handler for INT and TERM that sets the $DONE flag to true, ultimately causing the server to exit. Recall also that the Daemon module automatically handles the CHILD signal by reaping children and maintaining a list of child PIDs in the %CHILDREN global. Lines 18 “21: Create socket We create a listening socket in the usual way. Lines 22 “24: Create pipe We create a unidirectional pipe with the pipe() call and add the CHILD_READ end of the pipe to an IO::Select set for use in the main loop. We will discuss the rationale for using IO::Select momentarily. Lines 25 “26: Initialize server We call the Daemon module's init_server() routine to create the PID file for the server, autobackground, and initialize logging. Lines 27 “28: Prefork children We call our internal make_new_child() subroutine to fork the specified number of child server processes. Line 29: Main loop The main loop of the server runs until $DONE is set to true in a signal handler. Each time through the loop, the server waits for a status change message from a child or a signal. To keep the number of idle children between the low and high water marks, it updates the contents of %STATUS and runs the code that we have seen previously for launching or killing children. Lines 30 “42: Process messages from the pipe Looking at the main loop in more detail, we want to read status lines from the CHILD_READ filehandle using sysread () . However, we can't simply let the parent block in the I/O call, because we want to be able to terminate when we receive a TERM signal or notification that one of the child processes has died; sysread() , like the other slow I/O calls, is automatically restarted by Perl after interruption by a signal. The easiest solution to this problem is again to use select() to wait for the pipe to become readable because select() is not automatically restarted. We call the IO::Select object's can_read() method to wait for the pipe to become ready, and then invoke sysread() to read its current contents into a buffer. The data read may contain one message or several, depending on how active the children are. We split the data into individual messages on the newline character and parse the messages. If the child's status is "done," we delete its PID from the %STATUS global. Otherwise, we update the global with the child's current status code. Lines 43 “52: Launch or kill children After updating %STATUS , we collect the list of idle children by using grep() to filter the %STATUS hash for those children whose status is set to "idle." If the number of idle children is lower than LO_WATER_MARK , we call make_new_child() as many times as required to bring the child count up to the desired level. If the number of idle children exceeds HI_WATER_MARK , then we politely tell the excess children to quit by sending them a HUP ("hangup") signal. As we will see later, each child has a HUP handler that causes it to terminate after finishing its current connection. This is better than terminating the child immediately, because it avoids breaking a Web session that is in process. When we tally the idle children, we sort them numerically by process ID, causing older excess children to be killed preferentially. This is probably unnecessary, but it might be useful if the child processes are leaking memory. Lines 54 “70: Termination When the main loop is done, we log a warning and call the kill_children() subroutine defined in Daemon. kill_children() sends each child a TERM and then waits for each one to exit. When the subroutine returns, we log a second message and exit. Lines 58 “67: make_new_child() subroutine make_new_child() is invoked to create a new child process. We invoke the Daemon module's launch_child() function to fork a new child in a signal-safe manner. When we call launch_child() , we pass it a code reference to a callback routine that will be invoked immediately after the child is reaped. The callback, cleanup_child() , is responsible for keeping %STATUS up to date even if the child exits abnormally. launch_child() returns the PID of the child in the parent process and numeric 0 in the child process. In the former case, we simply log a debugging message. In the latter, we close the CHILD_READ filehandle, because we no longer need it, and run our Web server routines by calling do_child() . When do_child() is finished, we exit. >Lines 68 “91: do_child() subroutine At its heart, this routine does exactly what the previous version of do_child() did. It serializes on the lock file using flock() , calls the listening socket's accept() method, and passes the connected socket to the Web module's handle_connection() function. The main differences from the previous version are (1) it handles HUP signals sent to it by the parent by shutting down gracefully, and (2) it writes status messages to the CHILD_WRITE filehandle. Lines 70 “73: Initialize subroutine and start accept() loop When we enter the do_child() routine, we open the lock file and initialize the $cycles variable as before. We then install a handler for HUP which sets the local variable $done to true. Our accept loop exits when $done becomes true or we have processed the maximum number of transactions. At the top of the accept() loop, we write a status message containing our process ID (stored in $$ ) and the "idle" status message. Lines 76 “83: Lock and call accept() The rationale for the next bit of code is a bit subtle. We call flock() and then accept() as before. However, what happens if the HUP signal from the parent comes in while we're in one or the other of those calls? The HUP handler executes and sets $done to true, but since Perl restarts slow system calls automatically, we will not notice the change in $done until we have received an incoming connection, processed it, and returned to the top of the accept loop. We cannot handle this by interposing an interruptable select() between the calls to flock() and accept() , because the HUP might just as easily come while we are blocked for the flock() call, and flock() is also restartable. Instead, we wrap the calls to flock() and accept() in an eval{} block. At the top of the block we install a new local HUP handler, which bumps up $done and dies, forcing the entire eval{} block to terminate when the HUP signal is received. We test the value returned by the block, and if it is undefined, we return to the top of the loop, where the change in $done will be detected. Lines 84 “91: Handle connection If the eval{} block runs to completion, then we have accepted a new incoming connection. We send a "busy" message to the parent via CHILD_WRITE and call the handle_connection() subroutine. After the loop terminates, we write a "done" message to the parent, close all our open filehandles, and exit. Lines 92 “95: cleanup_child() subroutine cleanup_child() is the callback routine invoked when the reap_child() subroutine defined in Daemon successfully receives notification that a child has died. We receive the child's PID on the subroutine stack and delete it from %STATUS . This handles the case of a child dying before it has had a chance to write its "done" status to the pipe. When we run the adaptive preforking server with the DEBUG option set to a true value, we see messages from the parent whenever it launches a new child (including the three preforked children at startup time), processes a status change message, or kills an excess child. We see messages from the children whenever they call accept() or terminate. Notice how the parent killed a child when the number of idle processes exceeded the high water mark. Jun 21 10:46:19 pesto prefork_pipe.pl[7195]: launching child 7196 Jun 21 10:46:19 pesto prefork_pipe.pl[7195]: launching child 7201 Jun 21 10:46:20 pesto prefork_pipe.pl[7195]: launching child 7202 Jun 21 10:46:19 pesto prefork_pipe.pl[7196]: child 7196: calling accept() Jun 21 10:46:20 pesto prefork_pipe.pl[7195]: 7201=>idle 7202=>idle 7196=>idle Jun 21 10:46:38 pesto prefork_pipe.pl[7195]: 7201=>idle 7202=>idle 7196=>busy Jun 21 10:46:38 pesto prefork_pipe.pl[7202]: child 7202: calling accept() Jun 21 10:46:41 pesto prefork_pipe.pl[7195]: 7201=>idle 7202=>idle 7196=>idle Jun 21 10:46:42 pesto prefork_pipe.pl[7196]: child 7196: calling accept() Jun 21 10:46:42 pesto prefork_pipe.pl[7195]: 7201=>idle 7202=>busy 7196=>idle Jun 21 10:46:49 pesto prefork_pipe.pl[7195]: 7201=>idle 7202=>busy 7196=>busy Jun 21 10:46:49 pesto prefork_pipe.pl[7201]: child 7201: calling accept() Jun 21 10:46:56 pesto prefork_pipe.pl[7195]: launching child 7230 Jun 21 10:46:56 pesto prefork_pipe.pl[7217]: child 7217: calling accept() Jun 21 10:46:56 pesto prefork_pipe.pl[7195]: 7217=>idle 7201=>busy 7202=>busy 7196=>busy 7230=>idle Jun 21 10:47:08 pesto prefork_pipe.pl[7195]: 7217=>busy 7201=>busy 7202=>busy 7196=>busy 7230=>idle Jun 21 10:47:08 pesto prefork_pipe.pl[7230]: child 7230: calling accept() Jun 21 10:47:09 pesto prefork_pipe.pl[7195]: launching child 7243 Jun 21 10:47:09 pesto prefork_pipe.pl[7195]: 7217=>busy 7201=>idle 7202=>idle 7243=>idle 7196=>idle 7230=>idle Jun 21 10:47:29 pesto prefork_pipe.pl[7195]: killed 1 children Jun 21 10:48:54 pesto prefork_pipe.pl[7196]: child 7196: calling accept() Jun 21 10:48:54 pesto prefork_pipe.pl[7230]: child 7230 done Jun 21 10:50:18 pesto prefork_pipe.pl[7195]: Termination received, killing children As written, there is a potential bug in the parent code. The parent process reads from CHILD_READ in maximum chunks of 4,096 bytes rather than in a line-oriented fashion. If the children are very active and the parent very slow, it might happen that more than 4,096 bytes of messages could accumulate and the last message get split between two reads. Although this is unlikely (4,096 bytes is sufficient for 400 messages given an average size of 10 bytes per message), you might consider buffering these reads in a string variable and explicitly checking for partial reads that don't terminate in a newline. An Adaptive Preforking Server Using Shared Memory Last we'll look at the same server implemented using shared memory. All modern versions of UNIX support a shared memory facility that allows processes to read and write to the same segment of memory. This allows them to share variables and other data structures. Shared memory also includes a locking facility that allows one process to gain temporary exclusive access to the memory region to avoid race conditions in which two processes try to modify the same memory segment simultaneously. While Perl gives you access to the low-level shared memory calls via shmget () , schmread() , schmwrite() , and schmctl() , the IPC::Shareable module provides a high-level tied interface to the shared memory facility. Once you declare a scalar or hash variable tied to IPC::Shareable, its contents can be shared with any other Perl process. IPC::Shareable can be downloaded from CPAN. It requires the Storable module to be installed and will install it automatically for you if you use the CPAN shell. Here's the idiom for placing a hash in shared memory: tie %H, 'IPC::Shareable', 'Test', {create => 1, destroy => 1, exclusive => 1, mode => 0666}; The first argument gives the name of the variable to tie, in this case %H . The second is the name of the IPC::Shareable module. The third argument is a "glue" ID that will identify this variable to the processes that will share it. This can be an integer or any string of up to four letters . In this example we use a glue ID of Test . The last argument is a hash reference containing options to pass to IPC::Shareable. There are a variety of options, but the most frequent are create , destroy , exclusive , and mode . The create option causes the shared memory segment to be created if it doesn't exist already. It is often used in conjunction with exclusive to cause the tie() to fail if the segment already exists, and with destroy to arrange for the shared memory segment to be destroyed automatically when the process exits. Finally, mode specifies an octal access mode for the shared memory segment. It functions like file modes, where 0666 is the most liberal , and allows any process to read and write the memory segment, and 0600 is the most conservative, making the shared variable accessible only to processes that share the same user ID. Multiple processes can tie hashes to the same memory segment, provided that they have sufficient access privileges. In a typical case of a parent that must share data with multiple children, the parent first creates the shared memory using the create , destroy , and exclusive options. Each child then ties its own variable to the same glue ID. The children are not responsible for creating or destroying the shared memory, so they don't pass options to tie() : tie %my_copy, 'IPC::Shareable', 'Test'; After a hash variable is tied, all changes made to the variable by one process are seen immediately by all others. You can store scalar variables, objects, and references into the values of a shared hash, but not filehandles or subroutine references. However, there are certain subtleties to storing complex objects into shared hashes; see the IPC::Shareable documentation for all the caveats. If multiple processes try to modify the same shared variable simultaneously, odd things can happen. Even something as simple as $H{'key'}++ is a bit risky, because the ++ operation occurs internally in several steps: The current value is fetched , incremented, and stored back into the hash. If another process tries to modify the value before ++ has finished executing, its changes will be overwritten. The simple solution is to lock the hash before performing a multistep update and unlock it before you finish. Here's the idiom: tied(%H)->shlock; $H{'key'}++; tied(%H)->shunlock; The tied() method returns a reference to an object that is maintained internally by IPC::Shareable. It has just two public methods: shlock() and shunlock() . The first method locks the variable so that it can't be accessed by other processes, and the second reverses the lock. (These methods have no direct relationship to the lock() function used in threading or the flock() function used earlier in this chapter to serialize accept(). ) Scalar variables can also be tied to shared memory using a similar interface. Tied arrays are currently not supported. A new version of the adaptive preforking Web server written to take advantage of IPC::Shareable is shown in Figure 15.7. Figure 15.7. An adaptive preforking server using shared memory Lines 1 “8: Load modules We load the same modules as before, plus the IPC::Shareable module. Lines 9 “15: Define constants We define a new constant, SHM_GLUE , which contains the key that parent and children will use to identify the shared memory segment. Lines 16 “17: Declare globals We declare $DONE and %STATUS , which have the same significance as in the previous example. The major difference is that %STATUS is tied to shared memory and updated directly by the children, rather than kept up to date by the parent. Lines 18 “19: Install signal handlers We install TERM and INT handlers that set the $DONE flag to true, causing the server to terminate. We also intercept the ALRM signal with a handler that does absolutely nothing. As you will see, the parent spends most of its time in the sleep() call, waiting for one of its children to send it an ALRM to tell it that the contents of %STATUS have changed. We must install a handler for ALRM to override the default action of terminating the program completely. Lines 20 “25: Create socket, initialize server We create a listening socket and call the Daemon module's init_server() routine in the usual way. Lines 26 “28: Tie %STATUS We tie %STATUS to shared memory, using options that cause the shared memory to be created with restrictive access modes and to be destroyed automatically when the parent exits. If the memory segment already exists when tie() is called, the call will fail. This may happen if another program chose the same ID value for a shared memory segment or if the server crashed abnormally, leaving the memory allocated. In the latter case, you may have to delete the shared memory manually using a tool provided by your operating system. On Linux systems, the command to remove a shared memory segment is ipcrm . The contents of %STATUS are identical to those in the last example. Its keys are the PIDs of children, and its values are their status strings. Lines 29 “30: Prefork children We prefork some children by calling make_new_child() the required number of times. Lines 31 “43: Status loop As the children process incoming connections, they will update %STATUS and the changes will be visible to the parent process immediately. But it would be woefully inefficient to do a busy loop over %STATUS looking for changes. Instead, we rely on the children to tell us when %STATUS has changed, by waiting for a signal to arrive. The two signals we expect to get are ALRM , sent by the child when it changes %STATUS , and CHLD , sent by the operating system when a child dies for whatever reason. We enter a loop that terminates when $DONE becomes true. At the top of the loop, we call sleep() , which puts the process to sleep until some signal is received. When sleep() returns, we process %STATUS exactly as before, launching new children and killing old ones to keep the number of idle children between the low and high water marks. Lines 44 “47: Termination When the main loop is done, we call Daemon's kill_children() to terminate any running children, print out some diagnostic messages, and exit. Lines 48 “56: make_new_child() subroutine This subroutine is the same as the one used in the first version of the adaptive server, except that it no longer does pipe management. As in the earlier version, we call the Daemon module's launch_child() subroutine with a callback to cleanup_child() . Lines 57 “83: do_child() subroutine do_child() runs the accept() loop for each child, accepting and processing incoming connections from clients . On entry to the subroutine, we tie a local variable named %status to the shared memory segment identified by SHM_GLUE . Because we expect that the segment has already been created by the parent, we do not use the create or exclusive flags this time. If the variable cannot be tied, the child exits with an error message. We set up the lock file for serialization and enter an accept() loop. Each time the status of the child changes, we write its new status directly into the %status variable and notify the parent that the variable has changed by sending the parent an ALRM signal. The idiom looks like this: $status{$$} = 'idle'; kill ALRM=>getppid(); In other respects do_child() is identical to the earlier version, including its use of an eval{} block to intercept and handle HUP signals gracefully. Lines 84 “87: cleanup_child() subroutine cleanup_child() is called by the Daemon module's reap_child() subroutine to handle a child that has just been reaped. We delete the child's PID from %STATUS . This ensures that %STATUS is kept up to date even if the child has terminated prematurely. Some final notes on this server: I initially attempted to use the same tied %STATUS variable for both the parent and children, allowing the children to inherit %STATUS through the fork. This turned out to be a disaster because IPC::Shareable deallocated the shared memory segment whenever any of the children exited. A little investigation revealed that the destroy flag was being inherited along with the rest of the shared variable. One could probably fix this by hacking into IPC::Shareable's internal structure and manually deactivating the destroy flag. However, there's no guarantee that the internal structure won't change at some later date. Some posters to the comp.lang.perl.modules newsgroup have warned that IPC::Shareable is not entirely stable, and although I have not encountered problems with it, you might want to stick with the simpler pipe implementation on production systems. |