One of the many features that makes Perl an attractive programming language is its built-in support for TCP/IP programming. Perl and TCP/IP mix especially well because Perl's powerful text processing capabilities are very helpful in dealing with text-based Internet protocols like SMTP, NNTP, and HTTP. Perl's support for network programming is so complete that you can write any conceivable type of Internet network application in it. Anything that you can express in C also can be expressed in Perl. You can write a web server or a news client from the ground up if you like. You can write a DNS server. You could even rewrite sendmail . The necessary capabilities are all there. Don't write low-level code when you can use modules insteadWhile it's certainly possible to write network applications from the ground up, you should consider using existing modules to support your efforts. For example, if you want to fetch a Web page, the following will suffice: use LWP::Simple; $page = get 'http://www.effectiveperl.com/'; (Yes, that's righttwo lines!) Of course, you could always start out with calls to socket , bind , connect , bone up on HTTP, and so on, but I think you'll agree that this is easier. HTTP is particularly well served by Perl modules, but there are also modules for working with FTP, NNTP, SMTP, and many other Internet protocols and standards. If you want to learn more about Perl's Internet modules, you should begin by looking at libwww-perl (the World Wide Web library, also called LWP ) and libnet (a collection of Internet protocol modules). When you do write low-level networking code, don't use anachronismsOf course, your application may be one where you are forced to write low-level networking code. For example, you might be working on a CGI script that connects to and exchanges data with a server application (possibly also written in Perl) via TCP/IP. Networking or "sockets" code isn't easy to understand the first time you encounter it. To avoid starting from scratch, you will be tempted to look for an example to use as a starting point. This is a good idea, but you should be careful to work from an up-to-date example. Many of the older examples of sockets code in Perl that are floating around the net have various limitations, inefficiencies , and/or bugs . Let's discuss some of these problems, and how to avoid them. First, you should always use the Socket module, or perhaps IO::Socket . [2] Among other things, the Socket module defines constants for protocol numbers and the like that will be correct for your environment. Older code may contain something like this:
$pf_inet = 2; $sock_stream = 1; $tcp_proto = 6; These hard-coded values worked for many programmers for a long time, but once large numbers of people started using Perl on Solaris (System V) machines, network applications written in this manner began failing with mysterious " protocol not supported" messages. These values do not work on all operating systems. The right way to write this code is to use the functions defined by the Socket module:
Thus, we get the constant for the inet domain from the "constant" function PF_INET , the constant for the stream type from the function SOCK_STREAM , and the protocol number from the function getprotobyname . Another thing you may see in older code is the use of the pack operator to create the binary addresses that the various sockets functions require: The old way:
The Socket module defines functions that do this for you in a more readable and maintainable way: The new way:
Server applications are often written so that they spawn child processes to handle incoming connections. Any time you create child processes you need to do something to ensure that they do not become " zombies ." One way to do this is to set up a SIGCHLD (child died) signal handler in the parent process. Whenever a child process exits, control transfers to the signal handler, which should then call wait to get rid of the zombie. You may have seen a variety of versions of this code, but one reasonably safe version looks something like this: This is a slightly overblown signal handler:
It isn't necessary (or recommended) to reinstall the handler within the handler subroutine itself so long as you are on a BSD system or one that is POSIX-compliant. Nowadays the news here is likely to be good. Try the following: use Config; print "handlers stay put\n" if $Config{d_sigaction} eq "define"; You should skip reinstalling the handler if you believe your scripts will be run only on systems with POSIX signals ( generally a safe bet): sub REAPER { wait } $SIG{CHLD} = \&REAPER; Or, even more succinctly:
The reason to avoid the assignment to %SIG within the handler is that Perl does not yet have "safe" interrupts and may not have them for some time to come. [3] So long as this is true, the less that goes into a signal handler, the better. You may have considered using:
Please don't. It works on some System V machines but you will be experiencing "Night of the Living < defunct > " on other platforms. An exampleLet's develop a pair of simple TCP/IP applications. We will write a server called psd that will run the ps command locally. The result will be returned to a client called rps . If you have to write both a client and a server, it's usually easiest to start writing the server, because you can probably test it using telnet as a client. Here is a bare-bones, slightly buggy first cut at psd : psd : A ps daemon
You can test this version of psd by running it in the background from the command line, then telnet -ing to the assigned port: % psd & [1] 29321 psd listening to port 2001 % telnet localhost 2001 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. PID TT S TIME COMMAND 10582 pts/7 S 0:01 -tcsh . . . blah blah blah . . . Connection closed by foreign host. % There are a couple of problems with this code that we ought to fix. If you start this version of psd , connect to it at least once, then kill it (with Control-C), and then try to restart it immediately, it may die with an error message along the lines of " address already in use." If you wait a while, though, it will run fine. What is happening is that one or more closed connections in the TIME_WAIT state (a perfectly normal condition) are preventing the call to bind from succeeding, because bind will not by default allow more than one socket to use the same name (address and port number) at the same time. After a few minutes, the closed connections time out completely and the name becomes available for reuse again. Another problem is that this server will accept only a single connection at a time. The easy and customary way to have a server accept multiple simultaneous connections is to spawn a new child process to handle each incoming connection. We will get back to the server shortly. Let's take a look at our client, rps : rps : A remote ps client
You can now use rps instead of telnet to talk to psd (kind of a mouthful of Unix, isn't it?): % psd & [2] 29678 psd1 listening to port 2001 % rps localhost PID TT S TIME COMMAND 10582 pts/7 S 0:01 -tcsh . . . blah blah blah . . . % Extra features are nice, up to a point, so let's add one. Let's allow the user to specify a ps option as an argument on the command line, which rps will pass on to psd . First, at the top of the file, add: use FileHandle; Then, before my $remote_host , add: my $option = shift if @ARGV[0] =~ /^-/; Next , we have to send the option to the server. This will require changes to both rps and psd . The change to rps is simple. Before print while <PSD> , insert the following: PSD->autoflush(1); # prettier than SELECT(PSD) and $ = 1 print PSD "$option\n"; We want to make sure that the option string we are sending gets sent; otherwise the server will hang. Here's psd , rewritten to incorporate an option string sent from rps and to support multiple connections: psd : A revised ps daemon
We've added a call to setsockopt that allows us to establish multiple connections on the same socket. This also will put an end to the TIME_WAIT behavior you may have observed before. The second parameter to listen has been increased to 5 , which will allow us to have up to five connections queued up at once, that is, five connections that haven't yet been answered by accept . [4]
We now print out the hostname of the connecting machine and then fork a child process. The child process, which executes the code inside the first block of the if statement, reads the option sent by the client and tidies it up so that nothing bad will happen if someone sends an option like '; rm *' . There is also a SIGCHLD handler so that we don't create zombies. Obviously, you can go a lot farther in network programming than this simple example does. Perl supports all of the Unix networking features accessible from C, including other TCP/IP features (e.g., UDP) and Unix domain networking. As I pointed out earlier, Perl is especially convenient for dealing with text-based protocols because of its string handling and pattern matching features. In any event, as you embark on your next network programming project, remember to check the CPAN to see whether what you need has already been written. The code that you need may be there already, and if it is, it is likely to be reasonably well thought out and implemented. |