8.3 Turning CGIs Into mod_perl Programs

One immediate benefit of mod_perl is to cache the compilation of CGI programs. They are compiled once and thereafter executed from memory, hence the speed of these programs increases substantially. Now we can use the benefits of mod_perl to execute our current CGI programs as they are written. The good news is that the CGI programs do not need to be modified ”all it takes is an Apache configuration change.

There are two approaches to using mod_perl to run CGI programs: Apache::Registry and Apache::PerlRun .

8.3.1 `Apache::Registry`

The beauty of the Apache::Registry module is that it takes plain-Jane CGI programs, like those written in Chapter 7, and automatically makes them mod_perl programs by compiling the programs only the first time they are called, caching the compilation for subsequent executions. In other words, CGI programs become sort of like function calls because the code is compiled and ready to execute. The only thing that needs be done to apply the Apache::Registry module to the CGI programs is to alter the Apache configuration.

To illustrate that existing CGI programs can be run under mod_perl, as usual the first step is to create a new directory to hold these CGI programs, /var/www/perl :

 #  mkdir /var/www/perl  #  chmod a+rx /var/www/perl  #  cd /var/www/perl

Copy the CGI programs written in Chapter 7 into the new directory:

 #  cp ../cgi-bin/* .

Configure Apache to run these programs using mod_perl by adding this to the httpd.conf file:

 Alias /perl/ /var/www/perl/  <Location /perl>    SetHandler            perl-script    PerlHandler           Apache::Registry    PerlSendHeader        On    Options               +ExecCGI  </Location>

Now if Apache sees a URI starting with /perl/ , it will look for that file in /var/www/perl/ . Then, with the <Location> directive, Apache knows that anything accessed in the /perl directory should be handled by the perl-script handler ”in other words, executed by Perl built into Apache. Apache::Registry is set to be the module that handles requests for files in this part of the document tree. Because this module is applied to this directory, each program within it is compiled as the body of a Perl subroutine and then executed. Each Apache child process compiles the program the first time it is executed, caching it for later executions. If the file is changed on disk, it is recompiled for each child process that executes it, being cached again after the first compilation.

The PerlSendHeader is set to On , which tells mod_perl to intercept anything that looks like a header and turn it into a validly formatted header line. The +ExecCGI option is used by Apache::Registry to make sure you really know what you are doing. It requires that all programs in the directory be executable ”an added step you have to take to make sure this works.

Load the changed Apache configuration file by restarting it:

 #  /etc/init.d/httpd graceful

Now it is time to try one of the CGI programs. Load one of these URLs into your browser: http://localhost/perl/hello.cgi or www.opensourcewebbook/perl/hello.cgi. You should see Figure 8.3.

Figure 8.3. hello, world! with mod_perl

graphics/08fig03.gif

Although Apache::Registry is magic when it comes to existing CGI programs, instantly converting them to mod_perl programs, existing scripts may need to be modified slightly if they are written "sloppily." This is because a "normal" CGI program has a short life span: from when it is requested to when it finishes executing.

This is in contrast to a mod_perl program, which has a life span the same as the Apache child process. For instance, using global variables in plain-Jane CGIs is no big deal, because once the program finishes executing, the globals go away. But for a mod_perl CGI program using Apache::Registry , the global variables live as long as the Apache child process lives, which can cause serious bugs (imagine a variable's value persisting from one execution of the CGI to the next , without designing it this way).

So, the rule of thumb is, if you use Apache::Registry , your script may need to be modified slightly, but it will run much quicker! Or if you are lazy and don't want to modify the scripts that you have (yes, we understand), see Section 8.3.3, Apache::PerlRun, later in the chapter.

8.3.2 A Speed Example or Two

You may notice that the program runs much faster than as a normal CGI program. Here is a little Perl script that simulates 1,000 requests for plain CGI and mod_perl CGI, and times how long it takes. This program is called hitit.pl . It makes requests to the localhost server, so there is minimal network lag.

 #!/usr/bin/perl -w  # hitit.pl  use LWP::Simple;  # the URLs to request  $CGIURL     = http://localhost/cgi-bin/hello.cgi;  $MODPERLURL = http://localhost/perl/hello.cgi;  # initialize to empty string  $content    =  ;  # the time() function returns the number of seconds  # since January 1, 1970  $start_time = time();  print "Start CGI: ", $start_time, "\n";  # loop 1000 times  foreach (1..1000) {      unless (defined ($content = get($CGIURL))) {          die "could not get $CGIURL\n";      }  }  $end_time = time();  print "End CGI: ", $end_time, "\n";  print "CGI difference: ", $end_time - $start_time, "\n";  $start_time = time();  print "Start mod_perl: ", $start_time, "\n";  foreach (1..1000) {      unless (defined ($content = get($MODPERLURL))) {          die "could not get $MODPERLURL\n";      }  }  $end_time = time();  print "End mod_perl: ", $end_time, "\n";  print "mod_perl difference: ", $end_time - $start_time, "\n";

This program uses the LWP::Simple module, which provides simple functions using Perl to interface to the World Wide Web. One of those functions, get() , makes an HTTP request to get the web page that is its argument. Two variables are defined: $CGIURL , the URL for the CGI version of hello.cgi , and $MODPERLURL , the URL for the mod_perl version.

Then, for the CGI version, the time() function is called. This function returns the number of seconds since the Epoch ”January 1, 1970. ^[2] After the time is printed, the program loops 1,000 times, calling the CGI program using the get() function from LWP::Simple . The result of the get() , which is the output of the CGI program, is stored in $content . But we don't care too much about the output in this program, so we simply ignore it. Then the end time is computed and printed, and the difference between the start time and end time is reported . This action is repeated for the mod_perl program.

^[2] The beginning of time for Unix, or for us geeks , the beginning of life as we now understand it.

Here is the output of this program on one of our machines (a dual 450MHz server with 512MB of memory):

 $  ./hitit.pl  Start CGI: 1017758223  End CGI: 1017758259  CGI difference: 36  Start mod_perl: 1017758259  End mod_perl: 1017758265  mod_perl difference: 6

Thirty-six seconds for CGI, six seconds for mod_perl ”not bad. Of course, YMMV. ^[3]

^[3] Your Mileage May Vary.

Since TMTOWTDI, here is another Perl program that times the execution of 1,000 get() calls for both the CGI and mod_perl URLs. It is called hitit2.pl :

 #!/usr/bin/perl  use LWP::Simple;  use Benchmark;  # the URLs to request  $CGIURL     = http://localhost/cgi-bin/hello.cgi;  $MODPERLURL = http://localhost/perl/hello.cgi;  timethese(1000, {          CGI     => q{ $content = get($CGIURL); },          MODPERL => q{ $content = get($MODPERLURL); },      });

This program uses the Benchmark module, included in the Perl distribution, which provides a method named timethese() that times blocks of code. Here we call timethese() with two arguments: The first is the number of times ( 1000 ) the code will be executed in the next argument; the second is an anonymous hash, the values of which are code that will be executed 1000 times. ^[4] The timethese() method then prints some information about how long it takes to execute these two blocks of code. The Benchmark module simplifies the previous example greatly.

^[4] Anonymous hashes are beyond the scope of this book ”just type the curly braces where you see them, and you will create this interesting and powerful data type. For more information, see the Camel Book [Wall+ 00].

Executing hitit2.pl produces this output:

 $  ./hitit2.pl  Benchmark: timing 1000 iterations of CGI, MODPERL...        CGI: 35 wallclock secs (2.70 usr + 1.17 sys = 3.87 CPU)                                 @ 258.40/s (n=1000)    MODPERL:  6 wallclock secs (2.25 usr + 1.00 sys = 3.25 CPU)                                 @ 307.69/s (n=1000)

Remember the programs from Chapter 7 that displayed the client/server information? That information is available to mod_perl too. Try one of these URLs: http://localhost/perl/info4.cgi or www.opensourcewebbook.com/perl/info4.cgi.

You may be curious about the improvement for this program because it is much more involved than the earlier hello.cgi example. In hitit.pl and hitit2.pl , we changed the two URL variables to the following:

 $CGIURL     = http://localhost/cgi-bin/info4.cgi;  $MODPERLURL = http://localhost/perl/info4.cgi;

We then reran the program on the same computer and got this result:

 $  ./hitit.pl  Start CGI: 1017758565  End CGI: 1017758764  CGI difference: 199  Start mod_perl: 1017758764  End mod_perl: 1017758773  mod_perl difference: 9  $  ./hitit2.pl  Benchmark: timing 1000 iterations of CGI, MODPERL...         CGI: 199 wallclock secs (2.68 usr + 1.23 sys = 3.91 CPU)                                   @ 255.75/s (n=1000)     MODPERL:   8 wallclock secs (2.39 usr + 1.07 sys = 3.46 CPU)                                   @ 289.02/s (n=1000)

Either way you look at it, that's a huge improvement.

Posted data can be handled with mod_perl just as it was with CGI. To see how, create a directory:

 $  cd /var/www/html  $  mkdir mod_perl  $  chmod a+rx mod_perl  $  cd mod_perl

Insert the following HTML into /var/www/html/mod_perl/nameage.html :

 <html>  <head>  <title>Enter Your Name and Age</title>  </head>  <body bgcolor="#ffffff">  <form action="/perl/nameage.cgi" method="post">  Name: <input type="text" name="yourname">  <br>  Age: <input type="text" name="age">  <br>  <input type="submit" value="Click to Submit">  </form>  </body>  </html>

This HTML is similar to the example in Chapter 7, but the action= parameter to the <form> tag is different ”it points to /perl/nameage.cgi instead of /cgi-bin/nameage.cgi .

Try this example with one of these two URLs: http://localhost/mod_perl/nameage.html or www.opensourcewebbook.com/mod_perl/nameage.html. If you submit the form, you can see that the Apache::Registry mod_perl program handles form data exactly like the CGI version.

Curious about the speed advantage of using mod_perl with CGI programs that handle posted data? We modified hitit.pl and hitit2.pl by changing the URL variables to the following (wrapped here so that it fits on the page):

 $CGIURL     = http://localhost/cgi-bin/nameage.cgi?                 yourname=Ron+Ballard&age=31;  $MODPERLURL = http://localhost/perl/nameage.cgi?                 yourname=Ron+Ballard&age=31;

Executing the program produces this:

 $  ./hitit.pl  Start CGI: 1017759184  End CGI: 1017759377  CGI difference: 193  Start mod_perl: 1017759377  End mod_perl: 1017759384  mod_perl difference: 7  $  ./hitit2.pl  Benchmark: timing 1000 iterations of CGI, MODPERL...         CGI: 192 wallclock secs (2.77 usr + 1.23 sys = 4.00 CPU)                                   @ 250.00/s (n=1000)     MODPERL:   8 wallclock secs (2.43 usr + 1.14 sys = 3.57 CPU)                                   @ 280.11/s (n=1000)

Once again, quite a nice improvement in speed. All that gain with such a simple change!

8.3.3 `Apache::PerlRun`

Apache::Registry speeds things up because programs are compiled only once, the first time they are executed, but there are other issues besides speed to worry about. As we mentioned, if the program is badly (or sloppily) written, you can still get into trouble.

One typical problem seen in ill- considered mod_perl CGI programs is the use of global variables. After the CGI program is compiled the first time, the program remains in memory inside Apache, and any global variables are reused each time that instance of httpd is used. Usually, this is not such a good idea, because the variable value persists from one execution to another.

You might have some legacy CGI programs that are written somewhat sloppily (although we would never expect you to admit it), but you still want to use them and have them run as fast as possible. The solution is Apache::PerlRun . It compiles the script each time it is run, and all variables exist only for the life of the script.

You may think, "Hey, if the program is compiled each time, then how can using Apache::PerlRun be faster than plain-Jane CGI?" The answer to this good question is that because Perl is already within Apache, loading Perl is not necessary. Also, the CGI program can use preloaded modules that we add to script.pl without loading them for each execution of the script.

To apply Apache::PerlRun to a directory of CGI programs, create a directory and add something similar to this to /etc/httpd/conf/httpd.conf :

 Alias /sloppyperl/ /var/www/sloppyperl/  <Location /sloppyperl>    SetHandler            perl-script    PerlHandler           Apache::PerlRun    PerlSendHeader        On    Options               +ExecCGI  </Location>

Now, all CGI scripts under the URI /sloppyperl/ will be executed with Apache::PerlRun , so any ill effects from sloppy programming will be minimized. Better yet, rewrite the thing!

8.3.1 Apache::Registry

Figure 8.3. hello, world! with mod_perl

8.3.2 A Speed Example or Two

8.3.3 Apache::PerlRun

8.3.1 `Apache::Registry`

8.3.3 `Apache::PerlRun`