17. | CGI Programming with Perl

CGI Programming with Perl

1.2. Introduction to CGI

CGI can do so much because it is so simple. CGI is a very lightweight interface; it is essentially the minimum that the web server needs to provide in order to allow external processes to create web pages. Typically, when a web server gets a request for a static web page, the web server finds the corresponding HTML file on its filesystem. When a web server gets a request for a CGI script, the web server executes the CGI script as another process (i.e., a separate application); the server passes this process some parameters and collects its output, which it then returns to the client just as if had been fetched from a static file (see Figure 1-1).

figure 1-1

Figure 1-1. How a CGI application is executed

So how does the whole interface work? We'll spend the remainder of the book answering this question in more detail, but let's take a basic look now.

Web browsers request dynamic resources such as CGI scripts the same way they request any other resource on the Web: they send a message formatted according to the Hypertext Transport Protocol, or HTTP. We'll discuss HTTP in Chapter 2, "The Hypertext Transport Protocol ". An HTTP request includes a Universal Resource Locator, or URL, and by looking at the URL, the web server determines which resource to return. Typically, CGI scripts share a common directory, like /cgi, or a filename extension, like .cgi. If the web server recognizes that the request is for a CGI script, it executes the script.

Say you wanted to visit the URL, http://www.mikesmechanics.com/cgi/welcome.cgi. At its most basic, Example 1-1 shows a sample HTTP request your web browser might send.

Example 1-1. Sample HTTP Request

GET /cgi/welcome.cgi HTTP/1.1 Host: www.mikesmechanics.com

This GET request identifies the resource to retrieve as /cgi/welcome.cgi. Assuming our server recognizes all files in the /cgi directory tree as CGI scripts, it understands that it should execute the welcome.cgi script instead of returning its contents directly to the browser.

CGI programs get their input from standard input (STDIN) and environment variables. These variables contain information such as the identity of the remote host and user, the value of form elements submitted (if any), etc. They also store the server name, the communication protocol, and the name of the software running the server. We'll look at each one of these in more detail in Chapter 3, "The Common Gateway Interface".

Once the CGI program starts running, it sends its output back to the web server via standard output (STDOUT). In Perl, this is easy to do because by default, anything you print goes to STDOUT. CGI scripts can either return their own output as a new document or provide a new URL to forward the request elsewhere. CGI scripts print a special line formatted according to HTTP headers to indicate this to the web server. We'll look at these headers in the next chapter, but here is a sample of what a CGI script returning HTML would output:

Content-type: text/html

CGI scripts actually can return extra header lines if they choose, so to indicate that it has finished sending headers, a CGI script prints a blank line. Finally, if it is outputting a document, it prints the contents of that document, too.

The web server takes the output of the CGI script and adds its own HTTP headers before sending it back to the browser of the user who requested it. Example 1-2 shows a sample response that a web browser would receive from the web server.

Example 1-2. Sample HTTP Response

HTTP/1.1 200 OK Date: Sat, 18 Mar 2000 20:35:35 GMT Server: Apache/1.3.9 (Unix) Last-Modified: Wed, 20 May 1998 14:59:42 GMT ETag: "74916-656-3562efde" Content-Length: 2000 Content-Type: text/html <HTML> <HEAD>   <TITLE>Welcome to Mike's Mechanics Database</TITLE> </HEAD> <BODY BGCOLOR="#ffffff">   <IMG src="/books/1/64/1/html/2//images/mike.jpg" ALT="Mike's Mechanics">   <P>Welcome from dyn34.my-isp.net! What will you find here? You'll     find a list of mechanics from around the country and the type of     service to expect -- based on user input and suggestions.</P>   <P>What are you waiting for? Click <A HREF="/cgi/list.cgi">here</A>     to continue.</P>   <HR>   <P>The current time on this server is: Sat Mar 18 10:28:00 2000.</P>   <P>If you find any problems with this site or have any suggestions,     please email <A HREF="mailto:webmaster@mikesmechanics.com">     webmaster@mikesmechanics.com</A>.</P> </BODY> </HTML>

The header contains the communication protocol, the date and time of the response, the server name and version, the last time the document was modified, an entity tag used for caching, the length of the response, and the media type of the document -- in this case, a text document formatted with HTML. Headers like these are returned with all responses from web servers, and we'll look at HTTP headers in more detail in the next chapter. However, note that nothing here indicates to the browser whether this response came from the contents of a static HTML file or whether it was generated dynamically by a CGI script. This is as it should be; the browser asked the web server for a resource, and it received a resource. It doesn't care where the document came from or how the web server generated it.

CGI allows you to generate output that doesn't look any different to the end user than other responses on the Web. This flexibility allows you to generate anything with a CGI script that the web server could get from a file, including HTML documents, plain text documents, PDF files, or even images like PNGs or GIFs. We'll look at how to create dynamic images in Chapter 13, "Creating Graphics on the Fly".

1.2.1. Sample CGI

Let's look at a sample CGI application, written in Perl, that creates the dynamic output we just saw in Example 1-2. This program, shown in Example 1-3, determines where the user is connecting from and then creates a simple HTML document containing this information, along with the current time. In the next several chapters, we'll see how to use various CGI modules to make creating such an application even easier; for now, however, we will keep it straightforward.

Example 1-3. welcome.cgi

#!/usr/bin/perl -wT use strict; my $time        = localtime; my $remote_id   = $ENV{REMOTE_HOST} || $ENV{REMOTE_ADDR}; my $admin_email = $ENV{SERVER_ADMIN}; print "Content-type: text/html\n\n"; print <<END_OF_PAGE; <HTML> <HEAD>   <TITLE>Welcome to Mike's Mechanics Database</TITLE> </HEAD> <BODY BGCOLOR="#ffffff">   <IMG src="/books/1/64/1/html/2//images/mike.jpg" ALT="Mike's Mechanics">   <P>Welcome from $remote_host! What will you find here? You'll     find a list of mechanics from around the country and the type of     service to expect -- based on user input and suggestions.</P>   <P>What are you waiting for? Click <A HREF="/cgi/list.cgi">here</A>     to continue.</P>   <HR>   <P>The current time on this server is: $time.</P>   <P>If you find any problems with this site or have any suggestions,     please email <A HREF="mailto:$admin_email">$admin_email</A>.</P> </BODY> </HTML> END_OF_PAGE

This program is quite simple. It contains only six commands, although the last one is many lines long. Let's take a look at how it works. Because this script is our first and is short, we'll look at it line by line; but as mentioned in the Preface, this book does assume that you are already familiar with Perl. So if you do not know Perl well or if your Perl is a little rusty, you may want to have a Perl reference available to consult as you read this book. We recommend Programming Perl, Third Edition, by Larry Wall, Tom Christiansen, and Jon Orwant (O'Reilly & Associates, Inc.); not only is it the standard Perl tome, but it also has a convenient alphabetical description of Perl's built-in functions.

The first line of the program looks like the top of most Perl scripts. It tells the server to use the program at /usr/bin/perl to interpret and execute this script. You may not recognize the flags, however: the -wT flags tell Perl to turn on warnings and taint checking. Warnings help locate subtle problems that may not generate syntax errors; enabling this is optional, but it is a very helpful feature. Taint checking should not be considered optional: unless you like living dangerously, you should enable this feature with all of your CGI scripts. We will discuss taint checking more in Chapter 8, "Security".

The command use strict tells Perl to enable strict rules for variables, subroutines, and references. If you haven't used this command before, you should get into the habit of using it with your CGI scripts. Like warnings, it helps locate subtle mistakes, such as typos, that might not otherwise generate a syntax error. Furthermore, the strict pragma encourages good programming practices by forcing you to declare variables and reduce the number of global variables. This produces code that is more maintainable. Finally, as we will see in Chapter 17, "Efficiency and Optimization", the strict pragma is essentially required by FastCGI and mod_perl. If you think you might migrate to either of these technologies in the future, you should begin using strict now.

Now we start the real work. First, we set three variables. The first variable, $time, is set to a string representing the current date and time. The second variable, $remote_id, is set to the identity of the remote machine requesting this page, and we get this information from the environment variables REMOTE_HOST or REMOTE_ADDR. As we mentioned earlier, CGI scripts get all of their information from the web server from environment variables and STDIN. REMOTE_HOST contains the full domain name of the remote machine, but only if reverse domain name lookups have been enabled for the web server -- otherwise, it is blank. In this case, we use REMOTE_ADDR instead, which contains the IP address of the remote machine. The final variable, $admin_email, is set to SERVER_ADMIN, which contains the email address of the server's administrator according to the server's configuration files. These are just a few environment variables available to CGI scripts. We'll review these three in more detail along with the rest in Chapter 3, "The Common Gateway Interface".

As we saw earlier, if a CGI script wants to return a new document, it must first output an HTTP header declaring the type of document it is returning. It does this and prints an additional blank line to indicate that it has finished sending headers. It then prints the body of the document.

Instead of using a print statement to send each line to standard output separately, we use a "here" document, which allows us to print a block of text at once. This is a standard Perl feature that's admittedly a little esoteric; you may not be familiar with this if you have not done other forms of shell programming. This command tells Perl to print all of the following lines until it encounters the END_OF_PAGE token on its own line. It treats the text as if it were enclosed in double quotes, so the variables are evaluated, but double quotes do not need to be escaped. Not only do "here" documents save us from a lot of extra typing, but they also make the program easier to read. However, there are even better ways of outputting HTML, as we'll see in Chapter 5, "CGI.pm", and Chapter 6, "HTML Templates".

That's all there is to our script, so at this point it exits; the web server adds additional HTTP headers and returns the response to the client as we saw in Example 1-2. This was just a simple example of a CGI script, and don't worry if you have questions or are unsure about a particular detail. As our numerous references to later chapters indicate, we'll spend the rest of the book filling in the details.

1.2.2. Invoking CGI Scripts

CGI scripts have their own URLs, just like HTML documents and other resources on the Web. The server is typically configured to map a particular virtual directory (a directory contained within a URL) to CGI scripts, such as /cgi-bin, /cgi, /scripts, etc. Generally, both the location for CGI scripts on the server's filesystem and the corresponding URL path can be overridden in the server's configuration. We will see how to do this for the Apache web server a little later in Section 1.4.1, "Configuring CGI Scripts".

On Unix, the filesystem differentiates between files that are executable and those that are not. CGI scripts must be executable. Assuming you have a Perl file that you have named my_script.cgi, you would issue the following command from the shell to make a file executable:

chmod 0755 my_script.cgi

Forgetting this step is a common problem. On other operating systems, you may have to enable other settings to enable scripts to run. Refer to the documentation for your web server.


1. Getting Started		1.3. Alternative Technologies