Creating Virtual Documents with CGI | Lan Tutorial With Glossary of Terms: A Complete Introduction to Local Area Networks (Lan Networking Library)

Creating 'Virtual Documents' with CGI

Using the common gateway interface to send 'dynamic documents' to users' Web browsers.

This column is the fifth in a series of tutorials covering Internet and World Wide Web services. I began the series with "Providing Internet and World Wide Web Services," which provided an overview of Internet services. With "HTML and CGI," I began discussing Web servers. This lesson concerns the use of the common gateway interface (CGI)the key technology enabling Web servers to respond to user input.

On the client side, an important feature related to CGI is HTML forms capability. In the tutorials "HTML and CGI, Part II" and "CGI and Web Servers," I delved more deeply into the nuts and bolts of how HTML forms pass data to the gateway, discussing the get and post methods of submitting the requestas well as the form datato the Web server.

In "CGI and Web Servers," I also briefly discussed how CGI programs send responses back to the user. In short, the CGI program uses print statements to send responses to standard output (STDOUT); the gateway then directs the Web server to deliver the outputusually in the form of an HTML pageto the user's Web browser. I gave an example of a simple response packet, which contained a basic HTML page called "Simple CGI Test" and the body text "This is a test."

The example in "CGI and Web Servers" showed you how a response packet might look. Now, let's discuss how you might generate such a response in HTML format. One way to do so is to use the Perl script shown in Listing 1.

The first line of Listing 1 calls the Perl interpreter, which is located in the /usr/local/bin directory (you might keep the interpreter in a different location on your server). The second line is a print statement that generates a header line. The server can generate most of the header lines, saving you from doing the work, but you must still create the content-type header yourself. The "\n" is an escape sequence that calls for a new line. Some operating systems, such as MS-DOS and Windows, use a carriage -return/line feed combination to indicate a new line, while others, such as Unix, use a single newline character at the end of each line. How you handle line feeds will depend on which operating system your Web server uses.

Note that the print statement for the content-type header has two newline commands at the end of it. These commands create a blank line immediately following the header. It's mandatory that a blank line exist between the last header line and the beginning of the document. This blank line is a delimiter that enables the Web server and the browser to distinguish the header from the document. (If you're trying out a CGI program for the first time and it isn't working, a missing blank line is the first thing you should look for.)

The remaining lines, except for the last, contain print statements that output the closing HTML tags for the Web browser to interpret. The very last line exits the program.

Creating Dynamic Documents

Listing 1 illustrates output from a CGI program. The service CGI performs is redirecting print statements (which usually go to the printer) to the Web server. The server, in turn , forwards the statements to the user's Web browser as a "virtual HTML document." In this example, only one HTTP header is generated (the content-type header). The Web server fills in the rest of the headers. Alternatively, your CGI program can generate all the necessary header lines, in which case the server delivers the output directly to the requesting Web browser without attempting further processing. This second approach is called a nonparsed header. For most applications, you can let the Web server do the work of creating the headers.

Listing 1

 #!/usr/local/bin/perl print "Content-type: text/html", "\n\n"; print "<HTML>", "\n"; print "<HEAD><TITLE>Simple CGI Test</TITLE></HEAD>", "\n"; print "<BODY>", "\n"; print "This is a test.", "\n";_print "</BODY>", "\n"; print "</HTML>", "\n"; exit (0);

You've probably noticed that this example creates only "canned" HTML pages. What about taking the user's input, processing it, and producing a response based on the input? The Perl script in Listing 2 does this.

Listing 2

 #!/usr/local/bin/perl print "Content-type: text/html", "\n\n"; print "<HTML>", "\n"; print "<HEAD><TITLE>Regarding Your  Request...</TITLE></HEAD>", "\n"; $browser_type = $ENV{'HTTP_USER_AGENT'}; $query_string = $ENV{'QUERY_STRING'}; print "<BODY>", "\n"; print "The browser you're using is ", $browser_type, "<BR>\n"; print "This is the query string that was sent along with your request: ", $query_string, "\n"; print "</BODY>", "\n"; print "</HTML>", "\n"; exit (0);

In this example, I'm assuming three things: that you have created a simple HTML form to submit a request, that the request is submitted via the get method, and that the form contains one or more information fields. (The information fields are consolidated into the query string, which is appended to the URL). If you're unfamiliar with how HTML forms submit information to a CGI program, please review "HTML and CGI, Part One,", "HTML and CGI, Part Two," and "CGI and Web Servers."

The first line of this script calls the Perl interpreter. The second line generates the content-type header, followed by the requisite blank line between the header and the rest of the document. The third and fourth lines create the beginning of the HTML code and the document title.

The fifth line samples the environment variable HTTP_USER_AGENT. When a Web browser sends a request to a Web server, the request typically includes a good deal of information in the header. One of the items of information is the browser type, which CGI stores into the HTTP_USER_AGENT environment variable. The Perl program shown in Listing 2 creates a local string variable called "$browser_type" and loads it with the value it found in the environment variable HTTP_USER_AGENT.

The sixth line operates in similar fashion to the previous line, sampling the environment variable QUERY_STRING and loading the text string it found into a local variable called $query_string. (Note that QUERY_STRING and $query_string are not the same variable; QUERY_STRING is an environment variable, while $query_string is a local variable, used only within our Perl script. What they share in common is the value that we're shuttling between them, but otherwise , they are two separate entities.)

The seventh line prints an HTML tag, while the eighth creates a sentence informing the user what Web browser he or she used in submitting the request. The next line echoes back to the user the contents of the query string, while the remaining lines send the closing HTML tags.

While this program is a good learning tool, it's not very useful in practice, because all the information we want is still packed together into a single environment variable (QUERY_STRING). Recall from the tutorial "HTML and CGI, Part Two" that when a user clicks on the Submit button on an HTML form, the browser returns a query string that includes key/value pairs, separated by ampersands (&). Furthermore, plus signs (+) are substituted for spaces between words, and any other characters that are not acceptable in a URL (or on the shell's command line) are replaced by the percent sign, which is followed immediately by the hexadecimal value of the character, as represented in the ASCII code.

When writing a CGI program, you'll have to parse the query string, reversing-out the URL encoding, breaking the string up into individual variables , and loading the appropriate values into each one. Discussing the details of how to parse strings is beyond the scope of this tutorial; however, if you know your way around one or two programming languagesand I'm assuming you do if you're planning to write CGI programsyou should have a pretty good idea of how to do so. Incidentally, Perl's strength in string manipulation makes it the preferred language for CGI programming. You may also find a library of routines that can parse strings for you. Search the Internet for companies selling programming libraries, to see if there are any libraries of CGI routines available for your language of choice.

Getting Your Feet Wet

As I said in the tutorial "CGI and Web Servers," the learning curve for CGI is considerably steeper than that for HTML. As such, I present this discussion as a general overview of how CGI works, as opposed to a hands-on tutorial about implementing it. Following are a few tips to help you get started on your journey.

CGI can be tricky to set up. When first testing it, choose the simplest type of CGI program. (The Perl scripts contained in Listings 1 and 2 would be good initial test scripts.) How smoothly your setup goes depends a lot on proper configuration of your Web server and any interpreters (such as Perl) you might be using. Since many of these factors are likely to be server-specific, you should consult the documentation for your server or contact the tech support department at your Web server supplier. If you happen to be using a freeware Web server, you are, of course, much more on your own in terms of support. However, you may find an Internet newsgroup or online forum that covers your Web server, and these groups can be a good source of technical tips. Another option is to hire a Web server consultant to configure your CGI gateway correctly and show you how to pass parameters back and forth between the user's Web browser and your CGI programs.

While the initial setup of CGI can be tricky, the concept of how the technology works is quite simple. You submit requests to the Web server with either the get or post methods. The gateway passes user-supplied information to your CGI program, using environment variables. Your program massages the information obtained, adds other information from the server itself (for example, you might have a database that holds product availability data), and spits out to the user a "virtual document" in HTML format.

You've probably noticed that I've focused all of this discussion on getting information into your CGI program and then getting it back to the user. I've done so becauseexcept for the input and output processwriting a CGI program is not much different from writing any other program. The only difference that you must keep in mind is the nature of Web browser access. Since the user is waiting for something to happen, you must always design your CGI programs to deliver a response in a few seconds' time.

This tutorial, number 99, by Alan Frank, was originally published in the November 1996 issue of LAN Magazine/Network Magazine.