CGI and Web Servers | Lan Tutorial With Glossary of Terms: A Complete Introduction to Local Area Networks (Lan Networking Library)

Fundamentals of the Common Gateway Interface.

In the tutorial "HTML and CGI, Part Two," I discussed HTML forms and how to use them to obtain input from Web users. I also began to describe how to use the common gateway interface (CGI) to turn the information supplied in those forms over to a program running on the Web server that can process the data and produce a response.

In this tutorial I'll provide a basic understanding of how CGI works and how it fits almost hand in glove with HTML forms techniques to make a Web site interactive. But before I proceed, a few disclaimers:

This tutorial is not an in-depth , how-to discussion of CGI. CGI is quite complex and one of the tougher Web server technologies to implement correctly. Complete coverage of the subject would require an entire book.
If you're new to Web servers, you might want to limit your first few Web site projects to using HTML only. The learning curve for HTML is much gentler than that of CGI. Trying to make your first Web site interactive is a little like having an introductory lesson in rock climbing, then deciding to tackle Yosemite's El Capitan as your first climb. There's a lot to explore in HTML, so wait until you have some experience under your belt before embarking on a CGI excursion.
This is not to paint CGI as an impossibly difficult subject: You don't have to be a gonzo C programmer to use CGI. If and when you feel ready for the challenge, you have several programming options from which to choose. A lot of CGI programming can be done with scripting-type languages, such as Perl or Tcl (tool command language). I experimented a bit with Santa Monica, CA-based Quarterdeck's WebServer 1.0, a Web server that runs on Windows 3.x and lets you use MS-DOS batch files as the scripting language. (Although simple to use, the DOS batch language has serious limitations as a CGI scripting language, so I don't recommend using DOS scripts for CGI.)

With so many caveats to consider, why would I even attempt to cover CGI in this brief format? If you run a Web site or if you are considering doing so, it's useful to be familiar with how CGI worksat least in general terms. For example, if you want to put together an interactive Web site for your organization, but you are repelled by the idea of programming, you might decide to outsource the development work. Knowing the broad outlines of how CGI works will make you a well-informed decision-maker.

If, by chance, this discussion whets your appetite and you want to learn more about CGI programming (I know, that may be stretching it), I recommend you take a look at the books and Web documents listed in "For More Information" and "In Cyberspace."

Which Method?

Disclaimers aside, let's dive into our subject. In the tutorial "HTML and CGI, Part II," I discussed two basic methods by which HTML forms can submit user data to a Web server's Common Gateway Interface: the get method and the post method. Let's pick up where I left off.

The get method passes user data to the gateway by adding the data to the requested document's path . (I use the term "document" loosely here, because most CGI programs create an HTML-formatted response on the fly. The response therefore is a dynamically generated page, or virtual document.)

When a user clicks the Submit button in an HTML form, the HTTP request generated by the form contains a string of text known as the query string. This string contains the user data that is submitted to the CGI program (see the tutorial "HTML and CGI, Part II" for an example). The other portion of the request contains the file name of the CGI program to be executed; this program processes the submitted data.

The gateway extracts the query string and places it in an environment variable named, appropriately enough, query_string. Now your CGI program can read that environment variable and process it accordingly . In general, you place that value into a temporary variable for further processing. Following the example from the tutorial "HTML and CGI, Part II,", a user might respond sky blue to the prompt for color and large for size. The value for the environment variable query_string would thus be, color = sky%20blue& size =large. All the dataincluding the names of the returned information fields and their valuesis jumbled together, along with a cast of weird characters , such as ampersands (&) and percent signs (%). Placing the text string into a temporary variable lets you parse the information into the appropriate variables .

As you can see, parameters are passed to the CGI program as key/value pairs (such as color=sky%20blue or size=large). These key/value pairs are separated by ampersands. Because spaces and several other characters are not permitted in URLs, these characters are replaced by the percent sign followed by the hexadecimal ASCII value of the character. Thus, a space becomes %20, for example.

The order of the key/value pairs in the query string may vary depending on the user's Web browser; the number of returned pairs can vary, as well. As I pointed out in the tutorial "HTML and CGI, Part II," if the check box for AppleTalk is not checked, no key/value pair is returned. As a result, your CGI program should not assume that a set number or order of key/value pairs will be returned. Once your program has read the query_string environment variable and placed it into a temporary or working variable, you must parse the string for key/value pairs and convert special characters (such as spaces) from their hexadecimal representation to their original characters. You can then assign each value to a corresponding variable in your program.

There are some drawbacks to using the get method. First of all, it limits the amount of information that can be passed to the CGI program. Because the get method simply appends all the information to be passed to the URL, it can create an extremely long URL. As a result, the information you are attempting to send may be truncated by the Web server. Another drawback is that with the get method it is hard to set any rules that determine a "safe" length for the string because where the string is truncated depends on the Web browser and server you use. For these reasons, you may want to use the post method.

Post Your Messages

If a Web browser uses the post method, the request will look similar to the following:

 POST /cgi-bin/example.pl HTTP/1.0 Accept: www/source_Accept: text/html Accept: text/plain User-Agent: Content-type: application/ x-www-urlencoded Content-length: 28     COLOR=SKY%20BLUE&SIZE=LARGE

Note the content-type header (application/x-www-urlencoded) . HTML forms typically use this content typewhen the form creates a request, it automatically sets this type.

With the post method, the server passes the information contained in the submitted form as standard input (STDIN) to the CGI program. Your CGI program needs to know how much data to read from STDIN. Fortunately, the browser's request includes a content-length header. When form data is transferred via CGI, the gateway sets an environment variable named content_length to report the amount of data being transferred. Thus, before it can read from STDIN, your CGI program must first read the environment variable content_length .

Although in this example we're transferring only a small amount of data (28 bytes), the advantage of using the post method is that it lets you send an unlimited amount of data.

So how does the Web server know whether data is being sent via the get method or post method? The first header in the browser's request specifies the type of request, which, in turn, sets another environment variable, request_method , accordingly. However, if you're developing an application using CGI, you'll likely develop both the client and server ends of the application, so you'll know (or will decide) which method both the requester and the server use. So, for example, if you program your HTML forms to submit data using the get method, you should write your CGI program to accept data using that method.

In his book CGI Programming on the World Wide Web , Shishir Gundavaram shows a short Perl routine that can sample the request_method environment variable, obtain the data from either the query_string environment variable or STDIN (depending on whether the request is get or post), and place the information into a local variable named query_string (not to be confused with the environment variable of the same name). You can then use the data in the query_string local variable without concerning yourself as to whether it came in via a get request or a post request.

Creating A Virtual Page

Assuming you have the data in your CGI program for processing, how do you generate the response? It's quite simple: You send your response to standard output (STDOUT)which you can usually do with print statements. The gateway then directs the HTTP server (the Web server) to return the response to the user's Web browser as a documenttypically, an HTML document.

A simple response might look similar to this:

 HTTP/1.0 200 OK Date: Monday, 24-May-96 11:09:05 GMT Server: NCSA/1.3 MIME-version 1.0 Content-type: text/html Content-length:     <HTML> <HEAD><TITLE>Simple CGI Test</TITLE></HEAD> <BODY> This is a test. </BODY> </HTML>

Notice the blank line between the last header line and the beginning of the HTML code. It has to be there, as it serves as the delimiter between the header and the main part of the document. In the tutorial "Creating 'Virtual Documents' with CGI," we'll look at a Perl Script that could be used to create this sample virtual page.

Resources

A column, even a series of columns , can serve only to familiarize you with such dense content matter as CGI programming. But if you have found your interest sparked and you would like to learn more about CGI programming, then I would recommend the following list of books and Web sites.

CGI Programming on the World Wide Web

by Shishir Gundavaram
O'Reilly & Associates
ISBN: 1-56592-168-2

This book provides excellent coverage of CGI. It includes numerous examples, most of which are written in Perl. It is a good introduction to CGI, and also dives deeper into the subject.

Introduction to CGI/Perl

by Steven Brenner and Edwin Aoki
MIS Press
ISBN: 1-55851-478-3
I came across this reference on a Web page. I haven't read this book, so I can't comment on it, but it's one bookstore possibility.

The WWW Common Gateway Interface 1.1

http://www.ast.cam.ac.uk/~drtr/draft-robinson-www-interface-00.html
This document, written by David Robinson of the University of Cambridge, is an Internet draft describing CGI. The opening paragraphs stress that "it is inappropriate to use Internet drafts as reference material or to cite them other than as 'work in progress.'" So, if you bear in mind that this draft is subject to change or replacement at any time, you will find it has much useful information about the interface in its current state.

The Common Gateway Interface

http://hoohoo.ncsa.uiuc.edu/cgi/
The University of Illinois' National Center for Supercomputing Applications (NCSA) is where Mosaic was developed. The document listed here offers a good introduction to CGI.

Perl for CGI

Practical Extraction and Reporting Language
http://jumpgate.acadsvcs.wisc.edu/publishing/cgi/perl.html
This document provides information on the Perl scripting language.

This tutorial, number 98, by Alan Frank, was originally published in the October 1996 issue of LAN Magazine/Network Magazine.