Introduction to CGI


Conventional websites made up of static HTML pages are fine in their way. Server-side includes let you embed a lot of useful functionality into regular HTML pages. However, for true web applications (such as e-commerce sites, message boards, databases, and anything where the system tailors its content to the actions of the user), you need a server-side programming environment to handle the user's input and control the requested output. The standard way to do this is CGI, the Common Gateway Interface.

CGI exists as a "mediator" protocol, a layer of the web server that allows you to use HTML forms to take in users' data, which the server (through CGI) feeds in a common format to any program on the server, regardless of what language it's written in. A CGI program can be a Perl script, a compiled C binary, a shell script, or anything else that can be executed by the Apache user (nobody or www). The output of the program is routed back through Apache directly to the web browser. This means that, for example, you can write a CGI program that reads in variables from an HTML form, processes them, opens a pipe to Sendmail to mail the contents of the variables to you, and prints out an HTML response to the user. You'll see how to do this in a simple Perl script in just a moment.

Enabling CGI in Apache

There are two ways to use CGI programs in Apache. The first, cleanest way is with the ScriptAlias directive, which defines a certain directory as containing only CGI programs and maps it to a virtual path (as seen by the web browser):

ScriptAlias /cgi-bin/ "/usr/local/www/cgi-bin/"


This line, which is enabled in httpd.conf by default, tells Apache that the /cgi-bin/ URL (http://stripes.example.com/cgi-bin/) is a designated CGI directory and that everything in it should be treated as a CGI program. If you put anything in this directory that isn't a CGI program, it will return a 500 Server Error code, a failure to execute it as a CGI. You can add as many additional ScriptAlias lines as you like. The filesystem path to the CGI directory (for example, /usr/local/www/cgi-bin/) need not even be in the normally webaccessible path; you can point the alias to anywhere in the system that's readable by the nobody (or www) user. This prevents people from being able to look at the contents of the directory or navigate to it from other directory listings. You generally don't want people to be able to access your CGI programs directly, anywaythey're typically called from links or as form actions, which we will discuss shortly.

Note

Note that the trailing slash on /cgi-bin/ is specified. This is another security measure, designed to prevent unauthorized access to the directory listing. A CGI program as Apache sees it through ScriptAlias is the name of the program (for example, script.cgi) that is appended to the ScriptAlias virtual path (for example, /cgi-bin/), so the slash is required in order for the server to construct the proper path (for example, /cgi-bin/script.cgi). If the trailing slash were omitted, the bare directory itself would also be aliased, meaning that a client could issue a request for the /cgi-bin directory and get a listing; you probably don't want that. 301 redirects don't apply here.


Note

Because all files in a directory specified by ScriptAlias are treated as CGI scripts, you don't need to have filename extensions on each of these programs. You can have a program called /cgi-bin/test as well as one called /cgi-bin/hello.cgi; but outside the /cgi-bin/ directory, CGI scripts must have the .cgi extension or whatever extension you have configured using AddHandler, as you will see next.


The other way to enable CGI programs is to use the Options directive to add the ExecCGI option to an area of the server specified by a <Directory> or <Location> block. This is useful for enabling CGI programs in your users' public_html directories, allowing the server to execute programs as CGI based on their filename extensions (for example, .cgi) and whether they're set executable. The following example will turn on CGI execution of all users' executable .cgi files, no matter where in their directories they are:

<Directory /home/*/public_html>   Options +ExecCGI   AddHandler cgi-script .cgi </Directory>


If a CGI file (whether in a ScriptAlias directory or mapped to a handler by extension) can't be executed for any reason, the user will get a 500 Server Error message; this is a generic error condition, one that can be generated by numerous different server-side causes and can't really be used for debugging, other than to know that something is wrong. You can look at Apache's error log to get a more detailed diagnostic message. Here's an example of an error log from an unsuccessful request for a Perl script called blah in /usr/local/www/cgi-bin:

[View full width]

# tail /var/log/httpd-error.log syntax error at /usr/local/www/cgi-bin/blah line 3, at EOF Execution of /usr/local/www/cgi-bin/blah aborted due to compilation errors. [Thu Jan 12 11:55:40 2006] [error] [client 64.2.43.44] Premature end of script headers: /usr/local/www/cgi-bin/blah


The first two lines of output are directly from Perl, exactly the same as if the script had been run on the command line. The third line is Apache telling us that it tried to execute the script, but it quit before printing out any HTTP headers, such as Content-type: (which is required for a valid CGI script). Our task, then, is to make sure the program is written correctly for CGI execution.

Writing CGI Programs

The format in which CGI variables are passed to a server-side program is as a URLencoded text string, with each variable separated from its value by an equals sign (=) and from other variables by ampersands (&). The script sees it as being fed in via standard input (STDIN).

Perl is the most common language for CGI programming and therefore the two terms are (incorrectly) often used synonymously. Don't confuse the two: Perl is useful for a great many things besides web programming, and CGI encompasses all conceivable languages, even ones that don't exist yet. Still, the prevalence of Perl in the CGI programming world makes it the object of our attention right now.

Accessing CGI Programs Through Forms

Perl's strengths, as you saw in Chapter 11, "Introduction to Perl Programming," are in text processing and ease of development; this makes it an ideal candidate for situations when you need to read in variables from an HTML form (such as a user's name, email address, mailing address, and comments) and process them into a form you or the system can use. A typical Perl CGI program is invoked as the action of an HTML form, like the following:

<FORM NAME="myform" METHOD="POST" ACTION="/cgi-bin/post2me">


When the user submits this form, all its variables are submitted through the CGI interface to post2me, the Perl program whose job it is to handle them. The first thing this script must do is read in the variables from standard input and format them into an associative array for easy access:

read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); @pairs = split(/&/, $buffer); foreach $pair (@pairs) {     ($name, $value) = split(/=/, $pair);     $value =~ tr/+/ /;     $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;     $value =~ s/~!/ ~!/g;     $FORM{$name} = $value; }


This code does some security parsing to prevent malicious form input from being sent to the program. Specifically, if you write a CGI program that prints a user's form input out into an HTML file, a user can embed some malicious HTML code into his input that calls a server-side include that runs some program on the server when the newly written HTML file is accessed. Because any program executed through HTTP requests is run by the unprivileged nobody (or www) user, this usually amounts only to an annoyance. Still, it's a security hole that must be addressed, and this code block disables any potentially malicious HTML tags by inserting spaces and dashes where appropriate.

Note

This method of checking for a "blacklist" of prohibited codes is not sufficient to cover all possible methods of attack. You can put Perl into "taint" mode using the -T flag, which prevents Perl from issuing system calls using any so-called "tainted" variables in the CGI program.


After your script has read in the form input, each form variable is available as a key in the %FORM array; the contents of the HTML input field called email, for instance, are in $FORM{'email'} and can be used however you like.

Next, your script must print out a valid HTML header. You have two choices here: You can print HTML code to standard output, effectively writing a new HTML page from within the script, or you can redirect the user to a different URL while the script does its work. The former is done with a Content-type: header, which can be any valid MIME type (it's up to the browser to know how to handle it), followed by a double newline, the standard signal for the end of the header block:

print "Content-type: text/html\n\n";


Note

If you use the CGI.pm module (as is very common practice), you can print out the standard HTML content type header using the $q->header method.


Anything printed out by your script after this header is part of the response body, rendered as HTML by the browser. You can use a type of text/plain to force the browser to display it as plain text, or any other type, according to your needs.

The latter method, a redirect, is done with a Location: header and a redirection URL:

print "Location: http://www.somewhereelse.com/path/to/file.html\n\n";


Anything printed after this header vanishes because the browser will have already moved on to this new URL.

Environment variables are also available to CGI programs, and an HTTP connection comes with a great many pieces of interesting information. Some of these include HTTP_REFERER (the referring URL), HTTP_USER_AGENT (the browser the user has), REMOTE_HOST (the user's hostname), and many more. You can see them all by accessing the printenv script, which is included as part of the default Apache installation in the /usr/local/www/cgi-bin directory. You can access it at the URL http://www.example.com/cgi-bin/printenv, substituting your FreeBSD machine's hostname or IP address, as appropriate. Within Perl, your environment variables are accessible as keys in the %ENV array, so you can access the REMOTE_HOST variable as $ENV{'REMOTE_HOST'}.

Let's look at a simple Perl CGI program, which reads in three variablesname, email, and commentsfrom an HTML form, mails them to you, and prints a formatted thank-you note to the user. This script is shown in Listing 26.1.

Listing 26.1. A Sample Perl CGI Programsendcomments.cgi

[View full width]

#!/usr/bin/perl -wT read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); @pairs = split(/&/, $buffer); foreach $pair (@pairs) {     ($name, $value) = split(/=/, $pair);     $value =~ tr/+/ /;     $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;     $value =~ s/~!/ ~!/g;     $FORM{$name} = $value; } print "Content-type: text/html\n\n"; open (MAIL,"| /usr/sbin/sendmail -oi -t"); print MAIL "From: $FORM{'name'} <$FORM{'email'}>\n"; print MAIL "To: you\@your.hostname.com\n"; print MAIL "Subject: Form output\n\n"; print MAIL "$FORM{'name'}, from $ENV{'REMOTE_HOST'} ($ENV{'REMOTE_ADDR'}), has sent you  the following comment:\n\n"; print MAIL "$FORM{'comment'}\n"; close (MAIL); print qq^<HTML>\n<HEAD>\n<TITLE>Thank you!</TITLE></HEAD>\n^; print qq^<BODY><H3>Thank you!</H3>\nThanks for your comments!</H3>\n</BODY>\n</HTML>^;

You'll want to tune this script to your own needsreplace the dummy To: header with one that mails to your real email address, making sure to keep the backslash in front of the @ symbol. This sample script doesn't do very much in and of itself, but after some experimentation you'll find that the basic principles we've covered here form the heart of server-side programs you can easily create yourself, from the smallest feedback forms to the largest online databases and e-commerce systems.

Accessing CGI Programs Directly

A CGI program does not have to be called from an HTML form or with the POST method. You can use a direct URL to call a CGI script that doesn't need to have any variables posted directly into an associative array such as %FORM. Such a URL would look like this:

http://www.example.com/cgi-bin/sysinfo?frank+3

This URL calls the sysinfo program in the /cgi-bin/ directory. Everything after the question mark is known as the query string, and its contents are available to the script as elements of the @ARGV array. Arguments are separated by plus signs (+). The sysinfo program would have the string frank available as $ARGV[0] and the number 3 as $ARGV[1].

Tip

You can also access the entire query string as the environment variable QUERY_STRING, or $ENV{'QUERY_STRING'} in Perl.





FreeBSD 6 Unleashed
FreeBSD 6 Unleashed
ISBN: 0672328755
EAN: 2147483647
Year: 2006
Pages: 355
Authors: Brian Tiemann

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net