11.2 CGI

Having Figure 11.1 in mind, CGI refers to the interface between the Web server and the program running on the application server. This application program is usually called a CGI script. ^[3] Roughly speaking, CGI processing works as follows :

A Web server receives an HTTP request message that invokes a CGI script.
The Web server creates a new server-side process to take care of this request.
The server-side process takes the input provided by the browser and passes it to the appropriate application program or CGI script.
The CGI script computes the output and returns it back to the server-side process.
The server-side process returns the CGI script s output to the client.
The server-side process exits and the Web server waits for new incoming HTTP request messages.

Information is exchanged between the server-side process and the CGI script using environment variables that are sent and received using a mechanism for inter-process communication (e.g., pipes in a UNIX environment). Consequently, a CGI script must be able to read from standard input (i.e., stdin ) and write to standard output (i.e., stdout ). As long as this requirement is fulfilled, it can be written in any programming or scripting language. Consequently, most CGI scripts are written in interpreted scripting languages that are supposed to be fast and easy to use. Examples include Perl, ^[4] the Tool Control Language (Tcl), Java, or Python. ^[5] As of this writing, Perl is by far the most popular and widely deployed language for CGI programming or scripting.

The most important environment variables used for CGI programming are summarized in Table 11.1. Note that not all environment variables are set for all HTTP request messages, and that a browser may also send new HTTP headers. If a browser sent a new HTTP header to the Web server, the server (or the server-side process) would package the header into a new CGI environment variable. The environment variable, in turn , would be prefixed with ˜ ˜HTTP and any dash character (-) would be changed to an underscore character (_). The Web server (or the server-side process) need not handle all possible HTTP headers.

Table 11.1: CGI Environment Variables (in Alphabetical Order)
Environment Variable	Meaning
AUTH_TYPE	User authentication method used
CONTENT_LENGTH	Length of input data
CONTENT_TYPE	Internet media type of input data
GATEWAY_INTERFACE	CGI version
HTTP_ACCEPT	List of MIME types accepted by the client
HTTP_USER_AGENT	Software and version of browser
HTTP_REFERER	URL of referring document
MOD_PERL	Defined if running under mod_perl
PATH_INFO	URL part after the script identifier
PATH_TRANSLATED	PATH_INFO translated into filesystem
QUERY_STRING	Query string from URL (if present)
REMOTE_ADDR	IP address of the client
REMOTE_HOST	DNS name of the client
REMOTE_IDENT	Remote user identification (unreliable)
REMOTE_USER	Name of the authenticated user
REQUEST_METHOD	HTTP request method (e.g., GET)
SCRIPT_NAME	Virtual path of the script
SERVER_NAME	DNS name of the server
SERVER_PORT	Port number of the server
SERVER_PROTOCOL	Name and version of the protocol
SERVER_SOFTWARE	Server software name and version

In addition to the environment variables summarized in Table 11.1, some SSL/TLS-enabled Web servers also set additional environment variables when SSL or TLS is used. For example, Table 11.2 summarizes the additional environment variables set by an SSL/TLS-enabled Apache Web server (i.e., Apache-SSL or Apache with mod ssl). Other SSL/ TLS-enabled Web servers may set other environment variables. In either case, the environment variables may be used by the CGI scripts to provide security services. For example, a CGI script that provides access to a database with confidential material may abort, unless a certain type of cipher suite is used.

Table 11.2: Some Additional Environment Variables for SSL/TLS (in Alphabetical Order)
Environment Variable	Meaning
HTTPS	Set if HTTPS is being used
HTTPS_CIPHER	SSL/TLS cipherspec
HTTPS_KEYSIZE	Number of bits in the session key
HTTPS_SECRETKEYSIZE	Number of bits in the secret key
SSL_CIPHER	The same as HTTPS_CIPHER
SSL_CLIENT_DN	Distinguished name in client s certificate
SSL_CLIENT_<x509>	Component of client s distinguished name
SSL_CLIENT_I_DN	Distinguished name of issuer of client s certificate
SSL_CLIENT_I_<x509>	Component of client s issuer s distinguished name
SSL_PROTOCOL_VERSION	SSL protocol version
SSL_SERVER_DN	Distinguished name in server s certificate
SSL_SERVER_<x509>	Component of server s distinguished name
SSL_SERVER_I_DN	Distinguished name of issuer of server s certificate
SSL_SERVER_I_<x509>	Component of server s issuer s distinguished name
SSL_SSLEAY_VERSION	Version of the SSLeay library

According to Table 11.1, the server-side process running on the Web server may provide to the CGI script some information that is encoded in the QUERY STRING environment variable. This information is usually provided by the user and is the user s sole means for passing input data to the CGI script. It may contain, for example, a list of keywords for a search engine or an SQL expression for use by a database gateway.

In either case, a browser may send a query string to a Web server (or CGI script, respectively) in two different ways:

The browser can append the query string to the CGI script s URL. For example, a resulting URL may look as follows: ^[6]

http://www.esecurity.ch/cgi-bin/
do_search?search=eSECURITY+Technologies

This example assumes that the CGI script do_search is installed in the cgi-bin directory of the Web server hosting www.esecurity.ch. In this case, the query string refers to the substring search=eSECURITY+Technologies. Because it is part of the URL, it has to follow the URL syntax rules, such as replacing spaces with the plus character (+). The CGI script, in turn, must reconstruct the query string by examining the environment variable QUERY STRING. This way of sending query strings to CGI scripts uses the standard HTTP GET method and is typically used by older CGI scripts.
The browser can send the query string using the HTTP POST method.

This method is usually called in response to the user filling out and submitting an HTML form. For example, a simple code segment that includes an HTML form may look as follows:

< FORM ACTION="/cgi-bin/do_search" METHOD=POST >
Search string: < INPUT TYPE="text" NAME="search" >< P >
< INPUT TYPE="submit" VALUE="Search" >
< /FORM >

When this HTML code segment is received by the browser, a corresponding fill-out form is displayed. Figure 11.2 illustrates how this form is displayed using, for example, the Opera browser. If the user typed in the search string ˜ ˜eSECURITY Technologies and pressed the Search button, the browser would use the HTTP POST method (as indicated by the form s METHOD attribute) to submit the contents of the form to the Web server. The Web server, in turn, would write the following query string to the process it just started:

search=eSECURITY+Technologies

The CGI script /cgi-bin/do_search can now read the query string from standard input and process it accordingly .

Figure 11.2: A simple HTML fill-out form displayed using the Opera browser. ( 2002 Opera Software.)

From a security point of view, the HTTP POST method is preferred because the query string does not appear in the requested URL. Note, however, that a determined attacker can still eavesdrop on the data traffic and extract any information he or she wants.

In addition, there are many concerns related to the security of CGI scripts. For example, many CGI scripts that had been distributed with Web server software packages in the past were later found to be flawed or buggy . The corresponding security flaws or software bugs could be exploited to attack the machines that hosted the CGI scripts. Fortunately, this problem is no longer relevant, because most Web server software packages are distributed either without CGI scripts or with CGI scripts that are not executable by default (i.e., they are configured with read privileges only). In either case, if a CGI script is found to be flawed or buggy, it must be removed from the Web server as soon as possible (it can also be corrected or replaced with a more secure script that provides the same or a similar functionality).

The adiministrator of a Web server has to make several decisions with regard to the installation and secure configuration of CGI scripts:

First, he or she has to carefully design the user account used to run the Web server and to implement the principle of least privilege. Note that whatever restrictions apply to a Web server also apply to the CGI scripts. For example, if a Web server runs as root on a UNIX system it can potentially leak the password files. This can be changed, for example, by using a shadowed password file and to run the Web server as a user with only a few privileges (e.g., a user called nobody).
Second, he or she has to decide whether the server uses script-aliased CGI or non-script-aliased CGI.
- Using script-aliased CGI means that a CGI script can only be executed if it is installed in an explicitly configured directory, typically the subdirectory cgi-bin in the root directory of the Web server.
- Using non-script-aliased CGI means that a CGI scripts can be executed if its filename extension corresponds to the one defined in the server s configuration settings. In this case, it does not really matter where a CGI script is installed and it can also be located in a user s directory.
Having only one directory to look for CGI scripts is better and less error prone. Consequently, script-aliased CGI should be the preferred option (if possible and appropriate).
Third, he or she has to decide what CGI scripts to install. Obviously, he or she should only install CGI scripts that are needed by at least one legitimate user. CGI scripts that are not used by anybody only represent a potential vulnerability to the security of the Web server and should be removed.

In either case, interpreters, shells , and other scripting engines must never be installed in a directory where they may be invoked by a request with user-supplied input data. This is particularly true for the directory that hosts the CGI scripts (i.e., the cgi-bin directory). Unfortunately, there are examples in which software vendors have shipped Web servers with a Perl interpreter installed in the CGI directory ( mainly to make it simpler to install and configure CGI scripts written in Perl). This is very dangerous. Imagine, for example, what happens if a Perl interpreter perl.exe and a Perl script search.pl are installed in the CGI directory of the Web site www.victim.com. In this case, any user can invoke the script by simply requesting the following URL:

http://www.victim.com/cgi-bin/perl.exe?search.pl

This is convenient . This configuration, however, does not only allow the Perl script search.pl to be executed, but to run arbitrary Perl commands on the Web server. For example, anybody can request the following URL from the Web server:

http://www.victim.com/cgi-bin/perl.exe?-e+%27unlink+%3C*%3E%27

Following the rules for unescaping URLs, the Web server transforms this expression into the shell command perl -e unlink < * > , which represents a Perl command to delete all files in the current directory. Whether the command is successful depends on whether the server s user permissions allow it to make the delete operations.

In practice, many security problems occur simply because the Web server administrators and CGI script programmers assume that users behave properly and play by the rules. This means that they often assume that users type in only valid input data, that file names only contain legal characters , that users don t peek at secret CGI parameters contained inside hidden form fields, and similar things. There are, however, many ways in which users may not play by the rules and try to exploit weaknesses or vulnerabilities. An example is given above. Another example crops up in Perl scripts designed to send an e-mail message to an address entered in a fill-out form. In UNIX, it s comparably easy to do this by opening a pipe to the mail command and printing the body of the e-mail message to this pipe. Assuming that param is a function that extracts named fields from the CGI query string, a Perl script segment may look as follows (the example is taken from [1]):

$address = param( address );
$subject = param( subject );
$message = param( message );
open (MAIL," /bin/mail -s $subject $address");
print MAIL $message;
close MAIL;

The script segment first uses param to recover the e-mail address, subject line, and body of the message. It then opens a pipe to the mail command, using the -s flag to specify a subject line and passing the recipient s e-mail address on the command line. The script prints the body of the message to the pipe and closes it. When the pipe is closed, the mail command delivers the message. The script is intended to be called from a fill-out form that may look as follows:

< FORM ACTION="/cgi-bin/handle_mail" METHOD=POST >
To: < INPUT TYPE="text" NAME="address" > < P >
Subject: < INPUT TYPE="text" NAME="subject" > < P >
Message: < TEXTAREA NAME="message" ROWS=5 >< /TEXTAREA > < P >
< INPUT TYPE="submit" VALUE="Send Mail" >
< /FORM >

If the user typed rolf.oppliger@esecurity.ch into the ˜ ˜To: field, and Test into the ˜ ˜Subject: field, the CGI script would run the following command:

/bin/mail -s Test rolf.oppliger@esecurity.ch In this case, everything works as anticipated and the e-mail message is sent to rolf.oppliger@esecurity.ch. Unfortunately, the script has a problem: it blindly trusts that the e-mail address and subject line supplied by the user are valid. Now consider what happens when a malicious user types the string rolf.oppliger@esecurity.ch; cat /etc/passwd into the e-mail address field. In this case, the shell command the script now executes looks as follows: ^[7]

/bin/mail -s Test rolf.oppliger@esecurity.ch; cat /etc/passwd The effect of this is to run the anticipated mail command and then execute cat/etc/passwd. This command prints the content of the password file to standard output, which is transferred to the requesting browser. Of course, there s no reason that the same or a similar technique couldn t be used to read the contents of any file on the server host, including HTML documents that are normally protected by access control mechanisms and encrypted in transmit through the SSL or TLS protocol. In fact, variants of this exploit can be used to do many (malicious) things on the Web server. Consequently, the most important thing to do from a security point of view is to validate user-supplied input data, and to perform some pattern-matching checks accordingly. If something suspicious if found, the input data must be modified or refused .

Simson Garfinkel and Eugene H. Spafford compiled a list of general principles and rules for safe CGI programming [5]. The principles and rules are summarized in Table 11.3; they should be kept in mind when designing and implementing CGI scripts. In the same book, the authors also provide rules for C, Perl, and Hypertext Proprocessor (PHP) programmers. These rules are not summarized here.

Table 11.3: General Principles and Rules for Safe CGI Programming*
No.	Principle or Rule
1	Carefully design the program before you start.
2	Show the specification to another person.
3	Write and test small sections at a time.
4	Check all values provided by the user.
5	Check arguments that you pass to operating system functions.
6	Check all return codes from system calls.
7	Have internal consistency-checking code.
8	Include lots of logging.
9	Some information should not be logged.
10	Make the critical portion of your program as small and as simple as possible.
11	Read through your code.
12	Always use full pathnames for any filename argument, for both commands and data files.
13	Rather than depending on the current directory, set it yourself.
14	Test your completed program thoroughly.
15	Be aware of race conditions.
16	Don t have your program dump core except during your testing.
17	Do not create files in world-writable directories.
18	Don t place undue reliance on the source IP address in the packets of connections you receive.
19	Include some form of load shedding or load limiting in your server to handle cases of excessive load.
20	Put reasonable time-outs on the real time used by your CGI script while it is running.
21	Put reasonable limits on the CPU time used by your CGI script while it is running.
22	Do not require the user to send a reusable password in plaintext over the network connection to authenticate herself.
23	Have your code reviewed by another competent programmer (or two, or more).
24	Whenever possible, reuse code.
*According to [5].

Last but not least, it is important to note that on some platforms and systems a wrapper may be used to more securely run CGI scripts. Historically, the term wrapper was first coined by Wietse Venema for a tool he named TCP wrapper. ^[8] The tool is heavily used on UNIX platforms. It provides some level of access control based on the source and destination of a TCP connection request and logging for successful and unsuccessful connections. More specifically , the TCP wrapper starts a filter program before the requested server process is started, assuming that the connection request is permitted by the access control lists. All messages about connections and connection attempts are logged via the syslog daemon (i.e., syslogd ). Similar to the TCP wrapper, a wrapper may be used to more securely run another program (e.g., a CGI script). The execution of the other program can be made more secure because the wrapper can be configured in a way that fully controls it and changes its permissions accordingly. For example, the suEXEC wrapper can be used on UNIX systems running the Apache Web server (since version 1.2). The wrapper provides the ability to run CGI script under user IDs different from the user ID of the calling Web server (normally, when a CGI script executes, it runs as the same user who is running the Web server). Further information about the suEXEC wrapper is available at http://httpd.apache.org/docs/suexec.html. Also, its installation and configuration is further addressed in [4].

^[3] The term script is used because most of these programs are written in a simple scripting language, such as Perl.

^[4] http://www.perl.com

^[5] http://www.python.org

^[6] Note that this URL is a fictitious example only.

^[7] On UNIX systems, the semicolon is a metacharacter used to separate multiple commands.

^[8] The tool can be downloaded from ftp://ftp.porcupine.org/pub/security .