Handling Forms and Scripts | Advanced Linux Networking

Although static content is extremely common and important, it's not the only type of file that a Web server may be called on to handle. Some Web sites are much more dynamic in nature. For instance, Web search sites like Google allow you to enter information in a form, click a button, and receive back a customized Web page that the Web server built just for you. If you need to create such sites, it's vital that you understand how to configure your Web server to handle them appropriately. This topic is complex enough that this section can only provide a basic overview and a few pointers to the Apache options used in configuring such a site. For more information, you'll need to consult more specialized documentation.

Understanding Static Content, Forms, and CGI Scripts

The preceding sections have focused on static content. This term refers to files that aren't unique to particular users and that don't interact with the user in any sophisticated way. Examples of static content include:

HTML files ” The Hypertext Markup Language (HTML) is the most common form for textual content on the Web in 2002. HTML files usually have extensions that end in .htm or .html , and they're basically nothing more than plain text files with a few character sequences reserved to indicate special formatting. For instance, <P> marks the start of a paragraph, and </P> marks the end. HTML also provides mechanisms for linking to other documents on the Web (or the Internet generally ) by embedding URLs in the text. Users can click on links to read the linked-to files, and some (such as many graphics) can be displayed automatically, depending upon the browser's capabilities and settings. The upcoming section, "HTML and Other Web File Formats," describes HTML in more detail.
Text files ” Plain text files usually have .txt extensions. Web servers can deliver plain text files, and Web browsers can display them, albeit without the formatting and links that HTML files make possible.
Graphics files ” HTML files frequently include links to graphics files in various formats. These files are also static files. Note that some graphics files include animations. Although these are animated, they still qualify as static content; the word static refers to the data in the file, not to the data's appearance when displayed in a Web browser.
Miscellaneous document files ” Web pages sometimes include links to Adobe Portable Document Format (PDF) files, Microsoft Word files, archive files like .zip files or tarballs, and so on. Some browsers can load some of these files into appropriate applications, like a PDF viewer or word processor. Other files, if their links are clicked, can be saved to disk.

These are all examples of static content, and can be served from your DocumentRoot directory, your UserRoot directories, or subdirectories of these. When a user requests one of these files, the data flow is largely one way: The client requests a file, and the server delivers that file. To be sure, the request for a file itself contains data, but beyond the data in that request, the data flows in one direction.

Dynamic content, by contrast, is customized for individual users or allows a two-way exchange of data. You've probably encountered dynamic content on the Web before. Examples include:

Web search engines ” When you enter a search engine's URL, the Web server delivers a Web page that includes a form in which you can enter a search term. When you click the search button, the term you typed is returned to the Web server, which processes the data in order to create a custom Web page that's delivered to your Web browser.
E-commerce sites ” Clicking the "buy" button on a retailer's Web site causes your browser to request a URL that the retailer's Web server uses to register an entry into a "shopping cart." The retailer's Web server and your Web browser coordinate their actions in subsequent interactions to provide you with information on your purchase; provide the retailer with your address, credit card number, and so on; and confirm your purchase. The details vary greatly from one site to another, but at their core , the interactions are similar to those involved in a Web search engine ”the Web server creates customized content, and your Web browser returns data to the Web server.
Personalized sites ” Some sites provide personalized " logons " that allow the Web server to deliver the content you want to see, rather than generic content. For instance, sites such as Slashdot (http://slashdot.org) allow you to register and provide preferences for the type and amount of data they're to display. These sites usually work by storing a cookie on your computer to uniquely identify you in the future. When you request a Web page, your browser returns the cookie, so the Web server can generate custom content. (E-commerce sites and even search engines may also use cookies.)

These are only a few examples of dynamic Web sites. The possibilities are limited mainly by the Web site designer's imagination . The key difference between dynamic and static sites from a Web server's point of view is that the dynamic sites require creating HTML (or other document formats) on the fly, based on input sent by the user in a previous interaction or in the URL. To do this, several mechanisms may be used:

Web forms ” A Web form is a Web page that provides buttons , data entry fields, lists, and other mechanisms for entering data. A search engine usually provides a small text-entry field and a button to click to begin a search. E-commerce sites usually provide a wider array of forms, including text-entry fields and selection lists (for entry of your state, for instance). Web forms are encoded in HTML, which may be generated statically or dynamically. Even fixed Web forms ultimately feed back into a dynamic system.
CGI scripts ” The Common Gateway Interface (CGI) is a common tool for interfacing programs for generating dynamic content to a Web browser. These scripts may be written in just about any language. (In fact, they can be written in compiled languages rather than scripting languages, although scripting languages like Perl are very popular for writing CGI scripts.) The Web browser calls the CGI script when the user enters an appropriate URL. The CGI script can then accept input back from the user, call other programs on the Web server, and generate an appropriate reply page.
SSIs ” Server Side Includes (SSIs) are a basic form of dynamic content. Instead of generating a complete Web page dynamically, as CGI scripts do, SSIs are used to modify a template script. This makes SSIs less flexible than CGI scripts, but they're useful for performing tasks like embedding the current date in an otherwise static Web page.

There are other forms of dynamic content available. In particular, an assortment of alternatives to CGI scripts exist, but CGI scripts remain an extremely popular way to generate dynamic content. Note that CGI scripts may create pages that contain Web forms; the two aren't so much competing forms of dynamic content as they are two aspects to one system for data exchange between client and server.

Setting Script and Form Options

In order to use CGI scripts, you must tell Apache that you want to use them. Apache must be configured to run the CGI script when its filename is provided in a URL, to process the script's output to be sent back to the Web browser, and possibly to receive return data from the Web browser for return to the CGI script for another round. Apache's role is that of a middle man, and fortunately, its configuration is not too difficult. There are two things you must do. First, you must enable CGI features. Second, you must tell Apache which types of incoming requests may be treated as CGI requests.

Adding CGI support involves loading the CGI module, thus:

 LoadModule cgi_module         lib/apache/mod_cgi.so

If your CGI support is compiled into the main Apache binary (and sometimes if you load it as a module), you may need to use AddModule to activate it, thus:

 AddModule mod_cgi.c

Once this is done, Apache has the basic tools it needs to handle CGI scripts. This leaves enabling CGI support for particular files or directories. There are several ways you can accomplish this task:

ScriptAlias ” This directive performs two tasks. First, it tells Apache to run CGI scripts within a specific directory. Second, it maps a physical directory to a directory that might be specified in a URL. For instance, ScriptAlias /scripts/ "/home/httpd/cgi-bin/" maps the /home/httpd/cgi-bin directory to the /scripts directory in a URL. For instance, with this configuration, a user who enters http://www.threeroomco.com/scripts/test.pl causes the /home/httpd/cgi-bin/test.pl CGI script to run. Many Apache installations include a default configuration along these lines; check your httpd.conf file for one. This configuration also relies upon the presence of the mod_alias module. This module is usually included by default, but you should check this detail if you have problems.
Options +ExecCGI ” You can provide the +ExecCGI parameter to the Options directive to enable execution of CGI scripts. You probably should not use this feature as a system-wide option; instead, apply it only to specific subdirectories (within a <Directory> directive).
.htaccess ” You can control various types of access to individual directories by placing .htaccess files in those directories. If the file contains an Options +ExecCGI line, Apache will run CGI scripts it finds in the directory. For this configuration to work, though, the httpd.conf file must include an AllowOverride Options line, at least for the directory in question.

WARNING

The Options +ExecCGI and AllowOverride Options methods are both potentially dangerous if applied sloppily, because users may then be able to write scripts that open the system up to security breaches. For this reason, most distributions disallow use of the .htaccess file, and often in other directories, as well.

Many distributions' default Apache configurations permit CGI scripts via the ScriptAlias option, often from a directory called /home/httpd/cgi-bin , using the /cgi-bin URL component. This configuration is convenient , because you can drop files in the CGI directory and have them be treated as CGI scripts with little additional fuss. One detail to which you must attend is permissions. In particular, the CGI scripts are scripts. Like other Linux scripts and programs, they must have appropriate permissions to be run. Particularly if you've written a script yourself or downloaded one from a Web or FTP site, you may need to type chmod a+x script- name , where script-name is the script's name, to set its permissions appropriately.

Writing CGI Scripts

CGI scripts, like other scripts, are computer programs. A complete guide to writing them is well beyond the scope of this chapter. This section therefore provides just a few pointers to help get you started if you already know something about scripting. If you need more information, consult the "Dynamic Content with CGI" Web page (http://httpd.apache.org/docs/howto/cgi.html) for a basic introduction, or a book on CGI scripting for more detail.

CGI scripts accept standard input and generate standard output. Therefore, any text that you want your user to see can be output as if to the console, using standard output commands. The trick to creating output is to remember that the user is reading the output on a Web browser. Thus, your CGI script should generate HTML, or occasionally some other format that's friendly to Web browsers. (For instance, you might dynamically create a graphics file.)

Preceding the bulk of the HTML output, your CGI script should generate a content type header that lists the document's MIME type. This should normally resemble the following:

 Content-type: text/html\r\n\r\n

This example specifies a text/html MIME type, which is usually what you want to create. (The \r\n\r\n portion of the line creates two new-lines, which is a necessary part of the specification.) Precisely how you create this output depends upon your scripting language, of course. Bringing this together with a normal script header line and a simple program might produce something like Listing 20.1, which shows a Perl script to display a line of text. If you type this program into a file in a CGI scripting directory, give it execute permissions, and specify its URL in a Web browser, you should see the text Hello, Web appear in the Web browser's window.

Listing 20.1 A simple Perl CGI script

 #!/usr/bin/perl print "Content-type: text/html\r\n\r\n"; print "Hello, Web";

CGI script input is a bit trickier. Your script may receive input if it has generated output that displays a form on the Web browser. The input to the CGI script from the user's entering data in the form appears as field/value pairs. Each of these pairs uses an equals sign ( = ) to separate the field name from its returned value, and ampersands ( & ) separate different field/value pairs. For instance, your CGI script might see input like this:

 city=Oberlin&state=OH&zip=44074

Parsing and using such input is one of the strengths of certain scripting languages, including Perl, hence the popularity of Perl as a CGI scripting language.

Scripting Security Measures

One of the dangers of using CGI scripts is that you are giving anybody who can reach your Web server with a Web browser the right to run programs on your computer. Of course, this is true of any server, in the sense that outsiders can use the server itself. A CGI-enabled Web server, though, opens the door substantially wider, because every CGI script is a potential security threat. The programmers who write servers usually take great care to ensure that the server doesn't suffer from any security flaws. Even with careful attention to this detail, security problems occasionally do crop up. A Web server's CGI scripting tools are often used by administrators who are not as skilled at programming as are those who write servers, and the results can be disastrous.

Fortunately, there are certain measures you can take to help minimize the risk. Most importantly, you should double-check your User and Group settings in httpd.conf . Apache runs CGI scripts with the permissions specified by these options, so if you use an account with few privileges for this purpose, you minimize the damage that can be done if your script contains a flaw. Ideally, you should create a user and group only for Apache, and configure the account to not accept remote logins by any other means. This isn't a panacea, though; even with limited access, a buggy script could give a miscreant a foothold that could be used to create greater access to the server, when combined with other security problems.

You can also use existing scripting libraries. This will both ease the development task and reduce the risk that your code contains fundamental security flaws. You can find scripting libraries, such as CGI.pm and CGI::Lite, on scripting Web sites like http://www.cpan.org.

In the event that your Web server is compromised, you should take steps to ensure that it can do minimal damage. For instance, you should disable unnecessary servers, and restrict access from the Web server computer to other computers on your network. Part IV, Network Security and Router Functions, provides information on many general-purpose security measures you can take.