Section 16.4. Climbing the CGI Learning Curve

16.4. Climbing the CGI Learning Curve

Now that we've looked at setup issues, it's time to get into concrete programming details. This section is a tutorial that introduces CGI coding one step at a timefrom simple, noninteractive scripts to larger programs that utilize all the common web page user input devices (what we called widgets in the Tkinter GUI chapters in Part III).

Along the way, we'll also explore the core ideas behind server-side scripting. We'll move slowly at first, to learn all the basics; the next chapter will use the ideas presented here to build up larger and more realistic web site examples. For now, let's work through a simple CGI tutorial, with just enough HTML thrown in to write basic server-side scripts.

16.4.1. A First Web Page

As mentioned, CGI scripts are intimately bound up with HTML, so let's start with a simple HTML page. The file tutor0.html, shown in Example 16-2, defines a bona fide, fully functional web pagea text file containing HTML code, which specifies the structure and contents of a simple web page.

Example 16-2. PP3E\Internet\Web\tutor0.html

 <HTML> <TITLE>HTML 101</TITLE> <BODY> <H1>A First HTML page</H1> <P>Hello, HTML World!</P> </BODY></HTML>

If you point your favorite web browser to the Internet address of this file, you should see a page like that shown in Figure 16-2. This figure shows the Firefox browser at work on the address http://localhost/tutor0.html (type this into your browser's address field), and it assumes that the local web browser described in the prior section is running; other browsers render the page similarly. Since this is a static HTML file, you'll get the same result if you simply click on the file's icon on most platforms.

Figure 16-2. A simple web page from an HTML file

To truly understand how this little file does its work, you need to know something about HTML syntax, Internet addresses, and file permission rules. Let's take a quick first look at each of these topics before we move on to the next example.

16.4.1.1. HTML basics

I promised that I wouldn't teach much HTML in this book, but you need to know enough to make sense of examples. In short, HTML is a descriptive markup language, based on tags items enclosed in <> pairs. Some tags stand alone (e.g., <HR> specifies a horizontal rule). Others appear in begin/end pairs in which the end tag includes an extra slash.

For instance, to specify the text of a level-one header line, we write HTML code of the form <H1>text</H1>; the text between the tags shows up on the web page. Some tags also allow us to specify options. For example, a tag pair like <A href="address">text</A> specifies a hyperlink: pressing the link's text in the page directs the browser to access the Internet address (URL) listed in the href option.

It's important to keep in mind that HTML is used only to describe pages: your web browser reads it and translates its description to a web page with headers, paragraphs, links, and the like. Notably absent are both layout informationthe browser is responsible for arranging components on the pageand syntax for programming logicthere are no if statements, loops, and so on. Also, Python code is nowhere to be found in Example 16-2; raw HTML is strictly for defining pages, not for coding programs or specifying all user interface details.

HTML's lack of user interface control and programmability is both a strength and a weakness. It's well suited to describing pages and simple user interfaces at a high level. The browser, not you, handles physically laying out the page on your screen. On the other hand, HTML does not directly support full-blown GUIs and requires us to introduce CGI scripts (and other technologies) to web sites, in order to add dynamic programmability to otherwise static HTML.

16.4.1.2. Internet addresses (URLs)

Once you write an HTML file, you need to put it somewhere a web browser can reference it. If you are using the locally running Python web server described earlier, this becomes trivial: use a URL of the form http://localhost/file.html to access web pages, and http://localhost/cgi-bin/file.py to name CGI scripts. This is implied by the fact that the web server script serves pages and scripts from the directory in which it is.

On other servers, URLs may be more complex. Like all HTML files, tutor0.html must be stored in a directory on the server machine, from which the resident web server program allows browsers to fetch pages. For example, on the server used for the second edition of this book, the page's file must be stored in or below the public_html directory of my personal home directorythat is, somewhere in the directory tree rooted at /home/lutz/public_html. The complete Unix pathname of this file on the server is:

 /home/lutz/public_html/tutor0.html

This path is different from its PP3E\Internet\Web location in the book's examples distribution, as given in the example file listing's title. When referencing this file on the client, though, you must specify its Internet address, sometimes called a URL, instead. The following URL was used to load the remote page from the server:

 http://starship.python.net/~lutz/tutor0.html

The server maps this URL to the Unix pathname automatically. In general, URL strings like this one are composed as the concatenation of multiple parts:

Protocol name:
http

The protocol part of this URL tells the browser to communicate with the HTTP (i.e., web) server program on the server machine, using the HTTP message protocol. URLs used in browsers can also name different protocolsfor example, ftp:// to reference a file managed by the FTP protocol and server, file:// to reference a file on the local machine, telnet to start a Telnet client session, and so on.

Server machine name and port:
starship.python.net

A URL also names the target server machine's domain name or Internet Protocol (IP) address following the protocol type. Here, we list the domain name of the server machine where the examples are installed; the machine name listed is used to open a socket to talk to the server. As usual, a machine name of localhost (or the equivalent IP address 127.0.0.1) here means the server is running on the same machine as the client.

Optionally, this part of the URL may also explicitly give the socket port on which the server is listening for connections, following a colon (e.g., starship.python.net:8000, or 127.0.0.1:80). For HTTP, the socket is usually connected to port number 80, so this is the default if the port is omitted. See Chapter 13 if you need a refresher on machine names and ports.

File path:
~lutz/tutor0.html

Finally, the URL gives the path to the desired file on the remote machine. The HTTP web server automatically translates the URL's file path to the file's true pathname: on the starship server, ~lutz is automatically translated to the public_html directory in my home directory. When using the Python-coded web server script in Example 16-1, files are mapped to the server's current working directory instead. URLs typically map to such files, but they can reference other sorts of items as well, and may name an executable CGI script to be run when accessed.

Query parameters (used in later examples)

URLs may also be followed by additional input parameters for CGI programs. When used, they are introduced by a ? and are typically separated by & characters. For instance, a string of the form ?name=bob&job=hacker at the end of a URL passes parameters named name and job to the CGI script named earlier in the URL, with values bob and hacker, respectively. As we'll discuss later in this chapter when we explore escaping rules, the parameters may sometimes be separated by ; characters instead, as in ?name=bob;job=hacker, though this form is less common.

These values are sometimes called URL query string parameters and are treated the same as form inputs by scripts. Technically speaking, query parameters may have other structures (e.g., unnamed values separated by +), but we will ignore additional options in this text; more on both parameters and input forms later in this tutorial.

To make sure we have a handle on URL syntax, let's pick apart another example that we will be using later in this chapter. In the following HTTP protocol URL:

 http://localhost:80/cgi-bin/languages.py?language=All

the components uniquely identify a server script to be run as follows:

The server name localhost means the web server is running on the same machine as the client; as explained earlier, this is the configuration we're using for our examples.
Port number 80 gives the socket port on which the web server is listening for connections (port 80 is the default if this part is omitted, so we will usually omit it).
The file path cgi-bin/languages.py gives the location of the file to be run on the server machine, within the directory where the server looks for referenced files.
The query string ?language=All provides an input parameter to the referenced script languages.py, as an alternative to user input in form fields (described later).

Although this covers most URLs you're likely to encounter in the wild, the full format of URLs is slightly richer:

 protocol://networklocation/path;parameters?querystring#fragment

For instance, the fragment part may name a section within a page (e.g., #part1). Moreover, each part can have formats of their own, and some are not used in all protocols. The ;parameters part is omitted for HTTP, for instance (it gives an explicit file type for FTP), and the networklocation part may also specify optional user login parameters for some protocol schemes (its full format is user:password@host:port for FTP and Telnet, but just host:port for HTTP). We used a complex FTP URL in Chapter 14, for example, which included a username and password, as well as a binary file type (the server may guess if no type is given):

 ftp://lutz:password@ftp.rmi.net/filename;type=i

We'll ignore additional URL formatting rules here. If you're interested in more details, you might start by reading the urlparse module's entry in Python's library manual, as well as its source code in the Python standard library. You may also notice that a URL you type to access a page looks a bit different after the page is fetched (spaces become + characters, % characters are added, and so on). This is simply because browsers must also generally follow URL escaping (i.e., translation) conventions, which we'll explore later in this chapter.

16.4.1.3. Using minimal URLs

Because browsers remember the prior page's Internet address, URLs embedded in HTML files can often omit the protocol and server names, as well as the file's directory path. If missing, the browser simply uses these components' values from the last page's address. This minimal syntax works for URLs embedded in hyperlinks and for form actions (we'll meet forms later in this tutorial). For example, within a page that was fetched from the directory dirpath on the server www.server.com, minimal hyperlinks and form actions such as:

 <A HREF="more.html"> <FORM ACTION="next.py"  ...>

are treated exactly as if we had specified a complete URL with explicit server and path components, like the following:

 <A HREF="http://www.server.com/dirpath/more.html"> <FORM ACTION="http://www.server.com/dirpath/next.py"  ...>

The first minimal URL refers to the file more.html on the same server and in the same directory from which the page containing this hyperlink was fetched; it is expanded to a complete URL within the browser. URLs can also employ Unix-style relative path syntax in the file path component. A hyperlink tag like <A HREF="../spam.gif">, for instance, names a GIF file on the server machine and parent directory of the file that contains this link's URL.

Why all the fuss about shorter URLs? Besides extending the life of your keyboard and eyesight, the main advantage of such minimal URLs is that they don't need to be changed if you ever move your pages to a new directory or serverthe server and path are inferred when the page is used; they are not hardcoded into its HTML. The flipside of this can be fairly painful: examples that do include explicit site names and pathnames in URLs embedded within HTML code cannot be copied to other servers without source code changes. Scripts and special HTML tags can help here, but editing source code can be error-prone.

The downside of minimal URLs is that they don't trigger automatic Internet connections when followed offline. This becomes apparent only when you load pages from local files on your computer. For example, we can generally open HTML pages without connecting to the Internet at all by pointing a web browser to a page's file that lives on the local machine (e.g., by clicking on its file icon). When browsing a page locally like this, following a fully specified URL makes the browser automatically connect to the Internet to fetch the referenced page or script. Minimal URLs, though, are opened on the local machine again; usually, the browser simply displays the referenced page or script's source code.

The net effect is that minimal URLs are more portable, but they tend to work better when running all pages live on the Internet. To make them easier to work with, the examples in this book will often omit the server and path components in URLs they contain. In this book, to derive a page or script's true URL from a minimal URL, imagine that the string:

 http://localhost/

appears before the filename given by the URL. Your browser will, even if you don't.

16.4.1.4. HTML file permission constraints

One install pointer before we move on: if you want to use a different server and machine, it may be necessary on some platforms to grant web page files and their directories world-readable permission. That's because they are loaded by arbitrary people over the Web (often by someone named "nobody," who we'll introduce in a moment).

An appropriate chmod command can be used to change permissions on Unix-like machines. For instance, a chmod 755 filename shell command usually suffices; it makes filename readable and executable by everyone, and writable by you only.^[*] These directory and file permission details are typical, but they can vary from server to server. Be sure to find out about the local server's conventions if you upload HTML files to a remote site.

^[*] These are not necessarily magic numbers. On Unix machines, mode 755 is a bit mask. The first 7 simply means that you (the file's owner) can read, write, and execute the file (7 in binary is 111each bit enables an access mode). The two 5s (binary 101) say that everyone else (your group and others) can read and execute (but not write) the file. See your system's manpage on the chmod command for more details.

16.4.2. A First CGI Script

The HTML file we saw in the prior section is just thatan HTML file, not a CGI script. When referenced by a browser, the remote web server simply sends back the file's text to produce a new page in the browser. To illustrate the nature of CGI scripts, let's recode the example as a Python CGI program, as shown in Example 16-3.

Example 16-3. PP3E\Internet\Web\cgi-bin\tutor0.py

 #!/usr/bin/python ####################################################### # runs on the server, prints HTML to create a new page; # url=http://localhost/cgi-bin/tutor0.py ####################################################### print "Content-type: text/html\n" print "<TITLE>CGI 101</TITLE>" print "<H1>A First CGI script</H1>" print "<P>Hello, CGI World!</P>"

This file, tutor0.py, makes the same sort of page as Example 16-2 if you point your browser at itsimply replace .html with .py in the URL, and add the cgi-bin subdirectory name to the path to yield its address, http://localhost/cgi-bin/tutor0.py.

But this time it's a very different kind of animalit is an executable program that is run on the server in response to your access request. It's also a completely legal Python program, in which the page's HTML is printed dynamically, instead of being precoded in a static file. In fact, little is CGI-specific about this Python program; if run from the system command line, it simply prints HTML instead of generating a browser page:

 C:\...\PP3E\Internet\Web\cgi-bin>python tutor0.py Content-type: text/html <TITLE>CGI 101</TITLE> <H1>A First CGI script</H1> <P>Hello, CGI World!</P>

When run by the HTTP server program on a web server machine, however, the standard output stream is tied to a socket read by the browser on the client machine. In this context, all the output is sent across the Internet to your browser. As such, it must be formatted per the browser's expectations.

In particular, when the script's output reaches your browser, the first printed line is interpreted as a header, describing the text that follows. There can be more than one header line in the printed response, but there must always be a blank line between the headers and the start of the HTML code (or other data).

In this script, the first header line tells the browser that the rest of the transmission is HTML text (text/html), and the newline character (\n) at the end of the first print statement generates an extra line feed in addition to the one that the print statement generates itself. The net effect is to insert a blank line after the header line. The rest of this program's output is standard HTML and is used by the browser to generate a web page on a client, exactly as if the HTML lived in a static HTML file on the server.^[*]

^[*] Notice that the script does not generate the enclosing <HEAD> and <BODY> tags included in the static HTML file of the prior section. As mentioned in Chapter 2, strictly speaking, it shouldHTML without such tags is technically invalid. But because all commonly used browsers simply ignore the omission, we'll take some liberties with HTML syntax in this book. If you need to care about such things, consult HTML references for more formal details.

CGI scripts are accessed just like HTML files: you either type the full URL of this script into your browser's address field, or click on the tutor0.py link line in the examples root page of Figure 16-1 (which follows a minimal hyperlink that resolves to the script's full URL). Figure 16-3 shows the result page generated if you point your browser at this script.

Figure 16-3. A simple web page from a CGI script

16.4.2.1. Installing CGI scripts

If you are running the local web server described at the start of this chapter, no extra installation steps are required to make this example work, and you can safely skip most of this section. If you want to put CGI scripts on another server, though, there are a few pragmatic details you may need to know about. This section provides a brief overview of common CGI configuration details for reference.

Like HTML files, CGI scripts are simple text files that you can either create on your local machine and upload to the server by FTP, or write with a text editor running directly on the server machine (perhaps using a Telnet client). However, because CGI scripts are run as programs, they have some unique installation requirements that differ from simple HTML files. In particular, they usually must be stored and named specially, and they must be configured as programs that are executable by arbitrary users. Depending on your needs, CGI scripts also may require help finding imported modules and may need to be converted to the server platform's text file format after being uploaded. Let's look at each install constraint in more depth:

Directory and filename conventions

First, CGI scripts need to be placed in a directory that your web server recognizes as a program directory, and they need to be given a name that your server recognizes as a CGI script. In the local web server we're using in this chapter, scripts need to be placed in a special cgi-bin subdirectory and be named with a .py extension. On the server used for this book's second edition, CGI scripts instead were stored in the user's public_html directory just like HTML files, but they required a filename ending in a .cgi, not a .py. Some servers may allow other suffixes and program directories; this varies widely and can sometimes be configured per server or per user.

Execution conventions

Because they must be executed by the web server on behalf of arbitrary users on the Web, CGI script files may also need to be given executable file permissions to mark them as programs, and be made executable by others. Again, a shell command chmod 0755 filename does the trick on most servers.

Under some servers, CGI scripts also need the special #! line at the top, to identify the Python interpreter that runs the file's code. The text after the #! in the first line simply gives the directory path to the Python executable on your server machine. See Chapter 3 for more details on this special first line, and be sure to check your server's conventions for more details on non-Unix platforms.

Some servers may expect this line, even outside Unix. Most of the CGI scripts in this book include the #! line just in case they will ever be run on Unix-like platforms; under our locally running web server on Windows, this first line is simply ignored as a Python comment.

One subtlety worth noting: as we saw earlier in the book, the special first line in executable text files can normally contain either a hardcoded path to the Python interpreter (e.g., #!/usr/bin/python) or an invocation of the env program (e.g., #!/usr/bin/env python), which deduces where Python lives from environment variable settings (i.e., your $PATH). The env TRick is less useful in CGI scripts, though, because their environment settings are those of the user "nobody" (not your own), as explained in the next paragraph.

Module search path configuration (optional)

Some HTTP servers may run CGI scripts with the username "nobody" for security reasons (this limits the user's access to the server machine). That's why files you publish on the Web must have special permission settings that make them accessible to other users. It also means that some CGI scripts can't rely on the Python module search path to be configured in any particular way. As we've seen, the module path is normally initialized from the user's PYTHONPATH setting and .pth files, plus defaults. But because CGI scripts are run by the user "nobody," PYTHONPATH may be arbitrary when a CGI script runs.

Before you puzzle over this too hard, you should know that this is often not a concern in practice. Because Python usually searches the current directory for imported modules by default, this is not an issue if all of your scripts and any modules and packages they use are stored in your web directory. But if the module lives elsewhere, you may need to modify the sys.path list in your scripts to adjust the search path manually before importsfor instance, with sys.path.append(dirname) calls, index assignments, and so on.

End-of-line conventions (optional)

On some Unix (and Linux) servers, you might also have to make sure that your script text files follow the Unix end-of-line convention (\n), not DOS (\r\n). This isn't an issue if you edit and debug right on the server (or on another Unix machine) or FTP files one by one in text mode. But if you edit and upload your scripts from a PC to a Unix server in a tar file (or in FTP binary mode), you may need to convert end-of-lines after the upload. For instance, the server that was used for the second edition of this text returns a default error page for scripts whose end-of-lines are in DOS format. See Chapter 7 for automated end-of-line converter scripts.

Unbuffered output streams (optional)

Under some servers, the print statement may buffer its output. If you have a long-running CGI script, to avoid making the user wait to see results, you may wish to manually flush your printed text (call sys.stdout.flush( )) or run your Python scripts in unbuffered mode. Recall from Chapter 5 that you can make streams unbuffered by running with the -u command-line flag or by setting your PYTHONUNBUFFERED environment variable to a nonempty value.

To use -u in the CGI world, try using a first line like #!/usr/bin/python -u. In typical usage, output buffering is not usually a factor. On some servers and clients, this may be a resolution for empty reply pages, or premature end-of-script header errorsthe client may time out before the buffered output stream is sent (though more commonly, these cases reflect genuine program errors in your script).

This installation process may sound a bit complex at first glance, but much of it is server-dependent, and it's not bad once you've worked through it on your own. It's only a concern at install time and can usually be automated to some extent with Python scripts run on the server. To summarize, most Python CGI scripts are text files of Python code, which:

Are named according to your web server's conventions (e.g., file.py)
Are stored in a directory recognized by your web server (e.g., cgi-bin/)
Are given executable file permissions if required (e.g., chmod 755 file.py)
May require the special #!pythonpath line at the top for some servers
Configure sys.path only if needed to see modules in other directories
Use Unix end-of-line conventions, if your server rejects DOS format
Flush output buffers if required, or to send portions of the reply periodically

Even if you must use a server machine configured by someone else, most of the machine's conventions should be easy to root out during a normal debugging cycle. As usual, you should consult the conventions for any machine to which you plan to copy these example files.

16.4.2.2. Finding Python on remote servers

One last install pointer: even though Python doesn't have to be installed on any clients in the context of a server-side web application, it does have to exist on the server machine where your CGI scripts are expected to run. If you're running your own server with either the webserver.py script we met earlier, or an open source server such as Apache, this is a nonissue.

But if you are using a web server that you did not configure yourself, you must be sure that Python lives on that machine. Moreover, you need to find where it is on that machine so that you can specify its path in the #! line at the top of your script. If you are not sure if or where Python lives on your server machine, here are some tips:

Especially on Unix systems, you should first assume that Python lives in a standard place (e.g., /usr/local/bin/python): type python in a shell window and see if it works. Chances are that Python already lives on such machines. If you have Telnet access on your server, a Unix find command starting at /usr may help.
If your server runs Linux, you're probably set to go. Python ships as a standard part of Linux distributions these days, and many web sites and Internet Service Providers (ISPs) run the Linux operating system; at such sites, Python probably already lives at /usr/bin/python.
In other environments where you cannot control the server machine yourself, it may be harder to obtain access to an already installed Python. If so, you can relocate your site to a server that does have Python installed, talk your ISP into installing Python on the machine you're trying to use, or install Python on the server machine yourself.

If your ISP is unsympathetic to your need for Python and you are willing to relocate your site to one that is, you can find lists of Python-friendly ISPs by searching http://www.python.org. And if you choose to install Python on your server machine yourself, be sure to check out the freeze tool shipped with the Python source distribution (in the Tools directory). With freeze, you can create a single executable program file that contains the entire Python interpreter, as well as all the standard library modules. Such a frozen interpreter can be uploaded to your web account by FTP in a single step, and it won't require a full-blown Python installation on the server. Also see the public domain Installer and Py2Exe systems, which can similarly produce a frozen Python binary.

16.4.3. Adding Pictures and Generating Tables

Let's get back to writing server-side code. As anyone who's ever surfed the Web knows, web pages usually consist of more than simple text. Example 16-4 is a Python CGI script that prints an <IMG> HTML tag in its output to produce a graphic image in the client browser. This example isn't very Python-specific, but note that just as for simple HTML files, the image file (ppsmall.gif, one level up from the script file) lives on and is downloaded from the server machine when the browser interprets the output of this script to render the reply page.

Example 16-4. PP3E\Internet\Web\cgi-bin\tutor1.py

 #!/usr/bin/python text = """Content-type: text/html <TITLE>CGI 101</TITLE> <H1>A Second CGI script</H1> <HR> <P>Hello, CGI World!</P> <IMG src="/books/2/726/1/html/2/../ppsmall.gif" BORDER=1 ALT=[image]> <HR> """ print text

Notice the use of the triple-quoted string block here; the entire HTML string is sent to the browser in one fell swoop, with the print statement at the end. Be sure that the blank line between the Content-type header and the first HTML is truly blank in the string (it may fail in some browsers if you have any spaces or tabs on that line). If both client and server are functional, a page that looks like Figure 16-4 will be generated when this script is referenced and run.

Figure 16-4. A page with an image generated by tutor1.py

So far, our CGI scripts have been putting out canned HTML that could have just as easily been stored in an HTML file. But because CGI scripts are executable programs, they can also be used to generate HTML on the fly, dynamicallyeven, possibly, in response to a particular set of user inputs sent to the script. That's the whole purpose of CGI scripts, after all. Let's start using this to better advantage now, and write a Python script that builds up response HTML programmatically, listed in Example 16-5.

Example 16-5. PP3E\Internet\Web\cgi-bin\tutor2.py

 #!/usr/bin/python print """Content-type: text/html <TITLE>CGI 101</TITLE> <H1>A Third CGI script</H1> <HR> <P>Hello, CGI World!</P> <table border=1> """ for i in range(5):     print "<tr>"     for j in range(4):         print "<td>%d.%d</td>" % (i, j)     print "</tr>" print """ </table> <HR> """

Despite all the tags, this really is Python codethe tutor2.py script uses triple-quoted strings to embed blocks of HTML again. But this time, the script also uses nested Python for loops to dynamically generate part of the HTML that is sent to the browser. Specifically, it emits HTML to lay out a two-dimensional table in the middle of a page, as shown in Figure 16-5.

Figure 16-5. A page with a table generated by tutor2.py

Each row in the table displays a "row.column" pair, as generated by the executing Python script. If you're curious how the generated HTML looks, select your browser's View Source option after you've accessed this page. It's a single HTML page composed of the HTML generated by the first print in the script, then the for loops, and finally the last print. In other words, the concatenation of this script's output is an HTML document with headers.

16.4.3.1. Table tags

The script in Example 16-5 generates HTML table tags. Again, we're not out to learn HTML here, but we'll take a quick look just so that you can make sense of this book's examples. Tables are declared by the text between <table> and </table> tags in HTML. Typically, a table's text in turn declares the contents of each table row between <tr> and </tr> tags and each column within a row between <td> and </td> tags. The loops in our script build up HTML to declare five rows of four columns each by printing the appropriate tags, with the current row and column number as column values.

For instance, here is part of the script's output, defining the first two rows (to see the full output, run the script standalone from a system command line, or select your browser's View Source option):

 <table border=1> <tr> <td>0.0</td> <td>0.1</td> <td>0.2</td> <td>0.3</td> </tr> <tr> <td>1.0</td> <td>1.1</td> <td>1.2</td> <td>1.3</td> </tr> . . . </table>

Other table tags and options let us specify a row title (<th>), layout borders, and so on. We'll use more table syntax to lay out forms in a uniform fashion later in this tutorial.

16.4.4. Adding User Interaction

CGI scripts are great at generating HTML on the fly like this, but they are also commonly used to implement interaction with a user typing at a web browser. As described earlier in this chapter, web interactions usually involve a two-step process and two distinct web pages: you fill out an input form page and press Submit, and a reply page eventually comes back. In between, a CGI script processes the form input.

16.4.4.1. Submission page

That description sounds simple enough, but the process of collecting user inputs requires an understanding of a special HTML tag, <form>. Let's look at the implementation of a simple web interaction to see forms at work. First, we need to define a form page for the user to fill out, as shown in Example 16-6.

Example 16-6. PP3E\Internet\Web\tutor3.html

 <html> <title>CGI 101</title> <body> <H1>A first user interaction: forms</H1> <hr> <form method=POST action="http://localhost/cgi-bin/tutor3.py">     <P><B>Enter your name:</B>     <P><input type=text name=user>     <P><input type=submit> </form> </body></html>

tutor3.html is a simple HTML file, not a CGI script (though its contents could be printed from a script as well). When this file is accessed, all the text between its <form> and </form> tags generates the input fields and Submit button shown in Figure 16-6.

Figure 16-6. A simple form page generated by tutor3.html

16.4.4.2. More on form tags

We won't go into all the details behind coding HTML forms, but a few highlights are worth underscoring. The following occurs within a form's HTML code:

Form handler action

The form's action option gives the URL of a CGI script that will be invoked to process submitted form data. This is the link from a form to its handler programin this case, a program called tutor3.py in the cgi-bin subdirectory of the locally running server's working directory. The action option is the equivalent of command options in Tkinter buttonsit's where a callback handler (here, a remote handler script) is registered to the browser and server.

Input fields

Input controls are specified with nested <input> tags. In this example, input tags have two key options. The type option accepts values such as text for text fields and submit for a Submit button (which sends data to the server and is labeled "Submit Query" by default). The name option is the hook used to identify the entered value by key, once all the form data reaches the server. For instance, the server-side CGI script we'll see in a moment uses the string user as a key to get the data typed into this form's text field.

As we'll see in later examples, other input tag options can specify initial values (value=X), display-only mode (readonly), and so on. As we'll also see later, other input type option values may transmit hidden data that embeds state information in pages (type=hidden), reinitializes fields (type=reset), or makes multiple-choice buttons (type=checkbox).

Submission method: get and post

Forms also include a method option to specify the encoding style to be used to send data over a socket to the target server machine. Here, we use the post style, which contacts the server and then ships it a stream of user input data in a separate transmission over the socket.

An alternative get style ships input information to the server in a single transmission step by appending user inputs to the query string at the end of the URL used to invoke the script, usually after a ? character. Query parameters were introduced earlier when we met URLs; we will put them to use later in this section.

With get, inputs typically show up on the server in environment variables or as arguments in the command line used to start the script. With post, they must be read from standard input and decoded. Because the get method appends inputs to URLs, it allows users to bookmark actions with parameters for later submission (e.g., a link to a retail site, together with the name of a particular item); post is very generally meant for sending data that is to be submitted once (e.g., comment text).

The get method is usually considered more efficient, but it may be subject to length limits in the operating system and is less secure (parameters may be recorded in server logs, for instance). post can handle larger inputs and may be more secure in some scenarios, but it requires an extra transmission. Luckily, Python's cgi module transparently handles either encoding style, so our CGI scripts don't need to know or care which is used.

Notice that the action URL in this example's form spells out the full address for illustration. Because the browser remembers where the enclosing HTML page came from, it works the same with just the script's filename, as shown in Example 16-7.

Example 16-7. PP3E\Internet\Web\tutor3-minimal.html

 <html> <title>CGI 101</title> <body> <H1>A first user interaction: forms</H1> <hr> <form method=POST action="cgi-bin/tutor3.py">     <P><B>Enter your name:</B>     <P><input type=text name=user>     <P><input type=submit> </form> </body></html>

It may help to remember that URLs embedded in form action tags and hyperlinks are directions to the browser first, not to the script. The tutor3.py script itself doesn't care which URL form is used to trigger itminimal or complete. In fact, all parts of a URL through the script filename (and up to URL query parameters) are used in the conversation between browser and HTTP server, before a CGI script is ever spawned. As long as the browser knows which server to contact, the URL will work.

On the other hand, URLs submitted outside of a page (e.g., typed into a browser's address field or sent to Python's urllib module) usually must be completely specified, because there is no notion of a prior page.

16.4.4.3. Response script

So far, we've created only a static page with an input field. But the Submit button on this page is loaded to work magic. When pressed, it triggers the possibly remote program whose URL is listed in the form's action option, and passes this program the input data typed by the user, according to the form's method encoding style option. On the server, a Python script is started to handle the form's input data while the user waits for a reply on the client, as shown in Example 16-8.

Example 16-8. PP3E\Internet\Web\cgi-bin\tutor3.py

 #!/usr/bin/python ####################################################### # runs on the server, reads form input, prints HTML; # url=http://server-name/cgi-bin/tutor3.py ####################################################### import cgi form = cgi.FieldStorage( )            # parse form data print "Content-type: text/html"        # plus blank line html = """ <TITLE>tutor3.py</TITLE> <H1>Greetings</H1> <HR> <P>%s</P> <HR>""" if not form.has_key('user'):     print html % "Who are you?" else:     print html % ("Hello, %s." % form['user'].value)

As before, this Python CGI script prints HTML to generate a response page in the client's browser. But this script does a bit more: it also uses the standard cgi module to parse the input data entered by the user on the prior web page (see Figure 16-6).

Luckily, this is automatic in Python: a call to the standard library cgi module's FieldStorage class automatically does all the work of extracting form data from the input stream and environment variables, regardless of how that data was passedin a post style stream or in get style parameters appended to the URL. Inputs sent in both styles look the same to Python scripts.

Scripts should call cgi.FieldStorage only once and before accessing any field values. When it is called, we get back an object that looks like a dictionaryuser input fields from the form (or URL) show up as values of keys in this object. For example, in the script, form['user'] is an object whose value attribute is a string containing the text typed into the form's text field. If you flip back to the form page's HTML, you'll notice that the input field's name option was userthe name in the form's HTML has become a key we use to fetch the input's value from a dictionary. The object returned by FieldStorage supports other dictionary operations, toofor instance, the has_key method may be used to check whether a field is present in the input data.

Before exiting, this script prints HTML to produce a result page that echoes back what the user typed into the form. Two string-formatting expressions (%) are used to insert the input text into a reply string, and the reply string into the triple-quoted HTML string block. The body of the script's output looks like this:

 <TITLE>tutor3.py</TITLE> <H1>Greetings</H1> <HR> <P>Hello, King Arthur.</P> <HR>

In a browser, the output is rendered into a page like the one in Figure 16-7.

Figure 16-7. tutor3.py result for parameters in a form

16.4.4.4. Passing parameters in URLs

Notice that the URL address of the script that generated this page shows up at the top of the browser. We didn't type this URL itselfit came from the action tag of the prior page's form HTML. However, nothing is stopping us from typing the script's URL explicitly in our browser's address field to invoke the script, just as we did for our earlier CGI script and HTML file examples.

But there's a catch here: where does the input field's value come from if there is no form page? That is, if we type the CGI script's URL ourselves, how does the input field get filled in? Earlier, when we talked about URL formats, I mentioned that the get encoding scheme tacks input parameters onto the end of URLs. When we type script addresses explicitly, we can also append input values on the end of URLs, where they serve the same purpose as <input> fields in forms. Moreover, the Python cgi module makes URL and form inputs look identical to scripts.

For instance, we can skip filling out the input form page completely, and directly invoke our tutor3.py script by visiting a URL of this form (type this in your browser's address field):

 http://localhost/cgi-bin/tutor3.py?user=Brian

In this URL, a value for the input named user is specified explicitly, as if the user had filled out the input page. When called this way, the only constraint is that the parameter name user must match the name expected by the script (and hardcoded in the form's HTML). We use just one parameter here, but in general, URL parameters are typically introduced with a ? and are followed by one or more name=value assignments, separated by & characters if there is more than one. Figure 16-8 shows the response page we get after typing a URL with explicit inputs.

Figure 16-8. tutor3.py result for parameters in a URL

In fact, HTML forms that specify the get encoding style also cause inputs to be added to URLs this way. Try changing Example 16-6 to use method=GET, and submit the formthe name input in the form shows up as a query parameter in the reply page address field, just like the URL we manually entered in Figure 16-8. Forms can use the post or get style. Manually typed URLs with parameters use get.

Generally, any CGI script can be invoked either by filling out and submitting a form page or by passing inputs at the end of a URL. Although hand-coding parameters in URLs can become difficult for scripts that expect many complex parameters, other programs can automate the construction process.

When CGI scripts are invoked with explicit input parameters this way, it's not too difficult to see their similarity to functions, albeit ones that live remotely on the Net. Passing data to scripts in URLs is similar to keyword arguments in Python functions, both operationally and syntactically. In fact, in Chapter 18 we will meet a system called Zope that makes the relationship between URLs and Python function calls even more literal (URLs become more direct function calls).

Incidentally, if you clear out the name input field in the form input page (i.e., make it empty) and press Submit, the user name field becomes empty. More accurately, the browser may not send this field along with the form data at all, even though it is listed in the form layout HTML. The CGI script detects such a missing field with the dictionary has_key method and produces the page captured in Figure 16-9 in response.

Figure 16-9. An empty name field producing an error page

In general, CGI scripts must check to see whether any inputs are missing, partly because they might not be typed by a user in the form, but also because there may be no form at allinput fields might not be tacked onto the end of an explicitly typed URL. For instance, if we type the script's URL without any parameters at allby omitting the text from the ? and beyond, and visiting http://localhost/cgi-bin/tutor3.py with an explicitly entered URLwe get this same error response page. Since we can invoke any CGI through a form or URL, scripts must anticipate both scenarios.

16.4.4.5. Testing outside browsers with the module urllib

Once we understand how to send inputs to forms as query string parameters at the end of URLs like this, the Python urllib module we met in Chapters 2 and 14 becomes even more useful. Recall that this module allows us to fetch the reply generated for any URL address. When the URL names a simple HTML file, we simply download its contents. But when it names a CGI script, the effect is to run the remote script and fetch its output.

For example, we can trigger the script in Example 16-8 directly, without either going through the tutor3.html web page or typing a URL in a browser's address field:

 C:\Python24>python >>> from urllib import urlopen >>> reply = urlopen('http://localhost/cgi-bin/tutor3.py?user=Brian').read( ) >>> print reply <TITLE>tutor3.py</TITLE> <H1>Greetings</H1> <HR> <P>Hello, Brian.</P> <HR> >>> url   = 'http://localhost/cgi-bin/tutor3.py' >>> conn  = urlopen(url) >>> reply = conn.read( ) >>> print reply <TITLE>tutor3.py</TITLE> <H1>Greetings</H1> <HR> <P>Who are you?</P> <HR>

Recall from Chapter 14 that urllib.urlopen gives us a file object connected to the generated reply stream. Reading this file's output returns the HTML that would normally be intercepted by a web browser and rendered into a reply page.

When fetched directly, the HTML reply can be parsed with Python text processing tools (e.g., string methods like split and find, the re pattern-matching module, or the htmllib HTML parsing module). Extracting text from the reply this way is sometimes informally called screen scrapinga way to use web site content in other programs. Screen scraping is an alternative to more complex web services frameworks, though a brittle one: small changes in the page's format can often break scrapers that rely on it. The reply text can also be simply inspectedurllib allows us to test CGI scripts from the Python interactive prompt or other scripts, instead of a browser.

More generally, this technique allows us to use a server-side script as a sort of function call. For instance, a client-side GUI can call the CGI script and parse the generated reply page. Similarly, a CGI script that updates a database may be invoked programmatically with urllib, outside the context of an input form page. This also opens the door to automated regression testing of CGI scriptswe can invoke scripts on any remote machine, and compare their reply text to the expected output.^[*] We'll see urllib in action again in later examples.

^[*] If your job description includes extensive testing of server-side scripts, you may also want to explore Twill, a relatively new Python-based system that provides a little language for scripting the client-side interface to web applications. Search the Web for details.

Before we move on, here are a few advanced urllib usage notes. First, this module also supports proxies, alternative transmission modes, and more. For instance, proxies are supported transparently with environment variables or system settings, or by passing in an extra argument. Moreover, although it normally doesn't make a difference to Python scripts, it is possible to send parameters in both the get and the put submission modes described earlier with urllib. The get mode, with parameters in the query string at the end of a URL as shown in the prior listing, is used by default. To invoke post, pass parameters in as a separate argument:

 >>> from urllib import urlopen, urlencode >>> params = urlencode({'user': 'Brian'}) >>> params 'user=Brian' >>> >>> print urlopen('http://localhost/cgi-bin/tutor3.py', params).read( ) <TITLE>tutor3.py</TITLE> <H1>Greetings</H1> <HR> <P>Hello, Brian.</P> <HR>

Finally, if your web application depends on client-side cookies (discussed later), see also the newer module, urllib2. This module provides the same file-like urlopen interface for opening and reading from a URL, but it uses the cookielib module to automatically store cookies locally, and later return them to the server. It also supports redirection, authentication, and more; both URL modules also support secure HTTP transmissions. See the Python library manual for details. We'll explore both cookies and urllib2 later in this chapter, and introduce secure HTTP in the next.

16.4.5. Using Tables to Lay Out Forms

Now let's move on to something a bit more realistic. In most CGI applications, input pages are composed of multiple fields. When there is more than one, input labels and fields are typically laid out in a table, to give the form a well-structured appearance. The HTML file in Example 16-9 defines a form with two input fields.

Example 16-9. PP3E\Internet\Web\tutor4.html

 <html> <title>CGI 101</title> <body> <H1>A second user interaction: tables  </H1> <hr> <form method=POST action="cgi-bin/tutor4.py">   <table>     <TR>       <TH align=right>Enter your name:       <TD><input type=text name=user>     <TR>       <TH align=right>Enter your age:       <TD><input type=text name=age>     <TR>       <TD colspan=2 align=center>       <input type=submit value="Send">   </table> </form> </body></html>

The <TH> tag defines a column like <TD>, but also tags it as a header column, which generally means it is rendered in a bold font. By placing the input fields and labels in a table like this, we get an input page like that shown in Figure 16-10. Labels and inputs are automatically lined up vertically in columns, much as they were by the Tkinter GUI geometry managers we met earlier in this book.

Figure 16-10. A form laid out with table tags

When this form's Submit button (labeled "Send" by the page's HTML) is pressed, it causes the script in Example 16-10 to be executed on the server machine, with the inputs typed by the user.

Example 16-10. PP3E\Internet\Web\cgi-bin\tutor4.py

 #!/usr/bin/python ####################################################### # runs on the server, reads form input, prints HTML; # URL http://server-name/cgi-bin/tutor4.py ####################################################### import cgi, sys sys.stderr = sys.stdout                # errors to browser form = cgi.FieldStorage( )            # parse form data print "Content-type: text/html\n"      # plus blank line # class dummy: #     def _ _init_ _(self, s): self.value = s # form = {'user': dummy('bob'), 'age':dummy('10')} html = """ <TITLE>tutor4.py</TITLE> <H1>Greetings</H1> <HR> <H4>%s</H4> <H4>%s</H4> <H4>%s</H4> <HR>""" if not form.has_key('user'):     line1 = "Who are you?" else:     line1 = "Hello, %s." % form['user'].value line2 = "You're talking to a %s server." % sys.platform line3 = "" if form.has_key('age'):     try:         line3 = "Your age squared is %d!" % (int(form['age'].value) ** 2)     except:         line3 = "Sorry, I can't compute %s ** 2." % form['age'].value print html % (line1, line2, line3)

The table layout comes from the HTML file, not from this Python CGI script. In fact, this script doesn't do much newit uses string formatting to plug input values into the response page's HTML triple-quoted template string as before, this time with one line per input field. When this script is run by submitting the input form page, its output produces the new reply page shown in Figure 16-11.

Figure 16-11. Reply page generated by tutor4.py

As usual, we can pass parameters to this CGI script at the end of a URL, too. Figure 16-12 shows the page we get when passing a user and age explicitly in this URL:

 http://localhost/cgi-bin/tutor4.py?user=Joe+Blow&age=30

Figure 16-12. Reply page from tutor4.py for parameters in URL

Notice that we have two parameters after the ? this time; we separate them with &. Also note that we've specified a blank space in the user value with +. This is a common URL encoding convention. On the server side, the + is automatically replaced with a space again. It's also part of the standard escape rule for URL strings, which we'll revisit later.

Although Example 16-10 doesn't introduce much that is new about CGI itself, it does highlight a few new coding tricks worth noting, especially regarding CGI script debugging and security. Let's take a quick look.

16.4.5.1. Converting strings in CGI scripts

Just for fun, the script echoes back the name of the server platform by fetching sys.platform along with the square of the age input field. Notice that the age input's value must be converted to an integer with the built-in int function; in the CGI world, all inputs arrive as strings. We could also convert to an integer with the built-in eval function. Conversion (and other) errors are trapped gracefully in a try statement to yield an error line, instead of letting our script die.

You should never use eval to convert strings that were sent over the Internet, like the age field in this example, unless you can be absolutely sure that the string does not contain even potentially malicious code. For instance, if this example were available on the general Internet, it's not impossible that someone could type a value into the age field (or append an age parameter to the URL) with a value like os.system('rm *'). Given the appropriate context and process permissions, when passed to eval, such a string might delete all the files in your server script directory!

Unless you run CGI scripts in processes with limited permissions and machine access, strings read off the Web can be dangerous to run as code in CGI scripting. You should never pass them to dynamic coding tools like eval and exec, or to tools that run arbitrary shell commands such as os.popen and os.system, unless you can be sure that they are safe. Always use simpler tools for numeric conversion like int and float, which recognize only numbers.

16.4.5.2. Debugging CGI scripts

Errors happen, even in the brave new world of the Internet. Generally speaking, debugging CGI scripts can be much more difficult than debugging programs that run on your local machine. Not only do errors occur on a remote machine, but scripts generally won't run without the context implied by the CGI model. The script in Example 16-10 demonstrates the following two common debugging tricks:

Error message trapping: This script assigns sys.stderr to sys.stdout so that Python error messages wind up being displayed in the response page in the browser. Normally, Python error messages are written to stderr, which generally causes them to show up in the web server's console window or logfile. To route them to the browser, we must make stderr reference the same file object as stdout (which is connected to the browser in CGI scripts). If we don't do this assignment, Python errors, including program errors in our script, never show up in the browser.
Test case mock-up: The dummy class definition, commented out in this final version, was used to debug the script before it was installed on the Net. Besides not seeing stderr messages by default, CGI scripts also assume an enclosing context that does not exist if they are tested outside the CGI environment. For instance, if run from the system command line, this script has no form input data. Uncomment this code to test from the system command line. The dummy class masquerades as a parsed form field object, and form is assigned a dictionary containing two form field objects. The net effect is that form will be plug-and-play compatible with the result of a cgi.FieldStorage call. As usual in Python, object interfaces, not datatypes, are all we must adhere to.

Here are a few general tips for debugging your server-side CGI scripts:

Run the script from the command line: It probably won't generate HTML as is, but running it standalone will detect any syntax errors in your code. Recall that a Python command line can run source code files regardless of their extension: for example, python somescript.cgi works fine.
Assign sys.stderr to sys.stdout as early as possible in your script: This will generally make the text of Python error messages and stack dumps appear in your client browser when accessing the script, instead of the web server's console window or logs. Short of wading through server logs, or manual exception handling, this may be the only way to see the text of error messages after your script aborts.
Mock up inputs to simulate the enclosing CGI context: For instance, define classes that mimic the CGI inputs interface (as done with the dummy class in this script) so that you can view the script's output for various test cases by running it from the system command line.^[*] Setting environment variables to mimic form or URL inputs sometimes helps too (we'll see how later in this chapter).
^[*] This technique isn't unique to CGI scripts, by the way. In Chapter 15, we'll meet systems that embed Python code inside HTML. There is no good way to test such code outside the context of the enclosing system without extracting the embedded Python code (perhaps by using the htmllib HTML parser that comes with Python) and running it with a passed-in mock-up of the API that it will eventually use.
Call utilities to display CGI context in the browser: The CGI module includes utility functions that send a formatted dump of CGI environment variables and input values to the browser, to view in a reply page. For instance, cgi.print_form(form) prints all the input parameters sent from the client, and cgi.test( ) prints environment variables, the form, the directory, and more. Sometimes this is enough to resolve connection or input problems. We'll use some of these in the mailer case study in the next chapter.
Show exceptions you catch, print tracebacks: If you catch an exception that Python raises, the Python error message won't be printed to stderr (that is normal behavior). In such cases, it's up to your script to display the exception's name and value in the response page; exception details are available in the built-in sys module, from sys.exc_info( ). In addition, Python's traceback module can be used to manually generate stack traces on your reply page for errors; tracebacks show source-code lines active when an exception occurred. We'll use this later in the error page in PyMailCGI (Chapter 17).
Add debugging prints: You can always insert tracing print statements in your code, just as in normal Python programs. Be sure you print the content-type header line first, though, or your prints may not show up on the reply page. In the worst case, you can also generate debugging and trace messages by opening and writing to a local text file on the server; provided you access that file later, this avoids having to format the trace messages according to HTML reply stream conventions.
Run it live: Of course, once your script is at least half working, your best bet is likely to start running it live on the server, with real inputs coming from a browser. Running a server locally on your machine, as we're doing in this chapter, can help by making changes go faster as you test.

16.4.6. Adding Common Input Devices

So far, we've been typing inputs into text fields. HTML forms support a handful of input controls (what we'd call widgets in the traditional GUI world) for collecting user inputs. Let's look at a CGI program that shows all the common input controls at once. As usual, we define both an HTML file to lay out the form page and a Python CGI script to process its inputs and generate a response. The HTML file is presented in Example 16-11.

Example 16-11. PP3E\Internet\Web\cgi-bin\tutor5a.html

 <HTML><TITLE>CGI 101</TITLE> <BODY> <H1>Common input devices</H1> <HR> <FORM method=POST action="cgi-bin/tutor5.py">   <H3>Please complete the following form and click Send</H3>   <P><TABLE>     <TR>       <TH align=right>Name:       <TD><input type=text name=name>     <TR>       <TH align=right>Shoe size:       <TD><table>       <td><input type=radio name=shoesize value=small>Small       <td><input type=radio name=shoesize value=medium>Medium       <td><input type=radio name=shoesize value=large>Large       </table>     <TR>       <TH align=right>Occupation:       <TD><select name=job>         <option>Developer         <option>Manager         <option>Student         <option>Evangelist         <option>Other       </select>     <TR>       <TH align=right>Political affiliations:       <TD><table>       <td><input type=checkbox name=language value=Python>Pythonista       <td><input type=checkbox name=language value=Perl>Perlmonger       <td><input type=checkbox name=language value=Tcl>Tcler       </table>     <TR>       <TH align=right>Comments:       <TD><textarea name=comment cols=30 rows=2>Enter text here</textarea>     <TR>       <TD colspan=2 align=center>       <input type=submit value="Send">   </TABLE> </FORM> <HR> </BODY></HTML>

When rendered by a browser, the page in Figure 16-13 appears.

Figure 16-13. Input form page generated by tutor5a.html

This page contains a simple text field as before, but it also has radio buttons, a pull-down selection list, a set of multiple-choice checkbuttons, and a multiple-line text input area. All have a name option in the HTML file, which identifies their selected value in the data sent from client to server. When we fill out this form and click the Send submit button, the script in Example 16-12 runs on the server to process all the input data typed or selected in the form.

Example 16-12. PP3E\Internet\Web\cgi-bin\tutor5.py

 #!/usr/bin/python ####################################################### # runs on the server, reads form input, prints HTML ####################################################### import cgi, sys form = cgi.FieldStorage( )            # parse form data print "Content-type: text/html"        # plus blank line html = """ <TITLE>tutor5.py</TITLE> <H1>Greetings</H1> <HR> <H4>Your name is %(name)s</H4> <H4>You wear rather %(shoesize)s shoes</H4> <H4>Your current job: %(job)s</H4> <H4>You program in %(language)s</H4> <H4>You also said:</H4> <P>%(comment)s</P> <HR>""" data = {} for field in ('name', 'shoesize', 'job', 'language', 'comment'):     if not form.has_key(field):         data[field] = '(unknown)'     else:         if type(form[field]) != list:             data[field] = form[field].value         else:             values = [x.value for x in form[field]]             data[field] = ' and '.join(values) print html % data

This Python script doesn't do much; it mostly just copies form field information into a dictionary called data so that it can be easily inserted into the triple-quoted response template string. A few of its techniques merit explanation:

Field validation

As usual, we need to check all expected fields to see whether they really are present in the input data, using the dictionary has_key method. Any or all of the input fields may be missing if they weren't entered on the form or appended to an explicit URL.

String formatting

We're using dictionary key references in the format string this timerecall that %(name)s means pull out the value for the key name in the data dictionary and perform a to-string conversion on its value.

Multiple-choice fields

We're also testing the type of all the expected fields' values to see whether they arrive as a list rather than the usual string. Values of multiple-choice input controls, like the language choice field in this input page, are returned from cgi.FieldStorage as a list of objects with value attributes, rather than a simple single object with a value.

This script copies simple field values to the dictionary verbatim, but it uses a list comprehension to collect the value fields of multiple-choice selections, and the string join method to construct a single string with an and inserted between each selection value (e.g., Python and Tcl). The script's list comprehension is equivalent to the call map(lambda x: x.value, form[field]).^[*]

^[*] Two forward references are worth noting here. Besides simple strings and lists, later we'll see a third type of form input object, returned for fields that specify file uploads. The script in this example should really also escape the echoed text inserted into the HTML reply to be robust, lest it contain HTML operators. We will discuss escapes in detail later.

When the form page is filled out and submitted, the script creates the response shown in Figure 16-14essentially just a formatted echo of what was sent.

Figure 16-14. Response page created by tutor5.py (1)

16.4.7. Changing Input Layouts

Suppose that you've written a system like that in the prior section, and your users, clients, and significant other start complaining that the input form is difficult to read. Don't worry. Because the CGI model naturally separates the user interface (the HTML input page definition) from the processing logic (the CGI script), it's completely painless to change the form's layout. Simply modify the HTML file; there's no need to change the CGI code at all. For instance, Example 16-13 contains a new definition of the input that uses tables a bit differently to provide a nicer layout with borders.

Example 16-13. PP3E\Internet\Web\tutor5b.html

 <HTML><TITLE>CGI 101</TITLE> <BODY> <H1>Common input devices: alternative layout</H1> <P>Use the same tutor5.py server side script, but change the layout of the form itself.  Notice the separation of user interface and processing logic here; the CGI script is independent of the HTML used to interact with the user/client.</P><HR> <FORM method=POST action="cgi-bin/tutor5.py">   <H3>Please complete the following form and click Submit</H3>   <P><TABLE border cellpadding=3>     <TR>       <TH align=right>Name:       <TD><input type=text name=name>     <TR>       <TH align=right>Shoe size:       <TD><input type=radio name=shoesize value=small>Small           <input type=radio name=shoesize value=medium>Medium           <input type=radio name=shoesize value=large>Large     <TR>       <TH align=right>Occupation:       <TD><select name=job>         <option>Developer         <option>Manager         <option>Student         <option>Evangelist         <option>Other       </select>     <TR>       <TH align=right>Political affiliations:       <TD><P><input type=checkbox name=language value=Python>Pythonista           <P><input type=checkbox name=language value=Perl>Perlmonger           <P><input type=checkbox name=language value=Tcl>Tcler     <TR>       <TH align=right>Comments:       <TD><textarea name=comment cols=30 rows=2>Enter spam here</textarea>     <TR>       <TD colspan=2 align=center>       <input type=submit value="Submit">       <input type=reset  value="Reset">   </TABLE> </FORM> </BODY></HTML>

When we visit this alternative page with a browser, we get the interface shown in Figure 16-15.

Figure 16-15. Form page created by tutor5b.html

Now, before you go blind trying to detect the differences in this and the prior HTML file, I should note that the HTML differences that produce this page are much less important than the fact that the action fields in these two pages' forms reference identical URLs. Pressing this version's Submit button triggers the exact same and totally unchanged Python CGI script again, tutor5.py (Example 16-12).

That is, scripts are completely independent of both the transmission mode (URL query parameters of form fields) and the layout of the user interface used to send them information. Changes in the response page require changing the script, of course, because the HTML of the reply page is still embedded in the CGI script. But we can change the input page's HTML as much as we like without affecting the server-side Python code. Figure 16-16 shows the response page produced by the script this time around.

Figure 16-16. Response page created by tutor5.py (2)

16.4.7.1. Keeping display and logic separate

In fact, this illustrates an important point in the design of larger web sites: if we are careful to keep the HTML and script code separate, we get a useful division of display and logiceach part can be worked on independently, by people with different skill sets. Web page designers, for example, can work on the display layout, while programmers can code business logic.

Although this section's example is fairly small, it already benefits from this separation for the input page. In some cases, the separation is harder to accomplish, because our example scripts embed the HTML of reply pages. With just a little more work, though, we can usually split the reply HTML off into separate files that can also be developed independently of the script's logic. The html string in tutor5.py (Example 16-12), for instance, might be stored in a text file and loaded by the script when run.

In larger systems, tools such as server-side HTML templating languages help make the division of display and logic even easier to achieve. The Zope and Python Server Pages examples we'll meet in Chapter 18, for instance, promote the separation of display and logic by providing reply page description languages that are expanded to include portions generated by separate Python program logic. In a sense, server-side templating languages embed Python in HTMLthe opposite of CGI scripts that embed HTML in Pythonand may provide a cleaner division of labor, provided the Python code is separate components. See Chapter 18 for more details. Similar techniques can be used for separation of layout and login in the GUIs we studied earlier in this book, but they also usually require larger frameworks or models to achieve.

16.4.8. Passing Parameters in Hardcoded URLs

Earlier, we passed parameters to CGI scripts by listing them at the end of a URL typed into the browser's address fieldin the query string parameters part of the URL, after the ?. But there's nothing sacred about the browser's address field. In particular, nothing is stopping us from using the same URL syntax in hyperlinks that we hardcode or generate in web page definitions.

For example, the web page from Example 16-14 defines three hyperlinks (the text between the <A> and </A> tags), which trigger our original tutor5.py script again (Example 16-12), but with three different precoded sets of parameters.

Example 16-14. PP3E\Internet\Web\tutor5c.html

 <HTML><TITLE>CGI 101</TITLE> <BODY> <H1>Common input devices: URL parameters</H1> <P>This demo invokes the tutor5.py server-side script again, but hardcodes input data to the end of the script's URL, within a simple hyperlink (instead of packaging up a form's inputs).  Click your browser's "show page source" button to view the links associated with each list item below. <P>This is really more about CGI than Python, but notice that Python's cgi module handles both this form of input (which is also produced by GET form actions), as well as POST-ed forms; they look the same to the Python CGI script.  In other words, cgi module users are independent of the method used to submit data. <P>Also notice that URLs with appended input values like this can be generated as part of the page output by another CGI script, to direct a next user click to the right place and context; together with type 'hidden' input fields, they provide one way to save state between clicks. </P><HR> <UL> <LI><A href="cgi-bin/tutor5.py?name=Bob&shoesize=small">Send Bob, small</A> <LI><A href="cgi-bin/tutor5.py?name=Tom&language=Python">Send Tom, Python</A> <LI><A href="http://localhost/cgi-bin/tutor5.py?job=Evangelist&comment=spam"> Send Evangelist, spam</A> </UL> <HR></BODY></HTML>

This static HTML file defines three hyperlinksthe first two are minimal and the third is fully specified, but all work similarly (again, the target script doesn't care). When we visit this file's URL, we see the page shown in Figure 16-17. It's mostly just a page for launching canned calls to the CGI script.

Figure 16-17. Hyperlinks page created by tutor5c.html

Clicking on this page's second link creates the response page in Figure 16-18. This link invokes the CGI script, with the name parameter set to "Tom" and the language parameter set to "Python," simply because those parameters and values are hardcoded in the URL listed in the HTML for the second hyperlink. As such, hyperlinks with parameters like this are sometimes known as stateful linksthey automatically direct the next script's operation. The net effect is exactly as if we had manually typed the line shown at the top of the browser in Figure 16-18.

Figure 16-18. Response page created by tutor5.py (3)

Notice that many fields are missing here; the tutor5.py script is smart enough to detect and handle missing fields and generate an unknown message in the reply page. It's also worth pointing out that we're reusing the Python CGI script again. The script itself is completely independent of both the user interface format of the submission page, as well as the technique used to invoke itfrom a submitted form or a hardcoded URL with query parameters. By separating such user interface details from processing logic, CGI scripts become reusable software components, at least within the context of the CGI environment.

The query parameters in the URLs embedded in Example 16-14 were hardcoded in the page's HTML. But such URLs can also be generated automatically by a CGI script as part of a reply page in order to provide inputs to the script that implements a next step in user interaction. They are a simple way for web-based applications to "remember" things for the duration of a session. Hidden form fields, up next, serve some of the same purposes.

16.4.9. Passing Parameters in Hidden Form Fields

Similar in spirit to the prior section, inputs for scripts can also be hardcoded in a page's HTML as hidden input fields. Such fields are not displayed in the page, but are transmitted back to the server when the form is submitted. Example 16-15, for instance, allows a job field to be entered, but fills in name and language parameters automatically as hidden input fields.

Example 16-15. PP3E\Internet\Web\tutor5d.html

 <HTML><TITLE>CGI 101</TITLE> <BODY> <H1>Common input devices: hidden form fields</H1> <P>This demo invokes the tutor5.py server-side script again, but hardcodes input data in the form itself as hidden input fields, instead of as parameters at the end of URL hyperlinks. As before, the text of this form, including the hidden fields, can be generated as part of the page output by another CGI script, to pass data on to the next script on submit; hidden form fields provide another way to save state between pages. </P><HR><p> <form> <form method=post action="cgi-bin/tutor5.py">     <input type=hidden name=name     value=Sue>     <input type=hidden name=language value=Python>     <input type=text   name=job      value="Enter job">     <input type=submit value="Submit Sue"> </form> </p><HR></BODY></HTML>

When Example 16-15 is opened in a browser, we get the input page in Figure 16-19.

Figure 16-19. tutor5d.html input form page

When submitting, we trigger our original tutor5.py script once again (Example 16-12), but some of the inputs have been provided for us as hidden fields. The reply page is captured in Figure 16-20.

Figure 16-20. Response page created by tutor5.py (4)

Here again, we've hardcoded and embedded the inputs in the page's HTML, but such fields can also be generated on the fly as part of the reply from a CGI script. When they are, they serve as inputs for the next page, and so are a sort of memory. To fully understand how and why this is necessary, we need to next take a short diversion into state retention alternatives.