Section 17.2. The PyMailCGI Web Site

17.2. The PyMailCGI Web Site

In Chapter 15, we built a program called PyMailGUI that implements a complete Python+Tkinter email client GUI (if you didn't read that chapter, you may want to take a quick glance at it now). Here, we're going to do something of the same, but on the Web: the system presented in this section, PyMailCGI, is a collection of CGI scripts that implement a simple web-based interface for sending and reading email in any browser. In effect, it is a simple webmail systemthough not as powerful as what may be available from your Internet Service Provider (ISP), its scriptability gives you control over its operation and future evolution.

Our goal in studying this system is partly to learn a few more CGI tricks, partly to learn a bit about designing larger Python systems in general, and partly to underscore the trade-offs between systems implemented for the Web (the PyMailCGI server) and systems written to run locally (the PyMailGUI client). This chapter hints at some of these trade-offs along the way and returns to explore them in more depth after the presentation of this system.

17.2.1. Implementation Overview

At the top level, PyMailCGI allows users to view incoming email with the Post Office Protocol (POP) interface and to send new mail by Simple Mail Transfer Protocol (SMTP). Users also have the option of replying to, forwarding, or deleting an incoming email while viewing it. As implemented, anyone can send email from a PyMailCGI site, but to view your email, you generally have to install PyMailCGI on your own computer or web server account, with your own mail server information (due to security concerns described later).

Viewing and sending email sounds simple enough, and we've already coded this a few times in this book. But the required interaction involves a number of distinct web pages, each requiring a CGI script or HTML file of its own. In fact, PyMailCGI is a fairly linear systemin the most complex user interaction scenario, there are six states (and hence six web pages) from start to finish. Because each page is usually generated by a distinct file in the CGI world, that also implies six source files.

Technically, PyMailCGI could also be described as a state machine, though very little state is transferred from state to state. Scripts pass user and message information to the next script in hidden form fields and query parameters, but there are no client-side cookies or server-side databases in the current version. Still, along the way we'll encounter situations where more advanced state retention tools could be an advantage.

To help keep track of how all of PyMailCGI's source files fit into the overall system, I jotted down the file in Example 17-1 before starting any real programming. It informally sketches the user's flow through the system and the files invoked along the way. You can certainly use more formal notations to describe the flow of control and information through states such as web pages (e.g., dataflow diagrams), but for this simple example, this file gets the job done.

Example 17-1. PP3E\Internet\Web\PyMailCgi\pageflow.txt

 file or script                          creates --------------                          ------- [pymailcgi.html]                         Root window  => [onRootViewLink.py]                  Pop password window      => [onViewPswdSubmit.py]            List window (loads all pop mail)          => [onViewListLink.py]          View Window + pick=del|reply|fwd (fetch)              => [onViewPageAction.py]    Edit window, or delete+confirm (del)                  => [onEditPageSend.py]  Confirmation (sends smtp mail)                      => back to root  => [onRootSendLink.py]                  Edit Window      => [onEditPageSend.py]              Confirmation (sends smtp mail)          => back to root

This file simply lists all the source files in the system, using => and indentation to denote the scripts they trigger.

For instance, links on the pymailcgi.html root page invoke onRootViewLink.py and onRootSendLink.py, both executable scripts. The script onRootViewLink.py generates a password page, whose Submit button in turn triggers onViewPswdSubmit.py, and so on. Notice that both the view and the send actions can wind up triggering onEditPageSend.py to send a new mail; view operations get there after the user chooses to reply to or forward an incoming mail.

In a system such as this, CGI scripts make little sense in isolation, so it's a good idea to keep the overall page flow in mind; refer to this file if you get lost. For additional context, Figure 17-1 shows the overall contents of this site, viewed as directory listings on Windows in a DOS command prompt window.

Figure 17-1. PyMailCGI contents

To install this site, all the files you see here are uploaded to a PyMailCgi subdirectory of your web directory on your server's machine. Besides the page-flow HTML and CGI script files invoked by user interaction, PyMailCGI uses a handful of utility modules:

commonhtml.py: Is a library of HTML tools.
externs.py: Isolates access to modules imported from other places.
loadmail.py: Encapsulates mailbox fetches for future expansion.
secret.py: Implements configurable password encryption.

PyMailCGI also reuses parts of the mailtools module package and mailconfig.py module we wrote in Chapter 14. The former of these is accessible to imports from the PP3E package root, and the latter is copied to the PyMailCgi directory so that it can differ between PyMailGUI and PyMailCGI. The externs.py module is intended to hide these modules' actual locations, in case the install structure varies on some machines.

In fact, this system demonstrates the powers of code reuse in a practical way. In this third edition, it gets a great deal of logic for free from the new mailtools package of Chapter 14message loading, sending, deleting, parsing, composing, and attachmentseven though that package's modules were originally develop for the PyMailGUI program. When it came time to update PyMailCGI six months later, tools for handling complex things such as attachments and message text searches were already in place. See Example 14-21 in Chapter 14 for mailtools source code.

As usual, PyMailCGI also uses a variety of standard library modules: smtplib, poplib, email.*, cgi, urllib, and the like. Thanks to the reuse of both custom and standard library code, this system achieves much in a minimal amount of code. All told, PyMailCGI consists of just 835 lines of new code, including whitespace and comments.

The compares favorably to the 2,200 lines of the PyMailGUI client, but most of this difference owes to the limited functionality in PyMailCGIthere are no local save files, no transfer thread overlap, no message caching, no inbox synchronization tests, no multiple-message selections, and so on. Still, PyMailCGI's code factoring and reuse of existing modules allow it to implement much in a surprisingly small amount of code.

17.2.2. New in This Edition

In this third edition, PyMailCGI has been upgraded to use the new mailtools module package of Chapter 14, employ the PyCrypto package for passwords if it is installed, support viewing and sending message attachments, and run more efficiently.

We'll meet these new features along the way, but the last two of these merit a few words upfront. Attachments are supported in a simplistic but usable fashion and use existing mailtools package code for much of their operation:

For viewing attachments, message parts are split off the message and saved in local files on the server. Message view pages are then augmented with hyperlinks pointing to the temporary files; when clicked, they open in whatever way your web browser opens the selected part's file type.
For sending attachments, we use the HTML upload techniques presented near the end of Chapter 16. Mail edit pages now have file-upload controls, to allow a maximum of three attachments. Selected files are uploaded to the server by the browser with the rest of the page as usual, saved in temporary files on the server, and added to the outgoing mail.

Both schemes would fail for multiple simultaneous users, but since PyMailCGI's configuration file scheme (described later in this chapter) already limits it to a single username, this is a reasonable constraint. The links to temporary files generated for attachment viewing also apply only to the last message selected, but this works if the page flow is followed normally. Improving this for a multiuser scenario, as well as adding additional features such as PyMailGUI's local file save and open options, are left as exercises.

For efficiency, this version of PyMailCGI also avoids repeated exhaustive mail downloads. In the prior version, the full text of all messages in an inbox was downloaded every time you visited the list page, and every time you selected a single message to view. In this version, the list page downloads only the header text portion of each message, and only a single message's full text is downloaded when one is selected for viewing.

Even so, the list page's headers-only download can be slow if you have many messages in your inbox (I have more than a thousand in one of mine). A better solution would somehow cache mails to limit reloads, at least for the duration of a browser session. For example, we might load headers of only newly arrived messages, and cache headers of mails already fetched, as done in the PyMailGUI client we met in Chapter 16.

Due to the lack of state retention in CGI scripts, though, this would require some sort of server-side database. We might, for instance, store already fetched message headers under a generated key that identifies the session (e.g., with process number and time) and pass that key between pages as a cookie, hidden form field, or URL query parameter. Each page would use the key to fetch cached mail stored directly on the web server, instead of loading it from the email server again. Presumably, loading from a local cache file would be faster than loading from a network connection to the mail server. This would make for an interesting exercise too, if you wish to extend this system on your own, but it would also result in more pages than this chapter has to spend (frankly, I ran out of real estate in this chapter before I ran out of potential enhancements).

Carry-On Software

PyMailCGI works as planned and illustrates more CGI and email concepts, but I want to point out a few caveats upfront. The application was initially written during a two-hour layover in Chicago's O'Hare airport (though debugging took a few hours more). I wrote it to meet a specific needto be able to read and send email from any web browser while traveling around the world teaching Python classes. I didn't design it to be aesthetically pleasing to others and didn't spend much time focusing on its efficiency.

I also kept this example intentionally simple for this book. For example, PyMailCGI doesn't provide nearly as many features as the PyMailGUI program in Chapter 15, and it reloads email more than it probably should. Because of this, its performance can be very poor if you keep your inbox large.

In fact, this system almost cries out for more advanced state retention options. As is, user and message details are passed in generated pages as hidden fields and query parameters, but we could avoid reloading mail by also using the server-side database techniques described in Chapter 16. Such extensions might eventually bring PyMailCGI up to the functionality of PyMailGUI, albeit at some cost in code complexity.

In other words, you should consider this system a work in progress; it's not yet software worth selling. On the other hand, it does what it was intended to do, and you can customize it by tweaking its Python source codesomething that can't be said of all software sold.

17.2.3. Presentation Overview

Much of the "action" in PyMailCGI is encapsulated in shared utility modules (especially one called commonhtml.py). The CGI scripts that implement user interaction don't do much by themselves. This architecture was chosen deliberately, to make scripts simple, avoid code redundancy, and implement a common look-and-feel in shared code. But it means you must jump between files to understand how the whole system works.

To make this example easier to digest, we're going to explore its code in two chunks: page scripts first, and then the utility modules. First, we'll study screenshots of the major web pages served up by the system and the HTML files and top-level Python CGI scripts used to generate them. We begin by following a send mail interaction, and then trace how existing email is read, and then processed. Most implementation details will be presented in these sections, but be sure to flip ahead to the utility modules listed later to understand what the scripts are really doing.

I should also point out that this is a fairly complex system, and I won't describe it in exhaustive detail; as in the PyMailGUI chapter (Chapter 15), be sure to read the source code along the way for details not made explicit in the narrative. All of the system's source code appears in this chapter (as well as in the book's examples distribution package), and we will study its key concepts here. But as usual with case studies in this book, I assume that you can read Python code by now and that you will consult the example's source code for more details. Because Python's syntax is so close to executable, pseudocode systems are sometimes better described in Python than in English, once you have the overall design in mind.

17.2.4. Running This Chapter's Examples

The HTML pages and CGI scripts of PyMailCGI can be installed on any web server to which you have access. To keep things simple for this book, though, we're going to use the same policy as in Chapter 16we'll be running the Python-coded webserver.py script from Example 16-1 locally, on the same machine as the web browser client. As we learned at the start of the prior chapter, that means we'll be using the server domain name "localhost" (or the equivalent IP address, "127.0.0.1") to access this system's pages in our browser, as well as in the urllib module.

Start this server script on your own machine to test-drive the program. Ultimately, this system must generally contact a mail server over the Internet to fetch or send messages, but the web page server will be running locally on your computer.

One minor twist here: PyMailCGI's code is located in a directory of its own, one level down from the webserver.py script. Because of that, we'll start the web server here with an explicit directory and port number in the command line used to launch it:

 C:\...\PP3E\Internet\Web>webserver.py PyMailCgi 8000

Type this sort of command into a command prompt window on Windows or into your system shell prompt on Unix-like platforms. When run this way, the server will listen for URL requests on machine "localhost" and socket port number 8000. It will serve up pages from the PyMailCgi subdirectory one level below the script's location, and it will run CGI scripts located in the PyMailCgi/cgi-bin directory below that. This works because the script changes its current working directory to the one you name when it starts up.

Subtle point: because we specify a unique port number on the command line this way, it's OK if you simultaneously run another instance of the script to serve up the prior chapter's examples one directory up; that instance will accept connections on port 80, and our new instance will handle requests on port 8000. In fact, you can contact either server from the same browser by specifying the desired server's port number. If you have two instances of the server running in the two different chapters' directories, to access pages and scripts of the prior chapter, use a URL of this form:

 http://localhost/languages.html http://localhost/cgi-bin/languages.py?language=All

And to run this chapter's pages and scripts, simply use URLs of this form:

 http://localhost:8000/pymailcgi.html http://localhost:8000/cgi-bin/onRootSendLink.py

You'll see that the HTTP and CGI log messages appear in the window of the server you're contacting. For more background on why this works as it does, see the introduction to network socket addresses in Chapter 13 and the discussion of URLs in Chapter 16.

As in Chapter 16, if you do install this example's code on a different server, simply replace the "localhost:8000/cgi-bin" part of the URLs we'll use here, with your server's name, port, and path details. In practice, a system such as PyMailCGI would be much more useful if it were installed on a remote server, to allow mail processing from any web client.^[*]

^[*] One downside to running a local webserver.py script that I noticed during development for this chapter is that on platforms where CGI scripts are run in the same process as the server (including Windows, with the code's workaround), you'll need to stop and restart the server every time you change an imported module. Otherwise, a subsequent import in a CGI script will have no effect: the module has already been imported in the process. This is not an issue on platforms that run the CGI as a separate, new process. The Windows workaround in the code is probably temporary; it is likely that CGIs will eventually be able to be run in separate processes there too.

As with PyMailGUI, also note that you'll have to edit the mailconfig.py module's settings, if you wish to use this system to read your own email. As provided, the email server information is not useful to readers; more on this in a moment.