16.9. Transferring Files to Clients and Servers
It's time to explain a bit of HTML code we've been keeping in the shadows. Did you notice those hyperlinks on the language selector example's main page for showing the CGI script's source code? Normally, we can't see such script source code, because accessing a CGI script makes it execute (we can see only its HTML output, generated to make the new page). The script in Example 16-26, referenced by a hyperlink in the main language.html page, works around that by opening the source file and sending its text as part of the HTML response. The text is marked with <PRE> as preformatted text, and is escaped for transmission inside HTML with cgi.escape.
Example 16-26. PP3E\Internet\Web\cgi-bin\languages-src.py
Here again, the filename is relative to the server's directory for our web server on Windows (see the prior note, and delete the cgi-bin portion of its path on other platforms). When we visit this script on the Web via the hyperlink or a manually typed URL, the script delivers a response to the client that includes the text of the CGI script source file. It appears as in Figure 16-27.
Figure 16-27. Source code viewer page
Note that here, too, it's crucial to format the text of the file with cgi.escape, because it is embedded in the HTML code of the reply. If we don't, any characters in the text that mean something in HTML code are interpreted as HTML tags. For example, the C++ < operator character within this file's text may yield bizarre results if not properly escaped. The cgi.escape utility converts it to the standard sequence < for safe embedding.
16.9.1. Displaying Arbitrary Server Files on the Client
Almost immediately after writing the languages source code viewer script in the preceding example, it occurred to me that it wouldn't be much more work, and would be much more useful, to write a generic versionone that could use a passed-in filename to display any file on the site. It's a straightforward mutation on the server side; we merely need to allow a filename to be passed in as an input. The getfile.py Python script in Example 16-27 implements this generalization. It assumes the filename is either typed into a web page form or appended to the end of the URL as a parameter. Remember that Python's cgi module handles both cases transparently, so there is no code in this script that notices any difference.
Example 16-27. PP3E\Internet\Web\cgi-bin\getfile.py
This Python server-side script simply extracts the filename from the parsed CGI inputs object, and reads and prints the text of the file to send it to the client browser. Depending on the formatted global variable setting, it sends the file in either plain text mode (using text/plain in the response header) or wrapped up in an HTML page definition (text/html).
Both modes (and others) work in general under most browsers, but Internet Explorer doesn't handle the plain text mode as gracefully as Netscape doesduring testing, it popped up the Notepad text editor to view the downloaded text, but end-of-line characters in Unix format made the file appear as one long line. (Netscape instead displays the text correctly in the body of the response web page itself.) HTML display mode works more portably with current browsers. More on this script's restricted file logic in a moment.
Let's launch this script by typing its URL at the top of a browser, along with a desired filename appended after the script's name. Figure 16-28 shows the page we get by visiting this URL:
Figure 16-28. Generic source code viewer page
The body of this page shows the text of the server-side file whose name we passed at the end of the URL; once it arrives, we can view its text, cut-and-paste to save it in a file on the client, and so on. In fact, now that we have this generalized source code viewer, we could replace the hyperlink to the script languages-src.py in language.html, with a URL of this form:
For illustration purposes, the main HTML page in Example 16-17 has links to the original source code display script as well as to this URL (less the server name). Really, URLs like these are direct calls (albeit across the Web) to our Python script, with filename parameters passed explicitly. As we've seen, parameters passed in URLs are treated the same as field inputs in forms; for convenience, let's also write a simple web page that allows the desired file to be typed directly into a form, as shown in Example 16-28.
Example 16-28. PP3E\Internet\Web\getfile.html
Figure 16-29 shows the page we receive when we visit this file's URL. We need to type only the filename in this page, not the full CGI script address.
Figure 16-29. Source code viewer selection page
When we press this page's Download button to submit the form, the filename is transmitted to the server, and we get back the same page as before, when the filename was appended to the URL (see Figure 16-28). In fact, the filename will be appended to the URL here, too; the get method in the form's HTML instructs the browser to append the filename to the URL, exactly as if we had done so manually. It shows up at the end of the URL in the response page's address field, even though we really typed it into a form.[*]
188.8.131.52. Handling private files and errors
As long as CGI scripts have permission to open the desired server-side file, this script can be used to view and locally save any file on the server. For instance, Figure 16-30 shows the page we're served after asking for the file path PyMailCgi/pymailcgi.htmlan HTML text file in another application's subdirectory, nested within the parent directory of this script (we explore PyMailCGI in the next chapter). Users can specify both relative and absolute paths to reach a fileany path syntax the server understands will do.
Figure 16-30. Viewing files with relative paths
More generally, this script will display any file path for which the username under which the CGI script runs has read access. On some servers, this is often the user "nobody"a predefined username with limited permissions. Just about every server-side file used in web applications will be accessible, though, or else they couldn't be referenced from browsers in the first place. When running our local web server, every file on the computer can be inspected: C:\Mark\WEBSITE\public_html\index.html works fine when entered in the form of Figure 16-29 on my laptop, for example.
That makes for a flexible tool, but it's also potentially dangerous if you are running a server on a remote machine. What if we don't want users to be able to view some files on the server? For example, in the next chapter, we will implement an encryption module for email account passwords. On our server, it is in fact addressable as PyMailCgi/cgi-bin/secret.py. Allowing users to view that module's source code would make encrypted passwords shipped over the Net much more vulnerable to cracking.
To minimize this potential, the getfile script keeps a list, privates, of restricted filenames, and uses the os.path.samefile built-in to check whether a requested filename path points to one of the names on privates. The samefile call checks to see whether the os.stat built-in returns the same identifying information (device and inode numbers) for both file paths. As a result, pathnames that look different syntactically but reference the same file are treated as identical. For example, on the server used for this book's second edition, the following paths to the encryptor module were different strings, but yielded a true result from os.path.samefile:
Unfortunately, the os.path.samefile call is supported on Unix, Linux, and Macs, but not on Windows. To emulate its behavior in Windows, we expand file paths to be absolute, convert to a common case, and compare:
>>> os.getcwd( ) 'C:\\PP3E-cd\\Examples\\PP3E\\Internet\\Web' >>> >>> x = os.path.abspath('../Web/PYMailCgi/cgi-bin/secret.py').lower( ) >>> y = os.path.abspath('PyMailCgi/cgi-bin/secret.py').lower( ) >>> z = os.path.abspath('./PYMailCGI/cgi-bin/../cgi-bin/SECRET.py').lower( ) >>> x 'c:\\pp3e-cd\\examples\\pp3e\\internet\\web\\pymailcgi\\cgi-bin\\secret.py' >>> y 'c:\\pp3e-cd\\examples\\pp3e\\internet\\web\\pymailcgi\\cgi-bin\\secret.py' >>> z 'c:\\pp3e-cd\\examples\\pp3e\\internet\\web\\pymailcgi\\cgi-bin\\secret.py' >>> x == y, y == Z (True, True)
Accessing any of the three paths expanded here generates an error page like that in Figure 16-31.
Figure 16-31. Accessing private files
Notice that bona fide file errors are handled differently. Permission problems and attempts to access nonexistent files, for example, are trapped by a different exception handler clause, and they display the exception's messagefetched using Python's sys.exc_infoto give additional context. Figure 16-32 shows one such error page.
Figure 16-32. File errors display
As a general rule of thumb, file-processing exceptions should always be reported in detail, especially during script debugging. If we catch such exceptions in our scripts, it's up to us to display the details (assigning sys.stderr to sys.stdout won't help if Python doesn't print an error message). The current exception's type, data, and traceback objects are always available in the sys module for manual display.
16.9.2. Uploading Client Files to the Server
The getfile script lets us view server files on the client, but in some sense, it is a general-purpose file download tool. Although not as direct as fetching a file by FTP or over raw sockets, it serves similar purposes. Users of the script can either cut-and-paste the displayed code right off the web page or use their browser's View Source option to view and cut.
But what about going the other wayuploading a file from the client machine to the server? For instance, suppose you are writing a web-based email system, and you need a way to allow users to upload mail attachments. This is not an entirely hypothetical scenario; we will actually implement this idea in the next chapter, when we develop PyMailCGI.
As we saw in Chapter 14, uploads are easy enough to accomplish with a client-side script that uses Python's FTP support module. Yet such a solution doesn't really apply in the context of a web browser; we can't usually ask all of our program's clients to start up a Python FTP script in another window to accomplish an upload. Moreover, there is no simple way for the server-side script to request the upload explicitly, unless an FTP server happens to be running on the client machine (not at all the usual case). Users can email files separately, but this can be inconvenient, especially for email attachments.
So is there no way to write a web-based program that lets its users upload files to a common server? In fact, there is, though it has more to do with HTML than with Python itself. HTML <input> tags also support a type=file option, which produces an input field, along with a button that pops up a file-selection dialog. The name of the client-side file to be uploaded can either be typed into the control or selected with the popup dialog. The HTML page file in Example 16-29 defines a page that allows any client-side file to be selected and uploaded to the server-side script named in the form's action option.
Example 16-29. PP3E\Internet\Web\putfile.html
One constraint worth noting: forms that use file type inputs must also specify a multipart/form-data encoding type and the post submission method, as shown in this file; get-style URLs don't work for uploading files (adding their contents to the end of the URL doesn't make sense). When we visit this HTML file, the page shown in Figure 16-33 is delivered. Pressing its Browse button opens a file-selection dialog, while Upload sends the file.
Figure 16-33. File upload selection page
On the client side, when we press this page's Upload button, the browser opens and reads the selected file and packages its contents with the rest of the form's input fields (if any). When this information reaches the server, the Python script named in the form action tag is run as always, as listed in Example 16-30.
Example 16-30. PP3E\Internet\Web\cgi-bin\putfile.py
Within this script, the Python-specific interfaces for handling uploaded files are employed. They aren't very new, really; the file comes into the script as an entry in the parsed form object returned by cgi.FieldStorage, as usual; its key is clientfile, the input control's name in the HTML page's code.
This time, though, the entry has additional attributes for the file's name on the client. Moreover, accessing the value attribute of an uploaded file input object will automatically read the file's contents all at once into a string on the server. For very large files, we can instead read line by line (or in chunks of bytes) to avoid overflowing memory space. For illustration purposes, the script implements either scheme: based on the setting of the loadtextauto global variable, it either asks for the file contents as a string, or reads it line by line.[*] In general, the CGI module gives us back objects with the following attributes for file upload controls:
Additional attributes are not used by our script. Files represent a third input field object; as we've also seen, the value attribute is a string for simple input fields, and we may receive a list of objects for multiple-selection controls.
For uploads to be saved on the server, CGI scripts (run by the user "nobody" on some servers) must have write access to the enclosing directory if the file doesn't yet exist, or to the file itself if it does. To help isolate uploads, the script stores all uploads in whatever server directory is named in the uploaddir global. On one Linux server, I had to give this directory a mode of 777 (universal read/write/execute permissions) with chmod to make uploads work in general. This is a nonissue with the local web server used in this chapter, but your mileage may vary; be sure to check permissions if this script fails.
The script also calls os.chmod to set the permission on the server file such that it can be read and written by everyone. If it is created anew by an upload, the file's owner will be "nobody" on some servers, which means anyone out in cyberspace can view and upload the file. On one Linux server, though, the file will also be writable only by the user "nobody" by default, which might be inconvenient when it comes time to change that file outside the Web (the degree of pain can vary per operation).
If both client and server do their parts, the CGI script presents us with the response page shown in Figure 16-34, after it has stored the contents of the client file in a new or existing file on the server. For verification, the response gives the client and server file paths, as well as an echo of the uploaded file with a line count (in line-by-line reader mode).
Figure 16-34. Putfile response page
Incidentally, we can also verify the upload with the getfile program we wrote in the prior section. Simply access the selection page to type the pathname of the file on the server, as shown in Figure 16-35.
Figure 16-35. Verifying putfile with getfileselection
If the file upload is successful, the resulting viewer page we will obtain looks like Figure 16-36. Since the user "nobody" (CGI scripts) was able to write the file, "nobody" should be able to view it as well.
Figure 16-36. Verifying putfile with getfileresponse
Notice the URL in this page's address fieldthe browser translated the / character we typed into the selection page to a %2F hexadecimal escape code before adding it to the end of the URL as a parameter. We met URL escape codes like this earlier in this chapter. In this case, the browser did the translation for us, but the end result is as if we had manually called one of the urllib quoting functions on the file path string.
Technically, the %2F escape code here represents the standard URL translation for non-ASCII characters, under the default encoding scheme browsers employ. Spaces are usually translated to + characters as well. We can often get away without manually translating most non-ASCII characters when sending paths explicitly (in typed URLs). But as we saw earlier, we sometimes need to be careful to escape characters (e.g., &) that have special meaning within URL strings with urllib tools.
184.108.40.206. Handling client path formats
In the end, the putfile.py script stores the uploaded file on the server within a hardcoded uploaddir directory, under the filename at the end of the file's path on the client (i.e., less its client-side directory path). Notice, though, that the splitpath function in this script needs to do extra work to extract the base name of the file on the right. Some browsers may send up the filename in the directory path format used on the client machine; this path format may not be the same as that used on the server where the CGI script runs. This can vary per browser, but it should be addressed for portability.
The standard way to split up paths, os.path.split, knows how to extract the base name, but only recognizes path separator characters used on the platform on which it is running. That is, if we run this CGI script on a Unix machine, os.path.split chops up paths around a / separator. If a user uploads from a DOS or Windows machine, however, the separator in the passed filename is \, not /. Browsers running on a Macintosh may send a path that is more different still.
To handle client paths generically, this script imports platform-specific, path-processing modules from the Python library for each client it wishes to support, and tries to split the path with each until a filename on the right is found. For instance, posixpath handles paths sent from Unix-style platforms, and ntpath recognizes DOS and Windows client paths. We usually don't import these modules directly since os.path.split is automatically loaded with the correct one for the underlying platform, but in this case, we need to be specific since the path comes from another machine. Note that we could have instead coded the path splitter logic like this to avoid some split calls:
def splitpath(origpath): # get name at end basename = os.path.split(origpath) # try server paths if basename == origpath: # didn't change it? if '\\' in origpath: basename = origpath.split('\\')[-1] # try DOS clients elif '/' in origpath: basename = origpath.split('/')[-1] # try Unix clients return basename
But this alternative version may fail for some path formats (e.g., DOS paths with a drive but no backslashes). As is, both options waste time if the filename is already a base name (i.e., has no directory paths on the left), but we need to allow for the more complex cases generically.
This upload script works as planned, but a few caveats are worth pointing out before we close the book on this example:
If you run into any of these limitations, you will have crossed over into the domain of suggested exercises.
16.9.3. More Than One Way to Push Bits over the Net
Finally, let's discuss some context. We've seen three getfile scripts at this point in the book. The one in this chapter is different from the other two we wrote in earlier chapters, but it accomplishes a similar goal:
Really, the getfile CGI script in this chapter simply displays files only, but it can be considered a download tool when augmented with cut-and-paste operations in a web browser. Moreover, the CGI- and HTTP-based putfile script here is also different from the FTP-based putfile in Chapter 14, but it can be considered an alternative to both socket and FTP uploads.
The point to notice is that there are a variety of ways to ship files around the Internetsockets, FTP, and HTTP (web pages) can move files between computers. Technically speaking, we can transfer files with other techniques and protocols tooPost Office Protocol (POP) email, Network News Transfer Protocol (NNTP) news, Telnet, and so on.
Each technique has unique properties but does similar work in the end: moving bits over the Net. All ultimately run over sockets on a particular port, but protocols like FTP add additional structure to the socket layer, and application models like CGI add both structure and programmability.
In the next chapter, we're going to use what we've learned here to build a substantial application that runs entirely on the WebPyMailCGI, a web-based email tool, which allows us to send and view emails in a browser, process email attachments such as images and audio files, and more. At the end of the day, though, it's mostly just bytes over sockets, with a user interface.