14.2. FTP: Transferring Files over the Net
As we saw in the preceding chapter, sockets see plenty of action on the Net. For instance, the getfile example allowed us to transfer entire files between machines. In practice, though, higher-level protocols are behind much of what happens on the Net. Protocols run on top of sockets, but they hide much of the complexity of the network scripting examples of the prior chapter.
FTPthe File Transfer Protocolis one of the more commonly used Internet protocols. It defines a higher-level conversation model that is based on exchanging command strings and file contents over sockets. By using FTP, we can accomplish the same task as the prior chapter's getfile script, but the interface is simpler and standardFTP lets us ask for files from any server machine that supports FTP, without requiring that it run our custom getfile script. FTP also supports more advanced operations such as uploading files to the server, getting remote directory listings, and more.
Really, FTP runs on top of two sockets: one for passing control commands between client and server (port 21), and another for transferring bytes. By using a two-socket model, FTP avoids the possibility of deadlocks (i.e., transfers on the data socket do not block dialogs on the control socket). Ultimately, though, Python's ftplib support module allows us to upload and download files at a remote server machine by FTP, without dealing in raw socket calls or FTP protocol details.
14.2.1. Fetching Files with ftplib
Because the Python FTP interface is so easy to use, let's jump right into a realistic example. The script in Example 14-1 automatically fetches and opens a remote file with Python. More specifically, this Python script does the following:
The download portion will run on any machine with Python and an Internet connection. The opening part works if your playfile.py supports your platform; see Chapter 6 for details, and change as needed.
Example 14-1. PP3E\Internet\Ftp\getone.py
Most of the FTP protocol details are encapsulated by the Python ftplib module imported here. This script uses some of the simplest interfaces in ftplib (we'll see others later in this chapter), but they are representative of the module in general.
To open a connection to a remote (or local) FTP server, create an instance of the ftplib.FTP object, passing in the string name (domain or IP style) of the machine you wish to connect to:
connection = FTP(sitename) # connect to ftp site
Assuming this call doesn't throw an exception, the resulting FTP object exports methods that correspond to the usual FTP operations. In fact, Python scripts act much like typical FTP client programsjust replace commands you would normally type or select with method calls:
connection.login(*userinfo) # default is anonymous login connection.cwd(dirname) # xfer 1k at a time to localfile
Once connected, we log in and change to the remote directory from which we want to fetch a file. The login method allows us to pass in a username and password as additional optional arguments to specify an account login; by default, it performs anonymous FTP. Notice the use of the nonpassive flag in this script:
if nonpassive: # force active FTP if server requires connection.set_pasv(False)
If this flag is set to TRue, the script will transfer the file in active FTP mode rather than the default passive mode. We'll finesse the details of the difference here (it has to do with which end of the dialog chooses port numbers for the transfer), but if you have trouble doing transfers with any of the FTP scripts in this chapter, try using active mode as a first step. In Python 2.1 and later, passive FTP mode is on by default. Now, fetch the file:
connection.retrbinary('RETR ' + filename, localfile.write, 1024)
Once we're in the target directory, we simply call the retrbinary method to download the target server file in binary mode. The retrbinary call will take a while to complete, since it must download a big file. It gets three arguments:
Because this script creates a local file named localfile of the same name as the remote file being fetched, and passes its write method to the FTP retrieval method, the remote file's contents will automatically appear in a local, client-side file after the download is finished. Observe how this file is opened in wb binary output mode; if this script is run on Windows, we want to avoid automatically expanding any \n bytes into \r\n byte sequences (that happens automatically on Windows when writing files opened in w text mode).
Finally, we call the FTP quit method to break the connection with the server and manually close the local file to force it to be complete before it is further processed (it's not impossible that parts of the file are still held in buffers before the close call):
connection.quit( ) localfile.close( )
And that's all there is to itall the FTP, socket, and networking details are hidden behind the ftplib interface module. Here is this script in action on a Windows machine; after the download, the image file pops up in a Windows picture viewer on my laptop, as captured in Figure 14-1:
C:\...\PP3E\Internet\Ftp>python getone.py Pswd? Connecting... Downloading... Open file?y
Figure 14-1. Image file downloaded by FTP and opened
Notice how the standard Python getpass.getpass is used to ask for an FTP password. Like the raw_input built-in function, this call prompts for and reads a line of text from the console user; unlike raw_input, getpass does not echo typed characters on the screen at all (in fact, on Windows it initially used the low-level direct keyboard interface we met in the stream redirection section of Chapter 3). This is handy for protecting things like passwords from potentially prying eyes. Be careful, thoughin the current IDLE GUI, the password is echoed anyhow!
Configure this script's initial assignments for a site and file you wish to fetch, and run this on your machine to see the opened file.[*] The thing to notice is that this otherwise typical Python script fetches information from an arbitrarily remote FTP site and machine. Given an Internet link, any information published by an FTP server on the Net can be fetched by and incorporated into Python scripts using interfaces such as these.
14.2.2. Using urllib to FTP Files
In fact, FTP is just one way to transfer information across the Net, and there are more general tools in the Python library to accomplish the prior script's download. Perhaps the most straightforward is the Python urllib module: given an Internet address stringa URL, or Universal Resource Locatorthis module opens a connection to the specified server and returns a file-like object ready to be read with normal file object method calls (e.g., read, readline).
We can use such a higher-level interface to download anything with an address on the Webfiles published by FTP sites (using URLs that start with "ftp://"); web pages and output of scripts that live on remote servers (using "http://" URLs); local files (using "file://" URLs); Gopher server data; and more. For instance, the script in Example 14-2 does the same as the one in Example 14-1, but it uses the general urllib module to fetch the source distribution file, instead of the protocol-specific ftplib.
Example 14-2. PP3E\Internet\Ftp\getone-urllib.py
Don't sweat the details of the URL string used here; it is fairly complex, and we'll explain its structure and that of URLs in general in Chapter 16. We'll also use urllib again in this and later chapters to fetch web pages, format generated URL strings, and get the output of remote scripts on the Web.
Technically speaking, urllib supports a variety of Internet protocols (HTTP, FTP, Gopher, and local files). Unlike ftplib, urlib is generally used for reading remote objects, not for writing or uploading them (though the HTTP and FTP protocols support file uploads). As with ftplib, retrievals must generally be run in threads if blocking is a concern. But the basic interface shown in this script is straightforward. The call:
remotefile = urllib.urlopen(remoteaddr) # returns input file-like object
contacts the server named in the remoteaddr URL string and returns a file-like object connected to its download stream (here, an FTP-based socket). Calling this file's read method pulls down the file's contents, which are written to a local client-side file. An even simpler interface:
also does the work of opening a local file and writing the downloaded bytes into itthings we do manually in the script as coded. This comes in handy if we want to download a file, but it is less useful if we want to process its data immediately.
Either way, the end result is the same: the desired server file shows up on the client machine. The output is similar to the original version, but we don't try to automatically open this time (I've changed the password in the URL here to protect the innocent):
C:\...\PP3E\Internet\Ftp>getone-urllib.py Pswd? Downloading ftp://lutz:firstname.lastname@example.org/lawnlake2-jan-03.jpg;type=i
For more urllib download examples, see the section on HTTP in this chapter, and the server-side examples in Chapter 16. As we'll see in Chapter 16, in bigger terms, tools like urllib.urlopen allow scripts to both download remote files and invoke programs that are located on a remote server machine, and so serves as a useful tool for testing and using web sites in Python scripts. In Chapter 16, we'll also see that urllib includes tools for formatting (escaping) URL strings for safe transmission.
14.2.3. FTP get and put Utilities
When I present the ftplib interfaces in Python classes, students often ask why programmers need to supply the RETR string in the retrieval method. It's a good questionthe RETR string is the name of the download command in the FTP protocol, but ftplib is supposed to encapsulate that protocol. As we'll see in a moment, we have to supply an arguably odd STOR string for uploads as well. It's boilerplate code that you accept on faith once you see it, but that begs the question. You could propose a patch to ftplib, but that's not really a good answer for beginning Python students, and it may break existing code (the interface is as it is for a reason).
Perhaps a better answer is that Python makes it easy to extend the standard library modules with higher-level interfaces of our ownwith just a few lines of reusable code, we can make the FTP interface look any way we want in Python. For instance, we could, once and for all, write utility modules that wrap the ftplib interfaces to hide the RETR string. If we place these utility modules in a directory on PYTHONPATH, they become just as accessible as ftplib itself, automatically reusable in any Python script we write in the future. Besides removing the RETR string requirement, a wrapper module could also make assumptions that simplify FTP operations into single function calls.
For instance, given a module that encapsulates and simplifies ftplib, our Python fetch-and-play script could be further reduced to the script shown in Example 14-3essentially just two function calls plus a password prompt.
Example 14-3. PP3E\Internet\Ftp\getone-modular.py
Besides having a much smaller line count, the meat of this script has been split off into a file for reuse elsewhere. If you ever need to download a file again, simply import an existing function instead of copying code with cut-and-paste editing. Changes in download operations would need to be made in only one file, not everywhere we've copied boilerplate code; getfile.getfile could even be changed to use urllib rather than ftplib without affecting any of its clients. It's good engineering.
18.104.22.168. Download utility
So just how would we go about writing such an FTP interface wrapper (he asks, rhetorically)? Given the ftplib library module, wrapping downloads of a particular file in a particular directory is straightforward. Connected FTP objects support two download methods:
We will meet the retrlines method in a later example; the getfile utility module in Example 14-4 always transfers in binary mode with retrbinary. That is, files are downloaded exactly as they were on the server, byte for byte, with the server's line-feed conventions in text files. You may need to convert line feeds after downloads if they look odd in your text editorsee the converter tools in Chapter 7 for pointers.
Example 14-4. PP3E\Internet\Ftp\getfile.py
This module is mostly just a repackaging of the FTP code we used to fetch the image file earlier, to make it simpler and reusable. Because it is a callable function, the exported getfile.getfile here tries to be as robust and generally useful as possible, but even a function this small implies some design decisions. Here are a few usage notes:
Also notice that, despite its name, this module is very different from the getfile.py script we studied at the end of the sockets material in the preceding chapter. The socket-based getfile implemented client and server-side logic to download a server file to a client machine over raw sockets.
The new getfile here is a client-side tool only. Instead of raw sockets, it uses the simpler FTP protocol to request a file from a server; all socket-level details are hidden in the ftplib module's implementation of the FTP client protocol. Furthermore, the server here is a perpetually running program on the server machine, which listens for and responds to FTP requests on a socket, on the dedicated FTP port (number 21). The net functional effect is that this script requires an FTP server to be running on the machine where the desired file lives, but such a server is much more likely to be available.
22.214.171.124. Upload utility
While we're at it, let's write a script to upload a single file by FTP to a remote machine. The upload interfaces in the FTP module are symmetric with the download interfaces. Given a connected FTP object, its:
Unlike the download interfaces, both of these methods are passed a file object as a whole, not a file object method (or other function). We will meet the storlines method in a later example. The utility module in Example 14-5 uses storbinary such that the file whose name is passed in is always uploaded verbatimin binary mode, without line-feed translations for the target machine's conventions. If this script uploads a text file, it will arrive exactly as stored on the machine it came from, client line-feed markers and all.
Example 14-5. PP3E\Internet\Ftp\putfile.py
Notice that for portability, the local file is opened in rb binary mode this time to suppress automatic line-feed character conversions, in case this is run on Windows: if this is binary information, we don't want any bytes that happen to have the value of the \r carriage-return character to mysteriously go away during the transfer.
This script uploads a file you name on the command line as a self-test, but you will normally pass in real remote filename, site name, and directory name strings. Also like the download utility, you may pass a (username, password) tuple to the user argument to trigger nonanonymous FTP mode (anonymous FTP is the default).
126.96.36.199. Playing the Monty Python theme song
It's time for a bit of fun. Let's use these scripts to transfer a copy of the Monty Python theme song audio file I have at my web site. First, let's write a module that downloads and plays the sample file, as shown in Example 14-6.
Example 14-6. PP3E\Internet\Ftp\sousa.py
There's not much to this script, because it really just combines two tools we've already coded. We're reusing Example 14-4's getfile to download, and Chapter 6's playfile module (Example 6-16) to play the audio sample after it is downloaded (turn back to that example for more details on the player part of the task). Also notice the last two lines in this filewe can achieve the same effect by passing in the audio filename as a command-line argument to our original script, but it's less direct.
This script will run on any machine with Python, an Internet link, and a recognizable audio player; it works on my Windows laptop with a dial-up or broadband Internet connection, and it plays the music clip in Windows Media Player (if I could insert an audio file hyperlink here to show what it sounds like, I would):
C:\...\PP3E\Internet\Ftp>sousa.py Pswd? Downloading sousa.au Download done. C:\...\PP3E\Internet\Ftp>sousa.py Pswd? sousa.au already fetched
The getfile and putfile modules can be used to move the sample file around, too. Both can either be imported by clients that wish to use their functions, or run as top-level programs to trigger self-tests and command-line usage. Let's run these scripts from a command line and the interactive prompt to see how they work. When run standalone, parameters are passed in the command line and the default file settings are used:
C:\...\PP3E\Internet\Ftp>putfile.py sousa.py ftp.rmi.net pswd? Uploading sousa.py Upload done.
When imported, parameters are passed explicitly to functions:
C:\...\PP3E\Internet\Ftp>python >>> from getfile import getfile >>> getfile(file='sousa.au', site='ftp.rmi.net', dir='.', user=('lutz', 'XXX')) sousa.au already fetched C:\...\PP3E\Internet\Ftp>del sousa.au C:\...\PP3E\Internet\Ftp>python >>> from getfile import getfile >>> getfile(file='sousa.au', site='ftp.rmi.net', dir='.', user=('lutz', 'XXX')) Downloading sousa.au Download done. >>> from PP3E.System.Media.playfile import playfile >>> playfile('sousa.au')
188.8.131.52. Adding user interfaces
If you read the preceding chapter, you'll recall that it concluded with a quick look at scripts that added a user interface to a socket-based getfile scriptone that transferred files over a proprietary socket dialog, instead of over FTP. At the end of that presentation, I mentioned that FTP is a much more generally useful way to move files around because FTP servers are so widely available on the Net. For illustration purposes, Example 14-7 shows a simple mutation of the prior chapter's user interface, implemented as a new subclass of the last chapter's general form builder.
Example 14-7. PP3E\Internet\Ftp\getfilegui.py
If you flip back to the end of the preceding chapter, you'll find that this version is similar in structure to its counterpart there; in fact, it has the same name (and is distinct only because it lives in a different directory). The class here, though, knows how to use the FTP-based getfile module from earlier in this chapter instead of the socket-based getfile module we met a chapter ago. When run, this version also implements more input fields, as in Figure 14-2.
Figure 14-2. FTP getfile input form
Notice that a full file path is entered for the local directory here. Otherwise, the script assumes the current working directory, which changes after each download and can vary depending on where the GUI is launched (e.g., the current directory differs when this script is run by the PyDemos program at the top of the examples tree). When we click this GUI's Submit button (or press the Enter key), the script simply passes the form's input field values as arguments to the getfile.getfile FTP utility function shown earlier in this section. It also posts a pop up to tell us the download has begun (Figure 14-3).
Figure 14-3. FTP getfile info pop up
As currently coded, further download status messages show up in the console window; here are the messages for a successful download, as well as one that failed when I mistyped my password (no, it's not really "xxxxxxxxx"):
C:\...\PP3E\Internet\Ftp>getfilegui.py Server Name => ftp.rmi.net User Name? => lutz Local Dir => c:\temp File Name => calendar.html Password? => xxxxxxxx Remote Dir => . Download of "calendar.html" has failed: ftplib.error_perm 530 Login incorrect. Server Name => ftp.rmi.net User Name? => lutz Local Dir => c:\temp File Name => calendar.html Password? => xxxxxxxxx Remote Dir => . Download of "calendar.html" successful
Given a username and password, the downloader logs into the specified account. To do anonymous FTP instead, leave the username and password fields blank.
Now, to illustrate the threading capabilities of this GUI, start a download of a large file, then start another download while this one is in progress. The GUI stays active while downloads are underway, so we simply change the input fields and press Submit again.
This second download starts and runs in parallel with the first, because each download is run in a thread, and more than one Internet connection can be active at once. In fact, the GUI itself stays active during downloads only because downloads are run in threads; if they were not, even screen redraws wouldn't happen until a download finished.
We discussed threads in Chapter 5, but this script illustrates some practical thread concerns:
We learned about ways to work around the no-GUI rule for threads in Chapter 11, and we will apply such techniques when we explore the PyMailGUI example in the next chapter. To be portable, though, we can't really close the GUI until the active-thread count falls to zero. Here is the sort of output that appears in the console window when two downloads overlap in time (these particular threads overlapped a long time ago):
C:\...\PP3E\Internet\Ftp>python getfilegui.py User Name? => Server Name => ftp.python.org Local Dir => c:\temp Password? => File Name => python1.5.tar.gz Remote Dir => pub/python/src User Name? => lutz Server Name => starship.python.net Local Dir => c:\temp Password? => xxxxxx File Name => about-pp.html Remote Dir => public_html/home Download of "about-pp.html" successful Download of "python1.5.tar.gz" successful
This example isn't much more useful than a command line-based tool, of course, but it can be easily modified by changing its Python code, and it provides enough of a GUI to qualify as a simple, first-cut FTP user interface. Moreover, because this GUI runs downloads in Python threads, more than one can be run at the same time from this GUI without having to start or restart a different FTP client tool.
While we're in a GUI mood, let's add a simple interface to the putfile utility too. The script in Example 14-8 creates a dialog that starts uploads in threads. It's almost the same as the getfile GUI we just wrote, so there's nothing new to say. In fact, because get and put operations are so similar from an interface perspective, most of the get form's logic was deliberately factored out into a single generic class (FtpForm), so changes need be made in only a single place. That is, the put GUI here is mostly just a reuse of the get GUI, with distinct output labels and transfer methods. It's in a file by itself to make it easy to launch as a standalone program.
Example 14-8. PP3E\Internet\Ftp\putfilegui.py
Running this script looks much like running the download GUI, because it's almost entirely the same code at work. Let's upload some files from the client machine to the server; Figure 14-4 shows the state of the GUI while starting one.
Figure 14-4. FTP putfile input form
And here is the console window output we get when uploading two files in parallel; here again, uploads run in threads, so if we start a new upload before one in progress is finished, they overlap in time:
User Name? => lutz Server Name => starship.python.net Local Dir => c:\stuff\website\public_html Password? => xxxxxx File Name => about-PP3E.html Remote Dir => public_html User Name? => lutz Server Name => starship.python.net Local Dir => c:\stuff\website\public_html Password? => xxxxxx File Name => about-ppr3e.html Remote Dir => public_html Upload of "about-PP3E.html" successful Upload of "about-ppr2e.html" successful
Finally, we can bundle up both GUIs in a single launcher script that knows how to start the get and put interfaces, regardless of which directory we are in when the script is started, and independent of the platform on which it runs. Example 14-9 shows this process.
Example 14-9. PP3E\Internet\Ftp\PyFtpGui.pyw
When this script is started, both the get and put GUIs appear as distinct, independently run programs; alternatively, we might attach both forms to a single interface. We could get much fancier than these two interfaces, of course. For instance, we could pop up local file selection dialogs, and we could display widgets that give the status of downloads and uploads in progress. We could even list files available at the remote site in a selectable listbox by requesting remote directory listings over the FTP connection. To learn how to add features like that, though, we need to move on to the next section.
14.2.4. Downloading Web Sites (Mirrors)
Once upon a time, I used Telnet to manage my web site at my Internet Service Provider (ISP).[*] Like most personal web sites, today I maintain mine on my laptop and transfer its files to and from my ISP as needed. Often, this is a simple matter of one or two files, and it can be accomplished with a command-line FTP client. Sometimes, though, I need an easy way to transfer the entire site. Maybe I need to download to detect files that have become out of sync. Occasionally, the changes are so involved that it's easier to upload the entire site in a single step.
Although there are a variety of ways to approach this task, Python can help here, too: by writing Python scripts to automate the upload and download tasks associated with maintaining my web site on my laptop, they provide a portable and mobile solution. Because Python FTP scripts will work on any machine with sockets, they can be run on my laptop and on nearly any other computer where Python is installed. Furthermore, the same scripts used to transfer page files to and from my PC can be used to copy ("mirror") my site to another web server as a backup copy, should my ISP experience an outage.
The following two scripts address these needs. The first, downloadflat.py, automatically downloads (i.e., copies) by FTP all the files in a directory at a remote site to a directory on the local machine. I keep the main copy of my web site files on my PC these days, but I use this script in two ways:
More generally, this script (shown in Example 14-10) will download a directory full of files to any machine with Python and sockets, from any machine running an FTP server.
Example 14-10. PP3E\Internet\Ftp\mirror\downloadflat.py
There's not a whole lot that is new to speak of in this script, compared to other FTP examples we've seen thus far. We open a connection with the remote FTP server, log in with a username and password for the desired account (this script never uses anonymous FTP), and go to the desired remote directory. New here, though, are loops to iterate over all the files in local and remote directories, text-based retrievals, and file deletions:
All of this is simpler in action than in words. Here is the command I use to download my entire web site from my ISP server account to my Windows laptop PC, in a single step:
C:\Mark\temp\website>c:\...\PP3E\Internet\Ftp\mirror\downloadflat.py Password for lutz on home.rmi.net: Clean local directory first? y connecting... deleting local 2004-longmont-classes.html deleting local 2005-longmont-classes.html deleting local about-hopl.html deleting local about-lp-toc.html deleting local about-lp.html deleting local about-lp2e.html ... ...lines deleted... ... deleting local dsc00475.jpg deleting local dsc00506.jpg downloading 2004-longmont-classes.html to .\2004-longmont-classes.html as text downloading 2005-longmont-classes.html to .\2005-longmont-classes.html as text downloading about-hopl.html to .\about-hopl.html as text downloading about-lp-toc.html to .\about-lp-toc.html as text downloading about-lp.html to .\about-lp.html as tex ... ...lines deleted... ... downloading lp2e-updates.html to .\lp2e-updates.html as text downloading 109_0137.JPG to .\109_0137.JPG as image downloading sousa.au to .\sousa.au as audio downloading sousa.py to .\sousa.py as text downloading pp2e-cd-dir.txt.gz to .\pp2e-cd-dir.txt.gz as text gzip downloading wxPython.doc.tgz to .\wxPython.doc.tgz as application gzip downloading img_0694.jpg to .\img_0694.jpg as image downloading t250.jpg to .\t250.jpg as image downloading c3100.gif to .\c3100.gif as image downloading ipod.gif to .\ipod.gif as image downloading lp70.jpg to .\lp70.jpg as image downloading pic23.html to .\pic23.html as text downloading 2006-longmont-classes.html to .\2006-longmont-classes.html as text Done: 258 files downloaded.
This may take a few moments to complete, depending on your site's size and your connection speed (it's bound by network speed constraints, and it usually takes roughly five minutes on my current laptop and wireless broadband connection). It is much more accurate and easier than downloading files by hand, though. The script simply iterates over all the remote files returned by the nlst method, and downloads each with the FTP protocol (i.e., over sockets) in turn. It uses text transfer mode for names that imply text data, and binary mode for others.
With the script running this way, I make sure the initial assignments in it reflect the machines involved, and then run the script from the local directory where I want the site copy to be stored. Because the target download directory is usually not where the script lives, I need to give Python the full path to the script file. When run on a server in a Telnet session window, for instance, the execution and script directory paths are different, but the script works the same way.
If you elect to delete local files in the download directory, you may also see a batch of "deleting local..." messages scroll by on the screen before any "downloading..." lines appear: this automatically cleans out any garbage lingering from a prior download. And if you botch the input of the remote site password, a Python exception is raised; I sometimes need to run it again (and type more slowly):
C:\Mark\temp\website>c:\...\PP3E\Internet\Ftp\mirror\downloadflat.py Password for lutz on home.rmi.net: Clean local directory first? y connecting... Traceback (most recent call last): File "c:\...\PP3E\Internet\Ftp\mirror\downloadflat.py", line 27, in ? connection.login(remoteuser, remotepass) # login as user/pass... File "C:\Python24\lib\ftplib.py", line 362, in login if resp == '3': resp = self.sendcmd('PASS ' + passwd) File "C:\Python24\lib\ftplib.py", line 241, in sendcmd return self.getresp( ) File "C:\Python24\lib\ftplib.py", line 214, in getresp raise error_perm, resp ftplib.error_perm: 530 Login incorrect.
It's worth noting that this script is at least partially configured by assignments near the top of the file. In addition, the password and deletion options are given by interactive inputs, and one command-line argument is allowedthe local directory name to store the downloaded files (it defaults to ".", the directory where the script is run). Command-line arguments could be employed to universally configure all the other download parameters and options, too, but because of Python's simplicity and lack of compile/link steps, changing settings in the text of Python scripts is usually just as easy as typing words on a command line.[*]
14.2.5. Uploading Web Sites
Uploading a full directory is symmetric to downloading: it's mostly a matter of swapping the local and remote machines and operations in the program we just met. The script in Example 14-11 uses FTP to copy all files in a directory on the local machine on which it runs, up to a directory on a remote machine.
I really use this script too, most often to upload all of the files maintained on my laptop PC to my ISP account in one fell swoop. I also sometimes use it to copy my site from my PC to a mirror machine or from the mirror machine back to my ISP. Because this script runs on any computer with Python and sockets, it happily transfers a directory from any machine on the Net to any machine running an FTP server. Simply change the initial setting in this module as appropriate for the transfer you have in mind.
Example 14-11. PP3E\Internet\Ftp\mirror\uploadflat.py
Similar to the mirror download script, this program illustrates a handful of new FTP interfaces and a set of FTP scripting techniques:
As for the mirror download script, this program simply iterates over all files to be transferred (files in the local directory listing this time), and transfers each in turnin either text or binary mode, depending on the files' names. Here is the command I use to upload my entire web site from my laptop Windows PC to the remote Unix server at my ISP, in a single step:
C:\Mark\temp\website>c:\...\PP3E\Internet\Ftp\mirror\uploadflat.py Password for lutz on home.rmi.net: Clean remote directory first? n connecting... uploading .\109_0137.JPG to 109_0137.JPG as image uploading .\2004-longmont-classes.html to 2004-longmont-classes.html as text uploading .\2005-longmont-classes.html to 2005-longmont-classes.html as text uploading .\2006-longmont-classes.html to 2006-longmont-classes.html as text uploading .\about-hopl.html to about-hopl.html as text ... ...lines deleted... ... uploading .\visitor_poundbang.py to visitor_poundbang.py as text uploading .\wcall.py to wcall.py as text uploading .\wcall_find.py to wcall_find.py as text uploading .\wcall_find_patt.py to wcall_find_patt.py as text uploading .\wcall_visitor.py to wcall_visitor.py as text uploading .\whatsnew.html to whatsnew.html as text uploading .\whatsold.html to whatsold.html as text uploading .\wxPython.doc.tgz to wxPython.doc.tgz as application gzip uploading .\xlate-lp.html to xlate-lp.html as text uploading .\zaurus0.jpg to zaurus0.jpg as image uploading .\zaurus1.jpg to zaurus1.jpg as image uploading .\zaurus2.jpg to zaurus2.jpg as image uploading .\zoo-jan-03.jpg to zoo-jan-03.jpg as image uploading .\zopeoutline.htm to zopeoutline.htm as text Done: 258 files uploaded.
On my current laptop and wireless broadband connection, this process typically takes seven minutes, depending on server load. As with the download script, I usually run this command from the local directory where my web files are kept, and I pass Python the full path to the script. When I run this on a Linux server, it works in the same way, but the paths to the script and my web files directory differ. If you elect to clean the remote directory before uploading, you'll get a bunch of "deleting remote..." messages before the "uploading..." lines here, too:[*]
... deleting remote uk-3.jpg deleting remote whatsnew.html deleting remote whatsold.html deleting remote xlate-lp.html deleting remote uploadflat.py deleting remote ora-lp-france.gif deleting remote LJsuppcover.jpg deleting remote sonyz505js.gif deleting remote pic14.html ...
14.2.6. Refactoring Uploads and Downloads for Reuse
The directory upload and download scripts of the prior two sections work as advertised and, apart from the new mimetypes logic, were all we wrote in the prior edition of this book. If you look at these two scripts long enough, though, their similarities will pop out at you eventually. In fact, they are largely the samethey use identical code to configure transfer parameters, connect to the FTP server, and determine file type. The exact details have been lost to time, but some of this code was certainly copied from one file to the other.
Although such redundancy isn't a cause for alarm if we never plan on changing these scripts, it can be a killer in software projects in general. When you have two copies of identical bits of code, not only is there a danger of them becoming out of sync over time (you'll lose uniformity in user interface and behavior), but you also effectively double your effort when it comes time to change code that appears in both places. Unless you're a big fan of extra work, avoid redundancy wherever possible.
This redundancy is especially glaring when we look at the complex code that uses mimetypes to determine file types. Repeating magic like this in more than one place is almost always a bad ideanot only do we have to remember how it works every time we need the same utility, but it is a recipe for errors.
184.108.40.206. Refactoring with functions
As originally coded, our download and upload scripts comprise top-level script code that relies on global variables. Such a structure is difficult to reusecode runs immediately on imports, and it's difficult to generalize for varying contexts. Worse, it's difficult to maintainwhen you program by cut-and-paste of existing code, you increase the cost of future changes every time you click the Paste button.
To demonstrate how we might do better, Example 14-12 shows one way to refactor (reorganize) the download script. By wrapping its parts in functions, they become reusable in other modules, including our upload program.
Example 14-12. PP3E\Internet\Ftp\mirror\downloadflat_modular.py
Compare this version with the original. This script, and every other in this section, runs the same as the original flat download and upload programs, so we won't repeat their outputs here. Although we haven't changed its behavior, though, we've modified the script's software structure radicallyits code is now a set of tools that can be imported and reused in other programs.
The refactored upload program in Example 14-13, for instance, is now noticeably simpler, and the code it shares with the download script only needs to be changed in one place if it ever requires improvement.
Example 14-13. PP3E\Internet\Ftp\mirror\uploadflat_modular.py
Not only is the upload script simpler now because it reuses common code, but it will also inherit any changes made in the download module. For instance, the isTextKind function was later augmented with code that adds the .pyw extension to mimetypes tables (this file type is not recognized by default); because it is a shared function, the change is automatically picked up in the upload program, too.
220.127.116.11. Refactoring with classes
The function-based approach of the last two examples addresses the redundancy issue, but they are perhaps clumsier than they need to be. For instance, their cf configuration options object provides a namespace that replaces global variables and breaks cross-file dependencies. Once we start making objects to model namespaces, though, Python's OOP support tends to be a more natural structure for our code. As one last twist, Example 14-14 refactors the FTP code one more time in order to leverage Python's class feature.
Example 14-14. PP3E\Internet\Ftp\mirror\ftptools.py
In fact, this last mutation combines uploads and downloads into a single file, because they are so closely related. As before, common code is factored into methods to avoid redundancy. New here, the instance object itself becomes a natural namespace for storing configuration options (they become self attributes). Study this example's code for more details of the restructuring applied.
Although this file can still be run as a command-line script (pass in a command-line argument to specify "download" or "upload"), its class is really now a package of FTP tools that can be mixed into other programs and reused. By wrapping its code in a class, it can be easily customized by redefining its methodsits configuration calls, such as getlocaldir, for example, may be redefined in subclasses for custom scenarios.
Perhaps most important, using classes optimizes code reusability. Clients of this file can both upload and download directories by simply subclassing or embedding an instance of this class and calling its methods. To see one example of how, let's move on to the next section.
14.2.7. Uploads and Deletes with Subdirectories
Perhaps the biggest limitation of the web site download and upload scripts we just met is that they assume the site directory is flat (hence their names). That is, both transfer simple files only, and neither handles nested subdirectories within the web directory to be transferred.
For my purposes, that's often a reasonable constraint. I avoid nested subdirectories to keep things simple, and I store my home web site as a simple directory of files. For other sites (including one I keep at another machine), site transfer scripts are easier to use if they also automatically transfer subdirectories along the way.
18.104.22.168. Uploading local trees
It turns out that supporting directories on uploads is fairly simplewe need to add only a bit of recursion, and remote directory creation calls. The upload script in Example 14-15 extends the version we just saw, to handle uploading all subdirectories nested within the transferred directory. Furthermore, it recursively transfers subdirectories within subdirectoriesthe entire directory tree contained within the top-level transfer directory is uploaded to the target directory at the remote server.
In terms of its code structure, Example 14-15 is just a customization of the FtpTools class of the prior sectionreally we're just adding a method for recursive uploads, by subclassing. As one consequence, we get tools such as parameter configuration, content type testing, and connection and upload code for free here; with OOP, some of the work is done before we start.
Example 14-15. PP3E\Internet\Ftp\mirror\uploadall.py
Like the flat upload script, this one can be run on any machine with Python and sockets and upload to any machine running an FTP server; I run it both on my laptop PC and on other servers by Telnet to upload sites to my ISP.
The crux of the matter in this script is the os.path.isdir test near the top; if this test detects a directory in the current local directory, we create an identically named directory on the remote machine with connection.mkd and descend into it with connection.cwd, and recur into the subdirectory on the local machine (we have to use recursive calls here, because the shape and depth of the tree are arbitrary). Like all FTP object methods, mkd and cwd methods issue FTP commands to the remote server. When we exit a local subdirectory, we run a remote cwd('..') to climb to the remote parent directory and continue. The rest of the script is roughly the same as the original.
In the interest of space, I'll leave studying this variant in more depth as a suggested exercise. For more context, see the experimental\uploadall-2.py version of this script in the examples distribution; it's similar, but coded so as not to assume that the top-level remote directory already exists.
Here is the sort of output displayed on the console when the upload-all script is run. It's similar to the flat upload (which you might expect, given that it is reusing much of the same code), but notice that it traverses and uploads two nested subdirectories along the way, .\tempdir and .\tempdir\nested:
C:\Mark\temp\website>c:\...\PP3E\Internet\Ftp\mirror\uploadall.py Password for lutz on home.rmi.net: connecting... uploading .\109_0137.JPG to 109_0137.JPG image uploading .\2004-longmont-classes.html to 2004-longmont-classes.html text uploading .\2005-longmont-classes.html to 2005-longmont-classes.html text uploading .\2006-longmont-classes.html to 2006-longmont-classes.html text ... ...lines deleted... ... uploading .\t615c.jpg to t615c.jpg image uploading .\talk.html to talk.html text uploading .\temp.txt to temp.txt text uploading .\tempdir to tempdir directory created uploading .\tempdir\index.html to index.html text uploading .\tempdir\nested to nested directory created uploading .\tempdir\nested\about-pp.html to about-pp.html text uploading .\tempdir\nested\calendar.html to calendar.html text directory exited uploading .\tempdir\zaurus0.jpg to zaurus0.jpg image directory exited uploading .\testicon.jpg to testicon.jpg image uploading .\testicon_py.html to testicon_py.html text ... ...lines deleted... ... uploading .\zoo-jan-03.jpg to zoo-jan-03.jpg image uploading .\zopeoutline.htm to zopeoutline.htm text Done: 261 files and 2 directories uploaded.
As is, the script of Example 14-15 handles only directory tree uploads; recursive uploads are generally more useful than recursive downloads if you maintain your web sites on your local PC and upload to a server periodically, as I do. To also download (mirror) a web site that has subdirectories, a script must parse the output of a remote listing command to detect remote directories. For the same reason, the recursive upload script was not coded to support the remote directory tree cleanup option of the originalsuch a feature would require parsing remote listings as well. The next section shows how.
22.214.171.124. Deleting remote trees
One last example of code reuse at work: when I initially tested the prior section's upload-all script, it contained a bug that caused it to fall into an infinite recursion loop, and keep copying the full site into new subdirectories, over and over, until the FTP server kicked me off (not an intended feature of the program!). In fact, the upload got 13 levels deep before being killed by the server; it effectively locked my site until the mess could be repaired.
To get rid of all the files accidentally uploaded, I quickly wrote the script in Example 14-16 in emergency (really, panic) mode; it deletes all files and nested subdirectories in an entire remote tree. Luckily, this was very easy to do given all the reuse that Example 14-16 inherits from the FtpTools superclass. Here, we just have to define the extension for recursive remote deletions. Even in tactical mode like this, OOP can be a decided advantage.
Example 14-16. PP3E\Internet\Ftp\mirror\cleanall.py
Besides again being recursive in order to handle arbitrarily shaped trees, the main trick employed here is to parse the output of a remote directory listing. The FTP nlst call used earlier gives us a simple list of filenames; here, we use dir to also get file detail lines like these:
ftp> dir ... -rw-r--r-- 1 ftp ftp 10088 Mar 19 19:35 talkmore.html -rw-r--r-- 1 ftp ftp 8711 Mar 19 19:35 temp.txt drwxr-xr-x 2 ftp ftp 4096 Mar 19 20:13 tempdir -rw-r--r-- 1 ftp ftp 6748 Mar 19 19:35 testicon.jpg -rw-r--r-- 1 ftp ftp 355 Mar 19 19:35 testicon_py.html
This output format is potentially server-specific, so check this on your own server before relying on this script. For my ISP, if the first character of the first item on the line is character "d", the filename at the end of the line names a remote directory (e.g., tempdir). To parse, the script simply splits on whitespace to extract parts of a line.
The output of our clean-all script in action follows; it shows up in the system console window where the script is run. This reflects a much larger tree than the one uploaded previously:
C:\Mark\temp\website>c:\...\PP3E\Internet\Ftp\mirror\cleanall.py Password for lutz on home.rmi.net: connecting... ... ...lines deleted... ... file t250.jpg file t615c.jpg file talk.html file talkmore.html directory temp file 109_0137.JPG file 2004-longmont-classes.html file 2005-longmont-classes.html file 2006-longmont-classes.html ... ...lines deleted... ... directory exited file testicon.jpg file testicon_py.html ... ...lines deleted... ... file zoo-jan-03.jpg file zopeoutline.htm Done: 855 files and 13 directories cleaned.
It is possible to extend this remote tree-cleaner to also download a remote tree with subdirectories. We'll leave this final step as a suggested exercise, though, partly because its dependence on the format produced by server directory listings makes it complex to be robust; and partly because this use case is less common for mein practice, I am more likely to maintain a site on my PC and upload to the server, than to download a tree.
If you do wish to experiment with a recursive download, though, be sure to consult the script Tools\Scripts\ftpmirror.py in Python's install or source tree for hints. That script attempts to download a remote directory tree by FTP, and allows for various directory listing formats which we'll skip here in the interest of space. For our purposes, it's time to move on to the next protocol on our tourInternet email.