Section 14.2. FTP: Transferring Files over the Net


14.2. FTP: Transferring Files over the Net

As we saw in the preceding chapter, sockets see plenty of action on the Net. For instance, the getfile example allowed us to transfer entire files between machines. In practice, though, higher-level protocols are behind much of what happens on the Net. Protocols run on top of sockets, but they hide much of the complexity of the network scripting examples of the prior chapter.

FTPthe File Transfer Protocolis one of the more commonly used Internet protocols. It defines a higher-level conversation model that is based on exchanging command strings and file contents over sockets. By using FTP, we can accomplish the same task as the prior chapter's getfile script, but the interface is simpler and standardFTP lets us ask for files from any server machine that supports FTP, without requiring that it run our custom getfile script. FTP also supports more advanced operations such as uploading files to the server, getting remote directory listings, and more.

Really, FTP runs on top of two sockets: one for passing control commands between client and server (port 21), and another for transferring bytes. By using a two-socket model, FTP avoids the possibility of deadlocks (i.e., transfers on the data socket do not block dialogs on the control socket). Ultimately, though, Python's ftplib support module allows us to upload and download files at a remote server machine by FTP, without dealing in raw socket calls or FTP protocol details.

14.2.1. Fetching Files with ftplib

Because the Python FTP interface is so easy to use, let's jump right into a realistic example. The script in Example 14-1 automatically fetches and opens a remote file with Python. More specifically, this Python script does the following:

  1. Downloads an image file (by default) from a remote FTP site

  2. Opens the downloaded file with a utility we wrote earlier in Example 6-16, in Chapter 6

The download portion will run on any machine with Python and an Internet connection. The opening part works if your playfile.py supports your platform; see Chapter 6 for details, and change as needed.

Example 14-1. PP3E\Internet\Ftp\getone.py

 #!/usr/local/bin/python ############################################################### # A Python script to download and play a media file by FTP. # Uses ftplib, the ftp protocol handler which uses sockets. # Ftp runs on 2 sockets (one for data, one for control--on # ports 20 and 21) and imposes message text formats, but the # Python ftplib module hides most of this protocol's details. # Note: change to fetch file from a site you have access to. ############################################################### import os, sys from getpass import getpass nonpassive  = False                           # force active mode FTP for server? filename    = 'lawnlake2-jan-03.jpg'          # file to be downloaded dirname     = '.'                             # remote directory to fetch from sitename    = 'ftp.rmi.net'                   # FTP site to contact userinfo    = ('lutz', getpass('Pswd?'))      # use ( ) for anonymous if len(sys.argv) > 1: filename = sys.argv[1]  # filename on command line? print 'Connecting...' from ftplib import FTP                      # socket-based FTP tools localfile  = open(filename, 'wb')           # local file to store download connection = FTP(sitename)                  # connect to FTP site connection.login(*userinfo)                 # default is anonymous login connection.cwd(dirname)                     # xfer 1k at a time to localfile if nonpassive:                              # force active FTP if server requires     connection.set_pasv(False) print 'Downloading...' connection.retrbinary('RETR ' + filename, localfile.write, 1024) connection.quit( ) localfile.close( ) if raw_input('Open file?') in 'Yy':     from PP3E.System.Media.playfile import playfile     playfile(filename) 

Most of the FTP protocol details are encapsulated by the Python ftplib module imported here. This script uses some of the simplest interfaces in ftplib (we'll see others later in this chapter), but they are representative of the module in general.

To open a connection to a remote (or local) FTP server, create an instance of the ftplib.FTP object, passing in the string name (domain or IP style) of the machine you wish to connect to:

 connection = FTP(sitename)                  # connect to ftp site 

Assuming this call doesn't throw an exception, the resulting FTP object exports methods that correspond to the usual FTP operations. In fact, Python scripts act much like typical FTP client programsjust replace commands you would normally type or select with method calls:

 connection.login(*userinfo)                 # default is anonymous login connection.cwd(dirname)                     # xfer 1k at a time to localfile 

Once connected, we log in and change to the remote directory from which we want to fetch a file. The login method allows us to pass in a username and password as additional optional arguments to specify an account login; by default, it performs anonymous FTP. Notice the use of the nonpassive flag in this script:

 if nonpassive:                              # force active FTP if server requires     connection.set_pasv(False) 

If this flag is set to TRue, the script will transfer the file in active FTP mode rather than the default passive mode. We'll finesse the details of the difference here (it has to do with which end of the dialog chooses port numbers for the transfer), but if you have trouble doing transfers with any of the FTP scripts in this chapter, try using active mode as a first step. In Python 2.1 and later, passive FTP mode is on by default. Now, fetch the file:

 connection.retrbinary('RETR ' + filename, localfile.write, 1024) 

Once we're in the target directory, we simply call the retrbinary method to download the target server file in binary mode. The retrbinary call will take a while to complete, since it must download a big file. It gets three arguments:

  1. An FTP command string; here, the string RETR filename, which is the standard format for FTP retrievals.

  2. A function or method to which Python passes each chunk of the downloaded file's bytes; here, the write method of a newly created and opened local file.

  3. A size for those chunks of bytes; here, 1,024 bytes are downloaded at a time, but the default is reasonable if this argument is omitted.

Because this script creates a local file named localfile of the same name as the remote file being fetched, and passes its write method to the FTP retrieval method, the remote file's contents will automatically appear in a local, client-side file after the download is finished. Observe how this file is opened in wb binary output mode; if this script is run on Windows, we want to avoid automatically expanding any \n bytes into \r\n byte sequences (that happens automatically on Windows when writing files opened in w text mode).

Finally, we call the FTP quit method to break the connection with the server and manually close the local file to force it to be complete before it is further processed (it's not impossible that parts of the file are still held in buffers before the close call):

 connection.quit( ) localfile.close( ) 

And that's all there is to itall the FTP, socket, and networking details are hidden behind the ftplib interface module. Here is this script in action on a Windows machine; after the download, the image file pops up in a Windows picture viewer on my laptop, as captured in Figure 14-1:

 C:\...\PP3E\Internet\Ftp>python getone.py Pswd? Connecting... Downloading... Open file?y 

Figure 14-1. Image file downloaded by FTP and opened


Notice how the standard Python getpass.getpass is used to ask for an FTP password. Like the raw_input built-in function, this call prompts for and reads a line of text from the console user; unlike raw_input, getpass does not echo typed characters on the screen at all (in fact, on Windows it initially used the low-level direct keyboard interface we met in the stream redirection section of Chapter 3). This is handy for protecting things like passwords from potentially prying eyes. Be careful, thoughin the current IDLE GUI, the password is echoed anyhow!

Configure this script's initial assignments for a site and file you wish to fetch, and run this on your machine to see the opened file.[*] The thing to notice is that this otherwise typical Python script fetches information from an arbitrarily remote FTP site and machine. Given an Internet link, any information published by an FTP server on the Net can be fetched by and incorporated into Python scripts using interfaces such as these.

[*] In the prior edition of this book, the examples in the first part of this chapter were coded to download files from Python's anonymous FTP site, ftp.python.org, so that readers could run them without having to have an FTP account of their own (the examples fetched the Python source distribution, and the sousa audio file). Unfortunately, just weeks before the final draft of this edition was wrapped up, that FTP site was shut down permanently, supposedly. If you want to play with the new examples here, you'll need to find a site to transfer to and from, or check whether ftp.python.org is available again. HTTP from

www.python.org
still works as before. See the directory defunct in the source tree for the original examples.

14.2.2. Using urllib to FTP Files

In fact, FTP is just one way to transfer information across the Net, and there are more general tools in the Python library to accomplish the prior script's download. Perhaps the most straightforward is the Python urllib module: given an Internet address stringa URL, or Universal Resource Locatorthis module opens a connection to the specified server and returns a file-like object ready to be read with normal file object method calls (e.g., read, readline).

We can use such a higher-level interface to download anything with an address on the Webfiles published by FTP sites (using URLs that start with "ftp://"); web pages and output of scripts that live on remote servers (using "http://" URLs); local files (using "file://" URLs); Gopher server data; and more. For instance, the script in Example 14-2 does the same as the one in Example 14-1, but it uses the general urllib module to fetch the source distribution file, instead of the protocol-specific ftplib.

Example 14-2. PP3E\Internet\Ftp\getone-urllib.py

 #!/usr/local/bin/python ################################################################### # A Python script to download a file by FTP by its URL string. # use higher-level urllib instead of ftplib to fetch file; # urllib supports FTP, HTTP, and gopher protocols, and local files; # urllib also allows downloads of html pages, images, text, etc.; # see also Python html/xml parsers for web pages fetched by urllib; ################################################################### import os, getpass import urllib                            # socket-based web tools filename = 'lawnlake2-jan-03.jpg'        # remote/local filename password = getpass.getpass('Pswd?') remoteaddr = 'ftp://lutz:%s@ftp.rmi.net/%s;type=i' % (password, filename) print 'Downloading', remoteaddr # this works too: # urllib.urlretrieve(remoteaddr, filename) remotefile = urllib.urlopen(remoteaddr)     # returns input file-like object localfile  = open(filename, 'wb')           # where to store data locally localfile.write(remotefile.read( )) localfile.close( ) remotefile.close( ) 

Don't sweat the details of the URL string used here; it is fairly complex, and we'll explain its structure and that of URLs in general in Chapter 16. We'll also use urllib again in this and later chapters to fetch web pages, format generated URL strings, and get the output of remote scripts on the Web.

Technically speaking, urllib supports a variety of Internet protocols (HTTP, FTP, Gopher, and local files). Unlike ftplib, urlib is generally used for reading remote objects, not for writing or uploading them (though the HTTP and FTP protocols support file uploads). As with ftplib, retrievals must generally be run in threads if blocking is a concern. But the basic interface shown in this script is straightforward. The call:

 remotefile = urllib.urlopen(remoteaddr)     # returns input file-like object 

contacts the server named in the remoteaddr URL string and returns a file-like object connected to its download stream (here, an FTP-based socket). Calling this file's read method pulls down the file's contents, which are written to a local client-side file. An even simpler interface:

 urllib.urlretrieve(remoteaddr, filename) 

also does the work of opening a local file and writing the downloaded bytes into itthings we do manually in the script as coded. This comes in handy if we want to download a file, but it is less useful if we want to process its data immediately.

Either way, the end result is the same: the desired server file shows up on the client machine. The output is similar to the original version, but we don't try to automatically open this time (I've changed the password in the URL here to protect the innocent):

 C:\...\PP3E\Internet\Ftp>getone-urllib.py Pswd? Downloading ftp://lutz:password@ftp.rmi.net/lawnlake2-jan-03.jpg;type=i 

For more urllib download examples, see the section on HTTP in this chapter, and the server-side examples in Chapter 16. As we'll see in Chapter 16, in bigger terms, tools like urllib.urlopen allow scripts to both download remote files and invoke programs that are located on a remote server machine, and so serves as a useful tool for testing and using web sites in Python scripts. In Chapter 16, we'll also see that urllib includes tools for formatting (escaping) URL strings for safe transmission.

14.2.3. FTP get and put Utilities

When I present the ftplib interfaces in Python classes, students often ask why programmers need to supply the RETR string in the retrieval method. It's a good questionthe RETR string is the name of the download command in the FTP protocol, but ftplib is supposed to encapsulate that protocol. As we'll see in a moment, we have to supply an arguably odd STOR string for uploads as well. It's boilerplate code that you accept on faith once you see it, but that begs the question. You could propose a patch to ftplib, but that's not really a good answer for beginning Python students, and it may break existing code (the interface is as it is for a reason).

Perhaps a better answer is that Python makes it easy to extend the standard library modules with higher-level interfaces of our ownwith just a few lines of reusable code, we can make the FTP interface look any way we want in Python. For instance, we could, once and for all, write utility modules that wrap the ftplib interfaces to hide the RETR string. If we place these utility modules in a directory on PYTHONPATH, they become just as accessible as ftplib itself, automatically reusable in any Python script we write in the future. Besides removing the RETR string requirement, a wrapper module could also make assumptions that simplify FTP operations into single function calls.

For instance, given a module that encapsulates and simplifies ftplib, our Python fetch-and-play script could be further reduced to the script shown in Example 14-3essentially just two function calls plus a password prompt.

Example 14-3. PP3E\Internet\Ftp\getone-modular.py

 #!/usr/local/bin/python ################################################################ # A Python script to download and play a media file by FTP. # Uses getfile.py, a utility module which encapsulates FTP step. ################################################################ import getfile from getpass import getpass filename = 'lawnlake2-jan-03.jpg' # fetch with utility getfile.getfile(file=filename,                 site='ftp.rmi.net',                 dir ='.',                 user=('lutz', getpass('Pswd?')),                 refetch=True) # rest is the same if raw_input('Open file?') in 'Yy':    from PP3E.System.Media.playfile import playfile     playfile(filename) 

Besides having a much smaller line count, the meat of this script has been split off into a file for reuse elsewhere. If you ever need to download a file again, simply import an existing function instead of copying code with cut-and-paste editing. Changes in download operations would need to be made in only one file, not everywhere we've copied boilerplate code; getfile.getfile could even be changed to use urllib rather than ftplib without affecting any of its clients. It's good engineering.

14.2.3.1. Download utility

So just how would we go about writing such an FTP interface wrapper (he asks, rhetorically)? Given the ftplib library module, wrapping downloads of a particular file in a particular directory is straightforward. Connected FTP objects support two download methods:


retrbinary

This method downloads the requested file in binary mode, sending its bytes in chunks to a supplied function, without line-feed mapping. Typically, the supplied function is a write method of an open local file object, such that the bytes are placed in the local file on the client.


retrlines

This method downloads the requested file in ASCII text mode, sending each line of text to a supplied function with all end-of-line characters stripped. Typically, the supplied function adds a \n newline (mapped appropriately for the client machine), and writes the line to a local file.

We will meet the retrlines method in a later example; the getfile utility module in Example 14-4 always transfers in binary mode with retrbinary. That is, files are downloaded exactly as they were on the server, byte for byte, with the server's line-feed conventions in text files. You may need to convert line feeds after downloads if they look odd in your text editorsee the converter tools in Chapter 7 for pointers.

Example 14-4. PP3E\Internet\Ftp\getfile.py

 #!/usr/local/bin/python ######################################################################### # Fetch an arbitrary file by FTP.  Anonymous FTP unless you pass a # user=(name, pswd) tuple. Self-test FTPs a test file and site. ######################################################################### from ftplib  import FTP          # socket-based FTP tools from os.path import exists       # file existence test def getfile(file, site, dir, user=( ), verbose=True, refetch=False):     """    fetch a file by ftp from a site/directory     anonymous or real login, binary transfer     """     if exists(file) and not refetch:         if verbose: print file, 'already fetched'     else:         if verbose: print 'Downloading', file         local = open(file, 'wb')                # local file of same name         try:             remote = FTP(site)                  # connect to FTP site             remote.login(*user)                 # anonymous=( ) or (name, pswd)             remote.cwd(dir)             remote.retrbinary('RETR ' + file, local.write, 1024)             remote.quit( )         finally:             local.close( )                       # close file no matter what         if verbose: print 'Download done.'        # caller handles exceptions if _ _name_ _ == '_ _main_ _':     from getpass import getpass     file = 'lawnlake2-jan-03.jpg'     dir  = '.'     site = 'ftp.rmi.net'     user = ('lutz', getpass('Pswd?'))     getfile(file, site, dir, user) 

This module is mostly just a repackaging of the FTP code we used to fetch the image file earlier, to make it simpler and reusable. Because it is a callable function, the exported getfile.getfile here tries to be as robust and generally useful as possible, but even a function this small implies some design decisions. Here are a few usage notes:


FTP mode

The getfile function in this script runs in anonymous FTP mode by default, but a two-item tuple containing a username and password string may be passed to the user argument in order to log in to the remote server in nonanonymous mode. To use anonymous FTP, either don't pass the user argument or pass it an empty tuple, ( ). The FTP object login method allows two optional arguments to denote a username and password, and the function(*args) call syntax in Example 14-4 sends it whatever argument tuple you pass to user (it works like the older apply built-in).


Processing modes

If passed, the last two arguments (verbose, refetch) allow us to turn off status messages printed to the stdout stream (perhaps undesirable in a GUI context) and to force downloads to happen even if the file already exists locally (the download overwrites the existing local file).


Exception protocol

The caller is expected to handle exceptions; this function wraps downloads in a TRy/finally statement to guarantee that the local output file is closed, but it lets exceptions propagate. If used in a GUI or run from a thread, for instance, exceptions may require special handling unknown in this file.


Self-test

If run standalone, this file downloads an image file again from my web site as a self-test, but the function will normally be passed FTP filenames, site names, and directory names as well.


File mode

This script is careful to open the local output file in wb binary mode to suppress end-line mapping, in case it is run on Windows. As we learned in Chapter 4, it's not impossible that true binary datafiles may have bytes whose value is equal to a \n line-feed character; opening in w text mode instead would make these bytes automatically expand to a \r\n two-byte sequence when written locally on Windows. This is only an issue for portability to Windows (mode w works elsewhere). Again, see Chapter 7 for line-feed converter tools.


Directory model

This function currently uses the same filename to identify both the remote file and the local file where the download should be stored. As such, it should be run in the directory where you want the file to show up; use os.chdir to move to directories if needed. (We could instead assume filename is the local file's name, and strip the local directory with os.path.split to get the remote name, or accept two distinct filename argumentslocal and remote.)

Also notice that, despite its name, this module is very different from the getfile.py script we studied at the end of the sockets material in the preceding chapter. The socket-based getfile implemented client and server-side logic to download a server file to a client machine over raw sockets.

The new getfile here is a client-side tool only. Instead of raw sockets, it uses the simpler FTP protocol to request a file from a server; all socket-level details are hidden in the ftplib module's implementation of the FTP client protocol. Furthermore, the server here is a perpetually running program on the server machine, which listens for and responds to FTP requests on a socket, on the dedicated FTP port (number 21). The net functional effect is that this script requires an FTP server to be running on the machine where the desired file lives, but such a server is much more likely to be available.

14.2.3.2. Upload utility

While we're at it, let's write a script to upload a single file by FTP to a remote machine. The upload interfaces in the FTP module are symmetric with the download interfaces. Given a connected FTP object, its:

  • storbinary method can be used to upload bytes from an open local file object

  • storlines method can be used to upload text in ASCII mode from an open local file object

Unlike the download interfaces, both of these methods are passed a file object as a whole, not a file object method (or other function). We will meet the storlines method in a later example. The utility module in Example 14-5 uses storbinary such that the file whose name is passed in is always uploaded verbatimin binary mode, without line-feed translations for the target machine's conventions. If this script uploads a text file, it will arrive exactly as stored on the machine it came from, client line-feed markers and all.

Example 14-5. PP3E\Internet\Ftp\putfile.py

 #!/usr/local/bin/python ########################################################## # Store an arbitrary file by FTP.  Uses anonymous # ftp unless you pass in a user=(name, pswd) tuple. ########################################################## import ftplib                    # socket-based FTP tools def putfile(file, site, dir, user=( ), verbose=True):     """     store a file by ftp to a site/directory     anonymous or real login, binary transfer     """     if verbose: print 'Uploading', file     local  = open(file, 'rb')               # local file of same name     remote = ftplib.FTP(site)               # connect to FTP site     remote.login(*user)                     # anonymous or real login     remote.cwd(dir)     remote.storbinary('STOR ' + file, local, 1024)     remote.quit( )     local.close( )     if verbose: print 'Upload done.' if _ _name_ _ == '_ _main_ _':     site = 'ftp.rmi.net'     dir  = '.'     import sys, getpass     pswd = getpass.getpass(site + ' pswd?')                # filename on cmdline     putfile(sys.argv[1], site, dir, user=('lutz', pswd))   # nonanonymous login 

Notice that for portability, the local file is opened in rb binary mode this time to suppress automatic line-feed character conversions, in case this is run on Windows: if this is binary information, we don't want any bytes that happen to have the value of the \r carriage-return character to mysteriously go away during the transfer.

This script uploads a file you name on the command line as a self-test, but you will normally pass in real remote filename, site name, and directory name strings. Also like the download utility, you may pass a (username, password) tuple to the user argument to trigger nonanonymous FTP mode (anonymous FTP is the default).

14.2.3.3. Playing the Monty Python theme song

It's time for a bit of fun. Let's use these scripts to transfer a copy of the Monty Python theme song audio file I have at my web site. First, let's write a module that downloads and plays the sample file, as shown in Example 14-6.

Example 14-6. PP3E\Internet\Ftp\sousa.py

 #!/usr/local/bin/python ####################################################################### # Usage: sousa.py.  Fetch and play the Monty Python theme song. # This may not work on your system as is: it requires a machine with # Internet access, and uses audio filters on Unix and your .au player # on Windows.  Configure playfile.py as needed for your platform. ####################################################################### from PP3E.Internet.Ftp.getfile  import getfile from PP3E.System.Media.playfile import playfile from getpass import getpass file = 'sousa.au'                      # default file coordinates site = 'ftp.rmi.net'                   # Monty Python theme song dir  = '.' user = ('lutz', getpass('Pswd?')) getfile(file, site, dir, user)         # fetch audio file by FTP playfile(file)                         # send it to audio player # import os # os.system('getone.py sousa.au')      # equivalent command line 

There's not much to this script, because it really just combines two tools we've already coded. We're reusing Example 14-4's getfile to download, and Chapter 6's playfile module (Example 6-16) to play the audio sample after it is downloaded (turn back to that example for more details on the player part of the task). Also notice the last two lines in this filewe can achieve the same effect by passing in the audio filename as a command-line argument to our original script, but it's less direct.

This script will run on any machine with Python, an Internet link, and a recognizable audio player; it works on my Windows laptop with a dial-up or broadband Internet connection, and it plays the music clip in Windows Media Player (if I could insert an audio file hyperlink here to show what it sounds like, I would):

 C:\...\PP3E\Internet\Ftp>sousa.py Pswd? Downloading sousa.au Download done. C:\...\PP3E\Internet\Ftp>sousa.py Pswd? sousa.au already fetched 

The getfile and putfile modules can be used to move the sample file around, too. Both can either be imported by clients that wish to use their functions, or run as top-level programs to trigger self-tests and command-line usage. Let's run these scripts from a command line and the interactive prompt to see how they work. When run standalone, parameters are passed in the command line and the default file settings are used:

 C:\...\PP3E\Internet\Ftp>putfile.py sousa.py ftp.rmi.net pswd? Uploading sousa.py Upload done. 

When imported, parameters are passed explicitly to functions:

 C:\...\PP3E\Internet\Ftp>python >>> from getfile import getfile >>> getfile(file='sousa.au', site='ftp.rmi.net', dir='.', user=('lutz', 'XXX')) sousa.au already fetched C:\...\PP3E\Internet\Ftp>del sousa.au C:\...\PP3E\Internet\Ftp>python >>> from getfile import getfile >>> getfile(file='sousa.au', site='ftp.rmi.net', dir='.', user=('lutz', 'XXX')) Downloading sousa.au Download done. >>> from PP3E.System.Media.playfile import playfile >>> playfile('sousa.au') 

14.2.3.4. Adding user interfaces

If you read the preceding chapter, you'll recall that it concluded with a quick look at scripts that added a user interface to a socket-based getfile scriptone that transferred files over a proprietary socket dialog, instead of over FTP. At the end of that presentation, I mentioned that FTP is a much more generally useful way to move files around because FTP servers are so widely available on the Net. For illustration purposes, Example 14-7 shows a simple mutation of the prior chapter's user interface, implemented as a new subclass of the last chapter's general form builder.

Example 14-7. PP3E\Internet\Ftp\getfilegui.py

 ############################################################################### # launch FTP getfile function with a reusable form GUI class;  uses os.chdir # to goto target local dir (getfile currently assumes that filename has no # local directory path prefix);  runs getfile.getfile in thread to allow more # than one to be running at once and avoid blocking GUI during downloads; # this differs from socket-based getfilegui, but reuses Form;  supports both # user and anonymous FTP as currently coded; caveats: the password field is # not displayed as stars here, errors are printed to the console instead of # shown in the GUI (threads can't touch the GUI on Windows), this isn't 100% # thread safe (there is a slight delay between os.chdir here and opening the # local output file in getfile) and we could display both a save-as popup for # picking the local dir, and a remote dir listing for picking the file to get; ############################################################################### from Tkinter import Tk, mainloop from tkMessageBox import showinfo import getfile, os, sys, thread                 # FTP getfile here, not socket from PP3E.Internet.Sockets.form import Form     # reuse form tool in socket dir class FtpForm(Form):     def _ _init_ _(self):         root = Tk( )         root.title(self.title)         labels = ['Server Name', 'Remote Dir', 'File Name',                   'Local Dir',   'User Name?', 'Password?']         Form._ _init_ _(self, labels, root)         self.mutex = thread.allocate_lock( )         self.threads = 0     def transfer(self, filename, servername, remotedir, userinfo):         try:             self.do_transfer(filename, servername, remotedir, userinfo)             print '%s of "%s" successful'  % (self.mode, filename)         except:             print '%s of "%s" has failed:' % (self.mode, filename),             print sys.exc_info()[0], sys.exc_info( )[1]         self.mutex.acquire( )         self.threads -= 1         self.mutex.release( )     def onSubmit(self):         Form.onSubmit(self)         localdir   = self.content['Local Dir'].get( )         remotedir  = self.content['Remote Dir'].get( )         servername = self.content['Server Name'].get( )         filename   = self.content['File Name'].get( )         username   = self.content['User Name?'].get( )         password   = self.content['Password?'].get( )         userinfo   = ( )         if username and password:             userinfo = (username, password)         if localdir:             os.chdir(localdir)         self.mutex.acquire( )         self.threads += 1         self.mutex.release( )         ftpargs = (filename, servername, remotedir, userinfo)         thread.start_new_thread(self.transfer, ftpargs)         showinfo(self.title, '%s of "%s" started' % (self.mode, filename))     def onCancel(self):         if self.threads == 0:             Tk().quit( )         else:             showinfo(self.title,                      'Cannot exit: %d threads running' % self.threads) class FtpGetfileForm(FtpForm):     title = 'FtpGetfileGui'     mode  = 'Download'     def do_transfer(self, filename, servername, remotedir, userinfo):         getfile.getfile(filename, servername, remotedir, userinfo, 0, 1) if _ _name_ _ == '_ _main_ _':     FtpGetfileForm( )     mainloop( ) 

If you flip back to the end of the preceding chapter, you'll find that this version is similar in structure to its counterpart there; in fact, it has the same name (and is distinct only because it lives in a different directory). The class here, though, knows how to use the FTP-based getfile module from earlier in this chapter instead of the socket-based getfile module we met a chapter ago. When run, this version also implements more input fields, as in Figure 14-2.

Figure 14-2. FTP getfile input form


Notice that a full file path is entered for the local directory here. Otherwise, the script assumes the current working directory, which changes after each download and can vary depending on where the GUI is launched (e.g., the current directory differs when this script is run by the PyDemos program at the top of the examples tree). When we click this GUI's Submit button (or press the Enter key), the script simply passes the form's input field values as arguments to the getfile.getfile FTP utility function shown earlier in this section. It also posts a pop up to tell us the download has begun (Figure 14-3).

Figure 14-3. FTP getfile info pop up


As currently coded, further download status messages show up in the console window; here are the messages for a successful download, as well as one that failed when I mistyped my password (no, it's not really "xxxxxxxxx"):

 C:\...\PP3E\Internet\Ftp>getfilegui.py Server Name     =>      ftp.rmi.net User Name?      =>      lutz Local Dir       =>      c:\temp File Name       =>      calendar.html Password?       =>      xxxxxxxx Remote Dir      =>      . Download of "calendar.html" has failed: ftplib.error_perm 530 Login incorrect. Server Name     =>      ftp.rmi.net User Name?      =>      lutz Local Dir       =>      c:\temp File Name       =>      calendar.html Password?       =>      xxxxxxxxx Remote Dir      =>      . Download of "calendar.html" successful 

Given a username and password, the downloader logs into the specified account. To do anonymous FTP instead, leave the username and password fields blank.

Now, to illustrate the threading capabilities of this GUI, start a download of a large file, then start another download while this one is in progress. The GUI stays active while downloads are underway, so we simply change the input fields and press Submit again.

This second download starts and runs in parallel with the first, because each download is run in a thread, and more than one Internet connection can be active at once. In fact, the GUI itself stays active during downloads only because downloads are run in threads; if they were not, even screen redraws wouldn't happen until a download finished.

We discussed threads in Chapter 5, but this script illustrates some practical thread concerns:

  • This program takes care to not do anything GUI-related in a download thread. At least in the current release on Windows, only the thread that makes GUIs can process them (this is a Windows rule).

  • To avoid killing spawned download threads on some platforms, the GUI must also be careful not to exit while any downloads are in progress. It keeps track of the number of in-progress threads, and just displays a pop up if we try to kill the GUI by pressing the Cancel button while both of these downloads are in progress.

We learned about ways to work around the no-GUI rule for threads in Chapter 11, and we will apply such techniques when we explore the PyMailGUI example in the next chapter. To be portable, though, we can't really close the GUI until the active-thread count falls to zero. Here is the sort of output that appears in the console window when two downloads overlap in time (these particular threads overlapped a long time ago):

 C:\...\PP3E\Internet\Ftp>python getfilegui.py User Name?      => Server Name     =>      ftp.python.org Local Dir       =>      c:\temp Password?       => File Name       =>      python1.5.tar.gz Remote Dir      =>      pub/python/src User Name?      =>      lutz Server Name     =>      starship.python.net Local Dir       =>      c:\temp Password?       =>      xxxxxx File Name       =>      about-pp.html Remote Dir      =>      public_html/home Download of "about-pp.html" successful Download of "python1.5.tar.gz" successful 

This example isn't much more useful than a command line-based tool, of course, but it can be easily modified by changing its Python code, and it provides enough of a GUI to qualify as a simple, first-cut FTP user interface. Moreover, because this GUI runs downloads in Python threads, more than one can be run at the same time from this GUI without having to start or restart a different FTP client tool.

While we're in a GUI mood, let's add a simple interface to the putfile utility too. The script in Example 14-8 creates a dialog that starts uploads in threads. It's almost the same as the getfile GUI we just wrote, so there's nothing new to say. In fact, because get and put operations are so similar from an interface perspective, most of the get form's logic was deliberately factored out into a single generic class (FtpForm), so changes need be made in only a single place. That is, the put GUI here is mostly just a reuse of the get GUI, with distinct output labels and transfer methods. It's in a file by itself to make it easy to launch as a standalone program.

Example 14-8. PP3E\Internet\Ftp\putfilegui.py

 ############################################################### # launch FTP putfile function with a reusable form GUI class; # see getfilegui for notes: most of the same caveats apply; # the get and put forms have been factored into a single # class such that changes need be made in only one place; ############################################################### from Tkinter import mainloop import putfile, getfilegui class FtpPutfileForm(getfilegui.FtpForm):     title = 'FtpPutfileGui'     mode  = 'Upload'     def do_transfer(self, filename, servername, remotedir, userinfo):         putfile.putfile(filename, servername, remotedir, userinfo, 0) if _ _name_ _ == '_ _main_ _':     FtpPutfileForm( )     mainloop( ) 

Running this script looks much like running the download GUI, because it's almost entirely the same code at work. Let's upload some files from the client machine to the server; Figure 14-4 shows the state of the GUI while starting one.

Figure 14-4. FTP putfile input form


And here is the console window output we get when uploading two files in parallel; here again, uploads run in threads, so if we start a new upload before one in progress is finished, they overlap in time:

 User Name?      =>      lutz Server Name     =>      starship.python.net Local Dir       =>      c:\stuff\website\public_html Password?       =>      xxxxxx File Name       =>      about-PP3E.html Remote Dir      =>      public_html User Name?      =>      lutz Server Name     =>      starship.python.net Local Dir       =>      c:\stuff\website\public_html Password?       =>      xxxxxx File Name       =>      about-ppr3e.html Remote Dir      =>      public_html Upload of "about-PP3E.html" successful Upload of "about-ppr2e.html" successful 

Finally, we can bundle up both GUIs in a single launcher script that knows how to start the get and put interfaces, regardless of which directory we are in when the script is started, and independent of the platform on which it runs. Example 14-9 shows this process.

Example 14-9. PP3E\Internet\Ftp\PyFtpGui.pyw

 ################################################################ # spawn FTP get and put GUIs no matter what dir I'm run from; # os.getcwd is not necessarily the place this script lives; # could also hardcode a path from $PP3EHOME, or guessLocation; # could also do this but need the DOS pop up for status messages: # from PP3E.launchmodes import PortableLauncher # PortableLauncher('getfilegui', '%s/getfilegui.py' % mydir)( ) ################################################################ import os, sys from PP3E.Launcher import findFirst mydir = os.path.split(findFirst(os.curdir, 'PyFtpGui.pyw'))[0] if sys.platform[:3] == 'win':     os.system('start %s/getfilegui.py' % mydir)     os.system('start %s/putfilegui.py' % mydir) else:     os.system('python %s/getfilegui.py &' % mydir)     os.system('python %s/putfilegui.py &' % mydir) 

When this script is started, both the get and put GUIs appear as distinct, independently run programs; alternatively, we might attach both forms to a single interface. We could get much fancier than these two interfaces, of course. For instance, we could pop up local file selection dialogs, and we could display widgets that give the status of downloads and uploads in progress. We could even list files available at the remote site in a selectable listbox by requesting remote directory listings over the FTP connection. To learn how to add features like that, though, we need to move on to the next section.

14.2.4. Downloading Web Sites (Mirrors)

Once upon a time, I used Telnet to manage my web site at my Internet Service Provider (ISP).[*] Like most personal web sites, today I maintain mine on my laptop and transfer its files to and from my ISP as needed. Often, this is a simple matter of one or two files, and it can be accomplished with a command-line FTP client. Sometimes, though, I need an easy way to transfer the entire site. Maybe I need to download to detect files that have become out of sync. Occasionally, the changes are so involved that it's easier to upload the entire site in a single step.

[*] The second edition of this book included a tale of woe here about how my ISP forced its users to wean themselves off Telnet access. This seems like a small issue today. Common practice on the Internet has come far in a short time. Come to think of it, so has Python's presence on the Net. When I first found Python in 1992, it was a set of encoded email messages, which users decoded and concatenated and hoped the result worked. Were we ever that young?

Although there are a variety of ways to approach this task, Python can help here, too: by writing Python scripts to automate the upload and download tasks associated with maintaining my web site on my laptop, they provide a portable and mobile solution. Because Python FTP scripts will work on any machine with sockets, they can be run on my laptop and on nearly any other computer where Python is installed. Furthermore, the same scripts used to transfer page files to and from my PC can be used to copy ("mirror") my site to another web server as a backup copy, should my ISP experience an outage.

The following two scripts address these needs. The first, downloadflat.py, automatically downloads (i.e., copies) by FTP all the files in a directory at a remote site to a directory on the local machine. I keep the main copy of my web site files on my PC these days, but I use this script in two ways:

  • To download my web site to client machines where I want to make edits, I fetch the contents of my web directory of my account on my ISP's machine.

  • To mirror my site to my account on another server, I run this script periodically on the target machine if it supports Telnet or secure shell; if it does not, I simply download to one machine and upload from there to the target server.

More generally, this script (shown in Example 14-10) will download a directory full of files to any machine with Python and sockets, from any machine running an FTP server.

Example 14-10. PP3E\Internet\Ftp\mirror\downloadflat.py

 #!/bin/env python ############################################################################### # use FTP to copy (download) all files from a single directory at a remote # site to a directory on the local machine; run me periodically to mirror # a flat FTP site directory to your ISP account;  set user to 'anonymous' # to do anonymous FTP;  we could use try to skip file failures, but the FTP # connection is likely closed if any files fail;  we could also try to # reconnect with a new FTP instance before each transfer: connects once now; # if failures, try setting nonpassive for active FTP, or disable firewalls; # this also depends on a working FTP server, and possibly its load policies. ############################################################################### import os, sys, ftplib from getpass   import getpass from mimetypes import guess_type nonpassive = False                        # passive FTP on by default in 2.1+ remotesite = 'home.rmi.net'               # download from this site remotedir  = '.'                          # and this dir (e.g., public_html) remoteuser = 'lutz' remotepass = getpass('Password for %s on %s: ' % (remoteuser, remotesite)) localdir   = (len(sys.argv) > 1 and sys.argv[1]) or '.' cleanall   = raw_input('Clean local directory first? ')[:1] in ['y', 'Y'] print 'connecting...' connection = ftplib.FTP(remotesite)                 # connect to FTP site connection.login(remoteuser, remotepass)            # login as user/password connection.cwd(remotedir)                           # cd to directory to copy if nonpassive:                                      # force active mode FTP     connection.set_pasv(False)                      # most servers do passive if cleanall:     for localname in os.listdir(localdir):          # try to delete all locals         try:                                        # first to remove old files             print 'deleting local', localname             os.remove(os.path.join(localdir, localname))         except:             print 'cannot delete local', localname count = 0                                           # download all remote files remotefiles = connection.nlst()                     # nlst( ) gives files list                                                     # dir( )  gives full details for remotename in remotefiles:     mimetype, encoding = guess_type(remotename)     # e.g., ('text/plain', 'gzip')     mimetype  = mimetype or '?/?'                   # may be (None, None)     maintype  = mimetype.split('/')[0]              # .jpg ('image/jpeg', None')     localpath = os.path.join(localdir, remotename)     print 'downloading', remotename, 'to', localpath,     print 'as', maintype, encoding or ''     if maintype == 'text' and encoding == None:         # use ascii mode xfer         localfile = open(localpath, 'w')         callback  = lambda line: localfile.write(line + '\n')         connection.retrlines('RETR ' + remotename, callback)     else:         # use binary mode xfer         localfile = open(localpath, 'wb')         connection.retrbinary('RETR ' + remotename, localfile.write)     localfile.close( )     count += 1 connection.quit( ) print 'Done:', count, 'files downloaded.' 

There's not a whole lot that is new to speak of in this script, compared to other FTP examples we've seen thus far. We open a connection with the remote FTP server, log in with a username and password for the desired account (this script never uses anonymous FTP), and go to the desired remote directory. New here, though, are loops to iterate over all the files in local and remote directories, text-based retrievals, and file deletions:


Deleting all local files

This script has a cleanall option, enabled by an interactive prompt. If selected, the script first deletes all the files in the local directory before downloading, to make sure there are no extra files that aren't also on the server (there may be junk here from a prior download). To delete local files, the script calls os.listdir to get a list of filenames in the directory, and os.remove to delete each; see Chapter 4 (or the Python library manual) for more details if you've forgotten what these calls do.

Notice the use of os.path.join to concatenate a directory path and filename according to the host platform's conventions; os.listdir returns filenames without their directory paths, and this script is not necessarily run in the local directory where downloads will be placed. The local directory defaults to the current directory ("."), but can be set differently with a command-line argument to the script.


Fetching all remote files

To grab all the files in a remote directory, we first need a list of their names. The FTP object's nlst method is the remote equivalent of os.listdir: nlist returns a list of the string names of all files in the current remote directory. Once we have this list, we simply step through it in a loop, running FTP retrieval commands for each filename in turn (more on this in a minute).

The nlst method is, more or less, like requesting a directory listing with an ls command in typical interactive FTP programs, but Python automatically splits up the listing's text into a list of filenames. We can pass it a remote directory to be listed; by default it lists the current server directory. A related FTP method, dir, returns the list of line strings produced by an FTP LIST command; its result is like typing a dir command in an FTP session, and its lines contain complete file information, unlike nlst. If you need to know more about all the remote files, parse the result of a dir method call (we'll see how in a later example).


Retrieving: text versus binary

To keep line-ends in sync with the machines that my web files live on, this script distinguishes between binary and text files. It uses the Python mimetypes module to choose between text and binary transfer modes for each file.

We met mimetypes in Chapter 6 near Example 6-16, where we used it to play media files (see the examples and description there for an introduction). Here, mimetypes is used to decide whether a file is text or binary by guessing from its filename extension. For instance, HTML web pages and simple text files are transferred as text with automatic line-end mappings, and images and tar archives are transferred in raw binary mode.

Specifically, binary files are pulled down with the retrbinary method we met earlier and a local open mode of wb to suppress line-feed byte mappingthis script may be run on Windows or Unix-like platforms, and we don't want a \n byte embedded in an image to get expanded to \r\n on Windows. We don't use a chunk-size third argument here, thoughit defaults to a reasonable size if omitted.

For text files, the script instead uses the retrlines method, passing in a function to be called for each line in the text file downloaded. The text line handler function mostly just writes the line to a local file. But notice that the handler function created by the lambda here also adds a \n line-end character to the end of the line it is passed. Python's retrlines method strips all line-feed characters from lines to sidestep platform differences. By adding a \n, the script is sure to add the proper line-end marker character sequence for the local platform on which this script runs (\n or \r\n). For this automapping of the \n in the script to work, of course, we must also open text output files in w text mode, not in wbthe mapping from \n to \r\n on Windows happens when data is written to the file.

All of this is simpler in action than in words. Here is the command I use to download my entire web site from my ISP server account to my Windows laptop PC, in a single step:

 C:\Mark\temp\website>c:\...\PP3E\Internet\Ftp\mirror\downloadflat.py  Password for lutz on home.rmi.net: Clean local directory first? y  connecting... deleting local 2004-longmont-classes.html deleting local 2005-longmont-classes.html deleting local about-hopl.html deleting local about-lp-toc.html deleting local about-lp.html deleting local about-lp2e.html ...  ...lines deleted... ... deleting local dsc00475.jpg deleting local dsc00506.jpg downloading 2004-longmont-classes.html to .\2004-longmont-classes.html as text downloading 2005-longmont-classes.html to .\2005-longmont-classes.html as text downloading about-hopl.html to .\about-hopl.html as text downloading about-lp-toc.html to .\about-lp-toc.html as text downloading about-lp.html to .\about-lp.html as tex ...  ...lines deleted... ... downloading lp2e-updates.html to .\lp2e-updates.html as text downloading 109_0137.JPG to .\109_0137.JPG as image downloading sousa.au to .\sousa.au as audio downloading sousa.py to .\sousa.py as text downloading pp2e-cd-dir.txt.gz to .\pp2e-cd-dir.txt.gz as text gzip downloading wxPython.doc.tgz to .\wxPython.doc.tgz as application gzip downloading img_0694.jpg to .\img_0694.jpg as image downloading t250.jpg to .\t250.jpg as image downloading c3100.gif to .\c3100.gif as image downloading ipod.gif to .\ipod.gif as image downloading lp70.jpg to .\lp70.jpg as image downloading pic23.html to .\pic23.html as text downloading 2006-longmont-classes.html to .\2006-longmont-classes.html as text Done: 258 files downloaded. 

This may take a few moments to complete, depending on your site's size and your connection speed (it's bound by network speed constraints, and it usually takes roughly five minutes on my current laptop and wireless broadband connection). It is much more accurate and easier than downloading files by hand, though. The script simply iterates over all the remote files returned by the nlst method, and downloads each with the FTP protocol (i.e., over sockets) in turn. It uses text transfer mode for names that imply text data, and binary mode for others.

With the script running this way, I make sure the initial assignments in it reflect the machines involved, and then run the script from the local directory where I want the site copy to be stored. Because the target download directory is usually not where the script lives, I need to give Python the full path to the script file. When run on a server in a Telnet session window, for instance, the execution and script directory paths are different, but the script works the same way.

If you elect to delete local files in the download directory, you may also see a batch of "deleting local..." messages scroll by on the screen before any "downloading..." lines appear: this automatically cleans out any garbage lingering from a prior download. And if you botch the input of the remote site password, a Python exception is raised; I sometimes need to run it again (and type more slowly):

 C:\Mark\temp\website>c:\...\PP3E\Internet\Ftp\mirror\downloadflat.py Password for lutz on home.rmi.net: Clean local directory first? y connecting... Traceback (most recent call last):   File "c:\...\PP3E\Internet\Ftp\mirror\downloadflat.py", line 27, in ?     connection.login(remoteuser, remotepass)         # login as user/pass...   File "C:\Python24\lib\ftplib.py", line 362, in login     if resp[0] == '3': resp = self.sendcmd('PASS ' + passwd)   File "C:\Python24\lib\ftplib.py", line 241, in sendcmd     return self.getresp( )   File "C:\Python24\lib\ftplib.py", line 214, in getresp     raise error_perm, resp ftplib.error_perm: 530 Login incorrect. 

It's worth noting that this script is at least partially configured by assignments near the top of the file. In addition, the password and deletion options are given by interactive inputs, and one command-line argument is allowedthe local directory name to store the downloaded files (it defaults to ".", the directory where the script is run). Command-line arguments could be employed to universally configure all the other download parameters and options, too, but because of Python's simplicity and lack of compile/link steps, changing settings in the text of Python scripts is usually just as easy as typing words on a command line.[*]

[*] To check for version skew after a batch of downloads and uploads, you can run the diffall script we wrote in Chapter 7, Example 7-30. For instance, I find files that have diverged over time due to updates on multiple platforms by comparing the download to a local copy of my web site using a command such as C:\...>c:\...\PP3E\System\Filetools\diffall.py . c:\mark\WEBSITE\public_html. See Chapter 7 for more details on this tool.

14.2.5. Uploading Web Sites

Uploading a full directory is symmetric to downloading: it's mostly a matter of swapping the local and remote machines and operations in the program we just met. The script in Example 14-11 uses FTP to copy all files in a directory on the local machine on which it runs, up to a directory on a remote machine.

I really use this script too, most often to upload all of the files maintained on my laptop PC to my ISP account in one fell swoop. I also sometimes use it to copy my site from my PC to a mirror machine or from the mirror machine back to my ISP. Because this script runs on any computer with Python and sockets, it happily transfers a directory from any machine on the Net to any machine running an FTP server. Simply change the initial setting in this module as appropriate for the transfer you have in mind.

Example 14-11. PP3E\Internet\Ftp\mirror\uploadflat.py

 #!/bin/env python ############################################################################## # use FTP to upload all files from one local dir to a remote site/directory; # e.g., run me to copy a web/FTP site's files from your PC to your ISP; # assumes a flat directory upload: uploadall.py does nested directories. # see downloadflat.py comments for more notes: this script is symmetric. ############################################################################## import os, sys, ftplib from getpass import getpass from mimetypes import guess_type nonpassive = False                                  # passive FTP by default remotesite = 'home.rmi.net'                         # upload to this site remotedir  = '.'                                    # from machine running on remoteuser = 'lutz' remotepass = getpass('Password for %s on %s: ' % (remoteuser, remotesite)) localdir   = (len(sys.argv) > 1 and sys.argv[1]) or '.' cleanall   = raw_input('Clean remote directory first? ')[:1] in ['y', 'Y'] print 'connecting...' connection = ftplib.FTP(remotesite)                 # connect to FTP site connection.login(remoteuser, remotepass)            # log in as user/password connection.cwd(remotedir)                           # cd to directory to copy if nonpassive:                                      # force active mode FTP     connection.set_pasv(False)                      # most servers do passive if cleanall:     for remotename in connection.nlst( ):            # try to delete all remotes         try:                                        # first to remove old files             print 'deleting remote', remotename             connection.delete(remotename)         except:             print 'cannot delete remote', remotename count = 0                                           # upload all local files localfiles = os.listdir(localdir)                   # listdir( ) strips dir path                                                     # any failure ends script for localname in localfiles:     mimetype, encoding = guess_type(localname)      # e.g., ('text/plain', 'gzip')     mimetype  = mimetype or '?/?'                   # may be (None, None)     maintype  = mimetype.split('/')[0]              # .jpg ('image/jpeg', None')     localpath = os.path.join(localdir, localname)     print 'uploading', localpath, 'to', localname,     print 'as', maintype, encoding or ''     if maintype == 'text' and encoding == None:         # use ascii mode xfer         localfile = open(localpath, 'r')         connection.storlines('STOR ' + localname, localfile)     else:         # use binary mode xfer         localfile = open(localpath, 'rb')         connection.storbinary('STOR ' + localname, localfile)     localfile.close( )     count += 1 connection.quit( ) print 'Done:', count, 'files uploaded.' 

Similar to the mirror download script, this program illustrates a handful of new FTP interfaces and a set of FTP scripting techniques:


Deleting all remote files

Just like the mirror script, the upload begins by asking whether we want to delete all the files in the remote target directory before copying any files there. This cleanall option is useful if we've deleted files in the local copy of the directory in the clientthe deleted files would remain on the server-side copy unless we delete all files there first.

To implement the remote cleanup, this script simply gets a listing of all the files in the remote directory with the FTP nlst method, and deletes each in turn with the FTP delete method. Assuming we have delete permission, the directory will be emptied (file permissions depend on the account we logged into when connecting to the server). We've already moved to the target remote directory when deletions occur, so no directory paths need to be prepended to filenames here. Note that nlst may raise an exception for some servers if the remote directory is empty; we don't catch the exception here, but you can simply not select a cleaning if one fails for you.


Storing all local files

To apply the upload operation to each file in the local directory, we get a list of local filenames with the standard os.listdir call, and take care to prepend the local source directory path to each filename with the os.path.join call. Recall that os.listdir returns filenames without directory paths, and the source directory may not be the same as the script's execution directory if passed on the command line.


Uploading: text versus binary

This script may also be run on both Windows and Unix-like clients, so we need to handle text files specially. Like the mirror download, this script picks text or binary transfer modes by using Python's mimetypes module to guess a file's type from its filename extension; HTML and text files are moved in FTP text mode, for instance. We already met the storbinary FTP object method used to upload files in binary modean exact, byte-for-byte copy appears at the remote site.

Text mode transfers work almost identically: the storlines method accepts an FTP command string and a local file (or file-like) object opened in text mode, and simply copies each line in the local file to a same-named file on the remote machine. As usual, if we run this script on Windows, opening the input file in r text mode means that DOS-style \r\n end-of-line sequences are mapped to the \n character as lines are read. When the script is run on Unix and Linux, lines end in a single \n already, so no such mapping occurs. The net effect is that data is read portably, with \n characters to represent end-of-line.

For binary files, we open in rb mode to suppress such automatic mapping everywhere (we don't want bytes in an audio file that happen to have the same value as \r to magically disappear when read on Windows!).[*]

[*] Technically, Python's storlines method automatically sends all lines to the server with \r\n line-feed sequences, no matter what it receives from the local file readline method (\n or \r\n). Because of that, the most important distinctions for uploads are to use the rb for binary mode and the storlines method for text. Consult the module ftplib.py in the Python source library directory for more details.

As for the mirror download script, this program simply iterates over all files to be transferred (files in the local directory listing this time), and transfers each in turnin either text or binary mode, depending on the files' names. Here is the command I use to upload my entire web site from my laptop Windows PC to the remote Unix server at my ISP, in a single step:

 C:\Mark\temp\website>c:\...\PP3E\Internet\Ftp\mirror\uploadflat.py  Password for lutz on home.rmi.net: Clean remote directory first? n  connecting... uploading .\109_0137.JPG to 109_0137.JPG as image uploading .\2004-longmont-classes.html to 2004-longmont-classes.html as text uploading .\2005-longmont-classes.html to 2005-longmont-classes.html as text uploading .\2006-longmont-classes.html to 2006-longmont-classes.html as text uploading .\about-hopl.html to about-hopl.html as text ...  ...lines deleted... ... uploading .\visitor_poundbang.py to visitor_poundbang.py as text uploading .\wcall.py to wcall.py as text uploading .\wcall_find.py to wcall_find.py as text uploading .\wcall_find_patt.py to wcall_find_patt.py as text uploading .\wcall_visitor.py to wcall_visitor.py as text uploading .\whatsnew.html to whatsnew.html as text uploading .\whatsold.html to whatsold.html as text uploading .\wxPython.doc.tgz to wxPython.doc.tgz as application gzip uploading .\xlate-lp.html to xlate-lp.html as text uploading .\zaurus0.jpg to zaurus0.jpg as image uploading .\zaurus1.jpg to zaurus1.jpg as image uploading .\zaurus2.jpg to zaurus2.jpg as image uploading .\zoo-jan-03.jpg to zoo-jan-03.jpg as image uploading .\zopeoutline.htm to zopeoutline.htm as text Done: 258 files uploaded. 

On my current laptop and wireless broadband connection, this process typically takes seven minutes, depending on server load. As with the download script, I usually run this command from the local directory where my web files are kept, and I pass Python the full path to the script. When I run this on a Linux server, it works in the same way, but the paths to the script and my web files directory differ. If you elect to clean the remote directory before uploading, you'll get a bunch of "deleting remote..." messages before the "uploading..." lines here, too:[*]

[*] Usage note: these scripts are highly dependent on the FTP server functioning properly. For awhile, the upload script occasionally had timeout errors when running over my current broadband connection. These errors went away later, when my ISP fixed or reconfigured their server. If you have failures, try running against a different server; connecting and disconnecting around each transfer may or may not help (some servers limit their number of connections).

 ... deleting remote uk-3.jpg deleting remote whatsnew.html deleting remote whatsold.html deleting remote xlate-lp.html deleting remote uploadflat.py deleting remote ora-lp-france.gif deleting remote LJsuppcover.jpg deleting remote sonyz505js.gif deleting remote pic14.html ... 

14.2.6. Refactoring Uploads and Downloads for Reuse

The directory upload and download scripts of the prior two sections work as advertised and, apart from the new mimetypes logic, were all we wrote in the prior edition of this book. If you look at these two scripts long enough, though, their similarities will pop out at you eventually. In fact, they are largely the samethey use identical code to configure transfer parameters, connect to the FTP server, and determine file type. The exact details have been lost to time, but some of this code was certainly copied from one file to the other.

Although such redundancy isn't a cause for alarm if we never plan on changing these scripts, it can be a killer in software projects in general. When you have two copies of identical bits of code, not only is there a danger of them becoming out of sync over time (you'll lose uniformity in user interface and behavior), but you also effectively double your effort when it comes time to change code that appears in both places. Unless you're a big fan of extra work, avoid redundancy wherever possible.

This redundancy is especially glaring when we look at the complex code that uses mimetypes to determine file types. Repeating magic like this in more than one place is almost always a bad ideanot only do we have to remember how it works every time we need the same utility, but it is a recipe for errors.

14.2.6.1. Refactoring with functions

As originally coded, our download and upload scripts comprise top-level script code that relies on global variables. Such a structure is difficult to reusecode runs immediately on imports, and it's difficult to generalize for varying contexts. Worse, it's difficult to maintainwhen you program by cut-and-paste of existing code, you increase the cost of future changes every time you click the Paste button.

To demonstrate how we might do better, Example 14-12 shows one way to refactor (reorganize) the download script. By wrapping its parts in functions, they become reusable in other modules, including our upload program.

Example 14-12. PP3E\Internet\Ftp\mirror\downloadflat_modular.py

 #!/bin/env python ############################################################################## # use FTP to copy (download) all files from a remote site and directory # to a directory on the local machine; this version works the same, but has # been refactored to wrap up its code in functions that can be reused by the # uploader, and possibly other programs in the future - else code redundancy, # which may make the two diverge over time, and can double maintenance costs. ############################################################################## import os, sys, ftplib from getpass   import getpass from mimetypes import guess_type, add_type defaultSite = 'home.rmi.net' defaultRdir = '.' defaultUser = 'lutz' def configTransfer(site=defaultSite, rdir=defaultRdir, user=defaultUser):     """     get upload or download parameters     uses a class due to the large number     """     class cf: pass     cf.nonpassive = False                 # passive FTP on by default in 2.1+     cf.remotesite = site                  # transfer to/from this site     cf.remotedir  = rdir                  # and this dir ('.' means acct root)     cf.remoteuser = user     cf.localdir   = (len(sys.argv) > 1 and sys.argv[1]) or '.'     cf.cleanall   = raw_input('Clean target directory first? ')[:1] in ['y','Y']     cf.remotepass = getpass(                     'Password for %s on %s:' % (cf.remoteuser, cf.remotesite))     return cf def isTextKind(remotename, trace=True):     """     use mimetype to guess if filename means text or binary     for 'f.html,   guess is ('text/html', None): text     for 'f.jpeg'   guess is ('image/jpeg', None): binary     for 'f.txt.gz' guess is ('text/plain', 'gzip'): binary     for unknowns,  guess may be (None, None): binary     mimetype can also guess name from type: see PyMailGUI     """     add_type('text/x-python-win', '.pyw')                       # not in tables     mimetype, encoding = guess_type(remotename, strict=False)   # allow extras     mimetype  = mimetype or '?/?'                               # type unknown?     maintype  = mimetype.split('/')[0]                          # get first part     if trace: print maintype, encoding or ''     return maintype == 'text' and encoding == None              # not compressed def connectFtp(cf):     print 'connecting...'     connection = ftplib.FTP(cf.remotesite)           # connect to FTP site     connection.login(cf.remoteuser, cf.remotepass)   # log in as user/password     connection.cwd(cf.remotedir)                     # cd to directory to xfer     if cf.nonpassive:                                # force active mode FTP         connection.set_pasv(False)                   # most servers do passive     return connection def cleanLocals(cf):     """     try to delete all locals files first to remove garbage     """     if cf.cleanall:         for localname in os.listdir(cf.localdir):    # local dirlisting             try:                                     # local file delete                 print 'deleting local', localname                 os.remove(os.path.join(cf.localdir, localname))             except:                 print 'cannot delete local', localname def downloadAll(cf, connection):     """     download all files from remote site/dir per cf config     ftp nlst() gives files list, dir( ) gives full details     """     remotefiles = connection.nlst( )                  # nlst is remote listing     for remotename in remotefiles:         localpath = os.path.join(cf.localdir, remotename)         print 'downloading', remotename, 'to', localpath, 'as',         if isTextKind(remotename):             # use text mode xfer             localfile = open(localpath, 'w')             def callback(line): localfile.write(line + '\n')             connection.retrlines('RETR ' + remotename, callback)         else:             # use binary mode xfer             localfile = open(localpath, 'wb')             connection.retrbinary('RETR ' + remotename, localfile.write)         localfile.close( )     connection.quit( )     print 'Done:', len(remotefiles), 'files downloaded.' if _ _name_ _ == '_ _main_ _':     cf = configTransfer( )     conn = connectFtp(cf)     cleanLocals(cf)          # don't delete if can't connect     downloadAll(cf, conn) 

Compare this version with the original. This script, and every other in this section, runs the same as the original flat download and upload programs, so we won't repeat their outputs here. Although we haven't changed its behavior, though, we've modified the script's software structure radicallyits code is now a set of tools that can be imported and reused in other programs.

The refactored upload program in Example 14-13, for instance, is now noticeably simpler, and the code it shares with the download script only needs to be changed in one place if it ever requires improvement.

Example 14-13. PP3E\Internet\Ftp\mirror\uploadflat_modular.py

 #!/bin/env python ############################################################################## # use FTP to upload all files from a local dir to a remote site/directory; # this version reuses downloader's functions, to avoid code redundancy; ############################################################################## import os from downloadflat_modular import configTransfer, connectFtp, isTextKind def cleanRemotes(cf, connection):     """     try to delete all remote files first to remove garbage     """     if cf.cleanall:         for remotename in connection.nlst( ):          # remote dir listing             try:                                        # remote file delete                 print 'deleting remote', remotename                 connection.delete(remotename)             except:                 print 'cannot delete remote', remotename def uploadAll(cf, connection):     """     upload all files to remote site/dir per cf config     listdir( ) strips dir path, any failure ends script     """     localfiles = os.listdir(cf.localdir)            # listdir is local listing     for localname in localfiles:         localpath = os.path.join(cf.localdir, localname)         print 'uploading', localpath, 'to', localname, 'as',         if isTextKind(localname):             # use text mode xfer             localfile = open(localpath, 'r')             connection.storlines('STOR ' + localname, localfile)         else:             # use binary mode xfer             localfile = open(localpath, 'rb')             connection.storbinary('STOR ' + localname, localfile)         localfile.close( )     connection.quit( )     print 'Done:', len(localfiles), 'files uploaded.' if _ _name_ _ == '_ _main_ _':     cf = configTransfer( )     conn = connectFtp(cf)     cleanRemotes(cf, conn)     uploadAll(cf, conn) 

Not only is the upload script simpler now because it reuses common code, but it will also inherit any changes made in the download module. For instance, the isTextKind function was later augmented with code that adds the .pyw extension to mimetypes tables (this file type is not recognized by default); because it is a shared function, the change is automatically picked up in the upload program, too.

14.2.6.2. Refactoring with classes

The function-based approach of the last two examples addresses the redundancy issue, but they are perhaps clumsier than they need to be. For instance, their cf configuration options object provides a namespace that replaces global variables and breaks cross-file dependencies. Once we start making objects to model namespaces, though, Python's OOP support tends to be a more natural structure for our code. As one last twist, Example 14-14 refactors the FTP code one more time in order to leverage Python's class feature.

Example 14-14. PP3E\Internet\Ftp\mirror\ftptools.py

 #!/bin/env python ############################################################################## # use FTP to download or upload all files in a single directory from/to a # remote site and directory;  this version has been refactored to use classes # and OOP for namespace and a natural structure;  we could also structure this # as a download superclass, and an upload subclass which redefines the clean # and transfer methods, but then there is no easy way for another client to # invoke both an upload and download;  for the uploadall variant and possibly # others, also make single file upload/download code in orig loops methods; ############################################################################## import os, sys, ftplib from getpass   import getpass from mimetypes import guess_type, add_type # defaults for all clients dfltSite = 'home.rmi.net' dfltRdir = '.' dfltUser = 'lutz' class FtpTools:     # allow these to be redefined     def getlocaldir(self):         return (len(sys.argv) > 1 and sys.argv[1]) or '.'     def getcleanall(self):         return raw_input('Clean target dir first?')[:1] in ['y','Y']     def getpassword(self):         return getpass(                'Password for %s on %s:' % (self.remoteuser, self.remotesite))     def configTransfer(self, site=dfltSite, rdir=dfltRdir, user=dfltUser):         """         get upload or download parameters         from module defaults, args, inputs, cmdline         anonymous ftp: user='anonymous' pass=emailaddr         """         self.nonpassive = False             # passive FTP on by default in 2.1+         self.remotesite = site              # transfer to/from this site         self.remotedir  = rdir              # and this dir ('.' means acct root)         self.remoteuser = user         self.localdir   = self.getlocaldir( )         self.cleanall   = self.getcleanall( )         self.remotepass = self.getpassword( )     def isTextKind(self, remotename, trace=True):         """         use mimetypes to guess if filename means text or binary         for 'f.html,   guess is ('text/html', None): text         for 'f.jpeg'   guess is ('image/jpeg', None): binary         for 'f.txt.gz' guess is ('text/plain', 'gzip'): binary         for unknowns,  guess may be (None, None): binary         mimetypes can also guess name from type: see PyMailGUI         """         add_type('text/x-python-win', '.pyw')                    # not in tables         mimetype, encoding = guess_type(remotename, strict=False)# allow extras         mimetype  = mimetype or '?/?'                            # type unknown?         maintype  = mimetype.split('/')[0]                       # get 1st part         if trace: print maintype, encoding or ''         return maintype == 'text' and encoding == None           # not compressed     def connectFtp(self):         print 'connecting...'         connection = ftplib.FTP(self.remotesite)           # connect to FTP site         connection.login(self.remoteuser, self.remotepass) # log in as user/pswd         connection.cwd(self.remotedir)                     # cd to dir to xfer         if self.nonpassive:                                # force active mode             connection.set_pasv(False)                     # most do passive         self.connection = connection     def cleanLocals(self):         """         try to delete all local files first to remove garbage         """         if self.cleanall:             for localname in os.listdir(self.localdir):    # local dirlisting                 try:                                       # local file delete                     print 'deleting local', localname                     os.remove(os.path.join(self.localdir, localname))                 except:                     print 'cannot delete local', localname     def cleanRemotes(self):         """         try to delete all remote files first to remove garbage         """         if self.cleanall:             for remotename in self.connection.nlst( ):     # remote dir listing                 try:                                        # remote file delete                     print 'deleting remote', remotename                     self.connection.delete(remotename)                 except:                     print 'cannot delete remote', remotename     def downloadOne(self, remotename, localpath):         """         download one file by FTP in text or binary mode         local name need not be same as remote name         """         if self.isTextKind(remotename):             localfile = open(localpath, 'w')             def callback(line): localfile.write(line + '\n')             self.connection.retrlines('RETR '+ remotename, callback)         else:             localfile = open(localpath, 'wb')             self.connection.retrbinary('RETR '+ remotename, localfile.write)         localfile.close( )     def uploadOne(self, localname, localpath, remotename):         """         upload one file by FTP in text or binary mode         remote name need not be same as local name         """         if self.isTextKind(localname):             localfile = open(localpath, 'r')             self.connection.storlines('STOR ' + remotename, localfile)         else:             localfile = open(localpath, 'rb')             self.connection.storbinary('STOR ' + remotename, localfile)         localfile.close( )     def downloadDir(self):         """         download all files from remote site/dir per config         ftp nlst() gives files list, dir( ) gives full details         """         remotefiles = self.connection.nlst( )         # nlst is remote listing         for remotename in remotefiles:             localpath = os.path.join(self.localdir, remotename)             print 'downloading', remotename, 'to', localpath, 'as',             self.downloadOne(remotename, localpath)         print 'Done:', len(remotefiles), 'files downloaded.'     def uploadDir(self):         """         upload all files to remote site/dir per config         listdir( ) strips dir path, any failure ends script         """         localfiles = os.listdir(self.localdir)       # listdir is local listing         for localname in localfiles:             localpath = os.path.join(self.localdir, localname)             print 'uploading', localpath, 'to', localname, 'as',             self.uploadOne(localname, localpath, localname)         print 'Done:', len(localfiles), 'files uploaded.'     def run(self, cleanTarget=lambda:None, transferAct=lambda:None):         """         run a complete FTP session         default clean and transfer are no-ops         don't delete if can't connect to server         """         self.configTransfer( )         self.connectFtp( )         cleanTarget( )         transferAct( )         self.connection.quit( ) if _ _name_ _ == '_ _main_ _':     ftp = FtpTools( )     xfermode = 'download'     if len(sys.argv) > 1:         xfermode = sys.argv.pop(1)   # get+del 2nd arg     if xfermode == 'download':         ftp.run(cleanTarget=ftp.cleanLocals,  transferAct=ftp.downloadDir)     elif xfermode == 'upload':         ftp.run(cleanTarget=ftp.cleanRemotes, transferAct=ftp.uploadDir)     else:         print 'Usage: ftptools.py ["download" | "upload"] [localdir]' 

In fact, this last mutation combines uploads and downloads into a single file, because they are so closely related. As before, common code is factored into methods to avoid redundancy. New here, the instance object itself becomes a natural namespace for storing configuration options (they become self attributes). Study this example's code for more details of the restructuring applied.

Although this file can still be run as a command-line script (pass in a command-line argument to specify "download" or "upload"), its class is really now a package of FTP tools that can be mixed into other programs and reused. By wrapping its code in a class, it can be easily customized by redefining its methodsits configuration calls, such as getlocaldir, for example, may be redefined in subclasses for custom scenarios.

Perhaps most important, using classes optimizes code reusability. Clients of this file can both upload and download directories by simply subclassing or embedding an instance of this class and calling its methods. To see one example of how, let's move on to the next section.

14.2.7. Uploads and Deletes with Subdirectories

Perhaps the biggest limitation of the web site download and upload scripts we just met is that they assume the site directory is flat (hence their names). That is, both transfer simple files only, and neither handles nested subdirectories within the web directory to be transferred.

For my purposes, that's often a reasonable constraint. I avoid nested subdirectories to keep things simple, and I store my home web site as a simple directory of files. For other sites (including one I keep at another machine), site transfer scripts are easier to use if they also automatically transfer subdirectories along the way.

14.2.7.1. Uploading local trees

It turns out that supporting directories on uploads is fairly simplewe need to add only a bit of recursion, and remote directory creation calls. The upload script in Example 14-15 extends the version we just saw, to handle uploading all subdirectories nested within the transferred directory. Furthermore, it recursively transfers subdirectories within subdirectoriesthe entire directory tree contained within the top-level transfer directory is uploaded to the target directory at the remote server.

In terms of its code structure, Example 14-15 is just a customization of the FtpTools class of the prior sectionreally we're just adding a method for recursive uploads, by subclassing. As one consequence, we get tools such as parameter configuration, content type testing, and connection and upload code for free here; with OOP, some of the work is done before we start.

Example 14-15. PP3E\Internet\Ftp\mirror\uploadall.py

 #!/bin/env python ############################################################################ # extend the FtpTools class to upload all files and subdirectories from a # local dir tree to a remote site/dir; supports nested dirs too, but not # the cleanall option (that requires parsing FTP listings to detect remote # dirs: see cleanall.py); to upload subdirectories, uses os.path.isdir(path) # to see if a local file is really a directory, FTP( ).mkd(path) to make dirs # on the remote machine (wrapped in a try in case it already exists there), # and recursion to upload all files/dirs inside the nested subdirectory. # see also: uploadall-2.py, which doesn't assume the top remotedir exists. ############################################################################ import os, ftptools class UploadAll(ftptools.FtpTools):     """     upload an entire tree of subdirectories     assumes top remote directory exists     """     def _ _init_ _(self):         self.fcount = self.dcount = 0     def getcleanall(self):         return False  # don't even ask     def uploadDir(self, localdir):         """         for each directory in an entire tree         upload simple files, recur into subdirectories         """         localfiles = os.listdir(localdir)         for localname in localfiles:             localpath = os.path.join(localdir, localname)             print 'uploading', localpath, 'to', localname,             if not os.path.isdir(  localpath):                 self.uploadOne(localname, localpath, localname)                 self.fcount += 1             else:                 try:                     self.connection.mkd(localname)                     print 'directory created'                 except:                     print 'directory not created'                 self.connection.cwd(localname)             # change remote dir                 self.uploadDir(localpath)                  # upload local subdir                 self.connection.cwd('..')                  # change back up                 self.dcount += 1                 print 'directory exited' if _ _name_ _ == '_ _main_ _':     ftp = UploadAll( )     ftp.run(transferAct = lambda: ftp.uploadDir(ftp.localdir))     print 'Done:', ftp.fcount, 'files and', ftp.dcount, 'directories uploaded.' 

Like the flat upload script, this one can be run on any machine with Python and sockets and upload to any machine running an FTP server; I run it both on my laptop PC and on other servers by Telnet to upload sites to my ISP.

The crux of the matter in this script is the os.path.isdir test near the top; if this test detects a directory in the current local directory, we create an identically named directory on the remote machine with connection.mkd and descend into it with connection.cwd, and recur into the subdirectory on the local machine (we have to use recursive calls here, because the shape and depth of the tree are arbitrary). Like all FTP object methods, mkd and cwd methods issue FTP commands to the remote server. When we exit a local subdirectory, we run a remote cwd('..') to climb to the remote parent directory and continue. The rest of the script is roughly the same as the original.

In the interest of space, I'll leave studying this variant in more depth as a suggested exercise. For more context, see the experimental\uploadall-2.py version of this script in the examples distribution; it's similar, but coded so as not to assume that the top-level remote directory already exists.

Here is the sort of output displayed on the console when the upload-all script is run. It's similar to the flat upload (which you might expect, given that it is reusing much of the same code), but notice that it traverses and uploads two nested subdirectories along the way, .\tempdir and .\tempdir\nested:

 C:\Mark\temp\website>c:\...\PP3E\Internet\Ftp\mirror\uploadall.py  Password for lutz on home.rmi.net: connecting... uploading .\109_0137.JPG to 109_0137.JPG image uploading .\2004-longmont-classes.html to 2004-longmont-classes.html text uploading .\2005-longmont-classes.html to 2005-longmont-classes.html text uploading .\2006-longmont-classes.html to 2006-longmont-classes.html text ...  ...lines deleted... ... uploading .\t615c.jpg to t615c.jpg image uploading .\talk.html to talk.html text uploading .\temp.txt to temp.txt text uploading .\tempdir to tempdir directory created uploading .\tempdir\index.html to index.html text uploading .\tempdir\nested to nested directory created uploading .\tempdir\nested\about-pp.html to about-pp.html text uploading .\tempdir\nested\calendar.html to calendar.html text directory exited uploading .\tempdir\zaurus0.jpg to zaurus0.jpg image directory exited uploading .\testicon.jpg to testicon.jpg image uploading .\testicon_py.html to testicon_py.html text ...  ...lines deleted... ... uploading .\zoo-jan-03.jpg to zoo-jan-03.jpg image uploading .\zopeoutline.htm to zopeoutline.htm text Done: 261 files and 2 directories uploaded. 

As is, the script of Example 14-15 handles only directory tree uploads; recursive uploads are generally more useful than recursive downloads if you maintain your web sites on your local PC and upload to a server periodically, as I do. To also download (mirror) a web site that has subdirectories, a script must parse the output of a remote listing command to detect remote directories. For the same reason, the recursive upload script was not coded to support the remote directory tree cleanup option of the originalsuch a feature would require parsing remote listings as well. The next section shows how.

14.2.7.2. Deleting remote trees

One last example of code reuse at work: when I initially tested the prior section's upload-all script, it contained a bug that caused it to fall into an infinite recursion loop, and keep copying the full site into new subdirectories, over and over, until the FTP server kicked me off (not an intended feature of the program!). In fact, the upload got 13 levels deep before being killed by the server; it effectively locked my site until the mess could be repaired.

To get rid of all the files accidentally uploaded, I quickly wrote the script in Example 14-16 in emergency (really, panic) mode; it deletes all files and nested subdirectories in an entire remote tree. Luckily, this was very easy to do given all the reuse that Example 14-16 inherits from the FtpTools superclass. Here, we just have to define the extension for recursive remote deletions. Even in tactical mode like this, OOP can be a decided advantage.

Example 14-16. PP3E\Internet\Ftp\mirror\cleanall.py

 #!/bin/env python ############################################################################## # extend the FtpTools class to delete files and subdirectories from a remote # directory tree; supports nested directories too;  depends on the dir( ) # command output format, which may vary on some servers! - see Python's # Tools\Scripts\ftpmirror.py for hints;  extend me for remote tree downloads; ############################################################################## from ftptools import FtpTools class CleanAll(FtpTools):     """     delete an entire remote tree of subdirectories     """     def _ _init_ _(self):         self.fcount = self.dcount = 0     def getlocaldir(self):         return None  # irrelevent here     def getcleanall(self):         return True  # implied here     def cleanDir(self):         """         for each item in current remote directory,         del simple files, recur into and then del subdirectories         the dir( ) ftp call passes each line to a func or method         """         lines = []                                   # each level has own lines         self.connection.dir(lines.append)            # list current remote dir         for line in lines:             parsed  = line.split( )                      # split on whitespace             permiss = parsed[0]                      # assume 'drw... ... filename'             fname   = parsed[-1]             if permiss[0] != 'd':                    # simple file: delete                 print 'file', fname                 self.connection.delete(fname)                 self.fcount += 1             else:                                    # directory: recur, del                 print 'directory', fname                 self.connection.cwd(fname)           # chdir into remote dir                 self.cleanDir( )                          # clean subdirectory                 self.connection.cwd('..')            # chdir remote back up                 self.connection.rmd(fname)           # delete empty remote dir                 self.dcount += 1                 print 'directory exited' if _ _name_ _ == '_ _main_ _':     ftp = CleanAll( )     ftp.run(cleanTarget=ftp.cleanDir)     print 'Done:', ftp.fcount, 'files and', ftp.dcount, 'directories cleaned.' 

Besides again being recursive in order to handle arbitrarily shaped trees, the main trick employed here is to parse the output of a remote directory listing. The FTP nlst call used earlier gives us a simple list of filenames; here, we use dir to also get file detail lines like these:

 ftp> dir ... -rw-r--r--   1 ftp      ftp         10088 Mar 19 19:35 talkmore.html -rw-r--r--   1 ftp      ftp          8711 Mar 19 19:35 temp.txt drwxr-xr-x   2 ftp      ftp          4096 Mar 19 20:13 tempdir -rw-r--r--   1 ftp      ftp          6748 Mar 19 19:35 testicon.jpg -rw-r--r--   1 ftp      ftp           355 Mar 19 19:35 testicon_py.html 

This output format is potentially server-specific, so check this on your own server before relying on this script. For my ISP, if the first character of the first item on the line is character "d", the filename at the end of the line names a remote directory (e.g., tempdir). To parse, the script simply splits on whitespace to extract parts of a line.

The output of our clean-all script in action follows; it shows up in the system console window where the script is run. This reflects a much larger tree than the one uploaded previously:

 C:\Mark\temp\website>c:\...\PP3E\Internet\Ftp\mirror\cleanall.py  Password for lutz on home.rmi.net: connecting... ...  ...lines deleted... ... file t250.jpg file t615c.jpg file talk.html file talkmore.html directory temp file 109_0137.JPG file 2004-longmont-classes.html file 2005-longmont-classes.html file 2006-longmont-classes.html ...  ...lines deleted... ... directory exited file testicon.jpg file testicon_py.html ...  ...lines deleted... ... file zoo-jan-03.jpg file zopeoutline.htm Done: 855 files and 13 directories cleaned. 

It is possible to extend this remote tree-cleaner to also download a remote tree with subdirectories. We'll leave this final step as a suggested exercise, though, partly because its dependence on the format produced by server directory listings makes it complex to be robust; and partly because this use case is less common for mein practice, I am more likely to maintain a site on my PC and upload to the server, than to download a tree.

If you do wish to experiment with a recursive download, though, be sure to consult the script Tools\Scripts\ftpmirror.py in Python's install or source tree for hints. That script attempts to download a remote directory tree by FTP, and allows for various directory listing formats which we'll skip here in the interest of space. For our purposes, it's time to move on to the next protocol on our tourInternet email.




Programming Python
Programming Python
ISBN: 0596009259
EAN: 2147483647
Year: 2004
Pages: 270
Authors: Mark Lutz

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net