There's a directory on a remote FTP server that changes every few weeks. You want to mirror a copy of the directory on your local machine and update your copy every time it changes. You can't use one of the many "mirror" scripts to do this because the directory name contains a timestamp, and you need to do a pattern match to identify the right directory. Net::FTP to the rescue. Net::FTP is part of the libnet utilities by Graham Barr. In addition to Net::FTP, libnet includes Net::SMTP, Net::NNTP, and Net::POP3 discussed in later chapters. When you install the libnet modules, the install script prompts you for various default configuration parameters used by the Net::* modules. This includes such things as an FTP firewall proxy and the default mail exchanger for your domain. See the documentation for Net::Config (also part of the libnet utilities) for information on how to override the defaults later. Net::FTP, like many of the client modules, uses an object-oriented interface. When you first log in to an FTP server, the module returns a Net::FTP object to you. You then use this object to get directory listings from the server, to transfer files, and to send other commands. A Net::FTP Example Figure 6.1 is a simple example that uses Net::FTP to connect to ftp.perl.org and download the file named RECENT from the directory /pub/CPAN/ . If the program runs successfully, it creates a file named RECENT in the current directory. This file contains the names of all files recently uploaded to CPAN. Figure 6.1. Downloading a single file with Net::FTP Lines 1 “5: Initialize We load the Net::FTP module and define constants for the host to connect to and the file to download. Line 6: Connect to remote host We connect to the FTP host by invoking the Net::FTP new() method with the name of the host to connect to. If successful, new() returns a Net::FTP object connected to the remote server. Otherwise , it returns undef , and we die with an error message. In case of failure, new() leaves a diagnostic error message in $@ . Line 7: Log in to the server After connecting to the server, we still need to log in by calling the Net::FTP object's login() method with a username and password. In this case, we are using anonymous FTP, so we provide the username "anonymous" and let Net::FTP fill in a reasonable default password. If login is successful, login() returns a true value. Otherwise, it returns false and we die, using the FTP object's message() method to retrieve the text of the server's last message. Line 8: Change to remote directory We invoke the FTP object's cwd() ("change working directory") method to enter the desired directory. If this call fails, we again die with the server's last message. Line 9: Retrieve the file We call the FTP object's get() method to retrieve the desired file. If successful, Net::FTP copies the remote file to a local one of the same name in the current directory. Otherwise we die with an error message. Lines 10 “11: Quit We call the FTP object's quit() method to close the connection. FTP and Command-Based Protocols FTP is an example of a common paradigm for Internet services: the command-based protocol. The interaction between client and server is constrained by a well-defined protocol in which the client issues a single-line command and the server returns a line-oriented response. Each of the client commands is a short case-insensitive word, possibly followed by one or more arguments. The command is terminated by a CRLF pair. As we saw in Chapter 5, when we used the gab2.pl script to communicate with an FTP server, the client commands in the FTP protocol include user and PASS , which together are used to log into the server; HELP , to get usage information; and QUIT , to quit the server. Other commands are used to send and retrieve files, obtain directory listings, and so forth. For example, when the client wishes to log in under the user name "anonymous," it will send this command to the server: USER anonymous Each response from the server to the client consists of one or more CRLF-delimited lines. The first line always begin with a three-digit numeric result code indicating the outcome of the command. This is usually followed by a human-readable message. For example, a successful USER command will result in the following server response: 331 Guest login ok, send your complete e-mail address as password. Sometimes a server response will stretch over several lines. In this case, the numeric result code on the first line will end in a "-", and the result code will be repeated (without the dash) on the last line. The FTP protocol's response to the HELP command illustrates this: HELP 214-The following commands are recognized (* =>'s unimplemented). USER PORT STOR MSAM* RNTO NLST MKD CDUP PASS PASV APPE MRSQ* ABOR SITE XMKD XCUP ACCT* TYPE MLFL* MRCP* DELE SYST RMD STOU SMNT* STRU MAIL* ALLO CWD STAT XRMD SIZE REIN* MODE MSND* REST XCWD HELP PWD MDTM QUIT RETR MSOM* RNFR LIST NOOP XPWD 214 Direct comments to ftp-bugs@wuarchive.wustl.edu Commonly the client and server need to exchange large amounts of non-command data. To do this, the client sends a command to warn the server that the data is coming, sends the data, and then terminates the information by sending a lone dot (".") on a line by itself. We will see an example of this in the next chapter when we examine the interaction between an e-mail client and an SMTP server. Server result codes are arbitrary but generally follow a simple convention. Result codes between 100 and 199 are used for informational messages, while those in the 200 “299 range are used to indicate successful completion of a command. Codes in the 300 “399 range are used to indicate that the client must provide more information, such as the password that accompanies a username. Result codes of 400 or greater indicate various errors: the 400 “499 codes are used for client errors, such as an invalid command, while 500 and greater are used for server-side errors, such as an out of memory condition. Because command-based servers are so common, the libnet package comes with a generic building block module called Net::Cmd. The module doesn't actually do anything by itself, but adds functionality to descendents of the IO::Socket module that allow them to easily communicate with this type of network server. Net::FTP, Net::SMTP, Net::NNTP, and Net::POP3 are all derived from Net::Cmd. The two major methods provided by Net::Cmd objects are command() and response() : $success = $obj->command($command [,@args]) Send the command indicated by $command to the server, optionally followed by one or more arguments. command() automatically inserts spaces between arguments and appends a CRLF to the end of the command. If the command was delivered successfully, the method returns true. $status = $obj->response Fetches and parses the server's response to the last command, returning the most significant digit as the method result. For example, if the server's result code is 331, response() will return 3. It returns undef in case of failure. | Subclasses of Net::Cmd build more sophisticated methods on top of the command() and response() . For example, the Net::FTP login() method calls command() twice: once to issue the USER command and again to issue the PASS command. You will not ordinarily call command() and response() , yourself, but use the more specialized (and convenient ) methods provided by the subclass. However, command() and response() are available should you need access to functionality that isn't provided by the module. Several methods provided by Net::Cmd are commonly used by end-user applications. These are code() , message() , and ok() : $code = $obj->code Returns the three-digit numeric result code from the last response. $message = $obj->message Returns the text of the last message from the server. This is particularly useful for diagnosing errors. $ok = $obj->ok The ok() method returns true if the last server response indicated success, false otherwise. It returns true if the result code is greater than 0 but less than 400. | The Net::FTP API We'll now look at the Net::FTP API in greater detail. Net::FTP is a descendent of both IO::Socket and Net::Cmd. As a descendent of IO::Socket, it can be used as a filehandle to communicate directly with the server. For example, you canread and write to a Net::FTP object with syswrite() and sysread () , although you would probably not want to. As a descendent of Net::Cmd, Net::FTP supports the code() , message() , and ok() methods discussed in the previous section. The FTP protocol's status codes are listed in RFC 959 (see Appendix D). To the generic methods inherited from its ancestors , Net::FTP adds a large number of specialized methods that support the special features of the FTP protocol. Only the common methods are listed here. See the Net::FTP documentation for the full API. $ftp = Net::FTP->new($host [,%options]) The new() method creates a Net::FTP object. The mandatory first argument is the domain name of the FTP server you wish to contact. Additional optional arguments are a set of key/value pairs that set options for the session, as shown in Table 6.1. For example, to connect to ftp.perl.org with hash marks enabled and a timeout of 30 seconds, we could use this statement: $ftp = Net::FTP('ftp.perl.org', Timeout=>30, Hash=>1); | Table 6.1. Net::FTP-> new() Options Option | Description | Firewall | Name of the FTP proxy to use when your machine is behind certain types of firewalls | BlockSize | Block size of transfers (default 10240) | Port | FTP port to connect to (default 21) | Timeout | Timeout value, in seconds, for various operations (default 120 seconds) | Debug | Debug level; set to greater than zero for verbose debug messages | Passive | Use FTP passive mode for all file transfers; required by some firewalls | Hash | Prints a hash mark to STDERR for each 1024 bytes of data transferred | $success = $ftp->login([$username [,$password [,$account]]]) The login() method attempts to log in to the server using the provided authentication information. If no username is provided, then Net::FTP assumes "anonymous". Ifno username or password is provided, then Net::FTP looks up the authentication information in the user's . netrc file. If this is still not found, it generates a password of the form " $user@ ", where $USER is your login name. The optional $account argument is for use with some FTP servers that require an additional authentication password to gain access to the filesystem after logging into the server itself. login() returns true if the login was successful, and false otherwise. See the Net::Netrc manual pages for more information on the .netrc file. $type = $ftp->ascii Puts the FTP object into ASCII mode. The server automatically performs newline translation during file transfers (ending lines with CRLF on Windows machines, LF on UNIX machines, and CR on Macintoshes). This is suitable for transferring text files. The return value is the previous value of the transfer type, such as "binary." Note: ASCII mode is the default. $type = $ftp->binary Puts the FTP object into binary mode. The server will not perform translation. This is suitable for transferring binary files such as images. $success = $ftp->delete($file) Deletes the file $file on the server, provided you have sufficient privileges to do this. $success = $ftp->cwd([$directory]) Attempts to change the current working directory on the remote end to the specified path . If no directory is provided, will attempt to change to the root directory " / ". Relative directories are understood , and you can provide a pathname of ".." to move up one level. $directory = $ftp->pwd Returns the full pathname of the current working directory on the remote end. $success = $ftp->rmdir($directory) Remove the specified directory, provided you have sufficient privileges to do so. $success = $ftp->mkdir($directory [,$parents]) Creates a new directory at the indicated path, provided you have sufficient privileges to do so. If $parents is true, Net::FTP attempts to create all missing intermediate directories as well. @items = $ftp->ls([$directory]) Gets a short-format directory list of all the files and subdirectories in the indicated directory or, if not specified, in the current working directory. In a scalar context, ls() returns a reference to an array rather than the list itself. By default, each member of the returned list consists of just the bare file or directory name. However, since the FTP daemon just passes the argument to the ls command, you are free to pass command-line arguments to ls . For example, this returns a long listing: @items = $ftp->ls('-lF'); @items = $ftp->dir([$directory]) Gets a long-format directory list of all the files and subdirectories in the indicated directory or, if not specified, in the current working directory. In a scalar context, dir() returns a reference to an array rather than the list itself. In contrast to ls() , each member of the returned list is a line of a directory listing that provides the file modes, ownerships, and sizes. It is equivalent to calling the ls command with the -lg options. $success = $ftp->get($remote [,$local [, $offset]]) The get() method retrieves the file named $remote from the FTP server. You may provide a full pathname or one relative to the current working directory. The $local argument specifies the local pathname to store the retrieved file to. If not provided, Net::FTP creates a file with the same name as the remote file in the current directory. You may also pass a filehandle in $local , in which case the contents of the retrieved file are written to that handle. This is handy for sending files to STDOUT: $ftp->get('RECENT',\*STDOUT) The $offset argument can be used to restart an interrupted transmission. It gives a position in the file that the FTP server should seek before transmitting. Here's an idiom for using it to restart an interrupted transmission: my $offset = (stat($file))[7] 0; $ftp->get($file,$file,$offset); The call to stat() fetches the current size of the local file or, if none exists, 0. This is then used as the offset to get() . $fh = $ftp->retr($filename) Like get() , the retr() method can be used to retrieve a remote file. However, rather than writing the file to a filehandle or disk file, it returns a filehandle that can be read from to retrieve the file directly. For example, here is how to read the file named RECENT located on a remote FTP server without creating a temporary local file: $fh = $ftp->retr('REMOTE') or die "can't get file ",$ftp-> message; print while <$fh>; $success = $ftp->put($local [,$remote]) The put() method transfers a file from the local host to the remote host. The naming rules for $local and $remote are identical to get() , including the ability to use a filehandle for $local . $fh = $ftp->stor($filename) $fh = $ftp->appe($filename) These two methods initiate file uploads. The file will be stored on the remote server under the name $filename . If the remote server allows the transfer, the method returns a filehandle that can be used to transmit the file contents. The methods differ in how they handle the case of an existing file with the specified name. The stor() , method overwrites the existing file, and appe() appends to it. $modtime = $ftp->mdtm($file) The mdtm() method returns the modification time of the specified file, expressed as seconds since the epoch (the same format returned by the stat() function). If the file does not exist or is not a plain file, then this method returns undef . Also be aware that some older FTP servers (such as those from Sun) do not support retrieval of modification times. For these servers mdtm() will return undef . $size = $ftp->size($file) Returns the size of the specified file in bytes. If the file does not exist or is not a plain file, then this method returns undef . Also be aware that older FTP servers that do not support the SIZE command also return undef . | A Directory Mirror Script Using Net::FTP, we can write a simple FTP mirroring script. It recursively compares a local directory against a remote one and copies new or updated files to the local machine, preserving the directory structure. The program preserves file modes in the local copy (but not ownerships) and also makes an attempt to preserve symbolic links. The script, called ftp_mirror.pl, is listed in Figure 6.2. To mirror a file or directory from a remote server, invoke the script with a command-line argument consisting of the remote server's DNS name, a colon , and the path of the file or directory to mirror. This example mirrors the file RECENT, copying it to the local directory only if it has changed since the last time the file was mirrored: Figure 6.2. The ftp_mirror.pl script % ftp_mirror.pl ftp.perl.org:/pub/CPAN/RECENT The next example mirrors the entire contents of the CPAN modules directory, recursively copying the remote directory structure into the current local working directory (don't try this verbatim unless you have a fast network connection and a lot of free disk space): % ftp_mirror.pl ftp.perl.org:/pub/CPAN/ The script's command-line options include --user and --pass , to provide a username and password for non-anonymous FTP, --verbose for verbose status reports , and --hash to print out hash marks during file transfers. Lines 1 “5: Load modules We load the Net::FTP module, as well as File::Path and Getopt::Long. File::Path provides the mkpath() routine for creating a subdirectory with all its intermediate parents. Getopt::Long provides functions for managing command-line arguments. Lines 6 “19: Process command-line arguments We process the command-line arguments, using them to set various global variables. The FTP host and the directory or file to mirror are stored into the variables $HOST and $PATH , respectively. Lines 20 “23: Initialize the FTP connection We call Net::FTP->new() to connect to the desired host, and login() to log in. If no username and password were provided as command-line arguments, we attempt an anonymous login. Otherwise, we attempt to use the authentication information to log in. After successfully logging in, we set the file transfer type to binary, which is necessary if we want to mirror exactly the remote site, and we turn on hashing if requested . Lines 24 “26: Initiate mirroring If all has gone well, we begin the mirroring process by calling an internal subroutine do_mirror() with the requested path. When do_mirror() is done, we close the connection politely by calling the FTP object's quit() method and exit. Lines 27 “36: do_mirror() subroutine The do_mirror() subroutine is the main entry point for mirroring a file or directory. When first called, we do not know whether the path requested by the user is a file or directory, so the first thing we do is invoke a utility subroutine to make that determination. Given a path on a remote FTP server, find_type() returns a single-character code indicating the type of object the path points to, a "-" for an ordinary file, or a "d" for a directory. Having determined the type of the object, we split the path into the directory part (the prefix) and the last component of the path (the leaf; either the desired file or directory). We invoke the FTP object's cwd() method to change into the parent of the file or directory to mirror. If the find_type() subroutine indicated that the path is a file, we invoke get_file() to mirror the file. Otherwise, we invoke get_dir() . Lines 37 “53: get_file() subroutine This subroutine is responsible for fetching a file, but only if it is newer than the local copy, if any. After fetching the file, we try to change its mode to match the mode on the remote site. The mode may be provided by the caller; if not, we determine the mode from within the subroutine. We begin by fetching the modification time and the size of the remote file using the FTP object's mdtm() and size() methods. Remember that these methods might return undef if we are talking to an older server that doesn't support these calls. If the mode hasn't been provided by the caller, we invoke the FTP object's dir() method to generate a directory listing of the requested file, and pass the result to parse_listing() , which splits the directory listing line into a three-element list consisting of the file type, name, and mode. We now look for a file on the local machine with the same relative path and stat() it, capturing the local file's size and modification time information. We then compare the size and modification time of the remote file to the local copy. If the files are the same size, and the remote file is as old or older than the local one, then we don't need to freshen our copy. Otherwise, we invoke the FTP object's get() method to fetch the remote file. After the file transfer is successfully completed, we change the file's mode to match the remote version. Lines 54 “73: get_dir() subroutine, recursive directory mirroring The get_dir() , subroutine is more complicated than get_file() because it must call itself recursively in order to make copies of directories nested within it. Like get_file() , this subroutine is called with the path of the directory and, optionally, the directory mode. We begin by creating a local copy of the directory in the current working directory if there isn't one already, using mkpath() to create intermediate directories if necessary. We then enter the newly created directory with the chdir() Perl built-in, and change the directory mode if requested. We retrieve the current working directory at the remote end by calling the FTP object's pwd() method. This path gets stored into a local variable for safekeeping. We now enter the remote copy of the mirror directory using cwd() . We need to copy the contents of the mirrored directory to the local server. We invoke the FTP object's dir() method to generate a full directory listing. We parse each line of the listing into its type, pathname, and mode using the parse_listing() subroutine. Plain files are passed to get_file() , symbolic_links() to make_link() , and subdirectories are passed recursively to get_dir() . Having dealt with each member of the directory listing, we put things back the way they were before we entered the subroutine. We call the FTP object's cwd() , routine to make the saved remote working directory current, and chdir('..') to move up a level in the local directory structure as well. Lines 74 “84: find_type() subroutine find_type() is a not-entirely-satisfactory subroutine for guessing the type of a file or directory given only its path. We would prefer to use the FTP dir() method for this purpose, as in the preceding get_dir() call, but this is unreliable because of slight differences in the way that the directory command works on different servers when you pass it the path to a file versus the path to a directory. Instead, we test whether the remote path is a directory by trying to cwd() into it. If cwd() fails, we assume that the path is a file. Otherwise, we assume that the path is a directory. Note that by this criterion, a symbolic link to a file is treated as a file, and a symbolic link to a directory is treated as a directory. This is the desired behavior. Lines 85 “92: make_link() subroutine The make_link() subroutine tries to create a local symbolic link that mirrors a remote link. It works by assuming that the entry in the remote directory listing denotes the source and target of a symbolic link, like this: README.html -> index.html We split the entry into its two components and pass them to the symlink () , built-in. Only symbolic links that point to relative targets are created. We don't attempt to link to absolute paths (such as "/CPAN") because this will probably not be valid on the local machine. Besides, it's a security issue. Lines 93 “106: parse_listing() subroutine The parse_listing() subroutine is invoked by get_dir() to process one line of the directory listing retrieved by Net::FTP->dir() . This subroutine is necessitated by the fact that the vanilla FTP protocol doesn't provide any other way to determine the type or mode of an element in a directory listing. The subroutine parses the directory entry using a regular expression that allows variants of common directory listings. The file's type code is derived from the first character of the symbolic mode field (e.g., the "d" in drwxr-xr-x ), and its mode from the remainder of the field. The filename is whatever follows the date field. The type, name, and mode are returned to the caller, after first converting the symbolic file mode into its numeric form. Lines 107 “122: filemode() subroutine This subroutine is responsible for converting a symbolic file mode into its numeric equivalent. For example, the symbolic mode rw-r--r-- becomes octal 0644. We treat the setuid or setgid bits as if they were execute bits. It would be a security risk to create a set-id file locally. When we run the mirror script in verbose mode on CPAN, the beginning of the output looks like the following: % ftp_mirror.pl --verbose ftp.perl.org:/pub/CPAN Getting directory CPAN/ Symlinking CPAN.html -> authors/Jon_Orwant/CPAN.html Symlinking ENDINGS -> .cpan/ENDINGS Getting file MIRRORED.BY Getting file MIRRORING.FROM Getting file README Symlinking README.html -> index.html Symlinking RECENT -> indices/RECENT-print Getting file RECENT.html Getting file ROADMAP Getting file ROADMAP.html Getting file SITES Getting file SITES.html Getting directory authors/ Getting file 00.Directory.Is.Not.Maintained.Anymore Getting file 00upload.howto Getting file 00whois.html Getting file 01mailrc.txt.gz Symlinking Aaron_Sherman -> id/ASHER Symlinking Abigail -> id/ABIGAIL Symlinking Achim_Bohnet -> id/ACH Symlinking Alan_Burlison -> id/ABURLISON ... When we run it again a few minutes later, we see messages indicating that most of the files are current and don't need to be updated: % ftp_mirror.pl --verbose ftp.perl.org:/pub/CPAN Getting directory CPAN/ Symlinking CPAN.html -> authors/Jon_Orwant/CPAN.html Symlinking ENDINGS -> .cpan/ENDINGS Getting file MIRRORED.BY: not newer than local copy. Getting file MIRRORING.FROM: not newer than local copy. Getting file README: not newer than local copy. ... The major weak point of this script is the parse_listing() routine. Because the FTP directory listing format is not standardized, server implementations vary slightly. During development, I tested this script on a variety of UNIX FTP daemons as well as on the Microsoft IIS FTP server. However, this script may well fail with other servers. In addition, the regular expression used to parse directory entries will probably fail on filenames that begin with whitespace. |