The Post Office Protocol

	Network Programming with Perl By Lincoln D. Stein Slots : 1
	Table of Contents

	Chapter 8. POP, IMAP, and NNTP

Content

POP3 and IMAP are the two protocols used most to access Internet mail. Both were designed to allow a user to access mail drops on remote machines, and provide methods to list the contents of the user's mailbox, to download mail for viewing, and to delete messages the user is no longer interested in.

POP3 (Post Office Protocol version 3) is the older and simpler of the two. Described in RFC 1725 and STD 53, it provides a straightforward interface for listing, retrieving, and deleting mail held on a remote server. IMAP (Internet Message Access Protocol), described in RFC 2060, adds sophisticated facilities for managing sets of remote and local mailboxes and synchronizing them when the user connects.

We will consider fetching mail from a POP3 server in this section. There are at least two Perl modules on CPAN for dealing with POP3 servers: Mail::POP3Client, written by Sean Dowd, and Net::POP3, by Graham Barr. Both provide essentially the same functionality but they use different APIs. The most important feature difference between the two is that Net::POP3 allows you to save the contents of a mail message to a filehandle, while Mail::POP3Client reads the entire mail message into memory. Because the ability to save to a filehandle makes a big difference when dealing with large e- mails (such as those containing MIME enclosures), I recommend Net::POP3.

Net::POP3 inherits from Net::Cmd, making it similar in style to Net::FTP and Net::SMTP. You begin by creating a new Net::POP3 object connected to the mailbox host. If this is successful, you log in using a username and password, and then invoke various methods to list the contents of the mailbox, retrieve individual messages, and possibly delete the retrieved messages.

Summarizing a POP3 Mailbox

Figure 8.1 shows a small program that will access a user's mailbox on a maildrop machine and print a brief summary of the senders and subject lines of all new messages. The username and mailhost are specified on the command line using the format username@mailbox.host . The program prompts for the password. Appendix A contains the listing for the PromptUtil.pm package.

Figure 8.1. List entries in a user's inbox

graphics/08fig01.gif

Lines 1 “6: Load modules We bring in the Net::POP3 module to contact the remote POP server, and Mail::Header to parse the retrieved mail headers. We also bring in a new home-brewed utility module, PromptUtil, which provides the get_passwd() function, along with a few other user prompting functions.

Lines 6 “8: Get username, host, and password We get the username and host from the command line, and prompt the user to enter his or her password using the get_passwd() function. The latter turns off terminal echo so that the password is not visible on the screen.

Line 9: Connect to mailbox host We call the Net::POP3 new() method to connect to the indicated host, giving the server 30 seconds in which to respond with the welcome banner. The new() constructor returns a Net::POP3 object.

Lines 10 “13: Log in and count messages We call the POP3 object's login() method to log in with the user's name and password. If the login is successful, it returns the total number of messages in the user's mailbox; if there are no messages in the mailbox, it returns 0E0 ("zero but true"). This value has a property of 1 if treated in a logical text to test whether login was successful, and is equal to 0 when used to count the number of available messages.

Next we call the POP3 object's last() method to return the number of the last message the user read (0 if none read). We will use this to list the unread messages. Because the message count retrieved by new() can be 0E0, we add zero to it to convert it into a more familiar number. We then print the total number of old and new messages.

Lines 14 “21: Summarize messages Each message is numbered from 1 to the total of messages in the mailbox. For each one, we call the POP object's top() method to retrieve the message header as a reference to an array of lines, and pass this to Mail::Header->new() for parsing. We call the parsed header's get() method twice to retrieve the Subject: and From: lines, and pass the sender's address to the clean_from() utility subroutine to clean it up a bit. We then print out the message number, sender's name, and subject.

Line 22: Log out The POP object's quit() method logs out cleanly.

Lines 23 “29: Clean up with the clean_from() subroutine This subroutine cleans up sender addresses a bit, by extracting the sender's name from these three common address formats:
 "Lincoln Stein" <lstein@cshl.org> Lincoln Stein <stein@cshl.org> lstein@cshl.org (Lincoln Stein) 

When we run this program, we get output like this:

 %  pop_stats.pl lstein@localhost  inbox has 6 messages (6 new) 1 Geoff Winisky             Re: total newbie question                        2 Robin Lofving             Server updates                                   3 James W Goldblum          Comments part 2                                  4 Jessica Raymond           Statistics on Transaction Security               5 James W Goldbum           feedback access from each page                   6 The Western Web           The Western Web Newsletter

Net::POP3 API

The Net::POP3 API is simple. You can log in, log out, list messages, retrieve message headers, retrieve the entire message, and delete messages.

$pop = Net::POP3->new([$host] [,$opt1=>$val1, $opt2=>$val2 ])

The new() method constructs a new Net::POP3 object. The first, optional, argument is the name or IP address of the mailbox host. This may be followed by a series of option/value pairs. If the host is not provided, it will be retrieved from the Net::Config "POP3_hosts" value specified when the libnet module was installed. The options are listed in Table 8.1.

The ResvPort option is used with some POP3 servers that require clients to connect from reserved ports.

If unsuccessful , new() returns undef and $! is set to some error code.

$messages = $pop->login([$username [,$password]])

The login() method attempts to log into the server using the provided username and password. If one or both of the password and username are not given, then login() looks in the user's . netrc file for the authentication information for the specified host.

If successful, login() returns the total number of messages in the user's mailbox. If there are no messages, login() returns the following point number 0E0 , which will be treated as true when used in a logical context to test whether login was successful, but evaluate to zero when treated in a numeric context to count the number of available messages. If an error occurs, login() returns undef and $pop->message() contains an error message.

If the login fails, you may try again or try to login using apop() . Some servers close the connection after a number of unsuccessful login attempts. With the exception of quit() , none of the other methods will be accepted until the server accepts the login.

Some POP servers support the APOP command.

$messages = $pop->apop($username,$password)

APOP is similar to a standard login, but instead of sending passwords across the network in the clear, it uses a challenge/ response system to authenticate the user without processing cleartext passwords. Unlike login() , .netrc is not consulted if the username and password are absent. The value returned from apop() is the same as that from login() .

Table 8.1. `Net::POP3->new()` Options

Option	Description	Default
`Port`	Remote port to connect to	POP3(110)
`ResvPort`	Local port to bind to	ephemeral port
`Timeout`	Second to wait for a response	120
`Debug`	Turn on verbose debugging	`undef`

Many POP3 servers need special configuration before the APOP command will authenticate correctly. In particular, most UNIX servers need a password file distinct from the system password file.

Once login is successful, you can use a variety of methods to access the mailbox:

$last_msgnum = $pop->last

POP messages are numbered from 1 through the total number of messages in the inbox. At any time, the user may have read one or more messages using the RETR command (see below), but not deleted them from the inbox. Last() returns the highest number from the set of retrieved messages, or 0 if no messages have been retrieved. New messages begin at $last_msgnum+1 .

Many POP servers store the last-read information between connections; however, a few discard this information.

$arrayref = $pop->get($msgnum [,FILEHANDLE])

Following a successful login, the get() method retrieves the message indicated by its message number, using the POP3 RETR command. It can be called with a filehandle, in which case the contents of the message (both header and body) are written to the filehandle. Otherwise, the get() method returns an array reference containing the lines of the message.

$handle = $pop->getfh($msgnum)

This is similar to get() , but the return value is a tied filehandle. Reading from this handle returns the contents of the message. When the handle returns end-of-file, it should be closed and discarded.

$flag = $pop->delete($msgnum)

delete() marks the indicated message for deletion. Marked messages are not removed until the quit() method is called, and can be unmarked by calling reset() .

$arrayref = $pop->top($msgnum[,$lines])

The top() method returns the header of the indicated message as a reference to an array of lines. This format is suitable for passing to the Mail::Header->new() method. If the optional $lines argument is provided, then the indicated number of lines of the message body are included.

$hashref = $pop->list

$ size = $pop->list($msgnum)

The list() method returns information on the size of mailbox messages. Called without arguments, it returns a hash reference in which the keys are message IDs, and the values are the sizes of the messages, in bytes. Called with a message ID, the method returns the size of the indicated message, or if an invalid message number was provided, it returns undef .

($msg_count,$size) = $pop->popstat

pop_stat() returns a two-element list that consists of the number of undeleted messages in the mailbox and the size of the mailbox in bytes.

$uidl = $pop->uidl([$msgnum])

The uidl() method returns a unique identifier for the given message number. Called without an argument, it returns a hash reference in which the keys are the message numbers for the entire mailbox, and the values are their unique identifiers. This method is intended to help clients track messages across sessions, since the message numbers change as the mailbox grows and shrinks.

When you call the quit() method, messages marked for deletion are removed unless you reset() first.

$pop->reset

This method resets the mailbox, unmarking the messages marked for deletion.

$pop->quit

The quit() method quits the remote server and disconnects. Any messages marked for deletion are removed from the mailbox.

Retrieving and Processing MIME Messages via POP

To show Net::POP3 in a real-world application, I developed a script called pop_fetch.pl that combines Net::POP3 and MIME::Parse. Figure 8.2 shows a session with this program. After I invoke it with the mailbox name in user@host form, the program prompts me for my login password. The program reports the number of messages in my mailbox, and then displays the date, sender, and subject line of the first, prompting me to read it or skip to the next.

Figure 8.2. A session with pop_fetch.pl

graphics/08fig02.gif

I choose to read the message, causing the program to display the message header and the text part of the body. It then reports that the message has two attachments (technically, two non- text/plain MIME parts ). For each one, the program prompts me for the disposition of the attachment. For the first attachment, of type image/jpeg , I choose to view the attachment, causing my favorite image viewer (the XV application, written by John Bradley) to pop up in a new window and show the picture. After I quit the viewer, the script prompts me again for the disposition. This time I choose to save the image under its default name.

The next attachment is a Microsoft Word document. No viewer is defined for this document type, so the prompt only allows the attachment to be saved to disk.

After dealing with the last attachment, the program prompts me to keep or delete the entire message from the inbox, or to quit. I quit. The program then moves on to the next unprocessed message.

The pop_fetch.pl Script

pop_fetch.pl is broken into two parts. The main part, listed in Figure 8.3, handles the user interface. A smaller module named PopParser.pm subclasses Net::POP3 in such a way that messages retrieved from a POP3 mailbox are automatically parsed into MIME::Entities.

Figure 8.3. The pop_fetch.pl script

graphics/08fig03.gif

We'll look at pop_fetch.pl first.

Lines 1 “6: Activate taint checking and load modules Since we will be launching external applications (the viewers ) based on information from untrusted sources, we need to be careful to check for tainted variables . The -T switch turns on taint checking. (See Chapter 10 for more information.)

We load PopParser and PromptUtil, two modules developed for this application.

Lines 7 “11: Define viewers We define constants for certain external viewers. For example, HTML files are invoked with the command lynx %s , where %s is replaced by the name of the HTML file to view. For variety, some of the viewers are implemented as pipes. For example, the player for MP3 audio files is invoked as mpg123- , where the - symbol tells the player to take its input from standard input.

At the end of the code walkthrough, we'll discuss replacing this section of code with the standard mailcap facility.

Lines 12 “13: Taint check precautions As explained in more depth in Chapter 10, taint checking will not let us run with an untrusted path or with several other environment variables set. We set PATH to a known, trusted state, and delete four other environment variables that affect the way that commands are processed .

Lines 14 “20: Recover username and mailbox host We process the command-line arguments to recover the name of the user and the POP3 host.

The $entity global holds the most recent parsed MIME::Entity object. We make it global so that the script's END{} block can detect it and call its purge() , method in case the user quits the program prematurely. This will delete all temporary files from disk. For similar reasons, we intercept the INT signal to exit gracefully if the user hits the interrupt key.

Lines 21 “26: Log in to mailbox server The PopParser.pm module defines a new subclass of Net::POP3 that inherits all the behavior of the base class, but returns parsed MIME::Entity objects from the get() method rather than the raw text of the message. We create a new PopParser object connected to the mailbox host. If this is successful, we call get_passwd() (imported from the PromptUtil module) to get the user's login password.

Next, we authenticate ourselves to the remote host. We don't know a priori whether the server accepts APOP authentication or the less secure cleartext authentication method, so we try them both. If the apop() method fails, then we try login() . If that also fails, we die with an error message.

If login is successful, we print the number of messages returned by the apop() , or login() methods. We add 0 to the message count to convert the 0E0 result code into a more user-friendly integer.

Lines 27 “38: Enter the main message-processing loop We now enter the main message-processing loop. For each message, we fetch its header by calling the PopParser object's top() method (which is inherited without modification from Net::POP3). The header text is then passed to our print_header() method to display it as a one-line message summary.

We ask the user if he or she wants to read the message, and if so, we call the PopParser object's get() method, which fetches the indicated message, parses it, and returns a MIME::Entity object. This object is passed to our display_entity() , subroutine in order to display it and its subparts. When display_entity() is finished, we delete the entity's temporary files by calling its purge() method.

The last step is to ask the user if he or she wants to delete the message from the remote mailbox, and if the answer is affirmative , we call the PopParser's delete() method.

Lines 39 “45: print_header() subroutine The print_header() subroutine takes an array ref containing the header lines returned by $POP->top() and turns it into a one-line summary for display. Although we could have used the Mail::Header module for this purpose, it turned out to be cleaner to parse the header into a hash ourselves using the idiom of the Mail::SMTP mail client of Figure 7.2.

The output line contains the date, sender, and subject line, separated by tabs.

Lines 46 “60: display_entity() subroutine This subroutine is responsible for displaying a MIME::Entity object. It is called recursively to process both the top-level object and each of its subparts (and sub-subparts, if any).

We begin by retrieving the message's mail header as a MIME::Head object. If the header contains a From: field, then we can conclude that it is the top-level entity. We print out the header so that the user can see the sender's name and other fields.

Next we check whether the entity is multipart, by calling its is_multipart() , method. If this method returns true, then we call handle_multipart() to prompt the user for each of the parts. Otherwise, we invoke a subroutine called display_part() to display the contents of the entity.

Lines 61 “78: The handle_multipart() subroutine The handle_multipart() , subroutine loops through and processes each part of a multipart MIME::Entity object. We begin by calling the entity's parts() method to fetch each of the subparts as a MIME::Entity object. We then call Perl's grep() built-in twice to sort the parts into those that we can display directly and those that are to be treated as attachments that must be displayed using an external application. Since we know how to display only plain text, we sort on the MIME type text/plain .

For each of the text/plain parts, we call the display_part() subroutine to print the message body to the screen. If there are nontext attachments, we prompt the user for permission to display them, and if so, invoke display_entity() , recursively on each attachment. This recursive invocation of display_entity() , allows for attachments that are themselves multipart messages, such as forwarded e-mails.

Lines 79 “99: The display_part() subroutine The display_part() subroutine is invoked to display a single-part MIME::Entity. Depending on the user's wishes, its job is to display, save, or ignore the part.

We begin by retrieving the part's header, MIME type, description, and suggested filename for saving (derived from the Content-Disposition: header, if present). We also recover the part's MIME::Body object by calling its bodyhandle() method. This object gives us access to the body's unencoded content.

If the part's MIME type is text/plain , we do not need an external viewer to display it. We simply call the body object's print() method to print the contents to standard output. Otherwise, we call get_viewer() to return the name of an external viewer that can display this MIME type. We print a summary that contains the part's MIME type, description, and suggested filename, and then prompt the user to view or save the part. Depending on the user's response, we invoke save_body() to save the part's content to disk, or display_body() to launch the external viewer to display it. This continues in a loop until the user chooses "n" to go to the next part.

If no viewer is defined for the part's MIME type, the user's only option is to save the content to disk.

Lines 100 “114: The save_body() subroutine The save_body() subroutine accepts a MIME::Body object and a default filename. It gives the user the opportunity to change the filename, opens the file, and writes the contents of the part to disk.

The most interesting feature of this subroutine is the way that we treat the default filename for the attachment. This filename is derived from the Content-Disposition: header, and as such is untrusted data. Someone who wanted to spoil our day could choose a malicious pathname, such as one that would overwrite a treasured configuration file. For this reason we forbid absolute pathnames and those that contain the ".." relative path component. We also forbid filenames that contain unusual characters such as shell metacharacters. Having satisfied these tests, we extract the filename using a pattern match, thereby untainting it. Perl will now allow us to open the file for writing. We do so and write the attachment's contents to it by calling the MIME::Body object's print() method.

Lines 116 “128: The display_body () subroutine The display_body() subroutine is called to launch an external viewer to display an attachment. It is passed a MIME::Body object, and a command to launch an external viewer to display it.

To make this application a bit more interesting, we allow for two types of viewers: those that read the body data from a file on disk and those that read from standard input. The former are distinguished from the latter by containing the symbol %s , which will be replaced by the filename before execution (this is a standard convention in the UNIX mailcap file).

We begin by calling the MIME::Body object's path() method to obtain the path to the temporary file in which the object's data is stored. We then use this in a pattern substitution to replace any occurrence of %s in the viewer command. If the substitution is successful, it returns a true value, and we call system() to invoke the command.

Otherwise, we assume that the viewer will read the data from standard input. In this case, we use open() to open a pipe to the viewer command, and invoke the body object's print() method to print to the pipe filehandle. Before doing this, however, we set the PIPE handler to IGNORE to avoid the program terminating unexpectedly because of a recalcitrant viewer.

This subroutine works correctly both for line-oriented applications, such as the Lynx HTML viewer, and for windowing applications, such as XV.

Lines 129 “137: The get_viewer() subroutine get_viewer() is an extremely simple subroutine that uses a pattern match to examine the MIME type of the attachment and selects a hard-coded viewer for it.

Lines 138 “140: END{} block This script's END{} block takes care of calling any leftover MIME::Entity's purge() method. This deletes temporary files that might be left around if the user interrupted the script's execution unexpectedly.

The PopParser Module

The other main component of the pop_fetch.pl script is the PopParser module, which subclasses Net::POP3 in a way that enables it to parse MIME messages at the same time that it is fetching them. Figure 8.4 shows the code for PopParser.pm.

Figure 8.4. The PopParser module

graphics/08fig04.gif

Lines 1 “6: Load modules We turn on strict checking and load the Net::POP3 and MIME::Parser modules. We use the global @ISA array to tell Perl that PopParser is a subclass of Net::POP3.

Lines 7 “15: Override the new() method We override the Net::POP3 new() method in order to create and initialize a MIME::Parser for later use. We first invoke our parent's new() method to create the basic object and connect to the remote host, create and configure a MIME::Parser object, and store the parser for later use by invoking our parser() accessor method.

Lines 16 “21: The parser() method This method is an accessor for the MIME::Parser object created during the call to new() . If we are called with a parser object on our subroutine stack, we store it among our instance variables. Otherwise, we return the current parser object to the caller.

The way we stash the parser object among our instance variables looks weird, but it is the conventional way to store instance variables in filehandle objects:
 ${*$self}{'pp_parser'} = shift 
What this is doing is referencing a hash in the symbol table that happens to have the same name as our filehandle. We then index into that as if it were a conventionally created hash. We need to store our instance variables this way because Net::POP3 ultimately descends from IO::Handle, which creates and manipulates blessed filehandles, rather than more conventional blessed hash references.

Lines 22 “30: Override the get() method The last part of this module overrides the Net::POP3 get() method. We are called with the number of the message to retrieve, which we pass to getfh() to obtain a tied filehandle from which to read the desired message. The returned filehandle is immediately passed to our stored MIME::Parser object to parse the message and return a MIME::Entity object.

The nice thing about the design of the PopParser module is that message retrieval and message parsing occur in tandem, rather than downloading the entire message and parsing it in two steps. This saves considerable time for long messages.

There are a number of useful enhancements one could make to pop_fetch.pl . The one with the greatest impact would be to expand the range and flexibility of the viewers for nontext attachments. The best way to do this would be to provide support for the system /etc/mailcap and per-user .mailcap files, which on UNIX systems map MIME types to external viewers. This would allow the user to install and customize viewers without editing the code. Support for the mailcap system can be found in the Mail::Cap module, which is part of Graham Barr's MailTools package. To use Mail::Cap in the pop_fetch.pl script, replace lines 7 through 11 of Figure 8.3 with these lines:

 use Mail::Cap; my $mc = Mail::Cap-new;

This brings in the Mail::Cap module and creates a new Mail::Cap object that we can use to fetch information from the mailcap configuration files.

Replace line 90, which invokes the get_viewer() subroutine, with the equivalent call from Mail::Cap:

 my $viewer = $mc->viewCmd($type);

This takes a MIME type and returns the command to invoke to view it if one is defined.

The last modification is to replace line 97, which invokes the display_ body() subroutine to invoke the viewer on the body of an attachment, with the Mail::Cap equivalent:

 $mc->view($type,$body->path);

This call looks up the appropriate view command for the specified MIME type, does any needed string substitutions, and invokes the command using system() .

We no longer need the get_viewer() and display_body() subroutines, because Mail::Cap takes care of their functionality. You can delete them.

Other potential enhancements to this script include:

the ability to reply to messages
the ability to list old and new messages and jump directly to messages of interest
a full windowing display using the text-mode Curses module or the graphical PerlTK package, both available from CPAN

With a little work, you could turn this script into a full-featured e-mail client!

Top