4.6 POP Tuning Specifics

At this point, we'll cover some specifics for providing high-performance POP service. The POP protocol is defined in RFC 1939 [MR96]. Even though it is possible to configure most POP clients to leave copies of all messages on the email server, a POP server is intended to act as a temporary repository for email that will be periodically downloaded to another machine. While not all users will operate along this model, in most environments a majority of them will.

Although we restrict our field of interest here to Open Source email solutions, the email administrator can choose from many packages. These POP daemons differ in their feature sets, such as whether they support APOP, POP over Transport Layer Security (TLS) [NEW99] (formerly called SSL), and other bells and whistles. Despite the fact that each distribution uses its own code base, each POP daemon that implements a certain feature must perform the same tasks, and there are only so many ways that these tasks can be implemented. Therefore, performance differences between various POP daemons due to coding choices tend to be relatively minor, and only the rare POP server ends up being CPU bound. When it comes to decisions regarding the layout of and interactions with the message store, differences become more apparent.

4.6.1 The 7th Edition Message Store

All programs that come with standard UNIX distributions that interact with a user's mailbox assume that email will be stored in the 7th Edition mailbox format. This format entails a single file per user, with each message concatenated after the other within this single file. Some programs and systems use slightly different mechanisms for determining when one message ends and the next begins, but generally the LDA marks the beginning of a new message by writing a special line in the mailbox file beginning with "From ". The LDA makes sure that messages naturally containing "From " at the beginning of any line in the body of a message are not interpreted as the beginning of a new message by escaping the potential header. This result can be seen by sending oneself email on a UNIX system with a line in the body that begins with "From ". Here is what happens when I send myself email with such a line:

 From: Nick Christenson <npc@acm.org>  Message-Id: <200109192258.f8JMwBxD016945@gangofone.com>  Subject: Test  To: Nick Christenson <npc@gangofone.com>  Date: Wed, 19 Sep 2001 15:58:10 -0700 (PDT)  >From the mountains to the prairies...    Nick Christenson  npc@acm.org

When the LDA writes a message to a 7th Edition mailbox, it first writes the "From " line, then writes the message header and body, and finally appends a trailing blank line at the end, just to be sure.

The LDA and POP daemon, as well as other programs, may both wish to modify a mailbox at the same time. It is imperative that such simultaneous changes be prevented by the use of some form of locking, and every program that modifies a mailbox must agree to use the same locking mechanism. When selecting email programs, it is vital to understand how they lock mailboxes and messages and to ensure that everything will cooperate so that mailbox corruption cannot occur. This point cannot be stated too strongly.

Two of the most common locking mechanisms are the flock() system call and the creation of a lock file, such as mailboxname .lock. Although either works, flock() does not rely on synchronous filesystem operations and so is more performance friendly. A few UNIX flavors still do not support the flock() system call. On these versions, lockf() or, rarely, fcntl() can be substituted, but flock() is preferred.

Let's walk through the steps of what happens on the email server during a typical POP session operating on a 7th Edition mailbox:

The client Mail User Agent (MUA) connects to the pop3 port of the email server (port 110).
The master POP daemon, or inetd as appropriate, receives this connection and spawns a process to handle the session.
The client authenticates its user to the POP daemon. The POP daemon verifies the authentication credentials and authorizes access to a single mailbox. As an example, suppose that the user's name is npc and the mailbox is the file /var/mail/npc.
The POP daemon locks the mailbox, either by creating a /var/mail/mailboxname .lock file or by calling flock(), or both, just to be extra careful.
The POP daemon makes a temporary copy of the mailbox with which to work. Assume this file is named /var/mail/.npc.pop. While creating the temporary mailbox, the POP daemon has the opportunity to scan the mailbox, assembling information about how many distinct messages are present, what their sizes are, who sent each message, and so on.
Once the temporary mailbox is created, the POP daemon unlocks the mailbox.
The POP daemon signifies to the POP client that it is ready to handle requests regarding the mailbox.
The POP client will likely request a list of the messages. It may also request unique identifiers or header information about each message.
If the reader intends to leave the messages on the server, then as each message is read, it will be individually downloaded on demand and the POP session will stay open for the whole session. More likely, the client will simply download every message.
In the typical case, all downloaded email will be marked for deletion on the server. It's possible that all, some, or none of the email in the POP mailbox will be marked for deletion.
The client disconnects.
The POP daemon needs to reconcile the POP session mailbox with the system mailbox, as new email may have come in or the mailbox may have otherwise been modified. During this process, the POP daemon locks the mailbox.
If no new email has arrived and all messages were deleted in the POP session, then the reconciliation is an easy task, the main mailbox is truncated, the .pop file is unlinked, and the lock is released. If not, all messages in the mailbox will be deleted, then the POP daemon constructs a new temporary file containing the modified mailbox. Messages themselves may be modified. For example, the POP daemon may add a Status: header in messages to mark that they are old or have been read. In the reconstructed mailbox, some messages may have been deleted, and some new messages may have arrived. Once the temporary mailbox is constructed, it is renamed to the mailbox name, the .pop file is deleted, and the mailbox is unlocked.
The POP daemon handling the session exits normally.

The POP protocol does not include provisions for allowing two POP sessions to operate on the same mailbox at the same time. If a POP daemon is active for a given user, and someone attempts to open a second POP connection as that user, then the second daemon will notice the .pop file and disallow the second session. This proper POP procedure is required to ensure that the mailbox does not become corrupted. The .pop file acts as a lease on the mailbox for potentially competing POP daemons. It may be left around if a POP daemon exits abnormally without having a chance to clean up. Therefore, any .pop file that exceeds a certain age threshold safely can be assumed to be stale and deleted. While a POP daemon remains active, it must periodically refresh this file by updating its mtime time-stamp using the utime() call.

Historically, some POP daemons have had bugs whereby if the daemon becomes busy, it might not finish its current operations before the lease timeout expires on the .pop file, leading to mailbox corruption. In most POP daemons, this problem has been fixed by having an alarm handler wake up to perform the update periodically regardless of what the daemon is working on. However, a process cannot respond to an alarm signal while it is within kernel space. On an unsaturated email server, a process should never be unable to refresh its lease for an amount of time greater than the lease timeout on the .pop file, which is typically a few minutes. Unfortunately, on filesystems that are overwhelmed by the load, this may happen. One must strike a balance between how long a user should wait before being able to reestablish a new POP session after a daemon crashes and the risk of mailbox corruption under the circumstance that the server runs extremely slowly, especially given that these two events (processes dying abnormally and horribly slow file access) often are correlated activities. This problem can't be resolved simply, but clearly any email server on which this situation might arise under any except the most extreme circumstances needs to be tuned or upgraded to avoid this possibility.

If we can be absolutely certain that the only processes that might edit the mailbox will respect the lock mandated by the .pop files and that the LDA will only append messages to the mailbox, then we can reduce the I/O that the POP daemon performs. We do so by creating a .pop file, but not putting data in it until or unless the POP daemon needs to edit the file. If all of the messages in a mailbox are typically downloaded, and no new email arrives during this session, then most of the time the mailbox can be truncated without having to be rewritten, which can result in a large I/O savings. Within the context of qpopper (the most commonly used Open Source POP daemon [QPO]), this form of operation is called SERVER MODE. This technique provides considerable performance advantages, but it is totally incompatible with standard UNIX email readers, such as mailx, elm, or other programs that can be used to perform arbitrary edits of a mailbox and don't check for .pop files.

4.6.2 The `maildir` Message Store

While 7th Edition mailbox is the default format used on UNIX systems, it is not the only option. After having the mailbox be a single file containing all messages, the next most common format is to represent the mailbox as a directory with one file per message in it. Several variations on this theme exist, although the most frequently cited one is the qmail format (named after the MTA that popularized it). The qmail documentation calls it the maildir format.

The maildir message store works as follows: Each user has a directory under his or her home directory for receiving email, typically named Maildir. In this book, we want to focus on centralized email servers, so the directory that contains email messages will more likely reside in a centralized message store, such as /var/qmail/npc. This directory contains three subdirectories: tmp, new, and cur. Delivery of a message occurs via the following procedure:

When a new message comes in, a file is created in the tmp subdirectory with the name time.pid.host , where time is the traditional UNIX time returned by the time() library call (the number of seconds since the beginning of the year 1970), pid is the process's PID, and host is the host name. If this file cannot be created uniquely, the delivery agent sleeps for a while and tries again. For a file name collision to occur, two unique delivery agents with the same PID would have to try to deliver a message to the same user within the span of one second. As long as the operating system assigns PIDs sequentially, it is difficult to imagine a single email server capable of this sort of performance outside of a contrived example.
Once the file exists in the tmp directory, the message contents are written to this file. An extra "From " is not written at the beginning of the file, nor is an extra blank line added at the end. Lines in the message beginning with "From " do not need to be specially escaped. Once all the data are written out, the program uses fsync() to commit the file to disk.
The file is moved using rename() to the new subdirectory. The file's name is not modified.
The message delivery is completed, and the MTA is informed of a successful delivery.

As with all commonly used MTAs, there's a great deal to understand about how qmail operates, far too much to describe completely here. Further information can be found at the qmail Web site [QMA] or books on the subject [BLU00] [SIL01]. However, some additional notes bear mentioning. The cur directory is available to be used by programs such as the POP daemon as a place to store old messages. A POP daemon that recognizes maildir-formatted mailboxes may end its session by unlinking all messages to be deleted, and then using rename() to move all saved messages from the new directory to the cur directory.

Because the number of inodes consumed by maildir (one for each file and directory) is much larger than for 7th Edition format, it is even more important that a filesystem containing a maildir email spool support large numbers of small files. A cautious rule of thumb would be to divide the total unformatted capacity of the disk by the expected average message size to get the number of inodes that the filesystem should support. Multiplying this number by 1.5 might not be a bad idea, just to be safe, as the directories themselves will consume additional inodes. The quantity of inodes available to a filesystem is specified at file creation time as a parameter to the newfs command (or the equivalent command on other systems). Some advanced filesystems have the capability to add more inodes after creation, but this feature isn't common. In general, this issue should be carefully planned out before the first email message is delivered, as the problem might be difficult to fix later. If one isn't aware of it, running out of inodes on a disk can be tricky to diagnose, because many applications will report that the filesystem is out of space when, in reality, it has plenty of space but is out of inodes. Use df -i to inspect inode utilization on a machine's filesystems.

Even though the maildir format arose as part of the qmail email system, no part of it is incompatible with a sendmail-based email system. If mailboxes are stored in maildir format, whether running qmail or sendmail as the MTA, the POP daemon and the LDA will need to understand this format. Two options are to use the qmail-pop3d that comes with the qmail distribution or to patch qpopper to respect the maildir format. At least one patch to the venerable but obsolete qpopper version 2.53 is floating around the Internet, as is a patch for CUCIpop. Of course, because sendmail itself never touches the message store, it does not need to understand the message store format; only the LDA (defined by Mlocal in the sendmail.cf file) does. It's possible to use procmail [PRO], which understands the maildir format as the LDA, or it's a straightforward procedure to modify the mail.local LDA that comes with the sendmail distribution.

4.6.3 The Cyrus Message Store

The Cyrus message store also uses a one-message-per-file storage mechanism. All messages in the message store are owned by the user cyrus and the group mail by default. This setup precludes the mechanism's use by local mail reading programs, such as mailx or elm, without substantial modification of the software and its authentication mechanisms.

The root of the Cyrus message store is set in a master configuration file called /etc/imapd.conf. For example, /var/spool/imap might be a common location for the message store. Personal mailboxes are directories named user/username under the message store root. This mailbox is visible to an IMAP client as the special "INBOX" mailbox. Other mailboxes names are appended to this naming convention. Thus, if a personal IMAP mail folder was called book-reviews, it would appear on the file system as user/npc/book-reviews under the /var/spool/imap directory, or whatever top-level directory is defined in /etc/imapd.conf.

Each message is named via a number followed by an ASCII period (".") in the mailbox directory: 1., 2., 3., and so on. Unlike most other email systems, Cyrus stores its messages in CRLF (wire) format. Several other files found in each mailbox directory are used by the Cyrus IMAP software for various purposes:

cyrus.cache This file contains header information for every message in the mailbox. It is a performance optimization for IMAP clients, where requests for headers from a given mailbox are common. To satisfy this request, the Cyrus IMAP daemon can open and read from just one file rather than having to read every message file in a mailbox.

cyrus.header This file contains information about the mailbox in which it is stored. It includes the IMAP Access Control List (ACL) as well as the IMAP user-defined flags for the mailbox along with other information.

cyrus.index This file contains information about the mailbox as a whole as well as information about individual files. The last message number assigned to each mailbox and the date and time when the last message was added to the mailbox are some of the global information recorded in this file. For each message, information such as the contents of the Date: header and the message's size are stored here as well.

cyrus.seen This file contains information about which messages have already been read and which messages have arrived since the mailbox was last read.

The cyrus.cache and cyrus.index files need to be updated upon each message delivery. Other information repositories may need to be modified on delivery as well, including a database containing quota information. Cyrus supports a number of optional complex features (single-instance message storage, duplicate message suppression, SIEVE mail filtering support, and so on). A person is not expected to keep this entire system working by making file changes by hand. Rather, a powerful administration client, called cyradm, is available to make this job much easier.

Overall, this message store format is much more complex than either the 7th Edition or maildir format. Consequently, deliveries and simple retrievals are much more CPU- and I/O-intensive than they would be on the more basic formats. Of course, the Cyrus system is tuned for the demands that IMAP clients place on it. No other email packages use this message store format. If a site needs IMAP, then implementing Cyrus and its message store is a fine choice to meet those needs. If IMAP support isn't necessary, then the extra overhead of associated with Cyrus is wasted.

There is a lot more to Cyrus than has been covered here. For more information, read the overview.html document that comes bundled with current Cyrus source code distributions in the doc directory. Another excellent source of information is the book Managing IMAP [MM00].

4.6.4 Comparing Message Store Formats

When evaluating message store formats, we'll disregard the Cyrus message store. Either a site will need Cyrus for IMAP support or it won't, and evaluating this criterion should simplify the decision-making process.

Comparing the other two message stores discussed here, using the maildir message store offers some compelling advantages. The exclusive creation of the message file in the tmp directory is as close to explicit locking as this protocol gets, yet it remains completely safe on most filesystems. Therefore, lock files are unnecessary and the .pop file timeout problem cannot occur. Further, I/O is reduced because no second copy of the mailbox at the beginning of a session is required. Also, when the POP session starts, the mailbox does not need to be scanned to determine where each message begins and ends and how large it is. Instead, this information is available trivially from each message file.

On the other hand, maildir has its disadvantages. If the LDA should enforce mailbox size quotas, a 7th Edition LDA can determine a mailbox's quota status from a single stat() of the file, which is fairly inexpensive. With maildir, each file in at least two directories needs to be stat()ed, one needs to ensure that a message isn't double-counted (it might be moved while quota is being checked), and the results of each check must be added together. These operations are not terribly expensive by themselves, but they can add up when a mailbox contains a large number of messages. Also, while the same number of bytes is typically read or written in each case, a POP session operating on a maildir mailbox typically requires quite a few more (usually synchronous) metadata operations. Each message read requires an open(), and each deleted message requires a separate, usually synchronous unlink(). A file system's predictive read ahead, which doesn't help much for 7th Edition mailbox systems, is even less effective on servers using maildir.

Moreover, it is not safe to use maildir on a filesystem using Soft Updates without first patching the software. Using Soft Updates is often desirable to reduce the cost of the metadata operations. Soft Updates isn't safe because maildir relies on the file being renamed between the tmp and new directories to indicate a completed delivery. If a server running Soft Updates crashes after the rename occurs, it's possible that this rename will be rolled back because it isn't performed synchronously; in such a case, the file may sit in tmp forever, as every new process will ignore files in tmp that it did not create. This problem will result in the loss of the message. Software using maildir could fix this flaw by performing another fsync() on the open file descriptor after the rename occurs, as sendmail has done for queue operations since version 8.10. Indeed, an unofficial patch to qmail 1.03 does just that. For the same reasons, the maildir message store format is not supported for use on filesystems that don't perform metadata updates synchronously, such as ext2fs, ext3fs, and ReiserFS. As with sendmail's operations in the message queue, patches that fsync() the directory after file rename will solve the problem for these filesystems on Linux. Note that this fix is mandatory, not optional. Because these systems perform so many metadata operations, and because running unpatched code on a filesystem using Soft Updates isn't an option, it's usually a good idea to either use NVRAM or a journaling filesystem with a synchronous journal for high-performance maildir systems. Fortunately, either is usually easy to obtain.

It's hard to beat the performance of 7th Edition mailboxes if complicated mailbox edits are rare, lock files need not be created, and the system runs in SERVER MODE. In this case, an entire session can occur with only the fsync() after the mailbox truncation needing to be synchronous. If deleting some but not all messages is a typical occurrence in a session or the system cannot be run in SERVER MODE, however, maildir may be much faster. Also, anywhere except on a Soft Updates system, maildir's locking is generally safer no small comfort. It is entirely possible to run high-performance email systems using either message store format. In the end, the email administrator must decide which format to use based on expected I/O patterns, the kinds of filesystems that are available, and the software used to access the message store.

4.6.1 The 7th Edition Message Store

4.6.2 The maildir Message Store

4.6.3 The Cyrus Message Store

4.6.4 Comparing Message Store Formats

4.6.2 The `maildir` Message Store