4.1 What Happens During Email Reception | sendmail Performance Tuning

Just as we walked through an SMTP relay session in the previous chapter, let's examine what happens when a message is received by an email server and stored in its message store. For the purposes of this example, let's assume that we run sendmail version 8.9 as our MTA in background delivery mode on a typical operating system, we use /usr/libexec/mail.local as our LDA and Qualcomm's qpopper as our POP daemon, and the message store uses the familiar 7th Edition mailbox format [SHO94] to store email. We'll discuss the particulars of various mailbox formats later in this chapter, but for now note that 7th Edition mailbox format is just a fancy name for the default format in which email messages are stored on almost every UNIX or UNIX-like system. Much of the session remains the same:

The originator machine opens an SMTP connection with the server.
The server accepts the connection and spawns a new instance of the sendmail daemon to handle this connection.
The new sendmail process inspects the SMTP envelope to determine whether the message should be accepted by policy.
If the message is acceptable to the gateway, a qf file, which will eventually contain the message's header along with delivery information, is created in the queue, typically the /var/spool/mqueue directory. Next, the remote machine receives permission to send the message itself, and then the gateway creates the df file, which will hold the message's body, also in the queue.
The originator sends the message's header, which the server stores in memory temporarily, followed by the message body. As the body is received by the server, it is buffered and written out to the df file.
Once the originator indicates that the message has been completely sent, the gateway adds its own "Received:" line to the header, modifies the message headers if necessary, and writes out the contents of the qf file. It then issues an fsync() command on the still-open qf and df file descriptors, closes the files, and acknowledges receipt of the message to the originator.
Another sendmail process is spawned that examines the qf envelope and message headers to determine what to do with the message. It creates an xf file to store any error messages regarding this delivery attempt.
The originator closes the SMTP connection if no other messages will be transferred in this session.

Until this point, the session has been identical to that for relaying. Here the tasks performed by the email server diverge.
If the local machine is identified as the final destination for the message, the sendmail process spawns the LDA that will complete the delivery. The LDA is passed the list of local user names for which the message is to be delivered.
The LDA opens a temporary file in which it will store the message. A line beginning with "From " that contains the sender's email address and the current date and time is written to the file. The message is written out to this file, and a blank line is appended to the end of it. The message is passed from the MTA through the LDA to the temporary file.
The LDA opens and then locks the first mailbox on its list of recipients by using flock() (or lockf(), if the system doesn't support flock()) and creating a temporary lock file in the mail spool. For example, if the email user's login name is npc, then the file created would be /var/mail/npc.lock. Acquiring the advisory lock and being able to create the lock file indicate that the LDA can safely proceed with message delivery.
The contents of the temporary file are appended to the first mailbox. The mailbox file is then closed, which also clears the advisory lock, and the lock file is deleted.
The LDA goes on to the next recipient and repeats the last two steps. If this recipient is the last recipient, then the temporary file is unlinked and successful delivery is indicated to the MTA as the LDA exits.
After the LDA reports a successful delivery, the MTA can unlink the qf and df files in the queue.

Some of the details of this transaction are worthy of special mention. First, sendmail writes the message to the queue regardless of whether the destination is local or remote. If a system is running sendmail 8.12 in interactive mode with SuperSafe also set to interactive, then the SMTP session may be held open while the message is passed to the LDA, bypassing writes in the queue, just as in relaying. Second, the LDA writes the message out to a temporary location before appending it to mailboxes. Thus a message will get written three times, including twice with fsync(),toeffect a single delivery. From a pure efficiency standpoint, one would hope that the message would have to be written out only once. Third, the LDA uses two mechanisms to ensure that the mailbox is locked, including creating lock files in the spool during message delivery. Many filesystems will perform these additional metadata operations synchronously. Both locking mechanisms act as safeguards against email reading programs that might use only one or the other. For example, if the LDA can be assured that every program which accesses mailboxes respects the flock() advisory locks, then the creation of this lock file would be unnecessary.

In the email relaying procedure list presented in Chapter 3, it was documented that in sendmail versions 8.10 and 8.12 some items in the procedure list changed somewhat. These changes, mostly in the order and nature of queue operations, also apply to email reception. Because the nature of those changes was thoroughly discussed in Chapter 3, they won't be repeated here. Instead, this section focuses on the events that happen beginning with the spawning of the LDA.

4.1.1 The Local Delivery Agent

On the question of efficiency, at the very least it would make sense to modify mail.local so that a message bound for a single recipient would not create the temporary file. This effort requires modification of the LDA source code. It would also be possible to force sendmail to never attempt local delivery to multiple recipients at one time by removing the F=m flag from the local mailer definition in the configuration file. While this step would normally not be advisable, in conjunction with the other modification, it would mean that the temporary file would never need to be written. It's possible to alter the LDA's behavior in this way by adding the following line (no "m" is included to the list of flags) to the sendmail.mc file:

 define('LOCAL_MAILER_FLAGS', 'Prn9')

A better approach is to use the MODIFY_MAILER_FLAGS option, which became available in sendmail version 8.10, to remove this flag:

 MODIFY_MAILER_FLAGS('LOCAL', '-m')

Not having the temporary file written to disk will reduce the total number of writes performed by the email server. For multiple recipients, however, the file must be read from the queue multiple times. After the first copy occurs, the message likely will reside in the operating system's buffer cache, making successive read operations inexpensive; on most systems, the cost of several in-memory data copies (which consume only CPU resources) will be much lower than the cost of a single disk write. Without eliminating the need to write out the temporary file for a single recipient delivery, this approach would require writing the temporary file out for each recipient, which would increase the amount of I/O performed in the filesystem where the temporary files are stored. If the temporary file is stored in a memory-based filesystem and the files are small, then writing out the temporary file or files probably wouldn't be terribly expensive.

If sendmail does not spawn the LDA with multiple recipients, then one separate LDA process must be created using the fork() and exec() system calls for each recipient of the message. While this process will consume CPU resources, UNIX-style operating systems have become quite efficient at spawning new processes (sendmail does it quite often). Again, these actions consume only CPU resources, which are typically abundant on email servers. Besides the extra processes, each time the LDA is invoked, sendmail performs "canonification." Within the context of sendmail, canonification means making sure addresses and headers are complete and correct. In this specific instance, it means confirming that all host names are proper that is, all exist in DNS, all are fully qualified, and all have a "." or perhaps other character in them if the canonification rules have been redefined or modified in the configuration file. On most systems, this step will require DNS lookups to the name server. If the name server resides on the local host, the cost consists of CPU time, two context switches, and one network round trip across the loopback interface. If the name daemon isn't located on the same server, this approach can be more expensive.

Overall, we might want to know the following: Would it be cheaper for the system to write out a temporary file but send the message from the MTA to the LDA only once, or would it be better to recanonify the headers but save I/O involved in the creation of the temporary file? Unfortunately, no simple answer to this question exists. It depends entirely on the cost of each operation performed in each case on any particular email server. If DNS lookups to the name server are local (and they should be), then almost any amount of CPU overhead that avoids a write to a real disk is a price worth paying as long as the server isn't already CPU bound. If the temporary file write never moves a disk head (because it is written to a memory-based filesystem and the messages are smaller than the amount of RAM available to this filesystem), then writing out temporary copies of the message costs very little.

The default location for the temporary message file created by the LDA is typically /tmp, although the system's mkstemp() library call governs the precise location. On many systems, /tmp is a memory-based filesystem, which means that writes to this filesystem are merely written into the virtual memory system. Thus writing files here will result in disk I/O only if main memory fills and the OS pages the extra data out to swap. Using a memory-based filesystem to store files is a safe procedure as long as all files written to it are truly temporary, as in mail.local's case. The files have no value if the invocation of the LDA that created the file stops running, and consequently no data in these files must be salvaged if the machine crashes. Not writing these files to disk can produce a significant performance savings. On those servers that deal with very large messages, /tmp may be called upon to store a lot of data, filling available memory. If the temporary files will be written, this effort would result in disk operations under any circumstances, because as the system runs out of main memory it will be forced to write the extra data to swap space. Nevertheless, on most machines that support it, using a memory-based filesystem for storing the LDA's temporary files is a good way to mitigate the cost of the extra message write, at least most of the time.

If a memory-based file system isn't an option, then turning the LDA's temporary file storage area into a separate filesystem mounted asynchronously may be a good idea. While storing real message data in an asynchronous filesystem presents an unacceptable risk, it isn't necessary for these files to survive a system crash, so the performance gain obtained by making the filesystem asynchronous is worthwhile. In such a case, the temporary file storage area should be a separate filesystem. If mkstemp() creates files in /tmp, and /tmp is just another directory under the root filesystem, then making the / filesystem asynchronous to speed up writing the LDA's temporary message files is not appropriate.

4.1.2 Multiple-Recipient Performance Example

Let's examine this issue in some detail by using the CPU-bound email server that was introduced in Chapter 1. In this test, the target server runs sendmail 8.12.2 in background (default) delivery mode. It uses a very generic configuration file, accomplishing final delivery with the mail.local LDA that came with that sendmail distribution. The /tmp directory, where the LDA will write temporary files, is mounted as a tmpfs filesystem. We bombard 50 test accounts on that server with email messages that are about 1KB in size, each of which is bound to five randomly selected recipients. In the first test, we leave the "m" flag defined as a flag in the local mailer definition. With this setup, we see successful injection of 84 messages per minute (which translates to 420 messages delivered to mailboxes per minute). In the second test, we modify the configuration to remove the "m" flag from the local mailer definition. Under the same circumstances, we see only 50 messages injected per minute (250 messages delivered to mailboxes per minute), or about 60% of the throughput with F=m.

This result shouldn't be surprising, as we already know that the server is CPU bound, not I/O bound. We expect some reduction in throughput due to the extra data copies, canonification, ruleset parsing, and fork()ing if the LDA won't deliver to multiple recipients. Further, because we use tmpfs to store the temporary files, these writes don't cause disk movement. This example is especially contrived, as the LDA wasn't modified to not create temporary files, even in the single delivery case.

Most real-world numbers would be less dramatic. For instance, most email servers probably won't see an average of five recipients per message. At many sites, email is primarily a one-to-one communications medium. However, organizations that use email primarily as a one-to-many or many-to-many communications medium may experience results similar to our test case. The conclusion is that delivering to multiple recipients results in a significant CPU savings over delivering the message to each recipient individually.

4.1.3 LMTP

Even on reasonably sized email servers with well-tuned I/O systems, a system can become CPU bound. In these cases, reducing the number of fork()s performed by the system is certainly a good thing. With an appropriate LDA, this goal might be accomplished by making the LDA become a persistent process, perhaps by multithreading it, and passing messages between the MTA and a LDA via Local Mail Transfer Protocol (LMTP) rather than through the use of fork() and a one-way interprocess communication (IPC) channel, as is traditionally done. Unfortunately, no currently available Open Source solutions behave this way. Even so, using LMTP to communicate between sendmail and the LDA offers some benefits.

LMTP is defined by RFC 2033 [MYE96]. In essence, LMTP is a subset of the SMTP protocol designed to allow mail transport over very short and reliable networks, such as between two processes on the same host or, at the extreme, over a very reliable LAN. The RFC specifically states that this protocol should not be used over a WAN, and I wouldn't expect to ever endorse the use of LMTP between two hosts separated by a router.

The mail.local LDA that comes with the Open Source sendmail package includes support for LMTP. To enable it in the MTA, add FEATURE (local_lmtp) to the .mc file from which the sendmail.cf is generated. In terms of operational differences, some DSN information will change, and the z and X flags are added to the F= equate in the configuration file. These flags indicate, respectively, that the LMTP is supported and that the "hidden dot algorithm," as defined in Section 4.5.2 of RFC 2821, is used to encode lines beginning with a ".". Why might it be beneficial to use LMTP for communication between the MTA and LDA? For common sendmail installations, LMTP permits the elimination of one of the most pervasive and misunderstood email system errors without removing the F=m flag from the sendmail.cf file.

Suppose that sendmail invokes mail.local with three local recipients for a message, and that the first two deliveries of this message succeed but the third one fails due to some temporary problem. The LDA has no way of communicating the details of the failure back to the MTA; it must either signal success or failure for the entire delivery. Taking its delivery obligations seriously, the LDA reports a failure code. As it is also required to err on the side of caution, the MTA must assume that delivery failed for all of the recipients. The next time it tries to deliver the message, it is sent to everyone again. As a consequence, two of the recipients end up with duplicate messages. This duplication takes up I/O capacity, consumes disk space, and annoys the users. By using LMTP, the MTA can learn about the success or failure of delivery to each individual recipient, and it can then attempt redelivery to only those users for whom the initial attempt actually failed.

Because LMTP specifies a protocol in addition to the raw data transfer, we might expect to pay a mild CPU cost as a consequence of enabling LMTP support. In fact, if we return to our CPU-bound email server, we will see that this is, indeed, the case. In both tests, we use sendmail version 8.12.2 with the version of mail.local that ships with that distribution. We configure sendmail to deliver messages using background mode and test it by repeatedly sending one approximately 1KB message to one of 50 randomly selected recipients. In the first test, we use regular IPC delivery. We measure a throughput of about 242 messages/minute. After adding the following line to the .mc file

 FEATURE('local_lmtp')

and resetting the server, we can deliver about 236 messages/second, a decrease in throughput of less than 3%. This is a small price to pay to avoid extraneous deliveries.

RFC 2821 specifies that an email message traveling between two hosts should have each line in the message terminated with both a carriage return character and a line feed character. In the notation of the C programming language, this would be denoted by both \r and \n.Inthe RFCs, it is often referred to as CRLF (usually pronounced "CUR-liff") or "wire" format, as the message has this format when it is sent out over a network wire. UNIX-like operating systems, however, tend to terminate lines in files with just a line feed. Thus, every time a message is received from the Internet, it must be converted from CRLF to UNIX format when it is written to disk; every time it is sent back to another host, each \n is changed to \r\n.

If the only processes that access mailboxes on a given email server are the LDA and POP daemon, for example, it might seem tempting to store messages on the server in CRLF format to eliminate the need for these translations. Unfortunately, sendmail makes these translations when it writes the messages into the queue, so if the messages will be stored in the mail spool in CRLF format, the LDA must make the changes when the message is written to the spool, which largely defeats the purpose. The source code modifications required to change sendmail's behavior would definitely be nontrivial. Unless future versions of sendmail support storage of messages in this format, it would probably be wise to not venture down this road, especially given that it would likely result in very small performance gains.