6.3 Other General Strategies | sendmail Performance Tuning

Other general architectural strategies can be undertaken to improve a site's email performance. While these strategies stray a bit from the "making single servers perform better" guideline set down in Chapter 1, some are compelling enough to warrant mention here.

6.3.1 FallbackMX Host

A FallbackMX host is a server set up by an organization to act as a temporary repository for queued email that is being sent from the organization to the rest of the Internet. Assuming the local domain is example.net, in M4, one can configure a FallbackMX host by adding the following line to a .mc file:

 define('confFALLBACK_MX','fallback.example.net')

where fallback.example.net is the name of the host used as the FallbackMX host. When an email server looks up the host and domain portion of an email address for delivery, it generates a list of machines to try via querying DNS for MX, A, and CNAME records for that destination (and AAAA records if IPv6 support is enabled). If FallbackMXhost is defined, this host name (or set of host names if it is an MX record pointing to more than one actual server) is appended to the list. Thus, if the message cannot get through to its intended destination, it is funneled to the fallback machine for later delivery.

It may be easier to understand the function of a FallbackMX host by examining Figures 6.1, 6.2, and 6.3. Under normal circumstances, to move email from our site's outbound email server, example.net, to its destination, example.com, our server makes a direct connection to the destination machine (Figure 6.1). If our server cannot connect to example.com or any of the machines pointed to by example.com's MX records, it will send email bound for that domain to its FallbackMX host, fallback.example.net (Figure 6.2). This host will attempt to send email on to its true destination in this case, it cannot due to a network outage. Therefore, the messages are queued on this machine until delivery can take place. Once connectivity to example.com is restored, email can be sent there directly by example.net, and fallback.example.net can send the email bound for that destination that resides in its queues (Figure 6.3).

Figure 6.1. Normal email routing.

graphics/06fig01.gif

Figure 6.2. Email diverted to FallbackMX host.

graphics/06fig01.gif

Figure 6.3. Connectivity reestablished.

graphics/06fig03.gif

Using a FallbackMX host keeps the queues on the main server clear. In fact, as long as the fallback machine is accessible and handling SMTP connections, no messages should remain in the queue of the main server that are not currently being processed, and processed for the first time. On the other hand, the FallbackMX host might have relatively deep queues, but sending email from this machine need not occur extremely quickly, as every message in its queues has failed to be delivered to its intended destination at least once.

At a site with a FallbackMX host, the main servers responsible for sending email to the Internet need much smaller queues, which makes NVRAM and small solid state disks more practical, allowing for better throughput for the same amount of money. Further, because queues stay small, it may not be necessary to use a filesystem on these servers that performs efficient lookups in large directories, although performing fast metadata operations remains just as important as ever. At the same time, the FallbackMX host can have a small or moderate capability RAID system that doesn't need to be blazingly fast to store its queue. Multiple queue directories are appropriate here, as is a filesystem that handles large directories gracefully. Almost any site that trafficks enough email volume such that it's appropriate to use multiple servers to handle the load will benefit from creating a FallbackMX host.

6.3.2 Spillover Host

At some sites, a single or small number of email servers may become overloaded on more than an occasional basis. In some cases, upgrading the service is not an immediate option. If so, then one might want to try the FallbackMX host concept in reverse. That is, set up a host that can handle deep queues to act as an email storage reservoir for an organization when the primary hosts are unavailable. Stated another way, a Spillover host is a machine set up by an organization to temporarily queue email sent from the Internet to the organization in question when the organization's main servers cannot accept the email that the Internet wants to send them.

To set up this host, configure DNS so that the main server has the highest precedence (lowest number) MX record for a given set of domains. The server may also have some resource limits set (such as MaxDaemonChildren) so that it doesn't become overloaded with incoming SMTP connections. At the same time, a second server would be configured in DNS with a lower-precedence (higher-number) MX record for those same domains. If the higher-precedence servers are not available, the Spillover host would accept email bound for those domains, but it would simply store that email in its queues. In other words, it wouldn't attempt to deliver email to local mailboxes. Once a higher-precedence MX server became available to receive email, it would forward the messages to that server.

It may be easier to understand how the Spillover host functions in practice by referring to Figures 6.4, 6.5, and 6.6. In this case, the highest-precedence MX record for the domain example.net points to the machine example.net. A lower-precedence MX record points to spillover.example.net, which is configured to relay email bound for example.net but not to deliver it locally. Under normal circumstances, other email servers on the Internet would send email directly to example.net (Figure 6.4). If this server cannot accept incoming email, email for this domain would instead be sent to spillover.example.net, where it would accumulate in that machine's queues (Figure 6.5). Once connectivity is restored, new email from the outside world and messages that have accumulated in the Spillover host's queues would be delivered to example.com (Figure 6.6).

Figure 6.4. Connectivity reestablished.

graphics/06fig04.gif

Figure 6.5. Connectivity reestablished.

graphics/06fig05.gif

Figure 6.6. Connectivity reestablished.

graphics/06fig06.gif

Even between two sites with good Internet connectivity and high-speed links, the effective bandwidth between them often remains surprisingly limited, and latency can be quite high. On the best of days it might be measured in the tens of milliseconds, and it may often be much worse. Therefore, if Server A sends a certain amount of email to Server B, performance implications will arise based on how the network between them behaves. Of course, generally speaking, the amount of disk I/O will be the same regardless of the quality of the network connection. In both cases, the data must be read and written to the disk, and it doesn't much matter how fast the data arrive. However, the slower the network between the two servers, the more time every sendmail process will take to conclude its task, and each process consumes system resources. The total number of messages and megabytes of email will be handled by a given server each day regardless of the quality of the network connection between a server and the rest of the Internet. Consequently, it stands to reason that the lower the latency between a server and its peers, the fewer simultaneous processes that need to run on a given server, and the less memory (and other system resources) that will be consumed at any one time. This consideration can cause a significant difference in overall system performance.

An email server getting email from its Spillover host over a low-latency connection can reduce some aspects of its load compared to a system that receives all of its email via a high-latency connection to the Internet at large. Additionally, a Spillover host provides the beneficial side effect that the email moves off the Internet and on to friendly hosts more swiftly. Further, it's being a good neighbor to not tie up email server capacity and disk space around the Internet when not necessary. A Spillover host typically doesn't provide nearly the performance boost that a FallbackMX host does, but in certain circumstances it can provide a significant benefit.

In many cases, the Spillover host will reside "nearby" the primary email server that it guards. Nevertheless, one can argue in favor of housing a Spillover host at an off-site location, especially if one's network connection isn't reliable or has limited capacity. Having a dedicated host that is up and able to receive email even if an organization's primary network connection becomes unavailable is good for the Internet, but it also can help keep the primary server from becoming overloaded when network connectivity returns. Instead of having the entire Internet waiting to deluge the primary server with the email volume that has been building up during the outage, this email will now reside on the off-site Spillover host. It cannot send email to the primary server faster than a single host can send out email, which should allow the primary server to absorb email at a tractable rate. At the same time, this transfer should occur quite efficiently, as the email backlog can be sent over a small number of SMTP connections, which will further reduce the amount of effort required to absorb those messages. This situation is in stark contrast to a FallbackMX host, which should always reside "near" the servers that send email to it. Preferably, these hosts will reside on the same network.

A Spillover host will be used more frequently when the primary server charged with receiving email for a particular organization is overloaded or down. As a consequence, it might not be a bad idea to raise the value for Timeout.queuereturn on the Spillover host. If the primary server stays off the air for an extended period of time, it would usually be beneficial to have as much email as possible eventually get through to its intended recipient.

6.3.3 Smart Host

One last type of additional dedicated server is the "smart host." It is set in the sendmail.mc file by adding the following line:

 define('SMART_HOST', 'smarthost.example.org')

Originally, the smart host concept was created to allow a set of hosts within an organization that might not be able to route Internet-wide email (for example, if they didn't have access to Internet DNS information) to send all of their outbound email to a centralized server that could. The same strategy is also useful for reducing the load on an email server. Assume that a smart host is available for an email server that receives a great deal of email. With a smart host, outbound email will leave the sending host more quickly (going to a server that is always available and "nearby" in the networking sense). This approach will reduce the number of concurrent processes running on the sending machine as well as the number of entries in its queue at any one time. Both factors can reduce the load on the server.

6.3.4 Server Parallelism

When it comes to performing computer tasks, it's usually more cost-effective to run the task on two slower computers than on one computer that is twice as fast as either of the two. The trick is that the tasks to be performed must be parallelizable over the two computers, which is not always the case. The fact that not all tasks can be parallelized is primarily what allows vendors to sell very large computer systems. While the economies of parallelism often apply to CPU-bound problems, the cost-effectiveness of parallelizing I/O-bound applications is almost always considerable, and Internet applications such as email relaying are extremely parallelizable, although trying to parallelize a single image message store is not so easy to do.

Therefore, if an email relay is running at capacity, instead of implementing expensive upgrades, it's often a better strategy to buy a second server and split the load. In most cases, we point a domain's MX record at a host's single A record to indicate where email should go for that domain. Splitting the load is trivially accomplished by replacing the MX record pointing to the single host with two MX records that point to the two servers over which the load will be shared. The following example from a BIND zone file for the fictional example.com domain illustrates two equal-precedence MX records for the same zone split between two hosts:

 $TTL 3600000  example.com          IN SOA ns1.example.com.postmaster.example.com. (                          2002012001 ; Serial number                          10800      ; Refresh                          3600       ; Retry every hour                          604800     ; Expire after 1 week                          86400      ; Minimum TTL of 1 hour.                          )                       IN NS  ns1.example.com.                       IN NS  ns2.example.com.  example.com.         IN MX  10 server1.example.com.  example.com.         IN MX  10 server2.example.com.  server1.example.com. IN A   10.5.4.1 1  server2.example.com. IN A   10.5.4.1 2

According to the RFC 2821, when presented with two equal-precedence MX records for the same domain, SMTP servers should select randomly between them to determine with which host it will connect. These days, almost all SMTP software behaves in this way; at the very least, a large enough percentage of it does so to effectively split the load between multiple servers.

One can also use layer 4 switches to provide a similar level of parallelism. These devices have their relative advantages and disadvantages compared to using multiple MX records to provide parallelism, but often become a good idea once the number of MX records for a single domain exceeds a very modest number. These devices are also quite useful compared to using MX records, as DNS records for a server one might like to remove from service may be cached around the Internet, which may make removing the server less transparent. Further discussions of the issues surrounding these devices is beyond the scope of this book. See Tony Bourke's book [BOU01] for more information.

If a host name is listed in the sendmail.cf file, it will be checked for MX records first. If the record points to multiple hosts, then a given sendmail process will contact a randomly selected host. If the host name is enclosed with square brackets, such as [server1.example.org], then MX record lookups are not performed. Instead, the host name or IP address within the square brackets is taken literally.

Of course, the fact that email services can be parallelized in this manner doesn't mean that one shouldn't tune a single server. In general, keep in mind that if the email operations (or any computing task) can be parallelized, one should assemble the most horsepower per dollar rather than build the fastest machine possible, remembering that support costs as well as equipment costs need to be factored into the financial equations.