4.8 IMAP Tuning Specifics

Three popular Open Source IMAP server solutions exist: the University of Washington (UW), Cyrus, and Courier IMAP solutions. Each has its own niche and characteristics that makes it the best choice under certain circumstances. In this document, we will only briefly consider the performance implications of these services, as a truly in-depth analysis would require an extra book. IMAP is a very complicated protocol, much more so than any other aspect of email service, and its intricacies also will not be discussed here in any depth. For an understanding of the issues involved in running an Open Source IMAP server, check out Managing IMAP [MM00].

Generally, IMAP access differs from POP access in several ways. First, IMAP processes tend to be comparatively long lived. While the typical interaction with an email server using POP is to log on, download all new email, and then log off, a typical IMAP session might be to log on at the beginning of the day when one arrives at work, and then log off at the end of the day when one leaves, staying connected the entire time. On a POP server that provides service for 200,000 concurrent users, no more than a few thousand POP connections are likely to be active at any one time, and perhaps many fewer. With IMAP, however, a majority of the server's users may be connected at once. This setup creates some serious problems if an IMAP server uses one process per connection. Many operating systems don't even provide 100,000 unique process identification (PID) numbers, much less function adequately when nearly this many processes are active simultaneously. All of the Open Source solutions require one process per open connection, so supporting many tens of thousands of concurrent IMAP users on a single server running any of these Open Source packages simply will not work very well.

One might decide to adopt a commercial solution based on a multithreaded IMAP server, but a problem still exists. Even though one could theoretically hold millions of simultaneous socket connections open on a single server, each open file consumes a file descriptor, each session consumes some amount of memory, some practical limit usually restricts the number of concurrent threads an operating system will support, and other tables may become filled or lookups may become so inefficient that the whole system can't run. Therefore, under any except the most extreme circumstances, any single server is unlikely to be able to support more than a few tens of thousands of concurrent active IMAP connections at once. Consequently, dividing the user base over several servers will be necessary. In any case, both commercial multithreaded IMAP daemons and handling of multiple servers are beyond the scope of this book.

Aside from the long duration of connections, IMAP differs radically from POP in the duration one expects a message to remain on the server. With POP, messages will likely be downloaded to a local machine for storage, processing, editing, and response. With IMAP, all of these actions may take place on the IMAP server itself. As a consequence, the amount of disk space consumed by the average IMAP user will be much larger than that used by the average POP user, often by several orders of magnitude. While message sizes will not change just because of the protocol, the frequency of access to each message will. Even though disk access patterns will still remain synchronous, small, and randomly located, a much lower percentage of the total available disk space will be accessed during any given time period. Therefore, fast I/O channels, NVRAM, journaling, and fast metadata updates will all be just as important as they are in the POP case, but disks generally should be larger. In addition to the POP-like workload, they will be carrying a large amount of infrequently accessed data.

In populating RAID systems for IMAP service, mirroring (RAID Level 1) is typically not feasible, as too much disk space will be required for it to be cost-effective. RAID Level 5 or RAID 50 will be the preferred way to go. Also, instead of buying the smallest, fastest disks available, buying the largest, fastest disk will generally yield the best results. Total RAID system storage capacity then becomes important. In the POP case, I/O bandwidth would almost certainly become saturated long before the RAID system was completely populated with disks. With IMAP, it's even more difficult to give hard and fast rules on how much space each user will need; rather, the answer will depend very much on the user's workload, sophistication, environment, and types of messages typically sent in the organization. Count on this, though: No matter how much space one provides, it will eventually get filled. Usage of IMAP message store space is much more like a user's home directory than POP server space. An email server that is deployed with the maximal amount of disk space available will need to be replaced in short order, regardless of the users' access patterns.

Another feature of IMAP is that the user has available a great deal more information about and options concerning the messages stored by the server. Requests for header information from a range of messages, shared mailboxes, and message searching capabilities are just some of the myriad commands supported in the IMAP protocol. Many IMAP message stores try to at least partially optimize for these requests, but if a large percentage of a server's user base performs searches over a large number of messages, I/O bandwidth will be exhausted and the machine will run slowly. Not much can be done about this problem, but it makes capacity planning difficult. Expect that even with a stable user population, as time goes on, mailboxes will grow, people will handle more email per capita, and users will increasingly resort to more demanding options on the server. IMAP servers will constantly need to be upgraded over time, and it would be wise to plan for that eventuality up front.

UW IMAP is the reference implementation of the IMAP protocol. It can flexibly be adapted to a wide variety of message store formats, although most often it uses a slightly modified version of the 7th Edition folder format. For smaller servers, UW IMAP performs adequately, but it lacks some of the feature sets of other IMAP systems. Due to its relatively poor performance characteristics, this package is rarely used in demanding environments.

Cyrus is probably the most commonly deployed Open Source IMAP solution. It is built around its own message store format, which uses a one-file-per-message layout as described earlier in this chapter. Cyrus comes with its own LDA, called deliver, which replaces mail.local when used with sendmail. Cyrus has reasonably good scalability characteristics, but I don't know of an Open Source-based IMAP server that supports more than 100,000 total users. At such a scale, it's doubtful that more than 10% of the subscriber base is active at any point in time. At the time of this writing, the most current version of the Cyrus IMAP software is 2.1.5, but many sites continue to run the latest release of version 1.6 due to concerns about Cyrus 2's stability. The Cyrus folks have been making real progress in increasing the stability of Cyrus 2 (it's already more stable than several commercially available IMAP servers), so watch progress on this version carefully.

Courier is a complete email solution based on the maildir format. Courier has logged less field time than the other two solutions mentioned here. Nonetheless, quite a few sites are using it. Its performance characteristics seem fairly good, but a head-to-head performance comparison between Courier and Cyrus would clarify this issue. I'm not aware of any really large-scale Courier installations, but that lack doesn't mean that it wouldn't be straightforward to scale.



sendmail Performance Tuning
sendmail Performance Tuning
ISBN: 0321115708
EAN: 2147483647
Year: 2005
Pages: 67

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net