4.7 Message Store Hashing | sendmail Performance Tuning

As mentioned earlier, many filesystems perform a linear search for file or subdirectory names in a directory. With large numbers of users, this search can present a real performance problem in the /var/mail directory. Even using filesystems that perform hashed directory lookups, having hundreds of thousands, or even millions, of files or subdirectories in a single directory will not work well, or at all. For this reason, it's a good idea to subdivide a large message store by hashing it.

Support for a simple way of performing this task is provided in the qpopper distribution. One sets a variable called HASH_SPOOL to a numerical value at compile time, and qpopper will use that value and the mailbox name to construct a set of directories under which a mailbox will be stored. For example, a HASH_SPOOL value of 0 leads to the default mailbox location for the user npc of /var/mail/npc. With HASH_SPOOL set to 2, the mailbox location becomes /var/mail/n/p/npc. Of course, the LDA must be similarly modified.

While this approach generally works fairly well, it has a tendency to bunch up many of the mailboxes. On most very large email systems, the contents of the t/h directory tree might be numerous while the q/x tree contains very little. Of course, this issue may not be a major problem, but whether it's a good idea or not, there exist email systems where a sizable fraction of the email addresses have the same first letter(s).

One general solution to this problem would be to always use the entire user name as the hash; for example, npc's mailbox might be located in /var/mail/n/p/c/mbox. Of course, the directory with this mailbox may contain several subdirectories representing the beginnings of other names, but as long as no programs that operate on mailboxes use single-character names for files, no collisions will occur. While the SMTP standard leaves it up to each local email server to decide whether the user name portion of an email address is case sensitive, sendmail is not case sensitive by default when it comes to locally delivered email. Unless this behavior is explicitly changed, email sent to NPC@example.com and npc@example.com will therefore end up in the same mailbox on a typical sendmail system. In such a case, no more than 50 ASCII characters could be used as part of an email address; thus, including files used by the POP daemon, no directory would ever have more than 55 or so inodes in it, which is a very manageable number.

Such a scheme uses up inodes fairly quickly, so it would be wise to consider this issue before adopting this mechanism. A very conservative estimate can be obtained by multiplying the number of users times the maximum number of characters in a user name and allocating that number of inodes to the system. This approach is conservative because it assumes that no portion of the directory namespace will be reused from one user name to another, which won't be the case.

Although not at all tricky to code into most existing applications, this system has its share of disadvantages. Traversing a large number of directory levels causes a large number of directory lookups. Within the operating system, the function that converts path names to inodes is called namei(). For namei() to resolve the mailbox for the user npc using this hashing scheme, it must sequentially search the contents of the /, var, mail, n, p, and c directories before arriving at the desired mailbox. Much of this information, probably the first few levels of directory lookup, will be stored more or less permanently in the filesystem buffer cache, as these items will be accessed so frequently. On an email system with a very large number of users, more directories will be present than can be cached, so many of these lookups must happen on disk. While these lookup operations aren't expensive, they do consume resources and, more important, cause the disk heads to move. While disk heads are moving, they're not reading or writing, which slows down the aggregate number of disk operations that can be performed.

Another problem with this scheme is the challenge involved in extracting the complete list of mailboxes with mail on the system. It can be done with something like the following:

 #!/bin/sh  find /var/mail -type f -name mbox -print |\      sed 's:^/var/mail/::          s:/mbox$::          s:/::g'

This code is hardly an elegant solution and is much more cumbersome than ls/var/mail/*/*/.

Another solution is to use one or two directory levels (as appropriate) but to run some numerical hashing algorithm over the user name to determine the location of a mailbox. Here's an example of an algorithm that is likely to be effective: Pick a prime number (roughly equal to the square root of the maximal number of users we expect to support on this system). Let's assume we might support 58,000 subscribers. We then might select PRIME=241 as our number of bins, so we create the directories /var/mail/0 through /var/mail/240 inclusive. Now, when it comes time to locate a particular mailbox, we run HASH(username ) % PRIME, where HASH() is some hashing function and % denotes modular division. This function defines the location of each mailbox on the system.

The hashing algorithm could be something well known such as MD5 [RIV92]. We don't need a cryptographically secure hashing algorithm, however, and MD5 is much more computationally expensive than is necessary for these purposes. Take the following C code, for example:

 # define BIGPRIME 241  int  hash(char *username)  {          char *s = username;          int x = 7;          int y = 19;          int prime = BIGPRIME;          unsigned int n;          for (n = (unsigned int) strlen(username); *s != '\0'; s++)                  n=((n << x ^ (n >> y)) ^ *s);          return (n % prime);  }

In this algorithm, a string representing the user name is passed in and an integer representing the hash bin to which that user name corresponds is returned. Internally, prime is the number of hash bins, and x and y are arbitrarily selected small primes.

This algorithm will do almost as good a job of balancing user names over a set of hash bins as MD5 but in a much less computationally expensive manner. With this method, one would need to write a small program that, given a user name, would produce the path to any particular mailbox so that individual mailboxes could be located. This task represents an inconvenience, but not a major one, and ls/var/mail/*/* still provides a complete mailbox list (as long as there aren't too many mailboxes). Essentially, the same mailbox hashing scheme is described in [CBB97].

On email systems with up to, say, 10,000 accounts using a high-performance filesystem, hashing the mailboxes may not be necessary. Even if the message store isn't hashed, putting the .lock files (if necessary) on another disk can reduce the number of files in the spool and, at the same time, spread out the I/O load over more spindles. In fact, because the .lock files don't have any consequence after a reboot, placing them in a tmpfs filesystem would be entirely appropriate. For .pop files, they should reside on the same filesystem as the mailboxes they represent so that they can be replaced atomically with the rename() call. Of course, that doesn't mean that they can't also reside in a different directory. Creating a /var/mail/.pop directory for these files certainly would be acceptable, but make sure that .pop isn't allowed as a valid user name!