1.7 Email System Profiling

More times than I care to remember, I've had a conversation with someone trying to specify an email system that went like the following:

Them: I need to build an email server using Sun equipment that will handle X number of users. What hardware should I buy?

Me: Well, that completely depends.

Them: On what?

Me: On their usage profiles. How many messages does each person send and receive each day? What is the average message size? How fast is your Internet connection? What percentage of your peak day's total traffic occurs during the peak hour?

Them: I don't know.

Me: Then I'm afraid I can't help you very much.

Them: Well, can you give me a rough estimate?

Me: Sure, you'll need a machine somewhere between a Sparc 2 with a 2GB disk and a 30-processor Sun E6500 with an EMC Symmetrix storage system.

When it comes to email servers, it seems that just about the only information anyone can obtain from a prospective client is the number of users. Unfortunately for performance purposes, this figure is just about the least useful metric for evaluating email service load. Getting the information that's really necessary to size a server often seems next to impossible, even from people who really ought to know better.

When all is said and done, the two most important pieces of information to consider regarding the load on a server almost always will be (1) the number of transactions (messages or connections) per second the server has to handle and (2) the average size (in bytes) of each transaction. Further, one really needs to know these data rates only for the busiest 15 minutes of the day. If the server can handle its busiest load, then except under the most unusual circumstances, it should provide acceptable service during the rest of the day.

Some secondary considerations shouldn't be ignored, however. The first relates to the overhead for each connection. For example, if an email server provides IMAP service, there will be a fixed overhead for each connected user. If there are a lot of users or if the overhead is considerable (e.g., the server has to spawn a process for each connection), then total overhead might be an issue, even if the vast majority of connections remain idle for extended periods of time and thus contribute very little to the total throughput load. Another secondary performance issue might be related to bottlenecks that are not easily solved. For example, if a site has a fixed-bandwidth network connection to the outside world that is not easily upgraded, and during much of the day that connection is at capacity, then that server may require a substantially larger message queue than is typical; that requirement might need to be reflected in the server's storage capacity. This case might also arise in those (rare, these days) situations where network connectivity is intermittent.

Unfortunately, these considerations are largely irrelevant. Rarely does someone who wants information on building an email service know and provide the key information necessary for sizing the system(s) that will run it. If an organization has this information, it probably doesn't need assistance. In most cases, what needs to be done is to accept the information that is available the number of users and find an appropriate profile for their usage patterns from which performance numbers can be drawn. This effort usually involves obtaining copies of the data the email applications log on the server, if possible. Chapter 7 covers obtaining information from these logs in more detail.

One problem is that typical Internet email usage patterns change constantly, which means that not only will the estimates provided here be immediately obsolete, but also the trends for estimating future growth will almost certainly become less valid as time passes. The best substitute for accurate information is a guess based on current information, so readers are strongly encouraged to treat what is presented here with a healthy dose of skepticism and, instead, perform their own research to the best of their ability. This point cannot be stated strongly enough. The numbers presented in the remainder of this chapter should be treated as guesses and over-generalizations, but are better than nothing (although in some cases, perhaps not much).

In the Internet Service Provider (ISP) market, the numbers of messages that a typical subscriber sends and receives per day vary widely. A reasonable estimate is that typical ISP users receive about twice as many messages per day as they send. This difference is largely due to mailing list traffic, opt-in advertising, Web action confirmation, and spam. As to absolute numbers of messages, some direct data are available. In the online newsletter Messaging Online, America Online (AOL) reported that it averaged receiving 3.5 messages per subscriber per day in 1998 and 5.6 messages per subscriber per day in 1999 [M-O00]. From this, we can project that by mid-2002, AOL subscribers may be receiving, on average, as many as 20 email messages per user per day.

America Online users have often been chastised as being "unsophisticated" compared to the customers of more traditional ISPs. While it does little good to debate this point, it certainly seems that traditionally they receive less email per capita. During 1999, most other ISPs received between 10 and 15 messages per subscriber per day, a number typically two to three times higher than their AOL counterparts. I would expect this gap to narrow over time, but not to vanish, at least for several years. I'd project that during 2002, the typical ISP subscriber will receive 20 to 30 email messages/day.

To date, the numbers for ISPs outside of the United States tend to be far lower than those for their U.S. counterparts. Data for European and Asian ISPs lead me to believe that per capita their subscribers average about one-third the number of messages received by U.S. subscribers. I also expect this gap to close over time. Some folks have used the model that European and Asian Internet loads tend to have the same sorts of numbers as the U.S. loads of two to three years ago. For now, this model seems reasonable except when it comes to wireless messaging, where Europe and especially Asia support much higher loads [UM].

Similar to ISPs are the email portals, such as Hotmail and Yahoo!. In my experience these sites have two types of users: those who use their email just as much as one would use ISP email, and those who hardly ever use it. A reasonable estimate seems to be that message loads average a little less than half of the per capita load of ISPs in the same region.

Of course, many more email servers do duty in a corporate environment than at an ISP. For these servers, profile information is much more variable and depends greatly on the sophistication of the users, the types of documents and storage formats typically in use at the site, other data transport mechanisms that are available, and the general corporate culture that has sprung up concerning email applications. Estimating corporate email usage is a daunting task, and without adequate information about the site in question it would seem to be nearly impossible to generate an accurate model. Despite this problem, site planners and administrators will be asked to make these predictions, so we do what we can.

At least one report, released early in the year 2000 [FON00], estimates that at that time the typical user received about 19 email messages per day and the average message was about 150KB in size. This report predicted that in 12 months, the typical corporate mailbox would receive 34 messages per day and the average size would jump to 286KB. In many environments, especially in the corporate world, it would not be atypical to find that 90% of email is entirely internal that is, the sender and the recipient of the vast majority of messages use the same email server.

It is worth reemphasizing that these numbers are rough estimates, and that nothing here should serve as a satisfactory substitute for direct research concerning one's actual situation.



sendmail Performance Tuning
sendmail Performance Tuning
ISBN: 0321115708
EAN: 2147483647
Year: 2005
Pages: 67

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net