7.6 Store partitioning | Microsoft Exchange Server 2003 Administrators Pocket Consultant

< Day Day Up >

Early Exchange servers used systems built around slow processors, limited memory, and slow disks. The system configuration imposed natural limits for mailbox support, and most deployments did not worry about the 16- GB limit that Microsoft then imposed on a Store database. After all, if you only allocated a 10-MB or 20-MB quota per mailbox, you would not hit the 16-GB limit unless the server had to support thousands of mailboxes, and no server was able to handle the load. There were notable exceptions to the rule-especially enterprise customers who scaled using the extra power of the Alpha processor to handle thousands of mailboxes. Enterprise deployments ran into the 16-GB limit and demanded change.

Over time, improvements in hardware and software-faster CPUs, better disk controllers, Windows NT 4.0, and especially the vastly superior multithreaded capability of the Exchange 5.5 Store-meant that the 16-GB limit became a scalability bottleneck faster than Microsoft could imagine. The initial response was to remove the limit and allow databases to grow to a theoretical 16-TB size in the Exchange 5.5 Enterprise Edition. However, no one could even think about approaching the 16-TB limit, because of storage costs and (more importantly) the issue of how to handle backup and restore operations for huge email databases. Everyone's mailbox was held in a single database, so if it failed, the administrator had to work out how to get the database back online as quickly as possible. With backup rates running at up to 40 GB/hour and restore rates half as fast, most administrators drew the line at 50 GB and split mailboxes across multiple servers when storage needs increased past this point. The very bravest administrators explored the outer limits and some databases went past 250 GB-but at this size, any database maintenance operation becomes something that you must carefully plan and schedule and then perform flawlessly.

The result is that the limitations imposed by a single monolithic mailbox database became the single most pressing reason why Microsoft had to update the Store architecture. Their answer is to partition the Store across multiple databases, collecting the databases into "storage groups" as a convenient point for management operations.

7.6.1 The advantages of storage groups

Exchange gains many advantages through Store partitioning. First, you can control the size of databases on a server much more easily. Instead of having to move mailboxes to a new server after the database reaches its desired limit, you simply create a new Mailbox Store and move mailboxes over to the new Store. An added benefit is that smaller databases lead to faster management operations. Second, management operations are accomplished at a database level rather than at the Store level, meaning that you can work with an individual database (e.g., restore it from backup) without affecting the other databases, all of which can stay online and maintain service to users. Because you can back up and restore each database individually with all other databases working online, Exchange is able to deliver a better and more resilient service to users. Note that you can perform multiple backup operations when you deploy multiple storage groups. Inside a storage group, backup operations proceed serially as the backup application processes each database in turn. Third, you can arrange databases and transaction logs across the available disks and controllers in the storage subsystem to manage I/O most effectively and to ensure that you protect data from failure. The last point is very important, because I/O management becomes increasingly important as the number of supported mailboxes scales up on large servers.

Of course, you cannot attain any of these advantages unless you put good system management practices in place and then see that administrators carry them out. You can partition the Store into 20 databases, an action that results in a dramatic increase in complexity for every administrative operation. No improvement in software will ever improve uptime or quality without matching effort from the people who run the computers. Some administrators, who rushed to partition the Store without thinking through the consequences, have discovered this to their detriment.

You can define a storage group as an Exchange management entity within the Store process that controls a number of databases using a common set of transaction logs. By default, when you install Exchange, the Store is composed of a single storage group that contains the pair of Mailbox and Public Stores that you would expect to see on an Exchange 5.5 server, together with newly created streaming files for each Store. In fact, when you think about it, it can be argued that the concept of a storage group already exists in Exchange 5.5, because its Store is a single storage group that controls two databases using a common set of transaction logs. Note that the standard edition of Exchange is still restricted to a single storage group and cannot grow databases past the original 16 GB.

7.6.2 Planning storage groups

Theoretically, the internal architecture of the Store permits up to 15 storage groups to be active on a single server, plus a special 16th storage group used for recovery operations. You can create up to six database sets in each storage group. A database set is either a mailbox or public database together with its associated streaming file. Thus, simple mathematics suggests that you could deploy up to 90 databases on a single server. One of these will be the default Public Store, so, on a purely theoretical basis, you could partition user mailboxes across 89 Mailbox Stores. If you only support MAPI users, you could remove the local Public Store and point users to a Public Store on another server and get 90 Mailbox Stores. All good theory, but Microsoft discovered that it was not possible to implement a server that hosts 15 storage groups.

Architectures usually exhibit some practical limitations when the time comes to deploy on real-life computers, and the Store is no different. In this case, the major limitation comes from the 32-bit nature of Exchange. Virtual memory is used to mount each database, so the more databases that are in use, the more virtual memory is taken up for this purpose. In addition, the system has to perform more context switches within the Information Store process as Exchange brings more storage groups and databases online. Testing performed during the development of Exchange 2000 demonstrated that the practical limitation is around 20 databases spread across four storage groups, which led Microsoft to impose this limit in the software. In other words, each storage group hosts a maximum of five database sets (while the storage group can support six databases, one of these is reserved for recovery and maintenance operations such as restores or using the ESEUTIL utility). Future versions of Windows (with suitably tweaked versions of Exchange) may support higher virtual memory limits and so be able to support higher numbers of databases. Even the 64-bit version of Windows 2003 Enterprise Server is unlikely to solve the problem until Microsoft upgrades Exchange to become a true 64-bit application. Otherwise, we will be in the same situation as when Microsoft compiled the 32- bit Exchange 5.5 code to run on the 64-bit Alpha platform: The code runs, but it cannot take advantage of platform extensions.

While it might be disappointing not to be able to probe the outer limits of the Store architecture, 20 databases are quite enough to deploy and manage to meet 99.9 percent of today's requirements. To put things into perspective, administrators can now deploy 20 times as many mailbox databases on a server as they could in Exchange 5.5, so properly managed systems should be able to support 20 times as much data. Given that the largest Exchange 5.5 systems support over 200 GB in a single mailbox database, it is conceivable that you could expect to support more than 4 TB of mailbox data on a single server. The server configuration would not be simple: To protect this amount of data you would deploy a highly resilient cluster with the best possible storage subsystem you could buy, together with an appropriate backup solution. Relatively few companies will be interested in such a configuration, although it may be an issue for some of the larger ISPs that want to deploy very large servers.

Storage groups require some additional planning on clusters. A cluster is composed of virtual Exchange servers, each of which takes control of different resources that run on the physical servers that compose the cluster. From a planning perspective, you can consider storage groups as a cluster resource and, since they usually support large numbers of mailboxes, the majority of clusters deploy multiple storage groups. To be technically accurate, a cluster supports a Store resource that manages the storage groups. If a virtual server fails, the cluster attempts to transition the resources that were running on the failed virtual server to another node in the cluster. This cannot happen without multiple storage groups, and the basic rule is that each virtual server supports at least one storage group. In a two-node cluster, each virtual server might support up to four storage groups, and in a four- node cluster, each server might support two or three storage groups. Of course, you must factor other considerations into these configurations-for example, the version of Windows you run on the server and the ability of the version of Exchange to support different numbers of active nodes in the cluster.

The per-server limit for storage groups occurs because all of the active storage groups on a node run under the single Store process. Apart from the obvious problem that a transition may overload a virtual server by transferring too much work to a single system, you should also ensure that a transition would not force the cluster to exceed the limits of databases or storage groups. For example, in a four-node cluster, where three nodes are active and one is passive, each virtual server supports two storage groups. If one server fails, the cluster transitions the two storage groups to the passive server. After the transition, the three active nodes still support two storage groups, so the cluster still respects the limitations for Exchange cluster transitions. If a second server fails, the transition moves the two storage groups from that server to the two remaining servers, which end up with three each. In a doomsday situation, a third server failure will force the cluster to attempt to move the three storage groups from the failed server to the remaining server. This would result in six storage groups on that server, but the Store cannot support six storage groups, so the transition will stop after one storage group transitions. Users whose mailboxes are in the two remaining storage groups will lose service. A failure that knocks three nodes out of a four-node cluster is clearly serious and highly unlikely to occur except in unusal circumstances. However, it does underline the need to think through potential cluster failure scenarios. The advent of eight-node cluster support for Exchange 2003 makes the exercise a little trickier.

Keep these numbers in mind whenever you plan storage group deployment, so that you do not run the risk of exhausting virtual memory or other restrictions; exceed them and you might. To reinforce the message, Microsoft hard coded the maximum limits into ESM. You can certainly get around the limitation by writing some code to use Exchange Management Objects to create new storage groups, but this will create an unsupported system configuration.

Testing continues as new versions of both Exchange and Windows become available, along with different hardware configurations, and the maximum number of either storage groups or database sets may change over time, so you should check with Microsoft's Web site and hardware vendors before making a definitive judgment on any particular configuration. For example, you can find a good Exchange clustering guide at www.microsoft.com/exchange. Plenty of people have experience with lowend systems, but relatively few know how to combine software and hardware together into high-end systems. You should, therefore, be cautious when building very large systems, and base everything you do on experience.

Exchange treats storage groups as a management entity. In other words, the Store manages all of the databases within the group according to a common set of properties and policies. While it is nice to have multiple storage groups on a server, each requires separate management as well as its own set of transaction logs. Dividing mailboxes within databases in the same storage group is the first and most effective step to Store partitioning. You only need storage groups when you need to apply different settings to the databases. For example, you might want to create a separate storage group to host one or more public stores that accept inbound feeds for newsgroups. Circular logging and database page zeroing are usually not appropriate for databases that contain information that expires on a weekly or monthly basis, so the storage group that hosts these databases operates under a policy different from that pertaining for a group that includes Mailbox Stores.

7.6.3 Does single-instance storage matter anymore?

Single-instance storage (SIS) was a hyped feature of Exchange 4.0 when it shipped in March 1996. SIS is the feature where Exchange keeps a single copy of a message in a database no matter how many users appear to have a copy of the message in their mailboxes. Individual users access the content through a set of pointers and other properties that enable the same content to take on different identities. For example, one user can file a message in his or her Inbox folder, while another user puts the same message in quite a different folder. The Store maintains an access count for each message that is incremented by one for each mailbox that shares the content. The Store then decrements the count as users delete their pointer to the message. Eventually, when the count reaches zero, the Store removes the content from the database.

Exchange is not the first email system to use SIS. ALL-IN-1, a corporate messaging system sold by Digital Equipment Corporation from 1984 onward, used a similar scheme. The major difference between the implementations is that Exchange holds everything-content, pointers, and item properties-within a single database, whereas ALL-IN-1 uses a database for the pointers and properties and individual files for messages and attachments. In both instances, engineers designed SIS into the architecture to reduce the demand for disk space and eliminate redundancy. PC LAN- based systems, such as Microsoft Mail and Lotus cc:Mail, typically deliver separate copies of messages to each mailbox, an approach that is perfectly adequate when a server never has to process more than 50 copies of a message. However, as servers scale up to support hundreds or thousands of mailboxes, creating individual copies imposes a huge drain on system resources and can swamp the ability of the I/O subsystem to handle the workload. Things only get worse as messages and attachments become larger.

There are a number of obvious advantages in a shared message model- for instance:

Disk I/O activity is reduced, because the system does not have to create, delete, and otherwise manage multiple physical copies of messages. This is especially important when message content is large, and the average size of messages is increasing. Think of the I/O generated to create 100 copies of a 100-KB message. Now, scale up to 2,500 copies of a message (perhaps one circulated to everyone in a company) that has a very large attachment. Without single-instance messaging, a server would quickly find itself on its knees due to the I/O activity from a single message sent to a very large distribution list.
Disk space required for message data reduces, because the Store creates no redundant copies of messages.

The net effect of these points is that the single-instance storage model is easily the most effective and scalable storage mechanism available to high- end messaging servers.

Apart from making effective use of storage, the single-instance model also effectively increases the maximum size of the Store from the physical limit imposed by available disk space upward toward a higher logical plateau. Charging the size of a message (and its attachments) against the quota of each mailbox creates more apparent logical storage than is physically available. In some respects, this is a smoke and mirror trick, since users believe that their mailboxes occupy more space on a server than is actually used within the database. For example, if you send a 10-KB message to three recipients, then 10 KB is used in the database, but a total of 40 KB of storage is logically occupied (10 KB for each recipient and the sender). In other words, message content is stored once and a series of pointers allows individual mailboxes to share the single copy of the content. Single-instance storage cannot occur if a user makes a change to a message's content, such as altering its properties to allow Outlook or OWA to download embedded graphics. Once this happens, the Store creates a separate copy of the message in that user's mailbox. All other recipients continue to share the original copy of the message.

In the early days of Exchange, anyone who created an Exchange implementation plan worried about SIS. Perhaps this was because analysts considered SIS to be a major bonus of the Exchange architecture, especially as servers took on the load of multiple PC LAN post offices that they replaced. However, I think the real reason was the need to conserve hardware resources. Consider that the systems in use in 1996-1997 were much smaller than today; disk space was more restricted and a lot more expensive, and network bandwidth had to be conserved. Now, the systems are a lot faster and come equipped with more memory, the software makes better use of features such as multiple CPUs, copious disk space is available, and network bandwidth is cheaper. I have not met many administrators or system designers recently who think about SIS when they come to assess system design. The world has changed, and Microsoft considerably undermined the feature when they introduced Store partitioning.

What factors influence the SIS or sharing ratio that you see on a server? Here are a few that come to mind: Messages sent to many users or large distribution lists increase the ratio because more users share a single copy of a message. If you can arrange for users who tend to send messages to each other to share a server, you have a higher sharing ratio. Apart from achieving a higher sharing ratio, keeping messages on a server whenever possible reduces network traffic and speeds message delivery. For much the same reason, the sharing ratio tends to be higher on larger servers than on smaller servers.

Messages sent to external Internet recipients tend to be less shareable than internal messages. This is a generalization, but if you look at the messages you send to Internet recipients, you will probably find that the majority go to a single recipient. Incoming Internet messages are usually addressed to a single recipient on the target server, which further reduces the sharing ratio.

Mailboxes transferred between servers using the standard "Move Mailbox" option (both Exchange 5.5 and Exchange 2000/2003) preserve SIS as much as possible. Checks are performed using message properties to see whether a message already exists in the store on a target server, and, if the message exists, the Store creates a new pointer. However, if the message does not exist (because it was never delivered to a mailbox on the target server or all copies have since been deleted), a new copy of the content is created. Mailboxes transferred using the ExMerge utility always create a new copy of message content. ExMerge does not check for existing message content, because it is a simple export-import utility designed to extract or import data from mailboxes.

Exchange servers that run multiple Mailbox Stores have lower sharing ratios than those with a single Mailbox Store. The reason is simple. As soon as you split mailboxes across Stores, you increase the potential that Exchange must deliver any message to multiple databases. The more Stores you have on a server, the lower the overall sharing ratio will be. The implementation of multiple Stores offsets the higher sharing ratio that you tend to see with large servers.

Servers that host connectors tend to have lower sharing ratios than mailbox servers do. Messages passing through connector servers are transient and the servers often do not host many mailboxes, so the sizes of the Mailbox Stores are small and the sharing ratio is low.

It is easy to check what the sharing ratio is on your server. Exchange provides performance counters for the MSExchangeIS Mailbox object to do the job. A separate counter is available for each Mailbox Store, and there is a counter to track the overall sharing ratio as well. The Store calculates the counter by dividing the total number of entries in the message table by the total number of entries in the message folder table. The Store keeps individual messages as rows in the message table. Each message is one row, no matter how many folders the message is in. The Store holds the folder from all mailboxes as rows in the message folder table. Putting this structure together, if Exchange delivers a message to 20 users whose mailboxes are in the same database, the Store only needs to create a single row in the message table for the new message. Rows already exist for the 20 Inboxes in the message folder table, so the Store can update the message folder tables with a pointer to the new message. Gradually, as users delete messages, the Store first moves pointers to the Deleted Items folder, and then finally removes the data when users empty the Deleted Items folder. This is a simplified description of what happens, because the Store can "hide" messages and keep them for an additional period if a deleted item retention period is set on the database.

Figure 7.15 shows the SIS performance counter on an Exchange 2000 server with a single Mailbox Store. The highlighted figure indicates a low sharing ratio (1.129), meaning that there are 1.129 references on average to every message in the Store. This ratio is at the low end of the expected scale and is indicative of a server where many users tend to send messages off the server or to users whose mailbox is in another Store. High ratios (such as 4.0 or above) usually indicate that users are human packrats who do not delete messages as often as they should. Sharing ratios seen across hundreds of Exchange servers deployed at HP range from approximately 1.2 to 3.5. Anecdotal evidence gathered at conferences or through discussions in forums such as the msexchange list maintained at www.swynk.com indicates that you can consider a range of 1.5 to 2.5 as normal.

click to expand
Figure 7.15: Monitoring SIS on an Exchange 2000 server.

Do not expect real-time updates for these performance counters. It would be unreasonable for Exchange to dedicate valuable resources to have the Store constantly monitor the number of messages and folders in a database. The values are unlikely to vary dramatically, unlike other performance counters such as CPU use. If you want to keep track of the sharing ratio on a server, record the value at a regular interval (weekly or monthly is enough). Looking at performance monitor reporting the same ratio for hours on end is very boring.

Exchange also provides a sharing instance ratio counter for items in the Public Store. On my Exchange server, the counter reported a figure of 22, implying that each item in a public folder had an average of 22 references. Mailboxes do not exist in a Public Store, so it is hard to imagine how a single item is referenced more than once unless it is stored in multiple public folders. However, attaining an average sharing ratio of 22 means that there is an awful lot of cross-indexing in public folders. Only 123 instances of public folders exist in the Public Store, but I cannot figure out how this result occurred. There are other instances when the counters do not make sense. For example, Figure 7.16 proudly reports that a Mailbox Store has attained an average sharing ratio of 0.000, which is a touch on the low side. This value appeared following a cluster transition, when the Store moved some storage groups between two physical nodes, so I assume it is due to a glitch in the cluster transition code.

click to expand
Figure 7.16: No sharing at all!

< Day Day Up >