Introduction | Exchange Server Cookbook: For Exchange Server 2003 and Exchange 2000 Server

Exchange uses a set of databases to store mail and public folder data. At a very basic level, Exchange is simply a highly specialized database management system. Understanding how to optimize your organization's use of Exchange databases and transaction logs will ensure that you get the best performance and reliability from your messaging infrastructure.

Storage Groups

Exchange 2000 introduced the concept of the storage group (SG): a set of databases (each one of which can contain mailbox or public folder data) that share a common set of transaction logs. The impetus behind this design is simple: you can mount or dismount individual databases, which means you can back up, restore, repair, defragment, or otherwise work with one database without affecting users whose mailboxes are in other databases. Exchange 2000 and Exchange Server 2003 Enterprise Edition servers can have up to four SGs, whether they're standalone or clustered, with up to five databases in each SG. The Standard edition of these products allows you to create one MAPI public folder database and one mailbox database (although you can create additional non-MAPI public folder stores, and even multiple SGs, if you like). Each of these databases is limited to 16 GB in size; Enterprise Edition databases don't have set size limits.

Exchange Server 2003 offers a special additional SG known as a recovery storage group (RSG); the RSG can be used to mount any database from any Exchange 2000 or later server in the organization. Once the database is mounted in the RSG, you can copy mailbox data out of it and into another database, making it easy to set up "life support" messaging during a disaster recovery.

The EDB Database Files

Exchange stores much of its message data in files known as EDB files, after their file extension. These files contain all of the message, folder, and attachment properties that Exchange exposes via MAPI and WebDAV; apart from the messages and mailboxes themselves, there are a large number of indices and tables that are dynamically created and updated to provide improved performance for various types of queries. Exchange has always used a custom database engine known as the Extensible Storage Engine, or ESE. ESE provides a low-level interface that the Information Store service builds on; ESE is mostly concerned with the content and arrangement of the 4 KB pages that make up the individual database files. ESE is optimized for storing MAPI messages, which have an interesting quirk: the set of properties stored for two different messages may be very different, making it hard to efficiently construct a relational database structure to contain them.

Among its other duties, ESE is responsible for implementing transaction logging (more on that in the next section), providing an interface for reading and writing individual database pages, and managing the process of performing backups using the ESE backup API or the Windows Server 2003 Volume Shadow Copy Service (VSS).

The Streaming Database Files

One of the biggest scalability limits in Exchange 5.5 came from Exchange using its own native storage format (the Transport Neutral Encapsulation Format, or TNEF) to store items in the store. Unfortunately, TNEF bears approximately zero resemblance to the Multipurpose Internet Mail Extensions (MIME) standard, the way mail messages are actually transported around on the Internet. That means that any time a client used SMTP to submit a message to the store, it had to be converted from MIME to TNEF. Worse yet, any time an Internet-protocol client wanted to use POP3, IMAP4, or HTTP to get a message out it had to be converted back. Worst of all, the content conversion always took place, even if it wasn't necessary! This ate up so much CPU and I/O bandwidth that Microsoft decided to take an alternate approach for Exchange 2000.

The streaming media, or STM, file is that approach. It provides a way for the Exchange components to stream data in on-the-wire formats like MIME, without breaking it up into 4 KB chunks and storing it in an EDB file. That means that a 10 MB PowerPoint presentation or a 25 MB QuickTime movie will be stored with no content conversion and without being split up. Content will no longer be converted unless it's absolutely necessary; when a message is received through an Internet protocol, the data is passed through to the STM file, and the native MIME streams are left intact. If another Internet protocol client attempts to read the data, it is streamed directly out of the native content store without ever being converted. If a MAPI client such as Outlook 2000 attempts to read the data that resides in the native content store, the store process converts the MIME stream into a set of MAPI properties on demand and passes the data to the Outlook client. Better yet, this on-demand conversion takes place entirely in memory unless the data are changed as part of the process (for example, if an Outlook user opens that PowerPoint presentation, edits it, and saves it again).

The store uses a very aggressive online defragmentation process to keep as much contiguous space as possible free in the STM file. Interestingly, STM files are like magnetic poles: they don't exist by themselves. STM files are always paired with EDB files, and the two files share a set of transaction logs, are backed up together, and generally combine to act as a unit. The two are even added together to determine the store size, when calculating the 16 GB database size limit in Standard editions of Exchange.

The Transaction Logs

The fundamental idea behind logging is simple: the logs store copies of all transactions. These stored transactions can be played back later to restore a corrupted database, or even to retry a transaction that didn't complete successfully. Logging is a mainstay of relational database engines because it provides a backup mechanism for transactions to help preserve the ACID properties of the database:

Atomicity means that a transaction is a complete unit unto itself. If you withdraw $100 from your bank's ATM, you want the two separate operations of debiting your account $100 and giving you the money to be treated as one indivisible unit.
Consistency means that if you apply a transaction to a database in a given state, the result will always be the same.
Isolation means that a transaction, when applied, should only directly affect its target data; if you alter record A, then there should be no changes to record B.
Durability means that the transaction changes are persisted to the database, so they don't disappear when the database is shut down.

Although Exchange treats the logs for a SG as a single entity, it's actually a set of files, each of which is exactly 5 MB (that's 5,242,880 bytes, for you purists) in size, even if there aren't any transactions in it. If you see a log file that's any other size, it's probably corrupted. The IS maintains a log file named edb.log. When the log file fills up, the IS service renames it, using a sequential hexadecimal ID (the first file is edb00001.log, the second is edb00002.log, and so on). These renamed log files are called generations; edb.log represents the highest, or most recent, generation. Note that just because a log file is full, doesn't mean its transactions have been committedall commitments happen according to a process I'll describe in a moment.

The log files contain a number of tidbits that are useful if the logs have to be played back during server recovery, including the full path to the database files, information about which generation of log data is in the file, a signature and timestamp for the data, and a signature for the database. This header information enables the store to make sure that each log file is replayed into the correct database file, and to balk if you do something like try to restore files from one machine onto another. Of course, the log files also contain information about the transactions themselves. For each transaction, the log records the type of transaction (i.e., whether the transaction represents a change, a rollback of a previous change, or a commit of a previous change). These transactions record the low-level modifications to individual pages and tables within the database.

The logging process

Logging transactions is a good way to keep the database unsullied and consistent; however, there may be performance costs involved. A simplistic logging mechanism would just log transactions to a file, then periodically inject them into the database. The Exchange logging process is quite a bit smarter; it works like this:

Something happensa message arrives, a draft is savedand a new database transaction is created by the information store. The transaction only reflects data that has changed; for example, if you open a draft message in your mailbox, edit it, and resave it, the transaction will contain only your changes, not the entire draft.
The timestamp on the page that will be changed by the new transaction is updated.
The transaction is logged to the current generation of log file for the service that owns it. Transactions are written to the log file in sequential order. Once the transaction has been logged, the calling entity assumes that it will be properly registered in the database and goes about its business.
The transaction is applied to the version of the store database cached in RAM. The store never records a transaction to the cached database until the transaction has been logged.
When the log file hits its maximum size, the service that owns the log file renames it and creates a new log generation. This log file will stay on disk until it's purged during an online backup.
Exchange copies the transactions from the cached copy in RAM back to the disk version of the database. This so-called "lazy commit" strategy means that at any point in time the "real" database consists of data from the database file on disk, data from the database copy in RAM, and as-yet-uncommitted transactions.

When a database is dismounted normally (say, by ESM or when the server is shut down normally), any transactions that have been made to the in-RAM copy of the database are committed to the disk version, and the checkpoint file is updated to reflect which transactions have been committed. If the service is shut down abnormally (say, by a power failure), when it restarts, it will scan its inventory of log files and play back any uncommitted transactions from the log files to the database. This means that it's very important not to move, delete, edit, or otherwise disturb the log files until their transactions have been committed.

The checkpoint file

How does the IS track which transactions have been logged? The IS service maintains a checkpoint file named edb.chk. Whenever a transaction is committed, the checkpoint file is updated to point to that transaction. The service uses the checkpoint file at startup time; if this file is present, transactions are played back from the checkpoint to the end of the last available log file. The checkpoint file tells the store which transaction log files contain uncommitted transactions, and would be needed in case of a crash. If the checkpoint file is missing or damaged, Exchange can scan each log file and check whether its transactions have been committed, but this is much slower than using the checkpoint files.

The reserve logs

Since transaction processing depends on log files, it's a fair question to wonder what would happen if there wasn't enough space to start a log file. As a last-ditch defense against running out of space, the IS service maintains two reserve log files named res1.log and res2.log. When edb.log fills up and is renamed, if there's not enough space to create a new file, the store service will use the reserve file instead. If this happens, ESE will send a remote procedure call to the service. When the service gets this special emergency message, it will flush any uncommitted transactions from memory into the reserve log files, and then shut down cleanly. The service will also log an event in the system event log; if your IS service won't start, check the event log to make sure you have adequate free space.

Circular logging

The normal process of creating logs is appropriate for most servers, because you want a complete record of all transactions that have occurred. However, in some circumstances you might want to keep a reduced set of log data. For example, SMTP bridgehead servers have to have a mounted mailbox database in order to generate NDRs, but there's no reason to accumulate tons of log files for that database since it won't contain any useful data. You might also want to reduce the number of log files generated by activities that would ordinarily generate large numbers of log files, like moving mailboxes from one server to another.

Exchange has long supported an additional logging mode called circular logging in which only a limited number of log files are kept. After a log's transactions have been committed to the information store, the log file is overwritten. This reduces the disk space required for logging, but when circular logging is enabled you can only recover the most recent full backupyou can't play back the same range of transactions as you could without circular logging. Microsoft recommends that you leave circular logging disabled on regular servers, enabling it only when you specifically want to reduce the number of log files kept on a server.