Chapter 10 - Monitoring Exchange | |
Monitoring and Managing Microsoft Exchange 2000 Server | |
by Mike Daugherty | |
Digital Press ?2001 |
You can do some basic monitoring of processes and queues using each Exchange servers Monitoring tab. However, many other key resources should be regularly monitored . Windows 2000 includes a Performance Monitor that can be used to monitor these other Windows 2000 and Exchange 2000 resources, such as the number of Internet messages received per minute, the percentage of processor utilization, etc. Performance Monitor can create charts , set alerts, and format reports that help the Exchange administrator measure system performance. Data gathered from counters can be displayed in real-time or stored in log files for later analysis.
The usefulness of any monitoring effort depends on creating a baseline measurement when the system is operating effectively and knowing the limitsboth high- and low-end valuesfor each parameter that indicate a problem. The values in Windows 2000 Performance Monitor counters can be examined to determine how Exchange is performing or to track error conditions.
Performance Monitor tracks the value of object counters, where objects include the systems processors, memory, disks, and processes such as the Exchange 2000 processes. Each of these object types has a set of counters from which the Performance Monitor can collect data. For example, a LogicalDisk object has counters for % Disk Time, Free Megabytes, and % Free Space.
Some object types have several instances. For example, the systems Processor object type will have multiple instances if a system has multiple processors. The PhysicalDisk object will have one instance for each disk drive. The counters for each instance of an object can be monitored independently.
Most typical monitoring can be done using the Windows 2000 Performance Monitor. Using Performance Monitor, the Exchange administrator can track the functioning of critical objects on an Exchange 2000 server. By carefully watching the monitored objects, the administrator can often detect a minor problem before it progresses into one that will affect users due to server downtime.
The administrator can create Microsoft Management Console files (.MSC) that define specific objects to be monitored. Once created, the MSC file contains all the settings, including screen position, chart colors, etc. required to monitor the object. If alerts have been configured, the alert will be sent and the event details will be written to the Windows 2000 Event Viewer log when the thresholds are exceeded.
The Performance Monitors chart view provides a continual, real-time overview of the system performance. The following procedures can be used to create a performance monitor chart:
Start the Performance Monitor from the Windows 2000 Start menu by selecting Programs Administrative Tools Performance (Figure 10.12).
Figure 10.12: The Performance Monitor window
Select the View Chart button.
Select the Add button to display the Add Counters dialog box (Figure 10.13).
Figure 10.13: The Add Counters dialog box
Select the Exchange server to be monitored. You can monitor the local system by selecting Use local computer counters. To monitor another server in your network, select Select counters from computer and select the system from the associated drop-down list.
Use the Performance object drop-down list to select an object to monitor. Recommended objects to monitor are listed later in this chapter.
Select the counters and instances to be monitored. The set of available counters varies depending upon the object. Use the Instance list to select which instance of an object is to be monitored. For example, the Processor object type will have multiple instances if a system has multiple processors. You can select All counters to monitor all counters for an object and/or select All instances to monitor all instances. You can select the Explain button to display an explanation of the counter. When you have selected the counters and instances, select Add.
Repeat the previous steps to add additional counters to the chart. Select Close when all desired counters have been added to the chart.
To save these settings for subsequent use, select Save As from the Console menu. Provide a file name in the Save As dialog box. The Microsoft Management Console settings file (.MSC file) will save all Performance Monitor settings.
The Performance Monitor can be used to collect and record data over a period. These logs can later be analyzed to identify long- term trends or to troubleshoot problems. The following procedure can be used to create a Performance Monitor log file.
Start the Performance Monitor from the Windows 2000 Start menu by selecting Programs Administrative Tools Performance.
Expand the Performance Logs and Alerts item in the MMC tree pane (i.e., the left pane in the window), and then select the Counter Logs item (Figure 10.14).
Figure 10.14: The Counter Logs window
Right-click in the details pane (i.e., the right pane) and select New Log Settings.
Enter a name into the New Log Settings dialog box and select OK to display the log settings dialog box. The title of this dialog box will match the file name you entered into the New Log Settings dialog box (Figure 10.15).
Figure 10.15: The Log Settings dialog box
Select Add to display the Select Counters dialog box (Figure 10.16).
Figure 10.16: The Select Counters dialog box
Select the Exchange server to be monitored. You can monitor the local system by selecting Use local computer counters. To monitor another server in your network, select Select counters from computer and select the system from the associated drop-down list.
Use the Performance object drop-down list to select an object to monitor. Recommended objects to monitor are listed later in this chapter.
Select the counters and instances to be monitored. The set of available counters varies depending upon the object. Use the Instance list to select which instance of an object is to be monitored. For example, the Processor object type will have multiple instances if a system has multiple processors. You can select All counters to monitor all counters for an object and/or select All instances to monitor all instances. You can select the Explain button to display an explanation of the counter. When you have selected the counters and instances, select Add.
Repeat the previous steps to add additional counters. Select Close when all desired counters have been added.
On the General tab, set the Interval value to the desired interval for collecting sample data.
Select the Log Files tab (Figure 10.17).
Figure 10.17: The Log Files tab
Use the Location field to identify the directory where the log file will be stored.
Enter the log file name into the File name field.
You can use the End file names with field and Start numbering at field to append sequence numbers to the end of the log file names.
Select the Schedule tab (Figure 10.18).
Figure 10.18: The Schedule tab
Use the Start log fields to enter the time when logging operations should begin.
Use the Stop log fields to enter the time when logging operations should stop. Logging can be stopped automatically after a specified duration (e.g., 1 day), stopped at a specific time, or stopped manually using the shortcut menu.
When you have entered all information, select OK to create the logging entry.
To save these settings for subsequent use, select Save As from the Console menu. Provide a file name in the Save As dialog box. The Microsoft Management Console settings file (.MSC file) will save all Performance Monitor settings.
The Performance Monitors alert view provides for setting thresholds on counters. When a counters threshold is exceeded, the date and time of the event are recorded in the Alert window.
The following procedures can be used to specify alerts using Performance Monitor:
Start the Performance Monitor from the Windows 2000 Start menu by selecting Programs Administrative Tools Performance.
Expand the Performance Logs and Alerts item in the MMC tree pane (i.e., the left pane in the window), and then select the Alerts item (Figure 10.19).
Figure 10.19: The Alerts window
Right-click in the details pane (i.e., the right pane) and select New Alert Settings.
Enter a name into the New Alert Settings dialog box and select OK to display the log settings dialog box. The title of this dialog box will match the file name you entered into the New Alert Settings dialog box.
On the General tab, select Add to display the Select Counters dialog box (Figure 10.20).
Figure 10.20: The Select Counters dialog box
Select the Exchange server to be monitored. You can monitor the local system by selecting Use local computer counters. To monitor another server in your network, select Select counters from computer and select the system from the associated drop-down list.
Use the Performance object drop-down list to select an object to monitor. Recommended objects to monitor are listed later in this chapter.
Select the counters and instances to be monitored. The set of available counters varies depending upon the object. Use the Instance list to select which instance of an object is to be monitored. For example, the Processor object type will have multiple instances if a system has multiple processors. You can select All counters to monitor all counters for an object and/or select All instances to monitor all instances. You can select the Explain button to display an explanation of the counter. When you have selected the counters and instances, select Add.
Repeat the previous steps to add additional counters. Select Close when all desired counters have been added. The alert settings dialog box will be redisplayed.
Use the Alert when the value is drop-down list to select the condition to test (either Under or Over) and enter the threshold value for issuing the alert into the Limit field (Figure 10.21).
Figure 10.21: The General tab
Use the Interval field and the Units drop-down list to specify the desired interval for sampling data.
Select the Action tab to select the actions to be executed when the threshold value for the alert is reached. You may select as many of the actions as needed.
Select the Log an entry in the application event log check box if you want to log an entry when the threshold value is reached (Figure 10.22). You can view entries in the applications event log using the Windows 2000 Event Viewer .
Figure 10.22: The Action tab
Select the Send a network message to check box if you want to send a network alert to a specified workstation. Enter the workstation name into the associated field. The alert message will only be delivered if the workstation is turned on, if a user is logged onto the workstation, and the messaging service is running on the workstation. This type of notification should normally be used only in environments where the network is very reliable such as a dedicated monitoring workstation on the same local area network as the monitor.
You can use the Start performance data log check box to start a predefined performance data log.
You can select Run this program to run your own specialized alert program. A special notification application can be used to alert administrators who are not logged on to the network. For example, a notification application can be used to start a pager program to page an Exchange administrator who is not always logged onto the system. Use the Command Line Arguments button to specify the arguments that are to be passed to the special notification application.
Select the Schedule tab (Figure 10.23).
Figure 10.23: The Schedule tab
Use Start scan to enter the time when monitoring operations should begin.
Use Stop scan to enter the time when monitoring operations should stop. Monitoring can be stopped automatically after a specified duration (e.g., 1 day), stopped at a specific time, or stopped manually using the shortcut menu.
When all information has been specified, select OK to create the alert entry.
To save these settings for subsequent use, select Save As from the Console menu. Provide a file name in the Save As dialog box. The Microsoft Management Console settings file (.MSC file) will save all Performance Monitor settings.
Windows 2000 and Exchange provide many objects that can be monitored. Monitoring all of the possible objects is unnecessary and, in fact, may adversely affect network and server performance. The objects to monitor most closely are those processes that handle message flow through the system. These are the Message Transfer Agent and the message queues for the connectors. Monitors for other secondary objects can be configured to help diagnose suspected problems.
Several types of counters should be monitored:
The Windows 2000 and hardware resources essential to proper functioning of Exchange 2000 should be carefully monitored. This includes such counters as the percentage of CPU time being used, the amount of free disk space available on key disk volumes , etc.
Another group of counters that must be monitored closely includes those counters that show whether e-mail messages are flowing through the system as expected. These counters are principally the number of messages awaiting processing in various Exchange 2000 queues.
A third set of counters includes those items that provide an indicator as to how heavily the e-mail system is being used. This includes counters such as the number of currently active e-mail users, the rate at which messages are being processed , etc. Data from these counters should be collected when the e-mail system first becomes operational, and then periodic checks should be made to track the growth of e-mail usage. These counters can be helpful to justify increasing hardware to support increasing workloads.
There are a great number of Exchange counters that do not fall into any of the three categories listed above. These generally do not need to be closely tracked, but they are often useful for troubleshooting problems.
The administrator should create shortcuts for the Microsoft Management Console settings file (.MSC file) on the system that will be used to monitor the Exchange objects. The complete list of Performance Monitor counters is extensive . The following sections contain only those counters relevant to Exchange, and include some recommended Microsoft Management Console settings files.
The Performance Monitor counters in this section are recommended as the foundation set of counters for all Exchange servers. These counters monitor the Windows 2000 and hardware resources essential to proper functioning of Exchange 2000. Additional counters should be added in accordance with the function of the Exchange server. These counters should be combined into a single performance monitor graph with an update interval of about 2 minutes:
Logical Disk
% Disk Time . This counter records the percentage of time a hard drive is either reading or writing. A sustained value above 90% indicates that the hard drive is a performance bottleneck. (The Windows NT diskperf -yv command must be used to activate disk monitoring.) There is one instance for each logical drive.
% Free Space . Trigger an alert if the amount of free disk space available drops below 25%. Logical disk space counters need to be enabled before they can be monitored. Enabling the logical disk counters is done using the diskperf yv command on the system being monitored, and then rebooting the system. For more information about using the diskperf command, type diskperf - ? at the command prompt. There is one instance for each logical drive, and each instance should be monitored.
Free Megabytes . This is the amount of free space on the transaction log drive. It is especially important to monitor the log drive to ensure that it does not fill up with log files. LOG files are removed whenever an online backup is performed. If the LOG files are not being removed, verify that the backups are being completed successfully.
Memory
% Committed bytes in use . This counter is the ratio of Committed Bytes (physical memory in use for which space has been reserved in the paging file) to Commit Limit (determined by the paging file size ). Trigger an alert if the use of virtual memory exceeds 80%.
Pages/sec . This counter measures memory paging from/to the virtual memory paging file. A sustained high number of pages/sec indicates the need for additional memory. Brief spikes generally do not indicate a problem and can be ignored.
Paging File
% Usage . The paging file usage should generally remain between 15% and 35%. Usage above 60% usually indicates a problem such as a memory leak or too little RAM. Rebooting the system will provide a short-term, temporary solution. A consistent usage greater than 90% should be considered a critical situation.
PhysicalDisk
Avg. Disk Queue Length . Trigger an alert when the average disk queue exceeds 8.
Process
% Processor Time . The processor time should be monitored for the following instances: (The exact list of processes will vary depending upon the Exchange components that have been installed on the system.)
CCMCMicrosoft Exchange Connector for Lotus cc:Mail
EMSMTAMicrosoft Exchange MTA Stacks
INETINFOInternet Protocols, including IMAP4 and POP3
MADMicrosoft Exchange System Attendant
MTMS Mail Connector Interchange
STOREMicrosoft Exchange Information Store
The % Processor Time is the percentage of elapsed processor time that all of the process threads used. On systems with multiple processors, the maximum counter value is 100% times the number of processors. The processor time for these services should never be at 0% or at the maximum value (i.e., 100% times the number of processors) all of the time. If a process is always at 0%, check Programs Administrative Tools Services to verify that the process is running. If a process is always at the maximum value, check the Event Viewer to identify the problem.
Processor
% Processor Time . This counter records the percentage of time the processor is running non-idle threads and is the primary indicator of processor activity. Servers with multiple processors will have an instance (0, 1, 2, and so on) for each processor. Each instance (i.e., processor) should be monitored. An average value below 20% indicates the processor is lightly used or services are down. An average value consistently above 90% indicates that the processor is being overworked. Trigger an alert if the processor utilization exceeds 90%.
Redirector
Bytes Total/sec . This counter measures the bytes per second sent/ received by the network redirector. To see if network traffic is a problem, compare the maximum throughput of the network card with the maximum value of this counter.
Network Errors/sec . This measures the number of unexpected network errors received by the redirector.
The exact list of objects and counters will vary depending upon the Exchange components being used on the system. For example, if the cc:Mail Connector software is not being used, then the counters for this component will not be available.
The counters listed in this section are those that show whether e-mail messages are flowing through the system as expected. These counters should be combined into a single performance monitor graph with an update interval of about 2 minutes:
MSExchangeCCMC
Microsoft Exchange MTS-IN . This is the current count of messages awaiting delivery to Microsoft Exchange.
Microsoft Exchange MTS-OUT . This is the current count of messages awaiting delivery to Lotus cc:Mail.
MSExchangeIS
RPC Requests . This is the number of client requests that are currently being processed by the information store.
MSExchangeIS Mailbox
Receive Queue Size . Trigger an alert if the number of messages in the private information store receive queue is greater than 20.
Send Queue Size . Trigger an alert if the number of messages in the private information store send queue is greater than 20.
MSExchangeIS Public
Receive Queue Size . Trigger an alert if the number of messages in the public information store receive queue is greater than 20. This should be set on public folder servers only.
Send Queue Size . Trigger an alert if the number of messages in the public information store send queue is greater than 20. This should be set on public folder servers only.
MSExchangeMTA
Work Queue Length . This counter is the count of messages in the MTA queues awaiting delivery to other servers or awaiting processing by the MTA. The Work Queue Length should increase and decrease between 0 and 50. When messages are stuck in the queue, the counter will remain level or only increase for extended periods. Watch for artificial floors on the MTA queue.
A high number indicates a probable problem. An alert should be triggered if the Work Queue Length is greater than 100.
Divide this value by the Messages/Sec value to get an estimate of the delay that messages experience when delivered or sent.
MSExchangeNMC
Message Queued Inbound . This is the current count of Lotus Notes messages queued at the connector for delivery to Exchange.
Message Queued Outbound . This is the current count of Exchange messages queued at the connector for delivery to Lotus Notes.
MSExchangeSRS
Pending Replication Synchronizations . This counter shows the number of unanswered synchronization requests sent by this server. The synchronization process is complete when the Pending Replication Synchronizations counter and the Remaining Replication Updates counter both reach zero.
Remaining Replication Updates . This counter shows the number of object modifications waiting to be applied to the local server. The synchronization process is complete when the Pending Replication Synchronizations counter and the Remaining Replication Updates counter both reach zero.
SMTP Server
Categorizer Queue Length . This is the current count of messages in the categorizer queue.
Current Messages in Local Delivery . This is the current count of messages that are being processed by a server event sink for local delivery.
Local Queue Length . This is the current count of messages in the local queue.
Local Retry Queue Length . This is the current count of messages in the local retry queue.
Messages Pending Routing . This is the current count of messages that have been categorized but not routed.
Remote Queue Length . This is the current count of messages in the remote queue.
Remote Retry Queue Length . This is the current count of messages in the retry queue for remote delivery.
The counters listed in this section are those that provide an indicator as to how heavily the e-mail system is being used. Unlike the first two groups of counters, these do not need to be as closely watched. Instead, the administrator should periodically collect data for these counters and compare the collected data to the system baseline created when the e-mail system was first configured. The need to add additional hardware resources can often be justified by the e-mail usage increases shown with these counters:
MSExchangeCCMC
Messages sent to Lotus cc:Mail/hr . This is the rate that messages are being sent to cc:Mail.
Messages sent to Microsoft Exchange/hr . This is the rate that messages are being received from cc:Mail.
MSExchangeIS
Active User Count . This is the number of user connections that have shown some activity in the last 10 minutes.
User Count . User Count is the number of users connected to the information store.
MSExchangeIS Mailbox
Messages Delivered/min . This counter tracks the rate at which messages are being delivered to the Private Information Store. This includes both messages delivered to the information store by the MTA and those submitted directly to the information store from clients on this server.
This counter should usually be in the range of 10 to 40 messages per minute. If the value is constantly under 5 messages per minute while there are pending items in the MTA queue, then it is possible that the server is under a heavy load or there is a problem with one of the processes. If this value is very high (e.g., greater than 200 messages per minute) for an extended period, there may be a message stuck in the MTA queue.
Messages Sent/min . This counter tracks the rate at which messages are sent from the Information Store to the MTA to be transported to other servers or gateways.
MSExchangeIS Public
Messages Delivered/min . This counter tracks the rate at which messages are being delivered to the Public Information Store. This includes both messages delivered to the information store by the MTA and those submitted directly to the Information Store from clients on this server.
Messages Sent/min . This counter tracks the rate at which messages are sent from the Public Information Store to the MTA to be transported to other servers or gateways.
MSExchangeMSMI
Messages Received/hr . This is the rate that Exchange is receiving MS Mail messages.
Messages Sent/hr . This is the rate that Exchange messages are being sent to MS-Mail.
MSExchangeMTA
LAN Receive Bytes/Second . This is the rate that bytes are received over a LAN from MTAs.
LAN Transmit Bytes/Second . LAN Transmit Bytes/sec is the rate that bytes are transmitted over a LAN to MTAs.
Messages/Sec . This counter is the running average of the rate that messages are processed. This counter can be used to monitor the message traffic between servers.
MSExchangeNMC
Rate of messages received . This is the rate that Exchange is receiving Lotus Notes messages.
Rate of messages sent . This is the rate that Exchange messages are being sent to Exchange.
MSExchangePCMTA
Messages Received/hr . This is the rate that Exchange is receiving messages.
Messages Sent/hr . This is the rate that messages are being sent by Exchange.
SMTP Server
Messages Delivered/sec . This is the rate that messages are delivered to local mailboxes.
Messages Received/sec . This is the rate that inbound messages are being received.
Messages Sent/sec . This is the rate that outbound messages are being sent.
The primary role of the remaining performance monitor counters is to aid with troubleshooting. There are many Exchange- related objects, including:
MSExchange Oledb Events | MSExchangeIS Mailbox |
MSExchange Oledb Resource | MSExchangeIS Public |
MSExchange Web Mail | MSExchangeIS Transport Driver |
MSExchangeAL | MSExchangeMSMI |
MSExchangeCCMC | MSExchangeMTA |
MSExchangeCONF | MSExchangeMTA Connections |
MSExchangeDcsMgr | MSExchangeNMC |
MSExchangeDSAccess Caches | MSExchangePCMTA |
MSExchangeDSAccess Contexts | MSExchangePOP3 |
MSExchangeDSAccess Processes | MSExchangeSA NSPI Proxy |
MSExchangeES | MSExchangeSRS |
MSExchangeGWC | MSExchangeT120 |
MSExchangeIMAP4 | MSExchangeTransport Store Driver |
MSExchangeIpconf | SMTP NTFS Store Driver |
MSExchangeIS | SMTP Server |
Hundreds of counters for these objects can be monitored as needed to diagnose problems. The following is a small sample of the additional objects that may be useful for troubleshooting:
MSExchangeCCMC
DirSynch to Lotus cc:Mail . This counter is the number of directory updates sent to Lotus cc:Mail since the last DirSynch started.
DirSynch to Microsoft Exchange . This counter shows the number of entries updated in the Exchange global address list since the last directory synchronization cycle.
NDRs to Lotus cc:Mail . This counter shows the number of nonde livery reports submitted to Lotus cc:Mail by the connector since the Exchange Connector for Lotus cc:Mail service was started.
NDRs to Microsoft Exchange . The counter shows the number of non-delivery reports submitted to Exchange by the connector since the Exchange Connector for Lotus cc:Mail service was started.
Messages sent to Microsoft Exchange . This is the number of messages sent from Lotus cc:Mail to an Exchange Server since the Exchange Connector for Lotus cc:Mail service was started.
Messages sent to Lotus cc:Mail . This counter shows the number of messages sent from Exchange to Lotus cc:Mail since the Exchange Connector for Lotus cc:Mail service was started.
MSExchangeIS Mailbox
Average Delivery Time . This counter shows the average length of time that the ten most recent messages waited in the Information Store queue before being transferred to the MTA. A high value often indicates an MTA performance problem.
Average Local Delivery Time . This counter shows the average length of time that the ten most recent local delivery messages waited in the Information Store queue before being transferred to a local mailbox. A high value could indicate a Private Information Store performance problem.
Message Recipients Delivered/min . This counter shows a continuous average of the number of messages sent per minute divided by the number of recipients to which the messages were sent. This provides a fairly accurate count of the actual number of deliveries.
MSExchangeIS Public
Average Time for Delivery . This counter shows the average length of time that the ten most recent messages waited in the Public Information Store queue before being transferred to the MTA. A high value often indicates an MTA performance problem.
Average Time for Local Delivery . This counter shows the average length of time that the ten most recent local delivery messages waited in the Public Information Store queue before being transferred to a local mailbox. A high value could indicate a Public Information Store performance problem.
Message Recipients Delivered/min . This counter shows a continuous average of the number of messages sent per minute divided by the number of recipients to which the messages were sent. This provides a fairly accurate count of the actual number of deliveries.
MSExchangeMSMI
Messages Received . This counter measures the number of messages received by Exchange from the Microsoft Mail Connector since MS Mail Connector Interchange was started. If this number is increasing, the connector is receiving mail. If this number is not changing, there could be either no mail to transfer or there could be a problem.
MSExchangePCMTA
LAN/WAN Messages Moved/ hour . This is the rate that LAN/WAN messages are moved.
File contentions/hour . The Microsoft Mail Connector MTA, any other Microsoft Mail MTA, and MS Mail clients try to get exclusive read/write access to key post office files. Some number of file contentions is normal. However, too many contentions could indicate a locked file or too much traffic going through the post office.
SMTP Server
Inbound Connections Current . This is the current count of connections to the SMTP service established by other SMTP hosts .
Inbound Connections Total . This is the total number of connections the SMTP service has accepted from other hosts since the service was started.
Outbound Connections Current . This is the current count of connections the SMTP service has established to other SMTP hosts.
Outbound Connections Refused . This is the total number of connections the SMTP service has attempted to other hosts that have been refused since the service was started.
Outbound Connections Total . This is the total count of successful connections that the SMTP service has established since the service was started.
Messages Received Total . This is the total number of inbound messages accepted.
NDRs Generated . This counter shows the total number of Non Delivery Reports generated for inbound mail.
Messages Sent Total . This is the total number of outbound messages delivered to their destinations.
Message Bytes Received Total . This counter shows the total size of all inbound messages transferred to Exchange.
Message Bytes Sent Total . This counter shows the total size of all outbound messages transferred from Exchange.
The Windows 2000 performance monitor MMC files can also be added to the Startup folder using the following procedure:
Open the Taskbar and Start Menu Properties window from the Windows 2000 Start menu by selecting Settings Taskbar & Start Menu.
Select the Advanced tab.
Select Add.
Enter the path and file name for the performance monitor MMC file using the following format:
mmc filename
where filename is the path and file name of a performance monitor MMC file.
Select Next to display the Select Program Folder window.
Select the Startup folder, and then select Next.
Enter a name to appear on the Startup menu, and then select Finish.
In the Taskbar and Start Menu Properties window, select OK.