9.12 Recovering from a disaster

monitoring and managing microsoft exchange 2000 server
Chapter 10 - Monitoring Exchange
Monitoring and Managing Microsoft Exchange 2000 Server
by Mike Daugherty  
Digital Press 2001
 

10.1 Monitoring policies

Regardless of which software tools are used to monitor the Exchange messaging environment, they are just tools. Even the best tools, if used improperly or inconsistently will not provide the desired services. Discipline, in the form of established monitoring policies, must be combined with proper use of the software tools to provide mission critical monitoring of the Exchange environment. This section provides some suggested guidelines that should be established.

Most operations teams deploy a single system to monitor the health of the messaging system. However, the team should consider having a second monitoring system at another site in the event that the primary monitoring site experiences a major outage . If multiple monitoring systems are deployed, it is important to ensure that all monitoring systems use the same tools. Each different monitoring tool collects data using its own specialized process. If multiple monitoring tools are used, the impact of these multiple data collection processes could unduly affect the Exchange server performance. Using a standard monitoring tool across the corporation will ensure that the monitoring process does not overly affect the servers. The use of common monitoring tools will also facilitate sharing skills and personnel.

The monitoring policy should define the objects to be monitored , the conditions to tested , the polling frequency and the actions to be taken. Two general conditions indicate a situation that requires attention: state changes and threshold exceptions. Monitoring polices should be defined for each of these conditions.

Monitoring provides the most immediateand sometimes the only indication of a problem and should be done 24 hours per day, 7 days per week.

Once monitor settings have been defined, a baseline set of data should be collected, and the same set of data should continue to be collected at regular intervals. Any changes should be carefully considered and tested before being implemented in the production messaging environment. At times, the process of investigating and solving a specific problem may necessitate that the Exchange administrator monitor an object that is not currently being monitored. In these cases, it is better to start a new monitoring session than to modify one of the permanent monitor sessions that are used to collect data on the production environment.

Messaging system problems will occur. The speed with which the problem can be solved , depends largely on whether the Exchange administrator has experienced and solved the same problem previously. When a team of Exchange administrators shares responsibility for managing the messaging environment, it is useful if they pool their experience and knowledge. Therefore, any significant event reported by the monitoring process should be recorded in the Exchange administrators daily report. The daily report entry should describe the problem and the associated solution so the entire administrative team can learn from the experience. You should review these reports on a regular basis to determine if there are recurring problems that can be prevented by additional monitoring or changing certain configurations or procedures.

10.1.1 Alerts and notification

It is a good practice to allocate several workstations with large display terminals for monitoring the Exchange messaging environment. The workstations should only be used to run monitoring and administrative tools. A technician should be responsible for watching these display terminals.

However, the on-duty technician cannot always be monitoring the terminals. Monitoring software uses alerts to notify an Exchange administrator of a situation requiring attention. Each different monitoring software product has its own method to indicate the presence of an alert. The software also usually allows the administrator to assign a priority level to different types of potential problems to differentiate the severity of the problem.

The Exchange administrators will need to consider and define the policies regarding alerts and notification. Exchange monitoring policies should be reviewed quarterly to ensure that the policies are continuing to provide the information needed to maintain a high-quality , reliable Exchange messaging service, without placing an undue burden on the servers and network, and without generating excessive unneeded alert conditions. This includes addressing the following topics:

  • Who should be notified when an alert is raised? Usually the primary recipient is the on-call Exchange operator. However, there may also be secondary notification distribution lists to be used for operational, technical, management and user notification. The list of recipients also often differs based on the severity of the alert. The recipient list can also differ based on the source of the problem.

  • How should the notification be delivered? Should this be a Windows 2000 alert message delivered to a specific terminal or user? Should the alert be delivered using an e-mail message? Remember that the source of the problem may keep an e-mail notification message from being delivered. Another common delivery method is to use paging softwarealthough this method should probably be reserved for critical alerts, only. The mechanism for delivering the notification may vary based on the severity of the problem, the source of the problem, and whether the recipient is the primary recipient (e.g., the on-call operator). Most organizations implement a combination of notification mechanisms using both electronic mail and paging software. Table 10.1 lists the recommended notification mechanisms based on the alert priority.

Table 10.1: Alert Notification

Alert Priority

Description

Notification

Urgent

A condition has been detected that requires immediate attention.

Pager Notification:

  • On-call Operator

  • Backbone Management

E-mail Notification:

  • On-call Operator

  • Backbone Management

  • Messaging Service Managers

  • Level 2 Support

  • Level 3 Support

Warning

A condition has been detected that suggests a potential or impending problem. This situation should be investigated and corrective action should be initiated before the problem becomes critical.

Pager Notification:

  • None

E-Mail Notification:

  • On-call Operator

  • Backbone Management

  • Messaging Service Managers

  • Level 2 Support

  • Level 3 Support

Informational

An event has occurred that may prove useful in understanding and tracking the behavior of the system, but does not necessarily indicate an error.

Pager Notification:

  • None

E-Mail Notification:

  • On-call Operator

  • Backbone Management

Regardless of the notification delivery mechanism, each priority should be clearly and quickly distinguishable . Urgent alerts must be indicated in a manner that demands attention. For example, an Urgent alert may cause an objects icon to turn red and flash with an accompanying audible alarm. A Warning may only cause the objects icon to turn yellow and flash. An Informational alert may only add some small indicator to the icon.

  • What recovery actions should the monitoring software automatically attempt? For example, many monitoring programs have the capability to automatically restart Exchange services that have stopped . This automation avoids the need for an immediate human response to the problem. However, if the restarted service continues to crash, it would be unwise to repeatedly execute the automatic restart procedures. Most monitoring software will provide mechanisms to avoid this type of problem.

  • What actions should be taken by the on-call operator when an alert is received? Again, this may differ based on the severity and source of the problem.

 


Monitoring and Managing Microsoft Exchange 2000 Server
Monitoring and Managing Microsoft Exchange 2000 Server (HP Technologies)
ISBN: 155558232X
EAN: 2147483647
Year: 2000
Pages: 113

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net