Section 3.7. Troubleshooting


3.7. Troubleshooting

Not surprisingly, the MOM 2005 management pack includes 95 event processing rules and 17 performance processing rules that specifically target agents, as well as hundreds of other rules that monitor the other MOM components. MOM is well equipped for self-diagnosis. You will rely on MOM's ability to report on itself in almost all troubleshooting scenarios, and almost all troubleshooting is done in the Operator console.

The main concerns of troubleshooting agents are:

  • Is the agent up or down?

  • If it is up, is the agent providing a heartbeat?

  • If the agent is not providing a heartbeat, when was the last successful contact with the agent?

  • If the agent is up, is it successfully sending event, alert, and performance data to the management server?

  • Is the agent successfully receiving updates from the management server?

These concerns are listed from most critical to least, but an agent has to be doing all of these things successfully to be fully functional. The first place to check the current status of any service is in the State view in the Operator console (see Figure 3-35).

Here, the focus is on homesqlserver in the results pane. The current state of each machine is reported in the leftmost column and is the worst state of any of the monitored components for that machine. In the MOM agent column, you can see that the

Figure 3-35. The current agent condition is reported in the State view


state is reported as successful. In the Details pane, the heartbeat component of the overall MOM agent state is also good. If the agent service on homesqlserver was stopped to simulate an agent failure, within a minute the State view would update to reflect the change in the condition of the agent on homesqlserver (see Figure 3-36).

Figure 3-36. State change indicates a missing heartbeat and possibly a failed agent


The agent failure also generates a MOM Agent heartbeat failure alert in the Alerts view. This is the same alert examined in "The Life of a MOM 2005 Alert" section in Chapter 1. From the information in the alert, you know that MOM pinged homesqlserver and got a successful response, but the agent is not responding. On the product knowledge tab, one of the possible solutions is to ensure that the agent service is running on the target computer. To further troubleshoot this, open the Tasks pane while in the State view to keep the focus on homesqlserver and run the Start MOM 2005 Service task (Figure 3-37).

The output of the task is returned in the console task output box and the MOM agent and heartbeat status are both returned successfully. The State view also shows the time and date of the last heartbeat, which is the time that the computer was last contacted, when the target computer's name in the computer column is selected (see Figure 3-38).

Figure 3-37. Running the Start MOM 2005 Service task in the tasks pane


Figure 3-38. Date and time of the last heartbeat


Next, to ensure that the agent is running correctly, run the Test End to End Monitoring task in the Tasks pane while the focus is on homemomserver. This task causes the agent to place a specific event in the target server's Application event log (event ID 22078), which then generates an informational alert that reports back to MOM (see Figure 3-39).

To track the status of the task, switch to the Public Views/Task Status folder shown in Figure 3-40. The Test End to End Monitoring task generates two events in this console: a 9897, which states that the Test End to End Monitoring task has been scheduled, and a 9898, which states that the task completed successfully (see Figure 3-40).

Figure 3-39. Launching the Test End to End Monitoring task


Figure 3-40. Task status tracking


The first four troubleshooting concerns have been addressed, but you still need to know if the agent is receiving configuration update information correctly. To do this, you make a minor change to a processing rule in the Administrator console (point 1 in Figure 3-41) and then Commit Configuration Change (see point 2 in Figure 3-41). In this case, disable a rule in the Management Packs Rule Groups Microsoft Operations Manager Operations Manager 2005 Agent Performance Rules rule group. MOM 2005 will submit a task to update the rules on the affected agents.

Figure 3-41. Disable a rule in the Administrator console


When this happens, an event ID 21240 is generated in the Application Event Log on the target server. You then can either watch the Application Event Log on the target computer or switch to the Public Views/Events container to see that event be returned from the target machine (see Figure 3-42).




Essential Microsoft Operations Manager
Essential Microsoft Operations Manager
ISBN: 0596009534
EAN: 2147483647
Year: N/A
Pages: 107
Authors: Chris Fox voc

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net