3.7. TroubleshootingNot surprisingly, the MOM 2005 management pack includes 95 event processing rules and 17 performance processing rules that specifically target agents, as well as hundreds of other rules that monitor the other MOM components. MOM is well equipped for self-diagnosis. You will rely on MOM's ability to report on itself in almost all troubleshooting scenarios, and almost all troubleshooting is done in the Operator console. The main concerns of troubleshooting agents are:
These concerns are listed from most critical to least, but an agent has to be doing all of these things successfully to be fully functional. The first place to check the current status of any service is in the State view in the Operator console (see Figure 3-35). Here, the focus is on homesqlserver in the results pane. The current state of each machine is reported in the leftmost column and is the worst state of any of the monitored components for that machine. In the MOM agent column, you can see that the Figure 3-35. The current agent condition is reported in the State viewstate is reported as successful. In the Details pane, the heartbeat component of the overall MOM agent state is also good. If the agent service on homesqlserver was stopped to simulate an agent failure, within a minute the State view would update to reflect the change in the condition of the agent on homesqlserver (see Figure 3-36). Figure 3-36. State change indicates a missing heartbeat and possibly a failed agentThe agent failure also generates a MOM Agent heartbeat failure alert in the Alerts view. This is the same alert examined in "The Life of a MOM 2005 Alert" section in Chapter 1. From the information in the alert, you know that MOM pinged homesqlserver and got a successful response, but the agent is not responding. On the product knowledge tab, one of the possible solutions is to ensure that the agent service is running on the target computer. To further troubleshoot this, open the Tasks pane while in the State view to keep the focus on homesqlserver and run the Start MOM 2005 Service task (Figure 3-37). The output of the task is returned in the console task output box and the MOM agent and heartbeat status are both returned successfully. The State view also shows the time and date of the last heartbeat, which is the time that the computer was last contacted, when the target computer's name in the computer column is selected (see Figure 3-38). Figure 3-37. Running the Start MOM 2005 Service task in the tasks paneFigure 3-38. Date and time of the last heartbeatNext, to ensure that the agent is running correctly, run the Test End to End Monitoring task in the Tasks pane while the focus is on homemomserver. This task causes the agent to place a specific event in the target server's Application event log (event ID 22078), which then generates an informational alert that reports back to MOM (see Figure 3-39). To track the status of the task, switch to the Public Views/Task Status folder shown in Figure 3-40. The Test End to End Monitoring task generates two events in this console: a 9897, which states that the Test End to End Monitoring task has been scheduled, and a 9898, which states that the task completed successfully (see Figure 3-40). Figure 3-39. Launching the Test End to End Monitoring taskFigure 3-40. Task status trackingThe first four troubleshooting concerns have been addressed, but you still need to know if the agent is receiving configuration update information correctly. To do this, you make a minor change to a processing rule in the Administrator console (point 1 in Figure 3-41) and then Commit Configuration Change (see point 2 in Figure 3-41). In this case, disable a rule in the Management Packs Rule Groups Microsoft Operations Manager Operations Manager 2005 Agent Performance Rules rule group. MOM 2005 will submit a task to update the rules on the affected agents. Figure 3-41. Disable a rule in the Administrator consoleWhen this happens, an event ID 21240 is generated in the Application Event Log on the target server. You then can either watch the Application Event Log on the target computer or switch to the Public Views/Events container to see that event be returned from the target machine (see Figure 3-42). |