Working with Alerts, Events, Performance Graphs, and Tasks in the Operator Console(s) | Professional MOM 2005, SMS 2003, and WSUS

At first glance, you may think there is only one Operator Console (OC) — the "thick" one accessed via the Start menu — but that's not the case.! There is an additional, web-based "thin" flavor of the OC installed on the MOM server; it can be accessed using the URL http://www.yourMomServer:1272/.

While the two are somewhat similar, you will find that the "thin" flavor is really intended for basic monitoring functions whereas the "thick" one is more applicable for in-depth monitoring. The "thick" flavor has Performance Graphs, Diagrams, and My Views, while the "thin" does not contain these features. The biggest difference between the two is accessibility: The "thick" one can be accessed on machines with the MOM client components installed, whereas the "thin" can be accessed from virtually anywhere as it is web-based. We will now cover the two consoles in greater depth.

"Thick" Operator Console

This console is the one most people are aware of after installing the MOM product on a machine. We are not going to go into depth about all of the features of this console as it has already been explained in Chapter 6, but we do want to highlight the monitoring-related pieces of it.

When you first start up this console, you will be looking at the default Alert Views (see Figure 13-1). On the left side of the screen is the Alert Views pane, the middle part of your screen contains the Alerts and Alert Details panes, and on the right side of your screen is the Tasks pane. While these panes are populated with alert-related information upon first launching the console, these panes have a greater scope than just alerts. All of the panes are accessible via the View menu.

image from book
Figure 13-1

Pane	Definition
Center pane	Used to display the high-level information of the selected entity (i.e., Alerts, Events, and so on)
Details pane	Used to display the low-level information of the selected entity (i.e., Alerts, Events, and so on)
Navigation pane	Used to explore the various areas of the "thick" OC
Task pane	Used to access the assigned tasks

You can filter alerts (or any other entity for that matter) in the "thick" console via the drop-down labeled Group in the upper-middle portion of the console. When you expand the drop-down, you will be able to choose from the defined computer groups in your MOM installation. After selecting a particular group within the drop-down, all relevant information for the group is displayed.

Note

You can place the computer on which an alert appears into maintenance mode (a state of a monitored machine that does not permit the generation of alerts) by right-clicking on an alert and selecting Put Computer In Maintenance Mode.

Creating Custom Alert Views

In the Alerts View pane you will see an entry for All:Alert Views, Alerts, Service Level Exceptions SLE and an entry for every management pack that has been installed on the MOM server. If you select the All:Alert Views entry, all alerts from your custom alert views (a summation of alerts if you will) are displayed. It is in this pane that you can modify pre-existing alert views, remove alert views, or create new custom alert views. Try the following quick exercise:

Right-click All:Alert Views in the Alert Views pane and select New Alerts View.
Select Alerts that are not Resolved, and click Next.
Type MyAlertView for the View name & the Description and click Finish.

Note

The Alerts view and the SLE view are nothing more than just pre-created custom alert views.

Once you have created your alert views, you can further modify their filtering via right-clicking on a particular view and selecting Properties. A modal dialog box with the wizard that was used for the initial creation of the view appears.

Another way of filtering the alerts you see is by expanding one of the MPs in the Alert Views pane and then selecting either Alerts or SLE. Thus, for every MP you have installed, there are the same two pre-created alert views filtered by each MP. (If you dig a little deeper, you will find that this works by the view filtering on a particular management pack's installed computer group.)

Personalizing Custom Alert Views

In addition to filtering the alerts you see, you can also add/remove columns you wish to see in the Alert Details pane. A table listing the displayed and available columns is shown below. To access this capability, simply right-click on an alert view and select Personalize. Try the following exercise:

Right-click on any alert view and select ⋅Personalize.
Remove all pre-existing display columns.
Add TicketId, Resolution State, Time of First Event, and Time of Last Event to the display columns list.
You will now see the four columns in your personalized alert view.

Displayed Column (by default)	Description
Severity	Indicates the severity of the alert, such as Service Unavailable or Success.
Maintenance Mode	Indicates whether the alert is in maintenance mode.
Domain	Specifies the domain to which the computer belongs.
Computer	Specifies the computer on which an agent generated the alert.
Time Last Modified	Specifies the date and time that the alert was last changed.
Resolution State	Indicates the status of the resolution process of the alert, such as New or Resolved. The resolution state indicates whether the resolu-tion process has begun.
Time in State	Specifies the amount of time that the alert has been in the current resolution state.
Displayed Column (by default)	Description
Problem State	Indicates what problem state the alert is in.
Repeat Count	Specifies the number of identical duplicate alerts that this instance represents.
Name	Specifies the name of the rule that generated the alert.
Source	Indicates where the alert was generated, for example, from MOM, or a specific server.
Ticket Id	Specifies the ticket ID assigned to the alert.
Owner	Specifies the person responsible for tracking and resolving the alert.

Available Column (by default)	Description
Description	Specifies the description of the alert.
Time Of Last Event	Specifies the date and time that the last event was detected.
Time Raised	Specifies the date and time that the alert was raised.
Alert Id	Specifies the unique identifier for the alert.
Rule Id	Specifies the unique identifier for the rule that generated the alert.
Rule Name	Specifies the name of the rule that generated the alert.
CustomFieldn	Specifies the values of user-defined alert fields.
Time Added	Specifies the date and time that the alert was added to the list.
Time of First Event	Specifies the date and time that the alert was first raised. The time is shown in the time zone of the local computer.
Time Resolved	Specifies the date and time that the alert was resolved. The time is shown in the time zone of the local computer.
Resolved By	Specifies the person who resolved the alert.
Modified By	Specifies the person who last modified the alert.
Computer Custom Data n	Specifies the values of user-defined computer data fields.
Maintenance Mode End	Date and time that maintenance mode ended for this alert.
Maintenance Mode User	Specifies the person who put the alert in maintenance mode.
Maintenance Mode Reason	Specifies the reason for putting the computer into maintenance mode.

Events Tab

The Events tab in the Navigation pane behaves no differently than the Alerts tab. You can create new custom events views and personalize custom events views, and there are the default events views called Events and Task Status. The Events view simply includes all events with no filtering at all (by default)! Task Status shows events that are related to task results. The table that follows shows the available columns to be displayed in the Events view.

Available Column	Description
Provider Type	Specifies the type of event provider that generated the event.
Provider Name	Specifies the name of the event provider.
Source Domain	Specifies the domain on which the computer that generated the event resides.
Source Computer	Specifies the computer on which the event generated.
Consolidated	Indicates whether this event represents a number of identical events consolidated by a consolidation rule.
Raises Alert	Indicates whether the event generated an alert and, if so, how many alerts the event generated.
Event Id	Specifies a unique identifier for the event.
User	Specifies the name of the Windows account logged on to the computer that generated the event.
Event Number	Specifies the Windows event number.
Category	Specifies the Windows text string that describes the event.
Source	Specifies the user-defined name identifying the source of the alert.
Description	Specifies the description of the event.
Computer	Specifies the computer on which an agent generated the event.
Domain	Specifies the domain to which the computer belongs.
Time	Specifies the date and time that the event was raised.
Type	Specifies the Windows event type, such as Error, Warning, or Information.

Performance Tab

Performance graphs are accessed in the "thick" OC via the Performance tab. Again, you can create new custom performance views and personalize custom performance views, and there are the default performance views under Performance. The table that follows shows the available columns to be displayed in the Events view (all of which are displayed by default).

Column	Definition
Include	Check box that will include the counter's collected data inside the graph
Domain	Domain of the machine listed
Computer	Machine's name
Performance Object	Resource that is being monitored
Performance Instance	A particular instance of the counter that's being collected
Performance Counter	Specific Windows Performance Counter
Last Sampled Value	Value of the last sample
Last Sampled Time	Timestamp of the last sample

Tasks Pane

Tasks are actions that can diagnose or repair a particular problem. Tasks are accessed in the "thick" OC via the Tasks pane. Some default tasks are always available to you, and there are MP-specific tasks that get installed as part of the corresponding MP. Computer Management, Event Viewer, IP Configuration, Ping, and Remote Desktop are all default Tasks. In one particular MOM instance managed by Derek, one of the authors, they have Microsoft Baseline Security Analyzer, MOM, SQL Server, Windows DFS, IIS, and Windows Base OS MP task groupings (which reflect the current MPs I have installed).

To launch a task simply left-click on the task and it is launched in a new window. One very useful tidbit of information is that when you launch a task it is "targeting" the source computer of whatever alert/event you have highlighted! Thus, if Derek selects an alert that was generated by DereksMachine and then clicks the Ping task, a new window launches pinging DereksMachine.

"Thin" Web Operator Console

The "thin" OC has a title bar, a filter bar, the current date/time, Alerts tab, Computers tab, Events tab, and the Alerts/Alert Details panes. By default, when you browse to the URL, it will select the Alerts tab with the alerts/details displayed (see Figure 13-2). The only filtering that you can do via this interface is by clicking the "filter" link in the filter bar. Doing this opens a modal dialog box containing alert filtering by computer group, basic criteria, and advanced criteria.

image from book
Figure 13-2

Like the "thick" OC, you navigate this console in much the same way, clicking on the various tabs for the alert's properties or selecting a different alert that then refreshes the alert properties. The alerts can be ordered by clicking the column names and an arrow is shown indicating ascending or descending order of content.

Events work just like the alerts do in the "thin" console as well. There is virtually no difference excluding the event-specific information that gets displayed. A link in the lower-left corner of the alert detail's pane is labeled Help; Click this link to open an ⋅htm file with helpful information regarding the Web Console.

Using the Web Reporting Console

In addition to all of the monitoring capabilities that the two OCs provide you with, an additional source of monitoring information is the Reporting Console (RC) (see Figure 13-3). As you are aware, when you install MOM you gain access to several predefined reports via the RC, but there is even more! Each MP that you download and install may include additional reports for that specific MP's target software. You can select reports to be installed in the initial setup wizard of an MP.

image from book
Figure 13-3

Do not be confused, the RC is nothing but an instance of Microsoft Reporting Services (MSRS)! Thus, if you are familiar with MSRS, you should be able to begin using it immediately and quite effectively.

Launching the RC

URL http://www.[yourMomServer]/Reports/Pages/Folder.aspx.
Launch the Administrator Console, select Operations in the Explorer pane, and then left-click the Start Reporting Console link in the details pane.

Alert-Based Monitoring

Alerts occur when preconfigured rules in the administrative console have their conditions met. Once an alert has been generated, the monitoring phase begins. As you are probably aware of by now, alerts are somewhat complex entities, containing several pieces of information. Some of the alert's information is static from its "birth" to its "death." Other alert information is dynamic throughout its life, changing over the course of time. Several tabs are displayed in the Alert Details pane in the "thick" OC; each tab represents a logical collection of related alert information.

Properties Tab

The Properties tab contains several pieces of useful, read-only information. Some of the most useful items are Severity, Resolution State, Time First Raised, Time Last Raised, Description, Age, and Repeat Count. The table that follows shows all available properties.

Property	Definition
Description	A description of an alert.
Name	The alert's name.
Severity	The importance of an alert.
Resolution State	Status of an alert's resolution.
Domain	Machine's domain that was responsible for throwing the alert.
Computer	Machine responsible for throwing the alert.
Time of First Event	Timestamp of the alert's first occurrence (using Suppress Duplicate Alerts).
Time of Last Event	Timestamp of the alert's last occurrence (using Suppress Duplicate Alerts).
Alert Latency	Time from when the alert was thrown at the originating server to when the alert was in the MOM server for monitoring purposes.
Problem State	The Problem State shows the current state of the problem. It indicates if the reported problem is still occurring.
Repeat Count	Number of occurrences (using Suppress Duplicate Alerts).
Age	Age of alerts.
Source	Source of the alert.
Alert Id	Identifier of the alert.
Rule	Corresponding rule's location.

Custom Properties Tab

This tab is useful for a handful of purposes, all of which relate to applying custom information against a particular alert. First we have an Alert Owner property, which can be used to assign alert ownership to a particular MOM operator. Later in this chapter, we go into more detail about using this feature effectively. The Ticket ID property is useful if the alert is being tracked alongside a separate ticketing system; thus you would enter that system's ticket ID in this property to help enforce the relationship. Custom fields are useful for any other custom information you may wish to track with your alerts.

Events Tab

The Events tab is useful when you have events that are related to a particular alert. An example of when this is useful would be when you have an alert described as "The response 'script: SQL Server 2000 Service Discovery' has been running more than 600 seconds and exceeded the time allowed to run." The event behind this alert has the exact same description but they could have been different. The point is that lots of events occur but may not necessarily lead to an alert getting generated! In the section "Event-Based Monitoring" later in this chapter you can go from an event to an alert in exactly the same manner.

Product Knowledge Tab

One of the biggest selling points of MOM is its extensibility. Software vendors can develop MPs in parallel with their software products and supply both to their customers. A vendor's MP contains their own internal knowledge of their software products; it is in the Product Knowledge tab that this information is shown per MP rule!

Note

A link in some product knowledge tabs directs you to the vendor's web site for more detailed and current knowledge. The link is labeled MOM Online.

Company Knowledge Tab

In addition to vendor-supplied knowledge there is also your own organization's accumulated knowledge about various software products (sometimes this exceeds the vendor's, too). By selecting this tab you can view any company-specific information regarding the alert's rule. You can create or update your company's information about a particular rule in one of two ways. The first is by simply clicking the Edit button on this tab via the Alert Detail's view. A second method is detailed below:

Launch the Administrator Console, and expand Management Packs/Rule Groups/Microsoft Operations Manager/Operations Manager 2005/Agent/Performance Rules/.
Double-click the Report Collection-System Uptime rule.
Select the Knowledge Base tab, and click Edit.
Type in test, and click OK.
Click OK again.

History Tab

This information in this tab includes the source, a date and timestamp, the GUID for the rule that generated the alert, and the name of the management group. If you want to add more information to the history, click the Append button.

Event-Based Monitoring

Events are not as important per se as alerts, but it's not wise to disregard them either. A wealth of good information is contained in events. Let's now discuss the various attributes of an event using the "thick" OC.

Properties Tab

This tab contains much of the same information as the Alert's Details properties tab does. A few items of interest here are the Raises Alert, Provider Type, Source, and Time properties. The table that follows provides a listing of all available properties.

Property	Definition
Description	A description of an event
Domain	Machine's domain that was responsible for throwing the event
Computer	Machine responsible for throwing the event
Time	Timestamp of the event's occurrence
Type	Specifies the Windows event type, such as Error, Warning, or Information
Provider Name	Specifies the name of the event provider, such as Application or Security
Event Number	The Windows event number
Provider Type	Specifies the type of event provider that generated the event, such as a Windows event log
Source	Source of the event
Category	Specifies the Windows text string that describes the event
Raises Alert	Indicates whether the event generated an alert and, if so, how many alerts the event generated
Consolidated	Indicates whether this event represents a number of identical events consol-idated by a consolidation rule and, if so, the number of consolidated events this event represents
Event ID	Identifier of the event

Alerts Tab

The Alerts tab contains information only when an alert was raised as a result of the event you are currently viewing. As noted earlier, MOM makes it very easy to switch between events and alerts as they have a logical relationship. Now review the columns for the Alerts tab in the event details pane.

Column	Definition
Severity	The importance of an alert
Time Last Modified	Timestamp of when the event was last changed
Resolution State	Status of an alert's resolution
Source	Source of the alert
Name	The alert's name
Owner	Specifies the person responsible for tracking and resolving the alert

Parameters Tab

This tab contains read-only information that is useful in diagnosing why a particular event was raised. Some rules have actions defined in them that receive parameters; it is these parameters that get tracked with the event that was raised as a result. The following table shows the columns defined in the Parameters tab in the event details pane.

Column	Definition
Position	The ordering of a parameter
Name	The name of a parameter
Value	The value that was passed into the parameter

Task-Based Monitoring

Throughout the chapter we have discussed tasks quite a bit, but let's wrap them up here. Tasks are viewed in the "thick" OC via the Tasks pane (see Figure 13-4). There are two "flavors" of tasks, default and custom. By default, we are referring to the tasks that get installed as either part of the base MOM installation or from any MP you might have installed. Custom tasks can be created in the Administrator Console and are valuable when you need an action that has not been supplied to you by default. It is useful to have tasks inside of the MOM environment for the following reasons:

As you now know, tasks get launched with the computer of the event/alert you are currently viewing as its "target."
It provides a more integrated experience for the user.
You can wrap security around the tasks via console scopes, thus limiting certain users to particular tasks.

image from book
Figure 13-4

Performance Monitoring

Part of MOM's greatness is its performance monitoring capabilities. Every resource that has a Windows NT Performance Counter Provider is a candidate for performance monitoring in MOM! There are literally hundreds of these counters and we cannot go into the details of each and every one as that would fill another book in itself. So, once you have identified a provider you wish to use to gather MOM performance data, the next step is either to verify a performance rule has already been defined, use it, or create a custom performance rule using the provider of interest to you. Performance rules define how MOM processes performance counter data. There are two types of performance rules, Measuring rules and Threshold rules.

Performance Rule Type	Definition
Measuring rules	Cause MOM to collect numeric values from sources, such as WMI or Microsoft Windows NT performance counters. MOM stores sample numeric measures in the database. Measuring rules can also include a response.
Threshold rules	Specify that MOM generate an alert or execute a response when a WMI value or performance counter crosses a defined threshold.

Now that you know how to collect performance data inside of MOM, let's learn how to extract it. The steps below are used to create and view a performance graph in the "thick" console:

Launch the "thick" OC.
If the Navigation pane is not displayed, select View Navigation Pane from the menu.
Select the Performance tab, and expand All:Performance Views.
There should be a default performance graph labeled Performance. Select it.
In the right-hand pane is a listing of computers that you are currently monitoring in MOM. Select a machine in the list that is being agent-monitored and hit the Select Counters button.
Place a check in the check box where Performance Object = Process, Performance Instance = MOMService, Performance Counter = % Processor Time. This shows you the usage of the CPU by the MOM agent that is running on your monitored machine.
Click the Draw Graph button.

Advanced Monitoring Topics

Following are several "best practices" for MOM monitoring. While these topics could have been discussed in earlier sections, they are best addressed separately. The topics include suppressing duplicate alerts, meeting SLAs with MOM, and enforcing accountability with alert ownership.

Suppressing Duplicate Alerts

Suppressing duplicate alerts is an important topic. Who wants to keep getting informed that the same issue is occurring? You will find that most rules suppress duplicate alerts and this is also the default for new rules created via the Create Rule Wizard (there is a step in the wizard to configure this). You can also configure the formula used for determining a duplicate alert in the Admin Console (see Figure 13-5). Time for some more hands-on work. Here are the steps for viewing configurable items for suppressing duplicate alerts:

Start up the Administrator Console.
Expand Rule Groups/Microsoft Operations Manager/Operations Manager 2005/Agent/in the Explorer pane.
Select Performance Rules in the Explorer pane.
Select Performance Threshold: MOM Service CPU in the details pane.
Right-click on the rule and select Properties.
Select the Alert Suppression tab.

image from book
Figure 13-5

Again, if a rule suppresses duplicate alerts, the repeat count property of a generated alert will increment by one for each ongoing occurrence. If a particular rule does not suppress duplicates, you are going to receive a new instance of the alert for each occurrence. We highly encourage you to embrace suppressing duplicate alerts, as most of the time this is going to be the desired behavior of your rules.

Meeting Your Service Level Agreements with Service Level Exceptions and Custom Resolution States

As you may or may not be aware, you can create/modify/delete Resolution States in MOM. This can become quite useful in the context of SLAs because it is in the Resolution State entity of MOM that you can define the service level agreement time. If any alert in a given state exceeds the time allowed, it becomes a service level exception. Here are the steps to set resolution state times allowed:

Launch the Administrator Console, and expand the Administration node in the Explorer pane.
Select Global Settings, and double-click the Alert Resolution States Setting in the details pane.
Select the New Resolution State (which is a default state), and click the Modify button.
Focus your attention on the Service level agreement region of the dialog box, and change the default value of 10 minutes to 5 minutes. Click OK.
Click the Apply button and then OK in the Alert Resolution States dialog box.

In order to know if you are meeting your SLAs, you must be able to track any SLA violations (or, we hope, lack thereof). By altering the properties of an alert view in the OC, you can restrict your viewed alerts to only those that have violated a particular Resolution State's SLA. Here are the steps for viewing alerts that violate an SLA:

Personalize any alert view in the "thick" OC (you should now know how to do this).
Check the box labeled "that violated specified service level agreement."
Click on the blue text in the View Description textbox that reads "specified."
Set whatever violation criteria you want, and click OK.
Click OK again in the Alert View properties dialog box.

Enforcing Accountability with Alert Ownership

Like any ticketing system certain individuals are going to be "owners" of specific problem domains. So, we may have a SQL DBA on our team who would take ownership of all SQL Server-related alerts, an IIS geek on our team who would take ownership of all IIS-related alerts, and so on. Each individual or group that you wish to be able to assign ownership of an alert to must be defined as either an Operator or Notification Group in the Admin Console. By using the Alert Owner property you can create alert queues and enforce accountability of resolving specific alerts. To automatically assign alerts to a predetermined owner, simply type the name of an existing operator into the owner field on the alert's tab for a particular rule in the Admin Console.