Monitoring: a Four-Step Process | Microsoft Application Center 2000 Resource Kit 2001

The easiest way to approach the Application Center monitoring process is to break it down into the four major steps shown in Figure 7.14.

click to view at full size

Figure 7.14 The major areas of Application Center monitoring activity and process flow

Generating Data

The first step in the monitoring process is creating data that will provide monitoring information. Enabled by WMI, Application Center uses the following major data sources to obtain information:

Application Center events
Health Monitor data collectors
Windows events
Performance counters

Logging Data

The next step in the monitoring process is logging the data that is generated. As you've seen already, Application Center uses the SQL desktop engine to store data for a cluster and its members, thereby extending the existing Windows event and performance logs—which also store certain information by default.

Querying Data

The third step involves querying the data store by using built-in components provided by Application Center. The various information views that are available through the user interface are obtained by using parameterized SQL queries—handled transparently by the user interface—that run against the ACLog database.

Viewing Data

The final step is presenting the information to the monitoring screens, which provides member-wide and cluster-wide views of events and performance.

Let's step through the first three steps in the monitoring process, starting with generating the data.

Generating Data

In the WMI section, you saw how events are fired by the operating system or an application. Application Center uses its own custom provider to generate events that supplement the data provided by Windows events and Health Monitor events. In addition to this event data, the Performance Monitor obtains counter-based performance data from the Performance Data Helper (PDH).

Application Center generates events for the following core services:

Cluster services
Replication service
Request forwarding
Monitoring

Figure 7.15 illustrates the architecture that's used to provide event information to WMI. The core services in the preceding list function as clients for the Passive Provider, which is a key element in sending Application Center events to WMI.

NOTE
The provider is a decoupled WMI event provider that sends Application Center event notifications to WMI. This provider is an in-process COM component created by the provider's clients, such as the cluster and replication services.

click to view at full size

Figure 7.15 Eventing architecture

As you can see in Figure 7.15, the Application Center Event Provider writes a subset of errors and warnings to the Windows Event Log and it sends all event information to WMI.

Event Schema

The Application Center event schema is a hierarchy of WMI classes with a common root, MicrosoftAC_Base_Event, which is the Base class. The following .mof code shows how MicrosoftAC_Base_Event is defined:

 class MicrosoftAC_Base_Event : __ExtrinsicEvent { // Identifies the event. This is specific to the source that generated the  event log entry and is used, together with SourceName, to uniquely identify  an NT event type.         [Key]         uint32 EventId; // This uniquely identifies each instance of an event so that we can refer  to the event later.  We will be automatically generating these events in  the provider.         [Key]         string GUID; // Error code         uint32 Status; // Error message         string StatusMessage; // Specifies the time at which the source generated the event.         datetime TimeGenerated; // The severity level         [Values {"Error", "Warning", "Information"}, ValueMap {1, 2, 4}]         uint32 Type; };

There are two types of event classes: Containers and Events. Containers are higher-level classes that serve as categories for events (Replication Session and Request Forwarding Initialization, for example). Every event generated at run time is an instance of a series of these hierarchical classes. A container query will return events for any of its children.

All the classes that derive from the base class use the following naming convention: MicrosoftAC_class1_class2_class3 _name_Event, where classn is the name of each of the parent classes, not including Base. Underscores indicate the class hierarchy. To reduce the length of class names, the parent classes can be abbreviated.

All classes end with Event to indicate that they are events per WMI convention. This extended class naming is done to make the event namespace more usable from monitoring tools, such as Health Monitor.

Schema Example

The class structure for Replication Service is:

Base

Replication

Engine

General

Events

This class structure is represented in WMI as:

MicrosoftAC_Base_Event

MicrosoftAC_Replication_Event

MicrosoftAC_Replication_Engine_Event

MicrosoftAC_Replication_Engine_General_Event

If you enumerate MicrosoftAC_RepEngGeneral, you'll find several events represented by the following classes:

MicrosoftAC_Replication_Engine_General_SetReplAttrFailed_Event
MicrosoftAC_Replication_Engine_General_SetDriverAttrFailed_Event
MicrosoftAC_Replication_Engine_General_StartReplFailed_Event
MicrosoftAC_Replication_Engine_General_SetDriverAttr_Event
MicrosoftAC_Replication_Engine_General_SetReplAttr_Event
MicrosoftAC_Replication_Engine_General_StartRepl_Event
MicrosoftAC_Replication_Engine_General_StopRepl_Event
MicrosoftAC_Replication_Engine_General_DirChangeNotifyFailed_Event
MicrosoftAC_Replication_Engine_General_RemovedirectoryFailed_Event

The other clients that use the event provider implement a class structure and schema that follows the example given for the Replication Service.

Figure 7.16 shows how event information is generated from an instance of the Replication Service's class MicrosoftAC_Replication_Engine_General_StartRepl_Event. The event's status—"Synchronization enabled successfully"—is passed to the event provider, which in turn forwards the information to WMI. Once this information is stored in WMI, the appropriate event consumer can access the data and write to an event log(s).

click to view at full size

Figure 7.16 Architectural elements and process flow when an Application Center service generates an event

Two items should be noted in Figure 7.16. First, nothing is written to the Windows Event Log unless an error occurs. Second, errors are written to the Windows Event Log, and all events are sent to WMI. WMI, in turn, writes information to the Application Center log and any user-defined logs.

Logging Data

As you may have already gathered, Application Center does not collect data in a central cluster database. Instead, it persists data in a SQL desktop engine database that's installed on each member. Queries related to monitoring are run against the individual data stores to provide member-wide and cluster-wide reporting.

After data is generated, it has to be logged. This is accomplished by using the architecture illustrated in Figure 7.17.

The central element in this monitoring model is the Log Agent component, which functions as an intermediary between data consumers and the local instance of the SQL desktop engine database. (The consumers subscribe to the providers described in the preceding section.) The agent runs in process to the logging clients—the consumers—and provides its services to the client whenever the client has data that needs to be logged. Each local instance of the log agent maintains an OLEDB connection to the data store and provides an interface for structured logging. Each time the agent writes the log, it combines the log data it receives (represented as a variant containing an array of variants) with the log parameters that the client passes (server information and time stamp) to generate a log record that's written to the database.

click to view at full size

Figure 7.17 Logging architecture

The Event Logging Consumer

The Event Logging consumer is a permanent consumer that subscribes to events from the following sources:

Application Center
Health Monitor
Windows Event Log

WMI activates this component based on permanent consumer registration before delivering events. The Event Logging consumer is used by the user interface to configure the event query filters that determine which events to collect according to their level of severity.

NOTE
This consumer runs as a COM+ application with the process identity of a cluster user. COM+ performs an access check when events are delivered by WMI via calls to the Event Logging consumer. During cluster creation, the user and/or password for the server may get changed. If this happens, you have to remember to alter the process identity of the consumer COM+ application accordingly.

Although WMI throttles the delivery of events from the provider to the consumer, there may be cases in which the consumer can't keep up with the incoming data, in which case an event buffer overflow occurs. If this happens, WMI will drop events. While there is no guarantee that data won't get lost, Application Center uses additional buffering to ensure that event data loss is minimal.

The Performance Counter Logging Consumer

Application Center uses the Performance Counter Logging consumer to log performance metrics that are used for historical performance charts. This consumer is a permanent event consumer that links directly to the PDH to obtain data. The Performance Counter Logging component is implemented as a COM automation server and runs out-of-process to WMI. In the event that the consumer isn't running, WMI activates it before delivering the events that the consumer uses.

This consumer is configured through WMI with instances of the following configuration classes:

MicrosoftAC_CapacityLoggingConfig
MicrosoftAC_CapacityCounterConfig

Because changes to these configuration class instances are made on the cluster controller and replicated by the replication engine to every cluster member, each member picks up configuration changes to the Performance Counter Logging consumer.

Querying and Preparing Data

It's hard to say which is more difficult, getting data out of the log or putting it in. On the query side of the argument, the system has to locate the fields that are required for a specific view of the data, and then the data has to be formatted for display on the screen. Figure 7.18 shows the architecture that Application Center uses to access and query the data store, retrieve the data, format the data, and display the data in the user interface.

click to view at full size

Figure 7.18 The event querying and viewing architecture

The key new elements in this architecture are the user interface (Web browser and MMC snap-in) and the Log Query Helper (LQH) Service.

The User Interface

Either user interface can send requests for information to the LQH Service. These actions are triggered by setting focus on a node in the console tree or by clicking a button (for example, Refresh) in the details pane of the snap-in.

The LQH Service

The role of the LQH Service is to provide the log data needed to populate the Event Viewer and Performance Viewer pages. It runs as a service on the local system account (Application Center Log Query Helper) and depends on the RPC Service.

The rollup component performs these basic tasks:

Accepts requests from the user interface.
Passes a query to each server.
Returns results and status to the user interface.

The background information that we've provided about the different elements of

Application Center health monitoring and how it's implemented should help you with the decisions that you'll have to make when modifying or creating new monitors—a topic that is covered in detail in Chapter 9, "Working with Monitors and Events."