As we noted in Chapter 7, "Monitoring," Application Center enables a default set of performance counters that are used to capture performance data on every cluster member and logs this data to the Application Center Events and Performance Logging database. As soon as you create a cluster on a server, or add a server to a cluster, counter logging is initiated and counter data is written to the local instance of the ACLog database.
NOTE
The default counters are defined in the file Perflogconsumer.mof, which is used to create the Windows Management Instrumentation (WMI) counter instances that the Application Center Events and Performance Logging database uses. In turn, a WMI performance-logging consumer uses an agent to write counter information to the database. In order to display this counter list in the user interface, Application Center queries the database with a query component.
Each of the installed counters can be enabled for graphing on the performance chart that's available for the cluster or member nodes by using a Web page dialog that you can launch from any performance chart that's displayed in the details pane of the snap-in. (See "Enabling Counter Graphing," later in this chapter.)
The cross-section of counters selected as the Application Center default performance counters are listed in Table 10.10. Based on the feedback provided by Microsoft Consulting Services, product teams, early adopters, and beta testers, it was determined that these counters were the ones most likely to be used on a regular basis by system administrators. These counters should meet most of your normal operational performance monitoring requirements. You'll notice that most of these counters have already been identified in earlier sections of the chapter that dealt with monitoring the different aspects of a Web server environment. You can, of course, add additional counters, which we'll cover later in this section.
In addition to listing the Application Center default performance counters alphabetically by name, Table 10.10 also provides a short description for each counter, identifies the counter's unit of measurement, and identifies the scope of the data. Scope describes what the data represents, the present value, an accumulated value, an average, or data collected over a period of time.
Table 10.10 Application Center Performance Counters
Counter | Description | Units | Scope |
---|---|---|---|
Available Bytes (memory) | The amount of physical memory that is available to processes running on the computer. It is calculated by summing space on the Zeroed, Free, and Stand by memory lists. This figure should be at least 5 percent of total memory at all times.1 | Bytes | Present value |
Bytes Total/sec (Web Service) | The sum of Bytes Sent/sec and Bytes Received/sec. This is the total rate of bytes that are transferred by the Web Service. | Integer | Data per time period |
Connections active (TCP) | The number of times TCP connections have made a direct transition to the Syn-sent state from the Closed state. | Integer | Present value |
Context Switches/sec (System) | This value can indicate excessive locking in code, perhaps creating a contention for resources. If too high, add another server or check with Microsoft for the latest patches. | Integer | Data per time period |
Current Connections(Web Service) | The number of current client connections to the Web Service. | Integer | Present value |
Current Disk Queue Length (physical disk) | The number of requests outstanding on the disk at the time the performance data is collected. It includes requests in service at the time of the reading. Multi-spindle disk devices can have multiple requests active at one time, but other concurrent requests are awaiting service. This counter might reflect a transitory high or low queue length, but if there is a sustained load on the disk drive, it is likely that this will be consistently high. Requests are experiencing delays proportional to the length of this queue minus the number of spindles on the disks.2 | Integer | Present value |
Errors per second (ASP) | The number of errors generated by ASP applications, per second. | Integer | Data per time period |
Get Requests/sec (Web Service) | The number of HTTP requests that are using the GET method, per second. The GET method is the most common method used on the Web. | Integer | Data per time period |
ISAPI extension requests/sec (Web Service) | The number of ISAPI extension requests that are simultaneously being processed by the Web Service, per second. | Integer | Data per time period |
Page faults/sec (memory) | The number of times, per second, that the server reads the page file on the disk or from memory that is not assigned to the working set. Most CPUs can handle a large numbers of page faults without consequence; however, if disk reads are high, there might be performance degradation. | Bytes | Data per time period |
Private Bytes (process: Inetinfo) | The number of bytes of memory that are taken up by a particular process (in this case, Inetinfo, which is part of IIS). | Bytes | Present value |
% Privileged Time (CPU) | The percentage of non-idle processor time spent in privileged mode. (Privileged mode is a processing mode designed for operating system components and hardware-manipulating drivers. It allows direct access to hardware and all memory. The alternative, user mode, is a restricted processing mode designed for applications, environment subsystems, and integral subsystems. The operating system switches application threads to privileged mode to access operating system services). % Privileged Time includes time servicing interrupts and deferred procedure calls (DPCs). A high rate of privileged time might be attributable to a large number of interrupts that are being generated by a failing device. This counter displays the average busy time as a percentage of the sample time. | Percentage | Average of accumulated values |
Processor Utilization (CPU) | The percentage of time that the processor is executing a non-idle thread. This counter was designed as a primary indicator of processor activity. It is calculated by measuring the time that the processor spends executing the thread of the Idle process in each sample interval, and subtracting that value from 100 percent. Processor bottlenecks are characterized by high Processor:% Processor Time numbers while the network adapter remains well below capacity.3 | Percentage | Average of accumulated values |
% User Time (CPU) | The percentage of non-idle processor time spent in user mode. (User mode is a restricted processing mode designed for applications, environment subsystems, and integral sub-systems. The alternative, privileged mode, is designed for operating system components and allows direct access to hardware and all memory. The operating system switches application threads to privileged mode to access operating system services.) This counter displays the average busy time as a percentage of the sample time. | Percentage | Average of accumulated values |
Request execution time (ASP) | The number of milliseconds that it took the most recent ASP request to complete. | Milliseconds | Last value |
Requests per second (ASP) | The number of requests executed, per second. | Integer | Data per time period |
Requests Queued (ASP) | The number of requests waiting for service from the queue. This number should be small, except during heavy traffic periods. Large numbers of queued requests indicates that there is a performance bottleneck somewhere in your server. | Integer | Present value |
Request wait time (ASP) | The amount of time that the most recent ASP request was waiting in the queue. | Milliseconds | Last value |
Total Server Memory (SQL Server: Memory Manager) | The total amount of dynamic memory the server is currently consuming. | Bytes | Present value |
1. This value should be greater than 20 MB.
2. This difference should average less than 2 for good performance.
3. Processor utilization does occasionally peak at fairly high levels, but this level should not be sustained for a long period.
The System Test team's favorite counters
The Application Center System Test team identified the following counters as their favorites for isolating performance bottlenecks and identifying memory leaks:
- Active Server Pages: Requests/sec
- Active Server Pages: Errors/sec
- Active Server Pages: Transactions/Sec
- Distributed Transactions Coordinator: Response Time -- Average
- Distributed Transactions Coordinator: Transactions/sec
- Memory: Available MBytes
- Network Interface: Bytes Total/sec
- Processor: %Processor time
You can obtain a current list of the installed counters on a server running Application Center by using one of several techniques. The first method, of course, is via the Application Center user interface:
The Add a Counter dialog box, which displays all the counters that are currently installed on the system, appears.
NOTE
It is possible to get two different counter lists depending on the way you query for them. The Add a Counter dialog box queries data from the Application Center Events and Performance Logging database; all other methods query the WMI repository. If there are counters that are not enabled for logging, the two lists will differ, with the one retrieved from the database being shorter. You can retrieve old data for counters that are no longer being collected.
The second method involves using the WMI Tester (Wbemtest.exe) or WMI Common Information Model (CIM) Studio (CIM Studio) and running one of these against the member from which you want to obtain counter information. Follow these steps:
Finally, for the third method, you can run the Counters.vbs script that's provided on the Application Center CD. In addition to obtaining a list of the installed counters, you can use this script to "delete" a counter. In the context of the Counters.vbs script, "delete" means to stop collecting data from the counter. It does not remove the counter from the ACLog database.
CAUTION
You should be extremely cautious when writing any scripts that access ACLog and remove counters. If done incorrectly, you can easily affect data integrity and corrupt the database.
To run this script:
Run without parameters, the script displays help for the two parameters that are available, /list and /delete. Use the /list parameter to list the installed counters and the /delete parameter, accompanied by a counter name enclosed in quotation marks, to delete the specified counter.
Here is the Counters.vbs script:
set args = wscript.arguments cmd = "" if args.Count > 0 then cmd = args(0) end if select case cmd case "/list" listCounters case "/delete" deleteCounter(args(1)) case else showHelp end select function e(str) wscript.echo(str) end function // // Display help if script is executed without parameters // function showHelp() e("/list to display installed counters") e("/delete <counter name> to stop collecting a counter") end function // // List the counters // function listCounters() Set wbemLocator = CreateObject("WbemScripting.SWbemLocator") Set wbemService = wbemLocator.ConnectServer(strComputerName,"root\MicrosoftApplicationCenter") wbemLocator.Security_.ImpersonationLevel=3 Set counterInstances = wbemService.InstancesOf("MicrosoftAC_CapacityCounterConfig") For Each counter in counterInstances counterName = counter.Name e(counterName) Next end function // // Stop logging data from the specified counter // function deleteCounter(counterName) Set wbemLocator = CreateObject("WbemScripting.SWbemLocator") Set wbemService = wbemLocator.ConnectServer(strComputerName,"root\MicrosoftApplicationCenter") wbemLocator.Security_.ImpersonationLevel=3 wbemService.Delete("MicrosoftAC_CapacityCounterConfig.Name=""" + counterName +"""") e("Deleted counter: " + counterName) end function
If the counters that are provided don't completely meet your monitoring requirements, you can load additional counters into the Application Center namespace. Creating new counters isn't difficult; however, you should determine whether or not new counters are needed to meet an ongoing operational requirement.
We recommend that you only create new cluster-wide counters if you intend to gather data on an ongoing basis with the intention of accumulating historical data for reporting and planning purposes. In this case, you would create the counter on the cluster controller so that the updated counter collection is replicated to all the cluster members the next time there's a full synchronization—which you can force manually after you create the new counter(s).
In situations where you require additional monitoring capability for a short period of time, such as performance tuning on a single member, you can add performance counters to that member. Remember to take the member out of the synchronization loop before creating the new counter so that the local counter collection isn't overwritten by the counter definitions on the controller. After you've finished collecting performance data, you can bring the member back into the synchronization loop; the next time a full synchronization occurs, the counter collection will be restored to its original state. If a new counter is added on a member, you need to connect directly to that member—in the Connect to server dialog box, click Manage this server only—in order to see the counter on the member. If you don't do this, you will see only the counter list for the cluster controller.
An alternative to creating a new counter is to use the available operating system tools, such as Performance Monitor and Network Monitor, to perform in-depth monitoring of the server in question. With these tools, you can log the necessary data you need for ongoing analysis without changing the structure of the ACLog database and in general, it will be easier to isolate the information you require for tuning the server or an application.
Creating a new counter is accomplished by writing a counter definition and saving it as a MOF file or by modifying the sample counters file that's provided on the Application Center CD.
The following code illustrates a typical counter definition that defines a counter for the Application Center namespace:
// Specifies the WMI namespace for the instance #pragma namespace("\\root\\MicrosoftApplicationCenter") // // Counter consumer class definition // instance of MicrosoftAC_CapacityCounterConfig { Name = "CPU 0 Interrupts/sec"; CounterPath = "\\Processor(0)\\Interrupts/sec"; CounterType = 1; Units = ""; AggregationMethod = 1; ClusterAggregation = 1; DefaultScale = 0; };
After you run Mofcomp against this script, the new counter is created as an instance of the MicrosoftAC_CapacityCounterConfig class. After the performance log consumer retrieves this information and logs it, a stored procedure detects the counter identifier and then writes an entry to the counter metadata table. Data integrity is enforced through this process.
Let's analyze the preceding sample in more detail and then create a new counter definition that defines a new counter for the Application Center counter collection.
The required properties for a counter are as follows:
The following values can be used to specify an aggregation method for the counter:
WARNING
Do not use the Min or Max aggregation methods for ClusterAggregation when a counter specifies Sum—a cumulative counter—for server aggregation. The results are not useful, very unpredictable, and not supported. In addition, a ClusterAggregation value of 0 indicates no aggregation. As a result, this counter will not be displayed in the cluster-wide view. An example of this is Thread\ID Process. ID Process is the unique identifier for this process; ID Process numbers are reused, so they only identify a process for the lifetime of that process.
Let's say, for example, that we want to add two counters to verify that there is a potential processor bottleneck caused by a client request. The two counters are Processor:Interrupts/sec and Processor:% DPC Time. The first counter tells us how much time the processor is spending on hardware interrupts, and the second tells us how much time is spent on deferred procedure calls.
The easiest way to obtain the counter information that is required for the counter definition is as follows:
The Add Counters dialog box appears.
Figure 10.4 shows the Performance snap-in with the %DPC Time object selected as the counter. Note also that the _Total instance is selected by default.
Figure 10.4 The Performance snap-in and the Add Counters dialog box
Using the information provided in the Add Counters dialog box, we can start building our MOF file to add the new counters. For the counter path, we have:
The next code sample contains our new counter definition for the %DPC Time counter:
// Specifies the WMI namespace for the instance #pragma namespace("\\root\\MicrosoftApplicationCenter") // // DPC counter consumer class definition // instance of MicrosoftAC_CapacityCounterConfig { Name = "DPC Interrupts/sec"; CounterPath = "\\Processor(_Total)\\%DPC Time"; CounterType = 1; Units = "Interrupts/sec"; // // Use averaging for cluster aggregation because summing this value across // the cluster does not provided meaningful results // AggregationMethod = 1; ClusterAggregation = 1; DefaultScale = 0; };
We can repeat the preceding steps to obtain information about the %Interrupt Time counter so that we can add it to the preceding code. When all of the necessary coding is finished, we'll save the file—as a text file with a .mof file name extension—on the server where we want to add the counter. Next, we'll open the command-line window, and run Mofcomp against the file to add it to the Application Center counter collection. Finally, to verify that the counters were successfully added, from the command line, we'll run Counter.vbs /list to obtain a list of the currently active counters. This list verifies that the WMI class instances were successfully stored in the WMI repository. To verify that the counter is available for logging in the Performance view, open the Add counter dialog box, and then confirm that the counter name is listed. If the counter isn't listed, check the Event view to see if any error events were generated from running Mofcomp to add the counter.
NOTE
You should add new counters on the cluster controller. Because counters are a replicated property, any new counter information is replicated to all the cluster members. In addition, the list of cluster-wide counters that is displayed in the Application Center snap-in is retrieved from the controller.
Through the Application Center user interface, you can enable counter graphing on a per-member basis or across the cluster. This provides flexibility in managing your members, particularly when some, such as ACDW802AS in the test cluster we set up, do not have the same performance capabilities as the other members.
The steps in enabling counter graphing in a performance chart are as follows:
Figure 10.5 illustrates the user interface for enabling a counter.
Cluster-wide performance graphs are displayed when you select the cluster node view. Server counter graphs are automatically rolled up to the cluster view—in accordance with the counter aggregation settings—when the same counter is enabled on every member. (See Figure 10.6, later in this chapter, for an illustration of cluster-wide counter displays.)
Figure 10.5 Using the Add a counter dialog box to enable graphing for a counter