Typical Processor-
A processor bottleneck occurs when the demand is overshooting supply of processor threads of the system or applications being deployed. This is caused by processor demands being queued and thus maintaining high CPU utilization until the queue is being emptied, which causes the system response to degrade.
When you find that the processor utilization on a server is consistently high (90 percent or higher) it usually leads to processes queuing up, waiting for processor time, and
Let’s discuss an example of high processor utilization. If you are monitoring an IIS server hosting a single Web site that relies upon a legacy COM+ application written in Visual Basic 6 to parse through
NOTE
When examining processor usage, keep in mind the role of the computer and the type of work being done. High processor values on a SQL server are less desirable than on a Web server.
There are two
System Object
The
System
object and its associated counters measure aggregate data for threads running on the processor. They provide
The number of threads in the processor queue. Unlike the disk counters (discussed later in the chapter), this counter shows ready threads only, not threads that are running. There is a single queue for processor time even on computers with multiple processors. Therefore, if a computer has multiple processors, you need to divide this value by the number of processors
One way to determine if a processor bottleneck exists with your application is to monitor the System\ Processor Queue Length counter. A sustained queue length along with an over-utilized processor (90 percent and above) is a strong indicator of a processor bottleneck.
When monitoring the
Processor Queue Length
counter we
You can also monitor Processor\ % Interrupt Time for an indirect indicator of the activity of disk drivers, network adapters, and other devices that generate interrupts.
The combined rate at which all processors on the computer are switched from one thread to another. Context switches occur when a running thread voluntarily relinquishes the processor, is pre-empted by a higher priority ready thread, or switches between
A system that experiences excessive context switching due to inefficient application code or poor system architecture can be extremely costly in the terms of resource usage. Your goal should always be to decrease the amount of context switching occurring at your application or database servers. Context switches
Finally, when monitoring your system you should make sure that the
System\Context Switches/sec
counter that
Disk Bottlenecks
Disk space is a recurring problem. No matter how much drive space you configure your servers or network storage devices with, your software seems to
The
System Monitor measures different aspects of physical and logical disk performance. To truly understand the state of disk resource consumption you will need to monitor several disk counters, and in some instances you will need to monitor them for several days. On top of this, you will probably find yourself churning through some mathematical formulas to determine whether or not a disk bottleneck exists at your server. These formulas are detailed in the real world example below. However, before we delve into these formulas let’s review some of the counters you will monitor when hunting down a disk bottleneck. These counters will allow you to troubleshoot, capacity plan and measure the activity of your disk subsystem. In the case of some of the counters the information they provide is required for the aforementioned disk bottleneck formulas.
The average number of both read and
The average number of read requests that were queued for the selected disk during the sample interval.
The average number of write requests that were queued for the selected disk during the sample interval.
The average time, in seconds, of a read of data from the disk.
The time, in seconds, of the average disk transfer.
The rate of read operations on the disk.
The rate of write operations on the disk.
How the ACE Team
An internal product team at Microsoft was interested in evaluating server hardware from two different
Load the login page
Select a user
Load the
Submit actual work times to the manager
Load the resource views page
Set and save notification reminders
Delegate one task to another resource
The client machines were configured so that all of the 500 databases at the SQL server would be accessed during the tests. This helped prevent any one of the databases from receiving a majority of the SQL transactions. After configuring the client machines, the stress test harness was started and run for 20 minutes (15 minutes were set aside as a warm up period). During these 20 minutes, performance data at the SQL server was collected for benchmark purposes.
A wait time of 10 and 60 seconds was used when executing the load against the
On executing both scenarios a significant disk read times and write times was noticed which prompted an investigation as to the disk capacity of the hardware being utilized. The calculations indicated the I/O per disk exceeded the manufacturer’s specified I/O that the disk can successfully handle.
The performance data collected during the 10 second and 60 second wait-time benchmark indicated the existence of a disk bottleneck at Server 1. In order to verify this, our team applied the performance data gathered from the physical disk activity to the following formula:
I/Os per Disk = [Reads + (4xWrites)] / Number of Disks
If the calculated I/Os per disk exceeded the capacity for the server, this would verify the existence of a disk bottleneck. The disk I/O capacity and calculated disk I/O per disk is outlined below. It should be noted that for each of the calculations, 85 random I/Os per disk is used as the capacity for a disk in a RAID 5 configuration.
10-Second Wait Time Test Scenario on Server 1
Disk I/O capacity = 85 random I/Os per disk
Calculated I/Os per disk = [269.7 + (4x74.6) ] / 5
Calculated I/Os per disk = 113.62 random I/Os per disk
At 113.62 random I/Os per disk Server1 is suffering from a disk bottleneck as the capacity for each disk in the server was only 85 random I/Os per disk.
10-Second Wait Time Test Scenario on Server 2
Disk I/O capacity = 85 random I/Os per disk
Calculated I/Os per disk = [138.3 + (4x43.0)] / 4
Calculated I/Os per disk = 77.7 random I/Os per disk
At 77.7 random I/Os per disk Server 2 is below the capacity of 85 random I/Os per disk, therefore no disk bottleneck exists.
60-Second Wait Time Test Scenario on Server 1
Disk I/O capacity = 85 random I/Os per disk
Calculated I/Os per disk = [294.8 + (4x71.8) ] / 5
Calculated I/Os per disk = 116.4 random I/Os per disk
At 116.4 random I/Os per disk Server 1 is suffering from a disk bottleneck as the capacity for each disk in the server is only 85 random I/Os per disk.
60-Second Wait Time Test Scenario on Server 2
Disk I/O capacity = 85 random I/Os per disk
Calculated I/Os per disk = [68.9 + (4x24.0) ] / 4
Calculated I/Os per disk = 41.2 random I/Os per disk
At 41.2 random I/Os per disk Server 2 is significantly below the capacity of 85 random I/Os per disk, therefore no disk bottleneck exists. At 113.62 and 116.4 random I/Os per disk respectively Server1 is suffering from a disk bottleneck as the capacity for each disk in the server is only 85 random I/Os per disk thus
Disk Architecture Matters to Performance
Today, many Web applications are built to interact with database server. Many if not all of the applications we test use SQL Server 2000, and in most cases we find some significant performance gains by tuning the SQL server. These wins come through optimization of the SQL code, database schema, or disk utilization. When designing the architecture of your database, you will be required to select how data and log files are read and written from disk. For example, do you want to write your log files to a RAID device versus a non-RAID device? If you do not make the right choices, this can lead to a disk bottleneck. In one such case we were able to apply formulas that proved or disproved the existence of a disk bottleneck. You will find details of the project and formulas utilized in the real world example above.
Memory
When analyzing the performance of your Web applications, you should determine if a system is starving for memory due to a memory leak or other application fault, or if the system is simply over-used and requires more hardware. In this section we discuss the counters you should monitor to determine the existence and then cause of the memory bottleneck. (Note that there are tools available to you other than System Monitor to analyze memory utilization of a server. It may be worth your while to investigate some of these tools, as they can save time when monitoring the system.)
The average number of pages faulted per second. It is measured in number of pages faulted per second because only one page is faulted in each fault operation; hence this is also equal to the number of page fault operations. This counter includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory.) Most processors can handle large numbers of soft faults without significant consequences. However, hard faults, which require disk access, can cause significant delays.
Indicates how many bytes of memory are currently available for use by processes. Pages/sec provides the number of pages that were either retrieved from disk due to hard page faults or written to disk to free space in the working set due to page faults.
The rate at which the disk was read to resolve hard page faults. It shows the number of read operations, without regard to the number of pages retrieved in each operation. A hard page fault occurs when a process references a page in virtual memory that is not in the working set or elsewhere in physical memory, and must be retrieved from disk. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It includes read operations to
The rate at which pages are written to disk to free up space in physical memory. Pages are written to disk only if they are changed while in physical memory, so they are likely to hold data, not code. This counter shows write operations, without regard to the number of pages written in each operation. This counter displays the difference between the values observed in the last two samples, divided by the duration of the sample interval.
The rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of Memory\\Pages Input/sec and Memory\\Pages Output/sec . It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory\\Page Faults/sec , without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files.
How the ACE Team Discovered a Memory Leak
In this example we discuss how we were able to determine the existence of a memory leak in an application that was submitted to our team for performance testing. Performance analysts on our team met with the development team to understand some of the common user scenarios for the Web application. The analyst discussed existing performance issues the development team was aware of. The developers were
|
Windows 2000 IIS 5.0 |
~ Average-IIS |
~Maximum / Total-IIS |
|
System-% Total Processor Time |
55% |
100% |
|
Inetinfo-% Total Processor Time |
.5% |
1% |
|
Dllhost-% Total Processor Time |
41% |
100% |
|
Memory: Available in Megabytes |
164 MB |
185 MB |
|
Memory: Pages/sec |
|
.2 |
|
Inetinfo: Private in Megabytes |
14 MB |
14 MB |
|
Dllhost: Private in Megabytes |
38 MB |
56 MB |
|
Windows 2000 IIS 5.0 |
~ Average-IIS |
~Maximum / Total-IIS |
|
System-% Total Processor Time |
69% |
100% |
|
Inetinfo-%Total Processor Time |
.6 % |
1.5% |
|
Dllhost-% Total Processor Time |
71% |
100% |
|
Memory: Available in Megabytes |
56 MB |
196 MB |
|
Memory: Pages/sec |
51 |
295 |
|
Inetinfo: Private in Megabytes |
14 MB |
14.4 MB |
|
Dllhost: Private in Megabytes |
368 MB |
671 MB |
Memory leaks should be investigated by monitoring
Memory\ Available bytes, Process\ Private Bytes
and
Process\ Working Set
. A memory leak would typically
Create and Configure Alerts
You can configure the Performance Logs and Alerts service to fire off alerts when a specified performance event has occurred at the server. For example, if the available memory at the Web server
Logs an entry to the application event log
Sends a network message to a specified user
Starts a performance data log
Runs a specified program
There are several instances when configuring an alert to trigger an event helps increase your testing efficiency. One is when you are running an extended stress test. Let’s say the stress test must be run over a 24-hour period and you are particularly interested in what happens with the Web server’s memory. You could configure an alert that records an event to the application event log each time a spike occurs with the
Pages/Sec
counter. This way, you don’t have to try to count the number of spikes in an
To create an alert follow these steps:
Open Performance and click Start, point to Programs, point to Administrative Tools, and then click Performance.
Double-click Performance Logs and Alerts, and then click Alerts. Any existing alerts will be listed in the details pane. A green icon indicates that an alert is running; a red icon indicates an alert has been stopped or is not currently active.
Right-click a blank area of the details pane and click New Alert Settings.
In Name, type the name of the alert, and then click OK.
To define a comment for your alert, along with counters, alert thresholds, and the sample interval, use the General tab. To define actions that should occur when counter data triggers an alert, use the Action tab, and to define when the service should begin scanning for alerts, use the Schedule tab.
NOTE
You must have Full Control access to a subkey in the registry in order to create or modify a log configuration. The
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\SysmonLog\Log Queries
In general, administrators have this access by default. Administrators can grant access to users using the Security menu in Regedt32.exe. In addition, to run the Performance Logs and Alerts service (which is installed by Setup and runs in the background when you configure a log to run), you must have the right to start or
CAUTION
Incorrectly editing the registry may severely damage your system. Before making changes to the registry, you should back up any valued data on the computer.
To define counters and thresholds for an Alert, follow these steps:
Open Performance.
Double-click Performance Logs and Alerts, and then click Alerts.
In the details pane, double-click the alert.
In Comment, type a comment to describe the alert as needed.
Click Add.
For each counter or group of counters that you want to add to the log, perform the following steps:
To monitor counters from the computer on which the Performance Logs and Alerts service will run, click Use Local Computer Counters.
Or, to monitor counters from a specific computer regardless of where the service is run, click Select Counters From Computer and specify the name of the computer you want to monitor.
In Performance object, click an object to monitor.
In Performance counters, click one or more counters to monitor.
To monitor all instances of the selected counters, click All Instances. (Binary logs can include instances that are not available at log startup but subsequently become available.)
Or, to monitor particular instances of the selected counters, click Select Instances From List, and then click an instance or instances to monitor.
Click Add.
In Alert When The Value Is, specify Under or Over, and in Limit, specify the value that triggers the alert.
In Sample Data Every, specify the amount and the unit of measure for the update interval.
Complete the alert configuration using the Action and Schedule tabs.
NOTE
When creating a monitoring console for export, be sure to select Use Local Computer Counters. Otherwise, counter logs will obtain data from the computer named in the text box, regardless of where the console file is installed.
To define actions for an alert, follow these steps:
Open Performance.
Double-click Performance Logs and Alerts, and then click Alerts.
In the details pane, double-click the alert.
Click the Action tab.
To have the Performance Logs and Alerts service create an entry visible in Event Viewer, select Log An Entry in the Application Event Log.
To have the service trigger the messenger service to send a message, select Send a Network Message to and type the name of the computer on which the alert message should be displayed.
To run a counter log when an alert occurs, select Start Performance Data Log and specify the counter log you want to run.
To have a program run when an alert occurs, select Run This Program and type the file
To start or stop a counter log, trace log, or alert manually, follow these steps:
Open Performance.
Double-click Performance Logs and Alerts, and click Counter Logs, Trace Logs, or Alerts.
In the details pane, right-click the name of the log or alert you want to start or stop, and click Start to begin the logging or alert activity you defined, or click Stop to terminate the activity.
NOTE
There may be a
To remove counters from a log or alert, follow these steps:
Open Performance.
Double-click Performance Logs and Alerts, and then click Counter Logs or Alerts.
In the details pane, double-click the name of the log or alert.
Under Counters, click the counter you want to remove, and then click Remove.
To view or change properties of a log or alert, follow these steps:
Open Performance.
Double-click Performance Logs and Alerts.
Click Counter Logs, Trace Logs, or Alerts.
In the details pane, double-click the name of the log or alert.
View or change the log properties as needed.
To define start or stop parameters for a log or alert, follow these steps.
Open Performance.
Double-click Performance Logs and Alerts, and then click Counter Logs, Trace Logs, or Alerts.
In the details pane, double-click the name of the log or alert.
Click the Schedule tab.
Under Start log, click one of the following options:
To start the log or alert manually, click Manually. When this option is selected, to start the log or alert, right-click the log name in the details pane, and click Start.
To start the log or alert at a specific time and date, click At, and then specify the time and date.
Under Stop Log, select one of the following options:
To stop the log or alert manually, click Manually. When this option is selected, to stop the log or alert, right-click the log or alert name in the details pane, and click Stop.
To stop the log or alert after a specified duration, click After, and then specify the number of intervals and the type of interval (days, hours, and so on).
To stop the log or alert at a specific time and date, click At, and then specify the time and date. (The year box accepts four characters; the others accept two
To stop a log when the log file becomes full, select options as
For counter logs, click When the Log File is Full. The file will continue to accumulate data according to the file-
For trace logs, click When the n-MB Log File is Full. The file will continue to accumulate data according to the file-size limit you set on the Log Files tab (in megabytes).
Complete the properties as appropriate for logs or alerts:
When setting this option, take into consideration your available disk space and any disk quotas that are in place. An error might occur if your disk runs out of disk space due to logging.
For logs, under When a Log File Closes, select the appropriate option:
If you want to configure a circular (continuous, automated) counter or trace logging, select Start a New Log File.
If you want to run a program after the log file stops (for example, a copy command for transferring completed logs to an archive site), select Run This Command. Also type the path and file name of the program to run, or click Browse to locate the program.
For alerts, under When An Alert Scan Finishes, select Start a New Alert Scan if you want to configure continuous alert scanning.
To delete a log or alert, follow these steps:
Open Performance.
Double-click Performance Logs and Alerts.
Click Counter Logs, Trace Logs, or Alerts.
In the details pane, right-click the name of the log or alert, and click Delete.
When you schedule a log to close at a specific time and date or close the log manually, the Start a New Log File option is unavailable.