Monitoring your site’s performance and reliability is an important part of your operations. Monitoring allows you to identify potential problems before they occur so that you can prevent these problems from turning into emergencies. By monitoring a system, you can determine when to upgrade your system and whether changes made to your system are helping or hurting performance. Monitoring allows you to establish performance baselines that you can compare to monitoring data collected at a later time, allowing you to determine what changes have occurred. From this information, you can decide which actions to take. In this lesson you’ll learn how to design a monitoring strategy that allows you to assess your system’s performance so that you can address potential problems that might occur in your Web environment.
Windows 2000 and IIS contain objects that allow you to gather performance data from various components in your system. Performance objects typically correspond to major hardware components such as memory or processors. The objects are built into the operating system, although other programs may install their own performance objects. Each performance object contains a set of counters that provide information about specific aspects of a system or service. For example, the Processor performance object is associated with the processors on your systems. The Processor performance object contains a number of counters, such as the % Processor Time counter and the % User Time counter. Some performance objects can have multiple instances. For those objects you can track statistics for each instance.
Microsoft provides a number of tools that allow you to monitor performance. Some of these tools are included with Windows 2000 and IIS, others are available through the Windows 2000 Server Resources Kit, and still others can be downloaded from Microsoft’s Web site at http://www.microsoft.com. These tools use performance objects and counters to monitor systems and services. Some of the most commonly used tools are the Performance tool, Task Manager, Windows Management Instrumentation (WMI), and Event Viewer, which are described below. In addition to these four tools, Microsoft provides other tools that you can use to monitor performance. These tools are described in Table 11.1.
Table 11.1 Tools Available to Monitor Performance
HTTP Monitoring Tool
Allows you to monitor Hypertext Transfer Protocol (HTTP) activity and set alerts that notify you of dramatic changes in activity
Allows you to monitor network traffic
A command-line tool that detects current network connections and lists them with information about protocol, local address, foreign address, and state
Performance Counter Check
A scriptable COM object that allows you to read performance counters from within Microsoft Windows Script Host (WSH) or from within an Active Server Pages (ASP) file
Allows you to view each process on the local system in detail, set thread priority, view and change security settings, and terminate processes
Allows you to view each process on the local system in detail
Process Thread and Status
Allows you to view the status of all processes and threads
Allows you to query the process inheritance tree and stop process on local and remote computers
Allows you to examine each process in detail, set process and thread priority, and stop the process if necessary
Web Application Stress Tool (WAST)
Simulates multiple browsers requesting pages from a Web site so you can gather performance and stability information about your Web applications
Web Capacity Analysis Tool (WCAT)
Simulates various workloads on client-server configurations so that you can test how your IIS 5.0 and network configurations will respond to different client requests for content, data, or HTML pages
The Performance tool is a Microsoft Management Console (MMC) interface that contains two snap-ins: the System Monitor snap-in and the Performance Logs and Alerts snap-in. Figure 11.1 shows the Performance console and the two snap-ins. Notice that the System Monitor snap-in is active and is monitoring the % Processor Time counter and the % User Time counter, both of which are part of the Processor object. Also notice the spikes in activity when applications are opened.
Figure 11.1 - System Monitor (in the Performance console) monitoring the % Processor Time counter and the % User Time counter
The Performance console is the most effective Microsoft tool that you can use to establish a baseline of server performance. It also allows you to monitor and measure the effects of any changes you make to software or hardware. You can view performance counter readings while you’re monitoring performance; graphically log counter activity, or set alerts that monitor when specified thresholds are met.
System Monitor allows you to collect and view extensive data about the usage of hardware resources and the activity of system services on computers. You can collect and view real-time performance data on a local computer or from several remote computers. You can define the data that you want to collect by type of data (objects, counters, and instances) and source of data (local computer or remote computers). To view performance data in System Monitor, you simply add specific counters to your view.
Performance Logs and Alerts allows you to collect performance data automatically from local or remote computers and save that data to logs. You can then view the logged data by using System Monitor or by exporting the data to spreadsheet programs or databases for analysis and report generation. Like System Monitor, Performance Logs and Alerts uses the performance objects, counters, and instances to provide data about hardware resources and system services. Performance Logs and Alerts supports two types of logs: counter logs and trace logs. With counter logs, data is collected at specified intervals. With trace logs, data is collected when certain activities occur, such as when a disk I/O operation or a page fault occurs.
Performance Logs and Alerts also allows you to set an alert on a counter. When you configure an alert, you must specify the alert threshold on the counter. For example, suppose you want to set an alert on the % Processor Time counter. You configure the alert so that the threshold is reached when the counter value exceeds 20 percent. You must then determine what action should be taken when the value exceeds that threshold. You can configure the alert to take any of the following actions:
Like the Performance tool, Task Manager provides performance information about your systems. Task Manager shows you a snapshot of programs and processes running on your computer. It also provides a summary of processor and memory usage, as shown in Figure 11.2.
Figure 11.2 - Performance tab in Task Manager
The Performance tab provides a dynamic overview of your computer’s performance. It includes graphs for CPU and memory usage; the total number of handles, threads, and processes running; and the total kilobytes (KB) of physical, kernel, and commit memory. You can also configure the Performance tab to display the amount of CPU resources consumed by kernel operations.
Task Manager is useful as a quick reference to system operation and performance; however, the Performance tool provides far more capabilities. Although Task Manager uses data from some of the same performance counters used by the Performance tool, Task Manager doesn’t have access to the breadth of information available from all installed counters. In addition, Task Manager doesn’t support logs and alerts.
WMI is the Microsoft implementation of Web-Based Enterprise Management (WBEM), which provides uniform access to management information. WMI is integrated with the Common Information Model (CIM), an extensible object- oriented schema for managing systems, networks, application, databases, and devices. WMI allows you to monitor, track, and control system events related to software applications, hardware components, and networks.
WMI provides the infrastructure for system monitoring by exposing hardware and software diagnostics in a common application programming interface (API). You can use this infrastructure for any WMI management data.
Windows 2000 allows you to collect data about system resources, such as disks, memory, processors, and network components. By default, the operating system uses the registry to collect this data. However, you can collect this data by using the WMI interface instead of the registry. WMI consolidates data from the hardware platform, drivers, and applications and passes it on to a management information store. The data uses CIM to expose and interface with the data it holds. Together, WMI and CIM enable management applications, platforms, and consoles to perform a variety of tasks, including monitoring and logging events.
You can use WMI with the Performance tool to gather performance data. At the command prompt, type perfmon /wmi. The Performance tool will open as before and will include the System Monitor snap-in and the Performance Logs and Alerts snap-in. However, the data will be collected through WMI rather than through the registry.
Event Viewer is another tool that allows you to monitor your system. Event Viewer maintains logs about application, security, and system events on your computer. Event Viewer allows you to view and manage event logs, gather information about hardware and software problems, and monitor Windows 2000 security.
Event Viewer supports several types of logs, including the Application log, the Security log, and the System log. Each log can contain the following types of events: Error, Warning, Information, Success Audit, and Failure Audit. Figure 11.3 shows the System log in Event Viewer. Notice that several event types are shown in the detail pane.
When you configure an alert in the Performance Logs and Alerts snap-in, you can specify that an event is logged to the Application log if the threshold is exceeded. You can then use Event Viewer to view that log to determine when and how often that alert has been triggered.
Figure 11.3 - System log in Event Viewer
Event logs are used extensively in auditing your system. Auditing is discussed in Lesson 2, "Designing a Security Auditing Strategy."
When monitoring a system that supports a Web site and its applications and data, you should start by focusing on specific areas of performance, including memory, processing, network I/O, security overhead, and Web applications. You can use the Performance tool to monitor each area. Be sure to log data for several days in order to gather a reliable cross section of activity, including unusually high and low activity.
This section provides an overview of the type of performance objects and counters that you should monitor in order to evaluate your system’s performance. For the names of the specific counters that you should monitor, see the Microsoft Windows 2000 Server Resource Kit.
Memory should be the first component in your system that you monitor. Inadequate memory can result in other parts of your system appearing as though the problems reside there. For example, what might appear on the surface as poor disk or processor performance can in fact be as a result of a memory problem. You should rule out memory performance problems before investigating other components.
In Windows 2000, the largest component of activity is the process. Each process contains threads, which are used to accomplish particular tasks. The physical RAM available to the process is called the working set. If the process exceeds the amount of available RAM, it can’t store all of its code and frequently used data. As a result, some of this information must be stored on a disk, which results in an increase in disk activity. Figure 11.4 provides an overview of how a process and its threads use memory.
Figure 11.4 - The working set of a process and its threads
When collecting data about your memory’s performance, you should monitor the following components:
Your system should have enough memory to provide space for the Inetinfo working set so that Windows 2000 will not have to perform disk operations. You should check the size of the working set to determine how much its size varies in response to general memory availability on the server. Your available memory shouldn’t dip below 5 percent of the amount of physical RAM on the server. You should also compare the size of the working set to the rate of page faults attributed to that process. If you can’t lower the page fault rate to an acceptable level, you might have to add memory.
When reviewing performance data, you should also look at how often objects sought in the cache are found. Frequent cache misses result in increased disk I/O and decreased performance. If cache hits are low or if cache misses are high, the cache might be too small to function effectively. Cache flushes can also affect performance. Windows 2000 flushes objects from the cache if they change or if they time out before they’re reused. A high rate of cache flushes associated with elevated cache misses and page faults might mean that the cache is being flushed too frequently. To measure cache flushes, you should compare the number of cache flushes to the number of cache misses and to the rate of page faults in the Inetinfo process.
As you analyze the performance data, be sure to measure cache size in relation to available memory. You want to track how small the cache gets and how often that happens. When memory is scarce, the system trims the cache, and when there is plenty of memory, the system enlarges the cache. If the cache is too small, performance can degrade. You might need to add more memory, defragment your disk, or both.
For active servers, processor bottleneck can become a problem. A bottleneck occurs when one or more processes take up nearly all the time of all the processors on a computer. If this occurs, process threads must wait in queue for processor time, and other activity stops until the queue is cleared. The processors on a Windows 2000 computer running IIS must support the operating system, IIS processes, and processes unrelated to either. You can use such tools as WCAT, WAST, and the Performance console to measure processor performance. When using any of these tools, be sure to account for the system resources used by the tool itself as you analyze performance data.
As you collect data about processor performance, be sure to include information about processor activity, IIS service connections, and IIS threads. Data about processor activity should include processor queue length and processor time percentages. Data about IIS connections should include the Web service and File Transfer Protocol (FTP) service. Data about IIS threads should include thread count, processor time, and context switches.
A long, sustained queue length indicates that a processor can’t handle the load assigned to it. As a result, threads are being kept waiting. A sustained queue length of two or more threads can indicate a processor bottleneck. You can configure an alert in Performance Logs and Alerts to notify you if the processor queue length reaches an unacceptable value. You can use data about processor time percentages to determine how processor load is being distributed among processors. If all processors are being shared equally but are reaching their maximum (and causing sustained queue lengths), you might need to upgrade or add processors. If one processor is being used above all others, you might need to replace the application running in that process or move the process to another server.
Connection data allows you to identify patterns of client demand for your server. When combined with information about processor queues and lengths, connection data allows you to determine whether load levels are causing processor bottleneck. User load might be causing bottleneck if the data reveals a long, sustained processor queue, high use rates on one or more processors, or current connections reaching a plateau at a high value, indicating that some connections are being blocked out.
When analyzing data about thread count, you should determine how many threads the Inetinfo process creates and how the number of threads varies. You should also observe the processor time for each thread and the number of context switches. A large number of threads is likely to increase the number of context switches, which might interfere with performance, especially if processor utilization is more than 70 percent.
The main functions of IIS are to establish client connections, receive and interpret requests, and deliver files. Two factors determine how effectively IIS can perform these functions: bandwidth and capacity. Effective bandwidth relies on the link’s transmission capacity, the server configuration, and the server workload. Network capacity is determined, at least in part, by the number of connections established and maintained by the server.
When collecting network I/O data, you should gather information about transmission rates and Transmission Control Protocol (TCP) connections. Transmission rate data should include bytes sent and received by the Web service, FTP service, and Simple Mail Transfer Protocol (SMTP) service. You should also collect sent and received data about TCP segments, Internet Protocol (IP) datagrams, and the network interface. TCP connection data should include information about established, failed, and reset connections.
You can use the data that you collect about transmission rates and TCP connections to determine network capacity and how often you reach that capacity. You should compare these numbers to processor and memory use to help pinpoint where any bottlenecks might be occurring. By collecting data that includes spikes in traffic over a long period of time, you can determine whether you have enough capacity to meet user demand. For example, suppose you’re having problems supporting all your users at peak time. If your network interface is close to capacity at those times, but your processor and memory use are moderate, you know that you should address network capacity issues.
The number of connections that are rejected or reset might also indicate that your network connection can’t support the current or increasing demand for your site. An increasing number of failures and resets or a consistently increasing rate of failures and resets can indicate a bandwidth shortage.
Any layer of security that you implement in your system can affect performance. However, you can’t measure security overhead simply by monitoring a separate process or threads. Many security features in Windows 2000 are integrated into the operating system and IIS. The most common way to measure security overhead is to compare performance with and without the specific security feature. When collecting data about security overhead, you should gather information about processor activity, the processor queue, physical memory used, network traffic, and latency and delays.
Analyzing data about security overhead consists primarily of comparing data collected with and without the security feature implemented. You can then use the results of these comparisons to determine whether to implement the security feature and, if so, what type of upgrading you should do in order to support that feature. For example, you might need to upgrade or add processors, add memory, or use customer hardware.
A poorly written Web application can result in an inefficient use of resources. For example, a script might make several references to a database instead of a single comprehensive one. If Web applications are an important part of your site, you should monitor the performance of those applications by monitoring ASP, Common Gateway Interface (CGI), and Internet Server Application Programming Interface (ISAPI) requests. You should also monitor Web service GET and POST requests.
You can also configure IIS to log events in the Windows Application event log (which you can view through Event Viewer) when ASP errors occur. Events are logged when a client request for an ASP application is unsuccessful.
If your ASP requests per second are low during peak usage, your application might be causing a bottleneck. At the same time, the number of requests queued and the request wait time should remain low, although they will go up and down under varying loads. If the limit is reached for the number of requests that can be queued, client browsers will receive a message saying that the server is busy.
If pages are being executed quickly and don’t wait for I/O, the number of requests executing is likely to be low. If pages must wait for I/O, the number of requests executing is likely to be high. If the number of requests executing is high, the number of queued requests is high, and CPU utilization is low, you may need to increase the maximum number of allowed processor threads.
If CGI and ISAPI requests drop while under increasing loads, the application itself might be causing a problem. If you’re using CGI, you might want to consider converting to ASP or ISAPI.
If your data analysis reveals a problem, you might need to rewrite your application to improve performance. However, it’s also possible that you need to upgrade your system to support the demand for your applications.
For many administrators, system monitoring should include the capacity to detect Web application failures automatically and then notify the appropriate individuals or services of the failure. Administrators can use Performance Logs and Alerts to create alerts based on specific counters related to applications. For example, you can configure an alert based on the Active Server Pages\Errors From Script Compilers counter so that an administrator is notified if that counter exceeds a certain limit. You can also use other tools to monitor application failures, such as Health Monitor 2.1 in Microsoft Application Center 2000.
Your monitoring strategy should include collecting data about memory, processing, network I/O, security overhead, and Web applications. Table 11.2 provides an overview of the considerations that you should take into account when monitoring your system.
Table 11.2 Monitoring Your System
Memory is the most critical component to monitor because problems can appear in other areas of a system that are related to inadequate memory. When collecting data about memory, be sure to include data on available memory, paging, file system cache, paging file size, and memory pool size.
If a processor bottleneck occurs, process threads must wait in queue for processor time. When you collect processor data, include processor queue length and processor time percentages. Also monitor IIS connections and threads.
Two factors, bandwidth and capacity, determine how effectively IIS can establish client connections, receive and interpret requests, and deliver files. When collecting network I/O data, include information about transmission rates and TCP connections. For transmission rates, include data about bytes sent and received by the Web service, FTP service, and SMTP service and data about TCP segments, IP datagrams, and the network interface. For TCP connections, include data about established, failed, and reset connections.
The more layers of security you implement in your system, the more performance can be affected. To determine security overhead, collect data with and without the specific security features. Include data about processor activity, the processor queue, physical memory used, network traffic, and latency and delays.
A poorly written Web application can result in an inefficient use of resources. It’s also possible that you don’t have enough resources to handle the application. When collecting application data, monitor ASP, CGI, and ISAPI requests. Also monitor Web service GET and POST requests.
You should adhere to the following guidelines when developing your monitoring strategy:
Lucerne Publishing maintains a small Web site that allows users to view products online. The company is experiencing performance degradation at peak usage. Network administrators at the company first monitor memory and find that all aspects of memory appear to be operating within acceptable ranges. Next they decide to collect data about processor activity, as shown in Figure 11.5.
To monitor processor activity, administrators use the following counters:
Figure 11.5 - Monitoring processor activity
In reviewing the data, the administrators discover that the processor reaches peak capacity at high usage times. To address this problem, they plan to add a second processor to the system.
Windows 2000 and IIS contain performance objects that you can use to gather data about your system’s performance. Each object contains a set of counters that provide performance information about specific aspects of a system or resource. Microsoft provides a number of tools that allow you to monitor performance, including the Performance tool, Task Manager, WMI, and Event Viewer. When monitoring your system, you should focus on memory, processing, network I/O, security overhead, and Web applications. Memory should be the first component in your system that you monitor. For memory, monitor available memory, paging, file system cache, paging file size, and memory pool size. For processing, monitor processor activity, IIS service connections, and IIS threads. For network I/O, monitor transmission rates and TCP connections. For security, monitor processor activity, the processor queue, physical memory used, network traffic, and latency and delays. For applications, monitor ASP, CGI, and ISAPI requests, as well as Web service GET and POST requests.