Performance Monitor | Professional SQL Server 2005 Administration (Wrox Professional Guides)

Performance Monitor is an important tool because not only does it enable you to know how SQL Server is performing, it is also the tool that indicates how Windows is performing. Performance Monitor provides a huge set of counters. Probably no one understands all of them, so don't be daunted.

This section is not about how to use Performance Monitor. Instead, the focus here is on how you can use the capabilities of this tool to diagnose performance problems in your system. For general information about using Performance Monitor, look at the Windows 2000 or Windows 2003 Server Operation Guide in the server resource kit.

You should concentrate your initial efforts on three main areas when investigating performance-related issues:

Processor usage
Disk activity
Memory usage

Monitor these key counters over a "typical" 24-hour period. Pick a typical business day, not a weekend or holiday, so you get an accurate picture of what's happening.

Processor Usage

Processor bottlenecks occur when the processor is so busy that it cannot respond to a request for some time. Processor bottlenecks that happen suddenly and unexpectedly, without additional load on the server, are commonly caused by a non-optimal query plan, a poor configuration, or a design factor, and not insufficient hardware. Nonetheless, always look at the big picture first. Before jumping to conclusions, monitor other counters such as disk, memory, process, and network. That will help you pinpoint the source of the processor activities.

The following sections outline the key Performance Monitor counters to monitor CPU utilization. Remember that it is normal for the trace to show a sharp increase when any program executes; you can safely ignore these spikes.

Object: Processor - Counter: % Processor Time

This counter determines the percentage of time the processor is busy by measuring the percentage of time the thread of an idle process is running and then subtracting from 100 percent. Measuring the processor activity of your SQL Server is key to identifying the potential CPU bottlenecks. For multiprocessor systems, a separate instance of this counter should be monitored for each processor. In such systems, we recommend that you use the System: % Total Processor Time counter to determine the average for all processors.

Object: System - Counter: % Total Processor Time

This counter measures the average of all the CPUs in your server. Obviously this is a key counter to watch for average CPU utilization. A rule of thumb is that if the % Total Processor Time counter exceeds 90% for a continuous period (over 20 minutes or so) on a busy database server, you may have a CPU bottleneck on your server.

The value that characterizes high processor usage depends greatly on your system and workload. You can define your target maximum usage at a higher or lower value than we have specified here. You have to monitor the response time of transactions to ensure that they are within reason; if not, CPU usage greater than 90 % may simply mean that the workload is too much for the available CPU resources, and either CPU resources have to be increased or workload has to be reduced or tuned.

While the % Total Processor Time counter is important, you probably don't want to jump to the conclusion that you have a CPU bottleneck based on just this one counter. Another indicator of CPU performance is the System Object: Processor Queue Length.

Object: System - Counter: Processor Queue Length

A collection of one or more threads that is ready but not able to run on the processor due to another active thread that is currently running is called the processor queue. This counter corresponds to the number of threads waiting for the processor cycle. If the Processor Queue Length counter exceeds two per CPU for a continuous period (over 20 minutes or so), then you may have a CPU bottleneck. For very busy multiprocessor systems, the queue length should range from one to three. For example, if you have eight CPUs in your server, the expected range of Processor Queue Length on a busy database server with high CPU activity is 8 to 24.

Object: Processor - Counter: % Privileged Time

This is another important counter you should monitor when you see high CPU utilization. The Intel x86 family of processors supports four operating modes, called rings. These are numbered from 0 through 3 and each is isolated from the others by the hardware, although Windows uses only two of these modes: ring 0 for the kernel and ring 3 for user modes. OS code and device drivers run in kernel mode. This counter represents the percentage of time the CPU spends executing kernel code - for example, processing SQL Server I/O requests. If this counter exceeds 80% to 90% for a continuous period (over 20 minutes or so), and at the same time the Physical Disks counters (discussed in the "Disk Activity" section later in the chapter) are high, you may have a disk bottleneck, rather than a CPU bottleneck.

Object: Processor - Counter: % User Time

This counter measures the amount of time consumed by non-kernel-level applications such as SQL Server or your anti-virus software. The sum of % User Time and % Privileged Time is normally equal to the % Processor Time. This is also a very important counter to look at when you see high CPU usage, especially if you are running multiple processes on a server. You may want to investigate further by looking at the instance of the counter Process: % User Time.

Object: Processor - Counter: Interrupts/Sec

This counter measures the numbers of interrupts the processor was asked to respond to per second. Interrupts are generated from hardware components such as hard-disk controller adapters and network interface cards. A sustained value over 1,000 is usually an indication of a problem. Causes of problems can include poorly configured drivers, errors in drivers, excessive use of a device, or hardware failure. If the Interrupts/sec is much larger over a sustained period, you probably have a hardware issue.

Isolating Processor Activity Created by SQL Server

After you have identified a processor bottleneck, you next need to determine whether SQL Server is using the processor or whether the processor is being consumed by running many other processes. To do this, log processor time used by SQL Server on your system, as follows:

Select the Process object.
Select the % Processor Time counter.
Select the SqlServr instance. (If you have multiple instances of SQL Server running on a single box, Performance Monitor will list them as sqlservr#1, sqlservr#2, etc. Make sure that you pick the correct instance. You can use the Process: ID Process counter to do this. That counter lists the ProcessID and the instance name in Performance Monitor.)

Object: Process - Counter: % Processor Time

This counter gives you the percentage of time a CPU spends serving non-idle threads for all processes on your server. It does the same thing as Processor: % Processor Time, but it will give you the percentage of time for a specific process you have chosen, in this case SQL Server. If you suspect that SQL Server is not bottlenecking the CPU, you might want to add some other processes to find out which process is causing the problem. You can also quickly look at Task Manager and sort the activity by CPU, which will indicate what process is consuming the most CPU.

Object: Process - Counter: % User Time

This counter will give you the amount of time each individual process is consuming CPU cycles. It may be that you have just installed the new anti-virus software and it is eating all your CPU cycles. You can determine the culprit through analyzing Process: % User Time for the anti-virus software.

Resolving Processor Bottlenecks

With the help of all these counters, you can pretty much come to a conclusion about whether you have a CPU bottleneck in your system or not. If you do, the performance trends analyses of these counters will be very useful. Even though these counters may be well within the limits now, if you notice an increase each month (provided you are doing trends analysis every month), this is a good indication that you will run out of CPU cycle soon.

Analyze Your Application

You can use SQL Profiler and DMVs to gather the data for application analysis. We discuss both of these later in this chapter. Many CPU-intensive operations occur in SQL Server, such as excessive compilations and recompilations. It is also possible that just one or two problematic queries in your application are causing high CPU usage. The queries might get a high cache hit ratio (described in the "Memory Usage" section), but still require a large amount of logical I/O. Try to tune those queries or add indexes if necessary. The methods you can use to make such adjustments are varied and vast - from redesigning your application to reworking some queries, adding indexes, and more. If you have the money, you can buy bigger hardware, but you certainly don't want to kill a mouse with a bazooka. In addition, as a best practice, do not run OLTP and OLAP applications on the same server. To get the best performance, different types of applications should have their own dedicated servers, each one using the configuration required for the type of application you are running.

Server Configuration

Make sure that your server is properly configured. For more details on this topic please refer to Chapters 11 and 12.

After looking at the preceding options, if you still have high CPU usage you might want to consider purchasing a faster processor or adding a processor to your existing system. Make sure you use a faster processor with the largest processor cache that is practical.

Disk Activity

SQL Server relies on the Windows operating system to perform I/O operations. The disk system handles the storage and movement of data on your system, giving it a powerful influence on your system's overall responsiveness. Disk I/O is frequently the cause of bottlenecks in a system. You need to observe many factors in determining the performance of the disk system. These include the level of usage, the rate of throughput, the amount of disk space available, and whether a queue is developing for the disk systems. Unless your database fits into physical memory, SQL Server constantly brings database pages into and out of the buffer pool. This generates substantial I/O traffic. Similarly, log records need to be flushed to the disk before a transaction can be declared committed.

Many of these factors are interrelated. For example, if disk utilization is high, disk throughput might peak, and a queue might begin to form. These conditions might result in increased response time, and cause performance to slow. In addition, when disk space is extremely low, it can have an influence on response time. We have seen some cases where overall system response was very slow and it turned out that the C: drive had only 2 MB of free space left! Make sure you monitor for free disk space and take action when it falls below certain a given threshold.

The operating system enables a driver called diskperf.sys to activate disk monitoring. Keep in mind that by default the operating system activates only the physical performance counters. You must activate the Logical Disk counters manually using the diskperf command. To enable Logical Disk counters, run the following commands at the command prompt. Be sure to reboot the server after you run them.

 Diskperf -yv Diskperf -ye (when monitoring RAID disks)

We strongly recommend that you enable all disk performance data collection on any system for which you care about performance. The diskperf measurement layer does add some code to the I/O manager stack, so there is added latency associated with each I/O request that accesses a physical disk when measurement is tuned on. However, the overhead is trivial. In a benchmark environment where a 700MHz, eight-way Windows 2003 server was handling 50K I/O per second, enabling the diskperf measurements reduced its I/O capacity by about 4 percent, to 48K I/O per second. Let's say you haven't turned on the disk performance counters and some performance problem arises, which happened to be disk-related (and trust us, many are), you won't be able to gather the data because loading the diskperf requires a server reboot.

When it comes to I/O-related information under Windows, Performance Monitor is the undisputed king. We have listed the key I/O-related Performance Monitor counters in this section. The following sections describe the Physical Disk counters because they are the ones that truly provide value.

Object: Physical Disk - Counter: % Avg. Disk Sec/Transfer

This counter reports how fast data is being moved (in seconds). It measures the average time of each transfer, regardless of number of bytes read or written. The Avg. Disk Sec/Transfer measurement reported is based on the complete round-trip time of a request. Strictly speaking, it is a direct measurement of the disk response time, which means it includes the queue time. Queue time is the time spent waiting for the device because it is busy with another request, or waiting for the SCSI bus to the device because it is busy. A high value (for most disks, greater than 0.3 seconds) for this counter might mean that the system is retrying the requests due to lengthy queuing, or, less commonly, disk failures. To further analyze the data use Physical Disk: Avg. Disk Sec/Read and Physical Disk: Avg. Disk Sec/Write.

Object: Physical Disk - Counter: Avg. Disk Sec/Read

Because this counter reports how fast data is being read from your disk (in seconds), it gives you a good idea of disk read operation response time. If you are running a RAID configuration, this counter may signal that you should change your RAID configuration to a different one to get a better response time.

Object: Physical Disk - Counter: Avg. Disk Sec/Write

This counter is nearly the same as the previous one, but it reports how fast data is written to your disk, instead of read from it.

Object: Physical Disk - Counter: % Disk Time

This counter reports the percentage of time that the selected disk drive is busy servicing read or write requests. The % Disk Time counter is not measured directly. It is the value derived by the diskperf filter driver that provides the disk performance statistics. Diskperf is the layer of software sitting in the disk driver stack, as discussed earlier (see Figure 13-1).

image from book
Figure 13-1

As I/O request packets (IRPs) pass through the Diskperf.sys layer, it keeps track of the time the I/O starts and finishes. On the way to the device, diskperf records a timestamp for the IRP. On the way back from the device, the completion time is recorded. The difference is the duration of the I/O request. Averaged over a collection interval, this becomes the Avg. Disk Sec/Transfer, a direct measure of the disk response time from the point of view of the device driver. Diskperf also maintains byte counts and separate counters for reads and writes at both Logical and Physical Disk level. This enables Avg. Disk Sec/Transfer to be broken out into reads and writes.

% Disk Time is the value derived by diskperf from the sum of all the IRP round-trip times (Avg. Disk Sec/Transfer) multiplied by Disk Transfer/Sec (this counter is described later in this section), and divided by duration, or essentially the following:

% Disk Time = Avg. Disk Sec/Transfer Disk Transfer/Sec

Because the Avg. Disk Sec/Transfer that diskperf measures includes disk queuing, % Disk Time can grow greater than 100 percent if there is significant queuing. The formula used in the calculation to derive % Disk Time corresponds to Little's Law, a well-known equivalence relation that shows the number of requests in the system as a function of the arrival rate and service time. According to the law, Avg. Disk Sec/Transfer times Disk Transfer/Sec properly yields the average number of requests in the system, formally known as the Average Queue Length. The Average Queue Length value calculated in this fashion includes both the IRPs queued for service and those actually in service.

As a general guideline, if this counter is high (more than 90%), then you should start looking at potential I/O bottlenecks. Use the Physical Disk: Current Disk Queue Length counter together with % Disk Time for potential I/O bottlenecks. Even after looking at these two counters, though, you should not jump to the conclusion that you have an I/O bottleneck. We will discuss later which other counters you should look at before concluding that you have a disk I/O bottleneck. If you are using a RAID device, the % Disk Time counter can indicate a value greater than 100 percent. If it does, use the Avg. Disk Queue Length counter to determine how many system requests on average are waiting for disk access.

Object: Physical Disk - Counter: Avg. Disk Queue Length

This counter tracks the number of requests that are queued and waiting for a disk during the sample interval, as well as the requests in service. The Avg. Disk Queue Length counter is derived from the product of Avg. Disk Sec/Transfer multiplied by Disk Transfer/Sec, which is the average response of the device times the I/O rate, as described previously. The number of waiting I/O requests (Avg. Disk Queue Length) should be sustained at no more than 1.5 to 2 times the number of spindles that make up the physical disk. Most disks have one spindle, although RAID devices usually have more than one. A hardware RAID device appears as one physical disk in Performance Monitor. RAID devices created through software appear as multiple instances in Performance Monitor.

Object: Physical Disk - Counter: Current Disk Queue Length

This counter tracks the number of requests that are waiting as well as the requests currently being serviced. This counter gives an instantaneous value or snapshot of the current queue length, unlike Avg. Disk Queue Length, which reports averages. The Avg. Disk Queue Length counter is an estimate of the number of outstanding requests in the disk (Logical and Physical). This includes any requests that are currently in service at the device, plus any requests that are waiting for the service. If the requests are currently waiting for the device inside the SCSI device driver layer of the software below the diskperf filter driver (refer to Figure 13-1), the Current Disk Queue Length counter will have a value greater than 0. If the requests are queued in the hardware, which is usual for SCSI disks and RAID controllers, the Current Disk Queue Length counter will show a value of 0 even though requests are queued.

Object: Physical Disk - Counter: % Disk Read Time

This counter reports the percentage of time that the selected disk drive is busy servicing read requests. When the % Disk Time counter is high, you should add this counter to your watch list because it will give you more information about whether read activity is causing a bottleneck or not. In addition, this counter, along with % Disk Write Time, provides a high-level view of the type of disk activity for which you must plan.

Object: Physical Disk - Counter: % Disk Write Time

This counter reports the percentage of time that the selected disk drive is busy servicing write requests. When the % Disk Time counter is high, you should add this counter to your watch list because it will give you more information about whether write activity is causing a bottleneck or not.

Object: Physical Disk - Counter: % Idle Time

This counter reports the percentage of time that the disk system was not processing requests and no work was queued. This counter, when added to % Disk Time, might not equal 100 percent, because % Disk Time can exaggerate disk usage. Having a measure of disk idle time enables you to calculate whether the % Disk Busy measurement equals 100 minus the % Idle Time, which is a valid measure of disk usage.

Object: Physical Disk - Counter: Avg. Disk Bytes/Transfer

This counter measures the size of input/output operations. The disk is efficient if it transfers large amount of data relatively quickly. This counter shows how many items are processed, on average, during an operation. It displays a ratio of the bytes sent to the number of transfers completed. The ratio is calculated by comparing the number of bytes sent during the last interval to the number of transfers completed during the last interval. If you want to drill down further, use the Physical Disk: Avg. Bytes/Read and Physical Disk: Avg. Bytes/Write counters.

Object: Physical Disk - Counter: Disk Bytes/Sec

This counter measures the rate at which bytes are transferred; it is the primary measure of disk throughput. To analyze the data based on reads and writes, use the Physical Disk: Disk Bytes/Read and Physical Disk: Disk Bytes/Write counters.

Isolating Disk Activity Created by SQL Server

We have discussed all the counters you should monitor to find disk bottlenecks. However, you may have multiple applications running on your servers, and it is very possible that they could cause a lot of disk I/O, or you may have memory bottlenecks in your system. You should isolate the disk activities created by SQL Server. Monitor the following counters to verify whether the disk activity is caused by SQL server or not:

SQL Server: Buffer Manager: Page reads/sec
SQL Server: Buffer Manager: Page writes/sec

Sometime your application is too big for the hardware you have, and a problem that appears to be related to disk I/O may be resolved by adding more RAM. Make sure you do a proper analysis before making a decision. That's where trend analysis is very helpful because you can see how the performance problem evolved.

Is Disk Performance the Bottleneck?

With the help of the disk counters, you can come to a conclusion regarding whether you have disk bottlenecks in your system or not. Several conditions must exist in order for you to determine that a disk bottleneck exists in your system, including a sustained rate of disk activity well above your baseline, persistent disk queue length longer than two per disk, and the absence of a significant amount of paging. Without this combination of factors, it is unlikely that you have a disk bottleneck in your system.

Note here that sometimes your disk hardware may be faulty, and that could cause a lot of interrupts to the CPU. It is a processor bottleneck caused by a disk subsystem, which can have a systemwide performance impact (see the description of counter interrupts/sec in the "Processor Utilization" section). Make sure you consider this when you analyze the performance data.

Data Analysis

When interpreting log data, remember the limitations of the performance counters that report sums or disk time. The counters sum the totals rather than calculate them over the number of disks. For example, if you want to calculate the I/Os per disk in different RAID configurations, you have to use the following formulas:

RAID 0 - I/Os per disk = (Reads + Writes) / Number of disks
RAID 1 - I/Os per disk = [Reads + (2 Writes)] / 2
RAID 5 - I/Os per disk = [Reads + (4 Writes)] / Number of disks
RAID 10 - I/Os per disk = [Reads + (2 Writes)] / Number of disks

As explained earlier, % Disk Time can sometime exceed 100 percent, which it should not. If that is the case, use the Avg. Disk Queue Length, Avg. Disk Write Queue Length, and Avg. Disk Read Queue Length counters to display disk activity and usage as decimals, rather than as a percentage. Don't forget to recalculate the values over the whole disk configuration.

Resolving Disk Bottlenecks

If, after monitoring your system, you came to the conclusion that you have a disk bottleneck, you need to be able to resolve the problem. The following sections outline some steps you can take to return your system to optimal performance.

Analyze Your Application

You can use SQL Profiler and DMVs to gather the data for analysis, which is discussed later in this chapter. In addition, look at the execution plans and see which plans led to more I/O being consumed. It is possible that a better plan (for example, a plan using the right indexes) can minimize the I/O. Query plan analysis is described in detail in Chapter 14. If there are missing indexes, you may want to run the Database Engine Tuning Advisor to find them (see Chapter 15 for details).

Disk Configuration and File Layout

If you are using RAID, make sure that it is properly configured. For more details on this topic refer to Chapter 11. In addition, after looking at the Avg. Disk Queue Length and Current Disk Queue Length counters for each physical disk, make sure that one single disk doesn't get too much queue compared to the others.

Increase I/O Bandwidth

If after looking at these options you still have disk bottlenecks, you should consider adding more physical drives to the current disk array controller, or replacing your current disks with faster drives. That will help boost both read and write access time. In addition, you can add faster or additional I/O controllers. Try to get I/O controllers with more cache on them. Finally, adding more memory can help reduce disk I/O.

Disk Performance Counters

The following table lists the counters you should monitor for disk performance data.

Open table as spreadsheet

Counter Type	Counters
Queue-Length Counters (Physical Disk Object)	Avg. Disk Queue Length Avg. Disk Read Queue Length Avg. Disk Write Queue Length Current Disk Queue Length
Throughput Counters (Physical Disk Object)	Disk Bytes/Sec Disk Read Bytes/Sec Disk Write Bytes/Sec
Usage Counters (Physical Disk Object)	% Disk Time % Disk Read Time % Disk Write Time % Idle Time Disk Reads/Sec Disk Writes/Sec Disk Transfers/Sec
Paging Counters (Memory Object)	SQL Server: Buffer Manager: Cache Hit Ratio SQL Server: Buffer Manager: Page Life Expectancy SQL Server: Buffer Manager: Checkpoint Pages/Sec SQL Server: Buffer Manager: Lazy Writes/Sec Memory: Pages/Sec Memory: Page Reads/Sec Memory: Page Writes/Sec

Counter Type

Counters

Queue-Length Counters

(Physical Disk Object)

Avg. Disk Queue Length

Avg. Disk Read Queue Length

Avg. Disk Write Queue Length

Current Disk Queue Length

Throughput Counters

(Physical Disk Object)

Disk Bytes/Sec

Disk Read Bytes/Sec

Disk Write Bytes/Sec

Usage Counters

(Physical Disk Object)

% Disk Time

% Disk Read Time

% Disk Write Time

% Idle Time

Disk Reads/Sec

Disk Writes/Sec

Disk Transfers/Sec

Paging Counters

(Memory Object)

SQL Server: Buffer Manager: Cache Hit Ratio

SQL Server: Buffer Manager: Page Life Expectancy

SQL Server: Buffer Manager: Checkpoint Pages/Sec

SQL Server: Buffer Manager: Lazy Writes/Sec

Memory: Pages/Sec

Memory: Page Reads/Sec

Memory: Page Writes/Sec

Memory Usage

Low memory conditions can slow the operation of the applications and services on your system. Monitor an instance of SQL Server periodically to confirm that the memory usage is within typical ranges. When your server is low on memory, paging - the process of moving virtual memory back and forth between physical memory and the disk - can be prolonged, resulting in more work for your disks. The paging activity might have to compete with other transaction being performed, intensifying disk bottleneck. Monitor the following counters to identify memory bottlenecks.

Object: Memory - Counter: Available Mbytes

This counter reports how many megabytes of memory are currently available after the working set of running processes and the cache have been served. It is calculated by adding the amount of space on the Zeroed, Free, and Standby memory lists. Free memory is ready for use; Zeroed memory are pages of memory filled with zeros to prevent later processes from seeing data used by a previous process; and Standby memory is memory removed from a process' working set (its physical memory) en route to disk, but still available to be recalled. If the memory amount is low, external memory pressure may be present. The exact value depends on many factors; however, you can start looking into this when memory drops below 50–100MB. External memory pressure is clearly present when the value drops below 10MB.

Please keep in mind that when RAM is in short supply (when committed bytes is greater than installed RAM), the operating system will attempt to keep a certain fraction of installed RAM available for immediate use by copying virtual memory pages that are not in active use to the page file. For this reason, this counter will not go to zero, and is not necessarily a good indication of whether your system is short of RAM.

Object: Memory - Counter: Pages/Sec

This counter reports the number of requested pages that were not immediately available in RAM and had to be read from the disk or had to be written to the disk to make room in RAM for other pages.

This counter is the sum of Memory: Pages Input/Sec and Memory: Pages Output/Sec. Pages Input/Sec is the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk. When a page is faulted, the system tries to read multiple contiguous pages into memory to maximize the benefit of the read operation. Compare the value of Memory: Pages Input/Sec to the value of Memory: Page Reads/Sec to determine the average number of pages read into memory during each read operation. Pages Output/Sec is the rate at which pages are written to disk to free up space in physical memory. Pages are written back to disk only when they are changed in physical memory, so they are likely to hold data, not code. A high rate of pages output might indicate a memory shortage. Windows writes more pages back to disk to free up space when physical memory is in short supply.

If your system experiences a high rate of hard page faults, the value for Memory: Pages/Sec can be high. A high value for this counter is not necessarily indicative of memory pressure. If you happen to have an application running on your server along with SQL Server that is doing sequential reading of memory-mapped files, whether cached or not, this counter will report that too. Therefore, it is necessary to monitor other counters such as Memory: Pages Output/Sec and Memory Pages: Input/Sec in order to understand the big picture.

Object: Memory - Counter: Committed Bytes

Committed Bytes is the amount of committed virtual memory, in bytes. Committed memory is the physical memory that has space reserved on the disk paging file(s). There can be one or more paging files on each physical drive. This counter displays the last observed value only; it is not an average. As Committed Bytes grows above the amount of available RAM, paging will increase and the size of the page file in use will also increase. At some point, paging activity will start to significantly impact perceived performance.

Object: Memory - Counter: Committed Limit

Commit Limit is the amount of virtual memory that can be committed without having to extend the paging file(s). It is measured in bytes. Committed memory is the physical memory that has space reserved on the disk paging files. There can be one paging file on each logical drive. If the paging file(s) are expanded, this limit increases accordingly. This counter displays the last observed value only; it is not an average.

Isolating Memory Used by SQL Server

After you have identified a memory bottleneck, you next need to isolate the memory used by SQL Server. By default, SQL Server changes its memory requirements dynamically, on the basis of available system resources. If SQL Server needs more memory, it queries the operating system to determine whether free physical memory is available and uses the available memory. If SQL Server does not need the memory currently allocated to it, it releases the memory to the operating system if the OS asks for it. To monitor the amount of memory SQL Server uses, examine the following Performance Monitor counters.

Object: Process - Counter: Working Set

The Working Set counter reports the amount of committed memory allocated to the process. This might include shared and private bytes currently residing in physical memory. The Private Bytes counter reports memory allocated exclusively to the process. Working set monitoring is important because when memory is in short supply, the operating system trims the working sets of processes, and paging occurs. As long as no trimming has occurred, Working Set is the best counter for seeing how much memory has been allocated within the SQL Server process space.

Object: Process - Counter: Private Bytes

This counter shows the current number of bytes allocated to a process that cannot be shared with other processes. It is probably the best counter in Performance Monitor for viewing the approximate amount of memory consumed by any threads within the sqlservr.exe process space.

Object: Process - Counter: Virtual Bytes

You can use this counter to see the total virtual memory being used by SQL Server. Virtual Bytes is the current size, in bytes, of the virtual address space the process is using. Use of virtual address space does not necessarily imply corresponding use of either disk or main memory pages.

Object: SQL Server: Buffer Manager - Counter: Buffer Cache Hit Ratio

This counter reports the percentage of pages found in the buffer cache without having to read from disk. The ratio is the total number of cache hits divided by the total number of cache lookups since an instance of SQL Server was started. After a long period of time, the ratio moves very little. Because reading from the cache is much less expensive than reading from disk, you want this ratio to be high (90 percent or higher is desirable).

Object: SQL Server: Buffer Manager - Counter: Total Pages

This counter indicates the total number of pages in the buffer pool, which includes database pages, and free and stolen pages.

Object: SQL Server: Memory Manager - Counter: Total Server Memory (KB) and Target Server Memory (KB)

The first counter, SQL Server: Memory Manager: Total Server Memory (KB), tells you how much memory the Sqlservr service is currently using. This includes the total of the buffers committed to the SQL Server buffer pool and the OS buffers. The second counter, SQL Server: Memory Manager: Target Server Memory (KB), tells you how much memory SQL Server would like to have in order to operate efficiently. This is based on the number of buffers reserved by SQL Server.

If, over time, the SQL Server: Memory Manager: Total Server Memory (KB) counter is less than the SQL Server: Memory Manager: Target Server Memory (KB) counter, then SQL Server has enough memory to run efficiently. Conversely, if the SQL Server: Memory Manager: Total Server Memory (KB) counter is more than or equal to the SQL Server: Memory Manager: Target Server Memory (KB) counter, then SQL Server may be under memory pressure and could use access to more physical memory.

Resolving Memory Bottlenecks

The easy solution to memory bottlenecks is to add more memory, but as we said earlier, tuning your application always comes first. Try to find queries that are memory intensive, such as queries with large worktables - such as hashes for joins and sorts - and see if you can tune them. You will learn more about tuning T-SQL queries in Chapter 14.

In addition, refer to Chapter 11 to ensure that you have configured your server properly. After adding more memory, if you are still running into memory bottlenecks and you are running a 32-bit machine, then look into a 64-bit system. For an analysis (reporting queries) type of application, whereby you have a lot of aggregation and sorts and hash joins, consider a 64-bit platform. Of course, we are talking about big applications with tens of thousands of users running complex reports and pulling a lot of data, and not like Adventure Works.