You need to be able to tell how NT/Windows 2000 is reacting to the presence of SQL Server running within it: how SQL Server is using memory, the processors, and other important system resources. A large number of objects and counters relate to Windows and the services it is running. The next few sections will look at which objects and counters provide useful information in investigating certain areas of the system and focus on the ones that you need for monitoring SQL Server.
Monitoring the Network Interface
One area of possible congestion is the network card or network interface; it does not matter how fast the server's work is if it has to queue up to go out through a small pipe.
Remember: Any activity on the server machine might be consuming some of the bandwidth of the network interface/card. You can see the total activity via Performance Monitor. The detail information is good now. Table 37.1 shows the typical network performance object and counters you will want to be using to measure the total network interface activity.
Table 37.1. Network Interface Performance Objects and Counters
Figure 37.6 illustrates a pretty low usage picture for a particular network interface.
Figure 37.6. Network interface's performance object and counters.
In general, if the SQL Server counter is grossly lower than the server's counter, then other activity on the server is occurring that is potentially bogging this server down or not allowing SQL Server to be used optimally. The rule of thumb here is to isolate all other functionality to other servers if you can and let SQL Server be the main application on a machine.
Pay strict attention to how many requests are queuing up, waiting to make use of the network interface. You can see this by using the DBCC PERFMON command and looking at the Command Queue Length value. As mentioned earlier, this number should be 0. If it is 2 or more, then the network interface has a bottleneck. Check the bus width of the card. Obviously, a 32-bit PCI card is faster than an 8-bit ISA one. Also, check that you have the latest drivers from the hardware vendor.
When using DBCC PERFMON, the detail information of actual bytes read and written allows you to understand the size of this network activity. A quick calculation of reads/bytes gives you an average size of reads from the network. If this is large, then you might want to question what the application is doing and whether the network as a whole can handle this big of a bandwidth request.
DBCC PERFMON Go Statistic Value -------------------------------- ------------------------ Network Reads 39.0 Network Writes 47.0 Network Bytes Read 4008.0 Network Bytes Written 70975.0 Command Queue Length 0.0 Max Command Queue Length 0.0 Worker Threads 0.0 Max Worker Threads 0.0 Network Threads 0.0 Max Network Threads 0.0 Wait Type Requests Wait Time Signal Wait Time ----------------- --------------- ---------------- ------------------------ NETWORKIO 18.0 40.0 0.0
Sp_Monitor as well as several SQL Server system variables can also be used to see much of what is being shown in DBCC PERFMON. DBCC PERFMON:Network Reads corresponds to sp_monitors (or @@pack_received system variable) packets_received and DBCC PERFMON:Network Writes corresponds to sp_monitors (or @@pack_sent system variable) packets_sent.
The following SELECT statement retrieves the current picture of what is being handled by SQL Server from a network packets point of view:
SELECT @@connections as 'Connections', @@pack_received as 'Packets Received', @@pack_sent as 'Packets Sent', getdate() as 'As of datetime' go Connections Packets Received Packets Sent As of datetime 39 998 1799 2001-09-01 14:11:56.660 (1 row(s) affected)
Monitoring the Processors
The main processor(s) of your server is doing the majority of all the hard work, executing the operating system code and all applications. This is the next logical point to start looking at the performance of your system. The emphasis here will be to see if the processors that are allocated to the server are busy enough to maximize performance, but not too saturated as to create a bottleneck. The rule of thumb here is to see if your processors are working at between 50 “80 percent. If this usage is consistently above 90 “95 percent, then you must look at splitting off some of the workload or adding processors. Table 37.2 indicates some of the key performance objects and counters for measuring processor utilization.
Table 37.2. Processor- Related Performance Objects and Counters
The counters System: % Total Processor Time, System: Processor Queue Length and Processor: % Processor Time are the most critical to watch. If the percentages are consistently high (above that 90 “95 percent level), then you need to identify which specific processes and threads are consuming so many CPU cycles.
From the SQL Server point of view, you can execute a simple SELECT statement that yields the SQL Server processes and their corresponding threads.
[View full width]
This will give you the top CPU resource hogs that are active on SQL Server. After you identify which processes are causing a burden on the CPU, check whether they can be either turned off or moved to a different server. If they cannot be turned off or moved, then you might want to consider upgrading the processor.
No one should use the SQL Server box as a workstation because using the processor for client applications can cause SQL Server to starve for processor time. The ideal Windows setup for SQL Server is on a standalone member server to the Windows domain. Do not install SQL Server onto a primary domain Controller (PDC) or backup domain controller (BDC) because they run additional services that consume memory, CPU, and network resources.
Before you upgrade to the latest processor just because the % Processor Time counter is constantly high, you might want to check the load placed on the CPU by your other adapters. By checking Processor: % Interrupt Time and Processor: Interrupts/sec, you can tell whether the CPU is interrupted more than normal by adapters such as disk controllers.
The % Interrupt Time should be as close to 0 as possible; controller cards should handle any processing requirements. The optimum value of Interrupts/Sec varies with the CPU used; DEC Alpha processors generate a nonmaskable interrupt every 10 milliseconds (ms), whereas Intel processors interrupt every 15ms. The lowest absolute values are 100 interrupts per second and 67 interrupts per second, respectively.
The System: Context Switches counter can reveal when excessive context switching occurs, which usually directly affects overall performance. In addition, the System: Threads counter can give a good picture of the excessive demand on the CPU of having to service huge numbers of threads. In general, only look at these counters if processor queuing is happening.
By upgrading inefficient controllers to bus-mastering controllers, you can take some of the load from the CPU and put it back on the adapter. You will also want to keep the controller patched with the latest drivers from the hardware vendor.
Memory, like the processor, is divided into segments for each process running on the server. If memory has too much demand, the operating system has to use virtual memory to supplement the physical memory. Virtual memory is storage allocated on the hard disk; it is named PAGEFILE.SYS under Windows. Table 37.3 reflects the main performance objects and counters that are best utilized to monitor memory for SQL Server.
Table 37.3. Memory-Related Performance Objects and Counters
Numerous goals can be achieved related to memory and SQL Server. Figure 37.7 shows a typical monitoring of memory underway.
Figure 37.7. Memory performance object and counters.
It is important to remember that when the operating system or SQL Server isn't able to use memory to find something and has to use virtual memory stored on the disk, performance will degrade. Therefore, you need to work on minimizing this situation, known as swapping or page faulting.
To observe the level of the page faulting, you can look at the Memory: Page Faults/sec and Process: Page Faults (for a SQL Server Instance) counters.
Next in line are the MSSQL Buffer Manager: Buffer Cache hit ratio and MSSQL Buffer Manager: Total Pages counters. These directly indicate how well SQL Server is finding data in its controlled memory (cache). You need to achieve a near 90 percent or higher ratio here. DBCC PERFMON also has the Cache Hit Ratio and Cache Size information.
If the Memory: Pages/sec counter is greater than 0 or the Memory: Page Reads counter is greater than 5, the operating system is being forced to use disk to resolve memory references. These are called hard faults. The Memory: Page counter is one of the best indicators of the amount of paging that Windows is doing and the adequacy of SQL Server's current memory configuration.
Because the memory used by SQL Server 2000 dynamically grows and shrinks, you might want to track the exact usage using either Process: Working Set: SQLServr or MSSQL: Memory Manager: Total Server Memory (KB). These counters indicate the current size of the memory used by the SQL Server process. If these are consistently high as compared to the amount of physical memory in the machine, then you are probably ready to install more memory on this box. If you see a performance degradation because SQL Server must continually grow and shrink its memory, you should either remove some of the other services or processes running or use the configuration option Use a Fixed Memory Size.
Monitoring the Disk System
By monitoring the portion of the system cache used for the server services (synchronous) and that related to the SQL Server (asynchronous), you can see how much disk access is related to SQL Server. Not all asynchronous disk activity is SQL Server, but on a dedicated box, it should be. You can watch a number of different synchronous and asynchronous counters, depending on the type of activity you want to monitor. The essential performance objects and counters related to monitoring the disk system are indicated in Table 37.4.
Table 37.4. Disk Usage “Related Performance Objects and Counters
Before you can get information from some of these counters, you must first turn them on by using diskperf . From a command prompt, you need to execute diskperf -y and then reboot the computer (see Figure 37.8). This is done this way so that nothing is slowing down disk performance in any way (as the default). If you want to monitor this area, you have to ask for it and take a slight hit on overall performance due to the overhead of these counters.
Figure 37.8. Setting the diskperf “y option on.
Slow disk I/O causes a reduction in the transaction throughput. To identify which disks are receiving all the attention, you should monitor both the Physical Disk and Logical Disk performance objects. You have many more opportunities to tune at the disk level than you do with other components such as processors. This has long been the area that database administrators and system administrators have been able to get better performance.
The place to start is with looking at the Physical Disk: Current Disk Queue Length and Physical Disk: Avg. Disk Queue Length counters' behavior of all disks or of each particular disk. This will identify where much of the attention is from a disk-usage point of view. Figure 37.9 shows a disk-monitoring session under way that is loaded with long disk queues. In other words, disk operations are being stacked up and there appears to be a major issue here.
Figure 37.9. Disk performance object and counters.
As you monitor each individual disk, you might see that some drives are not as busy as others. You can relocate heavily used resources to minimize these long queue lengths that you have uncovered and spread out the disk activity. Common techniques for this are to relocate indexes away from tables, isolate read-only tables away from volatile tables, and so on.
The Physical Disk: % Disk Time counter for each physical disk drive will show you the percentage of time that the disk is active; a continuously high value could indicate an under-performing disk subsystem.
Of course, the monitoring up to this point will only show half the picture if drives are partitioned into multiple logical drives. To see the work on each logical drive, you need to examine the logical disk counters; in fact, you can monitor read and write activity separately with Logical Disk: Disk Write Bytes/sec and Logical Disk: Disk Read Bytes/sec.
If you use RAID, it is necessary to know how many physical drives are in each RAID array to figure out the monitored values of disk queuing for any one disk. In general, just divide the disk queue value by the number of physical drives in the disk array. This will give you a fairly accurate number for each physical disk's queue length.