Identifying Bottlenecks

Identifying Bottlenecks

In many cases, you can use simple logic to detect whether the SQL tier is a bottleneck. For example, when you have multiple Web servers accessing data from a single SQL server, reducing the number of Web servers should decrease the overall application throughput. If changing the number of Web servers does not vary the throughput, the SQL tier is probably the source of the bottleneck.

NOTE
Frequently, the stress clients may reach a bottleneck before enough stress has been applied on the Web servers or the SQL servers. Therefore, it is important to periodically check the stress clients resource utilization during each stress test.

Finding out whether the SQL tier has a problem isn t a terribly difficult task. You can reduce the number of Web servers as mentioned earlier, or monitor the processor utilization at each tier. In contrast, pinpointing the problems and producing valid solutions is not easy, and this chapter will not give you such skills instantly. Instead, we ll show you how we discover the SQL tier bottlenecks and hope that you will learn from our experience. We will start by discussing the tools included with SQL Server 2000.

Tools We Use

We troubleshoot most SQL related problems using only the tools included in SQL Server 2000. Slow queries can be identified using SQL Profiler; blocking can be detected using SQL Profiler and SQL Query Analyzer; and other causes of bottlenecks, such as disk or memory, can be identified using System Monitor. Let s look at each tool and discuss how we use them.

SQL Profiler

SQL Profiler, if installed in the default location, can be found under Start\Programs\Microsoft SQL Server\Profiler.

NOTE
To run a trace, the user must have administrator rights on the SQL server being monitored.

After launching SQL Profiler, select the server that you want to monitor and then enter your login information. After a connection has been established, the Trace properties dialog box will open. This dialog box has four tabs: General, Events, Data Columns, and Filters, which are used to specify the trace properties.

On the General tab, you can specify the name of a trace, specify a trace template, and choose to save the data to a trace file or a SQL table. If you choose to save to a file, you have to specify a directory and filename; if you choose to save the data to a SQL server, you need to specify the server, database, and table.

On the Events tab, you can select the events to monitor. The events are grouped into collections under which individual events are listed. Selecting an event should display a description on the bottom of the tab. Monitoring all the events might be overkill and can put a burden on the server. Some of the events that we typically capture are:

  • Stored Procedures: SP:StmtCompleted, SP:Completed, RPC:Completed and TSQL: SQL:StmtCompleted, SQL:Batch Completed

    These events are triggered whenever a statement, stored procedure, RPC or batch is completed. By including the text, reads, CPU, and duration data columns in the trace, you can identify the slow running code and the cost in terms of CPU resources used. It is valuable to know the reads incurred for establishing a baseline against which you can compare optimization efforts.

  • Stored Procedures: SP:Starting, SP:StmtStarting and TSQL: SQL:StmtStarting

    By including these events with the completed events, you can profile a stored procedure down to the line of code. This is useful for long-running stored procedures because it lets you identify the problematic statement before the completion. Additionally, if errors occur while running the stored procedures, you can pinpoint the statement at which an error is occurring.

  • Sessions: Existing Connection

    With this event you can verify the SQL Server options set for database connections. Session options can affect the way queries are processed. For example, to use indexed views, the ARITHABORT option must be set to ON, otherwise the underlying table will be used to retrieve data instead of the clustered index on the view.

  • Stored Procedures: SP:Recompile

    The steps involved in running a stored procedure are parsing, compiling, and executing. Each of these steps takes a finite amount of time. Ideally, code should be parsed and compiled only once, utilizing only the server resources to execute the query. Improperly coded stored procedures can cause a recompile every time they are called, which has an adverse effect on the duration and even can cause blocking. By including this event in your trace, you can gauge the amount of recompiling and identify the code that needs to be optimized by including the SP: Starting event in the same trace.

  • Lock: Timeout and Lock: Deadlock

    You must track this event because it directly affects scalability of the application. A lock timeout or a deadlock can equate to incomplete transactions and result in errors on the client side.

  • Errors and Warnings: Attention

    This event is generated any time there is a disconnect by a client or a timeout. For applications that typically require a significant amount of time to process, you may need to increase the timeout value to reduce the number of time outs. For example, in ADO.NET you may need to increase the value of the CommandTimeOut property to more than 30 seconds (default) for queries that have long durations.

  • Errors and Warnings: Missing Column Statistics

    SQL Server can automatically create and maintain statistics on tables and indexes. This information is merely a distribution of the values in particular data columns of the tables or indexes and is used by the query processor to select an execution plan. Missing or outdated statistics could cause the query processor to pick an inefficient execution plan, which can have an impact on the overall throughput of the application.

  • Errors and Warnings: Missing Join Predicate

    This event is triggered whenever a cross join is run. Cross joins are very rarely used and could be a potential oversight in a developer s code. Tracking this event will let you confirm whether queries reference tables in such a manner.

  • Errors and Warnings: Exception and Error Log

    These events are useful for determining if any unusual errors are occurring as a result of stressing the application. By monitoring this event you can establish whether these errors are a result of load. You can, for example, set the trace flag 1204 and 3605 for all connections made to a server, and SQL Profiler can capture information relating to deadlocks. Deadlocking is a serious problem that will result in a poor client experience and significantly affect your application throughput.

For more information, search under "event classes" in SQL Server Books Online.

On the Data Columns tab, you can specify the data columns that you want to capture. There are 43 columns, of which Event Class and SPID are required. Depending on the level of activity a given application might generate on the SQL server and how many users are in the environment, you might end up with a huge amount of data. Keep in mind that not all data columns will be populated, because every column is not meaningful for every event. For example, duration is only meaningful for complete events such as SP:Completed; this column will be blank for SP:Starting. Some of the data columns that we collect are listed in Table 8-1:

Table 8-1. Typical Data Columns

Data Column

Description

Event

The name of the event. For example, the start of a stored procedure, a connection established, and so on.

Text

The text of the SQL command being run. This column can be blank if it is irrelevant for a particular event.

CPU

CPU resources used in milliseconds.

Reads

Number of logical reads performed to execute a query.

Writes

Number of physical writes performed by a SQL command.

Duration

The duration of an event, such as the time to execute a stored procedure.

StartTime

Time the event was triggered.

EndTime

Time the event completed. This applies to end events.

NestLevel

The depth within a nesting operation. For example, if a stored procedure calls another stored procedure, the nest level of the calling procedure is 1 and that of the called procedure is 2. This is equal to the @@NestValue global variable.

Application Name

The name of the application that has a connection to the server.

HostName

The machine name on which the application that submits a SQL command is running.

NTUserName

The Windows user that the connecting application is running under.

LoginName

Either the Windows user name or the SQL server login, depending on the authentication method used to connect.

For more information, search under "data columns" in SQL Server Books Online.

On the Filters tab, you can specify filters to further limit the data collected. For example, if you are only interested in the SQL activity generated by one particular application, you can set a filter on the application name. By default, events triggered by SQL Profiler are excluded. You can familiarize yourself with all the filters available by examining the Filters tab in the Trace Properties dialog box. Selecting a filter should display an explanation of the filter at the bottom of the tab.

Now that you know what to track, you may be wondering how to analyze the trace you have captured. SQL Profiler gives you two choices for saving the data: a trace file or a SQL table. In fact, if you save the data to a file, you can always open the trace file and still choose to save the data to a SQL table. The choice is yours and depends on your preference.

TIP
You can also directly query a trace file by using a system function called fn_trace_gettable. For example, the following query returns all events with greater than 1000 milliseconds duration from trace.trc file in a table format:

SELECT * FROM ::fn_trace_gettable( c:\trace.trc , default) WHERE Duration > 1000

For more information, see "fn_trace_gettable" in SQL Server Books Online.

We find that for large amounts of data, it is easier to analyze a trace by saving the data to a table and writing queries against this table to retrieve information such as the top few stored procedures that had the longest duration or to get the average duration across all instances of a stored procedure. Before you decide that some code is slow, you need to define criteria for a slow running query.

TIP
We consider any code that has a duration of a second or more while the application is under minimal load to be an area for further investigation and potential improvement. The logic behind this criterion is that under load these durations will simply increase and hence reduce application throughput.

As you run more and more traces, you will find that you always track certain events and data columns or specify certain filters. SQL Profiler makes this task easy by letting you create a template containing the trace properties that you most often use. This will save you from having to define your trace properties every time you need to run a trace. The steps to create a template are very simple: Choose New Trace Template from the File menu to display the Trace Properties dialog box, in which you can specify events, data columns, and filters. You can then click Save As on the General tab. Alternatively, you can choose Open Trace Template from the File menu to open an existing template. You can then customize the template and save it under a different name.

System Monitor

SQL Server exposes a number of performance objects that can be monitored with System Monitor. Chapter 4 covers this tool in great detail and focuses on counters such as Processor, Memory, Disk, and Network that relate to system level resources. This section will only cover the SQL Server specific counters that we tend to look at most often. The reference books listed at the start of this chapter cover the other counters in more detail. Depending on your specific needs, you may want to look at other counters. The SQL Server specific counters that we use most often are:

  • SQLServer:Buffer Manager
    • Buffer cache hit ratio

      The percentage of pages that were found in the buffer pool without having to incur a read from disk (physical I/O). Low values might be a symptom of low memory conditions or poor indexing.

  • SQLServer:General Statistics Object
    • User Connections

      The number of active SQL connections to the system. This counter is informational and can be used in reporting results to quantify the level of concurrency on the system.

  • SQLServer:Locks
    • Lock Requests/sec

      The number of lock requested per second. Optimizing queries to reduce the number of reads can reduce this counter.

    • Lock Timeouts/sec

      Number of lock requests that time out while waiting for a lock to be granted. Ideally this counter should be zero.

    • Lock Waits/sec

      Number of lock requests that could not be granted immediately. Ideally this counter should be as close to zero as possible.

    • Number of Deadlocks/sec

      Number of requests that resulted in a deadlock condition. Deadlocks are detrimental to the scalability of applications and result in a poor user experience. This counter must be zero.

      By monitoring the instances available for these counters, you can find out what types of locks were involved and their contribution to total values of these counters. Good indexing and possibly lock hints should alleviate any adverse lock conditions.

  • SQLServer:Memory Manager
    • Memory Grants Pending

      Number of processes waiting for a workspace memory grant. This counter should be as close to zero as possible; otherwise it may indicate a memory bottleneck.

  • SQLServer:SQL Statistics
    • Batch Requests/sec

      Number of batch requests submitted to the server per second. This counter is used to quantify the amount of load on a system.

    • SQL Compilations/sec

      Number of compilations per second. Ideally this counter should be low. If the number of Batch requests/sec counter is close to this counter, there might be a lot of ad-hoc SQL calls being made.

    • SQL Re-Compilations/sec

      Number of recompilations per second. This counter should be low as well. Stored procedures should ideally be compiled once and their execution plans reused. A high value for this counter might require alternative coding for the stored procedures to minimize recompilation.

SQL Query Analyzer

SQL Query Analyzer can be used in various ways, for example, to execute SQL scripts to deploy new code, to count the number of records inserted or updated, to analyze query execution plans, or to execute various system stored procedures. Among the tools included with SQL Server 2000, Query Analyzer is by far the most frequently used tool in our team. Rather than describe Query Analyzer in detail, we will show you specific examples of how the tool can be used to analyze blocking and analyze execution plans in the later part of this chapter.

Quick Row Count

For a scenario that inserts records into tables, counting the number of rows before and after the stress test is useful for calculating the transaction throughput. Counting all the rows in a very large table can be time-consuming. Instead, the sysindexes system table in the application database can be queried to get the total row count for each table with the following query:

SELECT o.name, rows FROM sysobjects o INNER JOIN sysindexes i on o.id = i.id WHERE i.indid < 2 ORDER BY o.name

Reading the sysindexes table requires less I/O compared to counting a large table with thousands or millions of records. Therefore, the more records you have to count, the greater will be the benefit of querying the sysindexes table. Note that it is generally not recommended to query system tables directly. Only if the query duration of counting records from the table is unacceptable should you consider the sysindexes table.

Blocking Problems

A blocking condition occurs whenever a connection cannot acquire a lock on a resource because that resource is already locked by another connection. Blocking only occurs if the requested locks from each connection are incompatible. This causes one connection to wait until the other connection releases the locks. For instance, if Connection A requests an exclusive lock on Table A, no other connections can place another exclusive lock on the same table. Other connections that require an exclusive lock on Table A must wait until the existing lock is released by Connection A.

Depending on the type of application, blocking can occur even in a highly tuned application; however, if the blocking occurs often enough that there are long wait times, it will lead to a performance problem. Knowing how to troubleshoot the blocking problem can have a terrific pay off, and anyone interested in SQL Server tuning should acquire this skill. Again, discussing every available method is beyond the scope of this book. However, we will show you a few methods that we use in performance testing to identify blocking issues.

Identifying Blocking Connections

Typical symptoms of blocking bottlenecks are low system resource (CPU and disks) utilizations while the SQL tier throughput is maxed out. If you suspect blocking is the bottleneck, the first step towards identifying the cause is to query the sysprocesses table in the master database. Listing 8-1 shows a script that returns information on the blocked connections and the last statement sent by the root blocker:

Listing 8-1

-- Script returns blocking information from the sysprocesses table SELECT spid, blocked, status, waittime, waittype, waitresource, db_name(dbid) DatabaseName, cmd, hostname, loginame FROM master..sysprocesses WHERE blocked != 0 DECLARE @spid int -- Get the root blocker's spid id SELECT @spid = A.spid FROM master..sysprocesses A INNER JOIN master..sysprocesses B ON A.spid = B.blocked WHERE A.blocked = 0 IF NOT @spid IS NULL BEGIN -- Returns last statement sent from the connection DBCC INPUTBUFFER(@spid) END

The IBuySpy sample site included with this book does not inherently have blocking issues on the SQL tier. For the purpose of demonstration, we will add a new stored procedure that recompiles each time it is executed to simulate a compile blocking. The following script will add such a stored procedure:

CREATE PROCEDURE ProductCategoryList_Recompile WITH RECOMPILE AS SELECT CategoryID, CategoryName FROM dbo.Categories ORDER BY CategoryName ASC GO

To observe the compile blocking behavior, we will have to run several instances of the recompiling stored procedure simultaneously. As discussed in Chapter 3, we can use ACT to simulate load on the SQL server. The ACT script in Listing 8-2 can be used to apply the required load:

Listing 8-2

-- The ACT script that can simulate simultaneous execution of the stored procedure. Dim oConn On Error Resume Next Set oConn = CreateObject("ADODB.Connection") oConn.Open "driver={SQL Server};Server=SQLServer;" & _ "Database=Store;uid=user;pwd=user" If err.Number <> 0 Then Test.Trace("Error Opening connection: " & _ err.number & ", " & err.description) ELSE oConn.Execute("EXEC ProductCategoryList_Recompile") If err.Number <> 0 Then Test.Trace("Error Executing: " & _ err.number & ", " & err.description) End If End IF

You can set the simultaneous browser connections to 10 and run a test for five minutes. While stressing the SQL server, executing the script in Listing 8-1 will give you information regarding the blocked connections and the last statement sent by the root blocker. In this example, the blocked connection s waitresource column should indicate compile blocking with the following information:

TAB: 7:1173579219 [[COMPILE]]

NOTE
The "7" after "TAB:" indicates the Store database, and "1173579219" indicates the object being compiled. In our case, "1173579219" represents ProductCategoryList_Recompile. You can get this information by running SELECT Object_Name(1173579219) in the Store database. Note that the format of the waitresource field may change in the next version or the next service pack, and the current method of retrieving the database and stored procedure name may not be valid.

The last statement sent by the root blocker should be displayed as ProductCategoryList_Recompile. Although we purposefully created the problem on ProductCategoryList_Recompile, querying the sysprocesses table revealed the recompile issue on the same stored procedure.

The occurrence of excessive compiles is just one of the numerous causes of blocking. Long running transactions, improper Transaction Isolation Level, and inappropriate indexes can also cause blocking. Regardless of the cause, the approach in this section can be used to identify the blocking stored procedures or ad-hoc queries. After they are identified, you can analyze the query further to pinpoint the problem.

Locks

In-depth knowledge of lock types used in SQL Server is essential for troubleshooting blocking problems. We strongly recommend that you thoroughly review the lock topics in the aforementioned reference materials. Understanding the topics will help you troubleshoot lock timeouts as well as deadlocks.

For applications that experience SQL blocking, we typically execute sp_lock to look at the granted or waiting locks. It s a snapshot of locks at the time of execution of the sp_lock system store procedure; therefore, we execute it multiple times during the stress test to see a pattern of locking behavior. The output of sp_lock is cryptic and needs to be cross-referenced to get the table names or index names. Thus, we run a custom stored procedure that is similar to sp_lock_verbose, included in The Guru s Guide to Transact-SQL, to eliminate the repetitive cross-referencing task.

Again, for demonstration purposes, we will introduce a lock problem in one of the stored procedures. The following script will add such a stored procedure:

CREATE PROCEDURE ProductCategoryList_XLOCK AS -- Use transaction to hold the exclusive lock BEGIN TRANSACTION SELECT CategoryID, CategoryName FROM dbo.Categories (XLOCK) ORDER BY CategoryName ASC WAITFOR DELAY '00:00:01' - Hold the lock for 1 second COMMIT TRANSACTION GO

Executing sp_lock while stressing the SQL server with this new stored procedure will produce an output similar to the following:

spid dbid ObjId IndId Type Resource Mode Status ------ ------ ----------- ------ ---- ---------------- -------- ------ 51 7 0 0 DB S GRANT 53 7 0 0 DB S GRANT 53 7 1977058079 0 RID 1:89:0 X WAIT 53 7 1977058079 0 PAG 1:89 IX GRANT 53 7 1977058079 0 TAB IX GRANT 54 7 1977058079 0 TAB IX GRANT 54 7 1977058079 0 RID 1:89:2 X GRANT 54 7 1977058079 0 RID 1:89:6 X GRANT ... (Result abbreviated)

The interesting parts of the sp_lock output are the ObjId, Type, Mode, and Status columns. For example, the output shows exclusive mode, row identifier (RID) type locks granted on the ObjId of "1977058079". The RID lock type is shown instead of the KEY lock type due to the lack of a clustered index on the Categories table. In addition, an exclusive lock is waiting to be granted, as indicated by the WAIT status. You can execute SELECT Object_Name(<ObjId>) to get the object name, and in this case, "1977058079" represents the Categories table. Combining the results obtained by querying the sysprocesses table to get the blocking connection and the output of the sp_lock stored procedure to get the lock information will make it easier to narrow the problem down to a statement.

Deadlocks

A deadlock occurs when two or more connections block each other and each are waiting on some resources that the other connection has locked. For instance, Connection A holds a lock on Row A and is waiting to get a lock on Row B. Connection B holds a lock a Row B and is waiting to get a lock on Row A. Each connection is waiting on the other, and neither can proceed to commit or roll back the transactions. This type of deadlock is referred as cyclical. Another possible deadlock type is a conversion deadlock. A conversion deadlock occurs when two or more connections have shared locks on some resources and they both want to convert their shared locks to exclusive locks. Regardless of deadlock type, we use trace flags to troubleshoot the deadlock problems. The following statement will write the deadlock trace report (1204) to the SQL Server error log (3605) and set the trace flag for all connections (-1):

DBCC TRACEON(-1, 1204, 3605)

After the trace flags are turned on, you can capture the ErrorLog event using SQL Profiler or view the error log using Enterprise Manager. For demonstration purposes, we created the following script to add a stored procedure that forces a conversion deadlock:

CREATE CLUSTERED INDEX IXC_ModelName ON dbo.Products (ModelName) GO CREATE PROCEDURE ProductsUnitCostUpdate_DeadLock AS BEGIN TRANSACTION SELECT * FROM dbo.Products (HOLDLOCK) WHERE ModelName = N'Bullet Proof Facial Tissue' WAITFOR DELAY '00:00:05' UPDATE dbo.Products SET UnitCost = UnitCost * 0.90 WHERE ModelName = N'Bullet Proof Facial Tissue' COMMIT TRANSACTION GO

You must run the deadlock stored procedure in the two separate SQL Query Analyzer windows within a few seconds apart to produce a deadlock. The following report is a sample produced by enabling trace flag 1204:

Deadlock encountered .... Printing deadlock information Wait-for graph Node:1 KEY: 7:2041058307:1 (a60421ba9ed3) CleanCnt:2 Mode: Range-S-S Flags: 0x0 Grant List:: Owner:0x42be1340 Mode: Range-S-S Flg:0x0 Ref:0 Life:02000000 SPID:52 ECID:0 SPID: 52 ECID: 0 Statement Type: UPDATE Line #: 12 Input Buf: Language Event: ProductsPriceUpdate_DeadLock Requested By: ResType:LockOwner Stype:'OR' Mode: X SPID:51 ECID:0 Ec:(0x42f77568) _ Value:0x42be1360 Cost:(0/0) Node:2 KEY: 7:2041058307:1 (a60421ba9ed3) CleanCnt:2 Mode: Range-S-S Flags: 0x0 Grant List:: Owner:0x42be5240 Mode: Range-S-S Flg:0x0 Ref:0 Life:02000000 SPID:51 ECID:0 SPID: 51 ECID: 0 Statement Type: UPDATE Line #: 12 Input Buf: Language Event: ProductsPriceUpdate_DeadLock Requested By: ResType:LockOwner Stype:'OR' Mode: X SPID:52 ECID:0 Ec:(0x430b5568) _ Value:0x42be1260 Cost:(0/0) Victim Resource Owner: ResType:LockOwner Stype:'OR' Mode: X SPID:52 ECID:0 Ec:(0x430b5568) _ Value:0x42be1260 Cost:(0/0)

Detailed information is produced by the trace flag 1204 for each deadlock occurrence and we usually focus our attention on a few sections. We look for the lock resource in "Key:"; check the lock mode in "Mode:"; and search for the stored procedure name or the ad-hoc query in "Input Buf:". With this information, we can narrow our troubleshooting to a few stored procedures or queries. More detailed information on troubleshooting deadlocks and the trace flag 1204 report can be found in SQL Server Books Online and Inside Microsoft SQL Server 2000.

Additional Resources

So far we discussed manual methods of troubleshooting blocking problems. Several articles have published the blocking scripts. Using those scripts can reduce the human error factor, and we highly recommend the following article:

  • "INF: How to Monitor SQL Server 2000 Blocking (Q271509)" at http://support.microsoft.com/default.aspx?scid=kb;en-us;q271509

In addition, the following articles are recommended:

  • "INF: SQL Blocking Due to [[COMPILE]] Locks (Q263889)" at http://support.microsoft.com/default.aspx?scid=kb;en-us;Q263889

  • "INF: Troubleshooting Application Performance with SQL Server (Q224587)" at http://support.microsoft.com/default.aspx?scid=kb;en-us;Q224587



Performance Testing Microsoft  .NET Web Applications
Performance Testing Microsoft .NET Web Applications
ISBN: 596157134
EAN: N/A
Year: 2002
Pages: 67

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net