Database Tuning Advisor | Professional SQL Server 2005 Administration (Wrox Professional Guides)

It used to be that the DBA had to spend a lot of time reviewing the database design, learning about data distribution, finding and examining in detail the main queries, and then manually tuning indexes to try to find the best set of indexes to suit individual queries. With DTA this slow and laborious process is now no longer needed. You can use DTA to tune individual queries as they are being developed, and to tune whole workloads as they become available.

DTA does this either by analyzing individual queries from SQL Management Studio, or with a SQL Server Profiler Trace file. The workload should contain at least one example of each query called, but it doesn't need to contain repeated calls to the same procedure as you would expect to see in a trace from a production system. This is because DTA will only tune each unique query; it isn't going to look at the interaction of all the queries in the result set and provide a balanced set of indexes to suit a mix of INSERT, UPDATE, and DELETE statements. It will simply look at each query and provide recommendations to improve that query, so the DBA still has some work to do in deciding which indexes to implement to get the best compromise between insert, update, and delete performance.

Now we'll jump straight into using DTA to create some indexes for us.

Using DTA to Tune Individual Queries

Imagine a scenario in which a developer DBA is writing queries for a new database that has no queries and wants to create an initial set of indexes. You have to have a database with data in it, and that data has to be representative of the final data distribution. In the sample workload, you'll examine the index recommendations with three levels of data in the target tables. Here are the numbers of rows in each table where you'll run DTA for the insertpeople query.

Open table as spreadsheet

	Insert	Update 1
People	0	1,000,000
BoysNames	100	100
GirlsNames	100	100
LastNames	2,000	2,000

Before starting to run DTA, you need to figure out how to determine the effectiveness of each of the DTA recommendations. DTA will give you its expectation of performance improvement, but you should check its effectiveness for yourself, so you need some way to measure before and after performance.

We've chosen to use three metrics for this. The first is the insert time for each row. To get this you can use a simple stored procedure that calls the insert stored procedure multiple times and reports how many rows it inserted and the insert rate at pre-determined intervals. The second metric is the output of the statistics IO. You can gather this data using SQL Server management Studio, by turning on the Query option for Statistics IO. The third metric is Statistics Time.

Before you start tuning, capture your starting metrics; and to ensure that you get consistent results, you also need to capture a cold time and several warm times, and then average the warm times.

One other thing you'll look at in this example is the wait time during query execution. You can only do this if you run the procedure in a tight loop, as unless the query is very slow running, you won't be able to capture the instantaneous results you need to see any waits; but by running in a tight loop, you can sample the wait stats repeatedly and stand a good chance of seeing what the query is waiting on.

The examples shown in the figures in this chapter were run on an Intel Centrino 1.73GHz laptop with 2GB of RAM, and a single 110GB disk that's been partitioned into 55GB as C: and 55GB as D:. For comparison of the results, we also repeated the tests on a desktop machine with dual 3GHz Intel processors with HT enabled. This machine has 4GB of RAM, a 40GB disk running the OS, and four 160GB SATA disks configured as a RAID 0 array (striped with a 128KB stripe size) connected to a four-port SATA controller on the PCI bus. The PCI bus here limits the disk throughput to around 100 MB/Sec for large sequential I/O.

Start off by capturing stats for the insertpeople stored procedure with an empty people table. To ensure that the server is in a cold state, use the following commands before each cold run to flush memory to disk, and make sure you get a full stored procedure compile cycle on the first run. This is much faster than restarting SQL Server every time, and gives good repeatable results:

 dbcc dropcleanbuffers dbcc freeproccache

Now run the stored procedure and see how fast it goes by using this script:

 use People go truncate table people go dbcc dropcleanbuffers dbcc freeproccache go set statistics time on set statistics io on go -- Cold run exec insertPeople go -- first warm run exec insertPeople go -- second warm run exec insertPeople go -- third warm run exec insertPeople go set statistics time off set statistics io off go -- we ran the SP to insert 2 people 4 times, so we should have 8 people in the DB select count (*) from people go

Following are the results of executing the procedure several times:

Cold Run SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 89 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'BoysNames'. Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'GirlsNames'. Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'lastNames'. Scan count 1, logical reads 8, physical reads 8, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 5 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'BoysNames'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'GirlsNames'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'lastNames'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 29 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 2 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'people'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 91 ms. Table 'people'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 217 ms. -- Warm Run times SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 1 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'BoysNames'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'GirlsNames'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'lastNames'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 16 ms, elapsed time = 1 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'BoysNames'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'GirlsNames'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'lastNames'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 31 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'people'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'people'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 2 ms. SQL Server Execution Times: CPU time = 16 ms, elapsed time = 36 ms.

This is some very useful information in this output. Looking at both the cold and warm run outputs, you can see that they are both pretty fast, with a cold run elapsed time of 217 mS and a warm run elapsed time of 36 mS.

Looking at the cold run stats and focusing on the number of physical reads, you can see that there were a total of ten physical reads: one to read the Boysnames table, one to read the Girlsnames table, and eight to read the lastnames table into memory.

The warm run stats show that there were no physical reads, only logical reads. The parse and compile time was also greatly reduced. This tells us that the query didn't need recompiling, which is good because it will save a lot of time each time it's called. The warm run stats also show that it's taking about 30 to 40 mS for each insert.

Given that we are only issuing 10 reads to execute the query, and that repeated calls don't invoke additional physical reads, it's going to be hard to improve performance by further reducing these already low numbers. It's also going to be hard to see any small time-based improvements when the time taken is already so short, at just 30–40 milliseconds.

To make it easier to observe small changes in performance, we are going to need to execute the queries hundreds or thousands of times, and then look at the overall stats for a very large number of executions. This will help highlight any small changes.

To do this, you need to put the script into a loop and run it thousands of times to determine whether that gives a better measurement. Use the following command-line statement:

 sqlcmd -StracysLaptop -E -d people -Q" exec makepeople 10000, 500"

These are the results of the makePeople stored procedure:

 Inserted 1000 people in 670mS at a rate of 1492.54 per Second Inserted 1000 people in 720mS at a rate of 1388.89 per Second Inserted 1000 people in 656mS at a rate of 1524.39 per Second Inserted 1000 people in 686mS at a rate of 1457.73 per Second Inserted 1000 people in 720mS at a rate of 1388.89 per Second

The inserts are going fast enough that you are getting between 1,400 and 1,500 inserts per second.

Now you should see what you are waiting on. To do that, modify the sqlcmd line as follows so it would run for considerably longer. Then you can query the sys.processes table to see what the wait types are for your query. Here is the modified cmd line:

 sqlcmd -StracysLaptop -E -d people -Q" exec makepeople 10000, 500"

This is the query that will enable you to monitor what you are waiting on:

 set nocount on while 1 > 0 begin   select spid, kpid, blocked, waittime, lastwaittype, waitresource   from master..sysprocesses   where program_name = 'SQLCMD'   waitfor delay '00:00:00.05' end

These are the results of the query (cleaned up to save space):

 spid   kpid   blocked waittime             lastwaittype 55     3336   0       0                    WRITELOG 55     3336   0       0                    WRITELOG 55     3336   0       0                    WRITELOG 55     3336   0       0                    WRITELOG

Not surprisingly, on such a simple insert on a very basic slow disk, most of the time is spent waiting on the log write. What the information here has told you is that most of the stats are meaningless except for the raw write rate that results from the makePeople stored procedure.

One final check before going onto DTA is to take a look at the output of showplan_text to see what the query plan looks like. You can then compare this with the query plan after applying any recommendations from DTA and see how it changes.

The output is too verbose to include here, but you can get it from this book's Web site at www.wrox.com. The key elements of interest are shown here:

 |--Table Scan(OBJECT:([People].[dbo].[BoysNames])) |--Table Scan(OBJECT:([People].[dbo].[GirlsNames])) |--Table Scan(OBJECT:([People].[dbo].[lastNames]))

This shows that you are using a table scan to get the names from the lookup tables. In most cases this works just fine because the tables are so small (boysNames and girlsNames), but this isn't so optimal on lastNames, where the table has 2,000 rows and occupies seven or eight database pages.

Now see what DTA recommends for you. Running DTA against the sample query is simple. Open the script run insertpeople.sql in a new query window. Right-click the window and select Analyze Query in Database Engine Tuning Advisor, as shown in Figure 15-1.

image from book
Figure 15-1

This brings up the Database Engine Tuning Advisor, shown in Figure 15-2.

image from book
Figure 15-2

There are two things you need to change before you click the Start Analysis button. First, change the database for workload analysis from master to people. Second, select which database you want to tune by selecting the people database. Now you are ready to start the analysis session by clicking the Start Analysis button. When you start the analysis session, DTA adds a new Progress tab and updates its analysis progress, as shown in Figure 15-3.

image from book
Figure 15-3

When the analysis is complete, DTA adds two more tabs: Recommendations and Reports.

For the insert query, DTA has recommended that you create a clustered index on the lastNames table. This will reduce the number of reads in the lastNmes table from 8 to 1 or 2 on each query. Percentage-wise, this is quite a large reduction.

Now you should implement the recommendation and check the performance difference. Start by looking at the stats time and I/O after adding the clustered index on LastNames. The following is abbreviated stats output, showing just the key areas that have changed:

 -- COLD Run Table 'lastNames'. Scan count 1, logical reads 9, physical reads 1, read-ahead reads 7, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'lastNames'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:    CPU time = 15 ms,  elapsed time = 590 ms. -- Warm Run Table 'lastNames'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'lastNames'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:    CPU time = 0 ms,  elapsed time = 4 ms.

There are two differences between these stats and the earlier pre-indexed stats. Now that you are using the clustered index on lastNames, the number of logical reads is reduced dramatically, from eight down to two. In addition, when the table is first read on the cold run, now it is being read in using a read ahead, which brings the whole table into memory much more quickly than if you use a regular table scan as you did before indexing.

Now take a look at the showplan_text output and confirm what you are observing in the I/O stats. Here is the relevant section from the plan output:

 |--Clustered Index Scan(OBJECT:([People].[dbo].[lastNames].[cix_LastNames_ID]))

This shows that you are in fact using the newly added clustered index. Now see how much this affects the execution of the query (remember that before you were able to achieve 1,300–1,400 inserts per second):

 Inserted 1000 people in 530mS at a rate of 1886.79 per Second Inserted 1000 people in 606mS at a rate of 1650.17 per Second Inserted 1000 people in 610mS at a rate of 1639.34 per Second Inserted 1000 people in 533mS at a rate of 1876.17 per Second

This shows that the rate of insertion has increased to 1,600–1,900 per second. That's quite an improvement for adding a clustered index.

One final thing to check: What are you waiting on?

 spid   kpid   blocked waittime             lastwaittype 54     5804   0       0                    WRITELOG 54     5804   0       0                    WRITELOG 54     5804   0       0                    WRITELOG 54     5804   0       0                    WRITELOG

No surprises there: It's still the log that's limiting insert performance.

Indexes for Updates

Next you want to tune the update query. Start that by capturing some metrics around the query's performance. Before you do that, fill the table up a bit by writing a million rows to the people table. You need that many in order to get a full set of results for the marriage query, which pulls out the top 1,000 rows for a given date range.

Truncate the people table, run makepeople to fill it with 1,000,000 rows, truncate the lastnames table, and then fill it with 2,000 rows. After that, you can start capturing metrics around the raw query performance again. Here are the results of the cold run and three warm runs, edited to remove the many extra rows, and with some additional formatting for clarity:

 Table '#boys'. Scan count 0, logical reads 1003, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'people'. Scan count 1, logical reads 1824, physical reads 0, read-ahead reads 1904, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:    CPU time = 63 ms,  elapsed time = 1197 ms. (1000 row(s) affected) Table '#girls'. Scan count 0, logical reads 1003, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'people'. Scan count 1, logical reads 1897, physical reads 0, read-ahead reads 64, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:    CPU time = 46 ms,  elapsed time = 866 ms. (1000 row(s) affected) SQL Server Execution Times:    CPU time = 0 ms,  elapsed time = 1 ms. Table '#boys'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:    CPU time = 0 ms,  elapsed time = 1 ms. Table '#girls'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:    CPU time = 0 ms,  elapsed time = 1 ms. Table 'people'. Scan count 2, logical reads 32281, physical reads 0, read-ahead reads 14172, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:    CPU time = 813 ms,  elapsed time = 8350 ms. (1 row(s) affected) SQL Server Execution Times:    CPU time = 0 ms,  elapsed time = 1 ms. SQL Server Execution Times:    CPU time = 0 ms,  elapsed time = 1 ms. SQL Server Execution Times:    CPU time = 922 ms,  elapsed time = 10464 ms.

This shows that the cold run took almost 10 seconds to complete.

The warm run looks like this:

 SQL Server parse and compile time:    CPU time = 0 ms, elapsed time = 1 ms. SQL Server parse and compile time:    CPU time = 0 ms, elapsed time = 1 ms. Table '#boys'. Scan count 0, logical reads 1003, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'people'. Scan count 1, logical reads 1895, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:    CPU time = 47 ms,  elapsed time = 216 ms. (1000 row(s) affected) Table '#girls'. Scan count 0, logical reads 1003, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'people'. Scan count 1, logical reads 1765, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:    CPU time = 47 ms,  elapsed time = 46 ms. (1000 row(s) affected) SQL Server Execution Times:    CPU time = 0 ms,  elapsed time = 1 ms. Table '#boys'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:    CPU time = 0 ms,  elapsed time = 3 ms. Table '#girls'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:    CPU time = 0 ms,  elapsed time = 1 ms. Table 'people'. Scan count 2, logical reads 32281, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:    CPU time = 703 ms,  elapsed time = 713 ms. (1 row(s) affected) SQL Server Execution Times:    CPU time = 15 ms,  elapsed time = 1 ms. SQL Server Execution Times:    CPU time = 812 ms,  elapsed time = 980 ms.

There was a significant improvement between the cold and warm run, down to the reduction in compilation time and the faster access, as the tables are now mostly loaded into memory. This is shown in the reduction in read ahead reads between the cold and warm runs.

This procedure is taking between 10 seconds for a cold run and 1 second for a warm run, so it should be much easier to see any improvement that the DTA can make. However, we'll still run it in a loop and see what the average update rate is over a longer period of executions.

This is the command to run:

 sqlcmd -E -d people -Q" exec domarriages 100, 10"

These are the results :

 Married 20 people in 15326mS at a rate of 1.30497 per Second Married 20 people in 18610mS at a rate of 1.07469 per Second Married 20 people in 15470mS at a rate of 1.29282 per Second Married 20 people in 14610mS at a rate of 1.36893 per Second Married 20 people in 14890mS at a rate of 1.34318 per Second Married 20 people in 17076mS at a rate of 1.17123 per Second

The results show that the query is taking between 1 and 1.36 seconds to execute, so it should be relatively easy to see any performance improvement.

Before going on to run DTA, let's take a quick look at the wait types. Run the command again and run the monitoring code and capture the waittypes from sysprocesses. The output should look something like this:

 spid   kpid   blocked waittime             lastwaittype 52     4212   0       546                  LOGMGR_RESERVE_APPEND 52     4212   0       0                    SOS_SCHEDULER_YIELD

There was a pretty even split between these two wait types. SQL Server Books Online (SQL BOL) explains what each of the wait types means. Look these up either by searching on the wait type or by searching for sys.dm_os_wait_stats. The LOGMGR_RESERVE_APPEND occurs when you are waiting to see if truncating the log will give you enough space to write the current log record. In this case, the database was configured with the simple recovery model, and the log file is on a very slow disk, so you should expect to see a lot of log-related waits.

SOS_SCHEDULER_YIELD occurs when a task voluntarily yields the scheduler and has to wait for a new quantum. These are quite different from the wait types in the insert query, which is expected because the update has a very different characteristic than the insert.

Now see what DTA has to say about this workload. This time we'll show you how to run DTA against a workload. The next step here is to set up SQL Server Profiler to capture a trace file, which is covered in Chapter 13. The key things you need to remember when setting up SQL Serve profiler for this trace are twofold: use the tuning template, and ensure that the files are set to roll over, although the overall file size will be trivial.

Once Profiler is set up and the trace is running, execute the marriage stored procedure once, and then stop the trace, which saves the file. Now you have a workload trace file on disk, and you can start up DTA and use it to tune the workload. Launch DTA and connect to your SQL Server. If you have been following through the earlier sections, you will see the earlier trace sessions in the left pane of DTA, as shown in Figure 15-4.

image from book
Figure 15-4

This is another part of DTA that is worth briefly mentioning. Each of your tuning sessions is saved, so you can go back and review what you asked DTA to tune, and the recommendations DTA came up with. However if you just use the default session names, as we have been doing, then the tuning session names don't really have a lot of meaning, and pretty soon it's difficult to know what each session was for. Therefore, we suggest that you come up with a naming scheme that makes sense to you. This will help you find the session you're looking for in the future.

After setting the database to be People in both the drop-down selection and the list of databases to tune, you need to tell DTA that you want to tune a file, and where the file is. Either type in the filename and full path, or use the Browse button to select the trace file you just created, as shown in Figure 15-5.

image from book
Figure 15-5

Now we can start the Analysis session again and see what results DTA has for us this time. You can see those results in Figure 15-6.

image from book
Figure 15-6

This time DTA has two recommendations, and reckons it can improve things by 98 percent. Take a closer look at what you can do with these recommendations. To do this, just scroll the recommendations window way over to the right to find the Definition column, If you hover over this a ToolTip pops up, telling you to click on the link to get a T-SQL script of the recommendations. Doing so will reveal a script like the one shown in Figure 15-7.

image from book
Figure 15-7

You can now copy this script either to the clipboard and from there into a file, or directly into SQL Server Management Studio to be executed. Alternately, after taking a look at the recommendations, you can have DTA run them for you by selecting Apply Recommendations from the Actions menu.

Before doing that, take a look at some of the other information in the Reports tab. This area of DTA holds a lot of useful information about what DTA did and why. In the simple case we have been working with here, most of the interesting information is on the Statement cost reports. Start with the first report in the list, shown in Figure 15-8.

image from book
Figure 15-8

This shows the various statements in the workload, and the estimated improvement that DTA expects to gain from the recommendation. In this case, all of the improvement comes from the update statement, which the Tuning report believes will be improved by 99.90 percent.

To apply the changes, simply let DTA make them. When you select Apply Recommendations, DTA asks whether you want to run the script now or schedule it for some time in the future. Choose to make the changes right away so you can see the immediate impact. While it's executing, DTA shows the status of the changes, as shown in Figure 15-9.

image from book
Figure 15-9

Now that it's done, go back and see how fast your queries are running. Start with the output of stats I/O and stats time. Running the same script again now gives you the following results.

This is the cold run:

SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 1 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 63 ms. Table '#boys'. Scan count 0, logical reads 1003, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'people'. Scan count 1, logical reads 1759, physical reads 3, read-ahead reads 1974, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 32 ms, elapsed time = 851 ms. (1000 row(s) affected) Table '#girls'. Scan count 0, logical reads 1003, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'people'. Scan count 1, logical reads 1794, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 46 ms, elapsed time = 49 ms. (1000 row(s) affected) SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table '#boys'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table '#girls'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'people'. Scan count 2, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 94 ms, elapsed time = 1012 ms.

This is the result for the warm run:

SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 1 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 1 ms. Table '#boys'. Scan count 0, logical reads 1003, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'people'. Scan count 1, logical reads 1793, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 32 ms, elapsed time = 44 ms. (1000 row(s) affected) Table '#girls'. Scan count 0, logical reads 1003, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'people'. Scan count 1, logical reads 1736, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 46 ms, elapsed time = 447 ms. (1000 row(s) affected) SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table '#boys'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table '#girls'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'people'. Scan count 2, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. CPU time = 78 ms, elapsed time = 494 ms.

That has reduced the CPU time for the warm run from 812 mS to 78 mS, although the elapsed time only came down from around 1 sec to 500 mS. Now see how fast it runs in a tight loop:

 Married 20 people in 9860mS at a rate of 2.0284 per Second Married 20 people in 8890mS at a rate of 2.24972 per Second Married 20 people in 9110mS at a rate of 2.19539 per Second Married 20 people in 6936mS at a rate of 2.88351 per Second Married 20 people in 5280mS at a rate of 3.78788 per Second Married 20 people in 5376mS at a rate of 3.72024 per Second

That's pretty remarkable; you have gone from just over 1 update per second to 2–4 inserts per second.

Now check the waits again:

 spid   kpid   blocked waittime             lastwaittype 58     4688   0       859                  LOGMGR_RESERVE_APPEND

This time the waits are predominantly this one wait type.

Finally, take a quick look at the showplan_text output to see how the DTA recommendations are changing the query plan.

This is the showplan output for the update before applying the DTA recommendations:

       |--Table Update(OBJECT:([People].[dbo].[people]), SET:([People].[dbo].[people].[lastName] = RaiseIfNull([Expr1016])))             |--Table Spool                  |--Compute Scalar(DEFINE:([Expr1016]=[Expr1016]))                       |--Nested Loops(Left Outer Join)                            |--Top(ROWCOUNT est 0)                            |    |--Table Scan(OBJECT:([People].[dbo].[people]), WHERE:([People].[dbo].[people].[personID]=[@girlID]) ORDERED)                            |--Assert(WHERE:(CASE WHEN [Expr1015]>(1) THEN (0) ELSE NULL END))                                 |--Stream Aggregate(DEFINE:([Expr1015]=Count(*), [Expr1016]=ANY([People].[dbo].[people].[lastName])))                                      |--Table Scan(OBJECT:([People].[dbo].[people]), WHERE:([People].[dbo].[people].[personID]=[@BoyID]))

You can clearly see that you are using a table scan to apply the update to People.

This is the showplan output after applying the DTA recommendations:

        |--Clustered Index Update(OBJECT:([People].[dbo].[people].[_dta_index_people_c_6_2089058478__K1]), SET:([People].[dbo].[people].[lastName] = RaiseIfNull([Expr1016])))             |--Table Spool                  |--Compute Scalar(DEFINE:([Expr1016]=[Expr1016]))                       |--Nested Loops(Left Outer Join)                            |--Top(ROWCOUNT est 0)                            |    |--Clustered Index Seek(OBJECT:([People].[dbo].[people].[_dta_index_people_c_6_2089058478__K1]), SEEK:([People].[dbo].[people].[personID]=[@girlID]) ORDERED FORWARD)                            |--Assert(WHERE:(CASE WHEN [Expr1015]>(1) THEN (0) ELSE NULL END))                                 |--Stream Aggregate(DEFINE:([Expr1015]=Count(*), [Expr1016]=ANY([People].[dbo].[people].[lastName])))                                      |--Clustered Index Seek(OBJECT:([People].[dbo].[people].[_dta_index_people_c_6_2089058478__K1]), SEEK:([People].[dbo].[people].[personID]=[@BoyID]) ORDERED FORWARD)

This starts with the clustered index tag, showing that you are now using the newly created clustered index to apply the update, and this is what's providing the big benefit.

Reassessing Inserts after Adding Update Indexes

Now go back and measure the impact the update indexes have had on the insert procedure. You haven't done anything else to the insert procedure, but the update procedure added a clustered index to People, so now the insert procedure will have to contend with the additional overhead of index maintenance, inserting new records, and splitting index pages. It's going to be interesting to see how much slower this makes the inserts, and how it changes what you are waiting on.

Start by looking at the stats time and I/O output:

-- COLD RUN SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 51 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'BoysNames'. Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 21 ms. Table 'GirlsNames'. Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'lastNames'. Scan count 1, logical reads 9, physical reads 1, read-ahead reads 7, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 2 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'BoysNames'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'GirlsNames'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'lastNames'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 29 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 1 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'people'. Scan count 0, logical reads 3, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 63 ms. Table 'people'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 168 ms. -- WARM RUN SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 1 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 1 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'BoysNames'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'GirlsNames'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'lastNames'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'BoysNames'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'GirlsNames'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'lastNames'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'people'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms. Table 'people'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 3 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 33 ms.

On the cold run, you increased the number of logical reads from 1 to 3. Physical reads didn't change because the whole database is pretty well cached by now. This change is too small to see on a single run, so you'll need to run the query a few thousand times. Before you do that, take a look at the showplan_text output.

The only obvious change here is that you can see that you are now doing an insert into a clustered index table versus the heap you were inserting into before:

 |--Clustered Index Insert(OBJECT:([People].[dbo].[people].[_dta_index_people_c_6_2089058478__K1]), SET:([People].[dbo].[people].[firstName] = RaiseIfNull([@BoysName]),[People].[dbo].[people].[lastName] = RaiseIfNull([@lastName]),[People].[dbo].[peop

Now see how this has changed the insert rate when you run makePeople. This is now reporting the following:

 Inserted 10000 people in 5250mS at a rate of 1904.76 per Second Inserted 10000 people in 5296mS at a rate of 1888.22 per Second Inserted 10000 people in 5233mS at a rate of 1910.95 per Second Inserted 10000 people in 5300mS at a rate of 1886.79 per Second Inserted 10000 people in 5233mS at a rate of 1910.95 per Second

This shows that you haven't caused any impact on the insert rate, which remains at around 1,800 to 1,900 per second. This is more evidence that the limiting factor for this insert is the speed with which we can write to the log (not surprisingly, on a laptop system this is pretty slow). This can also be the case on a large enterprise server where the amount of log activity could be much higher, but the log files are on a slow disk or poorly configured disk array, which yet again limits ultimate insert performance.

The wait stats for the query show that you are still waiting on the WriteLog, so you haven't seen a change to the point where index maintenance started to be the bottleneck.

Too Many Indexes?

One final scenario we are going to look at is how DTA tells you when you have too many indexes. To make this obvious, we'll add a whole stack of other indexes to the four tables in this scenario, run DTA against the insert and update procedures, and see what it tells you about indexes you aren't using.

Here is the script to create some bad indexes to see what DTA will recommend:

 -- Create Bad indexes use people  go create clustered index cix_boysnames on BoysNames ( ID, Name) go create index ix_boysnames_id on BoysNames ( id) go create index ix_boysnames_name on BoysNames (name) go create clustered index cix_girlsnames on GirlsNames ( ID, Name) go create index ix_Girlsnames_id on GirlsNames (id) go create index ix_Girlsnames_name on GirlsNames (name) go create clustered index cix_LastNames on LastNames ( ID, Name) go create index ix_Lastnames_id on LastNames (id) go create index ix_Lastnames_name on LastNames (name) go create clustered index cix_people on people(firstname) go create index ix_people_id on people(personid) go create index ix_people_dob on people(dob) go create index ix_people_lastname on people(lastname) go create index ix_people_dod on people(dod) go create index ix_people_sex on people(sex) go

Here are the results of running the query to makePeople:

 Inserted 1000 people in 1203mS at a rate of 831.255 per Second Inserted 1000 people in 750mS at a rate of 1333.33 per Second Inserted 1000 people in 640mS at a rate of 1562.5 per Second Inserted 1000 people in 673mS at a rate of 1485.88 per Second Inserted 1000 people in 656mS at a rate of 1524.39 per Second

These results show that insert performance has dropped dramatically. The worst batch is at only 831 inserts per second. This is about half the best rate achieved, which was nearly 2,000 inserts per second. The wait stats show that you are still waiting on the log write, although you are clearly spending a lot more time in index maintenance, and in reading extra pages The stats I/O time indicates that you are incurring a few extra reads on the boysNames and girlsNames tables, but otherwise there isn't a great difference in the stats. The showplan indicates that there is an index scan on both boysNames and girlsNames, rather than the single-page table scan that was there before. This is what is accounting for the extra page reads:

 |--Index Scan(OBJECT:([People].[dbo].[BoysNames].[ix_boysnames_name])) |--Index Scan(OBJECT:([People].[dbo].[GirlsNames].[ix_Girlsnames_name])) |--Index Scan(OBJECT:([People].[dbo].[lastNames].[ix_Lastnames_name]))

In addition, you are no longer using a useful clustered index on lastNames, but have to use the non-clustered index, which results in an extra page read required on every lastNames access. Overall the performance degradation isn't that great. On a system with a faster disk it might be considerably higher.

Now see what DTA has to say about all these extra indexes. You can use the trace file you captured earlier of the DoMarriage trace, but you'll have to change the DTA options. Open DTA and select the trace file you used earlier. Select the Tuning Options tab, and under the section titled "Physical Design Structures (PDS) to keep in database" change the default selection from Keep All Existing PDS to "Do Not Keep Any Existing PDS. You can have DTA advise you about additional indexes as well by keeping the same options selected under the PDS section, or you can have DTA just show you which indexes to remove. For now, ask DTA to examine the existing indexes and to recommend additional indexes if it finds any, as shown in Figure 15-10.

image from book
Figure 15-10

Now that you have done that, you can start the analysis session and see what results DTA provides. The results are shown in Figure 15-11.

image from book
Figure 15-11

DTA has easily found the bad indexes. It's recommending that you drop 12 of them, and in their place create two new indexes and two new statistics.

Tuning a Workload

Tuning a whole workload is just as easy. The biggest challenge with tuning a workload is when creating the trace file of the workload in the first place, as you must determine what to put into the trace file. In the sample scenario, there are only five different queries. Even if they are called in very different patterns, DTA doesn't really care about how frequently the statement is called - it's just looking for queries that it can tune. To create a workload file to tune, all you need to do is gather one example of each of your queries and pass that to DTA; in turn, it will provide recommendations for the whole workload.

To see how this works, start from the point where you have a workload trace file - in this case, named workload.trc. Open DTA and change the name of the session to represent what you are doing. Call this trace session Mixed Workload with No indexes, indicating that you are starting out with no indexes in the database.

Select the file that contains the trace file you previously created, and then select the tuning options you want. In this case, you are only interested in new indexes. After selecting the database to tune, start the analysis. In this sample scenario, DTA came back with the recommendations shown in Figure 15-12.

image from book
Figure 15-12

A note of caution when tuning a workload: DTA will make recommendations based on the cost estimates for each part of the workload. In a workload with a variety of queries, DTA will focus on the big gains, and may therefore miss recommendations for some of the smaller queries. Because of this, it's still worthwhile to tune individual queries after applying recommendations for the whole workload.

A good example of this can be seen when DTA didn't recommend indexes on the lastNames table when tuning the workload. When you tuned just the insertPeople procedure, DTA was able to see enough of an improvement to recommend indexes. Figure 15-13 shows the recommendations DTA comes up with for the lastNames table when tuning the insertPeople procedure.

image from book
Figure 15-13