Index Tuning | Performance Testing Microsoft .NET Web Applications

Index Tuning

Using the methods discussed in the previous section, you should be able to narrow your troubleshooting efforts to problematic queries or stored procedures. Identifying which SQL calls are causing the problem is a big step towards finding a solution. Unfortunately, it is almost impossible to list every problem that could occur using SQL Server 2000. Our team discovers new and difficult problems every day and the more we work on SQL Server, the more we realize books or whitepapers alone aren t enough to help us to become expert in SQL Server performance tuning. So if you can t come up with solutions to your current performance problem right after reading this chapter, don t be discouraged; you should understand that you can t learn to tune your application s SQL Server issues overnight.

Another issue that we frequently encounter is the lack of proper indexes. We usually take a similar approach for index tuning each problematic SQL call. In the next section, we will discuss the methods that we use to tune indexes.

Analyzing the Execution Plan

One approach to identifying the cause of long durations is to analyze the query execution plan and check the data retrieval method chosen by the SQL Server 2000 query optimizer. You can do this by using Query Analyzer to get a graphical view of the execution plan.

Populating Data

Before we begin, we will add 100,000 rows to the Orders table and 1,000,000 rows to the OrderDetails table in the IBuySpy sample site database. Actual numbers of rows used for analyzing the SQL tier will vary depending on the business requirements and the projected database growth. We chose an arbitrary number of rows to demonstrate how to analyze with large data sets. SQL Server behaves differently depending on the number of rows and the uniqueness of column data; therefore, it is crucial to tune queries on the correctly sized database. Listing 8-3 shows the T-SQL used to generate the additional records.

Listing 8-3

-- The T-SQL used to load additional data SET NOCOUNT ON -- Stop the number of rows effected messages DECLARE @Count int DECLARE @OrderID int DECLARE @CustomerID int DECLARE @DateAdd int DECLARE @DateAdd2 int DECLARE @Today DateTime DECLARE @SQL nvarchar(4000) -- Count the records before the inserts SELECT Count(*) Orders FROM Orders SELECT Count(*) OrderDetails FROM OrderDetails SELECT Count(*) Customers FROM Customers SET @Today = GetDate() SET @Count = 1 -- Add 100000 Orders and 10000 Customer WHILE @Count <= 100000 BEGIN --Insert a new customer every 10 records IF @Count % 10 = 0 OR @Count = 1 BEGIN INSERT INTO Customers ( FullName, EMailAddress, Password ) VALUES ( 'TestUser_' + Cast(@Count as varchar(10)), 'TestUser_' + Cast(@Count as varchar(10)) + '@test.com', 'password' ) SET @CustomerID = @@Identity END -- Vary the OrderDate and ShipDate SET @DateAdd = (-1 * (@Count % 365)) SET @DateAdd2 = (-1 * (@Count % 365)) + 1 INSERT INTO Orders ( CustomerID, OrderDate, ShipDate ) VALUES ( @CustomerID, DateAdd(d, @DateAdd, @Today), DateAdd(d, @DateAdd2, @Today) ) SET @OrderID = @@Identity SET @SQL = N'INSERT INTO OrderDetails ( OrderID, ProductID, Quantity, UnitCost ) SELECT TOP 10 @OrderID, ProductID, 1, UnitCost FROM Products ' -- Use 4 different sorts to add different products IF @Count % 4 = 1 SET @SQL = @SQL + N'ORDER BY CategoryID' IF @Count % 4 = 2 SET @SQL = @SQL + N'ORDER BY ModelNumber' IF @Count % 4 = 3 SET @SQL = @SQL + N'ORDER BY ModelName' IF @Count % 4 = 0 SET @SQL = @SQL + N'ORDER BY UnitCost' EXEC sp_executesql @SQL, N'@OrderID int', @OrderID = @OrderID SET @Count = @Count + 1 END -- Count after the inserts SELECT Count(*) Orders FROM Orders SELECT Count(*) OrderDetails FROM OrderDetails SELECT Count(*) Customers FROM Customers

Viewing the Execution Plan

Now that we have loaded the data successfully, we will execute ProductsMostPopular, one of the stored procedures that is affected adversely by an increase in the database size. To get the execution plan, the Show Execution Plan option must be on in Query Analyzer (press CTRL+K). Figure 8-1 shows the output of the ProductsMostPopular s execution plan.

figure 10-1 the execution plan created by running the productsmostpopular stored procedure

Figure 8-1. The execution plan created by running the ProductsMostPopular stored procedure

In this relatively simple example, ProductsMostPopular contains the following code:

CREATE Procedure ProductsMostPopular AS SELECT TOP 5 OrderDetails.ProductID, SUM(OrderDetails.Quantity) as TotalNum, Products.ModelName FROM OrderDetails INNER JOIN Products ON OrderDetails.ProductID = Products.ProductID GROUP BY OrderDetails.ProductID, Products.ModelName ORDER BY TotalNum DESC GO

All icons representing physical operators used to execute statements are too numerous to list here. They are well documented in SQL Server Books Online. In addition, the SQL Server Books Online provides details on how to interpret the graphical output. In this section, we will focus on the things to look for that can cause long call durations.

Among the various physical operators, look for the operators that are showing a high percentage of cost. For example, in Figure 8-1, the Hash Match/Aggregate and Table Scan operators indicate high costs, 54 percent and 46 percent respectively. When you pause the pointer on top of each icon in the execution plan, you will see a ToolTip appear with detailed information. Figure 8-2 shows the ToolTip information on the Table Scan icon.

figure 10-2 tooltip information appears when the pointer is above the table scan icon

Figure 8-2. ToolTip information appears when the pointer is above the Table Scan icon

The ToolTip shows that Query Optimizer picked a Table Scan operator to retrieve data from the object, [store].[dbo].[OrderDetails]. In a typical execution plan, Table Scans and Index Scans can cause longer duration compared to an execution plan that uses Index Seeks. In our case, however, reading the entire OrderDetails table is required to determine which top five products are the most popular. Table Scans and Index Scans on small tables are not unusual and in such cases may be more efficient. You should not be automatically alerted when you find such operators in the execution plan; adding more indexes to a small table may not make any difference in terms of the query execution duration. Therefore, you will need additional information to the physical operators to perform an effective index tuning.

NOTE
Table scan operator is used on tables without clustered indexes. The lack of clustered indexes may cause poor performance. For more details about the problem, see PRB: Poor Performance on a Heap (Q297861) at http://support.microsoft.com/default.aspx?scid=kb;en-us;Q297861

Additional Tuning Information

SQL Server has useful statistic options with which you can measure the resource utilizations, such as STATISTICS IO. A combined analysis of the query execution plan and the I/O utilization statistics can help you discover the tables that are the source of the performance problem. Ultimately, the goal is to reduce the execution duration of the queries, and this is where the STATISTICS TIME option can help you measure the durations. Turning on the STATISTICS TIME option displays the parse and compile time for the query execution, letting you can measure how long it takes to generate the query s execution plan.

TIP
The query s durations can be captured by SQL Profiler; however, with the STATISTICS TIME option you can view the duration from the same window that executed the query.

In addition to the STATISTICS options, there are two DBCC commands that clear the data buffer and procedure cache. With DBCC DROPCLEANBUFFERS you can test queries with a cold buffer cache without restarting SQL Server, and is useful for measuring consistent query statistics. Another helpful command is DBCC FREEPROCCACHE. With this command you can compare the query performance with or without the procedure cache.

CAUTION
You should avoid the DBCC DROPCLEANBUFFERS or DBCC FREEPROCCACHE commands in the production environment.

Now, let s take a look at the performance metrics obtained by executing the stored procedure ProductsMostPopular via the following script in Query Analyzer:

DBCC DROPCLEANBUFFERS DBCC FREEPROCCACHE SET STATISTICS IO ON SET STATISTICS TIME ON GO EXEC ProductsMostPopular

Depending on the complexity of the stored procedure, the amount of extra information generated by the DBCC commands and the STATISTICS options will vary. In the case of ProductsMostPopular, the following output is what we are interested in:

... SQL Server parse and compile time: CPU time = 32 ms, elapsed time = 187 ms. (5 row(s) affected) Table 'Products'. Scan count 5, logical reads 10, _ physical reads 4, read-ahead reads 0. Table 'OrderDetails'. Scan count 1, logical reads 3800, _ physical reads 0, read-ahead reads 3801. SQL Server Execution Times: CPU time = 1813 ms, elapsed time = 1826 ms. ... (Result abbreviated)

The resulting statistics will vary depending on the database size, the SQL Server version, and the server hardware capability. The output that we obtained shows that it took 187 milliseconds to parse and compile the stored procedure ProductsMostPopular, and 1826 milliseconds to execute it. This timing information is what you obtain by enabling the STATISTICS TIME option. The most interesting part of the output is the logical reads on the table OrderDetails. The logical reads indicates the number of pages that were read from the buffer; the value of 3800 that we obtained confirms that the Table Scan operator used to retrieve the data is expensive compared to the reads on the Products table. The scan count measure can also be useful from time to time. It represents how many times the object was accessed. It is directly correlated with the type of join used. For example, a query that uses a loop join and returns 100 rows may show a scan count value of 100, while the same query using uses a merge join instead may show a scan count value of 1. In our case, scan counts are negligible due to their low values. The physical and read-ahead reads varies depending on whether the required data was already in the buffer. For query tuning purposes, the two reads counters can be ignored.

After the execution plan of a poorly performing query has been analyzed, you can decide whether to add, drop, or modify indexes, modify queries, or make database schema changes. The best scenario would be to pick a modification that gives the most gain, and our team has seen the most performance gains from index tuning. Also, index optimizations typically minimize the risk of introducing new functional bugs. In the next section, we will show you how we go about index tuning.

Understanding Indexes

If you worked on SQL servers before reading this chapter, you probably have heard of clustered and nonclustered indexes. Learning as much as you can about how the indexes are stored and retrieved is critical for proper index tuning. Again, a detailed discussion of the indexes in SQL Server is beyond the scope of this book. Instead, we will introduce you briefly to each index type in Table 8-2.

Table 8-2. Available Indexes in SQL Server
Index Types	Description
Clustered Index	Physically sorts a table based on the order of indexed column(s). This type of index contains all the table data. Unless specified, a clustered index will be created by default when a primary key constraint is added.
Nonclustered Index	Index does not contain table data; instead it contains pointers to the location of the table data. If a column, not included in the index, needs to be queried, a Bookmark Lookup will be performed on a heap or on a clustered index.

Choosing Right Indexes

In real world situations, choosing the right indexes can be a complex task. You cannot just focus on one poorly performing query, because modifying indexes could potentially affect other queries that rely on existing indexes. For example, if creating new indexes decreases query duration in a SELECT query, it may increase the duration in an INSERT, DELETE, or UPDATE query. A worse scenario would be to add an index on frequently changing columns. Changing values will force re-positions to keep the index values in order, and in a large table, this may cause additional delay for each transaction modifying the indexed columns. The longer transaction durations will hold the exclusive locks longer, and might cause blocking to occur to a point where it becomes the bottleneck.

Fortunately, as mentioned in Chapter 3, real-world user scenarios can be scripted and stress tested. After the scenarios are scripted, you can easily execute multiple stress tests to compare the results between each modification. Running the stress tests will allow you to find the bottlenecks and will minimize the risk of creating additional problems by verifying the validity of the index changes. Also, using the script to test each modification allows you to measure the overall improvement for each change. For the purpose of demonstration, all of our index tuning examples will focus on one query.

Nonclustered Index

A nonclustered index does not contain the entire table data. Instead, it contains duplicates of chosen column data and pointers to the location of the rest of the table data. The location may be a clustered index key or a row identifier if the table lacks a clustered index. Such characteristics can make the nonclustered index compact, and as a result, scanning the entire index tree will takes fewer pages to read compared to a clustered index scan. Therefore, when you are looking for a range of data, as long as all required columns are contained in the nonclustered index, a nonclustered index is typically faster than a clustered index.

Our team frequently sees significant performance gains by applying appropriate nonclustered indexes. Although we can t discuss every possible nonclustered index usage case in this chapter, we will discuss the covering index that we encounter often.

A covering index can be created by choosing all columns that are referenced by a query to be part of the index. Given the right situation, a covering index can greatly improve a query s performance by eliminating bookmark lookups. Because a nonclustered index does not contain table data other than the selected column(s), any need for column data that is not part of the index will have to be retrieved from either a clustered index or a heap. By adding extra column(s) not specified in the Searchable Arguments (SARG) or join conditions, all data can be resolved from the nonclustered index and this extra lookup can be eliminated.

Covering Index Example 1

From the earlier analysis of the execution plan generated by the ProductsMostPopular stored procedure, we discovered that most of the logical reads occurred on the OrderDetails table. Let s first look at the index information on OrderDetails by executing following statement:

sp_helpindex OrderDetails

The output shows one nonclustered index on the OrderID and ProductID columns. Moreover, the index_description field indicated Primary Key constraint. Based on the index information and the query used in ProductsMostPopular, we can build Table 8-3 to help us determine which columns should be used and in what order.

Table 8-3. Column Choice and Order
OrderDetails Table	Columns
Searchable Argument (SARG)	None
GROUP BY or ORDER BY	ProductID
JOIN	ProductID
All other columns used in the query	Sum(Quantity)
Existing Index	Primary Key, Nonclustered Index on (OrderID, ProductID)

Query Shortcuts

Stored procedures or SQL statements can be saved in the Customize window under the Tools menu. Once configured, the stored procedure will be executed every time the shortcut key is pressed. For example, sp_helpindex can be mapped to CTRL+F1 as shown in Figure 8-3.

figure 10-3 stored procedures can be saved in the customize window for quick execution.

Figure 8-3. Stored procedures can be saved in the Customize window for quick execution.

Next time you need to execute sp_helpindex, simply highly the table name you are interested in and press CTRL+F1. sp_helpindex and sp_helptext are good ones to customize.

As indicated in Table 8-2, creating a nonclustered index on the ProductID and Quantity columns will cover all columns. In addition, ProductID is used in GROUP BY clause; therefore, an index that is first sorted by ProductID should be more efficient than sorted by Quantity. Regardless, let s create two indexes with different column orders and see which one is more efficient. The following script creates the two indexes:

CREATE INDEX IX_OrderDetails_ProductID_Quantity ON OrderDetails (ProductID, Quantity) CREATE INDEX IX_OrderDetails_Quantity_ProductID ON OrderDetails (Quantity, ProductID)

When we execute ProductsMostPopular again, different operators and the covering index on ProductID and Quantity were chosen by Query Optimizer. Again, the same dual1-GHZ Pentium processor server was used for this example. Figure 8-4 shows new execution plan.

figure 10-4 new execution plan after creating covering indexes on the orderdetails table

Figure 8-4. New execution plan after creating covering indexes on the OrderDetails table

The Table Scan operator was replaced with the Index Scan operator, and the Stream Aggregate was chosen instead of Hash Match/Aggregate. For the second covering index comparison, an INDEX hint was used, and the following code shows the hint usage in bold:

SELECT TOP 5 OrderDetails.ProductID, SUM(OrderDetails.Quantity) as TotalNum, Products.ModelName FROM OrderDetails (INDEX=IX_OrderDetails_Quantity_ProductID) INNER JOIN Products ON OrderDetails.ProductID = Products.ProductID GROUP BY OrderDetails.ProductID, Products.ModelName ORDER BY TotalNum DESC

NOTE
Most of the time, Query Optimizer chooses the most efficient indexes and operators to retrieve data. However, it is good practice to validate the chosen plan by forcing alternate plans and measuring the differences.

In this example, three execution plans were observed. For comparison purposes, statistics from each plan are listed in Table 8-4.

Table 8-4. Comparing Execution Plan Queries
	Original Query	Covering index on ProductID and Quantity	Covering index on Quantity and ProductID
Operator used and cost percentage	Table Scan (46 percent) and Hash Match/Aggregate (54 percent)	Index Scan (51 percent) and Stream Aggregate (49 percent)	Index Scan and Hash (38 percent) Match/Aggregate (61 percent)
Logical Reads on OrderDetails	3800	2184	2370
Query Duration in milliseconds	1826	1021	2095

As predicted earlier, the first covering index on ProductID and Quantity column results in the most improvement. In fact, when order of the columns used in the covering index was reversed, the query duration took longer than without the index. Therefore, each index modification should be tested and validated. Without thorough testing, your index tuning attempts may hurt the application s performance.

In this example, approximately 800 milliseconds were reduced by implementing a covering index. Typically we would analyze the query further to see if an indexed view can help reduce the duration even more. We will discuss indexed views in the clustered index section.

Covering Index Example 2

In this example, we will examine the OrdersList stored procedure. The query retrieves an order history for a customer and requires the CustomerID as a parameter. The following code was used to execute the stored procedure:

DBCC DROPCLEANBUFFERS DBCC FREEPROCCACHE GO SET STATISTICS IO ON SET STATISTICS TIME ON GO EXEC OrdersList @CustomerID = 10704

Following the same method used in example 1, we first capture the query statistics by turning on STATISTICS IO and STATISTICS TIME and review the execution plan. By enabling the STATISTICS TIME setting, we determine that the query takes approximately 640 milliseconds. The duration is not as severe as ProductsMostPopular; however, the query s execution plan does reveal a Table Scan and Bookmark Lookup being performed, which usually means that an index tuning is necessary. For demonstration purposes, we will assume that OrdersList gets executed frequently and that it needs tuning. We should therefore analyze the columns used in OrdersList. Running the system store procedure sp_helptext with OrdersList as a parameter displays the following code:

CREATE Procedure OrdersList ( @CustomerID int ) As SELECT Orders.OrderID, Cast(sum(orderdetails.quantity*orderdetails.unitcost) as money) _ as OrderTotal, Orders.OrderDate, Orders.ShipDate FROM Orders INNER JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID GROUP BY CustomerID, Orders.OrderID, Orders.OrderDate, Orders.ShipDate HAVING Orders.CustomerID = @CustomerID

Based on the T-SQL used in OrdersList, we can build the following tables similar to Table 8-5:

Table 8-5. Orders Table Columns Used in OrdersList
Orders Table	Columns
Searchable Argument (SARG)	CustomerID
GROUP BY or ORDER BY	CustomerID, OrderID, OrderDate, ShipDate
JOIN	OrderID
All other columns used in the query	None
Existing Index	None

Table 8-6. Order Details Table Columns in OrdersList
OrderDetails Table	Columns
Searchable Argument (SARG)	None
GROUP BY or ORDER BY	None
JOIN	OrderID
All other columns used in the query	SUM(Quantity*Unitcost)
Existing Index	Primary Key, Nonclustered Index on (OrderID, ProductID)

TIP
Writing out the columns used in each table as shown in Table 8-6 is highly recommended, especially for more complex queries.

Similar to the covering index example 1 shown earlier, we can use the following script to create the two covering indexes based on Table 8-5 and Table 8-6:

CREATE INDEX IX_Orders_CustomerID_Covered ON OrderDetails (CustomerID,OrderID, OrderDate, ShipDate) CREATE INDEX IX_OrderDetails_OrderID_Quantity_UnitCost ON OrderDetails (OrderID, Quantity, Unitcost)

For the purpose of comparison, statistics obtained before and after adding the covering indexes are listed in Table 8-7.

Table 8-7. Statistical Comparison of Each Execution Plan
	Original Query	After Adding Covering Indexes
Operator used and cost percentage	Table Scan on Orders (22 percent) Bookmark Lookup on OrderDetails (75 percent)	Index Seek on Orders (42 percent) Index Seek on OrderDetails (51 percent)
Logical reads	Orders = 520 OrderDetails = 160	Orders = 3 OrderDetails = 30
Query duration in milliseconds	644	6

As indicated in Table 8-7, OrdersList is executing almost a hundred times faster with the new covering indexes. In our experience, it s not rare to see such dramatic improvements by adding the appropriate covering indexes. Again, the improvement is just for one stored procedure running in isolation. The remaining frequently used scenarios should be thoroughly tested before implementing any new indexes to a production system to confirm that the overall performance is improved.

Clustered Index

Since a clustered index physically sorts the table data, you can only have one clustered index per table. Such characteristics can be beneficial in many cases. For instance, having a clustered index on a table allows you to defragment the table. Unless there is a good reason not to use clustered indexes, one should be created for each table. In addition, you should consider choosing narrow columns as clustered index column since all nonclustered indexes will contain the clustered index key. Using a wide clustered index key will cost you in terms of extra space used by the nonclustered indexes and it will also cause additional I/O.

SQL Server makes it easy to create a clustered index on each table by making it the default index that is created when defining a primary key constraint. The difficult part of clustered index tuning is to pick the proper column(s) to use in the index. It is beyond the scope of this chapter to go over every possible clustered index usage. Instead, we will discuss the following typically neglected areas that our team faces frequently:

The FILLFACTOR option
Clustered index on a view

Let s look at each scenario with examples.

FILLFACTOR

The FILLFACTOR option determines how much extra space to reserve on each page. For example, a FILLFACTOR option of 80 will reserve 20 percent of free space, which is approximately 1.6 KB. It is useful on tables or indexes that are used in many inserts or updates. The reserved extra space helps reduce page reorganizations (page splits) caused by insert or update operations on a full page. For instance, let s say a customer information table has an address field, and initial customer record created had an empty address field. If the page containing the customer record is full or nearly full, an update to the address information may require additional space and cause a page split.

Frequent page splits can lead to fragmentation and reduce the overall data density. If pages were only 50 percent full due to the heavy page splits, it will require twice as much I/O to get the data compared to pages that are 100 percent full. Applications that experience heavy inserts and updates can benefit from having the free space on each page by reducing page splits and the resulting data fragmentation. Since fillfactors guarantee free space at creation time only, table or index fragmentation and page densities should be monitored periodically.

The DBCC SHOWCONTIG command allows you to view the fragmentation and data density in tables or indexes. To demonstrate fragmentation, we will slightly modify the Orders table in the IBuySpy sample site database. The following code will add a nullable description field and add a clustered index on the OrderID column:

ALTER TABLE Orders ADD OrderDesc Varchar(500) NULL GO CREATE CLUSTERED INDEX IXC_Orders_OrderID ON Orders (OrderID) GO

Using the code in Listing 8-4 we can measure query performance at different fragmentation levels:

Listing 8-4

-- Gathers query performance based on varying fragmentation -- Clears cached data buffer and execution plans DBCC DROPCLEANBUFFERS DBCC FREEPROCCACHE GO -- Displays IO and TIME related statistics SET STATISTICS IO ON SET STATISTICS TIME ON GO -- A query on the Orders table. SELECT TOP 10 WITH TIES CustomerID, _ MAX(DateDiff(hh, OrderDate, ShipDate)) MaxShippingDelay FROM Orders (INDEX=IXC_Orders_OrderID) GROUP BY CustomerID ORDER BY MaxShippingDelay DESC, CustomerID GO -- Displays Fragmentation information on the Orders table DBCC SHOWCONTIG('Orders') GO

The following text is an excerpt of the output generated by executing the code in Listing 8-4 script:

DBCC SHOWCONTIG scanning 'Orders' table... Table: 'Orders' (2025058250); index ID: 1, database ID: 7 TABLE level scan performed. - Pages Scanned................................: 418 - Extents Scanned..............................: 54 - Extent Switches..............................: 53 - Avg. Pages per Extent........................: 7.7 - Scan Density [Best Count:Actual Count].......: 98.15% [53:54] - Logical Scan Fragmentation ..................: 0.24% - Extent Scan Fragmentation ...................: 1.85% - Avg. Bytes Free per Page.....................: 25.2 - Avg. Page Density (full).....................: 99.69% DBCC execution completed. If DBCC printed error messages, _ contact your system administrator. ... Table 'Orders'. Scan count 1, logical reads 419, _ physical reads 0, read-ahead reads 0. SQL Server Execution Times: CPU time = 297 ms, elapsed time = 302 ms. ... (Result abbreviated)

The output of the DBCC SHOWCONTIG execution that we are most interested in are the scan density and avg. page density. The scan density checks the contiguousness of extent links; a table with more fragmentation will have a scan density less than that of a table with less fragmentation. Fewer extent switches will be required to retrieve the data from less fragmented tables.

An average page density closer to 100 percent will require less I/O for retrieving data from tables. This is due to the way SQL Server retrieves data from a page; regardless of how full the page is, an entire 8-KB page will be read. In our case, both density values are optimal. Now, let s see what happens when we modify the OrderDesc column for every tenth row in the Orders table. We can accomplish this by executing the following statements:

-- REPLICATE function isn't necessary here, but it can be -- useful when you want to generate test data. UPDATE Orders SET OrderDesc = 'Some description ' + REPLICATE('X', OrderID % 10) WHERE OrderID % 10 = 0

We can execute Listing 8-4 again to see the query performance and the Orders table s fragmentation level. Output from this execution is

Table: 'Orders' (2025058250); index ID: 1, database ID: 7 TABLE level scan performed. - Pages Scanned................................: 834 - Extents Scanned..............................: 110 - Extent Switches..............................: 833 - Avg. Pages per Extent........................: 7.6 - Scan Density [Best Count:Actual Count].......: 12.59% [105:834] - Logical Scan Fragmentation ..................: 50.00% - Extent Scan Fragmentation ...................: 36.36% - Avg. Bytes Free per Page.....................: 3768.9 - Avg. Page Density (full).....................: 53.44% DBCC execution completed. If DBCC printed error messages, _ contact your system administrator. ... Table 'Orders'. Scan count 1, logical reads 836, _ physical reads 0, read-ahead reads 0. SQL Server Execution Times: CPU time = 307 ms, elapsed time = 307 ms. ... (Result abbreviated)

After forcing the fragmentation with the sample code, the query s duration did not change significantly; however, the logical reads almost doubled, 836 compared to 419. This is most mostly due to the reduced avg. page density, which went from 99 percent to 53 percent. The duration in this example is based on a single user executing the query. As more user load is put on the server, the impact of the additional reads will be manifested by a further increase in the duration.

The fragmentation could have been prevented if the Orders table clustered index s FILLFACTOR was set at 90 percent. You can modified the FILLFACTOR in a few ways. You can drop and recreate the clustered index while specifying the FILLFACTOR to be 90, or you can execute DBCC REINDEX to change the FILLFACTOR. For more information, see DBCC DBREINDEX and CREATE INDEX in SQL Books Online.

Indexed View

Indexed views are a new feature in SQL Server 2000 Enterprise Edition. In previous SQL Server versions, the server had to retrieve data from the base tables each time a query used a view. With an indexed view, however, the results are built before a query uses the view. Depending on a view s overhead of retrieving data from the base table, you may find that the materialized result set can make a dramatic difference in a query s performance. In this section, we will continue with the example used in the nonclustered index example 1 section to demonstrate the use of an indexed view.

It has been our experience to see significant improvements with indexed views in queries that uses aggregation functions such as SUM or COUNT. Incidentally, the ProductsMostPopular stored procedure in the IBuySpy application uses the SUM function. The following query is part of the stored procedure:

SELECT TOP 5 OrderDetails.ProductID, SUM(OrderDetails.Quantity) as TotalNum, Products.ModelName FROM OrderDetails INNER JOIN Products ON OrderDetails.ProductID = Products.ProductID GROUP BY OrderDetails.ProductID, Products.ModelName ORDER BY TotalNum DESC

Based on the query used in ProductsMostPopular, a view and a unique clustered index can be created with the T-SQL statements in Listing 8-5:

Listing 8-5

-- User option requirements for creating an indexed view SET QUOTED_IDENTIFIER ON SET ARITHABORT ON SET CONCAT_NULL_YIELDS_NULL ON SET ANSI_NULLS ON SET ANSI_PADDING ON SET ANSI_WARNINGS ON SET NUMERIC_ROUNDABORT OFF GO if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[ProductOrderCount]') and _ OBJECTPROPERTY(id, N'IsView') = 1) drop view [dbo].[ProductOrderCount] GO CREATE VIEW dbo.ProductOrderCount WITH SCHEMABINDING AS SELECT od.productid, p.modelname, SUM(od.Quantity) OrderSum , COUNT_BIG(*) RecordCount -- COUNT_BIG aggregate function is required FROM dbo.orderdetails od INNER JOIN dbo.products p ON od.productid = p.productid GROUP BY od.productid, p.modelname GO -- The first index on the indexed view must be clustered and unique CREATE UNIQUE CLUSTERED INDEX IXUC_ProductOrderCount ON dbo.ProductOrderCount (ProductID)

When you are creating an indexed view, make sure that the all of the required user options are set correctly. The required user options are shown at the top of Listing 8-5. In addition, the view must be created with the schema binding view attribute. Lastly, the first index on the view must be a unique clustered index. Once the clustered index is created, additional nonclustered indexes can be created as necessary. More information on indexed views can be obtained from SQL Server Books Online, especially the section outlining the requirements that have to be met. Now that the indexed view is created, we can execute the ProductsMostPopular stored procedure again to observe the effect on performance. Table 8-8 compares the query using the indexed view.

Table 8-8. Query Comparison of the Indexed View
	Original Query	Covering index on ProductID and Quantity	Indexed View
Operator used and cost percentage	Table Scan (46 percent) and Hash Match/Aggregate (54 percent)	Index Scan (51 percent) and Stream Aggregate (49 percent)	Clustered Index Scan (77 percent) and Sort (23 percent)
Logical reads on OrderDetails	3800	2184	2
Query duration in milliseconds	1826	1021	20

As indicated in Table 8-8, using the ProductOrderCount indexed view significantly reduces the query duration and logical reads. In fact, the logical reads no longer occur on the OrderDetails table but now occur on the ProductOrderCount view. The following output shows the STATISTICS IO results generated by running ProductsMostPopular:

... (5 row(s) affected) Table 'ProductOrderCount'. Scan count 1, logical reads 2, _ physical reads 0, read-ahead reads 0. ... (Result abbreviated)

If ProductsMostPopular was the only stored procedure that was executed by the application, you can conclude that the indexed view is the best solution. Real-world applications would typically execute more than just one stored procedure. Indexed views can drastically improve the read performance, however, they can also degrade the performance of queries that modify or update data. Therefore, before deciding to use an indexed view, further stress testing is required to verify that the indexed view does in fact improve overall application throughput.

SQL Server includes an Index Tuning Wizard, which can sometimes help you come up with efficient indexes to implement. An interesting exercise would be to allow the Index Tuning Wizard to analyze the execution of ProductsMostPopular. You can do this by running the Index Tuning Wizard from Query Analyzer. (Press CTRL+I.) In the case of ProductsMostPopular, the Index Tuning Wizard recommended an almost identical indexed view. In many cases, the Index Tuning Wizard can be a time saver and give you useful insight on indexes. If you have never used the Index Tuning Wizard before, we highly recommend that you read the related articles in SQL Server Books Online for more information. Microsoft TechNet also has an excellent article on the Index Tuning Wizard.

By no means do the examples in this section cover all index tuning situations; however, you should have a good idea on the importance of proper index tuning. A simple indexing mistake or a lack of proper indexing in a high transaction volume server can bring performance to its knees. Evaluating each index in a database with many objects can be a daunting task but is worth the pain. In most cases, a properly indexed database will give you fewer headaches in the long run. We highly recommend that you investigate indexes in your efforts to identify and solve SQL bottlenecks before attempting to modify the underlying queries or deciding to upgrade hardware.