Lesson 2: Optimizing Index Strategies

Estimated lesson time: 40 minutes

SQL Server 2005 supports two basic types of indexes: clustered and non-clustered. Both indexes are implemented as a balanced tree, where the so-called leaf level is the bottom level of the index structure. The difference between these index types is that the clustered index is the actual table, that is, the bottom level of a clustered index contains the actual rows, including all columns, of the table. A non-clustered index, on the other hand, only contains the columns included in its key, plus a pointer pointing to the actual data row. If a table does not have a clustered index defined on it, it is called a heap, or unsorted table. You could also say that a table can have one of two forms: It is either a heap (unsorted) or a clustered index (sorted).

Improving Performance with Covered Indexes

The notion of a covered index is that SQL Server doesn't need to use lookups between the non-clustered index and the table to return the query results. Because a clustered index is the actual table, clustered indexes always cover queries.

To consider the index "covered," it must contain all columns referenced in the query (in any clause, SELECT, JOIN, WHERE, GROUP BY, HAVING, etc.). Consider the following SQL table and query:

Test TableA
Col1	Col2	Col3

 SELECT Col1 FROM Test.TableA WHERE Col2 = 1;

For an index to cover this query, it must contain at least the columns Col1 and Col2. You can do this in several ways. All of the following indexes would cover this query:

 CREATE NONCLUSTERED INDEX TestIndex ON Test.TableA (Col1, Col2); CREATE NONCLUSTERED INDEX TestIndex ON Test.TableA (Col2, Col1); CREATE NONCLUSTERED INDEX TestIndex ON Test.TableA (Col1) INCLUDE (Col2); CREATE NONCLUSTERED INDEX TestIndex ON Test.TableA (Col2) INCLUDE (Col1); CREATE NONCLUSTERED INDEX TestIndex ON Test.TableA (Col1, Col2, Col3); CREATE NONCLUSTERED INDEX TestIndex ON Test.TableA (Col3) INCLUDE (Col1, Col2);

As you can see, the columns only need to be found in the index; their position and whether they are found in the index key or are included columns (discussed in this lesson) does not matter. Of course, both the execution plan and the performance could differ greatly between these indexes; however, they all cover the query.

The performance benefit gained by using a covered index is typically great for queries that return a large number of rows (a non-selective query) and small for queries that return few rows (a selective query). Remember that a small number of rows could be 10 for a table with a couple of hundred rows and 1,000 for a table with millions of rows. Following is a performance comparison of four queries. The table that the queries are executed against has the following schema and is populated with 1,000,000 rows:

 CREATE TABLE Test.CoveredIndexTest (     Col1 INT NOT NULL     Col2 NVARCHAR(2047) NOT NULL ); INSERT Test.CoveredIndexTest (Col1, Col2)     VALUES (0, 'A lonely row...'); INSERT Test.CoveredIndexTest (Col1, Col2)     SELECT TOP(999999) message_id, text FROM sys.messages AS sm     CROSS JOIN (         SELECT TOP(15) 1 AS Col FROM sys.messages     ) AS x

On the test machine, the size of this table is 24,238 pages (roughly 193 MB); also note that the table is a heap, that is, it does not have a clustered index defined on it. The queries and indexes used in this test have the following definitions; the performance metrics (measured in page reads) for the queries are shown in Table 7-4.

Table 7-4: Query Performance Matrix
Open table as spreadsheet
	Query #1 (1 row)	Query #2 (selective)	Query #3 (somewhat selective)	Query #4 (non-selective)
No index	24,237 pages	24,237 pages	24,237 pages	24,237 pages
Non-covered index	4 pages	1,062 pages	5,029 pages	50,125 pages
Covered index	3 pages	28 pages	139 pages	1,286 pages

 -- Query #1 -- Returns 1 row. SELECT Col1, Col2 FROM Test.CoveredIndexTest     WHERE Col1 = 0; -- Query #2  -- Returns roughly 0.1% of the rows found in the table. -- (1,056 rows) SELECT Col1, Col2 FROM Test.CoveredIndexTest     WHERE Col1 BETWEEN 1205 AND 1225; -- Query #3 -- Returns roughly 0.5% of the rows found in the table. -- (5,016 rows) SELECT Col1, Col2 FROM Test.CoveredIndexTest     WHERE Col1 BETWEEN 1205 AND 1426;  -- Query #4 (non-selective) -- Returns roughly 5% of the rows found in the table. -- (50,028 rows) SELECT Col1, Col2 FROM Test.CoveredIndexTest     WHERE Col1 BETWEEN 1205 AND 2298; -- Non-covered index CREATE NONCLUSTERED INDEX NonCovered ON Test.CoveredIndexTest (Col1); -- Covered index CREATE NONCLUSTERED INDEX Covered ON Test.CoveredIndexTest (Col1) INCLUDE (Col2);

 -- Drop the table used in this example. DROP TABLE Test.CoveredIndexTest;

The performance metric that is shown in the table is the number of data pages that SQL Server touched during the query execution (SET STATISTICS IO ON, logical reads).

Note that the so-called "selective" query (Query #2) returns 0.01 percent of the rows in the table. For a table of this size, that still amounts to 1,000 rows. If you are speaking to someone about the number of rows that will be affected by a query, and he or she says that number is "only a couple of percent of the table," this usually translates to a lot of rows.

Some conclusions that we can draw from the test are given here. (This is only with regard to read performance; write performance is discussed later in this lesson.)

A covered index always performs better than a non-covered index.
For queries that return a very limited number of rows, a non-covered index also performs very well.
For the somewhat-selective query (Query #3), the covered index reads at least 36 times fewer pages than the non-covered index. In this case, a query was considered selective by the query optimizer when it matched less than roughly 0.77 percent of the table.

Using Included Columns and Index Depth

In earlier versions of SQL Server (before 2005), creating covered non-clustered indexes could often be impossible because an index could contain no more than 16 columns or be more than 900 bytes wide. The new "included column" feature makes it possible to add columns to an index without adding them to the index's key. Included columns cannot be used for tasks such as filtering or sorting; their sole benefit is reducing page reads through covering queries, thereby avoiding table lookups.

An index can have a maximum of 1,023 included columns (a table can have a maximum of 1,024 columns), making it possible to create a non-clustered index that covers the entire table, which is almost like having a second clustered index! In addition, columns that use one of the large data types (VARCHAR(max), NVARCHAR(max), VARBINARY(max), XML, TEXT, NTEXT, and IMAGE) are allowed to be included in an index as an included column.

Only columns that are used for filtering, grouping, or sorting should be part of the index key; all other columns included in the index should be included columns. Besides allowing for more columns in the index, included columns have other benefits. In the following SQL script, a table with 1,000,000 rows is created with two indexes. One index has all columns in the index key, while the other index only has one column in the key (the one that would be filtered on), and the rest of the columns are included. The width of each row in the index will be a little over 300 bytes; this might sound like a very wide index row, but it not uncommon. This also makes up for the fact that the test table only contains one million rows; for larger tables, the width of the index does not need to be this big to make a performance difference. The following script defines (and populates) objects and indexes used in the following examples:

 CREATE TABLE Test.IncludedColumnsTest(     PKCol UNIQUEIDENTIFIER NOT NULL DEFAULT NEWSEQUENTIALID()         PRIMARY KEY CLUSTERED     ,Col1 INT IDENTITY NOT NULL     ,Col2 CHAR(20) NOT NULL     ,Col3 CHAR(20) NOT NULL     ,Col4 CHAR(20) NOT NULL     ,Col5 CHAR(20) NOT NULL     ,Col6 CHAR(20) NOT NULL     ,Col7 CHAR(20) NOT NULL     ,Col8 CHAR(20) NOT NULL     ,Col9 CHAR(20) NOT NULL     ,Col10 CHAR(20) NOT NULL     ,Col11 CHAR(20) NOT NULL     ,Col12 CHAR(20) NOT NULL     ,Col13 CHAR(20) NOT NULL     ,Col14 CHAR(20) NOT NULL     ,Col15 CHAR(20) NOT NULL     ,Col16 CHAR(20) NOT NULL ); INSERT Test.IncludedColumnsTest (Col2, Col3, Col4, Col5, Col6, Col7, Col8,        Col9, Col10, Col11, Col12, Col13, Col14, Col15, Col16) SELECT TOP(1000000)     CAST(message_id AS CHAR(20)) AS Col2     ,CAST(message_id AS CHAR(20)) AS Col3     ,CAST(message_id AS CHAR(20)) AS Col4     ,CAST(message_id AS CHAR(20)) AS Col5     ,CAST(message_id AS CHAR(20)) AS Col6     ,CAST(message_id AS CHAR(20)) AS Col7     ,CAST(message_id AS CHAR(20)) AS Col8     ,CAST(message_id AS CHAR(20)) AS Col9     ,CAST(message_id AS CHAR(20)) AS Col10     ,CAST(message_id AS CHAR(20)) AS Col11     ,CAST(message_id AS CHAR(20)) AS Col12     ,CAST(message_id AS CHAR(20)) AS Col13     ,CAST(message_id AS CHAR(20)) AS Col14     ,CAST(message_id AS CHAR(20)) AS Col15     ,CAST(message_id AS CHAR(20)) AS Col16 FROM sys.messages AS sm CROSS JOIN (     SELECT TOP(15) 1 AS Col FROM sys.messages ) AS x; CREATE NONCLUSTERED INDEX IncludedColumns ON Test.IncludedColumnsTest (Col1) INCLUDE (Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Col13, Col14, Col15, Col16); CREATE NONCLUSTERED INDEX NoIncludedColumns ON Test.IncludedColumnsTest (Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Col13, Col14, Col15, Col16);

Table 7-5 shows some of the interesting differences between the index with and without included columns.

Table 7-5: Index Size Matrix
Open table as spreadsheet
	IncludedColumns	NoIncludedColumns
Total size	40,147 pages	41,743 pages
Size of the non-leaf level of the index	146 pages	1,743 pages
Index depth	3 levels (root page + 1 intermediate level + leaf level)	5 levels (root page + 3 intermediate levels + leaf level)
Average size of rows in the non-leaf levels of the index	27 bytes	327 bytes
Average size of rows in the leaf level of the index	321 bytes	321 bytes

You can retrieve this information from the dynamic management function sys.dm_db_index_physical_stats by executing the following query:

 SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('Test.IncludedColumnsTest'), NULL, NULL, 'DETAILED');

The total size of the index is not reduced by more than about four percent because the leaf levels of both indexes contain the same data. However, the non-leaf levels of the index with included columns only needs to contain the one column that is in the index's key (plus pointers to the next level), while, for the other index, all columns are part of the index key, making each row in the non-leaf level roughly the same size as that of the leaf level. Table 7-6 shows the layout of each level of a NoIncludedColumns index.

Table 7-6: Levels of the NoIncludedColumns Index
Open table as spreadsheet
Root	1 page with 4 rows pointing to the next level
1^st intermediate level	4 pages with a total of 70 rows pointing to the next level
2^nd intermediate level	70 pages with a total of 1,668 rows pointing to the next level
3^rd intermediate level	1,668 pages with a total of 40,000 rows pointing to the next level
Leaf level	40,000 pages containing all of the 1,000,000 rows of the index

Table 7-7 shows the layout of each level of an IncludedColumns index.

Table 7-7: Levels of the IncludedColumns Index
Open table as spreadsheet
Root	1 page with 145 rows pointing to the next level
Intermediate level	145 pages with a total of 40,001 rows pointing to the next level
Leaf level	40,001 pages containing all of the 1,000,000 rows of the index

Due to the fact that the rows in the non-leaf level pages of the NoIncludedColumns index are substantially larger than those of the IncludedColumns index, more pages and therefore more levels are needed to create the balanced tree for the index. Because the NoIncludedColumns index is two levels (40 percent) deeper than the IncludedColumns index, each search through the NoIncludedColumns index will need two more page reads to get to the bottom of the index. This might not sound like much, but if the index is used for repeated searches, such as for joins or if it is heavily queried, this will cause performance degradation.

In Table 7-8, three example queries are shown that join a table called Test.OtherTable with the Test.IncludedColumnsTest using different indexes. Note that the index hints (WITH(INDEX)) are only used to force SQL Server to use the specified index instead of the optimal index (which would be the IncludedColumns index). A new index named NotCovered is added to show the performance of a non-clustered index that does not cover the query. The following script defines additional objects and indexes required by the example:

 -- Create the NotCovered index. CREATE NONCLUSTERED INDEX NotCovered ON Test.IncludedColumnsTest (Col1); -- Create and populate the Test.OtherTable table. CREATE TABLE Test.OtherTable (     PKCol INT IDENTITY NOT NULL PRIMARY KEY     ,Col1 INT NOT NULL ); INSERT Test.OtherTable (Col1) SELECT Col1 FROM Test.IncludedColumnsTest;

Table 7-8: Levels of the *IncludedColumns* Index
Open table as spreadsheet
Query #	Definition	Page reads
Query #1 Index: IncludedColumns The execution plan is shown in Figure 7-5.	SELECT o.PKCol, i.Col2 FROMTest.OtherTableAS o INNER JOIN Test.IncludedColumnsTestAS i WITH(INDEX(IncludedColumns)) ON o.Col1 = i.Col1 WHERE o.PKCol BETWEEN 1 AND 10000;	32,726 pages
Query #2 Index: NoIncludedColumns The execution plan is shown in Figure 7-5.	SELECT o.PKCol, i.Col2 FROM Test.OtherTable AS o INNER JOIN Test.IncludedColumnsTest AS i WITH(INDEX(NoIncludedColumns)) ON o.Col1 = i.Col1 WHERE o.PKCol BETWEEN 1 AND 10000;	53,994 pages
Query #3 Index: NotCovered The execution plan is shown in Figure 7-5.	SELECT o.PKCol, i.Col2 FROM Test.OtherTable AS o INNER JOIN Test.IncludedColumnsTest AS i WITH(INDEX(NotCovered)) ON o.Col1 = i.Col1 WHERE o.PKCol BETWEEN 1 AND 10000;	62,617 pages

Table 7-8: Levels of the IncludedColumns Index

Open table as spreadsheet

Query #

Definition

Page reads

Query #1 Index: IncludedColumns The execution plan is shown in Figure 7-5.

 SELECT o.PKCol, i.Col2 FROMTest.OtherTableAS o INNER JOIN Test.IncludedColumnsTestAS i      WITH(INDEX(IncludedColumns))      ON o.Col1 = i.Col1 WHERE o.PKCol BETWEEN 1      AND 10000;

32,726 pages

Query #2 Index: NoIncludedColumns The execution plan is shown in Figure 7-5.

 SELECT o.PKCol, i.Col2 FROM Test.OtherTable AS o INNER JOIN      Test.IncludedColumnsTest AS i      WITH(INDEX(NoIncludedColumns))      ON o.Col1 = i.Col1 WHERE o.PKCol BETWEEN 1      AND 10000;

53,994 pages

Query #3 Index: NotCovered The execution plan is shown in Figure 7-5.

 SELECT o.PKCol, i.Col2 FROM Test.OtherTable AS o INNER JOIN      Test.IncludedColumnsTest AS i      WITH(INDEX(NotCovered))      ON o.Col1 = i.Col1 WHERE o.PKCol BETWEEN 1      AND 10000;

62,617 pages

image from book
Figure 7-5: Execution plans for Queries 1–3 in SQL Server Management Studio

Query #1, with the IncludedColumns index, is the best-performing query with 32,726 page reads. Query #2, with the NoIncludedColumns index, used 53,994 page reads. As you can see, the difference in the number of page reads between the two indexes is roughly the same as the difference in index levels (40 percent). Query #3, with the NotCovered index, is the worst-performing query with 62,617 page reads because of the extra reads necessary to fetch the data that was not found in the index from the table. (Note the extra Nested Loops Join in the execution plan of Query #3.)

Using Clustered Indexes

Because a clustered index is the actual table reading from the clustered index, it never results in lookups. Therefore, the clustered index should generally be defined on columns that are often queried and typically return a lot of data. This is because the problem of lookups and fetching a large number of rows doesn't exist. Two good candidates for the clustered index are either the most frequently queried foreign key column of the table (a search on a foreign key typically returns many rows) or the most frequently searched date column. (Date searches generally return a large number of rows as well.)

Another important consideration when selecting the column or columns on which to create the clustered index is that the key size of the clustered index should be as small as possible. If a clustered index exists on a table, all non-clustered indexes on that table will use the key of the clustered index as the row pointer from the non-clustered index to the table. If a clustered index does not exist, the Row Identifier is used, which takes up eight bytes of storage in each row of each non-clustered index. This can significantly increase size for larger tables. Consider the following:

You have a table with 40,000,000 rows.
The table has five non-clustered indexes.
The clustered index key is 60 bytes wide. (This is not uncommon when you have clustered indexes that span a few columns.)

The total size of all row pointers from the non-clustered indexes on this table (only the pointers) would be:

40,000,000 * 5 * 60 = 12,000,000,000 bytes (close to 12 gigabytes)

If the clustered index was changed to be on only one column with a smaller data type, such as an integer for a foreign key, each row pointer would be only four bytes. Because four bytes is added to all duplicates of the clustered index key to keep it unique internally, the calculation uses eight bytes as the new clustered index key size.

40,000,000 * 5 * 8 = 1,600,000,000 bytes (close to 1.5 gigabytes)

The difference in storage needed is more than 10 gigabytes.

Exam Tip

For the exam, it's important to know the difference in non-clustered indexes depending on whether a clustered index exists on the table.

Read Performance vs. Write Performance

The addition of indexes typically only helps boost read performance. Write performance is typically degraded because the indexes must be kept up-to-date with the data in the table. If a table has five non-clustered indexes defined on it, an insert into that table is really six inserts: one for the table and one for each index. The same goes for deletes. With update statements, only indexes that contain the columns that are updated by the update statement must be touched.

When index keys are updated, the row in the index must be moved to the appropriate position in the index (which is not applicable for included columns). The result is that the update is split into a delete followed by an insert. Depending on the internal fragmentation of the index pages, this might also cause page splits.

Consider the following simple performance test on the Test.IndexInsertTest table containing 1,000,000 rows. 10,000 rows will be inserted in each test, and the table will be re-created between tests. First, the insert is performed against the table without any non-clustered indexes, then it is performed with one non-clustered index, and finally, with five non-clustered indexes.

 CREATE TABLE Test.IndexInsertTest (     PKCol UNIQUEIDENTIFIER NOT NULL DEFAULT NEWSEQUENTIALID()         PRIMARY KEY CLUSTERED     ,Col1 INT NOT NULL ); INSERT Test.IndexInsertTest (Col1) SELECT TOP(1000000)     ROW_NUMBER() OVER (ORDER BY message_id) AS Col1 FROM sys.messages AS sm CROSS JOIN ( SELECT TOP(15) 1 AS Col FROM sys.messages ) AS x; -- Rebuild the table's clustered index. ALTER INDEX ALL ON Test.OtherTable REBUILD; -- Created table containing the rows used to perform the inserts. CREATE TABLE Test.OtherTable (     PKCol INT IDENTITY(100000,4) NOT NULL PRIMARY KEY     ,OtherCol INT NOT NULL ) INSERT Test.OtherTable (OtherCol) SELECT Col1 FROM Test.IncludedColumnsTest     WHERE Col1 BETWEEN 1 AND 10000;

Following is the first test, without any non-clustered indexes defined on the table. The execution plan for this insert statement is shown in Figure 7-6.

 INSERT Test.IndexInsertTest (Col1)     SELECT PKCol FROM Test.OtherTable;

Figure 7-6: An execution plan from SQL Server Management Studio of the insert statement used in this test

The estimated query cost for the insert statement in this test was 0.74, and SQL Server touched 32,190 pages in the Test.IndexInsertTest table while performing the inserts.

Following is the second test with one non-clustered index defined on the table. The execution plan for this insert statement is shown in Figure 7-7.

 -- 1. Drop and re-create the Test.IndexInsertTest table. -- 2. Add one non-clustered index. CREATE NONCLUSTERED INDEX NCIdx1 ON Test.IndexInsertTest (Col1); -- 3. Execute the insert statement. INSERT Test.IndexInsertTest (Col1)     SELECT PKCol FROM Test.OtherTable;

Figure 7-7: An execution plan from SQL Server Management Studio of the insert statement used in this test

The estimated query cost for the insert statement in this test was 1.58, and SQL Server touched 65,125 pages in the Test.IndexInsertTest table while performing the inserts. This is roughly both twice the cost and number of pages compared with Test #1.

Following is the third test, with five non-clustered indexes defined on the table. The execution plan for this insert statement is shown in Figure 7-8.

 -- 1. Drop and recreate the Test.IndexInsertTest table. -- 2. Add five non-clustered indexes. CREATE NONCLUSTERED INDEX NCIdx1 ON Test.IndexInsertTest (Col1); CREATE NONCLUSTERED INDEX NCIdx2 ON Test.IndexInsertTest (Col1); CREATE NONCLUSTERED INDEX NCIdx3 ON Test.IndexInsertTest (Col1); CREATE NONCLUSTERED INDEX NCIdx4 ON Test.IndexInsertTest (Col1); CREATE NONCLUSTERED INDEX NCIdx5 ON Test.IndexInsertTest (Col1); -- 3. Execute the insert statement. INSERT Test.IndexInsertTest (Col1)     SELECT PKCol FROM Test.OtherTable;

image from book
Figure 7-8: An execution plan from SQL Server Management Studio of the insert statement used in this test

This time, the estimated query cost for the insert statement was 4.47, and SQL Server touched a staggering 196,853 pages in the Test.IndexInsertTest table while performing the inserts. As you can see, the cost for performing the inserts is roughly doubled with each new non-clustered index. However, in this case, each non-clustered index is roughly the same size (same width) as the table itself. For typical tables, the non-clustered indexes are narrower than the table and will not hurt performance (percentage-wise) to the same degree as in this test.

Because the ratio between read and write operations varies greatly between systems, and even tables, it is a good idea to create indexes to optimize read performance and then test the effect that the created indexes has on write performance. As long as the write performance is acceptable (and you have enough disk space to manage the created indexes), you can keep the created indexes. It is also advisable to run such a test every so often to verify that the read versus write ratio for the table hasn't changed.

You should also note that both update and delete statements will benefit from certain indexes to locate the rows in the table that they need to update or delete.

Using Computed Columns

A computed column is generally derived from other columns in the same table and can reference both system-and user-defined functions in its definition. To be able to create an index on a computed column, it must adhere to a few requirements, which you can find in SQL Server 2005 Books Online under the topic "Creating Indexes on Computed Columns" at http://msdn2.microsoft.com/en-us/library/ms189292.aspx.

By defining a computed column and indexing it, it is possible to make queries that would generally require an index or table scan to use a seek operation instead. Consider the following query for sales orders in the AdventureWorks database. The query's execution plan is shown in Figure 7-9.

 USE AdventureWorks; -- First create an index on the OrderDate column -- to support this query. CREATE NONCLUSTERED INDEX OrderDateIndex ON     Sales.SalesOrderHeader (OrderDate); GO SET STATISTICS IO ON; SELECT COUNT(*) FROM Sales.SalesOrderHeader WHERE MONTH(OrderDate) = 5;

image from book
Figure 7-9: An execution plan of the previous select statement from SQL Server Management Studio

Because the query did not use a valid search argument by modifying the column in the WHERE clause by applying a function to it, the OrderDateIndex index can only be used for scanning and not for seeking. To be able to produce an index seek, SQL Server must maintain an index of the result of the function call, in this case, MONTH(OrderDate). You can do this by adding a computed column to the table and indexing that column as follows. The query's execution plan is shown in Figure 7-10.

 -- Add the column. ALTER TABLE Sales.SalesOrderHeader     ADD OrderMonth AS MONTH(OrderDate); -- Create an index on the computed column. CREATE NONCLUSTERED INDEX OrderMonthIndex     ON Sales.SalesOrderHeader (OrderMonth); GO SET STATISTICS IO ON; -- Run the query and reference the new column. SELECT COUNT(*) FROM Sales.SalesOrderHeader WHERE OrderMonth = 5;

image from book
Figure 7-10: An execution plan of the select using the computed column in the WHERE clause

This time, the query performs a seek operation on the index of the computed column, resulting in only eight page reads. Depending on the complexity of your query and computed column definition, the optimizer can automatically use the index of the computed column without the computed column being referenced in the query, as follows. This query will also get the execution plan that was shown in Figure 7-10.

 SET STATISTICS IO ON; -- Run the query without referencing the computed column. SELECT COUNT(*) FROM Sales.SalesOrderHeader WHERE MONTH(OrderDate) = 5;

As you can see, SQL Server used the index of the computed column without having a reference to it in the query. This is a great feature because it makes it possible to add computed columns and index them without having to change the queries in applications or stored procedures to use the new index.

Besides using computed columns with indexes with function calls, you can also use indexed computed columns to provide indexes in different collations. Consider that we have the table Test.Person with the column Name using the Latin1_General_CI_AI collation. Now we want to find all rows starting with the character "Ö." In Latin1_General, the dots over the O are just considered accents, but in other languages, such as German and Swedish, Ö is a different character than O. Consider that the table is typically queried by English customers that will expect to get both "O" and "Ö" back from a search such as LIKE 'Ö%' and occasionally by Swedish customers who will expect to get only "Ö" back from the search. Because the table is typically queried by English customers, it makes sense to keep the Latin1_General_CI_AI collation, and when Swedish customers query the table to use the COLLATE keyword to explicitly use the Finnish_Swedish_CI_AI collation. Review the following script and queries. The execution plans for the two queries in the following script are shown in Figures 7-11 and 7-12.

 -- Create and populate the table CREATE TABLE Test.ProductNames (     Name NVARCHAR(50) COLLATE Latin1_General_CI_AI ); INSERT Test.ProductNames (Name) VALUES ('Öl'); INSERT Test.ProductNames (Name) VALUES ('Olja'); INSERT Test.ProductNames (Name) VALUES ('Beer'); INSERT Test.ProductNames (Name) VALUES ('Oil'); CREATE CLUSTERED INDEX NameIndex ON Test.ProductNames     (Name); GO -- Query #1 -- Query for all product names that begin with the letter Ö -- using the default collation. SELECT Name FROM Test.ProductNames     WHERE Name LIKE 'Ö%'; -- Result: Name ------------- Oil Öl Olja -- Query #2 -- Query for all product names that begin with the letter Ö -- using the Finnish_Swedish_CI_AI collation. SELECT Name FROM Test.ProductNames     WHERE Name LIKE 'Ö%' COLLATE Finnish_Swedish_CI_AI; -- Result: Name ------------- Öl

image from book
Figure 7-11: An execution plan of Query #1 in SQL Server Management Studio

image from book
Figure 7-12: An execution plan of Query #2 in SQL Server Management Studio

Comparing the execution plans of Query 1 (Figure 7-11) and Query 2 (Figure 7-12), we can see that, because the comparison in Query 2 needs to use a collation other than that of the column (and therefore, the index), a clustered index scan is used instead of an index seek, as in Query 1. By adding an indexed computed column to this table and specifying the Finnish_Swedish_CI_AS collation for this column, SQL Server can automatically use that index instead (which is shown below). Note that this is only a viable solution if you are using a relatively low number of collations because these indexes will need to be both stored and maintained like all other indexes. The execution plan for the query in the following script is shown in Figure 7-13.

 -- Add a computed column with another collation. ALTER TABLE Test.ProductNames      ADD Name_Finnish_Swedish_CI_AI         AS Name COLLATE Finnish_Swedish_CI_AI; -- Create an index on the computed column. CREATE NONCLUSTERED INDEX NameIndex2 ON Test.ProductNames     (Name_Finnish_Swedish_CI_AI); GO -- Query for all product names that begin with the letter Ö -- using the Finnish_Swedish_CI_AI collation without specifying -- the computed column. SELECT Name FROM Test.ProductNames     WHERE Name LIKE 'Ö%' COLLATE Finnish_Swedish_CI_AI; -- Result: Name ------------- Öl

image from book
Figure 7-13: An execution plan of the query in SQL Server Management Studio

Using Indexed Views

A normal database view is just a named select statement that can be used from other select statements. These views have no particular impact on performance. Beginning with SQL Server 2000, you could create one or more indexes on a view because the view satisfies certain requirements. These requirements are quite extensive and can be found in Books Online under the heading "Creating Indexed Views" at http://msdn2.microsoft.com/en-us/library/ms191432.aspx. By creating an index on a view, the view is materialized. This means that, in the logical sense, it is still a view, but now that view actually stores the data found in the view. If the data is changed in the tables on which the view is based, the view is automatically updated to reflect those changes.

Creating indexed views can greatly improve read performance of queries. An important aspect of indexed views is that the query optimizer can automatically detect that an indexed view that satisfies a certain query exists and can use it, even if the indexed view is referenced in the query. This, however, is only true for SQL Server 2005 Enterprise Edition and Developer Edition.

The following example shows a query and its execution plan (shown in Figure 7-14) without an indexed view:

 USE AdventureWorks; SELECT     p.Name     ,sod.OrderQty     ,soh.OrderDate FROM Production.Product AS p INNER JOIN Sales.SalesOrderDetail AS sod     ON sod.ProductID = p.ProductID INNER JOIN Sales.SalesOrderHeader AS soh     ON soh.SalesOrderID = sod.SalesOrderID WHERE soh.TerritoryID = 1;

image from book
Figure 7-14: An execution plan of the query in SQL Server Management Studio

The cost of the previous query was 2.03. Next, an indexed view is created to optimize the query, and the same query is executed again. The execution plan for this query is shown in Figure 7-15. The first index created on a view must materialize the entire view, which means that it must be a clustered index. It is also a requirement that the first index be unique (which is why the column SalesOrderDetailID has been added to the example's indexed view).

 CREATE VIEW Sales.ProductsSoldVw WITH SCHEMABINDING AS SELECT     soh.TerritoryID     ,sod.SalesOrderDetailID     ,p.Name     ,sod.OrderQty     ,soh.OrderDate FROM Production.Product AS p INNER JOIN Sales.SalesOrderDetail AS sod     ON sod.ProductID = p.ProductID INNER JOIN Sales.SalesOrderHeader AS soh     ON soh.SalesOrderID = sod.SalesOrderID GO CREATE UNIQUE CLUSTERED INDEX ProductsSoldVwIdx     ON Sales.ProductsSoldVw (TerritoryID, SalesOrderDetailID); GO SELECT     p.Name     ,sod.OrderQty     ,soh.OrderDate FROM Production.Product AS p INNER JOIN Sales.SalesOrderDetail AS sod     ON sod.ProductID = p.ProductID INNER JOIN Sales.SalesOrderHeader AS soh     ON soh.SalesOrderID = sod.SalesOrderID WHERE soh.TerritoryID = 1;

image from book
Figure 7-15: An execution plan of the query in SQL Server Management Studio

This time, the indexed view is used by the query (even though it is not referenced) and the query cost drops to 0.14.

After you have created the unique clustered index on the view, it is possible to create additional non-clustered indexes on the same view. If you are using any edition of SQL Server other than Enterprise Edition or Developer Edition, your query must directly reference the view, and you must add the optimizer hint WITH(NOEXPAND) to the query.

 SELECT     Name     ,OrderQty     ,OrderDate FROM Sales.ProductsSoldVw WITH(NOEXPAND) WHERE TerritoryID = 1;

Analyzing Index Usage

Because indexes incur a cost (for storage space and for keeping them up-to-date when DML statements are executed), it is important to keep track of which indexes are actually being used. If an index is never used, it is likely that it can be dropped to both save storage space and reduce the cost of write operations. It is important to keep in mind that some indexes are created for a specific purpose; for example, to optimize the monthly salary reports. Therefore, you should be careful when dropping unused or seldom-used indexes. When you drop an index that is seldom used, you should document your actions so that the dropped index can be recreated if it is needed later.

In earlier versions of SQL Server, there was no simple query that could be executed to find out how frequently specific indexes were used. One possible method was to run a SQL Server Profiler trace and track the use of indexes. This trace could later be analyzed to find out which indexes were used, or rather, which indexes were not used.

In SQL Server 2005, you can query a dynamic management view called sys.dm_db_index_usage_stats to find the index-usage information. The columns from this view that are particularly interesting are shown in Table 7-9. (You can see the whole table at SQL Server 2005 Books Online at http://msdn2.microsoft.com/en-us/library/ms188755.aspx.)

Table 7-9: Subset of the sys.dm_db_index_usage_stats DMV
Open table as spreadsheet
Column name	Data type	Description
database_id	smallint	ID of the database on which the table or view is defined
object_id	int	ID of the table or view on which the index is defined
index_id	int	ID of the index
user_seeks	bigint	Number of seeks by user queries
user_scans	bigint	Number of scans by user queries
user_lookups	bigint	Number of lookups by user queries
user_updates	bigint	Number of updates by user queries
last_user_seek	datetime	Time of last user seek
last_user_scan	datetime	Time of last user scan
last_user_lookup	datetime	Time of last user lookup

You should typically query this view for indexes that have low values in the user_seeks or user_scans column. All values in this view are reset whenever the SQL Server service is restarted. The values for a specific database are removed if the database is either detached or shut down. Note that indexes that have not been used since the view was reset are not included in the view.

Lab: Optimizing Queries Through Indexing

In this lab, you will use two different indexing techniques to optimize a specific query. The query returns the Customer ID and the total amount for all purchases for all customers (that have made purchases) in a specific territory. This lab only considers read performance and does not take write performance into account. The completed lab is available in the \Labs\Chapter 07\Lab2 folder on the companion CD.

Following is the query that you will try to optimize:

 -- Query that will be optimized: SELECT     soh.CustomerID     ,SUM(sod.OrderQty * sod.UnitPrice) AS TotalPurchases FROM Test.SalesOrderHeader AS soh INNER JOIN Test.SalesOrderDetail AS sod     ON sod.SalesOrderID = soh.SalesOrderID WHERE soh.TerritoryID = 1 GROUP BY soh.CustomerID;

Exercise 1: Create a Performance Base Line for the Query

In this exercise, you will create the base line for the query that needs to be optimized by executing it without adding any indexes.

Open SQL Server Management Studio and connect to an instance of SQL Server 2005.

In a new query window, type and execute the following SQL statements to create the TestDB database, the Test schema, and the two tables that will be used in this exercise:

 CREATE DATABASE TestDB; GO USE TestDB; GO CREATE SCHEMA Test; GO SELECT * INTO Test.SalesOrderHeader FROM AdventureWorks.Sales.SalesOrderHeader; GO SELECT * INTO Test.SalesOrderDetail FROM AdventureWorks.Sales.SalesOrderDetail; GO ALTER TABLE Test.SalesOrderHeader     ADD CONSTRAINT PKSalesOrderHeader     PRIMARY KEY(SalesOrderID); GO ALTER TABLE Test.SalesOrderDetail     ADD CONSTRAINT PKSalesOrderDetail     PRIMARY KEY(SalesOrderDetailID);

Turn on Include Actual Execution Plan in SQL Server Management Studio by pressing Ctrl+M.
Type and execute the following SQL statement to turn on the reporting of page reads:
```
 SET STATISTICS IO ON; 
```
Type and execute the query to test its performance:
```
 SELECT     soh.CustomerID     ,SUM(sod.OrderQty * sod.UnitPrice) AS TotalPurchases FROM Test.SalesOrderHeader AS soh INNER JOIN Test.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID WHERE soh.TerritoryID = 1 GROUP BY soh.CustomerID; 
```
What was the total cost of the query?

(You can find the value in the Execution Plan tab by moving the mouse pointer over the SELECT operator and locating the value named Estimated Subtree Cost.)

What was the total number of page reads for the query?

(You can find this value by scrolling to the bottom of the Messages tab and summarizing the values for logical reads.)
Type and execute the following SQL statement to clean up after this exercise:
```
 USE master; DROP DATABASE TestDB; 
```

Exercise 2: Optimize the Query by Using Clustered Indexes

In this exercise, you will optimize the query by modifying the primary key constraints to be non-clustered indexes and then creating appropriate clustered indexes.

Open SQL Server Management Studio and connect to an instance of SQL Server 2005.

In a new query window, type and execute the following SQL statements to create the TestDB database, the Test schema, and the two tables that will be used in this exercise:

 CREATE DATABASE TestDB; GO USE TestDB; GO CREATE SCHEMA Test; GO SELECT * INTO Test.SalesOrderHeader FROM AdventureWorks.Sales.SalesOrderHeader; GO SELECT * INTO Test.SalesOrderDetail FROM AdventureWorks.Sales.SalesOrderDetail; GO ALTER TABLE Test.SalesOrderHeader     ADD CONSTRAINT PKSalesOrderHeader     PRIMARY KEY(SalesOrderID); GO ALTER TABLE Test.SalesOrderDetail     ADD CONSTRAINT PKSalesOrderDetail     PRIMARY KEY(SalesOrderDetailID);

Type and execute the following SQL statements to modify the primary key constraint on the Test.SalesOrderHeader table to become a non-clustered index and then create an appropriate clustered index for the query:

 -- Modify the PK to be a non-clustered index. ALTER TABLE Test.SalesOrderHeader     DROP CONSTRAINT PKSalesOrderHeader; ALTER TABLE Test.SalesOrderHeader     ADD CONSTRAINT PKSalesOrderHeader     PRIMARY KEY NONCLUSTERED (SalesOrderID); -- Create the clustered index. CREATE CLUSTERED INDEX CluIdx ON Test.SalesOrderHeader     (TerritoryID, CustomerID);

Type and execute the following SQL statements to modify the primary key constraint on the Test.SalesOrderDetail table to become a non-clustered index and then create an appropriate clustered index for the query:

 -- Modify the PK to be a non-clustered index. ALTER TABLE Test.SalesOrderDetail     DROP CONSTRAINT PKSalesOrderDetail; ALTER TABLE Test.SalesOrderDetail     ADD CONSTRAINT PKSalesOrderDetail     PRIMARY KEY NONCLUSTERED (SalesOrderDetailID); -- Create the clustered index. CREATE CLUSTERED INDEX CluIdx ON Test.SalesOrderDetail     (SalesOrderID);

Turn on Include Actual Execution Plan in SQL Server Management Studio by pressing Ctrl+M.
Type and execute the following SQL statement to turn on the reporting of page reads:
```
 SET STATISTICS IO ON; 
```

Type and execute the query to test its performance:

 SELECT     soh.CustomerID     ,SUM(sod.OrderQty * sod.UnitPrice) AS TotalPurchases FROM Test.SalesOrderHeader AS soh INNER JOIN Test.SalesOrderDetail AS sod     ON sod.SalesOrderID = soh.SalesOrderID WHERE soh.TerritoryID = 1 GROUP BY soh.CustomerID;

What was the total cost of the query?

What was the total number of page reads for the query?

Type and execute the following SQL statement to clean up after this exercise:
```
 USE master; DROP DATABASE TestDB; 
```

Exercise 3: Optimize the Query by Using Covered Non-Clustered Indexes

In this exercise, you will optimize the query by creating covered non-clustered indexes.

Open SQL Server Management Studio and connect to an instance of SQL Server 2005.

In a new query window, type and execute the following SQL statements to create the TestDB database, the Test schema, and the two tables that will be used in this exercise:

 CREATE DATABASE TestDB; GO USE TestDB; GO CREATE SCHEMA Test; GO SELECT * INTO Test.SalesOrderHeader FROM AdventureWorks.Sales.SalesOrderHeader; GO SELECT * INTO Test.SalesOrderDetail FROM AdventureWorks.Sales.SalesOrderDetail; GO ALTER TABLE Test.SalesOrderHeader     ADD CONSTRAINT PKSalesOrderHeader     PRIMARY KEY(SalesOrderID); O ALTER TABLE Test.SalesOrderDetail    ADD CONSTRAINT PKSalesOrderDetail      PRIMARY KEY(SalesOrderDetailID);

Type and execute the following SQL statement to create the covered non-clustered index that will be used by the query when accessing the Test.SalesOrderHeader table:
```
 CREATE NONCLUSTERED INDEX TestIndex ON Test.SalesOrderHeader (TerritoryID, SalesOrderID) INCLUDE (CustomerID); 
```
Type and execute the following SQL statement to create the covered non-clustered index that will be used by the query when accessing the Test.SalesOrderDetail table:
```
 CREATE NONCLUSTERED INDEX TestIndex ON Test.SalesOrderDetail (SalesOrderID) INCLUDE (OrderQty, UnitPrice); 
```
Turn on Include Actual Execution Plan in SQL Server Management Studio by pressing Ctrl+M.
Type and execute the following SQL statement to turn on the reporting of page reads:
```
 SET STATISTICS IO ON; 
```
Type and execute the query to test its performance:
```
 SELECT     soh.CustomerID     ,SUM(sod.OrderQty * sod.UnitPrice) AS TotalPurchases FROM Test.SalesOrderHeader AS soh INNER JOIN Test.SalesOrderDetail AS sod     ON sod.SalesOrderID = soh.SalesOrderID WHERE soh.TerritoryID = 1 GROUP BY soh.CustomerID; 
```
What was the total cost of the query?

(You can find the value in the Execution Plan tab by moving the mouse pointer over the SELECT operator and locating the value named Estimated Subtree Cost.)

What was the total number of page reads for the query?

(You can find this value by scrolling to the bottom of the Messages tab and summarizing the values for logical reads.)
Type and execute the following SQL statement to clean up after this exercise:
```
 USE master; DROP DATABASE TestDB; 
```

Exercise 4: Optimize the Query by Implementing an Indexed View

In this exercise, you will optimize the query by creating an indexed view to cover the query.

Open SQL Server Management Studio and connect to an instance of SQL Server 2005.

In a new query window, type and execute the following SQL statements to create the TestDB database, the Test schema, and the two tables that will be used in this exercise:

 CREATE DATABASE TestDB; GO USE TestDB; GO CREATE SCHEMA Test; GO SELECT * INTO Test.SalesOrderHeader FROM AdventureWorks.Sales.SalesOrderHeader; GO SELECT * INTO Test.SalesOrderDetail FROM AdventureWorks.Sales.SalesOrderDetail; GO ALTER TABLE Test.SalesOrderHeader     ADD CONSTRAINT PKSalesOrderHeader     PRIMARY KEY(SalesOrderID); GO ALTER TABLE Test.SalesOrderDetail     ADD CONSTRAINT PKSalesOrderDetail     PRIMARY KEY(SalesOrderDetailID);

Type and execute the following SQL statement to create the view:

 CREATE VIEW Test.SalesByCustomerVw WITH SCHEMABINDING AS SELECT     soh.TerritoryID     ,soh.CustomerID     ,SUM(sod.OrderQty * sod.UnitPrice) AS TotalPurchases     ,COUNT_BIG(*) AS NumberOfRows FROM Test.SalesOrderHeader AS soh INNER JOIN Test.SalesOrderDetail AS sod     ON sod.SalesOrderID = soh.SalesOrderID GROUP BY soh.TerritoryID, soh.CustomerID;

Type and execute the following SQL statement to index the view:

 CREATE UNIQUE CLUSTERED INDEX SalesByCustomerVwIdx ON Test.SalesByCustomerVw (TerritoryID, CustomerID);

Turn on Include Actual Execution Plan in SQL Server Management Studio by pressing Ctrl+M.
Type and execute the following SQL statement to turn on the reporting of page reads:
```
 SET STATISTICS IO ON; 
```

Type and execute the query to test its performance:

 SELECT     soh.CustomerID     ,SUM(sod.OrderQty * sod.UnitPrice) AS TotalPurchases FROM Test.SalesOrderHeader AS soh INNER JOIN Test.SalesOrderDetail AS sod     ON sod.SalesOrderID = soh.SalesOrderID WHERE soh.TerritoryID = 1 GROUP BY soh.CustomerID;

Important

Verify that the indexed view is used

Verify that the indexed view is used to execute the query by examining the query execution plan. If the indexed view is not used (which is probably because you are running neither SQL Server 2005 Developer Edition nor Enterprise Edition), execute the following query to force the use of the indexed view:

 SELECT CustomerID, TotalPurchases FROM Test.SalesByCustomerVw WITH(NOEXPAND) WHERE TerritoryID = 1;

What was the total cost of the query?

What was the total number of page reads for the query?

Type and execute the following SQL statement to clean up after this exercise:
```
 USE master; DROP DATABASE TestDB; 
```

Exercise 5: Compare the Test Results

In this exercise, you will compare the results from the earlier exercises.

Enter the cost and page read count for each index technique in the following table.

Open table as spreadsheet

Index technique	Cost	Pages
Base line
Clustered indexes
Covered non-clustered indexes
Indexed view

Which of these techniques provided the lowest (best) cost?

Which of these techniques provided the lowest (best) page count?