Special Indexes | Inside Microsoft SQL Server 2005: The Storage Engine (Solid Quality Learning)

SQL Server 2005 allows you to create two special kinds of indexes: indexes on computed columns and indexes on views. Without indexes, both of these constructscomputed columns and viewsare purely logical. There is no physical storage for the data involved. A computed column is not stored with the table data; it is recomputed every time a row is accessed. A view does not save any data; it basically saves a SELECT statement that is re-executed every time the data in the view is accessed. With the new special indexes, SQL Server actually materializes what was only logical data into the physical leaf level of an index.

Prerequisites

Before you can create indexes on either computed columns or views, certain prerequisites must be met. The biggest issue is that SQL Server must be able to guarantee that given the identical base table data, the same values will always be returned for any computed columns or for the rows in a view. To guarantee that the same values will always be generated, these special indexes have certain requirements, which fall into three categories. First, a number of session-level options must be set to a specific value. Second, there are some restrictions on the functions that can be used within the column or view definition. The third requirement, which applies only to indexed views, is that the tables that the view is based on must meet certain criteria.

SET Options

The following seven SET options can affect the result value of an expression or predicate, so you must set them as shown to create indexed views or indexes on computed columns:

SET CONCAT_NULL_YIELDS_NULL ON SET QUOTED_IDENTIFIER ON SET ANSI_NULLS ON SET ANSI_PADDING ON SET ANSI_WARNINGS ON SET NUMERIC_ROUNDABORT OFF

Note that all the options have to be ON except the NUMERIC_ROUNDABORT option, which has to be OFF. Technically, the option ARITHABORT must also be set to ON, but in SQL Server 2005, setting ANSI_WARNINGS to ON automatically sets ARITHABORT to ON, so you do not need to set it separately. If any of these options is not set as specified, you'll get an error message when you create a special index. In addition, if you've already created one of these indexes and then you attempt to modify the column or view on which the index is based, you'll get an error. If you issue a SELECT that normally should use the index, and if the SET options do not have the values indicated, the index will be ignored but no error will be generated.

There are a couple of way to determine whether the SET options are set appropriately before you create one of these special indexes. You can use the property function SESSIONPROPERTY to test the settings for your current connection. A returned value of 1 means that the setting is ON, and a 0 means that it is OFF. The following example checks the current session setting for the option NUMERIC_ROUNDABORT:

SELECT SessionProperty('NUMERIC_ROUNDABORT')

Alternatively you can use the DMV sys.dm_exec_session to check the SET options for any connection. The following query returns the values for five of the previously discussed six set options for the current session:

SELECT quoted_identifier, arithabort, ansi_warnings,        ansi_padding, ansi_nulls, concat_null_yields_null FROM sys.dm_exec_sessions WHERE session_id = @@spid

Unfortunately, NUMERIC_ROUNDABORT is not included in the sys.dm_exec_session view. There is no way to see the setting for that value for other connections besides the current one.

Permissible Functions

A function is either deterministic or nondeterministic. If the function returns the same result every time it is called with the same set of input values, it is deterministic. If it can return different results when called with the same set of input values, it is nondeterministic. For the purposes of indexes, a function is considered deterministic if it always returns the same values for the same input values when all the SET options have the required settings. Any function used in a computed column's definition or used in the SELECT list or WHERE clause of an indexable view must be deterministic.

SQL Server 2005 Books Online contains a complete list of which supplied functions are deterministic and which are nondeterministic. Some functions can be either deterministic or nondeterministic, depending on how they are used, and Books Online also describes these functions.

It might seem that the list of nondeterministic functions is quite restrictive, but SQL Server must be able to guarantee that the values stored in the index will be consistent. In some cases, the restrictions might be overly cautious, but the downside of being not cautious enough might be that your indexed views or indexes on computed columns are meaningless. The same restrictions apply to functions you use in your own user-defined functions (UDFs)that is, your own functions cannot be based on any nondeterministic built-in function. UDFs are explained in detail in Inside Microsoft SQL Server 2005: T-SQL Programming. You can verify the determinism property of any function by using the OBJECTPROPERTY function:

SELECT OBJECTPROPERTY(object_id('<function_name>'), 'IsDeterministic')

Even if a function is deterministic, if it contains float or real expressions, the result of the function might vary with different processors, depending on the processor architecture or microcode version. Expressions or functions containing values of the data type float or real are therefore considered to be imprecise. To guarantee consistent values even when moving a database from one machine to another (by detaching and attaching, or by backup and restore), imprecise values can only be used in key columns of indexes if they are physically stored in the database and not recomputed. An imprecise value can be used if it is the value of a stored column in a table or if it is a computed column that is marked as persisted. I'll discuss persisted columns in more detail in the upcoming section titled "Indexes on Computed Columns."

Schema Binding

To create an indexed view, a requirement on the table itself is that the definition of any underlying object's schema cannot change. To prevent a change in schema definition, the CREATE VIEW statement allows the WITH SCHEMABINDING option. When you specify WITH SCHEMABINDING, the SELECT statement that defines the view must include the two-part names (schema.object) of all referenced tables. You can't drop or alter tables that participate in a view created with the SCHEMABINDING clause unless you've dropped that view or changed the view so that it no longer has schema binding. Otherwise, SQL Server raises an error. If any of the tables on which the view is based is owned by someone other than the user creating the view, the view creator doesn't automatically have the right to create the view with schema binding because that would restrict the table's owner from making changes to her own table. A user must be granted REFERENCES permission in order to create a view with schema binding on that table. We'll see an example of schema binding in a moment.

Indexes on Computed Columns

SQL Server 2005 allows you to build indexes on deterministic, precise computed columns where the resulting data type is otherwise indexable. This means that the column's data type cannot be text, ntext, or image. Such a computed column can participate at any position of an index or in a PRIMARY KEY or UNIQUE constraint. You cannot define a FOREIGN KEY, CHECK, or DEFAULT constraint on a computed column, and computed columns are always considered nullable unless you enclose the expression in the ISNULL function. You cannot create indexes on any computed columns in system tables. When you create an index on computed columns, the six SET options must first have the correct values set.

Here's an example:

CREATE TABLE t1 (a int, b as 2*a); GO CREATE INDEX i1 on t1 (b); GO

If any of your SET options does not have the correct value when you create the table, you get this message when you try to create the index:

Server: Msg 1934, Level 16, State 1, Line 2 CREATE INDEX failed because the following SET options have incorrect settings: '<OPTION NAME>'.

If more than one option has an incorrect value, the error message will report them all.

Here's an example that creates a table with a nondeterministic computed column:

CREATE TABLE t2 (a int, b datetime, c AS datename(mm, b)); GO CREATE INDEX i2 on t2 (c); GO

When you try to create the index on the computed column c, you get this error:

Msg 2729, Level 16, State 1, Line 1 Column 'c' in table 't2' cannot be used in an index or statistics or as a partition key becauseit is nondeterministic.

Column c is nondeterministic because the month value of datename can have different values depending on the language you're using.

Using the COLUMNPROPERTY Function

You can use the IsDeterministic column property to determine before you create an index on a computed column whether that column is deterministic. If you specify this property, the COLUMNPROPERTY function returns 1 if the column is deterministic and 0 otherwise. The result is undefined for noncomputed columns, so you should consider checking the IsComputed property before you check the IsDeterministic property. The following example detects that column c in table t2 in the previous example is nondeterministic:

SELECT COLUMNPROPERTY(object_id('t2'), 'c', 'IsDeterministic');

The value 0 is returned, which means that column c is nondeterministic. Note that the COLUMNPROPERTY function requires an object ID for the first argument and a column name for the second argument.

Implementation of a Computed Column

If you create a clustered index on a computed column, the computed column is no longer a virtual column in the table. Its computed value will physically exist in the rows of the table, which is the leaf level of the clustered index. Updates to the columns that the computed column is based on will also update the computed column in the table itself. For example, in the t1 table created previously, if we insert a row with the value 10 in column a, the row will be created with both the values 10 and 20 in the actual data row. If we then update the 10 to 15, the second column will be automatically updated to 30.

Persisted Columns

The ability to mark a column as persisted, a new feature in SQL Server 2005, allows storage of computed values in a table, even before you build an index. In fact, this feature was added to the product to allow columns of computed values from underlying table columns of type float or real to have indexes built on them. The alternative, when you want an index on such a column, would be to drop and re-create the underlying column, which can involve an enormous amount of overhead on a large table.

Here's an example. In the Northwind database, the Order Details table has a column called Discount that is of type real. The following code adds a computed column called Final that shows the total price for an item after the discount is applied. The statement to build an index on Final will fail because the resultant column involving the real value is imprecise.

USE Northwind; GO ALTER TABLE [Order Details]    ADD Final AS     (Quantity * UnitPrice) - Discount * (Quantity * UnitPrice); GO CREATE INDEX OD_Final_Index on [Order Details](Final); Error Message: Msg 2799, Level 16, State 1, Line 1 Cannot create index or statistics 'OD_Final_Index' on table 'Order Details' because the comp uted column 'Final' is imprecise and not persisted. Consider removing column from index or statistics key or marking computed column persisted.

Without persisted columns, the only way to create an index on a computed column containing the final price would be to drop the Discount column from the table and redefine it. Any existing indexes on Discount would have to dropped also, and then rebuilt. With persisted columns, all you need to do is drop the computed column, which is a metadata-only operation, and then redefine it as a persisted column. You can then build the index on the computed, persisted column.

ALTER TABLE [Order Details]    DROP COLUMN Final; GO ALTER TABLE [Order Details]    ADD Final AS    (Quantity * UnitPrice) - Discount * (Quantity * UnitPrice) PERSISTED; GO CREATE INDEX OD_Final_Index on [Order Details](Final);

When determining whether you have to use the PERSISTED option, use the COLUMNPROPERTY function and the IsPrecise property to determine whether a deterministic column is precise.

SELECT COLUMNPROPERTY (object_id('Order Details'), 'Final', 'IsPrecise');

You can also use persisted columns when you define partitions. A computed column that is used as the partitioning column must be explicitly marked as PERSISTED, whether it is imprecise. We'll look at partitioning later in this chapter.

Indexed Views

Indexed views in SQL Server are similar to what other products call materialized views. One of the most important benefits of indexed views is the ability materialize summary aggregates of large tables. For example, consider a customer table containing rows for several million U.S.-based customers, from which you want information regarding customers in each state. You can create a view based on a GROUP BY query, grouping by state and containing the count of orders per state. Normal views are only saved queries (as discussed in Inside Microsoft SQL Server 2005: T-SQL Programming) and do not store the results. Every time the view is referenced, the aggregation to produce the grouped results must be recomputed. When you create an index on the view, the aggregated data is stored in the leaf level of the index. So instead of millions of customer rows, your indexed view has only 50 rows, one for each state. Your aggregate reporting queries can then be processed using the indexed views without having to scan the underlying large tables. The first index you must build on a view is a clustered index, and because the clustered index contains all the data at its leaf level, this index actually does materialize the view. The view's data is physically stored at the leaf level of the clustered index.

Additional Requirements

In addition to the requirement that all functions used in the view be deterministic and that the required SET options be set to the appropriate values, the view definition can't contain any of the following:

TOP
text, ntext, or image columns
DISTINCT
MIN, MAX, COUNT(*), COUNT(<expression>), STDEV, VARIANCE, AVG
SUM on a nullable expression
A derived table
The ROWSET function
Another view (you can reference only base tables)
UNION
Subqueries, OUTER joins, or self-joins
Full-text predicates (CONTAINS, FREETEXT)
COMPUTE, COMPUTE BY
ORDER BY

Also, if the view definition contains GROUP BY, you must include the aggregate COUNT_BIG(*) in the SELECT list. COUNT_BIG returns a value of the data type BIGINT, which is an 8-byte integer. A view that contains GROUP BY can't contain HAVING, CUBE, ROLLUP, or GROUP BY ALL. Also, all GROUP BY columns must appear in the SELECT list. Note that if your view contains both SUM and COUNT_BIG(*), you can compute the equivalent of the AVG function even though AVG is not allowed in indexed views. Although these restrictions might seem severe, remember that they apply to the view definitions, not to the queries that might use the indexed views.

To verify that you've met all the requirements, you can use the OBJECTPROPERTY function's IsIndexable property. The following query tells you whether you can build an index on a view called Product Totals:

SELECT ObjectProperty(object_id('Product_Totals'), 'IsIndexable');

A return value of 1 means you've met all requirements and can build an index on the view.

Creating an Indexed View

The first step in building an index on a view is to create the view itself. Here's an example from the AdventureWorks database:

USE AdventureWorks; GO CREATE VIEW Vdiscount1 WITH SCHEMABINDING AS SELECT SUM(UnitPrice*OrderQty) AS SumPrice, SUM(UnitPrice*OrderQty*(1.00-UnitPriceDiscount)) AS SumDiscountPrice, COUNT_BIG(*) AS Count, ProductID FROM Sales.SalesOrderDetail GROUP BY ProductID;

Note the WITH SCHEMABINDING clause and the specification of the schema name (dbo) for the table. At this point, we have a normal viewa stored SELECT statement that uses no storage space. In fact, if we look at the data in sys.dm_db_partition_stats for this view, we'll see that no rows are returned.

SELECT convert(char(25),dbo.index_name(object_id, index_id)) AS index_name,      used_page_count, reserved_page_count, row_count FROM sys.dm_db_partition_stats WHERE object_id=object_id('dbo.Vdiscount1');

To create an indexed view, you must create an index. The first index you create on a view must be a unique clustered index. Clustered indexes are the only type of SQL Server index that contains data; the clustered index on a view contains all the data that makes up the view definition. This statement defines a unique clustered index for the view:

CREATE UNIQUE CLUSTERED INDEX VDiscount_Idx ON Vdiscount1 (ProductID);

After you create the index, you can reexamine sys.dm_db_partition_stats. My output looks like this:

index_name     used_page_count      reserved_page_count  row_count -------------- -------------------- -------------------- --------- VDiscountIdx   6                    6                    266

Data that composes the indexed view is persistent, with the indexed view storing the data in the clustered index's leaf level. You could construct something similar by using temporary tables to store the data you're interested in. But a temporary table is static and doesn't reflect changes to underlying data. In contrast, SQL Server automatically maintains indexed views, updating information stored in the clustered index whenever anyone changes data that affects the view.

After you create the unique clustered index, you can create multiple nonclustered indexes on the view. You can determine whether a view is indexed by using the OBJECTPROPERTY function's IsIndexed property. For the Vdiscount1 indexed view, the following statement returns a 1, which means the view is indexed:

SELECT ObjectProperty(object_id('Vdiscount1'), 'IsIndexed');

Once a view is indexed, metadata about space usage and location is available through the catalog view, just as for any other index.

Using an Indexed View

One of the most valuable benefits of indexed views is that your queries don't have to directly reference a view to use the index on the view. Consider the Vdiscount1 indexed view. Suppose that you issue the following SELECT statement:

SELECT ProductID, total_sales = SUM(UnitPrice*OrderQty) FROM Sales.SalesOrderDetail GROUP BY ProductID;

SQL Server's query optimizer will realize that the precomputed sums of all the UnitPrice * OrderQty values for each ProductID are already available in the index for the Vdiscoun1 view. The query optimizer will evaluate the cost of using that indexed view in processing the query, and the indexed view will very likely be used to access the information required to satisfy this querythe Sales.SalesOrderDetail table might never need to be touched at all.

Note

Although you can create indexed views in any edition of SQL Server 2005, in order for the optimizer to consider using them even when they aren't referenced in the query, the engine edition of your SQL Server 2005 must be Enterprise. This is the case with the Enterprise Edition as well as the Developer and Evaluation editions.

Just because you have an indexed view doesn't mean the query optimizer will always choose it for the query's execution plan. In fact, even if you reference the indexed view directly in the FROM clause, the query optimizer might decide to directly access the base table instead. To make sure that an indexed view in your FROM clause is not expanded into its underlying SELECT statement, you can use the NOEXPAND hint in the FROM clause. In Inside Microsoft SQL Server 2005: Tuning and Optimization, I'll tell you more about how the query optimizer decides whether to use indexed views and how you can tell if indexed views are being used for a query. I'll also revisit indexes on computed columns and show you some situations where you can make good use of them.