Special Indexes | Inside Microsoft SQL Server 2000

SQL Server 2000 allows you to create two special kinds of indexes: indexes on computed columns and indexes on views. Without indexes, both of these constructs are purely logical. There is no physical storage for the data involved. A computed column is not stored with the table data; it is recomputed every time a row is accessed. A view does not save any data; it basically saves a SELECT statement that is reexecuted every time the data in the view is accessed. With the new special indexes, SQL Server actually materializes what was only logical data into the physical leaf level of an index.

Prerequisites

Before you can create indexes on either computed columns or views, certain prerequisites must be met. The biggest issue is that SQL Server needs to be able to guarantee that given the identical base table data, the same values will always be returned for any computed columns or for the rows in a view. To guarantee that the same values will always be generated, these special indexes have certain requirements, which fall into three categories. First, a number of session-level options must be set to a specific value. Second, there are some restrictions on the functions that can be used within the column or view definition. The third requirement, which applies only to indexed views, is that the tables that the view is based on must meet certain criteria.

SET Options

The following seven SET options can affect the result value of an expression or predicate, so you must set them as shown to create indexed views or indexes on computed columns:

 SET ARITHABORT ON SET CONCAT_NULL_YIELDS_NULL ON SET QUOTED_IDENTIFIER ON SET ANSI_NULLS ON SET ANSI_PADDING ON SET ANSI_WARNINGS ON SET NUMERIC_ROUNDABORT OFF

Note that all the options have to be ON except the NUMERIC_ROUNDABORT option, which has to be OFF. If any of these options are not set as specified, you'll get an error message when you create a special index. In addition, if you've already created one of these indexes and then you attempt to modify the column or view on which the index is based, you'll get an error. If you issue a SELECT that normally should use the index, and if the seven SET options do not have the values indicated, the index will be ignored but no error will be generated.

You can also use the property function SESSIONPROPERTY to test whether you have each option appropriately set. A returned value of 1 means that the setting is ON, and a 0 means OFF. The following example checks the current session setting for the option NUMERIC_ROUNDABORT:

 SELECT SessionProperty('NUMERIC_ROUNDABORT')

Permissible Functions

A function is either deterministic or nondeterministic. If the function returns the same result every time it is called with the same set of input values, it is deterministic. If it can return different results when called with the same set of input values, it is nondeterministic. For the purposes of indexes, a function is considered deterministic if it always returns the same values for the same input values when the seven SET options have the required settings. Any function used in a computed column's definition or used in the SELECT list or WHERE clause of an indexable view must be deterministic.

Table 8-4 lists the deterministic and nondeterministic functions in SQL Server 2000. Where it might not be obvious, I have included an indication of why a particular function is nondeterministic—that is, why you might not always get the same result when the function is called with the same input values.

Table 8-4. SQL Server 2000 deterministic and nondeterministic functions.

Function	Deterministic or Nondeterministic Attribute
Aggregate functions (SUM, AVG, and so on) (Note that you cannot use aggregate functions in computed columns—as I'll discuss below—but you can use some of them in indexed views.)	Deterministic
Configuration Functions	All are Nondeterministic
Cursor Functions	All are Nondeterministic
Date and Time Functions:
DATEADD(datepart, number, date)	Deterministic
DATEDIFF(datepart, startdate, enddate)	Deterministic
DATENAME	Nondeterministic (the result is language dependent, particularly for the value returned when you're looking for the names of months or days of the week.)
DATEPART	Deterministic (except for DATEPART(dw,date) because the value depends on the current setting of DATEFIRST)
DAY (date)	Deterministic
GETDATE	Nondeterministic
GETUTCDATE	Nondeterministic
MONTH (date)	Deterministic
YEAR (date)	Deterministic
Mathematical Functions:
RAND()	Nondeterministic
The rest of mathematical functions, including RAND(SEED)	Deterministic
Metadata functions (object info)	All are Nondeterministic
Security functions (such as IS_MEMBER)	All are Nondeterministic
String functions (such as ASCII and STR)	All are Deterministic except CHARINDEX and PATINDEX
System Functions:
APP_NAME	Nondeterministic
CAST	Nondeterministic if the datatype is datetime, smalldatetime, or sql_variant. Deterministic otherwise.
CONVERT (data_type[(length)],	Nondeterministic if the datatype is expression) datetime, smalldatetime, or sql_variant. Deterministic otherwise.
CONVERT (data_type[(length)], expression, style)	Deterministic
COALESCE	Deterministic
CURRENT_TIMESTAMP	Nondeterministic
CURRENT_USER	Nondeterministic
DATALENGTH	Deterministic
@@ERROR	Nondeterministic
FORMATMESSAGE	Nondeterministic
GETANSINULL	Nondeterministic
HOST_ID	Nondeterministic
HOST_NAME	Nondeterministic
IDENT_INCR	Nondeterministic
IDENT_SEED	Nondeterministic
@@IDENTITY	Nondeterministic
IDENTITY (function)	Nondeterministic
ISDATE (expression)	Nondeterministic (because it depends on DATEFORMAT) (Note that this function can be deterministic if used with the CONVERT function, with CONVERT style parameter specified, and style not equal to 0, 100, 9, or 109.)
ISNULL	Deterministic
ISNUMERIC	Deterministic
NEWID	Nondeterministic
NULLIF	Deterministic
PARSENAME	Deterministic
PERMISSIONS	Nondeterministic
@@ROWCOUNT	Nondeterministic
SESSION_USER	Nondeterministic
STATS_DATE	Nondeterministic
SYSTEM_USER	Nondeterministic
@@TRANCOUNT	Nondeterministic
USER_NAME	Nondeterministic
CHECKSUM(select list)	Deterministic except for CHECKSUM(*)
System statistical functions such as @@cpu_busy, @@idle	Nondeterministic
Text and Image Functions:
PATINDEX	Nondeterministic
TEXTPTR	Nondeterministic
TEXTVALID	Nondeterministic
OPERATORS (such as +, -, /, *, string +, - neg)	Deterministic
Implicit Date Conversion Functions:
String -> date	Nondeterministic (because it depends on DATEFORMAT)

This might seem like quite a restrictive list, but the same restrictions apply to functions you use in your own user-defined functions (UDFs)—that is, your own functions cannot be based on any nondeterministic built-in function. I'll talk about UDFs in Chapter 11.

Schema Binding

To create an indexed view, a requirement on the table itself is that you don't want the definition of any underlying object's schema to change. To prevent a change in schema definition, SQL Server 2000's CREATE VIEW statement allows the WITH SCHEMABINDING option. When you specify WITH SCHEMABINDING, the SELECT statement that defines the view must include the two-part names (owner.object) of all referenced tables. You can't drop or alter tables participating in a view created with the SCHEMABINDING clause unless you've dropped that view or changed the view so that it no longer has schema binding. Otherwise, SQL Server raises an error. If any of the tables on which the view is based is owned by someone other than the user creating the view, the view creator doesn't automatically have the right to create the view with schema binding because that would restrict the table's owner from making changes to her own table. The table owner must grant another user REFERENCES permission in order for that user to create a view with schema binding on that table. We'll see an example of schema binding in a moment.

Indexes on Computed Columns

SQL Server 2000 allows you to build indexes on deterministic computed columns where the resulting data type is otherwise indexable. This means that the column's datatype cannot be text, ntext, or image. Such a computed column can participate at any position of an index or in a PRIMARY KEY or UNIQUE constraint. You cannot define a FOREIGN KEY, CHECK, or DEFAULT constraint on a computed column, and computed columns are always considered nullable, unless you enclose the expression in the ISNULL function.. You cannot create indexes on any computed columns in system tables. When you create an index on computed columns, the seven SET options must first have the correct values set.

Here's an example:

 CREATE TABLE t1 (a int, b as 2*a) GO CREATE INDEX i1 on t1 (b) GO

If any of your seven SET options don't have the correct values when you create the table, you get this message when you try to create the index:

 Server: Msg 1934, Level 16, State 1, Line 2 CREATE INDEX failed because the following SET options have incorrect settings: '<OPTION NAME>'.

If more than one option has an incorrect value, the error message reports only one incorrectly set option.

Here's an example that creates a table with a nondeterministic computed column:

 CREATE TABLE t2 (a int, b datetime, c AS datename(mm, b)) GO CREATE INDEX i2 on t2 (c) GO

When you try to create the index on the computed column c, you get this error:

 Server: Msg 1933, Level 16, State 1, Line 1 Cannot create index because the key column 'c' is non-deterministic or imprecise.

Column c is nondeterministic because the month value of datename can have different values depending on the language you're using.

Using the COLUMNPROPERTY Function

You can use the new IsDeterministic property to determine before you create an index on a computed column whether that column is deterministic. If you specify this property, the COLUMNPROPERTY function returns 1 if the column is deterministic and 0 otherwise. The result is undefined for noncomputed columns, so you should consider checking the IsComputed property before you check the IsDeterministic property. The following example detects that column c in table t2 in the previous example is nondeterministic:

 SELECT COLUMNPROPERTY(object_id('t2'), 'c', 'IsDeterministic')

The value 0 is returned, which means that column c is nondeterministic. Note that the COLUMNPROPERTY function requires an object ID for the first argument and a column name for the second argument.

Implementation of a Computed Column

If you create a clustered index on a computed column, the computed column is no longer a virtual column in the table. Its computed value will physically exist in the rows of the table, which is the leaf level of the clustered index. Updates to the columns that the computed column is based on will update the computed column in the table itself. For example, in the t1 table created previously, if we insert a row with the value 10 in column a, the row will be created with both the values 10 and 20 in the actual data row. If we then update the 10 to 15, the second column will be automatically updated to 30.

In Chapter 16, I'll revisit indexes on computed columns and show you some situations where you can make good use of them.

Indexed Views

Indexed views in SQL Server 2000 are similar to what other products call materialized views. The first index you must build on a view is a clustered index, and since the clustered index contains all the data at its leaf level, this index actually does materialize the view. The view's data is physically stored at the leaf level of the clustered index.

Additional Requirements

In addition to the requirement that all functions used in the view be deterministic and that the seven SET options be set to the appropriate values, the view definition can't contain any of the following:

TOP
text, ntext, or image columns
DISTINCT
MIN, MAX, COUNT(*), COUNT(<expression>), STDEV, VARIANCE, AVG
SUM on a nullable expression
A derived table
The ROWSET function
Another view (you can reference only base tables)
UNION
Subqueries, OUTER joins, or self-joins
Full-text predicates (CONTAINS, FREETEXT)
COMPUTE, COMPUTE BY
ORDER BY

Also, if the view definition contains GROUP BY, you must include the aggregate COUNT_BIG(*) in the SELECT list. COUNT_BIG returns a value of the datatype BIGINT, which is an 8-byte integer. A view that contains GROUP BY can't contain HAVING, CUBE, ROLLUP, or GROUP BY ALL. Also, all GROUP BY columns must appear in the SELECT list. Note that if your view contains both SUM and COUNT_BIG(*), you can compute the equivalent of the AVG function even though AVG is not allowed in indexed views. Although these restrictions might seem severe, remember that they apply to the view definitions, not to the queries that might use the indexed views.

To verify that you've met all the requirements, you can use the OBJECTPROPERTY function's IsIndexable property. The following query tells you whether you can build an index on a view called Product Totals:

 SELECT ObjectProperty(object_id('Product_Totals'), 'IsIndexable')

A return value of 1 means you've met all requirements and can build an index on the view.

Creating an Indexed View

The first step to building an index on a view is to create the view itself. Here's an example from the Northwind database:

 USE northwind GO CREATE VIEW Product_Totals WITH SCHEMABINDING AS select productid, total_volume = sum(unitPrice * Quantity), total_qty = sum(Quantity) , number = count_big(*) from dbo."order details" group by productid

Note the WITH SCHEMABINDING clause and the specification of the owner name (dbo) for the table. At this point, we have a normal view—a stored SELECT statement that uses no storage space. In fact, if we run the system stored procedure sp_spaceused on this view, we'll get this error message:

 Server: Msg 15235, Level 16, State 1, Procedure sp_spaceused, Line 91 Views do not have space allocated.

To create an indexed view, you need to create an index. The first index you create on a view must be a unique clustered index. Clustered indexes are the only type of SQL Server index that contains data; the clustered index on a view contains all the data that makes up the view definition. This statement defines a unique clustered index, PV_IDX, for the view:

 CREATE UNIQUE CLUSTERED INDEX PV_IDX on Product_Totals(productid)

After you create the index, you can rerun sp_spaceused. My output looks like this:

 Name       rows   reserved   data index_size unused ----------------- ------- ---------- ----- ---------- ------ Product_Totals  77   24 KB   8 KB 16 KB   0 KB

Data that comprises the indexed view is persistent, with the indexed view storing the data in the clustered index's leaf level. You could construct something similar by using temporary tables to store the data you're interested in. But a temporary table is static and doesn't reflect changes to underlying data. In contrast, SQL Server automatically maintains indexed views, updating information stored in the clustered index whenever anyone changes data that affects the view.

After you create the unique clustered index, you can create multiple nonclustered indexes on the view. You can determine whether a view is indexed by using the OBJECTPROPERTY function's IsIndexed property. For the Total_Products indexed view, the following statement returns a 1, which means that the view is indexed:

 SELECT ObjectProperty(object_id('Product_Totals'), 'IsIndexed')

Executing the system stored procedure sp_help or sp_helpindex returns complete information about any indexes on the view.

Using an Indexed View

One of the most valuable benefits of indexed views is that your queries don't have to directly reference a view to use the index on the view. Consider the Product_Totals indexed view. Suppose you issue the following SELECT statement:

 SELECT productid, total_qty = sum(Quantity) FROM dbo."order details" GROUP BY productid

SQL Server's query optimizer will realize that the precomputed sums of all the Quantity values for each productid are already available in the index for the Product_Totals view. The query optimizer will evaluate the cost of using that indexed view in processing the query. But just because you have an indexed view doesn't mean that the query optimizer will always choose it for the query's execution plan. In fact, even if you reference the indexed view directly in the FROM clause, the query optimizer might decide to directly access the base table instead. In Chapters 15 and 16, I'll tell you more about how the query optimizer decides whether to use indexed views and how you can tell if indexed views are being used for a query.