Optimization Through Design | MCAD/MCSE/MCDBA 70-229 Exam Cram 2: Designing & Implementing Databases w/SQL Server 2000 Enterprise Edition

A lot of factors affect SQL Server and the applications that use its data resources. Gaining performance and response time from a server is a primary concern for database developers and administrators. One key element in obtaining the utmost from a database is an indexing strategy that helps to achieve the business needs of the enterprise, in part by returning data from queries in a responsive fashion.

The query optimizer in SQL Server reliably chooses the most effective index for use with any given query, which helps considerably if indexes have been created to choose from. In most cases the optimizer provides the best performance from the index choices provided. The overall index design strategy should provide a good selection of indexes for the optimizer to use.

You can't just start putting a lot of indexes throughout your table structures until you understand the implications of indexes. Although indexes provide good query performance, they can also take away from other processes. You can expect performance to degrade when performing updates against the data. Indexes consume hardware resources and processing cycles; memory overhead is associated with their use; regular maintenance is required to keep them at optimum levels; and many database processes have to work within the presence of the index.

SQL Server indexing strategies begin with the most basic elements of providing a primary index, but there are more advanced tuning and design considerations. The exam covers indexing in many categories, including index selection, tuning, maintenance, and specialty implementations.

Pointing an Index in the Right Direction

An indexing strategy involves your decisions on how to implement indexes, which columns you choose to index, and how you decide that one index is better than another. SQL Server supports two kinds of indexes: clustered and nonclustered. Creating additional indexes, including clustered and nonclustered, covering indexes, and indexed views, is imperative to optimize data access.

Choosing an indexing strategy can be an extremely hard decision. Know when to implement certain indexes and where to put them. Index choice is definitely going to be questioned on the exam. The Index Tuning Wizard can help improve and design accurate, efficient, and proficient indexes on the fly with fascinating results. The Index Tuning Wizard enables you to select and create powerful indexes and statistics for a Microsoft SQL Server 2000 database without prior knowledge of index and database structural designs. The Index Tuning Wizard simplifies one of the tasks that weigh on DBAs' backs: choosing the correct column to index.

Information concerning indexes can be viewed by using the sp_helpindex stored procedure. To rename or delete an index, use the sp_rename stored procedure and DROP INDEX statement, respectively. When information that affects data is modified, index information can go loose throughout the database.

Index statistics need to be up-to-date for the optimizer to decide on the fastest route of access. You can use the UPDATE STATISTICS command, which updates statistical information for an index. Rebuilding indexes helps in collecting the scattered information and bringing data back from its fragmented form. Use the DBCC DBREINDEX to re-index a single index or all indexes in a table.

SQL Server supports indexes defined on any column in a table, including computed columns. If a table is created with no clustered index, the data rows are not stored in any particular order. This structure is called a heap. If order is supplied by a clustered index, heap storage is not used. In this case, the data is stored in a given sequence based on the clustered index.

SQL Server supports indexing of two basic types, although there are many variations of those types. The two types of SQL Server indexes are clustered and nonclustered indexes. Clustered indexes are implemented so that the logical order of the key values determines the physical order of the corresponding rows in a table. Nonclustered indexes are indexes in which the logical order of the index is different than the physical, stored order of the rows on disk. The following sections examine them both.

Full-Text Indexing and Searches

Full-Text Search is a completely separate program that runs as a service, namely Microsoft Search Service, or MSSearch. Full-Text Search can be used in conjunction with all sorts of information from all the various MS BackOffice products. The Full-Text catalogs and indexes are not stored in a SQL Server database. They are stored in separate files managed by the Microsoft Search Service.

Full-Text indexes are special indexes that efficiently track the words you're looking for in a table. They help in enabling special searching functions that differ from regular indexes. Full-Text indexes are not automatically updated, and they reside in a storage space called the Full-Text catalog.

When a Full-Text index is created, you can perform wildcard searches (Full-Text search) that locate words in close proximity. All Full-Text indexes are by default placed in a single Full-Text catalog. Each SQL Server at its apex can store 256 Full-Text catalogs.

The Full-Text catalog files are not recovered during a SQL Server recovery. They also cannot be backed up and restored using the T-SQL BACKUP and RESTORE statements. The T catalogs must be resynchronized separately after a recovery or restore operation. The Full-Text catalog files are accessible only to the Microsoft Search Service and the Windows NT or Windows 2000 system administrator.

To enable Full-Text searches, you can run the Full-Text Indexing Wizard, which enables you to manage and create Full-Text indexes. Note that Full-Text indexes may be created only on columns that contain only text. Full-Text indexes are not automatically updated, thereby bringing up the need to automate the process of updating by setting a job or performing a manual administrative task.

Clustered Indexes

A clustered index is a type of index in which the logical order of the key values determines the physical order of the data stored on disk. Because a clustered index shows how the physical data of a table is stored, there can be only one clustered index per table.

The selection of appropriate column(s) on which to base a clustered index is particularly important. Columns should be chosen where range searches are frequently performed or based on the expected order of the majority of reports printed from the data. A range search occurs within a WHERE conditional operation, as in selecting the range of all authors who have an advance greater than $500 but less than $4,000. With this type of range search, the index, if present, first locates the smallest value (500), and then it locates the other values alongside that value until the last value is reached (4000).

Each index page holds a header that is followed by index rows. Each of these rows contains a pointer value to another subpage or a row of data. The lowest level of implementation in clustered indexes is known as the leaf page or node, and it contains data rows and pages. The first level of implementation is known as the root node. Anything between the root node and the leaf node is referred to as intermediate-level pages.

Candidates for clustered indexes have the following characteristics:

A column queried using range operators such as BETWEEN, <, or >.
Columns that are grouped using the GROUP BY clause or that involve joins or that represent the ORDER of displayed or printed output.
Queries and reporting that access sequential data.
Operations that result in large resultsets.
Columns that don't have incremented values; for example, an ID column has incremented values.
Columns with many duplicate values.
Columns that don't have incremented values; for example, an ID column has incremented values.
On the PK when there are many inserts causing all records to be inserted on the same page, which can remain in memory.

Defining clustered indexes on views enables you to store the resultsets of the view in the database, thereby reducing the overhead of dynamically building the resultset. Because you can create only one clustered index, the majority of indexes created are nonclustered indexes. A nonclustered index can operate over an existing clustered index or over a data heap.

Nonclustered Indexes

Nonclustered indexes are indexes in which the logical order of the key values in the index is different than the physical order of the rows in the indexed table. There are two major differences that distinguish them from clustered indexes. First, the leaf node of the nonclustered index does not contain the concrete data. It contains index rows having a nonclustered key value that point to a data row. Second, as already noted, nonclustered indexes do not physically rearrange the data.

When you query data on a column with a nonclustered index, SQL Server 2000 first tries to find a record containing your table object ID and a value in the indid column from the sysindexes table that is >= 1 and < 255, because a 0 implies there is no index at all, and 255 implies that the table has Text/Image pages. After it finds the record, SQL Server examines the root column to find the root page of the index. After SQL Server finds the root page, it can begin a search for your data.

The SQL Server 2000 optimizer chooses an index to use for the search. The Keys column in sysindexes lists which columns are involved in each index. If the optimizer knows this and knows which columns you are searching on, it can use index statistics to determine which index to use. In the simplest case, if you are searching on only one column, A, and only one index exists on column A, then it uses that index. When you are searching on multiple columns, and many or all of these columns are indexed, it uses statistical methods to determine the index.

It's important to understand the following guidelines before you create nonclustered indexes:

Nonclustered indexes are good when queries don't return a large resultset.
Columns that are used in a SELECT list or the WHERE clause are good candidates for nonclustered indexes.
A nonclustered index is the default index in SQL Server.
There can be up to 249 nonclustered indexes on a single table.
Nonclustered indexes generally take more overhead than clustered indexes, so you should create them only when it is absolutely necessary.
Nonclustered indexes make a good choice for use as a covering index.

Be careful on the exam to pay close attention to WHERE and ORDER clause examples that don't have matching or covering indexes. Increased performance and decreased response time may be achieved by adding an index in these cases.

What to Index?

Column selectivity is a major step in the process of indexing. In general, columns that are frequently accessed by the WHERE or the ORDER BY clauses should be considered. When you build indexes, try to narrow indexes down to the minimum number of columns needed. Multicolumn indexes act negatively on performance. Columns with unique values serving as primary keys are also good candidates.

The challenge for the database designer is to build a physical data model that provides efficient data access. This can be done by minimizing I/O processing time. The following types of columns are good ones to index:

A column that acts as the table's primary or foreign key.
Columns that are regularly sorted by the ORDER BY clause.
Columns that are filtered on an exact condition using the WHERE clausefor instance, WHERE state= 'Ca'.
Columns that are queried on joins.
Columns that hold integer values rather than character values.
Searches for rows with search key values in a range of valuesfor example, WHERE Royalty BETWEEN 1000 and 4000.
Queries that use the like clause, but only if they start with character data. Examine the following: WHERE au_fname LIKE 'sm%'.

The true test of any index strategy occurs when queries are processed during day-to-day operations. Strong guidelines also exist on things that should not be indexed.

What Not to Index?

You really can't and shouldn't index all the columns in your table. Doing so would significantly drop performance on inserts and deletes, even though most queries would run fast. When determining whether to index a small table, look to see whether it requires more page reads to scan the index than there are pages in the table. In this case an index will hurt performance, not help it. Therefore, a table with less than three pages is not helped by any index. Learn to use the SQL Server Query Analyzer tool as a guide for whether an index is useful. Recognize table scans; the process of reading all records from a table in sequence may take fewer cycles than accessing an index firstparticularly on small tables.

Here are some conditions that would indicate you should not index a column:

If the index is never used by the query optimizer
If the column values exhibit low selectivity, often greater than 95% for nonclustered indexes
If the columns to be indexed are very wide
If the table is rarely queried
If the columns are not used in WHERE clauses, aggregated, or used in sorting or in join operations.

Indexes provide many trade-offs. Although queries may show a performance improvement, INSERT, UPDATE, and DELETE operations could see a decline in performance. You may not know the power of indexes until you perform large searches on tables having tens of thousands of rows. Implementing an indexing strategy would not be proper for a small database with a few tables containing no more than 50 rows. Tables are benefited by indexing for the following reasons:

As many as 249 nonclustered indexes can be created on a single table.
Indexes help increase efficiency on complex searches on large tables.
Indexes are easy to implement.
Sixteen columns can compose the same composite index, as long as the total key length is less than 900 bytes.
Indexes may be used to enforce uniqueness throughout rows in tables.

Creating indexes against a view is new to SQL Server 2000. With this functionality you can provide a few advanced implementations. Pay particular attention to restrictions and required settings, because they are sure to be needed on the exam.

Nonclustered Covering Indexes

A covering index is a nonclustered index that is built on all the columns needed to satisfy a SQL query, in both the selection criteria and the WHERE clause. Covering indexes save a huge amount of I/O and build a lot of efficiency in a query. For instance, if you query SQL Server with the statement

 SELECT Au_FName, Au_LName FROM Authors WHERE State= 'ca'

you can run the following to create an index:

 CREATE NONCLUSTERED INDEX MyIndex ON Authors(State, Au_FName, Au_LName)

MyIndex would be considered a covering index because a composite index is built on all columns specified in the SELECT statement and WHERE clause. Covering indexes significantly boost performance because all the data needed for the query to base the search on is contained within the index; only the index pages, not the data pages, of the table must be referenced to resolve where to retrieve the data.

Indexing Strategies

One clustered index per table can be used and, if implemented, it determines the physical order of the data. Nonclustered indexes act like those in the back of a bookpointing out the physical location of the data. Nonclustered covering indexes can be created in cases in which the exact query content is known.

Keys, ranges, and unique values are strong selections for index candidates. Seldom-used data, binary data, and repeating values are poor index candidates. After index candidates have been selected, monitor application usage. Adjust the indexing strategy on a regular basis to provide reliably high performance.

Index Maintenance over Time

It is a little tricky, at best, to select an appropriate indexing strategy, and maintaining it in the face of ongoing changes inherent in a database life cycle is even more of a challenge. Index maintenance over time is the next consideration.

Index creation and alteration are two necessary mechanisms in any indexing strategy. A database goes through a life cycle and undergoes size changes, application uses, and other growth. Any database system is in a constant state of flux because data is always being added, changed, removed, or otherwise affected by change.

Sometimes indexes need to be rebuilt because of changes in data that occurred after it was built. Rebuilding indexes helps in collecting the scattered information and bringing index data back to its original form. This increases the overall performance by making it easier for SQL Server to read pages to get data. You can re-create an index in three ways:

By deleting the original index and then re-creating it
By specifying DROP_EXISTING in the CREATE INDEX clause
By using the Database Console Command (DBCC), DBCC DBREINDEX

The first option is highly unfavorable for clustered indexes because other levels of indexes use the same index's cluster key as a pointer to data rows. When a clustered index is defined on a table, this dictates the physical order of the data. The clustered index key values are used as data pointers for all other nonclustered keys. Deleting a clustered index and then re-creating it means deleting and re-creating all nonclustered indexes.

The second option is rebuilding the index. This method reduces unnecessary work and is advantageous to both clustered and nonclustered indexes in that it significantly reduces the overhead of rebuilding the index.

The third option is the preferred out of the three choices, because it enables you to rebuild multiple indexes in a single shot, without rebuilding indexes individually. It also enables you to rebuild indexes on tables that use PRIMARY and UNIQUE constraints without requiring you to delete and rebuild each constraint.

SQL Server enables you to rebuild one or more indexes on a table by using the DBCC DBREINDEX statement, without having to rebuild each index separately. DBCC DBREINDEX is also useful if you want to rebuild indexes that enforce PRIMARY KEY or UNIQUE constraints without having to delete and re-create the constraints. If simple defragmentation of index space is desired, use the INDEXDEFRAG option. There are many uses of DBCC in database maintenance and in itself DBCC is a major topic; for that reason we will look at DBCC in depth later, in the section "Database Console Command (DBCC)."

Using the Index Tuning Wizard

The Index Tuning Wizard is a graphical tool that enables you to select and create powerful indexes and statistics for Microsoft SQL Server 2000 databases. The Index Tuning Wizard simplifies the task of choosing which indexes to create in a table. As you've seen previously, one of the most difficult tasks for a DBA is determining which columns get indexed. Fortunately, Microsoft has shipped a wizard that identifies the best clustered and nonclustered indexes for a table and the indexes that are most likely to improve query performance.

The Index Tuning Wizard is used to find the most efficient path of execution in a set of input. This input is a table or file, namely a workload. To make workloads, use the SQL Profiler to set a trace. The SQL Profiler enables you to monitor and record activity events in a file or table. SQL Profiler traces enable the Index Tuning Wizard to make accurate decisions. Traces of the different events that happen while work is performed can be recorded for hours or even days, depending on the period desired to be monitored. After the events are recorded, the Index Tuning Wizard works with the query processor to establish the viability of a configuration.

Caution! Profiler can affect overall database performance, sometimes significantly, so use it sparingly and remember to turn it off as soon as you feel that the necessary information has been captured.

For the Index Tuning Wizard to really be useful, it must be executed during production use of the database, or at least under circumstances in which a sufficient query load will produce meaningful results.

Dropping Unwanted Indexes

If an index isn't in use any longer, you may want to delete it. Deleting an index means freeing all storage space the index was previously holding. It also results in more efficient INSERT, UPDATE, and DELETE operations. Only a table owner is granted permission to delete indexes. To delete an index or indexes, run the DROP INDEX statement.

Dropping an index may require the removal of constraints if those constraints are directly related to the index. In other instances, you may just want to remove an index that has outlived its usefulness. To help determine index validity, the Index Tuning Wizard is provided as an assistant in index analysis. You can't drop an index created with a PRIMARY KEY or UNIQUE constraint; the alter table command must be used.

Partitioned Views to Optimize Access

When a distributed partitioned view is used, the data resides on several servers that work together to maintain and display the data. These servers may be called federated database servers. This term is used to refer to these servers that share the data-hosting job but are independently administered.

With partitioned views, tables that store large amounts of data can be split up (using horizontal partitioning) into smaller member tables. This data in the member table holds the same number of columns as the original table; it is only the number of rows that is decreased. After the data is broken down into smaller member tables, a view defined with UNION ALL is used to bring all member tables together. This view looks like a single large resultset. When a SELECT query is run against the partitioned view, SQL Server uses CHECK constraints in determining which member table the data is from. The CHECK constraint is usually created on the primary key column.

A partitioned view combines horizontally partitioned data from member tables across one or more servers (distributed partitioned views). A member table can be thought of as a segment from a larger table. There are two types of partitioned views in SQL Server: a local partitioned view and a distributed partitioned view. A local partitioned view is a partitioned view in which all member tables reside on the local instance of SQL Server. Distributed partitioned views are new to SQL Server 2000 and are a bit more advanced than local partitioned views. The key difference between a distributed partitioned view and a local partitioned view is that in a local partitioned view, the data is collected from a single server alone. In contrast, a distributed partitioned view collects data from two or more instances of SQL Server, hence the name "distributed."

To decide whether you should create a partitioned view, you have to examine the data you will be working with, and consider how it is used. If you have a table that is used by many different department or regions (each with its own server), you can look at partitioning the database along those lines.

But, you say, the system has been in place for years and what used to perform well just doesn't seem to be doing so well anymore. A neglected system or just plain poor maintenance may be the cause, and there are several things you should look at before deciding that a redesign is necessary.

Optimizing Performance in the Partitions

To maximize performance in large applications, the design of partitioned views should be examined. Also, placing indexes on views will provide for faster searching and retrieval of data. In itself, a view generally provides for better performance because less data is involved. Always consider the use of views when tables have an extremely large number of columns, application design is left to the user, and tables contain sensitive information.

Indexing Views for Optimization

Indexed views are a new concept introduced in SQL Server 2000. They allow view resultsets to be stored in the database's physical storage after an index is created. In contrast to this, in a non-indexed view, the view is activated at runtime and the resultset is dynamically built based on logic needed to create output from the underlying base tables. Any computations, such as joins or aggregations, are done during query execution for each query referencing the view, ultimately leading in degraded performance.

Defining clustered indexes on views enables the resultsets of the view to be stored in the database, thereby reducing the overhead of dynamically building the resultset. This means that indexed views can be treated like any other table because additional nonclustered indexes can be added. An indexed view can be created using the CREATE INDEX or CREATE VIEW statements, which is done when data is occasionally updated. Some of the benefits of indexed views are listed here:

They increase performance of queries that use joins and aggregates.
Data in an indexed view is updated when a base table gets modified.
Views that aggregate and join data are improved. When an index is created on a view, the view is executed and the resultset is stored in the database.
The first time you create an index on a view, it has to be of type unique clustered.

Creating an indexed view is as simple as altering the CREATE INDEX statement. Use the same syntax used in creating table indexes except that the table name is replaced with the view name. Also, some session-level options must be toggled (see the preceding section). Keep in mind that only the view owner is allowed to execute a CREATE INDEX statement on a view.

All indexes on a view are dropped if the view is dropped. All nonclustered indexes on the view are dropped if the clustered index is dropped. Nonclustered indexes can be dropped individually. Dropping the clustered index on the view removes the stored resultset, and the optimizer returns to processing the view like a standard view.