Tuning the Database Structure | MCTS 70-431 Exam Cram: Implementing and Maintaining Microsoft SQL Server 2005 Exam

From the first stages of the database design, you make decisions that directly affect performance. Many tools are available to assist with the tuning of a database after you begin the implementation, but it is important to consider the design from the outset. Many performance gains can come out of the database design itself.

The table structures and relationships, indexing, and other physical objects contribute to performance gains. You need to consider controlling the environment right from the beginning of any project. The following sections describe some of these techniques.

Indexing Strategies

A lot of factors affect SQL Server and the applications that use its data resources. Improving performance and response time from a server are primary concerns of database developers and administrators. One key element in obtaining the utmost from a database is having an indexing strategy that helps to achieve the business needs of the enterprise, in part by returning data from queries in a responsive fashion.

You can use one clustered index per table, If a clustered index is implemented, it determines the physical order of the data. Nonclustered indexes act like those in the back of a bookpointing out the physical location of the data. You can create nonclustered covering indexes in cases in which the exact query content is known. This means the indexes include all columns referenced by the query.

Keys, ranges, and unique values are strong selections for index candidates. Seldom-used data, binary data, and repeating values are weaker index candidates. After you have selected index candidates, you need to monitor application usage. You can adjust the indexing strategy regularly to provide reliably high performance.

Until you understand the implications of indexes, you should not use a lot of them throughout your table structures. Although indexes provide good query performance, they can also take away from other processes. You can expect performance to degrade when performing updates against the data:

Indexes consume hardware resources and processing cycles.
Memory overhead is associated with index use.
Regular maintenance is required to keep indexes at optimum levels.
Many database processes have to work within the presence of the index.

Exam Alert

Any SQL Server indexing strategy should begin with the most basic element of providing a primary index, but there are more advanced tuning and design considerations, as well. The 70-431 exam covers indexing in many categories, including index selection, tuning, maintenance, and specialty implementations.

What to Index

Column selection is a major step in the process of indexing. In general, you should consider indexing on columns that are frequently accessed by WHERE, ORDER BY, and JOIN clauses. When you build indexes, you should try to narrow them down to the minimum number of columns needed. Multicolumn indexes have a negative impact on performance. Columns with unique values serving as primary keys are also good candidates.

The challenge for a database designer is to build a physical data model that provides efficient data access. This can be done by minimizing I/O processing time. The following columns are good ones to index:

A column that acts as the table's primary or foreign key
Columns that are regularly sorted by the ORDER BY clause
Columns that are filtered on an exact condition, using the WHERE clause (for instance, WHERE state= 'Ca')
Columns that are queried on joins
Columns that hold integer values rather than character values
Searches for rows with search key values in a range of values (for example, WHERE Royalty BETWEEN 1000 and 4000)
Queries that use the like clause, but only if they start with character data (for example, WHERE au_fname LIKE 'sm%')

The true test of any indexing strategy occurs when queries are processed during day-to-day operations.

What Not to Index

Strong guidelines exist on what should not be indexed. You really can't and shouldn't index all the columns in a table. Doing so would significantly decrease performance on insertions and deletions, even though most queries would run fast. When determining whether to index a small table, you should determine whether more page reads are needed to scan the index than there are pages in the table. In such a case, an index would hurt performance, not help it. Therefore, a table with fewer than three pages is not helped by any index.

You should learn to use the SQL Server Query Analyzer tool as a guide for whether an index is useful. You need to recognize table scans; the process of reading all records from a table in sequence may take fewer cycles than accessing an index firstparticularly on small tables.

The following are some conditions under which you should not index a column:

If the index is never used by the Query Optimizer
If the column values exhibit low selectivity, often greater than 95% for nonclustered indexes
If the columns to be indexed are very wide
If the table is rarely queried
If the columns are not used in WHERE clauses, aggregated, or used in sorting or in JOIN operations

Using indexes involves many trade-offs. Although queries may show a performance improvement, INSERT, UPDATE, and DELETE operations could see a decline in performance. You might not know the power of indexes until you perform large searches on tables that have tens of thousands of rows. Implementing an indexing strategy would not be proper for a small database with a few tables containing no more than 50 rows.

Indexing has the following benefits:

Indexes help increase efficiency on complex searches on large tables.
Indexes are easy to implement.
Indexes can be used to enforce uniqueness throughout rows in tables.

Indexed Views

Creating indexes against a view is new to SQL Server 2005. With this functionality, you can provide a few advanced implementations. You need to pay particular attention to restrictions and required settings because they are sure to be covered on the exam.

An indexed view is a materialized view, which means it has been computed and stored. You index a view by creating a unique clustered index. Indexed views dramatically improve the performance of queries that aggregate many rows. However, they are not well suited for underlying data sets that are frequently updated.

Revisiting Indexing Postimplementation

It is a rarity for any complex database system design to implement all the right indexes on the first attempt. You can use the DTA to analyze existing index structures against a workload to determine what changes need to be implemented.

You should also periodically perform maintenance on existing indexes through defragmentation and rebuilding of index structures. You can use the ALTER INDEX command with the REORGANIZE option to defragment an index, and using the REBUILD option drops the existing index and then re-creates it. The REBUILD statement is preferred over DBCC DBREINDEX, which you can still use now but will be removed in a future version.

Leaving Space for Inserts (Fill Factor)

Fill factor is the percentage at which SQL Server fills leaf-level pages upon creation of indexes. Provision for empty pages enables the server to insert additional rows without performing a page-split operation. (A page split occurs when a new row is inserted into a table that has no empty space for its placement.) As the storage pages fill, page splits occur, and this can hamper performance and increase fragmentation. The fill factor is a configuration option that you set through the Enterprise Manager or the sp_configure stored procedure.

You normally find that queries (which read of existing data) outweigh data updates by a substantial margin. Providing extra room slows down the query process as empty space must be transversed during the reading process. Therefore, you might not want to adjust the fill factor value at all in static systems, where there are smaller numbers of additions.

Setting the fill factor too low hampers read performance because the server must negotiate a series of empty pages to fetch the desired data. It is beneficial to specify a fill factor when you create an index on a table that already has data and will have a high volume of inserts. If you do not specify this setting when creating an index, the server default fill factor setting is chosen.

The percentage value for the fill factor is not maintained over time; it applies only at the time of creation. Therefore, if inserts into a table occur frequently, it is important to take maintenance measures for rebuilding the indexes to ensure that the empty space is put back in place. You can rebuild a specific index by using the CREATE INDEX T-SQL command with the DROP EXISTING option. In addition, you can defragment indexes by using the DBCC INDEXDEFRAG command, which also reapplies the fill factor.

The pad index setting, which is closely related to the setting for fill factor, allows space to be left in non-leaf levels. You cannot specify the pad index by itself, and you can use it only if you supply a fill factor. You do not provide a value for this setting; it matches the setting given for the fill factor.

Data Partitioning Across Servers

You may be able to improve query performance by partitioning a table or an index or creating a partitioned view. This depends on the types of queries that are most frequently run and on the hardware configuration.

Partitioned Views

A partitioned view joins horizontally partitioned data. The data is presented as a single rowset, although it may come from a set of member tables across one or more servers. The view can be created from a set of member tables on the same instance; creating a local partitioned view in this manner is included for backward compatibility. The preferred method for partitioning data locally is by using partitioned tables.

Partitioned views can be created from member tables across multiple servers, creating a distributed partitioned view and forming a federation of database servers. A federation is simply a group of servers administered independently but that cooperate to share the processing and data storage. Forming a federation of database servers by partitioning data lets you scale out a set of servers to support the processing requirements of large, multitiered websites.

Partitioned Tables and Indexes

If queries that involve equi-joins are frequently run, improved performance may result if the partitioning columns are the same as the columns on which the tables are joined.

In partitioning scenarios, the tables or their indexes should be collocated, which means they should either use the same named partition function or use different ones that are essentially performing the same division. Partition functions are considered equivalent when they have the same number of parameters of corresponding data types, define the same number of partitions, and define the same boundary values for partitions.

The Query Optimizer can process a join faster, because the partitions can be joined. If a query joins two tables that are not partitioned on the join field, the presence of partitions may actually slow query processing.

Partitioning generally improves performance because of the addition of hardware components that work together to retrieve data. However, you need to be careful when you configure partitions. If partitions are mapped to filegroups, each accessing a different physical disk drive, data is sorted first by partition. SQL Server accesses one drive at a time under this mapping, and that reduces performance. A better configuration of partitions is to stripe data files of the partitions across more than one disk by using RAID 0.

Using the DTA

The DTA is a tool that analyzes database performance. You run workloads created using SQL Server Profiler against one or more databases. The DTA provides recommendations to add, remove, or modify clustered indexes, nonclustered indexes, indexed views, and/or partitioning.

The DTA has two interfaces: a standalone graphical interface and a command-line utility program, dta.exe, for DTA functionality in software programs and scripts.

In versions of SQL Server prior to SQL Server 2005, some of the DTA functionality was available within the Index Tuning Wizard. The DTA evaluates more types of events and structures than the Index Tuning Wizard, and it provides higher-quality recommendations as well.

Exam Alert

You can expect to see questions about the DTA on the 70-431 exam. In particular, you need to know the variety of settings that are available to provide variations on the diagnosis process.

The DTA provides a variety of settings so that you can specify objects to analyze and structures to maintain, and it provides advanced settings for online index recommendations. You can use these settings to specify how the DTA is to perform its analysis. Figure 8.1 shows the DTA interface and its available options.

Figure 8.1. The DTA graphical interface.

The DTA has two groups of physical design structures (PDS) settings. The first, near the top of the interface, allows for the selection of potential new structures to analyze for inclusion in the database. The second, on the bottom of the interface, allows for the selection of existing PDS elements to keep in the database and not analyze for removal.

When analyzing existing structures by using the DTA, if you want no additional PDS structures implemented, you can select the Evaluate Utilization of Existing PDS Only option from the Physical Design Structures (PDS) to Use in the Database area of the DTA. Alternatively, you can select the Do Not Keep Any Existing PDS option from the Physical Design Structures (PDS) to Keep in the Database area of the dialog to evaluate all existing structures.

Within the Partitioning Strategy to Employ section of the DTA, you can identify the style of partitioning to analyze for implementation. You can select anything from no analysis to full analysis. The Aligned Partitioning option recommends only partitions that are aligned with existing partitions.

Within the Advanced Tuning Options section of the DTA, you can identify the online index recommendations. The selection All Recommendations Are Offline generates the best recommendations possible, but it does not recommend that any physical design structures be created online. The option Generate Online Recommendations Where Possible chooses methods that can be implemented with the server online. This analysis occurs even if the implementation can be performed faster offline. The Generate Only Online Recommendations selection makes only recommendations that allow the server to stay online.