Creating the Physical Design for the Data Model | MCSD Self-Paced Training Kit: Analyzing Requirements and Defining Microsoft .NET Solution Architectures, Exam 70-300: Analyzing Requirements and ... Exam 70-300 (Pro-Certification)

In Chapter 8, "Creating the Logical Data Model," you created logical designs of the data model, which included defining tables and columns, normalizing data, defining relationships, and defining primary/foreign keys. The physical specifications further define indexes, partitions, and data denormalization, all of which greatly affect performance. Even applications created correctly a year ago need to have their performance features revisited frequently because as data grows, shrinks, or changes, performance is affected for better or worse.

Indexing

A database index is similar to a book index, which enables you to quickly find key sections in a book without having to search page by page. A database index enables you to quickly find a reference to a key section of data within a table and, instead of scanning the entire table, go directly to the specified row. Obviously, this method is a timesaver, but keep in mind that indexes are practical only if they are thin and exact. For example, having 500 indexed pages for a 1,000-page book would be impractical. On the same note, degraded performance for a table containing too many indexes is worse than not having any indexes in the first place.

Indexes should be used properly to be effective. Columns used in a table join, with clauses and unique identifiers, are great candidates for indexing. SQL Server fully supports index use. You can create indexes from Enterprise Manager, Transact SQL code, and Query Analyzer. After the index schema is in place, you should run and analyze queries to determine whether any performance benefits or losses occurred. There are two main types of indexes: clustered and non-clustered. Each type has its own advantages and usage guidelines, explained in the following section.

Clustered and Non-Clustered Indexes

Clustered indexes are physically sorted by columns in a table. A table can contain only one clustered index because a clustered index physically sorts the table, and the table cannot be sorted twice for obvious reasons. Clustered indexes are an excellent choice for unique identifiers, such as auto-incrementing a column. SQL Server automatically creates a clustered index on the primary key.

Non-clustered indexes do not sort the table; instead, they create a separate referenced instance of the row based on the clustered column. There can be up to 249 non-clustered indexes on a table. A non-clustered index provides excellent benefits over a full table scan if the majority of queries include a WHERE clause on the indexed column. Creating multiple non-clustered indexes impedes performance because they consume physical space, thus making the database increasingly more massive, especially if the indexes are used only 10% (or less) of the time.

SQL Server Index Tuning Wizard

SQL Server offers the Index Tuning Wizard, which is an excellent tool for configuring indexes across an entire database. The Index Tuning Wizard can automatically recommend index placement, thus saving you the time usually spent in analyzing results and making appropriate modifications. It actually tunes the database to your specific queries. For example, you might have two tables that look practically identical. SQL scripts can run SELECT statements on one table and INSERT/UPDATE statements on the other. The Index Tuning Wizard would most likely index only the table that has a large number of SELECT statements performed on it. To give the wizard a snapshot of data for analysis, you must perform a trace in SQL. After a trace has been run and saved, the Index Tuning Wizard can give you useful recommendations for proper index placement.

Partitions

Partitioning is the process of dividing items into smaller, more manageable sections. To partition a pie, for example, you would slice it into eight pieces, making it easier to eat. Data is partitioned in much the same way and includes the following dividing options:

Horizontal partitions This method is used to divide a table based on its rows.
Vertical partitions This method is used to divide a table based on its columns.
Disk partitions This method is used to divide database tables across multiple physical disk drives.
Database partitions This method, performed via federated databases, is used to divide database tables across multiple servers hosting different instances of SQL Server.

Horizontal and Vertical Partitioning

Horizontal partitioning divides a table based on rows. For example, you could divide a table by an auto-incrementing identifier, placing records into new tables based on their IDs. Each newly created table would then represent a range of unique IDs. However, this method, when used improperly, can quickly become unmanageable. After a block of numbers has been used, a new table should be created manually for these records. If handled programmatically, this action can be performed by a nightly job.

Vertical partitioning divides the table based on columns. You use this method when you have columns that are extremely large or infrequently accessed. For example, you might have an application that queries employee information, such as name, SSN, and address, on a sporadic, infrequent basis. Vertically partitioning the address information, for example, could substantially improve the performance of queries accessing frequently requested fields.

Disk and Database Partitioning

Occasionally, you might have a few monolithic tables containing more than a few million rows each, with the rest of the tables in the database being fairly small and very efficient. If you placed the large tables on a separate disk, you could significantly speed up queries to the instance of the database containing the smaller tables. This process is called disk partitioning.

SQL Server also enables you to place larger tables not just on a different disk within the same server, but also on another server with a different instance of SQL. This process, called database partitioning, is done via federated databases and requires that the server making the request add the other server as a linked server. Federated databases are not implemented in other industry-leading database carriers, thus giving SQL Server a huge advantage over its main competitors. Imagine dealing in several terabytes worth of data. Only so much disk space can be added to a server, but if more servers can be added, physical space ceases to be an issue.

Denormalization

In Chapter 8, you created your normalization scheme, which included eliminating redundant data. In theory, normalization is an ideal data model, but in practice, introducing some redundancy into your table can increase performance. Denormalization can be beneficial in a query that might perform an inner join on two tables, one small and compact and the other a large reference table. If the large reference table is taken out of the join and its values are placed into the smaller table, query performance to the smaller table is improved.

Denormalization is commonly practiced in data warehousing and reporting databases. Transaction-heavy e-commerce databases usually attempt to shy away from denormalization because they must be as dynamic as possible at all times. There are instances of "overnormalization," in which accessing a required record for a query might be possible only after joins with three or more tables. Normalization in this case can prove impractical for the needed result set. The decision to denormalize and the extent of denormalization are both judgment calls, to a certain degree. Running a trace of an average working day and verifying where normalization might have been taken to the extreme are always good ideas.