Certification Objective 8.04: Creating the Physical Design for the Data Model | MCSD Analyzing Requirements and Defining .NET Solutions Architectures Study Guide (Exam 70-300 (Certification Press)

For relational databases, transforming the logical database design into a physical model is often not that difficult. The logical model does a good job of identifying the tables, their fields, the primary key or keys, and the relationships with other tables.

In order to convert a logical design into a physical one, three key properties must be added to the data model:

Assign a system data type to each of the fields
Define the physical file structure of the database, including file names and locations
Identify frequently searched fields that need additional indexes for performance reasons

Exam Watch
The MCSD 70-300 does not contain specific SQL syntax questions, although you have to understand how databases are modeled.

In this section, we will examine the issues related to designing the physical database design model.

Creating an Indexing Specification

In relational databases, an index is a special database construct that makes searching for data faster on a particular column or columns within a data table. There is a slight additional overhead for maintaining each index, so it is important not to index every column on every table. However, if there is a particular column that will frequently be used as one of the criteria in a search or a join operation, you should consider creating an index for that field.

Application developers do not need to specify particular indexes in their searches. The database server will intelligently make use of appropriate indexes when executing search or join operations by creating an execution plan. One of the important tools for database developers when tuning their applications for performance is to examine the execution plans for all the application queries, to see if adding more indexes could improve performance.

One important thing to remember when designing an indexing strategy is that primary keys do not need to be indexed. Primary keys are, by definition, already indexes. Databases already do their best to make searches against a primary key one of the most efficient database operations imaginable. On the other hand, foreign keys are not automatically indexes. Foreign keys are the most likely candidates to become indexes, because they are usually involved in join operations.

Generally speaking, there are two types of indexes: clustered and unclustered. A clustered index simply means that the data table is actually physically sorted according to the index itself. Any table can have at most one clustered index—usually the primary key. Unclustered indexes do not affect the order in which data is sorted on the physical disk drive. The database keeps a separate record of the sort order of the index, and can quickly link them to the existing sort order.

Certain columns make better indexes than others. Here are a few general guidelines for when to create indexes:

Columns that are frequently sorted on using the ORDER BY clause should be indexed.
Columns that participate in joins should always be indexed.
The column that stores the primary key of the table is frequently made a clustered index, especially if it is used in join operations. There can only be one clustered index per table.
A column that is often searched for in ranges of values could potentially be made a clustered index.
Take the nature of your application into account. If the nature of your application is to frequently write to the database (such as an online transaction processing, or OLTP, application), then indexes can slow down your application. However, if the nature of your application is to frequently read from the database (such as an online analytical processing, or OLAP, application), then indexes will often speed up your application considerably.

Creating new indexes is often a task that happens near the end of the developing phase. A database administrator (DBA), working with members of the development team, can analyze the performance bottlenecks of all of the application’s queries and reports.

Partitioning Data

In SQL Server, you can create a database table that is stored on two or more database servers. One way to do this is to split the data into horizontal partitions. A horizontal partition is a database table that only stores a subset of the total data. For example, look at the Sales table in Table 8-1.

A horizontally partitioned database table splits the database table based on the value in one of the columns. Looking at the example from Table 8-1, and assuming this Sales table consists of millions of data rows similar to those examples shown, we can split this table between the two sales offices: office number 32 and office number 13. We can set up and install two SQL Server machines, one in each of the offices, to handle the primary database workload. The tables are separate from each other, as each machine only handles search requests for the data in the tables it manages. But behind the scenes the two databases are linked, as you could easily run a search request against all the rows in the Sales database, and the results would return matches from both offices.

One of the biggest drawbacks to using partitioned tables is that you use the ability to manage unique keys between the two servers. Therefore, you must partition the tables based on a column that is unique to each one, such as the Office ID. Even though Sales ID would normally server on its own as a primary key, because the tables are partitioned the Office ID becomes part of the key as well.