Defining SQL Server Tables

There are two basic categories of tables in SQL Server: user created and system defined. Tables are generally used to store data, but each database contains a number of tables other than those used to store data. These tables store information that enables SQL Server to keep track of objects and procedures within a database. The sysobjects and syscomments system tables maintain entries containing the object definitions and other tracking information for each object. Various other tables also exist to maintain information about specific objects.

For more information regarding system tables, refer to SQL Server Books Online. System tables are used whenever SQL Server needs object information. You should never alter system tables directly; instead, you should allow SQL Server to manipulate the entries as needed. Many stored procedure and system views have been created to minimize the need to query the tables directly.

Exam Alert

Querying system tables is not the preferred method for obtaining system data. Using store procedures and dynamic management views are preferred methods.

In many systems, prior to creating tables (or any object, for that matter), you must determine the purpose of the table. If it is to hold data that is frequently accessed, the file placement of the table should take that into consideration. Tables that hold archive data and other less frequently accessed data require less maintenance than more volatile tables and don't have to be as responsive to user queries, so you should consider that as well.

Keep in mind when assigning objects to files that you can place some objects away from the mainstream data by using filegroups. You can select the object placement from Table Design Properties in the Enterprise Manager or through the use of an ON clause in a CREATE/ALTER statement. SQL Server enables you to place the following table objects:

Tables
Indexes
text, ntext, or image data

You don't necessarily always move objects around. Most applications are not large enough to justify these measures. You need to move objects around only in very large database systems.

When you create a table, you define the structure of the rows and columns (or records and fields, if you prefer) of the table. The objects associated with a table are actually considerably more than just column definitions. In SQL Server 2005, six types of objects are associated with a table definition

Columns These are the fields or attributes of the records in a table.
Keys Keys are primary or secondary identifiers of a table and table relationship.
Constraints A constraint is a regulation placed on data in a table.
Triggers Triggers are actions that automatically occur as additions, changes, and deletions of data occur.
Indexes An index is an object that controls the viewing of data in a particular sequence.
Statistics Statistical information helps the SQL Server Query Optimizer resolve queries in the most efficient manner possible.

Each of these objects plays an important role in a table. The objects are all closely related. In most cases, a setting in one of the objects causes an action or a reaction to occur in another object. Most of this activity occurs at the time of data entry. As additions, changes, and deletions are performed against the data, activities can be configured to aid in keeping the system functional and reliable.

When a table is created, you assign each column in the table a number of attributes that make up the column definition. The minimum requirements for column attributes are a name for the attribute and a data type.

Using Types and Schemas

The data types in SQL Server 2005 span more than just numbers, letters, and special characters. Table 2.1 summarizes the available data types.

Table 2.1. SQL Server 2005 Data Types
Data Type	Type	Byte Size	Notes
`bigint`	Exact numeric	8 bytes
`binary`	Binary string	1 byte per character
`bit`	Exact numeric	1 byte per 8 or less in table
`char`	Character	1 byte per character
`cursor`	Transact SQL cursor	N/A	Cannot be assigned as a column data type
`datetime`	Date and time	8 bytes
`decimal`	Exact numeric	5, 9, 13, or 17 bytes (depending on the precision)	Equivalent to `numeric`
`float`	Approximate	4 or 8 bytes (depending numeric on the precision)
`image`	Binary string	2TB maximum
`int`	Exact numeric	4 bytes
`money`	Exact numeric	8 bytes
`nchar`	Unicode character	2 bytes per character
`ntext`	Unicode character	2 bytes per character and 1TB maximum
`numeric`	Exact numeric	5, 9, 13, or 17 bytes (depending on the precision)	Equivalent to `decimal`
`nvarchar`	Unicode character	2 bytes per character
`real`	Approximate numeric	4 bytes
`smalldatetime`	Date and time	4 bytes
`smallint`	Exact numeric	2 bytes
`smallmoney`	Exact numeric	4 bytes
`sql_variant`	Undetermined/any	Varies with content	Can store any data type except `text`, `ntext`, `image`, `timestamp`, and `sql_variant`
table	SQL Server table	N/A	Cannot be assigned as a column data type
`text`	Character	1 byte per character and 2TB maximum
`timestamp`	Auto-generated	8 bytes	Auto-generated value
`tinyint`	Exact numeric	1 byte
`varbinary`	Binary string	1 byte per character
`varchar`	Character	1 byte per character
`uniqueidentifier`	GUID	16 bytes	Globally unique identifier
`xml`	XML document	2GB maximum

When you select the data type for any given column, you should keep in mind the maximum content of the column. You should also consider the presence or absence of a key field or a column in an index because these should be kept as small as possible.

User-Defined Types (UDTs)

There are two types of UDTs: standard and CLR UDTs. In the simplest form, UDTs simply remap system data types and apply some limits. More advanced UDTs can execute code to provide more diverse types. Some potential implementations of UDTs include those that use date, time, currency, and extended numeric types in a manner other than is offered by default data type behavior. Applications that utilize nonstandard data where system types can't be applied, such as geographical or other complex graphical data, are good candidates for UDTs. Also, if you use encoded or encrypted data, you can use UDTs.

UDTs are stored as database objects and are based on any of the system data types. UDTs can be used when several tables must store the same type of data in a column and you must ensure that these columns have exactly the same data type, length, and nullability.

Using UDTs can help you create tables more quickly and can also help you control the data in a predictable manner. Often, a UDT is created in the model database; it then exists in all new user-defined databases created.

The 70-431 exam might include questions that deal with the selection of the most appropriate or most efficient data type. You can use the byte sizes listed in Table 2.1 to help determine the correct type. You might also get questions in which you must differentiate between standard UDT use and CLR UDTs. You might also need to decide between the use of UDTs and user-defined functions (UDFs).

CLR UDTs

UDTs are programmability database objects that can take advantage of the CLR provided in SQL Server 2005. These CLR objects are programmed against an assembly created in the Microsoft .NET Framework. The ability to execute CLR code is set to OFF by default. You can enable CLR code by using sp_configure or via SQL Server Surface Area Configuration for Features.

Exam Alert

With the CLR being the latest and greatest thing to hit SQL Server, you can bet that you will see a question or two on the topic. Remember, just because it's possible to use other languages doesn't mean that doing so is the best solution. With a standard data task, T-SQL is more efficient than the CLR. If a task is a little out of the ordinary and requires an extra level of processing capabilities, you can effectively use the CLR.

Something that has always been a problem with UDTs in the past is the inability to encompass business logic into the type definition. With use of the CLR in complex processing scenarios, however, anything is possible. Although it is possible, though, you should not try to throw the CLR at every problem. T-SQL still handles data issues more effectively than the CLR.

You should use the CLR only when necessary because there is a considerable amount of overhead associated with using CLR types. Most data-checking mechanisms should be performed by other features of the database system, where they are performed in a more efficient and effective manner. One potential use of the CLR would be in situations where data types are set up in class/subclass scenarios.

XML Schema Collections

SQL Server provides for storage of a full XML document within a single column through the xml data type. You can associate XSD schemas with XML data through the association of an XML schema collection. The XML schema collection stores imported schemas. You can use these imported schemas to validate XML data or to apply type information for the XML data as it is stored in the database. XML is such an intricate part of SQL Server databases that this book provides a whole chapter on it. Refer to Chapter 4, "Supporting the XML Framework," for more information on using XML and XSD.

Other Attributes

You can set the NULL or NOT NULL property when you define a column. If a column is set to NULL, it allows empty entries. If a column is set to NOT NULL, the column must have an entry during entry or alteration of row data. If a column is to be included as the primary key or as a portion of the primary key, then it is set to NOT NULL and cannot be altered.

You might want a column to have a default value. The best way to achieve this is to use the DEFAULT option of the CREATE and ALTER TABLE statements. For backward compatibility, you can programmatically create and bind a DEFAULT object, but you should not do this because it will not be supported in future versions of SQL Server.

Using Columns with Automated or Calculated Values

Many types of columns achieve their resulting values through calculations or other internally automated mechanisms. Several of these column types are defined in the following sections, and each is used for a specific reason.

Computed Columns

When a column is defined, it can be defined so as to get its results from the result of a calculation. This type of column is referred to as a computed column. You might want to use a computed column in a denormalized data framework to avoid having to do complex or frequently repeated calculations within an application. In this case, the calculation is performed when the data is entered or when the data the formula is based on changes.

A computed column is calculated based on an expression that can use other columns in the same table. The expression can be a noncomputed column name, constant, function, variable, or any combination of these, connected by one or more operators. About the only restriction is that the expression cannot be a subquery.

Identity Columns

Another type of field that has its value automatically calculated and entered is an identity column. If you select a column to be used as an identity column, the value is calculated based on a seed starting point for the table, and an incremental value is added each time the calculation occurs.

Identity columns are useful when you want to achieve unique column content in an increasing (or decreasing) manner. Identity columns are often candidates for the primary key because they contain unique values.

Columns with `timestamp` Data Types

The timestamp data type is a binary data type. Oddly enough, everyone seems to assume that it is a time-valued variable, but timestamp has no time-related values, at least not from a calendar or clock perspective. The content is derived through the fact that each database has a counter that is incremented for each insert or update operation performed. This counter is the database timestamp. A table can have only one column that uses the timestamp data type. Each generated timestamp is guaranteed to be unique within the database. The timestamp data type is useful for concurrency logicthat is, for detecting whether a row has been updated since it was last read.

Maintaining Order by Using Indexes

Putting data into sequence to accommodate quick retrieval and to provide meaningful and usable output for an application usually requires that a variety of indexes be defined. A clustered index provides the physical order of the data being stored, whereas a nonclustered index provides an ordered list with pointers to the physical location of the data.

You can most easily understand indexing if you compare the data and index storage of a database to that of a book. In a book, the data itself is placed on the pages in a sequence that is meaningful if you read the book sequentially from cover to cover. An index at the back of the book enables you to read the data in a different order. You can locate a topic by looking through a list of topics that is accompanied by a physical page reference to the place where the topic can be found. To read a single topic, you need not skim through the entire book; you simply need to use the index to find the topic. In a similar manner, data in a database can be handled randomly or in sequence. You can locate a single record in the database by looking it up in the index rather than reading through all the rest of the data. Conversely, if a report is to be generated from all the data in a database, you can read the data sequentially, in its entirety.

Index storage in SQL Server has a B-tree storage structure. The indexes are maintained in 8KB pages classified as root, intermediate, and leaf-level pages. In a clustered index, the leaf level is the data itself, and all other levels represent index pages. In a nonclustered index, all pages contain indexes (see Figure 2.3).

Figure 2.3. The B-tree structure used for clustered indexes.

If a clustered index has not been defined for a given table, the data is stored in a heap. A data heap does not maintain data in any particular order; it simply stores the data in the order in which it is entered. In some applications, in which data is never retrieved in any particular order on a regular basis, this might actually be advantageous.

You can create indexes by using the T-SQL CREATE INDEX command. When creating indexes, it is good practice to leave space for later insertions of data. The following example creates a compound, nonclustered index that is 75% full:

CREATE INDEX IXProductItem ON ProductOrderLine (OrderMateKey, ProductLineKey) WITH FILLFACTOR = 75

The two different organizations of indexes, clustered and nonclustered, provide for the ordering of data in two ways: either through physically rearranging the data, as in a clustered index, or through the use of data pointers, as in a nonclustered index. If the organization is not specified, as in the previous example, nonclustered is the default behavior.

Indexing Through Reordering: Clustered Indexing

Selecting the appropriate column(s) on which to base a clustered index is important for several reasons. As previously mentioned, a clustered index represents the order in which data is physically stored on the disk. For this reason, you can define only a single clustered index for any table. If you choose not to use a clustered index in a table, the data on disk is stored in a heap. A clustered index, if present, has clustering keys that are used by all nonclustered indexes to determine the physical location of the data.

The basis for the index is usually determined by the order in which the majority of applications and queries want their output. The clustered index values are also present in other indexes, and the size of the defined index should be kept as small as possible. When you select a clustering key, you should try to utilize a numeric data type because character types cause index storage to occupy much more space.

You should always define a clustered index before you define any nonclustered indexes. If you do these tasks in reverse order, all nonclustered indexes rebuild themselves when the clustered index is created.

Indexing Through Data Pointers: Nonclustered Indexing

Nonclustered indexes provide a means of retrieving data from a database in an order other than that in which the data is physically stored. The only alternative to the use of these indexes is to use a sort operation, which would place undue overhead on the client system and might not produce the desired response times. A data sort implementation is usually performed for one-time operations only or for applications that will have very limited usage.

Although creating indexes increases performance and saves resources in a lot of cases, you should avoid creating indexes that will rarely be utilized. Each time a record is added to a table, all indexes in the table must be updated, and this might also cause undue system overhead. Therefore, it is important to carefully plan index usage.

One-of-a-Kind Indexing: Unique Indexing

When you create indexes, it is important to guarantee that each value is distinctive. This is particularly important for a primary key. SQL Server automatically applies a unique index to a primary key to ensure that each key value uniquely defines a row in the table. You might want to create additional unique indexes for columns that are not defined as the primary key.

Leaving Room for Additions in Indexes

The fill factor is the percentage at which SQL Server fills leaf-level pages upon creation of indexes. Provision for empty pages enables the server to insert additional rows without performing a page-split operation. A page split occurs when a new row is inserted into a table that has no empty space for its placement. As the storage pages fill, page splits occur, and they can hamper performance and increase fragmentation.

You normally find that queries (which read existing data) outweigh data updates by a substantial margin. Providing the extra room slows down the query process. Therefore, you might not want to adjust the FILLFACTOR setting at all in static systems where there are small numbers of additions.

On the other hand, setting the fill factor too low hampers read performance because the server must negotiate a series of empty pages to actually fetch the desired data. It is beneficial to specify a fill factor when you create an index on a table that already has data and will have a high volume of inserts. If you do not specify the fill factor when creating an index, the server default FILLFACTOR setting is chosen. You set the fill factor for a server through the Enterprise Manager or by using the sp_configure stored procedure.

The percentage value for the fill factor is not maintained over time; it applies only at the time of creation. Therefore, if inserts into a table occur frequently, it is important to take maintenance measures for rebuilding the indexes to ensure that the empty space is put back in place. You can rebuild a specific index by using the CREATE INDEX T-SQL command with the DROP EXISTING option. You can also defragment indexes by using the DBCC INDEXDEFRAG command, which also reapplies the fill factor.

The PADINDEX setting is closely related to the FILLFACTOR setting, to allow space to be left in non-leaf levels. You cannot specify PADINDEX by itself; you can use it only if you supply a fill factor. You do not provide a value for this setting; it matches the setting given for the fill factor.

Primary/Foreign Keys and Relationships

A table key is an attribute that is used to identify a particular row of a table. Both primary and foreign keys are defined in the form of constraints. These keys work together to accommodate table relationships. A foreign key refers to the primary key in the parent table, forming a one-to-one or one-to-many relationship. A many-to-many relationship is really two one-to-many relationships using a joining table.

When multiple tables maintained in a database are related to each other, you need to take some measures to ensure that the reliability of these relationships stays intact. To enforce referential integrity, you create a relationship between two tables. You can do this through the database diagram feature of the Enterprise Manager or by using the CREATE and ALTER TABLE T-SQL statements. Normally, you relate the referencing or foreign key of one table to the primary key or other unique value of a second table.

The `PRIMARY KEY` Constraint

A PRIMARY KEY constraint enforces entity integrity in that it does not permit any two rows in a table to have the same key value. This enables each row to be uniquely defined in its own right. Although a primary key should be created when a table is initially created, you can add or change a primary key at any time after table creation. You could add a primary key upon table creation as shown here:

CREATE TABLE Suppliers ( supid id NOT NULL    CONSTRAINT [UPKCL_supind] PRIMARY KEY CLUSTERED,   lname varchar (40) NOT NULL ,   fname varchar (20) NOT NULL ,   phone char (12) NOT NULL ,   address varchar (40) NULL ,   city varchar (20) NULL ,   state char (2) NULL ,   zip char (5) NULL ,   active bit NOT NULL ) ON [PRIMARY]

A primary key cannot have any duplicate values. SQL Server automatically creates a unique index to enforce the exclusiveness of each value. If a primary key is referenced by a foreign key in another table, the primary key cannot be removed unless the foreign key relationship is removed first.

Although it is not a requirement of the SQL Server database environment, a primary key should be defined for each table. A primary key helps records maintain their identities as unique rows of a table and also provides a means of relating tables to other tables in the database to maintain normal forms. A foreign key is defined in a subsidiary table as a pointer to the primary key or another unique value in the primary table to create a relationship.

Table keys can be set up in several different types of relationships:

One-to-many relationships The most common relationships are one-to-many relationships, in which the unique value in one table has many subsidiary records in the second table.
One-to-one relationships Another form of relationship, which is normally used to split a table with an extraordinary number of columns, is a one-to-one relationship. You can use one-to-one relationships to split a table and associate a single unique value in one table with the same unique value in a second table.
Many-to-many relationships You can also define a many-to-many relationship, but this form of referencing requires three tables and is really two separate one-to-many relationships.

Utilizing referential integrity guidelines helps maintain the accuracy of data entered into a system. A database system uses referential integrity rules to prohibit subsidiary elements from being entered into the system unless a matching unique element is in the referenced table. The system also protects the data from changes and deletions, assuming that cascading actions have been carefully and properly implemented. (Cascading actions are discussed later in this chapter, in the section "Using Cascading Actions.")

The `FOREIGN KEY` Constraint

A FOREIGN KEY constraint is defined so that a primary and subsidiary table can be linked together by a common value. A foreign key can be linked to any unique column in the main table; it does not necessarily have to be linked to the primary key. It can be linked to any column that is associated with a unique index.

You can define a foreign key and its relationship when creating or altering a table definition. The following example defines a relationship by using T-SQL:

CREATE TABLE PurchaseOrderDetails      ( DetailsID           smallint,        OrderID             smallint           FOREIGN KEY (OrderID) REFERENCES PurchaseOrders(OrderID),        SupID             smallint           FOREIGN KEY (SupID) REFERENCES Suppliers(SupID),        QtyOrdered bigint,        WarehouseLocation  smallint       )

With a foreign key defined, you cannot add a value to the foreign key column if a matching value is not present in the primary table. If a child entry with an ID is not found in the parent table, this is known as an orphan child and is a breach of referential integrity rules.

Using Cascading Actions

Cascading actions affect update and delete activity when an existing foreign key value is changed or removed. You control cascading action by using the CREATE and ALTER TABLE statements, with clauses for ON DELETE and ON UPDATE. You can also select these features by using the Enterprise Manager.

Exam Alert

You can expect the 70-431 exam to ask something about cascading actions. You should also be prepared for the exam by knowing all the results and implications of cascading actions. For example, you might be asked what occurs when a record contained in the parent table is deleted or has its key value changed.

In a cascading update, when you change the value of a key in a situation in which a foreign key in another table references the key value, those changed values are reflected in the other tables. Something similar happens with a delete operation: If a record is deleted, all subsidiary records in other tables are also deleted. For example, if an invoice record is deleted from an invoice table that has invoice details stored in another table and referenced by a foreign key, the details are also removed.

A series of cascading actions could easily result from the update or deletion of important keys. For example, the deletion of a customer could cause the deletion of all that customer's orders, which could cause the deletion of all the customer's invoices, which in turn could cause the deletion of all of the customer's invoice details. For this reason, careful system design is important, and you should consider archiving data through the use of triggers.

In the case of multiple cascading actions, all the triggers to be fired by the effects of the original deletion fire first. AFTER TRiggers then fire on the original table, and then the AFTER TRiggers in the table chain subsequently fire. As discussed in the following section, SQL Server 2005 has two different styles of triggers:

Data Manipulation Language (DML) triggers are familiar to most SQL Server technicians.
Data Definition Language (DDL) triggers are new to SQL Server.

Using DML and DDL Triggers

A DML trigger is like a stored procedure in that it contains a set of T-SQL statements saved for future execution. The big difference is that, unlike stored procedures, DML triggers are executed automatically, based on data activity in a table. A DML trigger may fire based on UPDATE, INSERT, or DELETE operations.

Exam Alert

Be sure to recognize these two anagrams because they may be the only clues given as to where the triggers are attached:

DML correlates to IUD: INSERT, UPDATE, DELETE.
DDL correlates to CAD: CREATE, ALTER, DROP.

DDL triggers, also referred to as database triggers, fire procedures in response to DDL statements. DDL statements start with CREATE, ALTER, and DROP. DDL triggers can be used for administrative tasks such as auditing and regulating database operations. DDL triggers fire after an event has occurred. You cannot define an INSTEAD OF operation on a DDL trigger. In SQL Server, DML triggers can be fired after an operation completes (the SQL Server default) or instead of the triggering operation. You can use an AFTER trigger to archive data when it is deleted, to send a notification that the new data has been added or changed, or to initiate any other process you might want to automate based on data activity. You can use an INSTEAD OF trigger to perform more advanced activities such as advanced data checking, to enable updates in a view to occur across multiple tables, or to perform many other functions that might be necessary in place of a triggering activity.

Triggers represent a mechanism in which code can be executed based on activity in the data.

Partitioning Tables

To support the partitioning of tables, you must be running SQL Server 2005 Enterprise Edition. Only the Enterprise Edition of SQL Server supports partitioning.

As discussed earlier in this chapter, tables store information about an entity, such as products or sales. Each table has attributes that describe only that entity. While a single table for each entity is the easiest to design and understand, such tables are not necessarily optimized for performance, scalability, and manageability, particularly as the tables grow larger.

Partitioning of data could allow data to be always available, even during periods of scheduled maintenance. By dividing the data into partitions, you can implement the maintenance process such that data on one partition is available while maintenance occurs on another.

When maintenance operations are performed, many costly effects can occur, such as degrading performance and data blocking problems. System and database backupsalthough being performed when the database is still onlinecould affect the overall performance of a server. There are also space, time, and operational costs associated with most maintenance activities. All this can negatively affect overall scalability of a server.

Partitioning Strategies

Partitions can be designed in a symmetric or asymmetric fashion, and although it is most useful to design symmetrically, the access requirements of a lot of systems necessitate an asymmetric design.

A symmetric design is one in which all related data is placed on the same server so that most queries do not have to cross network boundaries to access the data. It is also easier to manage data if the division of tables can be performed in such a manner that each server has the same amount of data. In most real-world applications, data is accessed in a random fashion that can make the designer lean toward an asymmetric implementation. The design can be configured so that one server has a larger role and/or contains more data than the others. You can improve performance if you weigh each server's use and make one server work harder on the partitioned applications because the other servers perform larger tasks that deal with other, unrelated processes.

Table Considerations

Table size is usually the primary factor in considering a table to be a candidate for partitioning. A table can also be considered if activities of other users or maintenance operations have a limiting effect on availability. You might want to implement partitioning in any scenario where performance is severely degraded or if a period of downtime is regularly needed for maintenance purposes.

Varying access patterns of the rows and columns within data could have an impact on the decision to partition a table. Performance could be degraded because of excessive reading of data when table scans are used to retrieve individual pieces of data.

A database doesn't necessarily have to be a large database to be problematic enough to warrant partitioning. You might want to consider partitioning any database that does not perform as desired. You also need to consider whether operational and maintenance costs have exceeded budgetary specifications.

Advantages of Partitioning

Using partitions aids in improving scalability and manageability of large tables or tables that have varying access patterns. Partitioning can help by dividing the data into smaller, more manageable sections.

If a table exists on a system that has multiple CPUs, partitioning the table can lead to better performance through execution of parallel operations. The performance of operations across extremely large data sets can benefit if operations are performed against individual subsets of the data in a parallel manner. In SQL Server 2005, related tables that are partitioned to the same partitioning key and the same partitioning function are aligned. When the Query Optimizer detects that two aligned tables are joined, it can join the data that resides on the same partitions first, and then it can combine the results with the results from other partitions.

Creating Partitioned Tables

To implement table partitioning, you follow these steps:

1.	Determine what you are using as the partitioning key and how many partitions you will use. Assume here that you are to divide the data into four partitions, based on alphabetic sorting: First partition: AF Second partition: GM Third partition: NS Fourth partition: TZ
2.	Create one or more filegroups. It is generally recommended that you have the same number of filegroups as partitions.
3.	Use the `CREATE PARTITION FUNCTION` statement to create a function in the current database that maps the rows of a table or index into partitions based on the values of a specified column. The number and domain of the partitions of a partitioned table or index are determined in a partition function.
4.	Use the `CREATE PARTITION SCHEME` statement to create a scheme in the current database that maps the partitions of a partitioned table or index to filegroups.
5.	Create the tables and other objects that are going to be stored on each partition.
6.	Create indexes and any other partition objects that have the same functions and create the partition scheme.

Let's first look at creating the filegroups used for data storage. To begin, you need to have created a database called ONE. You will need to add some filegroups and files to the database. You do this by using the ALTER DATABASE statement. You should probably create one filegroup/file for each partition. Depending on the actual physical storage, you might want to create multiple filegroups, each with one or more files. The following code creates four filegroups, each with a single file:

ALTER DATABASE ONE ADD FILEGROUP PartitionOne ALTER DATABASE ONE ADD FILEGROUP PartitionTwo ALTER DATABASE ONE ADD FILEGROUP PartitionTri ALTER DATABASE ONE ADD FILEGROUP PartitionFour GO ALTER DATABASE ONE ADD FILE (NAME = N'PartitionOne',     FILENAME = N'C:\PartitionData\Part1.ndf',     SIZE = 10MB, FILEGROWTH = 5MB)     TO FILEGROUP PartitionOne ALTER DATABASE ONE ADD FILE (NAME = N'PartitionTwo',     FILENAME = N'C:\PartitionData\Part2.ndf',     SIZE = 10MB, FILEGROWTH = 5MB)     TO FILEGROUP PartitionTwo ALTER DATABASE ONE ADD FILE (NAME = N'PartitionTri',     FILENAME = N'C:\PartitionData\Part3.ndf',     SIZE = 10MB, FILEGROWTH = 5MB)     TO FILEGROUP PartitionTri ALTER DATABASE ONE ADD FILE (NAME = N'PartitionFour',     FILENAME = N'C:\PartitionData\Part4.ndf',     SIZE = 10MB, FILEGROWTH = 5MB)     TO FILEGROUP PartitionFour GO

In this example, each partition file is in the same location, C:\PartitionData, which is a pre-created file folder on the hard drive where the data will be stored. In a production scenario, the partition files would each likely be stored on different volumes. Each file has appropriate autogrowth properties assigned.

Exam Alert

Pay close attention to the order in which objects are put in place:

The function definition has to be present to define a scheme based on that function.
The filegroups have to be present before creating the scheme that will use them.
The scheme has to be present for table creation to know where to store data.

The first true step in the partitioning process uses the CREATE PARTITION FUNCTION statement to provide the definition of how the data is to be split among the partitions. You need to supply three parameters to the statement.

The data type of the column used for determining the partitioning of the data
A range boundary indicator as to whether values fall left (the default) or right of the provided values
A list of the boundary values to be used in partitioning

To continue along with the example, the partition function creation would be similar to the following code:

CREATE PARTITION FUNCTION AlphaRangePartFun(char(30))     AS RANGE RIGHT FOR VALUES ('G', 'N', 'T') GO

Three values are provided, and using the RIGHT boundary element means that the value provided is the boundary element just to the right of the partition. In this example, G is the first letter to the right of the partition values, which will be A, B, C, D, E, and F. Also, N is the first letter to the right of G tHRough M, and T is the first letter to the right of N through S. The final partition would hold all remaining values to T tHRough Z. The last partition value set does not need to be defined.

When the partition function has been defined, you can proceed to define the scheme on which storage will be performed, using the previously defined function to divide the data into partitions. In defining the scheme, you use the CREATE PARTITION SCHEME statement and supply the name of the partition function to use as well as the names of the filegroups that hold the data for each partition. For this example, the code would be as follows:

CREATE PARTITION SCHEME AlphaRangePartSch      AS PARTITION AlphaRangePartFun     TO (PartitionOne, PartitionTwo, PartitionTri, PartitionFour) GO

Finally, you create the tables and any other objects you want to assign to the partitions. Notice in the following example that both the table and the index that are partitioned will be stored on AlphaRangePartSch, and the storage is based on the CUSTNAME field of the table:

CREATE TABLE dbo.CUSTOMERS(     CUSTNMBR char(15) NOT NULL,     CUSTNAME char(30) NOT NULL      CONSTRAINT AlphaCheckCon       CHECK (CUSTNAME >= 'A' AND CUSTNAME <= 'ZZZ'),     CNTCPRSN char(31),     ADDRESS1 char(31),     ADDRESS2 char(31),     ADDRESS3 char(31),     COUNTRY char(21),     CITY char(31),     STATE char(29),     ZIP char(11),     PHONE1 char(21),     PHONE2 char(21),     INACTIVE tinyint NOT NULL  CONSTRAINT [PKCUSTMR] PRIMARY KEY NONCLUSTERED   (CUSTNAME ASC) WITH (IGNORE_DUP_KEY = OFF)    ON AlphaRangePartSch(CUSTNAME))  ON AlphaRangePartSch(CUSTNAME) GO

After you finish creating the partitioned table, data begins to be stored across the partitions, based on the function you defined. You can then see the PARTITION FUNCTION and PARTITION SCHEME objects within the Storage folder of the ONE database within SQL Server Management Studio. The appropriate scheme then shows up as the storage location in both the table properties and the index properties.

Using `$PARTITION` in Queries

With a partitioned table, you can perform queries against the entire table in a normal manner. If you prefer to make a query against only a single partition, you can also easily perform that or any other similar query by using the $PARTITION function. You use this function to return the partition number. When you use it in combination with the other functions or aggregates, you can return some very useful information.

Exam Alert

$PARTITION is a function that returns a number that corresponds to the partition where the data is or would be. You can use this function in a number of ways whenever you need to know how data is dispersed.

You can use the $PARTITION function to test the function and scheme of any partitioned table. Even if a table has no data, you can find out which partition would be used by running a test similar to this:

USE ONE GO SELECT $Partition.AlphaRangePartFun ('Brown') SELECT $Partition.AlphaRangePartFun ('Jones') SELECT $Partition.AlphaRangePartFun ('Smith') SELECT $Partition.AlphaRangePartFun ('Zachary') GO

You can substitute any name in the single quotes. The value returned tells in which partition a record would be stored if it were inserted in the table. You can also use $PARTITION to easily track the record content of each partition. For example, the following query would return the number of records found in each of the partitions:

USE ONE GO SELECT $Partition.AlphaRangePartFun(CUSTNAME) AS Partition,     COUNT(*) AS Count FROM Customers     GROUP BY $Partition.AlphaRangePartFun(CUSTNAME)     ORDER BY Partition GO

In combination with grouping, in this manner you can get a good summary of the number of records stored within each partition. This is a useful view to set up in a production environment; you can easily convert the procedure to a view as follows:

CREATE VIEW PartitionCount AS     SELECT $Partition.AlphaRangePartFun(CUSTNAME) AS Partition,         COUNT(*) AS Count FROM Customers         GROUP BY $Partition.AlphaRangePartFun(CUSTNAME) GO SELECT * FROM PartitionCount     ORDER BY Partition GO

Using Types and Schemas

Table 2.1. SQL Server 2005 Data Types

User-Defined Types (UDTs)

CLR UDTs

XML Schema Collections

Other Attributes

Using Columns with Automated or Calculated Values

Computed Columns

Identity Columns

Columns with timestamp Data Types

Maintaining Order by Using Indexes

Figure 2.3. The B-tree structure used for clustered indexes.

Indexing Through Reordering: Clustered Indexing

Indexing Through Data Pointers: Nonclustered Indexing

One-of-a-Kind Indexing: Unique Indexing

Leaving Room for Additions in Indexes

Primary/Foreign Keys and Relationships

The PRIMARY KEY Constraint

The FOREIGN KEY Constraint

Using Cascading Actions

Using DML and DDL Triggers

Partitioning Tables

Partitioning Strategies

Table Considerations

Advantages of Partitioning

Creating Partitioned Tables

Using $PARTITION in Queries

Columns with `timestamp` Data Types

The `PRIMARY KEY` Constraint

The `FOREIGN KEY` Constraint

Using `$PARTITION` in Queries