Table and Index Partitioning | Inside Microsoft SQL Server 2005: The Storage Engine (Solid Quality Learning)

As we've already seen when looking at the metadata for table and index storage, partitioning is an integral feature of SQL Server space organization. Figure 6-6 illustrated the relationship between tables and indexes (hobts), partitions, and allocation units. To determine where a table or index is stored, you must reference the partition of the table or index. Tables and indexes that are built without any reference to partitions are considered to be stored on a single partition. One of the more useful metadata objects for retrieving information about data storage is the dynamic management view called sys.dm_db_partition_stats, which combines information found in sys.partitions, sys.allocation_units, and sys.indexes.

A partitioned object is one that is internally split into separate physical units that can be stored in different locations. Partitioning is invisible to the users and programmers, who can use Transact-SQL code to select from a partitioned table exactly the same way they select from a non-partitioned table. Creating large objects on multiple partitions improves the manageability and maintainability of your database system and can greatly enhance the performance of activities such as purging historic data and loading large amounts of data. In SQL Server 2000, partitioning is available only by manually creating a view that combines multiple tables. That functionality is referred to as partitioned views. The SQL Server 2005 built-in partitioning of tables and indexes has many advantages over partitioned views, including improved execution plans and fewer prerequisites for implementation.

A full discussion of all the benefits and uses of partitioning is outside the scope of this book. In this section, we'll focus primarily on the partitioning metadata. In Inside Microsoft SQL Server 2005: Tuning and Optimization, we'll look at the use of partitions for performance improvement and examine query plans involving partitioned tables and partitioned indexes.

Partition Functions and Partition Schemes

To understand the partitioning metadata, we need a little background into how partitions are defined. I will use an example based on the SQL Server samples, combining the scripts in PartitionAW.sql and sliding.sql. You can find my script, called Partition.sql, with the companion content. This script defines two tables: TransactionHistory and TransactionHistoryArchive, along with a clustered index and two nonclustered indexes on each. Both tables are partitioned on the TransactionDate column, with each month of data in a separate partition. Initially there are 12 partitions in TransactionHistory and 2 in TransactionHistoryArchive.

Before you create a partitioned table or index, you must define a partition function, which is basically just a set of endpoints. The number of endpoints will be one less than the number of partitions. Here is the partition function that my example will use:

CREATE PARTITION FUNCTION [TransactionRangePF1] (datetime) AS RANGE RIGHT FOR VALUES ('20031001', '20031101', '20031201',                '20040101', '20040201', '20040301', '20040401',                '20040501', '20040601', '20040701', '20040801');

Note that the table name is not mentioned in the function definition because the partition function is not tied to any one particular table. The function TransactionRangePF1 divides the data into 12 partitions because there are 11 datetime endpoints. The keyword RIGHT specifies that any value that equals one of the endpoints will go into the partition to the right of the endpoint. So for this function, all values less than October 1, 2003, will go in the first partition, and values greater than or equal to October 1, 2003, and less than November 1, 2003, will go in the second partition. I could have also specified LEFT (which is the default), in which case the value equal to the endpoint would go in the partition to the left. After you define the partition function, you define a partition scheme, which lists a set of filegroups onto which each range of data will be placed. Here is the partition schema for my example:

CREATE PARTITION SCHEME [TransactionsPS1] AS PARTITION [TransactionRangePF1] TO ([PRIMARY], [PRIMARY], [PRIMARY] , [PRIMARY], [PRIMARY], [PRIMARY] , [PRIMARY], [PRIMARY], [PRIMARY] , [PRIMARY], [PRIMARY], [PRIMARY]); GO

To avoid having to create 12 files and filegroups, I have put all the partitions on the PRIMARY filegroup, but for the full benefit of partitioning, you would most likely have each partition on its own filegroup. The CREATE PARTITION SCHEME command must list at least as many filegroups as there are partitions, but there can be more. And as you can see in the example, the listed filegroups do not have to be unique. Additional filegroups will be used in order, as more partitions are added, which can happen when a partition function is altered to split an existing range into two. If you do not specify extra filegroups at the time you create the partition scheme, you can alter the partition scheme to add another filegroup.

The partition function and partition scheme for a second table are shown here:

CREATE PARTITION FUNCTION [TransactionArchivePF2] (datetime) AS RANGE RIGHT FOR VALUES ('20030901'); GO CREATE PARTITION SCHEME [TransactionArchivePS2] AS PARTITION [TransactionArchivePF2] TO ([PRIMARY], [PRIMARY]); GO

My script then creates two tables and loads data into them. I will not include all the details here. To partition a table, you must specify a partition scheme in the table creation statement. I create a table called TransactionArchive that includes this line as the last part of the CREATE TABLE:

ON [TransactionsPS1] (TransactionDate)

My second table, TransactionArchiveHistory, is created using the TransactionsPS1 partitioning scheme.

My script then loads data into the two tables, and because the partition scheme has already been defined, as the data is loaded each row is placed in the appropriate partition. After the tables are loaded, we can examine the metadata.

Metadata for Partitioning

Figure 7-8 shows most of the catalog views for retrieving information about partitions. Along the left and bottom edges you can see the sys.tables, sys.indexes, sys.partitions, and sys.allocation_units that I've discussed already.

Figure 7-8. Catalog views containing metadata relevant to partitioning and data storage

In some of my queries, I will use the undocumented sys.system_internals_allocation_units view instead of sys.allocation_units in order to retrieve page address information. Below, I'll describe the most relevant columns of each of the these views:

Sys.data_spaces has a primary key called data_space_id, which is either a partition ID or a filegroup ID, and there is one row for each filegroup and one row for each partition scheme. One of the columns in sys.data_spaces specifies to which type of data space the row refers. If the row refers to a partition scheme, data_space_id can be joined with sys.partition_schemes.data_space_id. If the row refers to a filegroup, data_space_id can be joined with sys.filegroups.data_space_id. The sys.indexes view also has a data_space_id column to indicate how each heap or B-tree stored in sys.indexes is stored. So, if we know that a table is partitioned, we can directly join it with sys.partition_schemes without going through sys.data_spaces. Alternatively, you can use the following query to determine whether a table is partitioned, by replacing Production.TransactionHistoryArchive with the name of the table in which you're interested:
```
SELECT DISTINCT object_name(object_id) as TableName,            ISNULL(ps.name, 'Not partitioned') as PartitionScheme    FROM (sys.indexes i LEFT  JOIN sys.partition_schemes ps                   ON (i.data_space_id = ps.data_space_id))    WHERE (i.object_id = object_id('Production.TransactionHistoryArchive'))              AND  (i.index_id IN (0,1));
```
Sys.partition_schemes has one row for each partition scheme. In addition to the data_space_id and the name of the partition scheme, it has a function_id column to join with sys.partition_functions.
Sys.destination_data_spaces is a linking table because sys.partition_schemes and sys.filegroups are in a many-to-many relationship with each other. For each partition scheme, there is one row for each partition. The partition number is in the destination_id column, and the filegroup ID is stored in the data_space_id column.
Sys.partition_functions contains one row for each partition function, and its primary key function_id is a foreign key in sys.partition_schemes.
Sys.partition_range_values (not shown) has one row for each endpoint of each partition function. Its function_id column can be joined with sys.partition_functions, and its boundary_id column can join with either partition_id in sys.partitions or with destination_id in sys.destination_data_spaces.

These views have other columns that I haven't mentioned, and there are additional views that provide information, such as the columns and their data types that the partitioning is based on. However, the preceding information should be sufficient to understand Figure 7-8 and the view below. This view returns information about each partition of each partitioned table. The WHERE clause filters out partitioned indexes (other than the clustered index), but you can change that condition if you desire. When selecting from the view, you can add your own WHERE clause to find information about just the table you're interested in.

CREATE VIEW Partition_Info AS SELECT OBJECT_NAME(i.object_id) as Object_Name, dbo.INDEX_NAME(i.object_id,i.index_id) AS Index_Name,         p.partition_number, fg.name AS Filegroup_Name, rows, au.total_pages,         CASE boundary_value_on_right                    WHEN 1 THEN 'less than'                    ELSE 'less than or equal to' END as 'comparison', value FROM sys.partitions p JOIN sys.indexes i      ON p.object_id = i.object_id and p.index_id = i.index_id        JOIN sys.partition_schemes ps                 ON ps.data_space_id = i.data_space_id        JOIN sys.partition_functions f                    ON f.function_id = ps.function_id        LEFT JOIN sys.partition_range_values rv ON f.function_id = rv.function_id                     AND p.partition_number = rv.boundary_id      JOIN sys.destination_data_spaces dds              ON dds.partition_scheme_id = ps.data_space_id                   AND dds.destination_id = p.partition_number      JOIN sys.filegroups fg                 ON dds.data_space_id = fg.data_space_id      JOIN (SELECT container_id, sum(total_pages) as total_pages                      FROM sys.allocation_units                      GROUP BY container_id) AS au                 ON au.container_id = p.partition_id WHERE i.index_id <2;

The LEFT JOIN is needed to get all the partitions because the sys.partition_range_values view has a row only for each boundary value, not for each partition. The LEFT JOIN gives the last partition with a boundary value of NULL, which means there is no upper limit to the last partition's value. There is a derived table that groups together all the rows in sys.allocation_units for a partition, so the space used for all the types of storage (in-row, row-overflow, and LOB) will be aggregated into a single value. This query will use the preceding view to get information about my TransactionHistory table's partitions:

SELECT * FROM Partition_Info WHERE Object_Name = 'TransactionHistory';

Here are my results:

Object_Name	partition_number	Filegroup_Name	rows	total_pages	comparison	value
TransactionHistory	1	PRIMARY	11155	209	less than	`2003-10-0`
TransactionHistory	2	PRIMARY	9339	177	less than	`2003-11-01`
TransactionHistory	3	PRIMARY	10169	185	less than	`2003-12-01`
TransactionHistory	4	PRIMARY	12181	225	less than	`2004-01-01`
TransactionHistory	5	PRIMARY	9558	177	less than	`2004-02-01`
TransactionHistory	6	PRIMARY	10217	193	less than	`2004-03-01`
TransactionHistory	7	PRIMARY	10703	201	less than	`2004-04-01`
TransactionHistory	8	PRIMARY	10640	193	less than	`2004-05-01`
TransactionHistory	9	PRIMARY	12508	225	less than	`2004-06-01`
TransactionHistory	10	PRIMARY	12585	233	less than	`2004-07-01`
TransactionHistory	11	PRIMARY	3380	73	less than	`2004-08-01`
TransactionHistory	12	PRIMARY	1008	33	less than	NULL

This view contains details about the boundary point of each partition, as well as the filegroup that each partition is stored on, the number of rows in each partition, and the amount of space used. Note that although the comparison indicates that the values in the partitioning column for the rows in a particular partition are less than the specified value, you should assume that it also means that the values are greater than or equal to the specified value in the preceding partition. However, this view doesn't provide information about where in the particular filegroup the data is located. We'll look at a metadata query that gives us location information in the next section.

Partition Power

One of the main benefits of partitioning your data is that you can move data from one partition to another as a metadata-only operation. The data itself doesn't have to move. As I mentioned, this is not intended to be a complete how-to guide to SQL Server 2005 partitioning; rather, it is a look into the internal storage of partitioning information. However, to show the internals of rearranging partitions, we need to look at some additional partitioning operations.

The main operation you use when working with partitions is the SWITCH option to the ALTER TABLE command. This option allows you to:

Assign a table as a partition to an already existing partitioned table
Switch a partition from one partitioned table to another
Reassign a partition to form a single table

In all these operations, no data is moved. Rather, the metadata is updated in the sys.partitions and sys.system_internals_allocation_units views to indicate that a given allocation unit now is part of a different partition. Let's look at an example. The following query returns information about each allocation unit in the first two partitions of my TransactionHistory and TransactionHistoryArchive tables, including the number of rows, the number of pages, the type of data in the allocation unit, and the page where the allocation unit starts.

SELECT convert(char(25),object_name(object_id)) AS name,     rows, convert(char(15),type_desc) as page_type_desc,     total_pages AS pages, first_page, index_id, partition_number FROM sys.partitions p JOIN sys.system_internals_allocation_units a      ON p.partition_id = a.container_id WHERE (object_id=object_id('[Production].[TransactionHistory]')    OR object_id=object_id('[Production].[TransactionHistoryArchive]'))   AND index_id = 1 AND partition_number <= 2;

Here is the data I get back. (I left out the page_type_desc because all the rows are of type IN_ROW_DATA.)

name                       rows    pages      first_page      index_id    partition_number -------------------------  ------- ---------- --------------  ----------- ----------------- TransactionHistory         11155   209        0xD81B00000100  1           1 TransactionHistory         9339    177        0xA82200000100  1           2 TransactionHistoryArchive  89253   1553       0x981B00000100  1           1 TransactionHistoryArchive  0       0          0x000000000000  1           2

Now let's move one of my partitions. My ultimate goal is to add a new partition to TransactionHistory to store a new month's worth of data and to move the oldest month's data into TransactionHistoryArchive. The partition function used by my TransactionHistory table divides the data into 12 partitions, and the last one contains all dates greater than or equal to August 1, 2004. I'm going to alter the partition function to put a new boundary point in for September 1, 2004, so the last partition will be split. Before doing that, I must ensure that the partition scheme using this function knows what filegroup to use for the newly created partition. With this command, some data movement will occur, and all data from the last partition of any tables using this partition scheme will be moved to a new allocation unit. Please refer to Books Online for the complete details about each of the following commands:

ALTER PARTITION SCHEME TransactionsPS1 NEXT USED [PRIMARY]; GO ALTER PARTITION FUNCTION TransactionRangePF1() SPLIT RANGE ('20040901'); GO

Next, I'll do something similar for the function and partition scheme used by TransactionHistoryArchive. In this case, I'll add a new boundary point for October 1, 2003.

ALTER PARTITION SCHEME TransactionArchivePS2 NEXT USED [PRIMARY]; GO ALTER PARTITION FUNCTION TransactionArchivePF2() SPLIT RANGE ('20031001'); GO

I want to move all data from TransactionHistory with dates earlier than October 1, 2003 to the second partition of TransactionHistoryArchive. However, the first partition of TransactionHistory technically has no lower limit; it is everything earlier than October 1, 2003. The second partition of TransactionHistoryArchive does have a lower limit, which is the first boundary point, or September 1, 2003. To SWITCH a partition from one table to another, I have to guarantee that all the data to be moved meets the requirements for the new location. So I add a CHECK constraint that guarantees that no data in TransactionHistory is earlier than September 1, 2003. After adding the CHECK constraint, I run the ALTER TABLE command with the SWITCH option to move the data in partition 1 of TransactionHistory to partition 2 of TransactionHistoryArchive. (For your testing purposes, you could try leaving out the next step that adds the constraint and try just executing the ALTER TABLE / SWITCH command. You'll get an error message, and then you add the constraint and run the SWITCH command again.)

ALTER TABLE [Production].[TransactionHistory] ADD CONSTRAINT [CK_TransactionHistory_DateRange] CHECK ([TransactionDate] >= '20030901'); GO ALTER TABLE [Production].[TransactionHistory] SWITCH PARTITION 1 TO [Production].[TransactionHistoryArchive] PARTITION 2; GO

Now we run the metadata query that examines the size and location of the first two partitions of each table:

SELECT convert(char(25),object_name(object_id)) AS name,     rows, convert(char(15),type_desc) as page_type_desc,     total_pages AS pages, first_page, index_id, partition_number FROM sys.partitions p JOIN sys.system_internals_allocation_units a      ON p.partition_id = a.container_id WHERE (object_id=object_id('[Production].[TransactionHistory]')    OR object_id=object_id('[Production].[TransactionHistoryArchive]'))   AND index_id = 1 AND partition_number <= 2; RESULTS: name                  rows    pages      first_page      index_id    partition_number --------------------  ------- ---------- --------------  ----------- ---------------- TransactionHistory    0       0          0x000000000000  1           1 TransactionHistory    9339    177        0xA82200000100  1           2 TransactionHistoryAr  89253   1553       0x981B00000100  1           1 TransactionHistoryAr  11155   209        0xD81B00000100  1           2

You'll notice that the second partition of TransactionHistoryArchive now has exactly the same information that the first partition of TransactionHistory had in the first result set. It has the same number or rows (11,155), the same number of pages (209), and the same starting page (0xD81B00000100, or file 1, page 7128). No data was moved; the only change was that the allocation unit starting at file 1, page 7128 is not recorded as belonging to the second partition of the TransactionHistoryArchive table.

Although my partitioning script created the indexes for my partitioned tables using the same partition scheme used for the tables themselves, this is not always necessary. An index for a partitioned table can be partitioned using the same partition scheme or a different one. If you do not specify a partition scheme or filegroup when you build an index on a partitioned table, the index is placed in the same partition scheme as the underlying table, using the same partitioning column. Indexes built on the same partition scheme as the base table are called aligned indexes. Whether to make your indexes aligned with an underlying partitioned table is primarily a performance-related decision; I will discuss it in Inside Microsoft SQL Server 2005: Tuning and Optimization.

However, there is an internal storage component associated with automatically aligned indexes. As previously mentioned, if you build an index on a partitioned table and do not specify a filegroup or partitioning scheme on which to place the index, SQL Server creates the index using the same partitioning scheme that the table uses. However, if the partitioning column is not part of the index definition, SQL Server adds the partitioning column as an extra included column in the index. If the index is clustered, adding an included column is not necessary because the clustered index always contains all the columns already. Another case in which SQL Server does not add an included column automatically is when you create a unique index, either clustered or nonclustered. Because it is a requirement of unique partitioned indexes that the partitioning column be contained in the unique key, a unique index for which you have not explicitly included the partitioning key will not be automatically partitioned.

Note

In addition to looking in Books Online, you can get more partitioning background, usage suggestions, and best practices by referring to the white paper titled "SQL Server 2005 Partitioned Tables and Indexes" by Kimberly Tripp, which is included with the companion content.