Partition Tables and Indexes


Now we'll get started with digging into the details of some of these awesome new features, starting with one that we're particularly excited about: the new partition tables and partition indexes. First you'll learn why you would need to use this new feature, and then how you should use it. As we're doing that, you'll discover more about what partition tables and indexes are, and then find out more about how you use them.

Why Use Partition Tables and indexes?

Partition tables are a way to spread a single table over multiple partitions, and while doing so each partition can be on a separate filegroup. There are several reasons for doing this, described in the following sections.

Faster and Easier Data Loading

If your database has a very large amount of data to load, you might want to consider using a partition table. Anytime we're talking about a very large amount of data, this isn't a specific amount of data, but any case in which the load operation takes longer than is acceptable in the production cycle. A partition table enables you to load the data to an empty table that's not in use by the "live" data, so it has less impact on concurrent live operations. Clearly, there will be an impact to the I/O subsystem, but if you also have separate filegroups on different physical disks, even this has a minimal impact on overall system performance.

Once the data is loaded to the new table, you can perform a switch to add the new table to the live data. This switch is a simple metadata change that executes very quickly, so partition tables are a great way to load large amounts of data with limited impact on users touching the rest of the data in the table.

Faster and Easier Data Deletion or Archival

For the very same reasons, partition tables also help you to delete or archive data. If your data is partitioned on boundaries that are also the natural boundaries on which you add or remove data, then the data is considered to be aligned. When your data is aligned, deleting or archiving data is as simple as switching a table out of the current partition, after which you can unload or archive it at your leisure.

There is a bit of a catch to this part: With archival, you often want to move the old data to slower or different storage. The switch operation is so fast because all it does is change metadata; it doesn't move any data around, so to actually move the data from the filegroup where it lived to the old. slow disk archival filegroup, you actually have to move the data, but you are moving it when the partition isn't attached to the existing partition table. Therefore, although this may take quite some time, it will have minimal impact on any queries executing against the live data.

Faster Queries

We're sure the opportunity to get faster queries has you very interested. One thing that the Query Optimizer can do when querying a partition table is eliminate searching through partitions that it knows won't hold any results. This is referred to as partition elimination. This only works if the data in the partition table or index is aligned with the query. That is, the data has to be distributed among the partitions in a way that matches the search clause on the query. You will learn more details about this as we cover how to create a partition table.

Sliding Windows

A sliding window is basically what we referred to earlier in the discussion about adding new data and then deleting or archiving old data. What we did was fill a new table, switch it into the live table, and then switch an existing partition out of the live table for archival or deletion. It's kind of like sliding a window of new data into the current partition table, and then sliding an old window of data out of the partition table.

Prerequisites for Partitioning

Before you get all excited about partition tables, you should remember that partitioning is only available with SQL Server 2005 Enterprise Edition. There are also some expectations about the hardware in use, in particular the storage system, although these are implicit expectations, and you can store the data anywhere you want. You just won't get the same performance benefits you would if you had a larger enterprise storage system with multiple disk groups dedicated to different partitions.

Creating Partition Tables

When you decide to create a partition table for the first time, you can get pretty lost in the documentation for partition functions, range left versus range right, partition schemes, and how to actually create something that would work, so we'll walk you through the steps for this process.

Creating a Partition Table from an Existing Table

Suppose that you're starting out with an existing table, and you want to turn it into a partition table. You will face a bit of a limitation in that you can't spread the table over multiple filegroups, as that would require you to physically move all the data around, so this system partitions a table onto a partition scheme on a single set of filegroups.

The partition function is the function that determines how your data is split between partitions, so the first step is to create the partition function. Here is the code for a partition function that splits a table called People into multiple partitions based on the DOB field:

 -- Range partition function for the People table, -- every 10 years from 2006 -110 = 1896, -- start the first partition at everything before 1 Jan 1900 -- 1890–1900, 1900–1910, 1910 – 1920, 1920–1930, 1930–1940, 1940–1950, 1950–1960, -- 1960–1970, 1970–1980, 1980–1990, 1990–2000, 2000 onwards CREATE PARTITION FUNCTION [PeopleRangePF1] (datetime) AS RANGE RIGHT FOR VALUES ('01/01/1900', '01/01/1910', '01/01/1920', '01/01/1930',   '01/01/1940', '01/01/1950','01/01/1960', '01/01/1970', '01/01/1980',   '01/01/1990', '01/01/2000', '01/01/2005', '01/01/2006'   ); GO 

Most of this will look pretty familiar, but there is one new bit of syntax that's particularly perplexing, and that's the range right for values . According to BOL, it's there to determine how the boundary condition is applied at the actual range value, so it tells the function what to do with data that exactly matches the range boundary. In this case, the first range boundary is '01/01/1900'. What should the function do with data that matches that value - does it go above or below? Range Right tells the function to put the matching data into the right side (higher values, or above the range boundary) of the data, whereas range left tells the function to put the data into the left side (lower values, or below the range boundary) of the data.

Something else to watch for here when using datetime fields is the precise datetime you're matching on. In this example, you have the luxury of not worrying about the exact placement of a few values close to the boundary, which may get into the next partition. In a production system, you want to ensure that the partition holds exactly the values you want. In that case, you need to pay particular attention to the exact datetime constant you specify for the range boundary.

The second step is to create the partition scheme, as follows:

 CREATE PARTITION SCHEME [PeoplePS1] AS PARTITION [PeopleRangePF1] TO ([PRIMARY], [PRIMARY], [PRIMARY] , [PRIMARY], [PRIMARY], [PRIMARY] , [PRIMARY], [PRIMARY], [PRIMARY] , [PRIMARY], [PRIMARY], [PRIMARY] , [PRIMARY], [PRIMARY]); GO 

This is pretty simple syntax. The main thing you need to look out for is the exact number of partitions to be created. This partition function created 13 boundaries, so you need to have 13+1 = 14 partitions for the data to go into.

Because you are partitioning an existing table, you have to keep all the data on the existing filegroup. For cases where you need to physically move the data to different filegroups, see the details on deleting or archiving for information about how to move data around most efficiently.

The third step is applying the partition by creating a clustered index on the partition scheme. As the comment says, you already have data in your table, so now you have to spread it across the partitions, which you do by using a clustered index:

 -- we already have data in the table, so we have to partition it using a -- clustered index create clustered index cix_people on people (dob) on peoplePS1 (DOB) go 

The fourth and final step is to check the system metadata to confirm that you have partitioned correctly. To do this there is some new syntax you can use to determine the partition on which that data lives: $partition. Here is an example of using this to see how many rows are on each partition:

 -- How many people do we have in the whole table? select count (*) from people -- 17,391,302 --  this will tell us how many entries there are in each partition? select $partition.PeopleRangePF1(dob) [Partition Number], count(*) as total from people group by $partition.PeopleRangePF1(dob) order by $partition.PeopleRangePF1(dob) /* Partition Number total ---------------- ----------- 1                630366 2                1575914 3                1574769 4                1573157 5                1573374 6                1574617 7                1575106 8                1577206 9                1578043 10               1575668 11               1576848 12               788732 13               157508 14               59994 */ 

Using this syntax, you can see the number of rows on each partition to confirm that the data is distributed as you planned.

Adding New Data to the Partition Table

Now that your table is partitioned, you need a way to add new data offline, and to switch that new data into the partition table. The first step creates a table to hold all the new data to be loaded. This needs to have an identical layout to the partitioned table:

 -- Now go create a new People table to hold any births after 7/11/2006 -- MUST BE IDENTICAL to People create table newPeople (   [personID] [uniqueidentifier] NULL DEFAULT (newsequentialid()),   [firstName] [varchar](80) NOT NULL,   [lastName] [varchar](80) NOT NULL,   [DOB] [datetime] NOT NULL,   [DOD] [datetime] NULL,   [sex] [char](1) NOT NULL ) on [PRIMARY]    -- Must be on the same filegroup as the target partition! 

After doing this, you need to create modified versions of the Birth and doBirths stored procedures to insert rows to the newPeople table, rather then inserting to the People table. We called this new procedure doNewBirths. This change is shown as follows to make it clear that all we are doing is changing the table name we insert to from People to newPeople.

Change the lines

 insert people ( firstName, lastname, dob, sex)                values ( @BoysName, @lastName, getdate(), 'M') insert people ( firstName, lastname, dob, sex)                values ( @GirlsName, @lastName, getdate(), 'F') 

to these two lines:

 insert newPeople ( firstName, lastname, dob, sex)                   values ( @BoysName, @lastName, getdate(), 'M') insert newPeople ( firstName, lastname, dob, sex)                    values ( @GirlsName, @lastName, getdate(), 'F') 

Inserting to the new table is amazingly fast because it's empty, and there are no indexes to slow down the insert rate. Now insert some data into the table - a few hundred thousand rows. Do this by running the new doNewBirths stored procedure. Changing the arguments will determine how many rows it inserts.

Here is a sample command line to insert 100,000 rows into the newpeople table. The two arguments for this procedure are @People int, which tells the procedure how many rows to insert, and @ReportInterval int, which tells the procedure how frequently to report insert stats. For 100,000 rows, we chose to report only every 10,000 rows:

 sqlcmd -StracysLaptop -E -d people -Q" exec doBirths 100000, 10000" 

You're now ready to move onto the next step of creating a clustered index to match the partition index:

 -- create the index on the new table. create clustered index cix_newPeople on newPeople(dob) 

Creating the clustered index takes quite some time, but it doesn't affect your live system performance because it's not connected in any way to the live People table.

One more thing you need to do is create a check constraint on the new table to ensure that the data matches the partition function boundary you are just about to set up:

 -- Before we do the switch, -- create a check constraint on the source table -- to enforce the integrity of the data in the partition ALTER TABLE newPeople ADD CONSTRAINT [CK_DOB_DateRange] CHECK ([DOB] >= '7/11/2006'); GO 

Now you can start making changes to the live partition to prepare it for the new set of data you want to load. The first step here is to alter the partition scheme to specify where the new partition is going to live. This has to be on the same filegroup as the new table you just filled up. In this case, this is as easy as everything is on Primary anyway. In a production system with multiple filegroups, this would be one of the empty filegroups available. For more details on using multiple filegroups, see the "Partitioning a Production System" section coming up shortly.

 -- alter the partition scheme to ready a new empty partition ALTER PARTITION SCHEME PeoplePS1 NEXT USED [PRIMARY]; GO 

The next step is to create the new partition boundary in the partition function. This is done by using an ALTER PARTITION function statement. In this case, you have a RANGE RIGHT function, your new data is for 11 July 2006, and you want all the new data to be in the new partition, so using RANGE RIGHT, you specify the boundary as 11 July 2006:

 -- Split the newest partition at 11 July 2006. -- This is a RANGE RIGHT function, -- so anything on 7/11 goes into the NEW partition, -- anything BEFORE 7/11 goes into the OLD partition -- The new partition this creates goes to the new partition -- we prepared in the alter scheme above ALTER PARTITION FUNCTION PeopleRangePF1() SPLIT RANGE ('7/11/2006'); GO 

The next step will switch the new data into the partition, but before doing that, check to see how many people you have in the newPeople table:

 -- Check how many people we have in the newPeople table select count (*) fron newPeople -- 399996 

Now you can apply the switch and move all those people into the new partition:

 -- Now go switch in the data from the new table ALTER TABLE newPeople SWITCH     -- No partition here as we are switching in a NON partitioned table TO People PARTITION 15;  -- but need to add the partition here ! GO 

The switch ran very quickly, so you check how many people are in each table. First check to see how many people are now in the newPeople table:

 select count (*) from newPeople -- this table is now empty 

As expected, the table is empty; and it is hoped that all 399,996 people are in the partition table in the new partition:

 -- go check the partitions? -- this will tell us how many entries there are in each partition? select $partition.PeopleRangePF1(dob) [Partition Number], count(*) as total from people group by $partition.PeopleRangePF1(dob) order by $partition.PeopleRangePF1(dob) /* Partition Number total ---------------- ----------- 1                630366 2                1575914 3                1574769 4                1573157 5                1573374 6                1574617 7                1575106 8                1577206 9                1578043 10               1575668 11               1576848 12               788732 13               157508 14               59994 15               399996 */ -- We now have a 15th partition with all the new people in it 

Querying the partition table, we now have a fifteenth partition with 399,996 people in it. As you can see, it was relatively easy to slide in a new partition.

Deleting Data from the Partition Table

After a new set of data that represents a new day, week, or month of data has been added to your table, you also need to go move out the old data, either to delete it or to archive it somewhere. The first step in this process is to create a new table for the data. This needs to be an empty table, and it needs to have the exact same structure as the partition table. In addition, it must be on the same filegroup as the partition you are going to remove:

 -- Now go slide out the oldest partition. -- Step one, we need a table to put the data into create table oldPeople (   [personID] [uniqueidentifier] NULL DEFAULT (newsequentialid()),   [firstName] [varchar](80) NOT NULL,   [lastName] [varchar](80) NOT NULL,   [DOB] [datetime] NOT NULL,   [DOD] [datetime] NULL,   [sex] [char](1) NOT NULL ) on [PRIMARY] -- Must be on the same filegroup as the source partition! -- we need a clustered index on DOB to match the partition create clustered index cix_oldPeople on oldPeople(dob) go 

Note that you don't need a check constraint on this table, as it's moving out of the partition, not into it.

Now that you have the table and clustered index to match the partition table and index, you can execute the switch to move out a set of data. This is done using the ALTER TABLE switch statement again, but note that now the syntax has changed a little; you have to specify partition numbers:

 -- now go switch out the partition ALTER TABLE People SWITCH partition 1    -- which partition are we removing TO OldPeople -- No partition here as we are switching to a NON partitioned table GO 

That was remarkably easy, and amazingly fast. Now check how many rows are in each partition again:

 -- Query the system meta data to see where the old data went to? -- this will tell us how many entries there are in each partition? select $partition.PeopleRangePF1(dob) [Partition Number], count(*) as total from people group by $partition.PeopleRangePF1(dob) order by $partition.PeopleRangePF1(dob) /* Partition Number total ---------------- ----------- 2                1575914 3                1574769 4                1573157 5                1573374 6                1574617 7                1575106 8                1577206 9                1578043 10               1575668 11               1576848 12               788732 13               157508 14               59994 15               399996 */ -- Notice that partition 1 has gone! -- lets see how many people are now in the oldPeople table? select count (*) from oldPeople -- 630366 

You can see that partition 1 is no longer part of the partition table, and that the rows from that partition are now in the oldPeople table.

There is one more bit of tidying up you need to perform, and that's to alter the partition function to merge the old range that you no longer need. This is done using another ALTER PARTITION function statement, where you specify the old boundary that you no longer need:

 -- next we can merge the first partition ALTER PARTITION FUNCTION PeopleRangePF1() MERGE RANGE ('01/01/1900'); GO -- now go check the partition layout after the merge! -- this will tell us how many entries there are in each partition? select $partition.PeopleRangePF1(dob) [Partition Number], count(*) as total from people group by $partition.PeopleRangePF1(dob) order by $partition.PeopleRangePF1(dob) /* Partition Number total ---------------- ----------- 1                1575914 2                1574769 3                1573157 4                1573374 5                1574617 6                1575106 7                1577206 8                1578043 9                1575668 10               1576848 11               788732 12               157508 13               59994 14               399996 */ -- Now we only have partitions 1–14 again 

That was all pretty easy, but you aren't quite done yet. You have moved the old partition data out of the partition table but it's still on the same filegroup consuming fast, expensive disk resources. You want the data to be on the old, slow, cheap disks for archival, or in some location ready to be backed up and deleted.

To do this, you need to physically move the data from one filegroup to another. There are several ways you can do this, but the fastest and most efficient is to use a select into. This is pretty simple, but you do have to be a little careful about altering the default filegroup if anyone else might be creating objects in the database while you are doing your data movements. There are three steps you need to complete here.

  1. Alter the default filegroup to be the archival filegroup.

  2. Select into a new table.

  3. Alter the default filegroup back.

Here is the code you can use to do this:

 -- Once its slid out, we have to physically move the data -- to a different filegroup for archival storage -- to move it we need to do some MAGIC !!! -- ********************************************* -- * -- *     --- WARNING --- -- * WE MUST ENSURE no one else is creating new objects -- * during this section ! -- * else they will end up on our slow disks -- * -- * -- ********************************************* -- Change the default filegroup to be slowDisks -- This is so that thje select into creates the new table on slowDisks ALTER DATABASE People MODIFY FILEGROUP [slowDisks] DEFAULT GO -- Move the data! select * into archive2People from oldPeople -- alter the default filegroup back again! ALTER DATABASE People MODIFY FILEGROUP [primary] DEFAULT GO 

Finally, query some system metadata to see where your new table is living. The following is the query you can use for this, with abbreviated results showing just the objects of interest:

 -- Lets go check which filegroup all our objects are on select object_name(i.object_id) as ObjectName, i.name as indexName, f.name as filegroupName from sys.indexes as i inner join sys.filegroups as f on i.data_space_id = f.data_space_id where i.object_id > 100 -- Returns /* ObjectName              IndexName             filegroupName newPeople               cix_newPeople         PRIMARY oldPeople               NULL                  PRIMARY oldPeople               cix_oldPeople         slowDisks archivePeople           NULL                  slowDisks Person                  NULL                  PRIMARY Person                  ix_HomePhone          PRIMARY Person                  ix_dob_name           PRIMARY Person                  ix_personid           PRIMARY */ 

Partitioning a Production System

The examples covered so far help to show how you can implement a sliding window scenario, but they gloss over some of the finer details that are going to be the key points on a production system. One of the main issues on a production system is that you will have multiple filegroups matched to different physical storage.

The only place this really changes any of what you've seen so far is when you're creating and altering the partition scheme. Rather than the example shown earlier, you would create a partition scheme using something like this:

 CREATE PARTITION SCHEME [PeoplePS1] AS PARTITION [PeopleRangePF1] TO ([FileGroup1], [FileGroup2], [FileGroup3]); GO 

Here, each of your partitions would go to a different filegroup.

When you create filegroups, you need to make sure you have enough filegroups not just for the live table, but also for new data and for the old data before it's moved to archival storage or deleted. So you need filegroups for at least the number of live partitions, plus two more: one for the new data and one for the old data.

An alternative physical layout might have multiple partitions sharing the same filegroup. The exact details of laying out physical disks is beyond the scope of this chapter, but is touched on in Chapter 11.

Partitioning and DTA

DTA will provide partitioning recommendations if you ask it to. To do this you need to change the tuning options on the Tuning Options tab. Under the section titled Partitioning Strategy to Employ, change the default setting of No Partitioning to Full Partitioning. The other thing you need to do is determine what kind of queries to tune. It's no good using your INSERT , DELETE, or SWITCH queries, as DTA isn't really interested in them, so you have to tune the select statements that go against the nonpartitioned table. DTA isn't interested in INSERT, DELETE, or SWITCH statements because it can't improve their performance by adding indexes, so it ignores them and looks only at statements it can tune, i.e., SELECT statements.

If the table is big enough, and the queries would benefit from partitioning, DTA will provide scripts to create the partition function, the partition scheme, and the clustered index. Here is an example of a DTA partitioning recommendation we received from a different database. The underlying table has 57,000,000 rows, and uses 1.8GB of data space and 3.6GB of index space:

 CREATE PARTITION FUNCTION [_dta_pf__2533](int) AS RANGE LEFT FOR VALUES (103863552, 196930103, 203421423, 246065168, 269171113, 269702979, 270375078, 273695583, 276447808, 280951053, 298459732, 298855583, 299375843, 299810346, 301474640) CREATE PARTITION SCHEME [_dta_ps__8258] AS PARTITION [_dta_pf__2533] TO ([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY]) CREATE NONCLUSTERED INDEX [_dta_index_myTable_6_741577680_23810504_K1_K2_K3] ON [dbo].[myTable] (   [Col1] ASC,   [Col2] ASC,   [Col3] ASC )WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [_dta_ps__8258]([Colx]) 



Professional SQL Server 2005 Administration
Professional SQL Server 2005 Administration (Wrox Professional Guides)
ISBN: 0470055200
EAN: 2147483647
Year: 2004
Pages: 193

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net