Distributed Partitioned Views | ppk on JavaScript, 1/e

Now that you're an expert on linked servers, you're ready to dive into distributed partitioned views (DPVs). You've probably heard this SQL Server scalability buzzword, and you may have tried DPVs. They can be extremely complex to create, because they require you to cross every t and dot every i.

DPVs allow you to create views that tie together multiple databases on the same or different servers. The views themselves can be inserted, updated, deleted, and used for selections. With DPVs, you can break one large table into smaller, more manageable, quicker to process chunks, and place the chunks on separate servers, called federated servers. This allows you to scale out to seemingly indefinite levels. Whenever the server performance slows, take another slice of the data and place it on another server.

Note

Distributed partitioned views are only available in Enterprise and Developer Editions of SQL Server.

DPV Architecture

Figure 9-5 shows the basic architecture for DPVs. A client queries the partitioned view, not the table directly. The client can connect to either server in order to query the partitioned view.

click to expand
Figure 9-5: Distributed partitioned view architecture

This approach invites a certain level of scalability, because your connections aren't managed by a single server. In other words, you can spread your connections across multiple servers as well. From the partitioned view, SQL Server determines which server holds the data and then retrieves it. If both servers contain data, both servers are queried. You can have more than two servers involved in this approach.

Tip

Linked servers are a key component of distributed partitioned views. Make sure the connection between the two (or more) servers is a fast one. You don't want the main slowdown in your queries to be the network between the participating servers.

For distributed partitioned views to work, you should divide the data equally. The data is divided and governed by CHECK constraints. SQL Server uses these constraints to determine which server has the data based. For example, ServerA may hold CustomerID 1-10000, while ServerB holds CustomerID 10001-99999.

Caution

The column you use to divide the data must be a part of the primary key, and must be in the same ordinal position in each table on the various servers. Additionally, you'll want to ensure that when you change your password of the login used in the linked server that it is also changed in the properties of the linked server.

Preparing for a DPV

Distributed partitioned views are very erratic and quite complex to set up. Plan and design your DPVs carefully, because you don't want to have to redesign them after they're implemented. The basic preparation steps for creating a DPV are as follows:

Create the database if one doesn't exist.
Design the tables to be partitioned by CHECK constraints.
Create a linked server on each of the participating servers. If you have three servers, you'll have to create two linked servers on each of the participating servers. For example, on ServerB, you would have two linked servers, one for ServerA and one for ServerC.
Design your partitioned views using the UNION ALL clause.
Begin using the views.

Now that the basics are covered, let's get started on a two-server DPV. In this section, I'll walk you through the step-by-step process of creating a DPV and querying against it. (For this example, I'll use the Customers table from the Northwind database, and I'll split the table down the middle alphabetically.)

First, create another database (in this example, NorthwindDPV). This step is not required, but it's a good practice to have a fresh table on which to place the constraint.

Next, create a shell of the table. For the purpose of brevity, I've only included four columns from my table. Use code with the following syntax on ServerA to create the database and table:

CREATE Database NorthwindDPV GO USE NorthwindDPV CREATE TABLE Customers (       CustomerID nchar (5) NOT NULL ,       CompanyName nvarchar (40) NOT NULL ,       ContactName nvarchar (30) NULL,        CONSTRAINT          PK_Customers  PRIMARY KEY CLUSTERED (CustomerID),        CONSTRAINT CKDPVCustomerID         CHECK (CustomerID BETWEEN 'AAAAA' and 'LZZZZ')) GO

Notice that I'm dividing the data based on CustomerID, which is the primary key and is not an identity. The constraint CKDPVCustomerID governs which data will go into the table. If you choose to use artificial keys, make sure that you set the seed to a different number-one that lines up with the CHECK constraint.

You run similar code on ServerB, but you divide the data slightly differently to reflect the last half of the data:

CREATE Database NorthwindDPV GO USE NorthwindDPV CREATE TABLE Customers (       CustomerID nchar (5) NOT NULL ,       CompanyName nvarchar (40) NOT NULL ,       ContactName nvarchar (30) NULL,        CONSTRAINT          PK_Customers  PRIMARY KEY CLUSTERED (CustomerID),        CONSTRAINT CKDPVCustomerID         CHECK (CustomerID BETWEEN 'M' and 'ZZZZZ')) GO

Next, add a linked server on ServerA and ServerB. The linked server's name can be the same on both servers in a two-server DPV environment, but I strongly recommend against it. This would restrict you to a limit of two servers participating in the DPV. With that in mind, create a linked server (I called mine DPVSERVER1) on ServerA which connects to ServerB:

sp_addlinkedserver    @server=DPVSERVER1,    @srvproduct = 'SQLServer OLEDB Provider',    @provider = 'SQLOLEDB',    @datasrc = 'ServerB'

On ServerB, run similar code, but change the name of the linked server, and also change the data source to ServerA:

sp_addlinkedserver    @server=DPVSERVER2,    @srvproduct = 'SQLServer OLEDB Provider',    @provider = 'SQLOLEDB',    @datasrc = 'ServerA'

Now, map the logins for the linked servers. For ServerA, use the following code with the appropriate passwords and logins:

EXEC sp_addlinkedsrvlogin   @rmtsrvname='DPVServer1',   @useself='false',   @rmtuser='sa',   @rmtpassword='password'

On ServerB, you must run similar code, but change the linked server name:

EXEC sp_addlinkedsrvlogin   @rmtsrvname='DPVServer2',   @useself='false',   @rmtuser='sa',   @rmtpassword='password'

On both servers, turn on the Lazy Schema Validation option, which results in a significant performance boost. The Lazy Schema Validation option allows SQL Server to skip the validation of the schema at the beginning of every query. This saves time at both the query and network levels. To set the option on ServerA, use the following query:

EXEC sp_serveroption 'DPVSERVER1', 'lazy schema validation', 'true'

Similarly on ServerB, run the following query:

EXEC sp_serveroption 'DPVSERVER2', 'lazy schema validation', 'true'

Setting Up and Using the Partitioned Views

Now that the design is complete, create the partitioned views on ServerA and ServerB. The first view on ServerA takes all the records from its Customers table, then includes the records from ServerB using the UNION ALL clause. To accomplish this, connect to ServerA and run the following command:

CREATE VIEW DPVCustomers AS SELECT * FROM Customers UNION ALL SELECT * FROM DPVSERVER1.NorthwindDPV.dbo.Customers

On ServerB, create a similar view. For simplicity, I like to use the exact same name for the view. This permits users to connect to either system and query the same data by view name. I recommend that you do the same in your DPV implementation. To create the second view, use the following syntax:

CREATE VIEW DPVCustomers AS SELECT * FROM DPVSERVER2.NorthwindDPV.dbo.Customers UNION ALL SELECT * FROM Customers

Pay special attention to the order of this query. On ServerB, I've placed the remote server above the local server. This is because ServerA holds the first half of the data and ServerB holds the second half. If a user queries the view, you want the user to see a consistent view of the data, and you want to keep the data ordered the same, regardless of the connection. Unless the user issues an ORDER BY clause, he may not see the proper order of the data if you don't place the linked server's data in the first part of the data. The same theory applies for three servers in your DPV.

Inserting Data into the Partitioned View

You insert data into the DPV in the way you insert data into any other table. To load data, connect to either server and run a query with the following format:

INSERT INTO DPVCustomers SELECT CustomerID, CompanyName, ContactName FROM Northwind.dbo.Customers

Once you execute this query, half of the data is automatically placed onto ServerA and the other half on ServerB.

In the Trenches

You'll probably see an error with a very confusing error message-it's the nested transaction error (error number 7395):

Server: Msg 7395, Level 16, State 2, Line 1 Unable to start a nested transaction for OLE DB provider 'SQLOLEDB'. A nested transaction was required because the XACT_ABORT option was set to OFF. [OLE/DB provider returned message: Cannot start more transactions on this session.]

To use distributed partitioned views, you must have the XACT_ABORT option turned on at the client level. You need to turn this on in Query Analyzer before you execute your INSERT statement.

This is the number one gotcha I've experienced with DPVs. Once this option is turned on, SQL Server aborts any transaction where a runtime error occurs. This option is required for DPVs to work and is enabled by default, so you must run the following statement in Query Analyzer:

SET XACT_ABORT ON

After you turn on XACT_ABORT, you can rerun the INSERT statement.

You can also insert data manually into your DPV as shown here:

INSERT INTO DPVCustomers (CustomerID, CompanyName, ContactName) Values('BKNIG', 'Test Company', 'Brian Knight')

In the Trenches

A DPV design problem that made me bang my head on my keyboard involved the CHECK constraints. These constraints cause problems if you use Enterprise Manager or syntax like this to disable them:

ALTER TABLE Customers NOCHECK CONSTRAINT all

The CHECK constraint becomes 'untrusted' since SQL Server can't validate the data that already exists once you enable it. This results in the following error when you try to modify data in a partitioned view that uses the table:

Server: Msg 4436, Level 16, State 12, Line 1 UNION ALL view 'DPVCustomers' is not updateable because a  partitioning column was not found.

The only way to fix this problem is to drop the CHECK constraint and then add it back. This behavior is by design and is not a bug.

Querying a Partitioned View

Querying from a DPV is simple. All you have to do is connect to any of the servers participating in the DPV and run a standard SELECT statement as shown here:

SELECT * FROM DPVCustomers ORDER BY CustomerID

Each server takes a slice of the workload, and then returns the data that is merged by the server the client used to execute the query.

For a more interesting query, look at the following:

SELECT * FROM DPVCustomers WHERE CustomerID = 'QUICK'

If you execute this query from ServerB, which doesn't hold the segment of the data you need, the query is sent to ServerB and then sent to the client. ServerA is never even queried. This demonstrates the power of linked servers. Even if you segment data onto ten servers, only the servers that hold the necessary data to fulfill the query are interrogated.

If ServerA goes down, ServerB can still service any queries that need its segment of data if the client is connected to ServerB. If any part of your query needs data from ServerA, you would receive the following error:

Server: Msg 17, Level 16, State 1, Line 1 SQL Server does not exist or access denied.

Updating a Partitioned View

When Microsoft first designed DPVs, the plan didn't include permitting users to update primary keys, since that task would require moving data between servers. When Microsoft submitted their benchmark to the TPC, other database vendors rejected SQL Server's benchmark claims because of this limitation. This missing ingredient was fixed a few months later, before SQL Server 2000 entered production, and is now available. For example, you can update a CustomerID, as shown here:

UPDATE DPVCustomers SET CustomerID = 'MNGHT' WHERE CustomerID = 'BKNIG'

If you run this query from any of the participating servers, the record is moved from ServerA to ServerB because its CustomerID is now greater than MAAAA. This is useful if you're partitioning based on a sales region, and the customer moves to a new region.

DPV Limitations

Since DPV is a brand new feature in SQL Server 2000, there are bound to be limitations that may seem strange. Things like primary key updateability were considered a low priority until Microsoft's benchmark results were rejected. As this feature matures, we can expect easier methodology, and more tasks that can be accomplished via a GUI. Until then, we'll have to deal with some strange problems, and items that seem like bugs but are just limitations. Keep the following guidelines in mind when designing and implementing DPVs:

Always turn on the XACT_ABORT option before you run a SQL statement against a DPV.
Spend a lot of time planning your scalability strategy. You don't want to have to go back and reslice your data if you can avoid it.
Increase performance by using the Lazy Schema Validation option on your linked servers involved in the DPV.
Be sure the partitioning column is involved in the primary key.
You can only reference a particular column once in a partitioned view.
Use the same ANSI_PADDING settings on all instances of your table and view.
When inserting data, explicitly declare each column, even if the value is NULL.
Try to avoid text columns whenever possible.

In the Trenches

Another DPV common issue involves text, ntext, and image fields. If you try to run an UPDATE statement against a distributed partitioned view where the core table has one of these fields, you receive the following error:

Server: Msg 8626, Level 16, State 1, Line 1 Only text pointers are allowed in work tables, never text, ntext, or image columns. The query processor produced a query plan that required a text, ntext, or image column in a work table.

Error 8626 is by design. It is caused by SQL Server's inability to move the text, ntext, or image field columns from one distributed partition to another.

The only workaround is to remove the column or delete the row, and add it back with the modified data. For more information, see article Q270007 at http://support.microsoft.com/.