Preinstallation Concerns | ppk on JavaScript, 1/e

There are some items to be noted before you proceed with the configuration of your replication topology. Most of these items are corrected during the installation. By making the tables in your publication capable of being replicated, you may break your applications that use queries like:

SELECT * FROM Categories

This is because some types of replication will add a column to each table for replication. In this case, you may return more columns than the application is expecting and cause some errors. Be aware of these before proceeding and make sure your application isn't using blanket SELECT * queries. In most cases, this will not harm your application even if the application is using blanket queries. This would only cause a problem if the application was using the list of fields in the table to paint the application's screen.

Data Types Concerns

User-defined data types are not fully supported in replication. Use of these data types in replication will work if the data types are predefined on the subscribers. As you configure the articles with these data types, SQL Server will convert them to their base data counterparts during synchronization.

Merge replication requires that each table participating in the replication have a column with a uniqueidentifier data type and the ROWGUIDCOL attribute enabled. This assigns a global unique identifier (GUID) to each column added and assures that your data has a unique ID across all servers. If your table already has this, you're set. If it doesn't, SQL Server will automatically create one for you in each participating table. If you would like to do this yourself in a table, use syntax like the following:

CREATE TABLE GUIDTable (ProductNumber int,  Inventory int,  GUID uniqueidentifier ROWGUIDCOL)

Then insert data into the table as shown in the following syntax. The newid() function creates the GUID. You can also make this your default value for the column.

INSERT INTO GUIDTable values(1, 1000, newid()) INSERT INTO GUIDTable values(3, 5000, newid()) INSERT INTO GUIDTable values(8, 5, newid())

The contents of the table would look like the following results after running the INSERT statements. Each GUID will be unique across each system:

ProductNumber Inventory   GUID ------------- ----------- ------------------------------------  1             1000        D6511462-2DEB-11D5-8DF2-000094B63497 3             5000        D6511463-2DEB-11D5-8DF2-000094B63497 8             5           D6511464-2DEB-11D5-8DF2-000094B63497

Dealing with Identity Columns

Tables that deal with identities require special consideration. One of the misconceptions in the SQL Server community is that SQL Server cannot support identity columns in a replication topology. SQL Server can indeed support them if you use a small clause when creating the table. If you try to replicate a table with an identity column in it, SQL Server inserts it with the IDENTITY_INSERT option turned on. This option explicitly inserts the identity data into the remote system.

By design, SQL Server will then reseed the remote table to the inserted value plus one. In most systems, this leads to Primary Key constraint errors when you're trying to insert duplicate values, because of the reseeding. The way to avoid this is by specifying the NOT FOR REPLICATION clause on an identity column. By doing this, you force SQL Server to turn off the feature that reseeds the database after an insert is made with the IDENTITY_INSERT option turned on. Let's look at the GUIDTable table we created earlier, but make the ProductNumber column use an identity column in this manner:

CREATE TABLE GUIDTable2 (ProductNumber int identity(1,1) NOT FOR REPLICATION PRIMARY KEY,  Inventory int,  GUID uniqueidentifier ROWGUIDCOL)

Another identity problem with replication occurs when multiple subscribers can insert data. Say, for example, that Server A inserts data into your GUIDTable2 table shown above and receives the identity of 1. Then, Server B inserts a different record and also receives an identity of 1. You will then create a collision in your data and replication will stop due to a Primary Key conflict. This will occur because the SQL Server is no longer reseeding the table due to the NOT FOR REPLICATION clause. Even when the clause is not used, you risk collisions with data not reaching the servers until another record is inserted. The way around this is to strategically seed and increment your identity columns. For example, with two servers in your topology, you may want to use the following strategy:

Server A's column has a seed of 1 and increments by 2.This will make the server use only odd numbers for the identity column.
Server B's column has a seed of 2 and increments by 2.This will make the server use only even numbers for the identity column.

How about a four-server topology? As you add more servers into your topology, you have to get more creative with your identity columns. For a four-server topology, I use the following strategy:

Server A's column has a seed of 1 and increments by 2.This will make the server use only odd numbers for the identity column.
Server B's column has a seed of 2 and increments by 2.This will make the server use only even numbers for the identity column.
Server C's column has a seed of -1 and increments by -2.This will make the server use only odd negative numbers for the identity column.
Server D's column has a seed of -2 and increments by -2.This will make the server use only even negative numbers for the identity column.

Let's look at one final scenario where we throw eight total servers into the topology. We have to get even more creative to accommodate our identity columns in this scenario:

Server A's column has a seed of 1 and increments by 2.This will make the server use only odd numbers for the identity column.
Server B's column has a seed of 2 and increments by 2.This will make the server use only even numbers for the identity column.
Server C's column has a seed of -1 and increments by -2.This will make the server use only odd negative numbers for the identity column.
Server D's column has a seed of -2 and increments by -2.This will make the server use only even negative numbers for the identity column.
Server E's column has a seed of 1,000,000,001 and increments by 2.This will make the server use only odd numbers for the identity column.It also creates a range of data starting at one billion.
Server F's column has a seed of 1,000,000,002 and increments by 2.This will make the server use only even numbers for the identity column.It also creates a range of data starting at one billion.
Server G's column has a seed of -1,000,000,001 and increments by 2.This will make the server use only odd numbers for the identity column.It also creates a range of data starting at negative one billion.
Server H's column has a seed of -1,000,000,002 and increments by 2.This will make the server use only even numbers for the identity column.It also creates a range of data starting at negative one billion.

For servers E through H, I chose one billion because it's approximately half the amount of data an integer can store. Because of the complexity of these scenarios, it's extremely important to plan your design well in advance. As you add more servers to your topology, you will need to shrink the identity range slowly. In some cases, you may prefer to use the uniqueidentifier we discussed earlier to uniquely identify a column in place of the identity column. The catch is that the uniqueidentifier is 16 bytes compared to the 4-byte int field. You can also use a bigint to make your ranges larger since it's only 8 bytes in size.