Replication is a popular feature of SQL Server because it allows you to make copies of your data and keep them synchronized with a standardized process that can be automatic or manual. How does replication fit into the high availability picture? This chapter discusses the two ways to think about replication: using replication to make your databases available and making your replication architecture available.
As noted in Chapter 3, Making a High Availability Technology Choice, although transactional replication or merge replication can be used to make a database available, they would be third on the list of Microsoft SQL Server technology choices. For the purposes of high availability, if you had to choose one form of replication, it would be transactional because it has the lowest latency from Publisher to Subscriber. Remember that you can only use replication for high availability in certain scenarios and with certain considerations.
The best (and most likely) scenario is one in which you need to make a copy of all your data for read-only reporting purposes, which is something log shipping cannot do well. In the event of a disaster, with the proper objects and users added, you might also be able to use the replicated data for your application. If you generate this copy of the database by transactional replication, also unlike log shipping, your latency can perhaps be seconds instead of minutes. This section details what you need to know to plan for implementing replication for availability purposes.
The other likely scenario is the need to make a synchronized copy of a subset of your data, which is also not possible using log shipping. Log shipping, like clustering, provides a solution for your entire database ”you cannot log ship a single table or a subset of a table.
As noted already, you more than likely will employ transactional replication for high availability uses. Depending on the situation, more than one form of replication might provide some sort of benefit for you.
Merge replication is useful when the following criteria are met:
Multiple Subscribers have to update data at various times. Those changes are sent back to the Publisher and finally to other Subscribers. An example of this is near-real-time inventory updates for a chain of stores. This would allow each individual store to query other stores for an out-of-stock item. As long as the employee knew there was a bit of latency (for example, updated every half hour ), he or she could call the other store to check. All of this would ensure customer satisfaction and maybe a sale.
Site autonomy is critical. As with the previous inventory example, each individual store needs to have data specific to that location, but also needs some shared corporate data (such as product numbers and prices).
Merge replication provides a rich framework for conflict resolution for updating Subscribers. You can either use the conflict resolution provided by SQL Server or code your own custom solution. As a general rule, conflicts should be infrequent. The data should not be very complex if you are allowing Subscribers to update data, because it means a more complex conflict resolution. Remember, data ownership is at the heart of all conflicts.
Note | There might be certain instances in which merge replication can have a lower latency than transactional replication. It depends on the distributor and distribution database as well as the size of transactions. |
Snapshot replication is unlikely to be useful as a means of providing high availability, but a short description is provided here for completeness. Snapshot replication can be used when the following criteria are met:
The amount of data you are replicating is not large, and is read-only.
The data being replicated does not change often. Snapshot replication is useful if changes to the data are substantial, but infrequent. For example, if you batch update catalog data and you affect most of the database, it is more efficient to generate and deliver a complete snapshot than replicate individual rows. If you want to compare this to any other SQL Server “based technology, you can think about backup and restore. A snapshot is a point-in-time picture of your data, just like a backup is a point-in-time picture of your data. How current the Subscriber will be is dependent on the frequency of the snapshots you generate and apply.
New data does not have to be distributed immediately because out- of-date data is still usable. This also infers that latency would be acceptable in this case because Subscribers do not need up-to-the- minute updates. A good example of this type of data is census information, which is updated every 10 years in the United States.
Subscriber sites are often disconnected, as when they are located all around the world. Again, this implies that a high latency tolerance exists, especially if your network connection is slow and possibly unreliable.
Transactional replication is useful when the following criteria are met:
Low latency for change delivery to the Subscriber is required. In this case, you are looking for things like real-time reporting that will give you accurate information that is much more current than merge or snapshot replication can provide.
In terms of transactional consistency, with transactional replication, all the changes associated with a transaction executed at the Publisher are guaranteed to be delivered to the Subscriber together and in the same order. Merge replication, on the other hand, achieves data convergence using row-by-row reconciliation and does not guarantee that all the changes associated with a transaction will be delivered in the same synchronization session.
You need high throughput and low data latency. Merge replication can run frequently, as can log shipping, but it is measured in minutes and not seconds. With transactional replication, depending on the volume of transactions, you need good network connectivity and guarantees that the links between your sites will be available. The larger the distance between sites, the more you need to plan your architecture properly.
Your application cannot tolerate changes to the database schema.
If you are going to use replication to create a warm standby, it is like log shipping: You can use Network Load Balancing to abstract the name change, or you can change the name of the instance if it is possible to match the name of the original Publisher.
More problematic , however, is ensuring that the logins needed for use at the Subscriber exist so that the warm standby can function as the Publisher. Part of the problem here, too, is that all data might not be at the Subscriber, so can you really use the Subscriber as a new database?
In using replication, your schema design is very important because it dictates if you can use replication or not. If you are considering replication, you need to think about it long before you even implement your solution. It starts with the design phase.
With transactional and merge replication, all published tables must contain a declared primary key. Merge replication requires that for any tables that have foreign keys, the referenced table should also be in the publication. You are ensuring data integrity at the Subscriber. You therefore need to design your schema properly to include proper primary and foreign keys.
Only transactional replication requires an explicitly declared primary key on a published table. Furthermore, merge replication does not require all referenced tables to be in the publication.
Some packaged applications do not support the modification of your database schema, in particular primary keys and foreign keys. Do not implement replication without asking your third-party software vendor if this invalidates their application support.
If your replicated table does not contain a column with a uniqueidentifier data type, SQL Server adds one when the publication is generated. This is used to identify a row so that it can be reconciled if it is updated elsewhere.
Different forms of replication support different row sizes and numbers of columns that can be replicated. With snapshot or transactional replication, a table being replicated can support up to 255 columns and a maximum row size of 8000 bytes in the publication. A table used by merge replication can have up to 246 columns and a maximum row size of 6000 bytes to allow 2000 bytes for the overhead of conflict resolution. If the row size is greater than 6000 bytes, conflict-tracking metadata may be truncated. Think back to Chapter 4, Disk Techniques for High Availability, where you learned how to calculate a table s row size. Using this, you will know enough about your schema to help you configure replication. Do not assume that you will be able to send every column in your table to a Subscriber if you want to send the entire table and you exceed the 6000- or 8000-byte limit for each row.
One major difference from log shipping is that not all changes tracked in the transaction log are sent over as part of replication in any model.
Important | If you update replicated objects beyond their initial publication, it is not straightforward to send these as part of replication. |
More important, your schema cannot be very flexible. If you make constant changes to your database structure, it might be hard to maintain a replication solution. The only things you can do schema-wise are add a column to a publication or delete a column from a publication using Enterprise Manager or using sp_repladdcolumn and sp_dropreplcolumn directly. Considering that adding and dropping a column are the most common schema changes people make on their tables, this satisfies most requirements.
If you are replicating these columns as parts of your publications , consult the SQL Server Books Online topics Planning for Transactional Replication and Planning for Merge Replication. One of the biggest differences between transactional replication and merge replication is that with merge, WRITETEXT and UPDATETEXT are not supported. You must perform an explicit UPDATE on the column as described in the steps later in this section.
With transactional replication or snapshot replication, you can send text or image data types, but if you are using the immediate updating or queued updating options, changes made at the Subscriber to data replicated with text or image data types is not supported. If these are read-only subscriptions, replicating text and image data types as part of transaction or snapshot replication is supported.
If you are using UPDATETEXT or WRITETEXT to update text and image columns when publishing those columns using transactional replication, the text pointer should be retrieved within the same transaction as the UPDATETEXT or WRITETEXT operation with read repeatability . For example, do not retrieve the text pointer in one transaction and then use it in another. It might have moved and become invalid.
In addition, when you obtain the text pointer, you should not perform any operations that can alter the location of the text pointed to by the text pointer (such as updating the primary key) before executing the UPDATETEXT or WRITETEXT statements.
This is the recommended way of using UPDATETEXT and WRITETEXT operations with data to be replicated:
Begin the transaction.
Obtain the text pointer with read repeatable isolation.
Use the text pointer in the UPDATETEXT or WRITETEXT operation.
Commit the transaction.
DECLARE @textpointer binary(16) WRITETEXT MyTable.MyColumn @textpointer 'Sample Text' -- Dummy update to fire the trigger that will update metadata and ensure the -- update gets propagated to other Subscribers. If you set the value to -- itself, there is no change. UPDATE MyTable SET MyColumn = MyColumn WHERE ID = '1'
In your current schema, if you are using automatically generated integer columns as identity columns or as columns to help partition data, you might have to use the NOT FOR REPLICATION constraint in your schema.
Note | NOT FOR REPLICATION can only be implemented using Transact -SQL and not through Enterprise Manager. NOT FOR REPLICATION is an option of ALTER TABLE, ALTER TRIGGER, CREATE TABLE, and CREATE TRIGGER. |
Note that with transactional replication, the identity property is not propagated by default to a read-only Subscriber. If you are using data types with the identity property, consider these carefully when setting up replication for warm standby to ensure that your application works correctly after failover to the warm standby database. In this case, there are two options: Choose another technology to create the warm standby server, or manually manage the identity property at the Subscriber as described in the Replication Data Considerations section in SQL Server Books Online.
You can replicate timestamp columns with merge replication or transactional replication with queued updating, but note that the column is replicated and the value is regenerated at the Subscriber. Therefore, you do not get the value that is at the Publisher. Generally, this does not create a problem for an application. Timestamps are not specifically representative of clock time, but are instead increasing numbers based in part on the ID of the database in which the timestamp column exists. As a result, these columns have unique context only within the database in which they are generated.
Timestamps are most commonly used to perform optimistic concurrency control where a preread timestamp value on a row is compared to the current value just before issuing an update on the row. If the timestamps differ , you know another user updated the row. The application can then choose to prevent the update or take other appropriate action. Replicating the actual originating value of a timestamp column from one location to another is, in many cases, not highly useful, so the replication subsystem masks this column out of the propagated update and allows each site to calculate a unique timestamp value for each row that is appropriate to its database context. This helps ensure consistency in most common timestamp uses (for example, optimistic locking), regardless of whether an update arrived at a given replica as part of a replicated transaction or directly from a user application. If you do want to replicate timestamp columns and keep values the same, you might have to store it in another way and perform a conversion. If you want to include time-based data, consider the use of a datetime data type instead of a timestamp.
If you are using timestamp data type columns in your database and employing snapshot or transactional replication with read-only Subscribers or those that allow immediate updating, the literal values are replicated but the data type of the column at the Subscriber is changed to a binary(8). This might cause problems if you want to use the Subscriber later in a failure scenario. For example, a data type difference at the Subscriber means that any application behavior that is expecting a timestamp will not function if the secondary is used for updating. If the secondary is for read-only reporting or for read-only access in a disaster recovery scenario while the primary is being repaired, the data type change might not matter.
If you are replicating between SQL Servers with different character sets, no translation of data and data types occurs between Publisher and Subscriber. You therefore need to ensure that all servers in your replication topology have the same code page and sort order. If you do not ensure that all of the Subscribers match the Publisher in this regard, you might experience different results from your queries. If such a scenario exists, you might not want to use that Subscriber as a potential secondary.