Log shipping is an extremely flexible and compelling option as a
primary or secondary method of availability with SQL Server. It is
a proven solution with a good track record. From an implementation
and cost perspective, log shipping provides one of the best
Replication is a popular feature of SQL Server because it allows you to make copies of your data and keep them synchronized with a standardized process that can be automatic or manual. How does replication fit into the high availability picture? This chapter discusses the two ways to think about replication: using replication to make your databases available and making your replication architecture available.
As noted in Chapter 3, Making a High Availability Technology Choice, although transactional replication or merge replication can be used to make a database available, they would be third on the list of Microsoft SQL Server technology choices. For the purposes of high availability, if you had to choose one form of replication, it would be transactional because it has the
The best (and most likely) scenario is one in which you need to make a copy of all your data for read-only reporting purposes, which is something log shipping cannot do well. In the event of a disaster, with the proper objects and users added, you might also be able to use the replicated data for your application. If you generate this copy of the database by transactional replication, also unlike log shipping, your latency can perhaps be seconds instead of minutes. This section details what you need to know to plan for implementing replication for availability purposes.
The other likely scenario is the need to make a synchronized copy of a subset of your data, which is also not possible using log shipping. Log shipping, like clustering, provides a solution for your entire database ”you cannot log ship a single table or a subset of a table.
As noted already, you more than likely will
Merge replication is useful when the following criteria are met:
Multiple Subscribers have to update data at various times. Those changes are sent back to the Publisher and finally to other Subscribers. An example of this is near-real-time inventory updates for a chain of stores. This would allow each individual store to query other stores for an
Site autonomy is critical. As with the previous inventory example, each individual store needs to have data specific to that location, but also needs some shared corporate data (such as product
Merge replication provides a rich framework for conflict resolution for updating Subscribers. You can either use the conflict resolution provided by SQL Server or code your own custom solution. As a general rule, conflicts should be infrequent. The data should not be very complex if you are allowing Subscribers to update data, because it means a more complex conflict resolution. Remember, data ownership is at the heart of all conflicts.
There might be certain instances in which merge replication can have a lower latency than transactional replication. It depends on the distributor and distribution database as well as the
Snapshot replication is
The amount of data you are replicating is not large, and is read-only.
The data being replicated does not change often. Snapshot replication is useful if changes to the data are substantial, but infrequent. For example, if you batch update catalog data and you affect most of the database, it is more efficient to generate and deliver a complete snapshot than replicate individual rows. If you want to compare this to any other SQL Server “based technology, you can think about backup and restore. A snapshot is a point-in-time picture of your data, just like a backup is a point-in-time picture of your data. How current the Subscriber will be is dependent on the frequency of the snapshots you generate and apply.
New data does not have to be distributed immediately because out- of-date data is still usable. This also infers that latency would be acceptable in this case because Subscribers do not need up-to-the- minute updates. A good example of this type of data is census information, which is updated every 10
Subscriber sites are often disconnected, as when they are located all around the world. Again, this implies that a high latency tolerance exists,
Transactional replication is useful when the following criteria are met:
Low latency for change delivery to the Subscriber is required. In this case, you are looking for things like real-time reporting that will give you accurate information that is much more current than merge or snapshot replication can provide.
In terms of transactional consistency, with transactional replication, all the changes associated with a transaction executed at the Publisher are
You need high throughput and low data latency. Merge replication can run frequently, as can log shipping, but it is measured in minutes and not seconds. With transactional replication, depending on the volume of transactions, you need good network connectivity and
Your application cannot
If you are going to use replication to create a warm standby, it is like log shipping: You can use Network Load Balancing to abstract the name change, or you can change the name of the instance if it is possible to match the
In using replication, your schema design is very important because it dictates if you can use replication or not. If you are considering replication, you need to think about it long before you even implement your solution. It starts with the design phase.
With transactional and merge replication, all published tables must contain a declared primary key. Merge replication requires that for any tables that have foreign keys, the referenced table should also be in the publication. You are ensuring data integrity at the Subscriber. You therefore need to design your schema properly to include proper primary and foreign keys.
Only transactional replication requires an explicitly declared primary key on a published table. Furthermore, merge replication does not require all referenced tables to be in the publication.
Some packaged applications do not support the modification of your database schema, in particular primary keys and foreign keys. Do not implement replication without asking your third-party software vendor if this invalidates their application support.
If your replicated table does not contain a column with a uniqueidentifier data type, SQL Server adds one when the publication is generated. This is used to identify a row so that it can be reconciled if it is updated elsewhere.
Different forms of replication support different row sizes and numbers of columns that can be replicated. With snapshot or transactional replication, a table being replicated can support up to 255 columns and a maximum row size of 8000 bytes in the publication. A table used by merge replication can have up to 246 columns and a maximum row size of 6000 bytes to allow 2000 bytes for the overhead of conflict resolution. If the row size is greater than 6000 bytes, conflict-tracking metadata may be truncated. Think back to Chapter 4, Disk Techniques for High Availability, where you learned how to calculate a table s row size. Using this, you will know enough about your schema to help you configure replication. Do not assume that you will be able to send every column in your table to a Subscriber if you want to send the entire table and you exceed the 6000- or 8000-byte limit for each row.
One major difference from log shipping is that not all changes tracked in the transaction log are sent over as part of replication in any model.
If you update replicated objects beyond their initial publication, it is not straightforward to send these as part of replication.
More important, your schema cannot be very flexible. If you make constant changes to your database structure, it might be hard to maintain a replication solution. The only things you can do schema-wise are add a column to a publication or delete a column from a publication using Enterprise Manager or using sp_repladdcolumn and sp_dropreplcolumn directly. Considering that adding and dropping a column are the most common schema changes people make on their tables, this satisfies most requirements.
If you are replicating these columns as
With transactional replication or snapshot replication, you can send text or image data types, but if you are using the immediate updating or queued updating options, changes made at the Subscriber to data replicated with text or image data types is not supported. If these are read-only subscriptions, replicating text and image data types as part of transaction or snapshot replication is supported.
If you are using UPDATETEXT or WRITETEXT to update text and image columns when publishing those columns using transactional replication, the text pointer should be retrieved within the same transaction as the UPDATETEXT or WRITETEXT operation with read
In addition, when you obtain the text pointer, you should not perform any operations that can alter the location of the text pointed to by the text pointer (such as updating the primary key) before executing the UPDATETEXT or WRITETEXT statements.
This is the recommended way of using UPDATETEXT and WRITETEXT operations with data to be replicated:
Begin the transaction.
Obtain the text pointer with read repeatable isolation.
Use the text pointer in the UPDATETEXT or WRITETEXT operation.
Commit the transaction.
DECLARE @textpointer binary(16) WRITETEXT MyTable.MyColumn @textpointer 'Sample Text' -- Dummy update to fire the trigger that will update metadata and ensure the -- update gets propagated to other Subscribers. If you set the value to -- itself, there is no change. UPDATE MyTable SET MyColumn = MyColumn WHERE ID = '1'
In your current schema, if you are using automatically generated integer columns as identity columns or as columns to help partition data, you might have to use the NOT FOR REPLICATION constraint in your schema.
NOT FOR REPLICATION can only be implemented using
Note that with transactional replication, the identity property is not propagated by default to a read-only Subscriber. If you are using data types with the identity property, consider these
You can replicate timestamp columns with merge replication or transactional replication with queued updating, but note that the column is replicated and the value is regenerated at the Subscriber. Therefore, you do not get the value that is at the Publisher. Generally, this does not create a problem for an application. Timestamps are not
Timestamps are most commonly used to perform optimistic concurrency control where a preread timestamp value on a row is compared to the current value just before issuing an update on the row. If the timestamps
If you are using timestamp data type columns in your database and
If you are replicating between SQL Servers with different character sets, no translation of data and data types occurs between Publisher and Subscriber. You therefore need to ensure that all servers in your replication topology have the same code page and sort order. If you do not ensure that all of the Subscribers match the Publisher in this regard, you might experience different results from your queries. If such a scenario exists, you might not want to use that Subscriber as a potential secondary.