Replication Overview | Professional SQL Server 2005 Administration (Wrox Professional Guides)

SQL Server replication closely resembles the magazine publishing industry, so we'll use that analogy to explain its overall architecture. Consider National Geographic. For each monthly issue, the publisher has to collect articles, photos, and interviews. These items then form a pool, from which the editors pick out items to be included in a monthly issue, called a publication. Once a monthly publication is printed, it is shipped out via various distribution channels to subscribers all over the world.

In SQL Server replication, similar terminology is used. The pool from which a publication is formed can be considered a database. Each piece selected for publication is an article; it can be a table, a stored procedure, or another database object. Like a magazine publisher, replication also needs a distributor to deliver publications, keep track of delivery status, and maintain a history of synchronization to maintain data consistency. Depending on the kind of replication model you choose, articles from a publication can be stored as files stored in a folder that both publisher and subscriber(s) have access to, or transactions within articles can be committed on subscribers synchronously or asynchronously. Regardless of how publications are delivered, replication always needs a distributor database to keep track of delivery status. Depending on the capacity of the publication server, the distributor database can be on the same server as the publisher or other dedicated or nondedicated servers.

Conceptually, however, there are differences between SQL Server replication and a magazine publishing business. For example, in some replication models, a subscriber or subscribers can update articles and have them propagated back to the publisher or other subscribers. In the peer-to-peer replication model, each participant of replication acts both as publisher and subscriber, so that changes made in different databases will be replicated back and forth between multiple servers.

There are quite a few enhancements in replication with the release of SQL Server 2005 (for example, replication of new data types such as varchar(max), XML data types, and CLR objects, including assemblies). You can also replicate partitioned tables and full-text indexes. One significant improvement is that most ALTER DDL statements work without removing replication first.

Another significant enhancement in SQL Server 2005 is the introduction of peer-to-peer replication. This headless replication model provides a good alternative for load-balancing and high-availability scenarios.

Replication types

SQL Server 2005 has the basic replication types that existed in SQL Server 2000: snapshot replication, transactional replication, and merge replication. In addition, peer-to-peer replication was introduced in SQL Server 2005. Among those types, snapshot replication is the most basic.

Snapshot Replication

As its name implies, snapshot replication takes a snapshot of a publication and makes it available to subscribers. When the publication is applied on the subscribing database, the articles in the subscriber, such as tables, views, and stored procedures, will be dropped and recreated. Snapshot replication is best suited for fairly static data, when it is acceptable to have copies of data that is out of date between replication intervals, or when article size is small. For example, suppose you have lookup tables to maintain ZIP codes. Those tables can be good snapshot-replication candidates in most cases, because they are fairly static.

Transactional Replication

Transactional replication allows for faster data synchronization with less latency. When an environment is enabled for transactional replication, the subscriber should already have the initial snapshot applied. For subsequent changes applied to published articles, a log-reader process will record them and propagate those changes to the subscriber. Depending on how it is set up, this data synchronization can occur near real time, so it is useful for cases where you want incremental changes to happen at the subscriber quickly.

Merge Replication

Merge replication is usually used when there is no constant network connectivity among publishers and subscribers. It allows sites to work fairly autonomously and merge the changes to the data when they are together online again. Once again, it needs a snapshot to initialize the replication. When that is done, subsequent changes are tracked with triggers. When there are data conflicts during replication, they will be resolved by the rules set up earlier.

Replication Components

The SQL Server Agent plays a very important role in replication, as it did in SQL Server 2000. Publication creation, data distribution and merging, setting and checking status, and cleaning up replication history are all triggered by SQL Server Agent jobs.

In addition to SQL Server Agent, there are other agents specifically designed for replication, such as the Snapshot Agent, Distribution Agent, Merge Agent, Log Reader Agent, and Queue Reader Agent. These agents are all executables that will be called when you initiate jobs either directly from SQL Server Agent or from replication tools from Management Studio.