Replication Internals | MCSE Training Guide (70-229). Designing and Implementing Databases with MicrosoftR SQL ServerT 2000 Enterprise Edition

Now it's time to see how all of the types of replication are actually implemented: how they process their work, how they relate to one another in processing, and the events they undergo. Here you will find how each replication is implemented internally in SQL Server 2000 and what events fire in a given situation.

To support and configure snapshot, merge, and transactional replication (as in the sub-objective, "Support merge, snapshot, and transactional replication models"), you need to know the internals of these three models.

Inside Snapshot Replication

So, how does snapshot replication work? How does the publication get from the Publisher to the Subscriber in this model?

The Snapshot agent copies snapshot files (containing published tables and database objects such as stored procedures) into the Snapshot folders, which by default reside in the Distributor.

The following events take place in snapshot replication:

Snapshot replication is configured, and the Snapshot agent tries to connect to the publisher.
If the connection is successful, the Snapshot agent transfers snapshot files (containing published tables and database objects such as stored procedures) into the Snapshot folder.
The Distribution agent takes these database objects and schemas (found in the Snapshot folder) and rebuilds them in the Subscriber.
The Distribution agent tries to connect to both the Distribution database and the Subscriber's database.
If the connection succeeds, the Distribution agent reads the database objects and schema and copies them onto the Subscriber.

You know how snapshot replication works. What about transactional replication? How is it internally different? The next section describes it.

Inside Transactional Replication

As you learned in the preceding section, transactional replication differs from other types of replication in that it copies the transactions that occurred in the publishing database to the distribution database on the Distributor. The three agents involved in Transactional replication are the Distribution, Log Reader, and Snapshot agents . As in snapshot replication, the Snapshot agent copies snapshot files (containing published tables and database objects such as stored procedures) into the Snapshot folder, which by default resides on the Distributor. (However, as in snapshot replication, it is possible to specify an alternate location for storage.) The Snapshot agent does this only to create the initial snapshot, which can then be propagated to the Subscriber.

The new agent used in transactional replication is the Log Reader agent. The Log Reader agent has a pretty straightforward job. Its job is to keep track of the Transaction log of the databases that are marked for Transactional replication and to copy those certain transactions (that are marked for replication) into the distribution database. It might be worth noting that the Log Reader agent examines only the transactional log of the Publisher.

Similar to the way the Distribution agent works in snapshot replication, the Distribution agent in transaction-based replication copies the tables containing the transactions from the Distributor onto the Subscriber. The snapshot transactions are held in a folder until the Distribution agent moves them to all Subscribers or a Distribution agent residing at the Subscriber calls for the transactions. The Snapshot folder contains schema and data of published tables; it is propagated to the Subscriber only once, to set up the replication for the first time. Copying the whole publication onto the Subscriber is called applying the initial snapshot . After the initial snapshot is applied at the Publisher, the Subscriber can then receive incremental changes made at the Publisher.

Transactional replication is best used in situations where you want changes to be delivered to Subscribers as they occur, when you need strict adherence to ACID properties (as defined in Chapter 2, "Data Modeling"), the network connection is reliable, or Subscribers connect frequently. This is usually the technique of choice because it requires little if any modification of the table structure and delivers changes with very little latency.

Another situation when transactional replication is useful is when the amount of data in the tables is huge and the number of changes, inserts , and deletes is relatively small. In this case it is faster and better to use transactional replication and simply replicate the changes than it is to use snapshot and resend the entire huge table on a scheduled basis.

The following events take place in transactional replication:

Snapshot or transactional replication is configured and the Snapshot agent tries to connect the Distributor with the Publisher.
If the connection is successful, the Snapshot agent transfers Snapshot files (containing published tables and database objects such as stored procedures) into the Snapshot folder.
The Distribution agent tries to connect the Distributor with the server on which it is located (either the Distributor or Subscriber).
If the connection succeeds, the Distribution agent reads the database objects and schema and copies them onto the Subscriber.
The Log Reader agent monitors the transaction log (hoping to find any UPDATE , INSERT , and/or DELETE statement) and when it finds modifications, it sends it to the distribution database. The scheduled basis at which the Log Reader agent monitors can be set to either continuous monitoring or monitoring after a custom time interval.
When the Distributor sends a push or the Subscriber calls a pull, the Distribution agent reads the files from the distribution folder, and the actions are applied at the Subscriber. When a Subscriber activates the agent through a pull, changes are propagated to all Subscribers.

Now you have seen how transactional replication works, and have learned that it is a good solution when data needs to be sent and received in real time. In other instances, however, transactional replication is not the best alternative. Many scenarios require more flexibility, such as that offered by merge replication. The next section describes merge replication and how it processes replication.

Inside Merge Replication

In SQL Server 2000, merge replication is made possible by using both the Snapshot and Merge agents. As in other types of replication, the Snapshot agent carries out the job of preparing initial Snapshot files (containing published tables and database objects such as stored procedures) and transfers them to the Snapshot folder.

As soon as you allow users in multiple locations to change their own copies of the data independently, data conflicts become inevitable. A conflict occurs when data has been changed at more than one location and then an attempt is made to synchronize the data. You have a choice whether to use row-level or column-level conflict tracking. In row-level tracking, a change to two separate columns still results in a conflict. For example, one person might change a person's name while another person changes the phone number. In column-level tracking this is not an issue, but using row-level tracking results in a conflict.

Timestamp column values are regenerated when the data is copied to the Subscribers. Timestamps can be compared to determine when changes were made.

If you are replicating columns that are defined as Foreign Keys, make sure to include the reference table in the publication so that integrity can be maintained .

The Merge agent is responsible for the detection of all conflicts. Conflicts are resolved by priority, by first synchronization, or by a custom resolver as defined through a stored procedure or COM object. The default mechanism used by SQL Server is priority based, where priorities are assigned to each Subscriber.

The Merge agent transfers modifications from the Publisher and applies them at the Subscriber and then takes modifications made at the Subscriber and applies them at the Publisher. If conflicts occur during synchronization between the Publisher and Subscriber, the Merge agent gets notified and takes the proper course of action. Taking the proper course of action depends on how resolution has been set up.

Merge replication allows for the most flexibility of all the replication models, but latency is the highest of the three models. This form of replication is appropriate in situations where the Subscribers need to be able to update the data they receive, network connectivity is not consistent, and conflicts have been minimized by application design or are otherwise unlikely . Merge replication requires the highest degree of design planning in that merge replication requires a uniqueidentifier column with the rowguidcol attribute included. When applied, merge replication may have to alter the table structure and will require conflict resolution strategies to be applied.

Merge replication enables each copy of data to be changed independently, but it is not the only solution if data needs to be changed at the Subscriber. The alternative is to allow updating subscriptions with transactional and snapshot replication.