SQL Server utilizes replication agents to do different tasks during the replication process. These agents are constantly waking up at some frequency and fulfilling specific jobs. As you can see in Figure 22.18, several replication agent categories are listed under the Replication Monitor branch. Let's look at the main ones.
Figure 22.18. Various agent jobs.
The Snapshot Agent
The snapshot agent is responsible for preparing the schema and initial data files of published tables and stored procedures, storing the snapshot on the distribution server and recording information about the synchronization status in the distribution database. Each publication will have its own snapshot agent that runs on the distribution server. It will take on the name of the publication within the publishing database, within the machine on which it executes ([Machine][Publishing database][Publication Name ]).
Figure 22.18 shows what this snapshot agent looks like under the SQL Server Agent (Management > SQL Server Agent > Jobs) branch in Enterprise Manager. In addition, it also can be referenced from the Replication Monitor option (within the Replication Monitor > Agents > Snapshot Agents branch). You probably most often will use the Replication Monitor path to these agents!
It's worth noting that the snapshot agent might not even be used if the initialization of the subscriber's schema and data is done manually.
The Snapshot Agent
The snapshot agent is the process that ensures that both databases start on an even playing field. This process is known as synchronization. The synchronization process is performed whenever a publication has a new subscriber. Synchronization happens only one time for each new subscriber and ensures that database schema and data are exact replicas on both servers. After the initial synchronization, all updates are made via replication.
When a new server subscribes to a publication, synchronization is performed. When synchronization begins, a copy of the table schema is copied to a file with an .SCH extension. This file contains all the information necessary to create the table and any indexes on the tables, if they are requested . Next , a copy is made of the data in the table to be synchronized and written to a file with a .BCP extension. The data file is a BCP, or bulk copy file. Both files are stored in the temporary working directory on the distribution server.
After the synchronization process has started and the data files have been created, any inserts , updates, and deletes are stored in the distribution database. These changes will not be replicated to the subscription database until the synchronization process is complete.
When the synchronization process starts, only new subscribers are affected. Any subscriber that has been synchronized already and has been receiving modifications is unaffected. The synchronization set is applied to all servers that are waiting for initial synchronization. After the schema and data have been re-created, all transactions that have been stored in the distribution server are sent to the subscriber.
When you set up a subscription, it is possible to manually load the initial snapshot onto the server. This is known as manual synchronization. For extremely large databases, it is frequently easier to dump the database to tape and then reload the database on the sub-scription server. If you load the snapshot this way, SQL Server will assume that the databases already are synchronized and automatically will begin sending data modifications.
The Snapshot Agent Processing
Figure 22.19 shows the details of the snapshot agent execution for a typical push subscription.
Figure 22.19. Snapshot agent execution.
Following is the sequence of tasks that are carried out by the snapshot agent:
The Log Reader Agent
The log reader agent is responsible for moving transactions marked for replication from the transaction log of the published database to the distribution database. Each database published using transactional replication has its own log reader agent that runs on the distribution server. It will be easy to find because it takes on the name of the publishing database whose transaction log it is reading [Machine name][Publishing DB name] and REPL-LogReader category. Figure 22.18 also shows a Log Reader agent for the Northwind database.
After initial synchronization has taken place, the log reader agent begins to move transactions from the publication server to the distribution server. All actions that modify data in a database are logged to the transaction log in that database. Not only is this log used in the automatic recovery process, but it also is used in the replication process. When an article is created for publication and the subscription is activated, all entries about that article are marked in the transaction log. For each publication in a database, a log reader agent reads the transaction log and looks for any marked transactions. When the log reader agent finds a change in the log, it reads the changes and converts them to SQL statements that correspond to the action that was taken in the article. The SQL statements then are stored in a table on the distribution server waiting to be distributed to subscribers.
Because replication is based on the transaction log, several changes are made in the way the transaction log works. During normal processing, any transaction that has either been successfully completed or rolled back, is marked inactive. When you are performing replication, completed transactions are not marked inactive until the log reader process has read them and sent them to the distribution server.
It should be noted that truncating and fast bulk-copying into a table are non-logged processes. In tables marked for publication, you will not be able to perform non-logged operations unless you, temporarily, turn off replication on that table.
The Distribution Agent
A distribution agent moves transactions and snapshot jobs held in the distribution database out to the subscribers. This agent won't be created until a push subscription is defined. This distribution agent will take on the name of what the publication database is along with the subscriber information [machine name][publication DB name][subscriber machine name]. Figure 22.18 also shows a distribution agent for the Northwind database.
Those not set up for immediate synchronization share a distribution agent that runs on the distribution server. Pull subscriptions, to either snapshot or transactional publications, have a distribution agent that runs on the subscriber. Merge publications do not have a distribution agent at all. Rather, they rely on the merge agent, discussed next.
In transactional replication, the transactions have been moved into the distribution database, and the distribution agent either pushes out the changes to the subscribers or pulls them from the distributor, depending on how the servers were set up. All actions that change data on the publishing server are applied to the subscribing servers in the same order they were incurred. Figure 22.20 shows the latest history of the distribution agent and the successful delivery of a transaction.
Figure 22.20. Distribution agent activity.
The Merge Agent
When dealing with merge publications, the merge agent moves and reconciles incremental data changes that occurred after the initial snapshot was created. Each merge publication has a merge agent that connects to the publishing server, and the subscribing server and updates both as changes are made. In a full merge scenario, the agent first uploads all changes from the subscriber where the generation is 0, or the generation is greater than the last generation sent to the publisher. The agent gathers the rows in which changes were made, and those rows without conflicts are applied to the publishing database.
A "conflict" can arise when changes are made at both the publishing server and the subscription server to a particular row(s) of data. A conflict resolver handles these conflicts. Conflict resolvers are associated with an article in the publication definition. These conflict resolvers are sets of rules or custom scripts that can handle any complex conflict situation that might occur. The agent then reverses the process by downloading any changes from the publisher to the subscriber. Push subscriptions have merge agents that run on the publication server, whereas pull subscriptions have merge agents that run on the subscription server. Snapshot and transactional publications do not use merge agents.
The Miscellaneous Agents
In Figure 22.21, you can see that several miscellaneous agents have been set up to do house cleaning around the replication configuration. These agents include the following:
Figure 22.21. Miscellaneous agents.