Designing a Replication Strategy | MCITP Administrator: Microsoft SQL Server 2005 Optimization and Maintenance (Exam 70-444) Study Guide

Replication is another method of working with data at remote sources. SQL Server 2005 provides several methods of replication technologies so that you are able to copy and distribute your database objects among several databases and to multiple locations. The initial philosophy of replication is to enable the data to be close to the user. From this philosophy, many technologies have come forth that answer such questions as these: How fresh is the replicated data? How much bandwidth is available? How often is the original data updated? Can the user at the replicated site update the data? Is the data centrally located? Can the user at the replicated site just update the data belonging to that site and have read access to all other sites’ data? Answering these questions has spurred several technologies, each based on the publisher-subscriber metaphor.

Understanding Replication Terminology

SQL Server 2005 uses the publishing metaphor for its replication strategy. This includes the publisher, distributor, and subscriber with the related components of publications, subscriptions, and articles, as shown in Figure 8.8.

image from book
Figure 8.8: The publisher-distributor-subscriber metaphor

Let’s first make certain you understand the terms:

Publisher The publisher maintains the source database. It makes data that is published and available for replication. Furthermore, it detects and sends changes to all published data to the distributor.
Distributor The distributor is a server that maintains the distribution database. It also stores metadata, historical data, transactions, and queues. The distributor’s role of storing the data and moving it out to the subscriber depends on the method of replication chosen. For example, it has a greater role for snapshot replication and transactional replication than it has for merge replication, as you will learn in the following sections.
The distributor can be on the same computer as the publisher. However, in a large environment it is most often on its own server and might support several publication servers.
Subscriber The subscriber holds a copy of the data and changes to the data and makes them available for your users. Depending on the replication topology chosen, the subscriber might be allowed to update data and send it back to the publisher.
Publication A publication is the basis of the data being replicated. It can contain one or more articles. An article can be either data or objects in a database. A user database can have more than one publication on it.
Subscription A subscription is a request for a copy of a publication, including all the articles contained within the publication, to be sent to the subscriber. There are two types of subscriptions:
- Push subscription A push subscription is defined at the publisher. The publisher sends its changes to the subscriber without any request from the subscriber. You can accomplish this in several ways: on demand, continuously, or per a schedule. The Distribution Agent or Merge Agent in charge of propagating the changes resides at the distributor.
- Pull subscription The subscriber initiates the pull subscription. The subscriber requests the changes made at the publisher. The user controls data synchronization at the subscriber. The changes are usually synchronized on demand or on a schedule as opposed to continuously. Since a publisher might have a large number of subscribers and they might be autonomous or disconnected, the Distribution Agent or Merge Agent runs at the subscriber.

Article An article is data or a database object that is the smallest part of a publication. It can contain such objects as stored procedures, views, and user-defined functions. However, you cannot subscribe to an article; you must subscribe to a publication.

Understanding Replication Types

Now that you understand the terminology, we’ll explain how it applies to SQL Server 2005.

There are three main types of replication for use in distributed applications: snapshot replication, transactional replication, and merge replication. You need to understand many factors to determine the correct type of replication for a particular application. Some factors are the physical environment, the amount of data to be replicated, whether the data will be updated by the subscriber, and whether the replication is server to server or server to computer, such as a laptop or handheld device.

Understanding the Replication Process

An initial synchronization of the published articles between the publisher and the subscriber is typical for each type of replication. Most often this initial synchronization is performed by replication with a snapshot. A snapshot is merely a copy of everything specified by the publication. Once the snapshot has been created, it is sent out to the subscribers.

If data changes infrequently, snapshot replication might be all that is required. The snapshot and the snapshot replication process distribute the data exactly as it appears at a certain point in time. If changes to your data are infrequent and it is acceptable to have copies be outdated for a period of time (until the next snapshot), then you should use snapshot also as your replication technology.

For some applications, it is important that data changes incrementally after the initial snapshot flow to the subscriber, either continuously or over a scheduled time. Other applications further require that changes flow from the subscriber back to the publisher. For these situations, transactional replication and merge replication provide the necessary options.

Introducing Snapshot Replication

As mentioned previously, snapshot replication distributes your data exactly as it appears at a given moment in time. It is a copy of the data at the publication. When synchronization occurs, the entire snapshot is again sent to the subscriber.

Snapshot replication is most appropriate for data that changes infrequently, for those situations where it is acceptable to have data that is not fresh, for small volumes or data, or for data that has a large amount of changes over a short time span.

Snapshot replication has a small continuous overhead at the publisher. However, if the data set being published is large, then snapshot replication requires a goodly amount of resources to generate and apply the snapshot.

Figure 8.9 shows you the snapshot replication components.

image from book
Figure 8.9: Snapshot replication components

Introducing Transactional Replication

Typically, transactional replication starts with a snapshot of the publication. Incremental changes in both data and schema at the source are replicated to the destination as they occur. These changes are applied in near real time to the subscriber in the same order they were written at the publisher, guaranteeing a consistency between the two.

Transactional replication is most often used in environments that desire minimal latency (delay in time). It is normally applied in server-to-server replication. The publisher or subscriber can be either a SQL Server or a non-SQL Server database, such as Oracle.

The default definition of transactional replication creates a read-only publication. Changes are not usually replicated back to the publisher. However, transactional replication offers two other types beyond this standard format: transactional replication with updating subscriptions using a hierarchical topology and the new peer-to-peer topology. In the former, conflicts to changes made at multiple subscribers are resolved automatically. In the latter topology, there is no data change conflict because a row can be changed at only one location; this is termed data stewardship.

Note

See Chapter 7 for conflict resolution in replication topologies.

Introducing Merge Replication

Merge replication also typically starts with a snapshot of the publication. This topology uses triggers to track subsequent changes made at either the publisher or the subscriber. When it connects to the network, the subscriber synchronizes with the publisher and exchanges all changes since the last synchronization.

Merge replication is most often used in server-to-client replication. One example is the replication between a server and a mobile device such as a PDA. However, a variety of situations both with and without data conflicts can utilize this technology. Each subscriber can require a different partition of the data; in this case, there are no conflicts. If each subscriber is vying for the same data, conflicts can occur; in this instance, you need to have procedures available to detect and handle the conflicts.

Understanding the Role of Replication Agents

The replication process uses a number of individual programs called agents that run as jobs scheduled under the SQL Server Agent. These agents carry out tasks that are a part of tracking changes and distributing data. Therefore, the SQL Server Agent must be running for the jobs of the replication agents to run. You can also run the replication agents by using the command line or by using applications with replication management objects (RMO). You administer replication agents using SQL Server Replication Monitor and SQL Server Management Studio.

We will now review the tasks of these agents.

The SQL Server Agent

The SQL Server Agent is the host and schedules the jobs for replication and other agents used in the administration of SQL Server 2005. For replication agents to run, you need to make certain the SQL Server Agent is running. It is wise to have it start automatically when SQL Server starts.

Tip

The SQL Server Agent should be running under a domain user account. If you have installed the SQL Server Agent under a local system account, the service is able to access only the local computer.

The Snapshot Agent

The Snapshot Agent prepares the snapshot files. These files contain the schema and data of the tables and database objects you have decided to publish. The files are then stored in the snapshot folder for the publisher. The agent then records the tracking information in the distribution database on the distributor. When you configure a distributor, you specify the snapshot folder.

The Snapshot Agent performs the following tasks:

It establishes the connection between the publisher and the distributor. If necessary, the agent takes locks on the published tables. In merge publications, the Snapshot Agent does not take locks. For transactional publications, the Snapshot Agent takes locks only during the beginning phase of the snapshot generation. For snapshot publications, locks are held for the entire snapshot generation process.
It writes a copy of the table schema for each article to an .sch file. If you are publishing other database objects, it generates those additional script files.
It copies the published data from the publisher and writes them to the snapshot folder. The snapshot is generated as a set of bulk copy program (BCP) files.
In both snapshot and transactional publications, the Snapshot Agent appends rows to the MSrepl_commands and MSrepl_transactions tables in the distribution database. The MSrepl_commands table contains entries that are commands that indicate the location of .sch and .bcp files, any other snapshot files, and references to any pre- or post-snapshot scripts. The MSrepl_transactions table contains entries that are commands pertaining to synchronizing the subscriber.
It releases any locks on published tables.

The Distribution Agent

The Distribution Agent is used with both snapshot and transactional replication. It is in charge of moving the newly created snapshot jobs and transactions that are held in the distribution database to each subscriber for synchronization.

The Distribution Agent performs the following tasks:

It establishes a connection to the distributor.
It reads the location of the snapshot files from the MSrepl_commands table and the subscriber synchronization commands from both the MSrepl_commands and MSrepl_ transactions tables.
It applies the schema as well as other commands to the subscription database.

Tip

The Distribution Agent is configured by default to run under SQL Server authentication (distributor_admin). If you change the authentication to Windows when it logs into an instance of SQL Server, it will fail.

The Merge Agent

The Merge Agent is used with merge replication. It applies the initial snapshot to the subscriber. It then moves and reconciles any changes that occur. The Merge Agent runs at the distributor for push subscriptions and at the subscriber for pull subscriptions. Each merge subscription has its own Merge Agent that connects and updates both the publisher and the subscriber. Usually the merge is bidirectional, but it can be configured so that changes are moved in only one direction.

The Merge Agent performs the following tasks:

It establishes a connection to the publisher.
It examines the sysmergeschemachange table on the publisher and determines whether there is a new snapshot that should be applied at the subscriber.
If a new snapshot is available, the Merge Agent applies to the subscription database the snapshot files from the location specified in sysmergeschemachange.

The Log Reader Agent

The Log Reader Agent is used with transactional replication. It monitors the transaction logs of each database configured for transactional replication and searches for transactions that need to be replicated. It moves those transactions from the publisher’s transaction log to the distribution database. Each database has its own Log Reader Agent.

The Queue Reader Agent

The Queue Reader Agent takes the messages from a queue and applies them to the appropriate publication. It runs at the distributor and moves changes made at the subscriber back to the publisher.

Only a single instance of the Queue Reader Agent exists, which services all publishers and their publications for a given distribution database.

Deciding How to Distribute Your Data

You need to consider many factors as you set up your replication strategy. The following three factors are of primary importance:

Latency You must decide whether your replication strategy requires all copies of the data to be always up-to-date or whether a certain degree of latency, time delay, is acceptable. The latency factor is important in your choice of replication topology. For example, some reporting scenarios require up-to-the-minute data, meaning that a transactional topology is necessary, whereas other types of reports might be fine with a daily lag that would indicate a snapshot replication technology.
Site autonomy You need to know the amount of independence each site is allowed. This independence is termed autonomy. For example, you must know whether the subscribers will always be connected to the publisher. Snapshot replication and merge replication offer a higher degree of independence than transactional replication, which needs a constant connection.
Transactional consistency Your strategy must maintain a synchronized state between the publisher and subscriber; this is called transactional consistency. A transaction is a series of modifications to your data that must be completed in their entirety or rolled back. When you use replication, you will always have some latency, but you must maintain transactional consistency.

Understanding Replication Environments

To best understand how you can use replication in your environment, it is helpful to separate replication into two categories: server to server and server to client.

Replicating Data in a Server-to-Server Environment

You can achieve some important advantages by replicating data in a server-to-server environment:

Improving scalability and availability The ability to maintain continuously updated copies of your data across multiple servers allows your users to read data with minimal conflicts, thus improving scalability. It further provides access to data when one or more servers might be taken offline, thus improving availability.
Integrating data from multiple sites Quite often data is entered at multiple sites and must be consolidated to a centralized site. In a like manner, data can be replicated to remote offices of a company.
Utilizing data warehousing and reporting You can use replication to move data from online transactional processing (OLTP) sources to data warehousing and reporting servers.
Integrating heterogeneous data You can use replication to access data that is stored in a non–SQL Server database and use it with SQL Server data.
Offloading batch processing You can use replication to offload batch processing to a dedicated server.

Replicating Data in a Server-to-Client Environment

Applications interact between server and client. Present-day clients include workstations, laptops, tablets, and devices. It is often expedient to configure replication in this process. Here are a few situations:

Exchanging data with mobile users Many applications are based on providing and collecting data from mobile users. For example, a salesperson might collect customer data and need to relay it to the central database. Such methods are part of customer relationship management (CRM) and sales force automation (SFA) applications.
Utilizing consumer point of sale (POS) applications POS applications require the captured data be replicated to a central site. Examples of POS applications are customer checkout locations and ATMs.
Integrating data from multiple sites You may have an application that supports several regional offices that need to synchronize with a central office. In this scenario, you have data that can flow in both directions. These applications use replication topologies to handle the data transfer.

Configuring a Replication Strategy

You’ll now set up a replication so you can see how it works. In Exercise 8.3, you will be using the named instance that you created in Exercise 8.1 for your subscriber, so make certain it is available.

Exercise 8.3: Configuring Transactional Replication

In this exercise, you will configure transactional replication. The default instance of your SQL Server 2005 server will be the publisher and distributor. The named instance created for Exercise 8.1 will be the subscriber. You can use this scenario when you need minimal latency and have a well-established connection with the publishing server. As a reminder, the default instance is NEWCATSERVER, and the named instance is TRAINING.

Open SSMS, and connect to both your default and named instances. Also, make certain that the SQL Server Agent is running in both instances.

It is good practice to create and share a folder to contain the snapshots. You can use the default NTFS security and share permissions. Create and share a folder such as C\:Snapshots.
In Object Explorer on your default instance, expand Replication, right-click Local Publication, and choose New Publication.
The New Publication Wizard appears, as shown here. Click Next.
The Distributor page opens. Choose that your server will act as its own distributor (the default radio button). Click Next.
The Snapshot Folder page opens. Notice the default location for the snapshots. Change the default location to the snapshot folder you created, C:\Snapshots. Click Next.
The Publication Database page opens. Here you will choose the database that contains the data or objects you want to publish. Choose AdventureWorks, and click Next.

The Publication Type page appears. On this page, you must choose the type of publication that fits your replication strategy. Notice that each type is described in the lower panel, as shown here. Take time to review each type. Choose Transactional Publication, and click Next.
The Articles page opens. On this page, you select the articles to publish. Expand Tables, and check the Contact(Person) table. Click Next.
The Filter Table Rows page opens. Here you can horizontally partition your table. For example, you can choose just the rows with ContactID < 100. Let’s add a filter here. Choose Add.
The Add Filter page opens. Double-click ContactID. This fills in the Filter Statement box on the right with [ContactID]; type < 100 next to it. Your screen should look like the one shown here. Click OK.

You are taken back to the Filter Table Rows page. The filter you created should appear in the box at the bottom of the page. Click Next.
The Snapshot Agent page opens. Select the Create a Snapshot Immediately check box, and click Next.
The Agent Security page opens. Make certain that Use the Security Settings from the Snapshot Agent is checked, as shown here. Then click the Security Settings button.
Specify the account you are using for the Snapshot Agent process. This account can be the service account you are using for your SQL services. Ensure that the radio button Connect to the Publisher By Impersonating the Process Account is chosen, as shown here. Click OK.

You are taken back to the Agent Security page with the Security Settings for the Snapshot Agent and the Log Reader Agent filled in. Click Next.
The Wizard Actions page opens. Make certain the Create the Publication Selection is checked. Notice that you are also able to script your actions on this page. Click Next.
The Complete the Wizard page opens. Give the publication a name, such as AWContactsPub. Your screen should look like the one shown here. Click Finish.

We hope you had success on each step of the Creating Publication page. Your distributor as well as the AWContactsPub publication should now be created. The Snapshot Agent also should have started. Your screen should look like the one shown here. Click Close.

image from book

Your publication should appear under the Local Publications folder in the Object Explorer.

In Exercise 8.4, you will create a push subscriber on a remote server that will use the publication you just created. For this exercise, the remote server is the named instance.

Exercise 8.4: Configuring a Push Subscription

In this exercise, you will create a push subscription using the AWContactsPub publication. The subscriber will be located on the TRAINING named instance.

With SSMS open, right-click the local publication you just created, [AdventureWorks]: AWContactsPub. Choose New Subscriptions from the drop-down menu.
The New Subscription Wizard opens, as shown here. Click Next.
The Publication page opens. Under Databases and Publications, AWContactsPub should be showing and highlighted, as shown here. Click Next.

The Distribution Agent Location page opens. Since you are creating a push subscription, keep the default of Run all Agents at the Distributor. Notice the choice of moving the agents to the subscriber in the case of a pull subscription. Click Next.
The Subscribers page opens, as shown here. Since you want to create the subscriber on the TRAINING instance, click the Add Subscriber button, and choose Add SQL Server Subscriber from the drop-down menu.
Connect to the TRAINING instance of your server by choosing the instance name from the Server Name drop-down list, and click the Connect button.
Under the Subscription Database column for your TRAINING instance, choose <New Database…>, and name it AWCONTACTS. You can also choose an existing database or have AWCONTACTS created prior to creating the subscription. The Subscribers page should look like the one shown here. Click Next.

The Distribution Agent Security page opens. You need to specify the security information for all subscriptions. To do this, click the ellipsis on the far right. A page opens for you to enter your security information. Use the process account for the Distribution Agent, filling in the account name and password. Connect to both the distributor and subscriber using the default of impersonating the process account, as shown here. Click Next.
The Synchronization Schedule page opens. Keep the default to schedule the agent to run continuously, as shown here. Click Next.

The Initialize Subscriptions page opens. Keep the defaults and have the subscriber initialize immediately, as shown here. Click Next.
The Wizard Actions page opens. Keep the default to create the subscription. Click Next.
The Complete the Wizard page opens with the choices you have selected. Click Finish to create the subscription.
The Creating Subscription(s) page opens, creating the subscription. After the subscription has completed successfully, close the window.
Expand the TRAINING location in Object Browser. Your subscription is located in the Replication folder under the Local Subscriptions folder, as shown here.

You can launch Replication Monitor by right-clicking the Replication folder in your default instance. If you expand your server name on the left side of the window, you can see your publication. The right side of the window shows your subscription, stating its performance and latency, as shown here.

image from book

Administering Replication

Once you have chosen the appropriate replication methodology for your application and configured the topology, you need to understand how to administer your replication topology. This section covers some of the best practices that Microsoft recommends. We’ll merely list these practices here, but we urge you to read more about this area in SQL Server Books Online under the topics of “Best Practices for Replication Administration” and “Frequently Asked Questions for Replication Administrators.”

Here are the recommended best practices that you should apply to all replication topologies:

Develop and test a backup and restore strategy.
Script the replication topology.
Create thresholds and alerts.
Monitor the replication topology.
Establish performance baselines and tune replication if necessary.

Here are the recommended best practices that you should apply but that might not be required for your topology:

Validate data periodically.
Adjust agent parameters through profiles.
Adjust publication and distribution retention periods.
Understand how to change article and publication properties if application requirements change.
Understand how to make schema changes if application requirements change.

In the next sections, we will discuss the implementation of some of these best practices.

Considering Implementation Issues

Since much of your replicated data will travel over distributed environments that may involve the Internet or more than one domain within your own company, you need to be aware of and provide the needed security for the replication connections. The song remains the same: if you are able to use Windows authentication for your server and agent service accounts, configure them in that manner.

Considering Security Issues

Grant the agents only the permissions appropriate to the tasks. SQL Server 2005 Books Online has a definitive list of these permissions for agents under the topic “Replication Agent Security Model.” You can secure connections to your replication servers by using Secure Sockets Layer (SSL) through a certificate or Internet Protocol Security (IPSEC) to encrypt data that is transmitted across the unsecured network. You can view security settings in SSMS within the Security page of the property sheet of each replication object. You can also use system stored procedures to change passwords for replication agent accounts. For example, you can use the sp_changereplicationserverpasswords stored procedure to change the passwords for all instances of an account used by all replication agents on a particular replication server.

Using a Dedicated Snapshot Folder

As mentioned in Exercise 8.3, it is good practice to use a dedicated folder to store snapshots. The Snapshot Agent needs write permission to this folder. The other agents that use this folder are the Distribution Agent and perhaps the Merge Agent. They need read permissions to the folder. You might remember when you used the wizard that it cautioned you about using a local path. If your distributor is not on the publisher, you must specify a UNC network share name (\\) for the folder.

Tip

If you specify a network share for the snapshot folder, remember to grant appropriate share permissions for the folder also.

Using a VPN Connection

If you are using replication over the Internet with mobile subscribers, configuring a secure virtual private network (VPN) might be your best solution. We have found that to be a great solution.

Using No Longer Supported Features

We must mention two more items while we are talking about security if you have been using replication in previous versions of SQL Server.

First, ActiveX controls are marked as unsafe in SQL Server 2005. Thus, they are not able to be used for scripting for initializing variables or functions. This means that the Snapshot Agent ActiveX control is no longer available in this version of SQL Server. A managed Snapshot Agent has replaced it.

Second, attachable subscriptions have been deprecated. You might have used this feature in previous versions of SQL Server when deploying a large number of pull subscriptions, such as in merge replication. You now have several methods to use in place of attachable subscriptions, such as initializing your subscriptions with a backup or initializing your subscriptions through a script.

Note

This is a great place to tell you more about RMO. All aspects of SQL Server replication can be designed programmatically by using RMO. Previous versions of SQL Server used SQL Distributed Management Objects (SQL-DMO). RMO gives you the model for all the enhancements we have been discussing. For example, you can create a publisher programmatically using RMO and specify its attributes and methods. The libraries and classes have all been predefined for you. So if you’d like, you can take off your admin hat and put on your dev hat and enjoy doing everything we have done thus far using Visual Studio 2005. Such neat stuff!

Considering Data Integrity Issues

In replication, you often want the user functions to affect the data differently than the replication agent functions. This concept is particularly true in data constraints. For example, if you place a check constraint for data to be within a certain range for a particular column, this constraint is applied by the user when the data is entered. It is neither necessary nor desired for it to be applied repeatedly upon replication to each subscriber of the data. Therefore, the default application of the check constraint is NOT FOR REPLICATION.

Understanding NOT FOR REPLICATION

In most cases, the default settings specify that constraints are not copied to the subscriber. You can determine this setting by looking on the Articles Page of the Publication Properties sheet for your publication. Select an article, and then click Article Properties. Click Set Properties of All <ObjectType> Articles. Under the Copy Objects and Setting to Subscriber heading, you should see the default settings shown in Table 8.1.

Table 8.1: The Default Settings for Objects Copied to the Subscriber
Open table as spreadsheet
Setting	Default
Copy foreign key constraints	False
Copy check constraints	False
Copy default value specifications	False
Copy user triggers	False

Here’s how they’re applied:

The foreign key constraint is enforced when acted upon by a user but is not enforced when acted upon by a replication agent performing a data manipulation operation.
The check constraint is enforced when acted upon by a user but is not enforced when acted upon by a replication agent performing a data manipulation operation.
The trigger is executed when acted upon by a user but is not executed when acted upon by a replication agent performing a data manipulation operation.
Likewise, the identity column value is incremented when acted upon by a user’s insert operation but not incremented when a replication agent performs the insert operation.

Specifying NOT FOR REPLICATION

You have seen how to view the default schema values for the constraints. You also have the ability to change the values at the same location.

If you want to use Transact-SQL to change in the default specification or if you want to reinforce that specification in the event it has been somehow changed-always a good idea in our eyes-you can use the following:

CREATE TABLE statement This statement contains a NOT FOR REPLICATION clause for the IDENTITY, FOREIGN KEY, and CHECK columns.
ALTER TABLE statement This statement contains a DROP NOT FOR REPLICATION clause for the IDENTITY column. Enabling this indicates that values in IDENTITY columns are incremented by replication agents.
CREATE TRIGGER statement This statement contains a NOT FOR REPLICATION clause that indicates the trigger is not executed when acted upon by a replication agent.
ALTER TRIGGER statement This statement contains a NOT FOR REPLICATION clause that indicates the trigger is not executed when acted upon by a replication agent.

Understanding How to Modify Foreign Keys for Replication

Even though the default for replicated foreign keys in SQL Server 2005 is for the key not to be replicated by the replication agent, you might be upgrading from a previous version of SQL Server and need to modify or make certain that this is indeed the case. We have seen replicating foreign keys as a major cause of problems.

Here is how you disable a foreign key constraint for replication in SSMS:

In Object Explorer, expand the table with the foreign key constraint you want to modify.
Expand the Keys folder.
Right-click the foreign key constraint.
If you are using SP1 or earlier, click Modify; otherwise, click Design.
The Foreign Key Relationships dialog box appears. Select a value of No for Enforce for Replication.
Click Close.

Understanding How to Keep Triggers from Publishing

A different issue is that of keeping triggers from publishing altogether. It is often mistaken that this is the task of the NOT FOR REPLICATION clause; however, as we noted earlier, this is not the case.

The way you keep a trigger from publishing is to encrypt the trigger. Encrypted triggers cannot be published. It’s that simple. You encrypt a trigger by using WITH ENCRYPTION in the CREATE TRIGGER or ALTER TRIGGER statement. Remember, when you alter the trigger, make certain to include the WITH ENCRYPTION clause.

Tip

For the test, know how to keep triggers from replicating and how to keep triggers from publishing.

Designing Replication Alerts

Once the replication topology is established, you need to continually monitor its health. One important job is to configure alerts to warn you when a process goes awry. SQL Server Agent enables you to use alerts to monitor events through the creation of jobs and job steps that can be used to trigger a task, send you a notification, and write to logs.

Using Predefined Alerts

SQL Server 2005 provides several predefined alerts for you to use. These alerts are available when you configure your server as a distributor.

You can use both SQL Server Management Studio and SQL Server Agent to monitor events, such as replication agent events, through alerts. SQL Server Agent monitors the application login windows for events associated with alerts. When it finds an event, SQL Server Agent responds automatically by executing a task or sending a message you have defined to a specified operator. SQL Server also includes a set of predefined alerts for replication agents that you can configure to execute a task and/or notify an operator.

The alerts shown in Table 8.2 are installed when a computer is configured as a distributor.

Table 8.2: Predefined Replication Agent Alerts
Open table as spreadsheet
Message ID	Predefined Alert	Condition Causing the Alert to Fire
14150	Replication: agent success	The replication agent shuts down successfully.
14151	Replication: agent failure	The replication agent shuts down with an error.
14152	Replication: agent retry	The replication agent shuts down after unsuccessfully retrying an operation. For instance, the agent encounters a server not available, dead-lock, connection failure, or timeout failure error.
14157	Replication: expired subscription dropped	The expired subscription is dropped.
20572	Replication: Subscription reinitialized after validation failure	A subscription reinitializes successfully on response from job “Reinitialize subscriptions on data validation failure.”
20574	Replication: Subscriber has failed data validation	The Distribution Agent or Merge Agent fails data validation.
20575	Replication: Subscriber has passed data validation	The Distribution Agent or Merge Agent passes data validation.
20578	Replication: agent custom shutdown

For all predefined alerts, except 14157, “Replication: expired subscription dropped,” and 20572, “Replication: Subscription reinitialized after validation failure,” additional information is entered in the sysreplicationalerts system table stored in the msdb database. Figure 8.10 shows some columns from the table.

Figure 8.10: The Sysreplicationalerts table in msdb

You can configure a predefined replication alert in SSMS. Exercise 8.5 will show you the steps.

Exercise 8.5: Configuring a Predefined Replication Alert

In this exercise, you will configure the “Replication:Subscriber has failed data validation” predefined alert and specify a response job to that alert.

In SSMS Object Browser, connect to your distributor, your default instance of SQL Server.
Expand the SQL Server Agent folder, and then expand the Alerts folder.
Right-click the “Replication: Subscriber has failed data validation” alert, and click Properties.
On the General page, click Enable. Specify the AdventureWorks database as the database to which the alert should apply, as shown here.

Select the Response page. Check the Execute Job box, and click the ellipsis in the Browse button. Click Browse. The Browse for Objects dialog box opens. Select Reinitialize Subscriptions Having Data Validation Failures job, as shown here. Click OK in both open dialog boxes.
On the Response page, also configure an operator for email, a pager, or Net Send, if desired, as shown here.

When this job executes, it reinitializes the subscription by using a RPC to a stored procedure. In this exercise, you are using a local distributor, so you don’t need to do anything else. If the publisher uses a remote distributor, you must define a remote server login at the publisher so that the RPC from the distributor to the publisher can be made.
On the Options page, you can customize the message that goes to the operator.
When you have completed all the customization for the alert, click OK.

Using Replication Monitor for Database Maintenance

Replication Monitor has been enhanced for SQL Server 2005 to enable you to easily see and control your replication strategy. It provides another venue for enabling warnings and alerts. Configuring alerts in Replication Monitor will give you the information you need regarding status and performance in a timely manner. When you open the Warnings and Agents tab in Replication Monitor, you should see a screen like that shown in Figure 8.11.

image from book
Figure 8.11: Warnings and Agents tab in Replication Monitor

If you click the Configure Alerts button, the Configure Replication Alerts dialog box opens containing all the alerts that can be configured in Replication Monitor (see Figure 8.12).

image from book
Figure 8.12: Configuring replication alerts in Replication Monitor

Enabling Replication Monitor Threshold Warnings

You may have noticed on the Warnings and Agents tab that only two warnings appeared. This is because only two warnings pertain to this replication topology, transactional replication. To set different thresholds for the warnings, merely click in the Threshold columns, and make the appropriate changes. Remember to click the Save Changes button.

Here is a list of how the warnings pair up with the replication technologies:

Transactional replication Warn if a subscription will expire within the threshold.
Warn if latency exceeds the threshold.
Snapshot replication Warn if a subscription will expire within the threshold.
Merge replication Warn if a subscription will expire within the threshold.
Warn if a merge length for dial-up connections exceeds the threshold.
Warn if a merge length for LAN connections exceeds the threshold.
Warn if rows merged per second for LAN connections is less than the threshold.
Warn if rows merged per second for dial-up connections is less than the threshold.

You should notice that these warnings actually fall into four categories. Let’s take a look at each category. This will help explain how you should set the thresholds.

Imminent subscription expiration This warning is available for all replication topologies. Once the threshold has been set, if it is met or exceeded, the subscription status is displayed as Expiring Soon/Expired.

Exceeding the specified latency This warning applies only to transactional replication. Latency in this instance means the amount of time that elapses between a transaction being committed at the publisher and the corresponding transaction being committed at the subscriber. Once the threshold has been set, if it is met or exceeded, the subscription status is displayed as Performance Critical.

Exceeding the specified synchronization time This warning applies only to merge replication. Once the threshold has been set, if it is met or exceeded, the status is displayed as Long-Running Merge. You are able to specify different thresholds for dial-up and local area network (LAN) connections.

Falling short of processing the specified number of rows in a given amount of time This warning applies only to merge replication. Once the threshold has been set, if it is met or exceeded, the status is displayed as Performance Critical. You are able to specify different thresholds for dial-up and LAN connections.

Enabling Replication Monitor Alerts

You can trigger an alert in Replication Monitor in addition to displaying a warning. Alerts can be triggered for replication warnings and errors. Let’s go back to the Configure Alerts dialog box.

From the Warnings and Agents screen, click Configure Alerts.
Highlight and click the Configure button for any listed replication alert.
Notice that for each alert you can configure a job to execute and an operator to notify on the Response page of the properties sheet.
You can either cancel out or create an alert at this step.

Monitoring Agent Information

You can view information regarding the agents for a particular publication from the bottom of the Warnings and Agents tab:

To view detailed information about an agent, right-click the agent, and choose View Details. You will see a complete processing history of that agent.
To view detailed information regarding the job that runs the agent, right-click the agent at the bottom of the Warnings and Agents tab, and click Properties.
If you want to manage the profile of an agent, just right-click the agent, and then choose Agent Profile. You are able to list the parameters for an agent’s profile by clicking the Browse button on the right (the ellipsis). The Default Agent Profile properties sheet opens with the parameters set for the profile. If you uncheck Show Only Parameters Used in This Profile, you can see all the options available.
To start a nonrunning agent, right-click the agent, and choose Start Agent.
To stop a running agent, right-click the agent, and choose Stop Agent.

Verifying Replication

Once again Replication Monitor provides several methods for you to monitor your topology in the realm of validation. If you are using transactional replication latency, you can check the validity that the transaction reached the subscription by using a tracer token. If your replication topologies include transactional or merge replication, you can accomplish subscription validation at the publication screen.

Verifying Latency with Tracer Tokens

In SQL Server 2005 you can use tracer tokens to validate connections and measure the latency for transactional replication. The tracer token takes a small amount of data, a token, and writes it to the transaction log of the publication database. The database views the token as though it were a transaction needing replicating and sends it through the replication process.

The token measures the time to commit from the publisher to the distributor and the time to commit from the distributor to the subscriber. From the calculations it performs, you can find whether the token actually reaches the subscriber and, if so, how long it takes. Thus, you are able to discern your poorest-performing subscribers.

In Exercise 8.6, you will view the latency in your replication topology by using a tracer token.

Exercise 8.6: Viewing Latency Using a Tracer Token

In this exercise, you will insert a tracer token into the publication database and find the latency from the publisher to the distributor and from the distributor to the subscriber.

In SSMS, open Replication Monitor if it is not open already. In the left pane, click your publication. In the right pane, click the Tracer Tokens tab.
Click the Insert Tracer button.
As the tracer token is traveling, you should see Pending, and then the elapsed time should display in each column.
Subsequent implants of the tracer token will show only the latest one’s statistics. The drop-down list for Time Inserted can give you historical data. You can see the Tracer Tokens page here.

Validating Replicated Data

You can use Replication Monitor to validate that data replicated to the subscriber matches what was sent from the publisher for transactional and merge replication topologies.

You see how that works in Exercise 8.7.

Exercise 8.7: Validating Subscriptions in Replication Monitor

In this exercise, you will validate your subscriptions to AWContactsPub in Replication Monitor.

In SSMS, open Replication Monitor. In the left pane, expand to your publication, and then right-click it.
On the drop-down menu, choose Validate Subscriptions.
The Validate Subscriptions dialog box opens. In this dialog box, you can choose to validate all SQL Server subscriptions for the given publication or validate a particular SQL Server subscription for the publication. Choose the second radio button, Validate the Following Subscriptions, and highlight the AWCONTACTS subscription, as shown here.

Click the Validation Options button. This opens the Subscription Validation Options dialog box. Notice you have a number of choices on how to query the data used for validation. Keep the default of Compute a Fast Row Count. Notice that the check box for comparing checksums is also checked by default. Keep both defaults, as shown here. Click OK.
When the validation completes, a dialog box opens giving you the results of the publisher to distributor history, distributor to subscriber history, and the undistributed commands, as shown here.

If you are using merge publications, you can also validate that your data is partitioned correctly at the subscriber. You can find more information about this in SQL Server 2005 Books Online under the topic “Validating Partition Information for a Merge Subscriber.”

Validating Data Programmatically

As with all other replication tasks, you can use system stored procedures to validate your replicated data. The stored procedures that apply to validating data are as follows:

sp_article_validation
sp_marksubscriptionvalidation
sp_publication_validation
sp_validatemergepublication
sp_validatemergesubscription

You can find the syntax and parameters for each of these stored procedures in SQL Server 2005 Books Online under the stored procedure’s name.

Tip

Generally, data is not being delivered to your subscribers for one of two reasons. First, the data might not be reaching the destination because of either filtering, an issue with one of the agents, or an error in the replication. Second, if the data is being delivered, it might be being deleted at the subscriber immediately after it is applied.

Designing a Maintenance Plan

You have seen many instances where you can use Replication Monitor in designing a plan for the maintenance of your replication topologies. In addition, you can apply several maintenance jobs through either Replication Monitor or SSMS.

In addition to the agents mentioned earlier, replication has several jobs that perform both scheduled and on-demand maintenance. You can start and stop these jobs from the Jobs folder in SSMS and from the Common Jobs tab in Replication Monitor.

The following list describes the replication maintenance jobs:

Agent History Clean Up: Distribution This job removes replication agent history from the distribution database. Its default run time is every 10 minutes.
Distribution Clean Up: Distribution This job removes replicated transactions from the distribution database. If a subscription has not been synchronized within the maximum distribution retention period, it deactivates that subscription. Its default run time is every 10 minutes.
Expired Subscription Clean Up This job detects and removes expired subscriptions from publication databases. Its default run time is every day at 1 a.m.
Reinitialize Subscriptions Having Data Validation Failures This job detects all subscriptions that have data validation failures. Each failed subscription gets marked for reinitialization and gets a new snapshot applied the next time the Merge Agent or Distribution Agent runs. This job is not enabled by default and thus has no default schedule.
Replication Agents Checkup This job detects replication agents that are not actively logging history. If a job step fails, it writes the failure to the Microsoft Windows event log. Its default run time is every 10 minutes.
Replication monitoring refresher for distribution This job refreshes cached queries in Replication Monitor. It runs continuously.

You can view and modify the properties of these jobs in Replication Monitor or in SQL Server Agent by right-clicking the job and clicking Properties. You can modify the Job Steps, Schedules, Alerts, and Notifications settings to fit your maintenance plan.

Designing a Plan to Resolve Replication Conflicts

When you create a publication that accepts updates from the subscribers, you need to consider how you will handle conflicts. As you learned in Chapter 7, two communication mechanisms can propagate the changes: two-phase commit (often referred to as immediate updating) and queues (which are termed queued updating).

To handle the changes, SQL Server inserts a new column into each table that is included in the publication article. As you may recall, a new globally unique identifier (GUID) is placed in this created column, the uniqueidentifier column, each time the data row is modified. At the time of synchronization between the publisher and subscriber, the Queue Reader Agent compares the values of the GUID to see whether they are the same and whether a change has occurred, and the synchronization process updates the publisher.

Resolving Data Conflicts in Transactional Replication with Updating Subscribers

We’re assuming that you have completed Exercises 8.3 and 8.4, which created a transactional publication with a push subscription or that you know how to create publications and subscriptions. In Exercise 8.8, we’ll show how to create a transactional publication with an updatable subscription so you can see how it works. We will not take you through all the steps of creating the objects, instead just showing you the pertinent screens.

Exercise 8.8: Configuring a Transactional Publication with an Updatable Subscription

In this exercise, you will create a transactional publication of the ProductVendor table of the AdventureWorks database that has an updatable subscription.

Create the publication of the ProductVendor table on your default instance of SQL Server. If you select the ProductID column, the other columns necessary will be selected accordingly for you.

Notice that a uniqueidentifier column is added to the table to track changes, as shown here.
Complete the creation of the publication. We named ours AWVendorUpdate. Note that you also need to set up the linked server accounts to send the updates back to the publisher.

Since queued updating allows the possibility for updates to occur at all servers involved in the process simultaneously, you need to configure a conflict resolution policy.

The default conflict resolution policy is Keep the Publisher Change. You can find this policy on the Subscriptions Options page of the Publication properties sheet, as shown in Figure 8.13.

image from book
Figure 8.13: Conflict resolution policy for updatable subscription

The other two conflict resolution policy options you can specify are Keep the Subscriber Change or Reinitialize the Subscription.

Resolving Data Conflicts in Merge Replication

With merge replication, you can also make changes at both the publisher and the subscriber. Merge replication is often used for server-to-client applications as well as applications where the subscriber might be offline making updates for a period of time and then reconnect. When a Merge Agent runs, it synchronizes the changes by implementing triggers so that the publisher and subscriber converge to the same values. Each replicated table contains a GUID column to track each replicated row uniquely (see Figure 8.14).

image from book
Figure 8.14: Merge replication components

If the data has been modified at both locations, the Merge Agent detects the change as a data conflict. The decision on how the conflict is resolved is based upon how you have defined the conflict resolution policy for merge replication. By default, the publisher is the winner. However, you can choose from a series of resolution options, create your own resolver using stored procedures, or create an RMO procedure.

Figure 8.15 shows the Resolver tab of the Article Properties dialog box in SSMS.

image from book
Figure 8.15: Resolver tab of the Article Properties dialog box

You need to assess your application’s business logic needs when choosing which of the conflict resolvers to use.

Designing a Plan to Modify Agent Profiles

Replication agents are executables that perform tasks associated with replication. As you have seen, they perform a variety of jobs including creating copies of schema and data, detecting updates at the publisher or subscriber, and propagating changes between servers. Each replication agent is defined by the set of parameters given to the executable. When you install a replication topology, the agents defined are given default parameters; however, you are able to modify these parameters, thus modifying the agent’s profile.

Agent profiles are stored on the distributor. As an agent starts, it logs into the distributor and queries for the parameters in its profile. Each agent has at least one predefined profile, its default profile.

In addition to the default profile, the Log Reader Agent, Distribution Agent, and Merge Agent each have additional predefined profiles that you can choose that more closely fit your application. In addition, you are able to create your own agent profiles using the given parameters.

Understanding the Parameters

Here are some parameter concepts you might find helpful when modifying agent profiles:

After your initial testing, monitoring, or debugging, reduce the verbose levels of the agents.
Reduce the –HistoryVerboseLevel parameter and the –OutputVerboseLevel parameter of the Distribution Agents or Merge Agents. Reducing this parameter limits the number of new rows used to track input and output. When you need them for debugging purposes, increase the level again.
Use the –MaxBCPThreads parameter of the Snapshot Agent, Merge Agent, and Distribution Agent. The number of threads specified should not exceed the number of processors on the computer. When you use this parameter, you are provided with the number of bulk copy operations that can be performed in parallel when the snapshot is created and applied.
Use the –UseInprocLoader parameter of the Distribution Agent and the Merge Agent. If you are publishing tables that include XML columns, you cannot use this parameter. This parameter causes the agent to use the BULK INSERT command when the snapshot is applied.

Modifying Agent Profiles

You can modify the agent profiles in SSMS, in Replication Monitor, and programmatically.

Modifying Agent Profiles in SSMS

Here is the procedure to modify the agent profiles in SSMS:

Open SSMS. On the instance containing the distributor, right-click the Replication folder.
Choose Distributor Properties from the drop-down list.
On the General page, click the Profile Defaults button.
The Agent Profiles page opens. You can use this page to modify the agent profiles. Notice that the Distribution Agents, Merge Agents, and Log Reader Agents all have alternate profiles that have been precreated for you. If you search Books Online using the name of each agent, you can see the parameters explained.
You can choose a different agent profile and then click the button at the bottom of the window to Change Existing Agents if desired. Otherwise, the modifications will affect only new agents.
To view or edit the parameters associated with a profile, click the Browse button (ellipsis) on the far right.
The parameters and their values appear. You can see all parameters if you deselect the Show Only Parameters Used in This Profile check box.
To create your own parameters, click the New button, give the new configuration a name, and choose the desired parameters and their metrics accordingly.

Modifying Agent Profiles in Replication Monitor

Here is the procedure to modify the agent profiles in Replication Monitor:

Open Replication Monitor, and choose the publisher.
Right-click the publisher, and choose Agent Profiles.
You will see the same screen you saw if you used SSMS. Follow the steps starting at step 4 in the previous section.

Modifying Agent Profiles Programmatically in SSMS Using the Command Prompt

You need to understand that replication agents are executables that accept command-line parameters. Agents run as job steps under the SQL Server Agent. As such, you can modify them using their job step properties.

You can access the jobs and their corresponding job steps, which you are seeking to modify, either by viewing SQL Server Agent Jobs or by viewing the Replication Monitor tab named Common Jobs. In either case, clicking Properties opens the job. You can find the job steps on the Steps page and edit as you choose.

The changes you make take effect the next time the agent is started. In the case, where the agent runs continuously, it must be stopped and restarted.

Modifying Agent Profiles Programmatically Using Transact-SQL

You should never modify or delete predefined Agent Profiles.

To create a new agent profile, follow these steps:

At the Distributor, through Transact-SQL you need to execute the sp_add_agent_ profile system stored procedure. Here are the parameters it takes:
- For the name, use @name.
- For @profile_type, use a value of 1.
- For @agent_type, use one of the following: 1 for Replication Snapshot Agent, 2 for Replication Log Reader Agent, 3 for Replication Distribution Agent, 4 for Replication Merge Agent, or 9 for Replication Queue Reader Agent.
- For @default, use a value of 1 if you want this profile to be the default profile for its type of replication agent.

The identifier for the new profile is returned using the @profile_id output parameter.

Once the new profile has been created, you can customize it by adding, removing, or modifying the default parameters.

To modify an existing agent profile, follow these steps:

At the distributor, through Transact-SQL you need to execute the sp_help_agent_ profile system stored procedure. Specify one of the following values for the agent type (@agent_type) you want: 1 for Replication Snapshot Agent, 2 for Replication Log Reader Agent, 3 for Replication Distribution Agent, 4 for Replication Merge Agent, or 9 for Replication Queue Reader Agent.

Note the profile_id of the specific profile you want to change. The parameters of this profile will be returned to you. At this time, note any changes you want to make in the parameters.
To change the value of a parameter in a profile, execute the Transact-SQL system stored procedure sp_change_agent_profile. Specify the profile identifier for @profile_id, the name of the parameter to change for @property, and a new value for the parameter for @value.

An existing agent profile cannot become the default profile for an agent. You must create a new profile as the default profile.

To drop an agent profile, follow these steps:

At the distributor, through Transact-SQL you need to execute the sp_help_agent_ profile system stored procedure. Specify the one of the following values for the agent type (@agent_type) you want: 1 for Replication Snapshot Agent, 2 for Replication Log Reader Agent, 3 for Replication Distribution Agent, 4 for Replication Merge Agent, or 9 for Replication Queue Reader Agent.

Note the profile_id of the specific profile you want to drop.
Execute the sp_drop_agent_profile system stored procedure, specifying the profile identifier for @profile_id.

Tuning Replication Configuration

You can enhance replication performance in a number of ways. We will break them into five levels.

Tuning at the Server and Network Level

To enhance replication at the server and network level, try the following:

It is a best practice to set the minimum and maximum amount of memory allocated to Microsoft SQL Server Database Engine to avoid low memory availability during times of replication and operating system paging at other times.
Always use a separate disk drive for the transaction log for all databases involved in replication. This is a good standard practice for all your database development work.
Replication puts a large burden on your server’s memory, especially the tasks of the distributor. Consider adding memory to servers you use in your replication topology.
Use servers with multiprocessors. SQL Server services as well as replication agents can take advantage of additional processors.
Use a network with fast throughput. Don’t let your network be a bottleneck, especially in transactional replication. Even though you can specify settings for slower networks and your replication agent parameters can be modified accordingly, nothing is going to help as much as faster components.

Tuning at the Database Design Level

To enhance replication at the database design level, try the following:

For your publication database, you should follow best practices for database design. This is another good standard practice even if you are not replicating. However, be careful not to have unnecessary indexes at the subscriber. Additional indexes can slow down performance for inserts, updates, and deletes.
To reduce contention between your database users and your replication activity, you might consider setting the READ_COMMITTED_SNAPSHOT database option.
You need to be cautious with application logic in triggers. Especially when used at the subscriber, business logic in user-defined triggers can slow down the replication of changes.
- For transactional replication, it is often more efficient to include this logic in custom stored procedures.
- For merge replication, it is usually more efficient to use business logic handlers.

Limit large object (LOB) data types in your database. They generally require more processing than other column types. Use varchar(max), nvarchar(max), or varbinary(max) instead.

Tuning at the Publication Design Level

To enhance replication at the publication design level, try the following:

Publish only the data that is required. More is not always better. Additional resources will be used unnecessarily.
Use publication design and application behavior to minimize your conflicts. You can utilize publishing subsets of data to subscribers to offset conflicts. You should always plan your design.
Apply filters only when necessary. Applying filters to your data is done on a row-by-row basis by the Log Reader Agent. This affects the throughput of the data. Be judicious in the application of your filters.

Tuning at the Subscription Level

To enhance replication at the subscription level, try the following:

Consider using pull subscriptions if you have a large number of subscribers. Run your agents at the subscribers.
If large amounts of changes are sent to the subscribers, consider the reinitialization of the subscription.

Tuning at the Snapshot Level

To enhance replication at the snapshot level, try the following:

You should run the Snapshot Agent at off-peak times and otherwise only when necessary.
You should use the default native mode snapshot unless a character mode snapshot is required by a particular subscriber.
Create a single snapshot folder for a publication.
Place the snapshot folder on a drive local to the distributor. Do not put the snapshot folder on a drive that is used to store database or log files.
When creating the subscription database at the subscriber, you might consider specifying the recovery model of simple or bulk-logged. Either model will support minimal logging for inserted data. You can change the recovery model after the subscription is finished loading its initial data.
If you have a network with low bandwidth, you might want to consider using an alternate snapshot folder on removable media with compressed snapshots.
If you are replicating a large data set, you might consider initializing a subscription manually.

Setting Up a Performance Baseline

After you have set up your replication strategy, we highly recommend you set up a performance baseline. It is important that you understand the load that is typical for your replication scenario. You have already learned the vast amount of metrics available in Replication Monitor, but don’t forget that you can also use System Monitor and SQL Server Profiler to give you statistical information. If most of your strategy is built on transactional replication, latency and throughout are your two big metrics to keep a handle on. But here is a list of the standard “five dimensions” for which you should have typical numbers:

Latency Latency is the delay in time for a data change to be propagated between nodes in a replication topology.
Throughput Throughput is the amount of replication activity a system can sustain over a period of time.
Concurrency Concurrency is the number of simultaneous processes that can operate on a system.
Duration of synchronization Duration of synchronization is the amount of time it takes a given synchronization to complete.
Resource consumption Resource consumption are those resources used, usually at the hardware and network layers, by the replication process.