Lesson 1: Planning Replication

[Previous] [Next]

Good planning is the key to successful replication implementation. Setting up a replication scenario can be complex. The graphical tools make the physical implementation easy, but the design of the scenario warrants the same level of attention that the development of your original database schema did. You must have a working model before you can implement replication successfully.

This lesson begins with a conceptual consideration of business issues. It then addresses the details of physically implementing a replication solution. The lesson concludes with a discussion of data definition issues in replication.

After this lesson, you will be able to

Address issues in planning a Microsoft SQL Server version 7 replication scenario

Estimated lesson time: 30 minutes

Design Considerations

Designing the implementation of replication is similar to designing a database: you must plan the logical design before attempting the physical implementation. This section considers the questions you should address.

What Data Will You Publish?

When distributing data, your subscribing servers often require only a subset of data from the database. It is important to replicate only necessary data to reduce disk space use, processing time, and network input/output (I/O).

Think about how you should group articles. Will you publish subsets of data—such as by group, by site, or by region? If you have a central Subscriber/multiple Publisher scenario, perhaps each site should publish all of its tables, and the central location should subscribe to each publication.

If you disseminate data, you must do one of the following:

Create one publication for global data to which all sites will subscribe and one publication for partitioned data for each Subscriber

Create one publication for each Subscriber that contains both the global data and the partitioned data

IMPORTANT
You are able to make changes to publications, such as adding new articles, but you have to remove all subscriptions to the publication before you do so. You must then re-create the subscriptions after making the changes. This is one of the reasons why you should plan carefully before adding all of your publications and subscriptions.

Who Receives the Data?

Which servers will subscribe to the data? What characteristics do the destination servers have? Are they known, or are they off line? If sites must have update capabilities, you need to address how to resolve conflicts or have high-quality connections to those sites to make the Immediate Updating Subscribers option feasible.

How Often Must Data Be Replicated?

How will replication be accomplished—on a schedule or on demand?

If your application requirements allow latency, you can schedule infrequent updates. For example, if a site uses data for summary reporting or trend reporting on sales activity for the current year, the data need not be 100 percent current. A marketing department (decision support) is a good example of a site that would not require the most current information; once a week or once a month might be adequate.

What Are the Network Characteristics?

Use the following network considerations to help you choose the type of replication that best suits your network:

Are all nodes in the network always available, or are they available only at periodic intervals?

If they are not always available, consider using a type of replication that supports higher latency.

Is the network fast?

If not, you may want to use filtering to minimize the quantity of data that is sent across the network. Be sure to note whether slow links exist that could be bypassed by having a remote distribution server.

What is the network capacity?

If the capacity is low, consider using remote Distributors. Determine the time of day that has the greatest activity level. This is important if offline Subscribers will connect during peak usage periods. Schedule accordingly.

How reliable is the network?

If failures are common, it is not wise to use a replication plan that includes distributed transactions, as these cannot be completed when the servers involved in the distributed transaction are not connected.

Determining a Replication Solution

Preliminary planning of the physical implementation of the logical design includes addressing the questions in this section.

What Is Your Replication Topology?

What replication topology will you implement? Establishing the server roles lays the physical framework for implementing replication. The decisions to be made are as follows:

Select the physical replication model.

Determine whether the Distributor will be local or remote.

Determine whether the distribution database will be shared. If you have multiple Publishers that share a Distributor, will each use its own distribution database, or will they all share a distribution database on the Distributor?

What Type of Replication Will You Use?

You can select snapshot, transactional, or merge replication.

Who Initiates Replication Activities?

Push subscriptions use Distributor resources, and pull subscriptions use Subscriber resources.

How Many Subscribers Will There Be?

Estimating the number of Subscribers will help to determine the load on the distribution server.

What Are Your Space Requirements?

The following factors affect transaction log sizes for all databases that are involved in replication, as well as the size of the distribution database and working folder on the Distributor:

The amount of activity in the published databases

The number of publications and articles

Replication frequency

Replication latency

Type of replication

Data Definition Issues

Replication data must be defined with specific characteristics because some characteristics cannot be replicated or are altered when replication takes place. This section discusses the issues involved in data definition.

Using Data Types

The following table lists the data types that affect replication.

Data type	Impact
Timestamp	Indicates the sequence of SQL Server activity on a row. Despite the name, timestamp values are not related to date or time; they are database-wide, unique numbers that are intended only to indicate whether a row has changed. The values do not usually have any relevance on the Subscriber, so the timestamp column can be filtered out. If replicated, a timestamp column is replicated to the Subscriber as a binary(8) data type. If you are using the Immediate Updating Subscribers option, a timestamp column is required; one will be added to a table if it does not already exist. Timestamp columns must be removed from merge publications because it is not possible for timestamp column values to ever converge across all sites.
Uniqueidentifier	A globally unique identifier (GUID) created by setting the default for the column to the NEWID() function. In tables published for merge replication, a uniqueidentifier column is required; one will be added to a table if it does not already exist.
User-defined	Cannot be replicated unless the user-defined data type already exists in the Subscriber database.

TIP
For consistent query results on data that is replicated between servers, it is recommended that each server that participates in replication use the same character set and sort order. This is not required, but this practice ensures that query processing behavior is consistent across all servers.

Using an IDENTITY Property

During replication, the value of the column with an IDENTITY property is replicated, but the property itself is not replicated. At initial snapshot, identity columns are not created with the IDENTITY property. This means that values on the Subscriber will always match those on the Publisher.

To partition the data at each site, you can set up an identity column again on the Subscriber. Be sure that you

Using the NOT FOR REPLICATION Option

The NOT FOR REPLICATION option allows you to disable certain features—such as the IDENTITY property, CHECK constraints, and triggers—for data replicated to a Subscriber. When users modify data on the Subscriber, these features operate normally.

A good use of this option is with the multiple Publishers/multiple Subscribers model. For example, you may have a CHECK constraint that verifies that a location is within a particular sales region, ensuring that sales representatives cannot enter orders for customers who are not within their regions. Adding the NOT FOR REPLICATION option to the CHECK constraint allows the data from other regions to be replicated even though it will contain invalid locations.

Set this option when you define a column with an IDENTITY property or a CHECK constraint or when you create triggers.

Configuring Network Security Access

Before you begin implementing a replication scenario, you should ensure that you have met certain basic requirements. This section describes the tasks you need to perform.

Establish Trust Relationships Among Domains

If servers that participate in replication reside within separate Microsoft Windows NT Server domains, you must establish trust relationships between those domains. For information on establishing trust relationships, see Windows NT Server Help.

Verify the Windows NT Domain User Account for SQL Server Agent

By default, replication uses the Windows NT domain user account that SQL Server Agent uses. It is advisable to use the same SQL Server Agent domain user account for all servers that are participating in replication.

Verify that you have a domain user account with administrative privileges in Windows NT for SQL Server Agent. The account should be a member of the Windows NT local Administrators group.