Good planning is the key to successful replication implementation. Setting up a replication scenario can be complex. The graphical tools make the physical implementation easy, but the design of the scenario warrants the same level of attention that the development of your original database schema did. You must have a working model before you can implement replication successfully.
This lesson begins with a conceptual consideration of business issues. It then addresses the details of physically implementing a replication solution. The lesson concludes with a discussion of data definition issues in replication.
After this lesson, you will be able to
- Address issues in planning a Microsoft SQL Server version 7 replication scenario
Estimated lesson time: 30 minutes
Designing the implementation of replication is similar to designing a database: you must plan the logical design before attempting the physical implementation. This section considers the questions you should address.
When distributing data, your subscribing servers often require only a subset of data from the database. It is important to replicate only necessary data to reduce disk space use, processing time, and network input/output (I/O).
Think about how you should group articles. Will you publish subsets of data—such as by group, by site, or by region? If you have a central Subscriber/multiple Publisher scenario, perhaps each site should publish all of its tables, and the central location should subscribe to each publication.
If you disseminate data, you must do one of the following:
IMPORTANT
You are able to make changes to publications, such as adding new articles, but you have to remove all subscriptions to the publication before you do so. You must then re-create the subscriptions after making the changes. This is one of the reasons why you should plan carefully before adding all of your publications and subscriptions.
Which servers will subscribe to the data? What characteristics do the destination servers have? Are they known, or are they off line? If sites must have update capabilities, you need to address how to resolve conflicts or have high-quality connections to those sites to make the Immediate Updating Subscribers option feasible.
How will replication be accomplished—on a schedule or on demand?
If your application requirements allow latency, you can schedule infrequent updates. For example, if a site uses data for summary reporting or trend reporting on sales activity for the current year, the data need not be 100 percent current. A marketing department (decision support) is a good example of a site that would not require the most current information; once a week or once a month might be adequate.
Use the following network considerations to help you choose the type of replication that best suits your network:
If not, you may want to use filtering to minimize the quantity of data that is sent across the network. Be sure to note whether slow links exist that could be bypassed by having a remote distribution server.
If the capacity is low, consider using remote Distributors. Determine the time of day that has the greatest activity level. This is important if offline Subscribers will connect during peak usage periods. Schedule accordingly.
If failures are common, it is not wise to use a replication plan that includes distributed transactions, as these cannot be completed when the servers involved in the distributed transaction are not connected.
Preliminary planning of the physical implementation of the logical design includes addressing the questions in this section.
What replication topology will you implement? Establishing the server roles lays the physical framework for implementing replication. The decisions to be made are as follows:
You can select snapshot, transactional, or merge replication.
Push subscriptions use Distributor resources, and pull subscriptions use Subscriber resources.
Estimating the number of Subscribers will help to determine the load on the distribution server.
The following factors affect transaction log sizes for all databases that are involved in replication, as well as the size of the distribution database and working folder on the Distributor:
Replication data must be defined with specific characteristics because some characteristics cannot be replicated or are altered when replication takes place. This section discusses the issues involved in data definition.
The following table lists the data types that affect replication.
Data type | Impact |
---|---|
Timestamp | Indicates the sequence of SQL Server activity on a row. Despite the name, timestamp values are not related to date or time; they are database-wide, unique numbers that are intended only to indicate whether a row has changed. The values do not usually have any relevance on the Subscriber, so the timestamp column can be filtered out. If replicated, a timestamp column is replicated to the Subscriber as a binary(8) data type. If you are using the Immediate Updating Subscribers option, a timestamp column is required; one will be added to a table if it does not already exist. Timestamp columns must be removed from merge publications because it is not possible for timestamp column values to ever converge across all sites. |
Uniqueidentifier | A globally unique identifier (GUID) created by setting the default for the column to the NEWID() function. In tables published for merge replication, a uniqueidentifier column is required; one will be added to a table if it does not already exist. |
User-defined | Cannot be replicated unless the user-defined data type already exists in the Subscriber database. |
TIP
For consistent query results on data that is replicated between servers, it is recommended that each server that participates in replication use the same character set and sort order. This is not required, but this practice ensures that query processing behavior is consistent across all servers.
During replication, the value of the column with an IDENTITY property is replicated, but the property itself is not replicated. At initial snapshot, identity columns are not created with the IDENTITY property. This means that values on the Subscriber will always match those on the Publisher.
To partition the data at each site, you can set up an identity column again on the Subscriber. Be sure that you
The NOT FOR REPLICATION option allows you to disable certain features—such as the IDENTITY property, CHECK constraints, and triggers—for data replicated to a Subscriber. When users modify data on the Subscriber, these features operate normally.
A good use of this option is with the multiple Publishers/multiple Subscribers model. For example, you may have a CHECK constraint that verifies that a location is within a particular sales region, ensuring that sales representatives cannot enter orders for customers who are not within their regions. Adding the NOT FOR REPLICATION option to the CHECK constraint allows the data from other regions to be replicated even though it will contain invalid locations.
Set this option when you define a column with an IDENTITY property or a CHECK constraint or when you create triggers.
Before you begin implementing a replication scenario, you should ensure that you have met certain basic requirements. This section describes the tasks you need to perform.
If servers that participate in replication reside within separate Microsoft Windows NT Server domains, you must establish trust relationships between those domains. For information on establishing trust relationships, see Windows NT Server Help.
By default, replication uses the Windows NT domain user account that SQL Server Agent uses. It is advisable to use the same SQL Server Agent domain user account for all servers that are participating in replication.
Verify that you have a domain user account with administrative privileges in Windows NT for SQL Server Agent. The account should be a member of the Windows NT local Administrators group.
NOTE
The Windows NT account that you use for SQL Server Agent cannot be the Local System or a local user account, because neither account allows access to the network. You can use a SQL Server login account rather than a Windows NT account.
Before you implement replication, you must consider all of the replication types and physical models and design a replication solution that matches your environment. Certain data types and table attributes require special attention when designing your replication solution. The first step in implementing a replication solution is to ensure that you have configured your SQL Server security for network access.