Highly Available Replication Architecture


There are three words that should apply to any replication architecture: keep it simple. The more complex it is, the more difficult it is to maintain over the long term . In the event of a disaster (see the later section, Disaster Recovery with a Replicated Environment ), the simpler the topology, the easier it should be to recover. You should also test the normal operation of replication and simulate an actual switch to the replicated secondary, and, if applicable , a switch back to the original Publisher, prior to rolling out your replication architecture in a production environment.

The three obvious components that you are looking to protect are the Publisher, Distributor, and Subscriber. Of these three, the first two ”the Publisher and Distributor ”are the most important to protect. Without the Publisher, you have no source. Without the Distributor, you do not have anything to push the data. It is as simple as that.

Note

Remember that if you are deploying merge or bidirectional transactional replication, you will need some sort of conflict resolution to ensure that there will always be a winner if more than one update of the same bit of data is generated. Merge replication has conflict resolution built in, and transactional replication needs custom resolvers . For more information, see the topic Implementing Non-partitioned, Bidirectional Transactional Replication and the whole topic Merge Replication Conflict Detection and Resolution in the updated SQL Server Books Online. They describe more factors you need to take into account in any disaster recovery scenario.

Replication Agents

Before you can plan for your architecture you must understand which replication agents, the underlying components that initiate different actions of replication, are used by the different methods of replication (see Table 8-1).

Warning

Do not modify any replication objects ”tables, triggers, stored procedures, and so on ”that are created on your Publisher, Distributor, or Subscriber. Doing so makes your solution unsupported.

Table 8-1: Agents Per Model of Replication

Snapshot

Merge

Transactional

Snapshot Agent

Yes

Yes

Yes

Distribution Agent

Yes

No

Yes

Log Reader Agent

No

No

Yes

Merge Agent

No

Yes

No

Queue Reader Agent

No

No

Yes

Snapshot Agent

The Snapshot Agent is used with all models of replication. It prepares schema and initial data files of published tables and stored procedures, stores the snapshot files, and inserts information about initial synchronization in the distribution database. The Snapshot Agent typically runs at the Distributor.

There is one Snapshot Agent per publication. When you create a snapshot of data, there are physical files, or objects, associated with it, and it differs with the model of replication, as shown in Table 8-2. These objects vary in number depending on your replication configuration s publication and articles; there is no exact formula to calculate it. You can only get the exact count by testing. All forms of replication imply that a snapshot initiates them.

Once you create the initial snapshot, you should consider backing up the snapshot files if you want to use them to initiate replication again. You can either back up the snapshot folder itself or allow another execution of the snapshot agent to generate another snapshot. As part of a disaster recovery plan, you can use the old or the new snapshot files (depending on your strategy) for the publication prior to creating a new subscription or reinitializing an existing one.

Note

Triggers configured with replication are AFTER triggers and behave as normal AFTER triggers. The sp_trigger_order stored procedure is not run by the configuration of replication.

Table 8-2: File Types Associated with Replication

Merge

Snapshot or Transactional

Conflict tables (.cft)

Yes

No

Constraints (.idx)

No

Yes

Constraints and indexes (.dri)

Yes

Yes

Data (.bcp)

Yes

Yes

Schema (.sch)

Yes

Yes

System table data (.sys)

Yes

No

Triggers (.trg)

Yes

No

Distribution Agent

The Distribution Agent is used with snapshot replication and transactional replication. It moves snapshot files and incremental changes held in the distribution database to Subscribers. The Distribution Agent typically runs at the Distributor for push subscriptions and at the Subscriber for pull subscriptions.

Log Reader Agent

The Log Reader Agent is used with transactional replication. It moves transactions marked for replication from the transaction log on the Publisher to the distribution database. Each database marked for transactional replication has one Log Reader Agent that runs on the Distributor and connects to the Publisher.

Merge Agent

The Merge Agent is used only with merge replication. It applies the initial snapshot at the Subscriber. It then moves and reconciles incremental data changes that occurred after the initial snapshot was created. Each merge subscription has its own Merge Agent that connects to and updates both Publisher and Subscriber. The Merge Agent typically runs at the Distributor for push subscriptions and at the Subscriber for pull subscriptions.

Queue Reader Agent

The Queue Reader Agent is used with snapshot or transactional replication when you select the Queued Updating Subscribers option. It moves transactions and reconciles those incremental changes from a Subscriber with those at the Publisher as needed. There is one Queue Reader Agent per published database and the Queue Reader Agent typically runs at the Distributor regardless of whether push or pull subscriptions are employed.

Note

In the end, what is important is where the agents run and consume resources.

Scenario 1: Separate Publisher and Distributor

When you use transactional replication, you should not put the Publisher and Distributor on the same server, which is otherwise known as a Local Distributor (see Figure 8-1).

click to expand
Figure 8-1: Local Distributor in a replication topology.

It is better to separate the Publisher and Distributor and create what is known as a Remote Distributor (see Figure 8-2). In fact, you can have more than one Distributor, but a Publisher can only talk to one Distributor. Multiple Publishers can share the same Distributor.

click to expand
Figure 8-2: Remote Distributor in a replication topology.

Using a Remote Distributor has the following benefits:

  • It can improve performance because it offloads work performed by the Replication Agents from the primary computer.

  • It allows for the smoothest possible upgrade paths. SQL Server replication is designed to support replication among servers of differing versions. This can be an advantage because you can upgrade one computer at a time, redirecting client connections to a replica during the upgrade process. Users achieve an aggregate data-access availability that might not be possible in a nonreplicated scenario. You can also do this with merge and transactional replication with queued updating even if the Publisher and Distributor are on the same machine, but the best scenario is if they are separated.

  • One SQL Server with both Publisher and Distributor roles is removed as a single point of failure. Although the Publisher and Distributors can still potentially be single points of failure, you are not creating a bigger problem with them on the same server.

  • In a disaster recovery scenario, you might only be dealing with one server and one piece of functionality, not multiple problems. This approach helps to minimize single points of failure, and is a compelling reason to use a Remote Distributor.

Scenario 2: Using a Republisher

This scenario is not dissimilar to using a Remote Distributor. You still have a Publisher and a Distributor, but you then take a Subscriber and have it publish the data out to other servers. That Subscriber is then known as a Republisher, or a Publishing Subscriber (see Figure 8-3). In this case, the Republisher would act as its own Distributor, or you could even set up another one. This scenario would benefit you if your company has sites around the world. Because of network issues, it is easier and more cost-effective to have more local resources generating data to Subscribers. This also increases the availability of your solution so that even if the original Publisher or Distributor goes down, if there are changes to be propagated out to other Subscribers that are still in the Republisher, they will still get to the other Subscribers.

click to expand
Figure 8-3: Using a Republisher.
Note

The Republisher is really a hierarchy of servers, and not something you use to employ bidirectional transactional replication.

SQL Server Service Packs and Replication

SQL Server service packs will definitely make an impact on your replication implementation. If you apply them wrongly, you might need to reinitialize your replicated environment or use the disaster recovery procedures detailed here. See the section Applying a SQL Server 2000 Service Pack in Chapter 13, Highly Available Upgrades, for more information on replication and SQL Server service packs. The order in which you apply the service pack to your servers participating in replication matters. First, you must update the Distributor; then comes the Publisher; and finally the Subscriber.

You can also use replication to help keep your database available during a service pack install, much like you can use log shipping. Switch to the standby, and allow updates to happen on the standby assuming that it can function as the primary in every way it needs to. Once the service pack is done on the Publisher, switch back and upgrade the Subscriber. This is only applicable after the Distributor upgrade, which might even be the same server as the Publisher.

Planning Disk Capacity for Replication

Replication makes an impact on your planning for your disk capacity (in addition to anything outlined in Chapter 4 and the rest of the book), which is an obvious concern for any highly available environment. You do not want to run out of disk space before you even implement your solution. Table 8-3 walks through the requirements for each model of replication.

Table 8-3: Disk Requirements for Replication

Model of Replication

Impact on Transaction Log

Disk Space Needed

Merge

Database and its log file will have a small amount of growth when updates to a published table are made because the change tracking meta data results in additional inserts, updates, or deletes to system tables during the execution of user transactions. Changes are tracked in the database itself, not in the log, so there is no impact on backup strategy of replication frequency as it relates to truncating the transaction log.

Review SQL Server Books Online for a description of the change tracking tables created by merge replication in a user database. The maximum size of data columns is rarely used, but row-tracked tables can incur up to 249 bytes of change tracking overhead and column-tracked tables can incur up to 2048 bytes per changed row. Remember, only changed rows get a meta data entry; rows that are never updated do not result in change meta data accumulation. Also, the merge agent takes care of trimming aged meta data out of these system tables based on the publication retention period. The distribution database does not store tracked changes and instead only stores history and error information, which is also trimmed periodically based on the publication retention period.

Snapshot

Not a concern from a replication standpoint because the Subscriber gets everything in the primary.

A snapshot generates files in the filing system representing the schema and data for all published objects. The Snapshot Agent typically BCPs (bulk copies) data out from the Publisher to a file in a SQL Server binary format. Further reduction in disk impact can be achieved by setting a publication property such that the files generated by the snapshot are compressed in a .cab file. The ultimate disk impact then varies depending on the number of objects published, the amount of data published, and the use of compression in generating the snapshot files. In general, if you have a large database, the file system storage requirements will be large. The distribution database itself will store relatively little and will include a few bytes of control data directing the subsequent forwarding of the snapshot files to the Subscribers and a small amount of history and error information reflecting the execution of the Snapshot Agent.

Transactional

Transactional changes are tracked in the transaction log of the publishing database. The Log Reader Agent moves transactions from the publishing database s transaction log to the distribution database where they are queued for subsequent distribution to Subscribers. The transaction log cannot be truncated beyond the last transaction processed by the Log Reader Agent so additional log growth can incur when replication is employed. To minimize the potential growth of the publishing database log, run the Log Reader Agent continuously.

Disk space is an obvious concern because the Distributor is now storing copies of the transactions generated at the publishing database until they can be distributed to all Subscribers. The transaction log is a concern because on the database being replicated, transactions might not be flushed until they are put into the distribution database. So you need to set your data and log portions appropriately for your database and the distribution database.

Important

No matter what model of replication you use, you should not use the Simple Recovery model for your databases. For more information on recovery models, consult Chapter 9, Database Environment Basics for Recovery, and Chapter 10, Implementing Backup and Restore.




Microsoft SQL Server 2000 High Availability
Microsoft SQL Server 2000 High Availability
ISBN: 0735619204
EAN: 2147483647
Year: 2006
Pages: 137

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net