Data Replication

Replication allows you to automatically distribute copies of data from one server to one or more destination servers at one or more remote locations. Data integrity is a key design point of SQL Server's replication capabilities. The data at replicating sites might be slightly outdated, but it will accurately reflect the state of the data at a recent point in time and is guaranteed to reflect any changes made soon after they occur (or soon after a reliable network connection is established).

SQL Server 2000 supports three modes of replication, which are detailed on the companion CD in a whitepaper titled "Understanding SQL Server 7.0 Replication."

  • Transactional replication In this mode of replication, one site is considered the owner, or publisher, of a published article (a table or a subset of a table), and other sites subscribe to that article. All changes made to the article must be made at the publisher and then replicated to the subscribers.
  • Merge replication Although the publisher defines the articles to be published, any participating sites can make changes. System tables keep track of the changes, which are then propagated to all participants, including the original publisher. You can define which type of conflict resolution to use in the event of conflicting updates made by multiple sites.
  • Snapshot replication All data from the publishing site is periodically copied to all the subscribing sites. Individual changes aren't propagated. This method was formerly called scheduled table refresh.

Distributed transactions that use the two-phase commit protocol guarantee that the ACID properties are maintained, but replication doesn't. Replication isn't strongly consistent (the C in ACID). Instead, it provides loosely consistent data. With the two-phase commit protocol, a transaction is an all-or-nothing proposition, and the data is guaranteed to be strongly consistent. But inherent in the two-phase commit algorithm is the fact that a failure at any one site makes the entire transaction fail or can keep the transaction in doubt for a long period of time, during which all participants must hold locks, thereby crippling concurrency. SQL Server does provide an option to transactional replication called Immediate Updating Subscribers that uses distributed transactions between one subscriber and the publisher. Other subscribers to the same publication will not be updated as part of the transaction, however.

At first glance, you might think a system should require that updates be made at all sites in real time. In fact, when the costs of two-phase commit are realized (chiefly, the vulnerability that can result from a failure at just one node), the most pragmatic solution might be to make changes in real-enough time.

For example, suppose you run a car rental agency with 500 rental counters worldwide and you maintain a customer profile table containing 500,000 renters who belong to your Gold Card program. You want to store this customer profile table locally at all 500 rental counters so that even if a communication failure occurs, the profiles will be available wherever a Gold Card member might walk up to do business. Although all sites should have up-to-date records of all customers, it would be disastrous to insist that an update of the Gold Card profile must occur as part of a two-phase transaction for all 500 sites or not at all. If you were to do that, because of the vagaries of worldwide communication or because a storm might knock out the power at one site, you wouldn't be able to perform a simple update to the customer profile very often.

Replication is a much better solution in this case. The master customer profile table is maintained at your corporate headquarters. Replication publishes this data, and the rental counters subscribe to this information. When customer data is changed or when a customer is added or removed, these changes (and only these changes) are propagated to all the subscribing sites. In a well-connected network, the time delay might be just a few seconds. If a particular site is unavailable, no other sites are affected—they still get their changes. When the unavailable site is back online, the changes are automatically propagated and the subscriber is brought up to date. At any time, a given rental counter might have information that's slightly out of date—that is, it might reflect the state of the data at the corporate headquarters at some earlier time. This is considered loosely consistent, as opposed to the strongly consistent model of two-phase commit, in which all sites (or none) immediately reflect the change.

Although a time delay can occur in loosely consistent systems, maintaining transactional consistency is one of the chief design points of SQL Server replication. If multiple updates occur as a single atomic transaction to data being replicated, the entire transaction is also replicated. At the subscribing site, the transaction either entirely commits or is again replicated until it commits.

With SQL Server, data can be replicated continuously or at specified intervals. It can be replicated in its entirety or as filtered subsets (known as horizontal and vertical partitions). In addition to replicating to other SQL Servers, SQL Server 2000 can replicate to heterogeneous data sources that have an appropriate ODBC driver or OLE DB provider available. These include Access databases, Oracle databases, and other ODBC data sources. SQL Server 2000 can also replicate from other data sources, using ODBC drivers or OLE DB providers. These sources include Oracle 8, Microsoft Jet version 4, and IBM DB/2.

Unlike SQL Server, some products on the market promote replication as an "update anywhere, anytime, any way" model. However, this model has inherently unstable behavior if many nodes participate and update activity is moderate to heavy.1

Updates made at multiple sites will conflict with one another and will have to be reconciled. SQL Server 2000's merge replication does allow multiple site updates, but it's not a true "update anywhere, anytime, any way" solution. (See the whitepaper on the companion CD for more details on exactly what SQL Server merge replication can do.)



Inside Microsoft SQL Server 2000
Inside Microsoft SQL Server 2000
ISBN: 0735609985
EAN: 2147483647
Year: 2005
Pages: 179
Authors: Kalen Delaney

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net