Lesson 1: Introduction to Distributed Data

[Previous] [Next]

This lesson starts with a description of the distributed data environment and addresses factors in deciding how to distribute data. It then summarizes the various methods of distributing data.

After this lesson, you will be able to

  • List the various methods for distributing data
  • Describe the characteristics of Microsoft SQL Server version 7 replication

Estimated lesson time: 20 minutes

The Need for Distributed Data

A distributed data environment is one that can include multiple copies of the same information on multiple servers. For example, in an international company using a sales tracking application, data originates in one country and is distributed to servers in other countries to make querying more efficient.

If a distributed data environment evolves, rather than being created intentionally, it may become necessary to implement a solution that makes it more manageable. For example, during the early development of a company, different divisions typically have their own independent servers, which may run different database management systems. The challenge is to make the data more accessible to the whole company.

When you create a distributed data environment, you design a solution that

  • Brings the data closer to the user
  • Allows sites to operate independently (autonomously)
  • Separates online transaction processing (OLTP) from read-intensive applications, such as data marts and data warehousing
  • Can reduce conflicts
  • Because data can be distributed throughout the network, the information is made available to many users, and conflicts are reduced during user requests.

Considerations for Distributing Data

Two principal strategies exist for implementing distributed data: distributed transactions and replication. With both strategies, it is possible to keep multiple copies of data current. It also is possible to design a distributed environment that includes aspects of each strategy. Figure 15.1 compares the major differences between replication and distributed transactions; these differences are described in the sections that follow.

click to view at full size.

Figure 15.1 Comparing replication with distributed transactions

Distributed Transactions

Distributed transactions guarantee that all copies of your data are consistent all the time. This is usually based on the two-phase commit protocol. Each server that is included in a distributed transaction must be online and able to complete its part of the transaction. The distributed transactions method is less scalable than replication because the failure of a transaction at one site means failure at all sites. You should use this approach only when data must be synchronized at all times.

Distributed transactions are typically used for, but not limited to, applications that make simultaneous updates to data in more than one database. For example, distributed transactions might be used in an invoicing application that adds an invoice to a branch invoicing database and updates a central stock database, which records stock levels at a warehouse used by all branches.

Replication

With replication, recent copies of data are duplicated and distributed from a source database to a destination database, usually on a separate server. Autonomous sites are supported, allowing more scalability because sites can be online intermittently. Databases participating in replication can be located on a large server servicing hundreds of users but can also be located on a single user's computer, making replication useful for a wide variety of applications.

Replication is typically used for, but not limited to, data that is duplicated from one database to another. For example, replication might be used for a product catalog that is maintained at a central office and replicated to branch offices.

Factors in Deciding How to Distribute Data

When you decide how to distribute data, you must consider such factors as latency, site autonomy, transactional consistency, and database update conflicts.

Latency

Latency is the time delay that occurs between the updates to two or more sets of distributed data. You need to consider what degree of latency is acceptable for your database application. Distributed transactions require near-zero latency, while replication may cause or allow latency of a few seconds to a number of days.

Site Autonomy

Site autonomy refers to the degree to which sites can operate independently. Distributed transactions require that sites be permanently connected, whereas some forms of replication can allow sites to be completely disconnected for long periods.

Transactional Consistency

A transaction is a series of data modifications that must be completed entirely or not at all. Distributed transactions enforce complete, immediate transactional consistency. Some forms of replication maintain transactional consistency, although there is a delay from the initial update; other forms of replication do not guarantee transactional consistency.

Database Update Conflicts

If data is being updated at different sites, conflicts can occur. Distributed transactions give the same degree of multi-user consistency for multiple sites as a single server. Some forms of replication prevent conflicts by allowing data to be updated at only one site. Alternatively, you can design your application so that each participating site works with data that is strictly segregated, or partitioned, from other sites. For example, you might design your order entry system so that a given sales representative has a unique territory code, preventing orders from conflicting with those of other sales representatives.

NOTE
The term immediate guaranteed consistency replaces the term tight consistency used in SQL Server 6.x. The term latent guaranteed consistency replaces the term loose consistency used in SQL Server 6.x.

Methods of Distributing Data with SQL Server

As illustrated in Figure 15.2, the different methods of distributing data provide varying degrees of transactional latency and autonomy. It is important to select the method that best suits your particular business needs and environment.

click to view at full size.

Figure 15.2 Comparison of methods for distributing data

Distributed Transactions

Using distributed transactions guarantees that all sites have the same data at the same time. Microsoft Distributed Transaction Coordinator (MS DTC) facilitates distributed transactions in SQL Server by using a protocol known as two-phase commit to guarantee that a transaction completes at all participating sites at the same time.

Replication

Various types of replication exist for distributing data in SQL Server. They are summarized here and presented in detail in Lesson 3.

  • Transactional replication—Only changed data is distributed. The sequence of transactions is maintained. Conflicts do not arise, because there is only one location where data is changed.
  • Snapshot replication—A picture of the entire current data (changed and unchanged) at a source server replaces data at a destination server on a periodic basis or on demand.
  • Merge replication—Multiple sites make changes to data independently of one another. The changes are periodically merged together at the source server. Conflicts can occur (and be resolved), so this type does not guarantee transactional consistency.

Lesson Summary

Data is commonly distributed in an organization. SQL Server provides support for distributed transactions and replication to support these distributed data needs. Distributed transactions are implemented using a protocol known as two-phase commit that guarantees that a transaction completes at all participating sites at the same time. Replication duplicates and distributes recent copies of data from a source database to a destination database. With replication, distribution of changed data may be delayed.



Microsoft Press - Microsoft SQL Server 7. 0 System Administration Training Kit
Microsoft SQL Server 7.0 System Administration Training Kit
ISBN: 1572318279
EAN: 2147483647
Year: 1999
Pages: 100

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net