You must consider many factors when choosing a method to distribute data. Your business requirements will determine which is the right method for you. In general, you will need to understand the timing and latency of your data, its independence at each site, and your specific need to filter or partition the data.
Autonomy, Timing, and Latency of Data
Distributed data implementations can be accomplished using a few different facilities in Microsoft. These are Data Transformation Services (DTS), Distributed Transaction Coordi-nator (DTC), and Data Replication. The trick is to match the right facility to the type of data distribution that you need to get done.
In some applications, such as online transaction processing and inventory control systems, data must be synchronized at all times. This requirement, called immediate transactional consistency, was known as tight consistency in previous versions of SQL Server.
SQL Server implements immediate transactional consistency data distribution in the form of two-phase commit processing. A two-phase commit, sometimes known as 2PC, ensures that transactions are committed on all servers, or the transaction is rolled back on all servers. This ensures that all data on all servers is 100 percent in sync at all times. One of the main drawbacks of immediate transactional consistency is that it requires a high-speed LAN to work. This type of solution might not be feasible for large environments with many servers because occasional network outages can occur. These types of implementations can be built with DTC and DTS.
In other applications, such as decision support and report generation systems, 100 percent data synchronization all of the time is not as important. This requirement, called latent transactional consistency, was known as loose consistency in previous versions of SQL Server.
Latent transactional consistency is implemented in SQL Server via data replication. Replication allows data to be updated on all servers, but the process is not a simultaneous one. The result is "real-enough time" data. This is known as real-enough time data, or latent transactional consistency, because a lag exists between the data updated on the main server and the replicated data. In this scenario, if you could stop all data modifications from occurring on all servers, then all of the servers would eventually have the same data. Unlike the two-phase consistency model, replication works over both LANs and WANs, as well as slow or fast links.
When planning a distributed application, you must consider the effect of one site's operation on another. This is known as site autonomy. A site with complete autonomy can continue to function without being connected to any other site. A site with no autonomy cannot function without being connected to all other sites. For example, applications that utilize two-phase commits, or 2PC, rely on all other sites being able to immediately accept changes that are sent to it. In the event that any one site is unavailable, no transactions on any server can be committed. In contrast, sites using merge replication can be completely disconnected from all other sites and continue to work effectively, not guaranteeing data consistency. Luckily, some solutions combine both high data consistency and site autonomy.
Methods of Data Distribution
After you have determined the amount of transactional latency and site autonomy based on your business requirements, it is important to select the data distribution method that corresponds. Each different type of data distribution has a different amount of site autonomy and latency. With these distributed data systems, you can choose from several methods: