This lesson presents the physical replication models used in SQL Server, drawing a connection between the replication types (snapshot, transactional, and merge) and the physically implemented replication models. The lesson concludes with examples of each of these models.
After this lesson, you will be able to:
- Describe the physical replication models used in SQL Server
Estimated lesson time: 90 minutes
The basic physical replication models shown in Figure 7.10 illustrate how the server replication roles can be implemented in replication. These models are discussed in turn in this section.
Figure 7.10 Physical replication models
In the central Publisher/Distributor model (the SQL Server default), one or two servers are defined as the Publisher/Distributor. The Publisher/Distributor publishes and distributes data to any number of servers that are set up as Subscribers.
The Publisher and the Distributor can reside on one server or on separate servers. In either case, the publication server is the primary owner or source of all replicated data. Typically, the distribution server stores data before the data is replicated to the subscribing servers. Data that is received at subscription sites is intended to be read-only. Administrators must ensure that only the SELECT permission is allowed on Subscriber tables.
If the Publisher and the Distributor are established on separate servers, this model offloads much of the replication work from the Publisher to the Distributor.
In the central Subscriber/multiple Publishers model, multiple Publishers replicate data to a single Subscriber. This model addresses the need for consolidating data at a centralized site and providing the local sites with local data only. Because multiple Publishers are writing to the same subscription table, it is important to ensure that all data has a unique local owner so that another Publisher does not overwrite it. You can accomplish this by filtering the data horizontally.
In the multiple Publishers/multiple Subscribers model, multiple publication servers and multiple subscription servers each potentially plays a dual role. This model is the closest implementation to fully distributed data processing. You must be careful when you design both the schema and update types to ensure that an adequate level of data consistency exists at all sites. Horizontal filtering or partitioning will be necessary to achieve this level.
For each publication, you can use any of the replication models with any of the replication types. The following are some facts about the interaction between the replication model and the replication type:
A single database can have many publications with differing replication types. For example, in your company database, you might have a publication for inventory information that uses transactional replication with the immediate updating Subscribers option. Another publication that contains a customer list could use merge replication so that all sites can update it.
The example in this section features a sales analysis system used by a corporate headquarters and regional sales offices.
The Current Process
Currently, sales personnel in European regional sales offices are required to forecast sales prior to the end of each month. One of the primary tools that they use is a sales analysis system that exists at the corporate headquarters in New York. The regional sales offices have reliable, high-speed network connections to London. However, they can connect to the New York headquarters only via dialup Internet connections. For this reason, sales data at the regional offices is often unavailable.
The Business Issue
Regional sales offices cannot accurately forecast sales because they do not have consistent access to current sales data. Cost prohibits installing high-speed network connections between all of the regional offices and New York.
The Proposed Replication Model
Implementing a central Publisher model with snapshot replication can make the sales information available to Subscribers in each region. Publications that horizontally filter data can be created so that each region receives sales data for its customers only. This reduces the amount of data that each Subscriber receives.
A remote Distributor can take advantage of a high-speed network connection between New York and London. The Distributor then sends filtered subscriptions to the regional Subscribers in the Rome, Paris, and Dublin offices. Network connections between New York and the Subscriber locations are unnecessary. London would also be a Subscriber for its own data. Figure 7.11 illustrates this proposed solution.
Figure 7.11 Example of a central Publisher/remote Distributor
With this solution, users in each region can query their regional SQL Servers for sales data rather than using the SQL Server at corporate headquarters. The subscribed data cannot be updated, and it doesn t need to be. The snapshot can be scheduled to take place daily, weekly, or at some other appropriate interval.
A variation of this solution would be to configure London as a publishing Subscriber. London would subscribe to the data from New York and then publish the data to the other European offices. This would be a good approach if all of the regional offices were receiving all of the data, not only their own filtered data.
The example in this section features an automobile parts manufacturer with four regional warehouses across the country.
The Current Process
Currently, inventory levels fall below reorder levels and are not noticed until the warehouse is unable to completely fill a customer order. All warehouses manage their own inventories. When the quantity for a part falls below the inventory reorder point, the inventory control manager at the warehouse places an order with the factory. This procedure requires manual monitoring and order placing by each warehouse.
The Business Issue
No centralized inventory monitoring exists at the factory.
The Proposed Replication Model
This problem can be solved by implementing a central Subscriber/multiple Publishers model that uses transactional replication to roll up the inventory information from all of the regional sites to a main inventory database at the central factory site. Inventory can be reordered automatically when established reorder points are reached for individual parts in each warehouse. These orders are generated at the central factory site, based on the data that has been replicated from the warehouses.
Because the ID number for the parts is the same in all regions, adding a separate region code column (reg_code) uniquely identifies data from the regions when the data is rolled up at the central site. The region code column is used as part of a composite primary key (the reg_code and id columns) to uniquely identify the parts from each region. Figure 7.12 illustrates this solution.
Figure 7.12 Example of a central Subscriber with multiple Publishers
With this solution, data does not need to be updated at the factory; each warehouse updates only the rows that it publishes.
A pastry company has three shops in different parts of the city. Each shop has a server that publishes its own orders table and subscribes to the orders tables that the other shops publish.
The Current Process
Currently, each shop runs out of certain ingredients on a regular basis. When one shop runs out of a necessary ingredient, it must contact one or both of the other shops to find out whether the ingredient is in stock and then separately arrange to have it ordered.
The Business Issue
Because it is impossible to know the inventory of the other shops until the last minute, situations arise in which all shops are out of the same ingredient at the same time.
The Proposed Replication Model
Implementing a multiple Publishers/multiple Subscribers model that uses transactional replication enables each shop to know immediately if the inventory of the other shops can solve a given problem. Shortages can be anticipated so that each store is able to fill its orders.
As shown in Figure 7.13, Shops A, B, and C each have a copy of the orders table. Each shop is both a Publisher and a Subscriber the server at each shop publishes some rows from the orders table to the other two shops and subscribes to the rows of the orders table published by the other two shops.
Figure 7.13 Example of multiple Publishers with multiple Subscribers
Each shop has responsibility for only a portion of the data. The orders table is horizontally filtered by area. For example, Shop A is responsible for updating the data in Area 1. For this scenario to work with transactional replication, each shop can update the data only in its area.
These scenarios present some specific data warehousing architectures that might benefit from using replication to move data between systems. These are not required configurations; they simply present possible uses of replication components in a data warehousing architecture:
This scenario shows a collection of OLTP applications that feed into the central data warehouse via the Data Transformation Services (DTS). DTS populates the data warehouse on a scheduled basis. As part of the final step in the DTS package, a snapshot replication should be executed to push data down to the data marts. After the data has been refreshed in the data marts, the online analytical processing (OLAP) cubes can be scheduled for rebuilding. Figure 7.14 shows where replication is used in this scenario.
This configuration is best suited for LANs due to the volume of data possibly being moved through the system.
Figure 7.14 Using replication to update data marts
This scenario shows a collection of OLTP applications that feed into an intermediate database for transformation via transactional replication. From the intermediate database, DTS cleans the data and pumps it into the central data warehouse. As part of the final step in the DTS package, a snapshot replication should be executed to push data down to the data marts. After the data has been refreshed in the data marts, the OLAP cubes can be scheduled for rebuilding. Figure 7.15 shows where replication is used in this scenario.
This configuration is best suited for remote OLTP applications and a central data warehouse and OLAP cube. Moving just the transactions into the intermediate database reduces the volume of data being moved from the remote OLTP application into the central data warehouse. From this point, data moves similarly to the previous scenario.
Figure 7.15 Using replication to stage data for Transformation
This scenario shows a collection of OLTP applications that feed into local data marts via DTS. From the local data marts, snapshot replication pushes the data into the central data warehouse. After the data has been refreshed in the data warehouse, the OLAP cubes can be scheduled for rebuilding. Figure 7.16 shows where replication is used in this scenario.
This configuration is best suited for remote OLTP applications and remote data marts with a central data warehouse and OLAP cube. This configuration depends on a reliable WAN connection.
Figure 7.16 Using replication to update data warehouses
You have a number of choices for populating your data warehouse and data marts. The following sections help you select the appropriate replication or DTS solution for various data warehousing environments.
Transactional replication is often the best choice when replicating data from OLTP servers, because the Publisher sends a steady stream of inserts and updates as they occur on the OLTP server. The process of replication uses only a small amount of the processing time of the OLTP server and a small amount of network bandwidth for each transaction.
Snapshot replication is a good choice when the OLTP server does not have a reliable or fast connection to the Subscriber or when the OLTP server has no processor time to spend on replication. Snapshot replication is typically used to replicate an entire table, but you can create a horizontally partitioned article to limit the rows sent to the Subscriber and prevent overloading the Distributor with images of large databases.
TIP
In combining the execution of stored procedures and transactional replication, it may be more efficient to replicate the execution of the stored procedure rather than the operations that the stored procedure performed on the published tables.
When the data is static, snapshot replication is a good choice. For example, the information in the data warehouse is usually considered unchangeable. Transactional replication works only if someone is inserting, updating, or deleting rows in the table. Because those operations are impossible in a read-only database, snapshot replication is the only method of replication for moving data out of the data warehouse.
NOTE
You can set the read-only database option so that SQL Server does not allow the database to be modified. Setting the read-only option also improves performance because almost no locking is necessary in the database. Alternatively, if you consider a database to be read-only but do not set the read-only database option, you will have to use security and programmatic measures to ensure that the database is not modified.
Because there is no need for transactional consistency when summarizing information, snapshot replication is a good choice for replicating aggregate data. You can perform complex joins and apply aggregate functions only once and then distribute the results to Subscribers.
If you must recombine data into different schemas, you can replicate to the Subscriber and then use DTS to convert the data into a new format. For example, you could generate a cross-join between the Customer table and the Invoice table to create a table that contains a customer s city, state, and country information as well as the part number, quantity, and cost of each item ordered.
Use the DTS Import wizard to create a DTS package that imports and transforms data from database management systems other than SQL Server 7.0. Once the data is imported into a SQL Server database, you can use snapshot replication to distribute the data to other servers.
In the replication exercises, your server will be the Publisher, Distributor, and Subscriber. You will create publications in the Northwind database (representing the Publisher) and subscriptions in other databases or tables (representing the Subscribers).
In this exercise, you will enable publishing and distribution and create a new publication of the Products table in the Northwind database. You do not actually do anything to enable publishing and distribution; it is enabled automatically by the Create Publication wizard when you create the publication.
Option | Value |
---|---|
Distributor | Your server |
Publication type | Transactional publication |
Allow immediate-updating subscriptions | No |
Subscriber types | All Subscribers will be servers running SQL Server |
Articles for publication | Check dbo.Products, click the ellipsis to set the Article name and the Destination table name |
Article name | ReplProducts_Article |
Destination table name | ReplProducts Click OK to return to the wizard |
Publication name | Northwind_Products_Publication |
No, create a publication without data filters and with the following properties | Selected |
In this exercise, you will create a new database that will be used to create a pull subscription in the next exercise.
In this exercise, you will create a subscription to pull the publication that you published in a previous exercise.
Option | Value |
---|---|
Choose Publication | Expand your server, then select Northwind_Products_Publication: Northwind |
Destination Database | PullSubs |
Initialize Subscription | Yes, initialize the schema and data at the Subscriber |
Distribution Agent Schedule | Continuously |
Start Required Services | SQLServerAgent (on SQLSERVER) |
The job called SQLSERVER-Northwind-Northwind_Products_Pub-SQLSERVER-PullSubs- 0, with a category of REPL-Distribution is the job that starts the Distribution Agent for your new subscription.
Which server s system resources will this job consume? Would a push subscription change the resources being used?
Answer
In this exercise, you will start the Snapshot Agent to create a snapshot of the data and schema of the publication. Once the snapshot has been created, the Distribution Agent will use it to create the ReplProducts table and populate it in the PullSubs database.
View the agent history and action messages by right-clicking the agent in the Details pane and selecting Agent History. The Distribution Agent will say that there are no replicated transactions available because no new transactions have occurred since it replicated the snapshot. Click Close.
USE PublSubs SELECT * FROM ReplProducts |
In this exercise, you will use SQL Server Enterprise Manager to see some of the changes that are made to a server when replication is enabled.
If distribution (and other system databases such as master) is not listed, right-click on your server and click Edit SQL Server Registration properties. On the property sheet, make sure that the Show system databases and system objects option is checked.
You must not delete the distribution database or add objects to or remove objects from the distribution database. This database is used to store replication information and replication data from the Publisher that is to be sent to or retrieved by the Subscribers.
In this exercise, you will update the Products table (on the Publisher) and verify that the updated information is replicated to the Subscriber.
USE Northwind SELECT * FROM Products WHERE ProductID = 1 |
USE Northwind UPDATE Products SET ReorderLevel = 20 WHERE ProductID = 1 SELECT * FROM Products |
Immediately after executing the script, switch back to SQL Server Enterprise Manager and watch the Status of the Log Reader Agent.
Within a few moments, the Status of the Log Reader Agent entry should change to Running and the Last Action should indicate that one transaction with one command was delivered. In the console tree, click Distribution Agents. Within a few moments, the status of the Distribution Agent entry should change to Running, and the Last Action should indicate that one transaction with one command was delivered.
USE PublSubs SELECT * FROM Products WHERE ProductID = 1 |
Along with the three different types of replication discussed in Lesson 3, "SQL Server Replication Types," you can use many different physical models when you implement replication. The primary models are the central Publisher/Distributor model, the central Subscriber/multiple Publishers model, and the multiple Publishers/multiple Subscribers model. These models can be mixed or combined. When implementing replication, you can also choose whether to place the Publisher and the Distributor on the same computer or on different computers.
Replication can be used in a number of ways to move data in a data warehouse environment. These ways include updating data marts, staging data for transformation, and updating data warehouses.