Lesson 4: Physical Replication Models | Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit

This lesson presents the physical replication models used in SQL Server, drawing a connection between the replication types (snapshot, transactional, and merge) and the physically implemented replication models. The lesson concludes with examples of each of these models.

After this lesson, you will be able to:

Describe the physical replication models used in SQL Server

Estimated lesson time: 90 minutes

Overview of the Replication Models

The basic physical replication models shown in Figure 7.10 illustrate how the server replication roles can be implemented in replication. These models are discussed in turn in this section.

click to view at full size

Figure 7.10 Physical replication models

The Central Publisher/Distributor Model

In the central Publisher/Distributor model (the SQL Server default), one or two servers are defined as the Publisher/Distributor. The Publisher/Distributor publishes and distributes data to any number of servers that are set up as Subscribers.

The Publisher and the Distributor can reside on one server or on separate servers. In either case, the publication server is the primary owner or source of all replicated data. Typically, the distribution server stores data before the data is replicated to the subscribing servers. Data that is received at subscription sites is intended to be read-only. Administrators must ensure that only the SELECT permission is allowed on Subscriber tables.

If the Publisher and the Distributor are established on separate servers, this model offloads much of the replication work from the Publisher to the Distributor.

The Central Subscriber/Multiple Publishers Model

In the central Subscriber/multiple Publishers model, multiple Publishers replicate data to a single Subscriber. This model addresses the need for consolidating data at a centralized site and providing the local sites with local data only. Because multiple Publishers are writing to the same subscription table, it is important to ensure that all data has a unique local owner so that another Publisher does not overwrite it. You can accomplish this by filtering the data horizontally.

The Multiple Publishers/Multiple Subscribers Model

In the multiple Publishers/multiple Subscribers model, multiple publication servers and multiple subscription servers each potentially plays a dual role. This model is the closest implementation to fully distributed data processing. You must be careful when you design both the schema and update types to ensure that an adequate level of data consistency exists at all sites. Horizontal filtering or partitioning will be necessary to achieve this level.

Combining Replication Models and Types

For each publication, you can use any of the replication models with any of the replication types. The following are some facts about the interaction between the replication model and the replication type:

The replication model is the physical implementation of your replication design. You will spend most of your development time in designing your replication model.

The replication type (snapshot, transactional, or merge) provides the functionality that details how to maintain replicated data. This will be determined by the latency, consistency, and site autonomy requirements of your environment.

Any of the replication models can use any of the replication types. You typically select the model and the type at the same time; one does not determine the other.

A single database can have many publications with differing replication types. For example, in your company database, you might have a publication for inventory information that uses transactional replication with the immediate updating Subscribers option. Another publication that contains a customer list could use merge replication so that all sites can update it.

An Example of the Central Publisher/Remote Distributor Model

The example in this section features a sales analysis system used by a corporate headquarters and regional sales offices.

The Current Process

Currently, sales personnel in European regional sales offices are required to forecast sales prior to the end of each month. One of the primary tools that they use is a sales analysis system that exists at the corporate headquarters in New York. The regional sales offices have reliable, high-speed network connections to London. However, they can connect to the New York headquarters only via dialup Internet connections. For this reason, sales data at the regional offices is often unavailable.

The Business Issue

Regional sales offices cannot accurately forecast sales because they do not have consistent access to current sales data. Cost prohibits installing high-speed network connections between all of the regional offices and New York.

The Proposed Replication Model

Implementing a central Publisher model with snapshot replication can make the sales information available to Subscribers in each region. Publications that horizontally filter data can be created so that each region receives sales data for its customers only. This reduces the amount of data that each Subscriber receives.

A remote Distributor can take advantage of a high-speed network connection between New York and London. The Distributor then sends filtered subscriptions to the regional Subscribers in the Rome, Paris, and Dublin offices. Network connections between New York and the Subscriber locations are unnecessary. London would also be a Subscriber for its own data. Figure 7.11 illustrates this proposed solution.

click to view at full size

Figure 7.11 Example of a central Publisher/remote Distributor

With this solution, users in each region can query their regional SQL Servers for sales data rather than using the SQL Server at corporate headquarters. The subscribed data cannot be updated, and it doesn t need to be. The snapshot can be scheduled to take place daily, weekly, or at some other appropriate interval.

A variation of this solution would be to configure London as a publishing Subscriber. London would subscribe to the data from New York and then publish the data to the other European offices. This would be a good approach if all of the regional offices were receiving all of the data, not only their own filtered data.

An Example of the Central Subscriber/Multiple Publishers Model

The example in this section features an automobile parts manufacturer with four regional warehouses across the country.

The Current Process

Currently, inventory levels fall below reorder levels and are not noticed until the warehouse is unable to completely fill a customer order. All warehouses manage their own inventories. When the quantity for a part falls below the inventory reorder point, the inventory control manager at the warehouse places an order with the factory. This procedure requires manual monitoring and order placing by each warehouse.

The Business Issue

No centralized inventory monitoring exists at the factory.

The Proposed Replication Model

This problem can be solved by implementing a central Subscriber/multiple Publishers model that uses transactional replication to roll up the inventory information from all of the regional sites to a main inventory database at the central factory site. Inventory can be reordered automatically when established reorder points are reached for individual parts in each warehouse. These orders are generated at the central factory site, based on the data that has been replicated from the warehouses.

Because the ID number for the parts is the same in all regions, adding a separate region code column (reg_code) uniquely identifies data from the regions when the data is rolled up at the central site. The region code column is used as part of a composite primary key (the reg_code and id columns) to uniquely identify the parts from each region. Figure 7.12 illustrates this solution.

click to view at full size

Figure 7.12 Example of a central Subscriber with multiple Publishers

With this solution, data does not need to be updated at the factory; each warehouse updates only the rows that it publishes.

An Example of the Multiple Publishers/Multiple Subscribers Model

A pastry company has three shops in different parts of the city. Each shop has a server that publishes its own orders table and subscribes to the orders tables that the other shops publish.

The Current Process

Currently, each shop runs out of certain ingredients on a regular basis. When one shop runs out of a necessary ingredient, it must contact one or both of the other shops to find out whether the ingredient is in stock and then separately arrange to have it ordered.

The Business Issue

Because it is impossible to know the inventory of the other shops until the last minute, situations arise in which all shops are out of the same ingredient at the same time.

The Proposed Replication Model

Implementing a multiple Publishers/multiple Subscribers model that uses transactional replication enables each shop to know immediately if the inventory of the other shops can solve a given problem. Shortages can be anticipated so that each store is able to fill its orders.

As shown in Figure 7.13, Shops A, B, and C each have a copy of the orders table. Each shop is both a Publisher and a Subscriber the server at each shop publishes some rows from the orders table to the other two shops and subscribes to the rows of the orders table published by the other two shops.

click to view at full size

Figure 7.13 Example of multiple Publishers with multiple Subscribers

Each shop has responsibility for only a portion of the data. The orders table is horizontally filtered by area. For example, Shop A is responsible for updating the data in Area 1. For this scenario to work with transactional replication, each shop can update the data only in its area.

Using Replication to Update Warehoused Data

These scenarios present some specific data warehousing architectures that might benefit from using replication to move data between systems. These are not required configurations; they simply present possible uses of replication components in a data warehousing architecture:

Using replication to update data marts

Using replication to stage data for transformation

Using replication to update data warehouses

Using Replication to Update Data Marts

This scenario shows a collection of OLTP applications that feed into the central data warehouse via the Data Transformation Services (DTS). DTS populates the data warehouse on a scheduled basis. As part of the final step in the DTS package, a snapshot replication should be executed to push data down to the data marts. After the data has been refreshed in the data marts, the online analytical processing (OLAP) cubes can be scheduled for rebuilding. Figure 7.14 shows where replication is used in this scenario.

This configuration is best suited for LANs due to the volume of data possibly being moved through the system.

click to view at full size

Figure 7.14 Using replication to update data marts

Using Replication to Stage Data for Transformation

This scenario shows a collection of OLTP applications that feed into an intermediate database for transformation via transactional replication. From the intermediate database, DTS cleans the data and pumps it into the central data warehouse. As part of the final step in the DTS package, a snapshot replication should be executed to push data down to the data marts. After the data has been refreshed in the data marts, the OLAP cubes can be scheduled for rebuilding. Figure 7.15 shows where replication is used in this scenario.

This configuration is best suited for remote OLTP applications and a central data warehouse and OLAP cube. Moving just the transactions into the intermediate database reduces the volume of data being moved from the remote OLTP application into the central data warehouse. From this point, data moves similarly to the previous scenario.

click to view at full size

Figure 7.15 Using replication to stage data for Transformation

Using Replication to Update Data Warehouses

This scenario shows a collection of OLTP applications that feed into local data marts via DTS. From the local data marts, snapshot replication pushes the data into the central data warehouse. After the data has been refreshed in the data warehouse, the OLAP cubes can be scheduled for rebuilding. Figure 7.16 shows where replication is used in this scenario.

This configuration is best suited for remote OLTP applications and remote data marts with a central data warehouse and OLAP cube. This configuration depends on a reliable WAN connection.

click to view at full size

Figure 7.16 Using replication to update data warehouses

Selecting a Data Population Mechanism

You have a number of choices for populating your data warehouse and data marts. The following sections help you select the appropriate replication or DTS solution for various data warehousing environments.

Use Transactional Replication with OLTP Servers

Transactional replication is often the best choice when replicating data from OLTP servers, because the Publisher sends a steady stream of inserts and updates as they occur on the OLTP server. The process of replication uses only a small amount of the processing time of the OLTP server and a small amount of network bandwidth for each transaction.

Snapshot replication is a good choice when the OLTP server does not have a reliable or fast connection to the Subscriber or when the OLTP server has no processor time to spend on replication. Snapshot replication is typically used to replicate an entire table, but you can create a horizontally partitioned article to limit the rows sent to the Subscriber and prevent overloading the Distributor with images of large databases.

TIP
In combining the execution of stored procedures and transactional replication, it may be more efficient to replicate the execution of the stored procedure rather than the operations that the stored procedure performed on the published tables.

Use Snapshot Replication When the Data Is Static

When the data is static, snapshot replication is a good choice. For example, the information in the data warehouse is usually considered unchangeable. Transactional replication works only if someone is inserting, updating, or deleting rows in the table. Because those operations are impossible in a read-only database, snapshot replication is the only method of replication for moving data out of the data warehouse.

NOTE
You can set the read-only database option so that SQL Server does not allow the database to be modified. Setting the read-only option also improves performance because almost no locking is necessary in the database. Alternatively, if you consider a database to be read-only but do not set the read-only database option, you will have to use security and programmatic measures to ensure that the database is not modified.

Use Snapshot Replication with Aggregate Data

Because there is no need for transactional consistency when summarizing information, snapshot replication is a good choice for replicating aggregate data. You can perform complex joins and apply aggregate functions only once and then distribute the results to Subscribers.

Use DTS to Convert to a Different Schema

If you must recombine data into different schemas, you can replicate to the Subscriber and then use DTS to convert the data into a new format. For example, you could generate a cross-join between the Customer table and the Invoice table to create a table that contains a customer s city, state, and country information as well as the part number, quantity, and cost of each item ordered.

Use DTS to Import Heterogeneous Data

Use the DTS Import wizard to create a DTS package that imports and transforms data from database management systems other than SQL Server 7.0. Once the data is imported into a SQL Server database, you can use snapshot replication to distribute the data to other servers.

Replication Exercises

In the replication exercises, your server will be the Publisher, Distributor, and Subscriber. You will create publications in the Northwind database (representing the Publisher) and subscriptions in other databases or tables (representing the Subscribers).

Exercise 1: Enabling Publishing and Distribution and Creating a Publication

In this exercise, you will enable publishing and distribution and create a new publication of the Products table in the Northwind database. You do not actually do anything to enable publishing and distribution; it is enabled automatically by the Create Publication wizard when you create the publication.

To enable publishing and distribution and create a publication

In the console tree of SQL Server Enterprise Manager, click your server name.

On the Tools menu, point to Replication, and then click Create and Manage Publications.

In the Create and Manage Publications dialog box, click Northwind, and then click Create Publication.

Use the Create Publication wizard and the information in the following table to create your publication. Accept defaults for options not specified. When you reach the last window of the wizard, click Finish to close the wizard and create the publication.

Option	Value
Distributor	Your server
Publication type	Transactional publication
Allow immediate-updating subscriptions	No
Subscriber types	All Subscribers will be servers running SQL Server
Articles for publication	Check dbo.Products, click the ellipsis to set the Article name and the Destination table name
Article name	ReplProducts_Article
Destination table name	ReplProducts Click OK to return to the wizard
Publication name	Northwind_Products_Publication
No, create a publication without data filters and with the following properties	Selected

If you are prompted to have the SQL Server Agent start automatically, click Yes. You will see a dialog box that informs you that the Replication Monitor has been added to the console tree; click Close to close this dialog box.

After the publication has been created, click Close to close the Create and Manage Publications dialog box.

On the Tools menu, point to Replication, and then click Configure Publishing, Subscribers and Distribution.

In the Publisher and Distributor Properties dialog box, on the Distributor tab, select distribution in Databases, and then click Properties.

In the Store the transactions, At least option, type 24 to have the Distributor store the transactions for at least 24 hours.

Click OK twice to close the Distribution Properties and the Publisher and Distributor Properties dialog boxes.

In the console tree, expand your server, and then expand Replication Monitor.

If prompted, click Yes to turn polling on. Click OK to save the refresh and rate settings.

Expand Publishers, expand your server, and then click Northwind_Products_Publication:Northwind. The status of the Snapshot and Log Reader Agents for your new publication is shown in the Details pane.

Exercise 2: Creating a New Database

In this exercise, you will create a new database that will be used to create a pull subscription in the next exercise.

To create a new database

In the console tree, right-click Databases, and then click New Database.

In Name, type PullSubs, and then click OK to close the Database Properties dialog box and create the new database.

Exercise 3: Creating a Subscription

In this exercise, you will create a subscription to pull the publication that you published in a previous exercise.

To create a subscription

Click your server. On the Tools menu, point to Replication, and then click Pull Subscription to SQLSERVER (or your server name, if different).

Select PullSubs, and then click Pull New Subscription.

Use the Pull Subscription wizard and the information in the following table to pull Northwind_Products_Publication from the Publisher. When you reach the last window of the wizard, click Finish to close the wizard and create the pull subscription.

Option	Value
Choose Publication	Expand your server, then select Northwind_Products_Publication: Northwind
Destination Database	PullSubs
Initialize Subscription	Yes, initialize the schema and data at the Subscriber
Distribution Agent Schedule	Continuously
Start Required Services	SQLServerAgent (on SQLSERVER)

When you have completed all of the steps in the wizard, close the Pull Subscription dialog box.

After the subscription has been created, in the console tree expand your server, expand Management, expand SQL Server Agent, and then click Jobs.

The job called SQLSERVER-Northwind-Northwind_Products_Pub-SQLSERVER-PullSubs- 0, with a category of REPL-Distribution is the job that starts the Distribution Agent for your new subscription.

Which server s system resources will this job consume? Would a push subscription change the resources being used?

Answer

In the console tree, expand Replication Monitor, expand Agents, and then click Distribution Agents. The agent listed in the Details pane is the Distribution Agent for you new subscription being monitored by the Replication Monitor. Note that the Last action for the agent says, "The initial snapshot for article ReplProducts_Article is not yet available." This is because the Snapshot Agent has not yet been run. It is scheduled to run, by default only, at 11:30 p.m. each day. In the next exercise, you will start the Snapshot Agent manually.

Exercise 4: Running the Snapshot Agent

In this exercise, you will start the Snapshot Agent to create a snapshot of the data and schema of the publication. Once the snapshot has been created, the Distribution Agent will use it to create the ReplProducts table and populate it in the PullSubs database.

To run the Snapshot Agent

In the console tree, expand Replication Monitor, expand Publishers, expand your server, and then click Northwind_Products_Publication:Northwind.

In the Details pane, right-click Snapshot, and then click Start.

Wait for the Status column to say, "Succeeded," indicating that the Snapshot Agent has generated a snapshot successfully.

Right-click Snapshot, and then click Agent History. In the history list, there will be a single entry indicating the successful generation of a snapshot of one article.

Click Session Details. Review the list of actions that were performed when the snapshot was generated. Click Close twice to close the Snapshot Agent History dialog boxes.

Use Windows Explorer to review the folders and files that were created below C:\Mssql7\Repldata\Unc. Note the ReplProducts_Article.bcp, ReplProducts_Article.idx, and ReplProducts_Article.sch files that the Snapshot Agent created and that were referenced in the session details you reviewed in the previous step.

With Notepad, open the ReplProducts_Article.sch file and review the script that was generated. This script contains the table schema creation statements.

With Notepad, open the ReplProducts_Article.idx file and review the script that was generated. This script contains index creation statements.

Switch to SQL Server Enterprise Manager. In the console tree, expand Replication Monitor, expand Agents, and then click Distribution Agents.

View the agent history and action messages by right-clicking the agent in the Details pane and selecting Agent History. The Distribution Agent will say that there are no replicated transactions available because no new transactions have occurred since it replicated the snapshot. Click Close.

Open SQL Server Query Analyzer. Verify that the article was published to the ReplProducts table by executing the following statement:

 USE PublSubs SELECT * FROM ReplProducts

Exercise 5: Exploring Changes Due to Replication

In this exercise, you will use SQL Server Enterprise Manager to see some of the changes that are made to a server when replication is enabled.

To explore some of the changes after replication is enabled

Right-click your server name in the console tree, and click Disconnect. Many of the steps in this exercise require data to be refreshed in SQL Server Enterprise Manager; by disconnecting before you begin, you force all of the information to be refreshed.

In the console tree, expand your server, then expand Databases. Note that the distribution database has been added to the list of databases.

If distribution (and other system databases such as master) is not listed, right-click on your server and click Edit SQL Server Registration properties. On the property sheet, make sure that the Show system databases and system objects option is checked.

You must not delete the distribution database or add objects to or remove objects from the distribution database. This database is used to store replication information and replication data from the Publisher that is to be sent to or retrieved by the Subscribers.

Notice that the Northwind database icon has a hand below it, indicating that it has been published. Expand the Northwind database. A node called Publications has been added.

Expand Publications, and then click Northwind_Products_Publication. In the Details pane, all subscriptions to the selected publication are listed.

Expand the PullSubs database. Notice that a node called Pull Subscriptions has been added.

Click Pull Subscriptions. In the Details pane, the subscription to Northwind_Products_Publication is listed.

In the Details pane, right-click the subscription and then click Properties. Review the properties of your pull subscription.

Click Cancel to close the Pull Subscription Properties dialog box.

Exercise 6: Updating the Published Article

In this exercise, you will update the Products table (on the Publisher) and verify that the updated information is replicated to the Subscriber.

To update the published article

In SQL Server Query Analyzer, type and execute the following:

 USE Northwind SELECT * FROM Products WHERE ProductID = 1

Review the result set. Make a note of the ReorderLevel for the product (it should be 10).

Switch to SQL Server Enterprise Manager, then expand Replication Monitor. Expand Agents, and then click Log Reader Agents. You must now switch to SQL Server Query Analyzer and type and execute the following script.

 USE Northwind UPDATE Products SET ReorderLevel = 20 WHERE ProductID = 1 SELECT * FROM Products

Immediately after executing the script, switch back to SQL Server Enterprise Manager and watch the Status of the Log Reader Agent.

Within a few moments, the Status of the Log Reader Agent entry should change to Running and the Last Action should indicate that one transaction with one command was delivered. In the console tree, click Distribution Agents. Within a few moments, the status of the Distribution Agent entry should change to Running, and the Last Action should indicate that one transaction with one command was delivered.

In the console tree, click Log Reader Agents. Right-click SQLSERVER (or your server name, if different), and then click Agent History. Click Session Details to see the details for the current session. You may still see the action indicating that one transaction with one command was delivered; otherwise, the action will now indicate that no replicated transactions are available. The detail above the action messages includes various statistics for the current session for the Log Reader Agent.

In the console tree, click Distribution Agents. Right-click the publication in the Details pane, and then click Agent History. Click Session Details to see the details for the current session. You may still see the action indicating that one transaction with one command was delivered; otherwise, the action will now indicate that no replicated transactions are available. The detail above the action messages includes various statistics for the current session for the Distribution Agent.

If you want to see the agents running again, execute the query from Step 3 with a different reorder level. You must use a different reorder level or the Distribution Agent will not run, because the Log Reader will detect that no data has actually changed.

Switch to SQL Server Query Analyzer. Execute the following statement to view the reorder level in the ReplProducts table in the PullSubs database and verify that the data was replicated:

 USE PublSubs SELECT * FROM Products WHERE ProductID = 1

Lesson Summary

Along with the three different types of replication discussed in Lesson 3, "SQL Server Replication Types," you can use many different physical models when you implement replication. The primary models are the central Publisher/Distributor model, the central Subscriber/multiple Publishers model, and the multiple Publishers/multiple Subscribers model. These models can be mixed or combined. When implementing replication, you can also choose whether to place the Publisher and the Distributor on the same computer or on different computers.

Replication can be used in a number of ways to move data in a data warehouse environment. These ways include updating data marts, staging data for transformation, and updating data warehouses.