This lesson describes the Publisher/Subscriber metaphor, which is the SQL Server model for defining the source and destination of replicated data and the sets of data that will be replicated. It also discusses various types of data filtering.
After this lesson, you will be able to:
- Explain the Publisher/Subscriber metaphor, including articles, publications, and subscriptions
- Explain the process of filtering data for replication
- Describe SQL Server replication agents
Estimated lesson time: 30 minutes
Replication uses a Publisher/Subscriber metaphor for distributing data. In a replication environment, a Publisher sends data and a Subscriber receives data.
A single SQL Server can act as a Publisher, a Distributor, a Subscriber, or any combination of the three for one or more databases at the same time. Figure 7.3 illustrates the Publisher/distributor/subscriber relationship.
Figure 7.3 The Publisher/Distributor/Subscriber relationship
For a video demonstration that discusses SQL Server replication, run the Rep.htm file located in the \Media folder on the Supplemental Course Materials CD-ROM.
A Publisher is a SQL Server that maintains a source database, makes published data from that database available for replication, and sends changes to the published data to the Distributor.
A Subscriber is a SQL Server that receives an initial copy of the replicated data and later receives updates to this data. It is possible to allow the data on a Subscriber to be changed, and a Subscriber can, in turn, be a Publisher to other Subscribers.
The Distributor receives a copy of all changes to the published data, stores the changes, and then makes them available to the appropriate Subscribers. A special system database called the distribution database and a folder called the distribution working folder are created on the Distributor for storing this data and replication configuration information. By default, the distribution folder is C:\Mssql7\Repldata, but this can be changed and other folders can be created. Storing the replicated data in this way makes it possible to forward data to Subscribers at short or long intervals and allows for Subscribers that are not always connected. The Distributor can send changes to Subscribers, or Subscribers can fetch changes from the Distributor.
Although the Publisher and the Distributor can be on the same computer, it is more typical in larger or more active sites to locate the Distributor on its own server. If the Distributor is located on another computer, a complete, separately licensed SQL Server installation is required on that computer. It is also possible for one distribution server to support multiple publication servers.
NOTE
The most important concept in replication is that every replicated data element has only one Publisher. None of the SQL Server replication options uses a so-called multiple master model. This does not mean that data can be modified only on a single server. You will learn how to allow subscribed data to be changed and even published, but in doing so no data element will have more than one Publisher.
In keeping with the Publisher/Subscriber metaphor, the terms publication and article are used to refer to data that is published.
A publication is a collection of articles. The following facts apply to a publication:
An article is the basic unit of replication and represents a single data element that is replicated. Subscribers subscribe to publications, not articles. An article can be
It is possible to publish a subset of a table as an article. To publish a subset of a table, you use a filter to select the rows (horizontal filter) and columns (vertical filter) that are replicated; this is known as filtering data. Filtering data helps to avoid replication conflicts when multiple sites are allowed to update data. You can filter tables vertically, horizontally, or both vertically and horizontally. Each instance of a filtered table is a separate article. Figure 7.4 shows examples of the results of vertical and horizontal filters applied to two tables.
Figure 7.4 Horizontal and vertical filtering
As shown in the upper half of Figure 7.4, a vertical filter contains a subset of the columns in a table. The Subscriber receives only the replicated columns. For example, you might use a vertical filter to publish all but the Salary column from the Employee table. Vertical filtering is similar to specifying only certain columns from a table in a SELECT statement.
Merge replication (described in Lesson 3) does not support vertical filtering.
As shown in the lower half of Figure 7.4, a horizontal filter contains a subset of the rows in a table. The Subscriber receives only the subset of rows. For example, you can publish order records by region to each region. Horizontal filtering is similar to specifying a WHERE clause in a SELECT statement.
An alternative to filtering is to create separate tables. This can be more efficient than filtering, preventing conflicts and simplifying the logical view of the data. For example, instead of storing the sales data from multiple branches in a single table at a head office, you could create a separate table for each branch. The disadvantages to creating separate tables are that applications must deal with the separate tables, and administration can be more complex because if the table structure changes, tables on different servers may have to be altered.
Creating separate tables that use the same schema but contain different rows is usually called partitioning. Creating separate tables containing different columns is less common.
Partitioning Rows
Partitioning rows (horizontal partitioning) involves physically defining a horizontal subset of data as a separate table. For example, you can partition a customer table into separate tables for each region.
Partitioning Columns
Partitioning columns (vertical partitioning) involves physically defining a vertical subset of data as a separate table. For example, you can vertically partition an employee table by placing the name, title, and office number columns in one table and other confidential information, such as birth date and salary information, in another table.
The configuration that defines how a database on a Subscriber will receive a publication from a Publisher is referred to as a subscription.
Two kinds of subscriptions are possible. The type of subscription determines how subscriptions are created and administered and how data is replicated. There can be many subscriptions of both types to a single publication, as shown in Figure 7.5.
Figure 7.5 Push versus pull subscriptions
You can set up subscriptions while creating or administering publications on the Publisher. This is known as a push subscription. Push subscriptions centralize subscription administration in the following ways:
With a push subscription, the Distributor propagates the changes to a Subscriber without a request from the Subscriber to do so. Typically, push subscriptions are used in applications that must send changes to Subscribers as soon as they occur. Push subscriptions are best for applications that require higher security and near-real time updates and in cases in which the higher processor overhead at the Distributor does not affect performance.
For push subscriptions, the replication agents that replicate data to the Subscriber run at the Distributor or the Publisher.
You can also set up a subscription while administering a Subscriber. This is known as a pull subscription. The following are distinguishing characteristics of a pull subscription:
Pull subscriptions are best for applications that need lower security, that need more Subscriber autonomy (such as mobile users), and that need to support a high number of Subscribers (such as Subscribers that use the Internet).
Two kinds of pull subscriptions are available. Standard pull subscriptions are registered on the Publisher. Anonymous pull subscriptions are set up entirely on the Subscriber; no information about the subscription is stored on the Publisher. Anonymous subscriptions are ideal for Subscribers that connect via the Internet. Internet-based Subscribers can connect using the FTP protocol.
For pull subscriptions, the replication agents that replicate data to the Subscriber run at the Subscriber.
TIP
This discussion of subscriptions states that creation and administration of subscriptions is performed on the Publisher for push subscriptions and on the Subscriber for pull subscriptions. This definition does not mean that these tasks must be performed physically at these servers. You can register a Publisher and a Subscriber in SQL Server Enterprise Manager running on a client computer and then manage both push and pull subscriptions by simply selecting the relevant server.
SQL Server replication uses a Publisher/Subscriber metaphor. Data identified for replication is distributed from a Publisher server through a Distribution server to a Subscriber server. Data marked for replication, called a publication, can be horizontally and vertically filtered. A subscription configures a Subscriber to receive data from a Publisher. Subscriptions can be configured from the Publisher, a push subscription, or from the Subscriber, a pull subscription.