3 4
In this section, you'll learn about basic database replication concepts. We'll look at the publish-and-subscribe metaphor, the three types of replication, replication data, data propagation, and replication agents.
Microsoft SQL Server 2000 replication is based on the publish-and-subscribe metaphor first used to implement replication in SQL Server 6. This metaphor consists of three main concepts: publishers, distributors, and subscribers. A publisher is a database system that makes data available for replication. A distributor is the database system that contains the distribution database, or pseudodata, used to maintain and manage the replication. A subscriber is a database system that receives replicated data and stores the replicated database.
The publisher consists of a Microsoft Windows system hosting a SQL Server database. This database provides data to be replicated to other systems. In addition, the SQL Server database keeps track of which data has changed so that it can be effectively replicated. The publisher also maintains information about which data is configured for replication. Depending on the type of replication that is chosen, the publisher does some work or little work during the replication process. This will be explained in further detail later in this chapter.
A replicated environment can contain multiple subscribers, but any given set of data that is configured for replication, called an article, can have only one publisher. (Articles are described in more detail in the section "Replication Data" later in this chapter.) Having only one publisher for a particular set of data does not mean that the publisher is the only component that can modify the data—the subscriber can also modify and even republish the data. However, this can get a little tricky, as you will see in this chapter and the next two chapters.
In addition to containing the distribution database, servers acting as distributors store metadata, history data, and other information. In many cases, the distributor is also responsible for distributing the replication data to subscribers. The publisher and the distributor are not required to be on the same server. In fact, you will likely use a dedicated server as a distributor. Each publisher must be assigned a distributor when it is created, and a publisher can have only one distributor. This arrangement is described in more detail in the section "Configuring Publishing and Distribution" later in this chapter.
NOTE
Metadata is data about data. Metadata is used in replication to keep track of the state of replication operations. It is also the data that is propagated by the distributor to other members of the replication set and includes information about the structure of data and the properties of data, such as the type of data in a column (numeric, text, and so on) or the length of a column.
As mentioned, subscribers are the database servers that store the replicated data and receive updates. Subscribers can also make updates and serve as publishers to other systems. For a subscriber to receive replicated data, it must subscribe to that data. Subscribing to replication involves configuring the subscriber to receive that data. A subscription is the database information to which you are subscribing. You'll learn more about the relationships between the components in the section "Configuring Snapshot Replication" later in this chapter.
SQL Server offers three types of replication: snapshot, transactional, and merge. These replication types offer varying degrees of data consistency within the replicated database, and they require different levels of overhead.
Snapshot replication is the simplest replication type. With snapshot replication, a picture, or snapshot, of the database is taken periodically and propagated to subscribers. The main advantage of snapshot replication is that it does not involve continuous overhead on publishers and subscribers. That is, it does not require publishers to continuously monitor data changes, and it doesn't require the continuous transmission of data to subscribers. The main disadvantage is that the database on a subscriber is only as current as the last snapshot.
In many cases, as you will see later in this chapter, snapshot replication is sufficient and appropriate—for example, when source data is modified only occasionally. Information such as phone lists, price lists, and item descriptions can easily be handled by using snapshot replication. These lists can be updated once per day during off hours.
Transactional replication can be used to replicate changes to the database. With transactional replication, any changes made to articles (a set of data configured for replication) are immediately captured from the transaction log and propagated to the distributors. Using transactional replication, you can keep a publisher and its subscribers in almost exactly the same state, depending on how you configure the replication.
Transactional replication should be used when it is important to keep all of the replicated systems current. Transactional replication uses more system overhead than snapshot replication because it individually applies each transaction that changes data in the system to the replicated systems. However, transactional replication keeps the systems in a more up-to-date state than do snapshot and merge replication. Transactional replication is covered in detail in Chapter 27.
Merge replication is similar to transactional replication in that it keeps track of the changes made to articles. However, instead of individually propagating transactions that make changes, merge replication periodically transmits a batch of changes. Because merge replication transmits data in batches, it is also similar to snapshot replication. (Although with snapshot replication, all data configured for replication, not just changes, is propagated.) Merge replication is covered in detail in Chapter 28.
You group data to be replicated in an object called a publication. A publication is made up of one or more articles. Let's look at articles and publications in more detail.
As mentioned, an article is an individual set of data that is to be replicated. An article can be an entire table, a subset of a table consisting of certain columns or rows, or a stored procedure. You create subsets by using filters. A filter that is used to create a subset consisting of rows is called a horizontal filter. A filter that is used to create a subset consisting of columns is known as a vertical filter. Horizontal and vertical filters are covered in more detail later in this chapter.
NOTE
Filters are defined and work differently in merge replication than in snapshot and transactional replication. This is covered in detail in this chapter and in the next two chapters.
A publication is a set of articles grouped together as a unit. Publications provide the means to replicate a logical grouping of articles as one replication object. For example, you can create a publication to be used to replicate a database consisting of multiple tables, each of which is defined as an article. It is more efficient to replicate a database by replicating the entire database in one publication than by replicating tables individually.
A publication can consist of a single article, but it almost always contains more than one article. However, a subscriber can subscribe only to publications, not to articles. Therefore, if you want to subscribe to a single article, you must configure a publication that contains only that article and then subscribe to that publication.
Replicated data can be propagated in a number of ways. All propagation methods are based on either push subscriptions or pull subscriptions. A subscriber can support a mixture of push and pull subscriptions simultaneously.
If you set up a push subscription, the distributor is responsible for providing updates to the subscribers. Updates are initiated without any request from the subscriber. A push subscription is useful when centralized administration is desired because the distributor, rather than multiple subscribers, controls and administers replication. In other words, the initiation and the scheduling of the replication are handled on the distributor.
Push subscriptions allow you a lot of flexibility in scheduling replication. Push subscriptions can be configured to propagate changes immediately after they are made or to perform updates on a regular schedule. You'll learn more about these options in the section "Configuring Snapshot Replication" later in this chapter.
Pull subscriptions allow subscribers to initiate replication. Replication can be initiated either via a scheduled task or manually. Pull subscriptions are useful if you have a large number of subscribers and if the subscribers are not always attached to the network. Because subscribers initiate pull subscriptions, subscribers not always connected to the network can periodically connect and request replication data. This can also be useful in reducing the number of connection errors reported on the distributor. If the distributor tries to initiate replication to a subscriber that does not respond, an error will be reported. Thus, if the replication is initiated on the subscriber only when it is attached, no errors will be reported.
Several agents are used to perform the actions necessary to move the replicated data from the publisher to the distributor and finally to the subscriber: the Snapshot Agent, the Log Reader Agent, the Distribution Agent, the Merge Agent, and the Queue Reader Agent. In this section, you'll learn what these agents do and how to manage them.
The Snapshot Agent is used for creating and propagating the snapshots from the publisher to the distributor (or snapshot location). The Snapshot Agent creates the replication data (the snapshot) and creates the information that is used by the Distribution Agent to propagate that data (the metadata). The Snapshot Agent stores the snapshot on the distributor (or anywhere that you specify). The Snapshot Agent is also responsible for maintaining information about the synchronization status of the replication objects; this information is stored in the distribution database.
The Snapshot Agent is dormant most of the time and might periodically activate, based on the schedule that you have configured, and perform its tasks. Each time the Snapshot Agent runs, it performs the following tasks:
As you can see, the Snapshot Agent is responsible for only creating the snapshot; it does not distribute it to subscribers. Other agents perform this task.
The snapshot should be refreshed often when snapshot replication is used and the data is changing on the publisher. Because the snapshot is regularly copied to subscribers, it should be current.
NOTE
When transactional or merge replication is being used and new subscriptions are not being added, it is unnecessary to refresh the snapshot at all.
The Log Reader Agent is used in transactional replication to extract change information from the transaction log on the publisher in order to replicate these commands into the distribution database. Each database that uses transactional replication has its own Log Reader Agent on the publisher. The Log Reader Agent is covered in detail in Chapter 27.
The Distribution Agent propagates snapshots and transactions from the distribution database to subscribers. Each publication has its own Distribution Agent. If you are using a push subscription, the Distribution Agent runs on the distributor. If you are using a pull subscription, the Distribution Agent runs on the subscriber.
The Merge Agent is used in merge replication to reconcile (merge) incremental changes that have occurred since the last reconciliation. When you use merge replication, the Distribution Agent and the Snapshot Agent aren't used—the Merge Agent communicates with both the publisher and the distributor. The Merge Agent is covered in detail in Chapter 28.
The Queue Reader Agent is used to propagate changes made to subscribers of snapshot or transaction replication that have been configured with the queued updating option. This option allows changes to be made on the subscriber without the need to use a distributed transaction.
You can monitor any of the agents by using Enterprise Manager. To do so, follow these steps: