Chapter 12: SQL Server Replication | ppk on JavaScript, 1/e

When offices spread out, and salespeople become more remote, administrators periodically face the need to 'disconnect' the data. Moving data from one OLE DB data source to another destination can be one of the most frustrating and challenging things a DBA can endure. In this chapter, I'll discuss replicating your data in SQL Server 2000. Truthfully, you could read an entire book on the sophisticated features of SQL Server replication and only scratch the surface, so in this chapter I'll try to focus on administration issues with the feature.

SQL Server Replication

the major reasons for the release of SQL Server 6.0 was for replication functions, and the feature has been improving since that release. Replication allows you to migrate data to a different server, even if that server is in a different geographical area. Some of the scenarios where you might choose to use replication include:

You have remote salespeople who need to synchronize a list of products from headquarters and send in their sales for the day.
You're performing an inventory check of your warehouse on a handheld device and you want to synchronize your figures with the estimated figures every two hours.
You have a loan officer on a laptop who needs to synchronize the borrower's information with the main system.
Your remote offices need to synchronize their weekly sales and inventory figures with headquarters. At the same time, the HR department at headquarters needs to send the remote offices a list of new employees.
You have a number of distributors that need your inventory and product list daily.

Note

There are many third-party software programs that perform a function similar to SQL Server replication. It is my experience that the native SQL Server replication is far less costly to implement and is more reliable.

Replication Considerations

As you can imagine, the reasons to replicate your data to a distributed database are numerous. There are a number of high-level considerations to keep in mind before you begin to plan your solution:

Connectivity Ensure that your servers will be able to communicate with each other on the SQL Server port (by default, port 1433). Depending on the type of replication, you may also need Microsoft Distributed Transaction Coordinator (MS DTC), which uses a different set of port numbers. This is usually the most overlooked part of the planning process until the week before. Most companies have a line of red tape that you have to go through before the firewall can be opened for a port. Keep in mind that if you're installing a new T-1 or frame relay connection between your offices, it could take up to 45 days to install the connection.
Autonomy Determine how independent each server will be regarding the parent server (also referred to as the publisher). Decide whether the remote site can modify data in the sales table and then merge it with headquarters. The alternative is to make the sales table a read-only table. Also decide whether you want updates to take place on a transactional basis or via batch updates from headquarters.
Data and schema needs Determine how consistent the data is among all the servers. For example, you must decide if your servers will have the same data on all the servers rapidly, or if it's fine to have a slight lag between updates. The former requires added processing power, and the latter adds wait time.
Latency This refers to how up to date your data is. If you're deploying an application to a remote salesperson, you will want to create a high level of autonomy. Remote offices may be more dependent on up-to-date information and need more frequent updates.
Determine the ladder Decide who wins a database update if two users in different locations update the same data. Are you going to deploy a 'first come, first served' scenario, or are you going to have a hierarchy where certain logins (such as a CEO) would receive preferential treatment?

I'll expand on all of these considerations when I discuss the various types of replication.

Replication Architecture

The architecture of replication is relatively straightforward. The set of items you're replicating is called a publication. Each publication consists of multiple articles (the information being replicated). You can replicate the following types of articles:

Entire table Allows you to replicate the entire table from one location to another. This is the easiest type of replication.
Vertical partition Allows you to choose individual columns from a table. For example, this is useful if you want to replicate all the columns in an employee table other than the salary column.
Horizontal partition Allows you to replicate only select data from one location to another. For example, this is useful if you want to replicate only the specific salesperson's data to their laptop.
Stored procedure results Allows you to replicate stored procedures. This is only available in a select few replication types.
Views Allows you to replicate the contents of a view to a remote system. This is only available in a select few replication types.

In replication, data originates from the publisher, which controls the master copy of the schema and data that is sent to other servers. Data is transferred to a distributor, which acts as the middleman, distributing the data to anyone who has subscribed to the data. This takes a huge load off of the publisher when you have dozens of subscribers. The data that is transferred is called an article. Figure 12-1 illustrates the architecture of the publisher/distributor/subscriber paradigm.

click to expand
Figure 12-1: Publisher/distributor replication model

The distributor holds a key component called the distribution database, which contains the history of the replicated events. In some types of replication, that database contains the transactions that are replicated. Distributors take the load off the production database server. If you have lots of subscribers, you should make the distributor a server that is not used for any other purpose than to replicate data.

Note

The distribution database is extremely important and you should make sure it has plenty of space. In some types of replication (such as transactional), you'll need a great deal of space. Also take note that if you lose connection with the subscribers, the space requirement will continue to grow until the system can resynchronize with all subscribers.

Caution

Technically, a single SQL Server can perform multiple roles. For example, your publisher can also be a distributor. This is not advised, however, due to the amount of extra traffic this creates on your main production database. In data warehouse environments, it is not uncommon to have multiple publishers send data to a single data warehouse subscriber.

The subscribers can subscribe to groups of articles called publications. These publications can be pushed or pulled to the subscriber. Pushing a publication means the publisher initiates the transfer; pulling means the subscriber requests the publication. Push subscriptions involve minimal latency because the publisher pushes the data as needed or on a schedule. Pull subscriptions can involve a higher amount of latency, but also give the subscriber a high amount of autonomy.

Note

Pull subscriptions are ideal for traveling salespeople who can only synchronize up to the publisher when in the office.

Heterogeneous Support for Replication

One of the advantages of SQL Server replication is that it uses OLE DB to move data. As a result, you can send data to data sources such as Oracle, Access, and DB2. Most features are also backward compatible with SQL Server 7.0. To check compatibility, you should know that the following features are built into the DBMS:

DBMS must be ODBC level 1 compliant
Database cannot be read-only
DBMS must be 32-bit, thread-safe
DBMS must support transactions
DBMS must support DDL