News Site Revisited | Scalable Internet Architectures

After the preceding discussion of replication technologies, you might be thoroughly confused about what a good replication strategy is. This confusion isn't a bad thingit is the decision in disguise. The goal here is scalability, and as such we need to ensure that we can scale horizontally. This means that if we need more power we want to be able to simply add more nodes for a linear performance benefit. Although a linear speed-up is a theoretical best, it's still our goal, and we should design the system to target that goal.

Master-master replication is two nodes. As such, adding nodes is not possible. Multimaster replication using two-phase commit is too expensive to scale horizontally. If you attempt a 100-node configuration, the cost of each node performing 2PC with 99 other systems will cause everything to come crashing down. Master-slave replication is the only approach left standing.

Choosing Technologies and Methods

For the increased interest of this discussion, assume that our news site is running against Oracle. Why Oracle? The demands on the OLTP system that drives the site are intense (remember, we are trying to solve that here), and a tremendous amount of maintenance and auditing goes on in stored procedures. Additionally, there is a need for extensive data mining against an operational data store that pulls information from the OLTP system and exposes it to analysts. That operational data store runs Oracle because the analysts requested to use some of the data mining and reporting tools that Oracle provides. The real answer, of course, is that the architect, VP of Engineering, and CTO sat down with a group of vendors and liked what they heard from Oracle. All that matters is that the system is running Oracle, it is way too busy, and we want to scale to many more users.

The site's dynamic content is serviced centrally via a small cluster of web application servers directly referencing the master OLTP Oracle database that services the end-user and an even smaller set of web application servers that serve administrative users (columnists, editors, and so on). The operational data store hangs off the OLTP database and services some administrative users to perform business tasks and in-depth analysis. Figure 8.1 illustrates this architecture.

Figure 8.1. A centralized, vertically scalable architecture

We could simple deploy slave Oracle instances (two) onsite with each web cluster location and perform master-slave replication. Technically this would work, but financiallynot a chance. Let's assume that we have content web servers in each of our four worldwide locations as shown in Figure 6.10 in Chapter 6, "Static Content Serving for Speed and Glory." Assuming 10 dual processor machines in each location, that would be Oracle licenses for 4 clusters x 2 nodes/cluster x 2 processors/node = 16 processors! The price for that would make more than a casual dent in the year's operating budget. If you aren't familiar with Oracle's licensing policies, it is typically on a per-feature, per-processor basis. This policy is friendly for those who want to vertically scale their solutions, but as discussed, the goal is horizontal scalability.

It is important to remember that Oracle does not "enable" our architecture. It is a relational database solution that can be queried via SQL, and, most importantly, there are many others just like it. We could switch our infrastructure to use something less costly knowing now that we want to scale out to many more nodes; however, one rule of thumb is don't change what works if you can help it.

We really want to make as few changes as possible to our existing core architecture but enable the web application deployed at each data center to have a locally accessible database with an accurate copy of data managed at the core. We want to be able to scale up and down without incurring a large operating expense or a large capital expense. It so happens that our developers' expertise is against Oracle, and we want to leverage as much of that as possible. Between PostgreSQL and MySQL, PostgreSQL is more similar in feature set and usage to Oracle than is MySQL.

An Oracle core infrastructure with 16 PostgreSQL replicas (shown in Figure 8.2) is an architecture that does not pose a substantial financial risk or an enormous technical risk with respect to leveraging existing application developer expertise.

Figure 8.2. A master-slave oriented architecture that scales horizontally

What we hope to leverage here is the replication of heavily queried data to our various clusters. The data, as with most heavily used website data, changes infrequently with respect to the frequency it is referenced in read-only operations. In other words, the read-to-write ratio is high. In our case, we need to replicate the news articles, administrative users (authors, editors, approvers), and customers (the end-user). Of course, this book isn't only on this topic, so the example here will aim to be complete and accurate with respect to the limited problem we are going to tackle. An actual implementation would likely encompass much more data. Here we will talk about three tables, but I've used this technique before to handle a multiterabyte schema with more than 1,200 tables.

Implementing Cross-Vendor Database Replication

Implementing cross-vendor database replication in a generic manner can revolutionize the way problems are tackled in an architecture. If done correctly, it can allow you to truly choose the right tool for the job when it comes to databases. If we have some tasks that can be solved using features only found in MySQL or Postgres or Oracle or SQL Server, we simply replicate the necessary information to and from them and our problem is solved.

As discussed, there are a variety of database replication techniques, and what each offers is different. Our goals are met by master-slave database replication, which fortunately is easy to implement in an application external to the source and destination databases.

Specifically, we have three tables in Oracle that we want to replicate to 16 PostgreSQL nodes: AdminUsers is rather small with only about 5,000 rows; SiteUsers is large, and we see about 0.1% change on that table (by row count) on a daily basis (that is, of the 100 million rows, about 100,000 rows change during the course of a normal day). NewsArticles contains the articles that are published, and this can be between 50 and 2,000 new articles per day, and many more updates as articles are written, edited, and approved. The code listing for source database CREATE TABLE DDL follows:

CREATE TABLE AdminUsers (  UserID INTEGER PRIMARY KEY,  UserName VARCHAR2(32) NOT NULL UNIQUE,  Password VARCHAR2(80) NOT NULL,  EmailAddress VARCHAR2(255) NOT NULL ); CREATE TABLE SiteUsers (  SiteUserID INTEGER PRIMARY KEY,  CreatedDate DATE NOT NULL,  UserName VARCHAR2(32) NOT NULL UNIQUE,  Password VARCHAR2(80) NOT NULL,  EmailAddress VARCHAR2(255) NOT NULL,  Country VARCHAR2(3) NOT NULL,  PostalCode VARCHAR2(10) NOT NULL ); CREATE TABLE NewsArticles (  ArticleID INTEGER PRIMARY KEY,  Author INTEGER NOT NULL REFERENCES AdminUsers(UserID),  Editor INTEGER REFERENCES AdminUsers(UserID),  Approver INTEGER REFERENCES AdminUsers(UserID),  CreatedDate DATE NOT NULL,  LastModifiedDate DATE NOT NULL,  PublishedDate DATE, Title VARCHAR2(200),  AbstractText VARCHAR2(1024),  BodyText CLOB,  isAvailable INTEGER DEFAULT 0 NOT NULL );

The SiteUsers table in particular has far fewer rows than we would expect our real site to have. But all tables are kept a bit thin on columns only to help the brevity of the example.

The basic concept here is to replicate the changes that happen against these tables to another database. So, the obvious first step is to attempt to keep track of the changes. Changes in databases all come in one of three forms: addition, removal, and change (that is, INSERT, DELETE, and UPDATE).

DML Replay Replication

Keeping track of INSERT, DELETE, and UPDATE is actually easy because SQL provides a trigger on each of these. Before we jump the gun and write ourselves a trigger to track the changes, we'll need some place to stick the data itself. This is where the techniques can vary from one approach to another.

In our model, we care only about representing the most recent view available, and it isn't vitally important for us to play each transaction on the source forward and commit them individually on the destination. Although it isn't any more difficult to do this sort of thing, we simply want to know all the missed transactions and replay them to achieve a "current" resultset. As such, we don't care how the record changed, just that it is likely different. It is wholly sufficient to track just the primary key of the records that have been modified in some way.

The first step is to create the tables that will store the primary keys that get modified. We will tackle the NewsArticles and SiteUsers tables this way and leave the AdminUsers for the next section. We need to know what row changed (by primary key), and it would make our lives easier if we knew what transaction the changes happened in (so that we can more easily track what we have done already) and for database maintenance (such as cleaning out the DML log because when it gets old we should track the time at which the modification took place). The code listing for DDL to create DML logs follows:

CREATE TABLE SiteUsersDML (  TXNID VARCHAR2(20) NOT NULL,  SiteUserID INTEGER NOT NULL,  InsertionDate DATE NOT NULL ); CREATE INDEX SiteUsersDML_TXNID_IDX ON SiteUsersDML(TXNID); CREATE INDEX SiteUsersDML_InsertionDate_IDX ON SiteUsersDML(InsertionDate); CREATE TABLE NewsArticlesDML (  TXNID VARCHAR2(20) NOT NULL,  ArticleID INTEGER NOT NULL,  InsertionDate DATE NOT NULL ); CREATE INDEX NewsArticlesDML_TXNID_IDX ON NewsArticlesDML(TXNID); CREATE INDEX NewsArticlesDML_InsertionDate_IDX ON NewsArticlesDML(InsertionDate);

Now that we have some place to track our changes, let's do it. Each time we insert or update data, we want to place the new primary key (SiteUserID for SiteUsers and ArticleID for NewsArticles) into the DML table along with the current time and the transaction ID. Any time we delete data, we want to take the same action except on the old primary key (in SQL talk, there is no "new" row for a delete because it is gone). The code listing for DML tracking triggers follows:

CREATE OR REPLACE TRIGGER SiteUsersDMLTracker   AFTER INSERT OR UPDATE OR DELETE   ON SiteUsers   REFERENCING NEW as NEW OLD as OLD   FOR EACH ROW BEGIN   IF (deleting)   THEN     INSERT INTO SiteUsersDML                 (TXNID, SiteUserID, InsertionDate)          VALUES (DBMS_TRANSACTION.local_transaction_id,                 :OLD.SiteUserID, SYSDATE);   ELSE     INSERT INTO SiteUsersDML               (TXNID, SiteUserID, InsertionDate)          VALUES (DBMS_TRANSACTION.local_transaction_id,               :NEW.SiteUserID, SYSDATE);   END IF; END; CREATE OR REPLACE TRIGGER NewsArticlesDMLTracker   AFTER INSERT OR UPDATE OR DELETE   ON NewsArticles   REFERENCING NEW as NEW OLD as OLD   FOR EACH ROW BEGIN   IF (deleting)   THEN     INSERT INTO NewsArticlesDML                 (TXNID, ArticleID, InsertionDate)          VALUES (DBMS_TRANSACTION.local_transaction_id,                 :OLD.ArticleID, SYSDATE);   ELSE     INSERT INTO NewsArticlesDML                 (TXNID, ArticleID, InsertionDate)          VALUES (DBMS_TRANSACTION.local_transaction_id,                 :NEW.ArticleID, SYSDATE);   END IF; END;

After these triggers are in place, every modification to a row in both the SiteUsers and NewsArticles tables will be tracked in this table as well. The InsertionDate in these tables is only for removing rows from these DML tables after they have been applied to all the slave nodes; however, it cannot be used for tracking replication progress.

The next step is for a slave to connect to the master server and request all the transactions that have occurred since the last time it ran. You might think that it would be correct to have the slave store the last (latest) InsertionDate time stamp it witnessed as a high water mark and request all DML that has occurred since that point in time. This is terribly wrong.

The concept of "since" applies intuitively to time. However, in databases, what has happened "since" the last time you looked has nothing to do with when it happened but rather when it was committed. Oracle, unfortunately, does not expose the ability to place a trigger before commit, so there is no way to tell when all the rows inserted into our DML log left the isolation of their transaction and were actually committed for others to see. Our replication process is one of those "others," so this is a big issue. If we relied on time, we could have a race condition (shown in Figure 8.3), which would lead to a wholly inconsistent copy on the slaves.

Figure 8.3. The race condition in replaying DML by time

Instead of playing back by time, we should play back by transaction ID. This, unfortunately, requires us to maintain a list of transactions that each node has processed so that it can determine which transactions are "new" on each subsequent run. The code listing for the DDL to create tables for replication progress follows:

CREATE TABLE SiteUsersDMLProcessed (   NodeName VARCHAR2(80) NOT NULL,   TXNID VARCHAR2(20) NOT NULL,   PRIMARY KEY(NodeName, TXNID) ); CREATE INDEX SiteUsersDMLP_NN_IDX ON SiteUsersDMLProcessed(NodeName); CREATE TABLE NewsArticlesDMLProcessed (   NodeName VARCHAR2(80) NOT NULL,   TXNID VARCHAR2(20) NOT NULL,   PRIMARY KEY(NodeName, TXNID) ); CREATE INDEX NewsArticlesDMLP_NN_IDX ON NewsArticlesDMLProcessed(NodeName);

Now that we have all our building blocks in place on the master, we can tackle the actual replication from the slave's perspective.

Clearly we can't put any data into our PostgreSQL replica until we have the necessary tables in place. We need tables to hold the SiteUsers and NewsArticles as well as stored procedures (or functions in PostgreSQL) to apply changes as they come in. The code listing for DDL to create target tables in PostgreSQL follows:

CREATE TABLE SiteUsers (   SiteUserID INTEGER PRIMARY KEY,   CreatedDate TIMESTAMP NOT NULL,   UserName VARCHAR(32) NOT NULL UNIQUE,   Password VARCHAR(80) NOT NULL,   EmailAddress VARCHAR(255) NOT NULL,   Country VARCHAR(3) NOT NULL,   PostalCode VARCHAR(10) NOT NULL ); CREATE TABLE NewsArticles (   ArticleID INTEGER PRIMARY KEY,   Author INTEGER NOT NULL,   Editor INTEGER,   Approver INTEGER,   CreatedDate TIMESTAMP NOT NULL,   LastModifiedDate TIMESTAMP NOT NULL,   PublishedDate TIMESTAMP,   Title VARCHAR(200),   AbstractText VARCHAR(1024),   BodyText TEXT,   isAvailable INTEGER NOT NULL DEFAULT 0 );

Now that we have our tables in place, we need to write the stored procedures (or functions) in PostgreSQL for each table to apply the changes that we pull from our master Oracle instance. These stored procedures are fairly generic, and you could write an automated procedure for generating them from DDL information. The code listing for SiteUsers_Apply and NewsArticles_Apply follows:

01: CREATE FUNCTION 02: SiteUsers_Apply(integer, integer, timestamp, varchar, 03:                 varchar, varchar, varchar, varchar) 04: RETURNS void 05: AS $$ 06: DECLARE 07:   v_refid ALIAS FOR $1; 08:   v_SiteUserID ALIAS FOR $2; 09:   v_CreatedDate ALIAS FOR $3; 10:   v_UserName ALIAS FOR $4; 11:   v_Password ALIAS FOR $5; 12:   v_EmailAddress ALIAS FOR $6; 13:   v_Country ALIAS FOR $7; 14:   v_PostalCode ALIAS FOR $8; 15: BEGIN 16:   IF v_SiteUserID IS NULL THEN 17:     DELETE FROM SiteUsers WHERE SiteUserID = v_refid; 18:     RETURN; 19:   END IF; 20:   UPDATE SiteUsers 21:      SET SiteUserID = v_SiteUserID, CreatedDate = v_CreatedDate, 22:      UserName = v_UserName, Password = v_Password, 23:      EmailAddress = v_EmailAddress, Country = v_Country, 24:      PostalCode = v_PostalCode 25:    WHERE SiteUserID = v_refid; 26:   IF NOT FOUND THEN 27:     INSERT INTO SiteUsers 28:            (SiteUserID, CreatedDate, UserName, Password, 29:            EmailAddress, Country, PostalCode) 30:    VALUES (v_SiteUserID, v_CreatedDate, v_UserName, v_Password, 31:            v_EmailAddress, v_Country, v_PostalCode); 32:   END IF; 33: END; 34: $$ LANGUAGE 'plpgsql'; 35: 36: CREATE FUNCTION 37: NewsArticles_Apply(integer, integer, integer, integer, integer, 38:                    timestamp, timestamp, timestamp, varchar, 39:                    varchar, text, integer) 40: RETURNS void 41: AS $$ 42: DECLARE 43:   v_refid ALIAS FOR $1; 44:   v_ArticleID ALIAS FOR $2; 45:   v_Author ALIAS FOR $3; 46:   v_Editor ALIAS FOR $4; 47:   v_Approver ALIAS FOR $5; 48:   v_CreatedDate ALIAS FOR $6; 49:   v_LastModifiedDate ALIAS FOR $7; 50:   v_PublishedDate ALIAS FOR $8; 51:   v_Title ALIAS FOR $9; 52:   v_AbstractText ALIAS FOR $10; 53:   v_BodyText ALIAS FOR $11; 54:   v_isAvailable ALIAS FOR $12; 55: BEGIN 56:   IF v_ArticleID IS NULL THEN 57:     DELETE FROM NewsArticles WHERE ArticleID = v_refid; 58:     RETURN; 59:   END IF; 60:   UPDATE NewsArticles 61:      SET ArticleID = v_ArticleID, Author = v_Author, 62:          Editor = v_Editor, Approver = v_Approver, 63:          CreatedDate = v_CreatedDate, 64:          LastModifiedDate = v_LastModifiedDate, 65:          PublishedDate = v_PublishedDate, 66:          Title = v_Title, AbstractText = v_AbstractText, 67:          BodyText = v_BodyText, isAvailable = v_isAvailable 68:    WHERE ArticleID = v_refid; 69:   IF NOT FOUND THEN 70:     INSERT INTO NewsArticles 71:            (ArticleID, Author, Editor, Approver, CreatedDate, 72:            LastModifiedDate, PublishedDate, Title, 73:            AbstractText, BodyText, isAvailable) 74:    VALUES (v_ArticleID, v_Author, v_Editor, v_Approver, 75:            v_CreatedDate, v_LastModifiedDate, v_PublishedDate, 76:            v_Title, v_AbstractText, v_BodyText, v_isAvailable); 77:   END IF; 78: END; 79: $$ LANGUAGE 'plpgsql';

The basic concept behind these two routines is the same. The program that will pull changes from Oracle and apply them to PostgreSQL will know the original primary key of the row that was changed and the new full row (primary key and other columns) as it is in Oracle. The original primary keys are the reference primary keys, and the new full row represents how that referenced row should now appear. We have three cases to deal with:

If the new full row has a NULL value in the nonreference primary key, it means that the row was not in the master. This means that we should delete the row from the slave (based on the reference primary key). Lines: 1619 and 5659.
If the new full row has a valid (not NULL) primary key and the reference primary key exists in the slave table already, we must update that row to reflect the new information provided. Lines: 2025 and 6068.
If the new full row has a valid (not NULL) primary key and the reference primary key does not exist in the slave table, we must insert this new row. Lines: 2632 and 6977.

We now have the necessary back-end infrastructure in place in Oracle and PostgreSQL and must write the actual workhorse. We'll use Perl for this example because the DBI (Database independent interface for Perl) is simple, sweet, and entirely consistent across all the DBDs (database drivers), including the one for Oracle and the one for PostgreSQL

Although the concept of tracking your changes using triggers is straightforward, there are several moving parts. The slave side is much simpler in both concept and implementation. Basically, a slave does the following:

Connects to the master
Requests all modifications it has not yet processed
Applies those modifications
Informs the master it has applied them
Commits at the slave and the master
Repeats

A majority of the legwork here is authoring the queries that will serve to interact with the master and slave. We will make an effort to generalize the query generation so that adding new tables is easy. You should note that writing this as a single procedural Perl script is likely not the best way to go about putting a piece of critical infrastructure into your system. Making this into a clean, well-separated Perl replication framework is left as an exercise for the reader (and a rather simple one at that). Here it is as one procedural script so that we can walk through it section by section to match it against our six-item checklist without chasing down different classes and methods. The code listing for dml_replay.pl follows:

001: #!/usr/bin/perl 002: 003: use strict; 004: use DBI; 005: use POSIX qw/uname/; 006: use Getopt::Long; 007: use vars qw/%db %sql_ops %ops %tableinfo 008:             $nodename $interval $print_usage $verbose/; 009: 010: ($nodename = [POSIX::uname]->[1]) =~ s/\..*//; # short hostname 011: $interval = 5;                                    # 5 sec default 012: 013: sub vlog { printf STDERR @_ if($verbose); } 014: 015: GetOptions( 'n|nodename=s' => \$nodename, 016:             'i|interval=i' => \$interval, 017:             'v|verbose'    => \$verbose, 018:             'h|help'       => \$print_usage); 019: 020:  if($print_usage) { 021:  print "$0 [-n node] [-i seconds] [-v] [-h]\n"; 022:  exit; 023: } 024: 025: $db{master} = DBI->connect("dbi:Oracle:MASTER", "user", "pw", 026:                              { AutoCommit => 0, RaiseError => 1 } ) || 027:  die "Could not connect to master"; 028: $db{slave} = DBI->connect("dbi: Pg:database=slave", "user", "pw", 029:                             { AutoCommit => 0, RaiseError => 1 } ) || 030:  die "Could not connect to slave"; 031: 032: # define our tables (names, keys, and other columns) 033: %tableinfo = ( 034:  'SiteUsers' => 035:     { 'keys'   => [ 'SiteUserID' ], 036:     'columns'  => [ 'CreatedDate', 'UserName', 'Password', 037:                         'EmailAddress', 'Country', 'PostalCode'], 038:   }, 039:  'NewsArticles' => 040:   { 'keys'      => [ 'ArticleID' ], 041:     'columns'   => [ 'Author', 'Editor', 'Approver', 'CreatedDate', 042:                         'LastModifiedDate', 'PublishedDate', 'Title', 043:                         'AbstractText', 'BodyText', 'isAvailable' ], 044:  }, 045: ); 046: 047: %sql_ops = ( 048:   master => { 049:     fetchlogs => sub { 050:       my ($table, $pks, $cols) = @_; 051:       return qq{ 052:       SELECT l.txnid, }.join(',', map { "l.$_" } (@$pks)).", ". 053:              join(',', map { "t.$_" } (@$pks, @$cols)).qq{ 054:         FROM $table t, ${table}DML l, ${table}DMLProcessed p 055:        WHERE l.txnid = p.txnid(+) /* left join l to p and from */ 056:          AND p.nodename(+) = ?/* our node's view to remove */ 057:          AND p.txnid is NULL/* all seen txnids in p */ 058:           /* then left join against t on the primary key to*/ 059:           /* pull the changed row even if it has been deleted. */ 060:          AND }.join(' AND ', map { "l.$_ = t.$_(+)" } @$pks).qq{ 061:        ORDER BY l.txnid}; 062:  }, 063:  record => sub { 064:    my ($table, $pks, $cols) = @_; 065:    return qq{ 066:    INSERT INTO ${table}DMLProcessed (txnid, nodename) 067:                              VALUES (?, ?) 068:     }; 069:   }, 070: }, 071: slave => { 072:   apply = > sub { 073:    my ($table, $pks, $cols) = @_; 074:    # reference primary keys, new primary keys, new columns 075:    return qq{ SELECT ${table}_Apply(}. 076:     join(',', map { "?" } (@$pks, @$pks, @$cols)). 077:    ')'; 078:   } 079:  } 080: ); 081: 082: # transform %sql_ops into %ops where the operations are now 083: # DBI statement handles instead of SQL text. 084: while (my($table,$props) = each %tableinfo) { 085:   for my $connection ('master', 'slave') { 086:     while(my($op,$tsql) = each %{$sql_ops{$connection}}) { 087:       # use our template to build sql for this table 088:       # ($tsql is coderef that generates our SQL statements) 089:       my $sql = $tsql->($table,$props->{keys},$props->{columns}); 090:       $ops{$connection}->{$table}->{$op} = 091:         $db{$connection}->prepare($sql) || 092:           die "Could not prepare $sql on $connection"; 093:     } 094:   } 095: } 096: 097: sub dml_replay($) { 098:   my $table = shift; 099:   my $rows = 0; 100:   my %seen_txns; 101:   eval { 102:     my $master = $ops{master}->{$table}; 103:     my $slave = $ops{slave}->{$table}; 104:     vlog("Fetch the $table logs from the master\n"); 105:     $master->{fetchlogs}->execute($nodename); 106:     while(my @row = $master->{fetchlogs}->fetchrow()) { 107:       # txnid is the first column, pass the remaining to apply 108:       my $txnid = shift @row; 109:       vlog("[$txnid] Apply the changed row to the slave\n"); 110:       $slave->{apply}->execute(@row); 111:       $seen_txns{$txnid}++; 112:    $rows++; 113: } 114: foreach my $txnid (keys %seen_txns) { 115:   vlog("[$txnid] Record the application to the master\n"); 116:   $master->{record}->execute($txnid, $nodename); 117: } 118: $master->{fetchlogs}->finish(); 119: if($rows) { 120:   my $txns = scalar(keys %seen_txns); # get a count 121:   vlog("[$table] commit $txns txns / $rows rows\n"); 122:   for ('slave', 'master') { $db{$_}->commit(); } 123:  } 124: }; 125: if($@) { 126:   vlog("rollback DML replication on $table: $@\n"); 127:   for ('slave', 'master') { eval { $db{$_}->rollback(); }; } 128:  } 129: } 130: 131: my $stop; 132: $SIG{'INT'} = sub { $stop = 1; }; 133: while(!$stop) { 134:   foreach my $table (keys %tableinfo) { 135:     dml_replay($table); 136:   } 137:   sleep($interval); 138: } 139: 140: vlog("Disconnecting...\n"); 141: for my $connection ('slave', 'master') { 142:   $db{$connection}->disconnect(); 143: }

Let's walk over the checklist again and cross-reference it with our script:

Connect to the master. This is handled in lines 2530.
Request all modifications it has not yet processed. The method for generating the SQL is defined in lines 4962. That method is used and the result prepared on the database handle in lines 8295. The request and the loop over the response set are in lines 105106.
Apply those modifications. The SQL for applying the changes to the slave table is defined in lines 6369 and prepared as a statement handle in lines 8295. The resultset of the previous step is then applied to the slave table, and the transaction IDs that are applied are tracked in lines 107112.
Inform the master that it has applied them. The SQL for recording applied transactions with the master is defined in lines 7278 and prepared as a statement handle in lines 8295. Lines 114117 execute this statement handle to inform the master of the transactions we applied in the previous step.
Commit at the slave and the master. The changes are committed in lines 119123. If there was an error during any of the previous steps, the eval block would be exited and the rollback induced in lines 125128.
Repeat. Lines 131138 loop over the preceding steps.

Perhaps a more straightforward recap of the script itself would be

Lines 2530 connect.
Lines 3245 define the tables we will be replicating.
Lines 4780 define methods to generate all the SQL we will need to do our job.
Lines 8295 prepare all the SQL for all the tables on the appropriate database handle (master or slave).
Lines 97129 apply the process of fetching changes from the master, applying them to the slave, recording the application with the master, and committing (or rolling back on failure).
Lines 131138 continually do this work on a defined interval.

All in all, it is simple. To start, we run the script (and leave it running); then we export the master table and import it manually. From that point on, the script will keep the replica up-to-date (within $interval seconds).

One nuance that was not addressed in this example is that the DMLProcessed tables will grow unbounded. There should be periodic, automated administrative action taken to remove old records from the DML logs on the master.

Snapshot Replication for Small Datasets

Now is a perfect time to take a detour. We have one table that is rather small. AdminUsers is only roughly 5,000 rows. The act of pulling that data from the database is tiny, and the act of creating a new table of that size is hardly worthy of mention.

Tables like these often do not merit the complexity of tracking changes, shipping them over, and reapplying them to slave nodes. Instead, each slave node can simply truncate its copy, pull a new one, insert it, and commit. On some databases this has unappealing, locking behavior and can affect read-only queries referencing the target table. We see this in PostgreSQL sometimes, so we will implement something called flipping snapshots.

The concept is simple. We have two copies of the table, and we alternate between them, applying the snapshot procedure. At the end of each procedure, we replace a publicly used view to reference the table that we just finished populating.

The first thing we need, as in the DML replay examples, is a table to hold our AdminUsers information. Unlike the DML replay example, we will not use the name of the table directly because we want to alternate between two snapshots. We do this by creating two identical snapshot tables and a view against one of them:

CREATE TABLE AdminUsers_snap1 (   UserID INTEGER PRIMARY KEY,   UserName VARCHAR(32) NOT NULL UNIQUE,   Password VARCHAR(80) NOT NULL,   EmailAddress VARCHAR(255) NOT NULL ); CREATE TABLE AdminUsers_snap2 (   UserID INTEGER PRIMARY KEY,   UserName VARCHAR(32) NOT NULL UNIQUE,   Password VARCHAR(80) NOT NULL,   EmailAddress VARCHAR(255) NOT NULL ); CREATE VIEW AdminUsers AS SELECT * FROM AdminUsers_snap1;

Now our tables and view are there, but they are empty. Populating a snapshot table is much more straightforward than the previous examples because it is not fine-grained. Although the process is clearly less efficient, for small tables that change infrequently it is a simple and easy solution because it requires no instrumentation on the master node.

The process of populating tables is simple:

Truncate the local, unused snapshot table (the one to which the view is not pointing to).
Pull all rows from the master table and insert them into the now empty snapshot table.
Alter the view to point to the snapshot table we just populated.
Commit.

As you can imagine, there are several ways to do this. In most databases, performing bulk data loads efficiently requires using a separate API: Oracle has sqlldr, PostgreSQL has COPY, and MySQL has LOAD DATA INFILE. Additionally, many databases support the concept of dblinks, which are direct connections to other databases usually over some connection such as ODBC. Using a direct dblink or a native bulk loading method is the correct implementation of this, but we will use a generic (yet less efficient) approach here so that the code is less baroque and more generally applicable. The code for snapshot.pl follows:

01: #!/usr/bin/perl 02: 03: use strict; 04: use DBI; 05: 06: my $master = DBI->connect("dbi:Oracle:MASTER", "user", "pw", 07:                              { AutoCommit = > 0, RaiseError = > 1 } ) || 08:  die "Could not connect to master"; 09: my $slave = DBI->connect("dbi:Pg:database=slave", "user", "pw", 10:                              { AutoCommit = > 0, RaiseError = > 1 } ) || 11:   die "Could not connect to slave"; 12: 13: # This is how we will find out the view's current source table 14: my $whichsnap = $slave->prepare(q{ 15:     SELECT substring(definition from '%_snap#"_#"%' for '#') 16:       FROM pg_views 17:      WHERE viewname = lower(?) 18: }) || die "Cannot prepare snapshot detector\n"; 19: 20: while(my $table = shift) { 21:   eval { 22:     # Which snapshot is the view pointint to? (1 or 2) 23:     $whichsnap->execute($table); 24:     my($snap) = $whichsnap->fetchrow(); 25:     $whichsnap->finish(); 26: 27:     # Choose the non-active snapshot for population 28:     my $newsnap; 29:     $newsnap = "${table}_snap2" if($snap eq '1'); 30:     $newsnap = "${table}_snap1" if($snap eq '2'); 31:     $newsnap || die "[$table] which snapshot to use?\n"; 32: 33:     # Empty the snapshot table 34:     $slave->do(qq{TRUNCATE TABLE $newsnap}) || 35:       die "[$table] could not truncate $newsnap\n"; 36: 37:     # Pull all the rows from the master and put them 38:     # in the slave table 39:     my $src = $master->prepare(qq{SELECT * FROM $table}); 40:     $src->execute(); 41:     # Our column names are @{$src->{NAME}} 42:     my $dst = $slave->prepare(qq{ 43:       INSERT INTO $newsnap (}. join(',', @{$src->{NAME}}).q{) 44:            VALUES (}.join(',', map { "?" }  @{$src->{NAME}}).')'); 45:     while(my $row = $src->fetchrow_hashref()) { 46:       $dst->execute(map { $row->{$_} } @{$src->{NAME}}); 47:     } 48:     $src->finish(); 49: 50:     # Replace the view with a view onto the new snapshot 51:     $slave->do(qq{CREATE OR REPLACE VIEW $table AS 52:                        SELECT * FROM $newsnap}) || 53:       die "[$table] cannot replace view\n"; 54: 55:     $slave->commit(); 56: }; 57: # Rollback on any sort of error 58: if($@) { eval { $slave->rollback(); }; } 59: } 60: $master->disconnect(); 61: $slave->disconnect();

The preceding code sample is much simpler than the DML replay script. In this script we simply connect to the database (lines 611), determine which snapshot the current view references (lines 1318 and 2231), truncate the "other" snapshot (lines 3335), pull the rows from the master and stick them into our empty table (lines 3748), flip the view to look at our fresh snapshot (lines 5053), and commit (line 55). The script takes the tables to snapshot as additional arguments to the script itself.

Again, although this script is functional, a production version should leverage the database vendor-specific extensions for bulk data loading to reduce overall load on the slave node. In my personal experience doing this from Oracle to MySQL, we used the LOAD DATA INFILE with success, and when performing snapshots from Oracle to PostgreSQL we used the pl/perl procedural language in combination with dbi-link to perform all aspects of the snapshot procedure from within PostgreSQL itself.