Requirements | Replicating PostgreSQL Data with Slony

Table of contents:

Before you can create a replication cluster, you must install Slony on every computer that will host a node. You can find Slony at the PostgreSQL's gborg web site (http://gborg.postgresql.org) or at http://slony.info. Currently, you must compile Slony yourself (at the time we are writing this, binary distributions are not available). To build Slony, you must have the following PostgreSQL components on each system:

PostgreSQL C header files (typically found in /usr/include, you can locate these files with the command pg_config includedir)
PostgreSQL Server C header files (typically found in /usr/include/pgsql/ server, you can locates these files with the command pg_config includedir-server)
PostgreSQL client libraries (typically found in /usr/lib, you can locate the actual directory with the command pg_config -libdir)
PL/pgSQL (typically found in /usr/lib/pgsql, you can locate the actual directory with the command pg_config pkglibdir)

Once you've confirmed that you have all of the components that you need, download and unpack the Slony archive. Next cd to the directory that holds the freshly unpacked files and then, dance the familiar GNU Tango: ./configure && make && make install.

If you've installed PostgreSQL in a "standard" location, configure should be able to find the PostgreSQL components that it requires. If configure complains that it can't find a PostgreSQL component, you may need to add one or more of the following options to the configure command line:

with-pgconfigdir=directory: The given directory must contain the pg_config utility (configure uses pg_config to find most of the other PostgreSQL components)
with-pgbindir=directory: Location of the PostgreSQL postmaster (Slony installs a few programs, such as slon and slonik, in this directory)
with-pgincludedir=directory: Location of the PostgreSQL headers
with-pgincludeserverdir=directory: Location of the PostgreSQL server headers
with-pglibdir=directory: Location of the PostgreSQL libraries (Slony programs are linked against the libpq object library)
with-pgpkglibdir=directory: Location of the PostgreSQL pkglib directory (specifically, configure searches for the PL/pgSQL shared library)
with-pgsharedir=directory: Location of the PostgreSQL share directory (Slony installs a number of SQL scripts in this directory)

In addition to the Slony package itself, you must prepare each PostgreSQL server for replication. First, the target database (that is, the database that you are originating or subscribing to) must exist on each node. Although not required, you may find it easier to manage the cluster if you use consistent database names on each node. In our video store example, every node contains a database named movies. Within movies, each node contains one schema for each branch. (You would end up with a schema named springfield, a schema named boomtown, and a schema named snoozeville.) Finally, you must install PL/pgSQL into each node. Many of the triggers and functions that Slony uses are written in PL/pgSQL (others are written in C).

To carry out its work, Slony adds a number of tables, views, sequences, functions, and types to each node. Slony segregates most of its data into a single schema within each nodethe name of the schema is formed by adding an underscore to the start of the replication cluster name. For example, if you create a replication cluster named branches, Slony creates a schema named _branches in each node. Slony will refuse to cooperate if it finds anything in the cluster schema. (Slony prefers to create the cluster schema itself.) Although it's possible to populate the Slony schemas by hand, it's easier to use the slonik command. slonik is a scripting engineto create or manage a Slony cluster, you write a script and invoke the slonik command to execute that script.

When you execute a script, slonik connects to one or more of the databases in the cluster and executes commands (often stored procedures) in those databases. Each script must contain a preamble that tells slonik how to connect to each node. For example, a script that manages the branches cluster would begin like this:

cluster name = branches;
node 1 admin conninfo = 'dbname=movies host=springfield user=slony';
node 2 admin conninfo = 'dbname=movies host=boomtown user=slony';
node 3 admin conninfo = 'dbname=movies host=snoozeville user=slony';
...

Symbolic Names and Include Files

The slonik command recently learned how to perform two new tricks that make it much easier to write and, especially, maintain slonik scripts. Starting with release 1.1, you can define symbolic names and refer to those names later in the script. For example, using symbolic names, you could rewrite the preamble like this:

define CLUSTER branches;
define SPRINGFIELD 1;
define BOOMTOWN 2;
define SNOOZEVILLE 3;

cluster = @CLUSTER;
node @SPRINGFIELD admin 
 conninfo = 'dbname=movies host=springfield user=slony';
node @BOOMTOWN admin 
 conninfo = 'dbname=movies host=boomtown user=slony';
node @SNOOZEVILLE admin 
 conninfo = 'dbname=movies host=snoozeville user=slony';

A define statement assigns a symbolic name to a string. The first define command, for example, creates a symbol named CLUSTER whose value is branches. When slonik sees @symbol in a script, it substitutes the value of the symbol. The symbol value may include spaces and can also include references to other symbols.

At first glance, symbolic names may not seem to offer much, but as your slonik scripts become more complex, you'll find that symbolic names make it easier to keep track of node numbers, set numbers, status codes, and so on. In fact, when you combine symbolic names with include files, your slonik scripts become much shorter and much more maintainable. Rather than writing the same cluster and node directives in every slonik script, you can create a separate preamble file and include that file in the other scripts. For the remainder of this chapter, I'll assume that you are using a version of Slony that supports symbolic names and include files. If you're using an older version, I'd encourage you to upgrade.

The first command in the preamble tells slonik the name of the cluster. (slonik needs the cluster name so that it can refer to the proper schema within each database.) The cluster name directive is followed by a series of node declarations. Each node declaration assigns a unique numeric identifier to the node and tells slonik how to connect to the database. Since slonik is a client application that uses the libpq library to connect to each database, you must provide the connection information in the form of a libpq-style connection string. The conninfo string typically specifies a database name, a hostname, a username, and occasionally a port number, but you can include any of the connection properties described in Table 5.2. If you want to consolidate all of your connection properties into a single location, you might consider creating a pg_service.conf file and including the service names in your slonik script instead. For example, if you have a pg_service.conf file that contains the following entries:

[springfield-slonik]
dbname=movies
host=springfield
user=slony

[boomtown-slonik]
dbname=movies
host=boomtown
user=slony

[snoozeville-slonik]
dbname=movies
host=snoozeville
user=slony

You could write a slonik preamble like this:

# File: preamble
cluster name = branches;
node 1 admin conninfo = 'service=springfield-slonik';
node 2 admin conninfo = 'service=boomtown-slonik';
node 3 admin conninfo = 'service=snoozeville-slonik';
...

Slony Password Management

You may have noticed that the connection strings shown here do not include a password. Although you could include a password in node declarations, it's not a good idea to do so since the passwords would be stored in plain-text form. Instead, you should use one of the more secure authentication methods described in Chapter 23, "Security." The TRUST authentication method is undoubtedly the easiest method to configure (no passwords involved), but that's not particularly secure when you have more than one host in the replication cluster. The md5 authentication method is a reasonably secure choice, but you still have to store your passwords somewhere. Instead of embedding a plain-text password in a script, you should store PostgreSQL passwords in $HOME/.pgpass.

When a libpq application (like slonik or slon) connects to a database that requires a password, it searches for the password in the connection string. If the connection string does not contain that password (and, in general, it should not), it searches for the password in a file named $HOME/.pgpass (note the period at the beginning of the filename). If the client application is running on a Windows host, libpq searches for the password in a file named postgresqlpgpass.conf in the Application Data directory assigned to your account (the %APPDATA% environment variable points to your Application Data directory).

For example, if you are logged in to your computer as user bruce, libpq reads ~bruce/.pgpass (which is just another way of writing the .pgpass file found in bruce's home directory). Each entry in the .pgpass file contains a hostname, port, database name, username, and password (in plain-text form). libpq searches through .pgpass until it finds a match on the first four fields (an "*" matches any value for that field). If user bruce has created a .pgpass file that contains the following entries:

# format - hostname:port:database:username:password
# an '*' matches any value in that field
boomtown:*:movies:bruce:cows
boomtown:*:movies:slony:cows-slony

He can log in to the movies database on host boomtown (any port). Since .pgpass contains two entries that match host boomtown (any port) and database movies, bruce can log in as user bruce or as user slony. Either of the following psql commands will work without prompting bruce for a password:

[bruce@springfield ~] psql -h boomtown -U bruce movies
[bruce@springfield ~] psql -h boomtown -U slony movies

It's important to remember that libpq reads the .pgpass file found in the $HOME directory of the user running the client application, even if you connect to a different account in the target database. That means that, given the slony.preamble (or pg_service.conf) file shown earlier, your .pgpass file must contain at least the following entries:

springfield:*:movies:slony:cow-slony
boomtown:*:movies:slony:emu-slony
snoozeville:*:movies:slony:penguin-slony

Instead of writing the same preamble in every slonik script, you can save the preamble in a separate file (say, preamble.sk) and then include<> that file in other scripts. The preamble file is a great place to define symbolic names for values that you'll use in other scripts. Listing 24.1 shows a revised preamble.sk file that we'll include<> in the rest of the scripts that you'll see in this chapter.

Listing 24.1. preamble.sk

# File: preamble.sk
define CLUSTER branches;
define SPRINGFIELD 1;
define BOOMTOWN 2;
define SNOOZEVILLE 3;
define fqn fully qualified name;
define SUCCESS 0;
define FAILURES 1;
cluster = @CLUSTER;
node @SPRINGFIELD admin conninfo = 'dbname=movies host=springfield user=slony';
node @BOOMTOWN admin conninfo = 'dbname=movies host=boomtown user=slony';
node @SNOOZEVILLE admin conninfo = 'dbname=movies host=snoozeville user=slony';

Once you have the preamble in place, you can add commands to the slonik script. Most commands are composed of a one, two, or three-word verb followed by a comma-separated list of options enclosed in a pair of parentheses.

Part I: General PostgreSQL Use

Introduction to PostgreSQL and SQL

Working with Data in PostgreSQL

PostgreSQL SQL Syntax and Use

Performance

Part II: Programming with PostgreSQL

Introduction to PostgreSQL Programming

Extending PostgreSQL

PL/pgSQL

The PostgreSQL C APIlibpq

A Simpler C APIlibpgeasy

The New PostgreSQL C++ APIlibpqxx

Embedding SQL Commands in C Programsecpg

Using PostgreSQL from an ODBC Client Application

Using PostgreSQL from a Java Client Application

Using PostgreSQL with Perl

Using PostgreSQL with PHP

Using PostgreSQL with Tcl and Tcl/Tk

Using PostgreSQL with Python

Npgsql: The .NET Data Provider

Other Useful Programming Tools

Part III: PostgreSQL Administration

Introduction to PostgreSQL Administration

PostgreSQL Administration

Internationalization and Localization

Security

Replicating PostgreSQL Data with Slony

Contributed Modules

Index