Section 21.1. PostgreSQL Architecture | Backup & Recovery: Inexpensive Backup Solutions for Open Systems

21.1. PostgreSQL Architecture

From a power-user standpoint, PostgreSQL is like any other database. The following terms mean essentially the same in PostgreSQL as they do in any other relational database:

Database

Table

Index

Row

Attribute

Extent

Partition

Transaction

21.1.1. Clusters

A PostgreSQL cluster is analogous to an instance in other RDBMSs, and each cluster works with one or more databases. It is not typical to have more than one cluster per database server, but it is possible and is done in some cases. For example, consider a shared hosting environment with different sets of users that should not be given administrative access to each others' data. Since user information is global within a cluster, you would need to create multiple instances to have multiple groups of administrators administering separate databases without giving them the ability to administer another group's database. The pg_ctl command is used to start the cluster.

21.1.2. Tablespace

On one hand, a tablespace in PostgreSQL is the same as any other tablespace, as defined in Chapter 15; it is the space into which you put tables and other objects. As in other database systems, a PostgreSQL tablespace also consists of multiple pagefiles or datafiles. It's how those pagefiles are created that makes PostgreSQL different.

When you create a tablespace in PostgreSQL, you do not specify a list of one or more datafiles; you specify the name of the directory in which you want to store the tablespace. For example, the default tablespace is created within the directory where you install the PostgreSQL cluster. Additional tablespaces might reside on an LVM volume or an external RAID volume for performance reasons. When you create the tablespace, PostgreSQL starts creating the first datafile/pagefile. As you create tables in the tablespace and fill them up with data, that pagefile starts getting filled up. Once the pagefile reaches 1 GB in size, PostgreSQL starts creating another pagefile. This continues until the volume where the tablespace resides runs out of space. Additional tablespaces can be created at any time as space is needed, and tables can be moved between tablespaces with a simple alter table command.

Running out of space in a tablespace is a very bad thing: the backend of PostgreSQL crashes, and any open transactions are lost.

21.1.3. Pagefile/Datafile

As mentioned in the tablespace section, PostgreSQL pagefiles/datafiles are similar to datafiles in other databases with one exception. They are not created by the DBA and then assigned to a tablespace. They are created automatically by PostgreSQL as the tablespace needs more room for data.

21.1.4. Startup Scripts

PostgreSQL is started using scripts that are typically custom to the environment. PostgreSQL does not have a global configuration file such as Oracle's oratab to keep track of all instances on a machine. Instances on a given machine can be discovered using a process list. On a Unix/Linux machine, ps ef would show something like the following:

Need a process list from a PostgreSQL machine

21.1.5. System Tables

The system tables, which are stored in the default tablespace, can be queried for a list of all databases within the cluster, tablespaces within a database, and so on. This information can then be used for backup purposes. The following command shows a list of databases:

postgres=# select datname from pg_database order by datname; datname ------------------ calendar postgres systemvdm2 template0 template1  (5 rows)

You can also query the system tables for a list of tablespaces with this command:

postgres=# select spcname from pg_tablespace order by spcname; spcname ------------ pg_default pg_global (2 rows)

21.1.6. Large Objects

PostgreSQL has special datatypes for storing large objects such as image files and large text data. They include ByteA, BLOB and Text. Depending on the configuration, LOB data is stored in special files in the filesystem. If you're running PostgreSQL 8.1 or higher, such data is always backed up when using the pg_dump or pg_dumpall commands. Previous versions require you to specify the b option to the pg_dump to include this data in a backup.

21.1.7. Rollback Process

PostgreSQL handles the rollback process differently from other RDBMSs. Consider a complex transaction that consists of a single begin transaction statement, followed by hundreds of transactional commands and a single commit statement. Many RDBMS products would flush the changes to disk as they happen, even if the transaction isn't complete yet. The thought is that 99.9 percent of all transactions are committed anyway, so they may as well be flushed to disk now. The challenge is what happens if someone pulls the plug on the database: we have an incomplete transaction that is partially written to disk.

This is what the rollback log is for in other RDBMSs. Before changing a block on disk, the before image of that block is stored in the rollback log. If the transaction is committed, its before images are deleted from the rollback log. That means that any blocks in the rollback log are used during crash recovery to return the database to a consistent state.

PostgreSQL handles this differently. Only committed transactions are actually written to disk; uncommitted transactions make changes only to data that is in memory. Another big difference is that changes to tuples in PostgreSQL do not overwrite the previous version of the tuple. PostgreSQL writes a new tuple and marks the old one as inactive. If a transaction completes, its changes are flushed to disk at the next checkpoint, and the inactive tuple is marked for deletion by a vacuum process that happens later. If a transaction doesn't complete, it is rolled back by making the new tuple inactive and making the previous version of the tuple active again. This is why PostgreSQL does not need a separate rollback log.

This vacuum process happens on a regular basis and deletes from the table all tuples marked for deletion. The vacuum process is extremely important and must be performed. If you ever hit over 4 billion transactions without vacuuming, your database stops working, forcing you to vacuum before you can continue.

If you don't occasionally run the vacuum process, your database eventually stops functioning, and you lose any transactions in flight when that happens. Don't let this happen to you. The good news is that automated vacuuming has been built into PostgreSQL 8.1. It is disabled by default, however.

21.1.8. Write Ahead Log

The write ahead log, or WAL, is very similar to the transaction log in any other database. All transactions are written to the log before they are allowed to be written to disk, allowing the database to redo those transactions in case of crash recovery. The WAL can also be used for point-in-time recovery by replaying transactions that have occurred since the last full backup.

A running PostgreSQL system creates a sequence of WAL records that are divided into WAL segment files, which are normally 16 MB each, although you can specify a different system when building PostgreSQL. These segment files are given numeric names that represent their position in the overall sequence of records. If WAL archiving is not enabled, PostgreSQL creates a few segment files, then recycles them by renaming older segment files with newer segment numbers. This allows enough records to be archived to support hot backups and crash recovery, but does not help if you want to perform point-in-time recovery.

WAL archiving copies the segment files to an alternate location before they are renamed. WAL archiving must be enabled in order to use the WAL for point-in-time recovery. You enable WAL archiving by specifying a value for the archive_command parameter on startup. As with other parameters, this can be specified in postgresql. conf or by passing it via an option to the postmaster command. If you specify the options in postgresql.conf, you should be able to tell postmaster to reread the configuration file by issuing a pg_ctl reload. If that doesn't work, you need to stop and start the database. The string should be a series of commands designed to copy an individual file to an alternate location. In the string, any %p is replaced by the full path of the file to be archived, and any %f is replaced by the filename without its path. (Use %% if you want to use an actual % character in your command.) Here are some example commands from the PostgreSQL manual:

$ archive_command = 'cp -i %p /mnt/server/archivedir/%f'

This specification would mean that PostgreSQL would copy each WAL segment file to /mnt/server/archivedir as it is completed. If you want to make sure you're not overwriting any existing files, the following should work on most Unix variants:

$ archive_command = 'test ! -f .../%f && cp %p .../%f'

It tests to see whether the file is present in the destination before it copies it. Obviously, these examples should be enhanced for use in your own environment.

Alternatively, you can use a custom method for archiving and for other things as well. This example shows such a custom script:

$ archive_command = 'mycp.sh %p /mnt/server/archivedir/%f'

In addition to copying the files to the appropriate location, the mycp.sh script could also do additional checking to suit your requirements. For example:

Compress or zip the WAL segment file.
Check that you are not overwriting another segment file.