Chapter 21. PostgreSQL
As discussed in Chapter 15, any relational database that is to be trusted with production data should be ACID-compliant. PostgreSQL is the most popular
|
21.1. PostgreSQL Architecture
From a power-
21.1.1. Clusters
A PostgreSQL
cluster
is analogous to an instance in other RDBMSs, and each cluster works with one or more databases. It is not typical to have more than one cluster per database server, but it is possible and is done in some cases. For example, consider a shared hosting environment with different sets of users that should not be given administrative access to each others' data. Since user information is global within a cluster, you would need to create multiple instances to have multiple groups of administrators administering separate databases without giving them the ability to administer another
21.1.2. TablespaceOn one hand, a tablespace in PostgreSQL is the same as any other tablespace, as defined in Chapter 15; it is the space into which you put tables and other objects. As in other database systems, a PostgreSQL tablespace also consists of multiple pagefiles or datafiles . It's how those pagefiles are created that makes PostgreSQL different.
When you create a tablespace in PostgreSQL, you do not specify a list of one or more datafiles; you specify the
21.1.3. Pagefile/DatafileAs mentioned in the tablespace section, PostgreSQL pagefiles/datafiles are similar to datafiles in other databases with one exception. They are not created by the DBA and then assigned to a tablespace. They are created automatically by PostgreSQL as the tablespace needs more room for data. 21.1.4. Startup Scripts
PostgreSQL is started using scripts that are typically custom to the environment. PostgreSQL does not have a global configuration file such as Oracle's
oratab
to keep track of all instances on a machine. Instances on a given machine can be
Need a process list from a PostgreSQL machine 21.1.5. System TablesThe system tables, which are stored in the default tablespace, can be queried for a list of all databases within the cluster, tablespaces within a database, and so on. This information can then be used for backup purposes. The following command shows a list of databases: postgres=# select datname from pg_database order by datname; datname ------------------ calendar postgres systemvdm2 template0 template1 (5 rows) You can also query the system tables for a list of tablespaces with this command: postgres=# select spcname from pg_tablespace order by spcname; spcname ------------ pg_default pg_global (2 rows) 21.1.6. Large Objects
PostgreSQL has special datatypes for storing large objects such as image files and large text data. They include ByteA, BLOB and Text. Depending on the configuration, LOB data is stored in special files in the filesystem. If you're running PostgreSQL 8.1 or higher, such data is always
21.1.7. Rollback ProcessPostgreSQL handles the rollback process differently from other RDBMSs. Consider a complex transaction that consists of a single begin transaction statement, followed by hundreds of transactional commands and a single commit statement. Many RDBMS products would flush the changes to disk as they happen, even if the transaction isn't complete yet. The thought is that 99.9 percent of all transactions are committed anyway, so they may as well be flushed to disk now. The challenge is what happens if someone pulls the plug on the database: we have an incomplete transaction that is partially written to disk. This is what the rollback log is for in other RDBMSs. Before changing a block on disk, the before image of that block is stored in the rollback log. If the transaction is committed, its before images are deleted from the rollback log. That means that any blocks in the rollback log are used during crash recovery to return the database to a consistent state.
PostgreSQL handles this differently. Only committed transactions are actually written to disk; uncommitted transactions make changes only to data that is in memory. Another big difference is that changes to tuples in PostgreSQL do not overwrite the previous version of the tuple. PostgreSQL
This vacuum process happens on a regular basis and deletes from the table all tuples marked for deletion. The vacuum process is extremely important and must be performed. If you ever hit over 4 billion transactions without vacuuming, your database stops working, forcing you to vacuum before you can continue.
21.1.8. Write Ahead Log
The write ahead log, or WAL, is very similar to the transaction log in any other database. All transactions are written to the log before they are allowed to be written to disk, allowing the database to redo those transactions in case of crash recovery. The WAL can also be used for point-in-time recovery by
A running PostgreSQL system creates a sequence of WAL records that are divided into WAL segment files, which are normally 16 MB each, although you can specify a different system when building PostgreSQL. These segment files are given numeric
WAL archiving copies the segment files to an alternate location before they are
$ archive_command = 'cp -i %p /mnt/server/archivedir/%f' This specification would mean that PostgreSQL would copy each WAL segment file to /mnt/server/archivedir as it is completed. If you want to make sure you're not overwriting any existing files, the following should work on most Unix variants: $ archive_command = 'test ! -f .../%f && cp %p .../%f' It tests to see whether the file is present in the destination before it copies it. Obviously, these examples should be enhanced for use in your own environment. Alternatively, you can use a custom method for archiving and for other things as well. This example shows such a custom script: $ archive_command = 'mycp.sh %p /mnt/server/archivedir/%f' In addition to copying the files to the appropriate location, the mycp.sh script could also do additional checking to suit your requirements. For example:
|