How to Configure PostgreSQL
This chapter is dedicated to all database administrators and to those who want to become PostgreSQL administrators. Database administration can be either a simple or a difficult task, depending on the demands your database has to face and the level of security you have to provide.
PostgreSQL is indeed a database system that can easily be administered as long as the database administrator (DBA) understands the basic concepts of databases, backup strategies, and security issues. Security demands are always increasing and it is necessary to have a high standard for security, even if your database is only used in a protected environment. But let's get to some practical stuff.
Configuring the
Postmaster
The heart of every PostgreSQL database server is the so called
"postmaster,
the supervisor daemon of your database. The postmaster has a lot of parameters that can be configured to make your database run better and faster.
By using
postmaster --help
we can get a little help about working with the postmaster.
On Unix systems, the postmaster is usually started when the default runlevel starts its processes, but it can also be started manually. No matter how the postmaster is started, you can use the flags just shown. Let's take a closer look at some flags.
-
-B
Defines the number of shared memory buffers PostgreSQL uses for the backend processes. One shared memory buffer is usually 8KB, but the
size
can be redefined in
src/include/config.h
(see
BLCKSZ
).
-
-D
Specifies the directory PostgreSQL will use for the data.
-D
defines the root directory of a cluster of databases. If the database directories are not defined via the shell, PostgreSQL will use the environment variable
$PGDATA
instead. If you want to set this variable, check out your Unix manual and look for the
export
command (Bourne Shell).
-
-N
Sets the maximum number of backend processes that are allowed to run on the machine. The default value is set to 32 but can also be changed in
src/include/config.h
.
-
-S
Starts the postmaster in silent mode, which means that all messages the postmaster normally produces ”no matter whether on standard output or standard error ”are redirected to
/dev/null
. This is extremely bad for troubleshooting because you will loose important information.
-
-d
Defines the debugging level your PostgreSQL server will use. The debugging level can be set from 1 to 5. The higher the debugging level is, the more information will be displayed. In most cases, it is useful to redirect the output to a file because this is the only way to get a complete history of what is going on with your PostgreSQL server.
-
-i
Allows remote users to connect to your PostgreSQL server. Without
-i
set, only local Unix domain socket connections can be established.
-
-l
Provides secure connections in combination with the
-i
flag. This works only when SSL has been enabled at compile time.
-
-o
Passes options to the backend processes. This can be very useful when tuning or debugging your server. Many users consider
-o
more comfortable than editing files.
-
-p
Defines the port your server will use. If no port is defined, PostgreSQL will use the value of the environment variable
$PGPORT
. Normally, PostgreSQL uses 5432 as the default port (the value that is usually compiled into the binaries).
Note
All changes that are done in
src/include/config.h
require a recompile of PostgreSQL.
If you want to start PostgreSQL using an
init
script, the parameters passed to the postmaster have to be added to the
init
scripts in
/etc/rc.d/init.d
for users of RedHat Linux or
/etc/init.d
for users of Debian Linux. Just go through the shell script and add the desired flags to the section where
pg_ctl
is started.
init
scripts are very hardware dependent, so there is more than just one way of how things can be done.
PostgreSQL 7.1 ”The
pg_options
File
PostgreSQL databases version 7.1 use a file called
pg_options
to configure runtime parameters. The
pg_options
file is used to configure a variety of parameters. It can usually be found in
$PGDATA/data
. All values defined in
pg_options
can be changed at runtime. You don't have to restart PostgreSQL because the parameters in
pg_options
are re-read when a new backend starts, becase everytime the backend is started a connection to the PostgreSQL database is established.
This feature is extremely useful because it avoids unpleasant downtimes.
Note
Queries that are already running are not affected by the changes.
The syntax of the
pg_options
file is extremely simple, a shown in the following:
verbose=2
query=2
syslog=0
First
verbose
is set to two which makes PostgreSQL display some runtime information on the screen. The seond line
tells
PostgreSQL to print information about queries. Syslog is turned off so that all information is sent to
stdout
and
stderr
. Adding an option can easily be done by adding it to the file. The following is a complete list of all currently supported options:
-
all
all
is the global trace flag. It can be set to
(trace messages enabled individually),
1
(enable all trace messages), or
-1
(disable all trace messages). Tracing means that the internals of the database can be
monitored
.
-
verbose
The
verbose
flag is used to define the amount of message the database provides.
means that no messages are displayed.
1
provides more messages, and
2
displays as many messages as possible.
-
query
This flag is used to define the way queries are traced.
means that no query is displayed,
1
writes
one line per query, and
4
prints the full query.
-
plan
Prints the query plan.
-
parse
Prints the output of the parser.
-
rewritten
Prints rewritten queries.
-
pretty_plan
Pretty-print query plan.
-
pretty_parse
Prints pretty-print parser output.
-
pretty_rewritten
Prints pretty-print rewritten queries.
-
parsestats
Prints the statistics of the parser.
-
plannerstats
Prints the statistics of the planner.
-
executorstats
Prints the statistics of the executor.
-
locks
Used to trace locking.
-
userlocks
Used to trace
user
locks.
-
spinlocks
Used to trace spin locks.
-
notify
Used to trace notify functions.
-
lock_debug_oidmin
Minimum relation
oid
traced by locks.
-
lock_debug_relid
oid
, if not zero, of relation traced by locks.
-
deadlock_timeout
Deadlock check timer.
-
syslog
Defines the behavior of
syslog
.
prints all messages to standard output or standard error,
1
prints all messages to standard output, standard error, and
syslog
.
2
sends the messages to
syslog
only.
-
hostlookup
Enables hostname lookups in
ps_status
.
-
showportnumber
Shows port
numbers
in
ps_status
.
-
nofsync
Disables
fsync
on a per-backend basis.
The
nofsync
flag may be
especially
interesting for you, and it is worth going into a little more into detail.
int fsync(int fd)
can be found in the
unistd.h
C library and is used to flush all data to disk. This is useful because it protects your data from being lost when the machine crashes. But
fsync
can also cost a lot of performance when dealing with huge amounts of data. We will focus on the performance aspect of
fsync
in Chapter 8, called "Performance Tuning."
PostgreSQL 7.1 ”
postgreSQL.conf
Since PostgreSQL 7.1, all runtime parameters can be found in
postgresql.conf
. Like its predecessor, the file can be found in
$PGDATA/data
. Let's take a look at the default
config
file:
#
# PostgreSQL configuration file
# -----------------------------
#
# This file consists of lines of the form
#
# name = value
#
# (The `=' is optional.) White space is collapsed, comments are
# introduced by `#' anywhere on a line. The complete list of option
# names and allowed values can be found in the PostgreSQL
# documentation. Examples are:
#log_connections = on
#fsync = off
#max_connections = 64
# Any option can also be given as a command line switch to the
# postmaster, e.g., 'postmaster -c log_connections=on'. Some options
# can be set at run-time with the 'SET' SQL command.
#========================================================================
#
# Connection Parameters
#
#tcpip_socket = false
#ssl = false
#max_connections = 32 # 1-1024
#port = 5432
#hostname_lookup = false
#show_source_port = false
#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777
#virtual_host = ''
#krb_server_keyfile = ''
#
# Performance
#
#sort_mem = 512
#shared_buffers = 2*max_connections # min 16
#fsync = true
#
# Optimizer Parameters
#
#enable_seqscan = true
#enable_indexscan = true
#enable_tidscan = true
#enable_sort = true
#enable_nestloop = true
#enable_mergejoin = true
#enable_hashjoin = true
#ksqo = false
#geqo = true
#effective_cache_size = 1000 # default in 8k pages
#random_page_cost = 4
#cpu_tuple_cost = 0.01
#cpu_index_tuple_cost = 0.001
#cpu_operator_cost = 0.0025
#geqo_selection_bias = 2.0 # range 1.5-2.0
#
# GEQO Optimizer Parameters
#
#geqo_threshold = 11
#geqo_pool_size = 0 #default based in tables, range 128-1024
#geqo_effort = 1
#geqo_generations = 0
#geqo_random_seed = -1 # auto-compute seed
#
# Inheritance
#
#sql_inheritance = true
#
# Deadlock
#
#deadlock_timeout = 1000
#
# Expression Depth Limitation
#
#max_expr_depth = 10000 # min 10
#
# Write-ahead log (WAL)
#
#wal_buffers = 8 # min 4
#wal_files = 0 # range 0-64
#wal_sync_method = fsync # fsync or fdatasync or open_sync or open_datasync
# Note: default wal_sync_method varies across platforms
#wal_debug = 0 # range 0-16
#commit_delay = 0 # range 0-100000
#commit_siblings = 5 # range 1-1000
#checkpoint_segments = 3 # in logfile segments (16MB each), min 1
#checkpoint_timeout = 300 # in seconds, range 30-3600
#
# Debug display
#
#silent_mode = false
#log_connections = false
#log_timestamp = false
#log_pid = false
#debug_level = 0 # range 0-16
#debug_print_query = false
#debug_print_parse = false
#debug_print_rewritten = false
#debug_print_plan = false
#debug_pretty_print = false
#ifdef USE_ASSERT_CHECKING
#debug_assertions = true
#endif
#
# Syslog
#
#ifdef ENABLE_SYSLOG
#syslog = 0 # range 0-2
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'
#endif
#
# Statistics
#
#show_parser_stats = false
#show_planner_stats = false
#show_executor_stats = false
#show_query_stats = false
#ifdef BTREE_BUILD_STATS
#show_btree_build_stats = false
#endif
#
# Lock Tracing
#
#trace_notify = false
#ifdef LOCK_DEBUG
#trace_locks = false
#trace_userlocks = false
#trace_spinlocks = false
#debug_deadlocks = false
#trace_lock_oidmin = 16384
#trace_lock_table = 0
#endif
We will take a look at the most important parameters that affect your daily work with PostgreSQL.
-
log_connections
Prints one line for every connection established to the server.
-
fsync
Tells the database whether to
turn
fsync()
on or off. If
fsync()
is on, the database makes sure that the data is physically written to disk.
-
max_connections
By default, PostgreSQL allows 32 simultaneous connections. This value can be changed by setting
max_connections
to the desired value. By default, there is a compile limit of 1024 connections, which can only be changed by recompiling the server.
-
tcpip_socket
If this flag is turned on, PostgreSQL accepts TCP/IP connections and not only Unix domain socket connections.
-
ssl
Enables secure connections via SSL.
-
port
Defines the TCP-port to which the server is listening. This value can only be set when the server starts; you can't change it for individual backend processes.
-
hostname_lookup
If you want hostnames instead of IP addresses to be logged,
hostname_lookup
has to be turned on. Don't forget that DNS lookups can take some time.
-
show_source_port
With this flag enabled, the database shows the outgoing port of the host connecting to PostgreSQL.
-
unix_socket_directory
Defines the directory of the Unix domain socket the postmaster is listening to for connections from client applications. The default value is usually set to
/tmp
.
-
unix_socket_group
Defines the group owner of the Unix domain socket. By default, it is an empty string, which means that the default
group
of the current user is taken.
-
unix_socket_permissions
The value must be defined by using the default Unix syntax for setting file system permissions. For example,
0777
means that everybody can connect.
-
virtual_host
Works pretty much the same like Apache's
virtual_host
directly. By default, PostgreSQL listens to all hostnames assigned to the machine.
-
krb_server_keyfile
Defines the location of the Kerberos key file.
-
sort_mem
If PostgreSQL has to sort data, it is usually done in memory as long as the size of the
sort
buffer is not exceeded. If that happens, the database uses temporary files for sorting. Compared to sorting in memory, this is ghastly slow, but if the amount of data that has to be
processed
is extremely huge, it cannot be done in any other way. Keep in mind that if many concurrent sorts are running, the amount of memory defined by
sort_mem
is used multiple times. The size of the sort buffer is defined in kilobytes.
-
shared_buffers
Sets the shared memory buffers the database will use. A buffer is usually 8KB; the default value is set to 32.
-
enable_seqscan
If you want to avoid the planner using sequential
scans
, you can unset the
enable_seqscan
flag. It is not possible to turn sequential scans off completely, but their use can be reduced to the minimum.
-
enable_indexscan
Defines whether the database use indexscans.
-
enable_tidscan
Defines whether the planner uses TID scan plan types.
-
enable_sort
If you want to influence the number of sort operations, you can unset the
enable_sort
flag. Sorting can be turned off completely because the planner cannot always find a workaround.
-
enable_nestloop
Enables the use of nested
loops
.
-
enable_mergejoin
Enables the use of merge joins.
-
enable_hashjoins
Enables the use of
has
joins.
-
ksqo
The Key Set Query Optimizer (KSQO) makes the planner convert queries with many ANDs and ORs in the
WHERE
clause to queries using
UNION
. This is usually done when working with Microsoft Access.
-
geqo
Defines whether generic query optimization is used. Generic query optimization means that query planning is done without exhaustive searching. The database uses
on
by default.
-
effective_cache_size
Defines the value the optimizer assumes for the size of the disk cache.
-
random_page_cost
In many cases, PostgreSQL cannot do sequential reading. Jumping from block to block takes longer than sequential reading.
random_page_cost
can be used to tell the query optimizer how much longer random reading takes (for example, 4 times). This is a floating point value and allows very fine tuning.
-
cpu_tuple_cost
Defines the value the optimizer will assume to process a tuple. It is measured as a fraction of the cost of a sequential page fetch.
-
cpu_index_tuple_cost
Similar to
cpu_tuple_cost
but sets the time for processing a tuple using an index.
-
cpu_operator_cost
Sets the time to process an operator in a
WHERE
clause. It is also measured as a fraction of the cost of a sequential scan.
-
sql_inheritance
Defines the behavior of the database when dealing with inheritance. As described in the section called "Inheritance" in Chapter 8), PostgreSQL treats inherited tables slightly different. If you find the old way better, set this value to
off
.
-
deadlock_timeout
In a few cases, the database has to face a deadlock.
deadlock_timeout
defines the time to wait when waiting for a lock before checking whether the conditions for a deadlock are fulfilled. The time is measured in
milliseconds
.
-
max_expr_depth
Tells the database how many expressions can be nested within each other. If this is set to a very high value, the backend can crash because of buffer overflows.
-
wal_buffers
Defines the number of disk page buffers stored in shared memory for WAL log. WAL is short for Write Ahead Log.
-
wal_files
PostgreSQL creates log files in advance.
wal_files
defines how many of these files are created.
-
silent_mode
This has actually the same effect as the
-S
flag of the postmaster. No standard error and no standard output is produced.
-
log_timestamp
Adds a timestamp to each entry in the logfile. The default value is set to
off
, but we strongly recommend that you turn it
on
, because it allows you to find out what kind of action was performed by the database.
-
log_pid
The process ID can be added to every entry in the logfile by using the
log_pid
flag.
-
debug_level
The higher this value is, the more debugging information will be displayed. The default value is set to
. During development, it is useful to turn the debug level up and then down again for production.
-
syslog
syslog
is a powerful logging tool supported by all current Linux distributions. If this value is set to
,
syslog
is turned off completely.
1
means that messages are written to standard output and
syslog
. If you use
2
,
syslog
only is used.
postgresql.conf
is simply a wonderful thing. All major configuration
tasks
can be done within just one file, and this makes PostgreSQL administration even more simple.
|