6.1 Installing PostgreSQL on Unix | PHP and PostgreSQL. Advanced Web Programming2002

In general two ways of installing PostgreSQL are available. On the one hand, it is possible to install precompiled binaries that are available for the most widespread Unix systems. On the other hand, it is possible to install and compile the source code of PostgreSQL. Both methods are covered extensively in this section.

6.1.1 Installing Binaries

Installing binary packages is in most cases much more comfortable than installing the source code of PostgreSQL because it can be done much faster and it's difficult to do something wrong. Binaries are available for all major Linux distributions such as Red Hat and Mandrake.

In this section you will learn to deal with PostgreSQL binaries on Linux Debian and Linux Red Hat.

6.1.1.1 Installing and Removing RPM Packages

Installing RPM packages is an easy task. Download the latest RPM packages from a PostgreSQL mirror near you and install them with a simple shell command:

 [root@router postgres]# rpm -Uvh *rpm Preparing...                ########################################### [100%]    1:postgresql-docs        ########################################### [  7%]    2:postgresql-jdbc        ########################################### [ 15%]    3:postgresql-libs        ########################################### [ 23%]    4:postgresql             ########################################### [ 30%]    5:postgresql-contrib     ########################################### [ 38%]    6:postgresql-devel       ########################################### [ 46%]    7:postgresql-odbc        ########################################### [ 53%]    8:postgresql-perl        ########################################### [ 61%]    9:postgresql-python      ########################################### [ 69%]   10:postgresql-server      ########################################### [ 76%]   11:postgresql-tcl         ########################################### [ 84%]   12:postgresql-test        ########################################### [ 92%]   13:postgresql-tk          ########################################### [100%]

-U makes sure that all packages are updated if an older version of PostgreSQL is already installed. v and h can be used to generate easy-to-read output. On some systems a package called mx might be missing and therefore rpm fails. In this case --nodeps can be added to the RPM command so that no dependencies will be checked. If the mx package is missing, this won't affect PostgreSQL.

Some Linux systems such as SuSE Linux provide tools like Yast to install and remove binary packages. These tools have a graphical user interface and can easily be used. However, on all RPM-based systems, PostgreSQL can also be installed using a fixed set of shell commands.

After packages have been installed, it might be necessary to remove them from the system. Removing packages can also be done using RPM. If you want to find out which PostgreSQL packages are installed on your system, use the following command:

 [root@router /root]# rpm -qva | grep postgresql | sed -e 's/-7.*//gi' postgresql-docs postgresql-devel postgresql-tk postgresql-tcl postgresql-contrib postgresql-server postgresql postgresql-libs postgresql-jdbc postgresql-python postgresql-perl postgresql-odbc postgresql-test

rpm -qva returns a list of all packages installed on the system. From this list you can easily extract all packages related to PostgreSQL by piping the list to the grep command. Now that you have all PostgreSQL packages, you can easily remove the version number of the packages using good old sed. The remaining list can now be used by PRM to remove the packages. With the help of the following command, all PostgreSQL packages are removed:

 [root@router /root]# rpm -ev `rpm -qva | grep postgresql | sed -e 's/-7.*//gi' ` --nodeps error: cannot remove /var/lib/pgsql/data - directory not empty error: cannot remove /var/lib/pgsql - directory not empty error: cannot remove /usr/share/pgsql - directory not empty error: cannot remove /usr/lib/pgsql - directory not empty error: cannot remove /usr/share/man/manl - directory not empty error: cannot remove /usr/lib/perl5/site_perl/5.6.0/i386-linux/auto/Pg  directory not empty

The command generating the list of PostgreSQL packages installed on the system is added to rpm -ev --nodeps. With the help of command-line substitution, the list of files will be passed to rpm -ev --nodeps. --nodeps makes sure that RPM does not check for dependencies. This can be useful when substituting packages. If you want to install qmail instead of sendmail, for instance, it is useful to uninstall sendmail using the --nodeps flag because otherwise you'd have to uninstall half of the system to fulfill all dependencies before installing qmail.

As you can see, some errors are displayed on the screen. PostgreSQL does not delete the directory where databases are located because this might lead to trouble. Therefore error messages are displayed to tell the user that some components still reside on the system.

6.1.1.2 Installing and Removing Debian Packages

Debian is not based on RPM because it has its own package managing system. Installing and removing Debian packages can easily be done using dselect, which has a curses-based user interface.

If you want to install PostgreSQL using a shell command, you can use dpkg.

If recent Debian packages are not available, you can generate them with the help of alien. Alien is a package for converting Unix packages from one format to another. With alien, it is an easy task to generate Debian packages out of RPM packages.

To download alien, check out http://kitenet.net/programs/alien.

6.1.1.3 Starting PostgreSQL

Now that PostgreSQL has been installed successfully, it is time to start the database daemon so that the server can be accessed.

If you have installed the binary packages, this can easily be done with the help of init scripts, which are included in PostgreSQL's binary distribution. In the case of Red Hat Linux, PostgreSQL can easily be started as follows:

 [root@router /root]# /etc/rc.d/init.d/postgresql start Initializing database:                                     [  OK  ] Starting postgresql service:                               [  OK  ]

The script checks whether the server has been started the first time and takes appropriate action. Now PostgreSQL can be accessed from the local host.

If you are not using Red Hat Linux, the init scripts might be located in a different directory. In the case of Debian Linux, the init scripts can be found in /etc/init.d.

6.1.2 Installing the Source Code

When you want to run the latest version of PostgreSQL, it is useful to install the source code of PostgreSQL instead of the binary distribution. The advantage of installing the source code is that you know perfectly well what you have done and which flags have been set. In addition, it gives you great flexibility and you can add your own extensions to PostgreSQL. If there are no binaries available, you have to install the source code.

Installing the source code is not difficult and can easily be done.

6.1.2.1 Compiling PostgreSQL

The following procedure works for Unix-based operating systems. The first thing to do is to download the latest version of PostgreSQL on your hard disk.

After you've completed the download, extract the tar archive using a simple shell command:

 [root@athlon postgres]# tar xvfz postgresql-7.2.tar.gz

A directory containing the source code will be generated. In the next step, PostgreSQL must be configured. Enter the directory and run configure to see which flags can be set:

 [root@athlon postgresql-7.2]# ./configure --help Usage: configure [options] [host] Options: [defaults in brackets after descriptions] Configuration:   --cache-file=FILE       cache test results in FILE   --help                  print this message   --no-create             do not create output files   --quiet, --silent       do not print `checking...' messages   --version               print the version of autoconf that created configure Directory and file names:   --prefix=PREFIX         install architecture-independent files in PREFIX                           [/usr/local/pgsql]   --exec-prefix=EPREFIX   install architecture-dependent files in EPREFIX                           [same as prefix]   --bindir=DIR            user executables in DIR [EPREFIX/bin]   --sbindir=DIR           system admin executables in DIR [EPREFIX/sbin]   --libexecdir=DIR        program executables in DIR [EPREFIX/libexec]   --datadir=DIR           read-only architecture-independent data in DIR                           [PREFIX/share]   --sysconfdir=DIR        read-only single-machine data in DIR [PREFIX/etc]   --sharedstatedir=DIR    modifiable architecture-independent data in DIR                           [PREFIX/com]   --localstatedir=DIR     modifiable single-machine data in DIR [PREFIX/var]   --libdir=DIR            object code libraries in DIR [EPREFIX/lib]   --includedir=DIR        C header files in DIR [PREFIX/include]   --oldincludedir=DIR     C header files for non-gcc in DIR [/usr/include]   --docdir=DIR           doc documentation in DIR [PREFIX/doc]   --mandir=DIR            man documentation in DIR [PREFIX/man]   --srcdir=DIR            find the sources in DIR [configure dir or ..]   --program-prefix=PREFIX prepend PREFIX to installed program names   --program-suffix=SUFFIX append SUFFIX to installed program names   --program-transform-name=PROGRAM                           run sed PROGRAM on installed program names Host type:   --build=BUILD           configure for building on BUILD [BUILD=HOST]   --host=HOST             configure for HOST [guessed]   --target=TARGET         configure for TARGET [TARGET=HOST] Features and packages:   --disable-FEATURE       do not include FEATURE (same as --enable-FEATURE=no)   --enable-FEATURE[=ARG]  include FEATURE [ARG=yes]   --with-PACKAGE[=ARG]    use PACKAGE [ARG=yes]   --without-PACKAGE       do not use PACKAGE (same as --with-PACKAGE=no)   --x-includes=DIR        X include files are in DIR   --x-libraries=DIR       X library files are in DIR --enable and --with options recognized:   --with-includes=DIRS    look for additional header files in DIRS   --with-libraries=DIRS   look for additional libraries in DIRS   --with-libs=DIRS        alternative spelling of --with-libraries   --enable-locale         enable locale support   --enable-recode         enable character set recode support   --enable-multibyte      enable multibyte character support   --enable-nls[=LANGUAGES]     enable Native Language Support   --with-pgport=PORTNUM   change default port number [5432]   --with-maxbackends=N    set default maximum number of connections [32]   --disable-shared        do not build shared libraries   --disable-rpath         do not embed shared library search path in executables   --enable-debug          build with debugging symbols (-g)   --enable-depend         turn on automatic dependency tracking   --enable-cassert        enable assertion checks (for debugging)   --with-tcl              build Tcl and Tk interfaces   --without-tk            do not build Tk interfaces if Tcl is enabled   --with-tclconfig=DIR    tclConfig.sh and tkConfig.sh are in DIR   --with-tkconfig=DIR     tkConfig.sh is in DIR   --with-perl             build Perl interface and PL/Perl   --with-python           build Python interface module   --with-java             build JDBC interface and Java tools   --with-krb4[=DIR]       build with Kerberos 4 support [/usr/athena]   --with-krb5[=DIR]       build with Kerberos 5 support [/usr/athena]   --with-krb-srvnam=NAME  name of the service principal in Kerberos [postgres]   --with-pam[=DIR]        build with PAM support [/usr]   --with-openssl[=DIR]    build with OpenSSL support [/usr/local/ssl]   --enable-odbc           build the ODBC driver package   --with-unixodbc         build ODBC driver for unixODBC   --with-iodbc            build ODBC driver for iODBC   --with-odbcinst=DIR     default directory for odbcinst.ini [sysconfdir]   --with-CXX              build C++ modules (libpq++)   --with-gnu-ld           assume the C compiler uses GNU ld [default=no]   --enable-syslog         enable logging to syslog

As you can see, the list of flags seems to be endless and in most cases, you will need only a few of these flags. In many cases you will need additional libraries in order to make PostgreSQL compile properly. Let's have a look at the most important flags you will need to compile PostgreSQL (this is not a complete reference).

--prefix=PREFIX This flag defines where PostgreSQL's binaries should be installed. Usually this flag is set to define the root directory of PostgreSQL.
--datadir=DIR This flag defines where the templates of your database will be installed. This is not the directory where your database will be located this is the directory where the files used to create databases will be stored.
--enable-locale Local support is very often needed when working with non-English data. This flag will cause a small decrease in performance.
--enable-recode This flag can be used to recode single-byte characters. For users of the Cyrillic language set, this will be an essential flag, but it can also be used for other character sets.
--enable-multibyte For users of languages like Japanese, Korean, and Chinese, this flag is required because one character cannot be stored in one byte.
--enable-nls[=LANGUAGES] Every language has its specific characters and symbols. With the help of this flag, it is possible to use languages other than English.
--with-pgport=NUMBER Usually port 5432 (TCP) is used as the default port. If you want to use another port as the default value, this flag has to be set.
--with-CXX If you need the C++ interface to PostgreSQL, this flag has to be enabled.
--with-perl Builds the Perl modules and installs them in the place where Perl modules are usually installed on your system.
--with-python To build the Python modules, this flag has to be enabled.
--with-tcl Builds the TCL/Tk components of PostgreSQL.
--without-tk This flag will cause PostgreSQL not to build the Tk components of the TCL/Tk modules. If you are working with console applications, this might be useful for you.
--enable-odbc Builds the ODBC driver.
--with-iodbc Builds the ODBC driver for the iODBC driver instead of the unixODBC driver.
--with-unixodbc Builds ODBC for the unixODBC driver.
--with-krb5=DIRECTORY Builds Kerberos 5 support for PostgreSQL. Only one version of Kerberos can be enabled at a time (either 4 or 5).
--with-openssl=DIRECTORY To use PostgreSQL with SSL, this flag must be enabled and the path to the OpenSSL installation must be defined.
--with-java To build Java support, ant must be installed on your system; otherwise, compiling the sources will fail.
--enable-syslog This flag enables syslog, but it does not mean that syslog has to be used for logging.

In addition, compiler flags can be passed to configure using environment variables:

CC and CXX Defines a C or C++ compiler. Using these flags, it is possible to select a C compiler that would not be used by configure.
CFLAGS and CXXFLAGS Both variables can be used to pass flags to the C and C++ compiler. In case of Mac OS X, -flat-namespace should be used.

After you have seen which flags are accepted by configure, it is time to configure PostgreSQL. Therefore it is useful to write a short shell script, which in this example is called compile.sh. This is not necessary, but it will help you to remember how the current version of PostgreSQL has been compiled:

 #!/bin/sh env CFLAGS=' -march=athlon -O3 ' ./configure --prefix=/usr/local/postgresql \         --enable-locale --enable-recode --enable-multibyte --with-tcl \         --with-perl --with-python --with-openssl=/usr/share/ssl \         --enable-odbc --with-unixodbc --with-CXX make make install

Before you can start compile.sh, it is necessary to create the directory where the binaries will be installed:

 [root@athlon postgresql-7.2]# mkdir /usr/local/postgresql

As you can see, the name of the directory created is the same as you have defined in compile.sh.

Two flags are passed to configure via the environment variable CFLAGS. -march=athlon tells GCC to optimize the code for Athlon CPUs, and -O3 tells GCC that functions can be substituted by inline code, which in some cases leads to faster code. If you are not using AMD CPUs as we have in this example, you can optimize the code for any other CPU. For information about other hardware architectures and optimization, check out the man pages of the compiler you are planning to use. Keep in mind that these flags passed to configure are optional and need not be defined.

In addition to --prefix, multibyte support and some flags related to languages are also enabled. Interfaces to Tcl, Perl, Python, and C++ will also be built. To compile SSL, you can see that the directory of the local SSL installation has been defined. The directory in the script works for Red Hat 7.1 if you want to build SSL support for other distributions, check out the documentation of the software you are using. For building ODBC, two flags have been set. First ODBC is enabled. The second flag tells that the interface should be built for unixODBC, which is included in Red Hat 7.1. In general two ODBC drivers for Unix are available. In addition to unixODBC, iODBC can be used. Depending on the Linux distribution you are using, the driver installed on your system may vary.

Now that the server has been configured, make is used to compile the code. With the help of make install, the binaries are copied to the desired location.

6.1.2.2 Creating a Database Cluster

Before you can start PostgreSQL, it is necessary to create a set of databases, which in PostgreSQL is called a database cluster. A database cluster contains a set of database templates plus the database you will define. To create a database cluster, it is first necessary to create a user for running PostgreSQL. Usually this user is called postgres. To create a new user, you can use useradd:

 [root@athlon postgresql-7.2]# useradd postgres

If no error is displayed, the user has successfully been added to the system. Now it is time to assign a password to the new user:

 [root@athlon postgresql-7.2]# passwd postgres Changing password for user postgres New Unix password: Retype new Unix password: passwd: all authentication tokens updated successfully

The password has to be passed to the system twice so that the operating system can make sure that everything is correct.

In the next step you can create a directory where the database cluster should be located. This can easily be done by using mkdir. Make sure that the owner and the group of the directory are changed to the right user after you have created it:

 [root@athlon postgresql-7.2]# mkdir /var/pgdb [root@athlon postgresql-7.2]# chown postgres.postgres /var/pgdb/

Now the bin directory of PostgreSQL has to be added to the path so that the shell can find the executables. This can be done in /etc/profile by adding the appropriate directory to $PATH.

After you've performed all the steps described in this section, a database cluster can finally be generated using initdb:

 bash-2.04$ initdb -L /usr/local/postgresql/share/ -D /var/pgdb/ The files belonging to this database system will be owned by user "postgres". This user must also own the server process. Fixing permissions on existing directory /var/pgdb/... ok creating directory /var/pgdb//base... ok creating directory /var/pgdb//global... ok creating directory /var/pgdb//pg_xlog... ok creating directory /var/pgdb//pg_clog... ok creating template1 database in /var/pgdb//base/1... ok creating configuration files... ok initializing pg_shadow... ok enabling unlimited row size for system tables... ok creating system views... ok loading pg_description... ok vacuuming database template1... ok copying template1 to template0... ok Success. You can now start the database server using:     /usr/local/postgresql/bin/postmaster -D /var/pgdb/ or    /usr/local/postgresql/bin/pg_ctl -D /var/pgdb/ -l logfile start

If no error was displayed on the screen, a new database cluster has been added to the system.

6.1.2.3 Starting and Stopping PostgreSQL

If a database cluster has been created, you can start the database now. Starting the database means that a daemon called postmaster has to be started. This can be done in two ways: The postmaster can be started directly, or the postmaster can be started by pg_ctl.

Let's have a look at the command-line parameters the postmaster accepts:

 bash-2.04$ postmaster --help postmaster is the PostgreSQL server. Usage:   postmaster [options...] Options:   -B NBUFFERS     number of shared buffers (default 64)   -c NAME=VALUE   set run-time parameter   -d 1-5          debugging level   -D DATADIR      database directory   -F              turn fsync off   -h HOSTNAME     host name or IP address to listen on   -i              enable TCP/IP connections   -k DIRECTORY    Unix-domain socket location   -l              enable SSL connections   -N MAX-CONNECT  maximum number of allowed connections (default 32)   -o OPTIONS      pass 'OPTIONS' to each backend server   -p PORT         port number to listen on (default 5432)   -S              silent mode (start in background without logging output) Developer options:   -n              do not reinitialize shared memory after abnormal exit   -s              send SIGSTOP to all backend servers if one dies Please read the documentation for the complete list of run-time configuration settings and how to set them on the command line or in the configuration file. Report bugs to <pgsql-bugs@postgresql.org>.

As you can see, the postmaster accepts many parameters and many things can be defined when starting the daemon. One of the most important flags is the -D flag, which defines the database cluster the postmaster should work on. Another flag you should not forget about is the -i flag. If this flag is not enabled, PostgreSQL will not accept TCP/IP connections. Without -i, PostgreSQL is only capable of accepting local connections. Many people forget about this and have problems connecting PostgreSQL to the Internet.

Normally the postmaster is not started directly. In most cases it is started using pg_ctl. Using pg_ctl has many advantages because it allows you to start and shut down PostgreSQL comfortably and in a secure way. Let's have a look at the syntax overview of pg_ctl:

 bash-2.04$ pg_ctl --help pg_ctl is a utility to start, stop, restart, reload configuration files, or report the status of a PostgreSQL server. Usage:   pg_ctl start   [-w] [-D DATADIR] [-s] [-l FILENAME] [-o "OPTIONS"]   pg_ctl stop    [-W] [-D DATADIR] [-s] [-m SHUTDOWN-MODE]   pg_ctl restart [-w] [-D DATADIR] [-s] [-m SHUTDOWN-MODE] [-o "OPTIONS"]   pg_ctl reload  [-D DATADIR] [-s]   pg_ctl status  [-D DATADIR] Common options:   -D DATADIR            Location of the database storage area   -s                    Only print errors, no informational messages   -w                    Wait until operation completes   -W                    Do not wait until operation completes (The default is to wait for shutdown, but not for start or restart.) If the -D option is omitted, the environment variable PGDATA is used. Options for start or restart:   -l FILENAME           Write (or append) server log to FILENAME.  The                         use of this option is highly recommended.   -o OPTIONS            Command line options to pass to the postmaster                         (PostgreSQL server executable)   -p PATH-TO-POSTMASTER Normally not necessary Options for stop or restart:   -m SHUTDOWN-MODE      May be 'smart', 'fast', or 'immediate' Shutdown modes are:   smart                 Quit after all clients have disconnected   fast                  Quit directly, with proper shutdown   immediate             Quit without complete shutdown; will lead                         to recovery run on restart Report bugs to <pgsql-bugs@postgresql.org>.

pg_ctl accepts many parameters as well. Keep in mind that pg_ctl does nothing except start the postmaster. Therefore -D must also be defined to tell the postmaster which database cluster should be used. If you want to create a logfile containing all information returned by the database, server -l has to be used. Writing the logging information into a file is highly recommended because it allows you to keep track of what is going on inside your database. In some cases it might also help you to debug your applications. -o can be used to pass parameters to the backend processes, which are responsible for handling connections to the database. This is very important because it allows you to configure the database to your needs while it is running. One parameter that has to be passed to the postmaster is the -i flag. As you can see, pg_ctl does not provide a -i flag itself, so it has to be sent to the postmaster using -o.

Let's start the postmaster with the help of pg_ctl:

 bash-2.04$ pg_ctl -D /var/pgdb/ -l /tmp/postgresql.log -o "-i" start postmaster successfully started

-D tells PostgreSQL to use the database cluster you have created before. The logging information will be stored in /tmp/postgresql.log. All values defined by -o are passed to the postmaster process. In this example, you want -i to be passed to the back end.

To see if PostgreSQL has been started successfully, you can use ps in combination with grep. ps ax lists all processes running on the system, and grep post makes sure that only those processes belonging to PostgreSQL are displayed:

 bash-2.04$ ps ax | grep post 29459 pts/2    S      0:00 su - postgres 29546 pts/2    SW     0:00 su postgres 29882 pts/2    S      0:00 /usr/local/postgresql/bin/postmaster -i 29883 pts/2    S      0:00 postgres: stats buffer process 29885 pts/2    S      0:00 postgres: stats collector process 29888 pts/2    S      0:00 grep post

As you can see, three processes are running. Those of you who are running PostgreSQL < 7.2 will see that only one process is running.

To stop PostgreSQL again, you can use pg_ctl as well:

 bash-2.04$ pg_ctl -D /var/pgdb/ stop waiting for postmaster to shut down......done postmaster successfully shut down

Because many postmaster processes can run simultaneously, it is necessary to define the location of the database the postmaster you want to shut down is operating on.

If you want to run many postmasters at once, every postmaster needs its own database cluster to work with it is not possible to start two postmasters working on just one cluster. You also have to make sure that every postmaster has its own TCP port to listen to. Otherwise PostgreSQL is not able to find out which request is related to which postmaster process.

As you can see, PostgreSQL provides several methods for shutting down. Depending on how fast the database should quit, the user can decide which method is best to use. With the help of the -m flag, you can choose what kind of shutdown you consider to be useful.