The next step in securing a PostgreSQL installation is determining which computers are allowed to access your data.
PostgreSQL uses the $PGDATA/pg_hba.conf file to control client access (hba is an acronym for host-based authentication). When a client application (such as psql) TRies to connect to a PostgreSQL server, it sends a username and database name to the postmaster (I'll call those the target user and the target database). For example, if you run psql like this
$ psql -U bruce -d sales
the target user is bruce and the target database is sales.
When this request arrives, the postmaster searches through pg_hba.conf to find an entry that matches the target user, target database, client address, and connection type. Each entry in the pg_hba.conf file contains a connection type, a database name (or a set of database names), a username (or a set of usernames), and a client address. The postmaster will rarely find an entry that's an exact match. Instead, an entry may contain wildcards that match a number of different client requests. For example, the postmaster may find an entry that matches all users, or perhaps a group of related databases. The postmaster searches pg_hba.conf from the beginning and stops searching as soon as it finds an entry that matches all four fields (user, database, client, and connection type). If the postmaster reaches the end of pg_hba.conf without finding a match, it refuses the connection request.
If the postmaster finds an entry that matches the connection request, it uses that entry to decide how to authenticate the user. Each entry specifies an authentication method (such as TRust, reject, or password)the client application must authenticate itself (that is, prove its identity) using the method designated in the matching entry. For example, if the postmaster finds a matching pg_hba.conf enTRy that requires password authentication, the client application must supply the correct password. PostgreSQL supports a number of authentication methodseach method adds a different level of security to your system. In general, stronger authentication methods are more inconvenient than weaker authentications, but the stronger methods provide better protection for your data.
Let's start by looking at a short and simple pg_hba.conf file:
# Allow all local users to connect without providing passwords local all all trust # Allow users on our local network to connect to # database 'movies' if they have a valid password host movies 192.168.0.0 255.255.255.0 password
First, you should know that lines that begin with a # character are comments, and blank lines are ignored.
The first field in each record corresponds to a type of connection. PostgreSQL currently supports four types of connections:
If you're using an older version of PostgreSQL, the pg_hba.conf understands a slightly different set of connection types:
As the postmaster searches through pg_hba.conf, it compares the connection type field to the method used by the client. For example, if the client contacts the postmaster over an SSL secured TCP/IP connection, the postmaster will examine any entry that begins with hostssl or host the postmaster will ignore local and hostnossl entries.
The second field in each pg_hba.conf record identifies the set of databases controlled by the entry. The postmaster searches through pg_hba.conf to find an entry that matches the target database. You can include the name of a single database in this field, a comma-separated list of database names, or one of three special values. The string all controls access to all databases, and sameuser controls access to a database whose name is identical to the name of the user making the connection. The third special value is samegroup. When the server finds samegroup in the database field, it searches for a group whose name matches the target database. If such a group exists and the target user is a member of that group, the entry matches. If the group does not exist (or if the target user is not a member of the group), the server continues its search.
For example, if the postmaster encounters a samegroup entry when user bruce TRies to connect to a database named sales, the entry matches as long as bruce is a member of the group sales.
The third field in each entry identifies the set of users controlled by the entry. You can enter the name of a single user in this field, the name of a single group (prefix group names with a +), a comma-separated list of user and group names, or the string all (which matches any user).
If you're PostgreSQL version 8.0 or later, you can store database names and user/group names in separate files. Just prefix the name of the file with an @ and write that into the pg_hba.conf file in place of the database (or user) field, like this:
local @myDatabases.txt @myUsers.txt trust
The fourth field identifies the set of client addresses controlled by the entry. You cannot include a hostname in this fieldyou must specify clients by IP address. PostgreSQL prohibits hostnames to deter DNS spoofing attacks. You can specify a group of related hosts using a netmask (such as 192.168.0.0 255.255.255.0) or a CIDR-formatted address (such as 192.168.0.0/24).
The remainder of the pg_hba.conf entry depends on the connection type. I'll look at each one in turn.
local Connections
The format of a local enTRy is
local database user authentication-method [authentication-option]
A local enTRy matches any connection attempt that arrives on a Unix-domain socket.
You know that the database field contains the name of a database (or all, sameuser, or samegroup). The user field contains a list of user (and/or group) names. The client-address field is implied for local connection types (local means that the client resides on the same host as the postmaster). The authentication-method field determines what method you must use to prove your identity. I'll explain authentication methods and authentication options in a moment.
host, hostssl, and hostnossl Connections
The format of a host or hostssl record is
host database user ip-address authentication-method [ option ] hostnossl database user ip-address authentication-method [ option ] hostssl database user ip-address authentication-method [ option ]
The ip-address field specifies either a TCP/IP host or a TCP/IP network (by numeric address). You can specify the ip-address in netmask form or in CIDR form. In netmask form, you provide two pieces of information: an IP-address and a mask (with a space in between). The mask tells the postmaster how many bits in the ip-address are significant. For example, if the postmaster sees an address such as
192.168.0.0 255.255.255.0
the given entry matches any host in the 192.168.0.* network. To explicitly match a single computer, specify the exact IP address and mask of 255.255.255.255 (for example, 192.168.0.42 255.255.255.255).
In CIDR form, you provide an IP address and the number of significant bits (with a '/'in between). For example, if the postmaster sees an address such as the given entry matches any host in the 192.168.0.* network. To explicitly match a single computer, specify the exact IP address followed by the number of bits in the address (for example, 192.168.0.42/32 or, for an IPv6 address, fe80::290:4bff:fe94:7103/64).
192.168.0.0/24
If you try to connect to a postmaster and your client address does not match any of the pg_hba.conf records, the connection is rejected.
Now let's look at the authentication methods. Remember that you can specify a different authentication method for each host (or for each network). Some authentication methods are more secure than others, whereas some methods are more convenient than others.
The trust Authentication Method
When you use the trust authentication method, you allow any user on the client system to access your data. The client application does not have to provide any passwords (beyond what may be required to log in to the client system itself).
trust is the least secure of the authentication methodsit relies on the security of the client system.
You should never use trust to authenticate a connection attempt in an insecure network.
In most cases, you won't want to use the trust method to authenticate local connections. At first glance, it seems reasonable to trust the security on your own host; after all, I have to prove my identity to the operating system before I can start a client application. But the problem is not that I can fool the operating system; the problem is that I can impersonate another user. Consider the following scenario:
Welcome to arturo, please login... login: korry Password: cows Last login: Fri Jan 18 10:48:00 from marsalis [korry]$ psql -U sheila -d movies Welcome to psql, the PostgreSQL interactive terminal. movies=>
To log in to my host (arturo) as user korry, I am required to provide an operating system-authenticated password. But, if the movies database allows local connection attempts to be trusted, nothing stops me from impersonating another user (possibly gaining elevated privileges).
Given the security problems with trust, why would you ever want to use it? The TRust authentication method is useful on single-user machines (that is, systems with only one user authorized to log in). You may also use TRust to authenticate local connections on development or testing systems.
You never want to use TRust on a multiuser system that contains important data.
The ident Authentication Method
The ident authentication method (like TRust) relies on the client system to authenticate the user.
In the previous section, I showed you how easy it is to impersonate another user using the TRust authentication method. All I have to do to impersonate another user is use the -U flag when I fire up the psql client application.
ident tries to be a bit more secure. Let's pretend that I am currently logged in to host vivaldi as user korry, and I want to connect to a PostgreSQL server running on host arturo:
$ whoami korry $ psql -h arturo -d movies -U korry Welcome to psql, the PostgreSQL interactive terminal. movies=> select user; current_user korry
I'll walk through the authentication process for this connection.
First, my local copy of psql makes a TCP/IP connection to the postmaster process on host arturo and sends my username (korry). The postmaster (on arturo) connects back to the identd daemon on host vivaldi (remember, I am running psql on host vivaldi). The postmaster sends the psql-to-postmaster connection information to identd and identd replies with my username (also korry).
Now, the postmaster examines the pg_hba.conf record that matches my host. Assume that it finds the following:
host all 192.168.0.85 255.255.255.255 ident sameuser
The sameuser field tells the postmaster that if I am trying to connect using a name that matches the identd response, I am allowed to connect. (That might sound a little confusing at first. When you use the ident authentication method, the postmaster works with two different usernames: the name that I provided to the client application and the name returned by the identd daemon.)
Now let's see what happens when I try to impersonate another user. Recall from the previous section that I can fool the trust authentication method simply by lying about my username. It's a little harder to cheat with ident.
Let's say that I am logged in to host vivaldi as user sheila and I try to impersonate user korry. You can assume that because I am logged in to vivaldi, I have proven my identity to vivaldi by providing sheila's password.
$ whoami sheila $ psql -h arturo -d movies -U korry psql: IDENT authentication failed for user 'sheila'
As before, my local copy of psql makes a TCP/IP connection to the postmaster process on host arturo and sends the username that I provided on the command line (korry). The postmaster (on arturo) connects back to the identd daemon on host vivaldi. This time, the identd daemon returns my real username (sheila).
At this point, the postmaster (on arturo) is working with two usernames. I have logged in to the client (vivaldi) as user sheila but when I started psql, I specified a username of korry. Because my pg_hba.conf record specified sameuser, I can't connect with two different usernamesmy connection attempt is rejected.
Now that you've seen how the ident method provides a bit more security than trust, I'll show you a few more options that you can use with ident.
In the preceding examples, I used the sameuser option in my pg_hba.conf record. Instead of sameuser, I can specify the name of a map. A map corresponds to a set of entries in the $PGDATA/pg_ident.conf file. pg_ident.conf is a text file containing one record per line (as usual, blank lines and lines starting with a '#' character are ignored). Each record in the pg_ident.conf file contains three fields:
Here is an example:
# pg_ident.conf # #mapname ident-name pguser-name #----------- ------------- ----------- host-wynton Administrator bruce host-vivaldi Administrator sheila host-vivaldi sheila sheila host-vivaldi korry korry # pg_hba.conf # host all 192.168.0.85 255.255.255.255 ident host-vivaldi host all 192.168.0.22 255.255.255.255 ident host-wynton
You can see in this example that I have defined two ident maps: host-vivaldi and host-wynton. The pg_hba.conf file specifies that any connection attempts from host 192.168.0.85 should use the ident method with the host-vivaldi ident map; any connection attempts from host 192.168.0.22 should use the host-wynton map.
Now look at the pg_ident.conf filethere are three entries in the host-vivaldi map and one entry in the host-wynton map.
The host-wynton map says that if I am logged in to my client machine (192.168.0.22) as user Administrator, I can connect to a database as PostgreSQL user bruce.
The host-vivaldi map says that I can connect as PostgreSQL user sheila if I am logged in to my host as Administrator or if I am logged in as user sheila. Also, if I am logged in as korry, I can connect as PostgreSQL user korry.
So, why is the ident method insecure? Think back to the trust methodit is insecure because you trust the user to tell the truth about his or her identity. ident is insecure because you are trusting the client system. The network protocol used by the identd daemon is very simple and easy to impersonate. It's easy to set up a homegrown program to respond to identd queries with inaccurate usernames. In fact, I recently downloaded and installed an ident server on my Windows laptop, and one of the command-line options allowed me to specify a fake username!
I would recommend against using the ident authentication method except on closed networks (that is, networks where you control all the connected hosts).
The password Authentication Method
The password authentication method provides a reasonably high level of security compared to trust and ident. When you use password authentication, the client must prove its identity by providing a valid password.
On a Unix (or Linux) host, OS passwords are usually stored in /etc/passwd or /etc/shadow. When you log in to a Unix machine, you are prompted for your OS password, and the login program compares the password that you enter with the appropriate entry in the /etc/passwd file.
PostgreSQL authentication passwords are not related to the password that you use to log in to your operating system. The PostgreSQL server stores passwords in the pg_shadow system table (PostgreSQL servers older than version 7.3 could also store passwords in an external file). You use the CREATE USER or ALTER USER commands to maintain passwords. For example, to create a new (password-authenticated) user, you would use the following command:
CREATE USER bruce WITH PASSWORD 'cricketers';
If you want to change bruce's password, you would use the ALTER USER command:
ALTER USER bruce WITH PASSWORD 'Wooloomooloo';
The crypt Authentication Method
The crypt authentication method is nearly identical to password. The only difference between the two is that the password method sends your password over the network in plain-text form while crypt does not. Sending plain-text passwords across the network is always a bad ideait's just too easy to sniff them out of a network stream.
I mentioned in the last section that you could only use encrypted passwords with the md5 authentication method. So how does the crypt authentication method avoid sending plain-text passwords over the network?
When a client application wants to connect to a crypt-authenticating server, the server sends a random number (called a salt value) back to the client. After the client knows what salt value to use, it encrypts the password (entered by the user) with the salt and sends the result to the server. The server reads the plain-text password (stored in the pg_shadow table) and encrypts it with the same salt value. If the two encrypted passwords match, the client is successfully authenticated. The result is that passwords are stored in plain-text form, but the client sends encrypted passwords across the network.
The md5 Authentication Method
The third password-based authentication method is md5. With md5 authentication, you store passwords in the pg_shadow table in encrypted form. md5 authentication was not available prior to PostgreSQL release 7.2.
You create encrypted passwords using the CREATE USER and ALTER USER commands.
ALTER USER bruce WITH ENCRYPTED PASSWORD 'Wooloomooloo';
Note the keyword ENCRYPTED.
md5 is a cryptographically secure message digest algorithm developed by Ron L. Rivest of RSA Security. A message digest algorithm takes a plain-text message (in our case, a password) and produces a long number, called a hash or digest, based on the contents of the message. The md5 algorithm is carefully designed so that no two messages are likely to produce the same digest. It is nearly impossible to recover the original password given an md5 digest.
How can a message digest be used as a password? If you feed two passwords into the md5 algorithm, you will get the same digest value if the passwords are identical. When you create an encrypted password, the password itself is not actually stored in pg_shadow. Instead, PostgreSQL computes an md5 digest over the password and stores the digest. When a client attempts to connect using md5 authentication, the client computes an md5 digest over the password provided by the user and sends the digest to the server. The server compares the digest stored in pg_shadow with the digest provided by the client. If the two digests are identical, it is extremely likely that the passwords match.
There are a couple of security holes in the procedure that I just described. Let's say that bruce and sheila each happened to choose the same password. Two identical passwords will produce the same message digest. If bruce happened to notice that his password had the same message digest as sheila's, he would know that he and sheila had chosen the same password. To avoid this problem, PostgreSQL combines each password with the user's name before computing the md5 digest. That way, if two users happen to choose the same password, they won't have the same md5 digests. The second problem has to do with network security. If a client sent the same message digest to the server every time a given user logged in, the message digest would essentially function as a plain-text password. A nefarious user could watch the network traffic, capture the plain-text message digest, and impersonate the real user (by providing the same plain-text message digest). Instead, PostgreSQL uses the salt strategy that I described earlier. When a client connects to an md5 authenticating server, the server sends a random salt to the client. The client computes an md5 digest based on the user ID and password; this digest matches the digest stored in pg_shadow. The client then combines the salt (from the server) with the first md5 digest and computes a second digest. The second digest is sent to the server. The server combines the salt with the digest stored in pg_shadow and computes a new md5 digest. The server then compares the client's digest with its (salted) ownif the digests match, the passwords match.
The pam Authentication Method
The final password-based authentication method is pam (Pluggable Authentication Module). You've probably noticed that PostgreSQL offers many methods for authenticating a user. The authentication problem is not unique to PostgreSQLmany applications need to authenticate users. The goal of pam is to separate the act of authenticating a user from each and every application by placing authentication services in a framework that can be called by any application.
A system administrator can define different authentication methods for each application, depending on how secure the application needs to be. Using pam, an administrator can create a completely open system, requiring no passwords at all, or can choose to authenticate users using passwords, challenge-response protocols, or even more esoteric biometric authentication methods. PostgreSQL can use the pam framework.
Although pam can be ported to many Unix systems, it is most commonly found in Linux and Solaris. Configuring a pam system is not for the faint-of-heart, and the topic deserves an entire book. Because of space considerations, I won't try to describe how to configure a pam installation. Instead, I recommend that you visit the Linux-PAM web site (http://www.kernel.org/pub/linux/libs/pam/) for more information.
The krb4 and krb5 Authentication Methods
The krb4 and krb5 authentication methods correspond to Kerberos version 4 and Kerberos version 5, respectively. Kerberos is a network-secure authentication service developed at MIT.
Kerberos is a complex package (particularly from the administrator's point of view), but it offers a high level of security. After Kerberos is properly installed and configured, it is easy to use.
The easiest way to understand Kerberos is to compare it with a more traditional authentication method.
Let's say that you want to use telnet to connect to another host (bach) on your network. You start by issuing the telnet command:
$ telnet bach Trying bach... Connected to bach (192.168.0.56) Escape character is '^]'. login: korry Password: cows Last login: Thu Jan 24 19:18:44 $
After providing your username, the login program (on bach) asks for your password. Your password is compared with the password stored on bach (in the /etc/passwd or /etc/shadow file). If the password that you provide matches, you have proven your identity and bach permits access.
If you log out of bach and log back in, you must again provide your identity and prove that you are who you say you are.
Now let's see how you perform the same operation when using Kerberos.
With Kerberos, you don't have to prove your identity to each server; instead, you authenticate yourself to a trusted server. In this case, trusted means that both the client (that's you) and the server will trust the Kerberos authentication agent to verify that you are who you say you are.
Before you telnet using Kerberos, you must first obtain a ticket.
$ kinit Password for korry@movies.biz: cows
After you enter your password, the kinit program contacts the Kerberos authentication server (AS) and asks for a ticket. If your password was correct, the AS returns a chunk of data known as a TGT (ticket-granting ticket). The kinit program stores your TGT in a cache file inside of a temporary directory on your system.
At this point, you have proven your identity to the AS, and the AS has given back a certificate that you can use with servers that trust the AS. You can view your TGT using the klist command:
$ klist Ticket cache: /tmp/krb5cc_tty1 Default principal: korry@movies.biz Valid starting Expires Service principal 25 Jan 02 01:25:47 25 Jan 02 09:25:42 krbtgt/movies.bi@ movies.biz $
(Notice that the ticket expires in about eight hoursI have to occasionally reauthenticate myself to the AS.)
Now, you can use that TGT by using a Kerberos-enabled telnet client to connect to a Kerberos-enabled telnet server:
$ telnet -a bach Trying bach... Connected to bach (192.168.0.56) Escape character is '^]'. Last login: Thu Jan 24 19:18:44 $
There are two things that you should notice about this login example. First, I used the-a flag when I started telnetthat flag asks telnet to use Kerberos authentication. Second, I was not prompted for a user or for a password. Why not? The telnet client (on my local machine) used my TGT to ask the AS for another ticket, specifically a ticket that allows me to connect to the telnet server on bach. The AS sent the second ticket back to my local machine, and the new ticket was stored in my ticket cache. This new ticket is specific to telnet. My local telnet client sends the new ticket to the telnet server. The ticket contains enough secure (encrypted) information to satisfy the telnet server that I have proven my identity (specifically, I have proven my identity to the AS, and the telnet server trusts the AS).
I can view the new ticket with the klist command:
$ klist Ticket cache: /tmp/krb5cc_tty1 Default principal: korry@movies.biz Valid starting Expires Service principal 25 Jan 02 01:25:47 25 Jan 02 09:25:42 krbtgt/movies.biz@movies.biz 25 Jan 02 03:01:25 25 Jan 02 13:01:20 host/bach.movies.biz@movies.biz $
So, how does all this fit into PostgreSQL? PostgreSQL client applications (psql, for example) and the postmaster can be compiled to support Kerberos authentication.
When you specify the krb4- or krb5-authentication method, you are telling the postmaster that client applications must provide a valid Kerberos ticket.
When you connect to a krb4 or krb5 authenticated postmaster with a Kerberos-enabled client application, you are not required to supply a username or passwordinstead, the client application sends a Kerberos ticket to the postmaster.
The nice thing about Kerberos authentication is that it is secure and convenient at the same time. It is secure because you never send plain-text passwords over an insecure network. It is convenient because you authenticate yourself only once (using the kinit program).
As I mentioned earlier, setting up a Kerberos system is not a trivial project. After you have gone through the pain and mystery of installing and configuring Kerberos, you can configure PostgreSQL to use Kerberos to authenticate connection requests. Explaining how to install and configure would require a second book. If you are interested in using Kerberos authentication with PostgreSQL, I recommend you start by reading through the Kerberos web site: http://web.mit.edu/kerberos/www/index.html. The PostgreSQL Administrator's Guide provides the details you will need to connect a PostgreSQL database to an installed Kerberos system.
Kerberos is the second most secure authentication method.
The reject Authentication Method
The reject authentication method is the easiest to understand and is also the most secure. When a client tries to connect from a system authenticated by the reject method, the connection attempt is rejected.
If you try to connect from a system that does not match any of the pg_hba.conf records, you are also rejected.
Why might you want to use the reject method? Let's say that you have a reasonable amount of trust in most of the machines on your network, but you reserve one host as a demonstration machine (192.168.0.15). The demonstration machine should be allowed to access the demo database, but no other databases. Every other host should be allowed to access all databases (using Kerberos 5).
# File: pg_hba.conf # # Type Database Client IP address Netmask Method ###### ######## ################# ############### ###### host demo 192.168.0.15 255.255.255.255 trust host all 192.168.0.15 255.255.255.255 reject host all 192.168.0.0 255.255.255.0 krb5
Notice that there are two entries for the demo machine (192.168.0.15). The first entry allows trusted access to the demo database. The second entry rejects access to all other databases. This demonstrates an important point: The postmaster starts reading at the beginning of the pg_hba.conf file and stops as soon as it finds a record that matches on connection type, database name, and IP address/mask. When a user tries to connect to the demo database from the demo machine, the postmaster searches for a record of type host with a database of either demo, all, or sameuser (and of course, a match on the IP address/Netmask combination). The first record matches, so the postmaster allows access without requiring any form of authentication other than the IP address of the demo machine. Now suppose that a user (again on the demo machine) tries to connect to a different database (say, accounting). This time, the postmaster searches for a record of type host and a database of accounting, all, or sameuser. The first record no longer matches (wrong database name), so the postmaster moves on. The second record matches and the postmaster rejects the connection attempt. If a user logged in to a different host tries to connect, the postmaster will find the third record (the first two records won't match the IP address) and allow access using Kerberos 5 authentication.
If the postmaster can't find a record that matches a connection attempt, the connection is rejected, so you may be wondering why the reject method is needed.
Consider what would happen if you removed the second record from this file. If a user on the demo machine tries to connect to the accounting database, the postmaster will ignore the first record (wrong database) and move on to the last record. The last record says that anyone in our local network should be allowed to connect to all databases using Kerberos 5 authentication. That is clearly the wrong answer.
Part I: General PostgreSQL Use
Introduction to PostgreSQL and SQL
Working with Data in PostgreSQL
PostgreSQL SQL Syntax and Use
Performance
Part II: Programming with PostgreSQL
Introduction to PostgreSQL Programming
Extending PostgreSQL
PL/pgSQL
The PostgreSQL C APIlibpq
A Simpler C APIlibpgeasy
The New PostgreSQL C++ APIlibpqxx
Embedding SQL Commands in C Programsecpg
Using PostgreSQL from an ODBC Client Application
Using PostgreSQL from a Java Client Application
Using PostgreSQL with Perl
Using PostgreSQL with PHP
Using PostgreSQL with Tcl and Tcl/Tk
Using PostgreSQL with Python
Npgsql: The .NET Data Provider
Other Useful Programming Tools
Part III: PostgreSQL Administration
Introduction to PostgreSQL Administration
PostgreSQL Administration
Internationalization and Localization
Security
Replicating PostgreSQL Data with Slony
Contributed Modules
Index