PostgreSQL Indexes | PostgreSQL Essential Reference

I l @ ve RuBoard

Essentially, indexes help database systems search tables more efficiently . The concept of indexes is widely supported among all the popular RDBMSs, and PostgreSQL is no exception.

By default, PostgreSQL has support for up to three types of indexes: B-Tree, R-Tree, and hash. During index creation, the specific type of index required can be specified. Each index type is best suited for a specific type of indexing.

A general rule of thumb when using indexes is to determine what queries your database is making consistent use of. Essentially, indexes should exist for every WHERE criteria in frequently used queries.

B-Tree Indexes

The B-Tree index is based on an implementation of the high-concurrency Lehman-Yao B-Trees. The B-Tree index is a fully dynamic index that does not need periodic optimization.

This is the default index of which PostgreSQL most often makes use. In fact, if the CREATE INDEX command is called with no specification of index types, a B-Tree index will be generated.

B-Tree indexes are used whenever the following list of comparison operators is employed:

 <, <=, =, =>, >

Currently, B-Tree indexes are the only provided indexes that support multicolumn indexing. Up to 16 columns can be aggregated into a B-Tree multicolumn index (although this limit can be altered at compile time).

R-Tree Indexes

R-Tree indexes are especially suited for fast optimization of geometric and/or spatial comparisons. R-Tree indexes are implementations of Antonin Guttman's quadratic splits . The R-Tree index is a fully dynamic index that does not need periodic optimization.

R-Tree indexes are preferentially searched whenever one of the following comparison operators is employed:

 <<, &<, &>, >>, @, ~=, &&

Hash Indexes

The hash index is a standard hash index that is an implementation of Litwin's linear hashing algorithms. This hash index is a fully dynamic index that does not need periodic optimization.

Hash indexes can be used whenever the = operator is employed in a comparison. However, there is no substantial evidence that a hash index is any faster than a B-Tree index, as implemented in PostgreSQL. Therefore, in most cases, it is preferable to use B-Tree for = comparisons.

Other Index Topics

There are also specific uses of indexes that should be talked about. Namely, they can also be used on the output of functions and in multicolumn situations.

Functional Indexes

Often it is desirable to index the results of a function if a particular query makes consistent use of it. For instance, if you were frequently querying a table for a MAX() value from a field, it would be beneficial to create a separate index that contains the output from that function:

 CREATE INDEX max_payroll_idx ON payroll (MAX(salary));

This will result in much faster optimization times when queries call WHERE MAX(salary)>n or other such selection criteria.

Functional indexes are not suited for use in multicolumn indexes.

Multicolumn Indexes

The B-Tree indexing scheme utilized by PostgreSQL can support multicolumn indexes up to 16 columns wide. (This is a compile-time option.) Most frequently, multicolumn indexes are used only when the AND operator is used within a query.

For instance:

 CREATE INDEX name_ssn_idx ON payroll (name, ssn);  SELECT * FROM payroll WHERE name='Bill' AND ssn='555-55-5555';

This would make use of the multicolumn name_ssn_idx . However, the following would not:

 SELECT * FROM payroll WHERE name='Bill' OR ssn='555-55-5555';

Multicolumn indexes generally should be used sparingly. Most often, single-column indexes are more efficient in terms of speed and storage size than are multicolumn indexes.

However, multicolumn indexes can be used with great effect to aggregate unique row keys for tables that contain a lot of similar information. This is generally used to enforce data integrity. For instance, suppose a table made use of the following fields:

 Age Height Name

Each individual field would prove to be difficult to enforce any unique constraints upon. After all, there will be many potential people who are 5'10" or who are 25 and so on. However, there will be decidedly fewer people who are 5'10", 25 years old, and named Bill Smith. This can be a good candidate to use a multicolumn index with unique constraints to enforce data integrity.

Primary Key versus Unique Key Indexes

A common cause of confusion is the fact that there are two key types that, on the surface, seem to perform the same function.

Both primary and unique keys make use of indexes that enforce rules requiring field values to be unique to the table. However, there are some important distinctions and finer differences between them:

Primary keys are mainly used to relate a field value to a specific row (OID) in a table. This is why primary keys can be used as relational keys when used in conjunction with foreign tables. Additionally, primary keys will not allow NULL values to be entered.
Unique keys do not relate a field value to a specific row; they just enforce a uniqueness clause on the specified column. Although this is useful for maintaining data integrity, it is not necessarily as useful for foreign table relations as primary keys are. Moreover, unique keys will generally allow NULL values to be inserted.
There is no functional difference between a UNIQUE NO NULL constraint and a primary key. The keyword PRIMARY KEY exists to serve as a mnemonic device to remind the user of the purpose for the index constraint.

Here's a basic example of the differences: Suppose you have two fields that are important in an employee table. One field is for the employee_id , which is assigned by the system, and the other is an SSN used by humans for data input and so on.

In this scenario, the employee_id should be designated as a primary key, and the SSN should be designated as a unique key.

I l @ ve RuBoard