One problem that you will most likely encounter in your database life is the need to generate unique identifiers. We've already seen one example of this in the customers tablethe customer_id column is nothing more than a unique identifier. Sometimes, an entity that you want to store in your database will have a naturally unique identifier. For example, if you are designing a database to track employee information (in the United States), a Social Security number might make a good identifier. Of course, if you employ people who are not U.S. citizens, the Social Security number scheme will fail. If you are tracking information about automobiles, you might be tempted to use the license plate number as a unique identifier. That would work fine until you needed to track autos in more than one state. The VIN (or Vehicle Identification Number) is a naturally unique identifier.
Quite often, you will need to store information about an entity that has no naturally unique ID. In those cases, you are likely to simply assign a unique number to each entity. After you have decided to create a uniquifier[10], the next problem is coming up with a sequence of unique numbers.
[10] I'm not sure that "uniquifier" is a real word, but I've used it for quite some time and it sure is a lot easier to say than "disambiguator."
PostgreSQL offers help in the form of a SEQUENCE. A SEQUENCE is an object that automatically generates sequence numbers. You can create as many SEQUENCE objects as you like: Each SEQUENCE has a unique name.
Let's create a new SEQUENCE that you can use to generate unique identifiers for rows in your customers table. You already have a few customers, so start the sequence numbers at 10:
movies=# CREATE SEQUENCE customer_id_seq START 10; CREATE
The "ds " command (in psql) shows you a list of the SEQUENCE objects in your database:
movies=# ds List of relations Name | Type | Owner -----------------+----------+------ customer_id_seq | sequence | korry (1 row)
Now, let's try using this SEQUENCE. PostgreSQL provides a number of functions that you can call to make use of a SEQUENCE. The one that you are most interested in at the moment is the nextval() function. When you call the nextval() function, you provide (in the form of a string) the name of the SEQUENCE as the only argument.
For example, when you INSERT a new row in the customers table, you want PostgreSQL to automatically assign a unique customer_id :
movies=# INSERT INTO movies-# customers( customer_id, customer_name ) movies-# VALUES movies-# ( movies-# nextval( 'customer_id_seq' ), 'John Gomez' movies-# ); movies=# SELECT * FROM customers WHERE customer_name = 'John Gomez'; customer_id | customer_name | phone | birth_date | balance -------------+---------------+-------+------------+-------- 10 | John Gomez | | | (1 row)
You can see that the SEQUENCE (customer_id_seq) generated a new customer_id, starting with the value that you requested. You can use the currval() function to find the value that was just generated by your server process:
movies=# SELECT currval( 'customer_id_seq' ); currval --------- 10
The complete syntax for the CREATE SEQUENCE command is
CREATE SEQUENCE name [ INCREMENT increment ] [ MINVALUE min ] [ MAXVALUE max ] [ START start_value ] [ CACHE cache_count ] [ CYCLE ]
Notice that the only required item is the name.
The INCREMENT attribute determines the amount added to generate a new sequence number. This value can be positive or negative, but not zero. Positive values cause the sequence numbers to increase in value as they are generated (that is, 0, 1, 2, and so on). Negative values cause the sequence numbers to decrease in value (that is, 3, 2, 1, 0, and so on).
The MINVALUE and MAXVALUE attributes control the minimum and maximum values (respectively) for the SEQUENCE.
What happens when a SEQUENCE has reached the end of its valid range? You get to decide: If you include the CYCLE attribute, the SEQUENCE will wrap around. For example, if you create a cyclical SEQUENCE with MINVALUE 0 and MAXVALUE 3, you will retrieve the following sequence numbers: 0, 1, 2, 3, 0, 1, 2, 3, .... If you don't include the CYCLE attribute, you will see: 0, 1, 2, 3, error: reached MAXVALUE.
The START attribute determines the first sequence number generated by a SEQUENCE. The value for the START attribute must be within the MINVALUE and MAXVALUE range.
The default values for most of the SEQUENCE attributes depend on whether the INCREMENT is positive or negative. The default value for the INCREMENT attribute is 1. If you specify a negative INCREMENT, the MINVALUE defaults to -2147483647, and MAXVALUE defaults to -1. If you specify a positive INCREMENT, MINVALUE defaults to 1, and MAXVALUE defaults to 2147483647. The default value for the START attribute is also dependent on the sign of the INCREMENT. A positive INCREMENT defaults the START value to the MINVALUE attribute. A negative INCREMENT defaults the START value to the MAXVALUE attribute.
Remember, these are the defaultsyou can choose any meaningful combination of values that you like (within the valid range of a BIGINT).
The default SEQUENCE attributes are summarized in Table 2.29.
Attribute Name |
Default Value |
---|---|
INCREMENT |
1 |
MINVALUE |
-INCREMENT > 0 ? 1_INCREMENT < 0 ? -2147483647 |
MAXVALUE |
INCREMENT > 0 ? 2147483647 INCREMENT < 0 ? -1 |
START |
INCREMENT > 0 ? MINVALUE INCREMENT < 0 ? MAXVALUE |
CACHE |
1 |
CYCLE |
False |
The CACHE attribute is a performance-tuning parameter; it determines how many sequence numbers are generated and held in memory. In most cases, you can simply use the default value (1). If you suspect that sequence number generation is a bottleneck in your application, you might consider increasing the CACHE attribute, but be sure to read the warning in the PostgreSQL documentation (see the CREATE SEQUENCE section).
You can view the attributes of a SEQUENCE by treating it as a table and selecting from it[11]:
[11] There are four other columns in a SEQUENCE, but they hold bookkeeping information required to properly maintain the SEQUENCE.
movies=# SELECT movies-# increment_by, max_value, min_value, cache_value, is_cycled movies-# FROM movies-# customer_id_seq; increment_by | max_value | min_value | cache_value | is_cycled --------------+-----------+-----------+-------------+----------- 1 | 3 | 0 | 1 | f
PostgreSQL provides three functions that work with SEQUENCE s. I described the nextval() and currval() functions earlier; nextval() generates (and returns) a new value from a SEQUENCE, and currval() retrieves the most recently generated value. You can reset a SEQUENCE to any value between MINVALUE and MAXVALUE by calling the setval() function. For example:
movies=# SELECT nextval( 'customer_id_seq' ); ERROR: customer_id_seq.nextval: reached MAXVALUE (3) movies=# SELECT setval( 'customer_id_seq', 0 ); setval -------- 0 (1 row) movies=# SELECT nextval( 'customer_id_seq' ); nextval --------- 1
Now that you know how SEQUENCE s work in PostgreSQL, let's revisit the SERIAL data type. I mentioned earlier in this chapter that a SERIAL is really implemented as a SEQUENCE (see the "SERIAL, BIGSERIAL, and SEQUENCES" sidebar). Remember that a SERIAL provides an automatically increasing (or decreasing) unique identifier. That sounds just like a SEQUENCE, so what's the difference? A SEQUENCE is a standalone object, whereas SERIAL is a data type that you can assign to a column.
Let's create a new table that contains a SERIAL column:
movies=# CREATE TABLE serial_test ( pkey SERIAL, payload INTEGER ); NOTICE: CREATE TABLE will create implicit sequence 'serial_test_pkey_seq' for SERIAL column 'serial_test.pkey' NOTICE: CREATE TABLE/UNIQUE will create implicit index 'serial_test_pkey_key' for table 'serial_test' CREATE
The CREATE TABLE command is normally silent. When you create a table with a SERIAL column, PostgreSQL does a little extra work on your behalf. First, PostgreSQL creates a SEQUENCE for you. The name of the SEQUENCE is based on the name of the table and the name of the column. In this case, the SEQUENCE is named serial_test_pkey_seq. Next, PostgreSQL creates a unique index. We haven't really talked about indexes yet: for now, know that a unique index on the pkey column ensures that you have no duplicate values in that column. PostgreSQL performs one more nicety for you when you create a SERIAL column. The d command (in psql) shows you this last step:
movies=# d serial_test Table "public.serial_test" Attribute | Type | Modifier -----------+---------+------------------------------------------------- pkey | integer | not null default nextval('serial_test_pkey_seq') payload | integer | Index: serial_test_pkey_key
PostgreSQL has created a default value for the pkey column. A column's default value is used whenever you insert a row but omit a value for that column. For example, if you execute the command INSERT INTO serial_test( payload ) VALUES( 24307 );, you have not provided an explicit value for the pkey column. In this case, PostgreSQL evaluates the default value for pkey and inserts the resulting value. Because the default value for pkey is a call to the nextval() function, each new row is assigned a new (unique) sequence number.
Part I: General PostgreSQL Use
Introduction to PostgreSQL and SQL
Working with Data in PostgreSQL
PostgreSQL SQL Syntax and Use
Performance
Part II: Programming with PostgreSQL
Introduction to PostgreSQL Programming
Extending PostgreSQL
PL/pgSQL
The PostgreSQL C APIlibpq
A Simpler C APIlibpgeasy
The New PostgreSQL C++ APIlibpqxx
Embedding SQL Commands in C Programsecpg
Using PostgreSQL from an ODBC Client Application
Using PostgreSQL from a Java Client Application
Using PostgreSQL with Perl
Using PostgreSQL with PHP
Using PostgreSQL with Tcl and Tcl/Tk
Using PostgreSQL with Python
Npgsql: The .NET Data Provider
Other Useful Programming Tools
Part III: PostgreSQL Administration
Introduction to PostgreSQL Administration
PostgreSQL Administration
Internationalization and Localization
Security
Replicating PostgreSQL Data with Slony
Contributed Modules
Index