Creating New Tables | PostgreSQL SQL Syntax and Use

The previous section described how to create and drop databases. Now let's move down one level in the PostgreSQL storage hierarchy and talk about creating and dropping tables.

You've created some simple tables in the first two chapters; it's time to talk about some of the more advanced features of the CREATE TABLE command. Here is the command that you used to create the customers table:

CREATE TABLE customers (
 customer_id INTEGER UNIQUE,
 customer_name VARCHAR(50),
 phone CHAR(8),
 birth_date DATE,
 balance DECIMAL(7,2)
);

This command creates a permanent table named customers. A table name must meet the naming criteria described earlier in this chapter. When you create a table, PostgreSQL automatically creates a new data type[2] with the same name as the table. This means that you can't create a table whose name is the same as an existing data type.

[2] This seems to be a holdover from earlier days. You can't actually do anything with this data type.

When you execute this command, the customers table is created in the database that you are connected to. If you are using PostgreSQL 7.3 or later, the customers table is created in the first schema in your search path. (If you are using a version older than 7.3, your copy of PostgreSQL does not support schemas). If you want the table to be created in some other schema, you can prefix the table name with the schema qualifier, for example:

CREATE TABLE joes_video.customers( ... );

The new table is owned by you. You can't give ownership to another user at the time you create the table, but you can change it later using the ALTER TABLE...OWNER TO command (described later).

When you create a table (or an index), you can tell PostgreSQL to store the object in a specific tablespace by including a TABLESPACE tablespacename clause, like this:

CREATE TABLE joes_video.customers( ... ) TABLESPACE mytablespace;

If you don't specify a tablespace, PostgreSQL creates the table in the tablespace assigned to the schema (if you're creating an index without specifying a tablespace, the index is created in the tablespace of the parent table).

Temporary Tables

I mentioned earlier that the customers table is a permanent table. You can also create temporary tables. A permanent table persists after you terminate your PostgreSQL session; a temporary table is automatically destroyed when your PostgreSQL session ends. Temporary tables are also local to your session, meaning that other PostgreSQL sessions can't see temporary tables that you create. Because temporary tables are local to each session, you don't have to worry about colliding with the name of a table created by another session.

If you create a temporary table with the same name as a permanent table, you are effectively hiding the permanent table. For example, let's create a temporary table that hides the permanent customers table:

CREATE TEMPORARY TABLE customers (
 customer_id INTEGER UNIQUE,
 customer_name VARCHAR(50),
 phone CHAR(8),
 birth_date DATE,
 balance DECIMAL(7,2)
);

Notice that the only difference between this command and the command that you used to create the permanent customers table is the TEMPORARY keyword[3]. Now you have two tables, each named customers. If you now SELECT from or INSERT into the customers table, you will be working with the temporary table. Prior to version 7.3, there was no way to get back to the permanent table except by dropping the temporary table:

[3] You can abbreviate TEMPORARY to TEMP.

movies=# SELECT * FROM customers;
 customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
 8 | Wink Wankel | 555-1000 | 1988-12-25 | 0.00
(5 rows)

movies=# CREATE TEMPORARY TABLE customers
movies-# (
movies(# customer_id INTEGER UNIQUE,
movies(# customer_name VARCHAR(50),
movies(# phone CHAR(8),
movies(# birth_date DATE,
movies(# balance DECIMAL(7,2)
movies(# );
CREATE

movies=# SELECT * FROM customers;
 customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
(0 rows)

movies=# DROP TABLE customers;
DROP

movies=# SELECT * FROM customers;
 customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
 8 | Wink Wankel | 555-1000 | 1988-12-25 | 0.00
(5 rows)

Starting with release 7.3, you can access the permanent table by including the name of the schema where the permanent table resides.

A temporary table is like a scratch pad. You can use a temporary table to accumulate intermediate results. Quite often, you will find that a complex query can be formulated more easily by first extracting the data that interests you into a temporary table. If you find that you are creating a given temporary table over and over again, you might want to convert that table into a view. See the section titled "Using Views" in Chapter 1 for more information about views.

Table Constraints

In Chapter 2, "Working with Data in PostgreSQL," we explored the various constraints that you can apply to a column: NOT NULL, UNIQUE, PRIMARY KEY, REFERENCES, and CHECK(). You can also apply constraints to a table as a whole or to groups of columns within a table.

First, let's look at the CHECK() constraint. The syntax for a CHECK() constraint is

[CONSTRAINT constraint-name] CHECK( boolean-expression )

When you define a CHECK() constraint for a table, you are telling PostgreSQL that any insertions or updates made to the table must satisfy the boolean-expression given within the constraint. The difference between a column constraint and a table constraint is that a column constraint should refer only to the column to which it relates. A table constraint can refer to any column in the table.

For example, suppose that you had an orders table to track customer orders:

CREATE TABLE orders
(
 customer_number INTEGER,
 part_number CHAR(8),
 quantity_ordered INTEGER,
 price_per_part DECIMAL(7,2)
);

You could create a table-related CHECK() constraint to ensure that the extended price (that is, quantity_ordered times price_per_part) of any given order is at least $5.00:

CREATE TABLE orders
(
 customer_number INTEGER,
 part_number CHAR(8),
 quantity_ordered INTEGER,
 price_per_part DECIMAL(7,2),

 CONSTRAINT verify_minimum_order
 CHECK (( price_per_part * quantity_ordered) >= 5.00::DECIMAL )
);

Each time a row is inserted into the orders table (or the quantity_ordered or price_per_part columns are updated), the verify_minimum_order constraint is evaluated. If the expression evaluates to FALSE, the modification is rejected. If the expression evaluates to trUE or NULL, the modification is allowed.

You may have noticed that a table constraint looks very much like a column constraint. PostgreSQL can tell the difference between the two types by their placement within the CREATE TABLE statement. A column constraint is placed within a column definitionafter the column's data type and before the comma. A table constraint is listed outside of a column definition. The only tricky spot is a table constraint that follows the last column definition; you normally would not include a comma after the last column. If you want a constraint to be treated as a table constraint, be sure to include a comma following the last column definition. At the moment, PostgreSQL does not treat table constraints and column constraints differently, but in a future release it may.

Each of the table constraint varieties is related to a type of column constraint.

The UNIQUE table constraint is identical to the UNIQUE column constraint, except that you can specify that a group of columns must be unique. For example, here is the rentals table as currently defined:

CREATE TABLE rentals
(
 tape_id CHARACTER(8),
 customer_id INTEGER,
 rental_date DATE
);

Let's modify this table to reflect the business rule that any given tape cannot be rented twice on the same day:

CREATE TABLE rentals
(
 tape_id CHARACTER(8),
 customer_id INTEGER,
 rental_date DATE,

 UNIQUE( rental_date, tape_id )

);

Now when you insert a row into the rentals table, PostgreSQL will ensure that there are no other rows with the same combination of rental_date and tape_id. Notice that I did not provide a constraint name in this example; constraint names are optional.

The PRIMARY KEY table constraint is identical to the PRIMARY KEY column constraint, except that you can specify that the key is composed of a group of columns rather than a single column.

The REFERENCES table constraint is similar to the REFERENCES column constraint. When you create a REFERENCES column constraint, you are telling PostgreSQL that a column value in one table refers to a row in another table. More specifically, a REFERENCES column constraint specifies a relationship between two columns. When you create a REFERENCES table constraint, you can relate a group of columns in one table to a group of columns in another table. Quite often, you will find that the unique identifier for a table (that is, the PRIMARY KEY) is composed of multiple columns. Let's say that the Virtual Video Store is having great success and you decide to open a second store. You might want to consolidate the data for each store into a single database. Start by creating a new table:

CREATE TABLE stores
(
 store_id INTEGER PRIMARY KEY,
 location VARCHAR
);

Now, change the definition of the customers table to include a store_id for each customer:

CREATE TABLE customers (
 store_id INTEGER REFERENCES stores( store_id ),
 customer_id INTEGER UNIQUE,
 customer_name VARCHAR(50),
 phone CHAR(8),
 birth_date DATE,
 balance DECIMAL(7,2),

 PRIMARY KEY( store_id, customer_id )
);

The store_id column in the customers table refers to the store_id column in the stores table. Because store_id is the primary key to the stores table, you could have written the REFERENCES constraint in either of two ways:

store_id INTEGER REFERENCES stores( store_id )

store_id INTEGER REFERENCES stores

Also, notice that the primary key for this table is composed of two columns: store_id and customer_id. I can have two customers with the same customer_id as long as they have different store_ids.

Now you have to change the rentals table as well:

CREATE TABLE rentals
(
 store_id INTEGER,
 tape_id CHARACTER(8),
 customer_id INTEGER,
 rental_date DATE,

 UNIQUE( rental_date, tape_id )
 FOREIGN KEY( store_id, customer_id ) REFERENCES customers
);

The customers table has a two-part primary key. Each row in the rentals table refers to a row in the customers table, so the FOREIGN KEY constraint must specify a two-part foreign key. Again, because foreign key refers to the primary key of the customers table, I can write this constraint in either of two forms:

FOREIGN KEY( store_id, customer_id ) 
 REFERENCES customers( store_id, customer_id )

FOREIGN KEY( store_id, customer_id ) 
 REFERENCES customers

Now that I have the referential integrity constraints defined, they will behave as described in the "Column Constraints" section of Chapter 2. Remember, a table constraint functions the same as a column constraint, except that table constraints can refer to more than one column.

Dropping Tables

Dropping a table is much easier than creating a table. The syntax for the DROP TABLE command is

DROP TABLE table-name [, ...];

If you are using PostgreSQL 7.3 or later, you can qualify the table name with a schema. For example, here is the command to destroy the rentals table:

DROP TABLE rentals;

If the rentals table existed in some schema other than your current schema, you would qualify the table name:

DROP TABLE sheila.rentals;

You can destroy a table only if you are the table's owner or if you are a PostgreSQL superuser. Notice that I used the word destroy here rather than drop. It's important to realize that when you execute a DROP TABLE command, you are destroying all the data in that table.

PostgreSQL has a nice feature that I have not seen in other databases: You can roll back a DROP TABLE command. Try the following experiment. First, let's view the contents of the tapes table:

movies=# SELECT * FROM tapes;
 
tape_id | title | duration
----------+---------------+----------
 AB-12345 | The Godfather |
 AB-67472 | The Godfather |
 MC-68873 | Casablanca |
 OW-41221 | Citizen Kane |
 AH-54706 | Rear Window |
(5 rows)

Now, start a multistatement transaction and destroy the tapes table:

movies=# BEGIN WORK;
BEGIN

movies=# DROP TABLE tapes;

NOTICE: DROP TABLE implicitly drops referential integrity trigger
 from table "rentals"
DROP

If you try to SELECT from the tapes table, you'll find that it has been destroyed:

movies=# SELECT * FROM tapes;
ERROR: Relation "tapes" does not exist

If you COMMIT this transaction, the table will permanently disappear; let's ROLLBACK the transaction instead:

movies=# ROLLBACK;
ROLLBACK

The ROLLBACK threw out all changes made since the beginning of the transaction, including the DROP TABLE command. You should be able to SELECT from the tapes table again and see the same data that was there before:

movies=# SELECT * FROM tapes;
tape_id | title | duration
----------+---------------+----------
 AB-12345 | The Godfather |
 AB-67472 | The Godfather |
 MC-68873 | Casablanca |
 OW-41221 | Citizen Kane |
 AH-54706 | Rear Window |
(5 rows)

This is a very nice feature. You can roll back CREATE TABLE, DROP TABLE, CREATE VIEW, DROP VIEW, CREATE INDEX, DROP INDEX, and so on. I'll discuss transactions a bit later in this chapter. For now, I'd like to point out a few details that I glossed over in the previous example. You may have noticed that the DROP TABLE command produced a NOTICE.

movies=# DROP TABLE tapes;
NOTICE: DROP TABLE implicitly drops referential integrity trigger
 from table "rentals"
DROP

When you drop a table, PostgreSQL will automatically DROP any indexes defined for that table as well as any triggers or rules. If other tables refer to the table that you dropped (by means of a REFERENCE constraint), PostgreSQL will automatically drop the constraints in the other tables. However, any views that refer to the dropped table will not be removeda view can refer to many tables and PostgreSQL would not know how to remove a single table from a multitable SELECT.

Inheritance

Another PostgreSQL feature that is uncommon in relational database systems is inheritance. Inheritance is one of the foundations of the object-oriented programming paradigm. Using inheritance, you can define a hierarchy of related data types (in PostgreSQL, you define a hierarchy of related tables). Each layer in the inheritance hierarchy represents a specialization of the layer above it[4].

[4] We'll view an inheritance hierarchy with the most general types at the top and the most specialized types at the bottom.

Let's look at an example. The Virtual Video database defines a table that stores information about the tapes that you have in stock:

movies=# d tapes
 Column | Type | Modifiers
----------+-----------------------+-----------
 tape_id | character(8) | not null
 title | character varying(80) | not null
 duration | interval |

For each tape, you store the tape_id, title, and duration. Let's say that you decide to jump into the twenty-first century and rent DVDs as well as videotapes. You could store DVD records in the tapes table, but a tape and a DVD are not really the same thing. Let's create a new table that defines the characteristics common to both DVDs and videotapes:

CREATE TABLE video
(
 video_id CHARACTER(8) PRIMARY KEY,
 title VARCHAR(80),
 duration INTERVAL
);

Now, create a table to hold the DVDs. For each DVD you have in stock, you want to store everything in the video table plus a region_id and an array of audio_tracks. Here is the new table definition:

movies=# CREATE TABLE dvds
movies-# (
movies(# region_id INTEGER,
movies(# audio_tracks VARCHAR[]
movies(# ) INHERITS ( video );

Notice the last line in this command: You are telling PostgreSQL that the dvds table inherits from the video table. Now let's INSERT a new DVD:

movies=# INSERT INTO dvds VALUES
movies=# (
movies(# 'ASIN-750', -- video_id
movies(# 'Star Wars', -- title
movies(# '121 minutes', -- duration
movies(# 1, -- region_id
movies(# '{English,Spanish}' -- audio_tracks
movies(# );

Now, if you SELECT from the dvds table, you'll see the information that you just inserted:

 video_id | title | duration | region_id | audio_tracks
----------+-----------+----------+-----------+-------------------
 ASIN-750 | Star Wars | 02:01:00 | 1 | {English,Spanish}

At this point, you might be thinking that the INHERITS clause did nothing more than create a row template that PostgreSQL copied when you created the dvds table. That's not the caseif you simply want to create a table that has the same structure as another table, use the LIKE table-name clause instead of the INHERITS table-name clause. When we say that dvds inherits from video, we are not simply saying that a DVD is like a video, we are saying that a DVD is a video. Let's SELECT from the video table now; remember, you haven't explicitly inserted any data into the video table, so you might expect the result set to be empty:

movies=# SELECT * FROM video;
 video_id | title | duration
----------+-----------+----------
 ASIN-750 | Star Wars | 02:01:00

A DVD is a video. When you SELECT from the video table, you see only the columns that comprise a video. When you SELECT from the dvds table, you see all the columns that comprise a DVD. In this relationship, you say that the dvds table specializes[5] the more general video table.

[5] Object-oriented terminology defines many different phrases for this inheritance relationship: specialize/generalize, subclass/superclass, and so on. Choose the phrase that you like.

If you are using a version of PostgreSQL older than 7.2, you must code this query as SELECT * FROM video* to see the DVD entries. Starting with release 7.2, SELECT will include descendent tables and you have to say SELECT * FROM ONLY video to suppress descendents.

You now have a new table to track your DVD inventory; let's go back and redefine the tapes table to fit into the inheritance hierarchy. For each tape, we want to store a video_id, a title, and a duration. This is where we started: the video table already stores all this information. You should still create a new table to track videotapesat some point in the future, you may find information that relates to a videotape, but not to a DVD:

movies=# CREATE TABLE tapes ( ) INHERITS( video );
CREATE

This CREATE TABLE command creates a new table identical in structure to the video table. Each row in the tapes table will contain a video_id, a title, and a duration. Insert a row into the tapes table:

movies=# INSERT INTO tapes VALUES
movies-# (
movies(# 'ASIN-8YD',
movies(# 'Flight To Mars(1951)',
movies(# '72 min'
movies(# );
INSERT

When you SELECT from the tapes table, you should see this new row:

movies=# SELECT * FROM tapes;
 tape_id | title | duration
----------+----------------------+----------
 ASIN-8YD | Flight To Mars(1951) | 01:12:00 
(1 row)

And because a tape is a video, you would also expect to see this row in the video table:

movies=# SELECT * FROM video;
 video_id | title | duration
----------+--------------------------------+----------
 ASIN-750 | Star Wars | 02:01:00
 ASIN-8YD | Flight To Mars(1951) | 01:12:00
(2 rows)

Now here's the interesting part. A DVD is a videoany row that you add to the dvds table shows up in the video table. A tape is a videoany row that you add to the tapes table shows up in the video table. But a DVD is not a tape (and a tape is not a DVD). Any row that you add to the dvds table will not show up in the tapes table (and vice versa).

If you want a list of all the tapes you have in stock, you can SELECT from the tapes table. If you want a list of all the DVDs in stock, SELECT from the dvds table. If you want a list of all videos in stock, SELECT from the videos table.

In this example, the inheritance hierarchy is only two levels deep. PostgreSQL imposes no limit to the number of levels that you can define in an inheritance hierarchy. You can also create a table that inherits from multiple tablesthe new table will have all the columns defined in the more general tables.

I should caution you about two problems with the current implementation of inheritance in PostgreSQL. First, indexes are not shared between parent and child tables. On one hand, that's good because it gives you good performance. On the other hand, that's bad because PostgreSQL uses an index to guarantee uniqueness. That means that you could have a videotape and a DVD with the same video_id. Of course, you can work around this problem by encoding the type of video in the video_id (for example, use a T for tapes and a D for DVDs). But PostgreSQL won't give you any help in fixing this problem. The other potential problem with inheritance is that triggers are not shared between parent and child tables. If you define a trigger for the topmost table in your inheritance hierarchy, you will have to remember to define the same trigger for each descendant.

We have redefined some of the example tables many times in the past two chapters. In a real-world environment, you probably won't want to throw out all your data each time you need to make a change to the definition of an existing table. Let's explore a better way to alter a table.

ALTER TABLE

Now that you have a video table, a dvds table, and a tapes table, let's add a new column to all three tables that you can use to record the rating of the video (PG, G, R, and so on).

You could add the rating column to the tapes table and to the dvds table, but you really want the rating column to be a part of every video. The ALTER TABLE ... ADD COLUMN command adds a new column for you, leaving all the original data in place:

movies=# ALTER TABLE video ADD COLUMN rating VARCHAR;
ALTER

Now, if you look at the definition of the video table, you will see the new column:

movies=# d video
 Table "video"
 Column | Type | Modifiers
----------+-----------------------+-----------
 video_id | character(8) | not null
 title | character varying(80) |
 duration | interval |
 rating | character varying |
Indexes:
 "video_pkey" PRIMARY KEY, btree (video_id)

After the ALTER TABLE command completes, each row in the video table has a new column; the value of every rating column will be NULL. Because you have changed the definition of a video, and a DVD is a video, you might expect that the dvds table will also contain a rating column:

movies=# d dvds
 Table "dvds"
 Column | Type | Modifiers
--------------+-----------------------+-----------
 video_id | character(8) | not null
 title | character varying(80) |
 duration | interval |
 region_id | integer |
 audio_tracks | character varying[] |
 rating | character varying |
Inherits: video

Similarly, the tapes table will also inherit the new rating column:

movies=# d tapes
 Table "tapes"
 Column | Type | Modifiers
----------+-----------------------+-----------
 video_id | character(8) | not null
 title | character varying(80) |
 duration | interval |
 rating | character varying |
Inherits: video

Starting with PostgreSQL version 8.0, you can change the data type of an existing column using ALTER TABLE. For example, to change the data type of the customers.customer_id column from INTEGER to NUMERIC( 7, 2 ), you could execute the command:

ALTER TABLE customers ALTER COLUMN customer_id TYPE NUMERIC( 7,2 )

As long as PostgreSQL knows how to convert a value from the old data type to the new data type, you can freely change data types. If PostgreSQL doesn't know how to convert between the old and new types, you can include a USING expression clause to tell PostgreSQL how to perform the conversion. The expression following the USING keyword typically refers to the original column value. For example, if you want to change the data type of customers.customer_id and multiply each customer_id by 100 at the same time, use the following command:

ALTER TABLE customers ALTER COLUMN customer_id TYPE NUMERIC( 7,2 ) USING customer_id * 100

You can also refer to other columns in the USING expression. For example, say that you are currently storing each customer name in two columns, last_name and first_name, and you've decided to combine them into a single column named customer name. You can do that with the following commands:

movies=# ALTER TABLE customers
movies-# ALTER COLUMN last_name 
movies-# TYPE VARCHAR USING ( last_name || ',' || first_name ),
movies-# DROP COLUMN first_name;
ALTER TABLE

movies=# ALTER TABLE customers
movies-# RENAME COLUMN last_name TO customer_name;
ALTER TABLE

The first ALTER TABLE command performs two alterations. First, for each row in the table, it evaluates the expression last_name || ',' || first_name and assigns that value to the last_name column (converting the result into type VARCHAR along the way). Next, the (first) ALTER TABLE command removes the first_name column from each row. You're left with a single column called last_name that contains the concatenation of the original last_name and first_name columns (with a comma in between). The second ALTER TABLE command renames the last_name column to customer_name.

Keep in mind that some ALTER TABLE commands will take longer to execute than others. It takes very little time to change the name of a column. It can take quite a while to change the data type of a column (because PostgreSQL has to traverse every row in the table and write out a new version). If you use ALTER TABLE ... SET TABLESPACE to move a table from one tablespace to another, the server must physically copy each block in the table. In most cases, it's faster to execute a series of ALTER TABLE commands than it is to read the old data into a client application, change each row, and then write the result back to the server. When you use an ALTER TABLE command, the entire transformation occurs within the server; if you modify the structure of a table using a custom-written client application, you have to send every row to the client, perform the transformation, and then send every row back to the server.

The ALTER TABLE command is useful when you are in the development stages of a project. Using ALTER TABLE, you can add new columns to a table, define default values, rename columns (and tables), add and drop constraints, change the data type of a column, and transfer ownership. The capabilities of the ALTER TABLE command seem to grow with each new releasesee the PostgreSQL Reference Manual for more details.

Part I: General PostgreSQL Use

Introduction to PostgreSQL and SQL

Working with Data in PostgreSQL

PostgreSQL SQL Syntax and Use

Performance

Part II: Programming with PostgreSQL

Introduction to PostgreSQL Programming

Extending PostgreSQL

PL/pgSQL

The PostgreSQL C APIlibpq

A Simpler C APIlibpgeasy

The New PostgreSQL C++ APIlibpqxx

Embedding SQL Commands in C Programsecpg

Using PostgreSQL from an ODBC Client Application

Using PostgreSQL from a Java Client Application

Using PostgreSQL with Perl

Using PostgreSQL with PHP

Using PostgreSQL with Tcl and Tcl/Tk

Using PostgreSQL with Python

Npgsql: The .NET Data Provider

Other Useful Programming Tools

Part III: PostgreSQL Administration

Introduction to PostgreSQL Administration

PostgreSQL Administration

Internationalization and Localization

Security

Replicating PostgreSQL Data with Slony

Contributed Modules

Index