Retrieving Data from the Sample Database

At this point, you should have a sample database (movies) that contains three tables (tapes, customers, and rentals) and a few rows in each table. You know how to get data into a table; now let's see how to view that data.

The SELECT statement is used to retrieve data from a database. SELECT is the most complex statement in the SQL language, and the most powerful. Using SELECT, you can retrieve entire tables, single rows, a group of rows that meet a set of constraints, combinations of multiple tables, expressions, and more. To help you understand the basics of the SELECT statement, I'll try to break it down into each of its forms and move from the simple to the more complex.

SELECT Expression

In its simplest form, you can use the SELECT statement to retrieve one or more values from a set of predefined functions. You've already seen how to retrieve your PostgreSQL user id:

movies=# select user;
 current_user
---------------
 korry
(1 row)

movies=# q

Other values that you might want to see are

select 5; -- returns the number 5 (whoopee)
select sqrt(2.0); -- returns the square root of 2
select timeofday();-- returns current date/time
select now(); -- returns time of start of transaction
select version(); -- returns the version of PostgreSQL you are using

select now(), timeofday();

Commenting

The -- characters introduce a commentany text that follows is ignored.

The previous example shows how to SELECT more than one piece of informationjust list all the values that you want, separated by commas.

The PostgreSQL User's Guide contains a list of all the functions that are distributed with PostgreSQL. In Chapter 2, I'll show you how to combine columns, functions, operators, and literal values into more complex expressions.

SELECT * FROM Table

You probably won't use the first form of the SELECT statement very oftenit just isn't very exciting. Moving to the next level of complexity, let's see how to retrieve data from one of the tables that you created earlier:

movies=# SELECT * FROM customers;

 customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00(4 rows)

When you write a SELECT statement, you have to tell PostgreSQL what information you are trying to retrieve. Let's take a closer look at the components of this SELECT statement.

Following the SELECT keyword, you specify a list of the columns that you want to retrieve. I used an asterisk (*) here to tell PostgreSQL that we want to see all the columns in the customers table.

Next, you have to tell PostgreSQL which table you want to view; in this case, you want to see the customers table.

Now let's look at the results of this query. A SELECT statement returns a result set. A result set is a table composed of all the rows and columns (or fields) that you request. A result set may be empty.

You asked PostgreSQL to return all the columns in the customers tablenotice that the columns are displayed (from left to right) in the order that you specified when you created the table. You may have noticed that the rows are returned in an (apparently) arbitrary order. That's an important thing to keep in mind: Unless you specifically request that PostgreSQL return rows in a particular order, you won't be able to predict which rows will come first[5]. This is a performance feature; if you don't care about row ordering, let PostgreSQL return the rows in the fastest possible way.

[5] Okay, some people probably could predict the order in which the rows will appear. Those people have way too much free time and consider a propeller to be fashionable headwear. They are also very good at inducing sleep.

Making the Most of the psql Console

You'll be spending a lot of time using the psql console so it's a good idea to get to know it well. psql can do more than just send a command to the server and display the result. You can use bash-style tab completion to reduce the amount of typing you have to do. To use tab completion, just type in the first few characters of a word and then press the TAB keypsql will try to complete the rest of the word. Tab completion is smart. If you type in the first few characters of a command and then press TAB, psql tries to complete the command name. If you've already entered DROP DATABASE and then press TAB, psql shows you a list of databases; type in the first few characters of a database, press TAB, and psql completes the name of the database. If you press TAB in a context where psql expects to find a username (like DROP USER ), you'll see a list of users. psql can complete column names, data type names, domain names, aggregate names, function names, index names, table names, view names, database names, encodings, languages, schemas, and users.

You can also change the format that psql uses to display query results. By default, psql uses a format named aligned (each column is preceded by a column header and values are aligned within a grid). You can also choose unaligned, html, or latex format. To change the output format, use the command pset format format-name. For example, to switch to html format, type in the command pset format html. Once you're in HTML-mode, query results will include the HTML tags required to display the results in tabular form. You probably want to send HTML output to a file (rather than seeing all of the formatting commands in your terminal window). Use the o filename command to route query results to the given filename. See the psql manual page ($ man psql) for complete details. Play around with the formatting options. Play around with tab completion. Change your psql prompt. psql packs a lot of power into an easy-to-use interface.

 

SELECT Single-Column FROM Table

If you don't want to view all of the columns from a table, you can replace the * (following the SELECT keyword) with the name of a column:

movies=# SELECT title FROM tapes;
 title
---------------
 The Godfather
 The Godfather
 Casablanca
 Citizen Kane
 Rear Window
(5 rows)

Again, the rows are presented in an arbitrary order. But this time you see only a single column. You may have noticed that "The Godfather" appears twice in this list. That happens because our imaginary video store owns two copies of that movie. I'll show you how to get rid of duplicates in a moment.

SELECT Column-List FROM Table

So far, you have seen how to select all the columns in a table and how to select a single column. Of course, there is a middle groundyou can select a list of columns:

movies=# SELECT customer_name, birth_date FROM customers;
 customer_name | birth_date
----------------------+------------
 Jones, Henry | 1970-10-10
 Rubin, William | 1972-07-10
 Panky, Henry | 1968-01-21
 Wonderland, Alice N. | 1969-03-05
(4 rows)

Instead of naming a single column after the SELECT keyword, you can provide a column-separated list of column names. Column names can appear in any order, and the results will appear in the order you specify.

SELECT Expression-List FROM Table

In addition to selecting columns, you can also select expressions. Remember, an expression is a combination of columns, functions, operators, literal values, and other expressions that will evaluate to a single value. Here is an example:

movies=# SELECT
movies-# customer_name,
movies-# birth_date,
movies-# age( birth_date )
movies-# FROM customers;
 customer_name | birth_date | age
----------------------+------------+------------------------------
 Jones, Henry | 1970-10-10 | 31 years 4 mons 3 days 01:00
 Rubin, William | 1972-07-10 | 29 years 7 mons 3 days 01:00
 Panky, Henry | 1968-01-21 | 34 years 23 days
 Wonderland, Alice N. | 1969-03-05 | 32 years 11 mons 8 days
(4 rows)

In this example, I've selected two columns and an expression. The expression age( birth_date ) is evaluated for each row in the table. The age() function subtracts the given date from the current date[6].

[6] Technically, the age() function subtracts the given timestamp (date+time) from the current date and time.

Selecting Specific Rows

The preceding few sections have shown you how to specify which columns you want to see in a result set. Now let's see how to choose only the rows that you want.

First, I'll show you to how to eliminate duplicate rows; then I'll introduce the WHERE clause.

SELECT [ALL | DISTINCT | DISTINCT ON]

In an earlier example, you selected the titles of all the videotapes owned by your video store:

movies=# SELECT title from tapes;
 title
---------------
 The Godfather
 The Godfather
 Casablanca
 Citizen Kane
 Rear Window
(5 rows)

Notice that "The Godfather" is listed twice (you own two copies of that video). You can use the DISTINCT clause to filter out duplicate rows:

movies=# SELECT DISTINCT title FROM tapes;
 title
---------------
 Casablanca
 Citizen Kane
 Rear Window
 The Godfather
(4 rows)

You now have a single row with the value "The Godfather." Let's see what happens when you add the tape_id back into the previous query:

movies=# SELECT DISTINCT title, tape_id FROM tapes;
 title | tape_id
---------------+----------
 Casablanca | MC-68873
 Citizen Kane | OW-41221
 Rear Window | AH-54706
 The Godfather | AB-12345
 The Godfather | AB-67472
(5 rows)

We're back to seeing "The Godfather" twice. What happened? The DISTINCT clause removes duplicate rows, not duplicate column values; and when the tape IDs are added to the result, the rows containing "The Godfather" are no longer identical.

If you want to filter rows that have duplicate values in one (or more) columns, use the DISTINCT ON() form:

movies=# SELECT DISTINCT ON (title) title, tape_id FROM tapes;
 title | tape_id
---------------+----------
 Casablanca | MC-68873
 Citizen Kane | OW-41221
 Rear Window | AH-54706
 The Godfather | AB-12345
(4 rows)

Notice that one of the "The Godfather" rows has been omitted from the result set. If you don't include an ORDER BY clause (I'll cover that in a moment), you can't predict which row in a set of duplicates will be included in the result set.

You can list multiple columns (or expressions) in the DISTINCT ON() clause.

The WHERE Clause

The next form of the SELECT statement includes the WHERE clause. Here is the syntax diagram for this form:

SELECT expression-list FROM table WHERE conditions

Using the WHERE clause, you can filter out rows that you don't want included in the result set. Let's see a simple example. First, here is the complete customers table:

movies=# SELECT * FROM customers;

 customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
(4 rows)

Now pick out only those customers who owe you some money:

movies=# SELECT * FROM customers WHERE balance > 0;

 customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
(2 rows)

In this example, I've used a single condition to restrict the rows included in the result set: balance > 0.

When PostgreSQL executes a SELECT statement, it evaluates the WHERE clause as it processes each row. If all the conditions specified by the WHERE clause are met, the row will be included in the result set (if a row meets all the conditions in the WHERE clause, the row satisfies the WHERE clause).

Here is an example that is slightly more complex:

movies=# SELECT customer_name, phone FROM customers
movies-# WHERE
movies-# ( balance = 0 )
movies-# AND
movies-# ( AGE( birth_date ) < '35 years' )
movies-# ;
 customer_name | phone
---------------+----------
 Jones, Henry | 555-1212
(1 row)

In this query, I've specified two conditions, separated by an AND operator. The conditions are: balance = 0 and AGE( birth_date ) < '34 years'[7]. As before, PostgreSQL reads each row in the customers table and evaluates the WHERE clause. If a given row is to be included in the result set, it must satisfy two constraintsbalance must be equal to zero and the customer must be younger than 35 years of age. If either of these conditions is false for a given row, that row will not be included in the result set.

[7] I'll show you how to format various date/time related values in Chapter 2.

AND is one of the logical operators supported by PostgreSQL. A logical operator is used to combine logical expressions. A logical expression is an expression that evaluates to trUE, FALSE, or unknown (NULL). The other two logical operators are OR and NOT.

Let's see how the OR operator works:

movies=# SELECT customer_id, customer_name, balance, AGE(birth_date)
movies-# FROM customers
movies-# WHERE
movies-# ( balance = 0 )
movies-# OR
movies-# ( AGE( birth_date ) < '35 years' )
movies-# ;
 customer_id | customer_name | balance | age
-------------+----------------+---------+----------------------------------
 3 | Panky, Henry | 0.00 | 36 years 8 mons 29 days 23:00:00
 1 | Jones, Henry | 0.00 | 34 years 10 days
 2 | Rubin, William | 15.00 | 32 years 3 mons 10 days
(3 rows)

The OR operator evaluates to trUE if either (or both) of the conditions is TRUE. The first row (id = 1) is included in the result set because it satisfies the first condition (balance = 0). It is included even if it does not satisfy the second condition. The second row (id = 2) is included in the result set because it satisfies the second condition, but not the first. You can see the difference between AND and OR. A row satisfies the AND operator if both conditions are TRUE. A row satisfies the OR operator if either condition is trUE (or if both are TRUE).

The NOT operator is simple:

movies=# SELECT * FROM customers
movies-# WHERE
movies-# NOT ( balance = 0 )
movies-# ;

 customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
(2 rows)

NOT evaluates to TRUE if its operand is FALSE and evaluates to FALSE if its operand is TRUE. The NOT operator inverts (or reverses) a test. Without the NOT operator, the previous example would have returned all customers where the balance column was equal to zero. With the NOT operator, you get the other rows instead.

One other point that I should mention about the WHERE clause. Just because you mention a column in the WHERE clause does not mean that you have to include the column in the result set. For example:

movies=# SELECT customer_id, customer_name FROM customers
movies-# WHERE
movies-# balance != 0
movies-# ;

 customer_id | customer_name
-------------+----------------------
 4 | Wonderland, Alice N.
 2 | Rubin, William
(2 rows)

This example also shows a more common alternative to the NOT operator. The != operator means "is not equal to." The != operator is not an exact replacement for NOTit can only be used to check for inequality, whereas NOT is used to reverse the sense of any logical expression.

NULL Values

Sometimes when you add data to a table, you find that you don't know what value you should include for a column. For example, you may encounter a customer who does not want to provide you with his or her birthday. What value should be recorded in the birth_date column for that customer? You don't really want to make up an answeryou want a date value that means "unknown." This is what the NULL value is for. NULL usually means that you don't know what value should be entered into a column, but it can also mean that a column does not apply. A NULL value in the birth_date column certainly means that we don't know a customer's birth_date, not that birth_date does not apply[8]. On the other hand, you might want to include a rating column in the tapes table. A NULL value in the rating column might imply that the movie was produced before ratings were introduced and therefore the rating column does not apply.

[8] I am making the assumption that the customers for your video store have actually been born. For some of you, that may not be a valid assumption.

Some columns should not allow NULL values. In most cases, it would not make sense to add a customer to your customers table unless you know the customer's name. Therefore, the customer_name column should be mandatory (in other words, customer_name should not allow NULL values).

Let's drop and re-create the customers table so that you can tell PostgreSQL which columns should allow NULL values:

movies=# DROP TABLE customers;
DROP
movies=# CREATE TABLE customers (
movies-# customer_id INTEGER UNIQUE NOT NULL,
movies-# customer_name VARCHAR(50) NOT NULL,
movies-# phone CHAR(8),
movies-# birth_date DATE,
movies-# balance DECIMAL(7,2)
movies-#);
CREATE

The NOT NULL modifier tells PostgreSQL that the customer_id and customer_name columns are mandatory. If you don't specify NOT NULL, PostgreSQL assumes that a column is optional. You can include the keyword NULL to make your choices more obvious:

movies=# DROP TABLE customers;
DROP
movies=# CREATE TABLE customers (
movies-# customer_id INTEGER UNIQUE NOT NULL,
movies-# customer_name VARCHAR(50) NOT NULL,
movies-# phone CHAR(8) NULL,
movies-# birth_date DATE NULL,
movies-# balance DECIMAL(7,2) NULL
movies-#);
CREATE

Notice that a column of any data type can support NULL values.

The NULL value has a unique property that is often the source of much confusion. NULL is not equal to any value, not even itself. NULL is not less than any value, and NULL is not greater than any value. Let's add a customer with a NULL balance:

movies=# INSERT INTO customers
movies-# VALUES
movies-# (
movies(# 5, 'Funkmaster, Freddy', '555-FUNK', NULL, NULL
movies(# )
movies-# ;

Now we have five customers:

movies=# SELECT * FROM customers;

 customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
 5 | Funkmaster, Freddy | 555-FUNK | |
(5 rows)

One of these customers has a NULL balance. Let's try a few queries:

movies=# SELECT * FROM customers WHERE balance > NULL;

 customer_id | customer_name | phone | birth_date | balance
-------------+---------------+-------+------------+---------
(0 rows)

This query did not return any rows. You might think that it should have customer number 2 (Rubin, William); after all, 15.00 is surely greater than 0. But remember, NULL is not equal to, greater than, or less than any other value. NULL is not the same as zero. Rather than using relational operators ( '=', '!=', '<', or '>'), you should use either the IS or IS NOT operator.

movies=# SELECT * FROM customers WHERE balance IS NULL;

 customer_id | customer_name | phone | birth_date | balance
-------------+--------------------+----------+------------+---------
 5 | Funkmaster, Freddy | 555-FUNK | |
(1 row)

movies=# SELECT * FROM customers WHERE balance IS NOT NULL;
 customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
(4 rows)

The NULL value introduces another complication. If NULL is not greater than, equal to, or less than any other value, what would 'NULL + 4' mean? Is NULL + 4 greater than NULL? It can't be because that would imply that NULL is less than NULL + 4 and, by definition, NULL can't be less than another value. What does all this mean? It means that you can't do math with a NULL value.

movies=# SELECT customer_id, customer_name, balance, balance+4 FROM customers;

 customer_id | customer_name | balance | ?column?
-------------+----------------------+---------+----------
 3 | Panky, Henry | 0.00 | 4.00
 1 | Jones, Henry | 0.00 | 4.00
 4 | Wonderland, Alice N. | 3.00 | 7.00
 2 | Rubin, William | 15.00 | 19.00
 5 | Funkmaster, Freddy | |
(5 rows)

This query shows what happens when you try to perform a mathematical operation using NULL. When you try to add '4' to NULL, you end up with NULL.

The NULL value complicates logic operators as well. Most programmers are familiar with two-valued logic operators (that is, logic operators that are defined for the values trUE and FALSE). When you add in NULL values, the logic operators become a bit more complex. Tables 1.4, 1.5, and 1.6 show the truth tables for each logical operator.

Table 1.4. Truth Table for Three-Valued AND Operator

a

b

a AND b

trUE

trUE

trUE

trUE

FALSE

FALSE

trUE

NULL

NULL

FALSE

FALSE

FALSE

FALSE

NULL

FALSE

NULL

NULL

NULL

Source: PostgreSQL User's Guide

Table 1.5. Truth Table for Three-Valued OR Operator

a

b

a OR b

trUE

trUE

trUE

trUE

FALSE

TRUE

TRUE

NULL

trUE

FALSE

FALSE

FALSE

FALSE

NULL

NULL

NULL

NULL

NULL

Source: PostgreSQL User's Guide

Table 1.6. Truth Table for Three-Valued NOT Operator

a

NOT a

trUE

FALSE

FALSE

trUE

NULL

NULL

Source: PostgreSQL User's Guide

I don't mean to scare you away from the NULL valueit's very useful and often necessarybut you do have to understand the complications that it introduces.

NULLIF() and COALESCE()

PostgreSQL offers two operators that can convert a NULL value to some other value or convert a specific value into NULL.

The COALESCE() operator will substitute a default value whenever it encounters a NULL. For example, pretend that you've added two more columns, male_lead and female_lead to the tapes table so that it looks like this:

movies=# SELECT * from tapes;
tape_id | title | male_lead | female_lead | duration
----------+---------------+-----------------+----------------+----------
 AB-12345 | The Godfather | Marlon Brando | | 02:55:00
 AB-67472 | The Godfather | Marlon Brando | | 02:55:00
 MC-68873 | Casablanca | Humphrey Bogart | Ingrid Bergman | 01:42:00
 OW-41221 | Citizen Kane | | | 01:59:00
 AH-54706 | Rear Window | James Stewart | Grace Kelly |
 AH-44289 | The Birds | | Tippi Hedren | 01:59:00
(6 rows)

You can use the COALESCE() operator to transform a NULL male_lead into the word 'Unknown':

movies=# SELECT title, COALESCE( male_lead, 'Unknown' ) FROM tapes;
 title | coalesce
---------------+-----------------
 The Godfather | Marlon Brando
 The Godfather | Marlon Brando
 Casablanca | Humphrey Bogart
 Citizen Kane | Unknown
 Rear Window | James Stewart
 The Birds | Unknown
(6 rows)

The COALESCE() operator is more talented than we've shown hereit can search through a list of values, returning the first non-NULL value it finds. For example, the following query prints the male_lead, or, if male_lead is NULL, the female_lead, or if both are NULL, 'Unknown':

movies=# SELECT title, COALESCE( male_lead, female_lead, 'Unknown' )
movies-# AS "Starring"
movies-# FROM TAPES;
 title | Starring
---------------+-----------------
 The Godfather | Marlon Brando
 The Godfather | Marlon Brando
 Casablanca | Humphrey Bogart
 Citizen Kane | Unknown
 Rear Window | James Stewart
 The Birds | Tippi Hedren
(6 rows)

You can string together any number of expressions inside of the COALESCE() operator (as long as all expressions evaluate to the same type) and COALESCE() will evaluate to the leftmost non-NULL value in the list. If all of the expressions inside COALESCE() are NULL, the entire expression evaluates to NULL.

The NULLIF() operator translates a non-NULL value into NULL. NULLIF() is often used to do the opposite of COALESCE(). COALESCE() TRansforms NULL into a default valueNULLIF() translates a default value into NULL. In many circumstances, you want to treat a numeric value and a NULL value as being the same thing. For example, the balance column (in the customers table) is NULL until a customer actually rents a tape: A NULL balance implies that you haven't actually done any business with the customer yet. But a NULL balance also implies that the customer owes you no money. To convert a NULL balance to 0, use COALESCE( balance, 0 ). To convert a zero balance to NULL, use NULLIF( balance, 0 ). When PostgreSQL evaluates an NULLIF( arg1, arg2 ) expression, it compares the two arguments; if they are equal, the expression evaluates to NULL; if they are not equal, the expression evaluates to the value of arg1.

Part I: General PostgreSQL Use

Introduction to PostgreSQL and SQL

Working with Data in PostgreSQL

PostgreSQL SQL Syntax and Use

Performance

Part II: Programming with PostgreSQL

Introduction to PostgreSQL Programming

Extending PostgreSQL

PL/pgSQL

The PostgreSQL C APIlibpq

A Simpler C APIlibpgeasy

The New PostgreSQL C++ APIlibpqxx

Embedding SQL Commands in C Programsecpg

Using PostgreSQL from an ODBC Client Application

Using PostgreSQL from a Java Client Application

Using PostgreSQL with Perl

Using PostgreSQL with PHP

Using PostgreSQL with Tcl and Tcl/Tk

Using PostgreSQL with Python

Npgsql: The .NET Data Provider

Other Useful Programming Tools

Part III: PostgreSQL Administration

Introduction to PostgreSQL Administration

PostgreSQL Administration

Internationalization and Localization

Security

Replicating PostgreSQL Data with Slony

Contributed Modules

Index



PostgreSQL(c) The comprehensive guide to building, programming, and administering PostgreSQL databases
PostgreSQL(c) The comprehensive guide to building, programming, and administering PostgreSQL databases
ISBN: 735712573
EAN: N/A
Year: 2004
Pages: 261

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net