Multi-Table Joins

Table of contents:

So far, all the queries that you've seen involve a single table. Most databases contain multiple tables and there are relationships between these tables. This sample database has an example:

movies=# d rentals

 Table "rentals"
 Attribute | Type | Modifier
-------------+--------------+----------
 tape_id | character(8) | not null
 rental_date | date | not null
 customer_id | integer | not null

Here's a description of the rentals table from earlier in this chapter:

"When a customer comes in to rent a tape, we will add a row to the rentals table to record the transaction. There are three pieces of information that we need to record for each rental: the tape_id, the customer_id, and the date that the rental occurred. Notice that each row in the rentals table refers to a customer (customer_id) and a tape (tape_id)."

You can see that each row in the rentals table refers to a tape (tape_id) and to a customer (customer_id). If you SELECT from the rentals table, you can see the tape ID and customer ID, but you can't see the movie title or customer name. What you need here is a join. When you need to retrieve data from multiple tables, you join those tables.

PostgreSQL (and all relational databases) supports a number of join types. The most basic join type is a cross-join (or Cartesian product). In a cross join, PostgreSQL joins each row in the first table to each row in the second table to produce a result table. If you are joining against a third table, PostgreSQL joins each row in the intermediate result with each row in the third table.

Let's look at an example. We'll cross-join the rentals and customers tables. First, I'll show you each table:

movies=# SELECT * FROM rentals;
 tape_id | rental_date | customer_id
----------+-------------+-------------
 AB-12345 | 2001-11-25 | 1
 AB-67472 | 2001-11-25 | 3
 OW-41221 | 2001-11-25 | 1
 MC-68873 | 2001-11-20 | 3
(4 rows)

movies=# SELECT * FROM customers;

 customer_id | customer_name | phone | birth_date | balance
-------------+----------------------+----------+------------+---------
 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00
 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00
 4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 | 3.00
 2 | Rubin, William | 555-2211 | 1972-07-10 | 15.00
 5 | Funkmaster, Freddy | 555-FUNK | |
 7 | Gull, Jonathon LC | 555-1111 | 1984-02-05 |
 8 | Grumby, Jonas | 555-2222 | 1984-02-21 |
(7 rows)

Now I'll join these tables. To perform a cross-join, we simply list each table in the FROM clause:

movies=# SELECT rentals.*, customers.customer_id, customers.customer_name
movies-# FROM rentals, customers;
 tape_id | rental_date | customer_id | customer_id | customer_name
----------+-------------+-------------+-------------+----------------------
 AB-12345 | 2001-11-25 | 1 | 3 | Panky, Henry
 AB-12345 | 2001-11-25 | 1 | 1 | Jones, Henry
 AB-12345 | 2001-11-25 | 1 | 4 | Wonderland, Alice N.
 AB-12345 | 2001-11-25 | 1 | 2 | Rubin, William
 AB-12345 | 2001-11-25 | 1 | 5 | Funkmaster, Freddy
 AB-12345 | 2001-11-25 | 1 | 7 | Gull, Jonathon LC
 AB-12345 | 2001-11-25 | 1 | 8 | Grumby, Jonas
 AB-67472 | 2001-11-25 | 3 | 3 | Panky, Henry
 AB-67472 | 2001-11-25 | 3 | 1 | Jones, Henry
 AB-67472 | 2001-11-25 | 3 | 4 | Wonderland, Alice N.
 AB-67472 | 2001-11-25 | 3 | 2 | Rubin, William
 AB-67472 | 2001-11-25 | 3 | 5 | Funkmaster, Freddy
 AB-67472 | 2001-11-25 | 3 | 7 | Gull, Jonathon LC
 AB-67472 | 2001-11-25 | 3 | 8 | Grumby, Jonas
 OW-41221 | 2001-11-25 | 1 | 3 | Panky, Henry
 OW-41221 | 2001-11-25 | 1 | 1 | Jones, Henry
 OW-41221 | 2001-11-25 | 1 | 4 | Wonderland, Alice N.
 OW-41221 | 2001-11-25 | 1 | 2 | Rubin, William
 OW-41221 | 2001-11-25 | 1 | 5 | Funkmaster, Freddy
 OW-41221 | 2001-11-25 | 1 | 7 | Gull, Jonathon LC
 OW-41221 | 2001-11-25 | 1 | 8 | Grumby, Jonas
 MC-68873 | 2001-11-20 | 3 | 3 | Panky, Henry
 MC-68873 | 2001-11-20 | 3 | 1 | Jones, Henry
 MC-68873 | 2001-11-20 | 3 | 4 | Wonderland, Alice N.
 MC-68873 | 2001-11-20 | 3 | 2 | Rubin, William
 MC-68873 | 2001-11-20 | 3 | 5 | Funkmaster, Freddy
 MC-68873 | 2001-11-20 | 3 | 7 | Gull, Jonathon LC
 MC-68873 | 2001-11-20 | 3 | 8 | Grumby, Jonas
(28 rows)

You can see that PostgreSQL has joined each row in the rentals table to each row in the customers table. The rentals table contains four rows; the customers table contains seven rows. The result set contains 4 ¥ 7 or 28 rows.

Cross-joins are rarely usefulthey usually don't represent real-world relationships.

The second type of join, the inner-join, is very useful. An inner-join starts with a cross-join, and then throws out the rows that you don't want. Take a close look at the results of the previous query. Here are the first seven rows again:

tape_id | rental_date | customer_id | customer_id | customer_name
----------+-------------+-------------+-------------+----------------------
 AB-12345 | 2001-11-25 | 1 | 3 | Panky, Henry
 AB-12345 | 2001-11-25 | 1 | 1 | Jones, Henry
 AB-12345 | 2001-11-25 | 1 | 4 | Wonderland, Alice N.
 AB-12345 | 2001-11-25 | 1 | 2 | Rubin, William
 AB-12345 | 2001-11-25 | 1 | 5 | Funkmaster, Freddy
 AB-12345 | 2001-11-25 | 1 | 7 | Gull, Jonathon LC
 AB-12345 | 2001-11-25 | 1 | 8 | Grumby, Jonas

 . . . . .
 . . . . .
 . . . . .

These seven rows were produced by joining the first row in the rentals table:

 tape_id | rental_date | customer_id
----------+-------------+-------------
 AB-12345 | 2001-11-25 | 1

with each row in the customers table. What is the real-world relationship between a rentals row and a customers row? Each row in the rentals table contains a customer ID. Each row in the customers table is uniquely identified by a customer ID. So, given a rentals row, we can find the corresponding customers row by searching for a customer where the customer ID is equal to rentals.customer_id. Looking back at the previous query, you can see that the meaningful rows are those WHERE customers.customer_id = rentals.customer_id.

Qualifying Column Names

Notice that this WHERE clause mentions two columns with the same names (customer_id). You may find it helpful to qualify each column name by prefixing it with the name of the corresponding table, followed by a period. So, customers.customer_id refers to the customer_id column in the customers table and rentals.customer_id refers to the customer_id column in the rentals table. Adding the table qualifier is sometimes required if a command involves two columns with identical names, but is useful in other cases.

Now you can construct a query that will show us all of the rentals and the names of the corresponding customers:

movies=# SELECT rentals.*, customers.customer_id, customers.customer_name
movies-# FROM rentals, customers
movies-# WHERE customers.customer_id = rentals.customer_id;

 tape_id | rental_date | customer_id | customer_id | customer_name
----------+-------------+-------------+-------------+---------------
 AB-12345 | 2001-11-25 | 1 | 1 | Jones, Henry
 AB-67472 | 2001-11-25 | 3 | 3 | Panky, Henry
 OW-41221 | 2001-11-25 | 1 | 1 | Jones, Henry
 MC-68873 | 2001-11-20 | 3 | 3 | Panky, Henry
(4 rows)

To execute this query, PostgreSQL could start by creating the cross-join between all the tables involved, producing an intermediate result table. Next, PostgreSQL could throw out all the rows that fail to satisfy the WHERE clause. In practice, this would be a poor strategy: Cross-joins can get very large quickly. Instead, the PostgreSQL query optimizer analyzes the query and plans an execution strategy to minimize execution time. I'll cover query optimization in Chapter 4.

Join Types

We've seen two join types so far: cross-joins and inner-joins. Now we'll look at outer-joins. An outer-join is similar to an inner-join: a relationship between two tables is established by correlating a column from each table.

In an earlier section, you wrote a query that answered the question: "Which customers are currently renting movies?" How would you answer the question: "Who are my customers and which movies are they currently renting?" You might start by trying the following query:

movies=# SELECT customers.*, rentals.tape_id
movies-# FROM customers, rentals
movies=# WHERE rentals.customer_id = customers.customer_id;

 customer_id | customer_name | phone | birth_date | balance | tape_id
-------------+---------------+----------+------------+---------+----------
 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00 | AB-12345
 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00 | AB-67472
 1 | Jones, Henry | 555-1212 | 1970-10-10 | 0.00 | OW-41221
 3 | Panky, Henry | 555-1221 | 1968-01-21 | 0.00 | MC-68873
 rows)

Well, that didn't work. This query showed you which customers are currently renting movies (and the movies that they are renting). What we really want is a list of all customers and, if a customer is currently renting any movies, all the movies rented. This is an outer-join. An outer-join preserves all the rows in one table (or both tables) regardless of whether a matching row can be found in the second table.

The syntax for an outer-join is a little strange. Here is an example:

movies=# SELECT customers.customer_name, rentals.tape_id
movies-# FROM customers LEFT OUTER JOIN rentals
movies-# ON customers.customer_id = rentals.customer_id;
 customer_name | tape_id
----------------------+----------
 Jones, Henry | AB-12345
 Jones, Henry | OW-41221
 Rubin, William |
 Panky, Henry | AB-67472
 Panky, Henry | MC-68873
 Wonderland, Alice N. |
 Funkmaster, Freddy |
 Gull, Jonathon LC |
 Grumby, Jonas |
rows)

This query is a left outer-join. Why left? Because you will see each row from the left table (the table to the left of the LEFT OUTER JOIN phrase). An inner-join would list only two customers ("Jones, Henry" and "Panky, Henry")the other customers have no rentals.

A RIGHT OUTER JOIN preserves each row from the right table. A FULL OUTER JOIN preserves each row from both tables.

The following query shows a list of all customers, all tapes, and any rentals:

movies=# SELECT customers.customer_name, rentals.tape_id, tapes.title
movies-# FROM customers FULL OUTER JOIN rentals
movies-# ON customers.customer_id = rentals.customer_id
movies-# FULL OUTER JOIN tapes
movies-# ON tapes.tape_id = rentals.tape_id;
 customer_name | tape_id | title
----------------------+----------+----------------------
 Jones, Henry | AB-12345 | The Godfather
 Panky, Henry | AB-67472 | The Godfather
 | | Rear Window
 | | American Citizen, An
 Panky, Henry | MC-68873 | Casablanca
 Jones, Henry | OW-41221 | Citizen Kane
 Rubin, William | |
 Wonderland, Alice N. | |
 Funkmaster, Freddy | |
 Gull, Jonathon LC | |
 Grumby, Jonas | |
 | | Sly
 | | Stone
(13 rows)


Part I: General PostgreSQL Use

Introduction to PostgreSQL and SQL

Working with Data in PostgreSQL

PostgreSQL SQL Syntax and Use

Performance

Part II: Programming with PostgreSQL

Introduction to PostgreSQL Programming

Extending PostgreSQL

PL/pgSQL

The PostgreSQL C APIlibpq

A Simpler C APIlibpgeasy

The New PostgreSQL C++ APIlibpqxx

Embedding SQL Commands in C Programsecpg

Using PostgreSQL from an ODBC Client Application

Using PostgreSQL from a Java Client Application

Using PostgreSQL with Perl

Using PostgreSQL with PHP

Using PostgreSQL with Tcl and Tcl/Tk

Using PostgreSQL with Python

Npgsql: The .NET Data Provider

Other Useful Programming Tools

Part III: PostgreSQL Administration

Introduction to PostgreSQL Administration

PostgreSQL Administration

Internationalization and Localization

Security

Replicating PostgreSQL Data with Slony

Contributed Modules

Index



PostgreSQL(c) The comprehensive guide to building, programming, and administering PostgreSQL databases
PostgreSQL(c) The comprehensive guide to building, programming, and administering PostgreSQL databases
ISBN: 735712573
EAN: N/A
Year: 2004
Pages: 261

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net