So far, all the queries that you've seen involve a single table. Most databases contain multiple tables and there are relationships between these tables. This sample database has an example: movies=# \d rentals Table "rentals" Attribute Type Modifier -------------+--------------+---------- tape_id character(8) not null rental_date date not null customer_id integer not null Here's a description of the rentals table from earlier in this chapter:
You can see that each row in the rentals table refers to a tape ( tape_id ) and to a customer ( customer_id ). If you SELECT from the rentals table, you can see the tape ID and customer ID, but you can't see the movie title or customer name . What you need here is a join . When you need to retrieve data from multiple tables, you join those tables. PostgreSQL (and all relational databases) supports a number of join types. The most basic join type is a cross-join (or Cartesian product). In a cross join, PostgreSQL joins each row in the first table to each row in the second table to produce a result table. If you are joining against a third table, PostgreSQL joins each row in the intermediate result with each row in the third table. Let's look at an example. We'll cross-join the rentals and customers tables. First, I'll show you each table: movies=# SELECT * FROM rentals; tape_id rental_date customer_id ----------+-------------+------------- AB-12345 2001-11-25 1 AB-67472 2001-11-25 3 OW-41221 2001-11-25 1 MC-68873 2001-11-20 3 (4 rows) movies=# SELECT * FROM customers; id customer_name phone birth_date balance ----+----------------------+----------+------------+--------- 1 Jones, Henry 555-1212 1970-10-10 0.00 2 Rubin, William 555-2211 1972-07-10 15.00 3 Panky, Henry 555-1221 1968-01-21 0.00 4 Wonderland, Alice N. 555-1122 1969-03-05 3.00 5 Funkmaster, Freddy 555-FUNK 7 Gull, Jonathon LC 555-1111 1984-02-05 8 Grumby, Jonas 555-2222 1984-02-21 (7 rows) Now I'll join these tables. To perform a cross-join, we simply list each table in the FROM clause: movies=# SELECT rentals.*, customers.id, customers.customer_name movies-# FROM rentals, customers; tape_id rental_date customer_id id customer_name ----------+-------------+-------------+----+---------------------- AB-12345 2001-11-25 1 1 Jones, Henry AB-12345 2001-11-25 1 2 Rubin, William AB-12345 2001-11-25 1 3 Panky, Henry AB-12345 2001-11-25 1 4 Wonderland, Alice N. AB-12345 2001-11-25 1 5 Funkmaster, Freddy AB-12345 2001-11-25 1 7 Gull, Jonathon LC AB-12345 2001-11-25 1 8 Grumby, Jonas AB-67472 2001-11-25 3 1 Jones, Henry AB-67472 2001-11-25 3 2 Rubin, William AB-67472 2001-11-25 3 3 Panky, Henry AB-67472 2001-11-25 3 4 Wonderland, Alice N. AB-67472 2001-11-25 3 5 Funkmaster, Freddy AB-67472 2001-11-25 3 7 Gull, Jonathon LC AB-67472 2001-11-25 3 8 Grumby, Jonas OW-41221 2001-11-25 1 1 Jones, Henry OW-41221 2001-11-25 1 2 Rubin, William OW-41221 2001-11-25 1 3 Panky, Henry OW-41221 2001-11-25 1 4 Wonderland, Alice N. OW-41221 2001-11-25 1 5 Funkmaster, Freddy OW-41221 2001-11-25 1 7 Gull, Jonathon LC OW-41221 2001-11-25 1 8 Grumby, Jonas MC-68873 2001-11-20 3 1 Jones, Henry MC-68873 2001-11-20 3 2 Rubin, William MC-68873 2001-11-20 3 3 Panky, Henry MC-68873 2001-11-20 3 4 Wonderland, Alice N. MC-68873 2001-11-20 3 5 Funkmaster, Freddy MC-68873 2001-11-20 3 7 Gull, Jonathon LC MC-68873 2001-11-20 3 8 Grumby, Jonas (28 rows) You can see that PostgreSQL has joined each row in the rentals table to each row in the customers table. The rentals table contains four rows; the customers table contains seven rows. The result set contains 4 x 7 or 28 rows. Cross-joins are rarely useful ”they usually don't represent real-world relationships. The second type of join, the inner-join , is very useful. An inner-join starts with a cross-join, and then throws out the rows that you don't want. Take a close look at the results of the previous query. Here are the first seven rows again: tape_id rental_date customer_id id customer_name ----------+-------------+-------------+----+---------------------- AB-12345 2001-11-25 1 1 Jones, Henry AB-12345 2001-11-25 1 2 Rubin, William AB-12345 2001-11-25 1 3 Panky, Henry AB-12345 2001-11-25 1 4 Wonderland, Alice N. AB-12345 2001-11-25 1 5 Funkmaster, Freddy AB-12345 2001-11-25 1 7 Gull, Jonathon LC AB-12345 2001-11-25 1 8 Grumby, Jonas . . . . . . . . . . . . . . . These seven rows were produced by joining the first row in the rentals table: tape_id rental_date customer_id ----------+-------------+------------- AB-12345 2001-11-25 1 with each row in the customers table. What is the real-world relationship between a rentals row and a customers row? Each row in the rentals table contains a customer ID. Each row in the customers table is uniquely identified by a customer ID. So, given a rentals row, we can find the corresponding customers row by searching for a customer where the customer ID is equal to rentals.customer_id . Looking back at the previous query, you can see that the meaningful rows are those WHERE customers.id = rentals.customer_id .
Now you can construct a query that will show us all of the rentals and the names of the corresponding customers: movies=# SELECT rentals.*, customers.id, customers.customer_name movies-# FROM rentals, customers movies-# WHERE customers.id = rentals.customer_id; tape_id rental_date customer_id id customer_name ----------+-------------+-------------+----+--------------- AB-12345 2001-11-25 1 1 Jones, Henry OW-41221 2001-11-25 1 1 Jones, Henry AB-67472 2001-11-25 3 3 Panky, Henry MC-68873 2001-11-20 3 3 Panky, Henry (4 rows) To execute this query, PostgreSQL could start by creating the cross-join between all the tables involved, producing an intermediate result table. Next, PostgreSQL could throw out all the rows that fail to satisfy the WHERE clause. In practice, this would be a poor strategy: Cross-joins can get very large quickly. Instead, the PostgreSQL query optimizer analyzes the query and plans an execution strategy to minimize execution time. I'll cover query optimization in Chapter 4, "Query Optimization." Join TypesWe've seen two join types so far: cross-joins and inner-joins. Now we'll look at outer-joins . An outer-join is similar to an inner-join: a relationship between two tables is established by correlating a column from each table. In an earlier section, you wrote a query that answered the question: "Which customers are currently renting movies?" How would you answer the question: "Who are my customers and which movies are they currently renting ?" You might start by trying the following query: movies=# SELECT customers.*, rentals.tape_id movies-# FROM customers, rentals movies-# WHERE rentals.customer_id = customers.id; id customer_name phone birth_date balance tape_id ----+---------------+----------+------------+---------+---------- 1 Jones, Henry 555-1212 1970-10-10 0.00 AB-12345 1 Jones, Henry 555-1212 1970-10-10 0.00 OW-41221 3 Panky, Henry 555-1221 1968-01-21 0.00 AB-67472 3 Panky, Henry 555-1221 1968-01-21 0.00 MC-68873 (4 rows) Well, that didn't work. This query showed you which customers are currently renting movies (and the movies that they are renting). What we really want is a list of all customers and, if a customer is currently renting any movies, all the movies rented. This is an outer-join. An outer-join preserves all the rows in one table (or both tables) regardless of whether a matching row can be found in the second table. The syntax for an outer-join is a little strange . Here is an example: movies=# SELECT customers.customer_name, rentals.tape_id movies-# FROM customers LEFT OUTER JOIN rentals movies-# ON customers.id = rentals.customer_id; customer_name tape_id ----------------------+---------- Jones, Henry AB-12345 Jones, Henry OW-41221 Rubin, William Panky, Henry AB-67472 Panky, Henry MC-68873 Wonderland, Alice N. Funkmaster, Freddy Gull, Jonathon LC Grumby, Jonas (9 rows) This query is a left outer-join . Why left? Because you will see each row from the left table (the table to the left of the LEFT OUTER JOIN phrase). An inner-join would list only two customers ("Jones, Henry" and "Panky, Henry") ”the other customers have no rentals. A RIGHT OUTER JOIN preserves each row from the right table. A FULL OUTER JOIN preserves each row from both tables. The following query shows a list of all customers , all tapes , and any rentals : movies=# SELECT customers.customer_name, rentals.tape_id, tapes.title movies-# FROM customers FULL OUTER JOIN rentals movies-# ON customers.id = rentals.customer_id movies-# FULL OUTER JOIN tapes movies-# ON tapes.tape_id = rentals.tape_id; customer_name tape_id title ----------------------+----------+---------------------- Jones, Henry AB-12345 The Godfather Panky, Henry AB-67472 The Godfather Rear Window American Citizen, An Panky, Henry MC-68873 Casablanca Jones, Henry OW-41221 Citizen Kane Rubin, William Wonderland, Alice N. Funkmaster, Freddy Gull, Jonathon LC Grumby, Jonas Sly Stone (13 rows) |