7.5 Writing Joins


According to the relational concept, data is stored in various tables. To combine this data, so-called joins have to be performed. Joining means that tables are combined using common attributes.

The next scenario shows a relation between students and training courses. Every student can attend multiple training courses, so you need two tables. One table stores information about students and the second table contains the list of training courses a student attends. Here is the SQL code needed for generating the two tables:

 phpbook=# CREATE TABLE student (id int4, name text); CREATE phpbook=# CREATE TABLE course(student_id int4, course text); CREATE 

Let's insert some data into the tables:

 phpbook=# INSERT INTO student VALUES (1, 'Epi'); INSERT 24792 1 phpbook=# INSERT INTO student VALUES (2, 'Hans'); INSERT 24793 1 phpbook=# INSERT INTO student VALUES (3, 'Heinz'); INSERT 24794 1 

Table student contains three records now:

 phpbook=# SELECT * FROM student;  id | name ----+-------   1 | Epi   2 | Hans   3 | Heinz (3 rows) 

Three students can be found in the table. Let's add a list of training courses these students attend to table course:

 phpbook=# INSERT INTO course VALUES (1, 'PostgreSQL for Beginners'); INSERT 24795 1 phpbook=# INSERT INTO course VALUES (1, 'PostgreSQL for C Programmers'); INSERT 24796 1 phpbook=# INSERT INTO course VALUES (1, 'PostgreSQL for Tcl Programmers'); INSERT 24797 1 phpbook=# INSERT INTO course VALUES (2, 'PostgreSQL for Tcl Programmers'); INSERT 24798 1 

If no error occurred, four records will be in the table:

 phpbook=# SELECT * FROM course;  student_id |             course ------------+--------------------------------           1 | PostgreSQL for Beginners           1 | PostgreSQL for C Programmers           1 | PostgreSQL for Tcl Programmers           2 | PostgreSQL for Tcl Programmers (4 rows) 

Let's try to find all students who attend training courses. The student id and the name should be displayed:

 phpbook=# SELECT student.* FROM student, course WHERE student.id=course.student _id;  id | name ----+------   1 | Epi   1 | Epi   1 | Epi   2 | Hans (4 rows) 

Epi and Hans attend training courses, but the lazy student called Heinz does not attend a single training course. The problem with the query shown in the previous listing is that every name is listed more than once. Epi attends three training courses and therefore Epi is listed three times. Let's have a closer look at the query: All columns in student will be displayed. Two tables are selected from the database. In the WHERE clause you can see how a join is performed. The id in table student has to be the same as the student_id in the table course. This way two tables are connected using a common attribute.

To solve the problem concerning multiple entries, use DISTINCT:

 phpbook=# SELECT DISTINCT student.* FROM student, course WHERE student.id=course.student_id;  id | name ----+------   1 | Epi   2 | Hans (2 rows) 

The result is now what it is supposed to be. Every record is only displayed once. DISTINCT makes sure that no duplicated entries can be in the result.

In the next example you will see how to retrieve a list of people and the training courses attended by them:

 SELECT student.name, course.course         FROM student, course         WHERE student.id=course.student_id; 

In this case you have to select the name of the student from table student and the name of the training course from table course. In this case you don't need DISTINCT because a student cannot attend the same training course twice (in this model this has to be checked on the application level):

The result will contain information from both tables:

  name |             course ------+--------------------------------  Epi  | PostgreSQL for Beginners  Epi  | PostgreSQL for C Programmers  Epi  | PostgreSQL for Tcl Programmers  Hans | PostgreSQL for Tcl Programmers (4 rows) 

Let's try to query for all people attending the Tcl programming course:

 phpbook=# SELECT student.id, course.course WHERE course.course='PostgreSQL for Tcl Programmers';  id |             course ----+--------------------------------   1 | PostgreSQL for Tcl Programmers   2 | PostgreSQL for Tcl Programmers   3 | PostgreSQL for Tcl Programmers   1 | PostgreSQL for Tcl Programmers   2 | PostgreSQL for Tcl Programmers   3 | PostgreSQL for Tcl Programmers (6 rows) 

Somehow the query seems to be wrong. Heinz does not attend a single training course, but student number 3 is still in the list.

To understand what went wrong, you can omit the WHERE clause restricting the name of the training course:

 phpbook=# SELECT * FROM student, course;  id | name  | student_id |             course ----+-------+------------+--------------------------------   1 | Epi   |          1 | PostgreSQL for Beginners   1 | Epi   |          1 | PostgreSQL for C Programmers   1 | Epi   |          1 | PostgreSQL for Tcl Programmers   1 | Epi   |          2 | PostgreSQL for Tcl Programmers   2 | Hans  |          1 | PostgreSQL for Beginners   2 | Hans  |          1 | PostgreSQL for C Programmers   2 | Hans  |          1 | PostgreSQL for Tcl Programmers   2 | Hans  |          2 | PostgreSQL for Tcl Programmers   3 | Heinz |          1 | PostgreSQL for Beginners   3 | Heinz |          1 | PostgreSQL for C Programmers   3 | Heinz |          1 | PostgreSQL for Tcl Programmers   3 | Heinz |          2 | PostgreSQL for Tcl Programmers (12 rows) 

PostgreSQL has computed the Cartesian product of the two tables. Building the Cartesian product of a list of tables is one of the core operations defined in the relational concept. In the previous example the Cartesian product has also been generated, but you have only selected those records containing the string PostgreSQL for Tcl Programmers. The problem with the query was that you did not add a join condition to the WHERE clause. A join condition is a condition that tells PostgreSQL that one column in the first table has to be equal to one column in the second table. This way tables can be connected with each other. Always keep in mind that if you don't add join conditions to a query, the database will return the Cartesian product.

Let's rewrite the query returning a list of all people attending the Tcl training course:

 SELECT *         FROM student, course         WHERE student.id=course.student_id                 AND course.course='PostgreSQL for Tcl Programmers'; 

As you can see a join condition is in the query and the result will be correct.

  id | name | student_id |             course ----+------+------------+--------------------------------   1 | Epi  |          1 | PostgreSQL for Tcl Programmers   2 | Hans |          2 | PostgreSQL for Tcl Programmers (2 rows) 

Of course, you won't need columns in both tables, but the desired set of data is returned by the query.

After dealing with joins with two tables, you will learn how to write joins with more than just two tables involved. Therefore you have to create an additional table:

 CREATE TABLE course_desc (         course_name text,       -- name of the training course         desc_type text,         -- type of description (short, medium, long)         desctext text           -- text of description ); INSERT INTO course_desc VALUES ('PostgreSQL for Tcl Programmers',         'short', 'a training course about Tcl'); INSERT INTO course_desc VALUES ('PostgreSQL for Tcl Programmers',         'long', 'a training course about Tcl and PostgreSQL'); 

This table is used to store descriptions about training courses. Two records have been added to the table. One record contains a short description of the Tcl course and the second one contains a longer description of the course.

The target of the next query is to get a list of people attending the training course about Tcl plus the short description of the training course. Therefore three tables have to be joined:

 SELECT student.name, course.course, course_desc.desctext         FROM student, course, course_desc         WHERE student.id=course.student_id                 AND course.course=course_desc.course_name                 AND course_desc.desc_type='short'; 

For every table added to the query you will need additional join conditions; otherwise, PostgreSQL will generate the Cartesian product. Take a look at the result of the query:

  name |             course             |          desctext ------+--------------------------------+-----------------------------  Epi  | PostgreSQL for Tcl Programmers | a training course about Tcl  Hans | PostgreSQL for Tcl Programmers | a training course about Tcl (2 rows) 

The desired result is returned by the database.

Writing queries with many tables involved can sometimes be difficult. In such cases, it can be useful to work with views. You will learn more about views in Chapter 8.



PHP and PostgreSQL. Advanced Web Programming2002
PHP and PostgreSQL. Advanced Web Programming2002
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 201

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net