Hack 75. Minimize Bandwidth in One-to-Many Joins

When a one-to-many JOIN produces excess "repetition" in a result set, use a UNION.

Suppose you have a web site containing articles and comments. The database for this might have one table containing the articles and another table for the comments on those articles, with a relationship between these tables, linking each comment to its article.

Here is the ARTICLES table:

CREATE TABLE articles
( id INTEGER not null PRIMARY KEY
, title VARCHAR(100) not null
, article TEXT not null 
)

Each column is NOT NULL because it is required (for the primary key) or expected by the application (an article must have both a title and article text to be accepted).

If your database system does not support the TEXT type, the last column, article, could be any large string data type, such as VARCHAR(5000). The exact data type doesn't really matter, just that the data length of the column is very large because it needs to hold the article text of any article.

Here is the COMMENTS table:

CREATE TABLE comments
( id INTEGER not null PRIMARY KEY
, article_id INTEGER not null 
, title VARCHAR(100) null
, content TEXT not null
, FOREIGN KEY (article_id) REFERENCES articles(id)
)

The COMMENTS table has a similar structure, with an additional column being a foreign key to link each comment to the article on which it is commenting. The title is NULL because it is optional for comments. Finally, there is a large string data type column for the comment text, which you might think need not be as large as the column for the article text, but the business logic may allow for comments of arbitrary length.

This is a classical one-to-many relationship. Each article can have one or more comments, and each comment belongs to only one article.

Technically, it's a one-to-zero-or-many relationship, because the business logic dictates that an article must be able to exist without any comments, which is the situation that pertains immediately after the article is first added, before the first comment on it is created. For a one-to-many relationship, an article could be created with just the article ID, and the first "comment" on the table actually would contain the title and text of the article. Although that does not necessarily make sense in this example, it is how many ticketing systems handle tickets and notesthe "description" of the ticket is actually the first note.

9.12.1. Retrieve an Article and Its Comments

To retrieve an article, title, text, and all comments, you can use the following query:

SELECT articles.id
 , articles.title
 , articles.article
 , comments.id
 , comments.title
 , comments.content
 FROM articles 
LEFT OUTER
 JOIN comments 
 ON comments.article_id = articles.id
 WHERE articles.id = 937
ORDER
 BY articles.id
 , comments.id

The query retrieves a specific article by ID, plus its associated comments, if any, and it's a left outer join because the article might not have comments. The result set will look something like this:

aid atitle article cid ctitle content
937 Title One day I ... 1013 I am 1st First Post!
937 Title One day I ... 1024 Very Useful! Great article!
937 Title One day I ... 1037 Nothing to add I agree w/2nd commenter.
937 Title One day I ... 1042 Me Too' I like this article too.

Because the design was based on the classical one-to-many relationship, the result set is the classical one-to-many result set. Note that the article text and title are repeated in the result set four times, and they could contain 2 GB of data. The query works, but it wastes bandwidth.

You could perform two separate queries, but refer to "Combine Your Queries" [Hack #70] for why this can be bad. There is another way to combine two queries into one, without the "repetition" of data in the result set that you see with a one-to-many join.

9.12.2. The UNION Query

The following UNION query consists of two SELECTs combined with the UNION ALL operator:

SELECT 'article' AS rowtype
 , articles.id
 , articles.title
 , articles.article
 FROM articles 
 WHERE articles.id = 937
UNION ALL
SELECT 'comment' AS rowtype
 , comments.id
 , comments.title
 , comments.content
 FROM articles 
INNER
 JOIN comments 
 ON comments.article_id = articles.id
 WHERE articles.id = 937
ORDER
 BY rowtype

Although there are two SELECTs, there is only one ORDER BY, which sorts the combined results of both SELECTs into one sorted final result set. This is truly a hack; it relies on the fact that the rowtype article comes before comment in the ORDER BY.

The two SELECTs return the article and its comments, respectively. Each SELECT is driven by an identical WHERE clause, to select a specific article or its comments. The second SELECT's join must be an inner join rather than a left outer join, because the first SELECT retrieves the article and the second query retrieves only related comments.

9.12.3. Results

Here is the result of the query:

rowtype id title article
article 937 Title One day I ... 
comment 1013 I am 1st First Post!
comment 1024 Very Useful! Great article! 
comment 1037 Nothing to add I agree w/2nd commenter.
comment 1042 Me Too' I like this article too.

Optimal bandwidth is achieved because there is no repetition of any large column value. The rowtype can be a single-byte column if desired; its values are assigned as literals in the SELECTs.

In the second SELECT, is it necessary to do the join at all? Could the second SELECT simply retrieve from the comments table using a similar WHERE condition on articles_id instead? Yes, but it's better to leave the join in place so that, if desired, instead of retrieving a specific article by ID, you could return all articles that meet a certain condition, such as a keyword in their title. This type of condition testing would require using the same WHERE clause in each SELECT.

9.12.4. Does This Always Work?

You cannot take this approach with every one-to-many relationship that you meet. In order for you to write this type of UNION query, both SELECTs must return a union-compatible result set. This means the same number of columns, with each pair of columns having compatible data types (dates with dates, numbers with numbers, and strings with strings). Articles and comments are well suited for this type of UNION because the tables share such a similar design.

Rudy Limeback

SQL Fundamentals

Joins, Unions, and Views

Text Handling

Date Handling

Number Crunching

Online Applications

Organizing Data

Storing Small Amounts of Data

Locking and Performance

Reporting

Users and Administration

Wider Access

Index