Hack 84. Store Comma-Delimited Lists in a Column

Yes, it is possible to write a JOIN on a column containing a comma-delimited list of values.

You may have heard that it's not a good idea to design a table with a comma-delimited list of key values in a single column. But what if you inherit such a table and you need to write a query using it?

10.8.1. Advantages of Comma-Delimited Lists

You commonly encounter comma-delimited lists in web page HTML forms containing a SELECT, or drop-down list, with the MULTIPLE option. For example, suppose you have a web site of articles, each of which can belong to one or more categories. To add a new article, you will likely have a form which includes a drop down to select to which categories the article belongs:

NewsFeatured ArticlesOpinionsScienceTechnologyComputers

When the web page form is submitted, all of the option values selected in the drop down will be submitted as a comma-delimited list of values in the categories form field. So if the user selects Opinions, Science, and Computers, the value of the categories form field will be '3,4,6'.

The advantages of simply storing the list of user-selected values into a VARCHAR column are obvious: the code to store the list is simpler than the code to store the individual values of the list as multiple rows in a separate relationship table.

10.8.2. Disadvantages of Comma-Delimited Lists

One disadvantage becomes apparent the moment you want to use the comma-delimited list in a join. Suppose you need to retrieve the article title and body, as well as the category names for each category to which the article belongs. You will probably try this approach first:

SELECT articles.title
 , articles.body
 , categories.name
 FROM articles 
INNER
 JOIN categories
 ON categories.id IN ( articles.categories ) 

The problem is, you cannot use IN like that (you may even get a syntax error), because the VARCHAR column isn't an actual list of values as far as SQL is concernedit's a single string value, which just happens to contain commas. You cannot use an equality test either, because none of the categories.id values will equal the comma-delimited list (unless there's only one value in the list). What to do?

10.8.3. Joining on a Comma-Delimited List

The difficulty of joining on a comma-delimited list is that one of the columns has a key value which must be found "inside" the other column value. This sounds like a job for LIKE:

SELECT articles.title
 , articles.body
 , categories.name
 FROM articles 
INNER
 JOIN categories
 ON articles.categories 
 LIKE '%' || categories.id || '%' 

You can use the || operator to concatenate strings in Oracle and PostgreSQL. In MySQL you can use CONCAT('%', categories.id, '%'), and in SQL Server you can use '%' + categories.id + '%'.

This approach takes every value of categories.id and determines whether it's located anywhere within the articles.categories comma-delimited string values. Simple, right?

Wrong. If the comma-delimited list of values is '3,9,37,53,82' and the single key value is 7, the LIKE expression becomes '%7%', and you will get a correct match even though article 7 is not in the list.

The way around this predicament is to realize that you also need to delimit the LIKE value. Here is the final query:

select articles.title
 , articles.body
 , categories.name
 from articles 
inner
 join categories
 on ',' || articles.categories || ','
 like '%,' || categories.id || ',%' 

To use the preceding example data, this query now searches the string ',3,9,37,53,82,' for a LIKE expression of '%,7,%' which will not match. Appending the commas to the front and back of both the searched and the search strings ensures that the first and last items in the list will be found.

Even though this approach works, you should be aware that it will not scale. The fact that you're using LIKE with wildcards means that no index on those columns will be used, and the query is forced to scan the table sequentially. The bigger the table is, the slower the query will be.

 

10.8.4. See Also

"Search for a String Across Columns" [Hack #16]

Rudy Limeback

SQL Fundamentals

Joins, Unions, and Views

Text Handling

Date Handling

Number Crunching

Online Applications

Organizing Data

Storing Small Amounts of Data

Locking and Performance

Reporting

Users and Administration

Wider Access

Index



SQL Hacks
SQL Hacks
ISBN: 0596527993
EAN: 2147483647
Year: 2004
Pages: 147

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net