Recipe6.11.Converting Delimited Data into a Multi-Valued IN-List

Recipe 6.11. Converting Delimited Data into a Multi-Valued IN-List

Problem

You have delimited data that you want to pass to the IN-list iterator of a WHERE clause. Consider the following string:

 7654,7698,7782,7788

You would like to use the string in a WHERE clause but the following SQL fails because EMPNO is a numeric column:

 select ename,sal,deptno   from emp  where empno in ( '7654,7698,7782,7788' )

This SQL fails because, while EMPNO is a numeric column, the IN list is composed of a single string value. You want that string to be treated as a comma-delimited list of numeric values.

Solution

On the surface it may seem that SQL should do the work of treating a delimited string as a list of delimited values for you, but that is not the case. When a comma embedded within quotes is encountered, SQL can't possibly know that signals a multi-valued list. SQL must treat everything between the quotes as a single entity, as one string value. You must break the string up into individual EMPNOs. The key to this solution is to walk the string, but not into individual characters. You want to walk the string into valid EMPNO values.

DB2

By walking the string passed to the IN-list, you can you can easily convert it to rows. The functions ROW_NUMBER, LOCATE, and SUBSTR are particularly useful here:

  1 select empno,ename,sal,deptno  2   from emp  3  where empno in (  4 select cast(substr(c,2,locate(',',c,2)-2) as integer) empno  5   from (  6 select substr(csv.emps,cast(iter.pos as integer)) as c  7   from (select ','||'7654,7698,7782,7788'||',' emps  8          from t1) csv,  9        (select id as pos 10           from t100 ) iter 11  where iter.pos <= length(csv.emps) 12        ) x 13  where length(c) > 1 14    and substr(c,1,1) = ',' 15        ) y

MySQL

By walking the string passed to the IN-list, you can easily convert it to rows:

  1 select empno, ename, sal, deptno  2   from emp  3  where empno in  4        (  5 select substring_index(  6        substring_index(list.vals,',',iter.pos),',',-1) empno  6   from (select id pos from t10) as iter,  7        (select '7654,7698,7782,7788' as vals  8           from t1) list  9  where iter.pos <= 10        (length(list.vals)-length(replace(list.vals,',','')))+1 11        ) x

Oracle

By walking the string passed to the IN-list, you can you can easily convert it to rows. The functions ROWNUM, SUBSTR, and INSTR are particularly useful here:

  1 select empno,ename,sal,deptno  2   from emp  3  where empno in (  4        select to_number(  5                   rtrim(  6                 substr(emps,  7                  instr(emps,',',1,iter.pos)+1,  8                  instr(emps,',',1,iter.pos+1)  9                  instr(emps,',',1,iter.pos)),',')) emps 10          from (select ','||'7654,7698,7782,7788'||',' emps from t1) csv, 11               (select rownum pos from emp) iter 12         where iter.pos <= ((length(csv.emps)- 13                   length(replace(csv.emps,',')))/length(','))-1 14 )

Postgres

By walking the string passed to the IN-list, you can easily convert it to rows. The function SPLIT_PART makes it easy to parse the string into individual numbers:

  1 select ename,sal,deptno  2   from emp  3  where empno in (  4 select cast(empno as integer) as empno  5   from (  6 select split_part(list.vals,',',iter.pos) as empno  7   from (select id as pos from t10) iter,  8        (select ','||'7654,7698,7782,7788'||',' as vals  9           from t1) list 10  where iter.pos <= 11        length(list.vals)-length(replace(list.vals,',','')) 12        ) z 13  where length(empno) > 0 14        ) x

SQL Server

By walking the string passed to the IN-list, you can you can easily convert it to rows. The functions ROW_NUMBER, CHARINDEX, and SUBSTRING are particularly useful here:

  1 select empno,ename,sal,deptno  2   from emp  3  where empno in (select substring(c,2,charindex(',',c,2)-2) as empno  4   from (  5 select substring(csv.emps,iter.pos,len(csv.emps)) as c  6   from (select ','+'7654,7698,7782,7788'+',' as emps  7           from t1) csv,  8        (select id as pos  9          from t100) iter 10  where iter.pos <= len(csv.emps) 11       ) x 12  where len(c) > 1 13    and substring(c,1,1) = ',' 14       ) y

Discussion

The first and most important step in this solution is to walk the string. Once you've accomplished that, all that's left is to parse the string into individual, numeric values using your DBMS's provided functions.

DB2 and SQL Server

The inline view X (lines 611) walks the string. The idea in this solution is to "walk through" the string, so that each row has one less character than the one before it:

 ,7654,7698,7782,7788, 7654,7698,7782,7788, 654,7698,7782,7788, 54,7698,7782,7788, 4,7698,7782,7788, ,7698,7782,7788, 7698,7782,7788, 698,7782,7788, 98,7782,7788, 8,7782,7788, ,7782,7788, 7782,7788, 782,7788, 82,7788, 2,7788, ,7788, 7788, 788, 88, 8, ,

Notice that by enclosing the string in commas (the delimiter), there's no need to make special checks as to where the beginning or end of the string is.

The next step is to keep only the values you want to use in the IN-list. The values to keep are the ones with leading commas, with the exception of the last row with its lone comma. Use SUBSTR or SUBSTRING to identify which rows have a leading comma, then keep all characters found before the next comma in that row. Once that's done, cast the string to a number so it can be properly evaluated against the numeric column EMPNO (lines 414):

  EMPNO ------   7654   7698   7782   7788

The final step is to use the results in a subquery to return the desired rows.

MySQL

The inline view (lines 59) walks the string. The expression on line 10 determines how many values are in the string by finding the number of commas (the delimiter) and adding one. The function SUBSTRING_INDEX (line 6) returns all characters in the string before (to the left of ) the nth occurrence of a comma (the delimiter):

 +---------------------+ | empno               | +---------------------+ | 7654                | | 7654,7698           | | 7654,7698,7782      | | 7654,7698,7782,7788 | +---------------------+

Those rows are then passed to another call to SUBSTRING_INDEX (line 5); this time the nth occurrence of the delimited is 1, which causes all values to the right of the nth occurrence of the delimiter to be kept:

 +-------+ | empno | +-------+ | 7654  | | 7698  | | 7782  | | 7788  | +-------+

The final step is to plug the results into a subquery.

Oracle

The first step is to walk the string:

  select emps,pos   from (select ','||'7654,7698,7782,7788'||',' emps           from t1) csv,        (select rownum pos from emp) iter  where iter.pos <=  ((length(csv.emps)-length(replace(csv.emps,',')))/length(','))-1 EMPS                         POS --------------------- ---------- ,7654,7698,7782,7788,          1 ,7654,7698,7782,7788,          2 ,7654,7698,7782,7788,          3 ,7654,7698,7782,7788,          4

The number of rows returned represents the number of values in your list. The values for POS are crucial to the query as they are needed to parse the string into individual values. The strings are parsed using SUBSTR and INSTR. POS is used to locate the nth occurrence of the delimiter in each string. By enclosing the strings in commas, no special checks are necessary to determine the beginning or end of a string. The values passed to SUBSTR, INSTR (lines 79) locate the nth and nth+1 occurrence of the delimiter. By subtracting the value returned for the current comma (the location in the string where the current comma is) from the value returned bythe next comma (the location in the string where the next comma is) you can extract each value from the string:

  select substr(emps,        instr(emps,',',1,iter.pos)+1,        instr(emps,',',1,iter.pos+1)        instr(emps,',',1,iter.pos)) emps   from (select ','||'7654,7698,7782,7788'||',' emps           from t1) csv,        (select rownum pos from emp) iter  where iter.pos <=   ((length(csv.emps)-length(replace(csv.emps,',')))/length(','))-1  EMPS  -----------  7654,  7698,  7782,  7788,

The final step is to remove the trailing comma from each value, cast it to a number, and plug it into a subquery.

PostgreSQL

The inline view Z (lines 69) walks the string. The number of rows returned is determined by how many values are in the string. To find the number of values in the string, subtract the size of the string without the delimiter from the size of the string with the delimiter (line 9). The function SPLIT_PART does the work of parsing the string. It looks for the value that comes before the nth occurrence of the delimiter:

  select list.vals,        split_part(list.vals,',',iter.pos) as empno,        iter.pos   from (select id as pos from t10) iter,        (select ','||'7654,7698,7782,7788'||',' as vals           from t1) list  where iter.pos <=        length(list.vals)-length(replace(list.vals,',',''))      vals             | empno | pos ----------------------+-------+----- ,7654,7698,7782,7788, |       |   1 ,7654,7698,7782,7788, | 7654  |   2 ,7654,7698,7782,7788, | 7698  |   3 ,7654,7698,7782,7788, | 7782  |   4 ,7654,7698,7782,7788, | 7788  |   5

The final step is to cast the values (EMPNO) to a number and plug it into a subquery.