Recipe6.12.Alphabetizing a String | SQL Cookbook (Cookbooks (OReilly))

Recipe 6.12. Alphabetizing a String

Problem

You want alphabetize the individual characters within strings in your tables. Consider the following result set:

 ENAME ---------- ADAMS ALLEN BLAKE CLARK FORD JAMES JONES KING MARTIN MILLER SCOTT SMITH TURNER WARD

You would like the result to be:

 OLD_NAME   NEW_NAME ---------- -------- ADAMS      AADMS ALLEN      AELLN BLAKE      ABEKL CLARK      ACKLR FORD       DFOR JAMES      AEJMS JONES      EJNOS KING       GIKN MARTIN     AIMNRT MILLER     EILLMR SCOTT      COSTT SMITH      HIMST TURNER     ENRRTU WARD       ADRW

Solution

This problem is a perfect example of why it is crucial to understand your DBMS and what functionality is available to you. In situations where your DBMS does not provide built-in functions to facilitate this solution, you need to come up with something creative. Compare the MySQL solution with the rest.

DB2

To alphabetize rows of strings it is necessary to walk each string then order its characters:

  1 select ename,  2        max(case when pos=1 then c else '' end)||  3        max(case when pos=2 then c else '' end)||  4        max(case when pos=3 then c else '' end)||  5        max(case when pos=4 then c else '' end)||  6        max(case when pos=5 then c else '' end)||  7        max(case when pos=6 then c else '' end)  8   from (  9 select e.ename, 10        cast(substr(e.ename,iter.pos,1) as varchar(100)) c, 11        cast(row_number( )over(partition by e.ename 12                               order by substr(e.ename,iter.pos,1)) 13            as integer) pos 14   from emp e, 15        (select cast(row_number( )over( ) as integer) pos 16           from emp) iter 17  where iter.pos <= length(e.ename) 18        ) x 19  group by ename

MySQL

The key here is the GROUP_CONCAT function, which allows you to not only concatenate the characters that make up each name but also order them:

 1 select ename, group_concat(c order by c separator '') 2   from ( 3 select ename, substr(a.ename,iter.pos,1) c 4   from emp a, 5        ( select id pos from t10 ) iter 6  where iter.pos <= length(a.ename) 7        ) x 8  group by ename

Oracle

The function SYS_CONNECT_BY_PATH allows you to iteratively build a list:

  1 select old_name, new_name  2   from (  3 select old_name, replace(sys_connect_by_path(c,' '),' ') new_name  4   from (  5 select e.ename old_name,  6        row_number() over(partition by e.ename  7                         order by substr(e.ename,iter.pos,1)) rn,  8        substr(e.ename,iter.pos,1) c  9   from emp e, 10        ( select rownum pos from emp ) iter 11  where iter.pos <= length(e.ename) 12  order by 1 13         ) x 14  start with rn = 1 15 connect by prior rn = rn-1 and prior old_name = old_name 16         ) 17  where length(old_name) = length(new_name)

PostgreSQL

PostgreSQL does not offer any built-in functions to easily sort characters in a string, so it is necessary not only to walk through each string but also to know in advance the largest length of any one name. View V is used in this solution for readability:

  create or replace view V as  select x.*    from (  select a.ename,         substr(a.ename,iter.pos,1) as c    from emp a,         (select id as pos from t10) iter   where iter.pos <= length(a.ename)   order by 1,2         ) x

The following select statement leverages the view:

  1 select ename,  2        max(case when pos=1 then  3                 case when cnt=1 then c  4                      else rpad(c,cast(cnt as integer),c)  5                 end  6                 else ''  7             end)||  8        max(case when pos=2 then  9                 case when cnt=1 then c 10                      else rpad(c,cast(cnt as integer),c) 11                 end 12                 else '' 13             end)|| 14        max(case when pos=3 then 15                 case when cnt=1 then c 16                      else rpad(c,cast(cnt as integer),c) 17                 end 18                 else '' 19             end)|| 20        max(case when pos=4 then 21                 case when cnt=1 then c 22                      else rpad(c,cast(cnt as integer),c) 23                 end 24                 else '' 25             end)|| 26        max(case when pos=5 then 27                 case when cnt=1 then c 28                      else rpad(c,cast(cnt as integer),c) 29                 end 30                 else '' 31             end)|| 32        max(case when pos=6 then 33                 case when cnt=1 then c 34                      else rpad(c,cast(cnt as integer),c) 35                 end 36                 else '' 37             end) 38   from ( 39 select a.ename, a.c, 40        (select count(*) 41           from v b 42          where a.ename=b.ename and a.c=b.c ) as cnt, 43        (select count(*)+1 44           from v b 45          where a.ename=b.ename and b.c<a.c) as pos 46   from v a 47        ) x 48  group by ename

SQL Server

To alphabetize rows of strings it is necessary to walk each string, and then order their characters:

  1 select ename,  2           max(case when pos=1 then c else '' end)+  3           max(case when pos=2 then c else '' end)+  4           max(case when pos=3 then c else '' end)+  5           max(case when pos=4 then c else '' end)+  6           max(case when pos=5 then c else '' end)+  7           max(case when pos=6 then c else '' end)  8      from (  9    select e.ename, 10         substring(e.ename,iter.pos,1) as c, 11         row_number() over ( 12          partition by e.ename 13              order by substring(e.ename,iter.pos,1)) as pos 14    from emp e, 15         (select row_number()over(order by ename) as pos 16            from emp) iter 17   where iter.pos <= len(e.ename) 18          ) x 19   group by ename

Discussion

DB2 and SQL Server

The inline view X returns each character in each name as a row. The function SUBSTR or SUBSTRING extracts each character from each name, and the function ROW_NUMBER ranks each character alphabetically:

 ENAME  C  POS -----  -  --- ADAMS  A  1 ADAMS  A  2 ADAMS  D  3 ADAMS  M  4 ADAMS  S  5 …

To return each letter of a string as a row, you must walk the string. This is accomplished with inline view ITER.

Now that the letters in each name have been alphabetized, the last step is to put those letters back together, into a string, in the order they are ranked. Each letter's position is evaluated by the CASE statements (lines 27). If a character is found at a particular position it is then concatenated to the result of the next evaluation (the following CASE statement). Because the aggregate function MAX is used as well, only one character per position POS is returned, so that only one row per name is returned. The CASE evaluation goes up to the number 6, which is the maximum number of characters in any name in table EMP.

MySQL

The inline view X (lines 36) returns each character in each name as a row. The function SUBSTR extracts each character from each name:

 ENAME  C -----  - ADAMS  A ADAMS  A ADAMS  D ADAMS  M ADAMS  S …

Inline view ITER is used to walk the string. From there, the rest of the work is done by the GROUP_CONCAT function. By specifying an order, the function not only concatenates each letter, it does so alphabetically.

Oracle

The real work is done by inline view X (lines 511), where the characters in each name are extracted and put into alphabetical order. This is accomplished by walking the string, then imposing order on those characters. The rest of the query merely glues the names back together.

The tearing apart of names can be seen by executing only inline view X:

 OLD_NAME          RN C ---------- --------- - ADAMS              1 A ADAMS              2 A ADAMS              3 D ADAMS              4 M ADAMS              5 S …

The next step is to take the alphabetized characters and rebuild each name. This is done with the function SYS_CONNECT_BY_PATH by appending each character to the ones before it:

 OLD_NAME   NEW_NAME ---------- --------- ADAMS      A ADAMS      AA ADAMS      AAD ADAMS      AADM ADAMS      AADMS …

The final step is to keep only the strings that have the same length as the names they were built from.

PostgreSQL

For readability, view V is used in this solution to walk the string. The function SUBSTR, in the view definition, extracts each character from each name so that the view returns:

 ENAME C ----- - ADAMS A ADAMS A ADAMS D ADAMS M ADAMS S …

The view also orders the results by ENAME and by each letter in each name. The inline view X (lines 1518) returns the names and characters from view V, the number of times each character occurs in each name, and its position (alphabetically):

 ename | c | cnt | pos ------+---+-----+----- ADAMS | A |   2 |   1 ADAMS | A |   2 |   1 ADAMS | D |   1 |   3 ADAMS | M |   1 |   4 ADAMS | S |   1 |   5

The extra columns CNT and POS, returned by the inline view X, are crucial to the solution. POS is used to rank each character and CNT is used to determine the number of times the character exists in each name. The final step is to evaluate the position of each character and rebuild the name. You'll notice that each case statement is actually two case statements. This is to determine whether or not a character occursmore than once in a name; if it does, then rather than return that character, what is returned is that character appended to itself CNT times. The aggregate function, MAX, is used to ensure there is only one row per name.