Recipe6.9.Ordering by a Number in a String


Recipe 6.9. Ordering by a Number in a String

Problem

You want order your result set based on a number within a string. Consider the following view:

 create view V as select e.ename ||' '||         cast(e.empno as char(4))||' '||         d.dname as data   from emp e, dept d  where e.deptno=d.deptno 

This view returns the following data:

 DATA  ----------------------------  CLARK   7782 ACCOUNTING  KING    7839 ACCOUNTING  MILLER  7934 ACCOUNTING  SMITH   7369 RESEARCH  JONES   7566 RESEARCH  SCOTT   7788 RESEARCH  ADAMS   7876 RESEARCH  FORD    7902 RESEARCH  ALLEN   7499 SALES  WARD    7521 SALES  MARTIN  7654 SALES  BLAKE   7698 SALES  TURNER  7844 SALES  JAMES   7900 SALES 

You want to order the results based on the employee number, which falls between the employee name and respective department:

 DATA --------------------------- SMITH    7369 RESEARCH ALLEN    7499 SALES WARD     7521 SALES JONES    7566 RESEARCH MARTIN   7654 SALES BLAKE    7698 SALES CLARK    7782 ACCOUNTING SCOTT    7788 RESEARCH KING     7839 ACCOUNTING TURNER   7844 SALES ADAMS    7876 RESEARCH JAMES    7900 SALES FORD     7902 RESEARCH MILLER   7934 ACCOUNTING 

Solution

Each solution uses functions and syntax specific to its DBMS, but the method (making use of the built-in functions REPLACE and TRANSLATE) is the same for each. The idea is to use REPLACE and TRANSLATE to remove non-digits from the strings, leaving only the numeric values upon which to sort.

DB2

Use the built-in functions REPLACE and TRANSLATE to order by numeric characters in a string:

 1 select data 2   from V 3  order by 4         cast( 5      replace( 6    translate(data,repeat('#',length(data)), 7      replace( 8    translate(data,'##########','0123456789'), 9             '#','')),'#','') as integer) 

Oracle

Use the built-in functions REPLACE and TRANSLATE to order by numeric characters in a string:

 1 select data 2   from V 3  order by 4         to_number( 5           replace( 6         translate(data, 7           replace( 8         translate(data,'0123456789','##########'), 9                  '#'),rpad('#',20,'#')),'#')) 

PostgreSQL

Use the built-in functions REPLACE and TRANSLATE to order by numeric characters in a string:

 1 select data 2   from V 3  order by 4         cast( 5      replace( 6    translate(data, 7      replace( 8    translate(data,'0123456789','##########'), 9             '#',''),rpad('#',20,'#')),'#','') as integer) 

MySQL and SQL Server

As of the time of this writing, neither vendor supplies the TRANSLATE function.

Discussion

The purpose of view V is only to supply rows on which to demonstrate this recipe's solution. The view simply concatenates several columns from the EMP table. The solution shows how to take such concatenated text as input and sort it by the employee number embedded within.

The ORDER BY clause in each solution may look a bit intimidating but performs quite well and is pretty straightforward once you examine it piece by piece. To order by the numbers in the string, it's easiest to remove any characters that are not numbers. Once the non-numeric characters are removed all that is left to do is cast the string of numerals into a number, then sort as you see fit. Before examining each function call it is important to understand the order in which each function is called. Starting with the innermost call, TRANSLATE (line 8 from each of the original solutions), you see that:

  1. TRANSLATE (line 8) is called and the results are returned to

  2. REPLACE (line 7) and those results are returned to

  3. TRANSLATE (line 6) and those results are returned to

  4. REPLACE (line 5) and those results are returned and finally

  5. cast into a number

The first step is to convert the numbers into characters that do not exist in the rest of the string. For this example, I chose "#" and used TRANSLATE to convert all non-numeric characters into occurrences of "#". For example, the following query shows the original data on the left and the results from the first translation:

  select data,        translate(data,'0123456789','##########') as tmp   from V DATA                           TMP  ------------------------------ -----------------------  CLARK   7782 ACCOUNTING        CLARK   #### ACCOUNTING  KING    7839 ACCOUNTING        KING    #### ACCOUNTING  MILLER  7934 ACCOUNTING        MILLER  #### ACCOUNTING  SMITH   7369 RESEARCH          SMITH   #### RESEARCH  JONES   7566 RESEARCH          JONES   #### RESEARCH  SCOTT   7788 RESEARCH          SCOTT   #### RESEARCH  ADAMS   7876 RESEARCH          ADAMS   #### RESEARCH  FORD    7902 RESEARCH          FORD    #### RESEARCH  ALLEN   7499 SALES             ALLEN   #### SALES  WARD    7521 SALES             WARD    #### SALES  MARTIN  7654 SALES             MARTIN  #### SALES  BLAKE   7698 SALES             BLAKE   #### SALES  TURNER 7844 SALES          TURNER  #### SALES  JAMES  7900 SALES          JAMES   #### SALES 

TRANSLATE finds the numerals in each string and converts each one to to the "#" character. The modified strings are then returned to REPLACE (line 11), which removes all occurrences of "#":

  select data, replace( translate(data,'0123456789','##########'),'#') as tmp   from V DATA                           TMP  ------------------------------ -------------------  CLARK   7782 ACCOUNTING        CLARK   ACCOUNTING  KING    7839 ACCOUNTING        KING    ACCOUNTING  MILLER  7934 ACCOUNTING        MILLER  ACCOUNTING  SMITH   7369 RESEARCH          SMITH   RESEARCH  JONES   7566 RESEARCH          JONES   RESEARCH  SCOTT   7788 RESEARCH          SCOTT   RESEARCH  ADAMS   7876 RESEARCH          ADAMS   RESEARCH  FORD    7902 RESEARCH          FORD    RESEARCH  ALLEN   7499 SALES             ALLEN   SALES  WARD    7521 SALES             WARD    SALES  MARTIN  7654 SALES             MARTIN  SALES  BLAKE   7698 SALES             BLAKE   SALES  TURNER  7844 SALES             TURNER  SALES  JAMES   7900 SALES             JAMES   SALES 

The strings are then returned to TRANSLATE once again, but this time it's the second (outermost) TRANSLATE in the solution. TRANSLATE searches the original string for any characters that match the characters in TMP. If any are found, they too are converted to "#"s. This conversion allows all non-numeric characters to be treated as a single character (because they are all transformed to the same character):

  select data, translate(data,              replace(              translate(data,'0123456789','##########'),              '#'),              rpad('#',length(data),'#')) as tmp   from V DATA                           TMP ------------------------------ --------------------------- CLARK   7782 ACCOUNTING        ########7782########### KING    7839 ACCOUNTING        ########7839########### MILLER  7934 ACCOUNTING        ########7934########### SMITH   7369 RESEARCH          ########7369######### JONES   7566 RESEARCH          ########7566######### SCOTT   7788 RESEARCH          ########7788######### ADAMS   7876 RESEARCH          ########7876######### FORD    7902 RESEARCH          ########7902######### ALLEN   7499 SALES             ########7499###### WARD    7521 SALES             ########7521###### MARTIN  7654 SALES             ########7654###### BLAKE   7698 SALES             ########7698###### TURNER  7844 SALES             ########7844###### JAMES   7900 SALES             ########7900###### 

The next step is to remove all "#" characters through a call to REPLACE (line 8), leaving you with only numbers:

  select data, replace(              translate(data,              replace(            translate(data,'0123456789','##########'),                      '#'),                      rpad('#',length(data),'#')),'#') as tmp   from V DATA                           TMP ------------------------------ ----------- CLARK   7782 ACCOUNTING        7782 KING    7839 ACCOUNTING        7839 MILLER  7934 ACCOUNTING        7934 SMITH   7369 RESEARCH          7369 JONES   7566 RESEARCH          7566 SCOTT   7788 RESEARCH          7788 ADAMS   7876 RESEARCH          7876 FORD    7902 RESEARCH          7902 ALLEN   7499 SALES             7499 WARD    7521 SALES             7521 MARTIN  7654 SALES             7654 BLAKE   7698 SALES             7698 TURNER  7844 SALES             7844 JAMES   7900 SALES             7900 

Finally, cast TMP to a number (line 4) using the appropriate DBMS function (often CAST) to accomplish this:

  select data, to_number(               replace(              translate(data,              replace(        translate(data,'0123456789','##########'),                  '#'),                  rpad('#',length(data),'#')),'#')) as tmp   from V DATA                                  TMP ------------------------------ ---------- CLARK   7782 ACCOUNTING              7782 KING    7839 ACCOUNTING              7839 MILLER  7934 ACCOUNTING              7934 SMITH   7369 RESEARCH                7369 JONES   7566 RESEARCH                7566 SCOTT   7788 RESEARCH                7788 ADAMS   7876 RESEARCH                7876 FORD    7902 RESEARCH                7902 ALLEN   7499 SALES                   7499 WARD    7521 SALES                   7521 MARTIN  7654 SALES                   7654 BLAKE   7698 SALES                   7698 TURNER  7844 SALES                   7844 JAMES   7900 SALES                   7900 

When developing queries like this, it's helpful to work with your expressions in the SELECT list. That way, you can easily view the intermediate results as you work toward a final solution. However, because the point of this recipe is to order the results, ultimately you should place all the function calls into the ORDER BY clause:

  select data   from V  order by         to_number(           replace(         translate( data,           replace(         translate( data,'0123456789','##########'),                   '#'),rpad('#',length(data),'#')),'#')) DATA --------------------------- SMITH   7369 RESEARCH ALLEN   7499 SALES WARD    7521 SALES JONES   7566 RESEARCH MARTIN  7654 SALES BLAKE   7698 SALES CLARK   7782 ACCOUNTING SCOTT   7788 RESEARCH KING    7839 ACCOUNTING TURNER  7844 SALES ADAMS   7876 RESEARCH JAMES   7900 SALES FORD    7902 RESEARCH MILLER  7934 ACCOUNTING 

As a final note, the data in the view is comprised of three fields, only one being numeric. Keep in mind that if there had been multiple numeric fields, they would have all been concatenated into one number before the rows were sorted.




SQL Cookbook
SQL Cookbook (Cookbooks (OReilly))
ISBN: 0596009763
EAN: 2147483647
Year: 2005
Pages: 235

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net