Recipe 6.9. Ordering by a Number in a StringProblemYou want order your result set based on a number within a string. Consider the following view: create view V as select e.ename ||' '|| cast(e.empno as char(4))||' '|| d.dname as data from emp e, dept d where e.deptno=d.deptno This view returns the following data: DATA ---------------------------- CLARK 7782 ACCOUNTING KING 7839 ACCOUNTING MILLER 7934 ACCOUNTING SMITH 7369 RESEARCH JONES 7566 RESEARCH SCOTT 7788 RESEARCH ADAMS 7876 RESEARCH FORD 7902 RESEARCH ALLEN 7499 SALES WARD 7521 SALES MARTIN 7654 SALES BLAKE 7698 SALES TURNER 7844 SALES JAMES 7900 SALES You want to order the results based on the employee number, which falls between the employee name and respective department: DATA --------------------------- SMITH 7369 RESEARCH ALLEN 7499 SALES WARD 7521 SALES JONES 7566 RESEARCH MARTIN 7654 SALES BLAKE 7698 SALES CLARK 7782 ACCOUNTING SCOTT 7788 RESEARCH KING 7839 ACCOUNTING TURNER 7844 SALES ADAMS 7876 RESEARCH JAMES 7900 SALES FORD 7902 RESEARCH MILLER 7934 ACCOUNTING SolutionEach solution uses functions and syntax specific to its DBMS, but the method (making use of the built-in functions REPLACE and TRANSLATE) is the same for each. The idea is to use REPLACE and TRANSLATE to remove non-digits from the strings, leaving only the numeric values upon which to sort. DB2Use the built-in functions REPLACE and TRANSLATE to order by numeric characters in a string: 1 select data 2 from V 3 order by 4 cast( 5 replace( 6 translate(data,repeat('#',length(data)), 7 replace( 8 translate(data,'##########','0123456789'), 9 '#','')),'#','') as integer) OracleUse the built-in functions REPLACE and TRANSLATE to order by numeric characters in a string: 1 select data 2 from V 3 order by 4 to_number( 5 replace( 6 translate(data, 7 replace( 8 translate(data,'0123456789','##########'), 9 '#'),rpad('#',20,'#')),'#')) PostgreSQLUse the built-in functions REPLACE and TRANSLATE to order by numeric characters in a string: 1 select data 2 from V 3 order by 4 cast( 5 replace( 6 translate(data, 7 replace( 8 translate(data,'0123456789','##########'), 9 '#',''),rpad('#',20,'#')),'#','') as integer) MySQL and SQL ServerAs of the time of this writing, neither vendor supplies the TRANSLATE function. DiscussionThe purpose of view V is only to supply rows on which to demonstrate this recipe's solution. The view simply concatenates several columns from the EMP table. The solution shows how to take such concatenated text as input and sort it by the employee number embedded within. The ORDER BY clause in each solution may look a bit intimidating but performs quite well and is pretty straightforward once you examine it piece by piece. To order by the numbers in the string, it's easiest to remove any characters that are not numbers. Once the non-numeric characters are removed all that is left to do is cast the string of numerals into a number, then sort as you see fit. Before examining each function call it is important to understand the order in which each function is called. Starting with the innermost call, TRANSLATE (line 8 from each of the original solutions), you see that:
The first step is to convert the numbers into characters that do not exist in the rest of the string. For this example, I chose "#" and used TRANSLATE to convert all non-numeric characters into occurrences of "#". For example, the following query shows the original data on the left and the results from the first translation: select data, translate(data,'0123456789','##########') as tmp from V DATA TMP ------------------------------ ----------------------- CLARK 7782 ACCOUNTING CLARK #### ACCOUNTING KING 7839 ACCOUNTING KING #### ACCOUNTING MILLER 7934 ACCOUNTING MILLER #### ACCOUNTING SMITH 7369 RESEARCH SMITH #### RESEARCH JONES 7566 RESEARCH JONES #### RESEARCH SCOTT 7788 RESEARCH SCOTT #### RESEARCH ADAMS 7876 RESEARCH ADAMS #### RESEARCH FORD 7902 RESEARCH FORD #### RESEARCH ALLEN 7499 SALES ALLEN #### SALES WARD 7521 SALES WARD #### SALES MARTIN 7654 SALES MARTIN #### SALES BLAKE 7698 SALES BLAKE #### SALES TURNER 7844 SALES TURNER #### SALES JAMES 7900 SALES JAMES #### SALES TRANSLATE finds the numerals in each string and converts each one to to the "#" character. The modified strings are then returned to REPLACE (line 11), which removes all occurrences of "#": select data, replace( translate(data,'0123456789','##########'),'#') as tmp from V DATA TMP ------------------------------ ------------------- CLARK 7782 ACCOUNTING CLARK ACCOUNTING KING 7839 ACCOUNTING KING ACCOUNTING MILLER 7934 ACCOUNTING MILLER ACCOUNTING SMITH 7369 RESEARCH SMITH RESEARCH JONES 7566 RESEARCH JONES RESEARCH SCOTT 7788 RESEARCH SCOTT RESEARCH ADAMS 7876 RESEARCH ADAMS RESEARCH FORD 7902 RESEARCH FORD RESEARCH ALLEN 7499 SALES ALLEN SALES WARD 7521 SALES WARD SALES MARTIN 7654 SALES MARTIN SALES BLAKE 7698 SALES BLAKE SALES TURNER 7844 SALES TURNER SALES JAMES 7900 SALES JAMES SALES The strings are then returned to TRANSLATE once again, but this time it's the second (outermost) TRANSLATE in the solution. TRANSLATE searches the original string for any characters that match the characters in TMP. If any are found, they too are converted to "#"s. This conversion allows all non-numeric characters to be treated as a single character (because they are all transformed to the same character): select data, translate(data, replace( translate(data,'0123456789','##########'), '#'), rpad('#',length(data),'#')) as tmp from V DATA TMP ------------------------------ --------------------------- CLARK 7782 ACCOUNTING ########7782########### KING 7839 ACCOUNTING ########7839########### MILLER 7934 ACCOUNTING ########7934########### SMITH 7369 RESEARCH ########7369######### JONES 7566 RESEARCH ########7566######### SCOTT 7788 RESEARCH ########7788######### ADAMS 7876 RESEARCH ########7876######### FORD 7902 RESEARCH ########7902######### ALLEN 7499 SALES ########7499###### WARD 7521 SALES ########7521###### MARTIN 7654 SALES ########7654###### BLAKE 7698 SALES ########7698###### TURNER 7844 SALES ########7844###### JAMES 7900 SALES ########7900###### The next step is to remove all "#" characters through a call to REPLACE (line 8), leaving you with only numbers: select data, replace( translate(data, replace( translate(data,'0123456789','##########'), '#'), rpad('#',length(data),'#')),'#') as tmp from V DATA TMP ------------------------------ ----------- CLARK 7782 ACCOUNTING 7782 KING 7839 ACCOUNTING 7839 MILLER 7934 ACCOUNTING 7934 SMITH 7369 RESEARCH 7369 JONES 7566 RESEARCH 7566 SCOTT 7788 RESEARCH 7788 ADAMS 7876 RESEARCH 7876 FORD 7902 RESEARCH 7902 ALLEN 7499 SALES 7499 WARD 7521 SALES 7521 MARTIN 7654 SALES 7654 BLAKE 7698 SALES 7698 TURNER 7844 SALES 7844 JAMES 7900 SALES 7900 Finally, cast TMP to a number (line 4) using the appropriate DBMS function (often CAST) to accomplish this: select data, to_number( replace( translate(data, replace( translate(data,'0123456789','##########'), '#'), rpad('#',length(data),'#')),'#')) as tmp from V DATA TMP ------------------------------ ---------- CLARK 7782 ACCOUNTING 7782 KING 7839 ACCOUNTING 7839 MILLER 7934 ACCOUNTING 7934 SMITH 7369 RESEARCH 7369 JONES 7566 RESEARCH 7566 SCOTT 7788 RESEARCH 7788 ADAMS 7876 RESEARCH 7876 FORD 7902 RESEARCH 7902 ALLEN 7499 SALES 7499 WARD 7521 SALES 7521 MARTIN 7654 SALES 7654 BLAKE 7698 SALES 7698 TURNER 7844 SALES 7844 JAMES 7900 SALES 7900 When developing queries like this, it's helpful to work with your expressions in the SELECT list. That way, you can easily view the intermediate results as you work toward a final solution. However, because the point of this recipe is to order the results, ultimately you should place all the function calls into the ORDER BY clause: select data from V order by to_number( replace( translate( data, replace( translate( data,'0123456789','##########'), '#'),rpad('#',length(data),'#')),'#')) DATA --------------------------- SMITH 7369 RESEARCH ALLEN 7499 SALES WARD 7521 SALES JONES 7566 RESEARCH MARTIN 7654 SALES BLAKE 7698 SALES CLARK 7782 ACCOUNTING SCOTT 7788 RESEARCH KING 7839 ACCOUNTING TURNER 7844 SALES ADAMS 7876 RESEARCH JAMES 7900 SALES FORD 7902 RESEARCH MILLER 7934 ACCOUNTING As a final note, the data in the view is comprised of three fields, only one being numeric. Keep in mind that if there had been multiple numeric fields, they would have all been concatenated into one number before the rows were sorted. |