Recipe 6.5. Separating Numeric and Character DataProblemYou have (unfortunately) stored numeric data along with character data together in one column. You want to separate the character data from the numeric data. Consider the following result set: DATA --------------- SMITH800 ALLEN1600 WARD1250 JONES2975 MARTIN1250 BLAKE2850 CLARK2450 SCOTT3000 KING5000 TURNER1500 ADAMS1100 JAMES950 FORD3000 MILLER1300 You would like the result to be: ENAME SAL ---------- ---------- SMITH 800 ALLEN 1600 WARD 1250 JONES 2975 MARTIN 1250 BLAKE 2850 CLARK 2450 SCOTT 3000 KING 5000 TURNER 1500 ADAMS 1100 JAMES 950 FORD 3000 MILLER 1300 SolutionUse the built-in functions TRANSLATE and REPLACE to isolate the character from the numeric data. Like other recipes in this chapter, the trick is to use TRANSLATE to transform multiple characters into a single character you can reference. This way you are no longer searching for multiple numbers or characters, rather one character to represent all numbers or one character to represent all characters. DB2Use the functions TRANSLATE and REPLACE to isolate and separate the numeric from the character data: 1 select replace( 2 translate(data,'0000000000','0123456789'),'0','') ename, 3 cast( 4 replace( 5 translate(lower(data),repeat('z',26), 6 'abcdefghijklmnopqrstuvwxyz'),'z','') as integer) sal 7 from ( 8 select ename||cast(sal as char(4)) data 9 from emp 10 ) x OracleUse the functions TRANSLATE and REPLACE to isolate and separate the numeric from the character data: 1 select replace( 2 translate(data,'0123456789','0000000000'),'0') ename, 3 to_number( 5 replace( 6 translate(lower(data), 7 'abcdefghijklmnopqrstuvwxyz', 8 rpad('z',26,'z')),'z')) sal 9 from ( 10 select ename||sal data 11 from emp 12 ) PostgreSQLUse the functions TRANSLATE and REPLACE to isolate and separate the numeric from the character data: 1 select replace( 2 translate(data,'0123456789','0000000000'),'0','') as ename, 3 cast( 4 replace( 5 translate(lower(data), 6 'abcdefghijklmnopqrstuvwxyz', 7 rpad('z',26,'z')),'z','') as integer) as sal 8 from ( 9 select ename||sal as data 10 from emp 11 ) x DiscussionThe syntax is a bit different for each DBMS, but the technique is the same. I will use the solution for Oracle in the discussion section. The key to solving this problem is to isolate the numeric and character data. You can use TRANSLATE and REPLACE to do this. To extract the numeric data, first isolate all character data using TRANSLATE: select data, translate(lower(data), 'abcdefghijklmnopqrstuvwxyz', rpad('z',26,'z')) sal from (select ename||sal data from emp) DATA SAL -------------------- ------------------- SMITH800 zzzzz800 ALLEN1600 zzzzz1600 WARD1250 zzzz1250 JONES2975 zzzzz2975 MARTIN1250 zzzzzz1250 BLAKE2850 zzzzz2850 CLARK2450 zzzzz2450 SCOTT3000 zzzzz3000 KING5000 zzzz5000 TURNER1500 zzzzzz1500 ADAMS1100 zzzzz1100 JAMES950 zzzzz950 FORD3000 zzzz3000 MILLER1300 zzzzzz1300 By using TRANSLATE you convert every non-numeric character into a lowercase Z. The next step is to remove all instances of lowercase Z from each record using REPLACE, leaving only numerical characters that can then be cast to a number: select data, to_number( replace( translate(lower(data), 'abcdefghijklmnopqrstuvwxyz', rpad('z',26,'z')),'z')) sal from (select ename||sal data from emp) DATA SAL -------------------- ---------- SMITH800 800 ALLEN1600 1600 WARD1250 1250 JONES2975 2975 MARTIN1250 1250 BLAKE2850 2850 CLARK2450 2450 SCOTT3000 3000 KING5000 5000 TURNER1500 1500 ADAMS1100 1100 JAMES950 950 FORD3000 3000 MILLER1300 1300 To extract the non-numeric characters, isolate the numeric characters using TRANSLATE: select data, translate(data,'0123456789','0000000000') ename from (select ename||sal data from emp) DATA ENAME -------------------- ---------- SMITH800 SMITH000 ALLEN1600 ALLEN0000 WARD1250 WARD0000 JONES2975 JONES0000 MARTIN1250 MARTIN0000 BLAKE2850 BLAKE0000 CLARK2450 CLARK0000 SCOTT3000 SCOTT0000 KING5000 KING0000 TURNER1500 TURNER0000 ADAMS1100 ADAMS0000 JAMES950 JAMES000 FORD3000 FORD0000 MILLER1300 MILLER0000 By using TRANSLATE you convert every numeric character into a zero. The next step is to remove all instances of zero from each record using REPLACE, leaving only non-numeric characters: select data, replace(translate(data,'0123456789','0000000000'),'0') ename from (select ename||sal data from emp) DATA ENAME -------------------- ------- SMITH800 SMITH ALLEN1600 ALLEN WARD1250 WARD JONES2975 JONES MARTIN1250 MARTIN BLAKE2850 BLAKE CLARK2450 CLARK SCOTT3000 SCOTT KING5000 KING TURNER1500 TURNER ADAMS1100 ADAMS JAMES950 JAMES FORD3000 FORD MILLER1300 MILLER Put the two techniques together and you have your solution. |