Recipe6.5.Separating Numeric and Character Data


Recipe 6.5. Separating Numeric and Character Data

Problem

You have (unfortunately) stored numeric data along with character data together in one column. You want to separate the character data from the numeric data. Consider the following result set:

 DATA --------------- SMITH800 ALLEN1600 WARD1250 JONES2975 MARTIN1250 BLAKE2850 CLARK2450 SCOTT3000 KING5000 TURNER1500 ADAMS1100 JAMES950 FORD3000 MILLER1300 

You would like the result to be:

 ENAME             SAL ---------- ---------- SMITH             800 ALLEN            1600 WARD             1250 JONES            2975 MARTIN           1250 BLAKE            2850 CLARK            2450 SCOTT            3000 KING             5000 TURNER           1500 ADAMS            1100 JAMES             950 FORD             3000 MILLER           1300 

Solution

Use the built-in functions TRANSLATE and REPLACE to isolate the character from the numeric data. Like other recipes in this chapter, the trick is to use TRANSLATE to transform multiple characters into a single character you can reference. This way you are no longer searching for multiple numbers or characters, rather one character to represent all numbers or one character to represent all characters.

DB2

Use the functions TRANSLATE and REPLACE to isolate and separate the numeric from the character data:

  1 select replace(  2      translate(data,'0000000000','0123456789'),'0','') ename,  3        cast(  4      replace(  5    translate(lower(data),repeat('z',26),  6           'abcdefghijklmnopqrstuvwxyz'),'z','') as integer) sal  7    from (  8  select ename||cast(sal as char(4)) data  9    from emp 10         ) x 

Oracle

Use the functions TRANSLATE and REPLACE to isolate and separate the numeric from the character data:

  1 select replace(  2      translate(data,'0123456789','0000000000'),'0') ename,  3      to_number(  5        replace(  6        translate(lower(data),  7                  'abcdefghijklmnopqrstuvwxyz',  8                   rpad('z',26,'z')),'z')) sal  9   from ( 10 select ename||sal data 11   from emp 12        ) 

PostgreSQL

Use the functions TRANSLATE and REPLACE to isolate and separate the numeric from the character data:

  1 select replace(  2      translate(data,'0123456789','0000000000'),'0','') as ename,  3           cast(  4        replace(  5      translate(lower(data),  6                'abcdefghijklmnopqrstuvwxyz',  7                rpad('z',26,'z')),'z','') as integer) as sal  8   from (  9 select ename||sal as data 10   from emp 11        ) x 

Discussion

The syntax is a bit different for each DBMS, but the technique is the same. I will use the solution for Oracle in the discussion section. The key to solving this problem is to isolate the numeric and character data. You can use TRANSLATE and REPLACE to do this. To extract the numeric data, first isolate all character data using TRANSLATE:

  select data,        translate(lower(data),                 'abcdefghijklmnopqrstuvwxyz',                 rpad('z',26,'z')) sal   from (select ename||sal data from emp) DATA                 SAL -------------------- ------------------- SMITH800             zzzzz800 ALLEN1600            zzzzz1600 WARD1250             zzzz1250 JONES2975            zzzzz2975 MARTIN1250           zzzzzz1250 BLAKE2850            zzzzz2850 CLARK2450            zzzzz2450 SCOTT3000            zzzzz3000 KING5000             zzzz5000 TURNER1500           zzzzzz1500 ADAMS1100            zzzzz1100 JAMES950             zzzzz950 FORD3000             zzzz3000 MILLER1300           zzzzzz1300 

By using TRANSLATE you convert every non-numeric character into a lowercase Z. The next step is to remove all instances of lowercase Z from each record using REPLACE, leaving only numerical characters that can then be cast to a number:

  select data,        to_number(          replace(        translate(lower(data),                  'abcdefghijklmnopqrstuvwxyz',                  rpad('z',26,'z')),'z')) sal   from (select ename||sal data from emp)  DATA                        SAL  -------------------- ----------  SMITH800                    800  ALLEN1600                  1600  WARD1250                   1250  JONES2975                  2975  MARTIN1250                 1250  BLAKE2850                  2850  CLARK2450                  2450  SCOTT3000                  3000  KING5000                   5000  TURNER1500                 1500  ADAMS1100                  1100  JAMES950                    950  FORD3000                   3000  MILLER1300                 1300 

To extract the non-numeric characters, isolate the numeric characters using TRANSLATE:

  select data,        translate(data,'0123456789','0000000000') ename   from (select ename||sal data from emp)  DATA                 ENAME  -------------------- ----------  SMITH800             SMITH000  ALLEN1600            ALLEN0000  WARD1250             WARD0000  JONES2975            JONES0000  MARTIN1250           MARTIN0000  BLAKE2850            BLAKE0000  CLARK2450            CLARK0000  SCOTT3000            SCOTT0000  KING5000             KING0000  TURNER1500           TURNER0000  ADAMS1100            ADAMS0000  JAMES950             JAMES000  FORD3000             FORD0000  MILLER1300           MILLER0000 

By using TRANSLATE you convert every numeric character into a zero. The next step is to remove all instances of zero from each record using REPLACE, leaving only non-numeric characters:

  select data,        replace(translate(data,'0123456789','0000000000'),'0') ename   from (select ename||sal data from emp)  DATA                 ENAME  -------------------- -------  SMITH800             SMITH  ALLEN1600            ALLEN  WARD1250             WARD  JONES2975            JONES  MARTIN1250           MARTIN  BLAKE2850            BLAKE  CLARK2450            CLARK  SCOTT3000            SCOTT  KING5000             KING  TURNER1500           TURNER  ADAMS1100            ADAMS  JAMES950             JAMES  FORD3000             FORD  MILLER1300           MILLER 

Put the two techniques together and you have your solution.




SQL Cookbook
SQL Cookbook (Cookbooks (OReilly))
ISBN: 0596009763
EAN: 2147483647
Year: 2005
Pages: 235

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net