Recipe6.13.Identifying Strings That Can Be Treated as Numbers


Recipe 6.13. Identifying Strings That Can Be Treated as Numbers

Problem

You have a column that is defined to hold character data. Unfortunately, the rows contain mixed numeric and character data. Consider view V:

 create view V as select replace(mixed,' ','') as mixed   from ( select substr(ename,1,2)||        cast(deptno as char(4))||        substr(ename,3,2) as mixed   from emp  where deptno = 10  union all select cast(empno as char(4)) as mixed   from emp  where deptno = 20  union all select ename as mixed   from emp  where deptno = 30        ) x select * from v  MIXED  --------------  CL10AR  KI10NG  MI10LL  7369  7566  7788  7876  7902  ALLEN  WARD  MARTIN  BLAKE  TURNER  JAMES 

You want to return rows that are numbers only, or that contain at least one number. If the numbers are mixed with character data, you want to remove the characters and return only the numbers. For the sample data above you want the following result set:

    MIXED --------       10       10       10     7369     7566     7788     7876     7902 

Solution

The functions REPLACE and TRANSLATE are extremely useful for manipulating strings and individual characters. The key is to convert all numbers to a single character, which then makes it easy to isolate and identify any number by referring to a single character.

DB2

Use functions TRANSLATE, REPLACE, and POSSTR to isolate the numeric characters in each row. The calls to CAST are necessary in view V; otherwise, the view will fail to be created due to type conversion errors. You'll need the function REPLACE to remove extraneous white space due to casting to the fixed length CHAR:

  1 select mixed old,  2        cast(  3          case  4          when  5            replace(  6           translate(mixed,'9999999999','0123456789'),'9','') = ''  7          then  8             mixed  9          else replace( 10            translate(mixed, 11               repeat('#',length(mixed)), 12             replace( 13              translate(mixed,'9999999999','0123456789'),'9','')), 14                      '#','') 15          end as integer ) mixed 16   from V 17  where posstr(translate(mixed,'9999999999','0123456789'),'9') > 0 

MySQL

The syntax for MySQL is slightly different and will define view V as:

 create view V as select concat(          substr(ename,1,2),          replace(cast(deptno as char(4)),' ',''),          substr(ename,3,2)        ) as mixed   from emp  where deptno = 10  union all select replace(cast(empno as char(4)), ' ', '')   from emp where deptno = 20  union all select ename from emp where deptno = 30 

Because MySQL does not support the TRANSLATE function, you must walk each row and evaluate it on a character-by-character basis.

  1 select cast(group_concat(c order by pos separator '') as unsigned)  2        as MIXED1  3   from (  4 select v.mixed, iter.pos, substr(v.mixed,iter.pos,1) as c  5   from V,  6        ( select id pos from t10 ) iter  7  where iter.pos <= length(v.mixed)  8    and ascii(substr(v.mixed,iter.pos,1)) between 48 and 57  9        ) y 10  group by mixed 11  order by 1 

Oracle

Use functions TRANSLATE, REPLACE, and INSTR to isolate the numeric characters in each row. The calls to CAST are not necessary in view V. Use the function REPLACE to remove extraneous white space due to casting to the fixed length CHAR. If you decide you would like to keep the explicit type conversion calls in the view definition, it is suggested you cast to VARCHAR2:

  1 select to_number (  2         case  3         when  4            replace(translate(mixed,'0123456789','9999999999'),'9')  5           is not null  6         then  7              replace(  8            translate(mixed,  9              replace( 10            translate(mixed,'0123456789','9999999999'),'9'), 11                     rpad('#',length(mixed),'#')),'#') 12         else 13              mixed 14         end 15         ) mixed 16   from V 17  where instr(translate(mixed,'0123456789','9999999999'),'9') > 0 

PostgreSQL

Use functions TRANSLATE, REPLACE, and STRPOS to isolate the numeric characters in each row. The calls to CAST are not necessary in view V. Use the function REPLACE ito remove extraneous white space due to casting to the fixed length CHAR. If you decide you would like to keep the explicit type conversion calls in the view definition, it is suggested you cast to VARCHAR:

  1 select cast(  2        case  3        when  4         replace(translate(mixed,'0123456789','9999999999'),'9','')  5         is not null  6        then  7           replace(  8        translate(mixed,  9           replace( 10        translate(mixed,'0123456789','9999999999'),'9',''), 11                 rpad('#',length(mixed),'#')),'#','') 12        else 13          mixed 14        end as integer ) as mixed 15    from V 16  where strpos(translate(mixed,'0123456789','9999999999'),'9') > 0    

SQL Server

The built-in function ISNUMERIC along with a wildcard search allows you to easily identify strings that contains numbers, but getting numeric characters out of a string is not particularly efficient because the TRANSLATE function is not supported.

Discussion

The TRANSLATE function is very useful here as it allows you to easily isolate and identify numbers and characters. The trick is to convert all numbers to a single character; this way, rather than searching for different numbers you only search for one character.

DB2, Oracle, and PostgreSQL

The syntax differs slightly among these DBMSs, but the technique is the same. I'll use the solution for PostgreSQL for the discussion.

The real work is done by functions TRANSLATE and REPLACE. To get the final result set requires several function calls, each listed below in one query:

  select mixed as orig, translate(mixed,'0123456789','9999999999') as mixed1, replace(translate(mixed,'0123456789','9999999999'),'9','') as mixed2,  translate(mixed,  replace(  translate(mixed,'0123456789','9999999999'),'9',''),           rpad('#',length(mixed),'#')) as mixed3,   replace(  translate(mixed,  replace(  translate(mixed,'0123456789','9999999999'),'9',''),           rpad('#',length(mixed),'#')),'#','') as mixed4   from V   where strpos(translate(mixed,'0123456789','9999999999'),'9') > 0   ORIG  | MIXED1 | MIXED2 | MIXED3 | MIXED4 | MIXED5  --------+--------+--------+--------+--------+--------  CL10AR | CL99AR | CLAR   | ##10## | 10     |     10  KI10NG | KI99NG | KING   | ##10## | 10     |     10  MI10LL | MI99LL | MILL   | ##10## | 10     |     10  7369   | 9999   |        | 7369   | 7369   |   7369  7566   | 9999   |        | 7566   | 7566   |   7566  7788   | 9999   |        | 7788   | 7788   |   7788  7876   | 9999   |        | 7876   | 7876   |   7876  7902   | 9999   |        | 7902   | 7902   |   7902 

First, notice that any rows without at least one number are removed. How this is accomplished will become clear as you examine each of the columns in the above result set. The rows that are kept are the values in the ORIG column and are the rows that will eventually make up the result set. The first step to extracting the numbers is to use the function TRANSLATE to convert any number to a 9 (you can use any digit; 9 is arbitrary), this is represented by the values in MIXED1. Now that all numbers are 9's, they can be treating as a single unit. The next step is to remove all of the numbers by using the function REPLACE. Because all digits are now 9, REPLACE simply looks for any 9's and removes them. This is represented by the values in MIXED2. The next step, MIXED3, uses values that are returned by MIXED2. These values are then compared to the values in ORIG. If any characters from MIXED2 are found in ORIG, they are converted to the # character by TRANSLATE. The result set from MIXED3 shows that the letters, not the numbers, have now been singled out and converted to a single character. Now that all non-numeric characters are represented by #'s, they can be treated as a single unit. The next step, MIXED4, uses REPLACE to find and remove any # characters in each row; what's left are numbers only. The final step is to cast the numeric characters as numbers. Now that you've gone through the steps, you can see how the WHERE clause works. The results from MIXED1 are passed to STRPOS, and if a 9 is found (the position in the string where the first 9 is located) the result must be greater than 0. For rows that return a value greater than zero, it means there's at least one number in that row and it should be kept.

MySQL

The first step is to walk each string and evaluate each character and determine whether or not it's a number:

  select v.mixed, iter.pos, substr(v.mixed,iter.pos,1) as c   from V,       ( select id pos from t10 ) iter  where iter.pos <= length(v.mixed)   order by 1,2 +--------+------+------+ | mixed  | pos  | c    | +--------+------+------+ | 7369   |    1 | 7    | | 7369   |    2 | 3    | | 7369   |    3 | 6    | | 7369   |    4 | 9    | … | ALLEN  |    1 | A    | | ALLEN  |    2 | L    | | ALLEN  |    3 | L    | | ALLEN  |    4 | E    | | ALLEN  |    5 | N    | … | CL10AR |    1 | C    | | CL10AR |    2 | L    | | CL10AR |    3 | 1    | | CL10AR |    4 | 0    | | CL10AR |    5 | A    | | CL10AR |    6 | R    | +--------+------+------+ 

Now that each character in each string can be evaluated individually, the next step is to keep only the rows that have a number in the C column:

  select v.mixed, iter.pos, substr(v.mixed,iter.pos,1) as c    from V,        ( select id pos from t10 ) iter  where iter.pos <= length(v.mixed)   and ascii(substr(v.mixed,iter.pos,1)) between 48 and 57  order by 1,2 +--------+------+------+ | mixed  | pos  | c    | +--------+------+------+ | 7369   |    1 | 7    | | 7369   |    2 | 3    | | 7369   |    3 | 6    | | 7369   |    4 | 9    | … | CL10AR |    3 | 1    | | CL10AR |    4 | 0    | … +--------+------+------+ 

At this point, all the rows in column C are numbers. The next step is to use GROUP_CONCAT to concatenate the numbers to form their respective whole number in MIXED. The final result is then cast as a number:

  select cast(group_concat(c order by pos separator '') as unsigned)          as MIXED1   from (  select v.mixed, iter.pos, substr(v.mixed,iter.pos,1) as c    from V,        ( select id pos from t10 ) iter   where iter.pos <= length(v.mixed)    and ascii(substr(x.mixed,iter.pos,1)) between 48 and 57        ) y   group by mixed   order by 1 +--------+ | MIXED1 | +--------+ |    10  | |    10  | |    10  | |  7369  | |  7566  | |  7788  | |  7876  | |  7902  | +--------+ 

As a final note, keep in mind that any digits in each string will be concatenated to form one numeric value. For example, an input value of, say, '99Gennick87' will result in the value 9987 being returned. This is something to keep in mind, particularly when working with serialized data.




SQL Cookbook
SQL Cookbook (Cookbooks (OReilly))
ISBN: 0596009763
EAN: 2147483647
Year: 2005
Pages: 235

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net