Recipe6.6.Determining Whether a String Is Alphanumeric


Recipe 6.6. Determining Whether a String Is Alphanumeric

Problem

You want to return rows from a table only when a column of interest contains no characters other than numbers and letters. Consider the following view V (SQL Server users will use the operator "+" for concatenation instead of "||"):

 create view V as select ename as data   from emp  where deptno=10  union all select ename||', $'|| cast(sal as char(4)) ||'.00' as data   from emp  where deptno=20  union all select ename|| cast(deptno as char(4)) as data   from emp  where deptno=30 

The view V represents your table, and it returns the following:

 DATA -------------------- CLARK KING MILLER SMITH, $800.00 JONES, $2975.00 SCOTT, $3000.00 ADAMS, $1100.00 FORD, $3000.00 ALLEN30 WARD30 MARTIN30 BLAKE30 TURNER30 JAMES30 

However, from the view's data you want to return only the following records:

 DATA ------------- CLARK KING MILLER ALLEN30 WARD30 MARTIN30 BLAKE30 TURNER30 JAMES30 

In short, you wish to omit those rows containing data other than letters and digits.

Solution

It may seem intuitive at first to solve the problem by searching for all the possible non-alphanumeric characters that can be found in a string, but, on the contrary, you will find it easier to do the exact opposite: find all the alphanumeric characters. By doing so, you can treat all the alphanumeric characters as one by converting them to one single character. The reason you want to do this is so the alphanumeric characters can be manipulated together, as a whole. Once you've generated a copy of the string in which all alphanumeric characters are represented by a single character of your choosing, it is easy to isolate the alphanumeric characters from any other characters.

DB2

Use the function TRANSLATE to convert all alphanumeric characters to a single character, then identify any rows that have characters other than the converted alphanumeric character. For DB2 users, the CAST function calls in view V are necessary; otherwise, the view cannot be created due to type conversion errors. Take extra care when working with casts to CHAR as they are fixed length (padded):

 1 select data 2   from V 3  where translate(lower(data), 4                  repeat('a',36), 5                  '0123456789abcdefghijklmnopqrstuvwxyz') = 6                  repeat('a',length(data)) 

MySQL

The syntax for view V is slightly different in MySQL:

 create view V as select ename as data   from emp  where deptno=10  union all select concat(ename,', $',sal,'.00') as data   from emp  where deptno=20  union all select concat(ename,deptno) as data   from emp  where deptno=30 

Use a regular expression to easily find rows that contain non-alphanumeric data:

 1 select data 2   from V 3  where data regexp '[^0-9a-zA-Z]' = 0 

Oracle and PostgreSQL

Use the function TRANSLATE to convert all alphanumeric characters to a single character, then identify any rows that have characters other than the converted alphanumeric character. The CAST function calls in view V are not needed for Oracle and PostgreSQL. Take extra care when working with casts to CHAR as they are fixed length (padded). If you decide to cast, cast to VARCHAR or VARCHAR2:

 1 select data 2   from V 3  where translate(lower(data), 4                  '0123456789abcdefghijklmnopqrstuvwxyz', 5                  rpad('a',36,'a')) = rpad('a',length(data),'a') 

SQL Server

Because SQL Server does not support a TRANSLATE function, you must walk each row and find any that contains a character that contains a non-alphanumeric value. That can be done many ways, but the following solution uses an ASCII-value evaluation:

  1 select data  2   from (  3 select v.data, iter.pos,  4        substring(v.data,iter.pos,1) c,  5        ascii(substring(v.data,iter.pos,1)) val  6   from v,  7        ( select id as pos from t100 ) iter  8  where iter.pos <= len(v.data)  9        ) x 10  group by data 11  having min(val) between 48 and 122 

Discussion

The key to these solutions is being able to reference multiple characters concurrently. By using the function TRANSLATE you can easily manipulate all numbers or all characters without having to "iterate" and inspect each character one by one.

DB2, Oracle, and PostgreSQL

Only 9 of the 14 rows from view V are alphanumeric. To find the rows that are alphanumeric only, simply use the function TRANSLATE. In this example, TRANSLATE converts characters 09 and az to "a". Once the conversion is done, the converted row is then compared with a string of all "a" with the same length (as the row). If the length is the same, then you know all the characters are alphanumeric and nothing else.

By using the TRANSLATE function (using the Oracle syntax):

 where translate(lower(data),                 '0123456789abcdefghijklmnopqrstuvwxyz',                  rpad('a',36,'a')) 

you convert all numbers and letters into a distinct character (I chose "a"). Once the data is converted, all strings that are indeed alphanumeric can be identified as a string comprising only a single character (in this case, "a"). This can be seen by running TRANSLATE by itself:

  select data, translate(lower(data),                   '0123456789abcdefghijklmnopqrstuvwxyz',                    rpad('a',36,'a'))   from V DATA                 TRANSLATE(LOWER(DATA) -------------------- --------------------- CLARK                aaaaa … SMITH, $800.00       aaaaa, $aaa.aa … ALLEN30              aaaaaaa … 

The alphanumeric values are converted, but the string lengths have not been modified. Because the lengths are the same, the rows to keep are the ones for which the call to TRANSLATE returns all a's. You keep those rows, rejecting the others, by comparing each original string's length with the length of its corresponding string of a's:

  select data, translate(lower(data),                   '0123456789abcdefghijklmnopqrstuvwxyz',                    rpad('a',36,'a')) translated,         rpad('a',length(data),'a') fixed   from V DATA                 TRANSLATED           FIXED -------------------- -------------------- ---------------- CLARK                aaaaa                aaaaa … SMITH, $800.00       aaaaa, $aaa.aa       aaaaaaaaaaaaaa … ALLEN30              aaaaaaa              aaaaaaa … 

The last step is to keep only the strings where TRANSLATED equals FIXED.

MySQL

The expression in the WHERE clause:

 where data regexp '[^0-9a-zA-Z]' = 0 

causes rows that have only numbers or characters to be returned. The value ranges in the brackets, "0-9a-zA-Z", represent all possible numbers and letters. The character "^" is for negation, so the expression can be stated as "not numbers or letters." A return value of 1 is true and 0 is false, so the whole expression can be stated as "return rows where anything other than numbers and letters is false."

SQL Server

The first step is to walk each row returned by view V. Each character in the value returned for DATA will itself be returned as a row. The values returned by C represent each individual character for the values returned by DATA:

 +-----------------+------+------+------+ | data            | pos  | c    | val  | +-----------------+------+------+------+ | ADAMS, $1100.00 |    1 | A    |   65 | | ADAMS, $1100.00 |    2 | D    |   68 | | ADAMS, $1100.00 |    3 | A    |   65 | | ADAMS, $1100.00 |    4 | M    |   77 | | ADAMS, $1100.00 |    5 | S    |   83 | | ADAMS, $1100.00 |    6 | ,    |   44 | | ADAMS, $1100.00 |    7 |      |   32 | | ADAMS, $1100.00 |    8 | $    |   36 | | ADAMS, $1100.00 |    9 | 1    |   49 | | ADAMS, $1100.00 |   10 | 1    |   49 | | ADAMS, $1100.00 |   11 | 0    |   48 | | ADAMS, $1100.00 |   12 | 0    |   48 | | ADAMS, $1100.00 |   13 | .    |   46 | | ADAMS, $1100.00 |   14 | 0    |   48 | | ADAMS, $1100.00 |   15 | 0    |   48 | 

Inline view Z not only returns each character in the column DATA row by row, it also provides the ASCII value for each character. For this particular implementation of SQL Server, the range 48122 represents alphanumeric characters. With that knowledge, you can group each row in DATA and filter out any such that the minimum ASCII value is not in the 48122 range.




SQL Cookbook
SQL Cookbook (Cookbooks (OReilly))
ISBN: 0596009763
EAN: 2147483647
Year: 2005
Pages: 235

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net