Recipe 6.6. Determining Whether a String Is AlphanumericProblemYou want to return rows from a table only when a column of interest contains no characters other than numbers and letters. Consider the following view V (SQL Server users will use the operator "+" for concatenation instead of "||"): create view V as select ename as data from emp where deptno=10 union all select ename||', $'|| cast(sal as char(4)) ||'.00' as data from emp where deptno=20 union all select ename|| cast(deptno as char(4)) as data from emp where deptno=30 The view V represents your table, and it returns the following: DATA -------------------- CLARK KING MILLER SMITH, $800.00 JONES, $2975.00 SCOTT, $3000.00 ADAMS, $1100.00 FORD, $3000.00 ALLEN30 WARD30 MARTIN30 BLAKE30 TURNER30 JAMES30 However, from the view's data you want to return only the following records: DATA ------------- CLARK KING MILLER ALLEN30 WARD30 MARTIN30 BLAKE30 TURNER30 JAMES30 In short, you wish to omit those rows containing data other than letters and digits. SolutionIt may seem intuitive at first to solve the problem by searching for all the possible non-alphanumeric characters that can be found in a string, but, on the contrary, you will find it easier to do the exact opposite: find all the alphanumeric characters. By doing so, you can treat all the alphanumeric characters as one by converting them to one single character. The reason you want to do this is so the alphanumeric characters can be manipulated together, as a whole. Once you've generated a copy of the string in which all alphanumeric characters are represented by a single character of your choosing, it is easy to isolate the alphanumeric characters from any other characters. DB2Use the function TRANSLATE to convert all alphanumeric characters to a single character, then identify any rows that have characters other than the converted alphanumeric character. For DB2 users, the CAST function calls in view V are necessary; otherwise, the view cannot be created due to type conversion errors. Take extra care when working with casts to CHAR as they are fixed length (padded): 1 select data 2 from V 3 where translate(lower(data), 4 repeat('a',36), 5 '0123456789abcdefghijklmnopqrstuvwxyz') = 6 repeat('a',length(data)) MySQLThe syntax for view V is slightly different in MySQL: create view V as select ename as data from emp where deptno=10 union all select concat(ename,', $',sal,'.00') as data from emp where deptno=20 union all select concat(ename,deptno) as data from emp where deptno=30 Use a regular expression to easily find rows that contain non-alphanumeric data: 1 select data 2 from V 3 where data regexp '[^0-9a-zA-Z]' = 0 Oracle and PostgreSQLUse the function TRANSLATE to convert all alphanumeric characters to a single character, then identify any rows that have characters other than the converted alphanumeric character. The CAST function calls in view V are not needed for Oracle and PostgreSQL. Take extra care when working with casts to CHAR as they are fixed length (padded). If you decide to cast, cast to VARCHAR or VARCHAR2: 1 select data 2 from V 3 where translate(lower(data), 4 '0123456789abcdefghijklmnopqrstuvwxyz', 5 rpad('a',36,'a')) = rpad('a',length(data),'a') SQL ServerBecause SQL Server does not support a TRANSLATE function, you must walk each row and find any that contains a character that contains a non-alphanumeric value. That can be done many ways, but the following solution uses an ASCII-value evaluation: 1 select data 2 from ( 3 select v.data, iter.pos, 4 substring(v.data,iter.pos,1) c, 5 ascii(substring(v.data,iter.pos,1)) val 6 from v, 7 ( select id as pos from t100 ) iter 8 where iter.pos <= len(v.data) 9 ) x 10 group by data 11 having min(val) between 48 and 122 DiscussionThe key to these solutions is being able to reference multiple characters concurrently. By using the function TRANSLATE you can easily manipulate all numbers or all characters without having to "iterate" and inspect each character one by one. DB2, Oracle, and PostgreSQLOnly 9 of the 14 rows from view V are alphanumeric. To find the rows that are alphanumeric only, simply use the function TRANSLATE. In this example, TRANSLATE converts characters 09 and az to "a". Once the conversion is done, the converted row is then compared with a string of all "a" with the same length (as the row). If the length is the same, then you know all the characters are alphanumeric and nothing else. By using the TRANSLATE function (using the Oracle syntax): where translate(lower(data), '0123456789abcdefghijklmnopqrstuvwxyz', rpad('a',36,'a')) you convert all numbers and letters into a distinct character (I chose "a"). Once the data is converted, all strings that are indeed alphanumeric can be identified as a string comprising only a single character (in this case, "a"). This can be seen by running TRANSLATE by itself: select data, translate(lower(data), '0123456789abcdefghijklmnopqrstuvwxyz', rpad('a',36,'a')) from V DATA TRANSLATE(LOWER(DATA) -------------------- --------------------- CLARK aaaaa … SMITH, $800.00 aaaaa, $aaa.aa … ALLEN30 aaaaaaa … The alphanumeric values are converted, but the string lengths have not been modified. Because the lengths are the same, the rows to keep are the ones for which the call to TRANSLATE returns all a's. You keep those rows, rejecting the others, by comparing each original string's length with the length of its corresponding string of a's: select data, translate(lower(data), '0123456789abcdefghijklmnopqrstuvwxyz', rpad('a',36,'a')) translated, rpad('a',length(data),'a') fixed from V DATA TRANSLATED FIXED -------------------- -------------------- ---------------- CLARK aaaaa aaaaa … SMITH, $800.00 aaaaa, $aaa.aa aaaaaaaaaaaaaa … ALLEN30 aaaaaaa aaaaaaa … The last step is to keep only the strings where TRANSLATED equals FIXED. MySQLThe expression in the WHERE clause: where data regexp '[^0-9a-zA-Z]' = 0 causes rows that have only numbers or characters to be returned. The value ranges in the brackets, "0-9a-zA-Z", represent all possible numbers and letters. The character "^" is for negation, so the expression can be stated as "not numbers or letters." A return value of 1 is true and 0 is false, so the whole expression can be stated as "return rows where anything other than numbers and letters is false." SQL ServerThe first step is to walk each row returned by view V. Each character in the value returned for DATA will itself be returned as a row. The values returned by C represent each individual character for the values returned by DATA: +-----------------+------+------+------+ | data | pos | c | val | +-----------------+------+------+------+ | ADAMS, $1100.00 | 1 | A | 65 | | ADAMS, $1100.00 | 2 | D | 68 | | ADAMS, $1100.00 | 3 | A | 65 | | ADAMS, $1100.00 | 4 | M | 77 | | ADAMS, $1100.00 | 5 | S | 83 | | ADAMS, $1100.00 | 6 | , | 44 | | ADAMS, $1100.00 | 7 | | 32 | | ADAMS, $1100.00 | 8 | $ | 36 | | ADAMS, $1100.00 | 9 | 1 | 49 | | ADAMS, $1100.00 | 10 | 1 | 49 | | ADAMS, $1100.00 | 11 | 0 | 48 | | ADAMS, $1100.00 | 12 | 0 | 48 | | ADAMS, $1100.00 | 13 | . | 46 | | ADAMS, $1100.00 | 14 | 0 | 48 | | ADAMS, $1100.00 | 15 | 0 | 48 | Inline view Z not only returns each character in the column DATA row by row, it also provides the ASCII value for each character. For this particular implementation of SQL Server, the range 48122 represents alphanumeric characters. With that knowledge, you can group each row in DATA and filter out any such that the minimum ASCII value is not in the 48122 range. |