Recipe 5.8. Converting the Lettercase of a Stubborn String


Recipe 5.8. Converting the Lettercase of a "Stubborn" String

Problem

You want to convert a string to uppercase or lowercase, but UPPER⁠(⁠ ⁠ ⁠) and LOWER⁠(⁠ ⁠ ⁠) don't work.

Solution

You're probably trying to convert a binary string. Convert it to a nonbinary string so that it has a character set and collation and becomes subject to case mapping.

Discussion

The usual way to convert a string to uppercase or lowercase is to use the UPPER⁠(⁠ ⁠ ⁠) or LOWER⁠(⁠ ⁠ ⁠) function:

mysql> SET @s = 'aBcD'; mysql> SELECT UPPER(@s), LOWER(@s); +-----------+-----------+ | UPPER(@s) | LOWER(@s) | +-----------+-----------+ | ABCD      | abcd      | +-----------+-----------+ 

But sometimes you'll run across a string that is "stubborn" and resists lettercase conversion. This is common for columns that have a BINARY or BLOB data type:

mysql> CREATE TABLE t (b BLOB) SELECT 'aBcD' AS b; mysql> SELECT b, UPPER(b), LOWER(b) FROM t; +------+----------+----------+ | b    | UPPER(b) | LOWER(b) | +------+----------+----------+ | aBcD | aBcD     | aBcD     | +------+----------+----------+ 

The cause of the problem here is that the column is a binary string: it has no character set or collation and lettercase does not apply. Thus, UPPER⁠(⁠ ⁠ ⁠) and LOWER⁠(⁠ ⁠ ⁠) do nothing, which can be confusing. Compounding the confusion is that lettercase conversion of binary strings used to work in older versions of MySQL, but does so no longer. What's going on? Here is the history:

  • Before MySQL 4.1, all strings, including binary strings, were interpreted with respect to the server's default character set. Consequently, the UPPER⁠(⁠ ⁠ ⁠) and LOWER⁠(⁠ ⁠ ⁠) functions performed case mapping even for binary strings:

    mysql> SET @s = BINARY 'aBcD'; mysql> SELECT @s, LOWER(@s), UPPER(@s); +------+-----------+-----------+ | @s   | LOWER(@s) | UPPER(@s) | +------+-----------+-----------+ | aBcD | abcd      | ABCD      | +------+-----------+-----------+ 

  • In MySQL 4.1, character set handling was revised significantly, with one of the changes being that character set and collation applied only to nonbinary strings. From 4.1 up, a binary string is just a sequence of bytes, and lettercase has no meaning, even if you store what looks like text in the string. As a result, the LOWER⁠(⁠ ⁠ ⁠) and UPPER⁠(⁠ ⁠ ⁠) functions do nothing when applied to binary strings:

    mysql> SET @s = BINARY 'aBcD'; mysql> SELECT @s, LOWER(@s), UPPER(@s); +------+-----------+-----------+ | @s   | LOWER(@s) | UPPER(@s) | +------+-----------+-----------+ | aBcD | aBcD      | aBcD      | +------+-----------+-----------+ 

To map a binary string to a given lettercase, convert it to a nonbinary string, choosing a character set that contains an alphabet with uppercase and lowercase characters. The case-conversion functions then will work as you expect because the collation provides case mapping. The following example uses the BLOB column from earlier in this section, but the same principles apply to binary string literals and string expressions:

mysql> SELECT b,     -> UPPER(CONVERT(b USING latin1)) AS upper,     -> LOWER(CONVERT(b USING latin1)) AS lower     -> FROM t; +------+-------+-------+ | b    | upper | lower | +------+-------+-------+ | aBcD | ABCD  | abcd  | +------+-------+-------+ 

The same kind of case-conversion problem occurs with functions that return binary strings, which is typical for functions such as MD5⁠(⁠ ⁠ ⁠) or COMPRESS⁠(⁠ ⁠ ⁠) that perform encryption or compression.

If you're not sure whether a string expression is binary or nonbinary, use the CHARSET⁠(⁠ ⁠ ⁠) function. The following example shows that VERSION⁠(⁠ ⁠ ⁠) returns a nonbinary string, but MD5⁠(⁠ ⁠ ⁠) returns a binary string:

mysql> SELECT CHARSET(VERSION()), CHARSET(MD5('some string')); +--------------------+-----------------------------+ | CHARSET(VERSION()) | CHARSET(MD5('some string')) | +--------------------+-----------------------------+ | utf8               | binary                      | +--------------------+-----------------------------+ 

That result indicates that the string produced by VERSION⁠(⁠ ⁠ ⁠) can be case-mapped directly, but the string produced by MD5⁠(⁠ ⁠ ⁠) must first be converted to a nonbinary string:

mysql> SELECT UPPER(VERSION()); +------------------+ | UPPER(VERSION()) | +------------------+ | 5.1.12-BETA-LOG  | +------------------+ mysql> SELECT UPPER(CONVERT(MD5('some string') USING latin1)); +-------------------------------------------------+ | UPPER(CONVERT(MD5('some string') USING latin1)) | +-------------------------------------------------+ | 5AC749FBEEC93607FC28D666BE85E73A                | +-------------------------------------------------+ 




MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2004
Pages: 375
Authors: Paul DuBois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net