Recipe 5.5. Checking a String s Character Set or Collation


Recipe 5.5. Checking a String's Character Set or Collation

Problem

You want to know the character set or collation of a string.

Solution

Use the CHARSET⁠(⁠ ⁠ ⁠) or COLLATION⁠(⁠ ⁠ ⁠) function.

Discussion

If you create a table using the following definition, you know that values stored in the column will have a character set of utf8 and a collation of utf8_danish_ci:

CREATE TABLE t (c CHAR(10) CHARACTER SET utf8 COLLATE utf8_danish_ci); 

But sometimes it's not so clear what character set or collation applies to a string. Server configuration affects literal strings and some string functions, and other string functions return values in a specific character set. Symptoms that you have the wrong character set or collation are that a collation-mismatch error occurs for a comparison operation, or a lettercase conversion doesn't work properly. This section shows how to check what character set or collation a string has. Section 5.6 shows how to convert strings to a different character set or collation.

To find out what character set or collation a string has, use the CHARSET⁠(⁠ ⁠ ⁠) or COLLATION⁠(⁠ ⁠ ⁠) function. For example, did you know that the USER⁠(⁠ ⁠ ⁠) function returns a Unicode string?

mysql> SELECT USER(), CHARSET(USER()), COLLATION(USER()); +------------------+-----------------+-------------------+ | USER()           | CHARSET(USER()) | COLLATION(USER()) | +------------------+-----------------+-------------------+ | cbuser@localhost | utf8            | utf8_general_ci   | +------------------+-----------------+-------------------+ 

String values that take their character set and collation from the current configuration may change properties if the configuration changes. This is true for literal strings:

mysql> SET NAMES 'latin1'; mysql> SELECT CHARSET('abc'), COLLATION('abc'); +----------------+-------------------+ | CHARSET('abc') | COLLATION('abc')  | +----------------+-------------------+ | latin1         | latin1_swedish_ci | +----------------+-------------------+ mysql> SET NAMES latin7 COLLATE 'latin7_bin'; mysql> SELECT CHARSET('abc'), COLLATION('abc'); +----------------+------------------+ | CHARSET('abc') | COLLATION('abc') | +----------------+------------------+ | latin7         | latin7_bin       | +----------------+------------------+ 

For a binary string, the CHARSET⁠(⁠ ⁠ ⁠) or COLLATION⁠(⁠ ⁠ ⁠) functions return a value of binary, which means that the string is compared and sorted based on numeric byte values, not character collation values. Several functions return binary strings, such as MD5⁠(⁠ ⁠ ⁠) and PASSWORD⁠(⁠ ⁠ ⁠):

mysql> SELECT CHARSET(MD5('a')), COLLATION(MD5('a')); +-------------------+---------------------+ | CHARSET(MD5('a')) | COLLATION(MD5('a')) | +-------------------+---------------------+ | binary            | binary              | +-------------------+---------------------+ mysql> SELECT CHARSET(PASSWORD('a')), COLLATION(PASSWORD('a')); +------------------------+--------------------------+ | CHARSET(PASSWORD('a')) | COLLATION(PASSWORD('a')) | +------------------------+--------------------------+ | binary                 | binary                   | +------------------------+--------------------------+ 

It can be useful to know that a function or string expression produces a binary string if you're trying to perform lettercase conversion on the result and it's not working. See Section 5.8 for details.




MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2004
Pages: 375
Authors: Paul DuBois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net