One property of a string is whether it is binary or nonbinary:
A characteristic of nonbinary strings is that they have a character set. To see which character sets are available, use this statement: mysql> SHOW CHARACTER SET; +----------+-----------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+-----------------------------+---------------------+--------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 | | dec8 | DEC West European | dec8_swedish_ci | 1 | | cp850 | DOS West European | cp850_general_ci | 1 | | hp8 | HP West European | hp8_english_ci | 1 | | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 | | latin1 | cp1252 West European | latin1_swedish_ci | 1 | | latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 | ... | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | | ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 | ... The default character set in MySQL is latin1. If you need to store characters from several languages in a single column, consider using one of the Unicode character sets (utf8 or ucs2) because they can represent characters from multiple languages. Some character sets contain only single-byte characters, whereas others allow multibyte characters. For some multibyte character sets, all characters have a fixed length. Others contain characters of varying lengths. For example, Unicode data can be stored using the ucs2 character set in which all characters take two bytes or the utf8 character set in which characters take from one to three bytes. You can determine whether a given string contains multibyte characters using the LENGTH( ) and CHAR_LENGTH( ) functions, which return the length of a string in bytes and characters, respectively. If LENGTH( ) is greater than CHAR_LENGTH( ) for a given string, multibyte characters are present.
Another characteristic of nonbinary strings is collation, which determines the sort order of characters in the character set. Use SHOW COLLATION to see which collations are available; add a LIKE clause to see the collations for a particular character set: mysql> SHOW COLLATION LIKE 'latin1%'; +-------------------+---------+----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +-------------------+---------+----+---------+----------+---------+ | latin1_german1_ci | latin1 | 5 | | Yes | 1 | | latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 | | latin1_danish_ci | latin1 | 15 | | Yes | 1 | | latin1_german2_ci | latin1 | 31 | | Yes | 2 | | latin1_bin | latin1 | 47 | | Yes | 1 | | latin1_general_ci | latin1 | 48 | | Yes | 1 | | latin1_general_cs | latin1 | 49 | | Yes | 1 | | latin1_spanish_ci | latin1 | 94 | | Yes | 1 | +-------------------+---------+----+---------+----------+---------+ In contexts where no collation is indicated, the collation with Yes in the Default column is the default collation used for strings in the given character set. As shown, the default collation for latin1 is latin1_swedish_ci. (Default collations are also displayed by SHOW CHARACTER SET.) A collation can be case-sensitive (a and A are different), case-insensitive (a and A are the same), or binary (two characters are the same or different based on whether their numeric values are equal). A collation name ending in ci, cs, or bin is case-insensitive, case-sensitive, or binary, respectively. A binary collation provides a sort order for nonbinary strings that is something like the order for binary strings, in the sense that comparisons for binary strings and binary collations both use numeric values. However, there is a difference: binary string comparisons are always based on single-byte units, whereas a binary collation compares nonbinary strings using character numeric values; depending on the character set, some of these might be multibyte values. The following example illustrates how collation affects sort order. Suppose that a table contains a latin1 string column and has the following rows: mysql> CREATE TABLE t (c CHAR(3) CHARACTER SET latin1); mysql> INSERT INTO t (c) VALUES('AAA'),('bbb'),('aaa'),('BBB'); mysql> SELECT c FROM t; +------+ | c | +------+ | AAA | | bbb | | aaa | | BBB | +------+ By applying the COLLATE operator to the column, you can choose which collation to use for sorting and thus affect the order of the result:
You can choose a language-specific collation if you require that comparison and sorting operations use the sorting rules of a particular language. For example, if you store strings using the utf8 character set, the default collation (utf8_general_ci) treats ch and ll as two-character strings. If you need the traditional Spanish ordering that treats ch and ll as single characters that follow c and l, respectively, use the utf8_spanish2_ci collation. The two collations produce different results, as shown here: mysql> CREATE TABLE t (c CHAR(2) CHARACTER SET utf8); mysql> INSERT INTO t (c) VALUES('cg'),('ch'),('ci'),('lk'),('ll'),('lm'); mysql> SELECT c FROM t ORDER BY c COLLATE utf8_general_ci; +------+ | c | +------+ | cg | | ch | | ci | | lk | | ll | | lm | +------+ mysql> SELECT c FROM t ORDER BY c COLLATE utf8_spanish2_ci; +------+ | c | +------+ | cg | | ci | | ch | | lk | | lm | | ll | +------+ |