Recipe 7.4. Controlling Case Sensitivity of String Sorts


Problem

String sorting operations are case-sensitive when you don't want them to be, or vice versa.

Solution

Alter the comparison characteristics of the sorted values.

Discussion

Chapter 5 discusses how string comparison properties depend on whether the strings are binary or nonbinary:

  • Binary strings are sequences of bytes. They are compared byte by byte using numeric byte values. Character set and lettercase have no meaning for comparisons.

  • Nonbinary strings are sequences of characters. They have a character set and collation and are compared character by character using the order defined by the collation.

These properties apply to string sorting as well, because sorting is based on comparison. To alter the sorting properties of a string column, you must alter its comparison properties. (For a summary of which string data types are binary and nonbinary, see Section 5.2.)

The examples in this section use a table that has case-insensitive and case-sensitive nonbinary columns, and a binary column:

CREATE TABLE str_val (   ci_str   CHAR(3) CHARACTER SET latin1 COLLATE latin1_swedish_ci,   cs_str   CHAR(3) CHARACTER SET latin1 COLLATE latin1_general_cs,   bin_str  BINARY(3) ); 

Suppose that the table has the following contents:

+--------+--------+---------+ | ci_str | cs_str | bin_str | +--------+--------+---------+ | AAA    | AAA    | AAA     | | aaa    | aaa    | aaa     | | bbb    | bbb    | bbb     | | BBB    | BBB    | BBB     | +--------+--------+---------+ 

Each column contains the same values, but the natural sort orders for the column data types produce three different results:

  • The case-insensitive collation sorts a and A together, placing them before b and B. However, for a given letter, it does not necessarily order one lettercase before another, as shown by the following result:

    mysql> SELECT ci_str FROM str_val ORDER BY ci_str; +--------+ | ci_str | +--------+ | AAA    | | aaa    | | bbb    | | BBB    | +--------+ 

  • The case-sensitive collation puts A and a before B and b, and sorts uppercase before lowercase:

    mysql> SELECT cs_str FROM str_val ORDER BY cs_str; +--------+ | cs_str | +--------+ | AAA    | | aaa    | | BBB    | | bbb    | +--------+ 

  • The binary strings sort numerically. Assuming that uppercase letters have numeric values less than those of lowercase letters, a binary sort results in the following ordering:

    mysql> SELECT bin_str FROM str_val ORDER BY bin_str; +---------+ | bin_str | +---------+ | AAA     | | BBB     | | aaa     | | bbb     | +---------+ 

    You get the same result for a nonbinary string column that has a binary collation, as long as the column contains single-byte characters (for example, CHAR(3) CHARACTER SET latin1 COLLATE latin1_bin). For multibyte characters, a binary collation still produces a numeric sort, but the character values use multibyte numbers.

To alter the sorting properties of each column, use the techniques described in Section 5.9 for controlling how string comparisons work:

  • To sort case-insensitive strings in case-sensitive fashion, order the sorted values using a case-sensitive collation:

    mysql> SELECT ci_str FROM str_val     -> ORDER BY ci_str COLLATE latin1_general_cs; +--------+ | ci_str | +--------+ | AAA    | | aaa    | | BBB    | | bbb    | +--------+ 

  • To sort case-sensitive strings in case-insensitive fashion, order the sorted values using a case-insensitive collation:

    mysql> SELECT cs_str FROM str_val     -> ORDER BY cs_str COLLATE latin1_swedish_ci; +--------+ | cs_str | +--------+ | AAA    | | aaa    | | bbb    | | BBB    | +--------+ 

    Another possibility is to sort using values that have all been converted to the same lettercase, which makes lettercase irrelevant:

    mysql> SELECT cs_str FROM str_val     -> ORDER BY UPPER(cs_str); +--------+ | cs_str | +--------+ | AAA    | | aaa    | | bbb    | | BBB    | +--------+ 

  • Binary strings sort using numeric byte values, so there is no concept of lettercase involved. However, because letters in different cases have different byte values, comparisons of binary strings effectively are case-sensitive. (That is, a and A are unequal.) To sort binary strings using a case-insensitive ordering, convert them to nonbinary strings and apply an appropriate collation. For example, to perform a case-insensitive sort, use a statement like this:

    mysql> SELECT bin_str FROM str_val     -> ORDER BY CONVERT(bin_str USING latin1) COLLATE latin1_swedish_ci; +---------+ | bin_str | +---------+ | AAA     | | aaa     | | bbb     | | BBB     | +---------+ 

    If the default collation is case-insensitive (as is true for latin1), you can omit the COLLATE clause.




MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2004
Pages: 375
Authors: Paul DuBois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net