10.3. String Expressions


Literal strings in expressions are written as quoted values. By default, either single quotes or double quotes can be used, although single quotes are more standard. Also, if the ANSI_QUOTES SQL mode is enabled, double quotes are interpreted as identifier-quoting characters, so literal strings can be quoted only with single quotes.

The data types for representing strings in tables include CHAR, VARCHAR, BINARY, VARBINARY, and the TEXT and BLOB types. You choose which type to use depending on factors such as the maximum length of values, whether you require fixed-length or variable-length values, and whether the strings to be stored are non-binary or binary.

Direct use of strings in expressions occurs primarily in comparison operations. Otherwise, most string operations are performed by using functions.

The usual comparison operators apply to string values (=, <>, <, BETWEEN, and so forth). The result of a comparison depends on whether strings are non-binary or binary and, for non-binary strings that have the same character set, on their collation. (A comparison between strings that have different character sets typically results in an error.) String comparisons are dealt with further in Section 10.3.1, "Case Sensitivity in String Comparisons." Pattern matching is another form of comparison; it's covered in Section 10.3.2, "Using LIKE for Pattern Matching."

String concatenation is done with the CONCAT() function:

 mysql> SELECT CONCAT('abc','def',REPEAT('X',3)); +-----------------------------------+ | CONCAT('abc','def',REPEAT('X',3)) | +-----------------------------------+ | abcdefXXX                         | +-----------------------------------+ 

The || operator is treated as the logical OR operator by default, but can be used for string concatenation if you enable the PIPES_AS_CONCAT SQL mode:

 mysql> SELECT 'abc' || 'def'; +----------------+ | 'abc' || 'def' | +----------------+ |              0 | +----------------+ 1 row in set, 2 warnings (0.00 sec) mysql> SET sql_mode = 'PIPES_AS_CONCAT'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT 'abc' || 'def'; +----------------+ | 'abc' || 'def' | +----------------+ | abcdef         | +----------------+ 1 row in set (0.00 sec) 

In the first SELECT statement, || performs a logical OR operation. This is a numeric operation, so MySQL converts the strings in the expression to numbers first. Neither looks like a number, so MySQL converts them to zero, which is why there is a warning count of two. The resulting operands for the operation are zero, so the result also is zero. After PIPES_AS_CONCAT is enabled, || produces a string concatenation instead.

Several functions take string arguments or return string values. Some types of operations these functions can perform are to convert lettercase, calculate string lengths, or search for, insert, or replace substrings. Section 10.6, "Functions in SQL Expressions," presents some representative examples.

10.3.1. Case Sensitivity in String Comparisons

String comparisons are somewhat more complex than numeric or temporal comparisons. Numbers sort in numeric order and dates and times sort in temporal order, but string comparisons depend not only on the specific content of the strings, but on whether they are non-binary or binary. A letter in uppercase may compare as the same or different than the same letter in lowercase, and a letter with one type of accent may be considered the same or different than that letter with another type of accent.

The earlier discussion in Chapter 5, "Data Types," describes how strings may be non-binary or binary, and how the properties of these two types of strings differ. To summarize:

  • A non-binary string contains characters from a particular character set, and is associated with one of the collations (sorting orders) available for the character set. Characters may consist of single or multiple bytes. A collation can be case insensitive (lettercase is not significant), case sensitive (lettercase is significant), or binary (comparisons are based on numeric character values).

  • A binary string is treated as raw bytes. It has no character set and no collation. Comparisons between binary strings are based on numeric byte values.

The rules that govern string comparison apply in several ways. They determine the result of comparisons performed explicitly with operators such as = and <, and comparisons performed implicitly by ORDER BY, GROUP BY, and DISTINCT operations.

The default character set and collation for literal strings depend on the values of the character_set_connection and collation_connection system variables. The default character set is latin1. The default collation is latin1_swedish_ci, which is case insensitive as indicated by the "_ci" at the end of the collation name. Assuming these connection settings, literal strings are not case sensitive by default. You can see this by comparing strings that differ only in lettercase:

 mysql> SELECT 'Hello' = 'hello'; +-------------------+ | 'Hello' = 'hello' | +-------------------+ |                 1 | +-------------------+ 

A given collation might cause certain accented characters to compare the same as other characters. For example, 'ü' and 'ue' are different in the default latin1_swedish_ci collation, but with the latin1_german2_ci collation ("German phone-book" collation), they have the same sort value and thus compare as equal:

 mysql> SELECT 'Müller' = 'Mueller'; +----------------------+ | 'Müller' = 'Mueller' | +----------------------+ |                    0 | +----------------------+ mysql> SET collation_connection = latin1_german2_ci; mysql> SELECT 'Müller' = 'Mueller'; +----------------------+ | 'Müller' = 'Mueller' | +----------------------+ |                    1 | +----------------------+ 

For binary strings, lettercase is significant. However, this is not because binary strings are case sensitive per se, because binary strings have no character set. Rather, it is because uppercase and lowercase versions of a character have different numeric values.

A non-binary string can be treated as a binary string by preceding it with the BINARY keyword. If either string in a comparison is binary, both strings are treated as binary:

 mysql> SELECT BINARY 'Hello' = 'hello'; +--------------------------+ | BINARY 'Hello' = 'hello' | +--------------------------+ |                        0 | +--------------------------+ mysql> SELECT 'Hello' = BINARY 'hello'; +--------------------------+ | 'Hello' = BINARY 'hello' | +--------------------------+ |                        0 | +--------------------------+ 

The sorting principles just described were demonstrated using literal strings, but the same principles apply to string-valued table columns. Suppose that a table t contains a column c and has the following rows:

 mysql> SELECT c FROM t; +-----------+ | c         | +-----------+ | Hello     | | goodbye   | | Bonjour   | | au revoir | +-----------+ 

If c is a CHAR column that has the latin1_swedish_ci collation, it is a non-binary column with a case-insensitive collation. Uppercase and lowercase letters are treated as identical and a sort operation that uses ORDER BY produces results like this:

 mysql> SELECT c FROM t ORDER BY c; +-----------+ | c         | +-----------+ | au revoir | | Bonjour   | | goodbye   | | Hello     | +-----------+ 

If c is declared as a BINARY column instead, it has no character set or collation. ORDER BY sorts using raw byte codes and produces a different result. Assuming that the values are stored on a machine that uses ASCII codes, the numeric values for uppercase letters precede those for lowercase letters and the result looks like this:

 mysql> SELECT c FROM t ORDER BY c; +-----------+ | c         | +-----------+ | Bonjour   | | Hello     | | au revoir | | goodbye   | +-----------+ 

String comparison rules also apply to GROUP BY and DISTINCT operations. Suppose that t has a column c with the following contents:

 mysql> SELECT c FROM t; +---------+ | c       | +---------+ | Hello   | | hello   | | Goodbye | | goodbye | +---------+ 

If c is a non-binary, case-insensitive column, GROUP BY and DISTINCT do not make lettercase distinctions:

 mysql> SELECT c, COUNT(*) FROM t GROUP BY c; +---------+----------+ | c       | COUNT(*) | +---------+----------+ | Goodbye |        2 | | Hello   |        2 | +---------+----------+ mysql> SELECT DISTINCT c FROM t; +---------+ | c       | +---------+ | Hello   | | Goodbye | +---------+ 

On the other hand, if c is a BINARY column, those operations use byte values for sorting:

 mysql> SELECT c, COUNT(*) FROM t GROUP BY c; +---------+----------+ | c       | COUNT(*) | +---------+----------+ | Goodbye |        1 | | Hello   |        1 | | goodbye |        1 | | hello   |        1 | +---------+----------+ mysql> SELECT DISTINCT c FROM t; +---------+ | c       | +---------+ | Hello   | | hello   | | Goodbye | | goodbye | +---------+ 

The preceding discussion shows that to understand sorting and comparison behavior for strings, it's important to know whether they are non-binary or binary. This is important when using string functions as well. String functions may treat their arguments as non-binary or binary strings, or return binary or non-binary results. It depends on the function. Here are some examples:

  • LENGTH() returns the length of a string in bytes, whereas CHAR_LENGTH() returns the length in characters. For strings that contain only single-byte characters, the two functions return identical results. For strings that contain multi-byte characters, you should choose the function that is appropriate for the type of result you want. For example, the sjis character set includes characters that require two bytes to represent. The value of LENGTH() for any string containing such characters will be greater than the value of CHAR_LENGTH().

  • The UPPER() and LOWER() functions perform case conversion only if the argument is a non-binary string. Suppose that 'AbCd' is non-binary. In that case, the two functions return a value in the requested lettercase:

     mysql> SELECT UPPER('AbCd'), LOWER('AbCd'); +---------------+---------------+ | UPPER('AbCd') | LOWER('AbCd') | +---------------+---------------+ | ABCD          | abcd          | +---------------+---------------+ 

    However, if 'AbCd' is a binary string, it has no character set. In that case, the concept of lettercase does not apply, and UPPER() and LOWER() do nothing:

     mysql> SELECT UPPER(BINARY 'AbCd'), LOWER(BINARY 'AbCd'); +----------------------+----------------------+ | UPPER(BINARY 'AbCd') | LOWER(BINARY 'AbCd') | +----------------------+----------------------+ | AbCd                 | AbCd                 | +----------------------+----------------------+ 

    To make the two functions perform case conversion for a binary string, convert it to a non-binary string. For example:

     mysql> SELECT UPPER(CONVERT(BINARY 'AbCd' USING latin1)); +--------------------------------------------+ | UPPER(CONVERT(BINARY 'AbCd' USING latin1)) | +--------------------------------------------+ | ABCD                                       | +--------------------------------------------+ 

  • MD5() takes a string argument and produces a 32-byte checksum represented as a string of hexadecimal digits. It treats its argument as a binary string:

     mysql> SELECT MD5('a'); +----------------------------------+ | MD5('a')                         | +----------------------------------+ | 0cc175b9c0f1b6a831c399e269772661 | +----------------------------------+ mysql> SELECT MD5('A'); +----------------------------------+ | MD5('A')                         | +----------------------------------+ | 7fc56270e7a70fa81a5935b72eacbe29 | +----------------------------------+ 

These examples demonstrate that you must take into account the properties of the particular function you want to use. If you don't, you might be surprised at the results you get. See the MySQL Reference Manual for details on individual functions.

10.3.2. Using LIKE for Pattern Matching

Operators such as = and != are useful for finding values that are equal to or not equal to a specific exact comparison value. When it's necessary to find values based on similarity instead, a pattern match is useful. To perform a pattern match, use value LIKE 'pattern', where value is the value you want to test and 'pattern' is a pattern string that describes the general form of values that you want to match.

Patterns used with the LIKE pattern-matching operator can contain two special characters (called "metacharacters" or "wildcards") that stand for something other than themselves:

  • The '%' character matches any sequence of zero or more characters. For example, the pattern 'a%' matches any string that begins with 'a', '%b' matches any string that ends with 'b', and '%c%' matches any string that contains a 'c'. The pattern '%' matches any string, including empty strings.

  • The '_' (underscore) character matches any single character. 'd_g' matches strings such as 'dig', 'dog', and 'd@g'. Because '_' matches any single character, it matches itself and the pattern 'd_g' also matches the string 'd_g'.

A pattern can use these metacharacters in combination. For example, '_%' matches any string containing at least one character.

LIKE evaluates to NULL if either operand is NULL, but any non-NULL literal value matches itself. Likewise, a function call that produces a non-NULL value matches itself (with one exception). Thus, the following expressions evaluate as true:

 'ABC' LIKE 'ABC' column_name LIKE column_name VERSION() LIKE VERSION() 

The exception is that different invocations of the RAND() random-number function might return different values, even within the same query:

 mysql> SELECT RAND(), RAND(); +------------------+------------------+ | RAND()           | RAND()           | +------------------+------------------+ | 0.15430032289987 | 0.30666533979277 | +------------------+------------------+ 

As a result, the expression RAND() LIKE RAND() normally will be false.

LIKE performs a non-binary comparison if both operands are non-binary strings; otherwise, the comparison is binary:

 mysql> SELECT 'ABC' LIKE 'abc', 'ABC' LIKE BINARY 'abc'; +------------------+-------------------------+ | 'ABC' LIKE 'abc' | 'ABC' LIKE BINARY 'abc' | +------------------+-------------------------+ |                1 |                       0 | +------------------+-------------------------+ 

To invert a pattern match, use NOT LIKE rather than LIKE:

 mysql> SELECT 'ABC' LIKE 'A%', 'ABC' NOT LIKE 'A%'; +-----------------+---------------------+ | 'ABC' LIKE 'A%' | 'ABC' NOT LIKE 'A%' | +-----------------+---------------------+ |               1 |                   0 | +-----------------+---------------------+ 

MySQL, unlike some other database systems, allows use of LIKE with non-string values. This can be useful in some cases. For example, the expression d LIKE '19%' is true for date values d that occur during the 1900s. MySQL evaluates such comparisons by converting non-string values to strings before performing the pattern match.

It's possible to specify the pattern in a LIKE expression using a table column. In this case, the actual pattern that a value is compared to can vary for every row of a result set. The following table has one column containing patterns and another column that characterizes the type of string each pattern matches:

 mysql> SELECT pattern, description FROM patlist; +---------+--------------------------------+ | pattern | description                    | +---------+--------------------------------+ |         | empty string                   | | _%      | non-empty string               | | _ _ _   | string of exactly 3 characters | +---------+--------------------------------+ 

The patterns in the table can be applied to specific values to characterize them:

 mysql> SELECT description, IF('' LIKE pattern,'YES','NO')     -> FROM patlist; +--------------------------------+--------------------------------+ | description                    | IF('' LIKE pattern,'YES','NO') | +--------------------------------+--------------------------------+ | empty string                   | YES                            | | non-empty string               | NO                             | | string of exactly 3 characters | NO                             | +--------------------------------+--------------------------------+ mysql> SELECT description, IF('abc' LIKE pattern,'YES','NO')     -> FROM patlist; +--------------------------------+-----------------------------------+ | description                    | IF('abc' LIKE pattern,'YES','NO') | +--------------------------------+-----------------------------------+ | empty string                   | NO                                | | non-empty string               | YES                               | | string of exactly 3 characters | YES                               | +--------------------------------+-----------------------------------+ mysql> SELECT description, IF('hello' LIKE pattern,'YES','NO')     -> FROM patlist; +--------------------------------+-------------------------------------+ | description                    | IF('hello' LIKE pattern,'YES','NO') | +--------------------------------+-------------------------------------+ | empty string                   | NO                                  | | non-empty string               | YES                                 | | string of exactly 3 characters | NO                                  | +--------------------------------+-------------------------------------+ 

To match a pattern metacharacter literally, escape it by preceding it by a backslash:

 mysql> SELECT 'AA' LIKE 'A%', 'AA' LIKE 'A\%', 'A%' LIKE 'A\%'; +----------------+-----------------+-----------------+ | 'AA' LIKE 'A%' | 'AA' LIKE 'A\%' | 'A%' LIKE 'A\%' | +----------------+-----------------+-----------------+ |              1 |               0 |               1 | +----------------+-----------------+-----------------+ mysql> SELECT 'AA' LIKE 'A_', 'AA' LIKE 'A\_', 'A_' LIKE 'A\_'; +----------------+-----------------+-----------------+ | 'AA' LIKE 'A_' | 'AA' LIKE 'A\_' | 'A_' LIKE 'A\_' | +----------------+-----------------+-----------------+ |              1 |               0 |               1 | +----------------+-----------------+-----------------+ 

To specify a given character as the escape character, use an ESCAPE clause:

 mysql> SELECT 'AA' LIKE 'A@%' ESCAPE '@', 'A%' LIKE 'A@%' ESCAPE '@'; +----------------------------+----------------------------+ | 'AA' LIKE 'A@%' ESCAPE '@' | 'A%' LIKE 'A@%' ESCAPE '@' | +----------------------------+----------------------------+ |                          0 |                          1 | +----------------------------+----------------------------+ 



MySQL 5 Certification Study Guide
MySQL 5.0 Certification Study Guide
ISBN: 0672328127
EAN: 2147483647
Year: 2006
Pages: 312

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net