Recipe 5.2. Choosing a String Data Type


Problem

You need to store string data but aren't sure which is the most appropriate data type.

Solution

Choose the data type according to the characteristics of the information to be stored and how you need to use it. Consider questions such as these:

  • Are the strings binary or nonbinary?

  • Does case sensitivity matter?

  • What is the maximum string length?

  • Do you want to store fixed- or variable-length values?

  • Do you need to retain trailing spaces?

  • Is there a fixed set of allowable values?

Discussion

MySQL provides several binary and nonbinary string data types. These types come in pairs as shown in the following table.

Binary data typeNonbinary data typeMaximum length
BINARY CHAR 255
VARBINARY VARCHAR 65,535
TINYBLOB TINYTEXT 255
BLOB TEXT 65,535
MEDIUMBLOB MEDIUMTEXT 16,777,215
LONGBLOB LONGTEXT 4,294,967,295


For the binary data types, the maximum length is the number of bytes the string must be able to hold. For the nonbinary types, the maximum length is the number of characters the string must be able to hold (which for a string containing multibyte characters requires more than that many bytes).

For the BINARY and CHAR data types, MySQL stores column values using a fixed width. For example, values stored in a BINARY(10) or CHAR(10) column always take 10 bytes or 10 characters, respectively. Shorter values are padded to the required length as necessary when stored. For BINARY, the pad value is 0x00 (the zero-valued byte, also known as ASCII NUL). CHAR values are padded with spaces. Trailing pad bytes or characters are stripped from BINARY and CHAR values when they are retrieved.

For VARBINARY, VARCHAR, and the BLOB and TEXT types, MySQL stores values using only as much storage as required, up to the maximum column length. No padding is added or stripped when values are stored or retrieved.

If you want to preserve trailing pad values that are present in the original strings that are stored, use a data type for which no stripping occurs. For example, if you're storing character (nonbinary) strings that might end with spaces, and you want to preserve them, use VARCHAR or one of the TEXT data types. The following statements illustrate the difference in trailing-space handling for CHAR and VARCHAR columns:

mysql> CREATE TABLE t (c1 CHAR(10), c2 VARCHAR(10)); mysql> INSERT INTO t (c1,c2) VALUES('abc       ','abc       '); mysql> SELECT c1, c2, CHAR_LENGTH(c1), CHAR_LENGTH(c2) FROM t; +------+------------+-----------------+-----------------+ | c1   | c2         | CHAR_LENGTH(c1) | CHAR_LENGTH(c2) | +------+------------+-----------------+-----------------+ | abc  | abc        |               3 |              10 | +------+------------+-----------------+-----------------+ 

Thus, if you store a string that contains trailing spaces into a CHAR column, you will find that they're gone when you retrieve the value. Similar padding and stripping occurs for BINARY columns, except that the pad value is 0x00.

NOTE

Prior to MySQL 5.0.3, VARCHAR and VARBINARY have a maximum length of 255. Also, stripping of trailing pad values for retrieved values applies to VARCHAR and VARBINARY columns, so you should use one of the TEXT or BLOB types if you want to retain trailing spaces or 0x00 bytes.

A table can include a mix of binary and nonbinary string columns, and its nonbinary columns can use different character sets and collations. When you declare a nonbinary string column, use the CHARACTER SET and COLLATE attributes if you require a particular character set and collation. For example, if you need to store utf8 (Unicode) and sjis (Japanese) strings, you might define a table with two columns like this:

CREATE TABLE mytbl (   utf8data VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_danish_ci,   sjisdata VARCHAR(100) CHARACTER SET sjis COLLATE sjis_japanese_ci ); 

It is allowable to omit CHARACTER SET, COLLATE, or both from a column definition:

  • If you specify CHARACTER SET and omit COLLATE, the default collation for the character set is used.

  • If you specify COLLATE and omit CHARACTER SET, the character set implied by the collation name (the first part of the name) is used. For example, utf8_danish_ci and sjis_japanese_ci imply utf8 and sjis, respectively. (This means that the CHARACTER SET attributes could have been omitted from the preceding CREATE TABLE statement.)

  • If you omit both CHARACTER SET and COLLATE, the column is assigned the table default character set and collation. (A table definition can include those attributes following the closing parenthesis at the end of the CREATE TABLE statement. If present, they apply to columns that have no explicit character set or collation of their own. If omitted, the table defaults are taken from the database defaults. The database defaults can be specified when you create the database with the CREATE DATABASE statement. The server defaults apply to the database if they are omitted.)

The server default character set and collation are latin1 and latin1_swedish_ci unless you start the server with the --character-set-server and --collation-server options to specify different values. This means that, by default, strings use the latin1 character set and are not case-sensitive.

MySQL also supports ENUM and SET string types, which are used for data that has a fixed set of allowable values. You can use the CHARACTER SET and COLLATE attributes for these data types as well.




MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2004
Pages: 375
Authors: Paul DuBois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net