The following table lists the string data types provided in MySQL.
When you choose a string data type, consider the following factors:
The following discussion first describes the general differences between non-binary and binary strings, and then the specific characteristics of each of the string data types. 5.4.1. Character Set SupportStrings in MySQL may be treated as non-binary or binary. The differences between these types of strings make them suited to different purposes. The most general difference is that non-binary strings have a character set and consist of characters in that character set, whereas binary strings consist simply of bytes that are distinguished only by their numeric values. This section explores the implications of this difference. Non-binary strings have the following characteristics:
A given character set may have several collations to choose from. This enables you to select different sort orders for the same character set. For example, with the latin1 character set, you can choose from any of the following collations, many of which correspond to the sorting order rules of specific languages: mysql> SHOW COLLATION LIKE 'latin1%'; +-------------------+---------+----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +-------------------+---------+----+---------+----------+---------+ | latin1_german1_ci | latin1 | 5 | | | 0 | | latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 | | latin1_danish_ci | latin1 | 15 | | | 0 | | latin1_german2_ci | latin1 | 31 | | Yes | 2 | | latin1_bin | latin1 | 47 | | Yes | 1 | | latin1_general_ci | latin1 | 48 | | | 0 | | latin1_general_cs | latin1 | 49 | | | 0 | | latin1_spanish_ci | latin1 | 94 | | | 0 | +-------------------+---------+----+---------+----------+---------+ Each collation name ends with _ci, _cs, or _bin, signifying that the collation is case insensitive, case sensitive, or binary. Binary strings have the following characteristics:
String comparison rules are addressed in more detail in Section 10.3.1, "Case Sensitivity in String Comparisons." The different treatment of non-binary and binary strings in MySQL is important when it comes to choosing data types for table columns. You normally base the decision on whether you want to treat column values as containing characters or raw bytes. Thus, non-binary columns are more suitable for character strings such as textual descriptions, and binary columns are more suitable for raw data such as images or compressed data. Three data types store non-binary strings: CHAR, VARCHAR, and TEXT. THRee data types store binary strings: BINARY, VARBINARY, and BLOB. They're each described further in the following sections. You can mix non-binary and binary string columns within a single table. Also, for non-binary string columns, different columns can use different character sets and collations. For example, assume that you want to create a table named auth_info, to store login name and password authorization information for users of an application, as well as a picture to associate with each user. You want login names to match in any lettercase, passwords to be case sensitive, and the picture column must store binary image data. The following table definition satisfies these requirements: CREATE TABLE auth_info ( login CHAR(32) CHARACTER SET latin1, password CHAR(32) CHARACTER SET latin1 COLLATE latin1_general_cs, picture MEDIUMBLOB ); 5.4.2. Non-Binary String Data Types: CHAR, VARCHAR, TEXTThe CHAR, VARCHAR, and TEXT data types store non-binary strings (that is, strings of characters that have a character set and collation). The types differ in terms of their maximum allowable length and in how trailing spaces are handled. The CHAR data type is a fixed-length type. To define a CHAR column, provide the column name, the keyword CHAR, and the maximum length of acceptable values in parentheses. The length should be a number from 0 to 255. The CHAR data type holds strings up to the length specified in the column definition. Values in a CHAR column always take the same amount of storage. For example, a column defined as CHAR(30) requires 30 characters for each value, even empty values. Values shorter than the designated length are padded with spaces to that length when they are stored. Trailing spaces are removed from CHAR values when they are retrieved, so retrieved values might not be the same length as when stored. VARCHAR is a variable-length data type. VARCHAR columns are defined similarly to CHAR columns, but the maximum length can be a number up to 65,535. (The actual allowable maximum length is a few characters less due to internal restrictions imposed by storage engines.) A string stored into a VARCHAR column takes only the number of characters required to store it, plus one or two bytes to record the string's length. (One byte for columns declared with a length less than 256, two bytes otherwise.) Values in a VARCHAR column are stored as given. Trailing spaces are not removed or added for storage or retrieval. The TEXT data type comes in four different sizes, differing in the maximum length of values they can store. All are variable-length types, so an individual value requires storage equal to the length (in characters) of the value, plus 1 to 4 bytes to record the length of the value. Trailing spaces are not removed or added for storage or retrieval. The following table summarizes the non-binary string data types. For the storage requirement values, M represents the maximum length of a column. L represents the actual length of a given value, which may be 0 to M.
For fixed-length (CHAR) columns, MySQL must allocate enough space to store any value containing up to as many characters allowed by the column declaration. For CHAR(10), 10 bytes are required if the column has a single-byte character set. If the column has a multi-byte character set, MySQL must allocate 10 times the width of the widest allowed character. For utf8, each character takes from one to three bytes, so MySQL must allocate three bytes per character, or 30 bytes per column value. This amount of storage is required even for storing an empty string. For variable-length (VARCHAR, TEXT) columns, MySQL allocates only the required amount of space for each stored value. A 10-character utf8 VARCHAR column requires 10 bytes (plus a length byte) for a value that contains only single-byte characters, but 30 bytes (plus a length byte) if it contains only triple-byte characters. Non-binary strings have a character set and collation, and non-binary string columns by default are assigned the character set and collation of the table that contains them. The CHARACTER SET and COLLATE attributes can be used to designate specific values for a column, as described in Section 5.6, "Column Attributes." 5.4.3. Binary String Data Types: BINARY, VARBINARY, BLOBThe BINARY, VARBINARY, and BLOB data types are the binary string equivalents of the non-binary CHAR, VARCHAR, and TEXT data types. That is, they store strings that consist of bytes rather than characters, and they have no character set or collation. Like the corresponding non-binary types, binary string types differ in terms of their maximum allowable length and in how trailing spaces are handled. BINARY is a fixed-length data type. The length should be a number from 0 to 255. Values shorter than the designated length are padded with spaces to that length when they are stored. Trailing spaces are removed from BINARY values when they are retrieved, so retrieved values might not be the same length as when stored. For this reason, BINARY may not be suited for applications that store binary data if stored values can have trailing spaces. For example, if an encrypted value happens to end with spaces, the retrieved value will be different from the value that was stored. VARBINARY is a variable-length data type. The maximum length can be a number up to 65,535. (The actual allowable maximum length is a few bytes less due to internal restrictions imposed by storage engines.) Values in a VARBINARY column are stored as given. Trailing spaces are not removed or added for storage or retrieval. The BLOB data type comes in four different sizes, differing in the maximum length of values they can store. All are variable-length types, so an individual value requires storage equal to the length (in bytes) of the value, plus 1 to 4 bytes to record the length of the value. The following table summarizes the binary string data types. For the storage requirement values, M represents the maximum length of a column. L represents the actual length of a given value, which may be 0 to M.
5.4.4. The ENUM and SET Data TypesThe ENUM and SET string data types are used when the values to be stored in a column are chosen from a fixed set of values. You define columns for both types in terms of string values, but MySQL represents them internally as integers. This leads to very efficient storage, but can have some results that are unintuitive unless you keep this string/integer duality in mind. ENUM is an enumeration type. An ENUM column definition includes a list of allowable values; each value in the list is called a "member" of the list. Every value stored in the column must equal one of the values in the list. A simple (and very common) use for ENUM is to create a two-element list for columns that store yes/no or true/false choices. The following table shows how to declare such columns: CREATE TABLE booleans ( yesno ENUM('Y','N'), truefalse ENUM('T','F') ); Enumeration values aren't limited to being single letters or uppercase. The columns could also be defined like this: CREATE TABLE booleans ( yesno ENUM('yes','no'), truefalse ENUM('true','false') ); An ENUM column definition may list up to 65,535 members. Enumerations with up to 255 members require one byte of storage per value. Enumerations with 256 to 65,535 members require two bytes per value. The following table contains an enumeration column continent that lists continent names as valid enumeration members: CREATE TABLE Countries ( name char(30), continent ENUM ('Asia','Europe','North America','Africa', 'Oceania','Antarctica','South America') ); The values in an ENUM column definition are given as a comma-separated list of quoted strings. Internally, MySQL stores the strings as integers, using the values 1 through n for a column with n enumeration members. The following statement assigns the enumeration value 'Africa' to the continent column; MySQL actually stores the value 4 because 'Africa' is the fourth continent name listed in the enumeration definition: INSERT INTO Countries (name,continent) VALUES('Kenya','Africa'); MySQL reserves the internal value 0 as an implicit member of all ENUM columns. It's used to represent illegal values assigned to an enumeration column. For example, if you assign 'USA' to the continent column, MySQL will store the value 0, rather than any of the values 1 through 7, because 'USA' is not a valid enumeration member. If you select the column later, MySQL displays 0 values as the empty string ''. (In strict SQL mode, an error occurs if you try to store an illegal ENUM value.) The SET data type, like ENUM, is declared using a comma-separated list of quoted strings that define its valid members. But unlike ENUM, a given SET column may be assigned a value consisting of any combination of those members. The following definition contains a list of symptoms exhibited by allergy sufferers: CREATE TABLE allergy ( symptom SET('sneezing','runny nose','stuffy head','red eyes') ); A patient may have any or all (or none) of these symptoms, and symptom values therefore might contain zero to four individual SET members, separated by commas. The following statements set the symptom column to the empty string (no SET members), a single SET member, and multiple SET members, respectively: INSERT INTO allergy (symptom) VALUES(''); INSERT INTO allergy (symptom) VALUES('stuffy head'); INSERT INTO allergy (symptom) VALUES('sneezing,red eyes'); MySQL represents SET columns as a bitmap using one bit per member, so the elements in the symptom definition have internal values of 1, 2, 4, and 8 (that is, they have the values of bits 0 through 3 in a byte). Internally, MySQL stores the values shown in the preceding INSERT statements as 0 (no bits set), 4 (bit 2 set), and 9 (bits 0 and 3 set; that is, 1 plus 8). A SET definition may contain up to 64 members. The internal storage required for set values varies depending on the number of SET elements (1, 2, 3, 4, or 8 bytes for sets of up to 8, 16, 24, 32, or 64 members). If you try to store an invalid list member into a SET column, it's ignored because it does not correspond to any bit in the column definition. For example, setting a symptom value to 'coughing,sneezing,wheezing' results in an internal value of 1 ('sneezing'). The 'coughing' and 'wheezing' elements are ignored because they aren't listed in the column definition as legal set members. (In strict SQL mode, an error occurs if you try to store an illegal SET value.) As mentioned earlier in this section, the conversion between string and numeric representations of ENUM and SET values can have unintuitive results. For example, although you would normally refer to an enumeration column using the string forms of its values, you can also use the internal numeric values. The effect of this can be very subtle if the string values look like numbers. Suppose that you define a table t like this: CREATE TABLE t (age INT, siblings ENUM('0','1','2','3','>3')); In this case, the enumeration values are the strings '0', '1', '2', '3', and '>3', and the matching internal numeric values are 1, 2, 3, 4, and 5, respectively. Now suppose that you issue the following statement: INSERT INTO t (age,siblings) VALUES(14,'3'); The siblings value is specified here as the string '3', and that is the value assigned to the column in the new record. However, you can also specify the siblings value as a number, as follows: INSERT INTO t (age,siblings) VALUES(14,3); But in this case, 3 is interpreted as the internal value, which corresponds to the enumeration value '2'! The same principle applies to retrievals. Consider the following two statements: SELECT * FROM t WHERE siblings = '3'; SELECT * FROM t WHERE siblings = 3; In the first case, you get records that have an enumeration value of '3'. In the second case, you get records where the internal value is 3; that is, records with an enumeration value of '2'. |