SQLite Engine > Data Type Management
SQLite data processing happens in the VM module. The VM is the sole manipulator of data stored in databases; everything else is controlled by bytecode programs that it executes. It decides what data to store there, and what data to retrieve from there. Assigning appropriate storage types to data values, and doing necessary type conversions, are the primary tasks of the VM. There are three places of data exchange where type conversions may take place: from application to engine, from engine to application, and from engine to engine. For the first two cases, the VM assigns types to user data. The VM attempts to convert user-supplied values into the declared SQL type of the column whenever it can, and vice versa. For the last item, the data conversions are required for expression evaluations. I discuss these three data conversion issues in the next three subsections.
In the Record Format section, I discussed storage formats for records of tables and indexes. Each record field value has a storage type. Every value supplied to SQLite, whether as literal embedded in an SQL statement, or as value bound to a prepared statement, is assigned a storage type before the statement is executed. The type is used to encode the supplied value to appropriate physical representation. The VM decides on the storage type for a given input value of a column in three steps: it first determines the storage type of the input data, then the declared SQL type of the column, and finally, if required, it does the type conversion. Under circumstances described next, SQLite may convert values between numeric storage types (INTEGER and REAL) and TEXT during query evaluation.
The VM assigns initial storage types to user supplied values as follows. A value specified as literal as part of an SQL statement is assigned one of the following storage types:
TEXT if the value is enclosed by single or double quotes
INTEGER if the value is an unquoted number with no decimal point or exponent
REAL if the value is an unquoted number with a decimal point or exponent
NULL if the value is the character string NULL without any quote surrounding it
BLOB if the value is specified using the X'ABCD' notation
Otherwise, the input value is rejected, and the query evaluation fails. Values for SQL parameters that are supplied using the sqlite3_bind_* API functions are assigned the storage types that most closely match the native type bound (e.g., sqlite3_bind_blob binds a value with storage type BLOB).
The storage type of a value that is the result of an SQL scalar operator depends on the outermost operator of the expression. User-defined functions may return values with any storage type. It is not generally possible to determine the type of the result of an expression at SQL statement preparation time. The VM assigns the storage type at runtime on obtaining the value.
Even though each column (except integer primary key) can store any type of value, the value may have an affinity of its declared SQL type. Other SQL database engines that I am aware of use the more restrictive system of static typing, where the type is associated with the container, and not with the value. To maximize compatibility between SQLite and other database engines, SQLite supports the concept of type affinity on columns. The type affinity of a column is the recommended type for values stored in that column: "it is recommended, not required." Any column can still store any type of data. It is just that some columns, given the choice, will prefer to use one type over another.
SQLite is typeless, i.e., there is no domain constraint. It permits storing any type of data in any table column irrespective of the declared SQL type of that column. (The exception to this rule is the rowid column; this column stores only integer values; any other value is rejected.) It lets you omit the SQL type declaration in create table statements. For example, create table T1(a, b, c) is a valid SQL statement in SQLite.
The preferred type for a column is called its affinity. Each column is assigned one of five affinity types: TEXT, NUMERIC, INTEGER, REAL, and NONE. (You may note a bit of naming conflict: "text," "integer," and "real" are names used for storage types, too. You can, however, determine the type category from the context.) The affinity type of a column is determined, depending on its declared SQL type in the CREATE TABLE statement, according to the following rules (SQLite is not at all strict about spelling errors in SQL type declarations):
If the SQL type contains the substring INT, then that column has INTEGER affinity.
If the SQL type contains any of the substrings CHAR, CLOB, or TEXT, then that column has TEXT affinity. (The SQL type VARCHAR contains the string CHAR, and therefore has the TEXT affinity.)
If the SQL type contains the substring BLOB, or if no type is specified, then the column has NONE affinity.
If the SQL type contains any of the substrings REAL, FLOA, or DOUB, then the column has REAL affinity.
Otherwise, the affinity type is NUMERIC.
The VM evaluates the rules in the same order as given above. The pattern matching is case insensitive. For example, if the declared SQL type of a column is BLOBINT, the affinity is INTEGER, and not NONE. If an SQL table is created using a create table table1 as select... statement, then all columns have no SQL types, and they are given NONE affinity. The type of implicit rowid is always integer.
There is a relationship between affinity types and storage types. If a user-supplied value for a column does not satisfy the relationship, the value is either rejected or converted into the appropriate format. When a value is inserted into a column, the VM first assigns the most suitable storage type (see the "Input data type" section), and then makes an attempt to convert the initial storage type into the format of its affinity type (see the "Column affinity" section). It does so using the following rules:
A column with TEXT affinity stores all data that have NULL, TEXT, or BLOB storage types. If a numerical value (integer or real) is inserted into the column, the value is converted into text form, and the final storage type becomes TEXT.
A column with NUMERIC affinity may contain values with all five storage types. When a text value is inserted into a NUMERIC column, an attempt is made to convert the value to an integer or real number. If the conversion is successful, then the converted value is stored using the INTEGER or REAL storage type. If the conversion cannot be performed, the value is stored using the TEXT storage type. No attempt is made to convert NULL or BLOB values.
A column with INTEGER affinity behaves in the same way as a column with NUMERIC affinity, except that if a real value with no floating point component (or text value that converts to such) is inserted, the value is converted to an integer and stored using the INTEGER storage type.
A column with REAL affinity behaves like a column with NUMERIC affinity, except that it forces integer values into floating point representation. (As an optimization, integer values are stored on disk as integers to take up less space, and are only converted to floating point as the values are read out of the table.)
A column with affinity NONE does not prefer one storage type over another. The VM does not attempt to convert any input value.
All SQL database engines do data conversions. They reject input values that cannot be converted into desired types. The difference is that SQLite may still store the values even if a format conversion is not possible. For example, if you have a table column declared as SQL type INTEGER, and try to insert a string (e.g., "123" or "abc"), the VM will look at the string and see if it looks like a number. If the string does look like a number (as in "123"), it is converted into a number (and into an integer if the number does not have a fractional part), and is stored with real or integer storage type. But, if the string is not a well-formed number (as in the case "abc"), it is stored as a string with TEXT storage type. A column with TEXT affinity tries to convert numbers into an ASCII text representation before storing them. But BLOBs are stored in TEXT columns as BLOBs because SQLite cannot in general convert a BLOB into text. SQLite allows inserting string values into integer columns. This is a feature, not a bug.
Let us study a very simple example for more clarity. An example of a typical row record of a typical typeless table is given in Figure 6-3. The initial storage types for the a, b, and c column input values are integer, NULL, and text, respectively. The affinity type of all columns is NONE, and the VM does not make any attempt to convert the initial storage types. In the figure, the row record (header plus data items) consists of 11 bytes as follows. (All numbers in the record are given in hexadecimals.)
The header is 4 bytes long: one byte for the header size itself, plus one byte for each of three manifest types. The value 4 is encoded as the single byte 0x04.
Type 1 is the number 2 (representing a 2-byte signed integer) encoded as a single byte 0x02.
Type 2 is the number 0 (NULL) encoded as a single byte 0x00.
Type 3 is the number 23 (a text, (23 13)/2 = 5 bytes long) encoded as a single byte 0x17.
Data 1 is a 2-byte integer 00B1, which is the value 177. Note that 177 could not be encoded as a single byte because B1 is -79, not 177.
Data 2 is NULL, and it does not occupy any byte in the record.
Data 3 is the 5-byte string 68 65 6C 6C 6F. The zero-terminator is omitted.
The sqlite3_column_* API functions read data out of the SQLite engine. These functions attempt to convert the data value where appropriate. For example, if the internal representation is FLOAT, and a text result is requested (sqlite3_column_text function), the VM uses the sprintf() library function internally to do the conversion before returning the value to the caller. Table 6-6 presents data conversion rules that the VM applies on internal data to produce output data for applications.
|Internal Type||Requested Type||Conversion|
|NULL||INTEGER||Result is 0|
|NULL||FLOAT||Result is 0.0|
|NULL||TEXT||Result is NULL pointer|
|NULL||BLOB||Result is NULL pointer|
|INTEGER||FLOAT||Convert from integer to float|
|INTEGER||TEXT||ASCII rendering of the integer|
|INTEGER||BLOB||Same as for INTEGERTEXT|
|FLOAT||INTEGER||Convert from float to integer|
|FLOAT||TEXT||ASCII rendering of the float|
|FLOAT||BLOB||Same as FLOATTEXT|
|TEXT||INTEGER||Use atoi() C library function|
|TEXT||FLOAT||Use atof() C library function|
|BLOB||INTEGER||Convert to TEXT then use atoi()|
|BLOB||FLOAT||Convert to TEXT then use atof()|
|BLOB||TEXT||Add a \000 terminator if needed|
The VM may convert internal data before comparing them one against another or evaluating expressions. It uses the following conversions for internal data.
The NULL value can be used for any table columns except primary key columns. The storage type of the NULL value is "NULL." The NULL value is distinct from all valid values for a given column irrespective of their storage types. SQL standards are not very specific about how to handle NULL values from columns in expressions. It is not clear from the standards exactly how NULL values should be handled in all circumstances. For example, how do we compare NULLs with one another, and with other values? SQLite handles NULLs in the way many other RDBMSs do. NULLs are indistinct (i.e., the same value) for the purpose of the SELECT DISTINCT statement, the UNION operator in a compound SELECT, and GROUP BY. NULLs are, however, distinct in a UNIQUE column. NULLs are handled by the built-in SUM function as specified by the SQL standard. Arithmetic operations on NULL produce NULL.
SQLite supports three kinds of comparison operations:
binary comparison operators =, <, <=, >, >= and ! =
set membership operator IN
ternary comparison operator BETWEEN
The outcome of a comparison depends on the storage types of the two values being compared, according to the following rules:
A value (the lefthand side operand) with storage type NULL is considered less than any other value (including another value with storage type NULL).
An INTEGER or REAL value is less than any TEXT or BLOB value. When an INTEGER or REAL is compared to another INTEGER or REAL, a numerical comparison is performed.
A TEXT value is less than a BLOB value. When two TEXT values are compared, the standard C library function memcmp is usually used to determine the result. However this can be overridden by user-defined collation functions.
When two BLOB values are compared, the result is always determined using memcmp.
Before applying these rules, the first and foremost task for the VM is to determine the storage types of the operands of the comparison operator. It first determines the preliminary storage types of operands, and then, (if necessary) converts values between types based on their affinity. Finally, it performs the comparison using the above four rules.
If an expression is a column or a reference to a column via an AS alias or a subselect with a column as the return value or rowid, then the affinity of that column is used as the affinity of the expression. Otherwise, the expression does not have an SQL type, and its affinity is NONE. SQLite may attempt to convert values between the numeric storage types (INTEGER and REAL) and TEXT before performing a comparison operation. For binary comparisons, this is done in the cases enumerated next. The term "expression" in the bullets means any SQL scalar expression or literal other than a column value.
When two column values are compared, if any one column has NUMERIC affinity, then that affinity is preferred for both values. That is, the VM makes an attempt to convert the other column value before the comparison.
When a column value is compared to the result of an expression, the affinity of the column is applied to the result of the expression before the comparison takes place.
When values of two expressions are compared, then no conversions occur. The values are compared following the above-mentioned four standard rules. For example, if a string is compared to a number, the number will always be less than the string.
In SQLite, the expression a BETWEEN b AND c is equivalent to a >= b AND a <=c, even if this means that different affinities are applied to a in each of the two comparisons required to evaluate the expression.
Expressions of the kind a IN (SELECT b ...) are handled by the three rules enumerated above for = binary operator (e.g., a = b). For example if b is a column value, and a is an expression, then the affinity of b is applied to a before any comparisons take place. SQLite treats the expression a IN (x, y, z) as equivalent to a = z OR a = y OR a = z even if different affinities are applied to a in different equality expressions.
Here are a few simple examples. Suppose you have a table t1 that is created by CREATE TABLE t1(a TEXT, b NUMERIC, c BLOB). You insert a single row in the table by executing INSERT INTO t1 VALUES ('500', '500', '500'). The final storage types for a, b, and c column values are TEXT, INTEGER, TEXT, respectively.
SELECT a < 60, a < 40 FROM t1 converts 60 and 40 to "60" and "40," respectively, before the comparison, because the a column has TEXT affinity and the values are compared as TEXT; and the statement returns 1|0 as output because "500" is less than "60" as text, and is not less than "40."
SELECT b < 60, b < 600 FROM t1 does not convert any literal values, and compares values as NUMERIC, and returns 0|1 as output.
SELECT c < 60, c < 600 from t1 does not convert 60 and 600 because c has NONE affinity. The two values (storage class NUMERIC) are less than "500" (storage class TEXT), and the SELECT returns 0|0 as output.
All mathematical operators (except the concatenation operator ||) apply NUMERIC affinity to all operands prior to their evaluation. If any operand cannot be converted to NUMERIC, then the result of the operation is NULL. For the concatenation operator, TEXT affinity is applied to both operands. If either operand cannot be converted to TEXT (because it is NULL or BLOB), then the result of the concatenation is NULL.
When values are sorted by an ORDER BY clause, no storage type conversions take place before the sort. The previously stated standard comparison rules are followed: values with storage type NULL come first, followed by INTEGER and REAL values interspersed in numeric order, followed by TEXT values, usually in memcmp() order, and, finally, BLOB values in memcmp() order. Texts can be ordered by user-defined collation functions by overriding the memcmp() function. (Collations are user-defined sorting functions. You can refer to the SQLite homepage to learn how to use collations in SQL statements.)
When grouping values with the GROUP BY clause, no storage type conversions take place before grouping. Values with different storage types are considered distinct, except for INTEGER and REAL values, which are considered equal if they are numerically equal.
The compound SELECT operators (namely UNION, INTERSECT, and EXCEPT), perform implicit comparisons between values. Before these comparisons are performed, an affinity may be applied to each value. The same affinity, if any, is applied to all values that may be returned in a single column of the compound SELECT result set. The affinity applied is the affinity of the column returned by the leftmost component SELECT that has a column value (and not some other kind of expression) in that position. If for a given compound SELECT column none of the component SELECTs return a column value, no affinity is applied to the values from that column before they are compared.