7.7. Data TypesBefore continuing with our discussion of database objects, you need to understand the data types supported by DB2. A data type indicates what type of data can be saved in a column or variable and how large it can be. DB2 data types are either:
7.7.1. DB2 Built-in Data TypesDB2 provides several built-in data types, which can be classified into the following categories:
Figure 7.8 summarizes the built-in data types supported in DB2. Figure 7.8. The DB2 built-in data types7.7.1.1 Numeric Data TypesThe numeric data types include the following:
A SMALLINT uses the least amount of storage in the database for each value. The data value range for a SMALLINT is 32768 to 32767. The precision for a SMALLINT is 5 digits to the left of the decimal. Each SMALLINT column value uses 2 bytes of database storage. An INTEGER uses twice as much storage as a SMALLINT but has a greater range of possible values. The data value range for an INTEGER data type is 2,147,483,648 to 2,147,483,647. The precision for an INTEGER is 10 digits to the left of the decimal. Each INTEGER column value uses 4 bytes of database storage. The BIGINT data type is available for supporting 64-bit integers. The value range for BIGINT is 9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. Since platforms include native support for 64-bit integers, processing large numbers with BIGINT is more efficient than processing with DECIMAL and more precise than using DOUBLE or REAL. Each BIGINT column value uses 8 bytes of database storage. The SMALLINT, INTEGER, and BIGINT data types do not allow any digits to the right of the decimal. A DECIMAL or NUMERIC data type is used for numbers with fractional and whole parts. The DECIMAL data is stored in a packed format. You must provide the precision and scale when using a DECIMAL data type. The precision is the total number of digits (ranging from 1 to 31 digits), and the scale is the number of digits in the fractional part of the number. NOTE If you do not explicitly specify the precision and scale, DB2 will use a default value of DECIMAL(5,0), that is, a precision of 5 digits and a scale of 0 digits. A REAL or FLOAT data type is an approximation of a number. The approximation requires 32 bits or 4 bytes of storage. To specify a single-precision number using the REAL data type, you must define its length between 1 and 24 (especially if you use the FLOAT data type, as it can represent both single- and double-precision numbers and is determined by the integer value specified). A DOUBLE or FLOAT data type is also an approximation of a number. The approximation requires 64 bits or 8 bytes of storage. To specify a double-precision number using the FLOAT data type, you must define its length between 25 and 53. 7.7.1.2 String Data TypesYou can define string or character columns as either fixed length or variable length. The character string data types include the following:
A CHAR column is stored as a fixed-length field in the database; if the string you enter is shorter than the defined length of the column, the string will be padded with blanks. This wastes space within the database if you tend to store strings that are a lot shorter than the length of the column. A fixed-length character column can have a maximum length of 254 characters. If no length is specified, DB2 will use the default length of 1 character. A VARCHAR column stores only the characters entered for the string, and its maximum size closely corresponds to the page size for the table. For example, for a table created in a table space with a 32K page size, the maximum length of a VARCHAR string is 32,672 characters. A LONG VARCHAR column also stores only the characters entered for the string, but it does not store them in the data object with the other columns. LONG VARCHAR is a special data type stored in a separate long object. The maximum length of a LONG VARCHAR string is 32,700 characters. NOTE You can use the FOR BIT DATA clause of CHAR, VARCHAR, and LONG VARCHAR data types to indicate the data will be stored as a sequence of bytes. The clause can be used for non-traditional data like video and audio. Code page conversions do not occur because data is compared bit by bit. GRAPHIC data types use 2 bytes of storage to represent a single character. There are three types:
NOTE The LONG VARCHAR and LONG VARGRAPHIC data types are deprecated, meaning that they are still supported but are no longer enhanced. These data types are not manipulated in the buffer pool but are accessed directly from disk, a direct input/output (I/O) operation, so using them may impact performance. Instead, use the VARCHAR or VARGRAPHIC data types, respectively, because their maximum lengths are very close to those of the LONG data types, but VARCHAR and VARGRAPHIC provide better performance. When a VARCHAR data type's maximum size of 32,672 bytes is not enough to hold your data, use large objects. Large objects can store data greater than 32K up to 2GB in size. They are typically used to store information such as a long XML document, an audio file, or a picture. Three kinds of LOB data types are provided with DB2:
BLOBs store variable-length data in binary format and are ideal for storing video or audio information in your database. This data type has some restrictions; for example, you cannot sort by this type of column. CLOBs store large amounts of variable-length single-byte character set (SBCS) or multi-byte character set (MBCS) character strings, for example, large amounts of text such as white papers or long documents. DBCLOBs store large amounts of variable-length double-byte character set (DBCS) character strings, such as large amounts of text in Chinese. Similar to LONG VARCHAR and LONG VARGRAPHIC data types, LOBs are accessed directly from disk without going through the buffer pool, so using LOBs is slower than using other data types. In addition, because changes to a database are logged in transaction log files, these files might get filled quickly when modifying a LOB column. To prevent this from happening, the CREATE TABLE statement has the NOT LOGGED option for LOB columns. For LOB columns defined as more than 1GB in size, NOT LOGGED is required. The CREATE TABLE statement also has the COMPACT option for LOBs to allocate just the necessary disk space. However, if you perform an update to the LOB column that would increase the size of the LOB, DB2 would need to allocate more space at that time, which incurs a performance penalty. Note that this option does not compress the LOBs. NOTE Do not use LOBs to store data less than 32K in size. Instead, use VARCHAR or VARCHAR FOR BIT DATA, which can hold a maximum of 32,672 bytes. This will help with database performance. 7.7.1.3 Datetime Data TypesDate and time data types are special character data types used to store date and/or time values in specific formats. DB2 supports three datetime data types: DATE, TIME, and TIMESTAMP.
7.7.1.4 Datalink Data TypesDatalinks allow relational databases to work with file systems. By using a DATALINK column, you can point to files and take advantage of the power of relational database features like referential integrity. Refer to the DB2 UDB SQL Reference manual for more information about datalinks. 7.7.2. User-Defined TypesUser-defined types (UDTs) allow database users to create or extend the use of data types to their own needs. UDTs can be classified as DISTINCT, STRUCTURE, or REFERENCE. This section discusses only DISTINCT types. Please refer to the DB2 UDB SQL Reference manual for the other kinds of UDTs. A DISTINCT UDT can enforce business rules and prevent data from being used improperly. UDTs are built on top of existing DB2 built-in data types. To create a UDT, use the CREATE DISTINCT TYPE statement: CREATE DISTINCT TYPE type_name AS built-in_datatype WITH COMPARISONS The WITH COMPARISONS clause is required for all data types, except BLOB, CLOB, DBCLOB, LONG VARCHAR, LONG VARGRAPHIC, and DATALINK data types. This clause causes DB2 to create system-generated SQL functions to perform casting between the types; these are known as casting functions. For example, let's say you create two UDTs, celsius and fahrenheit: CREATE DISTINCT TYPE celsius AS integer WITH COMPARISONS CREATE DISTINCT TYPE fahrenheit AS integer WITH COMPARISONS The first statement creates a casting function named celsius, and the second statement creates a casting function named fahrenheit. Now, let's say you create a table using the newly created UDTs: CREATE TABLE temperature (country varchar(100), average_temp_c celsius, average_temp_f fahrenheit ) Table temperature keeps track of the average temperature of each country in the world in both Celsius and Fahrenheit. If you would like to know which countries have an average temperature higher than 35 degrees Celsius, you can issue this query: SELECT country FROM temperature WHERE average_temp_c > 35 Would this query work? At first, you may think it will, but remember that average_temp_c has data type celsius, while 35 is an INTEGER. Even though celsius was created based on the INTEGER built-in data type, this comparison cannot be performed as is. To resolve this problem, use the casting function generated with the creation of the celsius UDT as shown below: SELECT country FROM temperature WHERE average_temp_c > celsius(35) UDTs enforce business rules by preventing illegitimate operations. For example, the following query will not work: SELECT country FROM temperature WHERE average_temp_c = average_temp_f Because column average_temp_c and average_temp_f are of different data types, this query will result in an error. If UDTs had not been created and the INTEGER built-in data type had been used instead for both columns, the query would have workedbut what meaning in real life would that have? To drop a UDT, use the statement DROP DISTINCT TYPE type_name. This will also drop the casting functions associated to the UDT. 7.7.3. Choosing the Proper Data TypeIt is important to choose the proper data type because this affects performance and disk space. To choose the correct data type, you need to understand how your data will be used and its possible values. Table 7.3 summarizes what you should consider.
NOTE For the first two rows of Table 7.3 we chose CHAR versus VARCHAR depending on the length of the data. If the maximum length is fewer than 10 characters, we suggest using a CHAR data type; otherwise, we recommend VARCHAR. Normally for small variable-length columns, a CHAR column provides better performance. We chose the value of 10 characters based on our experience, but it may vary depending on your data. |