DB2 9 s Data Types | DB2 9 Fundamentals: Certification Study Guide

DB2 9's Data Types

If you stop and think about it, most of the "data" you encounter on a day-to-day basis falls into distinct categories. The money you buy coffee with and the change you get back are numerical in nature; the email messages you read and the replies you send are composed of character strings; and many of the things you do, such as waking up, going to the office, returning home, and going to bed revolve around time. Much of the data that gets stored in a DB2 database can be categorized in a similar manner. To ensure that all data is stored as efficiently as possible, DB2 comes equipped with a rich assortment of built-in data types. And in case none of those meet your needs, DB2 also provides the capability to create an infinite number of user-defined data types, which can in turn be used to store complex, nontraditional data that might be found in an intricate computing environment.

The built-in data types available with DB2 9 are classified according to the kind of data they have been designed to hold:

Numeric data
Character string data
Date/time data
Large object data
Extensible Markup Language (XML) documents

In addition to these more common data types, special data types designed to be used with the DB2 Extenders are also available.

Numeric Data Types

As the name suggests, numeric data types are used to store numeric values-specifically, numeric values that have a sign and a precision. The sign is considered positive if the value is greater than or equal to zero and negative if the value is less than zero, while the precision is the actual number of digits used to present the value. Numeric data is stored using a fixed amount of storage space, and the amount of space required increases as the precision of the number goes up. Numeric data types include:

Small integer: The small integer data type is used to store numeric values that have a precision of 5 digits or less. The range for small integer values is -32,768 to 32,767, and 2 bytes of storage space are required for every small integer value stored. (Positive numbers have one less value in their range because they start at the value 0, while negative numbers start at -1.) The keyword SMALLINT is used to denote the small integer data type.
Integer: The integer data type is used to store numeric values that have a precision of 10 digits. The range for integer values is -2,147,483,648 to 2,147,483,647, and 4 bytes of storage space are required for every integer value stored. The keywords INTEGER and INT are used to denote the integer data type.
Big integer: The big integer data type is used to store numeric values that have a precision of 19 digits. The range for big integer values is -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807, and 8 bytes of storage space are required for every big integer value stored. The keyword BIGINT is used to denote the big integer data type. (This data type is typically used on systems that provide native support for 64-bit integers; on such systems, processing large numbers that have been stored as big integers is much more efficient, and any calculations performed are more precise.)
Decimal: The decimal data type is used to store numeric values that contain both whole and fractional parts, separated by a decimal point. The exact location of the decimal point is determined by the precision and the scale of the value (the scale is the number of digits used by the fractional part). The maximum precision allowed for decimal values is 31 digits, and the corresponding scale must be a positive number less than the precision of the number. The amount of storage space needed to store a decimal value can be determined by solving the following equation: Precision ÷ 2 (truncated) + 1 = Bytes required. (For example, the value 67.12345 has precision of 7, 7 ÷ 2 is 3, + 1 makes 4; therefore, 4 bytes are required to store the value 67.12345.) The keywords DECIMAL, DEC, NUMERIC, and NUM are used to denote the decimal data type. If no precision or scale is specified, a scale of 5 and a precision of 0 are used by default: DECIMAL(5,0).
Single-precision floating-point: The single-precision floating-point data type is used to store a 32-bit approximation of a real number. This number can be zero, or it can fall within the range -3.402E+38 to -1.175E-37 or the range 1.175E-37 to 3.402E+38. Each single-precision floating-point value can be up to 24 digits in length, and 4 bytes of storage space are required for every value stored. The keywords REAL and FLOAT are used to denote the single-precision floating-point data type.
Double-precision floating-point: The double-precision floating-point data type is used to store a 64-bit approximation of a real number. This number can be zero, or it can fall within the range -1.79769E+308 to -2.225E-307 or 2.225E-307 to 1.79769E+308. Each double-precision floating-point value can be up to 53 digits in length, and 8 bytes of storage space is required for every value stored. The terms DOUBLE, DOUBLE PRECISION, and FLOAT are used to denote the double-precision floating-point data type.

Character String Data Types

Character string data types are used to store values composed of one or more alphanumeric characters. Together, these characters may form a word, a sentence, a paragraph, or a complete document. A variety of character string data types are available. Deciding on which one to use for a given situation primarily depends on the storage requirements of the data value to be stored. Character string data types include:

Fixed-length character string: The fixed-length character string data type is used to store character string values that are between 1 and 254 characters in length. The amount of storage space needed to store a fixed-length character string value can be determined by solving the following equation: (Number of characters × 1) = Bytes required. (A fixed amount of storage space is allocated, even if all of the space allocated is not needed-short strings are padded with blanks.) The keywords CHARACTER and CHAR are used to denote the fixed-length character string data type.
Varying-length character string: The varying-length character string data type is used to store character string values that are up to 32,672 characters in length. However, the actual length allowed is governed by the table space page size used. For tables that reside in table spaces that use 4K pages, varying-length character string values cannot be more than 4,092 characters in length; for tables that reside in a table spaces that use 8K pages, varying-length character string values cannot be more than 8,188 characters in length, and so on. The amount of storage space needed to store a varying-length character string value can be determined by solving the following equation: (Number of characters × 1) + 4 = Bytes required. (Only the amount of storage space actually needed, plus 4 bytes for an "end-of-string" marker, is allocated-strings are not blank padded.) The keywords CHARACTER VARYING, CHAR VARYING, and VARCHAR are used to denote the varying-length character string data type.
Long varying-length character string: The long varying-length character string data type is used to store character string values that are up to 32,672 characters in length, regardless of the table space page size used. The amount of storage space needed to store a long varying-length character string value can be determined by solving the following equation: (Number of characters × 1) + 24 = Bytes required. The keyword LONG VARCHAR is used to denote the long varying-length character string data type.
Fixed-length double-byte character string: The fixed-length double-byte character string data type is used to store DBCS (double-byte character set) character string values that are up to 127 characters in length. (Most Asian character sets are double-byte character sets.) The amount of storage space needed to store a fixed-length double-byte character string value can be determined by solving the following equation: (Number of characters × 2) = Bytes required. The term GRAPHIC is used to denote the fixed-length double-byte character string data type.
Varying-length double-byte character string: The varying-length double-byte character string data type is used to store DBCS character string values that are up to 16,336 characters in length. Again, the actual length allowed is governed by the table space page size used. For tables that reside in table spaces that use 4K pages, varying-length double-byte character string values cannot be more than 2,046 characters in length; for tables that reside in table spaces that use 8K pages, varying-length double-byte character string values cannot be more than 4,094 characters in length; and so on. The amount of storage space needed to store a varying-length double-byte character string value can be determined by solving the following equation: (Number of characters × 2) + 4 = Bytes required. The keyword VARGRAPHIC is used to denote the varying-length double-byte character string data type.
Long varying-length double-byte character string: The long varying-length double-byte character string data type is used to store DBCS character string values that are up to 16,350 characters in length, regardless of the table space page size used. The amount of storage space needed to store a long varying-length double-byte character string value can be determined by solving the following equation: (Number of characters × 2) + 24 = Bytes required. The keyword LONG VARGRAPHIC is used to denote the long varying-length character string data type.

Date/Time Data Types

Data/time data types are used to store values that represent dates and times. From a user perspective, these values appear to be character strings; however, they are physically stored as binary packed strings. Date/time data types include:

Date: The date data type is used to store three-part values (year, month, and day) that represent calendar dates. The range for the year portion is 0001 to 9999; the range for the month portion is 1 to 12; and the range for the day portion is 1 to 28, 29, 30, or 31, depending upon the month value specified and whether or not the year specified is a leap year. Externally, date values appear to be fixed-length character string values 10 characters in length. However, only 4 bytes of storage space are required for every date value stored. The keyword DATE is used to denote the date data type.
Time: The time data type is used to store three-part values (hours, minutes, and seconds) that represent time, using a 24-hour clock. The range for the hours portion is 0 to 24; the range for the minutes portion is 0 to 59; and the range for the seconds portion is 0 to 59. Externally, time values appear to be fixed-length character string values 8 characters in length. However, only 3 bytes of storage space are required for every time value stored. The keyword TIME is used to denote the time data type.
Timestamp: The timestamp data type is used to store seven-part values (year, month, day, hours, minutes, seconds, and microseconds) that represent a specific calendar date and time (using a 24-hour clock). The range for the year portion is 0001 to 9999; the range for the month portion is 1 to 12; the range for the day portion is 1 to 28, 29, 30, or 31, depending upon the month value specified and whether or not the year specified is a leap year; the range for the hours portion is 0 to 24; the range for the minutes portion is 0 to 59; the range for the seconds portion is 0 to 59; and the range for the microseconds portion is 0 to 999,999. Externally, timestamp values appear to be fixed-length character string values 26 characters in length (this string is displayed in the form YYYY-MM-DD-HH.MM.SS.NNNNNN, which translates to Year-Month-Day-Hour.Minute.Second.Microseconds). However, only 10 bytes of storage space are required for every timestamp value stored. The keyword TIMESTAMP is used to denote the timestamp data type.

Because the representation of date and time values varies throughout the world, the actual string format used to present a date or a time value is dependent upon the territory code assigned to the database being used. Table 6-1 shows the date and time string formats that are available with DB2 9.

Table 6-1: DB2 9 Date and Time Formats
Open table as spreadsheet
Format Name	Abbreviation	Date String Format	Time String Format
International Organization for Standardization	ISO	YYYY-MM-DD	HH.MM.SS
IBM USA Standard	USA	MM/DD/YYYY	HH:MM AM or PM
IBM European Standard	EUR	DD.MM.YYYY	HH.MM.SS
Japanese Industrial Standard	JIS	YYYY-MM-DD	HH:MM:SS
Site Specific	LOC	Based on database territory and country code	Based on database territory and country code
For date formats, YYYY = Year, MM = Month, and DD = Day; for time formats, HH = Hour, MM = Minute, and SS = Seconds
Adapted from Tables 3 and 4 on page 89 of the DB2 SQL Reference - Volume 1 manual.

Large Object (LOB) Data Types

Large object (LOB) data types are used to store large amounts of unstructured data. Large object data types include the following:

Binary large object: The binary large object data type is used to store binary data values (such as documents, graphic images, pictures, audio, and video) that are up to 2 gigabytes in size. The keywords BINARY LARGE OBJECT and BLOB are used to denote the binary large object data type. The amount of storage space set aside to store a binary large object value is determined by the length specification provided when a binary large object data type is defined. For example, 800 bytes of storage space would be set aside for a BLOB(800) definition.
Character large object: The character large object data type is used to store SBCS (single-byte character set) or MBCS (multibyte character set) character string values that are between 32,700 and 2,147,483,647 characters in length. The keywords CHARACTER LARGE OBJECT, CHAR LARGE OBJECT, and CLOB are used to denote the character large object data type. The amount of storage space set aside to store a character large object value is determined by the length specification provided when a character large object data type is defined. For example, 800 bytes of storage space would be set aside for a CLOB(800) definition.
Double-byte character large object: The double-byte character large object data type is used to store DBCS (double-byte character set) character string values that are between 16,350 and 1,073,741,823 characters in length. The keyword DBCLOB is used to denote the double-byte character large object data type. The amount of storage space set aside to store a double-byte character large object value is determined by the length specification provided when a double-byte character large object data type is defined. For example, 800 bytes of storage space would be set aside for a DBCLOB(400) definition.

The Extensible Markup Language (XML) Document Data Type

Extensible Markup Language (XML) is a simple, very flexible text format that provides a neutral way to exchange data between different devices, systems, and applications; data is maintained in a self-describing format that is hierarchical in nature. The Extensible Markup Language Document data type is used to store XML documents in their native format. (An XML value that is stored in a column defined with the Extensible Markup Language data type must be a well-formed XML document.) XML values are processed in an internal representation that is not comparable to any string value; however, an XML value can be transformed into a serialized string value representing the XML document using the XMLSERIALIZE() function. Similarly, a string value that represents an XML document can be transformed into an XML value using the XMLPARSE() function. The keyword XML is used to denote the XML data type. The amount of storage space set aside to store an XML document varies and is determined, in part, by the size and characteristics of the XML document being stored.

Tip

XML documents can only be stored in single-partition databases that have been defined using the UTF-8 code set. Furthermore, using the XML data type prevents future use of the Database Partitioning Feature that is available for DB2 Enterprise Server Edition.

Extenders

The extender products that are available with DB2 9 consist of unique sets of user-defined data types and user-defined functions that can be used to store and manipulate nontraditional data such as graphical images and audio/video clips. Since many of the data types provided by the DB2 Extenders are based on built-in data types, the size limits and storage requirements of an extender data type often match those of their built-in data type counterparts-provided extender data is actually stored in the database. Some of the DB2 Extenders allow their data to be stored in external files that reside outside the database, while the location of the files themselves is stored in a database table. When this is the case, the storage requirements for an extender data type can be much lower than for its built-in data type equivalent.

User-Defined Data Types

In Chapter 4, "Working with Databases and Database Objects," we saw that user-defined data types (UDTs) are data types that are explicitly created by a database user. A user-defined data type can be a distinct data type that shares a common representation with one of the built-in data types provided with DB2 (created by executing the CREATE DISTINCT TYPE SQL statement), or it can be a structured type that consists of a sequence of named attributes, each of which have their own data type (created by executing the CREATE TYPE SQL statement). Structured data types can also be created as subtypes of other structured types, thereby defining a type hierarchy.

User-defined data types are subject to strong data typing, which means that even though they may share the same representation as other built-in or user-defined data types, the value of one user-defined data type is only compatible with values of that same type (or of other user-defined data types within the same data type hierarchy). As a result, user-defined data types cannot be used as arguments for most built-in functions. However, user-defined functions and operators that duplicate the functionality provided by the built-in functions can be created.