Creating Database Objects

Let's look in detail at some of the objects that can be created: data types, tables, table spaces, views, materialized query tables, indexes, sequence objects, databases, and storage groups.

Data Types

Data types are used to specify the attributes of the columns when creating a table. DB2 supplies various data types; others are created by the users.

DB2-Supplied Data Types

When the database design is being implemented, any of the DB2-supplied data types can be used. Data is stored in DB2 tables consisting of columns and rows. Every DB2 table is defined by using columns, which must be one of the built-in DB2 data types or user-defined data types. Every DB2-supplied data type belongs to one of these major categories:

Numeric
String (binary, single byte, double byte)
Date/time

The valid built-in DB2 data types are listed in Table 4-1.

Table 4-1. Built-In Data Types
Data	Data Type	DB2 Data Type
Signed numeric types: Exact	Binary integer: 16-bit	`SMALLINT`
Signed numeric types: Exact	Binary integer: 32-bit	`INTEGER`
Signed numeric types: Decimal	Packed	`DECIMAL`
Signed numeric types: Approximate	Floating-point single-precision	`REAL`
Signed numeric types: Approximate	Floating-point double-precision	`DOUBLE`
Date/time types	Time	`TIME`
	Timestamp	`TIMESTAMP`
	Date	`DATE`
String types: Character	Fixed length	`CHAR`
String types: Character	Varying length	`VARCHAR` `CLOB`
String types: Graphic	Fixed length	`GRAPHIC`
	Varying length	`VARGRAPHIC` `DBCLOB`
	Varying-length binary	`BLOB`
String Types: Binary	Varying length	`BLOB`
Row identifier		`ROWID`

Numeric Data Types

These data types are used to store various numeric types and precisions. The precision of a number is the number of digits used to represent its value. The data is stored in the DB2 database, using a fixed amount of storage for all numeric data types. The amount of storage required increases as the precision of the number increases.

Numeric values should not be enclosed in quotation marks. If quotation marks are used, the value is treated as a character string. Even if a field contains numbers in its representation, a DB2 numeric data type should be used to represent the data only if arithmetic operations should be allowed.

Six DB2 data types can be used to store numeric data: SMALLINT, INTEGER, DECIMAL/NUMERIC, FLOAT (REAL or DOUBLE), REAL, and DOUBLE.

Small integer (SMALLINT). A small integer uses the least amount of storage in the database for each value. An integer does not allow any digits to the right of the decimal. The data value range for a SMALLINT is 32,768 to 32,767. The precision for a SMALLINT is five digits to the left of the decimal. Two bytes of database storage are used for each SMALLINT column value.
Integer (INTEGER). An INTEGER takes twice as much storage as a SMALLINT but has a greater range of possible values. The value range for an INTEGER data type is 2,147,483,648 to 2,147,483,647. The precision for an INTEGER is ten digits to the left of the decimal. Four bytes of database storage are used for each INTEGER column value.
Decimal (DECIMAL/NUMERIC). A DECIMAL or NUMERIC data type is used for numbers with fractional and whole parts. The DECIMAL data is stored in a packed format. The precision and scale must be provided when a decimal data type is used. The precision is the total number of digitsfrom 1 to 31and the scale is the number of digits in the fractional part of the number. For example, a DECIMAL data type to store currency values of up to $1 million requires a definition of DECIMAL(9,2). The terms NUMERIC, DECIMAL, or DEC can all be used to declare a decimal or numeric column. A DECIMAL number takes up p/2 + 1 bytes of storage, where p is the precision used. For example, DEC(8,2) takes up 5 bytes of storage (8/2 + 1), whereas DEC(7,2) takes up only 4 bytes (truncate the division of p/2).
Single-precision floating-point (REAL/FLOAT). A REAL data type is an approximation of a number. The approximation requires 32 bits, or 4 bytes of storage. To specify a single-precision number using the REAL data type, its length must be defined between 1 and 24, especially if the FLOAT data type is used, as it can represent both single-precision and double-precision and is determined by the integer value specified.
Double-precision floating-point (DOUBLE/FLOAT). A DOUBLE or FLOAT data type is an approximation of a number. The approximation requires 64 bits, or 8 bytes of storage. To specify a double-precision number using the FLOAT data type, its length must be defined between 25 and 53.

You must also be aware of the range limits of the data types and the corresponding application programming language when you are manipulating these numeric fields. Some data values are of the integer type by nature, such as the number of test candidates. A number representing a number of people count not contain fractional datanumbers to the right of the decimal. On the other hand, some values require decimal places to accurately reflect their value, such as test scores. These two examples should use different DB2 data types to store their values (SMALLINT and DECIMAL, respectively).

String Data Types

The seven string data types are CHAR; VARCHAR; CLOB; GRAPHIC, VARGRAPHIC, and DBCLOB; and BLOB.

NOTE

The syntaxes of LONG VARCHAR, and LONG VARGRAPHIC are still supported, but the alternative syntaxes of VARCHAR(integer) and VARGRAPHIC(integer) are preferred. DB2 will translate the LONG definitions into the other format prior to defining the table. Therefore, only the VARCHAR() and VARGRAPHIC() definitions should be used.

Fixed-length character strings (CHAR). Fixed-length character strings are stored in the database, using the entire defined amount of storage. If the data being stored always has the same length, a CHAR data type should be used.
Using fixed-length character fields can potentially waste disk space within the database if the data is not using the defined amount of storage. However, overhead is involved in storing varying-length character strings. The length of a fixed-length string must be between 1 and 255 characters. If you do not supply a value for the length, a value of 1 is assumed.
Varying-length character strings (VARCHAR). Varying-length character strings are stored in the database, using only the amount of space required to store the data and a 2-byte prefix to hold the length. In the example we develop, the individual names are stored as varying-length strings (VARCHAR) because each person's name has a different length, up to a maximum length of 30 characters.
If a varying-length character string is updated and the resulting value is larger than the original, it may not fit on the same page, and the row will be moved to another page in the table, leaving a marker in the original place. These marker data records are known as indirect-reference rows. Having too many of these records can cause significant performance degradation, as multiple pages (I/Os) are required to return a single data record.
A VARCHAR column must fit on one database page. This means that a 4KB page would allow a VARCHAR up to 4,046 characters long, defined as VARCHAR(4046); an 8KB page would be up to 8,128 characters long, and so on, up to a 32KB page with the maximum column length of 32,704 bytes. This means that for this table, you must create a table space that can accommodate the larger page size, and you must have sufficient space in the row to accommodate this string.
NOTE

Character strings on the z/OS platform are stored in the database without a termination character. Depending on the non-z/OS development environment, a null terminator may or may not be appended to the end of a character string when the data is stored or retrieved.

NOTE

Updates to variable-character fields can cause the row length to change and may cause indirect references.
Character large object (CLOB). CLOB is a varying-length single-byte character set (SBCS) or multibyte character set (MBCS) character string stored in the database. CLOB columns are used to store greater than 32KB of text. The maximum size for each CLOB column is approximately 2GB (2 gigabytes minus 1 byte). Because this data type is of varying length, the amount of disk space allocated is determined by the amount of data in each record. Therefore, you should create the column specifying the length of the longest string.
NOTE

The FOR BIT DATA clause can be used following a character string column definition. During data exchange, code page conversions are not performed. Rather, data is treated and compared as binary (bit) data.
Double-byte character strings (GRAPHIC). The GRAPHIC data types represent a single character using 2 bytes of storage. The GRAPHIC data types include GRAPHIC (fixed length, maximum 127 DBCS characters), VARGRAPHIC (varying length, maximum 32,704 DBCS characters for 32KB pages), and DBCLOB.
VARCHAR data types exceeding 255 bytes are similar to CLOB data types. (Both types have usage restrictions.)
Double-byte character large objects (DBCLOB). DBCLOBs are varying-length character strings that are stored in the database, using 2 bytes to represent each character. A code page is associated with each column. DBCLOB columns are used for large amounts (> 32KB) of double-byte text data, such as Japanese text. The maximum length should be specified during the column definition, because each data record will be variable in length.
Binary large objects (BLOB). BLOBs are variable-length binary strings. The data is stored in a binary format in the database. Restrictions apply when using this data type, including the inability to sort using this type of column. The BLOB data type is useful for storing nontraditional relational database information. The maximum size of each BLOB column is approximately 2GB. Because this data type is of varying length, the amount of disk space allocated is determined by the amount of data in each record, not by the defined maximum size of the column in the table definition.
Traditionally, large unstructured data was stored somewhere outside the database. Therefore, the data could not be accessed using SQL. Besides the traditional database data types, DB2 implements data types that will store large amounts of unstructured data. These data types are known as large objects (LOBs). Multiple LOB columns can be defined for a single table. DB2 provides special considerations for handling these large objects. You can choose not to log the LOB values, to avoid large amounts of data being logged.
A LOG option can be specified during the CREATE TABLESPACE statement for each AUXILIARY TABLE holding LOB column data, to avoid logging any modifications. To define a LOB column greater than 1GB, you must specify the LOG NO option.
In a database, you may choose to use BLOBs for the storage of pictures, images, or audio or video objects, along with large documents. BLOB columns will accept any binary string without regard to the contents. To manipulate textual data that is greater than 32KB in length, you use a CLOB data type. For example, if each test candidate were required to submit his or her resume, it could be stored in a CLOB column along with the rest of the candidate's information. Many SQL functions can be used to manipulate large character data columns.

NOTE

Support of LOB data types requires a special table space, an auxiliary table, and an auxiliary index.

Date and Time Data Types

Three DB2 data types are specifically used to represent dates and times.

The DATE data type is stored internally as a packed string of 4 bytes. Externally, the string has a length of 10 bytes (MM-DD-YYYY). This representation can vary and is dependent on the country code.
The TIME data type is stored internally as a packed string of 3 bytes. Externally, the string has a length of 8 bytes (HH-MM-SS). This representation may vary.
The TIMESTAMP data type is stored internally as a packed string of 10 bytes. Externally, the string has a length of 26 bytes (YYYY-MM-DD-HH-MM-SS-NNNNNN).

From the user perspective, these data types can be treated as character or string data types. Every time you need to use a date/time attribute, you will need to enclose it in quotation marks. However, date/time data types are not stored in the database as fixed-length character strings.

NOTE

The internal format of date, time, and timestamps is not at all similar to the external format returned via SQL. For example, the internal format of TIMESTAMP is 10 bytes, but the external format is 26 characters.

DB2 provides special functions that allow you to manipulate these data types. These functions allow you to extract the month, hour, or year of a date/time column. The date and time formats correspond to the site default. Therefore, the string that represents a date value will change, depending on the default format. In some countries, the date format is DD/MM/YYYY; in other countries, it is YYYY-MM-DD. You should be aware of the default format used by your site to use the correct date string format. If an incorrect date format is used in an SQL statement, an SQL error will be reported. Scalar functions will return date, time, and timestamp columns in formats other than the default.

As a general recommendation, if you are interested in a single element of a date stringsay, month or yearalways use the SQL functions provided by DB2 to interpret the column value. By using the SQL functions, you can make your application more portable.

NOTE

TIMESTAMP fields use the most storage but contain the most accurate time, as they include microseconds.

NOTE

Timestamps cannot be guaranteed to be unique, as the granularity is only to microseconds, and it is possible for two events to ask the system timer for the timestamp within 1 microsecond. The 8-byte system clock is unique, however, as it appends the time value with additional bits.

All date/time data types have an internal and an external format. The external format is always a character string. Three date/time data type formats are available in DB2.

Date string (DATE). A number of valid methods can be used to represent a DATE as a string. Any of the string formats shown in Table 4-2 can be used to store dates in a DB2 database. When the data is retrieved, using a SELECT statement, the output string will be in one of these formats or can be returned in any format specified.

Table 4-2. Date String Formats
Standard	String Format
International Standards Organization (ISO)	YYYY-MM-DD
IBM USA Standard (USA)	MM/DD/YYYY
IBM European Standard (EUR)	DD.MM.YYYY
Japanese Industrial Standard (JIS)	YYYY-MM-DD

Additionally, DB2 has many scalar functions to return date information, such as

DATE
DAY
DAYOFMONTH
DAYOFWEEK
DAYOFWEEK_ISO
DAYOFYEAR
JULIAN_DAY
MONTH
QUARTER
WEEK
WEEK_ISO
YEAR

Time string (TIME). A number of valid methods exist for representing a time as a string. Any of the string formats in Table 4-3 can be used to store times in a DB2 database. When data is retrieved, the external format of the time will be one of the formats shown in Table 4-3.
Table 4-3. Time String Formats
Standard
Format
International Standards Organization ISO
HH.MM.SS
IBM USA Standard USA
HH:MM AM or PM
IBM European Standard EUR
HH.MM.SS
Japanese Industrial Standard JIS
HH:MM:SS

Additionally, DB2 has many scalar functions to return time information, such as
- HOUR
- MINUTE
- SECOND
- TIME
Timestamp string (TIMESTAMP). The timestamp data type has a single default external format. Timestamps have an external representation of YYYY-MM-DD-HH.MM.SS.NNNNNN (year-month-day-hour-minute-seconds-microseconds). However, several scalar functions can manipulate the output format, especially the TIMESTAMP_FORMAT function, which returns any type of string, up to 255 characters, based on a user-defined template.
Additionally, DB2 has several scalar functions to return timestamp information besides all those listed for DATE and TIME, such as
- MICROSECOND
- TIMESTAMP
- TIMESTAMP_FORMAT

Special Data Types

Some special data types in DB2 can be used to support special processing, such as random-number and sequential-number generation and the ability to create data types.

ROWID

A ROWID is a value that uniquely identifies a row in a table. A ROWID column enables writing queries that navigate directly to a row in the table. ROWID column values are generated by DB2 unless supplied, but they must be unique. If a value is supplied, it must be a valid DB2-generated value. The internal format is 17 bytes and contains bit data. The external length is 40 bytes. If a ROWID column is used in a query, the access path is referred to as 'Direct Row Access' because the row is directly accessed without a scan of the index or the table space. This type of access is shown as a 'D' in the ACCESS_PATH column in the PLAN_TABLE. For more information on the PLAN_TABLE and access paths, refer to Chapter 17.

Identity Column

Columns can be populated with values if no value was supplied by the user. It is also possible to have DB2 generate sequence numbers or other values as part of a column during record insertion.

In the majority of applications, a single column within a table represents a unique identifier for that row. Often, this identifier is a number that gets sequentially updated as new records are added. DB2 has a feature that automatically generates this value on behalf of the user. The following example shows a table definition with the EMP_NO field automatically being generated as a sequence.

 CREATE TABLE EMPLOYEE     (EMPNO INT GENERATED ALWAYS AS IDENTITY,      NAME CHAR(10)); INSERT INTO EMPLOYEE(NAME) VALUES 'SMITH'; INSERT INTO EMPLOYEE(NAME) VALUES 'LAWSON'; SELECT * FROM EMPLOYEE; EMPNO       NAME ----------- ---------- 1           SMITH 2           LAWSON

If the column is defined with GENERATED ALWAYS, the INSERT statement cannot specify a value for the EMPNO field. By default, the numbering will start at 1 and increment by 1. The starting and increment values can be modified as part of the column definition:

 CREATE TABLE EMPLOYEE     (EMPNO INT GENERATED ALWAYS AS        IDENTITY(START WITH 100, INCREMENT BY 10)),      NAME CHAR(10)); INSERT INTO EMPLOYEE(NAME) VALUES 'SMITH'; INSERT INTO EMPLOYEE(NAME) VALUES 'LAWSON'; SELECT * FROM EMPLOYEE; EMPNO NAME ----------- ----------         100 SMITH         110 LAWSON

In addition, the default value can be GENERATED BY DEFAULT, which means that the user has the option of supplying a value for the field. If no value is supplied, using the DEFAULT keyword, DB2 will generate the next number in sequence.

One additional keyword (CACHE) is available as part of IDENTITY columns. You can decide how many numbers DB2 should "pregenerate." This can help reduce catalog contention, as DB2 will store the next n numbers in memory rather than go back to the catalog tables to determine which number to generate next.

Identity columns are restricted to numeric valuesinteger or decimaland can be used in only one column in the table definition. Table 4-4 describes the values used for identity column definition.

 CREATE TABLE EMPLOYEE   (EMPNO INT GENERATED ALWAYS AS IDENTITY,   NAME CHAR(10),   SALARY INT,   BONUS INT,   PAY INT); INSERT INTO EMPLOYEE(NAME, SALARY, BONUS, PAY) VALUES ('SMITH',20000,2000,22000); INSERT INTO EMPLOYEE(NAME, SALARY, BONUS, PAY) VALUES ('LAWSON',30000,5000,35000); SELECT * FROM EMPLOYEE; EMPNO      NAME          SALARY         BONUS          PAY -------    ----------    -----------    -----------    ---------- 1          SMITH         20000          2000           22000 2          LAWSON        30000          5000           35000

Table 4-4. Identity Column Values
Value	Description
`GENERATED`	DB2-generated values for the column. This column must specify `GENERATED` if the column is to be considered an identity column or if the data type is a `ROWID` or a distinct type that is based on a `ROWID`.
`ALWAYS`	DB2-generated value for the column when a row is inserted into the table.
`BY DEFAULT`	DB2-generated value for the column when a row is inserted into the table unless a value is specified. This is recommended only when using data propagation.
`AS IDENTITY`	Specifies that the column is an identity column for the table. A table can have only one identity column. `AS IDENTITY` can be specified only if the data type for the column is an exact numeric type with a scale of zero: `SMALLINT`, `INTEGER`, `DECIMAL` with a scale of zero, or a distinct type based on one of these types. An identity column is implicitly not null.
`START WITH` `n`	A numeric constant that provides the first value for the identity column. The value can be a positive or negative value that could be assigned to the column. No nonzero digits are allowed to the right of the decimal point. The default is 1.
`INCREMENT BY` `n*`	A numeric constant that provides the interval between consecutive values of the identity column. This value can be any positive or negative value that is not 0, and the default is 1. With a positive value, the sequence of values for the identity column will ascend; if it is negative, the sequence of identity column values will descend.
`CACHE` `n*`	Provides the number of values of the identity column sequence that DB2 preallocates and keeps in memory. The default is 20. If a system fails, all cached identity column values will never be used. The value specified for `CACHE` also represents the maximum number of identity column values that may be lost during a system failure. With a data sharing environment, each member has its own range of consecutive values to use. For instance, if using `CACHE 30`, susbystem DB2T may get values 1 to 30, and subsystem DB2U may use values 31 to 60. The values that are assigned might not be in the order in which they were requested, if transactions from different members generate values for the same identity column.
`NO CACHE*`	Specifies that caching is not to be used. Use `NO CACHE` to guarantee that the identity values are generated in the order in which they are requested for non-affinity transactions in a data sharing environment.
`CYCLE*`	Specifies whether the identity column should continue to generate values after reaching either the maximum or minimum value of the sequence. If the column values are ascending, it will start with the lowest value. If the column values are descending, it will start with the highest value. The `MAXVALUE` and `MINVALUE` are used to determine these ranges. Note that when `CYCLE` is in effect, duplicate values can occur. If a unique index exists, an error will occur.
`MAXVALUE`^[*]	Specifies the maximum value that can be generated for the identity column. The value can be positive or negative but must be greater than the `MINVALUE`.
`MINVALUE`^[*]	Specifies the minimum value that is generated for the identity column. The value can be positive or negative but must be less than the `MAXVALUE`.
`RESET WITH`^[*]	Allows for the identity column values to be reset.

^[*] As of version 8, ALTER can be used on these values.

The EMPNO is generated as an IDENTITY column.

NOTE

If the data needs to be unloaded and reloaded while preserving previously generated numbers, the GENERATED option can be altered.

Using identity columns has some restrictions. Identity columns cannot be specified on a table that has an edit procedure defined. An identity column cannot be defined with the FIELDPROC clause or the WITH DEFAULT clause. When updating an identity column, you cannot update the value of an identity column that is defined as GENERATED ALWAYS. If you are doing an ALTER table to add an identity column to a table that is not empty, the table space that contains the table is placed in the REORG pending state. When the REORG utility is executed, DB2 will generate the values for the identity column for all existing rows, and the REORG pending status will be removed. The values are guaranteed to be unique, and their order is determined by the system.

When creating another table using the LIKE clausecreating a table with the columns/attributes of another tableyou have some options in order to pick up identity column attributes. For an identity column, the newly created table will inherit only the data type of the identity column. No other column attributes are inherited unless the INCLUDING IDENTITY COLUMN ATTRIBUTES clause is used.

Distinct Types

Distinct types allow a user to extend the data types that DB2 understands in a database. The distinct types can be created based on an existing data type or another distinct type. Distinct types are used to define further types of data being represented in the database. If columns are defined using different distinct types based on the same base data type, these columns cannot be directly compared. This is known as strong typing. DB2 provides this strong data typing to avoid end user mistakes during the assignment or comparison of different types of real-world data. For more information on how to create and use distinct types, refer to Chapter 15.

Null-Value Considerations

A null value represents an unknown state. Therefore, when columns containing null values are used in calculations, the result is unknown. All the data types discussed previously support the presence of null values. During the table definition, you can specify that a valid value must be provided. This is accomplished by adding a phrase to the column definition. The CREATE TABLE statement can contain the phrase NOT NULL following the definition of each column. This will ensure that the column contains a known data value.

Special considerations are required to properly handle null values when coding a DB2 application. DB2 treats a null value differently from other data values. To define a column not to accept null values, add the phrase NOT NULL to the end of the column definition.

In the following example, DB2 will not allow any null values to be stored in the c1 column:

 CREATE TABLE t1 (c1 CHAR(3) NOT NULL)

In general, avoid using nullable columns unless they are required to implement the database design. Another factor to consider is overhead storage. An extra byte per nullable column is necessary if null values are allowed.

NOTE

Relational databases allow null values. It is important to remember that they can be appropriate for your database design.

When you insert a row into a table and omit the value of one or more columns, those columns may be populated using either a null valueif the column is defined as nullableor a defined default valueif you have so specified. If the column is defined as not nullable, the insert will fail unless the data has been provided for the column.

DB2 has a defined default value for each DB2 data type, but you can provide a default value for each column. The default value is specified in the CREATE TABLE statement. By defining your own default value, you can ensure that the data value has been populated with a known value.

In the following example, all the INSERT statements that omit the DEPT column will populate the column with the default value of 10. The COMM column is defined as default. In this case, you can choose at insert time between null or the default value of 15. To ensure that the default value is being used during an INSERT operation, the keyword DEFAULT should be specified in the VALUES portion of the INSERT statement. The following example shows two examples of inserting a record with user-defined default values:

 CREATE TABLE STAFF (ID SMALLINT NOT NULL, NAME VARCHAR(9), DEPT SMALLINT NOT NULL WITH DEFAULT 10, JOB CHAR(5), YEARS SMALLINT, SALARY DECIMAL(7,2), COMM DECIMAL(7,2) NOT NULL WITH DEFAULT 15);

In this case, both of the following statements will have the same result:

 INSERT INTO STAFF VALUES(360,'Lawson',DEFAULT, 'SE',8,20000,DEFAULT); INSERT INTO STAFF (ID,NAME,JOB,YEARS,SALARY) VALUES(360, 'Lawson', 'SE',8,20000,);

The result is

 ID      NAME     DEPT   JOB   YEARS   SALARY    COMM ------ --------- ------ ----- ------ --------- --------- 360    Lawson     10     SE    8      20000.00  15.00 1 record(s) selected.

Selecting the Correct Data Type

Knowledge of the possible data values and their usage is required to be able to select the correct data type. Specifying an inappropriate data type when defining the tables can result in wasted disk space, improper expression evaluation, and performance problems. Table 4-5 provides a brief checklist for data type selection.

Table 4-5. Data Type Selection
Usage	Data Type
Is the data variable in length?	`VARCHAR`
If the data is variable in length, what is the maximum length?	`VARCHAR`
Do you need to sort, or order, the data?	`CHAR`, `VARCHAR`, `NUMERIC`
Is the data going to be used in arithmetic operations?	`DECIMAL`, `NUMERIC`, `REAL`, `DOUBLE`, `FLOAT`, `INTEGER`, `SMALLINT`
Does the data element contain decimals?	`DECIMAL`, `NUMERIC`, `REAL`, `DOUBLE`
Is the data fixed in length?	`CHAR`
Does the data have a specific meaning beyond DB2 base data types?	`DISTINCT TYPE`
Is the data larger than what a character string can store? Do you need to store nontraditional data?	`CLOB`, `BLOB`, `DBCLOB`

Remember that you need to create page sizes that are large enough to contain the length of a row in a table. This is particularly important for tables with large character columns. When using character data types, the choice between CHAR and VARCHAR is determined by the range of lengths of the columns. For example, if the range of column length is relatively small, use a fixed CHAR with the maximum length. This will reduce the storage requirements and could improve performance.

Unicode Support in DB2

Unicode support is another very important enhancement. Support for Unicode will help with support across multinational boundaries. The Unicode encoding scheme allows for the representation of code points and characters of various geographies and languages. Unicode is a fixed-length character-encoding scheme that includes characters from most of the world's languages. Unicode characters are usually shown as U+xxxx, where xxxx is the hexadecimal code of the character. Each character is 16 bits wide, allowing for support of 65,000 characters. The normal support provided is the UCS-2/UTF-8 standard. With UCS-2 or Unicode encoding, ASCII and control characters are also 2 bytes long, and the lead byte is 0. Because extraneous NULLs may appear anywhere in the string, this could be a problem for programs that expect ASCII. UTF-8 is a transformation algorithm used to avoid the problem for programs that rely on ASCII code. UTF-8 transforms fixed-length (2-byte) UCS characters into variable-length byte strings. ASCII characters are represented by single-byte codes, but non-ASCII characters are 2 to 4 bytes long. UTF-8 transforms UCS-2 characters to a multibyte code set.

The UCS-2 code page is being registered as code page 1200. When new characters are added to a code page, the code page number does not change. Code page 1200 always refers to the current version of Unicode/UCS-2 and has been used for UCS-2 support in DB2 UDB on many of the other platforms. UTF-8 has been registered as CCSID (Coded Character Set Identifier) 1208 (code page 1208) and is used as the multibyte code page number for DB2's UCS-2/UTF-8 support.

DB2 UDB supports UCS-2 as a new multibyte code page. CHAR, VARCHAR, LONG VARCHAR, and CLOB data is stored in UTF-8; GRAPHIC, VARGRAPHIC, LONG VARGRAPHIC, and DBCLOB, in UCS-2.

As a default, databases are created in the code page of the application creating them. Alternatively, UTF-8 can be specified as the CODESET name with any valid two-letter TERRITORY code:

 CREATE DATABASE dbname USING CODESET UTF-8 TERRITORY US

A UCS-2 database allows connection from every single-byte and multibyte code page supported. Code page character conversions between a client's code page and UTF-8 are automatically performed by DB2. Data in graphic string types is always in UCS-2 and does not go through code page conversions. Although some client workstations have a limited subset of UCS-2 characters, the database allows the entire repertoire of UCS-2 characters.

All supported data types are also supported in a UCS-2 database. In a UCS-2 database, all identifiers are in multibyte UTF-8. Therefore, it is possible to use any UCS-2 character in identifiers in which DB2 allows the use of a character in the extended character set. This feature will also allow UCS-2 literals to be specified either in GRAPHIC string constant format, using the G'...' or N'....' format, or as a UCS-2 hexadecimal string, using the UX'....' or GX'....' format.

Tables

Tables consist of columns and rows that store an unordered set of data records. Tables can have constraints to guarantee the uniqueness of data records, maintain the relationship between and within tables, and so on.

Table Constraints

A constraint is a rule that the database manager enforces. Constraints are of three types:

Unique constraint. Ensures the unique values of a key in a table. Any changes to the columns that comprise the unique key are checked for uniqueness.
Referential constraint. Enforces referential constraints on insert, update, and delete operations. It is the state of a database in which all values of all foreign keys are valid.
Check constraint. Verifies that changed data does not violate conditions specified when a table was created or altered.

Unique Constraints

A unique constraint is the rule that the values of a key are valid only if they are unique within the table. Each column making up the key in a unique constraint must be defined as NOT NULL. Unique constraints are defined in the CREATE TABLE statement or the ALTER TABLE statement, using the PRIMARY KEY clause or the UNIQUE clause. When used, the table is marked as unavailable until you explicitly create an index for any key specified as UNIQUE or PRIMARY KEY, unless processed by the schema processor, in which case DB2 will implicitly create all necessary indexes.

A table can have any number of unique constraints; however, a table cannot have more than one unique constraint on the same set of columns. The enforcement of the constraint is through the unique index. Once a unique constraint has been established on a column, the check for uniqueness during multiple row updates is deferred until the end of the update: deferred unique constraint. A unique constraint can also be used as the parent key in a referential constraint.

Referential Constraint

Referential constraint, or integrity, allows you to define required relationships between and within tables. The database manager maintains these relationships, which are expressed as referential constraints, and requires that all values of a given attribute or table column also exist in another table column. Figure 4-2 shows an example of the referential integrity (RI) between two tables. This constraint requires that every employee in the EMPLOYEE table must be in a department that exists in the DEPARTMENT table. No employee can be in a department that does not exist.

A unique key is a set of columns in which no two values are duplicated in any other row. Only one unique key can be defined as a primary key for each table. The unique key may also be known as the parent key when referenced by a foreign key.
A primary key is a special case of a unique key. Each table can have only one primary key. In this example, DEPTNO and EMPNO are the primary keys of the DEPARTMENT and EMPLOYEE tables.
A foreign key is a table column or set of columns that refer to a unique key or primary key of the same or another table. A foreign key is used to establish a relationship with a unique key or a primary key and enforces referential integrity among tables. The column WORKDEPT in the EMPLOYEE table is a foreign key because it refers to the primary key, column DEPTNO, in the DEPARTMENT table.

Figure 4-2. Referential integrity between two tables

A parent key is a primary key or unique key of a referential constraint. A parent table is a table containing a parent key that is related to at least one foreign key in the same or another table. A table can be a parent in an arbitrary number of relationships. In this example, the DEPARTMENT table, which has a primary key of DEPTNO, is a parent of the EMPLOYEE table, which contains the foreign key WORKDEPT.

A dependent table contains one or more foreign keys. A dependent table can also be a parent table. A table can be a dependent in an arbitrary number of relationships. For example, the EMPLOYEE table contains the foreign key WORKDEPT, which is dependent on the DEPARTMENT table that has a primary key.

A referential constraint is an assertion that non-null values of a designated foreign key are valid only if they also appear as values of a unique key of a designated parent table. The purpose of referential constraints is to guarantee that database relationships are maintained and that data entry rules are followed.

Enforcement of referential constraints has special implications for some SQL operations that depend on whether the table is a parent or a dependent. The database manager enforces referential constraints across systems based on the referential integrity rules: the INSERT, DELETE, and UPDATE rules. However, only the DELETE rules are explicitly defined.

The INSERT rule is implicit when a foreign key is specified. You can insert a row at any time into a parent table without any action being taken in the dependent table. You cannot insert a row into a dependent table unless a row in the parent table has a parent key value equal to the foreign key value of the row that is being inserted, unless the foreign key value is null. If an INSERT operation fails for one row during an attempt to insert more than one row, all rows inserted by the statement are removed from the database.

When you delete a row from a parent table, the database manager checks whether any dependent rows in the dependent table match foreign key values. If any dependent rows are found, several actions can be taken. You determine which action will be taken by specifying a DELETE rule when you create the dependent table.

RESTRICT or NO ACTION prevents any row in the parent table from being deleted if any dependent rows are found. If you need to remove both parent and dependent rows, delete dependent rows first.
CASCADE automatically deletes the row from the dependent table when the parent row is deleted.
SET NULL sets the value of the foreign key to NULL, provided that it allows nulls. Other parts of the row will remain unchanged.

The database manager prevents the update of a unique key of a parent row. When you update a foreign key in a dependent table and the foreign key is defined with the NOT NULL option, it must match a value of the parent key of the parent table.

Check Constraints

Table-check constraints enforce data integrity at the table level. Once a table-check constraint has been defined for a table, every UPDATE and INSERT statement will involve checking the restriction, or constraint. If the constraint is violated, the data record will not be inserted or updated, and an SQL error will be returned.

A table-check constraint can be defined at table creation time or later, using the ALTER TABLE statement. The table-check constraints can help implement specific rules for the data values contained in the table by specifying the values allowed in one or more columns in every row of a table. This can save time for the application developer, as the validation of each data value can be performed by the database and not by each of the applications accessing the database.

When a check constraint is created or added, DB2 performs a syntax check on it. A check constraint cannot contain host variables or special registers. The check constraint's definition is stored in the system catalog tables: specifically, the SYSIBM.SYSCHECKS and SYSIBM.SYSCHECKDEP tables.

You can define a check constraint on a table by using the ADD CHECK clause of the ALTER TABLE statement. If the table is empty, the check constraint is added to the description of the table. If the table is not empty, what happens when you define the check constraint depends on the value of the CURRENT RULES special register, which can be either STD or DB2. If the value is STD, the check constraint is enforced immediately when it is defined. If a row does not conform, the table-check constraint is not added to the table, and an error occurs. If the value is DB2, the check constraint is added to the table description, but its enforcement is deferred. Because some rows in the table might violate the check constraint, the table is placed in check-pending(CHKP) status. The best way to remove the CHKP status is to run the CHECK DATA utility. For more information on the CHKP status and the CHECK DATA utility, refer to Chapter 7.

The ALTER TABLE statement that is used to define a check constraint will fail if the table space or partition that contains the table is in a check-pending status, the CURRENT RULES special register value is STD, and the table is not empty. To remove a check constraint from a table, use the DROP CONSTRAINT or DROP CHECK clauses of the ALTER TABLE statement. You must not use DROP CONSTRAINT on the same ALTER TABLE statement as DROP FOREIGN KEY, DROP CHECK, or DROP.

The following example adds a check constraint to a table that checks to be sure that JOB is valid:

 ALTER TABLE EMPLOYEE ADD CONSTRAINT check_job CHECK (JOB IN ('Engineer','Sales','Manager'));

NOTE

It is a good idea to appropriately label every constrainttriggers, table check, or referential integrity. This is particularly important for diagnosing errors that might occur.

As check constraints are used to implement business rules, you may need to change them from time to time. This could happen when the business rules change in your organization. No special command is used to change a check constraint. Whenever a check constraint needs to be changed, you must drop it and create a new one. Check constraints can be dropped at any time, and this action will not affect your table or the data within it.

When you drop a check constraint, you must be aware that data validation performed by the constraint will no longer be in effect. The statement used to drop a constraint is the ALTER TABLE statement. The following example shows how to modify the existing constraint. After dropping the constraint, you have to create it with the new definition:

 ALTER TABLE EMPLOYEE DROP CONSTRAINT check_job; ALTER TABLE EMPLOYEE ADD CONSTRAINT check_job CHECK (JOB IN ('OPERATOR','CLERK'));

Creating Tables

The CREATE TABLE statement allows you to define a new table. The definition must include its name and the attributes of its columns. The definition may include other attributes of the table, such as its primary key or check constraints. The RENAME TABLE statement can change the name of an existing table.

Once the table is defined, column names and data types cannot be modified. However, new columns can be added to the table. Be careful when adding new columns, as default data values will be used for existing records.

A table can have a maximum of 750 columns. This maximum will not vary depending on the data page size on the z/OS. DB2 supports 4KB, 8KB, 16KB, and 32KB data page sizes. Table 4-6 shows the maximum number of columns in a table and maximum length of a row by page size.

Table 4-6. Maximum Table Columns and Row Lengths
	4KB Page	8KB Page	16KB Page	32KB Page
Maximum columns	750	750	750	750
Maximum row length (bytes)	4,056	8,138	16,330	32,714

NOTE

If an EDITPROC was defined on a table, the maximum row sizes shown in Table 4-6 will decrease by 10 bytes in each case.

NOTE

If the table is a dependent table in a referentially intact structure, the maximum number of columns is 749.

Tables are always created within a table space. Users can specify the table space name in which the table will be created, or DB2 will create one implicitly. In the following example, DB2 will implicitly create a table space in the HUMANDB database, because no table space name was provided. The name of the table space will be derived from the table:

 CREATE TABLE DEPARTMENT (DEPTNUMB SMALLINT NOT NULL, DEPTNAME VARCHAR(20), MGRNO SMALLINT, PRIMARY KEY(DEPTNUMB)) IN DATABASE HUMANDB;

NOTE

Indexes will have their own index spaces created when they are created.

After a table is created, user data can be placed into the table by using one of these methods:

INSERT statement
LOAD utility
DSN1COPY

NOTE

The LOG NO option is available only for LOB table spaces.

NOTE

If you need a temporary table that is used only for the duration of a program, use the DECLARE GLOBAL TEMPORARY TABLE statement instead. This will result in no catalog contention, minimal logging, and no lock contention.

Following are sample CREATE TABLE statements. This example creates two tables. The definition includes unique constraints, check constraints, and referential integrity. In this example, the following conditions apply.

The DEPARTMENT table has a primary key that consists of column DEPTNUMB.
The EMPLOYEE table has a check constraint saying that JOB should be Sales, Mgr, or Clerk.
The default value is defined for the column HIREDATE in the EMPLOYEE table.
The EMPLOYEE table has a primary key that consists of the column ID.
A referential constraint is defined between the DEPARTMENT table and the EMPLOYEE table.
The EMPLOYEE table is created in the HUMANTS table space in the HUMANDB database.

 CREATE TABLE DEPARTMENT (DEPTNUMB SMALLINT NOT NULL, DEPTNAME VARCHAR(20), MGRNO SMALLINT, PRIMARY KEY(DEPTNUMB)) IN HUMANDB.HUMANTS; CREATE TABLE EMPLOYEE    (ID SMALLINT NOT NULL, NAME VARCHAR(9) NOT NULL,    DEPT SMALLINT,    JOB CHAR(5) CHECK (JOB IN('SALES','MGR','CLERK')),    HIREDATE DATE WITH DEFAULT CURRENT DATE,    SALARY DECIMAL(7,2),    COMM DECIMAL(7,2), CONSTRAINT UNIQUEID PRIMARY KEY(ID), FOREIGN KEY(DEPT) REFERENCES DEPARTMENT(DEPTNUMB) ON DELETE RESTRICT) IN HUMANDB.HUMANTS;

Auxiliary Tables

LOB data is not stored in the table in which it is defined. The defined LOB column holds information about the LOB, whereas the LOB itself is stored in another location. The normal place for this data storage is a LOB table space defining the physical storage that will hold an auxiliary table related to the base column and table.

Because the LOB is stored in a separate table, one performance consideration might be that if you have a large variable-character column in usethat is infrequently accessedyou may be able to convert it to a LOB so that it is kept separately, and this could speed up table space scans on the remaining data because fewer pages would be accessed.

Null is the only supported default value for a LOB column; if the value is null, it will not take up space in the LOB table space. The following examples show how to create a base table with a LOB and an auxiliary table to support it:

 CREATE TABLE DB2USER1.CANDIDATE  (CID           CANDIDATE_ID    NOT NULL,   ...   CERT_APP      CHAR(1) NOT NULL WITH DEFAULT,   PHOTO         BITMAP,   PRIMARY KEY (CID);   IN DB2CERT.CERTTS; CREATE AUX TABLE CAND_PHOTO   IN DB2CERT.CERTPIC   STORES DB2USER1.CANDIDATE   COLUMN PHOTO;

Copying a Table Definition

It is possible to create a table using the same characteristics of another table or a view. This is done through the CREATE TABLE LIKE statement. The name specified after LIKE must identify a table or a view that exists at the current server, and the privilege set must implicitly or explicitly include the SELECT privilege on the identified table or view. An identified table must not be an auxiliary table. An identified view must not include a column that is considered to be a ROWID column or an identity column.

The use of LIKE is an implicit definition of n columns, where n is the number of columns in the identified table or view. The implicit definition includes all attributes of the n columns as they are described in SYSCOLUMNS, with a few exceptions such as identity attributes, unless the INCLUDING IDENTITY clause is used.

The implicit definition does not include any other attributes of the identified table or view. For example, the new table will not have a primary key, foreign key, or check constraint. The table is created in the table space implicitly or explicitly specified by the IN clause, and the table has any other optional clause only if the optional clause is specified.

Following is an example of the CREATE LIKE statement:

 CREATE TABLE NEW_DEPT LIKE DEPARTMENT IN DATABASE HUMANDB

Modifying a Table

After you create a table, you can use the ALTER TABLE statement to modify existing tables. The ALTER TABLE statement modifies existing tables by

Adding one or more columns to a table
Adding or dropping a primary key
Adding or dropping one or more unique or referential constraints
Adding or dropping one or more check-constraint definitions
Altering the data type lengthincrease only within numeric or character data types
Altering the data type from CHAR to VARCHAR or VARCHAR to CHAR
Altering identity column attributes
Enable auditing of the table

NOTE

If a primary key is dropped, the dependents will no longer have foreign keys.

The following example shows how to add a check constraint to the DEPARTMENT table:

 ALTER TABLE DEPARTMENT ADD CHECK (DEPTNUM > 10)

The next example shows how to change a data type of a column. We assume in this example that the data type was originally CHAR and was less than 30.

 ALTER TABLE DEPARTMENT ALTER COLUMN NAME CHAR(35)

NOTE

With a change such as increasing the size of a data type column, we must not forget to relay this information to the application programmer, as this may impact code, that is, host variable definitions.

Removing a Table

When you want to remove a table, issue a DROP TABLE statement:

 DROP TABLE EMPLOYEE

NOTE

Any objects that are directly or indirectly dependent on this table are deleted or made inoperativefor example, indexes, triggers, and views. Whenever a table is deleted, its description is deleted from the catalog, and any packages that reference the object are invalidated.

Sequence Objects

A sequence object is a user-defined object that generates a sequence of numeric values according to the specifications in which it was created. Sequence objects provide an incremental counter generated by DB2 and are similar to identity columns. An identity column can be thought of as a special kind of sequence object; however, the sequence column is separate from the table. Sequence objects support many of the same attributes as identity columns, such as INCREMENT BY, CACHE, CYCLE, RESTART WITH, MINVAL, and MAXVAL.

Sequence object values can be used by applications for a variety of reasons and have several benefits:

No waits for incrementing values
Standalone sequential number-generating objectnot tied to a table
Ascending or descending number generation
Useful for application porting from the other DBMSs
Can help to generate keys that can be used to coordinate keys across multiple tables (RI or application related)

Creating Sequence Objects

The sequence name is made up of two parts: the 128-byte schema name and the 128-byte identifier. Sequence objects are created using a new CREATE SEQUENCE statement, and all attributes are completely user defined or defaults. The values in the sequence object can be of any exact numeric data type. The starting value is defined with a START WITH value and advances with INCREMENT BY, ascending or descending. The values can be cached and are generated in the order of request.

The following example shows the creation and simple use of a sequence object:

 CREATE SEQUENCE ACCOUNT_SEQ    AS INTEGER    START WITH 1    INCREMENT BY 10    CYCLE    CACHE 20

Using Sequence Objects

Some additional advantages of using sequence objects over other methods of number generation include the use of the NEXT VALUE FOR and PREVIOUS VALUE FOR expressions. The short terms NEXTVAL and PREVVAL can be used as synonyms for NEXT VALUE and PREVIOUS VALUE, respectively. NEXTVAL FOR generates and returns the next value for the sequence object. PREVVAL FOR generates and returns the previous value for the sequence object. These statements can be used with the following:

SELECT and SELECT INTO
An INSERT statement within the SELECT clause of fullselect
UPDATE statement within the SET clause (searched or positioned)
SET host variable
VALUES or VALUES INTO
CREATE PROCEDURE, FUNCTION, TRIGGER

The following examples show use of these statements; assume that ACCT_SEQ is START WITH 10 INCREMENT BY 10.

SELECT NEXTVAL FOR ACCT_SEQ	Returns 10
SELECT NEXTVAL FOR ACCT_SEQ	Returns 20
COMMIT
SELECT PREVVAL FOR ACCT_SEQ	Returns 20
______________________________________________________________
UPDATE ACCOUNTS
SET ACCT_NO = NEXTVAL FOR ACCT_SEQ	Returns 30
______________________________________________________________
INSERT INTO ACCOUNTS (ACCT_NO)
VALUES(NEXTVAL FOR ACCT_SEQ)	Returns 40

As you can see, using sequence objects instead of identity columns has many benefits. Table 4-7 provides a short comparison of the two.

Table 4-7. Sequence Objects versus Identity Columns
Sequence Objects	Identity Columns (with V8 features)
Standalone sequence objects created at user request	Internal sequence objects generated/maintained and populated by DB2
Used for whatever purpose users choose and can have more than one	Associated with a particular table and can have only one
Can have many and populate as many table columns as necessary	Can have only one per table
`CYCLE` used to wrap around and repeat with no uniqueness consideration	`CYCLE` may have a problem if a unique index is on the identity column and duplicates are created
When used to populate a table, can later be updated	Cannot be updated if `GENERATED ALWAYS` used
`ALTER` the sequence object attributes; `COMMENT` and `GRANT`/`REVOKE` privileges	`ALTER TABLE` only; if adding an identity column to a populated table, it will be put in REORG-pending status.
`DROP` a sequence object	Cannot be removed from a table.^[*]
Supports `NEXT VALUE FOR EXPRESSION` and `PREVIOUS VALUE FOR EXPRESSION`	Must use `ID_VAL_LOCAL` and returns only most recent values in that user's commit scope

^[*] If future designs would benefit more from sequence objects than identity columns, careful consideration should be made when choosing to use identity columns. If they are defined on populated tables and you want to remove them, the table must be dropped and recreated. This could be a big problem for large tables in a high-availability environment.

Both identity columns and sequence objects have their place. Given that they both accomplish the same objectivegenerating sequence numbersit is up to you to choose which one would work best for you. That determination will depend on the flexibility you need with the generated numbers and how the applications will be using them.

Modifying Sequence Objects

Sequence objects can be altered to change

Whether to cycle the generated sequence values
The MAXVALUE or MINVALUE
The starting value
The increment value
Whether to cache the values

NOTE

You cannot change/alter the data type or length of the values generated by a sequence object. In order to do this, you would need to drop and recreate the sequence object.

Removing Sequence Objects

Sequence objects can be removed by using the DROP statement:

 DROP SEQUENCE <sequence-name>

Table Spaces

Data is stored in table spaces, which comprise one or many VSAM data sets.

Types of Table Spaces

Types of table spaces may be simple, segmented, partitioned, or LOB.

Simple Table Space

Simple table spaces are the default but normally not the optimal. More than one table can be in a simple table space. If several tables are in the table space, rows from different tables can be interleaved on the same page; therefore, when a page is locked, rows of other tables may potentially be locked.

Segmented Table Space

Normally, if a table is not partitioned, a segmented table space is used. A segmented table space organizes its pages into segments, with each segment containing the rows of only one table. Segments can be 4 to 64 pages each, with each segment having the same number of pages. When using a segmented table space to hold more than one table, make sure that the tables have similar characteristics in all categories, including size, volatility, locking needs, compression, and backup/recovery strategies. Some guidelines for how many tables to have in a segmented table space, based on the number of pages in the table space, are listed in Table 4-8.

Table 4-8. Table Page Thresholds
Number of Pages	Table Space Design
> 100,000	Consider partitioning
> 10,000	One-table segmented table space
> 128 to < 10,000	Multiple-table segmented table spaces
< 128	Multiple-table segmented table spaces

Using segmented table spaces has several advantages. Because the pages in a segment contain rows from only one table, locking interference with other tables will not occur. In simple table spaces, rows are intermixed on pages: If one table page is locked, it can inadvertently lock a row of another table just because it is on the same page. When you have only one table per table space, this is not an issue. Other benefits of having a segmented table space for one table follow.

If a table scan is performed, the segments belonging to the table being scanned are the only ones accessed; empty pages will not be scanned.
If a mass delete or a DROP table occurs, segment pages are available for immediate reuse after the commit, and it is not necessary to run a REORG utility.
Mass deletes are much faster for segmented table space and produce less logging.
The COPY utility will not have to copy empty pages left by a mass delete.
When inserting records, some read operations can be avoided by using the more comprehensive space map of the segmented table space.
By being able to safely combine several tables in a table space, you can reduce the number of open data sets necessary, which reduces the amount of memory required in the subsystem.

SEGSIZE tells DB2 how large to make each segment for a segmented table space and determines how many pages are contained in a segment. SEGSIZE will vary, depending on the size of the table space. Recommendations are listed in Table 4-9.

Table 4-9. Recommendations for Table Segment Size
Number of Pages	`SEGSIZE`
28 or less	4 to 28
28 to 128	32
128 or more	64

NOTE

A segmented table space is a single data set and is limited to 2GB.

The following example shows how to create a segmented table space with a segment size of 32:

 CREATE TABLESPACE CERTTS IN DB2CERT USING STOGROUP CERTSTG PRIQTY 52 SECQTY 20 ERASE NO LOCKSIZE PAGE BUFFERPOOL BP6 CLOSE YES SEGSIZE 32;

NOTE

ALTER cannot be used on the SEGSIZE parameter.

Partitioned Table Space

Partitioning a table space has several advantages. For large tables, partitioning is the only way to store large amounts of data. Partitioning also has advantages for tables that are not large. DB2 allows defining up to 4,096 partitions of up to 64GB each. (However, total table size is limited to 128TB (terabytes) and the number of partitions is dependent on the DSSIZE specified.) Nonpartitioned table spaces are limited to 64GB of data.

You can take advantage of the ability to execute utilities on separate partitions in parallel. This also gives you the ability to access data in certain partitions while utilities are executing on others. In a data sharing environment, you can spread partitions among several members to split workloads. You can also spread your data over multiple volumes and need not use the same storage group for each data set belonging to the table space. This also allows you to place frequently accessed partitions on faster devices. The following example shows how to create a partitioned table space with two partitions:

 CREATE TABLESPACE CERTTSPT IN DB2CERT USING STOGROUP CERTSTG    PRIQTY 100    SECQTY 120    ERASE NO NUMPARTS 2 (PART 1  COMPRESS YES,  PART 2  FREEPAGE 20) ERASE NO LOCKSIZE PAGE CLOSE NO;

NOTE

It is possible to use the ALTER statement to change the partitioning-key ranges for rebalancing. A REBALANCE option on the REORG utility allows for this as well.

Many designs are outgrowing the bounds of V7 partitioned table spaces and need to add partitions. In the past, this process could be rather cumbersome. In version 8, this can be done by using the ALTER statement:

 ALTER TABLE table1 ADD PARTITION

DB2 picks the partition number, based on the next-available partition. If the partitions are storage group (stogroup) defined, the data set is allocated for table space and partitioning index with a PRIQTY value picked up from the previous partition. You may need to alter the table space to give new parameters to better suit the new partition, that is, primary allocations and free space. If your underlying data sets are VCAT defined, the data set will need to be defined first, before the ALTER is performed. The new partitions will be available immediately after the ALTER. In order for this to take place, the table space will have to be stopped and then started again. A small outage will occur in order to take advantage of this.

All affected plans, packages, and cached SQL statements are invalidated because SQL may be optimized to read certain partitions, and the new number of partitions may change the access path. If the table space is defined as LARGE, the limit is 64 partitions. If using DSSIZE, the number of the partitions is limited, based on the chosen DSSIZE and the page size. See Figure 4-12 later in this chapter.

Figure 4-12. Tables for the`DB2CERT` database

Dealing with the archiving of data or rolling of partitions often involves a large outage and a lot of previous planning. Even though tables can support up to 4,096 partitions, some limitations exist; supporting that much historical data in the primary table space may not be desirable.

Partitions can be rotated in order to move the first partition to last. Doing so will allow moving the lowest logical partition to the last logical partition and specifying the new partition values for enforcing the values of the last partition. The RESET option allows a partition to be reset, at which time the data is deleted, which can impact the performance of the feature and the availability of the data. Before the rotation, the partition holds the oldest data; afterward, it will hold the newest data. The REUSE option allows extents to be kept, and the data set will not be physically deleted. Following is an example of how to rotate partitions:

 ALTER TABLE table1 ROTATE PARTITION FIRST TO LAST ENDING AT ('2004-03-31')

As mentioned, the data is deleted from old partitions. Depending on the size of the partition, the logging impacts of the deletes may become a performance issue. You may want to consider unloading the partition prior to ALTER by using the old tried-and-true method of LOAD…PART x REPLACE, using a dummy SYSREC. This will help lessen the duration of the outage during the rotating of the partitions.

During the rotate, a DBD lock will be taken as the DDL completes. All activity will be quiesced immediately when the ALTER is issued.

Another issue is the fact that all keys for deleted rows must be deleted from nonpartitioning indexes (NPIs). If multiple NPIs are on a table, the scans must be performed serially to perform the deletes.

If a physical DB2-enforced referential integrity relationship or triggers exist, deletes will be a row at a time. Again, in this situation, you may want to consider an unload of the data first. But if a DELETE RESTRICT relationship is in place, the rotate may not work.

The ALTER TABLE ROTATE can change partition order. To account for this, the LOGICAL_PART column has been added to the SYSTABLEPART and SYSCOPY tables. This change will also be visible in the DISPLAY DATABASE output.

Figure 4-3 shows the ALTER statement of the three-partition CUSTOMER table. In this example, we are rotating off the first physical partition (P1) and now the last logical partition (P3).

Figure 4-3. Rotating partitions

In order to have a partitioned table in the DB2 releases prior to version 8, a partitioning index defined to give DB2 the limit keys for each partition was required. This partitioning index was also required to be the clustering index. Version 8 does not require a partitioning index to be defined on the partitioned table. The ability to partition is handled by the table creation, not the index. The following example uses the CREATE TABLE statement to handle the partitioning of a table:

 CREATE TABLE ACCOUNTS    (ENTER_DATE  DATE,      ACCOUNT  INTEGER,      STATUS CHAR(3))     IN TSP1           PARTITION BY (ENTER_DATE)              (PARTITION 1 ENDING AT ('2002-02-28'),               PARTITION 2 ENDING AT ('2002-03-31'));

The PART VALUES clause could also be used instead of the PARTITION ENDING AT clause; however, PARTITION ENDING AT is preferred.

Once a partitioned table has been created, it is ready to be used. Creating a separate partitioning index with the VALUES keyword is not required and not allowed. The information about the limit keys for the partitions will be stored in SYSTABLES, SYSCOLUMNS, and SYSTABLEPART.

Using table-controlled partitioning instead of index-controlled partitioning is recommended. Table-controlled partitioning is a replacement for index-controlled partitioning. Table 4-10 lists the differences between the two partitioning methods.

Table 4-10. Differences between Table-Controlled and Index-Controlled Partitioning
Table-Controlled Partitioning	Index-Controlled Partitioning
A partitioning index is not required; a clustering index is not required.	A partitioning index is required; a clustering index is required.
Multiple partitioned indexes can be created in a table space.	Only one partitioned index can be created in a table space.
A table space partition is identified by both a physical partition number and a logical partition number.	A table space partition is identified by a physical partition number.
The high-limit key is always enforced.	The high-limit key is not enforced if the table space is nonlarge.

The change from index-controlled partitioning to table-controlled partitioning can occur when any of the following are used:

DROP partitioning index
ALTER INDEX NOT CLUSTER
ALTER TABLE ADD PART
ALTER TABLE ALTER PART ROTATE
ALTER TABLE ALTER PART partno
CREATE INDEX PARTITIONED
CREATE INDEX VALUES, with no CLUSTER keyword

LOB Table Space

A LOB table space needs to be created for each columnor each column of each partitionof a base LOB table with a LOB column. This table space will contain the auxiliary table. The LOB table space has a structure that can be up to 4,000TB in size. This is a storage model used only for LOB table spaces. This linear table space can be up to 254 data sets, which can be up to 64GB each. The LOB table space is implemented in the same fashion as pieces are implemented for NPIs, as discussed later in this chapter. We can have 254 partitions of 64GB each, for a total of 16TB, and we will have one for each LOB column, up to 254 partitions, or a possible 4,000TB. See Figure 4-4.

Figure 4-4. Physical LOB storage: table with a `LOB` column up to 254 partitions

Partitioned base tables can each have different LOB table spaces. A LOB value can be longer than a page in a LOB table space and can also span pages. Following is an example of creating a LOB table space:

 CREATE LOB TABLESPACE CERTPIC   IN DB2CERT   USING STOGROUP CERTSTG       PRIQTY 3200       SECQTY 1600   LOCKSIZE LOB   BUFFERPOOL BP16K1   GBPCACHE SYSTEM   LOG NO   CLOSE NO;

NOTE

If a database is not explicitly defined for a LOB table space, it defaults to DSNDB04.

Creating Table Spaces

The CREATE TABLESPACE statement is used to define a simple, segmented, or partitioned table space on the current server. A large number of parameters with significant options are used in defining table spaces, depending on the type of table space. The major table space parameters follow.

DSSIZE specifies that the maximum size of any partition in a partitioned table space can exceed the 2GB limit, up to a size of 64GB.
LOB specifies that the table space is a LOB table space, used to hold only LOB values.
USING specifies whether storage groups or a user-defined table space is being defined. USING STOGROUP specifies that DB2 will manage the data sets and that PRIQTY and SECQTY will be used to define the allocations for the data sets. USING VCAT specifies that the data sets will be user defined.
FREEPAGE and PCTFREE specify the amount of free space to be left when a table space is loaded or reorganized.
GBPCACHE is used in data sharing to specify which pages of a table space or a partition are written to the group buffer pools.
NUMPARTS specifies the number of partitions in a partitioned table space, with a maximum of 4096 allowed.
SEGSIZE specifies the number of pages in a segment for segmented table spaces.
BUFFERPOOL specifies which buffer pool the table space is assigned to. A default buffer pool can be set with DSNZPARM TBSBPOOL.
LOCKSIZE specifies the size of locks used within the table space and in certain cases can specify the threshold at which lock escalation occurs. The allowable lock sizes are ANY, TABLESPACE, TABLE, PAGE, ROW, and LOB.
MAXROWS specifies the maximum number of rows allowed on a page, up to a maximum of 255.
LOCKPART specifies whether selective partition locking is to be used when locking a partitioned table space.
COMPRESS specifies whether data compression is used for rows in the table space or partition.
TRACKMOD specifies whether DB2 tracks modified pages in the space map pages in the table space or partition.

Page Sizes

Four page sizes, listed in Table 4-9, are available for use.

Table 4-11. Table Page Sizes
Buffer Pool	Pages
BP0BP49	4KB
BP8K0BP8K9	8KB
BP16K0BP16K9	16KB
BP32KBP32K9	32KB

The 8KB, 16KB, and 32KB pages are comprised of 4K control intervals (CIs), unless the DSVCI DSN2PARM is set to YES. Index page sizes are only 4KB pages, and work file (DSNDB07) table space pages are only 4KB or 32KB. The page size is defined by the buffer pool chosen for the table space; that is, BP8K0 supports 8KB pages.

Better hit ratios can be achieved with the larger page sizes, and they have less I/O because more rows can fit on a page. For instance, for a 2,200-byte rowperhaps for a data warehousea 4KB page would be able to hold only one row, but if an 8KB page were used, three rows could fit on a page, one more than if 4KB pages were used, and one less lock would be required.

DSSIZE

Depending on the DSSIZE and the page size, the number of partitions a table space can have has limitations. Table 4-12 shows these limits.

Table 4-12. `DSSIZE` and Partition Limitations
DSSIZE	4K	8K	16K	32K
4GB	4,096	4,096	4,096	4,096
8GB	2,048	4,096	4,096	4,096
16GB	1,024	2,048	4,096	4,096
32GB	512	1,024	2,048	4,096
64GB	256	512	1,024	2,048

Free Space

The FREEPAGE and PCTFREE clauses are used to help improve the performance of updates and inserts by allowing free space to exist on table spaces or index spaces. Performance improvements include improved access to the data through better clustering of data, less index page splitting, faster inserts, fewer row overflows, and a reduction in the number of REORGs required. Some tradeoffs include an increase in the number of pagesand therefore more auxiliary storage neededfewer rows per I/O and less efficient use of buffer pools, and more pages to scan.

As a result, it is important to achieve a good balance for each individual table space and index space when deciding on free space, and that balance will depend on the processing requirements of each table space or index space. When inserts and updates are performed, DB2 will use the free space defined, and by doing so can keep records in clustering sequence as much as possible. When the free space is used up, the records must be located elsewhere, and this is when performance can begin to suffer. Read-only tables do not require any free space, and tables with a pure insert-at-end strategy generally don't require free space. Exceptions to this are tables with VARCHAR columns and tables using compression and that are subject to updates.

The FREEPAGE amount represents the number of full pages inserted between each empty page during a LOAD or REORG of a table space or an index space. The tradeoff is between how often reorganization can be performed and how much disk space can be allocated for an object. FREEPAGE should be used for table spaces so that inserts can be kept as close to the optimal page as possible. For indexes, FREEPAGE should be used for the same reason, except improvements would be in terms of keeping index page splits near the original page instead of placing them at the end of the index. FREEPAGE is useful when inserts are sequentially clustered.

PCTFREE is the percentage of a page left free during a LOAD or a REORG. PCTFREE is useful when you can assume an even distribution of inserts across the key ranges. It is also needed in indexes to avoid all random inserts causing page splits.

Compression

Using the COMPRESS clause of the CREATE TABLESPACE and ALTER TABLESPACE SQL statements allows for the compression of data in a table space or in a partition of a partitioned table space.

NOTE

Indexes and LOB table spaces are not compressed.

In many cases, using the COMPRESS clause can significantly reduce the amount of Direct Access Storage Device (DASD) space needed to store data, but the compression ratio achieved depends on the characteristics of the data. Compression allows us to get more rows on a page and therefore see many of the following performance benefits, depending on the SQL workload and the amount of compression:

Higher buffer pool hit ratios
Fewer I/Os
Fewer getpage operations
Reduced CPU (central processing unit) time for image copies

Using compression also has some considerations for processing cost.

Compressing data can result in a higher processor cost, depending on the SQL workload. However, if you use IBM's synchronous data compression hardware, processor time is significantly less than if you use the DB2-provided software simulation or an edit or field procedure to compress the data.
The processor cost to decode a row by using the COMPRESS clause is significantly less than the cost to encode that same row. This rule applies regardless of whether the compression uses the synchronous data compression hardware or the software simulation built into DB2.
The data access path DB2 uses affects the processor cost for data compression. In general, the relative overhead of compression is higher for table space scans and less costly for index access.

The following example shows a table space created with compression:

 CREATE TABLESPACE CERTTSPT IN DB2CERT USING STOGROUP CERTSTG    PRIQTY 100    SECQTY 120    ERASE NO COMPRESS YES ERASE NO LOCKSIZE PAGE

Modifying Table Spaces

After you create a table space, the ALTER TABLESPACE statement enables you to modify existing table spaces. The ALTER TABLESPACE statement modifies many of the table space parameters, such as

Change the buffer pool assignment.
Change the lock size or the lock escalation threshold.
Change the specifics for a single partition.
Change any of the space definitions.
Turn compression off and on.
Change how pages are cached in group buffer pools.
Change logging for LOB table spaces.
Rotate partitions.
Add partitions.
Change the USING VCAT.

NOTE

If ALTER is used on the USING VCAT, the table space must be stopped first.

In the following example, ALTER TABLESPACE is used to change the buffer pool assignment and size of the locks used:

 ALTER TABLESPACE DB2CERT.CERTTS    BUFFERPOOL BP4    LOCKSIZE ROW

Removing Table Spaces

When you want to remove a table space, use the DROP TABLESPACE statement to delete the object. This will remove any objects that are directly or indirectly dependent on the table space. This statement will also invalidate any packages or plans that refer to the object and will remove its descriptions and all related data from the catalog.

 DROP TABLESPACE DB2CERT.CERTTS

Views

Views are logical tables created using the CREATE VIEW statement. Once a view is defined, it can be accessed using DML statements, such as SELECT, INSERT, UPDATE, and DELETE, as if it were a base table. A view is a temporary table, and the data in the view is available only during query processing.

With a view, you can make a subset of table data available to an application program and validate data that is to be inserted or updated. A view can have column names that are different from those of corresponding columns in the original tables. The use of views provides flexibility in the way the application programs and end user queries look at the table data.

A sample CREATE VIEW statement is shown in the following example. The original table, EMPLOYEE, has columns named SALARY and COMM. For security reasons, this view is created from the ID, NAME, DEPT, JOB, and HIREDATE columns. In addition, we are restricting access on the column DEPT. This definition will show the information only of employees who belong to the department whose DEPTNO is 10.

 CREATE VIEW EMP_VIEW1 (EMPID,EMPNAME,DEPTNO,JOBTITLE,HIREDATE)    AS SELECT ID,NAME,DEPT,JOB,HIREDATE FROM EMPLOYEE    WHERE DEPT=10;

After the view has been created, the access privileges can be specified. This provides data security, as a restricted view of the base table is accessible. As we see in this example, a view can contain a WHERE clause to restrict access to certain rows or can contain a subset of the columns to restrict access to certain columns of data.

The column names in the view do not have to match the column names of the base table. The table name has an associated schema, as does the view name. Once the view has been defined, it can be used in such DML statements as SELECT, INSERT, UPDATE, and DELETE, with restrictions. The database administrator can decide to provide a group of users with a higher-level privilege on the view than the base table.

A view is an alternative way to look at data in one or more tables. A view is an SQL SELECT statement that is executed whenever the view is referenced in an SQL statement. Because it is not materialized until execution, such operations as ORDER BY, the WITH clause, and the OPTIMIZE FOR clause have no meaning.

WITH CHECK OPTION

If the view definition includes conditions, such as a WHERE clause, and the intent is to ensure that any INSERT or UPDATE statement referencing the view will have the WHERE clause applied, the view must be defined using WITH CHECK OPTION. This option can ensure the integrity of the data being modified in the database. An SQL error will be returned if the condition is violated during an INSERT or UPDATE operation.

The following example is of a view definition using the WITH CHECK OPTION, which is required to ensure that the condition is always checked. You want to ensure that the DEPT is always 10. This will restrict the input values for the DEPT column. When a view is used to insert a new value, the WITH CHECK OPTION is always enforced.

 CREATE VIEW EMP_VIEW2 (EMPID,EMPNAME,DEPTNO,JOBTITLE,HIREDATE)    AS SELECT ID,NAME,DEPT,JOB,HIREDATE FROM EMPLOYEE    WHERE DEPT=10 WITH CHECK OPTION;

If the view in this example is used in an INSERT statement, the row will be rejected if the DEPTNO column is not the value 10. It is important to remember that no data validation occurs during data modification if the WITH CHECK OPTION is not specified. If the view in the previous example were used in a SELECT statement, the conditionalWHERE clausewould be invoked, and the resulting table would contain only the matching rows of data. In other words, the WITH CHECK OPTION does not affect the result of a SELECT statement. The WITH CHECK OPTION must not be specified for read-only views.

Read-Only Views

A read-only view allows no inserts, updates, or deletes. A view is considered read-only if

The first FROM clause identifies more than one table or view, a table function, or a read-only view, a nested or common table expression, or a system-maintained MQT.
The outer SELECT contains a GROUP BY or HAVING clause
The outer SELECT contains a column function or DISTINCT
It contains a subquery with the same table as the outer SELECT

Modifying and Removing a View

In order to modify a view, you simply drop and recreate it. You cannot alter a view's attributes, but you can regenerate it. To remove a view, you simply use the following:

 DROP VIEW EMP_VIEW2

Materialized Query Tables (MQTs)

Prior to V8, decision-support queries were difficult and expensive. They typically operated over a large amount of data that might have to scan or process terabytes of data and possibly perform multiple joins and complex aggregations. With these types of queries, traditional optimization was failing, and performance was less than optimal.

The use of materialized query tables, which used to be known as automatic summary tables on non-mainframe platforms, allows you to precompute whole or parts of each query and then use computed results to answer future queries. MQTs provide the means to save the results of prior queries and then reuse the common query results in subsequent queries. This helps avoid redundant scanning, aggregating, and joins. MQTs are useful for data warehousetype applications.

MQTs do not completely eliminate optimization problems; rather, MQTs move optimization issues to other areas. Some challenges include finding the best MQT for expected workload, maintaining the MQTs when underlying tables are updated, recognizing usefulness of MQT for a query, and determining to use the MQT for a query. Most of these types of problems are addressed by OLAP tools, but MQTs are the first step.

Defining MQTs

MQTs work with two tables: a source table and a materialized query table. The source table is the base table, view, table expression, or table function. The materialized query table is the table used to contain materialized data derived from one or more source tables in a FULLSELECT and is similar to creating a view. However, a view is logical, whereas an MQT contains materialized data of the query result. You could refer to an MQT as a "materialized view."

MQTs are created with the CREATE TABLE statement, and the columns of the MQT can be explicitly specified or derived from the fullselect associated with the table. MQTs are physically stored as are declared temporary tables. The following is an example of the syntax used to create an MQT:

 CREATE TABLE SALESMQT (CUSTID, STOREID, LOCID, MTH) AS (    SELECT CUSTID, STOREID, LOCID, MTH, COUNT(*)    FROM SALES    GROUP BY CUSTID, STOREID, LOCID, MTH) DATA INITIALLY DEFERRED REFRESH DEFERRED MAINTAINED BY SYSTEM ENABLE QUERY OPTIMIZATION

This example creates an MQT called SALESMQT.

MQT Options

The option DATA INITIALLY DEFERRED states that when a materialized query table is created, it will not be populated immediately by the fullselect defined in the DDL. The REFRESH DEFERRED option says that the data in the MQT is not refreshed immediately when its based tables are updates. The MQT can be refreshed at any time by using the REFRESH TABLE statement. The REFRESH TABLE command can also be used for the initial population of the MQT. The REFRESH TABLE statement deletes all rows with a mass delete on the MQT and then executes the fullselect in the MQT definition to recalculate the data from the base tables. It then inserts the calculated result into the MQT and updates the catalog for the refresh timestamp and cardinality of the MQT. This is all performed in a single commit scope. Following is an example of the how to refresh SALESMQT:

 REFRESH TABLE SALESMQT;

The MAINTAINED BY SYSTEM option indicates that the MQT is system maintained and that you will need to use the SQL statement REFRESH TABLE to perform this. This option does not allow for user updates by LOAD, INSERT, UPDATE, or DELETE and thus is by nature read-only.

The MAINTAINED BY USER option indicates that the MQT is user maintained by either triggers or batch updates. This option allows for user updates via LOAD, INSERT, UPDATE, or DELETE, and it can also be updated by the REFRESH TABLE statement, which can use EXPLAIN if necessary.

ENABLE FOR QUERY OPTIMIZATION allows the DB2 optimizer to choose the MQT for use during the processes of establishing an access path. Optimization is discussed further in Chapter 17.

Indexes

An index is a list of the locations of rows, sorted by the contents of one or more specified columns. Indexes are typically used to improve the query performance, but can also serve a logical data design purpose. For example, a unique index does not allow the entry of duplicate values in columns, thereby guaranteeing that no rows of a table are the same. Indexes can be created to specify ascending or descending order by the values in a column. The indexes contain a pointer, known as a row identifier (RID), to the physical location of the rows in the table.

Indexes are created for three main purposes:

To ensure uniqueness of values
To improve query performance
To ensure a physical clustering sequence

More than one index can be defined on a particular base table, which can have a beneficial effect on the performance of queries. However, the more indexes there are, the more the database manager must work to keep the indexes up to date during UPDATE, DELETE, and INSERT operations. Creating a large number of indexes for a table that receives many updates can slow down processing. Some of the most important parameters of the CREATE INDEX statement are listed in Table 4-13.

Table 4-13. `CREATE INDEX` Parameters
Parameter	Function
`TYPE 2`	Specifies a type 2 index. The `TYPE 2` clause is not required. A type 2 index is always created.
`UNIQUE`	Prevents the table from containing two or more rows with the same value of the index key.
`USING clause`	Specifies whether storage groups or a user-defined index space is being defined. `USING STOGROUP` specifies that DB2 will manage the data sets and that a `PRIQTY` and a `SECQTY` will define the allocations for the data sets. `USING VCAT` specifies that the data sets will be user defined and that no space allocations are defined in this statement.
`FREEPAGE` and `PCTFREE`	Specify the amount of free space to be left when an index space is built or reorganized.
`GBPCACHE`	Used in data sharing to specify which pages of an index space are written to the group buffer pools.
`DEFINE`	Specifies whether to create the underlying data sets.
`CLUSTER`	Specifies whether it is a clustering index.
`PARTITIONED`	Specifies whether the index is partitioned: primary or secondary.
`PART`	Identifies the partition number.
`PARTITION BY RANGE`	Specifies the partitioning index for the table to determine the partitioning scheme for the data in the table.
`BUFFERPOOL`	Specifies which buffer pool the table space is assigned to. A default buffer pool can be assigned with `DSNZPARM IDXBPOOL`.
`PIECESIZE`	Identifies the size of the piecesdata setsused for a nonpartitioning index.
`NOT PADDED`	Allows for an index with a `VARCHAR` column to not be padded with blanks, reducing the size of the index and allowing it to be used for index-only access.
`COPY`	Specifies whether the index can be image copied.

NOTE

Indexes are also stored in underlying VSAM data sets, as are table spaces. In order to make a correlation between the index that you or DB2 created, you can look in the INDEXSPACE column in the SYSINDEXES catalog table.

Type 2 indexes are created by default. Type 2 indexes have no locking and do not have subpages. Type 1 indexes should not be used and, as of version 8, are not supported.

An index can be defined with the DEFINE NO option (DEFINE YES is the default). This is done to specify an index and defer the physical creation. The data sets will not be created until data is inserted into the index. This option is helpful in order to reduce the number of physical data sets.

Clustering Index

It is generally important to control the physical sequence of the data in a table. The CLUSTER option is used on one, and only one, index on a table and specifies the physical sequence. If not defined, the first index defined on the table in a nonpartitioned table space is used for the clustering sequence. The best clustering index is the one that supports the majority of the sequential access to the data in the table.

NOTE

When the CLUSTER keyword is used, the index is called the explicit clustering index. If no index is defined with the CLUSTER keyword, the index DB2 chooses for clustering is called the implicit clustering index.

In defining table spaces, one option is called MEMBER CLUSTER. When this option is specified, the clustering sequence specified by the clustering index is ignored. In that case, DB2 will locate the data on the basis of available space when an SQL INSERT statement is used. This option is used mainly in a data sharing environment to avoid excessive p-lock (physical lock) negotiation on the space map when inserts are coming in by the clustering index on multiple members.

The clustering index can also be changed. ALTER can be used to change whether an index is the clustering index. For instance, the following syntax changes an index to not be the clustering index:

 ALTER INDEX CUSTIX NO CLUSTER

After the NO CLUSTER alter index is completed, the inserts will still occur by the implicit clustering index. A new clustering index needs to be defined, or ALTER can be used on an existing index to become the new clustering index, as shown in the following syntax:

 ALTER INDEX CUSTIX2 CLUSTER

When the new clustering index is defined, inserts will occur by the new index. Obviously, for performance reasons, it is wise to perform a REORG right after the ALTER to specify the new clustering index so that the inserts can now occur by the new clustering index.

Partitioned Index

Partitioned indexes are of two types: primary and secondary. If the keys in the index match the limit-key columns, or subset of columns, in the partitioned table, the index is considered a primary partitioned index; if not, it is considered a secondary partitioned index. This will be reflected in SYSIBM.SYSINDEXES.

 CREATE UNIQUE PARTITIONED INDEX DB2USER1.TESTCNTX   ON DB2USER1.TEST_CENTER    (TCID ASC)   USING STOGROUP CERTSTG   PRIQTY 512   SECQTY 64   ERASE NO   CLUSTER   BUFFERPOOL BP3   CLOSE YES;

A partitioned index is made up of several data sets. Each partition can have different attributes; that is, some may have more free space than others.

Unique and Nonunique Indexes

A unique index guarantees the uniqueness of the data values in a table's columns. The unique index can be used during query processing to perform faster retrieval of data. The uniqueness is enforced at the end of the SQL statement that updates rows or inserts new rows. The uniqueness is also checked during the execution of the CREATE INDEX statement. If the table already contains rows with duplicate key values, the index is not created. An example of creating a unique index follows:

 CREATE UNIQUE INDEX DB2USER1.TESTIX   ON DB2USER1.TEST    (NUMBER ASC)   USING STOGROUP CERTSTG   PRIQTY 512   SECQTY 64   ERASE NO   CLUSTER

A nonunique index can improve query performance by maintaining a sorted order for the data. Depending on how many columns are used to define a key, you can have an atomic, or single-column, key or a composite key, which is composed of two or more columns.

The following are types of keys used to implement constraints.

A unique key is used to implement unique constraints. A unique constraint does not allow two different rows to have the same values on the key columns.
A primary key is used to implement entity-integrity constraints. A primary key is a special type of unique key. A table can have only one primary key. The primary-key column must be defined with the NOT NULL option.
A foreign key is used to implement referential integrity constraints. Referential constraints can reference only a primary key or a unique constraint. The values of a foreign key can have values defined only in the primary key or unique constraint they are referencing or null values. (A foreign key is not an index; although an index can be created to improve access to rows using the foreign key columns.)

NOTE

DB2 uses unique indexes and the NOT NULL option to maintain primary and unique key constraints.

Unique Where Not Null Index

This is a special form of a unique index. Normally, in a unique index, any two null values are taken to be equal. Specifying WHERE NOT NULL will allow any two null values to be unequal.

Null Values and Indexes

It is important to understand the difference between a primary key and a unique index. DB2 uses two elements to implement the relational database concept of primary and unique keys: unique indexes and the NOT NULL constraint. Therefore, unique indexes do not enforce the primary key constraint by themselves, as they can allow a null value. Null values are unknown, but when it comes to indexing, a null value is treated as equal to all other null values, with the exception of the UNIQUE WHERE NOT NULL INDEX. You cannot insert a NULL value twice if the column is a key of a unique index, because it violates the uniqueness rule for the index.

Nonpartitioning Indexes

NPIs are used on partitioned tables and are not the same as the clustered partitioning key, which is used to order and partition the data. Rather, NPIs are for access to the data.

NPIs can be unique or nonunique. Although you can have only one clustered partitioning index, you can have several NPIs on a table, if necessary. NPIs can be broken apart into multiple pieces, or data sets, by using the PIECESIZE clause on the CREATE INDEX statement. Pieces can vary in size from 254KB to 64GB; the best size will depend on how much data you have and how many pieces you want to manage. If you have several pieces, you can achieve more parallelism on processes, such as heavy INSERT batch jobs, by alleviating the bottlenecks caused by contention on a single data set. The following example shows how to create an NPI with pieces:

 CREATE UNIQUE INDEX DB2USER1.TESTCN2X   ON DB2USER1.TEST_CENTER    (CODE ASC)   USING STOGROUP CERTSTG   PIECESIZE 512K;

Data-Partitioned Secondary Indexes

The new DPSI index type provides many advantages over the traditional NPIs for secondary indexes on a partitioned table space in terms of availability and performance. We can now choose to partition a table without a partitioning/clustering index. So now you can choose to cluster by a different index if it better fits your data and processing requirements. This can be accomplished through a DPSI.

The partitioning scheme of the DPSI will be the same as the table space partitions, and the index keys in x index partition will match those in x partition of the table space. Figure 4-5 shows how a DPSI is physically structured. The CUST_NUMX index is a DPSI.

Figure 4-5. DPSI physical layout

A DPSI provides the following benefits:

Clustering by a secondary index
Ability to easily drop partitions
Ability to easily rotate partitions
Efficient utility processing on secondary indexes
Ability to reduce overhead in data sharing (affinity routing)

Although partition independence is furthered, some queries may not perform as well with DPSIs. If it has predicates that reference partitioning column values in a single partition and therefore are restricted to a single partition of the DPSI, the query will benefit from this new organization. The queries will have to be designed to allow for partition pruning through the predicates in order to accomplish this. However, if it references only columns in the DPSI, the predicate may not perform very well because it may need to probe several partitions of the index. Figure 4-6 shows how applications will need to code predicates for DPSIs.

Figure 4-6. Coding predicates for DPSIs

Another limitation to using DPSIs is the fact that they cannot be unique and may not be the best candidates for ORDER BYs.

LOB Indexes

An index must be created on an auxiliary table for a LOB. The index itself consists of 19 bytes for the ROWID and 5 bytes for the RID. Therefore, the index is always unique. No LOB columns are allowed in the index. The following example shows the CREATE statement for an auxiliary index:

 CREATE INDEX DB2CERT.PHOTOIX   ON DB2USER1.CAND_PHOTO   USING VCAT DB2USER1   COPY YES;

NOTE

No columns are specified, because the auxiliary indexes have implicitly generated keys.

General Indexing Guidelines

Indexes consume disk space. The amount of disk space will vary, depending on the length of the key columns and whether the index is unique or nonunique. The size of the index will increase as more data is inserted into the base table. Therefore, consider the disk space required for indexes when planning the size of the database. Some of the indexing considerations include the following.

Primary and unique key constraints will always create a unique index.
It is usually beneficial to create indexes on foreign-key constraint columns.
It is beneficial to always create a clustering index.

Modifying an Index

The ALTER INDEX statement allows you to change many of the characteristics on the index, such as

Add a column to the index
Change the buffer pool assignment
Change the specifics for a single partition
Change any of the space definitions
Change whether the index is to be copied
Change the clustering index
Change to not pad an index if it contains a VARCHAR

The following example shows how to change the buffer pool assignment for an index:

 ALTER INDEX TESTCN2X BUFFERPOOL BP1;

Removing an Index

When you want to remove an index, issue the following statement:

 DROP INDEX TESTCN2X;

Databases

A database is a collection of table spaces, index spaces, and the objects with them. A couple of types of databases are used for special purposes.

A WORKFILE database holds the DB2 work files used for sorting and other activities.
A TEMP database holds temporary tables as defined by a DECLARE TEMP statement. (Refer to Chapter 7 for more information on declared temporary tables.)

For these special databases, you can also specify which data sharing member they are for, as each member must have its own. You can also specify the coding scheme for the data in the database (ASCII, CCSID, EBCDIC, Unicode).

Creating a Database

Following is an example of the creation of the DB2CERT database. The BUFFERPOOL parameter lets us specify that any objects created in this database without a buffer pool assigned will default to buffer pool BP7. INDEXBP has the same purpose but provides a default buffer pool for indexes.

 CREATE DATABASE CERTTS STOGROUP CERTSTG BUFFERPOOL BP7 INDEXBP BP8;

Modifying a Database

The default buffer pools, encoding scheme, and the storage group can be changed for a database. The following is an example ALTER DATABASE:

 ALTER DATABASE DB2CERT BUFFERPOOL BP4;

Removing a Database

It is very easy to remove a database: a simple DROP statement, provided that all the appropriate authorities are in place. When a database is dropped, all dependent objects are dropped as well.

 DROP DATABASE DB2CERT;

Storage Groups

Storage groups are used to list the DASD volumes that will be used to store the data. Storage groups can contain one or many volumes and can work with or without SMS (system-managed storage). If table spaces or index spaces are defined using a storage groupidentified in the USING clause in the CREATE INDEX and CREATE TABLESPACE statementsthey are considered to be DB2 managed, and DB2 will create them, allowing you to specify the PRIQTY and SECQTY for the data set allocations. Otherwise, they are considered to be user managed and must be defined explicitly through the IDCAMS utility.

To create a storage group, do the following:

 CREATE STOGROUP CERTSTG VOLUME(*) VCAT DB2USER1;

NOTE

The asterisk (*) in this example indicates that SMS will manage the volumes to be used.

We can add or remove volumes within a storage group. This is done via ALTER:

 ALTER STOGROUP CERTSTG ADD VOL1;

Storage groups can be removed by using the DROP statement. This can be done only if no table spaces or index spaces are using it.

 DROP STOGROUP CERTSTG;

Data Types

DB2-Supplied Data Types

Table 4-1. Built-In Data Types

Numeric Data Types

String Data Types

Date and Time Data Types

Table 4-2. Date String Formats

Table 4-3. Time String Formats

Special Data Types

ROWID

Identity Column

Table 4-4. Identity Column Values

Distinct Types

Null-Value Considerations

Selecting the Correct Data Type

Table 4-5. Data Type Selection

Unicode Support in DB2

Tables

Table Constraints

Unique Constraints

Referential Constraint

Figure 4-2. Referential integrity between two tables

Check Constraints

Creating Tables

Table 4-6. Maximum Table Columns and Row Lengths

Auxiliary Tables

Copying a Table Definition

Modifying a Table

Removing a Table

Sequence Objects

Creating Sequence Objects

Using Sequence Objects

Table 4-7. Sequence Objects versus Identity Columns

Modifying Sequence Objects

Removing Sequence Objects

Table Spaces

Types of Table Spaces

Simple Table Space

Segmented Table Space

Table 4-8. Table Page Thresholds

Table 4-9. Recommendations for Table Segment Size

Partitioned Table Space

Figure 4-12. Tables for theDB2CERT database

Figure 4-3. Rotating partitions

Table 4-10. Differences between Table-Controlled and Index-Controlled Partitioning

LOB Table Space

Figure 4-4. Physical LOB storage: table with a LOB column up to 254 partitions

Creating Table Spaces

Page Sizes

Table 4-11. Table Page Sizes

DSSIZE

Table 4-12. DSSIZE and Partition Limitations

Free Space

Compression

Modifying Table Spaces

Removing Table Spaces

Views

WITH CHECK OPTION

Read-Only Views

Modifying and Removing a View

Materialized Query Tables (MQTs)

Defining MQTs

MQT Options

Indexes

Table 4-13. CREATE INDEX Parameters

Clustering Index

Partitioned Index

Unique and Nonunique Indexes

Unique Where Not Null Index

Null Values and Indexes

Nonpartitioning Indexes

Data-Partitioned Secondary Indexes

Figure 4-5. DPSI physical layout

Figure 4-6. Coding predicates for DPSIs

LOB Indexes

General Indexing Guidelines

Modifying an Index

Removing an Index

Databases

Figure 4-12. Tables for the`DB2CERT` database

Figure 4-4. Physical LOB storage: table with a `LOB` column up to 254 partitions

Table 4-12. `DSSIZE` and Partition Limitations

Table 4-13. `CREATE INDEX` Parameters