Let's look in detail at some of the objects that can be created: data types, tables, table spaces, views, materialized query tables, indexes, sequence objects, databases, and storage groups.
Data types are used to specify the attributes of the columns when creating a table. DB2 supplies various data types; others are created by the users.
DB2-Supplied Data Types
When the database design is being implemented, any of the DB2-supplied data types can be used. Data is stored in DB2 tables consisting of columns and rows. Every DB2 table is defined by using columns, which must be one of the built-in DB2 data types or user-defined data types. Every DB2-supplied data type belongs to one of these major categories:
The valid built-in DB2 data types are listed in Table 4-1.
Numeric Data Types
These data types are used to store various numeric types and precisions. The precision of a number is the number of digits used to represent its value. The data is stored in the DB2 database, using a fixed amount of storage for all numeric data types. The amount of storage required increases as the precision of the number increases.
Numeric values should not be enclosed in quotation marks. If quotation marks are used, the value is treated as a character string. Even if a field contains numbers in its representation, a DB2 numeric data type should be used to represent the data only if arithmetic operations should be allowed.
Six DB2 data types can be used to store numeric data: SMALLINT, INTEGER, DECIMAL/NUMERIC, FLOAT (REAL or DOUBLE), REAL, and DOUBLE.
You must also be aware of the range limits of the data types and the corresponding application programming language when you are manipulating these numeric fields. Some data values are of the integer type by nature, such as the number of test candidates. A number representing a number of people count not contain fractional datanumbers to the right of the decimal. On the other hand, some values require decimal places to accurately reflect their value, such as test scores. These two examples should use different DB2 data types to store their values (SMALLINT and DECIMAL, respectively).
String Data Types
The seven string data types are CHAR; VARCHAR; CLOB; GRAPHIC, VARGRAPHIC, and DBCLOB; and BLOB.
The syntaxes of LONG VARCHAR, and LONG VARGRAPHIC are still supported, but the alternative syntaxes of VARCHAR(integer) and VARGRAPHIC(integer) are preferred. DB2 will translate the LONG definitions into the other format prior to defining the table. Therefore, only the VARCHAR() and VARGRAPHIC() definitions should be used.
Support of LOB data types requires a special table space, an auxiliary table, and an auxiliary index.
Date and Time Data Types
Three DB2 data types are specifically used to represent dates and times.
From the user perspective, these data types can be treated as character or string data types. Every time you need to use a date/time attribute, you will need to enclose it in quotation marks. However, date/time data types are not stored in the database as fixed-length character strings.
The internal format of date, time, and timestamps is not at all similar to the external format returned via SQL. For example, the internal format of TIMESTAMP is 10 bytes, but the external format is 26 characters.
DB2 provides special functions that allow you to manipulate these data types. These functions allow you to extract the month, hour, or year of a date/time column. The date and time formats correspond to the site default. Therefore, the string that represents a date value will change, depending on the default format. In some countries, the date format is DD/MM/YYYY; in other countries, it is YYYY-MM-DD. You should be aware of the default format used by your site to use the correct date string format. If an incorrect date format is used in an SQL statement, an SQL error will be reported. Scalar functions will return date, time, and timestamp columns in formats other than the default.
As a general recommendation, if you are interested in a single element of a date stringsay, month or yearalways use the SQL functions provided by DB2 to interpret the column value. By using the SQL functions, you can make your application more portable.
TIMESTAMP fields use the most storage but contain the most accurate time, as they include microseconds.
Timestamps cannot be guaranteed to be unique, as the granularity is only to microseconds, and it is possible for two events to ask the system timer for the timestamp within 1 microsecond. The 8-byte system clock is unique, however, as it appends the time value with additional bits.
All date/time data types have an internal and an external format. The external format is always a character string. Three date/time data type formats are available in DB2.
Special Data Types
Some special data types in DB2 can be used to support special processing, such as random-number and sequential-number generation and the ability to create data types.
A ROWID is a value that uniquely identifies a row in a table. A ROWID column enables writing queries that navigate directly to a row in the table. ROWID column values are generated by DB2 unless supplied, but they must be unique. If a value is supplied, it must be a valid DB2-generated value. The internal format is 17 bytes and contains bit data. The external length is 40 bytes. If a ROWID column is used in a query, the access path is referred to as 'Direct Row Access' because the row is directly accessed without a scan of the index or the table space. This type of access is shown as a 'D' in the ACCESS_PATH column in the PLAN_TABLE. For more information on the PLAN_TABLE and access paths, refer to Chapter 17.
Columns can be populated with values if no value was supplied by the user. It is also possible to have DB2 generate sequence numbers or other values as part of a column during record insertion.
In the majority of applications, a single column within a table represents a unique identifier for that row. Often, this identifier is a number that gets sequentially updated as new records are added. DB2 has a feature that automatically generates this value on behalf of the user. The following example shows a table definition with the EMP_NO field automatically being generated as a sequence.
CREATE TABLE EMPLOYEE (EMPNO INT GENERATED ALWAYS AS IDENTITY, NAME CHAR(10)); INSERT INTO EMPLOYEE(NAME) VALUES 'SMITH'; INSERT INTO EMPLOYEE(NAME) VALUES 'LAWSON'; SELECT * FROM EMPLOYEE; EMPNO NAME ----------- ---------- 1 SMITH 2 LAWSON
If the column is defined with GENERATED ALWAYS, the INSERT statement cannot specify a value for the EMPNO field. By default, the numbering will start at 1 and increment by 1. The starting and increment values can be modified as part of the column definition:
CREATE TABLE EMPLOYEE (EMPNO INT GENERATED ALWAYS AS IDENTITY(START WITH 100, INCREMENT BY 10)), NAME CHAR(10)); INSERT INTO EMPLOYEE(NAME) VALUES 'SMITH'; INSERT INTO EMPLOYEE(NAME) VALUES 'LAWSON'; SELECT * FROM EMPLOYEE; EMPNO NAME ----------- ---------- 100 SMITH 110 LAWSON
In addition, the default value can be GENERATED BY DEFAULT, which means that the user has the option of supplying a value for the field. If no value is supplied, using the DEFAULT keyword, DB2 will generate the next number in sequence.
One additional keyword (CACHE) is available as part of IDENTITY columns. You can decide how many numbers DB2 should "pregenerate." This can help reduce catalog contention, as DB2 will store the next n numbers in memory rather than go back to the catalog tables to determine which number to generate next.
Identity columns are restricted to numeric valuesinteger or decimaland can be used in only one column in the table definition. Table 4-4 describes the values used for identity column definition.
CREATE TABLE EMPLOYEE (EMPNO INT GENERATED ALWAYS AS IDENTITY, NAME CHAR(10), SALARY INT, BONUS INT, PAY INT); INSERT INTO EMPLOYEE(NAME, SALARY, BONUS, PAY) VALUES ('SMITH',20000,2000,22000); INSERT INTO EMPLOYEE(NAME, SALARY, BONUS, PAY) VALUES ('LAWSON',30000,5000,35000); SELECT * FROM EMPLOYEE; EMPNO NAME SALARY BONUS PAY ------- ---------- ----------- ----------- ---------- 1 SMITH 20000 2000 22000 2 LAWSON 30000 5000 35000
The EMPNO is generated as an IDENTITY column.
If the data needs to be unloaded and reloaded while preserving previously generated numbers, the GENERATED option can be altered.
Using identity columns has some restrictions. Identity columns cannot be specified on a table that has an edit procedure defined. An identity column cannot be defined with the FIELDPROC clause or the WITH DEFAULT clause. When updating an identity column, you cannot update the value of an identity column that is defined as GENERATED ALWAYS. If you are doing an ALTER table to add an identity column to a table that is not empty, the table space that contains the table is placed in the REORG pending state. When the REORG utility is executed, DB2 will generate the values for the identity column for all existing rows, and the REORG pending status will be removed. The values are guaranteed to be unique, and their order is determined by the system.
When creating another table using the LIKE clausecreating a table with the columns/attributes of another tableyou have some options in order to pick up identity column attributes. For an identity column, the newly created table will inherit only the data type of the identity column. No other column attributes are inherited unless the INCLUDING IDENTITY COLUMN ATTRIBUTES clause is used.
Distinct types allow a user to extend the data types that DB2 understands in a database. The distinct types can be created based on an existing data type or another distinct type. Distinct types are used to define further types of data being represented in the database. If columns are defined using different distinct types based on the same base data type, these columns cannot be directly compared. This is known as strong typing. DB2 provides this strong data typing to avoid end user mistakes during the assignment or comparison of different types of real-world data. For more information on how to create and use distinct types, refer to Chapter 15.
A null value represents an unknown state. Therefore, when columns containing null values are used in calculations, the result is unknown. All the data types discussed previously support the presence of null values. During the table definition, you can specify that a valid value must be provided. This is accomplished by adding a phrase to the column definition. The CREATE TABLE statement can contain the phrase NOT NULL following the definition of each column. This will ensure that the column contains a known data value.
Special considerations are required to properly handle null values when coding a DB2 application. DB2 treats a null value differently from other data values. To define a column not to accept null values, add the phrase NOT NULL to the end of the column definition.
In the following example, DB2 will not allow any null values to be stored in the c1 column:
CREATE TABLE t1 (c1 CHAR(3) NOT NULL)
In general, avoid using nullable columns unless they are required to implement the database design. Another factor to consider is overhead storage. An extra byte per nullable column is necessary if null values are allowed.
Relational databases allow null values. It is important to remember that they can be appropriate for your database design.
When you insert a row into a table and omit the value of one or more columns, those columns may be populated using either a null valueif the column is defined as nullableor a defined default valueif you have so specified. If the column is defined as not nullable, the insert will fail unless the data has been provided for the column.
DB2 has a defined default value for each DB2 data type, but you can provide a default value for each column. The default value is specified in the CREATE TABLE statement. By defining your own default value, you can ensure that the data value has been populated with a known value.
In the following example, all the INSERT statements that omit the DEPT column will populate the column with the default value of 10. The COMM column is defined as default. In this case, you can choose at insert time between null or the default value of 15. To ensure that the default value is being used during an INSERT operation, the keyword DEFAULT should be specified in the VALUES portion of the INSERT statement. The following example shows two examples of inserting a record with user-defined default values:
CREATE TABLE STAFF (ID SMALLINT NOT NULL, NAME VARCHAR(9), DEPT SMALLINT NOT NULL WITH DEFAULT 10, JOB CHAR(5), YEARS SMALLINT, SALARY DECIMAL(7,2), COMM DECIMAL(7,2) NOT NULL WITH DEFAULT 15);
In this case, both of the following statements will have the same result:
INSERT INTO STAFF VALUES(360,'Lawson',DEFAULT, 'SE',8,20000,DEFAULT); INSERT INTO STAFF (ID,NAME,JOB,YEARS,SALARY) VALUES(360, 'Lawson', 'SE',8,20000,);
The result is
ID NAME DEPT JOB YEARS SALARY COMM ------ --------- ------ ----- ------ --------- --------- 360 Lawson 10 SE 8 20000.00 15.00 1 record(s) selected.
Selecting the Correct Data Type
Knowledge of the possible data values and their usage is required to be able to select the correct data type. Specifying an inappropriate data type when defining the tables can result in wasted disk space, improper expression evaluation, and performance problems. Table 4-5 provides a brief checklist for data type selection.
Remember that you need to create page sizes that are large enough to contain the length of a row in a table. This is particularly important for tables with large character columns. When using character data types, the choice between CHAR and VARCHAR is determined by the range of lengths of the columns. For example, if the range of column length is relatively small, use a fixed CHAR with the maximum length. This will reduce the storage requirements and could improve performance.
Unicode Support in DB2
Unicode support is another very important enhancement. Support for Unicode will help with support across multinational boundaries. The Unicode encoding scheme allows for the representation of code points and characters of various geographies and languages. Unicode is a fixed-length character-encoding scheme that includes characters from most of the world's languages. Unicode characters are usually shown as U+xxxx, where xxxx is the hexadecimal code of the character. Each character is 16 bits wide, allowing for support of 65,000 characters. The normal support provided is the UCS-2/UTF-8 standard. With UCS-2 or Unicode encoding, ASCII and control characters are also 2 bytes long, and the lead byte is 0. Because extraneous NULLs may appear anywhere in the string, this could be a problem for programs that expect ASCII. UTF-8 is a transformation algorithm used to avoid the problem for programs that rely on ASCII code. UTF-8 transforms fixed-length (2-byte) UCS characters into variable-length byte strings. ASCII characters are represented by single-byte codes, but non-ASCII characters are 2 to 4 bytes long. UTF-8 transforms UCS-2 characters to a multibyte code set.
The UCS-2 code page is being registered as code page 1200. When new characters are added to a code page, the code page number does not change. Code page 1200 always refers to the current version of Unicode/UCS-2 and has been used for UCS-2 support in DB2 UDB on many of the other platforms. UTF-8 has been registered as CCSID (Coded Character Set Identifier) 1208 (code page 1208) and is used as the multibyte code page number for DB2's UCS-2/UTF-8 support.
DB2 UDB supports UCS-2 as a new multibyte code page. CHAR, VARCHAR, LONG VARCHAR, and CLOB data is stored in UTF-8; GRAPHIC, VARGRAPHIC, LONG VARGRAPHIC, and DBCLOB, in UCS-2.
As a default, databases are created in the code page of the application creating them. Alternatively, UTF-8 can be specified as the CODESET name with any valid two-letter TERRITORY code:
CREATE DATABASE dbname USING CODESET UTF-8 TERRITORY US
A UCS-2 database allows connection from every single-byte and multibyte code page supported. Code page character conversions between a client's code page and UTF-8 are automatically performed by DB2. Data in graphic string types is always in UCS-2 and does not go through code page conversions. Although some client workstations have a limited subset of UCS-2 characters, the database allows the entire repertoire of UCS-2 characters.
All supported data types are also supported in a UCS-2 database. In a UCS-2 database, all identifiers are in multibyte UTF-8. Therefore, it is possible to use any UCS-2 character in identifiers in which DB2 allows the use of a character in the extended character set. This feature will also allow UCS-2 literals to be specified either in GRAPHIC string constant format, using the G'...' or N'....' format, or as a UCS-2 hexadecimal string, using the UX'....' or GX'....' format.
Tables consist of columns and rows that store an unordered set of data records. Tables can have constraints to guarantee the uniqueness of data records, maintain the relationship between and within tables, and so on.
A constraint is a rule that the database manager enforces. Constraints are of three types:
A unique constraint is the rule that the values of a key are valid only if they are unique within the table. Each column making up the key in a unique constraint must be defined as NOT NULL. Unique constraints are defined in the CREATE TABLE statement or the ALTER TABLE statement, using the PRIMARY KEY clause or the UNIQUE clause. When used, the table is marked as unavailable until you explicitly create an index for any key specified as UNIQUE or PRIMARY KEY, unless processed by the schema processor, in which case DB2 will implicitly create all necessary indexes.
A table can have any number of unique constraints; however, a table cannot have more than one unique constraint on the same set of columns. The enforcement of the constraint is through the unique index. Once a unique constraint has been established on a column, the check for uniqueness during multiple row updates is deferred until the end of the update: deferred unique constraint. A unique constraint can also be used as the parent key in a referential constraint.
Referential constraint, or integrity, allows you to define required relationships between and within tables. The database manager maintains these relationships, which are expressed as referential constraints, and requires that all values of a given attribute or table column also exist in another table column. Figure 4-2 shows an example of the referential integrity (RI) between two tables. This constraint requires that every employee in the EMPLOYEE table must be in a department that exists in the DEPARTMENT table. No employee can be in a department that does not exist.
Figure 4-2. Referential integrity between two tables
A parent key is a primary key or unique key of a referential constraint. A parent table is a table containing a parent key that is related to at least one foreign key in the same or another table. A table can be a parent in an arbitrary number of relationships. In this example, the DEPARTMENT table, which has a primary key of DEPTNO, is a parent of the EMPLOYEE table, which contains the foreign key WORKDEPT.
A dependent table contains one or more foreign keys. A dependent table can also be a parent table. A table can be a dependent in an arbitrary number of relationships. For example, the EMPLOYEE table contains the foreign key WORKDEPT, which is dependent on the DEPARTMENT table that has a primary key.
A referential constraint is an assertion that non-null values of a designated foreign key are valid only if they also appear as values of a unique key of a designated parent table. The purpose of referential constraints is to guarantee that database relationships are maintained and that data entry rules are followed.
Enforcement of referential constraints has special implications for some SQL operations that depend on whether the table is a parent or a dependent. The database manager enforces referential constraints across systems based on the referential integrity rules: the INSERT, DELETE, and UPDATE rules. However, only the DELETE rules are explicitly defined.
The INSERT rule is implicit when a foreign key is specified. You can insert a row at any time into a parent table without any action being taken in the dependent table. You cannot insert a row into a dependent table unless a row in the parent table has a parent key value equal to the foreign key value of the row that is being inserted, unless the foreign key value is null. If an INSERT operation fails for one row during an attempt to insert more than one row, all rows inserted by the statement are removed from the database.
When you delete a row from a parent table, the database manager checks whether any dependent rows in the dependent table match foreign key values. If any dependent rows are found, several actions can be taken. You determine which action will be taken by specifying a DELETE rule when you create the dependent table.
The database manager prevents the update of a unique key of a parent row. When you update a foreign key in a dependent table and the foreign key is defined with the NOT NULL option, it must match a value of the parent key of the parent table.
Table-check constraints enforce data integrity at the table level. Once a table-check constraint has been defined for a table, every UPDATE and INSERT statement will involve checking the restriction, or constraint. If the constraint is violated, the data record will not be inserted or updated, and an SQL error will be returned.
A table-check constraint can be defined at table creation time or later, using the ALTER TABLE statement. The table-check constraints can help implement specific rules for the data values contained in the table by specifying the values allowed in one or more columns in every row of a table. This can save time for the application developer, as the validation of each data value can be performed by the database and not by each of the applications accessing the database.
When a check constraint is created or added, DB2 performs a syntax check on it. A check constraint cannot contain host variables or special registers. The check constraint's definition is stored in the system catalog tables: specifically, the SYSIBM.SYSCHECKS and SYSIBM.SYSCHECKDEP tables.
You can define a check constraint on a table by using the ADD CHECK clause of the ALTER TABLE statement. If the table is empty, the check constraint is added to the description of the table. If the table is not empty, what happens when you define the check constraint depends on the value of the CURRENT RULES special register, which can be either STD or DB2. If the value is STD, the check constraint is enforced immediately when it is defined. If a row does not conform, the table-check constraint is not added to the table, and an error occurs. If the value is DB2, the check constraint is added to the table description, but its enforcement is deferred. Because some rows in the table might violate the check constraint, the table is placed in check-pending(CHKP) status. The best way to remove the CHKP status is to run the CHECK DATA utility. For more information on the CHKP status and the CHECK DATA utility, refer to Chapter 7.
The ALTER TABLE statement that is used to define a check constraint will fail if the table space or partition that contains the table is in a check-pending status, the CURRENT RULES special register value is STD, and the table is not empty. To remove a check constraint from a table, use the DROP CONSTRAINT or DROP CHECK clauses of the ALTER TABLE statement. You must not use DROP CONSTRAINT on the same ALTER TABLE statement as DROP FOREIGN KEY, DROP CHECK, or DROP.
The following example adds a check constraint to a table that checks to be sure that JOB is valid:
ALTER TABLE EMPLOYEE ADD CONSTRAINT check_job CHECK (JOB IN ('Engineer','Sales','Manager'));
It is a good idea to appropriately label every constrainttriggers, table check, or referential integrity. This is particularly important for diagnosing errors that might occur.
As check constraints are used to implement business rules, you may need to change them from time to time. This could happen when the business rules change in your organization. No special command is used to change a check constraint. Whenever a check constraint needs to be changed, you must drop it and create a new one. Check constraints can be dropped at any time, and this action will not affect your table or the data within it.
When you drop a check constraint, you must be aware that data validation performed by the constraint will no longer be in effect. The statement used to drop a constraint is the ALTER TABLE statement. The following example shows how to modify the existing constraint. After dropping the constraint, you have to create it with the new definition:
ALTER TABLE EMPLOYEE DROP CONSTRAINT check_job; ALTER TABLE EMPLOYEE ADD CONSTRAINT check_job CHECK (JOB IN ('OPERATOR','CLERK'));
The CREATE TABLE statement allows you to define a new table. The definition must include its name and the attributes of its columns. The definition may include other attributes of the table, such as its primary key or check constraints. The RENAME TABLE statement can change the name of an existing table.
Once the table is defined, column names and data types cannot be modified. However, new columns can be added to the table. Be careful when adding new columns, as default data values will be used for existing records.
A table can have a maximum of 750 columns. This maximum will not vary depending on the data page size on the z/OS. DB2 supports 4KB, 8KB, 16KB, and 32KB data page sizes. Table 4-6 shows the maximum number of columns in a table and maximum length of a row by page size.
If an EDITPROC was defined on a table, the maximum row sizes shown in Table 4-6 will decrease by 10 bytes in each case.
If the table is a dependent table in a referentially intact structure, the maximum number of columns is 749.
Tables are always created within a table space. Users can specify the table space name in which the table will be created, or DB2 will create one implicitly. In the following example, DB2 will implicitly create a table space in the HUMANDB database, because no table space name was provided. The name of the table space will be derived from the table:
CREATE TABLE DEPARTMENT (DEPTNUMB SMALLINT NOT NULL, DEPTNAME VARCHAR(20), MGRNO SMALLINT, PRIMARY KEY(DEPTNUMB)) IN DATABASE HUMANDB;
Indexes will have their own index spaces created when they are created.
After a table is created, user data can be placed into the table by using one of these methods:
The LOG NO option is available only for LOB table spaces.
If you need a temporary table that is used only for the duration of a program, use the DECLARE GLOBAL TEMPORARY TABLE statement instead. This will result in no catalog contention, minimal logging, and no lock contention.
Following are sample CREATE TABLE statements. This example creates two tables. The definition includes unique constraints, check constraints, and referential integrity. In this example, the following conditions apply.
CREATE TABLE DEPARTMENT (DEPTNUMB SMALLINT NOT NULL, DEPTNAME VARCHAR(20), MGRNO SMALLINT, PRIMARY KEY(DEPTNUMB)) IN HUMANDB.HUMANTS; CREATE TABLE EMPLOYEE (ID SMALLINT NOT NULL, NAME VARCHAR(9) NOT NULL, DEPT SMALLINT, JOB CHAR(5) CHECK (JOB IN('SALES','MGR','CLERK')), HIREDATE DATE WITH DEFAULT CURRENT DATE, SALARY DECIMAL(7,2), COMM DECIMAL(7,2), CONSTRAINT UNIQUEID PRIMARY KEY(ID), FOREIGN KEY(DEPT) REFERENCES DEPARTMENT(DEPTNUMB) ON DELETE RESTRICT) IN HUMANDB.HUMANTS;
LOB data is not stored in the table in which it is defined. The defined LOB column holds information about the LOB, whereas the LOB itself is stored in another location. The normal place for this data storage is a LOB table space defining the physical storage that will hold an auxiliary table related to the base column and table.
Because the LOB is stored in a separate table, one performance consideration might be that if you have a large variable-character column in usethat is infrequently accessedyou may be able to convert it to a LOB so that it is kept separately, and this could speed up table space scans on the remaining data because fewer pages would be accessed.
Null is the only supported default value for a LOB column; if the value is null, it will not take up space in the LOB table space. The following examples show how to create a base table with a LOB and an auxiliary table to support it:
CREATE TABLE DB2USER1.CANDIDATE (CID CANDIDATE_ID NOT NULL, ... CERT_APP CHAR(1) NOT NULL WITH DEFAULT, PHOTO BITMAP, PRIMARY KEY (CID); IN DB2CERT.CERTTS; CREATE AUX TABLE CAND_PHOTO IN DB2CERT.CERTPIC STORES DB2USER1.CANDIDATE COLUMN PHOTO;
Copying a Table Definition
It is possible to create a table using the same characteristics of another table or a view. This is done through the CREATE TABLE LIKE statement. The name specified after LIKE must identify a table or a view that exists at the current server, and the privilege set must implicitly or explicitly include the SELECT privilege on the identified table or view. An identified table must not be an auxiliary table. An identified view must not include a column that is considered to be a ROWID column or an identity column.
The use of LIKE is an implicit definition of n columns, where n is the number of columns in the identified table or view. The implicit definition includes all attributes of the n columns as they are described in SYSCOLUMNS, with a few exceptions such as identity attributes, unless the INCLUDING IDENTITY clause is used.
The implicit definition does not include any other attributes of the identified table or view. For example, the new table will not have a primary key, foreign key, or check constraint. The table is created in the table space implicitly or explicitly specified by the IN clause, and the table has any other optional clause only if the optional clause is specified.
Following is an example of the CREATE LIKE statement:
CREATE TABLE NEW_DEPT LIKE DEPARTMENT IN DATABASE HUMANDB
Modifying a Table
After you create a table, you can use the ALTER TABLE statement to modify existing tables. The ALTER TABLE statement modifies existing tables by
If a primary key is dropped, the dependents will no longer have foreign keys.
The following example shows how to add a check constraint to the DEPARTMENT table:
ALTER TABLE DEPARTMENT ADD CHECK (DEPTNUM > 10)
The next example shows how to change a data type of a column. We assume in this example that the data type was originally CHAR and was less than 30.
ALTER TABLE DEPARTMENT ALTER COLUMN NAME CHAR(35)
With a change such as increasing the size of a data type column, we must not forget to relay this information to the application programmer, as this may impact code, that is, host variable definitions.
Removing a Table
When you want to remove a table, issue a DROP TABLE statement:
DROP TABLE EMPLOYEE
Any objects that are directly or indirectly dependent on this table are deleted or made inoperativefor example, indexes, triggers, and views. Whenever a table is deleted, its description is deleted from the catalog, and any packages that reference the object are invalidated.
A sequence object is a user-defined object that generates a sequence of numeric values according to the specifications in which it was created. Sequence objects provide an incremental counter generated by DB2 and are similar to identity columns. An identity column can be thought of as a special kind of sequence object; however, the sequence column is separate from the table. Sequence objects support many of the same attributes as identity columns, such as INCREMENT BY, CACHE, CYCLE, RESTART WITH, MINVAL, and MAXVAL.
Sequence object values can be used by applications for a variety of reasons and have several benefits:
Creating Sequence Objects
The sequence name is made up of two parts: the 128-byte schema name and the 128-byte identifier. Sequence objects are created using a new CREATE SEQUENCE statement, and all attributes are completely user defined or defaults. The values in the sequence object can be of any exact numeric data type. The starting value is defined with a START WITH value and advances with INCREMENT BY, ascending or descending. The values can be cached and are generated in the order of request.
The following example shows the creation and simple use of a sequence object:
CREATE SEQUENCE ACCOUNT_SEQ AS INTEGER START WITH 1 INCREMENT BY 10 CYCLE CACHE 20
Using Sequence Objects
Some additional advantages of using sequence objects over other methods of number generation include the use of the NEXT VALUE FOR and PREVIOUS VALUE FOR expressions. The short terms NEXTVAL and PREVVAL can be used as synonyms for NEXT VALUE and PREVIOUS VALUE, respectively. NEXTVAL FOR generates and returns the next value for the sequence object. PREVVAL FOR generates and returns the previous value for the sequence object. These statements can be used with the following:
The following examples show use of these statements; assume that ACCT_SEQ is START WITH 10 INCREMENT BY 10.
As you can see, using sequence objects instead of identity columns has many benefits. Table 4-7 provides a short comparison of the two.
Both identity columns and sequence objects have their place. Given that they both accomplish the same objectivegenerating sequence numbersit is up to you to choose which one would work best for you. That determination will depend on the flexibility you need with the generated numbers and how the applications will be using them.
Modifying Sequence Objects
Sequence objects can be altered to change
You cannot change/alter the data type or length of the values generated by a sequence object. In order to do this, you would need to drop and recreate the sequence object.
Removing Sequence Objects
Sequence objects can be removed by using the DROP statement:
DROP SEQUENCE <sequence-name>
Data is stored in table spaces, which comprise one or many VSAM data sets.
Types of Table Spaces
Types of table spaces may be simple, segmented, partitioned, or LOB.
Simple Table Space
Simple table spaces are the default but normally not the optimal. More than one table can be in a simple table space. If several tables are in the table space, rows from different tables can be interleaved on the same page; therefore, when a page is locked, rows of other tables may potentially be locked.
Segmented Table Space
Normally, if a table is not partitioned, a segmented table space is used. A segmented table space organizes its pages into segments, with each segment containing the rows of only one table. Segments can be 4 to 64 pages each, with each segment having the same number of pages. When using a segmented table space to hold more than one table, make sure that the tables have similar characteristics in all categories, including size, volatility, locking needs, compression, and backup/recovery strategies. Some guidelines for how many tables to have in a segmented table space, based on the number of pages in the table space, are listed in Table 4-8.
Using segmented table spaces has several advantages. Because the pages in a segment contain rows from only one table, locking interference with other tables will not occur. In simple table spaces, rows are intermixed on pages: If one table page is locked, it can inadvertently lock a row of another table just because it is on the same page. When you have only one table per table space, this is not an issue. Other benefits of having a segmented table space for one table follow.
SEGSIZE tells DB2 how large to make each segment for a segmented table space and determines how many pages are contained in a segment. SEGSIZE will vary, depending on the size of the table space. Recommendations are listed in Table 4-9.
A segmented table space is a single data set and is limited to 2GB.
The following example shows how to create a segmented table space with a segment size of 32:
CREATE TABLESPACE CERTTS IN DB2CERT USING STOGROUP CERTSTG PRIQTY 52 SECQTY 20 ERASE NO LOCKSIZE PAGE BUFFERPOOL BP6 CLOSE YES SEGSIZE 32;
ALTER cannot be used on the SEGSIZE parameter.
Partitioned Table Space
Partitioning a table space has several advantages. For large tables, partitioning is the only way to store large amounts of data. Partitioning also has advantages for tables that are not large. DB2 allows defining up to 4,096 partitions of up to 64GB each. (However, total table size is limited to 128TB (terabytes) and the number of partitions is dependent on the DSSIZE specified.) Nonpartitioned table spaces are limited to 64GB of data.
You can take advantage of the ability to execute utilities on separate partitions in parallel. This also gives you the ability to access data in certain partitions while utilities are executing on others. In a data sharing environment, you can spread partitions among several members to split workloads. You can also spread your data over multiple volumes and need not use the same storage group for each data set belonging to the table space. This also allows you to place frequently accessed partitions on faster devices. The following example shows how to create a partitioned table space with two partitions:
CREATE TABLESPACE CERTTSPT IN DB2CERT USING STOGROUP CERTSTG PRIQTY 100 SECQTY 120 ERASE NO NUMPARTS 2 (PART 1 COMPRESS YES, PART 2 FREEPAGE 20) ERASE NO LOCKSIZE PAGE CLOSE NO;
It is possible to use the ALTER statement to change the partitioning-key ranges for rebalancing. A REBALANCE option on the REORG utility allows for this as well.
Many designs are outgrowing the bounds of V7 partitioned table spaces and need to add partitions. In the past, this process could be rather cumbersome. In version 8, this can be done by using the ALTER statement:
ALTER TABLE table1 ADD PARTITION
DB2 picks the partition number, based on the next-available partition. If the partitions are storage group (stogroup) defined, the data set is allocated for table space and partitioning index with a PRIQTY value picked up from the previous partition. You may need to alter the table space to give new parameters to better suit the new partition, that is, primary allocations and free space. If your underlying data sets are VCAT defined, the data set will need to be defined first, before the ALTER is performed. The new partitions will be available immediately after the ALTER. In order for this to take place, the table space will have to be stopped and then started again. A small outage will occur in order to take advantage of this.
All affected plans, packages, and cached SQL statements are invalidated because SQL may be optimized to read certain partitions, and the new number of partitions may change the access path. If the table space is defined as LARGE, the limit is 64 partitions. If using DSSIZE, the number of the partitions is limited, based on the chosen DSSIZE and the page size. See Figure 4-12 later in this chapter.
Figure 4-12. Tables for theDB2CERT database
Dealing with the archiving of data or rolling of partitions often involves a large outage and a lot of previous planning. Even though tables can support up to 4,096 partitions, some limitations exist; supporting that much historical data in the primary table space may not be desirable.
Partitions can be rotated in order to move the first partition to last. Doing so will allow moving the lowest logical partition to the last logical partition and specifying the new partition values for enforcing the values of the last partition. The RESET option allows a partition to be reset, at which time the data is deleted, which can impact the performance of the feature and the availability of the data. Before the rotation, the partition holds the oldest data; afterward, it will hold the newest data. The REUSE option allows extents to be kept, and the data set will not be physically deleted. Following is an example of how to rotate partitions:
ALTER TABLE table1 ROTATE PARTITION FIRST TO LAST ENDING AT ('2004-03-31')
As mentioned, the data is deleted from old partitions. Depending on the size of the partition, the logging impacts of the deletes may become a performance issue. You may want to consider unloading the partition prior to ALTER by using the old tried-and-true method of LOAD…PART x REPLACE, using a dummy SYSREC. This will help lessen the duration of the outage during the rotating of the partitions.
During the rotate, a DBD lock will be taken as the DDL completes. All activity will be quiesced immediately when the ALTER is issued.
Another issue is the fact that all keys for deleted rows must be deleted from nonpartitioning indexes (NPIs). If multiple NPIs are on a table, the scans must be performed serially to perform the deletes.
If a physical DB2-enforced referential integrity relationship or triggers exist, deletes will be a row at a time. Again, in this situation, you may want to consider an unload of the data first. But if a DELETE RESTRICT relationship is in place, the rotate may not work.
The ALTER TABLE ROTATE can change partition order. To account for this, the LOGICAL_PART column has been added to the SYSTABLEPART and SYSCOPY tables. This change will also be visible in the DISPLAY DATABASE output.
Figure 4-3 shows the ALTER statement of the three-partition CUSTOMER table. In this example, we are rotating off the first physical partition (P1) and now the last logical partition (P3).
Figure 4-3. Rotating partitions
In order to have a partitioned table in the DB2 releases prior to version 8, a partitioning index defined to give DB2 the limit keys for each partition was required. This partitioning index was also required to be the clustering index. Version 8 does not require a partitioning index to be defined on the partitioned table. The ability to partition is handled by the table creation, not the index. The following example uses the CREATE TABLE statement to handle the partitioning of a table:
CREATE TABLE ACCOUNTS (ENTER_DATE DATE, ACCOUNT INTEGER, STATUS CHAR(3)) IN TSP1 PARTITION BY (ENTER_DATE) (PARTITION 1 ENDING AT ('2002-02-28'), PARTITION 2 ENDING AT ('2002-03-31'));
The PART VALUES clause could also be used instead of the PARTITION ENDING AT clause; however, PARTITION ENDING AT is preferred.
Once a partitioned table has been created, it is ready to be used. Creating a separate partitioning index with the VALUES keyword is not required and not allowed. The information about the limit keys for the partitions will be stored in SYSTABLES, SYSCOLUMNS, and SYSTABLEPART.
Using table-controlled partitioning instead of index-controlled partitioning is recommended. Table-controlled partitioning is a replacement for index-controlled partitioning. Table 4-10 lists the differences between the two partitioning methods.
The change from index-controlled partitioning to table-controlled partitioning can occur when any of the following are used:
LOB Table Space
A LOB table space needs to be created for each columnor each column of each partitionof a base LOB table with a LOB column. This table space will contain the auxiliary table. The LOB table space has a structure that can be up to 4,000TB in size. This is a storage model used only for LOB table spaces. This linear table space can be up to 254 data sets, which can be up to 64GB each. The LOB table space is implemented in the same fashion as pieces are implemented for NPIs, as discussed later in this chapter. We can have 254 partitions of 64GB each, for a total of 16TB, and we will have one for each LOB column, up to 254 partitions, or a possible 4,000TB. See Figure 4-4.
Figure 4-4. Physical LOB storage: table with a LOB column up to 254 partitions
Partitioned base tables can each have different LOB table spaces. A LOB value can be longer than a page in a LOB table space and can also span pages. Following is an example of creating a LOB table space:
CREATE LOB TABLESPACE CERTPIC IN DB2CERT USING STOGROUP CERTSTG PRIQTY 3200 SECQTY 1600 LOCKSIZE LOB BUFFERPOOL BP16K1 GBPCACHE SYSTEM LOG NO CLOSE NO;
If a database is not explicitly defined for a LOB table space, it defaults to DSNDB04.
Creating Table Spaces
The CREATE TABLESPACE statement is used to define a simple, segmented, or partitioned table space on the current server. A large number of parameters with significant options are used in defining table spaces, depending on the type of table space. The major table space parameters follow.
Four page sizes, listed in Table 4-9, are available for use.
The 8KB, 16KB, and 32KB pages are comprised of 4K control intervals (CIs), unless the DSVCI DSN2PARM is set to YES. Index page sizes are only 4KB pages, and work file (DSNDB07) table space pages are only 4KB or 32KB. The page size is defined by the buffer pool chosen for the table space; that is, BP8K0 supports 8KB pages.
Better hit ratios can be achieved with the larger page sizes, and they have less I/O because more rows can fit on a page. For instance, for a 2,200-byte rowperhaps for a data warehousea 4KB page would be able to hold only one row, but if an 8KB page were used, three rows could fit on a page, one more than if 4KB pages were used, and one less lock would be required.
Depending on the DSSIZE and the page size, the number of partitions a table space can have has limitations. Table 4-12 shows these limits.
The FREEPAGE and PCTFREE clauses are used to help improve the performance of updates and inserts by allowing free space to exist on table spaces or index spaces. Performance improvements include improved access to the data through better clustering of data, less index page splitting, faster inserts, fewer row overflows, and a reduction in the number of REORGs required. Some tradeoffs include an increase in the number of pagesand therefore more auxiliary storage neededfewer rows per I/O and less efficient use of buffer pools, and more pages to scan.
As a result, it is important to achieve a good balance for each individual table space and index space when deciding on free space, and that balance will depend on the processing requirements of each table space or index space. When inserts and updates are performed, DB2 will use the free space defined, and by doing so can keep records in clustering sequence as much as possible. When the free space is used up, the records must be located elsewhere, and this is when performance can begin to suffer. Read-only tables do not require any free space, and tables with a pure insert-at-end strategy generally don't require free space. Exceptions to this are tables with VARCHAR columns and tables using compression and that are subject to updates.
The FREEPAGE amount represents the number of full pages inserted between each empty page during a LOAD or REORG of a table space or an index space. The tradeoff is between how often reorganization can be performed and how much disk space can be allocated for an object. FREEPAGE should be used for table spaces so that inserts can be kept as close to the optimal page as possible. For indexes, FREEPAGE should be used for the same reason, except improvements would be in terms of keeping index page splits near the original page instead of placing them at the end of the index. FREEPAGE is useful when inserts are sequentially clustered.
PCTFREE is the percentage of a page left free during a LOAD or a REORG. PCTFREE is useful when you can assume an even distribution of inserts across the key ranges. It is also needed in indexes to avoid all random inserts causing page splits.
Using the COMPRESS clause of the CREATE TABLESPACE and ALTER TABLESPACE SQL statements allows for the compression of data in a table space or in a partition of a partitioned table space.
Indexes and LOB table spaces are not compressed.
In many cases, using the COMPRESS clause can significantly reduce the amount of Direct Access Storage Device (DASD) space needed to store data, but the compression ratio achieved depends on the characteristics of the data. Compression allows us to get more rows on a page and therefore see many of the following performance benefits, depending on the SQL workload and the amount of compression:
Using compression also has some considerations for processing cost.
The following example shows a table space created with compression:
CREATE TABLESPACE CERTTSPT IN DB2CERT USING STOGROUP CERTSTG PRIQTY 100 SECQTY 120 ERASE NO COMPRESS YES ERASE NO LOCKSIZE PAGE
Modifying Table Spaces
After you create a table space, the ALTER TABLESPACE statement enables you to modify existing table spaces. The ALTER TABLESPACE statement modifies many of the table space parameters, such as
If ALTER is used on the USING VCAT, the table space must be stopped first.
In the following example, ALTER TABLESPACE is used to change the buffer pool assignment and size of the locks used:
ALTER TABLESPACE DB2CERT.CERTTS BUFFERPOOL BP4 LOCKSIZE ROW
Removing Table Spaces
When you want to remove a table space, use the DROP TABLESPACE statement to delete the object. This will remove any objects that are directly or indirectly dependent on the table space. This statement will also invalidate any packages or plans that refer to the object and will remove its descriptions and all related data from the catalog.
DROP TABLESPACE DB2CERT.CERTTS
Views are logical tables created using the CREATE VIEW statement. Once a view is defined, it can be accessed using DML statements, such as SELECT, INSERT, UPDATE, and DELETE, as if it were a base table. A view is a temporary table, and the data in the view is available only during query processing.
With a view, you can make a subset of table data available to an application program and validate data that is to be inserted or updated. A view can have column names that are different from those of corresponding columns in the original tables. The use of views provides flexibility in the way the application programs and end user queries look at the table data.
A sample CREATE VIEW statement is shown in the following example. The original table, EMPLOYEE, has columns named SALARY and COMM. For security reasons, this view is created from the ID, NAME, DEPT, JOB, and HIREDATE columns. In addition, we are restricting access on the column DEPT. This definition will show the information only of employees who belong to the department whose DEPTNO is 10.
CREATE VIEW EMP_VIEW1 (EMPID,EMPNAME,DEPTNO,JOBTITLE,HIREDATE) AS SELECT ID,NAME,DEPT,JOB,HIREDATE FROM EMPLOYEE WHERE DEPT=10;
After the view has been created, the access privileges can be specified. This provides data security, as a restricted view of the base table is accessible. As we see in this example, a view can contain a WHERE clause to restrict access to certain rows or can contain a subset of the columns to restrict access to certain columns of data.
The column names in the view do not have to match the column names of the base table. The table name has an associated schema, as does the view name. Once the view has been defined, it can be used in such DML statements as SELECT, INSERT, UPDATE, and DELETE, with restrictions. The database administrator can decide to provide a group of users with a higher-level privilege on the view than the base table.
A view is an alternative way to look at data in one or more tables. A view is an SQL SELECT statement that is executed whenever the view is referenced in an SQL statement. Because it is not materialized until execution, such operations as ORDER BY, the WITH clause, and the OPTIMIZE FOR clause have no meaning.
WITH CHECK OPTION
If the view definition includes conditions, such as a WHERE clause, and the intent is to ensure that any INSERT or UPDATE statement referencing the view will have the WHERE clause applied, the view must be defined using WITH CHECK OPTION. This option can ensure the integrity of the data being modified in the database. An SQL error will be returned if the condition is violated during an INSERT or UPDATE operation.
The following example is of a view definition using the WITH CHECK OPTION, which is required to ensure that the condition is always checked. You want to ensure that the DEPT is always 10. This will restrict the input values for the DEPT column. When a view is used to insert a new value, the WITH CHECK OPTION is always enforced.
CREATE VIEW EMP_VIEW2 (EMPID,EMPNAME,DEPTNO,JOBTITLE,HIREDATE) AS SELECT ID,NAME,DEPT,JOB,HIREDATE FROM EMPLOYEE WHERE DEPT=10 WITH CHECK OPTION;
If the view in this example is used in an INSERT statement, the row will be rejected if the DEPTNO column is not the value 10. It is important to remember that no data validation occurs during data modification if the WITH CHECK OPTION is not specified. If the view in the previous example were used in a SELECT statement, the conditionalWHERE clausewould be invoked, and the resulting table would contain only the matching rows of data. In other words, the WITH CHECK OPTION does not affect the result of a SELECT statement. The WITH CHECK OPTION must not be specified for read-only views.
A read-only view allows no inserts, updates, or deletes. A view is considered read-only if
Modifying and Removing a View
In order to modify a view, you simply drop and recreate it. You cannot alter a view's attributes, but you can regenerate it. To remove a view, you simply use the following:
DROP VIEW EMP_VIEW2
Materialized Query Tables (MQTs)
Prior to V8, decision-support queries were difficult and expensive. They typically operated over a large amount of data that might have to scan or process terabytes of data and possibly perform multiple joins and complex aggregations. With these types of queries, traditional optimization was failing, and performance was less than optimal.
The use of materialized query tables, which used to be known as automatic summary tables on non-mainframe platforms, allows you to precompute whole or parts of each query and then use computed results to answer future queries. MQTs provide the means to save the results of prior queries and then reuse the common query results in subsequent queries. This helps avoid redundant scanning, aggregating, and joins. MQTs are useful for data warehousetype applications.
MQTs do not completely eliminate optimization problems; rather, MQTs move optimization issues to other areas. Some challenges include finding the best MQT for expected workload, maintaining the MQTs when underlying tables are updated, recognizing usefulness of MQT for a query, and determining to use the MQT for a query. Most of these types of problems are addressed by OLAP tools, but MQTs are the first step.
MQTs work with two tables: a source table and a materialized query table. The source table is the base table, view, table expression, or table function. The materialized query table is the table used to contain materialized data derived from one or more source tables in a FULLSELECT and is similar to creating a view. However, a view is logical, whereas an MQT contains materialized data of the query result. You could refer to an MQT as a "materialized view."
MQTs are created with the CREATE TABLE statement, and the columns of the MQT can be explicitly specified or derived from the fullselect associated with the table. MQTs are physically stored as are declared temporary tables. The following is an example of the syntax used to create an MQT:
CREATE TABLE SALESMQT (CUSTID, STOREID, LOCID, MTH) AS ( SELECT CUSTID, STOREID, LOCID, MTH, COUNT(*) FROM SALES GROUP BY CUSTID, STOREID, LOCID, MTH) DATA INITIALLY DEFERRED REFRESH DEFERRED MAINTAINED BY SYSTEM ENABLE QUERY OPTIMIZATION
This example creates an MQT called SALESMQT.
The option DATA INITIALLY DEFERRED states that when a materialized query table is created, it will not be populated immediately by the fullselect defined in the DDL. The REFRESH DEFERRED option says that the data in the MQT is not refreshed immediately when its based tables are updates. The MQT can be refreshed at any time by using the REFRESH TABLE statement. The REFRESH TABLE command can also be used for the initial population of the MQT. The REFRESH TABLE statement deletes all rows with a mass delete on the MQT and then executes the fullselect in the MQT definition to recalculate the data from the base tables. It then inserts the calculated result into the MQT and updates the catalog for the refresh timestamp and cardinality of the MQT. This is all performed in a single commit scope. Following is an example of the how to refresh SALESMQT:
REFRESH TABLE SALESMQT;
The MAINTAINED BY SYSTEM option indicates that the MQT is system maintained and that you will need to use the SQL statement REFRESH TABLE to perform this. This option does not allow for user updates by LOAD, INSERT, UPDATE, or DELETE and thus is by nature read-only.
The MAINTAINED BY USER option indicates that the MQT is user maintained by either triggers or batch updates. This option allows for user updates via LOAD, INSERT, UPDATE, or DELETE, and it can also be updated by the REFRESH TABLE statement, which can use EXPLAIN if necessary.
ENABLE FOR QUERY OPTIMIZATION allows the DB2 optimizer to choose the MQT for use during the processes of establishing an access path. Optimization is discussed further in Chapter 17.
An index is a list of the locations of rows, sorted by the contents of one or more specified columns. Indexes are typically used to improve the query performance, but can also serve a logical data design purpose. For example, a unique index does not allow the entry of duplicate values in columns, thereby guaranteeing that no rows of a table are the same. Indexes can be created to specify ascending or descending order by the values in a column. The indexes contain a pointer, known as a row identifier (RID), to the physical location of the rows in the table.
Indexes are created for three main purposes:
More than one index can be defined on a particular base table, which can have a beneficial effect on the performance of queries. However, the more indexes there are, the more the database manager must work to keep the indexes up to date during UPDATE, DELETE, and INSERT operations. Creating a large number of indexes for a table that receives many updates can slow down processing. Some of the most important parameters of the CREATE INDEX statement are listed in Table 4-13.
Indexes are also stored in underlying VSAM data sets, as are table spaces. In order to make a correlation between the index that you or DB2 created, you can look in the INDEXSPACE column in the SYSINDEXES catalog table.
Type 2 indexes are created by default. Type 2 indexes have no locking and do not have subpages. Type 1 indexes should not be used and, as of version 8, are not supported.
An index can be defined with the DEFINE NO option (DEFINE YES is the default). This is done to specify an index and defer the physical creation. The data sets will not be created until data is inserted into the index. This option is helpful in order to reduce the number of physical data sets.
It is generally important to control the physical sequence of the data in a table. The CLUSTER option is used on one, and only one, index on a table and specifies the physical sequence. If not defined, the first index defined on the table in a nonpartitioned table space is used for the clustering sequence. The best clustering index is the one that supports the majority of the sequential access to the data in the table.
When the CLUSTER keyword is used, the index is called the explicit clustering index. If no index is defined with the CLUSTER keyword, the index DB2 chooses for clustering is called the implicit clustering index.
In defining table spaces, one option is called MEMBER CLUSTER. When this option is specified, the clustering sequence specified by the clustering index is ignored. In that case, DB2 will locate the data on the basis of available space when an SQL INSERT statement is used. This option is used mainly in a data sharing environment to avoid excessive p-lock (physical lock) negotiation on the space map when inserts are coming in by the clustering index on multiple members.
The clustering index can also be changed. ALTER can be used to change whether an index is the clustering index. For instance, the following syntax changes an index to not be the clustering index:
ALTER INDEX CUSTIX NO CLUSTER
After the NO CLUSTER alter index is completed, the inserts will still occur by the implicit clustering index. A new clustering index needs to be defined, or ALTER can be used on an existing index to become the new clustering index, as shown in the following syntax:
ALTER INDEX CUSTIX2 CLUSTER
When the new clustering index is defined, inserts will occur by the new index. Obviously, for performance reasons, it is wise to perform a REORG right after the ALTER to specify the new clustering index so that the inserts can now occur by the new clustering index.
Partitioned indexes are of two types: primary and secondary. If the keys in the index match the limit-key columns, or subset of columns, in the partitioned table, the index is considered a primary partitioned index; if not, it is considered a secondary partitioned index. This will be reflected in SYSIBM.SYSINDEXES.
CREATE UNIQUE PARTITIONED INDEX DB2USER1.TESTCNTX ON DB2USER1.TEST_CENTER (TCID ASC) USING STOGROUP CERTSTG PRIQTY 512 SECQTY 64 ERASE NO CLUSTER BUFFERPOOL BP3 CLOSE YES;
A partitioned index is made up of several data sets. Each partition can have different attributes; that is, some may have more free space than others.
Unique and Nonunique Indexes
A unique index guarantees the uniqueness of the data values in a table's columns. The unique index can be used during query processing to perform faster retrieval of data. The uniqueness is enforced at the end of the SQL statement that updates rows or inserts new rows. The uniqueness is also checked during the execution of the CREATE INDEX statement. If the table already contains rows with duplicate key values, the index is not created. An example of creating a unique index follows:
CREATE UNIQUE INDEX DB2USER1.TESTIX ON DB2USER1.TEST (NUMBER ASC) USING STOGROUP CERTSTG PRIQTY 512 SECQTY 64 ERASE NO CLUSTER
A nonunique index can improve query performance by maintaining a sorted order for the data. Depending on how many columns are used to define a key, you can have an atomic, or single-column, key or a composite key, which is composed of two or more columns.
The following are types of keys used to implement constraints.
DB2 uses unique indexes and the NOT NULL option to maintain primary and unique key constraints.
Unique Where Not Null Index
This is a special form of a unique index. Normally, in a unique index, any two null values are taken to be equal. Specifying WHERE NOT NULL will allow any two null values to be unequal.
Null Values and Indexes
It is important to understand the difference between a primary key and a unique index. DB2 uses two elements to implement the relational database concept of primary and unique keys: unique indexes and the NOT NULL constraint. Therefore, unique indexes do not enforce the primary key constraint by themselves, as they can allow a null value. Null values are unknown, but when it comes to indexing, a null value is treated as equal to all other null values, with the exception of the UNIQUE WHERE NOT NULL INDEX. You cannot insert a NULL value twice if the column is a key of a unique index, because it violates the uniqueness rule for the index.
NPIs are used on partitioned tables and are not the same as the clustered partitioning key, which is used to order and partition the data. Rather, NPIs are for access to the data.
NPIs can be unique or nonunique. Although you can have only one clustered partitioning index, you can have several NPIs on a table, if necessary. NPIs can be broken apart into multiple pieces, or data sets, by using the PIECESIZE clause on the CREATE INDEX statement. Pieces can vary in size from 254KB to 64GB; the best size will depend on how much data you have and how many pieces you want to manage. If you have several pieces, you can achieve more parallelism on processes, such as heavy INSERT batch jobs, by alleviating the bottlenecks caused by contention on a single data set. The following example shows how to create an NPI with pieces:
CREATE UNIQUE INDEX DB2USER1.TESTCN2X ON DB2USER1.TEST_CENTER (CODE ASC) USING STOGROUP CERTSTG PIECESIZE 512K;
Data-Partitioned Secondary Indexes
The new DPSI index type provides many advantages over the traditional NPIs for secondary indexes on a partitioned table space in terms of availability and performance. We can now choose to partition a table without a partitioning/clustering index. So now you can choose to cluster by a different index if it better fits your data and processing requirements. This can be accomplished through a DPSI.
The partitioning scheme of the DPSI will be the same as the table space partitions, and the index keys in x index partition will match those in x partition of the table space. Figure 4-5 shows how a DPSI is physically structured. The CUST_NUMX index is a DPSI.
Figure 4-5. DPSI physical layout
A DPSI provides the following benefits:
Although partition independence is furthered, some queries may not perform as well with DPSIs. If it has predicates that reference partitioning column values in a single partition and therefore are restricted to a single partition of the DPSI, the query will benefit from this new organization. The queries will have to be designed to allow for partition pruning through the predicates in order to accomplish this. However, if it references only columns in the DPSI, the predicate may not perform very well because it may need to probe several partitions of the index. Figure 4-6 shows how applications will need to code predicates for DPSIs.
Figure 4-6. Coding predicates for DPSIs
Another limitation to using DPSIs is the fact that they cannot be unique and may not be the best candidates for ORDER BYs.
An index must be created on an auxiliary table for a LOB. The index itself consists of 19 bytes for the ROWID and 5 bytes for the RID. Therefore, the index is always unique. No LOB columns are allowed in the index. The following example shows the CREATE statement for an auxiliary index:
CREATE INDEX DB2CERT.PHOTOIX ON DB2USER1.CAND_PHOTO USING VCAT DB2USER1 COPY YES;
No columns are specified, because the auxiliary indexes have implicitly generated keys.
General Indexing Guidelines
Indexes consume disk space. The amount of disk space will vary, depending on the length of the key columns and whether the index is unique or nonunique. The size of the index will increase as more data is inserted into the base table. Therefore, consider the disk space required for indexes when planning the size of the database. Some of the indexing considerations include the following.
Modifying an Index
The ALTER INDEX statement allows you to change many of the characteristics on the index, such as
The following example shows how to change the buffer pool assignment for an index:
ALTER INDEX TESTCN2X BUFFERPOOL BP1;
Removing an Index
When you want to remove an index, issue the following statement:
DROP INDEX TESTCN2X;
A database is a collection of table spaces, index spaces, and the objects with them. A couple of types of databases are used for special purposes.
For these special databases, you can also specify which data sharing member they are for, as each member must have its own. You can also specify the coding scheme for the data in the database (ASCII, CCSID, EBCDIC, Unicode).
Creating a Database
Following is an example of the creation of the DB2CERT database. The BUFFERPOOL parameter lets us specify that any objects created in this database without a buffer pool assigned will default to buffer pool BP7. INDEXBP has the same purpose but provides a default buffer pool for indexes.
CREATE DATABASE CERTTS STOGROUP CERTSTG BUFFERPOOL BP7 INDEXBP BP8;
Modifying a Database
The default buffer pools, encoding scheme, and the storage group can be changed for a database. The following is an example ALTER DATABASE:
ALTER DATABASE DB2CERT BUFFERPOOL BP4;
Removing a Database
It is very easy to remove a database: a simple DROP statement, provided that all the appropriate authorities are in place. When a database is dropped, all dependent objects are dropped as well.
DROP DATABASE DB2CERT;
Storage groups are used to list the DASD volumes that will be used to store the data. Storage groups can contain one or many volumes and can work with or without SMS (system-managed storage). If table spaces or index spaces are defined using a storage groupidentified in the USING clause in the CREATE INDEX and CREATE TABLESPACE statementsthey are considered to be DB2 managed, and DB2 will create them, allowing you to specify the PRIQTY and SECQTY for the data set allocations. Otherwise, they are considered to be user managed and must be defined explicitly through the IDCAMS utility.
To create a storage group, do the following:
CREATE STOGROUP CERTSTG VOLUME(*) VCAT DB2USER1;
The asterisk (*) in this example indicates that SMS will manage the volumes to be used.
We can add or remove volumes within a storage group. This is done via ALTER:
ALTER STOGROUP CERTSTG ADD VOL1;
Storage groups can be removed by using the DROP statement. This can be done only if no table spaces or index spaces are using it.
DROP STOGROUP CERTSTG;