Table Guidelines

 <  Day Day Up  >  

The table is the basic unit of data that is accessible when using SQL. Data is inserted to, deleted from, and updated within DB2 tables. Once populated , data can be selected from a DB2 table. Basically, the table is the means by which end users gain access to DB2 data.

In general, you will define one table for each entity for which you will be storing data. A table can be thought of as a grouping of attributes that identify a physical entity. The table name should conform to the entity name . For example, consider the sample table for employees , DSN8810.EMP . EMP is the name of the table that represents an entity known as "employee." An employee has many attributes, some of which are EMPNO , FIRSTNME , and LASTNME . These attributes are columns of the table.

When you create one table for each entity, the tables are easy to identify and use because they represent real-world "things."

Of course, at times, the simple rule of one "physical" table per "logical" entity will need to be broken. This usually occurs when you need to denormalize for performance reasons. More details on are provided later in this chapter in the section "Denormalization."

DB2 Table Parameters

The preceding section concentrated on the rows and columns of a DB2 table. Other parameters also must be considered when creating DB2 tables. This section provides guidelines to assist you in your table creation endeavors.

Encoding Scheme

The CCSID parameter can be used to specify ASCII , EBCDIC , or UNICODE encoding at the table level as well as at the table space level. All data stored within a table space must use the same encoding scheme. Any indexes defined for tables in the table space will have the same encoding scheme as the table space.

graphics/v7_icon.gif

Support for Unicode is new as of DB2 Version 7.


DROP Restriction

To prohibit inadvertent table drops , use the WITH RESTRICT ON DROP clause of the CREATE TABLE statement. When WITH RESTRICT ON DROP is specified, drops cannot be issued for the table, its table space, and its database. To subsequently drop the table, it must first be altered to remove the RESTRICT ON DROP specification.

DB2-Enforced Table Auditing

If you must audit user access to DB2 tables, you can specify an audit rule for your tables. Although the auditing features of DB2 are rudimentary, sometimes they are useful. DB2 has three table audit options: NONE , CHANGES , and ALL .

DB2 table auditing is done on a unit-of-work basis only. DB2 audits only the first table access of any particular type for each unit of work, not every table access. AUDIT CHANGES writes an audit trace record for the first insert, update, and delete made by each unit of work. AUDIT ALL writes an audit trace record for the first select, insert, update, and delete made by each unit of work. By specifying AUDIT NONE or by failing to code an audit parameter, table auditing is inactivated.

Before deciding to audit DB2 table access, consider that table auditing incurs overhead ”each time a table is accessed in a new unit of work, an audit trace record is written. Additionally, even if auditing has been specified for a given table, no audit trace records are written unless the appropriate DB2 audit trace classes are activated. For AUDIT CHANGES , activate audit trace classes 1, 2, 3, 4, 7, and 8. For AUDIT ALL , activate audit trace classes 1 through 8.

NOTE

Keep in mind that the ALTER TABLE statement itself is audited only if AUDIT CHANGES or AUDIT ALL is specified and the appropriate audit trace class has been activated.


In general, to alleviate overhead, do not audit table access unless your application absolutely requires it.

Specifying Your Own OBID

You can use the OBID parameter to explicitly specify a particular OBID (object identifier) for DB2 to use for the table being created. An OBID is an identifier assigned by DB2 for its own internal use and identification purposes.

You might choose to specify an OBID if you are re-creating a table that used to exist and you wish to keep the previous OBID that was assigned to the table. For example, if you are dropping a table but will be re-creating it with changes you may wish to keep the previous OBID for that table.

To obtain the current OBID of any table you can query the DB2 Catalog as follows (supplying the CREATOR and NAME of the table for the question marks in the query):

 

 SELECT  OBID FROM    SYSIBM.SYSTABLES WHERE   CREATOR = ? AND     NAME = ?; 

You cannot use the OBID parameter to assign an OBID to a table that is already assigned to an existing DB2 table.

Augmenting the Log with Changed Information

If you are using a data propagation technology such as IBM's DataPropagator product, you may have to use the DATA CAPTURE parameter to augment the information captured on the DB2 log.

There are two options for DATA CAPTURE , NONE , and CHANGES . DATA CAPTURE NONE , which is the default, specifies that no additional information is to be recorded on the DB2 log. DATA CAPTURE CHANGES , however, will cause DB2 to write additional information about inserted, updated, and deleted data. When a data propagation tool uses the DB2 log to capture changed information this additional data may be required to ensure the validity of the propagated data.

NOTE

DATA CAPTURE CHANGES will not capture additional information for LOB columns.


Temporary Tables

Most DB2 tables are permanent, meaning that once created, they exist until an authorized user drops them. At times, though, you may need to create DB2 tables that exist only for the duration of a program run. Such a table is known as a temporary table.

DB2 has provided the capability to create temporary tables since Version 5. But the initial functionality was practical only in certain circumstances due to some inherent limitations. This first type of temporary table is known as a created (or global) temporary table.

graphics/v7_icon.gif

IBM's support of temporary tables was expanded as of Version 7. Now DB2 offers two different types of temporary tables: created and declared.


Why Use Temporary Tables?

Before we investigate these two types of temporary tables, let's first address why anyone would want or need to use a temporary table in the first place.

One potential use of temporary tables is to store intermediate SQL results during a program run. Consider, for example, if the results of a first query need to be used in a subsequent query. Instead of rerunning the first query (or combining it with the subsequent query), the results of the first query can be stored in a temporary table. Then the temporary table can be joined into the second query without incurring the overhead of rerunning the first query. This is particularly useful if the first query is particularly complex or inefficient.

Or what about result sets that need to be returned more than once by the same program? Consider this scenario: A complex multi-table join is coded that consumes a lot of resources to run. Furthermore, that join statement needs to be run three times during the course of the program. Instead of running the join three times you can run it once and populate a temporary table with the results. The next two times you can simply read the temporary table which might be more efficient than re-executing the complex, resource-consuming multi-table join.

Temporary tables also are useful for enabling non-relational data to be processed using SQL. For example, you can create a global temporary table that is populated with IMS data (or any other non-relational data source) by a program. Then during the course of that program, the temporary table containing the IMS data can be accessed by SQL statements and even joined to other DB2 tables. The same could be done for data from a flat file, VSAM, IDMS, or any other non-relational data.

Another reason for IBM's inclusion of temporary table support in DB2 is to make conversion from other relational products easier. Microsoft SQL Server and Oracle both have supported temporary tables for quite some time now. Without such support in DB2 it was very difficult for developers to convert or port their Oracle or SQL Server applications to DB2. IBM alleviated this problem by enabling temporary table support in DB2.

Now let's examine the two types of temporary tables supported by DB2.

Created Temporary Tables

A created temporary table exists only as long as the process that uses it. Temporary tables are created using the CREATE GLOBAL TEMPORARY TABLE statement. When created, the schema for the table is stored in the DB2 system catalog ( SYSIBM.SYSTABLES ) just like any other table, but the TYPE column is set to 'G' to indicate a global temporary table. Created temporary tables are sometimes referred to as global temporary tables, but this is confusing since declared temporary tables are also referred to as global when they are created.

It is important to remember that a created global temporary table must be created using a DDL CREATE statement before it can be used in any program.

A created temporary table is instantiated when it is referenced in an OPEN , SELECT INTO , INSERT , or DELETE statement, not when it is created. Each application process that uses the temporary table creates a new instance of the table for its use. When using a temporary table, keep the following in mind:

  • Because they are not persistent, locking, logging, and recovery do not apply to temporary tables.

  • Indexes cannot be created on temporary tables, so all access is by a complete table scan.

  • Constraints cannot be created on temporary tables.

  • A null is the only default value permitted for columns of a temporary table.

  • Temporary tables cannot be referenced by DB2 utilities.

  • Temporary tables cannot be specified as the object of an UPDATE statement.

  • When deleting from a temporary table, all rows must be deleted.

  • Although views can be created on temporary tables, the WITH CHECK OPTION cannot be specified.

Work file data sets are used to manage the data of created temporary tables. The work database ( DSNDB07 ) is used as storage for processing SQL statements that require working storage ”not just for created temporary tables. If you are using created temporary tables, be sure to examine the DB2 Installation Guide for tactics to estimate the disk storage required for temporary work files.

When a temporary work file result table is populated using an INSERT statement, it uses work file space. No other process can use the same work file space as that temporary work file table until the table goes away. The space is reclaimed when the application process commits or rolls back, or when it is deallocated, depending on which RELEASE option was used when the plan or package was bound. It is a good idea to keep the work files in a separate buffer pool to make it easier to monitor. IFCID 0311 in performance trace class 8 can be used to distinguish these tables from other uses of the work file.

Declared Temporary Tables
graphics/v7_icon.gif

DB2 Version 7 introduces declared temporary tables. Actually, to be more accurate, declared temporary tables were made available in the intermediate DB2 Version 6 refresh.


This new type of temporary table is different from a created temporary table and overcomes many of their limitations. The first significant difference between declared and created temporary tables is that declared temporary tables are specified using a DECLARE statement in an application program ”and not using a DDL CREATE statement. Because they are not persistent they do not have descriptions in the DB2 Catalog.

Additionally, declared temporary tables offer significant features and functionality not provided by created temporary tables. Consider

  • Declared temporary tables can have indexes and CHECK constraints defined on them.

  • You can issue UPDATE statements and positioned DELETE statements against a declared temporary table.

  • You can implicitly define the columns of a declared temporary table and use the result table from a SELECT .

So, declared temporary tables offer much more flexibility and functionality than created temporary tables. To "create" an instance of a declared temporary table, you must issue the DECLARE GLOBAL TEMPORARY TABLE statement inside of an application program. That instance of the declared temporary table is known only to the process that issues the DECLARE statement. Multiple concurrent programs can be executed using the same declared temporary table name because each program will have its own copy of the temporary table.

But there is more work required to use a declared temporary table than there is to use a created temporary table. Before you can declare temporary tables you must create a temporary database and table spaces for them to use. This is accomplished by specifying the AS TEMP clause on a CREATE DATABASE statement. Then, you must create segmented table spaces in the temporary database. Only one temporary database for declared temporary tables is permitted per DB2 subsystem.

CAUTION

Everyone has the CREATETAB privilege in the temporary database by default ”and you cannot REVOKE this privilege. It is implicitly granted when the temporary database is created.


When a DECLARE GLOBAL TEMPORARY TABLE statement is issued, DB2 will create an empty instance of the temporary table in the temporary table space. INSERT statements are used to populate the temporary table. Once inserted, the data can be accessed, modified, or deleted. When the program completes, DB2 will drop the instance of the temporary table. Also, be aware that users of temporary tables must have been granted USE authority on the temporary table space.

The following example shows a DECLARE statement that can be issued from an application program ( assuming the temporary database and table spaces already have been defined):

 

 DECLARE GLOBAL TEMPORARY TABLE TEMP_EMP   (EMPNO      CHAR(6)     NOT NULL,    FIRSTNME   VARCHAR(12) NOT NULL,    MIDINIT    CHAR(1)     NOT NULL,    LASTNAME   VARCHAR(15) NOT NULL,    WORKDEPT   CHAR(3),    PHONENO    CHAR(4)   ); 

This creates a declared temporary table named TEMP_EMP . Additionally, you can use the LIKE clause to DECLARE a temporary table that uses the same schema definition as another currently existing table. You can use the INCLUDING IDENTITY COLUMN ATTRIBUTES clause to copy the IDENTITY columns as well. For example:

 

 DECLARE GLOBAL TEMPORARY TABLE TEMP_PROJ  LIKE DSN8810.PROJ  INCLUDING IDENTITY  ON COMMIT PRESERVE ROWS; 

This example shows how to use the INCLUDING IDENTITY clause. However, the sample table DSN8810.PROJ does not use an IDENTITY column, so this statement would not work ”it is shown as an example only. Identity columns are covered later in this chapter in the section titled "Sequence Objects and Identity Columns."

Notice also the ON COMMIT PRESERVE ROWS clause in the previous example. The ON COMMIT clause specifies what action DB2 is to take with the data in the declared temporary table when the program issues a COMMIT statement. There are two options: PRESERVE or DELETE rows. Specifying PRESERVE ROWS indicates that the rows of the table are to be kept. Beware, though, that the PRESERVE ROWS option impacts thread reuse. You will not be able to reuse threads for any application process that contains, at its most recent COMMIT , an active declared temporary table defined using the PRESERVE ROWS option of the ON COMMIT clause. The other option, which is the default, is ON COMMIT DELETE ROWS . In that case all of the rows of the table are deleted as long as there are no cursors defined using WITH HOLD .

NOTE

graphics/v7_icon.gif

Scrollable cursors, another new feature of DB2 V7, require declared temporary tables. A scrollable cursor provides the ability to scroll forward and backward through the data once the cursor is open.

The data from a scrollable cursor is maintained in a declared temporary table. DB2 uses this mechanism to facilitate scrolling through data in multiple ways ”forward, backward, or to a specific position.

So, keep in mind, even if you do not choose to use temporary tables in your application programs, you may need to implement them to support scrollable cursors.


Declared Temporary Table Storage

Before using declared temporary tables, the temporary database and temporary table spaces must be defined to store the temporary data. For example,

 

 CREATE DATABASE TEMPDB AS TEMP; CREATE TABLESPACE TEMPTS  IN TEMPDB  SEGSIZE 4  BUFFERPOOL BP7; 

The table space is created as a temporary table space by virtue of it being in the temporary database.

The page size of the temporary table space must be large enough to hold the longest row in the declared temporary table. The size of a row in the declared temporary table might be considerably larger than the size of the row in the table for which the scrollable cursor is used. As with a regular table, the size of the row depends on the number of columns that are stored in the declared temporary table and the size of each column.

Consider creating a temporary table space for every page size. The buffer pool assignment determines the page size.

An in-depth discussion of calculating the storage requirements for a temporary table space for a declared temporary table is provided in the DB2 Installation Guide . Be sure to refer to that manual before implementing declared temporary tables or any features that rely on declared temporary tables (for example, scrollable cursors).

NOTE

Keep in mind that when there is more than one temporary table space defined to the DB2 subsystem, DB2 will select which temporary table spaces it will use for scrollable cursor processing. More information on scrollable cursors is provided in Chapter 11, "Using DB2 in an Application Program."


Temporary Table Guidelines

The following guidelines are provided for your assistance as you implement temporary tables for your DB2 applications.

Favor Declared Temporary Tables

For applications that require temporary tables, favor declared temporary tables over created temporary tables. Declared temporary tables are more flexible and functional than created temporary tables.

When using declared temporary tables you can index them, define CHECK constraints for them, and issue UPDATE statements and positioned DELETE statements against them, none of which can be done using created temporary tables.

When to Consider Created Temporary Tables

With all of the limitations of created temporary tables why would anyone still want to use them instead of declared temporary tables?

Well, there are a few potential problems with declared temporary tables, too. First of all, the DB2 Catalog ( SYSIBM.SYSPACKDEP ) will not show dependencies for declared temporary tables, but it will for created temporary tables. Second, some DBAs are leery of allowing database structures to be created by application programmers inside of an application programmer. With limited DDL and database design knowledge it may not be wise to trust programmers to get the table structure correct. Furthermore, the additional management of the temporary database and table spaces can become an administrative burden .

So, created temporary tables are still useful ”in the right situations. They should be considered primarily when no updating of temporary data is needed and access to the temporary data is purely sequential.

Use Temporary Tables with Stored Procedures

Temporary tables are most useful when a large result set must be returned from a stored procedure. Refer to Chapter 15, "Using DB2 Stored Procedures," for in-depth guidelines on using stored procedures.

Use SQL to Access Non-Relational Data

Temporary tables are useful for enabling non-relational data to be processed using SQL. For example, you can create a temporary table (global or created) and then populate it with IMS data in your application program. Then, during the course of that program, the temporary table containing the IMS data can be accessed by SQL statements and even joined to other DB2 tables.

General Table Guidelines

The following guidelines provide helpful hints for you to follow as you implement and manage tables within your DB2 applications and systems.

Use LIKE to Duplicate a Table's Schema

Use the LIKE clause to create a table with the same columns as another table. The following SQL creates a new table OLD_PROJ using the PROJ table as a template:

 

 CREATE TABLE DSN8810.OLD_PROJ LIKE DSN8810.PROJ IN  db.ts  ; 

The LIKE clause is particularly useful in the following instances:

  • When creating exception tables required by the CHECK utility

  • When multiple instances of a similar table must be created

  • When creating a PLAN_TABLE

  • When creating the same table for multiple users

Consider Using Comments

Consider using the COMMENT ON statement to document the entities you create. For V7 and previous releases, as many as 254 characters of descriptive text can be applied to each column, table, alias, index, distinct type, procedure, sequence, trigger, and UDF known to DB2. The maximum size of a comment is increased to 762 characters as of DB2 V8. The comment text is stored in a column named REMARKS in the appropriate DB2 Catalog table as follows:

  • SYSIBM.SYSTABLES for aliases and tables

  • SYSIBM.SYSCOLUMNS for columns

  • SYSIBM.SYSINDEXES for indexes

  • SYSIBM.SYSSEQUENCES for sequences

  • SYSIBM.SYSTRIGGERS for triggers

  • SYSIBM.SYSDATATYPES for distinct types

  • SYSIBM.SYSROUTINES for stored procedures and UDFs

If useful descriptions are maintained for all columns and tables, the DB2 Catalog can function as a crude data dictionary for DB2 objects. However, be aware that comments are stored in a VARCHAR column in each of the preceding catalog tables.

CAUTION

When comments are specified, the overall size of the DB2 Catalog will expand and might grow to be larger than expected. Weigh the benefits of added documentation against the impact on the DB2 Catalog before adding comments on all columns and tables.


Avoid Specifying Labels for Columns

DB2 provides the capability to label columns of DB2 tables using the LABEL ON statement. This was useful at times prior to DB2 Version 8 because the maximum length for a column name was 18 characters; the maximum length of a column label was 30 characters.

The label is stored in the DB2 Catalog in the SYSIBM.SYSCOLUMNS tables. QMF users can specify that they want to use labels rather than column names , thereby providing better report headings.

However, as of DB2 V8 the maximum length of a column name is 128 characters and the maximum length of the label is 90 characters, negating the benefit of a "descriptive" column label.

CAUTION

Be aware that specifying column labels will add to the overall size of the DB2 Catalog, specifically to the SYSIBM.SYSCOLUMNS table. However, labels will not cause the same amount of growth as comments because labels have a maximum size of 30 characters as opposed to 254 for comments; or, as of DB2 V8, 90 characters as opposed to 762 for comments.


Changing the Name of a Table

The RENAME statement enables DBAs to change the name of a DB2 table without dropping and re-creating the table. All table characteristics, data, and authorization is maintained. This feature is not available prior to DB2 Version 5.

Avoid Using the IN DATABASE Clause

When creating tables do not use the IN DATABASE clause. When IN DATABASE is used, a simple table space is implicitly created in the specified database. This simple table space will use all of the default parameters ”which, as we have already learned, are usually not optimal. Additionally, no other tables can be assigned to this table space.

Instead, explicitly create a table space (of the proper type for the data), and then create the table in that table space.

Rows and Columns

When defining DB2 columns to a table you will need to choose a data type and perhaps a length for each column. Recall from Chapter 1, "The Magic Words," that DB2 supports the following data types:

CHAR

fixed length alphanumeric data

VARCHAR

variable length alphanumeric data

GRAPHIC

fixed length graphical data

VARGRAPHIC

variable length graphical data

SMALLINT

small integer numbers

INTEGER

larger integer numbers

DECIMAL(p,s)

numeric data

FLOAT(n) or FLOAT

single precision floating point

(if n>21)

FLOAT(n) or REAL

double precision floating point

(if n<21)

DATE

calendar date data

TIME

time data

TIMESTAMP

combination date and time data

ROWID

unique row identifier (internally generated by DB2)

BLOB

binary large object

CLOB

character large object

DBCLOB

double byte character large object


As you define your DB2 tables you also will have to worry about other factors and assign other characteristics to each column. For example,

  • When creating a table you will need to define each column and therefore will be required to specify the columns in an effective sequence.

  • Each column must be named.

  • Each column must be defined as nullable or unable to be assigned null.

  • Each column may be assigned a default value to use when data is inserted to a row but no value is provided for that particular column.

As you create DB2 tables, you should be mindful of their composition (rows and columns) and how this affects performance. This section outlines several guidelines that ensure efficient row and column specification.

ROWID

The ROWID data type is used to generate a unique value for every row in a table. The value is internally generated by DB2. A table can have only one ROWID column. The values in a ROWID column cannot be null.

An additional parameter, GENERATED , must be specified for a column defined as a ROWID . The GENERATED parameter is reserved for ROWID columns only (as of DB2 Version 6). It indicates that the values for the column are to be generated by DB2. There are two options for GENERATED , one of which must be supplied:

  • The ALWAYS parameter indicates that DB2 will always generate a value for the column when a row is inserted into the table. Most ROWID columns should be defined with this option.

  • The BY DEFAULT parameter indicates that DB2 will generate a value for the column when a row is inserted into the table unless a value is specified. The BY DEFAULT option can be useful if you are using data propagation to move ROWID values from one table to another. If you specify BY DEFAULT , the ROWID column must have a unique, single-column index. Until this index is created, you cannot add rows to the table regardless of whether you are using INSERT or LOAD .

CAUTION

DB2 will use an explicitly specified value for a ROWID only if it is a valid ROWID value that was previously generated by DB2.


Sequence Objects and Identity Columns

When designing database a frequent request is for a column to contain sequentially generated numbers. For example, each row has a counter associated with it. When a new row is inserted, the counter should be incremented by one for the new row. Until recently such a design was difficult to deliver.

Without sequence objects or identity columns an application program can implement similar functionality, but usually not in a manner that will perform adequately as a usage scale. One common technique is to maintain a one-row table that contains the sequence number. Each transaction locks that table, increments the number, and then commits to unlock the table. In this scenario only one transaction at a time can increment the sequence number. A variation uses something like this:

 

 SELECT MAX()+ 1 FROM   ONEROW_TABLE WITH RR; 

The result is the next highest number to be used. This value is used by the application and ONEROW_TABLE must be updated with the incremented value. Performance bottlenecks will occur with this method when a lot of concurrent usage is required.

But now DB2 offers two methods of automatically generating sequential numbers for a column:

  • graphics/v7_icon.gif Identity columns.

  • graphics/v8_icon.gif SEQUENCE objects.

Identity Columns

Identity columns were formally added to DB2 as of Version 7, but were also made available to DB2 Version 6 as a refresh. An identity column is defined to a DB2 column using the IDENTITY parameter. A column thusly defined will cause DB2 to automatically generate a unique, sequential value for that column when a row is added to the table. For example, identity columns might be used to generate unique primary key values or a value that somewhat mimics Oracle's row number capability. Using identity columns helps to avoid some of the concurrency and performance problems that can occur when application programs are used to populate sequential values for a "counter" column.

When inserting data into a table that uses an identity column, the program or user will not provide a value for the identity column. Instead, DB2 automatically generates the appropriate value to be inserted.

Only one identity column can be defined per DB2 table. Additionally, the data type of the column must be SMALLINT , INTEGER , or DECIMAL with a zero scale, that is DECIMAL( n ,0) . The data type also can be a user-defined DISTINCT type based on one of these numeric data types. The designer has control over the starting point for the generated sequential values, and the number by which the count is incremented.

An example of creating a table with an identity column follows:

 

 CREATE TABLE EXAMPLE    (ID_COL INTEGER NOT NULL            GENERATED ALWAYS AS IDENTITY            START WITH 100            INCREMENT BY 10     ...); 

In this example, the identity column is named ID_COL . The first value stored in the column will be 100 and subsequent INSERT s will add 10 to the last value. So the identity column values generated will be 100, 110, 120, 130, and so on.

Note, too, that each identity column has a property associated with it assigned using the GENERATED parameter. This parameter indicates how DB2 generates values for the column. You must specify GENERATED if the column is to be considered an identity column or the data type of the column is a ROWID . This means that DB2 must be permitted to generate values for all identity columns. There are two options for the GENERATED parameter, ALWAYS and BY DEFAULT :

  • GENERATED ALWAYS indicates that DB2 will always generate a value for the column when a row is inserted into the table. You will usually specify ALWAYS for your identity columns unless you are using data propagation.

  • GENERATED BY DEFAULT indicates that DB2 will generate a value for the column when a row is inserted into the table unless a value is specified. So, if you want to be able to insert an explicit value into an identity column you must specify GENERATED BY DEFAULT .

Additionally, you can specify what to do when the maximum value is hit. Specifying the CYCLE keyword will cause DB2 to begin generating values from the minimum value all over again. Of course, this can cause duplicate values to be generated and should only be used when uniqueness is not a requirement.

Sometimes it is necessary to retrieve the value of an identity column immediately after it is inserted. For example, if you are using identity columns for primary key generation you may need to retrieve the value to provide the foreign key of a child table row that is to be inserted after the primary key is generated. The IDENTITY_VAL_LOCAL() function can be used to retrieve the value of an identity column after insertion. For example, run the following statement immediately after the INSERT statement that sets the identity value:

 

 VALUES IDENTITY_VAL_LOCAL() INTO :IVAR; 

The host variable IVAR will contain the value of the identity column.

Problems with Identity Columns

Identity columns can be useful, depending on your specific needs, but the problems that accompany identity column are numerous . Some of these problems include

  • Handling the loading of data into a table with an identity column defined as GENERATED BY DEFAULT . The next identity value stored by DB2 to be assigned may not be the correct value that should be generated. This can be especially troublesome in a testing environment.

  • LOAD INTO PART x is not allowed if an identity column is part of the partitioning index.

  • What about environments that require regular loading and reloading ( REPLACE ) for testing? The identity column will not necessarily hold the same values for the same rows from test to test.

  • graphics/v8_icon.gif Prior to V8, it was not possible to change the GENERATED parameter (such as from GENERATED BY DEFAULT to GENERATED ALWAYS ).

  • The IDENTITY_VAL_LOCAL() function returns the value used for the last insert to the identity column. But it only works after a singleton INSERT . This means you cannot use INSERT INTO SELECT FROM or LOAD , if you need to rely on this function.

  • When the maximum value is reached for the identity column, DB2 will cycle back to the beginning to begin reassigning values ”which might not be the desired approach.

If you can live with these caveats, then identity columns might be useful to your applications. However, in general, these "problems" make identity columns a very niche solution. IBM has intentions to rectify some of these problems over time in upcoming versions of DB2.

SEQUENCE Objects

Recall that DB2 has two methods of automatically generating sequential numbers. The first method is to define an identity column for the table; the second is to create a SEQUENCE object. A SEQUENCE object is a separate structure that generates sequential numbers.

graphics/v8_icon.gif

New to DB2 V8, a SEQUENCE is a database object specifically created to generate sequential values. So, using a SEQUENCE object requires the creation of a database object; using an identity column does not.


A SEQUENCE objects is created using the CREATE SEQUENCE statement.

When the SEQUENCE object is created it can be used by applications to "grab" a next sequential value for use in a table. SEQUENCE objects are ideal for generating sequential, unique numeric key values. A sequence can be accessed and incremented by many applications concurrently without the hot spots and performance degradation associated with other methods of generating sequential values. SEQUENCE objects also can be used by more than one column in the same table.

Sequences are efficient and can be used by many users at the same time without causing performance problems. Multiple users can concurrently and efficiently access SEQUENCE objects because DB2 does not wait for a transaction to COMMIT before allowing the sequence to be incremented again by another transaction.

An example of creating a SEQUENCE object follows:

 

 CREATE SEQUENCE ACTNO_SEQ    AS SMALLINT    START WITH 1    INCREMENT BY 1    NOMAXVALUE    NOCYCLE    CACHE 10; 

This creates the SEQUENCE object named ACTNO_SEQ . Now it can be used to generate a new sequential value, for example

 

 INSERT INTO DSN8810.ACT      (ACTNO, ACTKWD, ACTDESC)    VALUES      (NEXT VALUE FOR ACTNO_SEQ, 'TEST', 'Test activity'); 

The NEXT VALUE FOR clause is known as a sequence expression. Coding the sequence expression causes DB2 to use the named SEQUENCE object to automatically generate the next value. You can use a sequence expression to request the previous value that was generated. For example,

 

 SELECT PREVIOUS VALUE FOR ACTNO_SEQ INTO   :IVAR FROM   DSN8810.ACT; 

As you can see, sequence expressions are not limited to INSERT statements, but can be used in UPDATE and SELECT statements, too.

CAUTION

If you specify the NEXT VALUE FOR clause more than once in the same SQL statement, DB2 will return the same value for each NEXT VALUE FOR specification.


SEQUENCE Object Parameters

Similar to identity columns, a SEQUENCE object has parameters to control the starting point for the generated sequential values, and the number by which the count is incremented. You can also specify the data type to be generated (the default is INTEGER ). You can also specify a minimum value ( MINVALUE ) and a maximum value ( MAXVALUE ) if you wish to have further control over the values than is provided by the data type chosen .

Again, as with identity columns, you can specify how the SEQUENCE should handle running out of values when the maximum value is hit. Specifying the CYCLE keyword will cause the SEQUENCE object to wrap around and begin generating values from the minimum value all over again.

A final consideration for SEQUENCE objects is caching. Sequence values can be cached in memory to facilitate better performance. The size of the cache specifies the number of sequence values that DB2 will pre-allocate in memory. In the previous example CACHE 10 indicates that ten sequence values will be generated and stored in memory for subsequent use. Of course, you can turn off caching by specifying NO CACHE . With caching turned off each new request for a sequence number will cause I/O to the DB2 Catalog ( SYSIBM.SYSSEQUENCES ) to generate the next sequential value.

SEQUENCE Object Guidelines

DB2 does not wait for an application that has incremented a sequence to commit before allowing the sequence to be incremented again by another application. Applications can use one sequence for many tables, or create multiple sequences for use of each table requiring generated key values. In either case, the applications control the relationship between the sequences and the tables.

The name of the SEQUENCE object indicates that we are going to use it to generate activity numbers ( ACTNO ), but its usage is not limited to that. Of course, failure to control the use of a SEQUENCE object can result in gaps in the sequential values. For example, if we use the ACTNO_SEQ object to generate a number for a different column, the next time we use it for ACTNO there will be a gap where we generated that number.

Other scenarios can cause gaps in a SEQUENCE , too. For example, issuing a ROLLBACK after acquiring a sequence number will not roll back the value of the sequence generator ”so that value is lost. A DB2 failure can also cause gaps because cached sequence values will be lost.

NOTE

When sequences were introduced in non-mainframe DB2, syntax was supported that did not conform to the SQL standard. This non-standard syntax is supported on the mainframe as well:

  • NEXTVAL can be used in place of NEXT VALUE

  • PREVVAL can be used in place of PREVIOUS VALUE


Choosing Between IDENTITY and SEQUENCE

Although both IDENTITY columns and SEQUENCE objects are useful for generating incremental numeric values, you will need to choose between the two. Consider the following criteria when choosing one over the other. IDENTITY columns are useful when

  • Only one column in a table requires automatically generated values.

  • Each row requires a separate value.

  • An automatic generator is desired for a primary key of a table.

  • The process of generating a new value is tied closely to inserting into a table, regardless of how the insert happens.

SEQUENCE objects are useful when

  • Values generated from one sequence are to be stored in more than one table.

  • More than one column per table requires automatically generated values (multiple values may be generated for each row using the same sequence or more than one sequence).

  • The process of generating a new value is independent of any reference to a table.

Unlike SEQUENCE objects, which are more flexible, IDENTITY columns must adhere to several rigid requirements. For example, an IDENTITY column is always defined on a single table, and each table can have at most one IDENTITY column. Furthermore, when you create an IDENTITY column, the data type for that column must be numeric ”not so for sequences. If you used a SEQUENCE object to generate a value, you could put that generated value into a CHAR column, for example. Finally, when defining an IDENTITY column you cannot specify the DEFAULT clause, and the column is implicitly defined as NOT NULL . Remember, DB2 automatically generates the IDENTITY column's value, so default values and nulls are not useful concepts.

Consult Table 5.7 for a summary comparison of SEQUENCE objects and IDENTITY column characteristics.

Table 5.7. IDENTITY Columns Versus SEQUENCE Objects

IDENTITY Columns

SEQUENCE Objects

Internal objects generated and maintained by DB2

Standalone database objects created by a DBA

Associated with a single table

Not associated with a specific table; useable across tables

Use IDENTITY_VAL_LOCAL() to get last value assigned

Use PREVIOUS VALUE FOR seq-expr to get last value assigned

N/A

Use NEXT VALUE FOR seq-expr to get next value to be assigned

Add/change using ALTER TABLE ...ALTER COLUMN ( DB2 V8 only )

Administer using ALTER SEQUENCE , DROP , COMMENT , GRANT , and REVOKE

Version 6 refresh; Version 7

Version 8


Row and Column Guidelines

The following guidelines can be used to assist you as you design the row and column specifications of your DB2 tables.

Avoid Wasted Space

If you do not use very large and very small row sizes, you can reduce the amount of space wasted by unuseable bytes on the pages of a table space. Keep these rules in mind:

  • A maximum of 255 rows can be stored on one table space page.

  • A row length larger than 4,056 will not fit on a 4KB page. You will need to choose one of the other allowable page sizes. Page sizes supported by DB2 include 4KB, 8KB, 16KB, and 32KB.

  • A row length less than 15 wastes space because no more than 255 rows can be stored on a DB2 page, regardless of the size of the row. For pre-V3 subsystems, a row length less than 31 bytes wastes space because the limit at that time was 127 rows per page.

  • A row length of 2,029 results in only one row per (4K) page because the second row will be too large to exist on the same page.

Determine row size carefully to avoid wasting space. If you can combine small tables or split large tables to avoid wasting a large amount of space, do so. It usually is impossible to avoid wasting some space, however.

Define All Appropriate Columns

Be sure to apply normalization to your database design to arrive at the appropriate columns for each DB2 table. An introduction to normalization is forthcoming in a later section of this chapter.

Keep in mind, however, that the maximum number of columns that can be defined for a DB2 table is 750. If the table participates as a dependent to another table (referential integrity), then the maximum number of columns is reduced to 749.

For a table with more than 749 or 750 columns, you might need to break it into two tables, each with the same primary key, but with a different subset of the total number of columns.

Choose Meaningful Column Names

In many data processing shops , common names for data elements have been used for years . Sometimes these names seem arcane because they comply with physical constraints that have long since been overcome .

graphics/v8_icon.gif

With DB2 V8 a column name can comprise up to 128 bytes. For V7 and prior releases the maximum size of a column name is 18 bytes. You can enhance the useability of your applications if you use as many characters as necessary to achieve easy-to-understand column names. For example, use CUSTOMER_NAME instead of CNA0 for a customer name column. Do not use column names simply because people are accustomed to them.


This might be a tough sell in your organization, but it's well worth the effort. If you must support the older, non-descriptive names, consider creating tables with the fully descriptive names and then creating views of these tables with the old names. Eventually, people will convert to use the tables instead of the views.

Standardize Abbreviations

Every shop uses abbreviated data names. This isn't a bad practice ”unless the specification of abbreviations is random, uncontrolled, or unplanned . Document and enforce strict abbreviation standards for data names in conjunction with your data-naming standards. For example, the CUSTOMER_NAME column mentioned in the previous guideline can be abbreviated in many ways ( CST_NME , CUST_NM , CUST_NAME , and so on). Choose one standard abbreviation and stick to it.

Many shops use a list of tokens to create data abbreviation standards. This is fine as long as each token represents only one entity and each entity has only one abbreviation. For example:

Entity

Standard Abbreviation

CUSTOMER

CUST

NAME

NME


Sequence Columns to Achieve Optimal Performance

The sequencing of columns in a table is not important from a functionality perspective because the relational model states that columns must be non-positional. Columns and rows do not need to be sequenced for the retrieval commands to work on tables.

When you create a table, however, you must supply the columns in a particular order, and that becomes the order in which they physically are stored. The columns then can be retrieved in any order using the appropriate SQL SELECT statement.

When creating your tables, you will get better performance if you sequence the columns with an understanding of how DB2 logs. When data is modified in your database, DB2 will log every change made (in most circumstances). At a high level, this is how DB2 logs:

  • For fixed-length rows, DB2 logs from the first byte changed to the last byte changed.

  • For variable-length rows in which the size of the row changes due to the modification, DB2 logs from the first byte changed to the end of the row.

  • For variable-length rows in which the size of the row does not change, DB2 logs from the first byte changed to the last byte changed.

"So what?" you may be asking. Logging can cause a significant performance bottleneck. You can optimize performance by sequencing your columns such that you minimize the amount of data logged. The less data DB2 needs to log, the better the performance of your data modification operations will be. So, follow these rules for column sequencing:

  • Place the primary key columns first to ease identification.

  • Place infrequently updated non-variable columns next.

  • Place infrequently updated variable columns after the infrequently updated non- variable columns.

  • Consider placing columns that are frequently modified at the same time next to one another in sequence in the table. This can help to reduce the amount of data that is logged. Do so even if it does not "appear" to be an elegant design; for example, if the MIDDLE_INITIAL column is often updated at the same time as the PHONE_NUMBER column, place the two columns contiguously next to each other. Even though it might "look nicer" to put the LAST_NAME column after MIDDLE_INITIAL , performance can be improved in this scenario by putting PHONE_NUMBER after MIDDLE_INITIAL .

  • Only then, and given the preceding constraints, try to sequence the columns in an order that makes sense to the users of the table.

NOTE

A varying length row is any row that contains a VARCHAR or VARGRAPHIC column or any row that is compressed. Using ALTER TABLE to add a column to an existing table also makes the rows varying length (until the table space for the table is reorganized).


Avoid Special Sequencing for Nullable Columns

Treat nullable columns the same as you would any other column. Some DBAs advise you to place nullable columns of the same data type after non-nullable columns. This is supposed to assist in administering the null columns, but in my opinion it does not. Sequencing nullable columns in this manner provides no clear benefit and should be avoided.

See the "DB2 Table Parameters" section in this chapter for additional advice on nullable columns and Chapter 2, "Data Manipulation Guidelines," for advice on accessing nullable columns.

Use Nulls with Care

A null is DB2's attempt to record missing or unknown information. When you assign a null to a column instance, it means that a value currently does not exist for the column. It's important to understand that a column assigned to null logically means one of two things: The column does not apply to this row, or the column applies to this row, but the information is not known at present.

For example, suppose that a table contains information on the hair color of employees. The HAIR_COLOR column is defined in the table as being capable of accepting nulls. Three new employees are added today: a man with black hair, a woman with unknown hair color, and a bald man. The woman with the unknown hair color and the bald man both could be assigned null HAIR_COLOR , but for different reasons. The hair column color for the woman would be null because she has hair but the color presently is unknown. The hair color column for the bald man would be null also, but this is because he has no hair and so hair color does not apply.

DB2 does not differentiate between nulls that signify unknown data and those that signify inapplicable data. This distinction must be made by the program logic of each application.

DB2 represents null in a special hidden column known as an indicator variable . An indicator variable is defined to DB2 for each column that can accept nulls. The indicator variable is transparent to an end user, but must be provided for when programming in a host language (such as COBOL or PL/I). Every column defined to a DB2 table must be designated as either allowing or disallowing nulls.

The default definition for columns in a DB2 table is to allow nulls. Nulls can be prohibited for a column by specifying the NOT NULL or NOT NULL WITH DEFAULT option in the CREATE TABLE statement.

Avoid nulls in columns that must participate in arithmetic logic (for example, DECIMAL money values). The AVG , COUNT DISTINCT , SUM , MAX , and MIN functions omit column occurrences set to null. The COUNT(*) function, however, does not omit columns set to null because it operates on rows. Thus, AVG is not equal to SUM/COUNT(*) when the average is being computed for a column that can contain nulls. If the COMM column is nullable, the result of the following queries are not the same

 

 SELECT  AVG(COMM)                            SELECT  SUM(COMM)/COUNT(*) FROM    DSN8810.EMP;                         FROM    DSN8810.EMP; 

For this reason, avoid nulls in columns involved in math functions.

When DATE , TIME , and TIMESTAMP columns can be unknown, assign them as nullable. DB2 checks to ensure that only valid dates, times, and timestamps are placed in columns defined as such. If the column can be unknown, it must be defined to be nullable because the default for these columns is the current date, current time, and current timestamp (unless explicitly defined otherwise using the DEFAULT clause). Null, therefore, is the only available option for the recording of missing dates, times, and timestamps.

For every other column, determine whether nullability can be of benefit before allowing nulls. Consider these rules:

  • When a nullable column participates in an ORDER BY or GROUP BY clause, the returned nulls are grouped at the high end of the sort order.

  • Nulls are considered to be equal when duplicates are eliminated by SELECT DISTINCT or COUNT (DISTINCT column ) .

  • A unique index considers nulls to be equivalent and disallows duplicate entries because of the existence of nulls, unless the WHERE NOT NULL clause is specified in the index.

  • For comparison in a SELECT statement, two null columns are not considered equal. When a nullable column participates in a predicate in the WHERE or HAVING clause, the nulls encountered cause the comparison to evaluate to UNKNOWN .

  • When a nullable column participates in a calculation, the result is null.

  • Columns that participate in a primary key cannot be null.

  • To test for the existence of nulls, use the special predicate IS NULL in the WHERE clause of the SELECT statement.

  • You cannot simply state WHERE column = NULL . You must state WHERE column IS NULL .

  • It is invalid to test if a column is < NULL , <= NULL , > NULL , or >= NULL . These are all meaningless because null is the absence of a value.

  • You can assign a column to null using the = predicate in the SET clause of the UPDATE statement.

Examine these rules closely. ORDER BY , GROUP BY , DISTINCT , and unique indexes consider nulls to be equal and handle them accordingly . The SELECT statement, however, deems that the comparison of null columns is not equivalence, but unknown. This inconsistent handling of nulls is an anomaly that you must remember when using nulls. The following are several sample SQL queries and the effect nulls have on them.

 

 SELECT   JOB, SUM(SALARY) FROM     DSN8810.EMP GROUP BY JOB; 

This query returns the average salary for each type of job. All instances in which JOB is null will group at the bottom of the output.

 

 SELECT  EMPNO, PROJNO, ACTNO, EMPTIME,         EMSTDATE, EMENDATE FROM    DSN8810.EMPPROJACT WHERE   EMSTDATE = EMENDATE; 

This query retrieves all occurrences in which the project start date is equal to the project end date. This information is clearly erroneous, as anyone who has ever worked on a software development project can attest. The query does not return any rows in which either date or both dates are null for two reasons:

  • Two null columns are never equal for purposes of comparison.

  • When either column of a comparison operator is null, the result is unknown.

 

 UPDATE  DSN8810.DEPT   SET   MGRNO = NULL WHERE   MGRNO = '000010'; 

This query sets the MGRNO column to null wherever MGRNO is currently equal to '000010' in the DEPT table.

NOTE

Nulls sometimes are inappropriately referred to as null values. Using the term value to describe a null column is incorrect because the term null implies the lack of a value. The relational model has abandoned the idea of nulls in favor of a similar concept called marks . The two types of marks are an A-mark and an I-mark . An A-mark refers to information that is applicable but presently unknown, whereas an I-mark refers to inapplicable information (information that does not apply). If DB2 would implement marks rather than nulls, the problem of differentiating between inapplicable and unknown data would disappear.

No commercial DBMS products support A-marks and I-marks.


Define Columns Across Tables in the Same Way

When a column that defines the same attribute as another column is given a different column name, data administrators refer it to as a column synonym . In general, column synonyms should be avoided except in the situations detailed in this guideline.

Every attribute should be defined in one way, that is, with one distinct name and one distinct data type and length. The name should be different only if the same attribute needs to be represented as a column in one table more than once, or if the practical meaning of the attribute differs as a column from table to table. For example, suppose that a database contains a table that holds the colors of items. This column is called Color. The same database has a table with a Preferred Color column for customers. This is the same logical attribute, but its meaning changes based on the context. It is not wise to simply call the column Color in the Customer table, because it would imply that the customer is that color!

An attribute must be defined twice in self-referencing tables and in tables requiring multiple foreign key references to a single table. In these situations, create a standard prefixing or suffixing mechanism for the multiple columns. After you define the mechanism, stick to it. For example, the DSN8810.DEPT table in Appendix A, "DB2 Sample Tables," is a self-referencing table that does not follow these recommendations. The ADMRDEPT column represents the same attribute as the DEPTNO column, but the name is not consistent. A better name for the column would have been ADMR_DEPTNO . This adds the ADMR prefix to the attribute name, DEPTNO .

The practical meaning of columns that represent the same attribute can differ from table to table as well. In the sample tables, for example, the MGRNO column in the DSN8810.DEPT table represents the same attribute as the EMPNO column in the DSN8810.EMP table. The two columns can be named differently in this situation because the employee number in the DEPT table represents a manager, whereas the employee number in the EMP table represents any employee. (Perhaps the MGRNO column should have been named MGR_EMPNO .)

The sample tables provide another example of when this guideline should have been followed, but wasn't. Consider the same two tables: DSN8810.DEPT and DSN8810.EMP . Both contain the department number attribute. In the DEPT table, the column representing this attribute is DEPTNO , but in the EMP table, the column is WORKDEPT . This is confusing and should be avoided. In this instance, both should have been named DEPTNO .

Never use homonyms. A homonym , in DB2-column terminology, is a column that is spelled and pronounced the same as another column, but represents a different attribute.

Avoid Duplicate Rows

To conform to the relational model, every DB2 table should prohibit duplicate rows. Duplicate rows cause ambiguity and add no value.

If duplicates exist for an entity, either the entity has not been rigorously defined and normalized or a simple counter column can be added to the table. The counter column would contain a number indicating the number of duplicates for the given row.

Define a Primary Key

To assist in the unique identification of rows, define a primary (or unique) key for every DB2 table. The preferred way to define a primary key is with the PRIMARY KEY clause of the CREATE TABLE statement.

Consider Using a Surrogate Key

When the primary key for a table is very large it can be unwieldy to implement. For example, an eight-column key of 250 bytes could pose problems if there are numerous foreign keys that refer to it. If the primary key is defined using eight columns and 250 bytes, then every foreign key also will consume 250 bytes and use eight columns.

If the length of the primary key is impractical to implement, consider defining a surrogate key. You might be able to use a DB2 TIMESTAMP or a SEQUENCE object to generate unique surrogate key values.

Use Appropriate DB2 Data Types

Use the appropriate DB2 data type when defining table columns. Recall the list of valid DB2 data types presented earlier in this section. Some people may advise you to avoid certain DB2 data types ”this is unwise. Follow these rules:

  • Use the DB2 DATE data type to represent all dates. Do not use a character or numeric representation of the date.

  • Use the DB2 TIME data type to represent all times. Do not use a character or numeric representation of the time.

  • Favor the use of the DB2 TIMESTAMP data type when the date and time are always needed together, but rarely needed alone. Do not use a character or numeric representation of the timestamp. For additional insight, see the tradeoffs listed in the upcoming "Analyze DATE and TIME Columns Versus TIMESTAMP Columns" section.

  • Using INTEGER and SMALLINT data types is interchangeable with using the DECIMAL data type without scale. Specifying DECIMAL without scale sometimes is preferable to INTEGER and SMALLINT because it provides more control over the domain of the column. However, DECIMAL without scale might use additional DASD. For additional insight, see the tradeoffs listed in the upcoming "Consider All Options When Defining Columns As INTEGER" section.

  • When the data item is always numeric (and numeric only), use a numeric data type. Even if leading zeroes must be stored or reported , using the character data type is rarely acceptable. You can use program logic and reporting software to display any numeric data with leading blanks. Storing the data as a numeric data type has the benefit of providing automatic DB2 data integrity checking (non-numeric data can never be stored in a column defined with a numeric data type).

  • Remember, DB2 uses the cardinality of a column to determine its filter factors used during access path selection. The specification of column data types can influence this access path selection.

There are more possible character (alphanumeric) values than there are numeric values for columns of equal length. For example, consider the following two columns:

 

 COLUMN1   SMALLINT  NOT NULL COLUMN2   CHAR(5)   NOT NULL 

COLUMN1 can contain values only in the range “32,768 to 32,767, for a total of 65,536 possible values. COLUMN2 , however, can contain all the permutations and combinations of legal alphabetic characters, special characters, and numerals. So you can see how defining numeric data as a numeric data type usually results in a more accurate access path selection by the DB2 optimizer; the specified domain is more accurate for filter factor calculations.

Analyze DATE and TIME Columns Versus TIMESTAMP Columns

When defining tables that require a date and time stamp, two solutions are available:

  • Coding two columns, one as a DATE data type and the other as a TIME data type

  • Coding one column specifying the TIMESTAMP data type

Each option has its benefits and drawbacks. Before choosing an approach, consider the following issues:

  • With DATE and TIME you must use two columns. TIMESTAMP uses one column, thereby simplifying data access and modification.

  • The combination of DATE and TIME columns requires 7 bytes of storage, while a TIMESTAMP column always requires 10 bytes of storage. Using the combination of DATE and TIME columns can save space.

  • TIMESTAMP provides greater time accuracy, down to the microsecond level. TIME provides accuracy only to the second level. If precision is important, use TIMESTAMP ; otherwise consider the combination of DATE and TIME .

  • Date and time arithmetic can be easier to implement using TIMESTAMP data instead of a combination of DATE and TIME . Subtracting one TIMESTAMP from another results in a TIMESTAMP duration. To calculate a duration using DATE and TIME columns, two subtraction operations must occur: one for the DATE column and one for the TIME column.

  • DB2 provides for the formatting of DATE and TIME columns via local DATE and TIME exits, the CHAR function, and the DATE and TIME precompiler options. These facilities are not available for TIMESTAMP columns. If the date and time information is to be extracted and displayed on a report or by an online application, the availability of these DB2-provided facilities for DATE and TIME columns should be considered when making your decision.

Consider the Display Format for DATE and TIME Data

DB2 provides four options for displaying DATE and TIME data, as shown in Table 5.8. Each format conforms to a standard means of displaying date and time data: EUR is European Standard, ISO is the International Standards Organization format, JIS is Japanese Industrial Standard, and USA is IBM United States of America Standard.

Table 5.8. DB2 Date and Time Formats

Format

Date

Time

EUR

DD.MM.YYYY

HH.MM.SS

ISO

YYYY-MM-DD

HH.MM.SS

JIS

YYYY-MM-DD

HH:MM:SS

USA

MM/DD/YYYY

HH:MM AM or PM


DB2 also allows for the creation of an installation-defined date format. One of these formats is chosen as the default standard for your DB2 subsystem at installation time. The default is ISO . Any format can be displayed using the CHAR() function (previously described in Chapter 3, "Using DB2 Functions").

CAUTION

Avoid choosing the USA format as the default. The USA format causes TIME data to be displayed without the seconds component, instead appending an AM or PM . EUR , ISO , and JIS all display TIME in military format, specifying 1 through 24 for the hour . The USA format does not, instead specifying 1 through 12 for the hour, and using AM and PM to designate morning and evening times.

If the default format is USA , TIME columns will be displayed without seconds and with the AM or PM extension. When data is unloaded using DSNTIAUL , the seconds information is lost. This can result in data integrity problems if the unloaded data is subsequently loaded to another table or used as input for other processes.


Consider Optimization When Choosing Data Type

The impact on optimization is another consideration when deciding whether to use a character or a numeric data type for a numeric column.

Consider, for example, a column that must store four byte integers. This can be supported using a CHAR(4) data type or a SMALLINT data type. Often times, the desire to use CHAR(4) is driven by the need to display leading zeroes on reports .

Data integrity will not be an issue assuming that all data is edit checked prior to insertion to the column (a big assumption). But even if edit checks are coded, DB2 is not aware of these and assumes that all combinations of characters are permitted. For access path determination on character columns, DB2 uses base 37 math. This assumes that usually one of the 26 alphabetic letters or the 10 numeric digits or a space will be used. This adds up to 37 possible characters. For a four-byte character column there are 37 4 or 1,874,161 possible values.

A SMALLINT column can range from “32,768 to 32,767 producing 65,536 possible small integer values. The drawback here is that negative or 5 digit product codes could be entered. However, if we adhere to our proper edit check assumption, the data integrity problems will be avoided here, as well.

DB2 will use the HIGH2KEY and LOW2KEY values to calculate filter factors. For character columns, the range between HIGH2KEY and LOW2KEY is larger than numeric columns because there are more total values. The filter factor will be larger for the numeric data type than for the character data type, which might influence DB2 to choose a different access path. For this reason, favor the SMALLINT over the CHAR(4) definition.

Choose a Data Type Closest to the Desired Domain

It is always best to choose the data type for each column to be the one that is closest to its domain. By doing so, DB2 will perform data integrity checking that otherwise would need to be coded into application programs or CHECK constraints. For example, if you are storing numeric data in the column, do not choose a character data type. In general, adhere to the following rules:

  • If the data is numeric, favor SMALLINT , INTEGER , or DECIMAL data types. FLOAT is also an option.

  • If the data is character, use CHAR or VARCHAR data types.

  • If the data is date and time, use DATE , TIME , and TIMESTAMP data types.

  • If the data is multimedia, use GRAPHIC , VARGRAPHIC , BLOB , CLOB , or DBCLOB data types.

Specify Appropriate Defaults

When a row is inserted or loaded into a table and no value is specified for a column but the column has a default specified, the column will be set to the value that has been identified in the column default specification. Two types of defaults are available: system-defined and user-defined.

Each column can have a default value specifically tailored to it. These are known as user-defined defaults. DB2 also provides specific system-defined defaults for each data type, which are used if an explicit default value is not specified. The system-defined defaults are outlined in Table 5.9.

Table 5.9. System-Defined Column Default Values

Data Type

Default Value

Numeric

Zero

Fixed-length String

Blanks

Varying-length String

String of length zero

Row identifier

Actual ROWID for the row

Date

Current date

Time

Current time

Timestamp

Current timestamp


For existing rows, when a non-nullable column is added to a table, DATE , TIME , and TIMESTAMP data types default to the lowest possible value instead of the current value. DATE types will default to January 1, 0001 ; TIME types will default to 0:00:00 ; and timestamp types will default to a date of January 1, 0001 and a time of 0:00:00:00 .

Four options are available for user-defined defaults: a constant value, USER , CURRENT SQLID , and NULL . When specifying a constant, the value must conform to the column on which it is defined. Specifying USER causes the column to default to the contents of the USER special register. When CURRENT SQLID is specified, the default value will be the SQL authid of the process performing the INSERT . NULL is self-explanatory.

In general, it is best to explicitly define the default value to be used for each column. If the system-defined default values are adequate for your application, it is fine to use them by not providing a value following the DEFAULT clause. Consider the following column definitions:

 

 BONUS       DECIMAL(9,2)  DEFAULT 500.00, COMM        DECIMAL(9,2)  NOT NULL WITH DEFAULT, 

If a row is inserted without specifying BONUS and COMM , BONUS will default to 500.00 and COMM will default to zero.

Choose VARCHAR Columns Carefully

You can save DASD storage space by using variable columns instead of placing small amounts of data in a large fixed space. Each variable column carries a 2-byte overhead, however, for storing the length of the data. Additionally, variable columns tend to increase CPU usage and can cause the update process to become inefficient. When a variable column is updated with a larger value, the row becomes larger; if not enough space is available to store the row, it must be moved to another page. This makes the update and any subsequent retrieval slower.

Follow these rules when defining variable character columns:

  • Avoid variable columns if a sufficient amount of DASD is available to store the data using fixed columns.

  • Do not define a variable column if its maximum length is less than 30 bytes. Instead, simply define the column as a fixed length CHAR (or GRAPHIC ) column.

  • Do not define a variable column if its maximum length is within 10 bytes of the average length of the column.

  • Do not define a variable column when the data does not vary from row to row.

  • Place variable columns at the end of the row, but before columns that are frequently updated. Refer back to the "Sequence Columns to Achieve Optimal Performance" section for more details.

  • Consider redefining variable columns by placing multiple rows of fixed-length columns in another table or by shortening the columns and placing the overflow in another table.

Compression Versus VARCHAR Columns

Using DB2 compression, you can achieve similar results as with VARCHAR columns. However, DB2 compression avoids the two bytes of overhead and requires no programmatic intervention for handling the two-byte column length information.

On the other hand, VARCHAR columns affect data for the column only. With compression, the entire row is affected. Therefore, there is a greater chance that an UPDATE of a compressed row will need to be relocated to another page if its size has increased.

Altering VARCHAR Columns

As of DB2 V6, you can ALTER the length of a VARCHAR column to a greater length. However, you cannot ALTER the length of a VARCHAR column to a smaller length.

Monitor the Effectiveness of Variable Columns

Using views and SQL, it is possible to query the DB2 Catalog to determine the effectiveness of using VARCHAR for a column instead of CHAR . Consider, for example, the PROJNAME column of the DSN8810.PROJ table. It is defined as VARCHAR(24) .

To gauge whether VARCHAR is appropriate follow these steps:

  1. Create a view that returns the length of the NAME column for every row, for example:

     

     CREATE VIEW PROJNAME_LENGTH      (COL_LGTH) AS   SELECT LENGTH(PROJNAME)      FROM   DSN8810.PROJ; 

  2. Issue the following query using SPUFI to produce a report detailing the LENGTH and number of occurrences for that length:

     

     SELECT   COL_LGTH, COUNT(*) FROM     PROJNAME_LENGTH GROUP BY COL_LGTH ORDER BY COL_LGTH; 

This query will produce a report listing the lengths (in this case, from 1 to 24, excluding those lengths which do not occur) and the number of times that each length occurs in the table. These results can be analyzed to determine the range of lengths stored within the variable column.

If you are not concerned about this level of detail, the following query can be used instead to summarize the space characteristics of the variable column in question:

 

 SELECT  24*COUNT(*),         24,         SUM(2+LENGTH(PROJNAME)),         AVG(2+LENGTH(PROJNAME)),         24*COUNT(*)-SUM(2+LENGTH(PROJNAME)),         24-AVG(2+LENGTH(PROJNAME)) FROM    DSN8810.PROJ; 

The constant 24 will need to be changed in the query to indicate the maximum length of the variable column as defined in the DDL. The individual columns returned by this report are defined in the following list:

Definition

Calculation

Space used as CHAR(24)

24*COUNT(*)

Average space used as CHAR(24)

24

Space used as VARCHAR(24)

SUM(2+LENGTH(PROJNAME))

Average space used as VARCHAR(24)

AVG(2+LENGTH(PROJNAME))

Total space saved

24*COUNT(*)-SUM(2+LENGTH(PROJNAME))

Average space saved

24-AVG(2+LENGTH(PROJNAME))


Consider Using Odd DECIMAL Precision

Think about making the precision of all DECIMAL columns odd. This can provide an extra digit for the column being defined without using additional storage. For example, consider a column that must have a precision of 6 with a scale of 2. This would be defined as DECIMAL(6,2) . By defining the column as DECIMAL(7,2) instead, numbers up to 99999.99 can be stored instead of numbers up to 9999.99. This can save future expansion efforts.

However, if you must ensure that the data in the column conforms to the specified domain (that is, even precision), specify even precision.

Consider All Options When Defining Columns As INTEGER

Use SMALLINT instead of INTEGER when the “32,768 to 32,767 range of values is appropriate. This data type usually is a good choice for sequencing type columns. The range of allowable values for the INTEGER data type is “2,147,483,648 to 2,147,483,647. These ranges might seem arbitrary, but are designed to store the maximum amount of information in the minimum amount of space. A SMALLINT column occupies 2 bytes, and an INTEGER column occupies only 4 bytes.

The alternative to SMALLINT and INTEGER data types is DECIMAL with a 0 scale. DECIMAL(5,0) supports the same range as SMALLINT , and DECIMAL(10,0) supports the same range as INTEGER . The DECIMAL equivalent of SMALLINT occupies 3 bytes of storage but permits values as large as 99,999 instead of only 32,767. The DECIMAL equivalent of INTEGER occupies 6 bytes but permits values as large as 9,999,999,999 instead of 2,147,483,647.

When deciding whether to use DECIMAL without scale to represent integer columns, another factor is control over the domain of acceptable values. The domain of SMALLINT and INTEGER columns is indicated by the range of allowable values for their respective data types. If you must ensure conformance to a domain, DECIMAL without scale provides the better control.

Suppose that you code a column called DAYS_ABSENT that indicates the number of days absent for employees in the DSN8810.EMP table. Suppose too that an employee cannot miss more than five days per year without being disciplined and that no one misses ten or more days. In this case, a single digit integer column could support the requirements for DAYS_ABSENT . A DECIMAL(1,0) column would occupy 2 bytes of physical storage and provide for values ranging from “9 to 9. By contrast, a SMALLINT column would occupy two bytes of physical storage and provide for values ranging from “32768 to 32,767. The DECIMAL(1,0) column, however, more closely matches the domain for the DAYS_ABSENT columns.

One final consideration : A decimal point is required with DECIMAL data, even when the data has no scale. For example, the integer 5 is 5, when expressed as a decimal. This can be confusing to programmers and users who are accustomed to dealing with integer data without a decimal point.

Consider all these factors when deciding whether to implement SMALLINT , INTEGER , or DECIMAL data types for integer columns.

Use LOBs for Large Multimedia Objects

As of V6, DB2 provides support for large multimedia data types. Using BLOB , CLOB , and DBCLOB data types, DB2 can be used to store complex, unstructured data, such as images, audio, text, and video. Such data types can be used to store much larger multimedia data than can be stored in VARGRAPHIC or VARCHAR columns. Of course, these data types need to be treated differently by application programs and there are management considerations (the larger they become the more difficult it is to administer the data). For more details on the object/relational capabilities of DB2, consult Chapter 9, "Large Objects and Object/Relational Databases."

 <  Day Day Up  >  


DB2 Developers Guide
DB2 Developers Guide (5th Edition)
ISBN: 0672326132
EAN: 2147483647
Year: 2004
Pages: 388

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net