< Day Day Up > |
The table is the basic unit of data that is accessible when using SQL. Data is inserted to, deleted from, and updated within DB2 tables. Once populated , data can be selected from a DB2 table. Basically, the table is the means by which end users gain access to DB2 data. In general, you will define one table for each entity for which you will be storing data. A table can be thought of as a grouping of attributes that identify a physical entity. The table name should conform to the entity name . For example, consider the sample table for employees , DSN8810.EMP . EMP is the name of the table that represents an entity known as "employee." An employee has many attributes, some of which are EMPNO , FIRSTNME , and LASTNME . These attributes are columns of the table. When you create one table for each entity, the tables are easy to identify and use because they represent real-world "things." Of course, at times, the simple rule of one "physical" table per "logical" entity will need to be broken. This usually occurs when you need to denormalize for performance reasons. More details on are provided later in this chapter in the section "Denormalization." DB2 Table ParametersThe preceding section concentrated on the rows and columns of a DB2 table. Other parameters also must be considered when creating DB2 tables. This section provides guidelines to assist you in your table creation endeavors. Encoding SchemeThe CCSID parameter can be used to specify ASCII , EBCDIC , or UNICODE encoding at the table level as well as at the table space level. All data stored within a table space must use the same encoding scheme. Any indexes defined for tables in the table space will have the same encoding scheme as the table space.
DROP RestrictionTo prohibit inadvertent table drops , use the WITH RESTRICT ON DROP clause of the CREATE TABLE statement. When WITH RESTRICT ON DROP is specified, drops cannot be issued for the table, its table space, and its database. To subsequently drop the table, it must first be altered to remove the RESTRICT ON DROP specification. DB2-Enforced Table AuditingIf you must audit user access to DB2 tables, you can specify an audit rule for your tables. Although the auditing features of DB2 are rudimentary, sometimes they are useful. DB2 has three table audit options: NONE , CHANGES , and ALL . DB2 table auditing is done on a unit-of-work basis only. DB2 audits only the first table access of any particular type for each unit of work, not every table access. AUDIT CHANGES writes an audit trace record for the first insert, update, and delete made by each unit of work. AUDIT ALL writes an audit trace record for the first select, insert, update, and delete made by each unit of work. By specifying AUDIT NONE or by failing to code an audit parameter, table auditing is inactivated. Before deciding to audit DB2 table access, consider that table auditing incurs overhead ”each time a table is accessed in a new unit of work, an audit trace record is written. Additionally, even if auditing has been specified for a given table, no audit trace records are written unless the appropriate DB2 audit trace classes are activated. For AUDIT CHANGES , activate audit trace classes 1, 2, 3, 4, 7, and 8. For AUDIT ALL , activate audit trace classes 1 through 8. NOTE Keep in mind that the ALTER TABLE statement itself is audited only if AUDIT CHANGES or AUDIT ALL is specified and the appropriate audit trace class has been activated. In general, to alleviate overhead, do not audit table access unless your application absolutely requires it. Specifying Your Own OBIDYou can use the OBID parameter to explicitly specify a particular OBID (object identifier) for DB2 to use for the table being created. An OBID is an identifier assigned by DB2 for its own internal use and identification purposes. You might choose to specify an OBID if you are re-creating a table that used to exist and you wish to keep the previous OBID that was assigned to the table. For example, if you are dropping a table but will be re-creating it with changes you may wish to keep the previous OBID for that table. To obtain the current OBID of any table you can query the DB2 Catalog as follows (supplying the CREATOR and NAME of the table for the question marks in the query): SELECT OBID FROM SYSIBM.SYSTABLES WHERE CREATOR = ? AND NAME = ?; You cannot use the OBID parameter to assign an OBID to a table that is already assigned to an existing DB2 table. Augmenting the Log with Changed InformationIf you are using a data propagation technology such as IBM's DataPropagator product, you may have to use the DATA CAPTURE parameter to augment the information captured on the DB2 log. There are two options for DATA CAPTURE , NONE , and CHANGES . DATA CAPTURE NONE , which is the default, specifies that no additional information is to be recorded on the DB2 log. DATA CAPTURE CHANGES , however, will cause DB2 to write additional information about inserted, updated, and deleted data. When a data propagation tool uses the DB2 log to capture changed information this additional data may be required to ensure the validity of the propagated data. NOTE DATA CAPTURE CHANGES will not capture additional information for LOB columns. Temporary TablesMost DB2 tables are permanent, meaning that once created, they exist until an authorized user drops them. At times, though, you may need to create DB2 tables that exist only for the duration of a program run. Such a table is known as a temporary table. DB2 has provided the capability to create temporary tables since Version 5. But the initial functionality was practical only in certain circumstances due to some inherent limitations. This first type of temporary table is known as a created (or global) temporary table.
Why Use Temporary Tables?Before we investigate these two types of temporary tables, let's first address why anyone would want or need to use a temporary table in the first place. One potential use of temporary tables is to store intermediate SQL results during a program run. Consider, for example, if the results of a first query need to be used in a subsequent query. Instead of rerunning the first query (or combining it with the subsequent query), the results of the first query can be stored in a temporary table. Then the temporary table can be joined into the second query without incurring the overhead of rerunning the first query. This is particularly useful if the first query is particularly complex or inefficient. Or what about result sets that need to be returned more than once by the same program? Consider this scenario: A complex multi-table join is coded that consumes a lot of resources to run. Furthermore, that join statement needs to be run three times during the course of the program. Instead of running the join three times you can run it once and populate a temporary table with the results. The next two times you can simply read the temporary table which might be more efficient than re-executing the complex, resource-consuming multi-table join. Temporary tables also are useful for enabling non-relational data to be processed using SQL. For example, you can create a global temporary table that is populated with IMS data (or any other non-relational data source) by a program. Then during the course of that program, the temporary table containing the IMS data can be accessed by SQL statements and even joined to other DB2 tables. The same could be done for data from a flat file, VSAM, IDMS, or any other non-relational data. Another reason for IBM's inclusion of temporary table support in DB2 is to make conversion from other relational products easier. Microsoft SQL Server and Oracle both have supported temporary tables for quite some time now. Without such support in DB2 it was very difficult for developers to convert or port their Oracle or SQL Server applications to DB2. IBM alleviated this problem by enabling temporary table support in DB2. Now let's examine the two types of temporary tables supported by DB2. Created Temporary TablesA created temporary table exists only as long as the process that uses it. Temporary tables are created using the CREATE GLOBAL TEMPORARY TABLE statement. When created, the schema for the table is stored in the DB2 system catalog ( SYSIBM.SYSTABLES ) just like any other table, but the TYPE column is set to 'G' to indicate a global temporary table. Created temporary tables are sometimes referred to as global temporary tables, but this is confusing since declared temporary tables are also referred to as global when they are created. It is important to remember that a created global temporary table must be created using a DDL CREATE statement before it can be used in any program. A created temporary table is instantiated when it is referenced in an OPEN , SELECT INTO , INSERT , or DELETE statement, not when it is created. Each application process that uses the temporary table creates a new instance of the table for its use. When using a temporary table, keep the following in mind:
Work file data sets are used to manage the data of created temporary tables. The work database ( DSNDB07 ) is used as storage for processing SQL statements that require working storage ”not just for created temporary tables. If you are using created temporary tables, be sure to examine the DB2 Installation Guide for tactics to estimate the disk storage required for temporary work files. When a temporary work file result table is populated using an INSERT statement, it uses work file space. No other process can use the same work file space as that temporary work file table until the table goes away. The space is reclaimed when the application process commits or rolls back, or when it is deallocated, depending on which RELEASE option was used when the plan or package was bound. It is a good idea to keep the work files in a separate buffer pool to make it easier to monitor. IFCID 0311 in performance trace class 8 can be used to distinguish these tables from other uses of the work file. Declared Temporary Tables
This new type of temporary table is different from a created temporary table and overcomes many of their limitations. The first significant difference between declared and created temporary tables is that declared temporary tables are specified using a DECLARE statement in an application program ”and not using a DDL CREATE statement. Because they are not persistent they do not have descriptions in the DB2 Catalog. Additionally, declared temporary tables offer significant features and functionality not provided by created temporary tables. Consider
So, declared temporary tables offer much more flexibility and functionality than created temporary tables. To "create" an instance of a declared temporary table, you must issue the DECLARE GLOBAL TEMPORARY TABLE statement inside of an application program. That instance of the declared temporary table is known only to the process that issues the DECLARE statement. Multiple concurrent programs can be executed using the same declared temporary table name because each program will have its own copy of the temporary table. But there is more work required to use a declared temporary table than there is to use a created temporary table. Before you can declare temporary tables you must create a temporary database and table spaces for them to use. This is accomplished by specifying the AS TEMP clause on a CREATE DATABASE statement. Then, you must create segmented table spaces in the temporary database. Only one temporary database for declared temporary tables is permitted per DB2 subsystem. CAUTION Everyone has the CREATETAB privilege in the temporary database by default ”and you cannot REVOKE this privilege. It is implicitly granted when the temporary database is created. When a DECLARE GLOBAL TEMPORARY TABLE statement is issued, DB2 will create an empty instance of the temporary table in the temporary table space. INSERT statements are used to populate the temporary table. Once inserted, the data can be accessed, modified, or deleted. When the program completes, DB2 will drop the instance of the temporary table. Also, be aware that users of temporary tables must have been granted USE authority on the temporary table space. The following example shows a DECLARE statement that can be issued from an application program ( assuming the temporary database and table spaces already have been defined): DECLARE GLOBAL TEMPORARY TABLE TEMP_EMP (EMPNO CHAR(6) NOT NULL, FIRSTNME VARCHAR(12) NOT NULL, MIDINIT CHAR(1) NOT NULL, LASTNAME VARCHAR(15) NOT NULL, WORKDEPT CHAR(3), PHONENO CHAR(4) ); This creates a declared temporary table named TEMP_EMP . Additionally, you can use the LIKE clause to DECLARE a temporary table that uses the same schema definition as another currently existing table. You can use the INCLUDING IDENTITY COLUMN ATTRIBUTES clause to copy the IDENTITY columns as well. For example: DECLARE GLOBAL TEMPORARY TABLE TEMP_PROJ LIKE DSN8810.PROJ INCLUDING IDENTITY ON COMMIT PRESERVE ROWS; This example shows how to use the INCLUDING IDENTITY clause. However, the sample table DSN8810.PROJ does not use an IDENTITY column, so this statement would not work ”it is shown as an example only. Identity columns are covered later in this chapter in the section titled "Sequence Objects and Identity Columns." Notice also the ON COMMIT PRESERVE ROWS clause in the previous example. The ON COMMIT clause specifies what action DB2 is to take with the data in the declared temporary table when the program issues a COMMIT statement. There are two options: PRESERVE or DELETE rows. Specifying PRESERVE ROWS indicates that the rows of the table are to be kept. Beware, though, that the PRESERVE ROWS option impacts thread reuse. You will not be able to reuse threads for any application process that contains, at its most recent COMMIT , an active declared temporary table defined using the PRESERVE ROWS option of the ON COMMIT clause. The other option, which is the default, is ON COMMIT DELETE ROWS . In that case all of the rows of the table are deleted as long as there are no cursors defined using WITH HOLD . NOTE
Declared Temporary Table StorageBefore using declared temporary tables, the temporary database and temporary table spaces must be defined to store the temporary data. For example, CREATE DATABASE TEMPDB AS TEMP; CREATE TABLESPACE TEMPTS IN TEMPDB SEGSIZE 4 BUFFERPOOL BP7; The table space is created as a temporary table space by virtue of it being in the temporary database. The page size of the temporary table space must be large enough to hold the longest row in the declared temporary table. The size of a row in the declared temporary table might be considerably larger than the size of the row in the table for which the scrollable cursor is used. As with a regular table, the size of the row depends on the number of columns that are stored in the declared temporary table and the size of each column. Consider creating a temporary table space for every page size. The buffer pool assignment determines the page size. An in-depth discussion of calculating the storage requirements for a temporary table space for a declared temporary table is provided in the DB2 Installation Guide . Be sure to refer to that manual before implementing declared temporary tables or any features that rely on declared temporary tables (for example, scrollable cursors). NOTE Keep in mind that when there is more than one temporary table space defined to the DB2 subsystem, DB2 will select which temporary table spaces it will use for scrollable cursor processing. More information on scrollable cursors is provided in Chapter 11, "Using DB2 in an Application Program." Temporary Table GuidelinesThe following guidelines are provided for your assistance as you implement temporary tables for your DB2 applications. Favor Declared Temporary TablesFor applications that require temporary tables, favor declared temporary tables over created temporary tables. Declared temporary tables are more flexible and functional than created temporary tables. When using declared temporary tables you can index them, define CHECK constraints for them, and issue UPDATE statements and positioned DELETE statements against them, none of which can be done using created temporary tables. When to Consider Created Temporary TablesWith all of the limitations of created temporary tables why would anyone still want to use them instead of declared temporary tables? Well, there are a few potential problems with declared temporary tables, too. First of all, the DB2 Catalog ( SYSIBM.SYSPACKDEP ) will not show dependencies for declared temporary tables, but it will for created temporary tables. Second, some DBAs are leery of allowing database structures to be created by application programmers inside of an application programmer. With limited DDL and database design knowledge it may not be wise to trust programmers to get the table structure correct. Furthermore, the additional management of the temporary database and table spaces can become an administrative burden . So, created temporary tables are still useful ”in the right situations. They should be considered primarily when no updating of temporary data is needed and access to the temporary data is purely sequential. Use Temporary Tables with Stored ProceduresTemporary tables are most useful when a large result set must be returned from a stored procedure. Refer to Chapter 15, "Using DB2 Stored Procedures," for in-depth guidelines on using stored procedures. Use SQL to Access Non-Relational DataTemporary tables are useful for enabling non-relational data to be processed using SQL. For example, you can create a temporary table (global or created) and then populate it with IMS data in your application program. Then, during the course of that program, the temporary table containing the IMS data can be accessed by SQL statements and even joined to other DB2 tables. General Table GuidelinesThe following guidelines provide helpful hints for you to follow as you implement and manage tables within your DB2 applications and systems. Use LIKE to Duplicate a Table's SchemaUse the LIKE clause to create a table with the same columns as another table. The following SQL creates a new table OLD_PROJ using the PROJ table as a template: CREATE TABLE DSN8810.OLD_PROJ LIKE DSN8810.PROJ IN db.ts ; The LIKE clause is particularly useful in the following instances:
Consider Using CommentsConsider using the COMMENT ON statement to document the entities you create. For V7 and previous releases, as many as 254 characters of descriptive text can be applied to each column, table, alias, index, distinct type, procedure, sequence, trigger, and UDF known to DB2. The maximum size of a comment is increased to 762 characters as of DB2 V8. The comment text is stored in a column named REMARKS in the appropriate DB2 Catalog table as follows:
If useful descriptions are maintained for all columns and tables, the DB2 Catalog can function as a crude data dictionary for DB2 objects. However, be aware that comments are stored in a VARCHAR column in each of the preceding catalog tables. CAUTION When comments are specified, the overall size of the DB2 Catalog will expand and might grow to be larger than expected. Weigh the benefits of added documentation against the impact on the DB2 Catalog before adding comments on all columns and tables. Avoid Specifying Labels for ColumnsDB2 provides the capability to label columns of DB2 tables using the LABEL ON statement. This was useful at times prior to DB2 Version 8 because the maximum length for a column name was 18 characters; the maximum length of a column label was 30 characters. The label is stored in the DB2 Catalog in the SYSIBM.SYSCOLUMNS tables. QMF users can specify that they want to use labels rather than column names , thereby providing better report headings. However, as of DB2 V8 the maximum length of a column name is 128 characters and the maximum length of the label is 90 characters, negating the benefit of a "descriptive" column label. CAUTION Be aware that specifying column labels will add to the overall size of the DB2 Catalog, specifically to the SYSIBM.SYSCOLUMNS table. However, labels will not cause the same amount of growth as comments because labels have a maximum size of 30 characters as opposed to 254 for comments; or, as of DB2 V8, 90 characters as opposed to 762 for comments. Changing the Name of a TableThe RENAME statement enables DBAs to change the name of a DB2 table without dropping and re-creating the table. All table characteristics, data, and authorization is maintained. This feature is not available prior to DB2 Version 5. Avoid Using the IN DATABASE ClauseWhen creating tables do not use the IN DATABASE clause. When IN DATABASE is used, a simple table space is implicitly created in the specified database. This simple table space will use all of the default parameters ”which, as we have already learned, are usually not optimal. Additionally, no other tables can be assigned to this table space. Instead, explicitly create a table space (of the proper type for the data), and then create the table in that table space. Rows and ColumnsWhen defining DB2 columns to a table you will need to choose a data type and perhaps a length for each column. Recall from Chapter 1, "The Magic Words," that DB2 supports the following data types:
As you define your DB2 tables you also will have to worry about other factors and assign other characteristics to each column. For example,
As you create DB2 tables, you should be mindful of their composition (rows and columns) and how this affects performance. This section outlines several guidelines that ensure efficient row and column specification. ROWIDThe ROWID data type is used to generate a unique value for every row in a table. The value is internally generated by DB2. A table can have only one ROWID column. The values in a ROWID column cannot be null. An additional parameter, GENERATED , must be specified for a column defined as a ROWID . The GENERATED parameter is reserved for ROWID columns only (as of DB2 Version 6). It indicates that the values for the column are to be generated by DB2. There are two options for GENERATED , one of which must be supplied:
CAUTION DB2 will use an explicitly specified value for a ROWID only if it is a valid ROWID value that was previously generated by DB2. Sequence Objects and Identity ColumnsWhen designing database a frequent request is for a column to contain sequentially generated numbers. For example, each row has a counter associated with it. When a new row is inserted, the counter should be incremented by one for the new row. Until recently such a design was difficult to deliver. Without sequence objects or identity columns an application program can implement similar functionality, but usually not in a manner that will perform adequately as a usage scale. One common technique is to maintain a one-row table that contains the sequence number. Each transaction locks that table, increments the number, and then commits to unlock the table. In this scenario only one transaction at a time can increment the sequence number. A variation uses something like this: SELECT MAX()+ 1 FROM ONEROW_TABLE WITH RR; The result is the next highest number to be used. This value is used by the application and ONEROW_TABLE must be updated with the incremented value. Performance bottlenecks will occur with this method when a lot of concurrent usage is required. But now DB2 offers two methods of automatically generating sequential numbers for a column:
Identity ColumnsIdentity columns were formally added to DB2 as of Version 7, but were also made available to DB2 Version 6 as a refresh. An identity column is defined to a DB2 column using the IDENTITY parameter. A column thusly defined will cause DB2 to automatically generate a unique, sequential value for that column when a row is added to the table. For example, identity columns might be used to generate unique primary key values or a value that somewhat mimics Oracle's row number capability. Using identity columns helps to avoid some of the concurrency and performance problems that can occur when application programs are used to populate sequential values for a "counter" column. When inserting data into a table that uses an identity column, the program or user will not provide a value for the identity column. Instead, DB2 automatically generates the appropriate value to be inserted. Only one identity column can be defined per DB2 table. Additionally, the data type of the column must be SMALLINT , INTEGER , or DECIMAL with a zero scale, that is DECIMAL( n ,0) . The data type also can be a user-defined DISTINCT type based on one of these numeric data types. The designer has control over the starting point for the generated sequential values, and the number by which the count is incremented. An example of creating a table with an identity column follows: CREATE TABLE EXAMPLE (ID_COL INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY START WITH 100 INCREMENT BY 10 ...); In this example, the identity column is named ID_COL . The first value stored in the column will be 100 and subsequent INSERT s will add 10 to the last value. So the identity column values generated will be 100, 110, 120, 130, and so on. Note, too, that each identity column has a property associated with it assigned using the GENERATED parameter. This parameter indicates how DB2 generates values for the column. You must specify GENERATED if the column is to be considered an identity column or the data type of the column is a ROWID . This means that DB2 must be permitted to generate values for all identity columns. There are two options for the GENERATED parameter, ALWAYS and BY DEFAULT :
Additionally, you can specify what to do when the maximum value is hit. Specifying the CYCLE keyword will cause DB2 to begin generating values from the minimum value all over again. Of course, this can cause duplicate values to be generated and should only be used when uniqueness is not a requirement. Sometimes it is necessary to retrieve the value of an identity column immediately after it is inserted. For example, if you are using identity columns for primary key generation you may need to retrieve the value to provide the foreign key of a child table row that is to be inserted after the primary key is generated. The IDENTITY_VAL_LOCAL() function can be used to retrieve the value of an identity column after insertion. For example, run the following statement immediately after the INSERT statement that sets the identity value: VALUES IDENTITY_VAL_LOCAL() INTO :IVAR; The host variable IVAR will contain the value of the identity column. Problems with Identity ColumnsIdentity columns can be useful, depending on your specific needs, but the problems that accompany identity column are numerous . Some of these problems include
If you can live with these caveats, then identity columns might be useful to your applications. However, in general, these "problems" make identity columns a very niche solution. IBM has intentions to rectify some of these problems over time in upcoming versions of DB2. SEQUENCE ObjectsRecall that DB2 has two methods of automatically generating sequential numbers. The first method is to define an identity column for the table; the second is to create a SEQUENCE object. A SEQUENCE object is a separate structure that generates sequential numbers.
A SEQUENCE objects is created using the CREATE SEQUENCE statement. When the SEQUENCE object is created it can be used by applications to "grab" a next sequential value for use in a table. SEQUENCE objects are ideal for generating sequential, unique numeric key values. A sequence can be accessed and incremented by many applications concurrently without the hot spots and performance degradation associated with other methods of generating sequential values. SEQUENCE objects also can be used by more than one column in the same table. Sequences are efficient and can be used by many users at the same time without causing performance problems. Multiple users can concurrently and efficiently access SEQUENCE objects because DB2 does not wait for a transaction to COMMIT before allowing the sequence to be incremented again by another transaction. An example of creating a SEQUENCE object follows: CREATE SEQUENCE ACTNO_SEQ AS SMALLINT START WITH 1 INCREMENT BY 1 NOMAXVALUE NOCYCLE CACHE 10; This creates the SEQUENCE object named ACTNO_SEQ . Now it can be used to generate a new sequential value, for example INSERT INTO DSN8810.ACT (ACTNO, ACTKWD, ACTDESC) VALUES (NEXT VALUE FOR ACTNO_SEQ, 'TEST', 'Test activity'); The NEXT VALUE FOR clause is known as a sequence expression. Coding the sequence expression causes DB2 to use the named SEQUENCE object to automatically generate the next value. You can use a sequence expression to request the previous value that was generated. For example, SELECT PREVIOUS VALUE FOR ACTNO_SEQ INTO :IVAR FROM DSN8810.ACT; As you can see, sequence expressions are not limited to INSERT statements, but can be used in UPDATE and SELECT statements, too. CAUTION If you specify the NEXT VALUE FOR clause more than once in the same SQL statement, DB2 will return the same value for each NEXT VALUE FOR specification. SEQUENCE Object ParametersSimilar to identity columns, a SEQUENCE object has parameters to control the starting point for the generated sequential values, and the number by which the count is incremented. You can also specify the data type to be generated (the default is INTEGER ). You can also specify a minimum value ( MINVALUE ) and a maximum value ( MAXVALUE ) if you wish to have further control over the values than is provided by the data type chosen . Again, as with identity columns, you can specify how the SEQUENCE should handle running out of values when the maximum value is hit. Specifying the CYCLE keyword will cause the SEQUENCE object to wrap around and begin generating values from the minimum value all over again. A final consideration for SEQUENCE objects is caching. Sequence values can be cached in memory to facilitate better performance. The size of the cache specifies the number of sequence values that DB2 will pre-allocate in memory. In the previous example CACHE 10 indicates that ten sequence values will be generated and stored in memory for subsequent use. Of course, you can turn off caching by specifying NO CACHE . With caching turned off each new request for a sequence number will cause I/O to the DB2 Catalog ( SYSIBM.SYSSEQUENCES ) to generate the next sequential value. SEQUENCE Object GuidelinesDB2 does not wait for an application that has incremented a sequence to commit before allowing the sequence to be incremented again by another application. Applications can use one sequence for many tables, or create multiple sequences for use of each table requiring generated key values. In either case, the applications control the relationship between the sequences and the tables. The name of the SEQUENCE object indicates that we are going to use it to generate activity numbers ( ACTNO ), but its usage is not limited to that. Of course, failure to control the use of a SEQUENCE object can result in gaps in the sequential values. For example, if we use the ACTNO_SEQ object to generate a number for a different column, the next time we use it for ACTNO there will be a gap where we generated that number. Other scenarios can cause gaps in a SEQUENCE , too. For example, issuing a ROLLBACK after acquiring a sequence number will not roll back the value of the sequence generator ”so that value is lost. A DB2 failure can also cause gaps because cached sequence values will be lost. NOTE When sequences were introduced in non-mainframe DB2, syntax was supported that did not conform to the SQL standard. This non-standard syntax is supported on the mainframe as well:
Choosing Between IDENTITY and SEQUENCEAlthough both IDENTITY columns and SEQUENCE objects are useful for generating incremental numeric values, you will need to choose between the two. Consider the following criteria when choosing one over the other. IDENTITY columns are useful when
SEQUENCE objects are useful when
Unlike SEQUENCE objects, which are more flexible, IDENTITY columns must adhere to several rigid requirements. For example, an IDENTITY column is always defined on a single table, and each table can have at most one IDENTITY column. Furthermore, when you create an IDENTITY column, the data type for that column must be numeric ”not so for sequences. If you used a SEQUENCE object to generate a value, you could put that generated value into a CHAR column, for example. Finally, when defining an IDENTITY column you cannot specify the DEFAULT clause, and the column is implicitly defined as NOT NULL . Remember, DB2 automatically generates the IDENTITY column's value, so default values and nulls are not useful concepts. Consult Table 5.7 for a summary comparison of SEQUENCE objects and IDENTITY column characteristics. Table 5.7. IDENTITY Columns Versus SEQUENCE Objects
Row and Column GuidelinesThe following guidelines can be used to assist you as you design the row and column specifications of your DB2 tables. Avoid Wasted SpaceIf you do not use very large and very small row sizes, you can reduce the amount of space wasted by unuseable bytes on the pages of a table space. Keep these rules in mind:
Determine row size carefully to avoid wasting space. If you can combine small tables or split large tables to avoid wasting a large amount of space, do so. It usually is impossible to avoid wasting some space, however. Define All Appropriate ColumnsBe sure to apply normalization to your database design to arrive at the appropriate columns for each DB2 table. An introduction to normalization is forthcoming in a later section of this chapter. Keep in mind, however, that the maximum number of columns that can be defined for a DB2 table is 750. If the table participates as a dependent to another table (referential integrity), then the maximum number of columns is reduced to 749. For a table with more than 749 or 750 columns, you might need to break it into two tables, each with the same primary key, but with a different subset of the total number of columns. Choose Meaningful Column NamesIn many data processing shops , common names for data elements have been used for years . Sometimes these names seem arcane because they comply with physical constraints that have long since been overcome .
This might be a tough sell in your organization, but it's well worth the effort. If you must support the older, non-descriptive names, consider creating tables with the fully descriptive names and then creating views of these tables with the old names. Eventually, people will convert to use the tables instead of the views. Standardize AbbreviationsEvery shop uses abbreviated data names. This isn't a bad practice ”unless the specification of abbreviations is random, uncontrolled, or unplanned . Document and enforce strict abbreviation standards for data names in conjunction with your data-naming standards. For example, the CUSTOMER_NAME column mentioned in the previous guideline can be abbreviated in many ways ( CST_NME , CUST_NM , CUST_NAME , and so on). Choose one standard abbreviation and stick to it. Many shops use a list of tokens to create data abbreviation standards. This is fine as long as each token represents only one entity and each entity has only one abbreviation. For example:
Sequence Columns to Achieve Optimal PerformanceThe sequencing of columns in a table is not important from a functionality perspective because the relational model states that columns must be non-positional. Columns and rows do not need to be sequenced for the retrieval commands to work on tables. When you create a table, however, you must supply the columns in a particular order, and that becomes the order in which they physically are stored. The columns then can be retrieved in any order using the appropriate SQL SELECT statement. When creating your tables, you will get better performance if you sequence the columns with an understanding of how DB2 logs. When data is modified in your database, DB2 will log every change made (in most circumstances). At a high level, this is how DB2 logs:
"So what?" you may be asking. Logging can cause a significant performance bottleneck. You can optimize performance by sequencing your columns such that you minimize the amount of data logged. The less data DB2 needs to log, the better the performance of your data modification operations will be. So, follow these rules for column sequencing:
NOTE A varying length row is any row that contains a VARCHAR or VARGRAPHIC column or any row that is compressed. Using ALTER TABLE to add a column to an existing table also makes the rows varying length (until the table space for the table is reorganized). Avoid Special Sequencing for Nullable ColumnsTreat nullable columns the same as you would any other column. Some DBAs advise you to place nullable columns of the same data type after non-nullable columns. This is supposed to assist in administering the null columns, but in my opinion it does not. Sequencing nullable columns in this manner provides no clear benefit and should be avoided. See the "DB2 Table Parameters" section in this chapter for additional advice on nullable columns and Chapter 2, "Data Manipulation Guidelines," for advice on accessing nullable columns. Use Nulls with CareA null is DB2's attempt to record missing or unknown information. When you assign a null to a column instance, it means that a value currently does not exist for the column. It's important to understand that a column assigned to null logically means one of two things: The column does not apply to this row, or the column applies to this row, but the information is not known at present. For example, suppose that a table contains information on the hair color of employees. The HAIR_COLOR column is defined in the table as being capable of accepting nulls. Three new employees are added today: a man with black hair, a woman with unknown hair color, and a bald man. The woman with the unknown hair color and the bald man both could be assigned null HAIR_COLOR , but for different reasons. The hair column color for the woman would be null because she has hair but the color presently is unknown. The hair color column for the bald man would be null also, but this is because he has no hair and so hair color does not apply. DB2 does not differentiate between nulls that signify unknown data and those that signify inapplicable data. This distinction must be made by the program logic of each application. DB2 represents null in a special hidden column known as an indicator variable . An indicator variable is defined to DB2 for each column that can accept nulls. The indicator variable is transparent to an end user, but must be provided for when programming in a host language (such as COBOL or PL/I). Every column defined to a DB2 table must be designated as either allowing or disallowing nulls. The default definition for columns in a DB2 table is to allow nulls. Nulls can be prohibited for a column by specifying the NOT NULL or NOT NULL WITH DEFAULT option in the CREATE TABLE statement. Avoid nulls in columns that must participate in arithmetic logic (for example, DECIMAL money values). The AVG , COUNT DISTINCT , SUM , MAX , and MIN functions omit column occurrences set to null. The COUNT(*) function, however, does not omit columns set to null because it operates on rows. Thus, AVG is not equal to SUM/COUNT(*) when the average is being computed for a column that can contain nulls. If the COMM column is nullable, the result of the following queries are not the same SELECT AVG(COMM) SELECT SUM(COMM)/COUNT(*) FROM DSN8810.EMP; FROM DSN8810.EMP; For this reason, avoid nulls in columns involved in math functions. When DATE , TIME , and TIMESTAMP columns can be unknown, assign them as nullable. DB2 checks to ensure that only valid dates, times, and timestamps are placed in columns defined as such. If the column can be unknown, it must be defined to be nullable because the default for these columns is the current date, current time, and current timestamp (unless explicitly defined otherwise using the DEFAULT clause). Null, therefore, is the only available option for the recording of missing dates, times, and timestamps. For every other column, determine whether nullability can be of benefit before allowing nulls. Consider these rules:
Examine these rules closely. ORDER BY , GROUP BY , DISTINCT , and unique indexes consider nulls to be equal and handle them accordingly . The SELECT statement, however, deems that the comparison of null columns is not equivalence, but unknown. This inconsistent handling of nulls is an anomaly that you must remember when using nulls. The following are several sample SQL queries and the effect nulls have on them. SELECT JOB, SUM(SALARY) FROM DSN8810.EMP GROUP BY JOB; This query returns the average salary for each type of job. All instances in which JOB is null will group at the bottom of the output. SELECT EMPNO, PROJNO, ACTNO, EMPTIME, EMSTDATE, EMENDATE FROM DSN8810.EMPPROJACT WHERE EMSTDATE = EMENDATE; This query retrieves all occurrences in which the project start date is equal to the project end date. This information is clearly erroneous, as anyone who has ever worked on a software development project can attest. The query does not return any rows in which either date or both dates are null for two reasons:
UPDATE DSN8810.DEPT SET MGRNO = NULL WHERE MGRNO = '000010'; This query sets the MGRNO column to null wherever MGRNO is currently equal to '000010' in the DEPT table. NOTE Nulls sometimes are inappropriately referred to as null values. Using the term value to describe a null column is incorrect because the term null implies the lack of a value. The relational model has abandoned the idea of nulls in favor of a similar concept called marks . The two types of marks are an A-mark and an I-mark . An A-mark refers to information that is applicable but presently unknown, whereas an I-mark refers to inapplicable information (information that does not apply). If DB2 would implement marks rather than nulls, the problem of differentiating between inapplicable and unknown data would disappear. No commercial DBMS products support A-marks and I-marks. Define Columns Across Tables in the Same WayWhen a column that defines the same attribute as another column is given a different column name, data administrators refer it to as a column synonym . In general, column synonyms should be avoided except in the situations detailed in this guideline. Every attribute should be defined in one way, that is, with one distinct name and one distinct data type and length. The name should be different only if the same attribute needs to be represented as a column in one table more than once, or if the practical meaning of the attribute differs as a column from table to table. For example, suppose that a database contains a table that holds the colors of items. This column is called Color. The same database has a table with a Preferred Color column for customers. This is the same logical attribute, but its meaning changes based on the context. It is not wise to simply call the column Color in the Customer table, because it would imply that the customer is that color! An attribute must be defined twice in self-referencing tables and in tables requiring multiple foreign key references to a single table. In these situations, create a standard prefixing or suffixing mechanism for the multiple columns. After you define the mechanism, stick to it. For example, the DSN8810.DEPT table in Appendix A, "DB2 Sample Tables," is a self-referencing table that does not follow these recommendations. The ADMRDEPT column represents the same attribute as the DEPTNO column, but the name is not consistent. A better name for the column would have been ADMR_DEPTNO . This adds the ADMR prefix to the attribute name, DEPTNO . The practical meaning of columns that represent the same attribute can differ from table to table as well. In the sample tables, for example, the MGRNO column in the DSN8810.DEPT table represents the same attribute as the EMPNO column in the DSN8810.EMP table. The two columns can be named differently in this situation because the employee number in the DEPT table represents a manager, whereas the employee number in the EMP table represents any employee. (Perhaps the MGRNO column should have been named MGR_EMPNO .) The sample tables provide another example of when this guideline should have been followed, but wasn't. Consider the same two tables: DSN8810.DEPT and DSN8810.EMP . Both contain the department number attribute. In the DEPT table, the column representing this attribute is DEPTNO , but in the EMP table, the column is WORKDEPT . This is confusing and should be avoided. In this instance, both should have been named DEPTNO . Never use homonyms. A homonym , in DB2-column terminology, is a column that is spelled and pronounced the same as another column, but represents a different attribute. Avoid Duplicate RowsTo conform to the relational model, every DB2 table should prohibit duplicate rows. Duplicate rows cause ambiguity and add no value. If duplicates exist for an entity, either the entity has not been rigorously defined and normalized or a simple counter column can be added to the table. The counter column would contain a number indicating the number of duplicates for the given row. Define a Primary KeyTo assist in the unique identification of rows, define a primary (or unique) key for every DB2 table. The preferred way to define a primary key is with the PRIMARY KEY clause of the CREATE TABLE statement. Consider Using a Surrogate KeyWhen the primary key for a table is very large it can be unwieldy to implement. For example, an eight-column key of 250 bytes could pose problems if there are numerous foreign keys that refer to it. If the primary key is defined using eight columns and 250 bytes, then every foreign key also will consume 250 bytes and use eight columns. If the length of the primary key is impractical to implement, consider defining a surrogate key. You might be able to use a DB2 TIMESTAMP or a SEQUENCE object to generate unique surrogate key values. Use Appropriate DB2 Data TypesUse the appropriate DB2 data type when defining table columns. Recall the list of valid DB2 data types presented earlier in this section. Some people may advise you to avoid certain DB2 data types ”this is unwise. Follow these rules:
There are more possible character (alphanumeric) values than there are numeric values for columns of equal length. For example, consider the following two columns: COLUMN1 SMALLINT NOT NULL COLUMN2 CHAR(5) NOT NULL COLUMN1 can contain values only in the range “32,768 to 32,767, for a total of 65,536 possible values. COLUMN2 , however, can contain all the permutations and combinations of legal alphabetic characters, special characters, and numerals. So you can see how defining numeric data as a numeric data type usually results in a more accurate access path selection by the DB2 optimizer; the specified domain is more accurate for filter factor calculations. Analyze DATE and TIME Columns Versus TIMESTAMP ColumnsWhen defining tables that require a date and time stamp, two solutions are available:
Each option has its benefits and drawbacks. Before choosing an approach, consider the following issues:
Consider the Display Format for DATE and TIME DataDB2 provides four options for displaying DATE and TIME data, as shown in Table 5.8. Each format conforms to a standard means of displaying date and time data: EUR is European Standard, ISO is the International Standards Organization format, JIS is Japanese Industrial Standard, and USA is IBM United States of America Standard. Table 5.8. DB2 Date and Time Formats
DB2 also allows for the creation of an installation-defined date format. One of these formats is chosen as the default standard for your DB2 subsystem at installation time. The default is ISO . Any format can be displayed using the CHAR() function (previously described in Chapter 3, "Using DB2 Functions"). CAUTION Avoid choosing the USA format as the default. The USA format causes TIME data to be displayed without the seconds component, instead appending an AM or PM . EUR , ISO , and JIS all display TIME in military format, specifying 1 through 24 for the hour . The USA format does not, instead specifying 1 through 12 for the hour, and using AM and PM to designate morning and evening times. If the default format is USA , TIME columns will be displayed without seconds and with the AM or PM extension. When data is unloaded using DSNTIAUL , the seconds information is lost. This can result in data integrity problems if the unloaded data is subsequently loaded to another table or used as input for other processes. Consider Optimization When Choosing Data TypeThe impact on optimization is another consideration when deciding whether to use a character or a numeric data type for a numeric column. Consider, for example, a column that must store four byte integers. This can be supported using a CHAR(4) data type or a SMALLINT data type. Often times, the desire to use CHAR(4) is driven by the need to display leading zeroes on reports . Data integrity will not be an issue assuming that all data is edit checked prior to insertion to the column (a big assumption). But even if edit checks are coded, DB2 is not aware of these and assumes that all combinations of characters are permitted. For access path determination on character columns, DB2 uses base 37 math. This assumes that usually one of the 26 alphabetic letters or the 10 numeric digits or a space will be used. This adds up to 37 possible characters. For a four-byte character column there are 37 4 or 1,874,161 possible values. A SMALLINT column can range from “32,768 to 32,767 producing 65,536 possible small integer values. The drawback here is that negative or 5 digit product codes could be entered. However, if we adhere to our proper edit check assumption, the data integrity problems will be avoided here, as well. DB2 will use the HIGH2KEY and LOW2KEY values to calculate filter factors. For character columns, the range between HIGH2KEY and LOW2KEY is larger than numeric columns because there are more total values. The filter factor will be larger for the numeric data type than for the character data type, which might influence DB2 to choose a different access path. For this reason, favor the SMALLINT over the CHAR(4) definition. Choose a Data Type Closest to the Desired DomainIt is always best to choose the data type for each column to be the one that is closest to its domain. By doing so, DB2 will perform data integrity checking that otherwise would need to be coded into application programs or CHECK constraints. For example, if you are storing numeric data in the column, do not choose a character data type. In general, adhere to the following rules:
Specify Appropriate DefaultsWhen a row is inserted or loaded into a table and no value is specified for a column but the column has a default specified, the column will be set to the value that has been identified in the column default specification. Two types of defaults are available: system-defined and user-defined. Each column can have a default value specifically tailored to it. These are known as user-defined defaults. DB2 also provides specific system-defined defaults for each data type, which are used if an explicit default value is not specified. The system-defined defaults are outlined in Table 5.9. Table 5.9. System-Defined Column Default Values
For existing rows, when a non-nullable column is added to a table, DATE , TIME , and TIMESTAMP data types default to the lowest possible value instead of the current value. DATE types will default to January 1, 0001 ; TIME types will default to 0:00:00 ; and timestamp types will default to a date of January 1, 0001 and a time of 0:00:00:00 . Four options are available for user-defined defaults: a constant value, USER , CURRENT SQLID , and NULL . When specifying a constant, the value must conform to the column on which it is defined. Specifying USER causes the column to default to the contents of the USER special register. When CURRENT SQLID is specified, the default value will be the SQL authid of the process performing the INSERT . NULL is self-explanatory. In general, it is best to explicitly define the default value to be used for each column. If the system-defined default values are adequate for your application, it is fine to use them by not providing a value following the DEFAULT clause. Consider the following column definitions: BONUS DECIMAL(9,2) DEFAULT 500.00, COMM DECIMAL(9,2) NOT NULL WITH DEFAULT, If a row is inserted without specifying BONUS and COMM , BONUS will default to 500.00 and COMM will default to zero. Choose VARCHAR Columns CarefullyYou can save DASD storage space by using variable columns instead of placing small amounts of data in a large fixed space. Each variable column carries a 2-byte overhead, however, for storing the length of the data. Additionally, variable columns tend to increase CPU usage and can cause the update process to become inefficient. When a variable column is updated with a larger value, the row becomes larger; if not enough space is available to store the row, it must be moved to another page. This makes the update and any subsequent retrieval slower. Follow these rules when defining variable character columns:
Compression Versus VARCHAR ColumnsUsing DB2 compression, you can achieve similar results as with VARCHAR columns. However, DB2 compression avoids the two bytes of overhead and requires no programmatic intervention for handling the two-byte column length information. On the other hand, VARCHAR columns affect data for the column only. With compression, the entire row is affected. Therefore, there is a greater chance that an UPDATE of a compressed row will need to be relocated to another page if its size has increased. Altering VARCHAR ColumnsAs of DB2 V6, you can ALTER the length of a VARCHAR column to a greater length. However, you cannot ALTER the length of a VARCHAR column to a smaller length. Monitor the Effectiveness of Variable ColumnsUsing views and SQL, it is possible to query the DB2 Catalog to determine the effectiveness of using VARCHAR for a column instead of CHAR . Consider, for example, the PROJNAME column of the DSN8810.PROJ table. It is defined as VARCHAR(24) . To gauge whether VARCHAR is appropriate follow these steps:
This query will produce a report listing the lengths (in this case, from 1 to 24, excluding those lengths which do not occur) and the number of times that each length occurs in the table. These results can be analyzed to determine the range of lengths stored within the variable column. If you are not concerned about this level of detail, the following query can be used instead to summarize the space characteristics of the variable column in question: SELECT 24*COUNT(*), 24, SUM(2+LENGTH(PROJNAME)), AVG(2+LENGTH(PROJNAME)), 24*COUNT(*)-SUM(2+LENGTH(PROJNAME)), 24-AVG(2+LENGTH(PROJNAME)) FROM DSN8810.PROJ; The constant 24 will need to be changed in the query to indicate the maximum length of the variable column as defined in the DDL. The individual columns returned by this report are defined in the following list:
Consider Using Odd DECIMAL PrecisionThink about making the precision of all DECIMAL columns odd. This can provide an extra digit for the column being defined without using additional storage. For example, consider a column that must have a precision of 6 with a scale of 2. This would be defined as DECIMAL(6,2) . By defining the column as DECIMAL(7,2) instead, numbers up to 99999.99 can be stored instead of numbers up to 9999.99. This can save future expansion efforts. However, if you must ensure that the data in the column conforms to the specified domain (that is, even precision), specify even precision. Consider All Options When Defining Columns As INTEGERUse SMALLINT instead of INTEGER when the “32,768 to 32,767 range of values is appropriate. This data type usually is a good choice for sequencing type columns. The range of allowable values for the INTEGER data type is “2,147,483,648 to 2,147,483,647. These ranges might seem arbitrary, but are designed to store the maximum amount of information in the minimum amount of space. A SMALLINT column occupies 2 bytes, and an INTEGER column occupies only 4 bytes. The alternative to SMALLINT and INTEGER data types is DECIMAL with a 0 scale. DECIMAL(5,0) supports the same range as SMALLINT , and DECIMAL(10,0) supports the same range as INTEGER . The DECIMAL equivalent of SMALLINT occupies 3 bytes of storage but permits values as large as 99,999 instead of only 32,767. The DECIMAL equivalent of INTEGER occupies 6 bytes but permits values as large as 9,999,999,999 instead of 2,147,483,647. When deciding whether to use DECIMAL without scale to represent integer columns, another factor is control over the domain of acceptable values. The domain of SMALLINT and INTEGER columns is indicated by the range of allowable values for their respective data types. If you must ensure conformance to a domain, DECIMAL without scale provides the better control. Suppose that you code a column called DAYS_ABSENT that indicates the number of days absent for employees in the DSN8810.EMP table. Suppose too that an employee cannot miss more than five days per year without being disciplined and that no one misses ten or more days. In this case, a single digit integer column could support the requirements for DAYS_ABSENT . A DECIMAL(1,0) column would occupy 2 bytes of physical storage and provide for values ranging from “9 to 9. By contrast, a SMALLINT column would occupy two bytes of physical storage and provide for values ranging from “32768 to 32,767. The DECIMAL(1,0) column, however, more closely matches the domain for the DAYS_ABSENT columns. One final consideration : A decimal point is required with DECIMAL data, even when the data has no scale. For example, the integer 5 is 5, when expressed as a decimal. This can be confusing to programmers and users who are accustomed to dealing with integer data without a decimal point. Consider all these factors when deciding whether to implement SMALLINT , INTEGER , or DECIMAL data types for integer columns. Use LOBs for Large Multimedia ObjectsAs of V6, DB2 provides support for large multimedia data types. Using BLOB , CLOB , and DBCLOB data types, DB2 can be used to store complex, unstructured data, such as images, audio, text, and video. Such data types can be used to store much larger multimedia data than can be stored in VARGRAPHIC or VARCHAR columns. Of course, these data types need to be treated differently by application programs and there are management considerations (the larger they become the more difficult it is to administer the data). For more details on the object/relational capabilities of DB2, consult Chapter 9, "Large Objects and Object/Relational Databases." |
< Day Day Up > |