< Day Day Up > |
DB2 database changes are more flexible with DB2 V8 due to online schema evolution. IBM also refers to this feature as simple schema evolution, online schema changes, or just schema changes.
CAUTION For literal-minded readers, it most likely will not take millions of years for online schema evolution to support most types of database changes. I am simply using biological evolution as a metaphor for schema evolution. Online Change Management Prior to V8Of course, DB2 has offered some degree of online change management even prior to V8. For example, DB2 has provided the following capabilities for quite some time now:
These are just a few of the capabilities of DB2 change management, but some changes are not quite so easy. Online Schema Changes for V8Online schema evolution for DB2 V8 introduces some nice, new capabilities for managing database changes. What exactly can be changed today, as of DB2 Version 8?
Changing the Data Type of a ColumnSometimes it becomes necessary to change the data type of a column for an existing DB2 table. Prior to V8 this required dropping and re-creating the table, but as of V8 a column data type may be changed if the data can be converted from the old type to the new without losing significance. Essentially, this means that the new column definition has to allow for "larger" values than the current column definition. The ALTER TABLE statement can be used to change the data type of a column as indicated in Table 7.1. If the combination is not shown in this table, it is not supported ”that is, you cannot use ALTER to make such a change. Table 7.1. Data Type Changes Supported By ALTER
ALTER TABLE DSN8810.EMP ALTER COLUMN EDLEVEL SET DATATYPE DECIMAL(7,0); This is possible because the current data type of EDLEVEL was SMALLINT , so it can be changed to a DECIMAL with a scale of 5 or greater. After the ALTER runs successfully, DB2 creates a new "version" of the table space. The definition of the data type is stored in the DB2 Catalog and immediately applies to the data. Up to 256 concurrent versions of a table space and up to 16 concurrent versions of an index can be maintained by DB2. See the section "Versioning for Online Schema Changes" later in this chapter for more details on versioning. Keep in mind, though, that the existing data is not changed or reformatted on disk. Instead, when data is retrieved, the changed column(s) will be materialized in the new format. Updating or inserting data will cause the row to be saved using the format of the new data type. When the object is reorganized or rebuilt, the data is converted to the format of the latest version specified in the DB2 Catalog. This technique allows DB2 to offer the greatest availability to users with minimal performance degradation. Limitations on Changing a Data TypeKeep in mind that the data type can be changed only for character and numeric data types. You cannot change the data type of ROWID , DATE , TIME , TIMESTAMP , or FOR BIT DATA columns, nor can you change the length of an LOB column. Additionally, you cannot change the data type or length of a column under the following circumstances:
For each of these items you will need to DROP and re- CREATE the table to modify the data type or change its length. Impact of Changing a Data TypeWhen changing the data type for a column you need to be aware of the effect the change will have on other DB2 facilities and database objects. For example, when any column in a table has its data type changed, the plans, packages, and cached dynamic statements that reference the changed table are invalidated.
Statistics in the DB2 Catalog are also an issue. Any distribution statistics for the column in SYSIBM.SYSCOLDIST and SYSIBM.SYSCOLDISTSTATS will be invalidated when its data type changes. Additionally, the STATSTIME column in SYSIBM.SYSCOLUMNS will be set to January 1, 0001. This tells the optimizer to ignore the distribution frequency statistics. Be sure to run the RUNSTATS utility to repopulate the catalog with accurate column and index statistics as soon as possible after changing a column data type or length. Table spaces, indexes, and views are the obvious database objects that will need to be modified when a data type is changed. Check constraints are affected, as well. Every data type change requires the column's table space to be modified. Upon completion of a data type change, the table space will be placed in an AREO* exception state, which stands for Advisory REORG Pending. Users can continue to access the data while the table space is in the AREO* state, but performance will suffer because the columns will need to be converted from the old format to the format of the new data type. Additionally, when the data is modified, the entire row will be logged. Performance will continue to suffer until the table space is reorganized. Indexes need to be changed if the column whose data type has changed participates in an index. The availability of the index depends upon the data type of the column being changed. The index will be immediately available for use if a CHAR , VARCHAR , GRAPHIC , or VARGRAPHIC column is altered to increase its length. Altering a numeric data type to increase its length will result in delayed availability for the index. This includes columns defined as SMALLINT , INTEGER , DECIMAL , NUMERIC , FLOAT , REAL , or DOUBLE . The index is not immediately available because changes to numeric data would create severe performance problems. Instead, the index is placed into RBPD exception state, which stands for REBUILD Pending. If an entire index is rebuilt from the data, all the keys are converted to the latest format. The utilities that can be used to rebuild an index include REBUILD INDEX , REORG TABLESPACE , and LOAD REPLACE . CAUTION DB2 will not choose any index in an RBDP exception state for an access path . To resolve this problem run the REBUILD INDEX utility to remove the RBDP exception.
Views that reference an impacted column will be immediately regenerated. DB2 will examine the DB2 Catalog to perform this regeneration. Affected views are retrieved from the SYSIBM.SYSVIEWDEP table and then SYSIBM.SYSVTREE and SYSIBM.SYSVLTREE are used to review and modify the parse tree for the views. Keep in mind that a view can be created on another view, so this process might be recursive. Also, as with tables, a change to any column within a view invalidates all plans, packages, and dynamic cached statements that are dependent on that view. CAUTION The regeneration of a view can fail if the precision for decimal arithmetic does not work with the application. In this case, you must DROP and re- CREATE the view in order to correct the problem. Finally, when check constraints exist on a column whose data type or length has changed, the constraints will bea regenerated. CAUTION The regeneration of a check constraint can fail if the decimal point indicator or quote delimiter has changed since the check constraint was first defined. Changing an IndexPrior to DB2 V8, the aspects of an index that could be altered were limited to mostly storage characteristics. With V8 and online schema evolution, additional index attributes can be changed; you can use ALTER to add columns to an index, change the clustering specification, and modify the manner in which varying length index keys are treated. Adding Columns to an IndexTo add a column to an index under DB2 V8 you will use the ADD COLUMN clause of the ALTER INDEX statement. For example, to add a column to the XDEPT2 index (currently defined on the MGRNO column only) ALTER INDEX DSN8810.XDEPT2 ADD COLUMN (ADMRDEPT); Running this ALTER statement adds the ADMRDEPT column to the existing XDEPT2 index on the DSN8810.DEPT table. The new column will be appended to the end of the existing index key; you cannot change the existing order of a key or append a column to the beginning of the index key.
Changing ClusteringSometimes it becomes necessary to adjust the manner in which DB2 attempts to store data physically on disk. Recall from Chapter 5, "Data Definition Guidelines," that this is referred to as clustering. You might wish to change how data is clustered for several different reasons, such as
Prior to V8, changing clustering required dropping the clustering index and re-creating it without the CLUSTER keyword. To change clustering in DB2 V8, you can use the ALTER INDEX statement to specify either CLUSTER or NO CLUSTER . For example ALTER INDEX DSN8810.XPROJAC1 NO CLUSTER; Running this ALTER statement will change the XPROJAC1 index such that it no longer controls clustering. Keep in mind, though, that simply removing explicit clustering might not change the clustering specification for the table space. Until another clustering index is specified for the table, DB2 will continue to use the index that was just changed as the implicit clustering index. CAUTION If no explicit clustering index is specified for a table, the first index created on each table will be used as the implicit clustering index. When the clustering index is changed, new INSERT statements will cause data to be placed using the new clustering order. However, existing data is not immediately reclustered. Existing data will not be affected until the next time the table space is reorganized. Of course, you are still restricted to having only one clustering index at any one point in time. So, you will need to order and time the execution of your ALTER INDEX statements so that there is never a state when two clustering indexes exist at the same time. For example, to change the clustering index from IX2 to IX5 , you would issue the following sequence of ALTER statements: ALTER INDEX IX2 NO CLUSTER; COMMIT; ALTER INDEX IX5 CLUSTER; COMMIT; NOTE You can use the ALTER statement to respecify the clustering of your existing partitioned table spaces if you so desire . Prior to DB2 V8, the partitioning index for partitioned tables also had to be the clustering index. This is no longer the case. You can now specify a partitioning key that is not also the clustering key for a table. Changing the Treatment of Variable Index KeysPrior to DB2 V8, specifying a variable length column in an index caused DB2 to pad the data to its maximum length in the index key. This is no longer a requirement, because DB2 V8 allows you to specify PADDED or NOT PADDED to control whether the index key should be padded to its maximum length. This specification can be made when the index is defined using CREATE INDEX or changed using ALTER INDEX . NOTE The default is PADDED when you migrate from V7 to V8 in order to maintain compatibility with past implementations . However, for new V8 installations the default is NOT PADDED . A new DB2 V8 DSNZPARM named PADIX can be used to change the default. When changing an index from PADDED to NOT PADDED , the index is placed in the ARBDP exception state and a value of 'N' is placed in the PADDED column of SYSIBM.SYSINDEXES . The index must be rebuilt, because DB2 cannot determine the accurate length of the index key without accessing the table space. The index is not available for use until it has been rebuilt, thereby setting all of the keys to varying lengths and resetting the pending state. CAUTION Be aware that recovery to a point in time may cause the ARBDP exception state to be set (if that point in time was before the index was rebuilt). You can also change an index from NOT PADDED to PADDED using ALTER INDEX . If the index has varying length columns, it is placed in the AREO* exception state and a value of 'Y' is placed in the PADDED column of SYSIBM.SYSINDEXES . The index is available for use but performance will suffer. The index can be rebuilt or reorganized to pad the keys to the maximum length and reset the pending state. Whenever the padding attribute of the index is changed, DB2 creates a new version of the index in the DB2 Catalog. Changing Table Space Partitioning SpecificationsHistorically, one of the biggest impediments to managing DB2 database systems has been administering partitioned table spaces. Prior to DB2 V8, it was either difficult or impossible to modify the structure and many of the parameters of a partitioned table space.
With DB2 V8 you gain the ability to immediately add partitions, rotate partitions, and change the partitioning key values. In order to gain this flexibility, though, you will need to change from index-controlled partitioning to table-controlled partitioned tables. Then, you can use the ALTER TABLE statement to modify most of the partitioning specifications. Adding PartitionsTo add a partition to an existing table space the ALTER TABLE statement has been augmented with the ADD PART parameter. For example, consider a table space that is partitioned having one fiscal quarter worth of data per partition. Eventually, you might run out of partitions and need to add one. Assume that the last partition holds data up to the third quarter of 2004, but now you need to add data past this date. The following SQL shows how to use ALTER to add a new partition to a table: ALTER TABLE CREATOR.TBNAME ADD PART VALUES('12-31-2004'); Of course, this assumes that your fourth quarter ends in December. NOTE You do not specify a partition number when you add a partition. DB2 will determine the next partition number to be used by examining information in the DB2 Catalog. Along with adding a new data partition, a new partition is added for each partitioning index. This can include both the partitioning index and data-partitioned secondary indexes (DPSIs), as well. You can add partitions up to the maximum limit; the maximum number of partitions depends on the DSSIZE parameter and page size of the table space (as defined in Chapter 5). You cannot specify attributes such as PRIQTY and SECQTY ; instead, DB2 uses the values in use for the previous logical partition. Before you begin to use the new partition, you should execute an ALTER TABLESPACE statement to provide accurate space parameters for the new partition. If you are using STOGROUP s the next data set is automatically allocated for the table space and each partitioned index. When your DB2 objects are user managed ( VCAT ), you must pre-define the data sets using VSAM IDCAMS .
Rotating PartitionsIf the requirement to add a partition can be satisfied by allowing an existing partition to be reused, you might be able to rotate the partition. Rotating partitions allows old data to "roll off," but the partition is kept for new data. This is a good option in the following situations:
Partition rotation is implemented using the ALTER TABLE ALTER PART ROTATE FIRST TO LAST statement. When rotating, if you specify the RESET parameter, the data rows in the oldest (or logically first) partition are deleted, and a new table space high boundary is set so that partition becomes the last logical partition in sequence. This partition will then be ready to hold the new data as it is added. The partition that was rolled off is immediately available after the ALTER succeeds; a REORG is not required. CAUTION When specifying RESET , the existing data in the oldest partition is deleted and SYSIBM.SYSCOPY and SYSLGRNX rows associated with the partition being reset are deleted, too. The aftermath of rotating a partition can be confusing. This is especially the case if you are trying to match partitions to physical data sets. The .A001 data set is now the last logical partition, not the first. You will need to use the new LOGICAL_PART column in the SYSIBM.SYSTABLEPART table to match partitions to data sets. The DISPLAY command will list the status of table space partitions by logical partition. Also, steps need to be taken if you need to keep the rolled off data for archival purposes. Be sure to unload the data immediately before rotating the partition using either the UNLOAD utility or a user-written program. If this REUSE option is specified, a logical reset of the partition is done instead of deleting and redefining data sets. Existing extents for the partition will be kept. Changing Partition BoundariesDB2 V6 introduced the ability to modify limit keys for partitions. DB2 V8 adds the same capability for table-based partitioning with the ALTER TABLE ALTER PART VALUES statement. The affected data partitions are placed into the REORG pending state until they have been reorganized. Rebalancing PartitionsYou can rebalance partitions when running DB2 V8, too. Unlike the schema changes previously discussed in this chapter, partition rebalancing is accomplished using the REORG utility instead of the ALTER statement. When reorganizing a table space you can specify a new parameter, REBALANCE , indicating that new partition boundaries should be set for the range of partitions being reorganized. DB2 will rebalance the data such that it is evenly distributed across the partitions. Rebalancing is most practical when the data is not skewed greatly. CAUTION When many duplicate values occur in the columns that define the partition boundaries, DB2 might not be able to evenly balance the data effectively. Yes, you are reading this right. Running a REORG with the REBALANCE option can change the limit key for partition boundaries. The REORG will set new partition boundaries so that all the rows participating in the reorganization are evenly distributed across the partitions being reorganized. DB2 will update the SYSIBM.SYSTABLEPART and SYSIBM.SYSINDEXPART tables to record the new limit key values. Using REORG to rebalance partitions has its advantages. Using this approach, your partitions will not be placed in the REORG exception state, as would be the case if you changed the partition boundaries using ALTER TABLE . Keep the following restrictions in mind when considering whether to rebalance your partitions using REORG :
CAUTION After rebalancing is complete, plans, packages, and the dynamic statement cache records that reference the reorganized object are invalidated.
To support online schema changes, DB2 has been enhanced to support multiple versions of DB2 objects. As certain schema changes are made, DB2 creates a new version to refer to the new structure. Multiple versions can exist at one time, each version referring to the object at a different stage of its life. Issuing an ALTER for an existing DB2 object or column can cause a new format to be needed for tables, table spaces, and/or indexes. DB2 needs to know about the old format and the new format because all of the underlying data for an object (as well as its image copies) cannot be changed immediately to match the format of the latest version. By supporting multiple versions with different formats over time for tables and indexes, maximum data availability is achieved. DB2 references the version information to appropriately store and use the data in its correct format. Versioning is tracked and recorded in the OLDEST_VERSION and CURRENT_VERSION columns in the following DB2 Catalog tables:
SYSIBM.SYSOBDS is a new V8 DB2 Catalog table that contains one row for each table space and index that can be recovered to an image copy that was made before the first version was generated. However, the version information relevant to the data also is stored in system pages embedded in the table space or index page set. The system pages are stored along with the data so that the data can be properly interpreted. Doing so makes table spaces and indexes self-defining. Additionally, with the version information embedded in the page set, data can be accessed or unloaded from an image copy without DB2 being up. NOTE DB2 supported versioning prior to V8. When an indexed VARCHAR column in a table is enlarged, a new index version is created and tracked using the IOFACTOR column of SYSIBM.SYSINDEXES . In DB2 V8, the first ALTER that creates a new index version switches to DB2 V8 versioning by setting the OLDEST_VERSION and CURRENT_VERSION columns to the existing versions in the index. The OLDEST_VERSION is the oldest format of the data in the object itself or any image copy still registered in SYSIBM.SYSCOPY . There is an upper bound for version numbers . A table space can have up to 256 different active versions; an index can have up to 16 different active versions. A version is active if it is used on any page within a page set (table space or index) or is in use in an existing image copy still registered in the DB2 Catalog. CAUTION When the upper bound is hit for a version and a new version must be created, the version number will wrap back to the beginning ”starting again at version 1 (not zero). So, it is possible that the CURRENT_VERSION is a lower number than the OLDEST_VERSION . Of course, DB2 will not wrap if version number 1 is still an active version. If the maximum number of active versions is reached, a “4702 SQLCODE will be returned and the ALTER will fail. An object that is never altered remains at version zero. When Is a New Version Generated?A new version is created for the table or index that is affected whenever the following types of changes are made:
Deactivating VersionsFor table spaces, and indexes defined as COPY YES , the MODIFY utility needs to be run to update the LOW_VERSION in the DB2 Catalog and reclaim the version by making it inactive. If there are entries for COPY , REORG , or REPAIR VERSIONS remaining in SYSIBM.SYSCOPY for the table space, MODIFY updates LOW_VERSION to be the lowest value of LOW_VERSION found from matching SYSCOPY rows. If no SYSCOPY rows remain for the object, MODIFY sets LOW_VERSION to the lowest version data row or key that exists in the active page set. For indexes defined as COPY NO , running a REORG , REBUILD , or LOAD utility that resets the entire index updates the LOW_VERSION in SYSIBM.SYSINDEXES to be the same as HIGH_VERSION . Database Exception States for Online Schema ChangesException states are used by DB2 to alert administrators and users to a database condition that needs to be managed or improved. Exception states have been used in all past releases of DB2, but two new states have been created to support online schema changes. Throughout this chapter, we have discussed these new database exception states, but we will review them here for easy reference:
As with any DB2 database exception state, these states will appear when you issue the DISPLAY command to monitor the status of your database objects. Consult Chapter 37, "DB2 Utility and Command Guidelines," for a complete discussion of the database exception states. Online Schema Change Implementation ConsiderationsKeep in mind that existing access paths can become inefficient when a new version of an object is created. Therefore, making an online schema change might cause performance to degrade. Just because IBM has made it easier to implement changes to database objects does not mean that changes can be made indiscriminately and without planning. Be sure to treat every database change as a potential impact to performance and availability. Whenever possible, schedule schema changes as close to a scheduled reorganization as possible. This will minimize the potential performance impact. Certain types of online schema changes are more invasive than others. For example, rotating partitions can be very time-consuming . When rotating partitions of a partitioned table, the reset operation requires that the keys for deleted rows also be deleted from all NPIs. Because each NPI must be scanned to delete the keys this activity can consume an inordinate amount of elapsed time to complete. Individual delete row processing is required for referential integrity relationships when DATA CAPTURE is enabled, or when there are delete triggers. In such scenarios, be sure to factor in additional time to delete data a row at a time. |
< Day Day Up > |