Data Maintenance | DB2 for z/OS Version 8 DBA Certification Guide

Establishing a data maintenance policy can help ensure that the data in your tables is stored as efficiently as possible. Otherwise, you may discover that your applications start to experience degradation in performance. This may be caused by the poor physical organization of the data.

The physical distribution of the data stored in tables has a significant effect on the performance of applications using those tables. The way the data is stored in a table is affected by the update, insert, and delete operations on the table. For example, a delete operation may leave empty pages of data that may not be reused later. Also, updates to variable-length columns may result in the new column value's not fitting in the same data page. This can cause the row to be moved to a different page and so produce internal gaps or unused space in the table. As a consequence, DB2 may have to read more physical pages to retrieve the information required by the application.

These scenarios are almost unavoidable. However, as the database administrator, you can use the DB2 data maintenance utilities to optimize the physical distribution of the data stored in your tables. Two related utilities can help you organize the data in your tables: REORG and RUNSTATS.

Certain SQL operations may produce internal gaps in tables. So how can you determine the physical organization of your tables or indexes? How can you know how much space is being used and how much is free? With the information collected from the system catalog tables, you can generally determine the state of any table space or index space.

Reorganizing Table Spaces

The following actions occur during reorganization of a table space.

The data in the table space and the corresponding indexes defined in the table in the table space are reorganized.
The space of dropped tables is reclaimed, if it was not reclaimed before.
Free space is provided.
New space allocation units become effective.
Segments are realigned, and the rows of the tables are rearranged in clustering-index sequence, except for simple table spaces with multiple table
Overflow pointers are eliminated.
The corresponding indexes are rebuilt from scratch.
The version number of each row is updated to the current version number.

The REORG utility is used to reorganize data in table spaces and in indexes. Following are some of the important phases in the REORG utility.

UTILINIT, the initialization phase, performs the setup operations for the utility.
UNLOAD unloads the rows of the table space and writes to the unload data set, specified in the UNLDDN parameter in the utility. SYSREC is the default DD name for the unload data set. The unload data set is also referred to as the SYSREC data set. The type of table space, the number of tables in the table space, and whether the tables have explicit clustering indexes will all affect the sequence in which the rows are unloaded.
RELOAD reloads the rows from the unload data set into the table space. The sequence in which the rows were unloaded will be the sequence in which they are contained on the unload data set and subsequently the sequence in which they will be reloaded. All rows for a table were unloaded together for segmented table spaces and are then restored together, resulting in the contiguous segments for the table. Free space is reserved in the pages, and free pages are provided for future insertions during the reloading. This is provided as defined by the active PCTFREE and FREEPAGE values for the table space. As the rows are reloaded into the table space, the keys for the indexes of the table of the table space are extracted and, together with the new RID for the rows, stored onto the first work data set specified via the WORKDDN parameter. The default DD name for the data set is SYSUT1, also referred to as the SYSUT1 data set. Because they are extracted as the rows are reloaded, the key/RID pairs for the indexes are in an arbitrary sequence with regard to the key values. Also, the key/RID pairs for the various indexes are intermixed.
SORT sorts the key/RID pairs for the various indexes, using the DFSORT by index, key, and RID. At the end of this phase, the sorted key/RID pairs are contained on the second work data set, also referred to as the SORTOUT data set, specified in the WORKDDN parameter. The default name for this data set is SORTOUT.
NOTE

Although it is mentioned many times in reference to the REORG utility, DFSORT or an equivalent sort product can be used for the sorting, or equivalent utilities can be used.
BUILD is the phase in which the output of the SORT phasethe sorted key/RID pairs for the various indexesis used to build the indexes for the tables in the table space.
UTILTERM, the final phase, performs the cleanup operations, such as releasing virtual storage.

The phases in the REORG utility are performed sequentially. But exploiting many of the other features of the REORG utility will cause many of the phases to be performed in parallel.

If the target table space is segmented, REORG unloads and reloads by table. If an explicit clustering index exists on a table in a segmented table space, that table is unloaded in clustering sequence. If no explicit clustering index exists, the table is unloaded in physical row and segment order.

For segmented table spaces, REORG does not normally have to reclaim space from dropped tables. Space freed by dropping tables in a segmented table space is immediately available if the table space can be accessed when DROP TABLE is executed. If the table space cannot be accessed when DROP TABLE is executed, REORG reclaims the space for dropped tables. After the execution of the REORG, the segments for each table are contiguous.

The entire table space does not need to be reorganized if it is partitioned. You may choose to simply reorganize one partition, if necessary. If you reorganize a single partition or a range of partitions, all indexes of the table space are affected. Depending on how disorganized the nonpartitioning indexes are, they may need to be reorganized as well. The following example shows how to reorganize a partition of a table space:

 REORG TABLESPACE DB2USER1.CANDIDATE PART 3 SORTDATA SORTDEVT SYSDA

Ranges of partitions can also be reorganized. The following syntax would reorganize parts 3, 4, 5, and 6:

 REORG TABLESPACE DB2USER1.CANDIDATE PART 3:6

Reorganizing a LOB table space is a separate task from reorganizing the base table space. A LOB table space that was defined with LOG YES or LOG NO will affect logging while reorganizing a LOB column. Specify LOG YES and SHRLEVEL NONE when you reorganize a LOB table space, to avoid leaving it in COPY-pending status after the REORG.

During the LOB reorganization, the LOB is not unloaded and space is not reclaimed. Embedded free space is removed, and attempts are made to make the LOB pages contiguous. As a result, prefetch should be more effective.

Reorg Options

The REORG utility has some important parameters that help achieve parallelism in the REORG phases.

SHRLEVEL

The SHRLEVEL parameter determines the level of access allowed during the REORG. The NONE option states that, during the UNLOAD phase, applications can read but not write to the affected area and that, during the RELOAD phase, applications have no access to the data. The REFERENCE option allows for read access during the unload and reload phases. SHRLEVEL CHANGE allows for reading and writing during both the UNLOAD and RELOAD phases.

If you needed a more restrictive accesswith no access by applications or to the end userthe table space should be started in utility mode (UT). These are all offline REORGS.

The next example shows the REORG utility that does not allow concurrent writes during RELOAD:

 REORG TABLESPACE DB2USER1.CANDIDATE SHRLEVEL REFERENCE

If an object is in REORG-pending status (REORP), a REORG SHRLEVEL NONE will need to be executed on the affected data.

SORTDATA

Several factors come into play when determining the sequence in which the rows of the table space to be reorganized are unloaded: the type of table space, the number of tables in the table space, and whether the tables have explicit clustering indexes.

Segmented table spaces. Tables in segmented table spaces are unloaded one after the other. If a table has an explicit clustering indexa clustering index was defined with the CLUSTER keyword in the CREATE INDEX DDLthe clustering index is used to unload the rows of the table in the sequence of the clustering index that was defined. The table will then store the rows in the unload data set. If no explicit clustering index is defined, the data is unloaded and sorted in the physical sequence: segment by segment, page by page, and then row by row.
Partitioned table spaces. For these table spaces, the rows of the table space, or the individual partitions, to be reorganized are unloaded and stored in the unload data set. This is done using the partitioning index in the sequence defined by the index.
Simple table space. The unloading will depend on the number of tables in the table space. If multiple tables are in the table space, it is unloaded in physical sequence. The rows in the unload data set will have the same physical order as the table space, and the rows of the various tables are intermixed. If more than one table has an explicit clustering index, it will be used to unload the rows in the sequence defined by the index. If only one table is in the table space and does not have an explicit clustering index, the table will be unloaded in physical sequence.

When the SORTDATA option is used on the REORG utility control statement, DB2 always unloads the rows in physical sequence. Using SORTDATA will help with performance of the REORG utility. DFSORT or an equivalent sort utility is used to sort the rows in index sequence if the table space has at least one explicit clustering index and is not a simple table space with multiple tables. The index itself is not used for the sorting. SORTDATA operates differently for each type of table space.

For segmented table spaces, if an explicit clustering index is on at least one of the tables of the segmented table space, the tables will be unloaded table by table in a physical sequence and the output passed to DFSORT for sorting. The rows are then sorted in the sequence of the explicit or implicit clustering index of the tables.
NOTE

The data will not be sorted if no indexes are on the tables.

NOTE

In the case of REORGs not using SORTDATA, the rows of tables are also sorted if the tables have an implicit clustering index but not an explicit one. But at least one of the tables in the segmented table space must have an explicit clustering index; otherwise, the tables are unloaded one by one in physical sequence and the rows are not sorted.
For partitioned table spaces, the rows of the entire table space or the partitions to be reorganized are unloaded in physical sequence and passed to DFSORT to be sorted in the sequence defined by the clustering index. However, this index is not used for the sorting.
For simple table spaces, if the table space contains multiple tables, the rows are unloaded in physical sequence and not sorted. If the table space contains a single table, the rows are unloaded in physical sequence and passed to DFSORT, provided that an explicit clustering index is defined. If not, the rows will not be sorted.

In most cases, physically unloading the rows and sorting them by means of DFSORT is faster than unloading the rows through a clustering index if the CLUSTERRATIO of the index is less that 95 percent, or if the percentage of nonclustered rows (100*FAROFFPOS/CARDF) is greater that 5 percent). The lower the CLUSTERRATIO, the higher the performance improvements when using SORTDATA. SORTDATA has no advantage if the table space does not have an explicit clustering index or a partitioning index or if it is a simple table space with several tables.

The following example uses the SORTDATA keyword on the REORG of a partitioned table space:

 REORG TABLESPACE DB2USER1.CANDIDATE PART 3 SORTDATA SORTDEVT SYSDA

NOSYSREC

The intermediate storage of the rows in the unload data setSYSREC, after the unloading and sorting of the rows by the UNLOAD phasecan be avoided by specifying the NOSYSREC option explicitly or implicitly for the REORG utility. NOSYSREC is assumed for read/write online reorganizations (SHRLEVEL CHANGE), covered later in this chapter.

NOTE

Use of NOSYSREC requires that SORTDATA be used.

If the NOSYSREC parameter is used, the UNLOAD phase will not use the unload data set. Therefore, the UNLDDN parameter need not be specified or a SYSREC DD statement provided for the utility job.

Rows that must be sorted are passed to the RELOAD phase by the DFSORT exit after they have been sorted. If the rows do not need to be sorted, they are immediately passed to the RELOAD phase. DFSORT will need sort-work data sets if the number of rows to be sorted is large. If NOSYSREC is specified, either explicitly or implicitly, and dynamic allocation of the sort-work data set is not specified using SORTDEVT, you need to provide DD statements with the DD name DATAWKnn:nn = 01 for the first sort-work data set, 02 for the second, and so on.

When NOSYSREC is used, the DATAWKnn data set is used for the sorting of the rows, not the SORTWKnn-work data sets. DATAWKnn data sets need not be provided if you are not using NOSYSREC; they are not used if SORTDATA has been specified. In this case, the SORTWKnn data set is used if the sort-work data sets are not allocated dynamically.

Use of NOSYSREC will eliminate the I/O for the unload data set and will not cause additional I/O within DFSORT. The fact that different sort-work data sets are used does not change the I/O behavior of DFSORT. Therefore, use of NOSYSREC provides performance improvements.

NOTE

The REORG utility cannot be restarted if NOSYSREC has been specified or is assumed.

SORTKEYS

As of version 8, the SORTKEYS option is the default and no longer needs to be specified. The keyword is supported for compatibility only. The SYSUT1 and SORTOUT data sets can be eliminated from existing jobs to reduce any allocated and unused space. SORTKEYS allows for DFSORT exits to be used to pass the unsorted key/RID pairs for the indexes of the table space to DFSORT and to accept the sorted key/RID pairs again, rather than making the unsorted key/RID pairs available in the SYSUT1 data set and receiving the sorted key/RID pairs in the SORTOUT data set.

With SORTKEYS, the sorting of the key/RID pairs will be done in parallel with the reloading of the rows. Other activities can also be done in parallel, such as the building of the indexes and the inline statistics collection. The parallelism in the utility is accomplished by groups of z/OS subtasks. These task groups contain two or three subtasksSORT, BUILD, and STATISTICSdepending on whether inline statistics are to be established.

DB2 can allocate multiple task groups, two or three subtasks each, in order to establish the indexes for the tables of the table space in parallel with one another. The degree of parallelism used will depend on the specifications and the DD statements provided for sort-work data sets. If they are dynamically allocatedSORTDEVT specifiedthe degree of parallelism will be determined by DFSORT. If you manually allocate the sort-work data set and sort-message data sets, the degree of parallelism will be determined by the data sets you have allocated.

The following example uses SORTKEYS to execute the REORG utility:

 REORG TABLESPACE DB2CERT.CERTTS LOG NO SORTDATA SORTKEYS

Parallel Index Build

Because SORTKEYS is specified by default, many of the operations are performed in parallel: multiple sort and index build operations or multiple SORT and BUILD phases performed in parallel by the subtasks of the allocated task groups. The SORTBLD phase refers to when these operations are done in parallel. This phase encompasses the sort and build activities for all indexes involved in the reorganization.

The SORTBLD phase, which is jointly performed by the allocated task groups, partly overlaps with the RELOAD phase. Every index is constructed by a predefined task group, which can process multiple indexes. The proper SORT tasks for the indexes pass the key/RID pairs to DFSORT for sorting as the keys for the indexes are extracted from the rows and the new RIDs for the rows are determined. This allows the sorting of the key/RID pairs to be performed in parallel with the RELOAD phase. If multiple tasks groups are used, the key/RID pairs for the various indexes are sorted in parallel.

When DFSORT emits the sorted key/RID pairs, the BUILD tasks of the index will begin constructing the index. The building of the indexes will be partially overlapped with the reloading of the table space and the sorting of the key/RID pairs, even those for the same index. The various indexes are constructed in parallel if multiple task groups are used.

The construction of an index finishes only after the table space has completely been reloaded, despite the fact that the building of the indexes is done in parallel with the reloading of the table space. The task groups have three subtasks if inline statistics have been requested. This third STATISTICS subtask collects the requested statistics as the rows are reloaded and the indexes are built.

NOTE

Each SORT task requires a sort-work data set and a message data set. The message data set can be common to all task groups it is assigned to the SYSOUT.

Logging

The LOG option specifies whether records are logged during the reload phase of REORG. If the records are not logged, the table space is recoverable only after an image copy has been taken. If you specify COPYDDN, RECOVERYDDN, SHRLEVEL, REFERENCE, or SHRLEVEL CHANGE, an image copy is taken during REORG execution.

With LOG YES, the logs record changes made during the RELOAD phase. The LOG YES option is not allowed for any table space in DSNDB01 or DSNDB06, or if the SHRLEVEL REFERENCE or CHANGE options are used. If you specify SHRLEVEL NONEexplicitly or by defaultthe default is LOG YES. However, if you specify LOG NO with SHRLEVEL NONE, previous image copies may not be candidates for use during certain recoveries.

With LOG NO, DB2 does not log records. This puts the table space in COPY-pending status if either of these conditions is trUE.

REORG is executed at the local site, and COPYDDN, SHRLEVEL REFERENCE, or SHRLEVEL CHANGE is not specified.
REORG is executed at the remote site, and RECOVERYDDN is not specified.

Online Reorganizations (OLR)

Most reorganizations limit or restrict access to the table space that is being reorganized, especially if SHRLEVEL NONE is used. During the UNLOAD phase, write access is prohibited; during the RELOAD phase, no type of access is allowed. With today's increasing demand for 24-hour service, it becomes less and less acceptable that access is blocked for long periods of time or at all. To deal with this issue, DB2 supports online reorganization, a reorganization whereby online processes can access the table space for most of the time. Online reorg comes in two flavors.

If you specify SHRELVEL REFERENCE, you will have read access to the table space and access via the associated indexes for most of the time. The SWITCH phase has only a small window during which no access is allowed. SHRLEVEL REFERENCE reorganizations are also referred to as read-only online reorgs.
If you specify SHRLEVEL CHANGE, you will have read/write access to the table space and access via the indexes associated with the table space for most of the time. For a short period at the end of the LOG phase, you will have only read access; for a short period within the SWITCH phase, no access is allowed. SHRLEVEL CHANGE reorgs are referred to as read/write online reorgs. The LOG and SWITCH phases exist only for online reorganizations.

Read-Only OLR

Online reorg uses shadow data sets for the table space and the indexes being reorganized. Collectively, the shadow data sets for a table space are referred to as corresponding-shadow table space. Similarly, the shadow data sets for an index are referred to as corresponding-shadow index, or shadow index space. The shadow data sets never exist in the catalog.

Because the original table space does not change during a read-only online reorganization, four phases are sufficient. The rows of the space being reorganized are unloadedUNLOAD phaseand then reloaded into the shadow table spaceRELOAD phase. The reorganized indexes for the table space are not constructed in the original index spaces but in the shadow indexes spaces: in the SORTBLD phase or SORT/BUILD phases. During the SWITCH phase, the shadow data sets for the table/index spaces replace the original data sets, and all access is directed to them. As part of the switching process, the original data sets for the table/index spaces are renamed, and the shadow data sets receive their former names.

For DB2-managed table/index spaces, DB2 automatically creates the shadow data sets in the respective storage groups with the respective active space parameters and deletes the original data sets. For user-managed table/index spaces, you must use access method services (AMS) to define the data sets for the shadow tables/index space yourself. During the UTILTERM phase, the REORG utility completes the switch by renaming the original data sets again. They now receive the data set names the respective shadow data sets had before the reorganization. It is your responsibility to delete these data sets.

During all phases, excluding the SWITCH phase, the rows of the table space can be read, and the indexes of the table space can be used. However during the SWITCH phase, no access to the table space or through the indexes is allowed.

The FASTSWITCH keyword, the default, reduces the time that data is unavailable during the SWITCH phase. When this keyword is used, the online reorg no longer renames data sets, replacing the approximately 3-second outage associated with the renaming of original and shadow data set copies with a memory-speed switch of MVS catalog entries. In addition, FASTSWITCH does not have to invoke AMS to rename the data set.

If the data sets are system managed storage controlled, may need to switch automatic class selection routines to new naming standard (J00001 I-J switch).
The FASTSWITCH keyword cannot be used on catalog or directory objects.
A point-in-time (PIT) recovery works in spite of the changed data set name, even when concurrent copies are used.

Read/Write OLR

Read/write online reorg (SHRELEVEL CHANGE) does not allow you to specify the parameters SORTDATA, NONSYSREC, or SYSKEYS but always operates as if these parameters were specified. Thus, read/write OLR uses DFSORT to sort the rows during the UNLOAD phase and does not use the unload data set. This sorts the key/RID pairs for the indexes in parallel with the RELOAD phase and with the building of the indexes in the shadow index spaces. If the sort-work data sets are dynamically allocated or explicitly allocated for multiple task groups, read/write OLR builds the indexes in parallel.

The algorithm for read/write online reorg is the same as for read-only online reorg. However, the fact that changes are allowed during the UNLOAD, RELOAD, and SORTBLD phases is reflected by an additional phase: the LOG phase. During SWITCH, the changes performed throughout these phases are applied to the shadow table/index spaces. Even during most of the LOG phase, the user has full read/write access. Only at the end of the LOG phase does the REORG utility restrict access to read-only to guarantee that the reorganization comes to an end.

During a read-only online reorg, no access is allowed during the SWITCH phase. Also, a read/write online reorg always creates an inline copy for the table space being reorganized. Therefore, a full-image copy does not need to be created after the reorganization of the table space.

During the LOG phase, changes for the original table space on the DB2 log are applied to the shadow table space and consequently to the shadow index. The RIDs for the DB2 records on the DB2 log point to the original table space. In order to map the changes to the shadow table space, the REORG utility uses a four-column mapping tableactually a unique index over the mapping table. This table and index must be created prior to REORG utility execution.

NOTE

The mapping table must reside in a segmented table space that is not the table space being reorganized.

An example DDL of the four columns in the mapping table follows:

 CREATE TABLE MAP_TABLE   (TYPE CHAR(1) NOT NULL,    SOURCE_RID CHAR(5) NOT NULL,    TARGET_XRID CHAR(9) NOT NULL,    LRSN CHAR(6) NOT NULL) IN DB2CERT.MAPTS; CREATE UNIQUE INDEX MAPINX1 ON MAP_TABLE   (SOURCE_RID ASC,    TYPE,    TARGET_XRID,    LRSN);

The SOURCE_RID contains the RIDs for the rows in the original table space, whereas the column and the TARGET_XRID contain the extended RIDs for the rows in the shadow table space. The key of the mapping table index has the same four columns as the mapping table but in a different sequence.

The name of the mapping table must be specified via the MAPPING TABLE parameter for the REORG utility and can be any name you choose. An example of using the mapping table follows:

         REORG TABLESPACE DB2CERT.CERTTS SHRLEVEL CHANGE ... MAPPING TABLE DB2USER1.MAP_TABLE

DELETE, INSERT, SELECT, and UPDATE authorization will be needed on the mapping table.

NOTE

A mapping table is needed for each table space table being reorged in parallel, because exclusive locks are taken to prevent concurrent access to the mapping tables.

Only the index over the mapping table is filled; therefore, it is sufficient to allocate as little space as possible to the mapping table. For this index, you should assign at least 1.1 * Number of rows in TS * 27 bytes, assuming that the entire table space is being reorganized. For reorgs only of partitions of the table space, the number of rows in the partitions is to be used instead of the number of rows in the table space.

During read/write online reorganizations, application programs or end users may change the data in the original table space up to the end of the LOG phase. First, the LOG phase applies the changes made during the UNLOAD, RELOAD, and SORTBLD phases to the shadow table space and the shadow indexes. During most of the LOG phase, read/write access to the original table space is allowed. Because the data in the original table space may have changed during the first iteration, the second iteration must apply the changes made during the first iteration, a third iteration must apply the changes made during the second iteration, and so on. This could potentially cause the LOG phase to never end.

Ideally, fewer and fewer changes must be applied with each iteration. Conceivably, however, an endless number of iterations may be necessary. Therefore, the REORG utility must ultimately limit the access during the LOG phase to read, only to come to a final iteration. After this is done, REORG must apply only the changes accumulated during the previous iteration before it can enter the SWITCH phase.

Because the switch to read-only access impacts the operating environment, the REORG utility provides the ability to decide and specify how long a period this can be tolerated. This is done via the MAXRO parameter.

The REORG utility estimates how long the next iteration will take, based on the changes for the previous iteration. If its estimate is lower than or equal to the value specified via the MAXRO parameter, the REORG utility switches to read-only access or even to no access allowed, depending on what you have requested, and the last iteration takes place. The time of the last iteration may be larger than the estimate or the value you have specified. However, it should not be substantially larger.

If DRAIN WRITERS has been specified, either explicitly or by default, the REORG utility waits until all units of recovery accessing the table space have been committed and does not allow new units of recovery to begin. With DRAIN ALL, the REORG utility waits until all readers and writers are off the table space and does not allow any further read or write access. Readers may have locks (claims) not being released before they commit. Therefore, it is imperative that even long-running read-only programs commit from time to time to allow online reorganizations to succeed.

The default for MAXRO is 300 seconds. MAXRO DEFER causes log processing to continue indefinitely until you change MAXRO by means of the ALTER UTILITY command. The last iteration is forced by a different condition (Long-log) or the reorganization is terminated.

NOTE

Specifying a small value for MAXRO generally causes more iterations to take place and may increase the total elapsed time for the reorganization. A huge value for MAXRO will most likely lead to the second iteration's being the last one.

If a small value is specified for MAXRO, the log processing of the REORG utility may not catch up with the change activities performed for the table space. This is referred to as the long-log condition.

The REORG utility raises a long-log condition if the number of log records processed by the next iteration will not be sufficiently lower than the number of log records for the previous iteration and the next log iteration will take longer than the specified MAXRO values, that is, if the next iteration will not be the last one.

An option is available to specify what should happen in the case of long-log situations. After message DSNU3771 has been issued and the time specified through the DELAY parameter has passed, the action of the LONGLOG parameter is performed. The action can be CONTINUE, or continue log-processing iterations; TERM, or terminate the REORG; or DRAIN, or wait until all units of recovery involving the table space have been committed and prevent new units of recovery for the table space.

NOTE

The time on the DELAY parameter and the actions specified in the LONGLOG parameter can be changed by using the ALTER command, if necessary.

The REORG utility also allows you to control when a reorg must complete. You can use the DEADLINE parameter to specify a deadline, at which time the reorg must finish. The deadline can be an absolute timea time stampor a relative timea labeled duration expression using CURRENT DATE or CURRENT TIMESTAMP. If it estimates that the SWITCH phase will not complete before the specified deadline, the REORG utility issues the message that the DISPLAY UTILITY command would issue and terminates the reorganization. The DEADLINE parameter can be changed by means of the ALTER UTILITY command if the reorganization appears that it will not finish before its deadline and you do not want the reorg to be terminated.

NOTE

The deadline must be extended before it is reached.

Inline Statistics during REORG

After reorganization, the old statistics for the reorganized table space, index, or partitions are no longer valid; nor are the old statistics for the indexes associated with a table space being reorganized. New statistics must be established for these objects to ensure that the DB2 optimizer has the proper statistics for access-path determination. (For more information on access paths, refer to Chapter 18.)

Establishing the new statistics is done by executing the RUNSTATS utility after the reorganization or running the REORG utility for all the associated objects. These inline statistics can be requested by means of the STATISTICS clause. The DB2 catalog tables are updated accordingly and/or the statistics are reported in the output listing for the REORG utility. The following example requests inline statistics during a reorg:

 REORG TABLESPCE DB2CERT.CERTTS SORTDATA STATISTICS PART 5

By allowing REORG to perform inline statistics, the RUNSTATS utility does not need to be run after the reorg. It is faster to run the statistics inline, and they are done in parallel by subtasks during the reorg.

NOTE

Rebinds the plans and packages for application programs will still have to occur to make the statistics effective.

The keywords you can specify as a part of the STATISTICS clause are the same as you can specify for RUNSTATS, and the functions provided are the same. This includes statistics sampling, which will be discussed later in this chapter.

The REORG utility allows you to select the rows you want to remove, using the DISCARD clause, which may contain one or more FROM TABLE specifications identifying the tables from which rows should be removed.

In the FROM TABLE specification, you name a table of the table space from which rows should be removed and the selection condition for the removal of the rows (WHEN condition). In the WHEN condition, you can combine basic predicates by means of AND and OR to select the rows to be discarded. The predicates must refer to columns of the specified table.

NOTE

As the consequence of discarding rows, the check-pending (CHKP) conditions will be set for the table space containing dependent tables.

REORG also allows you to unload data by using the UNLOAD options: CONTINUE, PAUSE, ONLY, and EXTERNAL. CONTINUE tells REORG to continue processing after the data is unloaded; ONLY says to stop and terminate after the unload.

PAUSE tells REORG to stop after the UNLOAD phase. One common use for REORG UNLOAD PAUSE is to unload the data from the table space and then remove the extents from user-defined data sets by using access method services (AMS). The status of the REORG is recorded in SYSUTIL. The utility could then be restarted at the next phase (RESTART(PHASE)).

The UNLOAD EXTERNAL option allows you to unload data into a data set that will be in a format that the LOAD utility can use. The following examples show DISCARD and UNLOAD EXTERNAL:

 REORG TABLESPACE DB2CERT.CERTTS    UNLOAD EXTERNAL    FROM TABLE DB2USER1.TEST_TAKEN    WHEN (CID = 300) REORG TABLESPACE DB2CERT.CERTTS    DISCARD    FROM TABLE DB2USER1.TEST_TAKEN    WHEN (CID = 300)

Reorganizing Indexes

Reorganizing an index reclaims fragmented space and improves access performance. The REORG INDEX options are similar to those on REORG TABLESPACE, such as degree of access to the data during reorganization, collecting inline statistics using the STATISTICS keyword, preformatting pages, online reorg features, and the REPORTONLY feature. Sometimes, performance may be improved simply by reorganizing the index, not necessarily the table space.

The REORG INDEX statement will reorganize the entire indexall parts if partitioning. The REORG INDEX PART n reorganizes PART n of the partitioning index.

The execution phases of REORG INDEX are fairly similar to those of REORG TABLESPACE.

UTILINIT performs initialization and setup.
UNLOAD unloads index space and writes keys to a sequential data set.
BUILD builds indexes and updates index statistics.
LOG processes the log iteratively but is used only if you specify SHRLEVEL CHANGE.
SWITCH changes access to a shadow copy of the index space or partition and is used only if you specify SHRLEVEL REFERENCE or CHANGE.
UTILTERM performs any necessary cleanup.

Following is an example of the REORG INDEX utility with inline statistics:

 REORG INDEX DB2USER1.CANDIDATE SHRLEVEL REFERENCE STATISTICS

Triggering Reorganizations

Data that is organized well physically can improve the performance of access paths that rely on index or data scans. Well-organized data can also help reduce the amount of disk storage used by the index or table space. If the main reason for reorganizing is performance, the best way to determine when to reorganize is to watch your statistics for increased I/O, getpages, and processor consumption. When performance degrades to an unacceptable level, analyze the statistics described in the guidelines in this section to help you develop your own rules for when to reorganize in your particular environment. However, because running the REORG utility can be quite expensive and disruptive, it should be run only when a table space and indexes absolutely require it.

If the statistics are current in the DB2 catalog tables, REORG can determine whether it needs to perform the reorganization. The REORG utility allows for specification of criteria indicating when reorganization should be performed. This saves the DBA from having to do the analysis on the statistics and determining whether a reorg should be scheduled.

You can use some general guidelines to determine when to use the REORG utility. You query the catalog to manually determine to run REORG, or you can use the REORG utility to trigger reorgs, if necessary.

Catalog Queries

Catalog queries you can use to help you determine when to reorganize are included in DSNTESP in SDSNSAMP and can be used as input to SPUFI. The queries are as follows.

List table spaces that are candidates for reorganization:
```
 SELECT DBNAME, TSNAME  FROM SYSIBM.SYSTABLEPART   WHERE ((CARD > 0 AND (NEARINDREF + FARINDREF) * 100 / CARD > 10)   OR PERCDROP  > 10); 
```
NOTE

Indirect referencesgrowth in NEARINDREF and FARINDREFcan be caused by updates to columns defined as VARCHARs where the lengths of the rows change.

List index spaces that are candidates for reorganization:

 SELECT IXNAME, IXCREATOR  FROM SYSIBM.SYSINDEXPART   WHERE LEAFDIST > 200;

List the number of varying-length rows that were relocated to other pages because of an update. (This query gives an indication of how well DASD (Direct Access Storage Device) space is being used.)
```
 SELECT CARD, NEARINDREF, FARINDREF  FROM SYSIBM.SYSTABLEPART  WHERE DBNAME = 'xxx'  AND TSNAME = 'yyy'; 
```
List the percentage of unused space in a nonsegmented table space. (In nonsegmented table spaces, the space used by dropped tables is not reclaimed until you reorganize the table space.)
```
 SELECT PERCDROP  FROM SYSIBM.SYSTABLEPART  WHERE DBNAME = 'xxx'  AND TSNAME = 'yyy'; 
```
Determine whether the rows of a table are stored in the same order as the entries of its clustering index. (A large value of FAROFFPOS indicates that clustering is degenerating. A large value of NEAROFFPOS might also indicate that the table space needs reorganizing, but the value of FAROFFPOS is a better indicator.)
```
 SELECT NEAROFFPOS, FAROFFPOS  FROM SYSIBM.SYSINDEXPART  WHERE IXCREATOR = 'zzz'  AND IXNAME = 'www'; 
```
Return LEAFDIST, which is the average distance, multiplied by 100, between successive leaf pages during sequential access of an index. (If LEAFDIST increases over time, the index should be reorganized.)
```
 SELECT LEAFDIST  FROM SYSIBM.SYSINDEXPART  WHERE IXCREATOR = 'zzz'  AND IXNAME = 'www'; 
```
List the LOB table spaces that should be reorganized. (A value of ORGRATIO greater than 2 generally indicates a LOB table space that needs reorganization.)
```
 SELECT DBNAME, NAME, ORGRATIO  FROM SYSIBM.SYSLOBSTATS  WHERE ORGRATIO > 2; 
```

REORG Triggers

The REORG utility embeds the function of catalog queries. If a query returns a certain resultyou can use the default or supply your ownREORG will either reorganize or not. Optionally, you can have REORG run a report instead of doing the reorganization.

Following is an example of how to specify the OFFPOSLIMIT and INDREFLIMIT conditional reorg triggers:

 REORG TABLESPACE DB2USER1.TEST_TAKEN    SORTDATA NOSYSREC SORTKEYS    COPYDDN SYSCOPY1    OFFPOSLIMIT    INDREFLIMIT    STATISTICS TABLE(ALL)INDEX(ALL)

The REORG options OFFPOSLIMIT, INDREFLIMIT, and LEAFDISTLIMIT can also be equivalently described in the following SQL:

OFFPOSLIMIT

 SELECT CARDF   , (NEAROFFPOSF + FAROFFPOSF) * 100 / CARDF FROM SYSIBM.SYSINDEXPART WHERE CARDF > 0 AND (NEAROFFPOSF + FAROFFPOSF) * 100   / CARDF > :offposlimit

INDREFLIMIT

 SELECT CARD   , (NEARINDREF + FARINDREF) * 100 / CARD FROM SYSIBM.SYSTABLEPART WHERE CARD > 0 AND (NEARINDREF + FARINDREF * 100   / CARD > :indreflimit

LEAFDISTLIMIT

 SELECT LEAFDIST FROM SYSIBM.SYSINDEXPART WHERE LEAFDIST > :leafdistlimit

The REORG utility does not embed any function to help you determine when to reorganize LOB table spaces.

ALTER TABLE Statements

Another time to consider reorganizing data to improve performance is when ALTER TABLE statements have been used to add a column to the table or to change the data types or the lengths of existing columns. Such changes cause the table space to be placed in advisory REORG-pending (AREO*) status. In the case of changing the definition of an existing column, the table space is placed in AREO* status because the existing data is not immediately converted to its new definition.

Reorganizing the table space causes the rows to be reloaded, with the data converted to the new definition. Until the table space is reorganized, the changes must be tracked and applied as the data is accessed, possibly degrading performance. For example, depending on the number of changes, you may see decreased performance for dynamic SQL queries, updates, and deletes, as well as other ALTER statements, especially those that are run concurrently. In addition, multiple REORG and LOAD utilities running concurrently may perform slower or create timeouts. It may also take longer to unload a table that has had many changes prior to being reorganized.

Index Reorganizing

The LEAFNEAR and LEAFFAR columns of SYSIBM.SYSINDEXPART measure the disorganization of physical leaf pages by indicating the number of pages that are not in an optimal position. A REORG INDEX should be considered in the following cases.

LEAFFAR/NLEAF is greater than 10 percent. ( NLEAF is a column in SYSIBM.SYSINDEXES.)
PSEUDO_DEL_ENTRIES/CARDF is greater than 10 percent. If you are reorganizing the index because of this value, consider using the REUSE option to improve performance.
The data set has multiple extents. Keeping the number of extents to less than 50 is a general guideline. Many secondary extents can detract from the performance of index scans.
The index is in advisory REORG-pending status (AREO*) or advisory-REBUILD-pending status (ARBDP) as the result of an ALTER statement.
((REORGINSERTS + REORGDELETES) X 100) / TOTALENTRIES is greater than RRIInsertDeletePct.
(REORGAPPENDINSERT X 100) / TOTALENTRIES is greater than RRIInsertDeletePct.
Mass delete occurred.

Table Space Reorganizing

SYSIBM.SYSTABLEPART contains the information about how the data in the table is physically stored. Consider running REORG TABLESPACE in the following situations.

FAROFFPOSF/CARDF is greater than 10 percent. Or, if the index is a clustering index, the CLUSTERRATIOF column of SYSIBM.SYSINDEXES is less than 90 percent.
(NEARINDREF + FARINDREF)/CARDF is greater than 10 percent.
PERCDROP is greater than 10 percent for a simple table space. If you are reorganizing the table space because of this value, consider using the REUSE option to improve performance.
The data set has multiple extents. Keeping the number of extents to less than 50 is a general guideline.
The table space is in AREO* status as the result of an ALTER TABLE statement.

SYSIBM.SYSLOBSTATS contains information about how the data in the table space is physically stored. Consider running REORG on the LOB table space when the value in the ORGRATIO column is 2. Additionally, you can use real-time statistics to identify DB2 objects that should be reorganized, have their statistics updated, or be image copied.

NOTE

If you specify REPORTONLY on a reorg of a table space, partition, index, or index partition, the REORG does not take place independent of whether the limit is surpassed. You simply get a report of whether you needed a reorg.

DBA Analysis

The database administrator can choose not to use the REORG triggers and instead analyze the statistics manually to determine whether a reorg is necessary. For any table space, a reorg is needed if any of the following apply, using statistics from the clustering index.

Any data set behind the table space has multiple extents.
CLUSTERRATIO < 90 percent always. However, keep very small tables at 100 percent clustered; medium tables should be reorged below 98 percent; large tables should be reorged below 95 percent; and from 90 percent to 95 percent can cause very poor performance.
(NEARINDREF + FARINDREF) / CARD > 10 percent.
FAROFFPOS / CARD > 5 percent.
NEAROFFPOS / CARD > 10 percent.
DBD grows after successive drops/recreates in a table space.

To check whether your index space needs to be reorganized, review the LEAFDIST column in the SYSINDEXPART catalog table. Large numbers in the LEAFDIST column indicate that several pages are between successive leaf pages, and using the index will result in additional overhead. In this situation, DB2 may turn off prefetch use as well. Reorganizing the index space will solve these problems.

Reorganizing the Catalog and Directory

To determine when to reorganize the DB2 catalog table space and index spaces, you can also use the same techniques used for determining when to reorganize application table spaces and index spaces. First, you will want to ensure that statistics are kept current, by using RUNSTATS, based on the frequency of changes in the catalog, so that decisions for reorganizations are based on current numbers. A reorganization is also necessary if the objects are in extents or if unused space needs to be reclaimed.

NOTE

Every table space in the DSNDB06 database is eligible for reorganization, but DSNDB06.SYSPLAN cannot be reorganized with SHRLEVEL CHANGE.

DB2 directory reorganizations are also important because the directory contains critical information about internal DB2 control and structures. These elements are important to DB2 processing because they affect application plans and package execution, utility execution, and database access. If they become disorganized, transaction and utility performance can be affected.

The DB2 catalog and some of the DB2 directory tables have relationships. For instance, you would want to reorganize the directory table DBD01 when you reorganize catalog table SYSDBASE. Directory tables SCT02 and SPT01 would need to be reorganized with SYSPLAN and SYSPACKAGE, respectively.

Rebalancing Partitions

The limit keys for the partitions of a partitioned table space can be changed if the partitions become unbalanced. This is done via the ALTER TABLE SQL statement with the PARTITION BY parameter.

This changes only the definition in the DB2 catalog but does not move any rows in the partitioned table space or any index entries in the partitioned index. DB2 will also set the REORP status for those partitions because they may no longer contain the correct rows. The affected partitionsall partitions in the range affected by the ALTERare no longer accessible.

NOTE

The index partitions will not be placed in REORP status, but they cannot be used for access to the partitions in REORP status. NPIs cannot be used for access to the affected data partitions, but if index-only access is needed, they can be used.

In order to redistribute the rows for the affected partitions, you must reorganize all these partitions at the same time by specifying a range in the PART n:m parameter of the REORG utility control statement. The following example shows REORG parts 5 through 10:

 REORG TABLESPACE DB2CERT.CERTTS PART 5:10

NOTE

You cannot specify SHRLEVEL REFERENCE or SHRLEVEL CHANGE if partitions are in REORP. The REORG utility requires that you specify SHRLEVEL NONEexplicitly or by defaultif at least one of the target partitions is in REORP.

If the limit key for the last partition in the key sequence for a large table spacecreated with LARGE or DSSIZEis changed, there must be a discard data set if the key range had been reduced independent of whether some of its former rows no longer belong to the key range. The rows no longer belonging to any key range are then placed in the discard data set. This can be specified on the DD name for the discard data set via the DISCARDDN parameter, or use the default DD name of SYSDISC.

NOTE

The REORG utility always establishes an inline copy for at least one of the partitions being reorganized that is in REORP status. This is a single inline copy that includes all partitions of the specified range. The DD name for the local primary copy data set must be provided via the COPYDDN parameter, or provide a DD statement with the DD name SYSCOPY.