Data Recoveries | DB2 for z/OS Version 8 DBA Certification Guide

In order to ensure that it is possible to recover to a point in time when everything is consistent and in sync, a point of consistency, or a common quiesce point, must be established. This point of consistency is to establish a recovery set so that related objects will be recovered to a consistent point in time.

A point of consistency is established by executing the QUIESCE utility, which "quiets" the system and writes to the log the time during which there was no activity and everything was in sync. The utility does this by waiting for current units of work to complete and prevents any new ones from beginning. When the last unit of work has completed, a quiesce point is established. This point is recorded for each table space and index space in a recovery set by recording the START_RBA of the quiesce point in SYSIBM.SYSCOPY. This row will also have an ICTYPE of Q. These objects have a common point to which they can recover.

You determine which objects will be together in establishing this common point by using the TABLESPACESET keyword on the QUIESCE utility. You can also specify particular partitions of a partitioned table space:

 QUIESCE TABLESPACESET          TABLESPACE dbname1.tsname1          TABLESPACE dbname2.tsname2

NOTE

By specifying WRITE YES, the default, you can ensure that all changed pages of the table spaces are written to disk.

NOTE

QUIESCE will drain the table spaces one by one. In a system that is heavily updated and long-running units of recovery can be in the way, the QUIESCE utility can cause a serious outage because it will keep the table spaces that it was able to drain locked, causing SQL to fail.

Another way to establish a consistent point for your data is to establish a copy set. If you perform a single copy of multiple objects using the COPY utility with SHRLEVEL REFERENCE, the objects will have the same START_RBA value, which will create a recovery copy set.

Planning for Recovery

Report Recovery Utility

This utility provides information that can help plan for recovery:

Recovery history from SYSIBM.SYSCOPY
Log ranges from SYSIBM.SYSLGRNX
VOL-SER numbers of archive logs and BSDS
VOL-SER numbers of image copy data sets

The information from the REPORT RECOVERY utility helps you identify what image copies, active logs and archive logs are needed to recover a table space.

Fast Log Apply

The RECOVER utility can use either a standard log-apply process during recoveries or a fast log apply (FLA). Normal use is the standard log-apply process, but the FLA process can be used if it has been enabled.

The FLA process is enabled if a nonzero value for the size of the log-apply storage resides in the DBM1 address space. This is the DSNZPARM LOGAPSTG set on the DSNTIPL installation panel.

The FLA process will use two buffers, known as FLA buffers. One buffer is for reading the log records, and the other is used for applying the log records to the table space or index space. When the read buffer fills, the pages are sorted and then applied, making this now the write buffer, and the older write buffer becomes the new read buffer and begins to fill. These two buffers continue this flipping process until all the records have been read and applied.

FLA also takes advantage of parallelism. It uses one task per object being recovered for applying the logs and an additional task for reading the next set of log records at the same time.

Check Data Utility

This utility will check the table space for violations of table-check constraints or referential constraints and report this information. Check constraints would be violated only if they were added after the recovery, so only referential constraints should show up as violated. You can execute the CHECK DATA utility on the entire table space or on individual partitions. The utility will provide options for handling various violations.

Violations will be reported only in the utility output listing unless the FOR EXCEPTION clause on the utility is specified. If the DELETE NO clause is specified, the utility will report the violations and copy the violators to an exception (shadow) table provided. When DELETE YES is specified, the violators are removed after they are moved to the exception tables. The rows will have to be moved back to the original tables after they are corrected, but telling DB2 to delete the violators will reset the check-pending flag.

NOTE

Because it uses the primary index and, if available, the index on the foreign key, it is necessary to rebuild indexes before running the CHECK DATA utility.

Exception tables will have to be created for the CHECK DATA utility prior to its execution. These tables must have the same columns, in the same order, with the same data type, as the original table, although the column names can be different. The easiest way to do this is via the CREATE TABLE LIKE statement, creating an exception table exactly like the original. Two additional columns can be added to this table: a five-character column that gives you the RID or the original row and a timestamp column that gives you the starting data and time for the CHECK DATA utility.

NOTE

If a timestamp column is desired, a RID column must already exist. But you can have the RID column even without the TIMESTAMP column.

Catalog and Directory Recovery

You start a DB2 systemwide recovery by first recovering the catalog and directory; no other recoveries are possible before this process is completed. Therefore, the catalog and directory need to be image copied often, so that the recovery process for these objects is as fast as possible.

NOTE

If you lose your catalog and/or directory, you will not have an operational DB2 subsystem.

The time required for recovery of the catalog and the directory is crucial and needs to be kept to a minimum. Because DB2 performs the log apply for the catalog and the directory, it does not matter when you take the image copy of the catalog. However, it helps if you take a SHRLEVEL REFERENCE image copy just before you cut your last archive and take both this image and the archive logs to the vault or off site. This way, you make sure that the DB2 catalog contains all updates and that it is also valuable without the archive logs.

It is also important that the ICF (Integrated Catalog Facility) catalog be kept in sync with your DB2 catalog. For example, if the ICF catalog has yesterday's image in a disaster-recovery scenario, your DB2 catalog will reference data sets, such as image copies and/or table spaces, that do not exist, according to the ICF catalog. To avoid this problem, make sure that the recovery procedures of the ICF catalog match those of the DB2 catalog.

Because they also comprise table spaces and index spaces, the DB2 catalog and directory can fall subject to failures, such as a media failure, and these objects may need to be recovered. Recovering catalog data sets must be done in a specific order, owing to relationships and dependencies in the catalog. This order is as follows:

1.	`DSNDB01.SYSUTILX`
2.	`SYSUTILX` indexes
3.	`DSNDB01.DBD01`
4.	`DSNDB06.SYSCOPY`
5.	`SYSCOPY` (3) indexes (IBM only)
6.	`DSNDB01.SYSLGRNX`
7.	`SYSLGRNX` indexes
8.	`DSNDB06.SYSDBAUT`
9.	`SYSDBAUT` (3) indexes (IBM only)
10.	`DSNDB06.SYSUSER`
11.	`DSNDB06.SYSDBASE`
12.	`SYSDBASE` indexes (IBM only)
13.	`SYSUSER` (3) indexes (IBM only)
14.	All other catalog and directory table spaces and indexes
15.	Catalog indexes (user defined)
16.	System utility table spaces, such as QMF
17.	Communications database
18.	Object and application registration tables
19.	Resource-limit specification tables

The objects in steps 111 need to be recovered serially; the objects in steps 1219 can be recovered in parallel.

NOTE

Never recover the catalog and directory to a point in time unless all user data is also recovered to the same point in time. Otherwise, the inconsistency introduced will corrupt data and can cause systemwide DB2 failure.

Table Space Recovery

The RECOVER utility is used to recover a table space. The utility's two major phases are RESTORE and LOGAPPLY.

To determine which full image copy is needed, the RECOVER utility looks at the rows in the SYSIBM.SYSCOPY catalog table. This full image copy is taken and merged with any incremental copies that are found in the SYSCOPY table, replacing any updated pages in the full copy and providing the basis for the restoration of the table space.

It is possible to use backups not created by DB2. However, restoring any object is then your own responsibility, and you must specify that the RECOVER utility has to perform a LOGAPPLY only.

After the full image copy is brought up to current from the merging of the incremental image copies, the next phase is the LOGAPPLY. This phase applies any changes that are on the DB2 log that were made after the image copies were taken. The utility reads the logs and applies these changes to the restored table space. The utility uses the SYSLGRNX table to identify the DB2 log range that pertains to the table space being recovered. LOGAPPLY also uses the START_RBA in SYSCOPY to identify the latest image copy that was used during the RESTORE phase and to know from what point records in the log must be applied. The LOGAPPLY phase is very efficient because it can sort the log on page number and then on RBA/LRSN.

NOTE

The START_RBA will depend on whether the image copy was SHRLEVEL REFERENCE or SHRLEVEL CHANGE.

Index Space Recovery

You can also recover indexes if they were enabled for image copies in the DDL definition. Using image copies, an index is recovered rather than being rebuilt from the table by using the REBUILD utility. Only full image copies can be taken on indexes. The LOGAPPLY phase works the same way and applies necessary changes from the log since the time of the last image copy. Even if an index was marked to be recoverable, it can still be rebuilt by using the REBUILD utility. This utility will extract the index keys from the data, sort them, and build the index again, ignoring all recovery information available.

Object Recovery

Partial object recovery is possible against a partition, a data set of an NPI, a partition of a DPSI, a single page, and an error page range

NOTE

A single table cannot be independently recovered.

Just as it is possible to back up multiple objects in the same job execution, it is possible to recover multiple objects at the same time. In a single RECOVER statement, you can recover multiple table spaces, index spaces, partitions, or data sets. You simply repeat the TABLESPACE, INDEXSPACE, or INDEX clause:

 RECOVER TABLESPACE dbname1.tsname1         TABLESPACE dbname2.tsname2         INDEXSPACE creator1.ixname1         (PARALLEL (2))

The RESTORE phase is performed in parallel by using the PARALLEL option. In the preceding example, two streams of parallelism are attempted. The RESTORE phases for the objects are started and executed in parallel: One of the tasks reading the pages from the image copy, and the other tasks writing the pages that have been read.

The log is read only once during the LOGAPPLY phase. This phase is common for all the objects being recovered and does not use parallelism but can sort the log to improve performance.

NOTE

DB2 will lessen the parallel degree or even serialize the recoveries if storage is insufficient to support multiple streams.

Prior to version 8, parallel recovery was possible only if the image copies were on disk; if the copies were stacked on cartridge, the recovery processes were serialized. As of version 8, the stacked copies on tape can be recovered in parallel, improving usability and performance by implicitly retaining mounted volumes for input data sets used by RECOVER and COPYTOCOPY. Version 8 also allows dynamic allocation access for data sets stacked onto a tape volume. There is no unnecessary unloading and remounting of tapes between access

NOTE

This can also be applied via APARs (authorized program analysis report) for version 7.

Fallback Recovery

Sometimes, things can go wrong in recoveries, as when an image copy is not available, is defective, or is deleted. In these cases, DB2 has a few choices:

Use of local copies. If the local primary copy cannot be used, DB2 will try to use the local backup copy. DB2 will not try to use the remote copies, assuming that they are being keep off site.
Use of incremental copies. If an incremental copy cannot be used, DB2 will merge all prior incremental copies with the full copy, ignoring the defective incremental and all subsequent incremental image copies. DB2 will then start the LOGAPPLY phase at the START_RBA of the last incremental that was merged.
Invalid full copies. If the full copy cannot be used, DB2 will fall back to a previous full image copy and then merge it with the incremental image copies, if available. If no good full copy exists, the table space is recovered from the log or a DSNU510I error message is issued, and the utility terminates in error.
Index space invalid copies. If no valid index space copies exist, the same message as the previous point is issued, and the utility is terminated. It would then be necessary to perform a rebuild of the index from the table, using the REBUILD utility.

NOTE

If a REORG LOG NO or LOAD LOG NO was performed and no image copy was created, fallback processing will fail.

Point-in-Time Recovery

You can use the TOLOGPOINT parameter in your RECOVERY job in order to recover to any previous point in time. This parameter is specified to provide the log RBA or LRSN of the point in time for the recovery. This should be a quiesce point that was established for the copy recovery set. This information is obtained from the SYSIBM.SYSCOPY table or by using the REPORT RECOVERY utility.

After a point-in-time recovery has been performed, a row will be inserted into the SYSIBM.SYSCOPY table, with an ICTYPE of P for each object that was recovered, in order to allow future recoveries to skip those associated log ranges.

NOTE

In a non-data sharing environment, the TORBA keyword can be used, but it is recommended that TOLOGPOINT be used. In a distributed environment, TOLOGPOINT can also be used to recover two table spaces in two separate subsystems to the same point in time.

In order to avoid any pending conditions when you recover a table space set to a prior point in time, you will need to ensure that all the table spaces are in the same RECOVER control statement. You can use the TABLESPACESET keyword in the table space statement.

When a point-in-time recovery is performed for multiple objects in a single RECOVER utility control statement with TOCOPY, there is no LOGAPPLY phase but rather only a RESTORE phase to the last image copy. Therefore, multiple control statements must be used to prevent a CHKP status being set on the objects having RI relations. This will also cause an RBDP or CHKP on the indexes. It is wiser to specify a common START_RBA for a recovery copy set via the TOLOGPOINT parameter.

A point-in-time recovery can result in a variety of pending conditions.

RBDP (rebuild-pending) results for indexes that existed on tables that were part of a table space that was recovered. This status is for indexes that were not copy enabled. A copy-enabled index will get this status only if it was not included in the recovery set. Recovering only certain partitions can result in RDBP*, a nonpartitioning index rendering it inaccessible.
CHKP (check-pending) results from recovery of a table space that has tables with RI relationships with tables that were not recovered. This status also results when indexes are recovered for table spaces that were not also recovered.

In order to resolve these statuses, certain actions need to be taken. For an RBDP condition, the index will need to be rebuilt via the REBUILD INDEX utility. You will need to rebuild the index if a logical partition of a nonpartitioning index has RBDP* status.

For a CHKP, the CHECK DATA utility has to be performed. It will find any violations of referential integrity constraints and can optionally resolve them by removing the offending rows and any directly or indirectly dependent rows. CHKP conditions on an index may indicate that the index is inconsistent with the data in the table. The CHECK INDEX utility can help to resolve these inconsistencies by determining whether there are any; if so, you can run the REBUILD INDEX utility to rebuild the index and get it back in sync with the data.

NOTE

CHECK INDEX will reset the CHKP flag if no inconsistencies are detected. If inconsistencies between the index and the data exist, you may want to immediately run REBUILD INDEX to resolve problems in a timely manner.

NOTE

If any unexpected pending conditions, especially recovery pending (RECP), are encountered, carefully review the utility output. Utilities set and resolve pending conditions while they are processing. If, by the end of the utility, pending conditions still exist, perhaps something was wrong.

LOB Recovery

Special recovery considerations for LOB are needed because of the way large objects are stored and maintained.

Logging

An option allows you to decide whether to log changes to LOB columns. This option is specified on the creation of the LOB table space by using the LOG option; LOG YES is the default. LOBs larger than 1GB cannot be logged. If you have a LOB defined as LOG NO and then decide to alter it to be LOG YES, the LOB will be placed in copy-pending status. For LOBs that are defined as LOG NO, the force-at-commit protocol will ensure that the LOB values persist once committed. Even with LOG NO, the changes to the system pages and the auxiliary indexes are logged.

Point-in-Time Recovery

Even though there are no referential constraints between the LOB table space and its associated base table, the LOB table space belongs to the same table space set as the associated base table. A point-in-time recovery will need to recover both the base table and the LOB table space to a common point of consistency. Thus, a point of consistency will need to be established by quiescing the table space set or by using the COPY utility to generate a set of SHRLEVEL REFERENCE image copies for the table space set. The table space set will then need to be recovered to the RBA, the quiesce point, or the image copy set. A QUIESCE WRITE YES will record the point for the index on the auxiliary table if the index was defined with COPY YES.

AUXW Status

The LOG NO optionto not log the LOBsmeans any log-apply processing required during a recovery will invalidate the LOB values that were recorded after the last restored image copy of the table space, because DB2 records when the LOB values were changed but not the update itself. This will place the LOB table space in AUXW status (auxiliary warning), and the invalid LOB values will not be able to be read by SQL.

The table space that was defined with LOG NO will need to be recovered to current by running the CHECK LOB utility on the LOB table space to identify which rows are invalid. The rows are identified by their ROWID. Then SQL is used on the base table to update the invalid LOB values or delete the row that contains the invalid LOB value. A second run of the CHECK LOB utility will be needed to verify the LOB value, and then it will reset the AUXW status. Even if the LOB table space is in AUXW status, DML statements can be used on the base table but will fail with an SQLCODE of 904 when they try to read an invalid LOB.

Recover Pending

A recovery-pending (RECP) state can be set on table spaces and index spaces to prevent access to the affected objects. This state can be set on indexes after executing a recovery utility to TOCOPY and TORBA on a table space or from utility abends or terminations of RECOVER, LOAD, or REORG. The condition can be reset by executing a RECOVER, LOAD REPLACE, or REPAIR utility.