In order to ensure that it is possible to recover to a point in time when everything is consistent and in sync, a point of consistency, or a common quiesce point, must be established. This point of consistency is to establish a recovery set so that related objects will be recovered to a consistent point in time.
A point of consistency is established by executing the QUIESCE utility, which "quiets" the system and writes to the log the time during which there was no activity and everything was in sync. The utility does this by waiting for current units of work to complete and prevents any new ones from beginning. When the last unit of work has completed, a quiesce point is established. This point is recorded for each table space and index space in a recovery set by recording the START_RBA of the quiesce point in SYSIBM.SYSCOPY. This row will also have an ICTYPE of Q. These objects have a common point to which they can recover.
You determine which objects will be together in establishing this common point by using the TABLESPACESET keyword on the QUIESCE utility. You can also specify particular partitions of a partitioned table space:
QUIESCE TABLESPACESET TABLESPACE dbname1.tsname1 TABLESPACE dbname2.tsname2
By specifying WRITE YES, the default, you can ensure that all changed pages of the table spaces are written to disk.
QUIESCE will drain the table spaces one by one. In a system that is heavily updated and long-running units of recovery can be in the way, the QUIESCE utility can cause a serious outage because it will keep the table spaces that it was able to drain locked, causing SQL to fail.
Another way to establish a consistent point for your data is to establish a copy set. If you perform a single copy of multiple objects using the COPY utility with SHRLEVEL REFERENCE, the objects will have the same START_RBA value, which will create a recovery copy set.
Planning for Recovery
Report Recovery Utility
This utility provides information that can help plan for recovery:
The information from the REPORT RECOVERY utility helps you identify what image copies, active logs and archive logs are needed to recover a table space.
Fast Log Apply
The RECOVER utility can use either a standard log-apply process during recoveries or a fast log apply (FLA). Normal use is the standard log-apply process, but the FLA process can be used if it has been enabled.
The FLA process is enabled if a nonzero value for the size of the log-apply storage resides in the DBM1 address space. This is the DSNZPARM LOGAPSTG set on the DSNTIPL installation panel.
The FLA process will use two buffers, known as FLA buffers. One buffer is for reading the log records, and the other is used for applying the log records to the table space or index space. When the read buffer fills, the pages are sorted and then applied, making this now the write buffer, and the older write buffer becomes the new read buffer and begins to fill. These two buffers continue this flipping process until all the records have been read and applied.
FLA also takes advantage of parallelism. It uses one task per object being recovered for applying the logs and an additional task for reading the next set of log records at the same time.
Check Data Utility
This utility will check the table space for violations of table-check constraints or referential constraints and report this information. Check constraints would be violated only if they were added after the recovery, so only referential constraints should show up as violated. You can execute the CHECK DATA utility on the entire table space or on individual partitions. The utility will provide options for handling various violations.
Violations will be reported only in the utility output listing unless the FOR EXCEPTION clause on the utility is specified. If the DELETE NO clause is specified, the utility will report the violations and copy the violators to an exception (shadow) table provided. When DELETE YES is specified, the violators are removed after they are moved to the exception tables. The rows will have to be moved back to the original tables after they are corrected, but telling DB2 to delete the violators will reset the check-pending flag.
Because it uses the primary index and, if available, the index on the foreign key, it is necessary to rebuild indexes before running the CHECK DATA utility.
Exception tables will have to be created for the CHECK DATA utility prior to its execution. These tables must have the same columns, in the same order, with the same data type, as the original table, although the column names can be different. The easiest way to do this is via the CREATE TABLE LIKE statement, creating an exception table exactly like the original. Two additional columns can be added to this table: a five-character column that gives you the RID or the original row and a timestamp column that gives you the starting data and time for the CHECK DATA utility.
If a timestamp column is desired, a RID column must already exist. But you can have the RID column even without the TIMESTAMP column.
Catalog and Directory Recovery
You start a DB2 systemwide recovery by first recovering the catalog and directory; no other recoveries are possible before this process is completed. Therefore, the catalog and directory need to be image copied often, so that the recovery process for these objects is as fast as possible.
If you lose your catalog and/or directory, you will not have an operational DB2 subsystem.
The time required for recovery of the catalog and the directory is crucial and needs to be kept to a minimum. Because DB2 performs the log apply for the catalog and the directory, it does not matter when you take the image copy of the catalog. However, it helps if you take a SHRLEVEL REFERENCE image copy just before you cut your last archive and take both this image and the archive logs to the vault or off site. This way, you make sure that the DB2 catalog contains all updates and that it is also valuable without the archive logs.
It is also important that the ICF (Integrated Catalog Facility) catalog be kept in sync with your DB2 catalog. For example, if the ICF catalog has yesterday's image in a disaster-recovery scenario, your DB2 catalog will reference data sets, such as image copies and/or table spaces, that do not exist, according to the ICF catalog. To avoid this problem, make sure that the recovery procedures of the ICF catalog match those of the DB2 catalog.
Because they also comprise table spaces and index spaces, the DB2 catalog and directory can fall subject to failures, such as a media failure, and these objects may need to be recovered. Recovering catalog data sets must be done in a specific order, owing to relationships and dependencies in the catalog. This order is as follows:
The objects in steps 111 need to be recovered serially; the objects in steps 1219 can be recovered in parallel.
Never recover the catalog and directory to a point in time unless all user data is also recovered to the same point in time. Otherwise, the inconsistency introduced will corrupt data and can cause systemwide DB2 failure.
Table Space Recovery
The RECOVER utility is used to recover a table space. The utility's two major phases are RESTORE and LOGAPPLY.
To determine which full image copy is needed, the RECOVER utility looks at the rows in the SYSIBM.SYSCOPY catalog table. This full image copy is taken and merged with any incremental copies that are found in the SYSCOPY table, replacing any updated pages in the full copy and providing the basis for the restoration of the table space.
It is possible to use backups not created by DB2. However, restoring any object is then your own responsibility, and you must specify that the RECOVER utility has to perform a LOGAPPLY only.
After the full image copy is brought up to current from the merging of the incremental image copies, the next phase is the LOGAPPLY. This phase applies any changes that are on the DB2 log that were made after the image copies were taken. The utility reads the logs and applies these changes to the restored table space. The utility uses the SYSLGRNX table to identify the DB2 log range that pertains to the table space being recovered. LOGAPPLY also uses the START_RBA in SYSCOPY to identify the latest image copy that was used during the RESTORE phase and to know from what point records in the log must be applied. The LOGAPPLY phase is very efficient because it can sort the log on page number and then on RBA/LRSN.
The START_RBA will depend on whether the image copy was SHRLEVEL REFERENCE or SHRLEVEL CHANGE.
Index Space Recovery
You can also recover indexes if they were enabled for image copies in the DDL definition. Using image copies, an index is recovered rather than being rebuilt from the table by using the REBUILD utility. Only full image copies can be taken on indexes. The LOGAPPLY phase works the same way and applies necessary changes from the log since the time of the last image copy. Even if an index was marked to be recoverable, it can still be rebuilt by using the REBUILD utility. This utility will extract the index keys from the data, sort them, and build the index again, ignoring all recovery information available.
Partial object recovery is possible against a partition, a data set of an NPI, a partition of a DPSI, a single page, and an error page range
A single table cannot be independently recovered.
Just as it is possible to back up multiple objects in the same job execution, it is possible to recover multiple objects at the same time. In a single RECOVER statement, you can recover multiple table spaces, index spaces, partitions, or data sets. You simply repeat the TABLESPACE, INDEXSPACE, or INDEX clause:
RECOVER TABLESPACE dbname1.tsname1 TABLESPACE dbname2.tsname2 INDEXSPACE creator1.ixname1 (PARALLEL (2))
The RESTORE phase is performed in parallel by using the PARALLEL option. In the preceding example, two streams of parallelism are attempted. The RESTORE phases for the objects are started and executed in parallel: One of the tasks reading the pages from the image copy, and the other tasks writing the pages that have been read.
The log is read only once during the LOGAPPLY phase. This phase is common for all the objects being recovered and does not use parallelism but can sort the log to improve performance.
DB2 will lessen the parallel degree or even serialize the recoveries if storage is insufficient to support multiple streams.
Prior to version 8, parallel recovery was possible only if the image copies were on disk; if the copies were stacked on cartridge, the recovery processes were serialized. As of version 8, the stacked copies on tape can be recovered in parallel, improving usability and performance by implicitly retaining mounted volumes for input data sets used by RECOVER and COPYTOCOPY. Version 8 also allows dynamic allocation access for data sets stacked onto a tape volume. There is no unnecessary unloading and remounting of tapes between access
This can also be applied via APARs (authorized program analysis report) for version 7.
Sometimes, things can go wrong in recoveries, as when an image copy is not available, is defective, or is deleted. In these cases, DB2 has a few choices:
If a REORG LOG NO or LOAD LOG NO was performed and no image copy was created, fallback processing will fail.
You can use the TOLOGPOINT parameter in your RECOVERY job in order to recover to any previous point in time. This parameter is specified to provide the log RBA or LRSN of the point in time for the recovery. This should be a quiesce point that was established for the copy recovery set. This information is obtained from the SYSIBM.SYSCOPY table or by using the REPORT RECOVERY utility.
After a point-in-time recovery has been performed, a row will be inserted into the SYSIBM.SYSCOPY table, with an ICTYPE of P for each object that was recovered, in order to allow future recoveries to skip those associated log ranges.
In a non-data sharing environment, the TORBA keyword can be used, but it is recommended that TOLOGPOINT be used. In a distributed environment, TOLOGPOINT can also be used to recover two table spaces in two separate subsystems to the same point in time.
In order to avoid any pending conditions when you recover a table space set to a prior point in time, you will need to ensure that all the table spaces are in the same RECOVER control statement. You can use the TABLESPACESET keyword in the table space statement.
When a point-in-time recovery is performed for multiple objects in a single RECOVER utility control statement with TOCOPY, there is no LOGAPPLY phase but rather only a RESTORE phase to the last image copy. Therefore, multiple control statements must be used to prevent a CHKP status being set on the objects having RI relations. This will also cause an RBDP or CHKP on the indexes. It is wiser to specify a common START_RBA for a recovery copy set via the TOLOGPOINT parameter.
A point-in-time recovery can result in a variety of pending conditions.
In order to resolve these statuses, certain actions need to be taken. For an RBDP condition, the index will need to be rebuilt via the REBUILD INDEX utility. You will need to rebuild the index if a logical partition of a nonpartitioning index has RBDP* status.
For a CHKP, the CHECK DATA utility has to be performed. It will find any violations of referential integrity constraints and can optionally resolve them by removing the offending rows and any directly or indirectly dependent rows. CHKP conditions on an index may indicate that the index is inconsistent with the data in the table. The CHECK INDEX utility can help to resolve these inconsistencies by determining whether there are any; if so, you can run the REBUILD INDEX utility to rebuild the index and get it back in sync with the data.
CHECK INDEX will reset the CHKP flag if no inconsistencies are detected. If inconsistencies between the index and the data exist, you may want to immediately run REBUILD INDEX to resolve problems in a timely manner.
If any unexpected pending conditions, especially recovery pending (RECP), are encountered, carefully review the utility output. Utilities set and resolve pending conditions while they are processing. If, by the end of the utility, pending conditions still exist, perhaps something was wrong.
Special recovery considerations for LOB are needed because of the way large objects are stored and maintained.
An option allows you to decide whether to log changes to LOB columns. This option is specified on the creation of the LOB table space by using the LOG option; LOG YES is the default. LOBs larger than 1GB cannot be logged. If you have a LOB defined as LOG NO and then decide to alter it to be LOG YES, the LOB will be placed in copy-pending status. For LOBs that are defined as LOG NO, the force-at-commit protocol will ensure that the LOB values persist once committed. Even with LOG NO, the changes to the system pages and the auxiliary indexes are logged.
Even though there are no referential constraints between the LOB table space and its associated base table, the LOB table space belongs to the same table space set as the associated base table. A point-in-time recovery will need to recover both the base table and the LOB table space to a common point of consistency. Thus, a point of consistency will need to be established by quiescing the table space set or by using the COPY utility to generate a set of SHRLEVEL REFERENCE image copies for the table space set. The table space set will then need to be recovered to the RBA, the quiesce point, or the image copy set. A QUIESCE WRITE YES will record the point for the index on the auxiliary table if the index was defined with COPY YES.
The LOG NO optionto not log the LOBsmeans any log-apply processing required during a recovery will invalidate the LOB values that were recorded after the last restored image copy of the table space, because DB2 records when the LOB values were changed but not the update itself. This will place the LOB table space in AUXW status (auxiliary warning), and the invalid LOB values will not be able to be read by SQL.
The table space that was defined with LOG NO will need to be recovered to current by running the CHECK LOB utility on the LOB table space to identify which rows are invalid. The rows are identified by their ROWID. Then SQL is used on the base table to update the invalid LOB values or delete the row that contains the invalid LOB value. A second run of the CHECK LOB utility will be needed to verify the LOB value, and then it will reset the AUXW status. Even if the LOB table space is in AUXW status, DML statements can be used on the base table but will fail with an SQLCODE of 904 when they try to read an invalid LOB.
A recovery-pending (RECP) state can be set on table spaces and index spaces to prevent access to the affected objects. This state can be set on indexes after executing a recovery utility to TOCOPY and TORBA on a table space or from utility abends or terminations of RECOVER, LOAD, or REORG. The condition can be reset by executing a RECOVER, LOAD REPLACE, or REPAIR utility.