Disaster Recoveries | DB2 for z/OS Version 8 DBA Certification Guide

Another type of recovery requiring preparation is disaster recovery. The possibility always exists of losing an entire data center and having to recover at another site, usually known as the disaster, or recovery site. This type of recovery can be successful only with careful planning and practice.

In order to ensure success for a disaster recovery, regular backups of the data and log must be available at the disaster site. The more current the backups are, the better, in order to minimize data loss and update processing. The goal is to be restored and running as soon as possible.

Preparation

Many steps are involved in preparing for disaster recovery at a recovery site. This site needs to have copies of the data, including the catalog and directory, and the archive log data sets.

In order to prepare a recovery site to be able to recover to a fixed point in time, a weekly copy of the data, possibly with a DFSMS logical volume dump, must be available. This then needs to be sent to the recovery site, where it can be restored.

In order to do a recovery through the last archive copy, all the following objects need to be sent to the recovery site in a timely manner:

Image copies of all user data
Image copies of the catalog and directory
Archive logs
ICF EXPORT and list
BSDS lists

A disaster site can also be made ready by using the log-capture exit in order to capture real-time log data and then having that data sent periodically to the recovery site. However, this is not often a viable option, owing to the overhead of use of the log capture in high-volume environments.

NOTE

It is possible that you are using special facilities that you have become dependent on, such as DFSMS/HSM. In this case, these facilities should be in working order again at the recovery site and completely up to date. If these facilities go back in time, you might be forced to go back in time also. If you use these facilities, include them in your recovery scenarios.

Image Copies

Image copies of the application data will be required. In the event of a disaster recovery, it is assumed that a copy of the local copies is available at the disaster site. An option exists in the COPY utility to make copies to be sent regularly to the remote recovery site. A remote primary and a remote backup copy can be made like the local primary and backup by specifying the data set names in the DD statements for the RECOVERYDDN parameter in the COPY utility control cards. Those copies can be used for recovery on a subsystem that has the RECOVERYSITE DSNZPARM enabled or if you run a RECOVER using the RECOVERYSITE parameter.

You will also need image copies of the catalog and directory and a listing of the contents of SYSCOPY, which can be obtained via an SQL SELECT.

NOTE

If the image copies need to be tracked, you will want to catalog them. It is wise to have a single ICF catalog for a DB2 system so that all information about a system is in sync.

NOTE

It is not necessary to keep index copies at the disaster site; they can be rebuilt, if necessary.

Archive Logs

Copies of the archive logs need to be made and taken to the disaster recovery site. These copies can be made by issuing the ARCHIVE LOG command to archive the current active DB2 log. A BSDS report needs to be created by using the print log map utility to have a listing of the archive logs, as DB2 will use the BSDS to find all the available archive logs during a recovery.

ICF Catalog

It is also necessary to back up the ICF catalog via the VSAM EXPORT command. A list of DB2 entries needs be recorded daily via the VSAM LISTCAT command and sent to the recovery site.

NOTE

Often, the ICF catalog is the responsibility of a different department. Make sure that the ICF is always in sync with DB2. Create a single ICF catalog per DB2 system; perform EXPORT on the ICF after the DB2 catalog and directory have been image copied.

DB2 Libraries

The DB2 libraries need to be backed up to tape if they are changed. These libraries are as follows:

SMP/E, load, distribution, target libraries, DBRMs (database request modules), and user applications
The DSNTIJUZ job, which builds the DSNZPARM module and the DECP module
Data set allocations for the BSDS, logs, catalog, and directory

NOTE

It is good practice to record when all these items arrive at the recovery site and to have backups of all documentation.

Minimizing Data Loss

One disaster recovery scenario is to perform volume dumps and restores. But significant data loss can occur. In order to minimize the data loss, you should perform a dump of all the table spaces, logs, and BSDS while DB2 is up, after issuing an ARCHIVE LOG MODE(QUIESCE) command.

The ARCHIVE LOG command is useful when you are performing a DB2 backup in preparation for a remote-site recovery. The command allows the DB2 subsystem to quiesce all users after a commit point and to capture the resulting point of consistency in the current active log before the archive is taken. Therefore, when the archive log is used with the most current image copy, during an off-site recovery, the number of data inconsistencies will be minimized.

NOTE

If a -STOP DB2 MODE (FORCE) operation is in progress, the ARCHIVE command will not be allowed.

Taking the Table Spaces Offline

During a disaster recovery, the table space can be taken offline, or made unavailable, until the recovery is done. To do so, set the DSNZPARM DEFER to ALL, on install panel DSNTIPB, to allow the necessary log process to continue.

Tracker-Site Recovery

The trACKER SITE option allows for the creation of a separate DB2 subsystem or data sharing group that exists only for keeping shadow copies of the primary site's data. The primary full image copies need to be sent to the site after they have undergone a point-in-time recovery to ensure that they are up to date. A tracker site is supported by transferring the BSDS and archive logs from the primary site to the tracker site, which periodically runs LOGONLY recoveries to keep shadow data current. If a disaster occurs at the primary site, the tracker site will become the takeover site (Figure 8-2).

Figure 8-2. Tracker-site recovery

Because the tracker-site shadows activity on the primary site, image copies do not have to continually be shipped. This allows the tracker site to take control more quickly.

The two main reasons for choosing a tracker site are to minimize

The data lost during a disaster
The amount of time to get access to data during a disaster

You should to use a tracker site if it is important to recover data quickly at a disaster-recovery site with minimal data loss. This requires a DSNZPARM to be set in order to support the tracker-site option. Install panel DSNTIPO has a field called TRACKER SITE (DSNZPARM TRKRSITE), with 0 as the default setting. After installing the tracker-site subsystem, trACKERSITE is set to YES in order to enable tracker-site support.

In order to start tracker-site support, both sites must be brought in sync by shutting down the primary DB2 subsystem and taking a disk dump of everything. This dump is restored on the tracker site. After the tracker site is started, the primary site can be started. If both sites keep connected, they keep in sync, but if the connection is lost, the primary site cannot queue the log data, and you have to bring both sites in sync again, using manual intervention.

It is important to make sure that both sites never lose their connection if this option is chosen. A situation to watch for when using this option is the fact that table spaces get out of sync because of utilities with the LOG NO option.

The nature of the tracker site means that some operations are not allowed:

Some SQL statements: GRANT, REVOKE, DROP, ALTER, CREATE, UPDATE, INSERT, and DELETE
Read-only SELECTs are allowed but not recommended
Binds
Many utilities, including COPY, RUNSTATS, and REPAIR