Yesterday you learned how to back up your database; today you will learn how to recover your database in the event of an emergency. Several types of failures can necessitate a recovery:
No matter what type of recovery is required, the important task of recovery should be performed as quickly as possible. If a failure requiring recovery occurs, many users will have little to do until the recovery is complete. But even though the recovery operation must be completed quickly, you should not rush. Any mistakes during the recovery will only delay the resumption of normal operations.
Today you will learn about recovery and restoration operations. These two concepts might sometimes be thought of as the same, but they are actually quite different:
Despite how reliable hardware has become, there are still occasions when the system might fail (perhaps because of a component failure or a power failure). If the failure causes no data loss, Oracle can recover itself; this is referred to as an instance recovery. If data has been lost, data must be restored from a previous backup and recovered; this is referred to as media recovery. If some event, such as an accidental deletion of a table, has occurred, a point-in-time recovery must be performed to avoid recovering the table-drop statement and repeating the mistake.
New Term: When an instance failure occurs, the Oracle recovery process can completely recover the database upon instance startup. All transactions that were committed at the time of the failure will be recovered, or rolled forward, and all transactions that were in process (also known as in-flight transactions) will be rolled back.
NOTE: Instance recovery can be quite time consuming. How much time instance recovery takes depends on the number of dirty buffers in the SGA. The number of dirty buffers depends on how much time has passed since the last checkpoint and the number of data modifications.
The instance-recovery process is automatic. When the instance is started, the startup process examines the datafiles and redo log files to determine whether the instance was properly shut down. At this point, the redo log is read and the affected transactions are rolled forward or back. If the checkpoint launched by the last log switch was completed, the transactions in fewer than one log file will require recovery.
TIP: If you tune the checkpoint interval and recovery parallelism, you can shorten the recovery time.
Users cannot access the database during the recovery process; only after the instance recovery is complete can users access the database. For this reason, recovery time should be kept to a minimum. Using the parallel-recovery feature of the Parallel Query option can help you reduce the time it takes for instance recovery to be completed. As you will see on Day 19, "Advanced Oracle Options," the number of processes or threads that perform the instance recovery can be tuned. If you tune the number of recovery processes to run best with your system, you can optimize the recovery interval.
In the event of a media failure (if, for example, a datafile is damaged), data recovery is necessary. If a datafile is damaged, you must restore it from backup before recovery can occur. The Oracle recovery process then applies archive log files and redo log files to restore the damaged datafile to its prior-to-failure state. This process might require the use of all archive log files created since the backup to recover the restored datafile, which can be quite time consuming.
TIP: If you schedule frequent backups, you can shorten restoration and recovery time. Recovery time depends both on how much time has passed and how much data has been modified since the last backup.
By placing the latest backup files as well as any archive log files created since the last backup online, you can shorten the time it takes to start the recovery process. If you have a plan and have everything ready to go, things will run more smoothly.
In order to assist the recovery process, Oracle8 provides several different methods for database recovery:
Each of these utilities can perform an effective backup; personal preference and your system needs dictate which one you use.
As with the database-backup procedure, you can perform a recovery using Enterprise Manager's Backup Manager utility. To do so, follow these steps:
2. The Restore wizard leads you through a series of screens that are similar to those of the Backup wizard. The first screen, shown in Figure 17.1, allows you to choose what type of restore you want to perform. Note that I have selected the Tablespaces radio button. After you decide what type of restore you want to perform, click Next.
Figure 17.1.
Screen one of the Restore wizard.
NOTE: The available options in this case are Tablespaces and Datafiles. Because the instance is up and running in this example, the database recovery option is not available. If the instance was down, Database would be an option.
Figure 17.2.
The Tablespaces screen of the Restore wizard.
Figure 17.3.
The Rename screen of the Restore wizard.
Figure 17.4.
The Channels screen of the Restore wizard.
Figure 17.5.
The Summary screen of the Restore wizard.
If you want to change the type of restore operation from an online to an offline restore, you can change the state of the instance from the main screen of the Backup Manager utility. To shut down the database, simply select the Shutdown radio button and click Apply (see Figure 17.6).
After you select the Shutdown option, you are asked what type of shutdown you want to perform. Your options are
Figure 17.6.
Change the state of the database from the main screen of Backup Manager.
I recommend that you always use the Immediate or Normal option. The Abort option should be used only in the case of an emergency.
As you have seen, it is easy to back up and restore a database using Backup Manager. Nonetheless, it is still important to test the recovery process periodically to make sure it is working correctly.
The method for restoring a datafile using OS facilities is as simple as the method used for creating the backup using OS facilities:
2. After you bring the tablespace back online (via the ALTER TABLESPACE command or the restarting of the Oracle instance), Oracle will realize that the datafile has been replaced and prompt you to apply the archived log files.
Tablespace and datafile recovery can be performed only if the RDBMS is running in ARCHIVELOG mode. Otherwise, you must restore the entire database from the last full, offline backup. No roll forward or rollback occurs.
You can use the Import utility to restore the database if and only if the Export utility was used to create the backup. For more information about the Import and Export utilities, see Day 9, "Managing Data."
NT Recovery Manager is very similar to NT Backup Manager. NT Recovery Manager, provided with your Oracle for Windows NT software, is not part of Enterprise Manager. Because of this, NT Recovery Manager can recover only a local database--it cannot provide recovery over a network. To recover a local database with NT Recovery Manager, perform these steps:
2. After you supply the password for the internal account and connect with the instance, you will see NT Recovery Manager's first screen. This screen, shown in Figure 17.7, lists the following recovery options:
Figure 17.7.
Oracle Recovery Manager offers a variety of recovery options.
4. After you specify which datafiles to recover, NT Recovery Manager automatically recovers them for you. If instance recovery is necessary, it is performed as well.
Figure 17.8.
The Data File Recovery screen of NT Recovery Manager.
You cannot restore from the loss of a log file. If a log file is damaged and the system is still functional, you can drop the log file, re-create it, and immediately perform a full backup. The backup is necessary because you will not be able to use the archive and redo log files for any recovery on this database. That is why I always recommend that redo log files reside on fault-tolerant disk volumes.
NOTE: I recommend using RAID mirroring on the redo log files to protect them in the event of a disk failure.
The point-in-time recovery allows you to recover a database back to a specified point in time, allowing you to recover from user errors or software problems that caused a failure to occur. However, because of the potential side effects, the point-in-time recovery, a somewhat complex operation, should be performed only on rare occasions. When you recover back to a point in time, the entire database will be recovered to that point. If a table is deleted by accident and the point-in-time recovery is being used to recover to a point just before that deletion, you must keep in mind that all other work to other tables that occurred after that point will be lost.
WARNING: The point-in-time recovery can be dangerous to use, because you are restoring to a point in time in the past. Always perform a full backup of your current database before attempting the point-in-time recovery. A point-in-time recovery recovers the entire database to that point in time. There is no way to recover a single tablespace or table, so all work done after that point to other tables will be lost.
NOTE: I have mentioned the point-in-time recovery only so that you will know it exists. If you plan to use the point-in-time recovery, you should study the Oracle documentation and carefully plan a restore before you attempt it.
The point-in-time recovery can be performed via Recovery Manager or via the RECOVER administrative SQL command. In either case, the syntax for performing the point-in-time recovery is entered through the command-line interface; there is no GUI option for the point-in-time recovery.
What method you use to restore your data depends on what type of error necessitated the recovery. In the event of a media failure, you must restore the datafile, then perform the media recovery with the point-in-time recovery option. In the event of a user error, follow these steps:
2. Restore this temporary database from a previous backup.
3. Specify a point in time immediately before the failure, then perform the point-in-time recovery on this temporary database.
4. Export the table that has been damaged, which now is in a state just before the damage.
5. Import that undamaged table to the real database.
This task can be time consuming and dangerous. If you confuse the temporary and real copies of the database, you might destroy valuable data.
Recovery Manager can be used to perform a point-in-time recovery via its command-line syntax. Because of the complexity of the command-line interface to Recovery Manager and the rarity of its use, I will not go into great detail on how to use it here. You can find very complete documentation in the Oracle Backup and Recovery manual.
When using the command-line interface to Recovery Manager, you must specify this additional parameter:
SET UNTIL
This parameter takes the following qualifiers:
TIME date LOG SEQ number THREAD number SCN scn
With this syntax, you can specify the exact time you want the recovery to stop by specifying the time, the log sequence, or the system change number. By using the SET UNTIL parameter when performing a recovery via Recovery Manager, you restore until the specified time.
When you perform a recovery with the RECOVER command, you can restore to any point in time by using the RECOVER UNTIL syntax. The RECOVER command has a number of options, but the important ones for the point-in-time recovery are
Like Recovery Manager, the RECOVER command can be used to recover a database, a tablespace, or a datafile, and can include a parallel clause. If you increase the degree of parallelism, you might see a performance benefit.
It is important that you plan your recovery strategy so that in the event of a system failure you will be ready. Several scenarios must be planned for, depending on the type of failure.
Except for requiring you to set the degree of parallelism, the instance-recovery process is automatic; it requires no operator intervention. Therefore, your planning should concentrate on the parallel recovery option. Evaluate your system to determine the degree of parallelism for the recovery process. This is set as the Oracle initialization parameter RECOVERY_PARALLELISM.
The number of disk drives and CPUs typically determines the degree of parallelism in your system. My rule of thumb is to use one process (thread) for every two disk drives, up to 20. Try starting with a number like this, and adjust it to suit your system. Because every system is different, it is difficult to make across-the-board recommendations. If you set this number too high, you might cause additional overhead.
When you plan for media recovery, you must consider not only the steps required to recover the damaged media, but the restoration of data as well. Some of the key items necessary for quick media recovery include
If downtime is not an option, consider using a fault-tolerant disk subsystem. You can avoid considerable downtime by protecting your disk drives with a RAID disk array.
TIP: The component in your system that is most likely to fail is probably a disk drive. The more disk drives you have in your system, the more likely one will fail. By protecting your disk drives with hardware or software RAID, you can avoid considerable downtime.
Recovering from operator error is similar to recovering from media failure, except with operator error recovery you might need a large amount of disk space to temporarily recover your database. Fortunately, users can typically continue working on the permanent database unless their duties require them to access the table(s) that was deleted or corrupted. To recover, you need the following components:
Today you learned how to recover from various types of system failures. You learned how instance, media, and point-in-time recoveries work. You also learned how to recover from an operator error. You learned the importance of planning ahead to reduce downtime; in most businesses, downtime can be very expensive.
On Day 18, "Administering Oracle Replication," you will learn how to use Oracle replication to increase performance and quicken recovery. You will also learn about other Oracle options for quick recovery, including the standby database option. These features do not provide fault tolerance, but do help you to come back online quickly.
A Restoring involves copying a backup file to replace a damaged datafile, whereas recovery involves Oracle using the transaction log to roll forward transactions.
Q What is the difference between instance recovery and media recovery?
A With instance recovery, Oracle automatically rolls forward committed transactions to bring the database up to date; media recovery requires a damaged datafile to be replaced with a backup copy before commencing.
Q What is an in-flight transaction?
A This term is sometimes used to describe a transaction that was in the process of execution when a failure occurred.
Q What Oracle utility is needed to restore from an export file?
A An export file can only be restored with the Import utility.
The workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to provide you with experience in using what you've learned. See Appendix A, "Answers," for the answers to quiz questions.
2. What transactions are recovered from an instance failure?
3. What transactions can be recovered from media failure?
4. What is ARCHIVELOG mode?
5. What is a dirty buffer?
6. What is a checkpoint?
7. What files are necessary for instance recovery?
8. What files are necessary for media recovery?
9. What is a point-in-time recovery?
10. What is a point-in-time recovery used for?
2. Using NT Recovery Manager, restore the tablespace you backed up yesterday.
3. On a small test system, use the SHUTDOWN ABORT command to abort an Oracle instance. (Do not do this on a production system or one that others are using.)
4. Restart the instance and observe the recovery process; try this again with a larger degree of parallelism. (NOTE: You must perform some transactions for recovery to have activity.)
© Copyright, Macmillan Computer Publishing. All rights reserved.