|< Day Day Up >|| |
An analysis of backup and recovery issues that get opened with Oracle Support Services revealed an interesting point: in situations where backups existed for recovery, a full restore of the entire database was initiated 40 percent of the time. In other words, when a hardware failure or a data corruption occurred, the DBA initiated a restore of every datafile in the database.
Folly! Certainly in some cases a full restore of the database may be required due to the nature of the problem. But a survey of the reported issues showed us that, in most cases, a single datafile, or a subset of datafiles, would have sufficed to resolve the issue, saving hours of lost time with the entire database down. Why the whole database restore? Typically, it was trained reaction by the DBA who has been conditioned to believe that there is only one kind of way to recover the database, and that is to recover the entire database. Often, the database restore is scripted, and the DBA is simply dot-slashing a .sh file that was built back in the days of Oracle 7.3.4.
In defense of this technique, it is very simple. There is no gray area surrounding the restore and recover decisions. There is a clear road map to completion of the recovery, after which the database will be up. But if that were the case, if it were that simple, why do they keep calling Oracle Support for help? Let's just drop the shtick and allow the fact that recovery situations are sticky, sweaty, nervous times for the DBA-particularly for the HADBA who is committed to uptime. And when things get murky, a black and white database restore looks very appealing, regardless of downtime exceptions. So what is to be done?
First, we must realize that media backups are a required component of any HA strategy. RAC provides us with load balancing and protection against node failure. Data Guard guarantees our system against site failure and complete disaster scenarios. But RAC nodes still share the same datafiles, so there is still the possibility of datafile corruption that affects all nodes in the cluster. And Data Guard failover is expensive, so we try to avoid it at all costs-do we failover when a single datafile goes belly up? No, there are times when good old-fashioned backups still serve a purpose.
While backups still serve a critical function, the full database restore and recovery must be avoided at all costs. For the HADBA, full database restore from backup is the kiss of death for availability. Better to failover to the Data Guard (DG) standby system and then reinstantiate the primary, than to waste valuable time with a full database restore and recovery. But, having a sound backup strategy means having access to files for single file restore, or a subset of datafiles. Given a specific recovery scenario, the best approach may be a file restore instead of DG failover. And, given new Oracle capabilities in Oracle Database 10g, this file restore can happen faster than ever before.
To take full advantage of media backups, Oracle Recovery Manager (RMAN) is a requirement for any legitimate HA strategy. No longer the painful little utility best eschewed by seasoned DBAs empowered with tried and true shell scripts, RMAN now comes equipped with the kind of functionality that makes it a necessary HA partner. There is simply nowhere else to turn for the kind of killer features that RMAN brings to the table. RMAN has been developed specifically to assist in HA environments, so it not only integrates with RAC, Data Guard, and Oracle Flashback Technologies-it complements the entire HA stack beautifully, making all components greater than their sum.
This chapter is dedicated to getting the most from an RMAN backup strategy, and not just for recovery but for assisting with load balancing, maintaining uptime, and minimizing downtime, as well as chipping in with RAC and DG maintenance.
|< Day Day Up >|| |