| < Day Day Up > |
|
Every single system is prone to failures: natural, mechanical, or electronic. This could be the human system, automobiles, computer hardware, elevators, application servers, applications, database servers, databases, networks, or network connectivity. Based on the critical nature of the item and its everyday use, these types of failures need an alternative way to provide the required service and/or a method to keep the systems up and functioning. For example, the human system can fail due to sickness; to avoid falling sick the human body needs exercise, good food, and general care, so the body's metabolism works efficiently and keeps the body healthy. An automobile can fail due to a simple problem like a flat tire. A backup option in this case would be a spare tire and some essential tools to replace the tire. In some unavoidable conditions an alternative to the automobile has to be used (bus or taxi). Electronic devices such as computer hardware are prone to failures; hardware comes in many forms throughout the entire enterprise configuration. Normally protection against hardware failures is obtained by providing redundancy at all tiers of the configuration. This helps because, when one component fails, the others will help continue operation.
On the database side, the storage system that physically stores the data needs to be protected. Mirroring the disk, where the data is copied to another disk to provide safety, and failover when a disk in the array fails, will provide the required redundancy against disk failures. The disk redundant configuration is achieved by following an appropriate RAID configuration, as discussed in Chapter 2 (Hardware Concepts).
What happens when a privileged user deletes rows from a table in a production database? What happens when this damage is only noticed after a few days? What happens when lightning hits the production center and the electric grid, causing a short circuit that damages the entire storage subsystem? In all these situations an alternative method over and beyond the redundant hardware architecture is required to get to the bottom of the problem for its resolution, namely, a process to retrieve and recover the lost data.
The answer to this is that a copy of the data needs to be saved regularly to another medium and stored in a remote location. Such a method of data storage will protect the enterprise from losing its valuable data. The method of copying data from a live system for storage in a remote location is called a backup process.
Backup and recovery methods for a database under RAC are similar to the procedures used in a single stand-alone configuration. RAC supports all the backup features of an Oracle database running in single-instance mode.
While defining database configuration specifications (RAC or non- RAC) the following should be considered:
If loss of data is unacceptable, the ARCHIVELOG mode should be enabled.
All instances in a RAC configuration should be set to automatic archiving.
The archive destination for each instance needs to be available to each specific instance only during normal operation, but they have to be made available to the other instances performing recovery following a media failure.
Multiple archive logs are written to the same destination. Hence, partitions should not be used for archive log files because in a raw device configuration, each archive will overwrite the previous one.
If a backup and recovery strategy is not available, it is important that a good backup and recovery strategy is designed and tested. Various options available under Oracle are:
Full offline backup.
Hot backup.
Backup using Recovery Manager (RMAN).
A full offline backup procedure for a RAC environment is almost identical to a single-instance configuration. The major difference in a RAC environment is that all instances that are part of the clustered configuration should be shut down before the actual backup operation.
The procedure to perform a full offline backup is:
Query the V$DATAFILE view to obtain the names and locations of the data files. Spool the output to a file that could be read to perform the backup operation:
COL NAME FORMAT A50 SELECT FILE#, NAME, STATUS FROM V$DATAFILE /
Query the V$LOGFILE view to obtain the names and locations of the online redo log files. Spool the output to a file that can be read to perform the backup operation:
COL MEMBER FORMAT A50 SELECT * FROM V$LOGFILE /
Query the V$CONTROLFILE view to obtain the names and locations of the control files. Spool the output to a file that can be read to perform the backup operation:
COL NAME FORMAT A50 SELECT * FROM V$CONTROLFILE /
Shut down all instances (including the one used to query the data dictionary) that are currently accessing the database:
oracle$sqlplus /nolog SQL>connect as sys@RAC1 /as sysdba SQL>shutdown immediate
Save copies of the data files to an alternate location. This is done by using an operating system utility like ocopy when using raw devices or the standard copy utility available on most operating systems, when using clustered file systems to save all the data files, online redo log files and at least one copy of the control file to a backup location on disk.
Move the saved copies of the various files to backup media such as a tape. Depending on the backup and recovery strategy and the SLA requirements, retaining a copy of the latest backup set on disk would enable an easy restoration, if such a need arises.
Restart the instances:
oracle$sqlplus /nolog SQL> startup ORACLE instance started. Total System Global Area 450937896 bytes Fixed Size 730152 bytes Variable Size 285212672 bytes Database Buffers 163840000 bytes Redo Buffers 1155072 bytes Database mounted. Database opened. SQL>
For repeated use of this strategy the entire process is implemented using a script, provided such a script could shut down all instances before starting the backup operation.
Using the SRVCTL utility, startup and shutdown of remote instances could be controlled from a single instance, making this process easier.
Hot or online backup allows backup of all or part of the database while it is in operation. The procedure for a hot backup in a RAC configuration is similar to that followed in a single instance configuration. Unlike the full offline backups, hot backups can be performed only when the database has been configured to run in ARCHIVELOG mode.
The procedure to perform a hot backup is:
Query the V$DATAFILES and the V$TABLESPACE view to obtain the names and locations of the data files. These views could be joined using the TS# column.
SELECT TS.NAME, DF.NAME FROM V$DATAFILE DF, V$TABLESPACE TS WHERE DF.TS# = TS.TS# /
Query the V$LOGFILE view to obtain the names and locations of the online redo log files.
COL MEMBER FORMAT A50 SELECT * FROM V$LOGFILE /
Query the V$CONTROLFILE view to obtain the names and locations of the control files:
COL NAME FORMAT A50 SELECT * FROM V$CONTROLFILE /
As a privileged user like system, issue the ALTER TABLESPACE <tablespace_name> BEGIN BACKUP command. This command prevents the file header from being updated when a log switch or checkpoint occurs.
Issue the appropriate operating system command for backing up the data files for this tablespace. Use the ocopy in backing up from a raw device partition or a regular operating system copy like cp[1] when backing up from a clustered file system.
Issue the ALTER TABLESPACE <tablespace_name> END BACKUP command.
Repeat steps 4 through 6 for all tablespaces to be backed up.
Back up the control files with the ALTER DATABASE BACKUP CONTROLFILE TO... command. For an added measure of safety, back up the control file to trace:
ALTER DATABASE BACKUP CONTROLFILE TO TRACE. /
After all the tablespaces have been backed up, issue another ALTER SYSTEM ARCHIVE LOG CURRENT command to generate archive log files for all redo threads, including any unarchived logs from closed threads. Include the archived files generated by this step in your backup along with any other archived log files that were generated since the start of the backup.
Note | OCOPY is an Oracle-provided copy utility that allows writing to continue while the backup is running. Please note that (when using raw devices) on a Windows operating system also, Windows-based copy commands such as copy, xcopy cannot be used for backup, as this does not allow files in use to be copied. |
Usage of ocopy:
ocopy from_file [to_file [a | size_1 [size_n]]] ocopy -b from_file to_drive ocopy -r from_drive to_dir
b: Splits the input file into multiple output files. This option is useful for backing up to devices that are smaller than the input file.
r: Combines multiple input files and writes to a single output file. This option is useful for restoring backups created with the -b option.
It is a good practice to verify the files backed up using user-managed backups, such as cold and hot backups discussed above. This verification process should be added to the backup and recovery strategy and should be periodically verified.
The best method to verify the usability of backups is to restore them to a separate host and attempt to open the database.
Another method is to check the physical data structure integrity of the offline data file. This could be done using Oracle-provided utilities such as DBVERIFY.
Recovery Manager (RMAN) is a component of the Oracle database that provides a tightly integrated method for creating, managing, restoring, and recovering Oracle database backups. This utility supports hot, cold, and incremental backups. RMAN provides an option to maintain a repository called the recovery catalog that contains information about backup files and archived log files. RMAN uses the recovery catalog to automate restore operation and media recovery.
RMAN determines the most efficient method of executing the requested backup, restore, or recovery operation and then executes these operations in conjunction with the Oracle database server. The RMAN process and the server automatically identify modifications to the structure of the database and dynamically adjust the required operation to adapt to the changes.
RMAN offers a wide variety of advantages over the traditional full and hot backup options that have been available in Oracle for a very long time. Some of the key features are:
Recovery at the block level.
Backup retention policy.
Persistent configuration.
Automatic channel allocation.
Resumable backup and restore.
Multiplex archived log backups.
Managing space when restoring archive logs.
Archive log failover.
Backup of server parameter file.
Control file auto backup.
Enterprise manager support.
Figure 12.1 illustrates the various components that constitute the RMAN process. The figure also illustrates how the various components interact with each other, with the RMAN process, and finally with a certain certified external media process (MML) that is used to back up the data to external media such as a tape device. Let us briefly look at some of RMAN components and how they interact with each other.
Figure 12.1: RMAN components.
RMAN process In order to use RMAN, the RMAN executable has to be invoked. It is through this executable that all functions of the backup and recovery process are handled. This implies that the RMAN process is the central component of the entire backup and recovery operation.
Channels In order to communicate with the I/O device such as a tape or a disk system, RMAN processes need to open a communication link between these devices, called channels. Based on the number of I/O devices that would be involved in the backup operation, several channels could be opened at the same time. These channels could be invoked for parallel or asynchronous access to these devices or could be configured to backup in a sequential order.
Target database The database that is being backed up by the RMAN process is called the target database. In Figure 12.1, RMAN interacts with one target database called PRODDB. Since this is a RAC implementation, this database could have two or more instances, for example RAC1 and RAC2. The main difference in the implementation of RMAN in a RAC environment compared to a stand-alone environment, is that each instance has a copy of redo log files and these redo log files are archived to an instance-specific archive log destination. These archive log destination disks should be visible from all instances in the clustered configuration. This has two purposes:
To provide visibility to RMAN to back up these archive log files.
To provide visibility for recovery purposes during media recovery when the instance that the archive log files belong to has crashed.
Recovery catalog database The recovery catalog, as illustrated in Figure 12.1, is optional and is a repository used by RMAN to store backup and recovery activities performed on the target database. The catalog database does not contain the actual physical backup files from the target database. Normally, to provide visibility of the information contained in the catalog databases when the target database is not available, the catalog is created as a separate database that is external to the target database.
Media management layer The media management layer (MML) is a third-party piece of software that manages the reading and writing of files to and from tape. For example, the VERITAS NetBackup and the Legato backup utilities are MML products. Due to the variety of tape media devices available in the market, RMAN depends on the various certified MML products to backup the files to tape.
Oracle provides options for configuring RMAN, similar to those for installing and configuring a database. RMAN can be configured using the ommand line or using the GUI-based interface that comes bundled with the Oracle Enterprise Manager (OEM).
Create a tablespace in the catalog database to hold the catalog data:
SQL> CREATE TABLESPACE rcvcat_data_p001 2* DATA FILE '/u02/oradata/PRODDB/rcvcat_ts.dbf' SIZE 200m SQL> / Tablespace created.
Create an RMAN user in the catalog database:
SQL> CREATE USER rmanadmin IDENTIFIED BY rmanadmin 2 TEMPORARY TABLESPACE temp 3 DEFAULT TABLESPACE rcvcat_data_p001 QUOTA UNLIMITED ON rcvcat_data_p001 SQL>/ User created.
Grant the RMAN user a special role in the catalog database:
SQL> GRANT RECOVERY_CATALOG_OWNER TO rmanadmin; SQL> GRANT CONNECT, RESOURCE TO rmanadmin; Grant succeeded.
Create user for RMAN in the target database:
SQL> CREATE USER rmantarget IDENTIFIED BY rmantarget TEMPORARY TABLESPACE temp DEFAULT TABLESPACE users;
Grant the user the following privileges:
SQL>GRANT CONNECT, RESOURCE, UNLIMITED TABLESPACE TO rmantarget;
Shut down the instance and subsequently start it in mount mode:
SQL>SHUTDOWN IMMEDIATE SQL>STARTUP MOUNT
Verify if the instance is in archive log mode – if not, set the database to archive log mode:
SELECT NAME, LOG_MODE FROM V$DATABASE / NAME LOG_MODE --------- ------------ PRODDB NOARCHIVELOG SQL>ALTER DATABASE ARCHIVELOG;
Verify if the log mode has changed by querying V$DATABASE:
SELECT NAME, LOG_MODE FROM V$DATABASE / NAME LOG_MODE --------- ------------ PRODDB ARCHIVELOG
Shut down instance:
SQL>SHUTDOWN IMMEDIATE
Assign the following parameters in the initPRODDB.ora file:
RAC1.log_archive_dest = /archivelogs/arch/ RAC2.log_archive_dest = /archivelogs/arch/ RAC1.log_archive_format = 'RAC1_%T_%S.arc' RAC2.log_archive_format = 'RAC2_%T_%S.arc' *.log_archive_start = true
Oracle 10g | New Feature: The archive log-naming format specified by the ARCHIVE_LOG_FORMAT parameter can include the resetlogs ID as part of the format string. The resetlogs ID represents the current incarnation of the database. This allows RMAN for easier recovery of a database from a previous backup. The resetlogs ID is represented by the format string %r in the archive log name. For example LOG_ARCHIVE_FORMAT = RAC1_%T_%S_%r.arc' where %r represents the resetlogs ID. This is new with Oracle 10g. |
Start the instance:
SQL>STARTUP
Test archive operation by performing a log switch on the target database:
SQL>ARCHIVE LOG LIST Database log mode Archive Mode Automatic archival Enabled Archive destination /archivelogs/arch/ Oldest online log sequence 31 Next log sequence to archive 33 Current log sequence 33 SQL>ALTER SYSTEM SWITCH LOGFILE
This should create log files in $ORACLE_BASE/admin/PRODDB/RAC1/ arch directory.
Connect to the catalog database using RMAN and issue the create catalog command:
oracle$rman catalog rmanadmin/rmanadmin@rcatalog Recovery Manager: Release 9.2.0.1.0 - Production RMAN-06008: connected to recovery catalog database RMAN-06428: recovery catalog is not installed RMAN> CREATE CATALOG RMAN-06431: recovery catalog created
Connect to the target database and the catalog database using the RMAN and issue the register database command:
oracle$rman target sys/change_on_install@PRODDB catalog rman_user/ rman_pass@RCATALOG Recovery Manager: Release 9.2.0.2.0 - 64bit Production Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved. RMAN-06005: connected to target database:PRODDB (DBID=1016892352) RMAN-06008: connected to recovery catalog database RMAN> REGISTER DATABASE; RMAN-03022: compiling command: register RMAN-03023: executing command: register RMAN-08006: database registered in recovery catalog RMAN-03023: executing command: full resync RMAN-08002: starting full resync of recovery catalog RMAN-08004: full resync complete
Store scripts into the RMAN catalog:
RMAN>@rman_script_level0 RMAN>@rman_script_level1 RMAN>@rman_script_level2
Based on the backup and recovery startegy defined for the organiza tion, regular backups could be scheduled in one of two ways: by setting up command-level processes to run at scheduled intervals, or by using the GUI-based interface provided by OEM and its job scheduling interface.
Note | Sample RMAN scripts for setting up RMAN jobs are provided in Appendix 2 of this book. For detailed descriptions on installation and configuration procedures for RMAN please refer to Oracle-provided documentation sets for the appropriate version. |
Figure 12.1 specifies that RMAN depends on the archive log files, control files, and data files; this is true whether RMAN is configured for a stand-alone configuration or a RAC configuration. The data files and control files in a RAC implementation are required to be located on a shared storage subsystem. The files that are specific to the instance such as the archive log files can be stored on storage media that are visible to the instance. However, while these devices could be available for write activity to the node that owns the device, these devices should be available for read purposes to all instances participating in the cluster. The reason for this is because, in case a media recovery is required, the archive log files from the failed instance may need to be applied from one of the other available nodes or instances. This would be possible only if these devices are visible to the instance or node that is performing the recovery operation.
Another important point that is very critical to archiving log files in RAC implementations that use raw partitions is that the archive logs should be located on file systems and not on raw partitions. This is because raw partitions do not have the capability to store multiple files in these partitions and if archive logs are stored on a raw device partition, each new archive will overwrite the previous one.
Backup files from RMAN process could be written to disk or tape directly. Many organizations based on their backup strategy may decide to keep a few days of backup on disk for easy access. In this case the backup is stored on disk and then backed up from disk to tape. When disks are used for backup, these devices should also be visible from all instances participating in the cluster. In other words, these devices should also be mounted as shareable. In certain storage management solutions, such as NFS, there are considerable performance implications when the file systems are mounted to be shareable and such methods should be avoided.
Oracle 10g | New Feature: Starting with Oracle 10g a specific area in which Oracle can store and manage files related to backup and recovery could be created using the parameters DB_RECOVERY_FILE_DEST and DB_RECOVERY_FILE_DEST_SIZE. |
RMAN provides a good reporting option. Using the RMAN utility, various details of the backup activity and other options can be obtained.
A 'report schema' command provides a list of the various tablespaces and data files configured for the target database.
RMAN> report schema 2> ; using target database controlfile instead of recovery catalog Report of database schema File K-bytes Tablespace RB segs Datafile Name ---- ------- ---------- ------- -------- ----- 1 921600 SYSTEM *** /dev/vx/rdsk/oraracdg/partition1G_3 2 2867200 UNDO_RAC1 *** /dev/vx/rdsk/oraracdg/partition3G_2 3 921600 TOOLS *** /dev/vx/rdsk/oraracdg/partition1G_16 4 921600 USERS *** /dev/vx/rdsk/oraracdg/partition1G_13 5 2867200 UNDO_RAC2 *** /dev/vx/rdsk/oraracdg/vertpartition3G_9 6 921600 ASAD_DATA_P001 *** /dev/vx/rdsk/oraracdg/partition1G_44 7 921600 ASAD_DATA_P002 *** /dev/vx/rdsk/oraracdg/partition1G_45 8 921600 ASAD_DATA_P003 *** /dev/vx/rdsk/oraracdg/partition1G_47 9 921600 ORDHD_DATA_P001 *** /dev/vx/rdsk/oraracdg/partition1G_12 10 921600 ORDHD_DATA_P002 *** /dev/vx/rdsk/oraracdg/partition1G_14 11 921600 ORDHD_DATA_P003 *** /dev/vx/rdsk/oraracdg/partition1G_17 12 921600 ORDLI_DATA_P001 *** /dev/vx/rdsk/oraracdg/partition1G_10
This report provides a list of backup summary data; when the last backup was taken and what level of backup was taken (full, incremental, etc.):
RMAN> list backup summary; List of Backups ================= Key TY LV S Device Completion # # Tag Type Time Pieces Copies ---------------------------------------------------------------------- 1 B 0 A DISK 20-OCT-02 3 1 TAG20021020T135125 2 B 0 A DISK 20-OCT-02 2 1 TAG20021020T135125 3 B 0 A DISK 20-OCT-02 2 1 TAG20021020T135125 4 B 0 A DISK 20-OCT-02 5 1 TAG20021020T135125 5 B 1 A DISK 20-OCT-02 1 1 TAG20021020T144731 6 B 1 A DISK 20-OCT-02 1 1 TAG20021020T144731 7 B 1 A DISK 20-OCT-02 1 1 TAG20021020T144731
The following report lists all the data files that have changed since the last RMAN backup and are ready to be backed up:
RMAN> report need backup 2> ; RMAN retention policy will be applied to the command RMAN retention policy is set to redundancy 1 Report of files with less than 1 redundant backups File #bkps Name ---- ----- -------------------------------------- 32 0 /dev/vx/rdsk/oraracdg/partition100m_54a 230 0 /dev/vx/rdsk/oraracdg/partition3G_5 231 0 /dev/vx/rdsk/oraracdg/partition500m_6b 232 0 /dev/vx/rdsk/oraracdg/partition500m_7b 233 0 /dev/vx/rdsk/oraracdg/partition3G_13 234 0 /dev/vx/rdsk/oraracdg/partition3G_14 235 0 /dev/vx/rdsk/oraracdg/partition1G_11 236 0 /dev/vx/rdsk/oraracdg/partition3G_6 237 0 /dev/vx/rdsk/oraracdg/partition300m_51 238 0 /dev/vx/rdsk/oraracdg/partition500m_12a 239 0 /dev/vx/rdsk/oraracdg/partition500m_13b 240 0 /dev/vx/rdsk/oraracdg/partition500m_14a 241 0 /dev/vx/rdsk/oraracdg/partition500m_15a 242 0 /dev/vx/rdsk/oraracdg/partition500m_17a 243 0 /dev/vx/rdsk/oraracdg/partition500m_17b
[1]cp is a copy command used on Unix operating systems.
| < Day Day Up > |
|