Managing REDO and UNDO in a RAC Environment

< Day Day Up >

One of the keys to administration in a RAC environment is understanding how redo and rollback are managed. The key here is realizing that each individual instance requires its own, independent set of redo logs, and second, its own undo tablespace. Generally speaking, redo and undo are handled on a per-instance basis. Therefore, if a RAC database is comprised of three instances, each instance must have two sets of redo logs, one set for each instance, for a minimum of six online redo log groups. Each instance can, of course, have more, but just as with a regular instance, two is the minimum. In the case of the undo tablespace, each instance also must have its own undo tablespace. These files still must be on the shared drive.

Redo Logs and Instance Recovery

Instance recovery occurs when an instance goes down abruptly, either via a SHUTDOWN ABORT, a killing of a background process, or a crash of a node or the instance itself. After an ungraceful shutdown, it is necessary for the database to go through the process of rolling forward all information in the redo logs, and the rolling back of any transactions that had not yet been committed. This process is known as instance recovery and happens automatically, because the information that is required for recovering the instance should be available in the online redo loges.

In a RAC environment, the redo logs must be accessible from all instances for the purpose of doing instance recovery of a single instance, or of multiple instances. Should instance recovery be required because a node goes down ungracefully (whatever the reason), one of the remaining instances must have access to the online redo logs in order to perform the instance recovery. Thus, even though the instance is down, the data in the redo logs is accessible and can be rolled forward by a surviving instance, with any uncommitted transactions being rolled back. This happens immediately in a RAC environment, without the need to wait for the downed instance to come back online. Here is an example of what you may see in the alert log of the instance performing instance recovery:

Post SMON to start 1st pass IR Sun Feb 01 12:01:47 2004 Instance recovery: looking for dead threads Sun Feb 01 12:01:47 2004 Beginning instance recovery of 1 threads Sun Feb 01 12:01:47 2004 Started first pass scan Sun Feb 01 12:01:50 2004 Completed first pass scan  3941 redo blocks read, 165 data blocks need recovery Sun Feb 01 12:01:51 2004 Started recovery at  Thread 1: logseq 12, block 33210, scn 0.0 Recovery of Online Redo Log: Thread 1 Group 2 Seq 12 Reading mem 0   Mem# 0 errs 0: /ocfs/oradata/grid/redo02.log Sun Feb 01 12:01:53 2004 Completed redo application

The fact that instance recovery is done by a remaining node in the cluster means that when the crashed instance is restarted, no instance recovery is needed on that instance, because it will have already been done. If multiple instances go down, online instance recovery can still be done as long as there is a surviving instance. If all instances go down, then crash recovery is performed by the first instance to start up.

Redo Logs and Media Recovery

Media recovery differs from instance recovery in that it cannot be done automatically-it requires manual intervention, and may also require the application of archived redo logs. If it is necessary to perform media recovery on some or all of the database files, you must do this from a single node/instance. If you are recovering the entire database, all other instances must be shut down, and then you can mount the database on the node you have chosen to do recovery from. If you are recovering a single file (or set of files) that does not impact the entire database, then all instances can be open, but the file(s) that needs to be recovered must be offline, and will therefore be inaccessible.

Redo Threads

As we discussed previously, each instance is assigned a thread number, starting at 1, and the thread number for that instance should not change. The thread number is defined by the spfile parameter <sid>.THREAD=n, where n is the thread number for that instance. Thus, when a redo log group is created, it is assigned to a given instance using the thread number like so:

alter database add logfile thread 2 group 5 '/ocfs/oradata/grid/grid/redo02_05.log' size 100m;

The above example is a database on an OCFS drive.

A query similar to the following can be used to easily view the online redo logs, their groups, and their threads. This example is on a cluster using ASM:

set linesize 120 col inst_id for 99 col group# for 99 col thread# for 99 col sequence# for 9999999 col status for a10 col member for a40 col bytes for 9999999999 select a.group#, a.thread#, a.sequence#, b.member  from v$log a, v$logfile b where a.group#=b.group# order by 1,2,3; GROUP# THREAD# SEQUENCE# MEMBER ------ ------- --------- ----------------------------------------      1       1        14 +ASM_DISK/grid/onlinelog/group_1.265.3      1       1        14 +ASM_DISK/grid/onlinelog/group_1.264.3      2       1        15 +ASM_DISK/grid/onlinelog/group_2.263.3      2       1        15 +ASM_DISK/grid/onlinelog/group_2.262.3      3       2        11 +ASM_DISK/grid/onlinelog/group_3.258.3      3       2        11 +ASM_DISK/grid/onlinelog/group_3.256.3      4       2        12 +ASM_DISK/grid/onlinelog/group_4.257.3      4       2        12 +ASM_DISK/grid/onlinelog/group_4.261.5

Multiple Threads and Media Recovery

If you need to recover a datafile, or multiple datafiles, whether you are doing it with the database open or mounted from a single instance, you will need the archived redo logs from all threads of redo, and these will need to be made available in the archive destination of the instance doing the recovery. (This assumes, of course, that you are running in archivelog mode, as any good HA DBA will.) How you accomplish this recovery depends greatly on the type of file system that you are using for your datafiles. Recall that you have three options for storing files on the shared disk-OCFS, ASM, or RAW devices. We will discuss each option, and how it affects the archival process, in the sections to follow. (Note that in Chapter 8 we will discuss these options more in conjunction with the use of RMAN for backing up the database and archivelog.)

Archiving with RAW Devices as the File System Recall that if your shared files are on RAW devices, you must have a separate RAW slice (with a link name) for every file. Therefore, in this situation, it is not practical (or possible) to have the archived logs created on RAW slices themselves. Therefore, if you are using RAW devices, the most common solution to archiving would be to have the archive destination set to the private drive on each node. In the event that media recovery is necessary, it would then be required that all archived redo logs from all instances be copied to the archive destination from which recovery is being initiated. To speed the recovery process up, this can be avoided by setting up nfs mounts on each node, similar to the following:

mount -t nfs rmsclnxclu2:/u01/app/oracle/oradata/test/archive /archive

Here we have mounted the archive destination from Node2 (rmsclnxclu2) to a directory called /archive on Node1 (rmsclnxclu1). Assuming that node rmsclnxclu1 has the same path for archiving, you can then have two archive destinations, as such:

LOG_ARCHIVE_DEST_1=location='/u01/app/oracle/oradata/test/archive/' LOG_ARCHIVE_DEST_2=location='/archive/'

By doing this, Node1 is now archiving to two separate destinations-the first destination is its own local archive directory. The second destination, the /archive directory, is actually the nfs mounted archive destination used by Node2. If you reverse the process on Node2, issuing an nfs mount from Node2 back to the archive destination on Node1, what you will have is both instances archiving to their own local archive destination as well as the archive destination of the other node. What this means, in the event that media recovery is needed, is that no matter which node you do the media recovery from, you should have access to the archived logs from all threads. As you can see, this can get rather complicated if you have many more than two nodes, but in a two-node cluster this is a workable solution.

Note

If you make both archive destinations mandatory, you may cause a hang of the instance if the nfs mount point for LOG_ARCHIVE_DEST_2 is inaccessible. Therefore, we recommend that you make the second destination optional in this configuration, to lessen the impact if one of the nodes is down.

Archiving in a Cluster File System Environment If you are using OCFS for your datafiles, you can take advantage of this to greatly simplify the archival process. Simply specify the same directory on the OCFS drive as your LOG_ARCHIVE_DEST on each node. Each instance will then be writing archivelogs directly to the same directory. If you are using this option, we also strongly recommend that you specify a second archive destination on a private drive of each node. This will protect you in the event of a catastrophic failure of the disk subsystem where your OCFS drive(s) are.

One word of caution here: If archiving to the same directory on OCFS, a directory lock is taken out when a new archived log file is generated. This directory lock will be held until the file is created, preventing another instance from being able to archive at the same time. This will manifest itself as a short delay on the second instance, until the directory lock is released. As such, it may be advisable to create separate directories on the OCFS drive for each instance. This sacrifices convenience somewhat, as in the event a recovery is needed, you will need to move or copy files into the same directory. However, it avoids a possible performance hit during normal day-to-day operations.

Archiving in an ASM Environment Lastly, if you are using ASM as the storage medium for your datafiles, you can also use an ASM disk group as a location for your flashback recovery area. Doing so will allow each instance to write out the archived logs to the location that is specified by the DB_RECOVERY_FILE_DEST for each instance, allowing you to store the archived logs in an area accessible by all nodes for recovery. RMAN can use this location to restore and recover whatever files are necessary, with access to archived logs from all instances. Note also that by default, the value for LOG_ARCHIVE_DEST_10 is automatically set to be equal to the location for DB_RECOVERY_FILE_DEST, and will be used if no other values are set for LOG_ARCHIVE_DEST_n parameters. You can explicitly set one of the LOG_ARCHIVE_DESTINATION parameters to use the flashback recovery area by setting it equal to the string LOCATION=USE_DB_RECOVERY_FILE_DEST. For example:

*.LOG_ARCHIVE_DEST_1='LOCATION=/u01/app/oracle/oradata/grid/archive/' *.LOG_ARCHIVE_DEST_2='LOCATION=USE_DB_RECOVERY_FILE_DEST' *.db_recovery_file_dest='+ASM_TEST' *.db_recovery_file_dest_size=2147483648

We discuss this in more detail in Chapters 8 and 9. When archiving to an ASM disk group, Oracle will create a directory within the disk group for each day archived logs are generated. The directories containing the archived logs will stick around and cannot be deleted, even after they have been emptied (when the flashback recovery area and/or the archived logs are backed up via RMAN). Once a directory has been empty for seven days, it is deleted automatically when the controlfile is backed up using RMAN (either manually or via a controlfile autobackup).

Note

A flashback recovery area can also be used with OCFS. If using RAW devices, however, you will need to set up an ASM disk group if you want to define a flashback recovery area in a RAC environment or use an OCFS mount point.

Additional Notes on Archiving in a RAC Environment During normal operations, it is highly likely that the rate of log switches may not be the same between all instances. Therefore, if one instance is having a large amount of redo and another instance is relatively idle, the gap between SCNs and logs may grow. In order to keep this gap from becoming too large, Oracle will periodically kick in and force a log switch on instances that are not seeing as much activity. In addition, redo threads that are closed but enabled are also subject to this redo log kick, meaning that if an instance is shut down for a long period of time but you do not disable the redo thread, you may very well find that it is still generating archive logs, which will be needed in the event of a recovery. If you find archived logs being generated for an instance that has been down for a while, the thread can be disabled via the command

Alter database disable public thread 3;

where 3 is the thread number to be disabled. To reenable it, replace DISABLE with ENABLE.

HA Workshop: Enable Archiving and Flashback in a RAC Database

Workshop Notes

This workshop will go through the step-by-step process of putting a RAC database in archivelog mode and then enabling the Flashback Database option. A database must first be in archivelog mode before flashback can be enabled, so that will be done first. Our example will be centered around a two-node cluster using ASM for the datafiles and for the DB_FILE_RECOVERY_DEST. Note that for the purposes of illustration here, the same ASM disk group is used for both the datafiles and the DB_FILE_RECOVERY_DEST. However, best practices would require that these normally be separated onto different disk groups. In this example, we will also use multiple archive destinations, with one destination pointing to the private drive of each node. We will also use the default value of 1440 (minutes) for the DB_FLASHBACK_RETENTION_TARGET. In this workshop, the database name is test, and the instance names are test1 and test2.

Step 1. Create the local directories on each node needed for the nonshared (private) archive destination. In this example, ORACLE_BASE is set to /u01/app/oracle. Run these commands on each node:

[oracle@rmsclnxclu1 oracle]$ mkdir -p $ORACLE_BASE/test/archive

Step 2. Set the LOG_ARCHIVE_DEST_1 and LOG_ARCHIVE_DEST_2 parameters. Since these parameters will be identical for all nodes, we will use sid='*'. However, you may need to modify this for your situation if the directories are different on each node.

alter system set log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST'  SCOPE=SPFILE SID='*' System altered. alter system set log_archive_dest_2='LOCATION=/u01/app/oracle/oradata/test/archive' SCOPE=SPFILE SID='*' ; System altered.

Step 3. Set LOG_ARCHIVE_START to TRUE for all instances to enable automatic archiving.

SQL> alter system set log_archive_start=true scope=spfile sid='*'; System altered.

Note that we illustrate the command for backward compatibility purposes, but in Oracle Database 10g, the parameter is actually deprecated. Automatic archiving will be enabled by default whenever an Oracle Database 10g database is placed in archivelog mode.

Step 4. Set CLUSTER_DATABASE to FALSE for the local instance, which you will then mount to put the database into archivelog mode. By having CLUSTER_DATABASE=FALSE, the subsequent shutdown and startup mount will actually do a Mount Exclusive by default, which is necessary to put the database in archivelog mode, and also to enable the flashback database feature:

SQL> alter system set cluster_database=false scope=spfile sid='test1'; System altered.

Step 5. Shut down all instances. Ensure that all instances are shut down cleanly:

SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down.

Step 6. Mount the database from instance test1 (where CLUSTER_DATABASE was set to FALSE) and then put the database into archivelog mode.

SQL> startup mount ORACLE instance started. Total System Global Area  655434464 bytes Fixed Size                   455392 bytes Variable Size             125829120 bytes Database Buffers          528482304 bytes Redo Buffers                 667648 bytes Database mounted. SQL> alter database archivelog; Database altered.

Note

If you did not shut down all instances cleanly in Step 5, putting the database in archivelog mode
will fail with an ORA-265 error:

alter database archivelog * ERROR at line 1: ORA-00265: instance recovery required, cannot set ARCHIVELOG mode

Step 7. Confirm that the database is in archivelog mode, with the appropriate parameters, by issuing the ARCHIVE LOG LIST command:

SQL> archive log list Database log mode              Archive Mode Automatic archival             Enabled Archive destination            USE_DB_RECOVERY_FILE_DEST Oldest online log sequence     1281 Next log sequence to archive   1282 Current log sequence           1282

Step 8. Confirm the location of the RECOVERY_FILE_DEST via a SHOW PARAMETER:

SQL> show parameter recovery_file NAME                                 TYPE        VALUE ------------------------------------ ----------- --------------------------- db_recovery_file_dest                string      +ASM_DISK db_recovery_file_dest_size           big integer 8G

Step 9. Once the database is in archivelog mode, you can enable flashback while the database is still mounted in exclusive mode (CLUSTER_DATABASE = FALSE):

SQL> alter database flashback on; Database altered.

Step 10. Confirm that Flashback is enabled and verify the retention target:

SQL>select flashback_on, current_scn from v$database; FLASHBACK_ON    CURRENT_SCN -------------   --------------  YES                       0 SQL> show parameter flash NAME                                 TYPE        VALUE ------------------------------------ ----------- --------- db_flashback_retention_target        integer     1440

Step 11. Reset the CLUSTER_DATABASE parameter back to true for all instances:

SQL> alter system set cluster_database=true scope=spfile sid='*'; System altered.

Step 12. Shut down this instance and then restart all cluster database instances. All instances will now be archiving their redo threads. We will discuss Flashback Database in more detail in Chapter 9.

Note

The archived logs will be written out based on the init parameter LOG_ARCHIVE_FORMAT. By default, the LOG_ARCHIVE_FORMAT parameter includes the %T option, which means the thread number for each thread is included. Therefore, even if two instances have the same sequence number and end up being archived to the same location, the Thread# component of the archivelog will guarantee uniqueness. However, when archived logs are placed on an ASM disk group, these archived logs will not be named in accordance with the LOG_ARCHIVE_FORMAT, but rather will have system-generated names determined by Oracle, such as thread_2_seq_9.270.2, which still guarantees uniqueness across instances.

UNDO in RAC

Just as with the redo logs, each instance needs its own separate undo. In this case, each instance needs its own undo tablespace, again located on a shared drive. The undo tablespace will be defined by the parameter <instance_name>.UNDO_TABLESPACE. Again, the undo will be needed during instance recovery, if required, by the instance that is doing the recovery. In addition, however, it is very likely that the undo tablespace will need to be read from other nodes during the course of normal activities. For example, assume that a user on Node1 issues an update statement, updating a row on a table. The before picture of that row will go into the undo tablespace associated with that instance. Next, assume a second user on Node2 is querying the same record that has been updated on Node1, but not yet committed-this user must read the undo record that is in the undo tablespace assigned to Node1.

An undo tablespace can be dynamically altered while the instance is up via the ALTER SYSTEM command; however, it can only be changed to a tablespace that is specifically created as a tablespace of type UNDO, and that tablespace must not already be in use by another instance. For example, assume that we have just created a new undo tablespace called UNDOTBS_MYTEST, and we want to set that new tablespace as the undo tablespace for our third instance (called test3). The following SQL will make that change dynamically:

alter system set undo_tablespace=undotbs_mytest scope=both sid='test3';

Since scope=both was used, this change will persist even after a restart of the test3 instance, unless another statement is issued to change it later on.

< Day Day Up >