12.4 Redo internals

 < Day Day Up > 



The ACID property of a transaction that relates to failures and recovery is the durability property. This property requires that once a user's transaction completes successfully (by the issue of a commit statement), the changes that are made must survive RDBMS failures. The e-mail system provides a good example; if the e-mail system fails, all sent e-mail would be retained in the inbox when the mailbox is opened at a later time.

Similarly, Oracle is able to protect data loss from the database failures by logging the information in the redo log files. In this section, we will look into the various areas that affect the backup and recovery feature, by examining certain internal details.

12.4.1 Logging

Oracle uses the physiological logging mechanism for tracking changes to the database. Physiological logging blends the best of both worlds, namely physical logging and logical logging.

Physical logging

Under the physical logging method both the ''before'' and ''after'' image of the block is logged. This is achieved by keeping the changes made to a block or an entire copy of the block with the ''before'' image.

Logical logging

Under this method of logging, the actual operation that is performed, such as a DELETE or an UPDATE operation, is recorded. This is required because if the user requests an undo operation, all the required conditions for the statement have to be made available.

Physiological logging

This is a combination of both the physical logging and logical logging. Under this method, both the before and after image of the block along with the type of operation is stored. Oracle maintains redo records for all changes, which consist of one or more change vectors. A change vector is the physical change to a database block. A group of change vectors, which constitute a single atomic change to the database, is called a redo record.

12.4.2 Redo architecture

Oracle redo architecture consists of several components:

Note 

The complete architecture of the redo log files is visible if the dump of the redo log file is examined through its trace files.

SCN  This records the consistent version and defines the committed version of the database. An SCN is allocated and saved in the header of a redo record that commits the transaction. SCNs are also recorded in other data structures, such as data file headers, control file structures and block header structures. It is saved in the redo record header that commits the transaction. It may also be saved in a record when it is necessary to mark the redo as being allocated after a specific SCN.

Note 

The SCN is 48 bits long and thus can be allocated at a rate of 16,384 SCNs per second for over 534 years without running out of them.

RBA  The redo byte address (RBA) identifies the start of a redo record within an online redo log file. It acts as a pointer to where a particular change vector appears in the redo stream. It consists of three fields, the log sequence number, the block number within the log, and byte offset within the block. The RBA structure is 10 bytes long.

Change vector  As discussed earlier, a change vector is the physical change to a database block. The change vector has a header that gives the data block address (DBA) of the block, the segment, the SCN, the change type, the block class, and the operation.

Redo record  A group of change vectors, which constitute a single atomic change to the database, is called a redo record. This grouping allows multiple database blocks to be changed so that either all changes occur or no changes occur. A redo record is a change to the database state. Redo records are ordered by SCNs; this helps during recovery, when changes are applied in the order of the SCN.

Redo log files  The redo records that contain the changes to the database blocks are stored in redo log files. These files are a series of circular files where all changes are recorded. Its thread number, the sequence number within a thread, and the range of SCNs spanned by its redo records, identifies each log. The information in the redo log files is logically ordered by SCN to preserve the order in which changes were made.

In the redo log header this information is stored in the thread number, sequence number, low SCN, and next SCN fields. The low SCN is the SCN associated with the first redo record. The next SCN is the low SCN of the log with the next higher sequence number of the same thread. The current log for an enabled thread has an infinite next SCN, since there is no log with a higher sequence number.

The following dump output illustrates the above redo log header information.

LOG FILE #1: (name #3) C:\ORACLE\ORADATA\. . .\REDO01.LOG Thread 1 redo log links: forward: 2 backward: 0 siz: 0x32000 seq: 0x00000002 hws: 0x8 bsz: 512 nab: 0x723 flg: 0x0 dup: 1 Archive links: fwrd: 0 back: 0 Prev scn: 0x0000.0002e872 Low scn: 0x0000.00034541 10/25/2002 00:19:40 Next scn: 0x0000.00039c37 11/03/2002 19:32:42 FILE HEADER:   Software vsn=153092096=0x9200000, Compatibility   Vsn=153092096=0x9200000   Db ID=2747698417=0xa3c68cf1, Db Name='ORA9IDB'   Activation ID=2747712753=0xa3c6c4f1   Control Seq=252=0xfc, File size=204800=0x32000 File Number=1, Blksiz=512, File Type=2 LOG   descrip:"Thread 0001, Seq# 0000000002, SCN 0x000000034541-0x000000039c37" thread: 1 nab: 0x723 seq: 0x00000002 hws: 0x8 eot: 0 dis: 0 reset logs count: 0x1c601474 scn: 0x0000.0002e872 Low scn: 0x0000.00034541 10/25/2002 00:19:40 Next scn: 0x0000.00039c37 11/03/2002 19:32:42 Enabled scn: 0x0000.0002e872 10/23/2002 22:32:20 Thread closed scn: 0x0000.00039c36 10/25/2002 00:32:45 Log format vsn: 0x8000000 Disk cksum: 0x2370 Calc cksum: 0x2370 Terminal Recovery Stamp scn: 0x0000.00000000 01/01/1988 00:00:00 Most recent redo scn: 0x0000.00000000 Largest LWN: 0 blocks Miscellaneous flags: 0x0 LOG FILE #2:

In a RAC environment, where multiple instances have their own redo log files and record database block changes pertaining to that instance, the order of SCN increase across threads /instances.

Redo threads  The redo record information generated by an instance is called a thread of redo and is assigned a number. The thread number becomes significant in a multi-instance database configuration such as RAC. Each thread is written to its own private set of redo log files.

For each log file there is a control file record that describes the log file characteristics. The index of a log's control file record is referred to as its log number. Log numbers are unique across instances. The log file record in the control file has fields identifying the number of group members, as well as the head and tail of filenames in the group.

Checkpoint structure  This is a data structure that defines a point in the redo stream, before which all changes to blocks have been written to disk. The checkpoint structure contains a checkpoint SCN, an enabled thread bitvec (a bit mask identifying what threads were enabled at the time of the checkpoint), a timestamp, the number of the thread that initiated the checkpoint, and an RBA.

The output below is an extract from a dump file of the online redo log file. This output provides visibility to the various redo architecture components, such as the RBA, SCN, DBA, and change vector.

REDO RECORD - Thread:1 RBA: 0x00001f.00000021.006c LEN: 0x0054 VLD: 0x01  SCN: 0x0000.070a75f4 SUBSCN: 1 11/21/2002 10:07:59 CHANGE #1 TYP:0 CLS:33 AFN:2 DBA:0x00800089 SCN:0x0000. 070a75f3 SEQ: 1 OP:5.4  ktucm redo: slt: 0x0007 sqn: 0x00001bf6 srt: 0 sta: 9 flg: 0x2  ktucf redo: uba: 0x00800291.0718.18 ext: 2 spc: 3270 fbi: 0

12.4.3 Recovery architecture

Like the redo architecture, Oracle's recovery architecture is composed of various components and different algorithms for different functions. While these algorithms are provided to help in the recovery functions of the database, they are related to the redo architecture because recovery of the database depends on the redo log files and the presence of data in these files. The following section discusses the various algorithms and how these algorithms interact during a recovery operation.

Oracle's FIX rule algorithm

We discussed earlier that Oracles buffer contains data that was retrieved by various processes and which are in a consistent state of being modified; while these modifications happen, Oracle needs to ensure that other processes do not access the buffer or pages of data in memory. Oracle's FIX rule provides this functionality, that is, it prevents the access of the buffer or page until all the changes needed to take the buffer to the next consistent state have been applied.

Oracle relies on the O/S for implementing the FIX rule. In a Unix environment, this rule is implemented with semaphores. (Semaphores should be a familiar subject to database administrators who have installed, configured, and tuned an Oracle database system). If sufficient semaphores are not defined the following errors are normally noticed during instance startup, of which the ORA-3113 and ORA-27416 are common:

ORA-7250 "spcre: semget error, unable to get first semaphore set." ORA-7279 "spcre: semget error, unable to get first semaphore set." ORA-7251 "spcre: semget error, could not allocate any semaphores." ORA-7252 "spcre: semget error, could not allocate any semaphores." ORA-7339 "spcre: maximum number of semaphore sets exceeded." ORA-3113 "end-of-file on communication channel" at instance startup. ORA-27146 "post/wait initialization failed" 

However, for others, semaphores are a system resource that Oracle utilizes for interprocess communication. How many semaphores does the Oracle RDBMS need? This depends on many factors, such as the number of process parameters defined in the init<SID>.ora file. The number of semaphores is normally set to two times the number of processes defined in the parameter file and the number depends on certain Unix kernel parameter settings.

On Unix the locking portion of the FIX rule is implemented by:

  1. Obtaining a page semaphore in exclusive mode prior to altering the page, or obtaining a page semaphore in shared or exclusive mode prior to reading the page.

  2. Holding the semaphore until the changes to be applied are recorded in a log buffer and the changes have been made to the page.

In order to ensure that the original state can be restored in the case of a failure, Oracle creates a redo record prior to changing the buffer. Oracle guarantees that when a page is unpinned after the page action completes, the redo records are consistent with the buffer state.

Several times during normal activity there could be potential failures of the page action, for example when a process dies while modifying a buffer. Under such circumstances there is an inconsistent page, and to bring the page back to a consistent state, Oracle needs to perform a block recovery. In order to perform this block level recovery, Oracle needs to apply the redo records to a copy of the block obtained from the data file.

Note 

While in a Unix environment, Oracle depends on the semaphores to implement this rule; on an O/S such as Windows there is no concept of semaphores. On such environments it should be ensured that sufficient memory is allocated for Oracle to perform efficiently.

Write-ahead logging algorithm

It was stated earlier that, in order to ensure that the original state can be restored in the case of a failure, Oracle creates a redo record in the undo segment prior to changing the buffer. Apart from this, Oracle also needs to create the redo for the purpose of the undo change and the change itself. The redo change created during the process of modifying the record needs to be written to the redo log files in order for the change to complete.

This writing of the entry into the redo log files completes the change process to the record.

To ensure that writing all changes to the redo log file is complete, Oracle needs to ensure that the LGWR completes the process of writing the changes to the online redo log files prior to DBWR writing the dirty blocks to disk.

Log force at commit algorithm

When the user process completes its transaction with a commit statement, Oracle generates an SCN. This committed record is formed in the log buffer as a result of the completion of the change process. However, in order to make this change permanent, all redo entries up to and including the committed record need to be written to the online redo log files. The writing of the redo entries is optimized by Oracle by using the piggyback approach, such that any transaction that completes during the signaling of a write will be written at the same time. Grouping several transactions together performs this operation.

Online log switching algorithm

We have discussed in Chapter 3 (Oracle Database Concepts) how Oracle manages the redo log files, how the log switch happens, and the various criteria under which Oracle performs a log switch. One of the situations discussed where a log switch happens, is when Oracle encounters insufficient space in the current online redo log file. However, just having insufficient space cannot ensure that the log file can be switched; Oracle has to ensure the following:

  1. If the ARCHIVE LOG mode is enabled, the archiving process has to complete on the redo log file where Oracle will point to write the next redo operation when the current log file is full.

  2. The next log file should not contain changes required in the event of instance recovery being needed. This is determined by Oracle by ensuring that the thread checkpoint SCN is beyond the highest SCN allocated in the log to be switched into. When a delay happens due to this behavior, Oracle writes a ''checkpoint not complete'' message in the alert_<SID>.log file (e.g., alert_RAC2.log). If this happens frequently, the alternative is to add more redo log files, increase the size of the log files or the frequency of checkpoint activity.

Checkpoint algorithm

In Chapter 3 (Oracle Database Concepts) the various events and situations that trigger the checkpoint activity (log switch, shutdown, hot backup, etc.) were discussed in detail. There are various types of checkpoints such as:

  • Thread checkpoint: It is only in RAC that the concept of threads is really utilized and hence this type of checkpoint pertains to a RAC implemen tation. All blocks dirtied prior to the checkpoint SCN in that thread for all online data files will be written to disk. This occurs each time a local checkpoint is initiated by the instance, for example, during log switch.

  • Database checkpoint: The thread checkpoint that has the lowest checkpoint SCN for all enabled threads becomes the database checkpoint. All blocks in memory, which contain changes made prior to this SCN across all instances, must be written out to disk.

  • Data file checkpoint: All blocks changed prior to the data file checkpoint SCN have been written to disk. Database checkpoints, hot backups, or when taking a tablespace offline, can signal data file checkpoint.

  • Incremental checkpoint: Checks that happen more frequently are called incremental checkpoints. The DBWR process periodically writes out buffers from the checkpoint queues to advance the incremental checkpoint in memory. The CKPT process determines the lowest low RBA of all the buffers at the heads of the checkpoints queues and writes out this RBA to a thread-private location in the control file. Incremental checkpoints are used to reduce recovery time in the case of failure. During recovery, the recovery process chooses the highest incremental checkpoint RBA and the thread checkpoint RBA as the starting position for recovery.

    In Oracle 8i this feature was implemented using the FAST_START_IO_TARGET parameter. In Version 9i, Oracle has introduced another parameter called FAST_START_MTTR_TARGET, which writes records based on the recovery time required.

  • Mini-checkpoint: This checkpoint occurs due to DDL operation such as DROP TABLE, ALTER TABLE, etc. Unlike the other types of checkpoints, the mini-checkpoint only affects blocks belonging to the object covered by the statement.

12.4.4 Redo architecture in RAC

The redo architecture in a RAC environment is no different compared to a stand-alone configuration. This is because all redo and recovery is instance-specific. Every instance will have a set of redo logs and archive logs that contain changes made by users on that instance. For example, as illustrated in Figure 12.1 above, RAC1 and RAC2 have their background process that write to their copy of the redo logs.

While each instance that participates in the clustered configuration has its own set of redo log files, each set of these redo log files is assigned a thread. The thread-to-instance relationship is assigned through the para meter file. It is during a recovery operation that these threading properties of the redo logs (in a RAC configuration) play an important part.

During a recovery operation, when one or more instances fail, the recovery operation is performed by one of the other instances that is available. A more detailed discussion about recovery operation is covered in the next section.



 < Day Day Up > 



Oracle Real Application Clusters
Oracle Real Application Clusters
ISBN: 1555582885
EAN: 2147483647
Year: 2004
Pages: 174

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net