10.2 Availability | Oracle Real Application Clusters

< Day Day Up >

10.2.1 Oracle Real Application Clusters

RAC allows multiple Oracle instances residing on different nodes to access the same physical database, as shown in Figure 10.3. GCS and GES maintain consistency across the caches of the different nodes. RAC protects against either node failure or communication failure to a subset of the nodes. With a RAC implementation, two different setups for availability are possible:

Failover only: Workload runs only on one instance, with the second instance being used only for failover purposes. This requires no changes to the application.
Failover and scalability: Workload runs on multiple instances simultaneously. While this allows the application to take advantage of the resource of the multiple nodes, it may require changes to the application in order to be able to scale. This is the case if the application has not been designed to work in such an environment where the process from the application attaches to the database through one or more instances to process information.

click to expand
Figure 10.3: Oracle Real Application Clusters.

Note

To understand these cluster failover principles it is important that the reader has read through Chapter 2 (Hardware Concepts).

10.2.1 How does the failover mechanism work?

RAC relies on the CM of the clustered operating system for failure detection. The CM is a distributed kernel component that monitors whether cluster members can communicate with each other, and enforces the rules of cluster membership. The CM:

Forms a cluster, adds members to a cluster and removes members from a cluster
Tracks which members in a cluster are active
Maintains a cluster membership list that is consistent on all cluster members
Provides timely notification of membership changes Detects and handles possible cluster partitions

The CM ensures data integrity between communication failures by using a voting mechanism. That is, processing and I/O activity is allowed only when the cluster has a quorum. A quorum depends on a number of factors such as expected votes from the participating members in the cluster (node votes) and quorum disk votes.

Node votes are the fixed number of votes that a given member contributes towards a quorum. Cluster members can have either 1 or 0 node votes. Each member with a vote (1) is considered a voting member of the cluster and with 0 is considered a non-voting member.
Quorum disk votes are the fixed number of votes that a quorum disk contributes towards a quorum. Similar to the node vote, a quorum disk can also have either 1 or 0 votes.

Note

While node votes are a common method of voting, certain cluster configurations use the quorum disk method to determine availability of cluster members.

The CM determines the availability of a member in a cluster using the heartbeat function. Using the heartbeat mechanism, the CM allows the nodes to communicate with the other nodes to determine availability on a continuous basis at preset intervals, e.g., 2 seconds on Sun and Linux clusters. This means that every 2 seconds every node validates the existence of another node in the cluster.

When the heartbeat from the node does not get through after a specified amount of time, a timeout occurs (configurable in most operating systems). At this point, the node that originally detected the failure declares that the other node is not responding and declares that the node is not available for communications and has failed. The node that detected the failure first attempts to reform the cluster. (The heartbeat timeout parameter, like the heartbeat interval, varies from operating system to operating system; the default heartbeat timeout parameter on Sun clusters is 12 seconds and the default on Linux clusters is 10 seconds.^[1]) Reforming the cluster requires that the remaining nodes have enough members to form a quorum. If the remaining nodes form a quorum, the reorganization of the cluster membership happens. The reorganization process regroups the nodes that are accessible and removes the nodes that have failed. For example, in a four-node cluster, if one node fails, the CM will regroup among the remaining three nodes.

Similarly, when a node is added or joins the cluster after recovery, the CM performs this step to reform the cluster. The information regarding a node joining the cluster or leaving the cluster is exposed to the respective Oracle instances by the LMON process running on each cluster node.

As discussed previously in Chapter 4 (RAC Architecture), LMON is a background process that monitors the entire cluster to manage global resources. By constantly probing the other instances, it checks and manages instance deaths and the associated recovery for GCS. When a node joins or leaves the cluster, it handles reconfiguration of locks and resources. In particular, LMON handles the part of recovery associated with global resources. LMON-provided services are also known as Cluster Group Services (CGS).

The next step in the failover process is the database recovery operation. As part of this operation, Oracle will remaster the GCS of the failed instances amongst the available instances. Unlike the previous versions, where all services from both the failed and the available instances were reconfigured, in the case of Oracle 9i and above, only the services from the failed instance are reconfigured.

Once the reconfiguration of the services from the failed instances has completed, Oracle starts the cache recovery process by rolling forward transactions. This is made possible by reading the redo log files of the failed instance. Because of the shared storage subsystem, redo log files of any specific instance are visible to other instances. This makes any one instance that detected the failure to read the redo log files and start the recovery process.

After completion of the cache recovery process, Oracle starts the transaction recovery, that is, rolling back all uncommitted transactions. Oracle records all the steps of the recovery process in the alert log file of the instance that is performing the recovery.

The following extract is from the alert log file of the recovering instance. It displays the steps that Oracle has to perform; for example, the log file tracks the reconfiguration and cleaning up of enqueue resources.

Sun Sep 29 13:55:14 2002 Reconfiguration started List of nodes: 0, Global Resource Directory frozen one node partition Communication channels re-established Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Resources and enqueues cleaned out Resources remastered 7306 20900 GCS shadows traversed, 4 cancelled, 4159 closed 10401 GCS resources traversed, 0 cancelled 13156 GCS resources on freelist, 22000 on array, 22000 allocated set master node info Submitted all remote-enqueue requests Update rdomain variables Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted 20900 GCS shadows traversed, 0 replayed, 4162 unopened Submitted all GCS remote-cache requests 0 write requests issued in 16738 GCS resources 263 PIs marked suspect, 0 flush PI msgs Sun Sep 29 13:55:16 2002 Reconfiguration complete Post SMON to start 1st pass IR Sun Sep 29 13:55:16 2002 instance recovery: looking for dead threads Sun Sep 29 13:55:16 2002 Beginning instance recovery of 1 threads Sun Sep 29 13:55:16 2002 Started first pass scan Sun Sep 29 13:55:20 2002 Completed first pass scan 200007 redo blocks read, 3240 data blocks need recovery Sun Sep 29 13:55:25 2002 Started recovery at Thread 2: logseq 4, block 665405, scn 0.0 Recovery of Online Redo Log: Thread 2 Group 4 Seq 4 Reading mem 0 Mem# 0 errs 0: /dev/vx/rdsk/oraracdg/partition1G_32 Mem# 1 errs 0: /dev/vx/rdsk/oraracdg/partition1G_34 Sun Sep 29 13:55:31 2002 Ended recovery at Thread 2: logseq 4, block 865412, scn 0.6547319 2474 data blocks read, 3715 data blocks written, 200007 redo blocks read Ending instance recovery of 1 threads

After detection and during the remastering of GCS of the failed instance and cache recovery, most work in the surviving instance is paused and while transaction recovery takes place, work occurs at a slower pace. This point is considered full database availability, as now all data is accessible, including that which resided on the failed instance. The application is now responsible for reconnecting the users and repeating any uncommitted work they have done.

The output below is from the SMON trace file that indicates how SMON performs the recovery operations (basically by making available every segment that has completed recovery, for user access) when the recovering nodes try to recover the data that belongs to the failed instance:

SMON: about to recover undo segment 11 SMON: mark undo segment 11 as available SMON: about to recover undo segment 12 SMON: mark undo segment 12 as available SMON: about to recover undo segment 13 SMON: mark undo segment 13 as available SMON: about to recover undo segment 14 SMON: mark undo segment 14 as available SMON: about to recover undo segment 15 SMON: mark undo segment 15 as available SMON: about to recover undo segment 16 SMON: mark undo segment 16 as available SMON: about to recover undo segment 17 SMON: mark undo segment 17 as available SMON: about to recover undo segment 18 SMON: mark undo segment 18 as available SMON: about to recover undo segment 19

While RAC provides high availability of systems, the database servers, and the applications that use them, there are various features and options under RAC that provide even more support towards achieving the 99.999% availability of today's Internet-based business requirements. RAC allows for multiple nodes to participate in a clustered configuration providing continuous availability. When one of the participating nodes fails, the users are migrated to another node, thus providing a failover mechanism.

The best failover is the one that no one notices. Unfortunately, even though Oracle has been structured to recover very quickly, failures can severely disrupt users by dropping connections from the database. Work in progress at the time of the failure is most likely lost. If the user queried 1000 rows from the database and a failure on the node occurred while the user was scrolling through these rows on his/her terminal, the failure would cause the user to re-execute the query and browse through these rows again. This disruption could be eliminated for most situations by masking the failure with the TAF option.

10.2.3 The watchdog daemon on Linux

Oracle provides cluster management software (OCMS) with RAC Enterprise Edition to manage Linux clusters. The OCMS consists of two components: the watchdog daemon and the CM.

The watchdog daemon (watchdogd) uses a software-implemented watchdog timer to monitor selected system resources to prevent database corruption. The watchdog timer is a feature of the Linux kernel and comes as part of RAC.

The watchdog daemon monitors the CM and passes notifications to the watchdog timer at defined intervals. The behavior of the watchdog timer is partially controlled by the CONFIG_WATCHDOG_NOWAYOUT configuration parameter at the Linux kernel level.

The value of the CONFIG_WATCHDOG_NOWAYOUT configuration parameter should be set to Y. If the watchdog timer detects an Oracle instance or CM failure, it resets the instance to avoid possible database corruption.

The CM maintains the status of the nodes and the Oracle instances across the cluster. The CM process runs on each node of RAC. CM uses the following communication channels between nodes:

Private network
Quorum partition on the shared disk

During normal cluster operations, the CMs on each node of the cluster communicate with each other through heartbeat messages sent over the private network. The quorum partition is used as an emergency communication channel if a heartbeat message fails. A heartbeat message can fail for the following reasons:

The CM terminates on a node
The private network fails
There is an abnormally heavy load on the node

The CM uses the quorum partition to determine the reason for the failure. From each node, the CM periodically updates the designated block on the quorum partition. Other nodes check the timestamp for each block. If the message from one of the nodes does not arrive, but the corresponding partition on the quorum has a current timestamp, the network path between this node and other nodes has failed.

Each Oracle instance registers with the local CM. The CM monitors the status of local Oracle instances and propagates this information to CMs on other nodes. If the Oracle instance fails on one of the nodes, the following events occur:

The CM on the node with the failed Oracle instance informs the watchdog daemon about the failure.
The watchdog daemon requests the watchdog timer to reset the failed node.
The watchdog timer resets the node.
The CMs on the surviving nodes inform their local Oracle instances that the failed node is removed from the cluster.
Oracle instances in the surviving nodes starts the RAC reconfiguration procedure.

10.2.4 Transparent application failover

RAC provides near-continuous availability by hiding failures from end- user clients and application server clients. This provides continuous, uninterrupted data access. TAF in the database reroutes application clients to an available database node in the cluster when the connected node fails. Application clients do not see error messages describing loss of service.

TAF allows client application users to continue working after the application loses its connection to the database. While users may experience a brief pause during the time the database server fails over to a surviving cluster node, the session context is preserved. After failover completes, the application automatically reconnects to the database and, if desired, can continue retrieving data from SELECT statements initiated before the failure. After a failover, only the execution of interrupted SELECT statements can be resumed. All other calls are rolled back and OCI reports an error message that can be trapped and handled by the application.

Figure 10.4 illustrates the step-by-step scenario of the TAF configura tion. When a node or instance fails, what steps are taken before the user continues to receive the data? In this configuration, if the user's connection to node ORA-DB1 dies, their transaction is rolled back; however, they can continue work without having to manually reconnect to the other instance, establish another transaction programmatically, and then execute the request again. This functionality of continuation of work is made possible using the TAF option.

click to expand
Figure 10.4: Oracle trans parent applica tion failover.

To get a good understanding of how the TAF architecture works, it would be helpful to walk through a failover scenario using the earlier example, i.e. where a user is querying the database to retrieve 1000 rows from the database. Assume that the user is connected to node ORA-DB1 instance RAC1. By following the steps identified in Figure 10.4:

The heartbeat mechanism between the various nodes in the cluster checks to see if another node in the cluster is available and is participating in the cluster configuration. As discussed earlier, this verification process happens on a continuous basis.
Let us assume that a user executes a query against the database to retrieve 1000 rows from the database via instance RAC1.
The initial 500 rows are retrieved from the PRODDB database via instance RAC1 and returned to the user for browsing through the graphical user interface (GUI). (The application only retrieves and displays no more than 500 rows every time.)
While the user is browsing through the first 500 rows, node ORA-DB1 fails.
Node ORA-DB2 checks for the heartbeat of the other participating node and deduces that node ORA-DB1 is not responding to the heartbeat request; it times out and declares that the node ORA-DB1 has failed.
The user is unaware of the failure and scrolls past the initial 500 rows. In order to retrieve and display the remaining 500 rows, the process tries to connect to instance RAC1 but detects that RAC1 is not available.
When the process tries to connect to the instance, using the entries in the tnsnames.ora file the user connection to the other available node ORA-DB2 is established.
The user is now connected to instance RAC2 on node ORA-DB2.
Oracle re-executes the query using the connection on instance RAC2 and displays the remaining rows to the user. If the data was available in the buffer cache, the rows are returned to the user instantaneously. However, if the rows are not available, Oracle has to perform an I/O operation. This would be delayed until the recovery process has completed.

In Figure 10.4, when node ORA-DB1 fails, any SELECT statements that had partially executed on instance RAC1 are migrated as part of the failover process and are displayed through instance RAC2, when the user process fails over to node ORA-DB2. All this happens transparently without any interruption to the user. It should be noted that along with the SELECT statement, the following are also failed over:

Client/server connection
User session state
Prepared statements
Active cursors that have begun to return results to the user

The benefits that TAF adds in meeting the high-availability requirements in today's machine-critical applications are overwhelming, and the first question that arises is, why did Oracle not introduce such a feature before or why is this feature not available among other databases? Though the mechanism is very useful in meeting today's high-availability requirements, implementing such a feature is complex, basically because the database connections are not stateless. This means that during failure, the database, the users, and the transactions are in a specific state of operation, such as:

The process may be in a state of retrieving data from the database.
The database has already established a connection via Oracle Net to an instance.
A user connecting to the database has password and other user- authentication information.
The session has language and character set information that is specific to the instance on which the user has established a connection.
There are cursors open by various sessions that could be in an execution state.
SELECT cursors are open and users have partially scrolled through the various rows using the client interface or the user application browser.
INSERT, UPDATE,and DELETE (DML) statements and PL/SQL procedures are being executed.

It should be noted from the scenarios above that only SELECT statements are failed over from one node to another; transactional statements are not failed over by configuration of TAF. Transactional or DML statements can programmatically be transferred from node ORA-DB1 to node ORA-DB2 by proper validation of Oracle returned error messages and taking appropriate actions. (An example handling failover of DML statements is as shown in the Java example that appears later in this chapter.) This should be a preferred method to avoid any user interruptions, as well as keeping the database or system failures transparent to the user. Among the transactional statements, the following do not automatically failover when a node fails:

PL/SQL server-side package variables
Global temporary tables
Effect of any ALTER SESSION statements
Applications not using OCI8 and above Applications not using the JDBC thick driver
Transactional statements, i.e., statements that include INSERT, UPDATE, and DELETE operations

While the failover is in process, it would be user friendly to inform the user via the application interface that the activity or command issued may take some time. This information could be communicated by validating the various error messages returned by Oracle as part of the node and connection failure. Some of the common Oracle error codes that should be handled by the application to track and transfer transactional statements include:

ORA-01012: not logged on to Oracle
ORA-01033: Oracle initialization or shutdown in progress
ORA-01034: Oracle not available
ORA-01089: immediate shutdown in progress—no operations are permitted
ORA-03113: end-of-file on communication channel
ORA-03114: not connected to ORACLE
ORA-12203: TNS—unable to connect to destination
ORA-12500: TNS—listener failed to start a dedicated server process
ORA-12571: TNS—packet writer failure

TAF configuration

TAF can be configured using one of two methods.

TNSNAMES-based configuration
OCI API requests

TNSNAMES-based configuration

Under this method, configuring the TAF option involves adding Oracle Net parameters to the tnsnames.ora file and, when one of the participating nodes encounters failure, the use of parameter values to ascertain the next step in the failover process. The parameter that drives the TAF option is the FAILOVER_MODE under the CONNECT_DATA section of a connect descriptor. By using one or more of the subparameters listed in Table 10.1, full functionality of TAF is obtained.

Table 10.1: TAF Subparameters
Parameter	Description
BACKUP	Specifies a different net service name to be used to establish the backup connection. A backup should be specified when using PRECONNECT to pre-establish connections. Specifying a BACKUP is strongly recommended for BASIC methods; otherwise, reconnection may first attempt the instance that has just failed, adding additional delay until the client reconnects
TYPE	Specifies the type of failover. Three types of Net Oracle failover functionality are available by default to the Oracle call interface. SESSION: Fails over the session. With this option, only connection is established, no work in progress is transferred from the failed instance to the available instance. SELECT: Enables user with open cursors to continue fetching on them after failure. Oracle Net keeps track of any SELECT statements issued in the current transaction. It also keeps track of how many rows have been fetched back to the client for each cursor associated with a SELECT statement. If connection to the instance is lost, Oracle Net establishes a connection to a backup instance and continues with the execution of the SELECT statement from the point of failure
METHOD	Determines the speed of the failover from the primary to the secondary or backup node. BASIC: Establishes connections at failover time PRECONNECT: Pre-establishes connections. If this parameter is used, connection to the backup instance is made at the same time as the connection to the primary instance
RETRIES	Specifies the number of attempts to connect to the BACKUP node after a failure before giving up
DELAY	Specifies the amount of time in seconds to wait between attempts to connect to the BACKUP node after a failure before giving up

Another important parameter, or value, that should not be configured manually is the GLOBAL_DBNAME parameter in the SID_LIST_listener_name section of the listener.ora. Configur ing this parameter in listener.ora disables TAF. If the GLOBAL_DBNAME parameter has been defined, the parameter should be deleted and the database should be allowed to dynamically register the global database names automatically.

TAF implementation

The TAF option using the tnsnames.ora file can be implemented in one of three ways:

Connect-time failover and client load balancing
Retrying a connection
Pre-establishing a connection Through examples, the various implementations options are explained.

Connect-time failover and client load balancing Oracle Net connects randomly to one of the listener addresses on node ORA-DB1 or ORA-DB2. If the instance fails after the connection, Oracle Net fails over to the other node's instance, preserving any SELECT statements in progress.

The connect time failover example listed below is the basic method of tnsnames-based failover implementation. When the user session tries to connect to the instance on the first node (ORA-DB1) and determines that the instance is not responding or is not currently available, the session would immediately try the next host name defined in the list (namely ORA DB2) to establish a connection. The failover from one instance to another is true for the connections that are made for the very first time or for connection retries that occur during an instance crash when a transaction is in progress.

PRODDB.SUMMERSKYUS.COM= (DESCRIPTION= (ADDRESS_LIST= (FAILOVER=ON) (LOAD_BALANCE=ON)   (ADDRESS=(PROTOCOL=TCP)      (HOST=ORA-DB2.SUMMERSKYUS.COM)        (PORT=1521))      (ADDRESS=(PROTOCOL=TCP)           (HOST=ORA-DB1.SUMMERSKYUS.COM)              (PORT=1521)) ) (CONNECT_DATA= (SERVICE_NAME=PRODDB.SUMMERSKYUS.COM)       (ORACLE_HOME =/apps/oracle/product/9.2.0)       (FAILOVER_MODE=(TYPE=SELECT)        (METHOD=BASIC)) ) )

Note

If the FAILOVER and LOAD_BALANCE commands are not placed together below the ADDRESS_LIST argument, during a failover, certain sessions could encounter an ORA-3113 end-of-file on communication channel error.

Retrying a connection With the RETRIES and DELAY parameters as part of the FAILOVER-MODE parameter. The purpose of this parameter is that the connections to the instances are automatically retried by the number of times specified by the parameter. In this scenario the connection is retried 20 times with a delay of 15 seconds between every retry. Unlike the other option where one node in the cluster fails and the connection is re-established on one of the other surviving nodes, under this option, the connection is retried on the same node and no backup node is defined as part of the configuration. Similarly there is no significance to the load-balancing parameter, which has been set to OFF.

These additional parameters are extremely useful when there are thousands of users connected to the instance of the failed node and all these users have to establish connections to the other available nodes. In the case of a dedicated connection, there is only a single thread to establish connections, and simultaneous connection requests from a large number of users could cause connection timeouts. The retry and delay parameters help to retry the connection, with a delay between retries to establish connections to the available node. This is less of an issue when the shared server is configured and used to establish connections to the database, because users are placed in a queue and when a connection becomes available the user establishes connection.

PRODDB.SUMMERSKYUS.COM=    (DESCRIPTION=     (ADDRESS_LIST=      (FAILOVER=ON)     (LOAD_BALANCE=OFF)     (ADDRESS=(PROTOCOL=TCP)      (HOST=ORA-DB2.SUMMERSKYUS.COM)        (PORT=1521))     (ADDRESS=(PROTOCOL=TCP)        (HOST=ORA-DB1.SUMMERSKYUS.COM)        (PORT=1521)) ) (CONNECT_DATA=  (SERVICE_NAME=PRODDB.SUMMERSKYUS.COM) (ORACLE_HOME = /apps/oracle/product/9.2.0) (FAILOVER_MODE=(TYPE=SELECT) (METHOD=BASIC) (RETRIES=20) (DELAY=15)) ) )

Pre-establishing a connection Another implementation option available under the TAF configuration is to set up a pre-established connection to the backup or secondary instance. One of the potential performance issues is the time required to re-establish a connection after the primary instance has failed, which depends on the time taken to establish a connection to the backup or secondary instance. This could be resolved by pre- establishing connections, which means that the initial and backup connections are explicitly specified. While there is a great advantage in pre-establishing the connection, this is not without drawbacks; pre- established connections consume resources. During some controlled failover testing, additional resource usage was noticed when using pre- established connections. This is because the process always validates the connection throughout its activity.

In the following example, the Oracle Net connects to the listener on ORA-DB1 and simultaneously also makes a connection to the other instance on ORA-DB2. While the process has to make two connections at the beginning of a transaction, the time required to establish a connection during the failover is reduced.

If ORA-DB1 fails after the connection, Oracle Net fails-over to ORA-DB2, preserving any SELECT statements in progress. Having the backup connection already in place, it can reduce the time needed for a failover.

Apart from the additional resource consumption, another drawback in using the preconnection method is that if the connection to the backup instance does not succeed during failover connect time, fail back to the original instance is not possible.

Pre-establishing a connection implies that the backup node is predefined or hard coded. This reduces the scope of availability, as the connection to the original nodes/instances is not dynamic like in the other methods.

PRODDB =     (DESCRIPTION =      (ADDRESS=     (PROTOCOL=TCP)    (HOST=ORA-DB1)    (PORT=1521)) (CONNECT_DATA =     (SERVICE_NAME = PRODDB.SUMMERSKYUS.COM)    (INSTANCE_NAME=ORA-DB1RAC1)    (FAILOVER_MODE=  (BACKUP=ORA-DB2RAC2.SUMMERSKYUS.COM) (TYPE=SELECT) (METHOD=PRECONNECT))) ORA-DB2=    (DESCRIPTION=     (ADDRESS=    (PROTOCOL=TCP)    (HOST=ORA-DB2)    (PORT=1521)) (CONNECT_DATA=     (SERVICE_NAME=PRODDB.SUMMERSKYUS.COM)    (INSTANCE_NAME=ORA-DB2RAC2)    (FAILOVER_MODE=  (BACKUP=ORA-DB1RAC1.SUMMERSKYUS.COM) (TYPE=SELECT) (METHOD=PRECONNECT)))

OCI API requests

Under this method, implementing TAF involves using Oracle-provided APIs to accomplish what is normally performed through the tnsnames.ora file. However, under the OCI-based method, the application servers have a better control of what these APIs accomplish and provide appropriate actions based on the results from these calls.

When the OCI definitions are used, TAF is always active and there are no preset requirements of setting the mode to failover.

OCI-based TAF configuration is made possible by using the various failover type events provided through APIs. The failover events shown in Table 10.2 are part of the OracleOCIFailover interface.

Table 10.2: Failover Events
Failover Event	Description
FO_SESSION	The user session is reauthenticated on the server-side while open cursors in the OCI application need to be re-executed. This call is equivalent to FAILOVER_MODE= (TYPE= SESSION) defined in the tnsnames.ora file
FO_SELECT	The user session is reauthenticated on the server side; however, open cursors in the OCI can continue fetching. This implies that the client-side logic maintains the fetch state of each open cursor. This call is equivalent to FAILOVER_MODE= (TYPE= SELECT) defined in the tnsnames.ora file
FO_NONE	This is the default mode and implies no failover functionality is used. This call is equivalent to FAILOVER_MODE= (TYPE= NONE) defined in the tnsnames.ora file
FO_BEGIN	Indicates that failover has detected a lost connection and failover is starting
FO_END	Indicates successful completion of failover
FO_ABORT	Indicates that failover was unsuccessful and there is no option of retrying
FO_REAUTH	Indicates that a user handle has been reauthenticated
FO_ERROR	Indicates that failover was temporarily unsuccessful. This gives the application the opportunity to handle the error and retry failover In the case of an error while failing over to a new connection, the JDBC application is able to retry failover. Typically the application sleeps for a while and then it retries, either indefinitely or for a limited amount of time, by having the callback return FO_RETRY. The retry functionality is accomplished by using the FAILOVER_MODE= (RETRIES= <>, DELAY= <>) defined in the tnsnames.ora file
FO_EVENT_UNKOWN	Indicates a bad failover event

TAF callbacks TAF callbacks are used to track and trace failures. They are called during the failover to notify the JDBC application regarding events that are generated. In this case, unlike the tnsnames-based TAF configuration, the application has some control over the failover operation.

To address the issue of failure while establishing a connection of the failover process, the callback function is invoked programmatically several times during the course of re-establishing the user's session.

The first call to the callback function occurs when Oracle first detects an instance connection loss. At this time the client may wish to replay ALTER SESSION commands and inform the user that failover has happened. If failover is unsuccessful, then the callback is called to inform the application that failover will not take place.

A detailed example will clearly demonstrate the advantages of utilizing the interface provided by Oracle, OracleOCIFailover:

public interface OracleOCIFailover{ // Possible Failover Types public static final int FO_SESSION = 1; public static final int FO_SELECT = 2; public static final int FO_NONE = 3; public static final int; // Possible Failover events registered with callback public static final int FO_BEGIN = 1; public static final int FO_END = 2; public static final int FO_ABORT = 3; public static final int FO_REAUTH = 4; public static final int FO_ERROR = 5; public static final int FO_RETRY = 6; public static final int FO_EVENT_UNKNOWN = 7; public int callbackFn (Connection conn,            Object ctxt, // Anything the user            wants to save            int type, // One of the above possible Failover            Types              int event ); // One of the above                  possible Failover Events

In the case of a failure of one of the instances, Oracle tries to restore the connections of the failed instance onto the active instance. This causes a possible delay, in which case the users are to be notified, as a business rule.

package rac.chapter10.taf; //java imports import java.sql.Connection; import java.sql.Statement; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.DriverManager; //Oracle imports import oracle.jdbc.OracleConnection; //log4j imports. import org.apache.log4j.Category; public class TAFDetailsExample { /* Connection object to handle the database connection*/ private Connection con = null; /** * TAFCallbackFn class implements the interface provided * by Oracle in case of FailOver */ private TAFCallbackFn clbFn = null; /* Failover string */ private String strFailover = null; /* Statement object to execute query */ private Statement stmt = null; /* Result set object to hold the results of the query*/ private ResultSet rs = null; /* used for getting an instance of log4j for this class. */ protected static Category cat = Category.getInstance (TAFDetailsExample.class.getName()); /** * Constructor. */ public TAFDetailsExample() { } public static void main(String[] args) { TAFDetailsExample tafDE = new TAFDetailsExample(); // This method handles database connections. tafDE.handleDBConnections(); try { // This method is used to execute query tafDE.runQuery(); cat.debug(tafDE.toString()); // This method is used to free the resources allocated tafDE.closeConnections(); } catch (SQLException e) { e.printStackTrace(); } } /** * This method is used to clear all the resources allocated. * @throws SQLException */ void closeConnections() throws SQLException { rs.close(); stmt.close(); con.close(); cat.debug("Allocated Resources are free now."); } /** *This method is used to handle the database connection *with specific connection strings. */ void handleDBConnections() { try { // Register the Oracle driver DriverManager.registerDriver( new oracle.jdbc.driver.OracleDriver()); // Create a Connection to the database with specific connection string con = (OracleConnection) DriverManager.getConnection( "jdbc:oracle:oci:@PRODDBTRANS ", "user", "pwd"); if (con != null) this.RegisterFailOver(); //register failover } catch (SQLException e) { cat.debug("Error Occurred while registering Failover."); e.printStackTrace(); } } /** * This Function registers the class that implements the * Oracle OCIFailover Interface. This is done to notify Oracle that in case of a failure the callback function in the registered class is to be invoked. */ void RegisterFailOver() throws SQLException { clbFn = new TAFCallbackFn(); strFailover = new String("Register Failover"); // Registers the callback Function. ((OracleConnection) con).registerTAFCallback(clbFn, strFailover); cat.debug(" Failover Registered Successfully. "); } /** * This method is used to execute query. */ void runQuery() { try { stmt = con.createStatement(); } catch (SQLException e) { e.printStackTrace(); } long startTime = 0; long endTime = 0; /** * This loop is used for testing purposes only. */ for (int i = 0; i < 20000; i++) { startTime = System.currentTimeMillis(); try { /** * This query is just used for testing purposes. * In a real time scenario the query is dynamically passed and can be any valid PL/SQL statement. */ String query = "SELECT USP.USPRL_ID," +      "USP.USPRL_FIRST_NAME, " +      "USP.USPRL_LAST_NAME, " +      "USP.USPRL_CITY, " +      "USP.USPRL_STATE_ID, " +     "COMP.COMP_NAME, " +      "COMP.COMP_TYPE_CD, " +      "USP.USPRL_EMAIL, " +      "USP.USPRL_PHONE, " +      "US.USEC_TOTAL_LOGINS, " +      "USP.USPRL_ROLE_CD, " +      "COMP.COMP_SCAC_CODE, " +      "USP.USPRL_LOGIN_NAME, " +      "UL.USRLI_ID " +      "FROM USER_PROFILE USP, " +     "COMPANY COMP, " +      "USER_LOGIN UL, " +      "USER_SECURITY US " +      "WHERE UL.USRLI_ACTIVE_STATUS_CD= 'ACTIVE'AND" +     "UL.USRLI_LOGGED_IN_EUSR_ID= USP.USPRL_ID " +      "AND USP.USPRL_COMP_ID = COMP.COMP_ID AND " +     "USP.USPRL_ID = US.USEC_USPRL_ID ORDER BY " + "COMP.COMP_COMP_TYPE_CD, " +  "COMP.COMP_NAME, USP.USPRL_LAST_NAME"; rs = stmt.executeQuery(query); } catch (SQLException e) { /** * The limitations for failover prevent INSERT, DELETE, UPDATE * and transactional statements from failing over. * The possible errors that Oracle could throw in such a * situation can be handled and we can get a new connection * and execute the statements keeping the failure transparent to the user. * The possible errors for handling are. */ if ((e.getErrorCode() == 1012) k // ORA-01012: not logged on to Oracle (e.getErrorCode() == 1033) k // ORA-01033: Oracle initialization // or shutdown in progress (e.getErrorCode() == 1034) k // ORA-01034: Oracle not available (e.getErrorCode() == 1089) k // ORA-01089:immediate shutdown in // progress, no operations are // permitted 414 10.2 Availability (e.getErrorCode() == 3113) k // ORA-03113: end-of-file on // communication channel (e.getErrorCode() == 3114) k // ORA-03114: not connected to Oracle (e.getErrorCode() == 12203) k // ORA-12203: TNS---unable to connect // to destination (e.getErrorCode() == 12500) k // ORA-12500: TNS---listener failed to // start a dedicated server process (e.getErrorCode() == 12571)) // ORA-12571: TNS---packet writer // failure { cat.debug("Node failed while executing" +             "INSERT/DELETE/UPDATE/TRANSACTIONAL Statements"); // Get another connection handleDBConnections(); // re execute the query. runQuery(); } else // The failure is not due to a node failure. e.printStackTrace(); } endTime = System.currentTimeMillis(); if (cat.isDebugEnabled()) cat.debug("Execution Time for the query is " +            (endTime - startTime) + " ms."); } } public String toString() { StringBuffer sb = new StringBuffer(); try { sb.append("\nResultset values are " + "\n" + rs.getType()); } catch (SQLException e) { e.printStackTrace(); } return sb.toString(); } } package rac.chapter10.taf; //Java imports import java.sql.Connection; import java.sql.SQLException; //log4j imports import org.apache.log4j.Category; //Oracle imports import oracle.jdbc.OracleOCIFailover; public class TAFCallbackFn implements OracleOCIFailover { private static Category cat = Category.getInstance(TAFCallbackFn.class.getName()); public TAFCallbackFn() { } /** * This callback function will be invoked on failure of a * node or lost connections * @param connection - The failed connection which will be restored * @param o - Used to hold the user Context object * @param i - failover type * @param i1 - failover event * @return - In case of an error return FO_RETRY else * return 0 */ public int callbackFn(Connection connection, Object o, int type, int event) { String foType[] = { "SESSION", "SELECT", "NONE" }; String foEvent[] = { "BEGIN", "END", "ABORT", "REAUTHORISE", "ERROR", "RETRY", "UNKNOWN" }; try { cat.debug("The connection for which the failover occurred is :" +         connection.getMetaData().toString()); } catch (SQLException e) { e.printStackTrace(); } cat.debug("FAILOVER TYPE is : " + foType[type-1]); cat.debug("FAILOVER EVENT is : " + foEvent[event-1]); switch (event) { case FO_BEGIN: cat.info("Failover event is begin "); break; case FO_END: cat.info("Failover event is end"); return 0; case FO_ABORT: cat.info("Failover is aborted"); break; case FO_REAUTH: cat.info("Failover needs reauthorization"); break; case FO_ERROR: cat.info("Error occurred while failing over. Retrying " +" to restore connection. "); try { Thread.sleep(2000);//sleep for 2seconds } catch (InterruptedException e) { // Trap errors cat.error("Error while causing the currently executing thread to sleep"); } return FO_RETRY; default: cat.info("Default is returned"); } cat.debug("Before returning from the Callback Function."); return 0; } }

In the example above, the Oracle JDBC driver is registered and the connection is obtained from the DriverManager.

DriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver()); con=OracleConnection)DriverManager.getConnection ("jdbc:oracle:oci:@PRODDB","user", "pwd");

Here PRODDB represents an entry in the tnsnames.ora, which has the connection strings. In this file, failover has been enabled (FAILOVER=ON) and type of failover indicates that all SELECT queries will be failed over (TYPE=SELECT):

PRODDB.SUMMERSKYUS.COM= (DESCRIPTION= (ADDRESS_LIST= (FAILOVER=ON) (LOAD_BALANCE=ON) (ADDRESS=(PROTOCOL=TCP) (HOST=ORA-DB2.SUMMERSKYUS.COM) (PORT=1521)) (ADDRESS=(PROTOCOL=TCP) (HOST=ORA-DB1.SUMMERSKYUS.COM) (PORT=1521)) ) (CONNECT_DATA= (SERVICE_NAME=PRODDB.SUMMERSKYUS.COM) (ORACLE_HOME = /apps/oracle/product/9.2.0) (FAILOVER_MODE=(TYPE=SELECT) (METHOD=BASIC) (RETRIES=20) (DELAY=15)) ) )

After a connection is established, the class that is implementing the Oracle interface should be registered with Oracle:

clbFn = new TAFCallbackFn(); strFailover = new String("Register Failover"); ((OracleConnection)con).registerTAFCallback (clbFn, strFailover);

This notifies Oracle that in the case of a failure, the callback function, which is implemented in the class TAFCallbackFn, is to be called. Oracle also provides the failover type (SESSION, SELECT,or NONE) and the present failover event.

package rac.chapter10.taf; //java imports import java.sql.Connection; import java.sql.Statement; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.DriverManager; //Oracle imports import oracle.jdbc.OracleConnection; //log4j imports. import org.apache.log4j.Category;

When the failover starts, Oracle sends the FO_BEGIN event, thus notifying the application that the failover has begun and behind the scenes tries to restore the connection. As explained earlier, if the failover type is SELECT, the query is re-executed and the cursor is positioned to the row where the failure occurred. Additionally, the session on the initial instance may have received session-specific commands (ALTER SESSION), which need to be re-executed before the failover process is activated and the user session is established to continue. As discussed in the earlier section, session-specific commands will not be automatically replayed on the failed over instance. Also, the callback is called each time a user handle besides the primary handle is reauthenticated on the new connection. Since each user handle represents a server-side session, the client program would need to replay the ALTER SESSION commands for that session.

All the above-mentioned limitations for failover need to be handled so that the failure is transparent to the user. The possible errors that can be thrown by Oracle in such a case are handled and the connection is re-established and the query is re-executed.

if ((e.getErrorCode() == 1012) || // ORA-01012: not logged on to Oracle (e.getErrorCode() == 1033) || // ORA-01033: Oracle initialization // or shutdown in progress (e.getErrorCode() == 1034) || // ORA-01034: Oracle not available (e.getErrorCode() == 1089) || // ORA-01089: immediate shutdown in // progress, no operations are permitted (e.getErrorCode() == 03113) || // ORA-03113: end-of-file on communication channel (e.getErrorCode() == 03114) || // ORA-03114: not connected to Oracle (e.getErrorCode() == 12203) || // ORA-12203: TNS---unable to connect // to destination (e.getErrorCode() == 12500) || // ORA-12500: TNS---listener failed to // start a dedicated server process (e.getErrorCode() == 12571)) // ORA-12571: TNS--- // packet writer failure { cat.debug("Node failed while executing" + " TRANSACTIONAL Statements"); // Get another connection handleDBConnections(); // re execute the query. runQuery(); }

When the connection is established and the failover has ended, the FO_END event is sent back to the application saying that the failover has been completed.

Transitions may not always be smooth; if an error is encountered while restoring a connection to the failed over instance, the FO_ERROR event is sent to the application indicating the error and requesting the application to handle this error appropriately. Under these circumstances, the application could provide a retry functionality where the application will rest or sleep for a predefined interval of time and send back a FO_RETRY event. If during a subsequent attempt a similar error occurs, the application would retry again until the number of retry attempts specified by the property RETRIES in the tnsnames.ora file has been reached. The sleep or rest time is defined by the property DELAY also defined in the tnsnames.ora file.

case FO_ERROR: cat.info("Error Occurred while failing over. Retrying to restore connection."); try { Thread.sleep(2000); } catch (InterruptedException e) { // Trap errors cat.error("Error while causing the currently executing thread to sleep"); } return FO_RETRY;

An extract from a debug log below shows the scenario where a failure occurred and the query that executed on the primary instance in 47 ms failed over and re-executed the query in about 3026 ms (this includes the time for the session to failover, establish a new connection and re-execute the query). The entire failover operation is transparent to the user.

20:55:54,041 runQuery TAFDetailsExample.java 91 DEBUG: Execution Time for the query is 47ms. 20:55:54,041 runQuery TAFDetailsExample.java 91 DEBUG: Execution Time for the query is 47ms. 20:55:54,041 runQuery TAFDetailsExample.java 91 DEBUG: Execution Time for the query is 32ms. 20:55:54,056 callbackFn TAFCallbackFn.java 45 DEBUG:

The connection for which the failover occurred is:

oracle.jdbc.driver.OracleDatabaseMetaData@6b13c7 20:55:54,056 callbackFn TAFCallbackFn.java 49 DEBUG: FAILOVER TYPE is : SELECT 20:55:54,056 callbackFn TAFCallbackFn.java 50 DEBUG: FAILOVER EVENT is : BEGIN 20:55:54,072 callbackFn TAFCallbackFn.java 57 INFO: Failover event is begin 20:55:54,072 callbackFn TAFCallbackFn.java 89 DEBUG: Before returning from the callBack Function. 20:55:56,121 callbackFn TAFCallbackFn.java 49 DEBUG: FAILOVER TYPE is : SELECT 20:55:56,121 callbackFn TAFCallbackFn.java 50 DEBUG: FAILOVER EVENT is : END 20:55:56,137 callbackFn TAFCallbackFn.java 61 INFO: Failover event is end 20:55:56,138 callbackFn TAFCallbackFn.java 89 DEBUG: Before returning from the callBack Function. 20:55:57,018 runQuery TAFDetailsExample.java 91 DEBUG: Execution Time for the query is 3026ms.

TAF verification

Implementation of TAF could be verified by querying the Oracle-provided data dictionary views. V$SESSION has three columns, FAILOVER_MODE, FAILOVER_TYPE, and FAILED_OVER, that provide information pertain ing to TAF implementation, and verification of results when the node in the cluster crashes and the session fails-over to one of the available nodes.

SELECT SID, USERNAME, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER FROM V$SESSION / SID   USERNAME   FAILOVER_TYPE  FAILOVER_M  FAILED_OVER ---- ----------  -------------   --------    ----------- 316   OLTP_USER  SELECT         BASIC          YES 317   OLTP_USER  SELECT         BASIC          NO 320   OLTP_USER  SELECT         BASIC          NO 326   OLTP_USER  SELECT         BASIC          NO 1257  MVALLATH   NONE           NONE           NO 328  OLTP_USER   SELECT          BASIC          YES 330  OLTP_USER   SELECT          BASIC          YES 332  OLTP_USER   SELECT          BASIC          YES 337  OLTP_USER   SELECT          BASIC          NO 338  OLTP_USER   SELECT          BASIC          NO 341  OLTP_USER   SELECT          BASIC          YES

The above query provides the details and status of the failover operation. The output of the query indicates that five users' sessions have failed over (FAILED_OVER = YES) from the instance that had crashed. The user SCHEMA_OWNER has a connection to the database, but has not been set up to use the failover option and has the default FAILOVER_TYPE of NONE. On systems where there are many sessions it would be better to look at the details by grouping the results. The following query gives a consolidated count on the operation:

SELECT MACHINE,   FAILOVER_TYPE,   FAILOVER_METHOD,   FAILED_OVER,   COUNT (*) FROM V$SESSION    GROUP BY MACHINE,    FAILOVER_TYPE,    FAILOVER_METHOD,    FAILED_OVER

When configuring TAF to have the TYPE=SELECT:

The ordering of rows retrieved by a SELECT statement is not fixed; for this reason, queries that might be replayed should contain an ORDER BY clause. However, even without an ORDER BY clause, rows returned by the reissued query are nearly always returned in the initial order; known exceptions are queries that execute using the HASH JOIN or PARALLEL query features. If an ORDER BY clause is not used, OCI will check if the set of discarded rows matches those previously retrieved to ensure that the application does not generate incorrect results.
Recovery time after a failover can be longer when using TYPE=SELECT. For example, if a query that retrieves 100,000 rows is interrupted by a failure after 99,989 rows have been fetched, then the client application will not be available for new work after a failover until 99,989 rows have first been refetched, discarded, and the last 11 rows of the query have been retrieved.

Other benefits of TAF

The main functionality of the TAF features is to failover users' sessions from the failed instance to another active instance. However by putting to use this same functionality there are other useful scenarios where TAF improves system availability. Some of these functions are:

Transactional shutdown
Quiescing the database

Shutdown Transactional

During maintenance windows, when an instance needs to be freed from user or client activity (for example, applying a database patch to an instance without interrupting service to the clients), TAF can come in handy. By using the shutdown transactional, shutting down selected nodes rather than an entire database, users can be migrated from one instance to another. This is done by using the TRANSACTIONAL clause of the SHUTDOWN statement, which removes a node from service so that the shutdown event is deferred until all existing transactions are completed. This routes newly submitted transactions to an alternate node.

For example, the output below indicates a SHUTDOWN TRANSACTIONAL operation:

SQL> SHUTDOWN TRANSACTIONAL Database closed. Database dismounted ORACLE instance shut down SQL>

Quiescing the database

Certain database administrative activities require isolation from concurrent user transactions or queries. To accomplish such a function, the quiesce database feature could be used. Quiescing the database prevents having to shut down the database and reopen it in restricted mode to perform these administrative tasks.

Quiescing of the database is accomplished by issuing the following command:

ALTER SYSTEM QUIESCE RESTRICTED.

The QUIESCE RESTRICTED clause allows administrative tasks to be performed in isolation from concurrent user transactions or queries. In a RAC implementation, this affects all instances, not just the one that issues the statement.

Note

In a RAC implementation, quiescing of the database is allowed only if the database resource manager feature was activated at instance startup. The resource manager should be started on all instances. It is through the facilities of the database resource manager that non-DBA sessions are prevented from becoming active. It should be noted that while the quiesce statement is in effect, any changes to the current resource plan are queued until after the system is unquiesced.

After completion of DBA activities, the database could be unquiesced by issuing the following statement.

SQL>ALTER SYSTEM UNQUIESCE;

Once the database has been unquiesced the non-DBA activities are allowed to proceed.

10.2.5 Oracle Real Application Cluster Guard

This section will focus on the architecture and configuration of Oracle Real Application Cluster Guard (RACG). One of the great advantages or functionalities of RAC is to provide high availability. Availability can be provided either through an active/active configuration or an active/passive configuration. In the active/active configuration, both instances are active all the time and users can connect to either instance by connecting directly to the address of the instance or through a common load balance option using the load-balancing feature of Oracle Net. Active/passive configura tions are where only one instance is active all the time and the other instance is provided for availability, in the sense that when one instance fails, users would automatically transfer to the other instance. RACG is one such tool that provides failover of the services from the failed node or instance to the other instance.

Oracle, in the previous versions of the product, provided a similar functionality through the failsafe feature. Oracle parallel failsafe was available on a limited set of clusters and required a separate installation. The primary difference between a failsafe option and the current RACG option is that, in the case of failsafe the failover instance is not active, it is made active after the primary instance has failed and when the actual failover happens. In the case of RACG, both instances are available all the time; however, users only connect to one active instance, and when the primary instance fails, the users are failed over to the secondary instance. Unlike the failsafe option, RACG combines the features of RAC and the vendor's cluster management services to provide an efficient failover.

Architecture

Figure 10.5 illustrates a two-node RACG configuration. Each node executes the vendor's CM, which in addition to its normal functions, is responsible for running and halting scripts automatically upon failover or when you issue the appropriate command.

click to expand
Figure 10.5: Oracle Real Application Cluster Guard architecture.

Components of RACG

Packs

Each node contains a pack of software provided by Oracle. A pack is software that ensures the availability of the resources required to run an Oracle instance. A pack supports a single instance with access provided through listeners. A pack controls startups, shutdowns, and restarts of the processes under its control.

There is one pack for each instance. Packs contain the following components:

RAC instance
Listeners
IP Addresses
Pack functions
Monitors
Disk group manager

RAC instance

In Figure 10.5, node ORA-DB3 is running in the primary instance role and node ORA-DB4 in the secondary instance role.

Listeners One or more listeners can be configured to accept Oracle Net connection requests. Public listeners support clients, and private listeners support tools such as OEM and RMAN and also provide access for database administration tasks. Private listeners can also be used by the DBA for administration tasks. In Figure 10.5, the public listener points to the IP address of node ORA-DB3, which is configured as the primary node.

IP addresses Clients can use the pack's relocatable IP address to access the resources managed by the pack. A relocatable IP address is a public IP address that is configured to be up or down by the RACG. A relocatable IP address is not associated with a specific physical server; it can float between physical servers. It is initially associated with only the primary node. If the primary node fails, then the address fails over to a different cluster node (a secondary node). The relocatable IP address is configured to be up as the first step when the pack is running and is configured to be down as the last step when the pack is halted.

A stationary, private IP address is configured for private tasks such as IPC, heartbeat, system management, and RMAN operations. A private listener supports access to the instance through the private IP address.

Public IP addresses are relocatable and can be moved between nodes to maintain availability to an active instance. However, private IP addresses are static and support connections to private listeners.

Pack functions Packs do the following:

Start and stop the relocatable IP address and public listener

Start and stop the private listener
Start and stop the Oracle instance
Start and stop the monitors

A pack starts up the Oracle instance and monitors the instance. If it determines that the instance has expired, then it ensures that the resources associated with that instance are moved to the secondary node and subsequently enables service on the secondary node.

A pack can run on either the primary or the secondary node. When it is on its primary node, it starts up and shuts down everything. However, when the pack is on its foreign node, it only configures the relocatable IP address to be up or down.

Monitors

Instance monitor: The instance monitor detects termination of the local instance and initiates failover to the secondary node or restarts the instance.
Listener monitor: The function of the listener monitor is to check and restart the listeners. When the public listener fails to restart, the listener monitor exits and initiates a halt script, at which point RACG either begins failover or restarts the primary instance, depending on the state of the secondary node.
Heartbeat monitor: The heartbeat monitor checks the availability of the Oracle instance. During normal operation, the heartbeat monitor on each instance updates its own local heartbeat and checks the heartbeat of the other instance. The heartbeat monitor on the primary instance also executes a customer-defined query to test whether the primary instance is capable of work. The local Oracle instance is considered unavailable if the heartbeat monitor fails to complete three consecutive attempts and there are no unusual circumstances, such as instance recovery or unusually large numbers of sessions logging on.

If the primary instance is unavailable and the primary instance role has not resumed normal function on its new node, then the heartbeat monitor initiates takeover. A takeover occurs when the secondary node executes failover of the primary instance role to itself.

Disk group manager A disk group manager (DGM) is required only on some platforms, to enable public access to the database disks by the current primary node.

Configuration

In Oracle 9i, RACG supports a large variety of clusters and is automatically installed with the RAC option. The typical configuration for RACG is a two-instance RAC database with ACTIVE_INSTANCE_COUNT = 1 defined in their parameter files. This enables one instance as the primary instance and one as the secondary instance. The primary instance masters the entire GRD and is the only instance to allow user connections through Oracle Net Services. The secondary instance takes over the primary role when the primary instance fails.

In addition, the pieces of software necessary to manage an instance are configured into packs. Each pack is a self-contained set of software that can enable and monitor all the components of a RACG instance on a node.

Types of configurations

Hub configuration A hub configuration consists of one node that serves as the secondary node for other nodes that serve as primary nodes for separate installations of RACG databases. The simplest possible hub configuration consists of three nodes.

Figure 10.6 illustrates that the primary instance for database A resides on node ORA-DB3, the primary instance for database B resides on node ORA-DB4, and the primary instance for database C resides on node ORA-DB5. The secondary instances for all three databases reside on node ORA-DB6.

click to expand
Figure 10.6: Oracle Real Application Cluster Guard hub configuration.

In a stable state, all primary instances run on their preferred primary nodes. When a failure on a primary node occurs, the primary instance fails over to its secondary instance on node ORA-DB6. A single failover of a primary node to the secondary node has minimal impact; however, if this failure pattern repeats or if there are several failures there would a considerable impact on the performance of RAC. In the above configuration, if node ORA-DB6, which is configured as the secondary node for all the primary nodes fails, then all of the RACG installations lose resilience. In this configuration node ORA-DB6 is also the single point of failure.

Ring configuration Compared to the hub-based configuration, this configuration distributes the various primary and secondary instances amongst various nodes. No one node is a single point of failure.

Figure 10.7 illustrates the RACG ring configuration. In this con figuration three nodes are configured with each node containing a primary and secondary instance that maps to a primary and secondary instance in another node forming a ring/circle configuration.

click to expand
Figure 10.7: Oracle Real Application Cluster Guard ring configuration.

The primary instance for database A resides on node ORA-DB3, while the secondary instance for database A resides on node ORA-DB4. The primary instance for database B resides on node ORA-DB4, while the secondary instance for database B resides on node ORA-DB5.

The primary instance for database C resides on node ORA-DB5, while the secondary instance for database C resides on node ORA-DB3.

Starting the RACG

The RACG software is controlled from the command line. To start RACG:

Log in as the root user.
Shut down all listeners associated with the RACG database on the cluster.
Ensure that the DB_NAME, ORACLE_SERVICE, ORACLE_HOME, and if necessary, the ORACLE_BASE environment variables are set correctly.
Enter the following command:
```
# pfsctl
```
Start RACG from the PFSCTL prompt:
```
PFSCTL> pfsboot
```

Check the RACG packs log file for any errors:

$ORACLE_BASE/admin/db_name/pfs/pfsdump/ pfs_<ORACLE_SERVICE>_hostname.log $ORACLE_HOME/pfs/db_name/log/pfs_<ORACLE_SERVICE>_hostname.log

Check the Oracle heartbeat monitor logs for errors:

$ORACLE_BASE/admin/db_name/pfs/pfsdump/ pfs_<ORACLE_SERVICE>_hostname_ping.log $ORACLE_HOME/pfs/db_name/log/ pfs_<ORACLE_SERVICE>_hostname_ping.log

Operation

Figure 10.8 illustrates the failure operation when the primary instance fails. During normal operation, both node ORA-DB3 and node ORA-DB4 are operational. Pack A is running on its primary node, node ORA-DB3, and has the primary instance role. It contains the primary instance and an IP address. Pack B is running on its primary node, node ORA-DB4, and has the secondary instance role. It contains the secondary instance and an IP address.

click to expand
Figure 10.8: Oracle Real Application Cluster Guard failover operation.

If the primary instance fails, then RACG automatically does the following:

The secondary instance becomes the primary instance.
Pack A starts on node ORA-DB4 in foreign mode. This means that only its relocatable IP address is configured to be up on node ORA-DB4.

Now both Pack A and Pack B are running on node ORA-DB4. Pack B contains the primary instance and its IP address. Pack A has only the relocatable IP address configured to be up. Nothing is running on node ORA-DB3.

A notification about the failure is sent to the PFS log. If the system is set up to notify an administrator of the failure, then the administrator can use the RESTORE command to restore the secondary instance role. At which point RACG starts Pack A on node ORA-DB3. Because the instance on node ORA-DB4 now has the primary instance role, the instance associated with Pack A assumes the secondary instance role when it restarts. When both instances are up and operating, the system has resilience.

Restoration

After RACG fails over the primary instance role, the packs are on their home nodes, but the instance roles are reversed. If the primary instance needs to run on the preferred primary node, then this can be done by using the MOVE_PRIMARY and RESTORE commands.

Pack A is on node ORA-DB3 and has the secondary instance role. Pack B is on node ORA-DB4 and has the primary instance role. When the user enters the MOVE_PRIMARY command, RACG halts Pack B. The secondary instance, which is running on node ORA-DB3, becomes the primary instance. When the user enters the RESTORE command, RACG starts Pack B on node ORA-DB4. Pack B assumes the secondary instance role. The packs are now running on their home nodes with their original roles.

Advantages and disadvantages of RACG

The primary advantage of using the RACG configuration for high availability, is where one node is defined as the primary node and the other node is not being used. Under these circumstances, the second node is configured as the secondary node and is only used for a failover operation.

There are several disadvantages to this configuration:

The secondary node is not utilized until such time that the primary node fails. While the secondary node is up and operable, no user connections are permitted to this node.
There are several other alternative solutions to high availability such as ODG and OAR. The solutions not only provide high availability and failover, but also allow minimal use of the secondary node, for example, for read-only or reporting solutions.
For RACG to be operational, it requires RAC to be installed and configured, which is an expensive solution considering the fact that the secondary instance is not utilized.
While RACG configuration could be implemented on multiple nodes as illustrated in the hub and ring configuration scenarios, these configurations are very complex to set up, implement, and manage. The ideal configuration is a two-node configuration.
RACG operation requires several manual interferences to reinstate the original primary and secondary node configurations.

^[1]Oracle provides a daemon process called the ''watchdog'' for Linux environments for failover detection. The watchdog process provides the heartbeat functionality for Oracle Cluster Management Services (OCMS).

< Day Day Up >