8.1 Failover

Team-Fly    

 
Oracle Net8 Configuration and Troubleshooting
By Jonathan  Gennick , Hugo Toledo
Table of Contents
Chapter 8.  Net8 Failover and Load Balancing


Where Net8 is concerned , failover refers to the ability to switch to another database instance when the connection to the primary instance fails. Consider the situation shown in Figure 8.1 where a client connects to an instance named PROD01 in a Parallel Server environment. Under some circumstances, if the PROD01 instance fails, Net8 can automatically connect the client to the backup instance named PROD02.

Figure 8.1. Failover to a backup instance in an OPS environment
figs/n8c_0801.gif

Connection failures can be broadly categorized as follows :

  • Those that occur while attempting to connect for the first time

  • Those that occur after a connection has been established

The first category is the easiest one to deal with. If you attempt to connect to an instance and for some reason that attempt fails, you can simply try again, but this time to a backup instance. For example, if you can't connect to the PROD01 instance, you can try connecting to PROD02. This process, which can continue for as long as you have backup instances, is known as connect-time failover .

If a connection fails after having been established, then the application normally must handle the details of reconnecting to a backup database, reestablishing the session environment, and resubmitting any work lost as a result of the failure. This type of failover is known as application failover . For certain types of applications, Net8 can handle application failure in a manner that is more or less transparent to the application. This Net8 feature is known as Transparent Application Failover (TAF).

8.1.1 Connect-Time Failover

You can use connect-time failover to cause clients to connect to a backup instance in cases where the primary instance cannot be reached. This makes the most sense in an OPS environment where multiple instances are all accessing the same database. However, it can be done in a non-OPS environment as well. If you are using Oracle's standby database feature, you can configure a net service name so that clients connect to the standby database whenever the primary database is unreachable. Similarly, you could connect to a backup database maintained using Oracle's replication features.

When you define a net service name, you can use any one of the following mechanisms to specify failover:

  • Multiple listener addresses within an address list

  • Multiple listener addresses within a description

  • Multiple descriptions within a description list

The difference between the three methods is that multiple listener addresses are specified, they all use the same connect data, while using multiple descriptions allows you to specify different connect data information for each connection. Net8 Assistant supports only the first mechanismthat of specifying multiple listener addresses within an address list. To use one of the other mechanisms, you'll need to manually edit your tnsnames.ora file.

One important issue to be aware of is that connect-time failover only works if you are dynamically registering global database names with your listeners. If you are statically configuring global database names , as shown in the following example, then connect-time failover will not work in a consistent manner:

 SID_LIST_PRODLISTENER =   (SID_LIST =     (SID_DESC =       (GLOBAL_DBNAME = donna.gennick.org)       (ORACLE_HOME = /s01/app/oracle/product/8.1.6)       (SID_NAME = donna)     )   ) 

With respect to this example, if you want to use Net8's connect-time failover feature, you need to delete the GLOBAL_DBNAME parameter and allow the database to register itself with the listener automatically. You can list the database in your SID_LIST; you just can't include the GLOBAL_DBNAME parameter.

8.1.1.1 Failover with multiple listener addresses

If you're implementing failover in an OPS environment, you'll be dealing with several instances, each of which runs on its own node. You'll also have a listener running on each node. The database, and hence the service name, will be the same no matter which instance you connect to. In such a situation, you should implement failover by defining a net service name with multiple listener addresses and one set of connection data. For example:

 PROD.GENNICK.ORG =   (DESCRIPTION =     (ADDRESS_LIST =       (ADDRESS = (PROTOCOL = TCP)(HOST = prod01.gennick.org)(PORT = 1521))       (ADDRESS = (PROTOCOL = TCP)(HOST = prod02.gennick.org)(PORT = 1521))       (FAILOVER = true)     )     (CONNECT_DATA =       (SERVICE_NAME = prod.gennick.org)     )   ) 

Here you have an addresses list with two listener address. Each listener address points to a different node in a cluster. The (FAILOVER = true) parameter causes Net8 to try connecting to each address in sequence until a successful connection is made. Figure 8.2 shows the sequence of events that occurs when a client connects using the net service name prod.gennick.org .

Figure 8.2. The connection process when failover is used
figs/n8c_0802.gif

The (FAILOVER = true) parameter shown in the previous example is not strictly necessary, because it represents the default behavior whenever multiple listener addresses are supplied within an address list or within a description. In fact, if you configure this failover scenario using Net8 Assistant, you won't see (FAILOVER = true) in the resulting net service name definition.

While it's not the preferred syntax, you can specify multiple addresses for a description without enclosing them within an address list. The following example implements the same failover scenario you saw in the previous example, but without an ADDRESS_LIST parameter:

 PROD.GENNICK.ORG =   (DESCRIPTION =     (ADDRESS = (PROTOCOL = TCP)(HOST = prod01.gennick.org)(PORT = 1521))     (ADDRESS = (PROTOCOL = TCP)(HOST = prod02.gennick.org)(PORT = 1521))     (FAILOVER = true)     (CONNECT_DATA =       (SERVICE_NAME = prod.gennick.org)     )   ) 

By default, failover is enabled when you specify multiple addresses. You can disable failover completely, even when multiple addresses have been supplied, by specifying (FAILOVER = false). When failover is disabled, Net8 will attempt a connection using the first address. If that attempt fails, no further attempts will be made, and an error will be returned.

When a description list is used, client load balancing is enabled by default. This has implications for how failover works. Read the next section for details.

8.1.1.2 Failover with multiple descriptions

When multiple descriptions are embedded within a DESCRIPTION_LIST entry, each description can have a different set of connection data. The following net service name definition results in connection attempts to two different databases on two different nodes:

 PROD.GENNICK.ORG =   (DESCRIPTION_LIST =     (FAILOVER = true)     (LOAD_BALANCE = false)     (DESCRIPTION =       (ADDRESS_LIST =         (ADDRESS = (PROTOCOL = TCP)(HOST = donna.gennick.org)(PORT = 1523))       )       (CONNECT_DATA =         (SERVICE_NAME = donna.gennick.org)       )     )     (DESCRIPTION =       (ADDRESS_LIST =         (ADDRESS = (PROTOCOL = TCP)(HOST = jonathan.gennick.org)(PORT = 1521))       )       (CONNECT_DATA =         (SERVICE_NAME = herman.gennick.org)       )     )   ) 

When you connect using the service name prod.gennick.org , Net8 will first try to connect to the database service donna.gennick.org on the host with the same name. If that connection fails, Net8 will move on to the database service herman . gennick.org on the host named jonathan.gennick.org .

Notice that the description list contains both (FAILOVER = true) and (LOAD_BALANCE = false). (FAILOVER = true) still represents the default behavior. It's included here to make it clear that failover is being used. (LOAD_BALANCE = false), however, does not represent the default behavior in this case. It's included to disable client load balancing, which is enabled by default whenever multiple descriptions are being used. With client load balancing enabled, Net8 would randomly choose descriptions from the description list (you can read more about how that works later in this chapter). By disabling client load balancing, you ensure that Net8 tries each DESCRIPTION in the order in which it appears in the list.

Description list failover cannot be configured using Net8 Assistant.

It's actually possible to mix the two failover mechanisms described so far. The example in this section shows a net service name that contains two DESCRIPTION entries that each contain one ADDRESS entry. You could easily have two or three ADDRESS entries for one DESCRIPTION. If you did that, Net8 would try all the addresses for one description before moving on to the next.

8.1.1.3 Failover and Net8 Assistant

Failover can be configured using the Net8 Assistantyou don't need to edit your tnsnames.ora file directly, but Net8 Assistant only supports the definition of multiple addresses in an address list. Figure 8.3 shows the net service name PROD defined in Net8 Assistant with two listener addresses.

Figure 8.3. Net8 Assistant showing two listener addresses for PROD
figs/n8c_0803.gif

To define a net service name with two or more addresses using Net8 Assistant, you have to first go through the normal process of using the Net Service Name Wizard to define the name with just one address. Once that's done, click on the plus (+) icon in the Address Configuration area to add a tab for a second address. Then fill in the address details. You can add as many address tabs as you need.

Once you have all the addresses defined, click the Advanced button and you'll be presented with the dialog shown in Figure 8.4. This dialog lets you choose from a combination of failover and client load balancing options. The annotations to the figure explain the two options specifically related to failover.

Figure 8.4. Controlling failover with the Net8 Assistant
figs/n8c_0804.gif

The remaining options in Figure 8.4, the ones that aren't annotated, are for client load balancing or for use with Connection Manager. Client load balancing is described later in this chapter; Connection Manager is described in Chapter 9.

8.1.1.4 Failover and Oracle Names

Oracle Names doesn't affect Net8's failover functionality. If you're using Net8 Assistant, the process for defining a net service name with connect-time failover in Oracle Names is very similar to that of defining one in your tnsnames.ora file. Figure 8.5 shows the net service name PROD being defined with two listener addresses in the Oracle Names server named ns1.gennick.org .

Figure 8.5. Defining a net service name using connect-time failover to an Oracle Names server
figs/n8c_0805.gif

You can see that the use of tabs for listener addresses is identical to that shown earlier in Figure 8.3. There are two buttons labeled Advanced. The uppermost one in this case is the one that opens up the Address List Options dialog used to specify failover and load balancing options.

With Names Control, you use the REGISTER command to define a net service name using connect-time failover. The following command, which must be entered as one long line, defines the net service name prod :

 REGISTER prod -t oracle_database -d (DESCRIPTION=(ADDRESS_LIST= (ADDRESS=(PROTOCOL=TCP)(HOST=prod01.gennick.org)(PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=prod02.gennick.org)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME=herman.gennick.org))) 

REGISTER commands must be entered on one line, and you must not allow any spaces in the description following the -d parameter. Leave spaces in that string, and you'll get errors about using the wrong number of parameters.

8.1.2 Transparent Application Failover

Transparent Application Failover (TAF) is a Net8 feature designed to enable applications running in an OPS environment to gracefully recover from an instance failure by failing over to another instance accessing the same database. While designed for use with OPS, TAF can be used in non-OPS environments as well.

Unlike connect-time failover, TAF comes into play after an application has connected to an instance. If the connection to the instance is lost while the application is running, Net8 will transparently reconnect the application to another instance accessing the same database. The word "transparent" is best thought of in terms of the application user . In order for an application to take advantage of TAF, it must use failover-aware API calls that are built into the Oracle Call Interface (OCI). There are also TAF-related callbacks that can be used to make an application failover-aware.

SQL*Plus was one of the first applications to support TAF. Since then, Oracle has been working to add TAF capabilities to the following products:

  • Oracle Call Interface

  • ODBC

  • JDBC

  • Pro*C

  • Oracle Objects for OLE

8.1.2.1 TAF failover types

TAF supports two different types of failover: SESSION and SELECT. SESSION is the simplest type. When the connection to an instance is lost, SESSION failover results only in the establishment of a new connection to a backup instance. Any work in progress is lost.

SELECT failover is a bit more complex and enables some types of read-only applications to fail over without losing any work. When SELECT failover is used, Net8 keeps track of any SELECT statements issued in the current transaction. Net8 also keeps track of how many rows have been fetched back to the client for each cursor associated with a SELECT statement. If connection to the instance is lost, Net8 establishes a connection to a backup instance, reexecutes the SELECT statements, and positions the cursors so the client can continue fetching rows as if nothing had happened . SELECT failover is illustrated in Figure 8.6.

Figure 8.6. SELECT failover
figs/n8c_0806.gif

SELECT failover can be useful for reporting applications, but that's as sophisticated as TAF gets. There's no automatic recovery mechanism built into TAF to handle DML statements, such as INSERTs and UPDATES, that are in progress when a failover occurs. TAF has other inherent limitations as well. Some of these are described in Section 8.1.2.3.

8.1.2.2 TAF failover methods

TAF also supports two failover methods: BASIC and PRECONNECT. In both cases, you specify a net service name to use for the backup connection in case the primary connection fails. The difference between the two types lies in when the connection to the backup instance is made.

When the BASIC failover method is used, the connection to the backup instance is made only if and when the primary connection fails. When the PRECONNECT method is used, the connection to the backup instance is made at the same time as the connection to the primary instance. Having the backup connection already in place can reduce the time needed for a failover in the event that one needs to take place. The price you pay for that is the added overhead of always having that backup connection open.

8.1.2.3 TAF limitations

As good as it sounds, TAF has a number of limitations. Regardless of the failover type or the failover method, the following will be true when a failover occurs:

  • The effect of any ALTER SESSION statements will be lost.

  • Global temporary tables will be lost.

  • Any PL/SQL package states will be lost.

  • Transactions involving I NSERT, UPDATE, or DELETE statements cannot be handled automatically by TAF.

In addition to what gets preserved and lost during a failover, there are some connectivity issues to worry about. If a node goes down, your application may not notice it, and TAF may not be triggered until your application attempts to execute another SQL statement. A hung listener might cause a client to hang during a connection attempt. In that case, the client will never get a chance to attempt a connection to the backup instance. If the primary instance is up, but in an indeterminate state (such as during a startup or shutdown) client connections will fail, but not in a way that causes TAF to be triggered. Using ALTER SESSION to kill a client connection will also prevent TAF from being triggered.

The bottom line is that while Net8's TAF features represent a valuable piece of the puzzle when it comes to implementing robust applications that can fail over when necessary, you can't just slap TAF into place and expect all your applications to magically be capable of failover. As it stands now, TAF is most useful for read-only applications such as those for reporting or decision support.

8.1.2.4 Configuring TAF to connect to a backup instance

TAF is configured by adding a FAILOVER_MODE parameter to the CONNECT_DATA parameter for a net service name. TAF cannot be configured using Net8 Assistant. You have to manually edit tnsnames.ora or use the Oracle Names REGISTER command. If you are going to specify a backup instance, then you'll need two net service names: one to connect to the primary instance and one to connect to the backup instance. The following example shows a TAF configuration where connections to prod will fail over to prod _ bkup :

 prod.gennick.org =   (DESCRIPTION =     (ADDRESS_LIST =       (ADDRESS = (PROTOCOL = TCP)(HOST = jonathan.gennick.org)(PORT = 1521))     )     (CONNECT_DATA =       (SERVICE_NAME = herman.gennick.org)       (FAILOVER_MODE = (TYPE = SELECT)(METHOD=PRECONNECT)(BACKUP=prod_bkup))     )   ) prod_bkup.gennick.org =   (DESCRIPTION =     (ADDRESS_LIST =       (ADDRESS = (PROTOCOL = TCP)(HOST = donna.gennick.org)(PORT = 1523))     )     (CONNECT_DATA =       (SERVICE_NAME = donna.gennick.org)     )   ) 

The definition for prod contains a FAILOVER_MODE entry as part of its connection data. The BACKUP attribute for that entry specifies the net service name to which a connection should be made when a failover occurs. In this case, prod _ bkup represents the backup connection.

In this example, the backup connection is not to another OPS instance accessing the same database, but to an entirely different database. TAF can work either way, but if you fail over to a completely different database, you need some mechanism in place to keep it in sync with your primary database.

No FAILOVER_MODE entry has been placed in the definition for prod _ bkup . TAF doesn't support cascading failover. You can't fail over to a backup instance, and then fail over to yet another backup instance in the event that the first backup instance also fails.

8.1.2.5 Retries and delays

By default, when a TAF-initiated failover occurs, Net8 will make only one attempt to connect to the backup instance. Using the RETRIES and DELAY parameters, you can change that behavior so that Net8 makes multiple attempts to connect to the backup database. The FAILOVER_MODE specification in the following example calls for 20 retries at 30-second intervals:

 prod.gennick.org =   (DESCRIPTION =     (ADDRESS_LIST =       (ADDRESS = (PROTOCOL = TCP)(HOST = jonathan.gennick.org)(PORT = 1521))     )     (CONNECT_DATA =       (SERVICE_NAME = herman.gennick.org)       (FAILOVER_MODE = (TYPE = SELECT)(METHOD=BASIC)(BACKUP=prod_bkup)                        (RETRIES=20)(DELAY=30))     )   ) 

In this case, if the connection to prod is lost, Net8 will make 20 attempts over a period of 10 minutes (20 x 30 seconds = 10 minutes) to connect to the backup instance through the net service name prod _ bkup . This can be useful if you are using Oracle's standby database feature. If your primary database fails, it might take a few minutes for the standby to be brought up to date and opened. Using the RETRIES and DELAY attributes, you can accommodate that delay.

8.1.2.6 TAF and connect-time failover

You can mix transparent application failover with connect-time failover and take advantage of both. The following scenario illustrates this. Connect-time failover occurs from prod01.gennick.org to prod02.gennick.org . Assuming a successful connection to prod01.gennick.org , a failure while the application is running results in a TAF failover to prod02.gennick.org :

 PROD.GENNICK.ORG =    (DESCRIPTION =      (ADDRESS_LIST =        (ADDRESS = (PROTOCOL = TCP)(HOST = prod01.gennick.org)(PORT = 1521))       (ADDRESS = (PROTOCOL = TCP)(HOST = prod02.gennick.org)(PORT = 1521))     )     (CONNECT_DATA =       (SERVICE_NAME = herman.gennick.org)       (FAILOVER_MODE = (TYPE = SELECT)(METHOD=BASIC)(BACKUP=prod_bkup1))     )   ) PROD_BKUP1.GENNICK.ORG =   (DESCRIPTION =     (ADDRESS_LIST =       (ADDRESS = (PROTOCOL = TCP)(HOST = prod02.gennick.org)(PORT = 1521))     )     (CONNECT_DATA =       (SERVICE_NAME = herman.gennick.org)     )   ) 

In this scenario, failover always occurs from prod01 to prod02 . If connect-time failover causes the initial connection to be made to prod02 , then TAF failover won't be any help. That's because TAF failover can't be configured differently for the different addresses in the address list. If the initial connection is made to prod02 because of connect-time failover, then you are already connected to the backup mode. If another failure occurs, there's nowhere else to go.

You might think that you could gain some additional flexibility with respect to TAF and connect-time failover by using a description list instead of an address list. Consider the following example. Connect-time failover is to prod02 . The TAF definition for that description defines the backup connection to be prod_bkup1 , which points to prod01 . It appears as if you have reciprocity prod01 and prod02 always back up each other:

 prod.gennick.org =   (DESCRIPTION_LIST =     (FAILOVER = true)     (LOAD_BALANCE = false)     (DESCRIPTION =       (ADDRESS_LIST =         (ADDRESS = (PROTOCOL = TCP)(HOST = prod01.gennick.org)(PORT = 1521))       )       (CONNECT_DATA =         (SERVICE_NAME = herman.gennick.org)         (FAILOVER_MODE = (TYPE = SELECT)(METHOD=BASIC)(BACKUP=prod_bkup1))       )     )     (DESCRIPTION =       (ADDRESS_LIST =         (ADDRESS = (PROTOCOL = TCP)(HOST = prod02.gennick.org)(PORT = 1521))       )       (CONNECT_DATA =         (SERVICE_NAME = herman.gennick.org)         (FAILOVER_MODE = (TYPE = SELECT)(METHOD=BASIC)(BACKUP=prod_bkup2))       )     )   ) PROD_BKUP1.GENNICK.ORG =   (DESCRIPTION =     (ADDRESS_LIST =       (ADDRESS = (PROTOCOL = TCP)(HOST = prod02.gennick.org)(PORT = 1521))     )     (CONNECT_DATA =       (SERVICE_NAME = herman.gennick.org)     )   ) PROD_BKUP2.GENNICK.ORG =   (DESCRIPTION =     (ADDRESS_LIST =       (ADDRESS = (PROTOCOL = TCP)(HOST = prod01.gennick.org)(PORT = 1521))     )     (CONNECT_DATA =       (SERVICE_NAME = herman.gennick.org)     )   ) 

The reality, with respect to this example, is that regardless of whether connect-time failover occurs, the TAF settings are picked up from the first CONNECT_DATA entry encountered . Thus, the backup connection as far as TAF is concerned is always going to be prod _ bkup1 . This behavior is not well documented in the Oracle manuals, but that is currently the way it works.

Oracle may someday change this aspect of TAF. In a future release, Net8 may recognize that the second description in the list shown here has FAILOVER_MODE settings that are different from the first description. However, the behavior we are describing is correct as of the 8.1.6.0.0 release.

8.1.2.7 TAF status from V$SESSION

The V$SESSION view provides some information about TAF settings for the sessions currently connected to the instance. The information in V$SESSION applies only to TAF, and not to connect-time failover. The three columns to look at are these:

FAILOVER_TYPE

Indicates the type of failover. Valid values are NONE, SESSION, and SELECT. This value comes directly from the FAILOVER_MODE parameter named TYPE.

FAILOVER_METHOD

Indicates the method used to establish the backup connection. Valid values are NONE, BASIC, and PRECONNECT. This value comes directly from the FAILOVER_MODE parameter named METHOD.

FAILED_OVER

Indicates whether or not a session has failed over to the backup connection. Valid values are YES and NO.

The following example shows a query against V$SESSION that displays the available TAF information:

 SQL>  SELECT username, sid, serial#, failover_type, failover_method, failed_over  2  FROM v$session;  USERNAME          SID    SERIAL# FAILOVER_TYPE FAILOVER_METHOD  FAILED_OVER ---------- ---------- ---------- ------------- ---------------- -----------                     1          1 NONE          NONE             NO                     2          1 NONE          NONE             NO                     3          1 NONE          NONE             NO                     4          1 NONE          NONE             NO                     5          1 NONE          NONE             NO                     6          1 NONE          NONE             NO                     7        193 NONE          NONE             NO                     8        193 NONE          NONE             NO                     9        193 NONE          NONE             NO                    10        193 NONE          NONE             NO SYSTEM             11         63 SELECT        BASIC            YES 11 rows selected. 

In this example, the session for the user SYSTEM has failed over to the backup database. Note the value YES in the FAILED_OVER column. Prior to failover, that column would contain the value NO.

In most environments, only database administrators have access to the V$SESSION view.


Team-Fly    
Top


Oracle Net8 Configuration and Troubleshooting
Oracle Net8 Configuration and Troubleshooting
ISBN: 1565927532
EAN: 2147483647
Year: 2000
Pages: 120

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net