Net Configuration | Oracle Database 10g. High Availablity with RAC Flashback & Data Guard

< Day Day Up >

Now that we have gotten the definitions (and more) out of the way, we can look more at the specific configuration of these options. We saw some examples of different tnsnames entries, enabling the various features discussed. We will now discuss some of the mechanics of configuring these various pieces.

Using GUI Tools to Generate Configuration Files

Net configuration files can be generated manually by creating your own local tnsnames.ora file, or you can use a variety of different tools to walk you through the configurations. In Oracle Database 10g, probably the simplest way to accomplish this is to use the DBCA to create the services, as we described in Chapter 6. After the DBCA has done its work, you can take the tnsnames.ora files generated on the server and distribute them out to the appropriate clients. Note, however, that you probably do not want the entire server-side tnsnames.ora to be sent out to every client, because every service will be defined in that file. Instead, you can take that information specific to a service/application, and create a tnsnames.ora for a specific set of clients and distribute it that way. This will prevent, for example, clients who normally connect to the accounts payable service from instead trying to connect to the OLTP service.

Aside from the DBCA, you can also run the netmgr (Net Manager) or netca (Net Configuration Assistant) at the client, to step you through the creation of a basic tnsnames.ora file at the client site. Again, once you have a file set up for a specific service, or set of users, you can distribute the tnsnames.ora to multiple clients in need of access to that specific service.

Easy Connect

Oracle Database 10g allows for a simpler client configuration, in that in some cases it is not necessary to have a TNSNAMES file at all. This is possible using Oracle Net Easy Connect. In doing so, the client may connect by simply specifying the connect string in the form of //<vipname:port>/<service_name>. Note that it is not necessary to specify the port if you are using port 1521. This type of connection is possible only with Oracle Database 10g and later clients. A connection using the Easy Connect syntax will connect you to any available node that the service is running on, regardless of the VIP that is specified. For example, the following client connection to the grid_callcenter service could be made to either of the nodes available for the service, even though the VIP for node1 is specified:

connect scott/tiger@//rmscvip1/grid_callcenter

connect scott/tiger@//rmscvip1:1521/grid_callcenter

Using the Easy Connect syntax in Oracle Database 10g Release1, instead of a tnsnames.ora file, means that you will not be able to take advantage of advanced features, such as TAF or client-side load balancing. However, the listener load balancing is still possible.

Environments Suitable for TAF

The primary environment suited for use of transparent application failover is a RAC environment, because we can always be assured that when failing over to another node, we are still accessing the same data as before-because it is the same database. However, TAF can also be used in other environments, such as physical or logical standby environments, replication environments, and Streams environments. Determining the suitability depends on how these environments are used.

For example, with a standby database, it is possible to have users failover transparently, as we mentioned in Chapter 7, when the database is switched over to the standby. This depends, however, on how quickly the switchover takes place, and also on the settings for RETRIES and DELAY, as discussed next. With logical standby and replicated environments, it is possible for clients to failover as well, without necessarily having to perform a switchover, because in both cases (logical standby and replication) the target databases for failover are open. However, successful select failover would depend on the availability of the data at the target sites, as the propagation of data to a replicated site or a logical standby does not necessarily have to be synchronous.

In the following HA Workshop, we will take the most common case-that of implementing transparent application failover in a RAC environment, and demonstrate the configuration and testing of TAF to verify the setup. The same principles can be applied to other environments as well.

HA Workshop: Configuring and Testing TAF

Workshop Notes

This workshop will walk you through the creation of a tnsnames.ora file with select type failover enabled, and then test the failover of an in-flight query when the instance where the query started goes down.

Step 1. Create the tnsnames.ora file on an Oracle Database 10g client machine, similar to the following:

GRID =   (DESCRIPTION =     (ADDRESS = (PROTOCOL = TCP)(HOST = rmscvip1.us.oracle.com)(PORT = 1521))     (ADDRESS = (PROTOCOL = TCP)(HOST = rmscvip2.us.oracle.com)(PORT = 1521))     (LOAD_BALANCE = yes)     (CONNECT_DATA =       (SERVICE_NAME = grid)       (FAILOVER_MODE =         (TYPE = SELECT)         (METHOD = BASIC)         (RETRIES = 30)         (DELAY = 5)        )      )    )

Step 2. Connect to the grid service from the client machine:

sqlplus scott/tiger@grid

Step 3. Confirm which node you are connected to, and the failover method, with the following query, run from a sysdba session on the server:

SQL> select inst_id, username, failover_type, failover_method, failed_over   2  from gv$session where username = 'SCOTT';    INST_ID USERNA FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER ---------- ------ ------------- --------------- ------------          2 SCOTT  SELECT        BASIC           NO

From the above, we can see that our user SCOTT connected to instance 2, the FAILOVER_TYPE is select, and the session has not yet failed over.

Step 4. Create or populate a table with a large amount of data-enough so that a long-running query can be performed (long enough that we can shut down instance 2 in the middle). In our case, we have granted select on DBA_OBJECTS to SCOTT, and created a table called TEST_TAF as select * from SYS.DBA_OBJECTS:

Create table TEST_TAF as select * from SYS.DBA_OBJECTS;

Step 5. Connect as sysdba again to instance 2, in another session, and prepare to issue a SHUTDOWN ABORT, but do not press ENTER yet.

Step 6. Toggle back over to user Scott's session on the client. Execute a query against the TEST_TAF table from within Scott's session:

select * from test_taf where object_name like 'D%';

Step 7. As soon as the query begins, switch back over to your sysdba connection to instance 2 and execute the SHUTDOWN ABORT command.

Step 8. After a brief pause, your query from within your client session (as user Scott) should continue on. On instance 1, open up another sysdba connection and run the same query as before to get the session information for Scott.

SQL> select inst_id, username, failover_type, failover_method, failed_over  from gv$session where username = 'SCOTT'    INST_ID USERNA FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER ---------- ------ ------------- --------------- ------------          1 SCOTT  SELECT        BASIC           YES

Note that the session now resides on INST_ID 1, and the FAILED_OVER column says YES.

Note

The DELAY parameter in the above configuration tells us to wait five seconds after the failure is first noted, and then to attempt to reconnect. RETRIES tells us to retry this 30 times, with a five-second delay each time. The length of the pause you see in Step 7 will depend on how far the query has gotten, and will also depend on these parameters. If neither parameter is set, your session may error out rather than failing over as expected.

Listener Configuration

One of the most important differences in the configuration of Oracle Net in Oracle Database 10g is the fact that the listener in a RAC environment must be configured to listen on the virtual IP (VIP), rather than the actual IP address of the node. For those who may be familiar with RAC Guard in 9.2, or Parallel Fail Safe (OPFS) in previous releases, this is not a new concept, except to say that this is now an enforced configuration rather than an optional choice. However, this is still a new concept to most RAC users, and, of course, it is also new to those of you who are new to RAC altogether.

Another difference is the name used by the default listener. In past releases, it was fairly certain that 9 times out of 10, you were using a listener that was simply called LISTENER. However, in a RAC environment with a shared home (that is, a single home on a cluster file system), you only have a single listener.ora. You cannot, therefore, have two listeners both called LISTENER. Thus, it has become standard practice to name the listener for each node using the convention LISTENER_<NODENAME>. The default RAC and DBCA configuration in Oracle Database 10g will use that convention.

Why a Virtual IP?

So, why must we use a virtual IP address in a RAC environment in Oracle Database 10g? We touched on the reasoning briefly in earlier chapters, but we will go into more detail here. The simple answer to this question is 'TCP timeouts.' So, now that we have answered this question, we can move on, right? No? Okay, let's discuss this a bit more.

TCP Timeouts

TCP timeouts, believe it or not, play a huge piece in the perceived availability of applications. When a node in a RAC environment goes down, or in any high-availability environment with multiple addresses to attempt, there is no way for the client to know this. If a client is connecting using a TNS alias, or a service that allows connection to multiple nodes, the client may unknowingly try its first connection attempt to the node that is down. This in and of itself is not a problem, as there are supposed to be multiple addresses in the list. As such, when the client fails to get a response from the first address in the list, the next address will be tried, until the connection succeeds. The problem lies with the time that it takes to go to the next address in the list. How long does the client wait to determine that the host it is trying to reach is not accessible? The time can range anywhere from a few seconds to a few minutes, and in some environments this is simply unacceptable. If a node goes down for several hours, days, or weeks, the database may still be humming along just fine, with x number of nodes still accessing the database. However, some clients may always be trapped into making the initial connection attempt to the down node, and therefore be stuck in front of a (seemingly) interminable hourglass while the connection is timing out, prior to being rerouted to the next address in the list.

Reigning in TCP Timeouts at the OS Unfortunately, this time is something that is generally outside of the control of Oracle. In addition, it varies from client to client, and operating system to operating system. It is controlled by the operating system timeout values on the client side, so making modifications to all clients can be cumbersome since there may be many clients and many variations to configuration changes that need to be made. Further, changing the timeout values may also result in adverse consequences on other applications that the clients are running, if other applications rely on a higher TCP timeout value for whatever reason.

To make matters worse, the behavior may not be consistent. If client-side load balancing is enabled, it is quite possible that some connections will succeed immediately on their first attempt, because they just happened to randomly connect to a node that is available. At other times, however, the connection time increases, because the client randomly and unwittingly picks the down node for its first connection attempt. The result of this is confusion and frustration at the client side, even though from the database perspective, everything is functioning as it should.

Giving the HA DBA Control Over TCP Timeouts Enter the virtual IP address, or VIP. By using a virtual IP address, Oracle eliminates the problem with TCP timeouts on the initial connection, without the need to make any changes to a single client machine. This is done by enforcing client connections to first come in on the virtual IP address for all connections. When all nodes are functioning properly, each VIP is running on its assigned node, and connections are directed to the appropriate listener and service. When the unthinkable happens, and a node fails (gasp!), CRS will kick in, and the VIP for that node will actually be brought online on one of the remaining nodes of the cluster, where it can respond to a ping and also to connection attempts. It is important to note that this VIP will not now be accepting connections to the database. However, since the IP address is available, it will be able to respond to a connection attempt immediately. The response given to the client would normally be in the form of an ORA-12541, advising that there is no listener available. This is because the node where the VIP now resides has its own listener, but it is listening on its own VIP-not the VIP of any other nodes. The client, receiving the message back that there is no listener, will then be able to immediately retry, using the next IP in the ADDRESS_LIST, rather than waiting up to two minutes for the timeout we would normally expect. Thus, a connect-time failover has still occurred, but the connection attempt succeeds within a matter of a few seconds. The actual ORA error is masked, so that the client never sees it.

< Day Day Up >