15.3 Cluster interconnect

 < Day Day Up > 



This is a very important piece of the clustered configuration in Oracle 9i RAC. Oracle depends a lot on the cluster interconnect for movement of data between the instances. Chapter 5 (Transaction Management) provides detailed explanation on how global data movement occurs.

The important test on the cluster interconnect should start with a test of the hardware configuration. Tests to determine the transfer rate specification versus the actual implemented packet size should be made to ensure the installation has been carried out as per specification.

In a two-node RAC configuration, the cluster interconnect is a direct link between two nodes (e.g., between ORA-DB1 and ORA-DB2). However, in a configuration of more than two nodes, this direct connection between the nodes is not possible, hence a switch would be required to act as a bridge between the nodes participating in the clustered configuration. Now while determining the performance of a system with more than two nodes, the speed of the switch has to be obtained independently of the speed of the interconnect to determine the true latency of the switch and the interconnect.

In Chapter 2 (Hardware Concepts) the various types of cluster interconnects have been discussed. The speed of the cluster interconnect solely depends on the hardware vendor and the layered operating system. Oracle depends on the operating system and the hardware for sending packets of information across the cluster interconnect. For example, one type of cluster interconnect supported between Sun 4800s is the UDP protocol. However, Solaris in this specific version of the interconnect protocol has an O/S limitation of a 64 KB packet size for data transfer. To transfer 256 KB worth of data across this interconnect protocol would take this configuration over four round trips. On a high-transaction system where there is a large amount of interconnect traffic, because of user activity on the various instances participating in the clustered configura tion, this could cause a serious performance issue.

After the initial hardware and operating-system-level tests to confirm the packet size across the interconnect, subsequent tests could be done from the Oracle database to ensure that there is not any significant added latency from using cache-to-cache data transfer or the cache fusion technology. The query below provides the average latency of a consistent block request on the system. The data in these views show a cumulative figure since the last time the Oracle instance was bounced. Hence, this does not reflect the true performance of the interconnect or give a true picture of the latency in transferring data. To get a more realistic picture of the performance, it would be good to bounce all the Oracle instances and test again.

To obtain good performance it is important that the latency across the cluster interconnect be as low as possible. In Chapter 2 (Hardware Concepts) there is information about interconnects that support latency as low as 0.005 nanoseconds.

Latencies on the cluster interconnect could be caused by:

  • A large number of processes in the run queues waiting for CPU or scheduling delays.

  • Platform-specific O/S parameter settings that affect IPC buffering or process scheduling.

  • Slow, busy, or faulty interconnects.

Oracle recommends that the average latency of a consistent block request should typically be about 15 ms depending on the system con figuration and volume. The average latency of a consistent block request is the average latency of a consistent-read request round trip from the requesting instance to the holding instance and back to the requesting instance.

set numwidth 20 column "AVG CR BLOCK RECEIVE TIME (ms)" format 9999999.9 select       b1.inst_id,       b2.value "GCS CR BLOCKS RECEIVED",       b1.value "GCS CR BLOCK RECEIVE TIME",       ((b1.value / b2.value) * 10) "AVG CR BLOCK RECEIVE       TIME (ms)" from gv$sysstat b1, gv$sysstat b2 where b1.name = 'global cache cr block receive time' and b2.name = 'global cache cr blocks received' and b1.inst_id = b2.inst_id;                 GCS CR    GCS CR BLOCK      AVG CR BLOCK INST_ID  BLOCKS RECEIVED  RECEIVE TIME  RECEIVE TIME (ms) -------  ---------------  ------------  -----------------    1                2758        112394             443.78    2                1346          1457               10.8 2 rows selected.

In the output above, it can be noticed that the AVG CR BLOCK RECEIVE TIME is 443.78 ms; this is significantly high when the expected average latency as recommended by Oracle should be 15 ms. A high value is possible if the CPU has limited idle time and the system typically processes long-running queries. However, it is possible to have an average latency of less than 1 ms with user-mode IPC. Latency can also be influenced by a high value of the DB_MULTI_BLOCK_READ_COUNT parameter. This is because a requesting process can issue more than one request for a block depending on the setting of this parameter. Correspondingly, the requesting process may have to wait longer. This kind of high latency requires further investigation of the cluster interconnect configuration and tests should be performed at the operating system level.

Apart from the basic packet transfer tests that can be performed at the O/S level, there are other checks and tests that can be done to ensure that the cluster interconnect has been configured correctly.

  • There are redundant private high-speed interconnects between the nodes participating in the cluster. One interconnect is the primary and the other acts as the secondary when the primary fails, providing continuous availability.

  • The user network connection does not interfere with the cluster interconnect traffic. That is, they are isolated from each other.

  • The heartbeat verification occurs on a separate interconnect and is configured with a redundant path should one fail, without inter ruption. The heartbeat communication should be via the network that is outside the user networking or the private interconnects discussed above.

The following operating system commands can provide information on the cluster interconnect configuration:

  • The netstat command displays network-related data structures. The output below, netstat-i, indicates that there are three network adapters configured.

    ha-db1:RAC1:oracle # netstat –i Name Mtu Net/Dest     Address      Ipkts    Ierrs Opkts     Oerrs Collis Queue lo0 8232 loopback     localhost     1348906 0     1348906       0      0     0 ge2 1500 ora-db1      ora-db1      71804657 0     86959594      0      0     0 ge1 1500 172.16.1.0   172.16.1.1   74815149 0     83336461      0      0     0 ge0 1500 172.16.0.128 172.16.0.129  8501769 0     6064307       0      0     0

    ge2: Primary adapter

    ge1: Primary interconnect adapter

    ge0: Secondary interconnect adapter

    lo0: The output also indicates that there is a loopback option configured. Verification of whether Oracle is using the loopback option should also be verified using the ORADEBUG command and is discussed later in this section. The use of the loopback IP depends on the integrity of the routing table defined on each of the nodes. Modification of the routing table can result in the inoperability of the interconnect. Without the cluster interconnect between the nodes, RAC traffic cannot continue and could dramatically reduce database performance or halt database operations.

  • On certain Unix versions the usage of the transport layer can be verified by using the scstat command.

    # scstat -W -- Cluster Transport Paths --                  Endpoint                  Endpoint          Status --------- ------------------------ ------------------------ -------- Transport ora-db1.summerskyus.com: ora-db2.summerskyus.com: Path path:       ge1                      ge1                      online Transport ora-db1.summerskyus.com: ora-db2.summerskyus.com: Path path:       ge0                      ge0                      online

  • Another useful command for verification of the network is the

    ora-db1:RAC1:oracle # ifconfig -a

Checks can also be done from the Oracle instance to ensure proper configuration of the interconnect protocol. If the following commands are executed as user sys, a trace file is generated in the user dump destination directory that contains certain diagnostic information pertaining to the UDP/IPC configurations:

SQL> ORADEBUG SETMYPID       ORADEBUG IPC        EXIT 

The following is an extract from the trace file pertaining to the interconnect protocol. The output confirms that the cluster interconnect is being used for instance-to-instance message transfer.

 SSKGXPT 0x3671e28 flags SSKGXPT_READPENDING info for network 0         socket no 9   IP 172.16.193.1       UDP 59084         sflags SSKGXPT_WRITESSKGXPT_UP         info for network 1         socket no 0   IP 0.0.0.0    UDP 0         sflags SSKGXPT_DOWN context timestamp 0x4402d         no ports

Note 

The above output is from a Sun 4800 and indicates the IP address and that the protocol used is UDP. On certain operating systems such as Tru64 the trace output does not reveal the Cluster interconnect information.

Oracle's alert log (output listed below) is another great source of information:

Mon Dec 2 11:21:58 2002  cluster interconnect IPC version:Oracle UDP/IP with Sun RSM disabled  IPC Vendor 1 proto 2 Version 1.0

The following NDD command at the operating system level will confirm the actual UDP size definition. The following output is from a Sun environment:

ora-db1:RAC1:oracle # ndd -get /dev/udp name to get/set ? udp_xmit_hiwat value ? length ? 8192 name to get/set ? udp_recv_hiwat value ? length ? 8192

The output above reveals that the UDP has been configured for an 8 KB packet size. Applying this finding to the data gathered from the Oracle's views indicates that it would take 14,050 trips for all the blocks to be transferred across the cluster interconnect (112,394/8 = 14,050). If this were set to be 64 KB then the number of round trips would be signi ficantly reduced (112,394/64 = 1756 trips).

The STATSPACK report is also a good source of information to determine the interconnect latency. For example the following extract indicates high timeouts during ''gcs remote message'' transfer.

                                          Avg                                         Total Wait  wait   Waits Event                  Waits   Timeouts   Time (s)  (ms)   /txn -----------------     -------  -------- ----------  ---- ------- gcs remote message    391,629  377,590       7,017    18 2,163.7 ges remote message     77,311   74,159       3,488    45   427.1 gcs remote message    209,188  186,286       3,505    17   817.1 ges remote message     77,257   73,797       3,489    45   301.8 gcs remote message    599,108  323,177       6,943    12   303.5 ges remote message     81,552   72,571       3,485    43    41.3

The output above indicates high timeouts between remote message transfers; it can also be noticed that the number of waits and waits per transaction is significantly high.

Another parameter that affects the interconnect traffic is the DB_FILE_ MULTIBLOCK_READ_COUNT. This parameter helps read a certain number of blocks at a time from disk. When data needs to be transferred across the cluster interconnect, this parameter determines the size of the block that each instance would request from the other during read transfers.

Sizing this parameter should be based on the interconnect latency and the packet sizes as defined by the hardware vendor, and after considering the operating system limitations (e.g., the Sun UDP max setting is only 64 KB).

Kernel parameters that define the UDP parameter settings in the respective hardware environments are shown in Table 15.1.

Table 15.1: Kernel Parameters

Hardware

Parameter

SUN (ndd)

udp_recv_hiwat

 

udp_recv_hiwat

Linux

/proc/sys/net/core/rmem_default

 

/proc/sys/net/core/rmem_max

 

/proc/sys/net/core/wmem_default

 

/proc/sys/net/core/wmem_max

Tru64 (sysconfig)

udp_recvspace

 

udp_sendspace

HP (ndd)

tcp_xmit_hiwater_def

 

tcp_recv_hiwater_def

Note 

In Oracle 9i (9.2.0.1), there is a 32 KB size limitation set by Oracle on the amount of information that can be transferred across the cluster interconnect. In Oracle 9i (9.2.0.2) this limitation has been removed and now the amount of information that can be transferred across the cluster interconnect depends on limitations defined by the hardware vendor.

CLUSTER_INTERCONNECTS

This parameter provides Oracle with information on the availability of additional cluster interconnects that could be used for cache fusion activity across the cluster interconnect. The parameter overrides the default interconnect settings at the operating system level with a preferred cluster traffic network.

While this parameter does provide certain advantages on systems where high interconnect latency is noticed by helping reduce such latency, configuring this parameter could affect the interconnect high-availability feature. In other words, an interconnect failure that is normally unnoticeable would instead cause an Oracle cluster failure as Oracle still attempts to access the network interface.



 < Day Day Up > 



Oracle Real Application Clusters
Oracle Real Application Clusters
ISBN: 1555582885
EAN: 2147483647
Year: 2004
Pages: 174

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net