Chapter 5: Database Administration in a RAC Environment | Oracle Database 10g. High Availablity with RAC Flashback & Data Guard

< Day Day Up >

So, now that you have configured your RAC cluster and have a running RAC environment, what is needed to keep it that way? How is a RAC database different from a regular database, and how are they the same? What special knowledge is needed in the Brave New World of RAC to maintain your system as a highly available system? The reality and beauty of a RAC environment is that there really is not that big of a difference between a single-instance configuration and a RAC environment. However, there are enough differences that we can afford to devote a chapter to that discussion, as well as some of the other administrative needs you will have with regard to maintaining a RAC system.

RAC Essentials

In this section, we will discuss some of the underlying concepts of how RAC implements multiple instances for a single database, and give you the essentials for being able to determine how the database is set up and works. This includes points such as special parameters for RAC, how parameters differ for different instances, how redo and undo are managed, and differences in data dictionary views. This section will provide the most value to those who are new to the world of RAC with Oracle.

Instance Naming

This is just a short blurb on instance naming conventions in a RAC environment. We must first distinguish between the instance name and the database name. As noted in Chapter 4, the DBCA will prompt you for the database name-this will always be the same regardless of the node that you are on, as it is associated with the physical database itself. The instance name, on the other hand, is associated with the memory on each node, so the instance name must be unique to each node. As a general rule, you want the instance name to be the same as the database name, with the instance number appended to it-that is, <db_name>1, <db_name>2, and so on. This is how the DBCA will name the instances (note that sid and instance are synonymous). While you could deviate from this standard, it is strongly recommended that you do not do so.

spfile in the RAC World

This brings us next to the topic of the spfile, as that is the first file read as your RAC instance starts up. The spfile, or system parameter file, contains all of the parameters read by each instance. The spfile itself was first introduced in Oracle9i, as an alternative to the regular parameter file or pfile. The advantage of an spfile is that it gives you the ability to know for certain the file that was used by a given instance to start up-this is done via the following command:

SQL> show parameter spfile

With a regular parameter file, once the instance is started, there is no record of what file was used for startup. You can view the parameters used at startup by checking the alert log, but there is no record of what file was read for those parameters.

Viewing the spfile

While the spfile can be viewed using a CAT or MORE command or a text editor such as WordPad (on Windows), it is actually a binary file. As such, the only way to edit or change it is from within the database itself, via an ALTER SYSTEM command. This is a double-edged sword, as it can sometimes complicate the simple task of changing a parameter. However, it also has the advantage of allowing the HA DBA to make dynamic parameter changes, which take effect immediately and at the same time have permanence. This is due to the fact that the same command can have the changes written to the spfile, for use on subsequent startups. This is not possible using a regular parameter file.

In a RAC environment, all instances should share the same system parameter file. For parameters that are required to be unique for a given instance, that parameter must be prefaced with the instance name, in the form of

<sid_name>.<parameter_name>

For parameters that apply to all instances (the majority of parameters), the instance name is replaced by an asterisk (*). As an example, view these parameters:

grid2.instance_number=2 grid1.instance_number=1 *.java_pool_size=50331648 *.job_queue_processes=2

In this example, the instance_number applies only to either the grid1 or grid2 instance, while the values for the java_pool_size and job_queue_processes parameters will apply to all instances in the cluster.

Modifying the Parameters in the spfile

Should you need to change a parameter, you will need to do so via the ALTER SYSTEM SET XXX command. This section will briefly discuss how to modify the parameters in the spfile. The first thing to be aware of is that some parameters can be modified dynamically to take effect immediately, while others require a restart of the instance before they take effect. In either case, you must still modify the parameters via an ALTER SYSTEM command, but the SCOPE argument is used to determine how or when the parameter will be implemented.

By default, SCOPE is set to BOTH, meaning that if it is not specified, the ALTER SYSTEM command will assume that you want to write the new value out to the parameter file and also have it take effect immediately in memory. If you try to modify a parameter in memory that cannot be changed dynamically, you will get an ORA-02095 or an ORA-32018 error message. If this occurs, you must specify the SCOPE=SPFILE option, indicating that the parameter should be written only to the spfile and will take effect at the next startup of the instance.

Specifying the SID when Modifying Parameters

In a RAC environment, a key component of the ALTER SYSTEM command is the sid option. This tells Oracle which sid the change should apply to. If you leave the sid clause out, the default will be sid='*', meaning that the change is meant to apply to all instances. If you wish a change to apply to only one instance, you must specify sid='<sid_name>'. For example, the following command will set the parameter in the spfile for sid grid1:

SQL> alter system set db_cache_size=500m scope=spfile sid='grid1'; System altered.

This means that when the instance is restarted, grid1 will have a db_cache_size of 500MB, but this will not take effect until restart, since SCOPE=SPFILE was specified. In addition, the change will only impact the grid1 instance-not any other instances in the cluster.

RAC-Specific Parameters

The parameters instance_number and thread are examples of parameters that have specific purposes in a RAC environment. Each instance must have a unique value associated with it. It is not that these parameters do not exist in a regular instance-it is just that in a single-instance environment, the value of each defaults to 1. In a RAC environment, these take on a special meaning-the thread number defines which redo logs are associated with a given instance and the instance number will map to the inst_id column for many data dictionary views beginning with GV$ (as opposed to V$ views). These topics are discussed later in this chapter, in the sections 'Cache Coherency in a RAC Environment' and 'Managing REDO and UNDO in a RAC Environment.' The instance number is also used in other areas such as job scheduling. To avoid confusion, the thread number and instance number should be the same for a given instance.

Aside from the instance and thread parameters, there are a few other parameters that take on a special significance in a RAC environment. These parameters include (but are not limited to) CLUSTER_DATABASE, CLUSTER_INTERCONNECTS, ACTIVE_INSTANCE_COUNT, and MAX_COMMIT_PROPAGATION_DELAY. We will cover these parameters briefly, and what they mean to the HA DBA, in this next section.

CLUSTER_DATABASE

CLUSTER_DATABASE essentially defines the instance as being part of a cluster. It determines if the controlfile will be mounted in a shared mode, meaning other instances can also mount it. In most cases, all instances must have this value set to TRUE. However, there are some unique situations where all instances but one will need to be shut down, and a single instance will be open with CLUSTER_DATABASE set to FALSE. One example of this is when upgrading or applying a patch. The upgrade scripts are generally run after a STARTUP MIGRATE command, which must be done from one instance with CLUSTER_DATABASE set to FALSE. Another example is when putting the database into archivelog mode, or flashback mode.

CLUSTER_INTERCONNECTS

CLUSTER_INTERCONNECTS allows you to specify the network that is used for the IPC traffic for the cluster. As we have demonstrated, this is specified during the cluster install, so you generally will not need to set this. However, as we noted in Chapter 4, you may need to set this parameter if you do not have Patch 3385409, to ensure that the correct network is used for the interconnect. You also may find it helpful to set this parameter if you have multiple networks for the interconnect, and/or if you add network cards after the CRS install. When modifying the parameter CLUSTER_INTERCONNECTS, you must do it for each sid, specifying the sid as such:

alter system set cluster_interconnects = '10.1.1.1' scope=spfile sid='grid1'; alter system set cluster_interconnects = '10.1.1.2' scope=spfile sid='grid2';

Note, however, that you do not have to connect individually to each instance-you can issue the above commands all from the same session. If you wish to specify multiple networks for the interconnect, you would specify both IP addresses in the same command, separated by a colon. For example:

alter system set cluster_interconnects = '10.1.1.1:192.168.0.1' scope=spfile sid='grid1'; alter system set cluster_interconnects = '10.1.1.2:192.168.0.2' scope=spfile sid='grid2';

There are two ways to confirm that the correct network is being used for the interconnect traffic. The first, and simplest, is to check the alert log immediately after the startup parameters are listed. You should see an entry in the alert log similar to the following:

Cluster communication is configured to use the following interface(s) for this instance  10.1.1.2

If your database has been up and running so long that you no longer have the alert log from the startup (it is a highly available database, after all), then you can check the interconnect via the ORADEBUG IPC command in SQL*Plus. You must first attach to a pid, as such:

SQL> oradebug setmypid Statement processed. SQL> oradebug ipc Information written to trace file.

The trace file this information is written to will be in the udump directory-look for the last file created there-the command LS -LTR will sort the files by creation date. The trace file will have an entry such as this toward the end, showing the IP that is used:

SSKGXPT 0xbf8cd8c flags SSKGXPT_READPENDING     info for network 0         socket no 8     IP 10.1.1.2     UDP 52804         sflags SSKGXPT_WRITESSKGXPT_UP

ACTIVE_INSTANCE_COUNT

ACTIVE_INSTANCE_COUNT applies only in a two-node cluster. If set to a value of 1, in a two-node cluster, it will determine that the first instance to start up will be considered the primary instance. The second instance will be considered a secondary instance, and will only accept connections if the primary goes down for some reason. In a grid computing environment, you will generally want all instances active and accepting connections, so this parameter will most likely be set to the total number of instances in the cluster.

MAX_COMMIT_PROPAGATION_DELAY

MAX_COMMIT_PROPAGATION_DELAY defines the maximum amount of time that the system change number (SCN) is held in the SGA of the local instance before it is refreshed by the LGWR process. By default, the value of 700 (or 7 seconds) is generally adequate. However, in some extreme cases, where committed DML statements (that is, inserts, updates, or deletes) are taking place on one node, followed by the immediate need to query the new data from another node, this value may need to be changed to 0. Setting this parameter to 0 causes commits to be broadcast immediately to all other nodes, ensuring that the SGA has the current SCN on all nodes. When this parameter is set to 0, you will see an entry in the alert log like this:

This instance was first to open Picked broadcast on commit scheme to generate SCNs

Otherwise, if this parameter is set to a value greater than 0, you will see the Lamport scheme is chosen:

This instance was first to open Picked Lamport scheme to generate SCNs

The downside to setting this parameter to 0 is that it will have a slight performance impact, as it increases the messaging traffic on the interconnect and causes LGWR to do more work to keep the SCNs in sync. In general, we recommend leaving this value set to its default, allowing messages to update the SCN across instances to be combined with other cluster communications, and thereby working more efficiently. If you have the need to change this value, it must be set to the same value on all cluster nodes.

Additional Background Processes in a RAC Instance

Aside from additional parameters in a RAC environment, you will also find that there are additional background processes associated with each RAC instance. The purpose of these processes is to work together in a coordinated fashion to maintain the locks necessary for multiple instances to access resources simultaneously, and to ensure that these resources are made available to the instances where they are most needed, in a timely fashion.

A simple query like the following will give you an idea of the background processes involved in a RAC environment:

SQL> select name, description from v$bgprocess where PADDR <> '00'; NAME  DESCRIPTION ----- ---------------------------------------------------------------- PMON  process cleanup DIAG  diagnosibility process LMON  global enqueue service monitor LMD0  global enqueue service daemon 0 LMS0  global cache service process 0 LMS1  global cache service process 1 MMAN  Memory Manager DBW0  db writer process 0 LGWR  Redo etc. LCK0  Lock Process 0 CKPT  checkpoint SMON  System Monitor Process RECO  distributed recovery CJQ0  Job Queue Coordinator QMNC  AQ Coordinator MMON  Manageability Monitor Process MMNL  Manageability Monitor Process 2 17 rows selected.

Of these processes, the ones specific to a RAC instance are the DIAG, LCK, LMON, LMDn, and LMSn processes. We will give a brief description of each and how they interact in a RAC environment next.

DIAG: Diagnosability Daemon

The diagnosability daemon is responsible for capturing information on process failures in a RAC environment, and writing out trace information for failure analysis. The information produced by DIAG is most useful when working in conjunction with Oracle Support to troubleshoot causes for a failure. Only a single DIAG process is needed for each instance.

LCK: Lock Process

The lock process (LCK) manages requests that are not cache-fusion requests, such as row cache requests and library cache requests. Only a single LCK process is allowed for each instance. LCK maintains a list of lock elements and uses this list to validate locks during instance recovery.

LMD: Lock Manager Daemon Process

The lock manager daemon is also known as the global enqueue service daemon, as it manages global enqueue and global resource access. From within each instance, the LMD process manages incoming remote resource requests (that is, requests for locks that come from other instances in the cluster). It is also responsible for deadlock detection and monitoring for lock conversion timeouts.

LMON: Lock Monitor Process

LMON is the global enqueue service monitor. It is responsible for the reconfiguration of lock resources when an instance joins the cluster or leaves the cluster, and also is responsible for the dynamic lock remastering (discussed later on in this section). LMON will generate a trace file whenever a reconfiguration occurs (as opposed to remastering of a subset of locks). It is the responsibility of LMON to check for the death of instances clusterwide, and to initiate reconfiguration as quickly as possible.

LMS: Lock Manager Server Process

The LMS process (or global cache service process) is in charge of shipping the blocks between instances for cache-fusion requests. In the event of a consistent-read request, the LMS process will first roll the block back, creating the consistent read (CR) image of the block, and will then ship that version of the block across the interconnect to the foreground process making the request at the remote instance. In addition, LMS must interact with the LMD process to retrieve lock requests placed by LMD. An instance may dynamically generate up to 10 LMS processes, depending on the load.

Cache Fusion: A Brief Intro

To understand further what these background processes are doing, let's take a moment to discuss the basics of how RAC works in terms of managing access to shared datafiles from multiple nodes. The centerpiece of this is what is known as cache fusion. Cache fusion essentially enables the shipping of blocks between the SGAs of nodes in a cluster, via the interconnect. This avoids having to push the block down to disk, to be reread into the buffer cache of another instance. When a block is read into the buffer cache of an instance in a RAC environment, a lock resource is assigned to that block (different from a row-level lock) in order to ensure that other instances are aware that the block is in use. Then, if another instance requests a copy of that block, which is already in the buffer cache of the first instance, that block can be shipped across the interconnect directly to the SGA of the other instance. If the block in memory has been changed, but that change has not been committed, a CR copy is shipped instead. This essentially means that, whenever possible, data blocks move between each instance's buffer cache without needing to be written to disk, with the key being to avoid any additional I/O being necessary to synchronize the buffer caches of multiple instances. This is why it is critical to have a high-speed interconnect for your cluster-because the assumption is that the interconnect will always be faster than going to disk.

Dynamic Resource Mastering

In addition, a new feature in Oracle Database 10g is the concept of dynamic resource remastering. Locks are resources that are held in the SGA of each instance, and they are used to control access to database blocks. Each instance generally holds, or masters, a certain amount of locks, which are associated with a range of blocks. When an instance requests a block, a lock must be obtained for that block, and it must be obtained from the instance that is currently mastering those locks. This may or may not be the same instance that is requesting it. In Oracle Database 10g, the concept of dynamic remastering is introduced, which essentially means that if a certain instance is requesting locks for certain blocks more often than any of the other instances, that lock will eventually be moved into the SGA of the requesting instance, making future lock requests more efficient.

Reconfiguration

In the case of a node's death, the process of remastering that node's locks across the remaining instances is referred to as a reconfiguration. When a node or an instance dies or is taken offline, the locks (resources) that were previously mastered in that instance's SGA are distributed among the remaining instances. In the case of an instance rejoining the cluster, a reconfiguration will again take place, and the new instance will end up mastering a portion of the locks previously held by the other instances in the cluster. As mentioned above, this is known as a reconfiguration, and is different from dynamic remastering.

A reconfiguration can be seen in the alert log, prefaced with the line

Reconfiguration started (old inc 1, new inc 2)

and ending with the line

Reconfiguration complete

In addition, you can view the allocation of resources between instances by querying the view GV$RESOURCE_LIMIT.

Cache Coherency in a RAC Environment

Aside from the normal performance metrics monitored in a single-instance environment, a RAC environment requires some additional metrics be monitored. Primarily, we recommend that the HA DBA focus on metrics related to messages across the interconnect, to gauge the amount of traffic across the interconnect and also the response time. This traffic essentially falls into two categories when it comes to your RAC database-global cache services (GCS) and global enqueue services (GES).

Global Cache Service

The global cache service relates to the idea of the global buffer cache, which is integral to the cache-fusion concepts. As such, global cache is referring to database blocks. The global cache service is responsible for maintaining cache coherency in this global buffer cache by ensuring that any time an instance attempts to modify a database block, a global lock resource is acquired, avoiding the possibility that another instance modifies the same block at the same time. The instance making the change will have the current version of the block (with both committed and uncommitted transactions) as well as a past image of the block. Should another instance request that block, it is the duty of the global cache service to track who has the block, what version of the block they have, and what mode the block is held in. The LMS process is the key component of the global cache service.

Global Enqueue Service

Aside from the maintenance and management of database blocks themselves, it is also incumbent in a RAC environment that certain other resources be coordinated between nodes. Enter the global enqueue service. The global enqueue service, or GES, is responsible primarily for maintaining coherency in the dictionary cache and the library cache. The dictionary cache is essentially a cache of data dictionary information stored in the SGA of an instance for fast access. Since this dictionary information is stored in memory, changes on one node that result in dictionary changes (such as DDL statements) must be immediately propagated to the dictionary cache on all nodes. The GES is responsible for handling this and avoiding any discrepancies between instances. By the same token, library cache locks are taken out on objects within the database for parsing of SQL statements that affect those objects. These locks must be maintained among instances, and the global enqueue service must ensure that deadlocks do not occur between multiple instances requesting access to the same objects. The LMON, LCK, and LMD processes work together to perform the functions of the global enqueue service.

GV$ Views

With all of this globalization we have discussed, it is only natural that we point out the addition of global views in a RAC environment. These are views that are the same as the more commonly used V$ views, except that they will have a column added called INST_ID, which will map to the instance_number of each instance in the cluster. As such, in a three-node cluster, queries against a GV$ view will give you three times as many rows, because each instance will have its own set of data to display. Knowing this, you can query stats for a specific instance or all instances without needing to connect specifically to each node. An example of some of the information you can gather is shown in the following query:

SQL> select * from gv$sysstat where name like '%gcs %'; INST_ID STATISTIC# NAME                                CLASS       VALUE ------- ---------- ------------------------------ ---------- -----------       1         38 gcs messages sent                      32         716       2         38 gcs messages sent                      32       57325

This allows you to see, with one query, that instance number 1 has sent 716 gcs messages, whereas instance number 2 has sent 57325 messages, a disparity that may warrant further investigation.

Monitoring RAC Metrics Using AWR

As discussed in Chapter 3, the reports generated from the Automatic Workload Repository in a RAC environment will contain a section titled 'RAC Statistics,' where you can easily gather information on operations related to the global cache and global enqueues. You can get a quick look at the frequency of reads from the local cache, vs. the remote cache, and how frequently you need to go to disk for a read, as well as viewing average get times for various operations. Generally, these average values should not be higher than 30 ms, as the top range, but normally are lower. For example, the average current block receive time may be as high as 30 ms, but the average CR block receive time should not be much higher than 15 ms. The sum of the current block pin, and send and receive time altogether account for the total average time to process a current block request, and this total should not exceed 30 ms.

Interconnect Performance If you find that the performance of the interconnect is sub-par, you should likely begin with looking at the interconnect hardware. The first thing, of course, is the speed itself. As we have mentioned several times, you want the fastest possible network to be used for the interconnect. In addition, you should ensure that the UDP buffers are set as high as possible. On Linux, you can check this via the following command:

sysctl net.core.rmem_max net.core.wmem_max net.core.rmem_default net.core.wmem_default net.core.rmem_max = 65535 net.core.wmem_max = 131071 net.core.rmem_default = 65535 net.core.wmem_default = 65535

Alternatively, you can read the associated values directly from the respective files in the directory /proc/sys/net/core. These values can be increased to a max of 256K apiece, via the following SYSCTL commands:

sysctl -w net.core.rmem_max=262144 sysctl -w net.core.wmem_max=262144 sysctl -w net.core.rmem_default=262144 sysctl -w net.core.wmem_default=262144

< Day Day Up >