Chapter 4: RAC Setup and Configuration | Oracle Database 10g. High Availablity with RAC Flashback & Data Guard

< Day Day Up >

So what does grid computing mean to the database? The underpinnings of the 'database grid' is Real Application Clusters, or RAC for short. RAC came into its own in Oracle9i, realizing the potential of Oracle Parallel Server (OPS) by making applications truly scalable, without modifications. The central idea behind RAC is the same as the central idea behind grid computing-plug in nodes as needed to handle the additional workload, or remove nodes and move them elsewhere when the situation warrants.

Oracle Database 10g has taken that to the next level by both simplifying the process of adding and removing nodes, and by increasing the number of nodes one can have in a RAC cluster. Aside from this, Oracle has gone to the next level by providing end-to-end clustering solutions on all supported platforms, as well as providing its own cluster file system on some platforms and (as discussed in Chapter 3) a volume management solution with Automatic Storage Management (ASM). This chapter will explain the concepts behind RAC in Oracle Database 10g, as well as provide configuration steps for installing your own cluster. As before, in keeping with the philosophy of commodity hardware and OS, our focus will be on the Linux platform, but these principles will apply to all supported platforms. When appropriate, we will try to point out where certain steps are specific to the Linux platform.

Cluster-Ready Services (CRS)

Historically, Oracle has relied on operating system vendors to provide the clusterware layer needed at the OS to enable Oracle Parallel Server (OPS), and later on, RAC, to function. The normal process for the HA DBA would be to defer to the sysadmin and/or the hardware/OS vendor to configure the operating system, and then create a cluster using the OS vendor's software or clustering software from a third party. Once the OS and cluster were configured, the installation and configuration of Oracle (either OPS or RAC) would be undertaken by the DBA. This scenario could oftentimes lead to confusion and possibly fingerpointing between hardware vendors, software vendors, the sysadmin, and the DBA.

Starting with version 7.3.3.0.1 on the Windows platform, this began to change. Out of necessity, due to the lack of viable clustering software for Windows, Oracle began to provide a clustering layer on the Windows platform to enable the use of OPS on Windows. Eventually, with the rising popularity of Linux, Oracle was compelled to also provide clusterware for the Linux platform. Prior to Oracle9i, the clusterware on Windows was distributed through hardware vendors, coming indirectly from Oracle. Starting with Oracle9i, the clusterware became an integrated part of the Oracle9i Enterprise Edition offerings for both Windows and Linux.

Now, with Oracle Database10g, Oracle has introduced CRS, which is the logical next step in the evolution of the clusterware provided by Oracle. CRS is clusterware provided by Oracle to cluster together nodes on any supported operating system, including Sun, HP, Tru64, AIX, Windows, Linux, and so forth (all nodes in a cluster must be on the same operating system). It is possible to use CRS instead of the OS vendor clusterware, or third-party clusterware on any of these platforms. It is also possible to use CRS alongside third-party or operating system clusterware: if the HA DBA chooses to stick with the vendor-provided or third-party-provided clusterware, then CRS can be used to integrate Oracle clustering with the existing third-party clusterware, allowing the Oracle RDBMS to communicate and work correctly with the existing cluster vendor. When using Oracle's CRS, it is now supported to run RAC with the Standard Edition of Oracle.

CRS Architecture

CRS consists of three major components, which manifest themselves as daemons, run out of inittab on Unix operating systems, or as services on Windows. The three daemons are ocssd, or the cluster synchronization services daemon; crsd, which is the main engine for maintaining availability of resources; and evmd, which is an event logger daemon. Of these components, ocssd and evmd run as user oracle, while crsd runs as root. The crsd and evmd daemons are set to start with the respawn option, so that in the event of a failure they will be restarted. When running as part of CRS, the ocssd daemon is started up with the fatal option, meaning a failure of the daemon will lead to a node restart. This is required to prevent data corruption, should nodes lose contact with each other. Note, however, that the ocssd daemon is also used in single-instance environments, in order to enable the usage of ASM, as we mentioned in Chapter 3. If ocssd is running in a single-instance environment, independent of CRS, then failure of the daemon is not fatal to the node. We will go into a bit more detail on the two main components you need to be familiar with-namely, ocssd and crsd.

CSS: Cluster Synchronization Services

CSS is the foundation for interprocess communications in a cluster environment. As such, CSS is also used to handle the interaction between ASM instances and regular RDBMS instances in a single-instance environment. In a cluster environment, CSS also provides group services-dynamic information on which nodes and instances are part of the cluster at any given time, and static information, such as the names and node numbers of nodes (can be modified when nodes are added or removed). Also, CSS handles rudimentary locking functionality within the cluster (though most locking is handled by the Integrated Distributed Lock Manager within the RDBMS itself). In addition to other jobs performed, CSS is responsible for maintaining a heartbeat between the nodes in the cluster and monitoring the voting disk for split-brain failures.

CRSD

The crsd daemon is primarily responsible for maintaining the availability of application resources, aka services, as we will discuss in Chapter 6. The crsd daemon is responsible for starting and stopping these services, relocating them to another node in the event of failure, and maintaining the service profiles in the Oracle Cluster Registry (OCR). In addition, crsd is responsible for overseeing the caching of the OCR for faster access, and also backing up the OCR. Chapter 6 will be devoted to these operations, all of which fall under the realm of the crsd daemon.

Virtual IP Addresses, or VIPs

Oracle CRS takes advantage of the concept of virtual IP addresses, or VIPs, in order to enable a faster failover in the event of a failure of a node. Thus, each node will have not only its own statically assigned IP address, but also a virtual IP address that is assigned to the node. The listener on each node will actually be listening on the virtual IP (VIP), and client connections are meant to come in on the VIP. Should the node fail, the virtual IP will actually fail over and come online on one of the other nodes in the cluster.

Note that the purpose in doing so is not so that clients can continue to connect to the database using that VIP on the other node. The purpose of the IP address failover is to reduce the time it takes for the client to recognize that a node is down. If the IP has failed over and is actually responding from the other node, the client will immediately get a response back when making a connection on that VIP. However, the response will not be a successful connection, but rather a logon failure, indicating that while the IP is active there is no instance available at that address. The client can then immediately retry the connection to another address in the address list, and successfully connect to the VIP that is actually assigned to one of the existing/functioning nodes in the cluster. This is referred to as rapid connect-time failover, and will be discussed in more detail in Chapter 11.

CRS Installation

CRS is provided on a separate CD-ROM from the RDBMS install. CRS needs to be installed before the Oracle RDBMS and it must go into its own home, separate from the home for the Oracle RDBMS (this is generally referred to as the CRS_HOME). Ideally, CRS should be the first Oracle product that you install. For this reason (among others), if you are planning an Oracle Database 10g single-instance install but think you may use CRS/RAC in the future, you may want to consider installing CRS first, in its own home. You can then run the Oracle install as Local Only (that is, no RAC option), but you will have the opportunity later on to install with the RAC option, with CRS already in place. By doing this, you will enable the ocssd daemon running out of the CRS home, and this daemon can then be used for ASM by all other Oracle Database 10g installations, whether they are local only installs or RAC installs.

< Day Day Up >