|< Day Day Up >|| |
To ensure that your CRS install goes as smoothly as possible, you want to be certain that you have done the necessary preparation. If so, the actual install itself will take only a few minutes, but the preparation is the key. This can be broken down into two major categories: first, configuring the shared storage, and second, configuring the networking components. In an upcoming section, we will go through a workshop that walks you through the steps for configuring the shared storage.
HA Workshop: Configure a Linux/RAC Cluster
This HA Workshop is geared toward setting up a test environment with a two-node RAC cluster running Red Hat Linux 3.0 Advanced Server. While you have several options to choose from for setting up a test environment, including shared SCSI devices, NFS, and so forth, we have chosen to use Firewire drives for this workshop. The primary reasons for this are the cost and the simplicity. We caution you that Firewire is not an officially supported environment for RAC, and there are some gotchas. However, there is a concerted effort underway to support the IEEE 1394 standard on Linux (see http://www.linux1394.org ), and as such, later versions of the kernel are increasingly improving support for Firewire. We demonstrate it here because it does afford a low-cost means of testing the functionality and concepts of RAC, using hardware that is readily affordable and accessible for just about anyone. We believe in a chicken in every pot, two cars in every garage, and a test cluster under the desk of every DBA. In the vein of a very low-cost test system, we were able to configure a two-node RAC cluster using hardware that cost around $1,500.00 (less than the cost of the laptop I am currently using to write this chapter). While you do not have to do it so cheaply, this also demonstrates how RAC can be used to turn low-cost commodity hardware into a highly available system.
Step 1. Procure/beg/borrow the necessary hardware. The hardware specs on our cluster are as follows:
Two Refurbished Single CPU Pentium 4 machines - $825.00 Single CPU - 2.6 GHz Processor 512 MB RAM 40 GB Internal Hard Drive CD-ROM Drive On-Board 100 MB Ethernet Memory upgrade to 1GB on each node - $300.00 80GB IDE Drive - $117.00 Firewire Enclosure and Firewire cables - $95.00 2 Firewire Cards - $95.00 2 Additional Ethernet Network Cards - $40.00 10/100 Ethernet Switch - $50.00
The Firewire enclosure must support the OXFORD911 chipset standard, to allow simultaneous multinode access. For more information, see http://oss.oracle.com/projects/firewire/.
Step 2. Install Red Hat 3.0 on each node-use the Install Everything option, to ensure that you have all of the necessary components. During the install, ensure that both network cards are in place to simplify the assignment of IP addresses to the cards. See the upcoming section on network configuration for information on IP addresses for the private cards.
Step 3. Install kernel-unsupported-2.4.21-4.EL.i686.rpm from Red Hat Disk3 on each node. This kernel has device support for many nonstandard devices, including the Firewire cards and drive:
cd /mnt/cdrom/RedHat/RPMS rpm -iv kernel-unsupported-2.4.21-4.EL.i686.rpm
Step 4. Add the following lines to the /etc/modules.conf on each node:
options sbp2 sbp2_exclusive_login=0 post-install sbp2 insmod sd_mod post-remove sbp2 rmmod sd_mod
The sbp2_exlcusive_login=0 essentially disables exclusive access to the device from one node-by default, this value is set to 1, meaning exclusive_login is true, or enabled, and only one node will be able to access the disk. Setting this to 0, or disabled, will allow both nodes in the cluster to access the disk simultaneously. In theory, this should allow up to four machines to access the drive at once.
Step 5. Reboot each node. During the reboot, you should be in front of the console during the boot process, as Red Hat should detect your disk during startup. The Add Hardware Wizard will pop up and prompt you to configure the disk, but if you do not respond within 30 seconds, the boot process will continue without configuring the disk. If the reboot does not detect your disk, proceed to Step 6 regardless. You may need to reboot again, after running the commands in Step 6.
Step 6. After the reboot, run the following commands to detect the cards and drive:
modprobe sd_mod modprobe scsi_mod modprobe ohci1394 modprobe sbp2 echo "scsi add-single-device 0 0 0 0" > /proc/scsi/scsi
As an alternative to the ECHO command, you can also run the rescan-scsi-bus.sh shell script, available for download from many sites, including the following:
Step 7. After running these commands, you should be able to see the drive. You can confirm this by running the dmesg command:
dmesg | more
Look through the dmesg output for lines such as
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB) sda: sda1 < sda5 sda6 sda7 sda8 sda9 >
In this case, you can see that the scsi device is named sda. You can now begin to prepare the drive by running commands such as fdisk, by using /dev/sda as the target:
Step 8. Once you are in fdisk, type m to see the full menu of commands. To create a new partition, choose n. You should first create an extended partition, equal to the size of the entire shared disk. The first partition will be created as /dev/sda1. After this, the remaining partitions can be created as logical partitions. The number of partitions you create will depend on what type of file system you intend to use. For OCFS, we recommend a 100MB RAW partition for the OCR and a 50MB RAW partition for the voting disk, and then the rest can be used for the CFS disk. For ASM, we recommend the same, where the remainder is used for ASM. However, for ASM, if you want to test the functionality of having multiple disks, you can partition the remaining space into multiple logical partitions. This will allow you to simulate the behavior of adding and removing disks from a disk group, where the disks are actually just partitions on the same drive-but of course, it will not give you any true redundancy, should the actual disk fail. Once finished with the partitioning, type w to save and exit. The command fdisk -l will allow you to see all of the partitions created and confirm their configuration.
Step 9. In order to ensure that the disk is visible automatically on reboot from each node, you can add the commands listed in Step 6 to the /etc/rc.local file, which is executed at startup each time the machine is restarted. The /etc/rc.local script will then look like this:
touch /var/lock/subsys/local modprobe sd_mod modprobe scsi_mod modprobe ohci1394 modprobe sbp2 echo "scsi add-single-device 0 0 0 0" > /proc/scsi/scsi
You can also confirm that the disk is discovered by checking the /var/log/messages file:
# cat messages | grep sda Jan 10 14:57:37 rmsclnxclu2 kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Jan 10 14:57:37 rmsclnxclu2 kernel: SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB) Jan 10 14:57:37 rmsclnxclu2 kernel: sda: sda1 < sda5 sda6 sda7 sda8 sda9 >
As we mentioned, Firewire drives do have their quirks. If, after running through the previous steps, the drive is still not recognized, try powering off all nodes and the drive. Then power each piece on one at a time, starting with the drive first. Wait for each node to boot completely before attempting to start the next node.
For networking configurations, as we mentioned above, you should absolutely have a minimum of two network cards, meaning that at least one of these is dedicated solely to the private interconnect-this will be used for cluster communication as well as cache fusion traffic between the instances, once you have your database created. It is imperative that these networks be correctly configured before the installation of CRS.
The card that is used for the public network should, of course, be assigned an IP address that is accessible on the network, and most likely that IP address will be registered in your DNS server. The IP assigned to the card for the private network will depend on how you configure that network. While you may be able to successfully configure a two-node cluster using only a crossover cable, it is strongly recommended that a switch (not a hub) be used for the private network. This will allow for more than two nodes on the interconnect, should the need arise, and will also allow for constant activity to the private card, even if or when the other node is down, preventing the card from being shut off or disabled by the OS.
In addition, it is strongly recommended that the network speed for the interconnect be as fast as you can afford. While 100MB Ethernet cards and switches are adequate for testing purposes (certainly for our simple test environment that we have demonstrated), in a production environment you should plan for a Gigabit Ethernet network for the interconnect, at a minimum. The advantages of cache fusion in a RAC environment can be quickly lost if the interconnect is not robust enough to handle the amount of traffic required to ship high numbers of blocks between instances.
The first step in the configuration of the network piece, after having installed multiple NICs in the machine, is to configure the public and private names in the /etc/hosts file. Even if the public names are registered in your network's DNS server, we recommend defining them in the hosts file as well. If the network for your interconnect is dedicated for use only by nodes of the cluster (as we have recommended), you may assign any IP address that you wish to these cards, provided each machine knows about it by having an entry in the hosts file:
127.0.0.1 localhost.localdomain localhost 126.96.36.199 rmsclnxclu1.us.oracle.com rmsclnxclu1 188.8.131.52 rmsclnxclu2.us.oracle.com rmsclnxclu2 184.108.40.206 rmscvip2.us.oracle.com rmscvip2 220.127.116.11 rmscvip1.us.oracle.com rmscvip1 10.1.1.2 private2 10.1.1.1 private1
In this example, the IP addresses and node names of private1 and private2 are defined in the hosts file on each node, and in the networking properties of the network card itself. Verify this by running the ifconfig -a command. The output should look similar to this:
$ ifconfig -a eth0 Link encap:Ethernet HWaddr 00:01:03:2C:69:BB inet addr:18.104.22.168 Bcast:22.214.171.124 Mask:255.255.252.0 ... eth1 Link encap:Ethernet HWaddr 00:0B:DB:C0:2E:C4 inet addr:10.1.1.1 Bcast:10.1.3.255 Mask:255.255.252.0 ...
So interface eth0 is the public NIC, while eth1 is the private NIC. You will also notice in the sample hosts file above that there are two additional entries-rmscvip1 and rmscvip2. These entries are not currently used, and won't be used by the CRS install, but these are the virtual IPs that will be needed for the installation and creation of the database. Note that these IP addresses are on the same subnet as the public card-this is required, as these host names/IP addresses will be used for the actual client connections once the database is configured. Therefore, the VIPs must be valid, unused IP addresses on your public network, and must be on the same subnet as the public cards. For more information on the VIPs, refer to Chapter 11.
Once the network cards and the hosts file are configured, check your work by issuing a ping command of each host name, both public and private, from each node, including pinging yourself. For example, from Node1, which is rmsclnxclu1, make sure that you can first ping your own public name, followed by your own private name, and then ping the remote node's public and private name:
[root@rmsclnxclu1 etc]# ping rmsclnxclu1 PING rmsclnxclu1.us.oracle.com (126.96.36.199) 56(84) bytes of data. 64 bytes from rmsclnxclu1.us.oracle.com (188.8.131.52): icmp_seq=0 ttl=0 time=0.045 ms [root@rmsclnxclu1 etc]# ping private1 PING private1 (10.1.1.1) 56(84) bytes of data. 64 bytes from private1 (10.1.1.1): icmp_seq=0 ttl=0 time=0.032 ms [root@rmsclnxclu1 etc]# ping rmsclnxclu2 PING rmsclnxclu2.us.oracle.com (184.108.40.206) 56(84) bytes of data. 64 bytes from rmsclnxclu2.us.oracle.com (220.127.116.11): icmp_seq=0 ttl=64 time=0.456 ms [root@rmsclnxclu1 etc]# ping private2 PING private2 (10.1.1.2) 56(84) bytes of data. 64 bytes from private2 (10.1.1.2): icmp_seq=0 ttl=64 time=0.199 ms
You can also ping the VIPs, even though you will not get a response-however, you will be able to check that the names are resolving to the correct IP addresses:
[root@rmsclnxclu1 etc]# ping rmscvip1 PING rmscvip1.us.oracle.com (18.104.22.168) 56(84) bytes of data. From rmsclnxclu1.us.oracle.com (22.214.171.124) icmp_seq=0 Destination Host Unreachable
Repeat these steps from each node in the cluster, being sure to ping all nodes and all IP addresses. Remember that the VIPs will be unreachable at this stage of the install, but all of the other IP addresses (public and private) should respond successfully from each node. If not, then revisit the networking setup and make sure that there are no typos in the hosts file or in the network card definition. You may need to add both the public and private node names into the /etc/hosts.allow file.
As part of the networking setup, you must also set up user equivalence between the nodes in the cluster. Starting with Oracle Database 10g, the Oracle Universal Installer will first test user equivalence using ssh and scp. If this works, the secure shell and secure copies will be used. Otherwise, the OUI will revert back to rsh/rcp for the install. Since rsh and rcp are the most commonly used from past versions, we will begin by describing the configuration of user equivalence using rsh. You can do this by setting up entries in either /etc/hosts.equiv or in the .rhosts file in the Oracle user's home directory. You will want to add entries for both the public and private node names under the oracle account as follows:
rmsclnxclu1 oracle rmsclnxclu2 oracle rmsclnxclu1.us.oracle.com oracle rmsclnxclu2.us.oracle.com oracle private1 oracle private2 oracle rmscvip1 oracle rmscvip2 oracle rmscvip1.us.oracle.com oracle rmscvip2.us.oracle.com oracle
In order for rsh to work on Linux, you must also enable it by setting disable=no in the /etc/xinetd.d/rsh file. The same can be accomplished by running ntsysv as root (command-line menu interface), or by running the GUI serviceconf utility and checking the box next to rsh so that it is set to start automatically at reboot. The serviceconf utility will allow you to click a restart button and start rsh right away. Test that rsh works by running a command similar to the following from each node (the following command was run from node rmsclnxclu1):
# rsh rmsclnxclu2 hostname rmsclnxclu2
Using SSH for User Equivalence In some instances, users may prefer to use ssh and scp, rather than rsh and rcp for user equivalence. Starting with version 126.96.36.199.0 of the Oracle Universal Installer, the installer will first try to use ssh if it is present, and will revert back to rsh if not. Thus, if you prefer, you may use ssh. A basic ssh configuration can be achieved by using the ssh-keygen utility and the DSA Protocol 2, which is part of the Red Hat 3.0 install. For example, on our two-node cluster, run the following from Node1 (rmsclnxclu1):
oracle@/home/oracle>: ssh-keygen -t dsa Generating public/private dsa key pair. Enter file in which to save the key (/home/oracle/.ssh/id_dsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/oracle/.ssh/id_dsa. Your public key has been saved in /home/oracle/.ssh/id_dsa.pub.
On the above command, we took all defaults. Once the public key is created, write out the contents of id_dsa.pub to the authorized_keys file:
oracle@/home/oracle>: cd .ssh oracle@/home/oracle/.ssh>: cat id_dsa.pub > authorized_keys
Next, do a binary ftp of the file authorized_keys to Node2. Repeat the process of generating the key on Node2 by running ssh-keygen -t dsa as user oracle on Node2, again just pressing ENTER to accept the defaults:
Generating public/private dsa key pair. Enter file in which to save the key (/home/oracle/.ssh/id_dsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/oracle/.ssh/id_dsa. Your public key has been saved in /home/oracle/.ssh/id_dsa.pub.
Now, append the information from id_dsa.pub into the authorized_keys file as follows, on Node2:
oracle@/home/oracle/.ssh>: cat id_dsa.pub >> authorized_keys
Node rmsclnxclu2 now has an authorized_keys file that will allow the oracle user from either node to do an ssh to Node2. Copy this file back over to Node1 now, replacing the original authorized_keys file on that node. Test the user equivalence setup from each node via a command such as
oracle@/home/oracle/.ssh>: hostname rmsclnxclu1 oracle@/home/oracle/.ssh>: ssh rmsclnxclu2 hostname rmsclnxclu2
In the 'Configure a Linux/RAC Cluster' HA Workshop, we described how to configure a Firewire disk for shared access. Whether you are using a test Firewire cluster or a SAN with SCSI/Fiber connections, the disk configuration is the same. Once both nodes can see the disk, we must still create the partitions on those disks that CRS will need for the install. If you decide to use OCFS, the preparatory work will involve making sure that the OCFS drive is formatted and mounted on each node prior to the install, and directories are created so that the necessary CRS files can be created. If you plan to use either RAW devices or ASM, you must ensure that, at a minimum, you create two partitions for the CRS install to use (as previously mentioned, for the OCR and voting disk). Of course, you will need more partitions later on, for the database, particularly if using straight RAW devices, as you will need one partition for each datafile. However, for purposes of the CRS install, we will initially focus on the two that we need at the moment.
If using RAW devices or ASM, the device needed by the CRS install for the OCR disk should be a minimum of 100MB. This should be kept in mind when creating the partitions using fdisk. We recommend the same minimum size for the voting disk. The actual space used by the OCR will depend on the number of nodes, databases, and services in your cluster-100MB should be large enough to accommodate the current supported maximum number of nodes and services. To bind a RAW device to a partition, you need to add an entry for each partition to the /etc/sysconfig/rawdevices file. If using RAW devices, you will be creating the two devices for the OCR and voting disk, and then several other partitions for all of the datafiles. If using ASM, you should only create the partitions for the OCR and voting disk, and then one partition on each disk (unless you simply want to test the functionality by having multiple partitions on a single disk).
Use the output from the fdisk -l command to add entries into the /etc/sysconfig/rawdevices file. An example of the fdisk -l output is seen here:
Disk /dev/sda: 80.0 GB, 80026361856 bytes 255 heads, 63 sectors/track, 9729 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 9729 78148161 5 Extended /dev/sda5 1 13 104359+ 83 Linux /dev/sda6 14 26 104391 83 Linux /dev/sda7 27 3066 24418768+ 83 Linux /dev/sda8 3067 6106 24418768+ 83 Linux /dev/sda9 6107 9146 24418768+ 83 Linux
Taking that information, we see that /dev/sda5 and /dev/sda6 are each 100MB. These will be used for the OCR and voting disk. The remaining partitions are 25GB apiece, and will be used for ASM disk groups. So, the rawdevices file on each node looks like this:
# raw device bindings # format: <rawdev> <major> <minor> # <rawdev> <blockdev> /dev/raw/raw1 /dev/sda5 /dev/raw/raw2 /dev/sda6 /dev/raw/raw3 /dev/sda7 /dev/raw/raw4 /dev/sda8 /dev/raw/raw5 /dev/sda9
After saving the rawdevices file, run ntsysv and make sure that the box next to the rawdevices service is checked. Run SERVICE RAWDEVICES RESTART to restart the service immediately without the need to reboot. The modifications to /etc/syconfog/rawdevices and the restart of the rawdevices service has to be done on each node. The location for our OCR will now be /dev/raw/raw1, with the location for the voting disk being /dev/raw/raw2. Ownership of all of the devices needs to be changed to oracle, and permissions set to 755.
If you are using OCFS, you could simply create a single partition on each disk via fdisk, and there is no need to add anything to the rawdevices file. With OCFS, we have already pointed out that the voting disk and the OCR can be files rather than partitions, but this requires you to have formatted and mounted the shared drive before starting the CRS install. To do this, you must first download the OCFS driver and OCFS tools from Oracle's web site at http://oss.oracle.com. The files that you need will, of course, depend on the operating system version, and also the number of CPUs on the system-either single CPU of SMP system. Specifically, for Red Hat 3.0, the location for download is http://oss.oracle.com/projects/ocfs/files/RedHat/RHEL3/i386/.
For systems with multiple processors, be sure to download the SMP versions of the files.
The OCFS versions are subject to change as updates are put out. Note also, as we mentioned previously on Linux, that if you are putting the OCR and voting disk on an OCFS drive, ensure that you have the latest version of OCFS, containing the fix for Patch 3467544. In addition, at the time of printing, OCFS Version 2, which will support the ORACLE_HOME as well as additional enhancements, is a beta release. Information on OCFS Version 2 can be found at http://oss.oracle.com/projects/ocfs2/.
Using ocfstool Once the kernel is installed, ocfstool, part of the ocfs-tools rpm, is used to format and mount the drive. We will go through an HA Workshop in the next section that walks you through the steps of formatting and mounting an OCFS drive. The drive needs to be mounted from each node, and then you must determine what directory structure you want to use for the OCR and voting disk. Create the necessary subdirectories, changing the ownership of the subdirectories to the oracle user and setting the permissions as appropriate on those directories.
As we have alluded to previously, it is possible to use a combination of RAW/ASM and/or OCFS for the shared partitions. This likely makes the most sense if you are using OCFS for the ORACLE_HOME and ASM for the disks. In this case, you would need at least one shared OCFS formatted partition to be used for the ORACLE_HOME (you would be able to put the OCR and voting disks on this partition). You would then create RAW partitions, defined in the /etc/sysconfig/rawdevices file, which would be used later on as components of the ASM disk groups. ASM and OCFS will work together in this manner without any conflicts, provided you keep track of which partition is used for what.
HA Workshop: Install OCFS and Configure OCFS Drives
This workshop focuses on installing and configuring the Oracle Cluster File System on Linux. While Oracle may support third-party cluster file systems (or global file systems, as referred to in some cases), Oracle actually provides the OCFS drivers for both Windows and Linux. This workshop will walk you through the process of installing and configuring OCFS on Red Hat Linux 3.0.
Step 1. Download the latest OCFS rpms from http://oss.oracle.com, making them available to each cluster node (either locally or via an NFS mount point).
Step 2. Install the rpms on each node of the cluster in the following order, using this syntax:
[root@rmsclnxclu1 RPMS]# rpm -iv ocfs-support-1.0.9-12.i686.rpm Preparing packages for installation... ocfs-support-1.0.9-12 [root@rmsclnxclu1 RPMS]# rpm -iv ocfs-2.4.21-EL-1.0.9-12.i686.rpm Preparing packages for installation... ocfs-2.4.21-EL-1.0.9-12 Linking OCFS module into the module path [ OK ] [root@rmsclnxclu1 RPMS]# rpm -iv ocfs-tools-1.0.9-12.i686.rpm Preparing packages for installation... ocfs-tools-1.0.9-12
Step 3. ocfstool requires an X-Term; so to configure your display so that an X-Window connection can be made, first run xhost +:
[root@rmsclnxclu1 root]# xhost + access control disabled, clients can connect from any host [root@rmsclnxclu1 root]# DISPLAY=localhost:1 [root@rmsclnxclu1 root]# export DISPLAY
In this example, we used localhost:1 because we were connecting using the VNC client to a VNC server for root, which was listening on port 1. Your display setting may be different.
Step 4. ocfstool is loaded into the /usr/bin directory by default, which should be part of your path, so simply run ocfstool to open the X-Window:
[root@rmsclnxclu1 root]# ocfstool
Step 5. In order to prepare OCFS, we must first generate a configuration file by selecting the Generate Config option from the Tasks menu in the OCFS Tool screen. This will open up the OCFS Generate Config window, as seen in Figure 4-1. Change the interface to the private interface. In our example, we are using eth1 as the private network. Leave the port to its default value of 7000, and then put the private name in the Node Name box. Repeat the same steps on all cluster nodes, and then close out of OCFS tool.
Figure 4-1: Generate Config Window in ocfstool
Step 6. On each cluster node, create a mount point for the OCFS drive(s). This mount point must be the same on each node, and should preferably be created directly off of the root. In this example, we are using /ocfs as the mount point:
[root@rmsclnxclu1 root]# cd / [root@rmsclnxclu1 /]# mkdir ocfs
Step 7. Determine which partition(s) on the shared drive you want to use for OCFS. In our case, the partition is going to be /dev/sda7. Start ocfstool back up on one of the nodes, and this time choose Format from the Tasks menu. The OCFS Format window will open up, as seen in Figure 4-2. In the Device box, select the device you have determined will be the OCFS drive (/dev/sda7). Select the appropriate mount point, leaving the rest of the values as their defaults, and then click OK.
Figure 4-2: OCFS Format window
Step 8. After a few moments, the format operation should complete and you will be returned to the OCFS Tool window. Your device (/dev/sda7) should now be listed and highlighted (see Figure 4-3). Click on Mount to mount it to your directory. You should now be able to start ocfstool on the other nodes and mount the device on those nodes as well.
Figure 4-3: Mounting the drive via ocfstool
Step 9. One way to set up the drive such that it mounts automatically each time the node is rebooted is to add the following line to the /etc/rc.local script, using ocfs as the file system type:
mount -t ocfs /dev/sda7 /ocfs
Step 10. The mount point (/ocfs) should remain owned by root. As root, you can now cd into the /ocfs directory and create additional subdirectories, whose ownership can then be changed to the oracle:oinstall group:
[root@rmsclnxclu1 /]# mkdir /ocfs/ocr [root@rmsclnxclu1 /]# mkdir /ocfs/vote [root@rmsclnxclu1 /]# mkdir /ocfs/oradata [root@rmsclnxclu1 /]# chown oracle:oinstall /ocfs/*
|< Day Day Up >|| |