|< Day Day Up >|| |
As mentioned previously, our focus here is primarily on commodity clusters, so this section will be focused on configuring the Linux operating system in preparation for a CRS and RAC install. We will also go into some of the specific tasks needed for operating system configuration-tasks normally handled by an experienced systems administrator, but that more and more must be undertaken by the DBA. We will discuss the various storage and file configurations available, the preparation needed for the different types of storage, and other requirements such as networking pieces, kernel requirements and memory requirements.
On most platforms, CRS gives you three possible configurations to be used for storage-the two basic configurations available on all platforms are to use either RAW devices or Automatic Storage Management (ASM) for the shared files required by CRS and RAC. In addition, Oracle offers its own cluster file system (OCFS) on both the Linux and Windows platforms. Other vendors also offer cluster file systems on other platforms. A fourth option on the Linux platform is to use NFS mounts via a certified network attached storage device (such as Network Appliance) for storage of the shared database and CRS files. We will attempt to describe the basics of these options in the 'Shared Storage Configuration' section, later in this chapter.
The one thing all storage options listed above share in common is that they all involve disks that are accessible from multiple nodes, simultaneously. This is often referred to as a shared everything architecture, referring to all disks as being shared. Some cluster vendors and DBMS utilize a different architecture-that is, a shared nothing architecture, in which a disk can only be accessed by one node at a time. If a node accessing a disk goes down, that disk must be failed over to another, surviving node. The advantage that Oracle has with the shared everything architecture is that any node can access the data needed on any disk at any given time. It is not necessary to go through one specific node to get to the disks.
Oracle requires that all controlfiles, all online redo logs, and all database files be stored on the shared drives, and these files are therefore accessible from all nodes in the cluster. In addition, certain files such as archived redo logs, the system parameter file, and flashback logs can be stored on the shared drives, depending on the type of storage.
If using straight RAW devices for all files, you are essentially forgoing a file system altogether. Since there is no file system, you must create a separate partition for every single file that you will be using for the database. Soft links are then created to the RAW devices, and Oracle writes files out to the RAW devices via the link names. As such, it is not practical or possible to store archived logs on RAW devices. The advantage of RAW devices is that they generally allow for faster I/O. However, disadvantages include the complexity of managing these files without the use of a file system. Also, on platforms such as Linux, there is a limit to the number of partitions you can create, thus a limit to the number of files you can have.
As we discussed in Chapter 3, Automatic Storage Management allows the use of block devices (or RAW devices), which are managed completely by Oracle. In many ways, this removes the complexity of RAW devices, while still giving the advantage of speed that block devices provide. When using ASM, all file management (including taking of backups) must be done from within the database. While archived logs and flashback logs can be stored on ASM disk groups, the Oracle binaries cannot, so it is still required to have either a cluster file system for the Oracle install (supportability depends on the platform and version of the cluster file system) or install to the private drives of each node. Since ASM is volume management that is specific to the Oracle database, it is designed with RAC in mind, so it is an ideal choice as a shared storage mechanism for database files in a RAC environment.
As mentioned previously, Oracle provides a cluster file system for both Windows and Linux platforms, which can be used for all required database files. The obvious advantage of the cluster file system is that it simplifies the management of the shared files by allowing a directory structure, and by allowing multiple files to be stored on a single shared disk/partition. In addition, on some platforms, the cluster file system can also be used as a shared location for the installation of the Oracle binaries for the ORACLE_HOME. The cluster file system Oracle provides for Windows has supported the ORACLE_HOME since the beginning. The cluster file system for Linux will support installation of the ORACLE_HOME, starting with version 2.x of OCFS for Linux.
On the Linux platform, it is also possible to have the shared device for the datafiles be on an NFS mount, using a certified network attached storage vendor. This requires that all nodes have network access to the device and have the same NFS mount point. The advantage is that it does also allow file system management, easing the complexity. The disadvantage is that network latency introduced into the file I/O equation can slow down performance, depending on the speed of your network and the cards. While NFS mount points should not be used for the ORACLE_HOME, you may use NFS mounts as a location for archived logs and flashback logs (via the flashback recovery area).
In order to configure a RAC environment, you must have two network cards in each node at a minimum: one network card to be used for client communication on the public network, and the other network card to be used for private cluster communication. With this configuration in mind, you must first configure the /etc/hosts file on each node to have a unique name for both the public and private nodes, ideally with easily identifiable host names for public and private-that is, node1, node1_private, node2, node2_private, and so on. While it is possible in a two-node cluster to have the private network consist of a simple crossover cable, that is not recommended, and on some platforms it is not supported because of the media sensing behavior of the operating system. On Windows, for example, if one node is powered down, the card for the interconnect on the surviving node is disabled because there is no more activity being sensed across the interconnect. This can lead to errors on the surviving node.
The best solution is to have a dedicated switch between the nodes, which means there is constant activity on the card (coming from the switch), even when the other node is down. Of course, this also allows for expansion of your cluster to more than two nodes. The other consideration for production environments is that the switch and the cards for the interconnect should be capable of handling all of the cache fusion traffic and other messaging traffic across the interconnect-therefore, you want this network to be a high-speed network (1GB Ethernet or faster).
The detailed kernel parameter requirements for each platform are discussed in the specific chapter of the installation and configuration guide for that particular operating system. In this section, we will touch briefly on the specific requirements for Red Hat Linux 3.0. With Red Hat 3.0, kernel changes can be made by making modifications to the /etc/sysctl.conf file. In our case, we simply added the following lines to the end of the file:
kernel.shmall = 2097152 kernel.shmmax = 536870912 kernel.shmmni = 4096 kernel.sem = 250 32000 100 128 fs.file-max = 65536 net.ipv4.ip_local_port_range = 1024 65000
For the case of kernel.shmmax, the value of 536870912 is equal to half of the physical RAM installed on each node.
In our previous section on shared storage and the types of files stored on the shared disks, we intentionally left out two specific files. As part of the CRS install, you will be prompted to provide a separate location for the Oracle Cluster Registry (OCR) and the voting disk used by CRS. We did not mention these files previously because they are not associated with any database in particular, but rather, these are files used by your cluster itself. They are required to be stored on the shared disk, as all nodes must have access to the files. The OCR is essentially the metadata database for your cluster, keeping track of resources within the cluster, where they are running, and where they can (or should) be running. The voting disk will be used for resolving split-brain scenarios: should any cluster nodes lose network contact via the interconnect with the other nodes in the cluster, those conflicts will be resolved via the information in the voting disk.
How you define the location of these files/disks depends on what medium you have decided to use for the shared storage. If you are using straight RAW devices, you must set aside two slices (partitions)-one for the OCR and one for the voting disk-before CRS is installed. The same is true if using ASM-while all of the database files can exist in ASM disk groups, the OCR and voting disk must be on RAW slices of their own. However, if you are using OCFS as the storage medium for the database, the OCR and voting disk can be just another file on the OCFS volume. The volumes must be configured and mounted using OCFS prior to the CRS install, and appropriate directories should be created for the files if that is the case. Lastly, in the case of NFS mounts, you can also use these mount points as the bastion for these files, similar to the case of OCFS.
Use caution when placing the OCR and voting disk on an OCFS drive on Linux. At this point, we recommend, even in the case where OCFS is used for the database files, that the OCR and voting disk be placed on RAW devices. If using OCFS for the location of the voting disk and OCR on Linux, be sure that you are on OCFS version 1.0.11 or higher.
|< Day Day Up >|| |