2.9 Disk allocation | Oracle Real Application Clusters

< Day Day Up >

After identifying the best suitable architecture for implementation and selecting the required disk stripe/mirror RAID, the next step would be to determine the implementation of the disk system itself. There are two types of disk allocation: the traditional file system method, or using them directly as raw devices. While the raw device is the basic structure of any disk system, the file system is a software implementation over the raw devices to help create easy manageability and maintenance of the files that are stored on them.

2.9.1 Raw partitions

A raw device partition is a contiguous region of a disk accessed by a Unix character-device interface. This interface provides raw access to the underlying device, arranging for direct I/O between a process and the logical disk. Therefore, the issuance of a write command by a process to the I/O system directly moves the data to the device.

2.9.2 File system

A Unix file system is a hierarchical tree of directories and files implemented on a raw device partition through the file system of the kernel. The file system uses the concept of a buffering cache that optimizes the number of times the operating system must access the disk. The file system releases a process that is executing a write to disk by taking control of the operation, thus freeing the process to continue other functions. The file system then attempts to cache or retain the data to be written until multiple data writes can be done at the same time. This can have the effect of enhancing system performance.

However, system failures before writing the data from the cache can result in the loss of file system integrity. Additionally, the file system adds overhead to any operation that reads or writes data in direct accordance to its physical layout.

Shared file systems allow access from multiple hosts to the same file system data. This reduces the amount of multiple copies of the same data, while distributing the load across those hosts going to the same data.

2.9.3 Tradeoffs

Performance

Bypassing Unix file buffering results in savings on every disk read or write. These savings show up as a throughput improvement only if disk I/O is the system performance bottleneck. Very large database sites with high transaction volumes will certainly have concerns over the performance rates of disk I/O. The conversion to raw device partitions should improve disk I/O performance. These performance gains will be realized to an even greater extent when an online backup is in contention for the physical files that make up the database.

Memory

The memory used by Unix to buffer file I/O can be better used by the RDBMS that does its own I/O and caching. The more memory a machine has, the less effective using raw devices becomes.

Complexity

Using raw device partitions introduces a level of complexity in configuration planning, administration, and movement of databases. Each of these, while important, generally is not a problem because very large database (VLDB) sites tend to have experienced database administrators (DBAs) and this activity of setting up and configuration is more or less a one time activity.

A volume manager (VM) is a software tool that helps a system administrator manage their storage resources. It enables them to group many devices into a single volume group that is then managed as a single entity. This simplifies the management of a large database, and also improves database performance. In many environments, the volume manager is also used to create striped volume sets.

In an Oracle environment, with RAC or an OPS implementation, there is a requirement for a clustered volume manager to set up shared devices. This is required by RAC and OPS to run on clustered and shared devices. By setting these attributes on and off for a single device group, versus many single devices, it enables quick and easier management in a large database configuration.

With OPS the data files must be on raw devices. The archive logs must be on a file system, and the redo logs can reside on either a file system or on a raw partition. Placing the redo logs on a file system requires that the files be cross-mounted using a network file system (NFS). This allows them to be visible from both nodes for recovery. Having the redo logs on a file system simplifies administration but does have an impact on performance. Loss of an NFS mount point due to node failure can cause invisibility of the redo log files. To avoid non-availability of redo log it is recommended that the redo log files be located on raw devices.

With RAC the requirement of raw partitions is removed on most platforms. With the exception of Sun Solaris and HP-UX, for RAC implementation on all other operating systems the requirement to have raw partitions for data files and other optional files is removed. They could be implemented on file systems, like the stand-alone Oracle database.

< Day Day Up >