0645-0647 | Oracle Development Unleashed (3rd Edition)

Previous Table of Contents Next

Page 645

Clustering is another way to improve performance through physical layout on the disk. It is sometimes advantageous to create an indexed cluster for a group of tables that are frequently joined. The index used to join the tables in the cluster must be declared as part of the cluster. Tables created in the cluster must specify this index in the CLUSTER clause of the CREATE TABLE statement. When an indexed cluster is created, each value for the cluster index (or cluster key) is stored only once. The rows of the tables that contain the clustered key value are physically stored together, as if already joined. This method minimizes the movement of read-write heads when accessing these tables and conserves disk space by storing the key value only once.

In the contact manager example, assume that the users will typically be contacting individuals by telephone, and that they will be retrieving this information much more frequently than they will be updating it. Clustering the Individuals and Phone Numbers relations should be considered in this case. The DDL script in Listing 25.6 illustrates how this cluster might be created. Listing 25.6. DDL script that creates an indexed cluster.

 CREATE CLUSTER Individual_Phone_Numbers (Individual_ID      NUMBER(10)) SIZE    256 STORAGE (INITIAL         1M          NEXT            1M          MAXEXTENTS     100          PCTINCREASE     10); CREATE TABLE Individuals (        Individual_ID          NUMBER(10)    PRIMARY KEY       ,Last_Name              VARCHAR2(30)  NOT NULL       ,First_Name             VARCHAR2(20)  NOT NULL       ,Middle_Initial         CHAR(1)       ,Last_Updt_User         VARCHAR2(20)  NOT NULL       ,Last_Updt_Timestamp    DATE          NOT NULL ) CLUSTER Individual_Phone_Numbers(Individual_ID); CREATE TABLE Phone_Numbers (        Phone_ID               NUMBER(10)    PRIMARY KEY       ,Individual_ID          NUMBER(10)    NOT NULL       ,Company_ID             NUMBER(10)    NOT NULL       ,Phone_Number           NUMBER(10)    NOT NULL       ,Phone_Type_ID          NUMBER(10)    NOT NULL       ,Last_Updt_User         VARCHAR2(20)  NOT NULL       ,Last_Updt_Timestamp    DATE          NOT NULL ) CLUSTER Individual_Phone_Numbers(Individual_ID); CREATE INDEX Indiv_Phone on CLUSTER Individual_Phone_Numbers;

Page 646

NOTE

The storage specification is not needed for the individual tables. Their rows will be stored in the cluster, and the storage specification for the cluster will be used by all objects that it will contain.

In Listing 25.6, the SIZE keyword indicates the size in bytes needed to store the rows corresponding to one key value. The size is always rounded up to the nearest block size. In the example, because the cluster key is stored only once, it will take only 245 bytes to store the rows for each cluster key. The example assumes that 256 bytes is the closest block size.

A second type of cluster, known as a hash key cluster, can also be utilized to improve performance. Rather than store rows based on a common indexed key, rows are stored together based on a common hash value, which is not physically stored in the database. The hash value is calculated at runtime, using either a user -defined function or Oracle's internal hashing function. Although this reduces physical disk accesses , hash key clustering should be used only on small tables or on tables that will have a maximum size that is known at design time. This is because the number of hash keys is fixed at design time, and resolving collisions requires additional reads.

Clustering should not be overused because there can be a negative impact on performance when clustered tables are accessed separately. Insert, update, and delete operations on clustered tables will typically be slower as well. For these reasons, the designer should carefully consider how tables will be used before creating a cluster.

Table partitioning and tablespace striping are techniques that consist of spreading a large database object across multiple disks. Oracle8 offers a number of new physical storage options that can be used to optimize performance for specific types of applications, including table partitioning. Table partitioning divides table data among two or more tablespaces and physical data files, ideally on separate disks. Currently, a table can be partitioned horizontally only, based on the value of one or more columns . For further information please see Chapter 7, "Oracle8 Server."

Performance can be greatly improved by striping, particularly when large tables are accessed by full table scans . The striping of a particular table can be forced by creating a tablespace with multiple data files on separate disks, each of which is smaller than the table itself. To provide maximum control over how the table is striped, it should be the only object in the tablespace. Listing 25.7 provides one example of how striping can be accomplished in Oracle.

Listing 25.7. An example of striping.

 CREATE TABLESPACE Individual_Stripes       DATAFILE `disk1/oradata/stripe1.dat' SIZE 100K REUSE,                `disk2/oradata/stripe2.dat' SIZE 100K REUSE       DEFAULT STORAGE (  INITIAL           200K                          NEXT              200K                          MAXEXTENTS         100

Page 647

 PCTINCREASE          0  ); CREATE TABLE Individuals (        Individual_ID          NUMBER(10)    PRIMARY KEY       ,Last_Name              VARCHAR2(30)  NOT NULL       ,First_Name             VARCHAR2(20)  NOT NULL       ,Middle_Initial         CHAR(1)       ,Last_Updt_User         VARCHAR2(20)  NOT NULL       ,Last_Updt_Timestamp    DATE          NOT NULL ) TABLESPACE Individual_Stripes       STORAGE (INITIAL             90K                NEXT                90K                MINEXTENTS            2                MAXEXTENTS          100                PCTINCREASE           0  );

Obviously, a third extent will not be able to be allocated for this table, making this approach to striping a high-maintenance proposition. The size of the table must be continually monitored , and new data files must be added to the tablespace when needed, using the ALTER TABLESPACE command with ADD DATAFILE. Although this approach requires additional maintenance, performance gains can be very significant, particularly if the disks have separate controllers. The designer should consider the trade-offs carefully before recommending that a table be striped.

Capacity Planning

Capacity planning is important in ensuring that adequate storage is available for future growth. The DDL scripts for each database object are invaluable in determining the overall storage required by the database. In fact, the process of capacity planning actually begins with the DDL scripts.

It starts with defining the column attributes. These column attributes, in turn , determine the size of each row in the table. The column attributes also determine the size of each row in indexes created on the columns. These attributes, combined with the estimated total number of rows (including provisions for future growth), are used in defining the storage clause for tables and indexes. For purposes of capacity planning, it should be assumed that all objects will reach their maximum extents.

CAUTION

The MAXEXTENTS parameter to the STORAGE clause has an upper limit that is determined by the operating system and file system environment in which the database resides. It is a bad practice to set MAXEXTENTS equal to the maximum allowable value. If the object grows more than expected, the NEXT parameter can be changed using ALTER object_type. If the maximum extents have already been allocated, however, this will have no effect. In this situation, the size of the object cannot be extended unless MAXEXTENTS can be increased.

Page 648

The next step is creating DDL for tablespaces. The data file(s) created by these scripts should be sufficiently large to have all objects that they will contain. When determining the size of the data files, it should be assumed that all objects within the tablespace will reach their maximum size, as defined by the STORAGE clause in the DDL script for each object.

The total size of the database can then be determined by simply adding the sizes of the data files. Never assume, however, that this estimate is accurate and complete. Also, there are a number of additional considerations to be made.

TIP

Remember to consider the sizes of log files. These are defined when the database is created.

In capacity planning, the designer must accommodate for unexpected growth. As a general rule of thumb, at least 25 percent (preferably 50 percent) of each disk should be free after the initial installation. This will allow additional data files to be created wherever necessary if tables grow larger than expected. It is common to experience unanticipated growth, and it is even more common to underestimate the requirements for temporary and rollback segments, or to identify the need for additional indexes after the initial design is complete.

The initial hardware configuration and data file layout should accommodate these possibilities. Capacity limitations can be crippling in the event that additional rollback segments or temporary segments cannot be created when needed.

For this reason, the importance of capacity planning should not be underestimated.

Summary

Designing a database is a methodical process. It begins with defining the requirements of the application. Entities and attributes are grouped into a logical data model that meets the business needs. The logical data model can be represented graphically as an entity-relationship diagram. These entities and attributes are then translated into a physical model. The physical model is when datatypes and column constraints are defined. The normalization process is used to eliminate redundancy of data. Denormalization is a process of breaking the normalization rules to gain performance increases . A well-designed database is important for future expansion as well as yielding ease of application programming. As Oracle8 delivers on its promise of object-relational technology, we must continue to incorporate these latest and greatest features within our paradigm or we may soon find that the corporate boat has left the dock without us!

Previous Table of Contents Next