0631-0634 | Oracle Development Unleashed (3rd Edition)

Previous Table of Contents Next

Page 631

The Normalization Process

A process known as normalization is a technique used to group attributes in ways that eliminate these types of problems. More specifically , the goals of normalization are to minimize redundancy and functional dependency. Functional dependencies occur when the value of one attribute can be determined from the value of another attribute. The attribute that can be determined is said to be functionally dependent on the attribute that is the determinant. By definition, then, all nonkey attributes will be functionally dependent on the primary key in every relation (because the primary key uniquely defines each row). When one attribute of a relation does not uniquely define another attribute, but limits it to a set of predefined values, this is called a multivalued dependency. A partial dependency exists when an attribute of a relation is functionally dependent on only one attribute of a concatenated key. Transitive dependencies occur when a nonkey attribute is functionally dependent on one or more other nonkey attributes in the relation.

Normalization is measured in terms of normal forms, and the process of normalization consists of taking appropriate steps to reach the next normal form:

First normal form is very easy to achieve. The only requirement is that each " cell " in the table has a single value. In other words, first normal form will be violated only if multiple values are being stored in a single row and column combination of a table.
Second normal form is achieved when all partial dependencies are removed. If there are no composite keys in the relations, this level of normalization is very easy to achieve also.
Third normal form requires that all transitive dependencies be removed. In the sample relations presented previously, there are at least two transitive dependencies: In the Addresses relation, both city and state are dependent on zip code. (It could also be said that there is a multivalued dependency between zip code and city and state.)
Fourth normal form is reached when all multivalued dependencies are removed.

(There are other normal forms, but they are beyond the scope of this discussion.)

Third normal form can be reached in the example by removing the transitive dependencies that exist in the Address relation. This can be achieved by removing city and state from the Address relation and creating two new relations, as you will note in Table 25.4.

Table 25.4. Third normal form of States and Cities tables.

Cities	States
Zip Code (PK)	Zip Code (PK)
City (PK)	State

Page 632

NOTE

The (PK) next to both Zip Code and City in the Cities relation indicates that the attributes make up the concatenated primary key. It would be very unusual to have a relation in which all attributes are part of the primary key.

This makes both the Zip Code and City attributes in the Address relation foreign keys. (Note that City is made part of the key because, in rural areas, one zip code might have more than one city.) The model can be further normalized by eliminating several of the multivalued dependencies, as you will note in Table 25.5.

Table 25.5. Tables defined with identifying primary keys.

Address Type	Phone Type	Contact Type
ID (PK)	ID (PK)	ID (PK)
ype	Type	Type
Contact Method	Contact Reason
ID (PK)	ID (PK)
Method	Reason

Where these attributes exist in the previously defined relations, they will remain as foreign keys. As these examples illustrate , the dependencies are not eliminated, but their nature is changed so that dependencies exist on key attributes rather than nonkey attributes. In this example, however, fourth normal form is still not achieved. In the Individual relation, Company has a multivalued dependency (at best) on the Last Name, First Name , and Middle Initial.

Separating Company from Individuals brings the model to fourth normal form. Additional relations allow addresses and phones to be associated with either companies or individuals, and allow individuals to be related to more than one company, as you will note in Table 25.6.

Table 25.6. Fourth normal form.

Individuals	Individual-Company Relation
ID (PK)	Individual ID (FK)
Last Name	Company ID (FK)
First Name
Middle Initial
Contact Type (FK)

Page 633

Individuals
Individual-Company Relation
Assigned Employee (FK)
Individual Notes
Last Update User ID
Last Update Date/Time
Companies	Entity Type
ID (PK)	ID (PK)
Company	Type Company Notes
Addresses	Phone Numbers
ID (PK)	ID (PK)
Entity Type (FK)	Entity Type (FK)
Entity ID (FK)	Entity ID (FK)
Address Line 1	Phone Number
Address Line 2	Phone Type (FK)
Address Line 3	Last Update User ID
City (FK)	Last Update Date/Time
Zip Code (FK)
Address Type (FK)
Last Update User ID
Last Update Date/Time

An additional aspect of the logical model is the design of tablespaces. A tablespace consists of one or more data files and, as the name implies, a tablespace houses one or more database objects. Before proceeding to the physical design, designers should consider how they want to use tablespaces to group database objects along logical boundaries.

One tablespace is always created when Oracle is installed. This tablespace, called SYSTEM, houses all system tables and other system objects used by Oracle itself. Although this tablespace can be used to house additional tables, it is preferable to create separate tablespaces for application-specific objects. In many cases, it is desirable to create several tablespaces to house different types of database objects. A common logical division is to create separate tablespaces for rollback segments, indexes, tables, and temporary segments. This topic will be discussed in greater detail in the following section, but these logical design issues are worthy of some consideration before proceeding to the physical design.

Page 634

The Physical Model

The physical database consists of data files, tablespaces, rollback segments, tables, columns , and indexes. There are dependencies between these elements that impose an order on the design process. The process often starts with designing the smallest units of physical storage (the column) and proceeds, in order, to each successively larger unit of storage. Overall capacity and performance considerations provide constraints to the design, and should be considered at every step. As with logical modeling, developing a physical design can be a somewhat iterative process.

Column Attributes and DDL

Designing the physical database begins with assigning column attributes. The attributes of a column determine how it will be physically stored in the database by defining its datatype and maximum length. Oracle8 allows up to 1,000 columns per table. The datatype and length of a column should be carefully chosen at design time, because it is sometimes difficult to change these attributes after data has been loaded. Consider the following summarizations of each of the Oracle data types:

CHAR(SIZE)	Used to store fixed-length, alphanumeric data. (SIZE) determines the number of characters that will be stored for each value in the column. If a value is inserted into the column that is shorter than (SIZE), it will be padded with spaces on the right until it reaches (SIZE) characters in length. The maximum length of this datatype is 255 bytes. If (SIZE) is omitted, the default is 1 byte.
VARCHAR2(SIZE)	Used to store variable-length, alphanumeric data. This datatype differs from CHAR in that inserted values will not be padded with spaces; the Oracle7 maximum (SIZE) for this type is 2,000 bytes and 4,000 bytes in Oracle8.
NUMBER(P, S)	Used to store positive or negative, fixed or floating-point numeric values. The precision, (P), determines the maximum length of the data, whereas the scale, (S), determines the number of places to the right of the decimal. If scale is omitted, the default is 0. If precision is omitted, values are stored with their original precision up to the maximum of 38 digits.
DATE	Used to store dates and times. Oracle uses its own internal format to store only 7 bytes for day, month, century, year, hour , minute, and second. This is important because it illustrates the point that dates are fairly inexpensive to store ”7 bytes per record, even if only a portion of the information is used. The default representation for dates in Oracle is DD-MON-YY. For example, '01-JAN-95' is used to represent January 1, 1995.

Previous Table of Contents Next