Defining Data Integrity

With respect to Oracle, data integrity often means that the data in the database consistently adheres to the business rules of the organization in question. The means by which this data remains consistent can be through application code, through the use of database triggers, and through declarative integrity constraints.

How any given business maps the business rules to these methods for designing in integrity is, usually, a combined decision made by the database administrator advising the designer. The administrator is primarily concerned with the implementation methods chosen by the designer and with balancing performance needs against the integrity requirements and the chosen method of implementation.

Application Code

Application code can either be implemented as stored procedures within the database or as applications running on the client or outside the database on the server.

Database Triggers

It was suggested to me by one of the people from whom I took an Oracle University class in Minnesota that, from a database administrator's perspective, "a trigger's a wonderful thing." At the time, we thought that he watched just a little too much Winnie the Pooh, but the sentiment has stayed with me.

Database triggers are PL/SQL programs whose executions are entirely event driven (such as on insert or after update or on error). Triggers can be enabled or disabled. That is to say, they can either be allowed to execute whenever the "triggering" event occurs, or they can be disallowed from occurring at all. Triggers are typically created to enforce complex business rules that cannot be defined as an integrity constraint.

Triggers are not a part of the Oracle 9i Fundamentals I exam, so they will not be explored in depth in this chapter. You should, however, be aware that they are a way to ensure integrity in the database.

Integrity Constraints

Integrity constraints are usually the preferred mechanism for the enforcement of business rules because they provide performance improvements, are easily declared and modified, and do not require extensive and complex coding. They tend to centralize the rules while allowing for flexibility and also have the advantage of being fully documented in the data dictionary.

A rule can be defined on a column or a set of columns in a single table that references (or matches) values in a related table. Referential integrity can also include rules that restrict the type of data manipulation allowed on the reference column values and how actions on those values affect the actions on dependent tables.

You can restrict, or disallow, the updating or deletion of referenced data so that you don't strand records in one table when the referencing data is deleted. You can set a column to either NULL or a default value when referenced data is updated or deleted. You can force updates or deletes to cascade to associated dependent data whenever the referenced data is updated or deleted. Or you can opt for no action to be allowed, meaning that updates or deletion on the referenced (parent) data be completely disallowed if child data exists. This differs from restricting because restriction is checked at the end of a statement or transaction. No action is the default action.

Table 14.1 lists the types of constraints available in Oracle and provides a description of those constraints.

Table 14.1. Constraints with Definition
Constraint	Description
Check	Specifies that a condition exists that every row in the table must satisfy.
Foreign key	Designates that a column or a combination of columns is a foreign key in a referential integrity constraint. Foreign keys can use the DELETE CASCADE rules that cause the deletion of all dependent rows when the referenced row is deleted. It can use the SET NULL rule so that information in the child table is retained when the parent information is updated or deleted but is no longer tied to a parent row. Finally, again by default, Oracle disallows the updating or deleting of the parent (referenced) information.
Not NULL	Specifies that a column cannot contain a null value. By definition, all columns in a table allow NULLs to exist. NULL is not the same as spaces; NULL is not the same as 0. It is the absence of data. A Not NULL constraint requires that a column of a table must not be allowed to contain NULL values.
Primary key	Designates that a column or a combination of columns is the given table's primary key. This designation carries with it an automatic index. A table can have at most one primary key; however, no table has to have a primary key. Any column included in a primary key cannot contain any NULL values.
Unique	Designates that a column or a combination of columns specifies the way that the entire row becomes guaranteed unique. This is nearly a primary key constraint, but it allows NULLs (NULL values are never equal to anything, not even another NULL; therefore, a column full of NULLs is by definition unique), and it does not carry an implicit index.

Foreign key constraints between two columns either in the same table or in different tables must be defined on columns with the exact same data type and size. The column names, however, do not have to match.

There are distinct advantages to using constraints over application code or triggers. Constraints enforce applicable business rules in code in the database rather than in the application logic. This means that any time the business rules change, the change can be made centrally to the database, and no one has to worry about those changes being missed in one of the application programs. Definition and declaration of constraints occurs by use of a simple SQL statement. Alteration requires no complex additional programming logic. Finally, the performance of placing the business rule in the constraints is at worst similar to the cost of executing a SQL statement that would evaluate that same logic. Using constraints also ensures that business rules are always enforced, no matter what method is used to modify the data.

Constraint States

Integrity constraints can either be enabled (ENABLE) or disabled (DISABLE). If a constraint is enabled, just as if a trigger is enabled, data is checked when it is being entered or updated, and data that does not conform to the given constraint's rule is prevented from being entered. If the constraint is disabled, any data entered in the database will be entered even if that data does not comply with the constraint rule.

An integrity constraint can be in any one of the following states:

DISABLE NOVALIDATE means that the constraint is not checked. Data in the table and any new data are entered and updated, and none of the data may any longer conform to the rules defined by the constraint. DISABLE implies NOVALIDATE unless VALIDATE is specified.
DISABLE VALIDATE means that the constraint is not active but also that no modification can take place in the constrained columns, and any indexes that were involved with the constraint are dropped.
ENABLE NOVALIDATE means that the constraint allows new data to be entered as long as that data meets the constrained condition but also that the data that is in the table does not have to meet those conditions. This is often a useful state in data warehousing where valid OLTP data is being uploaded.
ENABLE VALIDATE means that no row that violates the constraint can be entered into the table, and no row that violates the constraint exists in the table. If the constraint was in the ENABLE NOVALIDATE or the disabled state and then the constraint was altered to the ENABLE VALIDATE state, rows that weren't validated need to be altered or deleted if they don't conform when the validation is reenabled. ENABLE implies VALIDATE unless NOVALIDATE is specified.

Whenever a unique or primary key constraint moves from the DISABLE state to the ENABLE state and there is no existing index, a unique index gets created automatically. The converse is also true; whenever a primary key moves from an ENABLE state to the DISABLE state, the index is dropped.

Whenever any constraint moves from the NOVALIDATE to VALIDATE state, all data must consequently be checked. Moving from VALIDATE to NOVALIDATE, naturally, ignores the fact that the data has ever been checked for validity. This means that if you move from VALIDATE to NOVALIDATE and immediately back to VALIDATE without entering or changing any data, every entry has to be rechecked.

Constraint Checking

Your database can either check constraints to ensure data validity during the transaction, or defer the checking until the end of the transaction.

Nondeferred constraints are considered to be immediate constraints and are enforced at the end of each and every DML statement. Constraint violations in this case cause the statement to be rolled back immediately. If the constraint causes an action such as DELETE CASCADE, the action is taken as an immediate part of the statement that caused it. Constraints defined as nondeferrable cannot later be modified to be enforced at the end of a transaction rather than at the end of every DML.

Deferred constraints are checked only at the end of a transaction when that transaction is committed. At that time, if any constraint violations are detected, the entire transaction is rolled back. Deferred constraints are particularly useful when you are inserting both the parent and child rows in a foreign key relationship at the same time, as in an order entry system where the order and the items on that order are being entered at the same time.

Constraints can be defined as deferred and have either of the following characteristics:

INITIALLY IMMEDIATE means that by default the constraint should behave as if it were an immediate constraint and any other behavior needs to be set otherwise.
INITIALLY DEFERRED means that by default the constraint should be enforced only at the end of the transaction.

You can either use the SET CONSTRAINTS statement to make a constraint either DEFERRED or IMMEDIATE, or you can use the ALTER SESSION statement to make alterations using the SET CONSTRAINTS to DEFERRED or to IMMEDIATE.

The SET CONSTRAINTS statement makes constraints either DEFERRED or IMMEDIATE for a particular transaction and can either be used to set a single constraint or a list of constraints. SET CONSTRAINT lasts for the duration of the transaction, or until another SET CONSTRAINTS statement resets the mode. SET CONSTRAINTS is not allowed inside triggers. An example of the generic version of this statement follows:

 SET CONSTRAINT | CONSTRAINTS {constraint name|constraint list| ALL} {IMMEDIATE|DEFERRED}

The ALTER SESSION SET CONSTRAINTS command implies setting all constraints, and no list of constraints can be specified. The statement applies to the current session only. An example of the generic ALTER SESSION command follows:

 ALTER SESSION SET CONSTRAINTS = {IMMEDIATE|DEFERRED|DEFAULT}

Enforcing Primary and Unique Key Constraints

Primary keys and unique keys are enforced using indexes. You can control both the location of the index and the type of index used for enforcing these constraints. Oracle uses the following process to determine how to enforce the constraints:

If the constraint is disabled, no indexes are needed.
If the constraint is enabled and the columns in the constraint form the leading part of the index, the index is used to enforce the constraint whether the index itself was created as unique or nonunique.
If the constraint is enabled and there is no index that uses the constraint columns as the leading part of the index, an index with the same name as the constraint is created using the following rules:
- If the key is deferrable, a nonunique index on the key columns is built.
- If the key is nondeferrable, a unique index is created.
If an index is available to be used and the constraint is nondeferred, Oracle uses the existing index.
If an index is available to be used, the constraint is deferrable, and the index is nonunique, Oracle uses the existing index.

Guidelines for Using Constraints

It is necessary to take many things into consideration when determining what constraints to define on a table.

When primary keys and unique constraints are created, it is important to remember that these imply indexes even if you don't specify the indexes, and the indexes should be deliberately built and built in a tablespace separate from the one that houses the table. This is accomplished either using the USING INDEX clause or by first creating the table, then creating the index, and then altering the table to add or enable the constraint.

If you are anticipating a large number of bulk loads on the table, nonunique indexes would be preferred over unique indexes. Further, if you are doing bulk loads, it is typically preferable to disable the constraints, perform the load, and then reenable the constraints. If a unique index is used for enforcing a primary key or unique constraint, this index needs to be dropped when the constraint is disabled.

For self-referencing foreign keys (keys that join tables to themselves), you should define or enable foreign keys only after the initial load and defer constraint checking. Defining the constraints as deferrable is the most useful method when the table will be used as the target for frequent bulk loads.

You are now ready to define and implement data integrity constraints. The following section guides you through the considerations and implementation of these constraints.