DBMS-Enforced Data Integrity | Inside Microsoft SQL Server 2000

A database is only as useful as the user's confidence in it. That's why the server must enforce data integrity rules and business policies. SQL Server enforces data integrity within the database itself, guaranteeing that complex business policies are followed and that mandatory relationships between data elements are complied with.

Because SQL Server's client/server architecture allows you to use a variety of front-end applications to manipulate and present the same data from the server, it would be cumbersome to encode all the necessary integrity constraints, security permissions, and business rules into each application. If business policies were all coded in the front-end applications, every application would need to be modified every time a business policy changed. Even if you attempted to encode business rules into every client application, the danger of an application misbehaving would still exist. Most applications can't be fully trusted. Only the server can act as the final arbiter, and the server must not provide a back door for a poorly written or malicious application to subvert its integrity.

SQL Server uses advanced data integrity features such as stored procedures, declarative referential integrity (DRI), datatypes, constraints, rules, defaults, and triggers to enforce data integrity. Each of these features has its own use within a database; by combining these integrity features, you can make your database flexible and easy to manage yet secure.

Declarative Data Integrity

The ability to enforce data integrity based on the constraints you define when you create the database tables is known as declarative data integrity. The different kinds of declarative data integrity constraints correspond to the three fundamental aspects of data integrity: entity integrity, referential integrity, and domain integrity.

A central tenet of relational database theory is that every tuple of every relation (more colloquially, every row of every table) can be uniquely identified—a condition known as entity integrity. The attribute or combination of attributes (the column or combination of columns) that ensures uniqueness is known as the primary key. A table can have only one primary key. SQL Server allows you to designate the columns that make up the primary key when you define a table. This is known as a PRIMARY KEY constraint. SQL Server uses this PRIMARY KEY constraint to guarantee that the uniqueness of the values in the designated columns is never violated. It enforces entity integrity for a table by making sure that it has a primary key.

Sometimes more than one column (or combination of columns) of a table can uniquely identify a row—for example, an employee table might have an employee ID (emp_id) column and a Social Security number (soc_sec_num) column, both of whose values are considered unique. Such columns are often referred to as alternate keys or candidate keys. These keys must also be unique. Although a table can have only one primary key, it can have multiple alternate keys. SQL Server supports the multiple alternate key concept via UNIQUE constraints. When a column or combination of columns is declared unique, SQL Server prevents any row from being added or updated that would violate this uniqueness.

Assigning an arbitrary unique number as the primary key when no natural or convenient key exists is often most efficient. For example, businesses commonly use customer numbers or account numbers as unique identifiers or primary keys. SQL Server makes it easy to efficiently generate unique numbers by allowing one column in a table to have the IDENTITY property. You use the IDENTITY property to make sure that each value in the column is unique and that the values will increment (or decrement) by the amount you specify from a starting point that you specify. (A column with the IDENTITY property typically also has a PRIMARY KEY or UNIQUE constraint, but this isn't required.)

The second type of data integrity is referential integrity. SQL Server enforces logical relationships between tables with FOREIGN KEY constraints. A foreign key in a table is a column or combination of columns that match the primary key (or possibly an alternate key) of another table. The logical relationship between those two tables is the basis of the relational model; referential integrity means that this relationship is never violated.

For instance, the simple SELECT example shown earlier in the chapter includes a titles table and a publishers table. In the titles table, the column title_id (title ID) is the primary key. In the publishers table, the column pub_id (publisher ID) is the primary key. The titles table also includes a pub_id column, which isn't the primary key because a publisher can publish multiple titles. Instead, pub_id is a foreign key, and it references the primary key of the publishers table. If you declare this relationship when you define the table, SQL Server enforces it from both sides. First, it ensures that a title can't be entered in the titles table, or an existing pub_id in the titles table can't be modified, unless a valid publisher ID for the new pub_id is in the publishers table. Second, it ensures that no changes are made to a pub_id value in the publishers table without consideration of the referencing values in the titles table. It can do this in two ways. It can restrict changes to the publishers table so that a publisher can't be deleted or a pub_id value can't be changed if any titles in the titles table reference that publisher. Alternatively, it can cascade the changes through the titles table. So, if you were to delete a publisher, all rows in titles that reference that publisher would also be deleted. If you were to update a pub_id, all matching pub_id values in titles would be updated to the same value.

The third type of data integrity is domain integrity, in which individual data values must meet certain criteria. SQL Server makes sure that any data entered matches the type and range of the specified datatype. For example, it prohibits NULL values from being entered in a column unless the column explicitly allows NULLs. SQL Server supports a wide range of datatypes, resulting in great flexibility and efficient storage. It also supports the definition of CHECK constraints to define conditions that the data in a column must meet.

Datatypes

SQL Server datatypes provide the simplest form of data integrity by restricting the types of information (for example, characters, numbers, or dates) that can be stored in the columns of the database tables. You can also design your own datatypes (user-defined datatypes) to supplement those supplied by the system. For example, you can define a state_code datatype as two characters (CHAR(2)); SQL Server will then accept only two-character state codes. You can use a user-defined datatype to define columns in any table. An advantage of user-defined datatypes is that you can bind rules and defaults (which I'll discuss in the next two sections) to them for use in multiple tables, eliminating the need to include these types of checks in the front-end application.

CHECK Constraints and Rules

CHECK constraints and rules are integrity constraints that go beyond those implied by a column's datatype. When a user enters a value, SQL Server checks that value against any CHECK constraint or rule created for the specified column to ensure that only values that adhere to the definition of the constraint or rule are accepted.

Although CHECK constraints and rules are essentially equivalent in functionality, CHECK constraints are easier to use and provide more flexibility. CHECK constraints are the preferred mechanism for restricting values in a column; SQL Server provides rules primarily as backward compatibility feature. A CHECK constraint can be conveniently defined when a column is defined, it can be defined on multiple columns, and it has access to all of SQL Server's built-in functions. A rule, however, must be defined and then bound separately to a single column or user-defined datatype, and it has access only to built-in functions that don't reference database objects.

Both CHECK constraints and rules can require that a value fall within a particular range, match a particular pattern, or match one of the entries in a specified list. An advantage of CHECK constraints is that they can depend on either the value of another column or columns in the row or on the value returned by one of the built-in functions. A rule can't reference other fields. As an example of applying a CHECK constraint or rule, a database containing information on senior citizens could have the CHECK constraint or the rule "age column must contain a value between 65 and 120 years." A birth certificate database could require that the date in the birth_date column be some date prior to the current date.

Defaults

Defaults allow you to specify a value that SQL Server inserts if no value is explicitly entered in a particular field. For example, you can set the current date as the default value for an order_date field in a customer order record. Then, if a user or front-end application doesn't make an entry in the order_date field, SQL Server automatically inserts the current date. You can also use the keyword DEFAULT as a placeholder in an INSERT or UPDATE statement, which instructs SQL Server to set the value to the declared default value.

Triggers

Triggers are a special type of stored procedure. Stored procedures can be executed only when explicitly called; triggers are automatically invoked, or triggered, by SQL Server, and this is their main advantage. Triggers are associated with particular pieces of data and are called automatically whenever an attempt to modify that data is made, no matter what causes the modification (user entry or an application action).

Conceptually, a trigger is similar to a CHECK constraint or rule. SQL Server automatically activates triggers, constraints, and rules when an attempt is made to modify the data they protect. CHECK constraints and rules then perform fairly simple types of checks on the data—for example, "make sure the age field has a value between 0 and 120." Triggers, on the other hand, can enforce extremely elaborate restrictions on the data, which helps to ensure that the rules by which your business operates can't be subverted. Because triggers are a form of stored procedure, they have the full power of the Transact-SQL language at their disposal and they can invoke other stored and extended stored procedures. You can write a trigger that enforces complex business rules, such as the following:

Don't accept an order:

If the customer has any past due accounts with us

If the customer has a bad credit rating by ACME Credit Service (with the trigger calling an extended procedure that automatically dials up ACME to get the credit rating)

If the order is for more than $50,000 and the customer has had an account with us for less than six months

This integrity check is quite powerful. Yet the trigger to enforce it is simple to write. Triggers can also enforce referential integrity, ensuring that relationships between tables are maintained. For example, a trigger can prohibit a customer record from being deleted if open orders exist for the customer, or it can prohibit any new order for a customer for which no record exists. While you can do all this by declaring primary and foreign key constraints, triggers give you some added flexibility and functionality. Triggers allow you to supply your own error messages, which might be more useful than the default messages provided by SQL Server. Triggers also allow you to implement other forms of referential actions beside those available with the foreign key definition. I'll talk more about referential actions in Chapter 6, where I'll talk about constraints, and in Chapter 12, where I'll go into greater detail about triggers.

Triggers automatically execute whenever a specified change to a data object is attempted. A trigger executes once per statement, even if multiple rows would be affected. It has access to the before and after images of the data. SQL Server 2000 provides two kinds of triggers: "after" triggers and "instead-of" triggers. Instead-of triggers are what some people might call "before" triggers, but that name is not really accurate for SQL Server. An instead-of trigger defines actions to be carried out instead of the requested data modification. A before trigger would define actions to perform prior to the data modification. By default, if a table or view has an instead-of trigger defined for a particular modification operation (INSERT, UPDATE, or DELETE), that data modification does not take place. Instead, the trigger action is executed.

An after trigger executes after the data has been modified. The trigger takes further action such as rolling back the current transaction or carrying out special referential actions.