Constraints | Inside Microsoft SQL Server 2005: The Storage Engine (Solid Quality Learning)

Constraints provide a powerful yet easy way to enforce the data integrity in your database. Data integrity comes in three forms:

Entity integrity Ensures that a table has a primary key. In SQL Server 2005, you can guarantee entity integrity by defining PRIMARY KEY or UNIQUE constraints or by building unique indexes. Alternatively, you can write a trigger to enforce entity integrity, but this is usually far less efficient.
Domain integrity Ensures that data values meet certain criteria. In SQL Server 2005, domain integrity can be guaranteed in several ways. Choosing appropriate data types can ensure that a data value meets certain conditionsfor example, that the data represents a valid date. Other approaches include defining CHECK constraints or FOREIGN KEY constraints or writing a trigger. You might also consider DEFAULT constraints as an aspect of enforcing domain integrity.
Referential integrity Enforces relationships between two tables, a referenced table, and a referencing table. SQL Server 2000 allows you to define FOREIGN KEY constraints to enforce referential integrity, and you can also write triggers for enforcement. It's crucial to note that there are always two sides to referential integrity enforcement. If data is updated or deleted from the referenced table, referential integrity ensures that any data in the referencing table that refers to the changed or deleted data is handled in some way. On the other side, if data is updated or inserted into the referencing table, referential integrity ensures that the new data matches a value in the referenced table.

In this section, I'll briefly describe some of the internal aspects of managing constraints. Constraints are also called declarative data integrity because they are part of the actual table definition. This is in contrast to programmatic data integrity, which uses stored procedures or triggers, which are discussed in Inside Microsoft SQL Server 2005: T-SQL Programming.

Here are the five types of constraints:

PRIMARY KEY
UNIQUE
FOREIGN KEY
CHECK
DEFAULT

You might also sometimes see the IDENTITY property and the nullability of a column described as constraints. I typically don't consider these attributes to be constraints; instead, I think of them as properties of a column, for two reasons. First, each constraint has its own row in the sys.objects catalog view, but IDENTITY and nullability information is not available in sys.objects, only in sys.columns and sys.identity_columns. This makes me think that these properties are more like data types, which are also viewable through sys.columns. Second, when you use the command SELECT INTO to make a copy of a table, all column names and data types are copied, as well as IDENTITY information and column nullability, but constraints are not copied to the new table. This makes me think that IDENTITY and nullability are more a part of the actual table structure than constraints are.

Constraint Names and Catalog View Information

The following simple create table statement, which includes a primary key on the table, creates a primary key constraint along with the table, and the constraint has a very cryptic-looking name.

CREATE TABLE customer ( cust_id      int         IDENTITY NOT NULL PRIMARY KEY, cust_name    varchar(30) NOT NULL )

If you don't supply a constraint name in the CREATE or ALTER TABLE statement that defines the constraint, SQL Server comes up with a name for you.

The constraint produced from the preceding simple statement has a name very similar to the nonintuitive name PK__customer__0856260D. (The hexadecimal number at the end of the name will most likely be different for a customer table that you create.) All types of single-column constraints use this naming scheme, which I'll explain shortly. The advantage of explicitly naming your constraint rather than using the system-generated name is greater clarity. The constraint name is used in the error message for any constraint violation, so creating a name such as CUSTOMER_PK probably makes more sense to users than a name such as PK__customer__cust_i__0856260D. You should choose your own constraint names if such error messages are visible to your users. The first two characters (PK) show the constraint typePK for PRIMARY KEY, UQ for UNIQUE, FK for FOREIGN KEY, and DF for DEFAULT.

Next are two underscore characters, which are used as a separator. (You might be tempted to use one underscore to conserve characters and to avoid having to truncate as much. However, it's common to use a single underscore in a table name or a column name, both of which appear in the constraint name. Using two underscore characters distinguishes the kind of a name it is and where the separation occurs.)

Note

Constraint names are schema scoped, which means they all share the same namespace and hence must be unique within a schema. Within a schema, you cannot have two tables with the same name for their PRIMARY KEY constraint.

Next comes the table name (customer), which is limited to 116 characters for a PRIMARY KEY constraint and slightly fewer characters for all other constraint names. For all constraints other than PRIMARY KEY, there are two more underscore characters for separation followed by the next sequence of characters, which is the column name. The column name is truncated to five characters if necessary. If the column name has fewer than five characters in it, the length of the table name portion can be slightly longer.

And finally, the hexadecimal representation of the object ID for the constraint (68E79C55) comes after another separator. This value is used in the object_id column of the sys.objects catalog view. Object names are limited to 128 characters in SQL Server 2005, so the total length of all portions of the constraint name must also be less than or equal to 128.

Several catalog views contain constraint information. They all inherit the columns from the sys.objects view and include additional columns specific to the type of constraint. These views are:

sys.key_constraints
sys.check_constraints
sys.default_constraints
sys.foreign_keys

The parent_object_id column, which indicates which object contains the constraint, is actually part of the base sys.objects view, but for objects that have no "parent," this column is NULL. Here is a query that allows you to take the name of a constraint, such as PK__customer__0856260D shown earlier, and use the hexadecimal number at the end of the name to get the sys.objects information for the object that includes this constraint:

SELECT * FROM sys.objects WHERE object_id = (SELECT parent_object_id from sys.objects where object_id = 0x0856260D)

You might think a query like this is unnecessary because the constraint name includes the object name, but keep in mind that object names are not unique. You can have multiple objects with the name customer if they are in different schemas. The object_id however, is guaranteed to be unique.

Keep these additional facts about constraints in mind if you want to query the constraint metadata.

A constraint is an object. A constraint has an entry in the sys.objects table with a value in the type column of C, D, F, PK, or UQ for CHECK, DEFAULT, FOREIGN KEY, PRIMARY KEY, or UNIQUE, respectively.
All four constraint catalog views relate to sys.objects. The constraint views are just extensions of the sys.objects catalog view. The object_id column in the constraint views is the object ID of the constraint, and the parent_column_id column of the constraint views is the object ID of the base table on which the constraint is declared.
parent_column_id values in sys.check_constraints and sys.default_constraints indicate the column the constraint applies to. If the constraint is a column-level CHECK or DEFAULT constraint, the constraint view has the column ID of the column in its parent_column_id column. This parent_column_id is related to the column_id of sys.columns for the base table represented by parent_object_id. A table-level constraint or any PRIMARY KEY/UNIQUE constraint (even if column level) always has 0 in the parent_column_id column.
To see the columns in a FOREIGN KEY constraint, you can look at the view sys.foreign_key_columns. This view's constraint_object_id column relates to the row for the constraint in sys.objects, and for a composite foreign key, there will be multiple rows with the same constraint_object_id value. The constraint_column_id indicates the column sequence, the parent_object_id and parent_column_id indicate the table containing the FOREIGN KEY constraint (the referencing table), and the referenced_object_id and referenced_column_id provide information about the referenced table.
sys.key_constraints contains rows for both PRIMARY KEY constraints and UNIQUE constraints To see the names and order of the columns in a PRIMARY KEY or UNIQUE constraint, you can use the parent_object_id and unique_index_id columns to find the corresponding row in sys.indexes and then look at the sys.index_columns table. I'll discuss the relationship between PRIMARY KEY and UNIQUE constraints and indexes in Chapter 7, and we'll look at the metadata in more detail at that time.

Note

Some of the documentation classifies constraints as either table-level or column-level and implies that any constraint defined on the line with a column is a column-level constraint and any constraint defined as a separate line in the table or added to the table with the ALTER TABLE command is a table-level constraint. However, this distinction does not hold true when you look at constraint catalog views. The only distinction in the catalog views is whether the constraint is on a single column.

Constraint Failures in Transactions and Multiple-Row Data Modifications

Many bugs occur in application code because developers don't understand how failure of a constraint affects a multiple-statement transaction declared by the user. The biggest misconception is that any error, such as a constraint failure, automatically aborts and rolls back the entire transaction. On the contrary, after an error is raised, it's up to the transaction to proceed and ultimately commit or to roll back. This feature provides the developer with the flexibility to decide how to handle errors. (The semantics are also in accordance with the ANSI SQL-92 standard for COMMIT behavior.)

Because many developers have handled transaction errors incorrectly and because it can be tedious to add an error check after every command, SQL Server includes a SET option called XACT_ABORT that causes SQL Server to abort a transaction if it encounters any error during the transaction. The default setting is OFF, which is consistent with ANSI-standard behavior. For more details about handling errors in SQL Server 2005, see Inside Microsoft SQL Server 2005: T-SQL Programming, which has an entire chapter about error handling.

A final comment about constraint errors and transactions: a single data modification statement (such as an UPDATE statement) that affects multiple rows is automatically an atomic operation, even if it's not part of an explicit transaction. If such an UPDATE statement finds 100 rows that meet the criteria of the WHERE clause but one row fails because of a constraint violation, no rows will be updated. I discuss implicit and explicit transactions a bit more in Chapter 8.

The Order of Integrity Checks

The modification of a given row will fail if any constraint is violated or if a trigger rolls back the operation. As soon as a failure in a constraint occurs, the operation is aborted, subsequent checks for that row aren't performed, and no trigger fires for the row. Hence, the order of these checks can be important, as the following list shows.

Defaults are applied as appropriate.
NOT NULL violations are raised.
CHECK constraints are evaluated.
FOREIGN KEY checks of referencing tables are applied.
FOREIGN KEY checks of referenced tables are applied.
UNIQUE/PRIMARY KEY is checked for correctness.
Triggers fire.