Constraints provide a powerful yet easy way to enforce the data integrity in your database. As described in Chapter 2, data integrity comes in three forms:
In this section, I'll discuss the use of constraints for enforcing your data integrity requirements. Constraints are also called declarative data integrity because they are part of the actual table definition. This is in contrast to programmatic data integrity enforcement, which uses stored procedures or triggers. Here are the five types of constraints:
You might also sometimes see the IDENTITY property and the nullability of a column described as constraints. I typically don't consider these attributes to be constraints; instead, I think of them as properties of a column, for two reasons. First, as we'll see, each constraint has its own row in the sysobjects system table, but IDENTITY and nullability information is not stored in sysobjects, only in syscolumns. This makes me think that these properties are more like datatypes, which are also stored in syscolumns. Second, when you use the special command SELECT INTO, a new table can be created that is a copy of an existing table. All column names and datatypes are copied, as well as IDENTITY information and column nullability. However, constraints are not copied to the new table. This makes me think that IDENTITY and nullability are more a part of the actual table structure than constraints are.
A central tenet of the relational model is that every row in a table is in some way unique and can be distinguished in some way from every other row in the table. You could use the combination of all columns in a table as this unique identifier, but the identifier is usually at most the combination of a handful of columns, and often it's just one column: the primary key. Although some tables might have multiple unique identifiers, each table can have only one primary key. For example, perhaps the employee table maintains both an Emp_ID column and an SSN (social security number) column, both of which can be considered unique. Such column pairs are often referred to as alternate keys or candidate keys, although both terms are design terms and aren't used by the ANSI SQL standard or by SQL Server. In practice, one of the two columns is logically promoted to primary key using the PRIMARY KEY constraint, and the other is usually declared by a UNIQUE constraint.Although neither the ANSI SQL standard nor SQL Server require it, it's good practice to declare a PRIMARY KEY constraint on every table. Furthermore, you must designate a primary key for a table that will be published for transaction-based replication.
Internally, PRIMARY KEY and UNIQUE constraints are handled almost identically, so I'll discuss them together here. Declaring a PRIMARY KEY or UNIQUE constraint simply results in a unique index being created on the specified column or columns, and this index enforces the column's uniqueness in the same way that a unique index created manually on a column would. The query optimizer makes decisions based on the presence of the unique index rather than on the fact that a column was declared as a primary key. How the index got there in the first place is irrelevant to the optimizer.
All columns that are part of a primary key must be declared (either explicitly or implicitly) as NOT NULL. Columns that are part of a UNIQUE constraint can be declared to allow NULL. However, for the purposes of unique indexes, all NULLs are considered equal. So if the unique index is on a single column, only one NULL value can be stored (another good reason to try to avoid NULL whenever possible). If the unique index is on a composite key, one of the columns can have many NULLs as long as the value in the other column is unique. For example, if the constraint contains two int columns, exactly one row of each of these combinations will be allowed:
NULL NULL 0 NULL NULL 0 1 NULL NULL 1
This behavior is questionable: NULL represents an unknown, but using it this way clearly implies that NULL is equal to NULL. As you'll recall, I recommend that you avoid using NULLs, especially in key columns.
You can explicitly specify the index attributes CLUSTERED or NONCLUSTERED when you declare a constraint. If you don't, the index for a UNIQUE constraint will be nonclustered and the index for a PRIMARY KEY constraint will be clustered (unless CLUSTERED has already been explicitly stated for a unique index, because only one clustered index can exist per table). You can specify the index FILLFACTOR attribute if a PRIMARY KEY or UNIQUE constraint is added to an existing table using the ALTER TABLE command. FILLFACTOR doesn't make sense in a CREATE TABLE statement because the table has no existing data, and FILLFACTOR on an index affects how full pages are only when the index is initially created. FILLFACTOR isn't maintained when data is added.
Try to keep the key lengths as compact as possible. Columns that are the primary key or that are unique are most likely to be joined and frequently queried. Compact key lengths allow more index entries to fit on a given 8-KB page, thereby reducing I/O, increasing cache hits, and speeding up character matching. Clustered index keys are used as bookmarks in all your nonclustered indexes, so a long clustered key will increase the size and decrease the I/O efficiency of all your indexes. So if your primary key has a clustered index, you've got plenty of good reasons to keep it short. When no naturally efficient compact key exists, it's often useful to manufacture a surrogate key using the IDENTITY property on an int column. If int doesn't provide enough range, a good second choice is a numeric column with the required precision and with scale 0. Alternatively, you can consider a bigint for an identity column You might use this surrogate as the primary key, use it for most join and retrieval operations, and declare a UNIQUE constraint on the natural but inefficient columns that provide the logical unique identifier in your data. Or you might dispense with creating the UNIQUE constraint altogether if you don't need to have SQL Server enforce the uniqueness. Indexes slow performance of data modification statements because the index as well as the data must be maintained.
Although it's permissible to do so, don't create a PRIMARY KEY constraint on a column of type float or real. Because these are approximate datatypes, the uniqueness of such columns is also approximate, and the results can sometimes be unexpected.
You can't directly drop a unique index created as a result of a PRIMARY KEY or UNIQUE constraint by using the DROP INDEX statement. Instead, you must drop the constraint by using ALTER TABLE DROP CONSTRAINT (or you must drop the table itself). This feature was designed so that a constraint can't be compromised accidentally by someone who doesn't realize that the index is being used to enforce the constraint.
There is no way to temporarily disable a PRIMARY KEY or UNIQUE constraint. If disabling is required, use ALTER TABLE DROP CONSTRAINT and then later restore the constraint by using ALTER TABLE ADD CONSTRAINT. If the index you use to enforce uniqueness is clustered, when you add the constraint to an existing table, the entire table and all nonclustered indexes will be internally rebuilt to establish the cluster order. This can be a time-consuming task, and it requires about 1.2 times the existing table space as a temporary work area in the database would require (2.2 times total, counting the permanent space needed) so that the operation can be rolled back if necessary. However, you can rebuild an index that is enforcing a constraint by using the DROP_EXISTING option of the CREATE INDEX statement. This drops and re-creates the index behind the scenes as a single operation. I'll discuss the CREATE INDEX command in detail, including the DROP_EXISTING option, in Chapter 8.
Typically, you declare PRIMARY KEY and UNIQUE constraints when you create the table (CREATE TABLE). However, you can add or drop both by subsequently using the ALTER TABLE command. To simplify matters, you can declare a PRIMARY KEY or UNIQUE constraint that includes only a single column on the same line that you define that column in the CREATE TABLE statement. Or you can declare the constraint as a separate line in your CREATE TABLE statement. I'll show you examples of both methods shortly. The approach you use is largely a matter of personal preference unless your constraint involves multiple columns. Constraints defined along with a column definition can refer only to that column.
Following are examples of four ways to declare a PRIMARY KEY constraint on a single column. They all cause a unique, clustered index to be created. Note the (abridged) output of the sp_helpconstraint procedure for each—especially the constraint name.
EXAMPLE 1
CREATE TABLE customer ( cust_id int IDENTITY NOT NULL PRIMARY KEY, cust_name varchar(30) NOT NULL ) GO EXEC sp_helpconstraint customer GO >>>> Object Name ----------- customer constraint_type constraint_name ----------------------- ------------------------------ PRIMARY KEY (clustered) PK__customer__68E79C55
EXAMPLE 2
CREATE TABLE customer ( cust_id int IDENTITY NOT NULL CONSTRAINT cust_pk PRIMARY KEY, cust_name varchar(30) NOT NULL ) GO EXEC sp_helpconstraint customer GO >>>> Object Name ----------- customer constraint_type constraint_name ----------------------- --------------- PRIMARY KEY (clustered) cust_pk No foreign keys reference this table.
EXAMPLE 3
CREATE TABLE customer ( cust_id int IDENTITY NOT NULL, cust_name varchar(30) NOT NULL, PRIMARY KEY (cust_id) ) GO EXEC sp_helpconstraint customer GO >>>> Object Name ----------- customer constraint_type constraint_name --------------- --------------- PRIMARY KEY (clustered) PK__customer__59063A47 No foreign keys reference this table.
EXAMPLE 4
CREATE TABLE customer ( cust_id int IDENTITY NOT NULL, cust_name varchar(30) NOT NULL, CONSTRAINT customer_PK PRIMARY KEY (cust_id) ) GO EXEC sp_helpconstraint customer GO >>>> Object Name ----------- customer constraint_type constraint_name --------------- --------------- PRIMARY KEY (clustered) customer_PK No foreign keys reference this table.
In Examples 1 and 3, I did not provide an explicit name for the constraint, so SQL Server came up with the names. The names, PK__customer__68E79C55 and PK__customer__59063A47, seem cryptic, but there is some method to this apparent madness. All types of single-column constraints use this same naming scheme, which I'll discuss later in the chapter, and multi-column constraints use a similar scheme. Whether you choose a more intuitive name, such as cust_pk in Example 2 or customer_PK in Example 4, or the less intuitive (but information-packed), system-generated name is up to you. Here's an example of creating a multi-column, UNIQUE constraint on the combination of multiple columns. (The primary key case is essentially identical.)
CREATE TABLE customer_location ( cust_id int NOT NULL, cust_location_number int NOT NULL, CONSTRAINT customer_location_unique UNIQUE (cust_id, cust_location_number) ) GO EXEC sp_helpconstraint customer_location GO >>> Object Name ----------------- customer_location constraint_type constraint_name constraint_keys --------------- ------------------------ --------------- UNIQUE customer_location_unique cust_id, (non-clustered) cust_location_number No foreign keys reference this table.
As noted earlier, a unique index is created to enforce either a PRIMARY KEY or a UNIQUE constraint. The name of the index is the same as the constraint name, whether the name was explicitly defined or system-generated. The index used to enforce the CUSTOMER_LOCATION_UNIQUE constraint in the above example is also named customer_location_unique. The index used to enforce the column-level, PRIMARY KEY constraint of the customer table in Example 1 is named PK__customer__68E79C55, which is the system-generated name of the constraint. You can use the sp_helpindex stored procedure to see information for all indexes of a given table. For example:
EXEC sp_helpindex customer >>> index_name index_description index_keys ------------------------------ ------------------ ---------- PK__customer__68E79C55 clustered, unique, cust_id primary key located on default
One of the fundamental concepts of the relational model is the logical relationships between tables. In most databases, certain relationships must exist (that is, the data must have referential integrity) or else the data will be logically corrupt.
SQL Server automatically enforces referential integrity through the use of FOREIGN KEY constraints. This feature is sometimes referred to as declarative referential integrity, or DRI, to distinguish it from other features, such as triggers, that you can also use to enforce the existence of the relationships.
The full ANSI SQL-92 standard contains the notion of the referential action, sometimes (incompletely) referred to as a cascading delete. SQL Server 7 doesn't provide this feature as part of FOREIGN KEY constraints, but this notion warrants some discussion here because the capability exists via triggers.
The idea behind referential actions is this: sometimes, instead of just preventing an update of data that would violate a foreign key reference, you might be able to perform an additional, compensating action that will still enable the constraint to be honored. For example, if you were to delete a customer table that has references to orders, you can have SQL Server automatically delete all those related order records (that is, cascade the delete to orders), in which case the constraint won't be violated and the customer table can be deleted. This feature is intended for both UPDATE and DELETE statements, and four possible actions are defined: NO ACTION, CASCADE, SET DEFAULT, and SET NULL.
SQL Server 2000 allows you to specify either NO ACTION or CASCADE when you define your foreign key constraints. I'll give you some examples shortly. If you want to implement either SET DEFAULT or SET NULL, you can use a trigger. Implementing the SET NULL action is very straightforward, as you'll see in Chapter 12, which discusses triggers in detail. One reason it's so straightforward is that while a FOREIGN KEY requires any value in the referencing table to match a value in the referenced table, NULLs are not considered to be a value, and having a NULL in the referencing table doesn't break any relationships. Implementing the SET DEFAULT action is just a little more involved. Suppose you want to remove a row from the referenced table and set the foreign key value in all referencing rows to -9999. If you want to maintain referential integrity, you have to have a "dummy" row in the referenced table with the primary key value of -9999.
Keep in mind that enforcing a foreign key implies that SQL Server will check data in both the referenced and the referencing tables. The referential actions just discussed apply only to actions that are taken on the referenced table. On the referencing table, the only action allowed is to not allow an update or an insert if the relationship will be broken as a result.
NOTE
Because a constraint is checked before a trigger fires, you can't have both a FOREIGN KEY constraint to enforce the relationship when a new key is inserted into the referencing table and a trigger that performs an operation such as SET NULL on the referenced table. If you do have both, the constraint will fail and the statement will be aborted before the trigger to cascade the delete fires. If you want to allow the SET NULL action, you have to write two triggers. You have to have a trigger for delete and update on the referenced table to perform the SET NULL action and a trigger for insert and update on the referencing table that disallows new data that would violate the referential integrity.If you do decide to enforce your referential integrity with triggers, you might still want to declare the foreign key relationship largely for readability so that the relationship between the tables is clear. You can then use the NOCHECK option of ALTER TABLE to disable the constraint, and then the trigger will fire.
This note is just referring to AFTER triggers. SQL Server 2000 provides an additional type of trigger called an INSTEAD OF trigger, which I'll talk about in Chapter 12.
Here's a simple way to declare a primary key/foreign key relationship:
CREATE TABLE customer ( cust_id int NOT NULL IDENTITY PRIMARY KEY, cust_name varchar(50) NOT NULL ) CREATE TABLE orders ( order_id int NOT NULL IDENTITY PRIMARY KEY, cust_id int NOT NULL REFERENCES customer(cust_id) )
Since the default referential action is NO ACTION, the orders table could also have been written in this way:
CREATE TABLE orders ( order_id int NOT NULL IDENTITY PRIMARY KEY, cust_id int NOT NULL REFERENCES customer(cust_id) ON UPDATE NO ACTION ON DELETE NO ACTION )
The orders table contains the column cust_id, which references the primary key of the customer table. An order (order_id) must not exist unless it relates to an existing customer (cust_id). With this definition, you can't delete a row from the customer table if a row that references it currently exists in the orders table, and you can't modify the cust_id column in a way that would destroy the relationship.
Note that a referential action can be specified for both DELETE and UPDATE operations on the referenced table, and the two operations can have different actions. For example, you can choose to CASCADE any updates to the cust_id in customer but not allow (NO ACTION) any deletes of referenced rows in customer.
Here's an example of adding a FOREIGN KEY constraint to the orders table that does just that: it cascades any updates to the cust_id in customer but does not allow any deletes of referenced rows in customer:
CREATE TABLE orders ( order_id int NOT NULL IDENTITY PRIMARY KEY, cust_id int NOT NULL REFERENCES customer(cust_id) ON UPDATE CASCADE ON DELETE NO ACTION )
The previous examples show the syntax for a single-column constraint—both the primary key and foreign key are single columns. This syntax uses the keyword REFERENCES, and the term "foreign key" is implied but not explicitly stated. The name of the FOREIGN KEY constraint is generated internally, following the same general form described earlier for PRIMARY KEY and UNIQUE constraints. Here's a portion of the output of sp_helpconstraint for both the customer and orders tables. (The tables were created in the pubs sample database.)
EXEC sp_helpconstraint customer >>> Object Name ------------ customer constraint_type constraint_name constraint_keys ---------------- ---------------------- --------------- PRIMARY KEY PK__customer__07F6335A cust_id (clustered) Table is referenced by ---------------------------------------------- pubs.dbo.orders: FK__orders__cust_id__0AD2A005 EXEC sp_helpconstraint orders >>> Object Name ----------- orders constraint_type constraint_name delete_ update_ action action --------------- ----------------------------- --------- ------ FOREIGN KEY FK__orders__cust_id__0AD2A005 no action no action PRIMARY KEY PK__orders__09DE7BCC (clustered) constraint_keys --------------- cust_id REFERENCES pubs.dbo.customer(cust_id) order_id No foreign keys reference this table.
Constraints defined inline with the column declaration can reference only a single column. This is also true for PRIMARY KEY and UNIQUE constraints. You must declare a multi-column constraint as a separate table element. The following example shows a multi-column FOREIGN KEY constraint:
CREATE TABLE customer ( cust_id int NOT NULL, location_num smallint NULL, cust_name varchar(50) NOT NULL, CONSTRAINT CUSTOMER_UNQ UNIQUE CLUSTERED (location_num, cust_id) ) CREATE TABLE orders ( order_id int NOT NULL IDENTITY CONSTRAINT ORDER_PK PRIMARY KEY NONCLUSTERED, cust_num int NOT NULL, cust_loc smallint NULL, CONSTRAINT FK_ORDER_CUSTOMER FOREIGN KEY (cust_loc, cust_num) REFERENCES customer (location_num, cust_id) ) GO EXEC sp_helpconstraint customer EXEC sp_helpconstraint orders GO >>> Object Name ----------- customer constraint_type constraint_name constraint_keys ------------------ ---------------- -------------------- UNIQUE (clustered) CUSTOMER_UNQ location_num, cust_id Table is referenced by --------------------------- pubs.dbo.orders: FK_ORDER_CUSTOMER Object Name ------------ orders constraint_type constraint_name constraint_keys ---------------- ---------------- ------------------ FOREIGN KEY FK_ORDER_CUSTOMER cust_loc, cust_num REFERENCES pubs.dbo.customer (location_num, cust_id) PRIMARY KEY ORDER_PK order_id (non-clustered) No foreign keys reference this table.
The preceding example also shows the following variations on how you can create constraints:
Unlike a PRIMARY KEY or UNIQUE constraint, an index isn't automatically built for the column or columns declared as FOREIGN KEY. However, in many cases, you'll want to build indexes on these columns because they're often used for joining to other tables. To enforce foreign key relationships, SQL Server must add additional steps to the execution plan of every insert, delete, and update (if the update affects columns that are part of the relationship) that affects either the table referencing another table or the table being referenced. The execution plan, determined by the SQL Server optimizer, is simply the collection of steps that carries out the operation. (In Chapter 16, you'll see the actual execution plan by using the SHOWPLAN options.)
If no FOREIGN KEY constraints exist, a statement specifying the update of a single row of the orders table might have an execution plan like the following:
When a FOREIGN KEY constraint exists on the orders table, the same operation has more steps in the execution plan:
The execution plan is more complex if the orders table has many FOREIGN KEY constraints declared. Internally, a simple update or insert operation might no longer be possible. Any such operation requires checking many other tables for matching entries. Because a seemingly simple operation might require checking as many as 253 other tables (see the next paragraph) and possibly creating multiple worktables, the operation might be much more complicated than it looks and much slower than expected.
A table can have a maximum of 253 FOREIGN KEY references. This limit is derived from the internal limit of 256 tables in a single query. In practice, an operation on a table with 253 or fewer FOREIGN KEY constraints might still fail with an error because of the 256-table query limit if worktables are required for the operation.
A database designed for excellent performance doesn't reach anything close to this limit of 253 FOREIGN KEY references. For best performance results, use FOREIGN KEY constraints judiciously. Some sites unnecessarily use too many FOREIGN KEY constraints because the constraints they declare are logically redundant. Take the following example. The orders table declares a FOREIGN KEY constraint to both the master_customer and customer_location tables:
CREATE TABLE master_customer ( cust_id int NOT NULL IDENTITY PRIMARY KEY, cust_name varchar(50) NOT NULL ) CREATE TABLE customer_location ( cust_id int NOT NULL, cust_loc smallint NOT NULL, CONSTRAINT PK_CUSTOMER_LOCATION PRIMARY KEY (cust_id,cust_loc), CONSTRAINT FK_CUSTOMER_LOCATION FOREIGN KEY (cust_id) REFERENCES master_customer (cust_id) ) CREATE TABLE orders ( order_id int NOT NULL IDENTITY PRIMARY KEY, cust_id int NOT NULL, cust_loc smallint NOT NULL, CONSTRAINT FK_ORDER_MASTER_CUST FOREIGN KEY (cust_id) REFERENCES master_customer (cust_id), CONSTRAINT FK_ORDER_CUST_LOC FOREIGN KEY (cust_id, cust_loc) REFERENCES customer_location (cust_id, cust_loc) )
Although logically the relationship between the orders and master_customer tables exists, the relationship is redundant to and subsumed by the fact that orders is related to customer_location, which has its own FOREIGN KEY constraint to master_customer. Declaring a foreign key for master_customer adds unnecessary overhead without adding any further integrity protection.
NOTE
In the case just described, declaring a foreign key improves the readability of the table definition, but you can achieve the same result by simply adding comments to the CREATE TABLE command. It's perfectly legal to add a comment practically anywhere—even in the middle of a CREATE TABLE statement. A more subtle way to achieve this result is to declare the constraint so that it appears in sp_helpconstraint and in the system catalogs but then disable the constraint by using the ALTER TABLE NOCHECK option. Because the constraint will then be unenforced, an additional table isn't added to the execution plan.
The CREATE TABLE statement shown in the following example for the orders table omits the redundant foreign key and, for illustrative purposes, includes a comment. Despite the lack of a FOREIGN KEY constraint in the master_customer table, you still can't insert a cust_id that doesn't exist in the master_customer table because the reference to the customer_location table will prevent it.
CREATE TABLE orders ( order_id int NOT NULL IDENTITY PRIMARY KEY, cust_id int NOT NULL, cust_loc smallint NOT NULL, —- Implied Foreign Key Reference of: —- (cust_id) REFERENCES master_customer (cust_id) CONSTRAINT FK_ORDER_CUST_LOC FOREIGN KEY (cust_id, cust_loc) REFERENCES customer_location (cust_id, cust_loc) )
Note that the table on which the foreign key is declared (the referencing table, which in this example is orders) isn't the only table that requires additional execution steps. When being updated, the table being referenced (in this case customer_location) must also have additional steps in its execution plan to ensure that an update of the columns being referenced won't break a relationship and create an orphan entry in the orders table. Without making any changes directly to the customer_location table, you can see a significant decrease in update or delete performance because of foreign key references added to other tables.
When using constraints, you should consider triggers, performance, and indexing. Let's take a look at the ramifications of each.
Triggers I won't discuss triggers in detail until Chapter 12, but for now, you should simply note that constraints are enforced before an AFTER triggered action is performed. If the constraint is violated, the statement will abort before the trigger fires.
NOTE
The owner of a table isn't allowed to declare a foreign key reference to another table unless the owner of the other table has granted REFERENCES permission to the first table owner. Even if the owner of the first table is allowed to select from the table to be referenced, that owner must have REFERENCES permission. This prevents another user from changing the performance of operations on your table without your knowledge or consent. You can grant any user REFERENCES permission even if you don't also grant SELECT permission, and vice-versa. The only exception is that the DBO, or any user who is a member of the db_owner role, has full default permissions on all objects in the database.
Performance When deciding on the use of foreign key relationships, you must weigh the protection provided against the corresponding performance overhead. Be careful not to add constraints that form logically redundant relationships. Excessive use of FOREIGN KEY constraints can severely degrade the performance of seemingly simple operations.
Indexing The columns specified in FOREIGN KEY constraints are often strong candidates for index creation. You should build the index with the same key order used in the PRIMARY KEY or UNIQUE constraint of the table that it references so that joins can be performed efficiently. Also be aware that a foreign key is often a subset of the table's primary key. In the customer_location table used in the preceding two examples, cust_id is part of the primary key as well as a foreign key in its own right. Given that cust_id is part of a primary key, it's already part of an index. In this example, cust_id is the lead column of the index, and building a separate index on it alone probably isn't warranted. However, if cust_id is not the lead column of the index B-tree, it might make sense to build an index on it.
Sometimes two tables reference one another, which creates a bootstrap problem. Suppose Table1 has a foreign key reference to Table2, but Table2 has a foreign key reference to Table1. Even before either table contains any data, you'll be prevented from inserting a row into Table1 because the reference to Table2 will fail. Similarly, you can't insert a row into Table2 because the reference to Table1 will fail.
ANSI SQL has a solution: deferred constraints, in which you can instruct the system to postpone constraint checking until the entire transaction is committed. Using this elegant remedy puts both INSERT statements into a single transaction that results in the two tables having correct references by the time COMMIT occurs. Unfortunately, no mainstream product currently provides the deferred option for constraints. The deferred option is part of the complete SQL-92 specification, which no product has yet fully implemented.
SQL Server 2000 provides immediate constraint checking; it has no deferred option. SQL Server offers three options for dealing with constraint checking: it allows you to add constraints after adding data, it lets you temporarily disable checking of foreign key references, and it allows you to use the bcp (bulk copy) program or BULK INSERT command to initially load data and avoid checking FOREIGN KEY constraints. (You can override this default option with bcp or the BULK INSERT command and force FOREIGN KEY constraints to be validated.) To add constraints after adding data, don't create constraints via the CREATE TABLE command. After adding the initial data, you can add constraints by using the ALTER TABLE command.
With the second option, the table owner can temporarily disable checking of foreign key references by using the ALTER TABLE table NOCHECK CONSTRAINT statement. Once data exists, you can reestablish the FOREIGN KEY constraint by using ALTER TABLE table CHECK CONSTRAINT. Note that when an existing constraint is reenabled using this method, SQL Server doesn't automatically check to see that all rows still satisfy the constraint. To do this, you can simply issue a dummy update by setting a column to itself for all rows, determining whether any constraint violations are raised, and then fixing them. (For example, you can issue UPDATE orders SET cust_id = cust_id.)
Finally you can use the bcp program or the BULK INSERT command to initially load data. The BULK INSERT command and the bcp program don't check any FOREIGN KEY constraints by default. You can use the CHECK_CONSTRAINTS option to override this behavior. BULK INSERT and bcp are faster than regular INSERT commands because they usually bypass normal integrity checks and most logging.
When you use ALTER TABLE to add (instead of reenable) a new FOREIGN KEY constraint for a table in which data already exists, the existing data is checked by default. If constraint violations occur, the constraint isn't added. With large tables, such a check can be quite time-consuming. You do have an alternative—you can add a FOREIGN KEY constraint and omit the check. To do this, you specify the WITH NOCHECK option with ALTER TABLE. All subsequent operations will be checked, but existing data won't be checked. As in the case of reenabling a constraint, you can then perform a dummy update to flag any violations in the existing data. If you use this option, you should do the dummy update as soon as possible to ensure that all the data is clean. Otherwise, your users might see constraint error messages when they perform update operations on the preexisting data even if they haven't changed any values.
If you're dropping tables, you must drop all the referencing tables or drop the referencing FOREIGN KEY constraints before dropping the referenced table. For example, in the preceding example's orders, customer_location, and master_customer tables, the following sequence of DROP statements fails because a table being dropped is referenced by a table that still exists—that is, customer_location can't be dropped because the orders table references it, and orders isn't dropped until later:
DROP TABLE customer_location DROP TABLE master_customer DROP TABLE orders
Changing the sequence to the following works fine because orders is dropped first:
DROP TABLE orders DROP TABLE customer_location DROP TABLE master_customer
When two tables reference each other, you must first drop the constraints or set them to NOCHECK (both operations use ALTER TABLE) before the tables can be dropped. Similarly, a table that's being referenced can't be part of a TRUNCATE TABLE command. You must drop or disable the constraint or else simply drop and rebuild the table.
A table can be self-referencing—that is, the foreign key can reference one or more columns in the same table. The following example shows an employee table in which a column for managers references another employee entry:
CREATE TABLE employee ( emp_id int NOT NULL PRIMARY KEY, emp_name varchar(30) NOT NULL, mgr_id int NOT NULL REFERENCES employee(emp_id) )
The employee table is a perfectly reasonable table. It illustrates most of the issues I've discussed. However, in this case, a single INSERT command that satisfies the reference is legal. For example, if the CEO of the company has an emp_id of 1 and is also his own manager, the following INSERT will be allowed and can be a useful way to insert the first row in a self-referencing table:
INSERT employee VALUES (1, 'Chris Smith', 1)
Although SQL Server doesn't currently provide a deferred option for constraints, self-referencing tables add a twist that sometimes makes SQL Server use deferred operations internally. Consider the case of a nonqualified DELETE statement that deletes many rows in the table. After all rows are ultimately deleted, you can assume that no constraint violation will occur. However, violations might occur during the DELETE operation because some of the remaining referencing rows might be orphaned before they are actually deleted. SQL Server handles such interim violations automatically and without any user intervention. As long as the self-referencing constraints are valid at the end of the data modification statement, no errors are raised during processing.
To gracefully handle these interim violations, however, additional processing and worktables are required to hold the work-in-progress. This adds substantial overhead and can also limit the actual number of foreign keys that can be used. An UPDATE statement can also cause an interim violation. For example, if all employee numbers are to be changed by multiplying each by 1000, the following UPDATE statement would require worktables to avoid the possibility of raising an error on an interim violation:
UPDATE employee SET emp_id=emp_id * 1000, mgr_id=mgr_id * 1000
The additional worktables and the processing needed to handle the worktables are made part of the execution plan. Therefore, if the optimizer sees that a data modification statement could cause an interim violation, the additional temporary worktables will be created even if no such interim violations ever actually occur. These extra steps are needed only in the following situations:
If a data modification statement in your application meets the preceding criteria, you can be sure that SQL Server is automatically using a limited and special-purpose form of deferred constraints to protect against interim violations.
Enforcing domain integrity (that is, ensuring that only entries of expected types, values, or ranges can exist for a given column) is also important. SQL Server provides two ways to enforce domain integrity: CHECK constraints and rules. CHECK constraints allow you to define an expression for a table that must not evaluate to FALSE for a data modification statement to succeed. (Note that I didn't say that the constraint must evaluate to TRUE. The constraint will allow the row if it evaluates to TRUE or to unknown. The constraint evaluates to unknown when NULL values are present, and this introduces three-value logic. The issues of NULLs and three-value logic are discussed in depth in Chapter 7.) SQL Server provides a similar mechanism to CHECK constraints, called rules, which are provided basically for backward compatibility reasons. Rules perform almost the same function as CHECK constraints, but they use different syntax and have fewer capabilities. Rules have existed in SQL Server since its initial release in 1989, well before CHECK constraints, which are part of the ANSI SQL-92 standard and were added in version 6 in 1995.
CHECK constraints make a table's definition more readable by including the domain checks in the DDL. Rules have a potential advantage in that they can be defined once and then bound to multiple columns and tables (using sp_bindrule each time), while you must respecify a CHECK constraint for each column and table. But the extra binding step can also be a hassle, so this capability for rules is beneficial only if a rule will be used in many places. Although performance between the two approaches is identical, CHECK constraints are generally preferred over rules because they're a direct part of the table's DDL, they're ANSI-standard SQL, they provide a few more capabilities than rules (such as the ability to reference other columns in the same row or to call a system function), and perhaps most important, they're more likely than rules to be further enhanced in future releases of SQL Server. For these reasons, I'm not going to talk about rules any more.
CHECK constraints add additional steps to the execution plan to ensure that the expression doesn't evaluate to FALSE (which would result in the operation being aborted). Although steps are added to the execution plan for data modifications, these are typically much less expensive than the extra steps discussed earlier for FOREIGN KEY constraints. For foreign key checking, another table must be searched, requiring additional I/O. CHECK constraints deal only with some logical expression for the specific row already being operated on, so no additional I/O is required. Because additional processing cycles are used to evaluate the expressions, the system requires more CPU use. But if there's plenty of CPU power to spare, the effect might well be negligible. You can watch this by using Performance Monitor.
As with other constraint types, you can declare CHECK constraints on a single column or on multiple columns. You must declare a CHECK constraint that refers to more than one column as a separate element in the CREATE TABLE statement. Only single-column CHECK constraints can be defined right along with the column definition, and only one CHECK constraint can be defined with the column. All other CHECK constraints have to be defined as separate elements. However, keep in mind that one constraint can have multiple logical expressions that can be AND'ed or OR'ed together.
Some CHECK constraint features have often been underutilized by SQL Server database designers, including the ability to reference other columns in the same row, use system and niladic functions (which are evaluated at runtime), use combinations of expressions combined with AND or OR, and use CASE expressions. The following example shows a table with multiple CHECK constraints (as well as a PRIMARY KEY constraint and a FOREIGN KEY constraint) and showcases some of these features:
CREATE TABLE employee ( emp_id int NOT NULL PRIMARY KEY CHECK (emp_id BETWEEN 0 AND 1000), emp_name varchar(30) NOT NULL CONSTRAINT no_nums CHECK (emp_name NOT LIKE '%[0-9]%'), mgr_id int NOT NULL REFERENCES employee(emp_id), entered_date datetime NULL CHECK (entered_date >= CURRENT_TIMESTAMP), entered_by int CHECK (entered_by IS NOT NULL), CONSTRAINT valid_entered_by CHECK (entered_by = SUSER_ID(NULL) AND entered_by <> emp_id), CONSTRAINT valid_mgr CHECK (mgr_id <> emp_id OR emp_id=1), CONSTRAINT end_of_month CHECK (DATEPART(DAY, GETDATE()) < 28) ) GO EXEC sp_helpconstraint employee GO >>>> Object Name --------------- employee constraint_type constraint_name constraint_keys --------------- -------------------------------- --------------- CHECK on column CK__employee__emp_id__2C3393D0 ([emp_id] >= 0 emp_id [emp_id] <= 1000) CHECK on column CK__employee__entered_by__300424B4 (((not( entered_by [entered_by] is null)))) CHECK on column CK__employee__entered_date__2F10007B ([entered_date] entered_date >= getdate()) CHECK Table end_of_month (datepart(day, Level getdate()) < 28) FOREIGN KEY FK__employee__mgr_id__2E1BDC42 mgr_id CHECK on column no_nums ([emp_name] not emp_name like '%[0-9]%') PRIMARY KEY PK__employee__2B3F6F97 emp_id (clustered) CHECK Table Level valid_entered_by ([entered_by] = suser_id(null) and [entered_by] <> [emp_id]) CHECK Table Level valid_mgr ([mgr_id] <> [emp_id] or [emp_id] = 1) Table is referenced by -------------------------------- pubs.dbo.employee: FK__employee__mgr_id__2E1BDC42
This example illustrates the following points:
ALTER TABLE employee add department_no int CHECK (department_no < 0 AND department_no > 100)
However, once the preceding command executes successfully and the new column and constraint are added to the table, you can't add another row to the table. I accidentally typed AND in the ALTER TABLE statement instead of OR, but that isn't considered an illegal CHECK constraint. Each time I try to insert a row, SQL Server will try to validate that the value in the department_no column is both less than 0 and more than 100! Any value for department_no that doesn't meet that requirement will cause the insert to fail. It could take a long time to come up with a value to satisfy this constraint. In this case, I'd have to drop the constraint and add a new one with the correct definition before I could insert any rows.
As with FOREIGN KEY constraints, you can add or drop CHECK constraints by using ALTER TABLE. When adding a constraint, by default the existing data is checked for compliance; you can override this default by using the WITH NOCHECK syntax. You can later do a dummy update to check for any violations. The table or database owner can also temporarily disable CHECK constraints by using NOCHECK in the ALTER TABLE statement.
A default allows you to specify a constant value, NULL, or the run-time value of a system function if no known value exists or if the column is missing in an INSERT statement. Although you could argue that a default isn't truly a constraint (because a default doesn't enforce anything), you can create defaults in a CREATE TABLE statement using the CONSTRAINT keyword; therefore, I'll refer to defaults as constraints. Defaults add little overhead, and you can use them liberally without too much concern about performance degradation.
SQL Server provides two ways of creating defaults. First, you can create a default (CREATE DEFAULT) and then bind the default to a column (sp_bindefault). This has been the case since the original SQL Server release. Second, you can use default constraints. These were introduced in 1995 with SQL Server 6.0 as part of the CREATE TABLE and ALTER TABLE statements, and they're based on the ANSI SQL standard (which includes such niceties as being able to use system functions). Using defaults is pretty intuitive. The type of default you use is a matter of preference; both perform the same function internally. Future enhancements are likely to be made to the ANSI-style implementation. Having the default within the table DDL seems a cleaner approach. I recommend that you use defaults within CREATE TABLE and ALTER TABLE rather than within CREATE DEFAULT, so I'll focus on that style here.
Here's the example from the previous CHECK constraint discussion, now modified to include several defaults:
CREATE TABLE employee ( emp_id int NOT NULL PRIMARY KEY DEFAULT 1000 CHECK (emp_id BETWEEN 0 AND 1000), emp_name varchar(30) NULL DEFAULT NULL CONSTRAINT no_nums CHECK (emp_name NOT LIKE '%[0-9]%'), mgr_id int NOT NULL DEFAULT (1) REFERENCES employee(emp_id), entered_date datetime NOT NULL CHECK (entered_date >= CONVERT(char(10), CURRENT_TIMESTAMP, 102)) CONSTRAINT def_today DEFAULT (CONVERT(char(10), GETDATE(), 102)), entered_by int NOT NULL DEFAULT SUSER_ID() CHECK (entered_by IS NOT NULL), CONSTRAINT valid_entered_by CHECK (entered_by=SUSER_ID() AND entered_by <> emp_id), CONSTRAINT valid_mgr CHECK (mgr_id <> emp_id OR emp_id=1), CONSTRAINT end_of_month CHECK (DATEPART(DAY, GETDATE()) < 28) ) GO EXEC sp_helpconstraint employee GO >>> Object Name --------------- employee constraint_type constraint_name constraint_keys --------------- -------------------------------- --------------- CHECK on column CK__employee__emp_id__2C3393D0 ([emp_id] >= 0 emp_id [emp_id] <= 1000) CHECK on column CK__employee__entered_by__300424B4 (((not( entered_by [entered_by] is null)))) DEFAULT on column def_today convert( entered_date char(10), getdate(), 102)) DEFAULT on column DF__employee__emp_id__35BCFE0A (1000) emp_id DEFAULT on column DF__employee__emp_name__37A5467C (null) emp_name DEFAULT on column DF__employee__entered_by__3D5E1FD2 (suser_id()) entered_by DEFAULT on column DF__employee__mgr_id__398D8EEE (1) mgr_id CHECK on column CK__employee__entered_date__2F10007B ([entered_date] entered_date >= getdate()) CHECK Table Level end_of_month (datepart(day, getdate()) < 28) FOREIGN KEY FK__employee__mgr_id__2E1BDC42 mgr_id CHECK on column no_nums ([emp_name] emp_name not like '%[0-9]%') PRIMARY KEY PK__employee__2B3F6F97 emp_id (clustered) CHECK Table Level valid_entered_by ( [entered_by] = suser_id(null) and [entered_by] <> [emp_id]) CHECK Table Level valid_mgr ([mgr_id] <> [emp_id] or [emp_id] = 1) Table is referenced by -------------------------------- pubs.dbo.employee: FK__employee__mgr_id__2E1BDC42
The preceding code demonstrates the following about defaults:
INSERT EMPLOYEE VALUES (1, 'The Big Guy', 1, DEFAULT, DEFAULT)
Table 6-10 summarizes the behavior of INSERT statements based on whether a column is declared NULL or NOT NULL and whether it has a default specified. It shows the result for the column for three cases:
Table 6-10. INSERT behavior with defaults.
No Entry | Enter NULL | Enter DEFAULT | ||||
---|---|---|---|---|---|---|
No Default | Default | No Default | Default | No Default | Default | |
NULL | NULL | default | NULL | NULL | NULL | default |
NOT NULL | error | default | error | error | error | default |
Declaring a default on a column that has the IDENTITY property doesn't make sense, and SQL Server will raise an error if you try it. The IDENTITY property acts as a default for the column. But the DEFAULT keyword cannot be used as a placeholder for an identity column in the values list of an INSERT statement. You can use a special form of INSERT statement if a table has a default value for every column (an identity column does meet this criteria) or allows NULL. The following statement uses the DEFAULT VALUES clause instead of a column list and values list:
INSERT employee DEFAULT VALUES
TIP
You can generate some test data by putting the IDENTITY property on a primary key column and declaring default values for all other columns and then repeatedly issuing an INSERT statement of this form within a Transact-SQL loop.
This section offers some advice on working with constraints. I'll look at constraint names and system catalog entries, the status field, constraint failures, multiple-row data modifications, and integrity checks.
Earlier in this chapter, you learned about the cryptic-looking constraint names that SQL Server generates. Now I'll explain the naming. Consider again the following simple CREATE TABLE statement:
CREATE TABLE customer ( cust_id int IDENTITY NOT NULL PRIMARY KEY, cust_name varchar(30) NOT NULL )
The constraint produced from this simple statement bears the nonintuitive name PK__customer__68E79C55. All types of single-column constraints use this naming scheme, which I'll explain shortly. The advantage of explicitly naming your constraint rather than using the system-generated name is greater clarity. The constraint name is used in the error message for any constraint violation, so creating a name such as CUSTOMER_PK probably makes more sense to users than a name such as PK__customer__cust_i__0677FF3C. You should choose your own constraint names if such error messages are visible to your users. The first two characters (PK) show the constraint type—PK for PRIMARY KEY, UQ for UNIQUE, FK for FOREIGN KEY, and DF for DEFAULT. Next are two underscore characters, which are used as a separator. (You might be tempted to use one underscore to conserve characters and to avoid having to truncate as much. However, it's common to use a single underscore in a table name or a column name, both of which appear in the constraint name. Using two underscore characters distinguishes the kind of a name it is and where the separation occurs.)
Next comes the table name (customer), which is limited to 116 characters for a PRIMARY KEY constraint and slightly fewer characters for all other constraint names. For all constraints other than PRIMARY KEY, there are then two more underscore characters for separation followed by the next sequence of characters, which is the column name. The column name is truncated to five characters if necessary. If the column name has fewer than five characters in it, the length of the table name portion can be slightly longer.
And finally, the hexadecimal representation of the object ID for the constraint (68E79C55) comes after another separator. (This value is used as the id column of the sysobjects system table and the constid column of the sysconstraints system table.) Object names are limited to 128 characters in SQL Server 2000, so the total length of all portions of the constraint name must also be less than or equal to 128.
Note the queries and output using this value that are shown in Figure 6-14.
Figure 6-14. Querying the system tables using the hexadecimal ID value.
TIP
The hexadecimal value 0x68E79C55 is equal to the decimal value 1760009301, which is the value of constid in sysconstraints and of id in sysobjects.
These sample queries of system tables show the following:
TIP
To see the names and order of the columns in a PRIMARY KEY or UNIQUE constraint, you can query the sysindexes and syscolumns tables for the index being used to enforce the constraint. The name of the constraint and that of the index enforcing the constraint are the same, whether the name was user-specified or system-generated. The columns in the index key are somewhat cryptically encoded in the keys1 and keys2 fields of sysindexes. The easiest way to decode these values is to simply use the sp_helpindex system stored procedure; alternatively, you can use the code of that procedure as a template if you need to decode them in your own procedure.
The status field of the sysconstraints view is a pseudo-bit-mask field packed with information. We could also call it a bitmap, because each bit has a particular meaning. If you know how to crack this column, you can essentially write your own sp_helpconstraint-like procedure. Note that the documentation is incomplete regarding the values of this column. One way to start decoding this column is to look at the definition of the sysconstraints view using the sp_helptext system procedure.
The lowest four bits, obtained by AND'ing status with 0xF (status & 0xF), contain the constraint type. A value of 1 is PRIMARY KEY, 2 is UNIQUE, 3 is FOREIGN KEY, 4 is CHECK, and 5 is DEFAULT. The fifth bit is on (status & 0x10 <> 0) when the constraint is a nonkey constraint on a single column. The sixth bit is on (status & 0x20 <> 0) when the constraint is on multiple columns or is a PRIMARY KEY or UNIQUE constraint.
NOTE
Some of the documentation classifies constraints as either table-level or column-level. This implies that any constraint defined on the line with a column is a column-level constraint and any constraint defined as a separate line in the table or added to the table with the ALTER TABLE command is a table-level constraint. However, this distinction does not hold true when you look at sysconstraints. Although it is further documented that the fifth bit is for a column-level constraint, you can see for yourself that this bit is on for any single column constraint except PRIMARY KEY and UNIQUE and that the sixth bit, which is documented as indicating a table-level constraint, is on for all multi-column constraints, as well as PRIMARY KEY and UNIQUE constraints.
Some of the higher bits are used for internal status purposes, such as noting whether a nonclustered index is being rebuilt, and for other internal states. Table 6-11 shows some of the other bit-mask values you might be interested in:
Table 6-11. Bitmap values in the status column of sysconstraints.
Bitmap Value | Description |
---|---|
16 | The constraint is a "column-level" constraint, which means that it's a single column constraint and isn't enforcing entity integrity |
32 | The constraint is a "table-level" constraint, which means that it's either a multi-column constraint, a PRIMARY KEY, or a UNIQUE constraint |
512 | The constraint is enforced by a clustered index. |
1024 | The constraint is enforced by a nonclustered index. |
16384 | The constraint has been disabled. |
32767 | The constraint has been enabled. |
131072 | SQL Server has generated a name for the constraint |
Using this information, and not worrying about the higher bits used for internal status, you could use the following query to show constraint information for the employee table:
SELECT OBJECT_NAME(constid) 'Constraint Name', constid 'Constraint ID', CASE (status & 0xF) WHEN 1 THEN 'Primary Key' WHEN 2 THEN 'Unique' WHEN 3 THEN 'Foreign Key' WHEN 4 THEN 'Check' WHEN 5 THEN 'Default' ELSE 'Undefined' END 'Constraint Type', CASE (status & 0x30) WHEN 0x10 THEN 'Column' WHEN 0x20 THEN 'Table' ELSE 'NA' END 'Level' FROM sysconstraints WHERE id=OBJECT_ID('employee') >>> Constraint Name Constraint Constraint Level ID Type -------------------------------------- ----------- ------------ ------ PK__employee__49C3F6B7 1237579447 Primary Key Table DF__employee__emp_id__4AB81AF0 1253579504 Default Column CK__employee__emp_id__4BAC3F29 1269579561 Check Column DF__employee__emp_name__4CA06362 1285579618 Default Column no_nums 1301579675 Check Column DF__employee__mgr_id__4E88ABD4 1317579732 Default Column FK__employee__mgr_id__4F7CD00D 1333579789 Foreign Key Column CK__employee__entered_date__5070F446 1349579846 Check Column def_today 1365579903 Default Column DF__employee__entered_by__52593CB8 1381579960 Default Column CK__employee__entered_by__534D60F1 1397580017 Check Column valid_entered_by 1413580074 Check Table valid_mgr 1429580131 Check Table end_of_month 1445580188 Check Table
Many bugs occur in application code because the developers don't understand how failure of a constraint affects a multiple-statement transaction declared by the user. The biggest misconception is that any error, such as a constraint failure, automatically aborts and rolls back the entire transaction. On the contrary, after an error is raised, it's up to the transaction to either proceed and ultimately commit or to roll back. This feature provides the developer with the flexibility to decide how to handle errors. (The semantics are also in accordance with the ANSI SQL-92 standard for COMMIT behavior).
Following is an example of a simple transaction that tries to insert three rows of data. The second row contains a duplicate key and violates the PRIMARY KEY constraint. Some developers believe that this example wouldn't insert any rows because of the error that occurs in one of the statements; they think that this error will cause the entire transaction to be aborted. However, this doesn't happen—instead, the statement inserts two rows and then commits that change. Although the second INSERT fails, the third INSERT is processed because no error checking has occurred between the statements, and then the transaction does a COMMIT. Because no instructions were provided to take some other action after the error other than to proceed, SQL Server does just that. It adds the first and third INSERT statements to the table and ignores the second statement.
IF EXISTS (SELECT * FROM sysobjects WHERE name='show_error' AND type='U') DROP TABLE show_error GO CREATE TABLE show_error ( col1 smallint NOT NULL PRIMARY KEY, col2 smallint NOT NULL ) GO BEGIN TRANSACTION INSERT show_error VALUES (1, 1) INSERT show_error VALUES (1, 2) INSERT show_error VALUES (2, 2) COMMIT TRANSACTION GO SELECT * FROM show_error GO Server: Msg 2627, Level 14, State 1, Line 1 Violation of PRIMARY KEY constraint 'PK__show_error__6EF57B66'. Cannot insert duplicate key in object 'show_error'. The statement has been terminated. col1 col2 ---- ---- 1 1 2 2
Here's a modified version of the transaction. This example does some simple error checking using the system function @@ERROR and rolls back the transaction if any statement results in an error. In this example, no rows are inserted because the transaction is rolled back.
IF EXISTS (SELECT * FROM sysobjects WHERE name='show_error' AND type='U') DROP TABLE show_error GO CREATE TABLE show_error ( col1 smallint NOT NULL PRIMARY KEY, col2 smallint NOT NULL ) GO BEGIN TRANSACTION INSERT show_error VALUES (1, 1) IF @@ERROR <> 0 GOTO TRAN_ABORT INSERT show_error VALUES (1, 2) if @@ERROR <> 0 GOTO TRAN_ABORT INSERT show_error VALUES (2, 2) if @@ERROR <> 0 GOTO TRAN_ABORT COMMIT TRANSACTION GOTO FINISH TRAN_ABORT: ROLLBACK TRANSACTION FINISH: GO SELECT * FROM show_error GO Server: Msg 2627, Level 14, State 1, Line 1 Violation of PRIMARY KEY constraint 'PK__show_error__70DDC3D8'. Cannot insert duplicate key in object 'show_error'. The statement has been terminated. col1 col2 ---- ---- (0 row(s) affected)
Because many developers have handled transaction errors incorrectly and because it can be tedious to add an error check after every command, SQL Server includes a SET statement that aborts a transaction if it encounters any error during the transaction. (Transact-SQL has no WHENEVER statement, although such a feature would be useful for situations like this.) Using SET XACT_ABORT ON causes the entire transaction to be aborted and rolled back if any error is encountered. The default setting is OFF, which is consistent with ANSI-standard behavior. By setting the option XACT_ABORT ON, we can now rerun the example that does no error checking, and no rows will be inserted:
IF EXISTS (SELECT * FROM sysobjects WHERE name='show_error' AND type='U') DROP TABLE show_error GO CREATE TABLE show_error ( col1 smallint NOT NULL PRIMARY KEY, col2 smallint NOT NULL ) GO SET XACT_ABORT ON BEGIN TRANSACTION INSERT show_error VALUES (1, 1) INSERT show_error VALUES (1, 2) INSERT show_error VALUES (2, 2) COMMIT TRANSACTION GO SELECT * FROM show_error GO Server: Msg 2627, Level 14, State 1, Line 1 Violation of PRIMARY KEY constraint 'PK__show_error__72C60C4A'. Cannot insert duplicate key in object 'show_error'. col1 col2 ---- ---- (0 row(s) affected)
A final comment about constraint errors and transactions: a single data modification statement (such as an UPDATE statement) that affects multiple rows is automatically an atomic operation, even if it's not part of an explicit transaction. If such an UPDATE statement finds 100 rows that meet the criteria of the WHERE clause but one row fails because of a constraint violation, no rows will be updated.
The modification of a given row will fail if any constraint is violated or if a trigger aborts the operation. As soon as a failure in a constraint occurs, the operation is aborted, subsequent checks for that row aren't performed, and no trigger fires for the row. Hence, the order of these checks can be important, as the following list shows.