Lesson 1: Validating Data and Permissions

Estimated lesson time: 60 minutes

There are several ways of validating data integrity in the database. These can be grouped into two categories:

Declarative data integrity Declarative data integrity is a set of rules that are applied to a table and its columns using the CREATE TABLE or ALTER TABLE statement. Examples of declarative data integrity include constraints (such as check, foreign key, unique, and primary key constraints) and rules.
Procedural data integrity Procedural data integrity is implemented either by having a stored procedure validate the data prior to inserting it into a table or by having a trigger check the data before or after a data manipulation language (DML) statement (such as INSERT, UPDATE, or DELETE) is issued.

In general, declarative data integrity is the simplest integrity check to integrate because it requires very little development effort. This also makes it less likely to produce bugs because it contains less code than procedural data integrity. On the other hand, procedural data integrity typically enables more advanced integrity checks. The typical database application will need to use both declarative and procedural data integrity.

In this lesson, you start with declarative data integrity and then look at procedural data integrity. This lesson ends with a look at using Transact-SQL (T-SQL) to verify permissions and protection methods that you can implement to help prevent damage from SQL injection attacks.

Implementing Declarative Data Integrity

Declarative data integrity is implemented using either constraints or rules. Rules are objects that are similar in function to check constraints. However, you should not use rules in your database because rules are a deprecated feature in SQL Server and will most likely be removed in a future release of the product. This leaves you with constraints. There are five types of constraints: PRIMARY KEY, UNIQUE, FOREIGN KEY, CHECK, and DEFAULT. Default constraints are only used to provide default values for columns and, therefore, they are not covered in this chapter.

Primary Key and Unique Constraints

Both primary keys and unique constraints identify a column or combination of columns that uniquely identifies a row in a table. This is enforced through the creation of a unique index; for example, an index that does not allow duplicate values. Because of this, a primary key and unique constraints have the same size limitations as the key of an index-that is, the key cannot contain more than 16 columns or 900 bytes of data.

Important

Index key

All columns that are part of an index, excluding included columns, are referred to as the index's key. Included columns are covered in detail in Chapter 7, "Optimizing SQL Server 2005 Performance."

If nothing else is specified, the index that is created for a primary key is a clustered index and the index for a unique constraint is a non-clustered index. However, you can change this behavior by specifying it in the following constraint definition:

 -- Primary key as a clustered index. ALTER TABLE MyTable ADD PRIMARY KEY (MyTableID); -- Primary key as a nonclustered index. ALTER TABLE MyTable ADD PRIMARY KEY NONCLUSTERED (MyTableID);

Because primary keys and unique constraints are both constraints and indexes, you can find information about them in both the sys.key_constraints and sys.indexes catalog views.

Important

Computed columns

You can create both primary key and unique constraints on computed columns.

Foreign Key Constraints

Foreign key constraints identify a column or combination of columns whose values must exist in another column or combination of columns in the same table or in another table in the same database. Foreign key constraints manage referential integrity between tables or within a single table. To implement a foreign key constraint, you must follow these rules:

The columns being referenced must have exactly the same data type (and collation, for string columns) as the local columns.
The columns being referenced must have a unique index created on them. This is typically implemented using either a primary key or a unique constraint.
Because the foreign key must reference a unique index, the foreign key columns have the same size limitations as that of the primary key and unique constraints.

A new feature of SQL Server 2005 is that you can create foreign key constraints on computed columns. You can find information about which foreign key constraints exist in your database by querying the sys.foreign_keys and sys.foreign_key_columns catalog views.

Foreign keys are usually queried frequently in user queries and in joins, as well as when SQL Server needs to verify referential integrity when deleting or updating primary key rows. This means that foreign keys usually greatly benefit from being indexed. Indexing is covered in greater detail in Chapter 7.

When a foreign key constraint raises an error because of a delete or an update of the row that it references, the default reaction is to raise an error message and roll back the statement that violated the constraint. If this is not the result you want, you can change the default action for the foreign key to delete the referenced row, update the referenced column, or both. There are four actions to choose from:

NO ACTION (the default; also available in earlier versions of SQL Server)
SET NULL
SET DEFAULT
CASCADE (also available in SQL Server 2000)

An example implementation is shown below:

 CREATE TABLE Test.Customers (    CustomerID INT PRIMARY KEY ); CREATE TABLE Test.Orders (    OrderID INT PRIMARY KEY    ,CustomerID INT NULL          REFERENCES Test.Customers               ON DELETE SET NULL               ON UPDATE CASCADE );

The default behavior of the foreign key is NO ACTION. If the foreign key finds an error and NO ACTION is specified, SQL Server rolls back the statement that violated the constraint and raises an error message.

SET NULL and SET DEFAULT each cause all the referenced values to be set to either NULL (for SET NULL) or DEFAULT (for SET DEFAULT-that is, the default defined on the column) instead of raising an error and rolling back the statement. In the relationship between the Orders and Customers tables shown in the code sample, if a customer is deleted, the CustomerID column is set to NULL for all orders belonging to that customer and no error message is sent to the calling application.

The CASCADE action causes SQL Server to delete referenced rows for a delete statement (ON DELETE) and update the referenced values (ON UPDATE) for an update statement. Using the same code sample, if the CustomerID column is changed for a row in the Customers table, all corresponding rows in the Orders table are updated with the same CustomerID to reflect the change. If ON DELETE CASCADE is specified for the foreign key constraint and a row in the Customers table is deleted, all referencing rows in the Orders table are deleted. This might sound reasonable, but it might not be possible to implement CASCADE for all foreign key constraints because cyclic references are not supported. For example, in the following script, an error will be raised when you try to add the foreign key FKCustomersLastOrder because it will introduce a cyclic reference. If a customer is deleted, all referencing orders must be deleted, and all customers referencing those orders through the LastOrderID column must be deleted.

 CREATE TABLE Test.Customers (    CustomerID INT PRIMARY KEY    ,LastOrderID INT NULL ); CREATE TABLE Test.Orders (    OrderID INT PRIMARY KEY    ,CustomerID INT NOT NULL          REFERENCES Test.Customers               ON DELETE CASCADE               ON UPDATE NO ACTION ); ALTER TABLE Test.Customers ADD    CONSTRAINT FKCustomersLastOrder    FOREIGN KEY (LastOrderID)          REFERENCES Test.Orders (OrderID)               ON DELETE CASCADE               ON UPDATE NO ACTION

In the previous example, consider what happens if a customer is deleted-all of the customer's orders are also deleted. This might be fine, but consider the following example:

 CREATE TABLE Test.Countries (    CountryID INT PRIMARY KEY ); CREATE TABLE Test.Cities (    CityID INT PRIMARY KEY    ,CountryID INT NOT NULL          REFERENCES Test.Countries               ON DELETE CASCADE ); CREATE TABLE Test.Customers (    CustomerID INT PRIMARY KEY    ,CityID INT NOT NULL          REFERENCES Test.Cities               ON DELETE CASCADE ); CREATE TABLE Test.Orders (    OrderID INT PRIMARY KEY    ,CustomerID INT NOT NULL          REFERENCES Test.Customers               ON DELETE CASCADE );

In this example, if you delete a country, all cities in that country, all customers in those cities, and all orders belonging to those customers are also deleted. Be cautious-you might be deleting more than you think. Consider someone executing the query "DELETE Test.Countries WHERE CountryID = 1;" from SQL Server Management Studio. The person might think he is deleting only one row in the Countries table when he might actually be deleting millions of rows. The time it takes to execute this delete statement depends on how many rows are being deleted. When it finishes, SQL Server Management Studio will return the following message:

 (1 row(s) affected)

This message is returned even if millions of rows were deleted because the message tells us only how many rows were directly deleted by the executed statement. If you check the return value of the @@ROWCOUNT function, you will find that it also returns the value 1. There is nothing wrong with this behavior, but it is definitely something you should be aware of.

Important

Triggers

If you have defined foreign keys with cascading actions, any AFTER triggers on the affected tables will still be executed, but they will be executed after the whole chain of cascading actions has completed. If an error occurs while the cascading action chain is being executed, the entire chain will be rolled back and no AFTER triggers will be executed for that chain.

Check Constraints

Check constraints are a set of rules that must be validated prior to data being allowed into a table. Advantages to using check constraints include:

They are simple to implement. (They are very similar to a WHERE clause.)
They are checked automatically.
They can improve performance.

A sample check constraint that verifies that a Product must have a positive price is shown here:

 ALTER TABLE Products ADD CHECK(Price >= 0.0);

The simplicity is a great advantage over using triggers. However, there are some disadvantages as well, such as the following:

Error messages from check constraints are system-generated and cannot be replaced by a more user-friendly error message.
A check constraint cannot "see" the previous value of a column. This means that it cannot be used for some types of data integrity rules, such as "Updates to the price column cannot increase or decrease the price by more than ten percent."

One important aspect of check constraints is that they reject values that evaluate to False rather than accepting values that evaluate to True. That might seem like the same thing, but in SQL Server, it is not. It creates an issue related to NULL values that is important to acknowledge. For example, if you have a check constraint that states that "Price > 10.0," you can still insert a NULL value into the Price column. This value is allowed because any comparison made with NULL returns NULL-it is neither True nor False. If you don't want the check constraint to allow the null value, you can either disallow NULL in the Price column by specifying the NOT NULL constraint for the column or by changing the check constraint to read "Price > 10.0 AND Price IS NOT NULL".

Extending Check Constraints with User-Defined Functions

The expression in a check constraint can contain most of the logic that you can use in a WHERE clause (including NOT, AND, and OR). It can call scalar functions and reference other columns in the same table; however, it is not allowed to directly contain subqueries. Because you can write your own scalar functions in either T-SQL or managed code, you can apply advanced logic inside your check constraints and, through them, even use subqueries.

The following example creates a user-defined function called fnIsPhoneNumber in managed code to verify that a string contains a valid U.S. phone number by applying a regular expression. It then creates a table that calls the function from a check constraint to validate the phone numbers entered into the table. Because the PhoneNumber column allows NULL values, inserts and updates of nulls are allowed.

 //C# [SqlFunction(IsDeterministic = true, DataAccess=DataAccessKind.None)] static public SqlBoolean fnIsPhoneNumber(SqlString phoneNumber) {    if(phoneNumber.IsNull){       return SqlBoolean.Null;    }    return System.Text.RegularExpressions.Regex       .IsMatch(phoneNumber.Value, @"^\([1-9]\d{2}\)\s?\d{3}\-\d{4}$"); } 'VB <Microsoft.SqlServer.Server.SqlFunction(IsDeterministic:=True, DataAccess:=DataAccessKind.None)> _ Public Shared Function fnIsPhoneNumber(ByVal phoneNumber As SqlString) As    SqlBoolean    If (phoneNumber.IsNull) Then       Return SqlBoolean.Null    End If    Return System.Text.RegularExpressions.Regex.IsMatch(phoneNumber.Value,       "^\([1-9]\d{2}\)\s?\d{3}\-\d{4}$") End Function

The following T-SQL statement creates a table and check constraint that references the user-defined function:

 CREATE TABLE Test.Contacts (        ContactID INT IDENTITY PRIMARY KEY       ,Name NVARCHAR(50) NOT NULL       ,PhoneNumber VARCHAR(20) NULL       ,CONSTRAINT CKContactsPhoneNumber       CHECK(dbo.fnIsPhoneNumber(PhoneNumber) = CAST(1 AS BIT)) ); -- Allowed: INSERT Test.Contacts (Name, PhoneNumber)       VALUES ('Tobias', '(425)555-1111'); INSERT Test.Contacts (Name, PhoneNumber)       VALUES ('Sara', NULL); -- Disallowed, will raise an error: INSERT Test.Contacts (Name, PhoneNumber)       VALUES ('Bill', '(42)555-2222');

When is the check constraint executed? Only when needed. The query optimizer will decide to run the check constraint only if columns referenced in the check constraints are referenced by the executed DML statement. For inserts, this is always true because an insert always affects all columns (even if you insert a null value). For updates, the check constraint is only executed if a column contained in the check constraint is referenced by the update. If you turn on the Actual Execution Plan in SQL Server Management Studio and run the two update statements in the following code sample, you will get the execution plan shown in Figure 5-1. Notice the Assert icon in Query 2 of the execution plan; this is the execution of the check constraint. It is not executed for Query 1.

 UPDATE Test.Contacts SET Name = Name; UPDATE Test.Contacts SET PhoneNumber = PhoneNumber;

image from book
Figure 5-1: Actual Execution Plan in SQL Server Management Studio

Best Practices

Performance

Adding a lot of logic to your check constraints can hurt performance. A good approach is to add the necessary constraints and then run a performance test to verify that the performance is sufficent.

Here you can see a very simple test that can be used to indicate whether you have the constraint in place:

 DECLARE @i INT; SET @i = 1; WHILE(@i < 100000)    BEGIN          UPDATE Test.Contacts SET PhoneNumber = PhoneNumber;          SET @i = @i + 1;    END

In one test, this sample script took about six seconds to run with the constraint in place. Remember that there are only two rows in the table. By turning off the check constraint (ALTER TABLE Test.Contacts NOCHECK CONSTRAINT CKContactsPhoneNumber;), the same batch took about three seconds to run, which is half the length of time.

Using a User-Defined Function with a Subquery

It is possible to include subqueries in check constraints by putting them inside a user-defined function. This pratice can result in poor performance because the subquery will be executed once for each row affected by an update or insert statement against the table. Using the previous example, you can also validate telephone area codes using a subquery. Store the allowed area codes in a separate table.

The following code sample is a user-defined function:

 //C# [SqlFunction(IsDeterministic = true, DataAccess=DataAccessKind.Read)] static public SqlBoolean fnIsPhoneNumber(SqlString phoneNumber) {    if(phoneNumber.IsNull)        return SqlBoolean.Null;        if(!System.Text.RegularExpressions.Regex       .IsMatch(phoneNumber.Value, @"^\([1-9]\d{2}\)\s?\d{3}\-\d{4}$")){       return false; }else{    string areaCode = phoneNumber.Value.Substring(1,3);    using(SqlConnection conn = new SqlConnection(@"context                                                   connection=true;"))    {          using(SqlCommand cmd = conn.CreateCommand()) {               cmd.CommandText = @"IF EXISTS(SELECT * FROM Test.AreaCodes WHERE AreaCode = @AreaCode)                                       SELECT CAST(1 AS BIT) AS Found                                 ELSE                                       SELECT CAST(0 AS BIT) AS Found"; cmd.Parameters.Add("@AreaCode", SqlDbType.Char, 3) .Value = areaCode;                conn.Open();                return (bool)cmd.ExecuteScalar();          }    } } } 'VB <Microsoft.SqlServer.Server.SqlFunction(IsDeterministic:=True, DataAccess:=DataAccessKind.None)> _ Public Shared Function fnIsPhoneNumber(ByVal phoneNumber As SqlString) As SqlBoolean   If (phoneNumber.IsNull) Then     Return SqlBoolean.Null   End If   If Not System.Text.RegularExpressions.Regex.IsMatch(phoneNumber.Value, "^\([1- 9]\d{2}\)\s?\d{3}\-\d{4}$") Then     Return False   Else     Dim areaCode As String = phoneNumber.Value.Substring(1, 3)     Using conn As SqlConnection = New SqlConnection("context connection=true;")       Using cmd As SqlCommand = conn.CreateCommand()         cmd.CommandText = "IF EXISTS(SELECT * FROM Test.AreaCodes " & _           "WHERE AreaCode = @AreaCode) " & _           "SELECT CAST(1 AS BIT) AS Found " & _           "ELSE " & _           "SELECT CAST(0 AS BIT) AS Found"          cmd.Parameters.Add("@AreaCode", SqlDbType.Char, 3).Value = areaCode         conn.Open()         Return CType(cmd.ExecuteScalar(), Boolean        End Using     End Using   End If End Function

The following T-SQL statement is used to create the table and check constraint referencing the user-defined function:

 CREATE TABLE Test.AreaCodes (    AreaCode CHAR(3) NOT NULL PRIMARY KEY ); INSERT Test.AreaCodes (AreaCode) VALUES ('425'); CREATE TABLE Test.Contacts (    ContactID INT IDENTITY PRIMARY KEY    ,Name NVARCHAR(50) NOT NULL    ,PhoneNumber VARCHAR(20) NULL    ,CONSTRAINT CKContactsPhoneNumber CHECK(dbo.fnIsPhoneNumber(PhoneNumber) = CAST(1 AS BIT)) ); -- Allowed: INSERT Test.Contacts (Name, PhoneNumber)    VALUES ('Tobias', '(425)555-1111'); INSERT Test.Contacts (Name, PhoneNumber)    VALUES ('Sara', NULL); -- Disallowed because of invalid area code: INSERT Test.Contacts (Name, PhoneNumber)    VALUES ('Bill', '(111)555-2222');

Here is another imporant consideration: while the check constraint is verified for updates and inserts to the table, it is not verified when deleting rows in the table that the subquery references. The data that the check constraint validated against on the insert or update can be deleted without raising an error. For example, the following delete statement would not result in an error:

 DELETE Test.AreaCode WHERE AreaCode = '425';

However, after executing the delete statement, the following update statement would raise an error:

 UPDATE Test.Contact SET PhoneNumber = PhoneNumber;

This behavior is highly undesirable because it leads you to think you have the same protection that you have with foreign keys, which protect you against the delete statement as well. What happened to the performance with the new constraint in place? The test script that previously ran in six seconds with the constraint in place now runs in 33 seconds! In SQL Server 2005, you can implement this logic by using a foreign key.

Using a Foreign Key with a Subquery

Next, you'll implement the test of the phone number as a combination of a check constraint and a foreign key constraint. You will use the first version of the user-defined function (the one without the subquery) together with a foreign key. There are a couple of things that you need to do to make it possible to create the foreign key shown in the example.

The result of the expression in the AreaCode column must be of the same data type as the column that the foreign key references: CHAR(3). You do this by adding the CAST function to the AreaCode expression.
The column must also be marked as "persisted." PERSISTED means that SQL Server will physically store the result of the computed column's expression in the data row instead of calculating it each time it is referenced in a query. It will be recalculated every time the column is updated. One of the reasons for this requirement is performance; you don't want SQL Server to execute the SUBSTRING function each time the foreign key needs to be validated.

How can you implement the foreign key? You want it to check the area code only against the Test.AreaCodes table, not the entire phone number. You do this by implementing a computed column that returns only the area code portion of the phone number. The following script creates the new version of the Test.Contacts, including the added foreign key constraint table:

 CREATE TABLE Test.Contacts (    ContactID INT IDENTITY PRIMARY KEY    ,Name NVARCHAR(50) NOT NULL    ,PhoneNumber VARCHAR(20) NULL    ,CONSTRAINT CKContactsPhoneNumber          CHECK(dbo.fnIsPhoneNumber(PhoneNumber) = 1)    ,AreaCode AS CAST(SUBSTRING(PhoneNumber, 2, 3) AS CHAR(3)) PERSISTED    ,CONSTRAINT FKContactsAreaCodes          FOREIGN KEY (AreaCode)          REFERENCES Test.AreaCodes );

As you can see, the AreaCode column in the Contacts table is just a subset of the PhoneNumber column.

What will happen if you insert a null value into the PhoneNumber column? The SUBSTRING function will return null, and null is accepted by the foreign key and interpreted as a value that does not reference the AreaCodes table. So, what happened with the performance this time? It took eight seconds, about two seconds slower than the first example that did not validate that the area code existed, but 25 seconds faster than the check constraint with the subquery. Remember that the foreign key now also protects from deleting area codes that are referenced by phone numbers.

Implementing Check and Foreign Key Constraints to Improve Query Performance

Can check and foreign key constraints improve query performance? Don't they just protect us against invalid data, and in doing so, somewhat degrade performance? The answers to these questions are: "Yes, they can" and "No, they don't."

Because foreign keys and check constraints are declared rules, the query optimizer can use them to create more efficient query plans. This usually involves skipping some part of the query plan because the optmizer can see that, for example, because of a foreign key constraint, it is unnecessary to execute that particular part of the plan. The following code sample is a simple example of this behavior with a foreign key constraint. Consider the following two tables and the foreign key FKOrdersCustomers:

 CREATE TABLE Test.Customers (    CustomerID INT PRIMARY KEY ); CREATE TABLE Test.Orders (    OrderID INT PRIMARY KEY    ,CustomerID INT NOT NULL          CONSTRAINT FKOrdersCustomers                REFERENCES Test.Customers (CustomerID) );

If you execute the following query against the two tables, the query returns all orders that have a valid customer reference.

 SELECT o.* FROM Test.Orders AS o WHERE EXISTS (SELECT * FROM Test.Customers AS c                      WHERE c.CustomerID = o.CustomerID);

The execution plan used by SQL Server to execute this query is shown in Figure 5-2. In the execution plan, you can see the Test.Customers table is not accessed; the only table being accessed is Test.Orders. This is because the query optimizer knows that the exists operator in this query is unnecessery to execute because the foreign key constraint requires all orders to refer to an existing customer, which is what is checked in the where clause.

image from book
Figure 5-2: Actual Execution Plan in SQL Server Management Studio

Turn off the foreign key by executing the following statement:

 ALTER TABLE Test.Orders NOCHECK CONSTRAINT FKOrdersCustomers;

After executing the same query again, you get a new execution plan, as shown in Figure 5-3. The query optimizer executes the exists operator (in this case, the Nested Loops icon in the execution plan) to return only those orders that actually have a valid reference to the Test.Customers table. Because you turned off the foreign key constraint, SQL Server could not be sure that all orders actually have valid customer references and, therefore, had to execute the exists operator. For a large table, this can make a huge difference in execution time and input/output.

image from book
Figure 5-3: Actual Execution Plan in SQL Server Management Studio

Now turn on the foreign key again. Execute the statement below and rerun the query.

 ALTER TABLE Test.Orders    CHECK CONSTRAINT FKOrdersCustomers;

After running the query this time, you end up with the same plan again-the plan shown in Figure 5-3. How can this be? You turned the constraint back on, so now SQL Server should be sure that all orders have valid customer references. However, this is actually not the case. This is because the foreign key is considered not trusted. A constraint that is not trusted (which only applies to foreign key and check constraints) is not taken into account by the query optimizer. Your foreign key is not trusted because, while it was turned off, someone could have inserted or updated an order row with an invalid CustomerID. Turning the constraint back on does not verify existing data. You can verify that the foreign key is indeed not trusted by executing the following query:

 SELECT name, is_not_trusted FROM sys.foreign_keys    WHERE name = 'FKOrdersCustomers';

You will find that the is_not_trusted column contains the value 1, indicating that the constraint is not trusted. To make it trusted, you need to modify the earlier "turn on" statement by adding the WITH CHECK option to it, as shown in the following example:

 ALTER TABLE Test.Orders    WITH CHECK    CHECK CONSTRAINT FKOrdersCustomers;

This option tells SQL Server to verify that all rows in the table comply with the constraint prior to turning it back on. If any rows do not comply with the constraint, an error message is returned and the ALTER TABLE statement is rolled back.

If you execute the query again, you will find that you are back to the first execution plan (the one shown in Figure 5-2) and, if you execute the query against sys.foriegn_keys again, you will find that the is_not_trusted column now returns the value 0. It is once again trusted.

One last note on this implementation: you can change the script for the Test.Customers and Test.Orders tables, as shown in the following example, so that the CustomerID column in the Test.Orders table allows for null values-that is, it is not declared with the NOT NULL constraint.

 CREATE TABLE Test.Customers (    CustomerID INT PRIMARY KEY ); CREATE TABLE Test.Orders (    OrderID INT PRIMARY KEY    ,CustomerID INT NULL          CONSTRAINT FKOrdersCustomers                REFERENCES Test.Customers (CustomerID) );

If you execute the same query against this table structure, you get the execution plan from Figure 5-3. This means that the exists operator is being executed even if you have the trusted foreign key constraint in place. To persuade the query optimizer to go back to the plan that didn't execute the exists operator, you need to change the query, as shown in the following example:

 SELECT o.* FROM Test.Orders AS o WHERE EXISTS (SELECT * FROM Test.Customers AS c                      WHERE c.CustomerID = o.CustomerID)    AND o.CustomerID IS NOT NULL;

This informs SQL Server that no orders with a CustomerID of NULL should be returned, which brings you back to the plan from Figure 5-2.

Implementing Procedural Data Integrity

Procedural data integrity is implemented either using T-SQL or managed code by creating stored procedures or triggers. Of course, user-defined functions can be used in both stored procedures and triggers to simplify integrity checks. Because procedural logic usually requires more coding than declarative integrity checks, it should be used when it is most appropriate.

Stored Procedures

Using stored procedures for most data access is a good practice. Here, you only look at what type of validations to perform in the stored procedure rather than in a constraint. The main advantage to performing data validation in stored procedures is that the stored procecure can validate a rule prior to starting work. This is interesting because, for example, instead of performing 50 percent of the work up front and then finding an error and having to roll back this work, you have the option of first checking whether everything is okay and only then issuing the transaction.

Important

Error probability

This practice is only good for performance if there is a high probability of an error in the execution. If there is no error 99 percent of the time, it is usually slower to perform data validation before issuing the transaction instead of just checking things as you procede (which is what the check constraint would do). For a typical system, because there are many validation checks done prior to executing a stored procedure, the number of times that a stored procedure finds an error is relatively low.

In stored procedures, you should check the flow of the stored procedure's transaction. For example, if you have a stored procedure that inserts an order into the Orders table, followed by some order detail rows being inserted into the OrderDetails table, you should check at least the following in the stored procedure's code:

Did the insertion into the Order table actually insert anything? If you issue an INSERT…SELECT statement, the SELECT could have returned 0 rows, which would cause the INSERT statement to insert 0 rows. This check could easily be done using the @@ROWCOUNT function.
Did the insertion into the OrderDetails table actually insert at least one row? Here you could again apply the same type of check using the @@ROWCOUNT function.

DML Triggers

Triggers are very powerful, but should usually be implemented only when they are absolutely necessary. Triggers are executed after an insert, update, or delete statement has been completed. The problem with this is that if the first row being updated contains an error according to the trigger, all other rows that are updated by the update statement will be updated prior to the trigger being called. When the trigger finds the error, the entire update that activated the trigger will have to be rolled back by the trigger. In this case, a check constraint would have stopped the update as soon as the first error was encountered and would only need to roll back the rows updated prior to that row.

You might think that using an INSTEAD OF trigger would solve this problem, because it would be executed prior to the insert, update, or delete statement that is actually being executed. However, this is not the case. With this type of trigger, you have another performance penalty to pay. Because these triggers also provide you with the inserted and deleted tables against which to do your validation checks, SQL Server must "pretend" to issue the insert, update, or delete statement to populate these tables with the data touched by the statement performed. Then, if your trigger doesn't find any errors, you will have to issue a DML statement in the trigger to apply the changes found in the inserted or deleted table to the actual table. This type of trigger would only be better than an AFTER trigger if most executions contain errors.

In SQL Server 2005, triggers populate the inserted and deleted tables in tempdb. This might cause tempdb to require more space and have to handle more load than it would without the use of the trigger. Another common use for triggers is to maintain denormalized (that is, redundantly stored) data. Note that the query optimizer cannot use code found in triggers to create better execution plans.

Best Practices

Data validation in triggers

For data validation checks, use triggers only if you can't do the same check using constraints or in your stored procedure. For example, if you need to compare the value found before an update with the new value being inserted, you might use a trigger.

Exam Tip

For this exam, knowing when constraints, AFTER triggers, and INSTEAD OF triggers are executed is important.

Using T-SQL to Verify Permissions

Usually, you manage permissions by applying them to securables using the GRANT, DENY, and REVOKE statements. However, sometimes you might need to verify certain security settings inside a stored procedure. There are several functions available to help you with this.

One example is the IS_MEMBER function. It can check whether the current user is a member of a specific database role or Windows group. If you issue the statement SELECT IS_MEMBER('BUILTIN\Administrators'), it will return 1 if the current user is a member of the local Windows Administrators group and 0 if she is not. If the current user is not logged on using Windows authentication, the function will return NULL. There are several functions of this type that you can use. More information about them is available from SQL Server 2005 Books Online under the topic "Security Functions (Transact-SQL)" at http://msdn2.microsoft.com/en-us/library/ms186236.aspx.

Be aware that this function might behave differently depending on whether a user is logged on using Windows or whether he is logged on using SQL Server authentication. Another thing to note is that some of these functions take into account the use of the EXECUTE AS option/statement.

SQL Injection Attacks

If a text is sent directly to SQL Server and then executed, it is potentially at risk for a SQL injection attack. SQL injection attacks occur if someone "injects" SQL code (such as a DELETE statement or the execution of a stored procedure) into an application or a Web site and the SQL code is then executed. This attack can occur either inside SQL Server when dynamic SQL is used or on the client side if a string containing SQL is created and then sent directly to SQL Server.

On the client side, there are some precautions you take to prevent SQL injection attacks:

Always use the most restrictive data type to store user input before sending it to SQL Server. For example, put numbers into numeric variables rather than string variables.
Verify the format of strings on the client side prior to sending them to the server. Disallow all characters that are not needed inside the string.
If you are using ADO.NET, always use SqlParameter objects to send parameters to both stored procedures and DML statements. Avoid adding parameters to SQL statements using string concatenation.
Never show detailed error messages to the user. He or she can use these to decide how to write his or her SQL injection code!

The following examples show inappropriate and appropriate ways to add parameters to SQL statements using ADO.NET.

In this inappropriate method for adding parameters, notice that the SqlCommand.Parameters collection is not used:

 //C# ... SqlCommand cmd = new SqlCommand(); cmd.CommandText = @"SELECT * FROM ATable                     WHERE Name = '" + txtName.Text + "';"; ... 'VB ... Dim cmd As SqlCommand = New SqlCommand() cmd.CommandText = "SELECT * FROM ATable " & _                   "WHERE Name = '" & txtName.Text & "';" ...

Following is the appropriate method for adding parameters, this time using the SqlCommand.Parameters collection:

 //C# ... SqlCommand cmd = new SqlCommand(); cmd.CommandText = "SELECT * FROM ATable WHERE Name = @Name;"; cmd.Parameters.Add("@Name", SqlDbType.NVarChar, 20).Value = txtName.Text; ... 'VB ... Dim cmd As SqlCommand = New SqlCommand() cmd.CommandText = "SELECT * FROM ATable WHERE Name = @Name;" cmd.Parameters.Add("@Name", SqlDbType.NVarChar, 20).Value = txtName.Text ...

Consider the inappropriate way. If someone enters the string " ' ; DELETE ATable; --" into the txtName text box , the SQL statement sent to SQL Server would be as follows:

 SELECT * FROM ATable WHERE Name = ' ';DELETE ATable;--';

This code could (depending on foreign keys, triggers, etc.) cause the deletion of all rows in the table ATable. If you used the SqlParameter object, the code sent to SQL Server would look like the following:

 SELECT * FROM ATable WHERE Name = ' '';DELETE ATable;--';

Notice the two single quotes (' '). Because ADO.NET automatically replaces one single quote with two, the entire text entered into the text box is used in the WHERE clause instead of forming a new statement.

On the SQL Server side, you should take the same precautions as on the client side, but also take the following into account:

Avoid using dynamic SQL (EXECUTE and sp_executesql).
If you must use dynamic SQL, use sp_executesql with parameters rather than EXECUTE and string concatenation (basically, the same as with ADO.NET).
Consider the use of the EXECUTE AS option for stored procedures to avoid having to give elevated permissions to users who use stored procedures that contain dynamic SQL.

Following are examples of inappropriate as well as appropriate use of dynamic SQL. First, in the example of inappropriate use of Dynamic SQL, notice that all parameters are concatenated into the @SQL variable to define the SELECT statement.

 /* @OrderDate, @CustomerType and @IncludeOrders are parameters to this "stored procedure". */ DECLARE @SQL NVARCHAR(max); SET @SQL = N'SELECT * FROM Sales.Customer AS c '; IF(@IncludeOrders = 1)    SET @SQL = @SQL + N'INNER JOIN Sales.SalesOrderHeader AS soh ON                        soh.CustomerID = c.CustomerID                        AND soh.OrderDate = ''' + @OrderDate + ''' '; SET @SQL = @SQL + N'WHERE c.CustomerType = ''' + @CustomerType + ''';'; EXECUTE(@SQL);

In the example of appropriate use of Dynamic SQL which follows, notice that the parameter values are added to the statement using the sp_executesql system stored procedure:

 /* @OrderDate, @CustomerType and @IncludeOrders are parameters to this "stored procedure". */ DECLARE @SQL NVARCHAR(max); SET @SQL = N'SELECT * FROM Sales.Customer AS c '; IF(@IncludeOrders = 1)    SET @SQL = @SQL + N'INNER JOIN Sales.SalesOrderHeader AS soh ON                         soh.CustomerID = c.CustomerID                         AND soh.OrderDate = @OrderDate '; SET @SQL = @SQL + N'WHERE c.CustomerType = @CustomerType;'; EXEC sp_executesql    @stmt = @SQL    ,@params = N'@OrderDate DATETIME, @CustomerType NCHAR(1)'    ,@OrderDate = @OrderDate    ,@CustomerType = @CustomerType

The difference between these statements is the same as was shown previously in the ADO.NET example. Here, sp_executesql adds the parameters correctly and, for example, replaces the single quotes with two single quotes. Because the INNER JOIN is an expression rather than a parameter, it has to be added using concatenation. However, the @OrderDate parameter that is used to filter the JOIN expression can be added as a parameter using sp_executesql. Note that sp_executesql will not raise an error if the join is not added to the query, even if the @OrderDate parameter would then be missing.

Lab: Comparing Procedural and Declarative Data Validation Performance

In this lab, you create two similar tables with a specific data validation rule. You apply using a trigger in one table and a check constraint in the other table and compare results.

Important

Lab requirements

You will need to have SQL Server installed before you can complete this exercise. Refer to the Introduction for setup instructions.

Exercise 1: Apply a Data Validation Rule to Tables

In this exercise, you create two tables. One table will use a trigger to validate the rule, and the other table will use a check constraint. You then compare the insert performance by inserting 100,000 rows that adhere to the rule and 100,000 rows that do not adhere to the rule. The rule is very simple: the column "Name" must contain more than five characters. You insert all 100,000 rows in one insert statement by issuing an INSERT…SELECT. You can also try to issue each insert separately; the result will be very different!

Open Microsoft SQL Server Management Studio and connect to an instance of SQL Server 2005.
In a new query window, type and execute the following SQL statements to create the TestDB database and the Test schema:
```
 CREATE DATABASE TestDB; GO USE TestDB; GO CREATE SCHEMA Test; GO 
```
Note the results of the creation of the TestDB database in Figure 5-4.

In a new query window, type and execute the following SQL statements to create the Test.TableWithCheckConstraint (including its check constraint) and Test.TableWith-Trigger tables, as well as the trigger on Test.TableWithTrigger:

 USE TestDB; GO CREATE TABLE Test.TableWithCheckConstraint (    Name NVARCHAR(50) NOT NULL CHECK(LEN(Name) > 5) ); CREATE TABLE Test.TableWithTrigger (    Name NVARCHAR(50) NOT NULL ); GO CREATE TRIGGER TableWithTriggerTestTrigger ON Test.TableWithTrigger AFTER INSERT, UPDATE AS BEGIN    SET NOCOUNT ON;    IF EXISTS (SELECT * FROM inserted                      WHERE LEN(Name) <= 5)          BEGIN                RAISERROR('Too short name found, rolling back t.', 16, 1);                ROLLBACK TRAN;          END END

Open a new query window and execute the following SQL statements to create a table containing 100,000 rows. This table will be used to issue the INSERT…SELECT statement against the previously created tables. This query might take several minutes to finish executing.
```
 USE TestDB; GO CREATE TABLE Test.Rows100000 (    Row INT NOT NULL ); INSERT Test.Rows100000 (Row)    SELECT TOP(100000)          ROW_NUMBER() OVER (ORDER BY s1.message_id)    FROM sys.messages AS s1    CROSS JOIN sys.messages AS s2; 
```
Open a new query window, execute the following INSERT…SELECT statement against the TableWithCheckConstraint table three times, and record the execution time of the last execution. In this step, you insert data that adheres to the rule-that is, is more than five characters long:
```
 USE TestDB; GO DECLARE @Start DATETIME; SET @Start = CURRENT_TIMESTAMP; INSERT Test.TableWithCheckConstraint (Name)    SELECT 'Name that is OK' FROM Test.Rows100000 SELECT DATEDIFF(ms, @Start, CURRENT_TIMESTAMP); 
```
Clear the content of the query window, execute the following INSERT…SELECT statement against the TableWithTrigger table three times, and record the execution time of the last execution. In this step, you insert data that adheres to the rule-that is, is more than five characters long:
```
 DECLARE @Start DATETIME; SET @Start = CURRENT_TIMESTAMP; INSERT Test.TableWithTrigger (Name)    SELECT 'Name that is OK' FROM Test.Rows100000 SELECT DATEDIFF(ms, @Start, CURRENT_TIMESTAMP); 
```
You should notice a small difference in execution time between steps 5 and 6. On a test machine, it was roughly 15 percent slower to insert the data into the table with the trigger.
Try to insert data that does not adhere to the rule. Start by clearing the query window and executing the following INSERT…SELECT statement against the TableWith-CheckConstrain table only once. Because you get an error on the first inserted row, the check constraint stops the statement execution and rolls back the statement immediately.
```
 INSERT Test.TableWithCheckConstraint (Name)    SELECT 'Short' FROM Test.Rows100000; 
```
Finally, try to insert data that does not adhere to the rule against the table with the trigger. Clear the query window and execute the following INSERT…SELECT statement against the TableWithTrigger table only once. How does this differ from the result of step 7?
```
 INSERT Test.TableWithTrigger (Name)    SELECT 'Short' FROM Test.Rows100000; 
```
Type and execute the following SQL statement to clean up after this exercise:
```
 USE master; DROP DATABASE TestDB; 
```

image from book
Figure 5-4: Database creation script in SQL Server Management Studio