Understanding SQL Server Data Integrity | Microsoft SQL Server 2005: The Complete Reference: Full Coverage of all New and Improved Features

When it comes to the subject of data integrity, we are dealing with a whole new barrel of pickles, one that SQL Server 2005 can adequately tackle under the guidance of the SQL Server developer and DBA…who now fully understands that he or she is modeling to ensure that data is plausible, not that it is correct. But there is still one more subject to discuss here before you can model your SQL Server 2005 database: business rules.

Data Integrity and Business Rules

A check constraint or trigger can easily be used to prevent a customer from spending more than $500 on credit. You might agree that any number above $500 is considered risky, but another customer might not. To apply this reasoning to the real world, for example, an airline booking system may be programmed to resist assigning seats to frequent-flier passengers who try to redeem miles toward a ticket, because the rules dictate that a seat should first be assigned to a cash customer, as opposed to a liability customer. All airlines maintain seat assignment rules when it comes to frequent fliers; although they vary widely in their rules.

Another rule, at a lower level than the one just described, would be that all rows for a given table must be unique. This rule is one of the core tenets of the Date relational model (C.J.Date). According to Chris Date, one of the world’s foremost database experts, the relational model should not allow for any NULLs or duplicates whatsoever. In fact, Date is outspokenly against NULLs and declares that they should never have been introduced into relational theory.

The Date rule declares that entity values (column values in a row) should never be NULL (unknown or missing). SQL Server 2005 lets you decide whether to abide by the Date rule or code to your own business rules, which may in certain circumstances allow both duplicate rows in a table and even NULL values.

Ensuring integrity is very much part of the relational database modeling, whether it is expressed in terms akin to calculus and algebra or according to Boolean logic or some other form of analysis. But data integrity, or the extent to which you manage it, as alluded to earlier, is also up to you. And this brings to us the subject of rules, specifically business rules.

Business rules are a hot topic in the new millennium. Yet they are really the abstract declaration of the data integrity requirements demanded by business owners and enterprise and data analysts. The “frequent fliers get last choice” rule discussed earlier is exactly the type of business rule about which we are talking.

We can look at this another way. We can say that the data integrity constraint logic is the formal definition of a business rule applied to corporate data. After all, you can scream about business rules and data integrity until the cows come home, but that will do nothing to a database that knows only unfettered character-based data, has more duplicates than bottle tops, and can do little about enforcing integrity (see Chapter 3). You will find, as we move from operational data support to analytical and temporal data support (discussed in Part III) in SQL Server 2005, that the formulation of business rules becomes more of a requirement than a luxury. Analytical data comes from operational data, so the more lax your integrity control in the OLTP system, the more effort you will have to expend when you need to get analysis data scrubbed before it can be copied to the data warehouse. Values like N/A, TBA, or “unknown” lessen the value in the data mine, and the information extraction becomes extremely time-consuming and expensive.

Figure 12–1 represents this discussion in conceptual terms. At the highest level-that is, the conceptual level-the enterprise and data analysts formulate rules with the business owners. This is also the requirements formulating level. In the middle is the modeling level that translates the business rules into database integrity requirements by database analysts and even DBAs. And at the lowest level is the development model that implements the integrity requirement as constraints, checks, and procedures in SQL Server 2005, implemented by DBAs and SQL Server developers.

image from book
Figure 12–1: Modeling the database for integrity and adherence to business rules

Now that our philosophical (and emotional) banks are charged, we can look at the level you are probably most interested in: implementing the integrity requirements. To do so, we must classify integrity into several governing sections as follows:

Database and table integrity
Referential integrity
Entity integrity
Type integrity
Domain integrity
Transition integrity
Transaction integrity

Database and Table Integrity

Database and table constraint mechanisms are the broadest form of integrity control because they relate to the relationships between entities, multiple columns in a table, and multiple tables in a database. A good example of a database integrity violation is allowing a customer with bad credit to buy more goods and to put the order on account, or even allowing payment to be made by anything other than cash, letter of credit, or a credit card. The following pseudocode would thus enforce the rules of database integrity for the given example:

 FOR Customers.Orders IF Customer_Credit.Credit < 10 BEGIN   RAISERROR 1.Message ('Customer must pay with cash or credit card')   GOTO Cash END ELSE BEGIN   RAISERROR 2.Message('Customer is cleared for credit')   GOTO Account END

Another table-level constraint against integrity violation is the case of the orphaned row. If we delete an entity related to one or more entities through primary-foreign key relationships, we are deemed (to use a legal expression) to have violated rules of referential integrity. Referential integrity rules should be adhered to in every database deployment, which is why we devote a separate section to this topic.

Referential Integrity

By deleting a member row that is referenced by other entities or rows, we in fact leave the remaining undeleted entities orphaned and the record in tatters. In fact, the database is now in a poor state because no easy means now exists of identifying or finding the orphaned entities and removing the entire record. And if you have data in a database that can never be accessed or relied upon, the entire database becomes questionable.

I liken referential integrity violations to the habit some people have of not finishing an apple. If you cut the apple, take out a wedge, or only eat half of it, the remaining fruit darkens and quickly goes bad. So it is when you delete a row and violate referential integrity: over time the database goes bad also. As mentioned in the chapters in Part I, if you regard all the rows in linked tables as combining to represent a complete record, deleting a row in one table and leaving the related rows intact is akin to taking a bite out of an apple and then leaving it to go brown and rot.

On the other hand, referential integrity violation can also be said to have occurred when an entity references a nonexistent entity in the database. Referential integrity requires that there be no references to nonexistent values and that if a key value changes, all references to it change consistently throughout the database.

In the past, referential integrity would be maintained using triggers and many lines of code; we called this approach procedural referential integrity (PRI). Most modern DBMS products now support declarative referential integrity (DRI), which is essentially the opposite of procedural referential integrity. DRI makes use of built-in mechanisms to ensure or maintain referential integrity, while PRI is the responsibility of the database modeler or developer.

The declarative referential integrity constraints are enforced as follows:

The integrity criteria are defined in the various database objects’ definitions, such as foreign key (FK) existence.
Checks, defaults, and rules trap the violations.
SQL Server automatically enforces the integrity rules.

Procedural referential integrity constraints are enforced as follows:

The integrity criteria are defined and enforced in your T-SQL code.
The constraints are implemented in triggers and stored procedures (see Chapters 13 and 14), or the T-SQL code in other manual constraints.

Entity Integrity

Entity constraints, also known as relational constraints, define rules that govern rows and columns in tables and the tables as units. Uniqueness is one of the primary entity integrity rules to ensure. Of course, you can maintain a database that enforces no uniqueness whatsoever, but that would certainly render a table devoid of any integrity For example, you can create and populate a table of orders and insert duplicate orders into the table willy-nilly. Your reasoning might be that, since customers can make duplicate orders, these duplicate orders should be stored without any constraints.

But is there such a thing as a duplicate order? Only in error, I believe, and a constraint should be designed to catch such an error. Each order is placed at a certain time and date. No order can be entered twice at exactly the same time. The description, quantity, price, and discount of an item might be duplicated, but the time the entry was made cannot. Besides, each order is a new transaction that differs from another transaction only in the time it was entered and the order number assigned to it. It is in fact unique.

Caution

Although highly unlikely, it is possible for two users to enter the same record or data at exactly the same time. An identity column and table locks would prevent the two rows from competing on the question of their uniqueness.

The tools we use to constrain entities are the primary keys (on identified columns), which can impose primary indexes, UNIQUE constraints, IDENTITY properties, and so forth. Table constraints are checked before any data can be allowed into the table. If the data violates the constraint rule-for instance, if it contains a duplicate row or a duplicate value on which we have installed a primary index-it is rejected.

Entity integrity is enforced using primary keys and unique indexes.

Type Integrity

Type constraint discussions generally cover domains, although data types and data domains are two separate concepts, and that is gospel. I have thus added domain constraints to the list, and we discuss domain integrity next. A type constraint enforces rules that govern correct use of the data type of a given value. Ensuring the consistent use of a date data type, for example, is a form of type constraint. Dates, decimals, currency, and the like are data types recorded in several formats that govern precision, but the data will become impossible to work with if such scalar values are not constrained to just one of the several formats required by your rules.

Assigning NULL values is another form of type constraint, even though we talk about NULL with respect to database and entity constraints. By allowing NULL values in a tuple (field), you are explicitly allowing the storage of missing or unknown values. If you disavow NULL values, you must either supply a value through some formula or supply a default value for the variable.

A NULL value, for that matter, is an oxymoron in a manner of speaking. You might ask yourself how a NULL value can be a value and replace something that is missing or unknown. It is, however, a convenient means of allowing a row to be inserted into the database even if it contains missing or unknown values. If your rules allow NULLs to be a temporary expedient, in they go. If not, out they go.

The rules of integrity should guide your use of the NULL. Ask yourself if your data is reliable if it is made up of missing or unknown values. If not knowing middle initials will not break your data, then the NULL represents a convenient placeholder until the initial becomes available. Inserting a default value would not be feasible, because you would be inserting a Type II error that could spell disaster: accepting as true something that may be false.

In this regard, you might consider NULL values to be a violation of database integrity as well. My rule allows NULL if it does not violate the integrity of my data or adversely affect the analytical value of the data. If it does, I must install a default value and code the logic, based on the business rule, that obtains the default. If you do not have the data for a value for a given row, you should instead assign a default value or obtain a value through an alternate formula. There is a lot of debate surrounding NULL. Nevertheless, SQL Server 2005 permits you the flexibility of either allowing or disallowing NULL values according to your needs…or ignorance.

Type constraints are defined in code and in the definitions of the data types, such as specifying NULL (allowed) or NOT NULL (disallowed). Integrity is ensured with automatic checks and procedural code.

Note

A type constraint is a column-level constraint.

Domain Integrity

Data domains are the logical grouping of values. For example, in my items table I have a color column that can take only one of five colors. Notice the emphasis on only because that is the focus of my domain constraints. The domain rule for the item dictates that it can only be red, blue, mustard, lime, or black. You might argue that mustard and lime belong to the domain flavors, and if you do, you have grasped the concept of a data domain, and you have joined the debate.

I tend not to agree that domain constraints govern data types; that is the work of type constraints. However, the check and constraint mechanisms might be the same for both, and many domain constraints have been cast along data type lines. For example, it is very convenient to constrain a numeric value by the type of integer. Domains can represent collections of data types, which is what leads to the incorrect definition that domains are data types, period. While domains are in fact data types, the definition of a domain should refer to a logical collection or grouping of data values and entities, not data types. As long as you understand the difference, you’ll be okay.

Besides checks, you can use stored procedures and triggers to ensure domain integrity. For example, a stored procedure can populate a list or table at the client with allowable values from which to choose.

Note

A domain constraint is also a column-level constraint.

Transition Integrity

One of my clients has a simple rule that was more important to the IS managers than anything in the database: “Call center agents are only allowed to take an order and receive payment.” The agent is not allowed access to any functions in the client application that debits items from inventory or causes a picker in the warehouse to go pack the items to send to shipping. Only a second-level stock manager is allowed to do that. My client manages its stock levels like a squirrel manages its acorns. Only stock managers can produce the pick and shipping data that will translate the order into a shippable collection of items. That was the business rule; I did not question it, I only implemented it.

Rules such as these are translated into what we call transition constraints. These constraints ensure that the status of records changes according to predetermined rules. Inventory levels and accounting databases need to adhere to strict transition constraints. For example, you should never credit to one table without debiting from another. Inventory cannot be debited if shipping is not credited.

There are several levels on which you can define or specify transition states. In most order-taking databases, these can be defined as follows:

Order entered.
Order taken, money taken, or credit approved.
Item back-ordered.
Items picked and packed.
Order canceled or paused.
Items shipped.
Obligation completed.

This list relates to the various transitions in a database. For example, an order changes from entered to taken only when either credit is approved or the items have been paid for. In other words, if transition integrity is maintained or enforced on the database, then an order entered can only be considered a de facto liability (the company owes the client the items) if money has changed hands or the customer is in good credit standing. A check on cash or credit will allow the order to go from a state of entered to a state of taken. Some companies do not consider an obligation completed until the order is on the road to the client. Only then do they actually debit the credit card.

Items also move through various states. For example, an AllowBackorders constraint can enforce a rule that either allows or disallows part of an order from being back-ordered. For example, a customer might request that the order should not ship until a part in the shipment is available for immediate delivery.

The aforementioned client also has another very important business rule. The items cannot be shipped and the software cannot produce the shipping label unless the shipping department has called the client and obtained a verbal agreement to accept the order. If the client agrees, only at that instant will the credit card be run or a debit applied to the account.

My client advised me the main reason for this rule is that about 95 percent of the shipments rejected by customers come from customers paying on credit accounts or credit cards. The customer changes his or her mind after placing the order and then refuses the delivery. My client then has to eat the loss on the shipping costs (often UPS Red or FedEx delivery) because the shipper has fulfilled.

Transaction Integrity

A transaction in the sense described here is a collection of operations on data that must be completed according to business rules as a unit, or the transaction is completely canceled. In this regard the transaction constraint is similar to the formal transaction monitoring in which SQL Server engages automatically and thus displays elements of atomicity (see Chapter 16). Transaction integrity in the preceding discussion is more a procedural integrity mechanism (something you usually have to code a solution for) that ensures that all of the components that are required for a complete transaction are entered and satisfy all of the preceding integrity rules before the transaction is committed.

Such transactions can, however, be applied over long periods, depending on the business needs and rules. Transactions can also happen over a short period, and several states can make up several transactions. For example, an order might be broken up into several transactions, one for each state an order is in.