This section describes a variety of data rules as examples that can be formulated and
Data rule execution can be laborious and expensive. Rule engines may be used to do some of the work. The alternative is to
The more data you use, the more machine resources will be burned. However, if you profile over too small a sample, you may not get the results you seek. It is best to use as much of the data as possible for data rule testing.
Data rules do not need to be executed
Searching for data rules begins with a methodical examination of the business meaning of each column in a table. Some very obvious relations between
Columns with a DATE or TIME data type are a good place to start. Some business objects have many dates in them. These dates
Some objects also include TIME_OF_DAY columns that also can have an implied ordering. For example, START_TIME should always be earlier than STOP_TIME for data-recording repair activities in an auto repair garage, provided the repair activities do not cross over the end of a day.
These fields can be mapped out to determine the expected ordering and then one or more data rules formulated to test for conformance. If all of the dates are in the same row of the table, one query can find all violations. It is generally very easy to spot the violation by examining the output. Examples of validation rules are
SHIPPING_DATE IS NOT EARLIER THAN ORDER_DATE RECEIVED_DATE IS NOT EARLIER THAN ORDER_DATE RENEWAL_DATE IS 1 YEAR AFTER CONTRACT_DATE
Other dates that are important in rules are the CURRENT_DATE and a milestone date. Many date fields are not allowed to be later than the CURRENT_DATE. For example, an ORDER_DATE cannot be forward dated. A milestone date is one that sets a boundary for objects. For example, the date a company was founded would be a milestone date for ensuring that no employee had a start date that preceded it. These types of rules are generally not included in this section because they can usually be part of the column value definition, as described in Chapter 8. They generally involve only a single column.
A time duration is the computation of time between a pair of date and/or time columns. For example, the function YEARS(HIRE_DATE—BIRTHDATE) will determine how old you were when you were
Some durations have boundaries set by the company. For example, a rule may exist that all orders sent to suppliers must be filled in 60 days or be automatically cancelled. A duration of more than 60 days would mean that either the rule was violated on purpose or one of the dates was wrong. Violations of duration rules generally indicate inaccurate data. One of the dates is wrong,
DAYS(SHIPPING_DATE_ORDER − DATE) LESS THAN 60 YEARS(HIRE_DATE − BIRTHDATE) GREATER THAN OR EQUAL TO 18
Duration rules can also check for order of values as discussed earlier. Negative duration values would indicate that values are not in the proper order.
Another place to look for simple data rules is to seek columns that divide the object type into
This division of the objects into subgroups probably has an impact on the acceptable values in other columns. Some of the columns may be used only for one subtype. In this case they must be NOT_NULL or not blank for that subgroup, and NULL or blank for others. They may also restrict the permitted values in some columns based on the
SPOUSE_NAME IS 'BLANK' IF MARRIED_FLAG IS NOT YES DATE_OF_LAST_BIRTH IS NULL IF GENDER IS 'MALE' IF EMPLOYEE_TYPE IS 'PART_TIME' THEN PAY_TYPE IS 'NONEXEMPT'
Violations of rules such as these are almost always the result of inaccurate data entry. These types of errors can have large impacts on decision support routines that tend to use the subgrouping columns to define comparison groups.
|
|
Subgrouping rules are particularly useful in finding
Common errors found are misentering GENDER or AGE (BIRTH_DATE). Simple checks generally yield a number of men having babies or other female-only procedures and children having diseases only possible in adults or the other way around.
These types of inaccuracies are easy to get into the database because these columns are rarely part of a key. In addition, the columns that depend on them are often specified as being NULL_ALLOWED because some of the time they require NULL and other times they do not. The database systems do not have the capability of enforcing conditional NULL rules. This means that there is no database support for catching problems at entry. It requires data rule formulation and execution to discover inaccuracies.
|
|
A similar type of rule involves data objects that record multiple steps in a process. As each step is completed, more data is added to the object. Each of the points in the process defines a "state" the data can be in. Each state generally involves some columns that are
Work flow rules are a subset of process rules. Data rules can be constructed from them to determine the correctness of values within each valid state.
As in the previous section, the database systems are not functionally sufficient to enforce this conditional NULL rule. A data rule is needed to ensure that all columns conform to the requirements of the current state.
The state may not affect only whether other columns are empty or not. It may also restrict the content of another column to a subset of the values permitted in value analysis.
Examples of work flow data objects are ORDERS (placed, shipped, received, returned, cancelled), LOAN_APPLICATION (
IF APPLICATION_STATUS IS "REVIEWED" THEN INTEREST_RATE IS NULL IF APPLICATION_STATUS IS "ACCEPTED" THEN INTEREST_RATE IS NOT NULL
Sometimes a column is determined by consideration of values from other columns through a business procedure or policy. Setting a customer rating or a risk factor on a security investment are examples. If the rules for setting these values are deterministic enough, they can be converted into data rules to test against the data. An example is
IF INVENTORY_USAGE_LAST6MONTHS IS 0 AND ONHAND IS GREATER THAN 0 THEN SALVAGE_FLAG=YES
Sometimes the business rule is vague, allowing discretion on a businessperson, and thus cannot be conveniently encoded as a data rule. However, a subset of the business rule may be absolute and can be encoded as a valid test. For example, a rule that defines a BAD customer as opposed to a GOOD customer may be very complex and allow some leeway. However, within the rule it may be an absolute that a customer with an OVERDUE amount in excess of $100,000 that is overdue by more than 90 days is a BAD customer. Even though some of the criteria for a BAD customer is
The rules shown so far involve values contained in a single row. Sometimes an object has multiple rows for
An example is testing for uniqueness across the PRODUCT_ID column of ORDER_LINE_ITEMS. This is a column that is not unique across the entire table but is expected to be unique across the rows of each order. In this example, it appears that only one column was being used. However, there were actually three columns used: PRODUCT_ID, ORDER_NUMBER, and LINE_ITEM_NUMBER. Another example of this is a data rule that says that multiple entries for the same employee for emergency contact cannot have the same telephone number.
Of course, additional data rules can be created that are not included in the samples shown. Many are more complex than those shown here. These are generally data rules dealing with business policies. Many times these rules are soft rules that are allowed to be violated at times. Examples are
EXPENSE LINE ITEMS FOR A SINGLE EXPENSE REPORT CAN- NOT INCLUDE DATES FOR MORE THAN ONE WEEK WHERE A WEEK IS DEFINED AS SUNDAY THROUGH SATURDAY AN ORDER CANNOT INCLUDE ITEMS THAT REQUIRE DIFFERENT SHIPPING MODES