CONSTRAINTS

only for RuBoard - do not distribute or recompile

CONSTRAINTS

Part of the role of the logical model is to record constraints that need to be imposed on the model. The introduction of full support for time brings with it some additional requirements for the imposition of constraints.

Double-Counting Constraints

Double-counting occurs when the joining of tables returns more rows than should be returned. This problem is usually avoided if the general rules about the structure of dimensional models are followed.

However, the introduction of existence attributes into the model increases the risk of error by changing the nature of the relationships in the warehouse data model from simple (1:n) to complex (m:n).

The problem is best described by the use of an example of a sale of wine. Table 6.2 shows the bottle cost existence.

Table  6.2. Example of the Existence of a Wine Cost Entity
Wine Code Start End Bottle Cost
4504 1997/02/27 1999/03/31 4.36
4504 1999/03/31 Now 4.79

The bottle cost has an existence attribute. The existence is continuous, but a change in the cost price of a bottle of the wine has occurred. This has resulted in the generation of a new row.

Table 6.3 shows a fragment of the sales fact table detailing a sale of the wine above.

Table  6.3. A Single Sale of Wine
Wine Code Day Quantity Value
4504 1999/03/31 5 24.95

Next, the following query is executed and is intended to show the sales value and costs:

 Select w.wine_name, s.value "Revenue",               sum(s.quantity * w.bottle_cost) "Cost"        from Sales s, Wine w, BottleCostExistence bce        where w.wine_code = s.wine_code        and w.wine_code = bce.wine_code        and s.day between bce.start_date and bce.end_date        group by w.wine_name 

The result set in Table 6.4 is returned.

Table  6.4. Example of Double-Counting
Wine Name Revenue Cost
Chianti Classico 24.95 21.80
Chianti Classico 24.95 23.95

The result is that the sale has been double-counted. The problem has occurred because of an overlap of dates that caused the sale, which was made on March 31, 1999, to successfully join to two rows. The date of the change may well be right. The old cost price ceased to be effective on March 31, 1999, and the new price took effect immediately, on the same day.

As far as the query processing is concerned , the multiple join is also correct. The join criteria have been met.

What is wrong is the granularity of time. There is an implicit constraint that states that:

Time overlaps in existence are not permitted in a dimensional model.

Therefore, if both dates are correct, then the granularity is incorrect and a finer grain, such as time of day, must be used instead.

The alternative is to ensure that the end of the old period and the start of the new period actually meet. . This means that there is no overlap, as shown in Table 6.5.

Table  6.5. Wine Cost Entity Showing No Overlaps in Existence
Wine Code Start End Bottle Cost
4504 1997/02/27 1999/03/30 4.36
4504 1999/03/31 Now 4.79

It is equally important to ensure that no gaps are inadvertently introduced, as in the Table 6.6.

Table  6.6. Wine Cost Entity Showing Gaps in Existence
Wine Code Start End Bottle Cost
4504 1997/02/27 1999/03/30 4.36
4504 1999/04/01 Now 4.79

If the data in Table 6.6 were used, then the result set would be empty.

The query used in the example was phrased to aid clarity. It is worth remembering that in data warehousing most queries, involving joins to the fact table, use arithmetical functions to aggregate the results. The chances of users identifying errors from the result sets of such queries are seriously reduced when large numbers of rows are aggregated.

Therefore, in order for the concept of existence to work, temporal constraints must prevent any form of overlapping to occur on periods; otherwise , there is a risk that double-counting might occur. The following query, using the temporal construct overlaps, would detect such an occurrence.

 Select R1.PK        From R1, R2        where R1.PK = R2.PK        and R1.Period <> R2.Period        where R1.Period  overlaps  R2.Period 

R1 and R2, in the previous query, are synonyms for the same relation, and PK is the primary key. The relation is subjected to a self-join in order to identify temporal overlaps.

This query can be rewritten, without using the temporal constructs, as in the following query:

 Select R1.PK        from R1, R2        where R1.PK = R2.PK        and (R1.Start <> R2.Start or R1.End <> R2.End)        and R1.Start <= R2.End        and R2.Start <= R1.End 

The periodic execution of such queries would enable the detection of double-counting errors.

Referential Integrity Constraints

Several axiomatic referential integrity constraints can be specified:

  1. The period of existence of the subordinate object must be contained within a single period of existence of the superordinate object.

  2. In a hierarchy, as long as the subordinate exists, there must also exist a relationship between the subordinate entity and its superordinate entity. This is due to the mandatory participation condition relating to subordinate entity in data warehouse models.

  3. During the period of existence of an entity, all the attributes of the entity must exist. It follows , therefore, that gaps in existence of attributes are not allowed while the entity is in existence.

There are some constraints relating to retrospection.

If an entity has true retrospection:

  1. It can have attributes that have true retrospection. The lifespans of those attributes must fall within the lifespan periods of the entity. If the entity ceases to exist, then the attributes with true retrospection should cease to exist at the same time.

  2. It can have attributes that have false retrospection. These attributes can change only when the existence attribute for the entity indicates that the entity exists. They cannot change when the existence attribute for the entity indicates that the entity does not exist.

  3. It can have attributes that have a value, for retrospection, of permanent, as these values never change.

If an entity has false retrospection:

  1. It can have attributes that have true retrospection. If the entity ceases to exist, then the attributes with true retrospection should cease to exist at the same time. When the entity changes from nonexistent to existent, these attributes should begin a new interval of existence at the same time.

  2. It can have attributes that have false retrospection. These attributes can change only when the existence attribute for the entity indicates that the entity exists. They cannot change when the existence attribute for the entity indicates that the entity does not exist.

  3. It can have attributes that have permanent retrospection, as these values never change.

If a dimension has a value for retrospection of permanent, there are no constraints that restrict the kind of attributes the dimension can have.

Deletion Constraints

It is for the owners of the data warehouse to determine the values of retrospection that are to be applied to each data warehouse object. The three approaches place very different requirements on the design of the warehouse, and the accuracy of the results from queries will vary depending on the choices made. Some rules can be applied.

The rules governing referential integrity violations in relational databases with respect to deletions must be applied to existence. Where a dimension changes its status from being existing to becoming logically nonexisting, this is equivalent to the entity being logically deleted. The application of the rules has to be applied selectively.

Cascade delete cannot be used because this may result in the deletion of facts and this would have the effect of invalidating the database. Although the data warehouse would retain integrity in the sense that the references would remain intact, the database would return incorrect results.

Nullifying the references (effectively-deleting the relationship) cannot be used, because the participation condition of the dimensions, which would be nullified, is mandatory. If it were permitted to delete the relationships, queries that aggregated all the facts using one dimension would produce different totals to other dimensions. This would have the effect of invalidating the results.

The only method for dealing with changes to existence is to use the restricted effect. This means that when a dimension's existence ceases, any referencing dimension must have its relationship existences closed and new relationship existences created that refer to an existing dimension, before the previous dimension is allowed to have its existence terminated .

only for RuBoard - do not distribute or recompile


Designing a Data Warehouse . Supporting Customer Relationship Management
Designing A Data Warehouse: Supporting Customer Relationship Management
ISBN: 0130897124
EAN: 2147483647
Year: 2000
Pages: 96
Authors: Chris Todman

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net