FREQUENCY OF CHANGED DATA CAPTURE

only for RuBoard - do not distribute or recompile

FREQUENCY OF CHANGED DATA CAPTURE

In the pursuit of accuracy relating to time, we need to know whether the data we are receiving for placement into the data warehouse accurately reflects the time that the change actually occurred. This was identified as another source of inaccuracy with respect to the representation of time in Chapter 4. This requirement cannot be fully satisfied by the data warehouse in isolation, as is now described.

So far as the facts are concerned , the time that is recorded against the event would be regarded, by the organization, as the valid time of the event. That means it truly reflects the time the sale occurred, or the call was made.

With the circumstances and dimensions, we are interested in capturing changes. As has been discussed, changes to some attributes are more important than others with respect to time. For the most important changes, we expect to record the times that the changes occurred. Some systems are able to provide valid time changes to attributes, but most are not equipped to do this. So we are faced with the problem of deducing changes by some kind of comparison process that periodically examines current values and compares them to previous values to determine precisely what has changed and how.

The only class of time available to us in this scenario is transaction time. Under normal circumstances, the transaction time is the time that the change is recorded in the operational system. Often, however, the transaction time changes are not actually recorded anywhere by the application. Changes to, say, a customer's address simply result in the old address being replaced by the new address, with no record being kept as to when the change was implemented. Other systems attempting to detect the change, by file comparison method, have no real way of knowing when the change occurred in the real world or when it was recorded into the system.

So, in a data warehouse environment, there are two time lags to be considered . The first is the lag between the time the change occurred in the real world, the valid time, and the time the change is recorded in an operational system, the transaction time. Usually, the organization is unaware of the valid time of a change event. In any case, the valid time is rarely recorded. The second time lag is the time it takes for a change, once it has been recorded in the operational system, to find its way into the data warehouse.

The solution is to try to minimize the time lags inherent in this process. Although that is often easier said than done, the objective of the designers must be to identify and process changes as quickly as possible so that the temporal aspect of the facts and dimensions can be synchronized.

only for RuBoard - do not distribute or recompile


Designing a Data Warehouse . Supporting Customer Relationship Management
Designing A Data Warehouse: Supporting Customer Relationship Management
ISBN: 0130897124
EAN: 2147483647
Year: 2000
Pages: 96
Authors: Chris Todman

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net