THE ROLE OF TIME

only for RuBoard - do not distribute or recompile

THE ROLE OF TIME

In data warehousing the addition of time enables historical data to be held and queried upon. This means that users of data warehouses can view aspects of their enterprise at any specific point or over any period of time for which the historical data is recorded. This enables the observation of patterns of behavior over time so that we can make comparisons between similar or dissimilar periods, for example, this year versus last year, seasonal trends. Armed with this information, we can extrapolate with the use of predictive models to assist us with planning and forecasting. We are, in effect, using the past to attempt to predict the future:

If men could learn from history, what lessons it might teach us! But passion and party blind our eyes, and the light which experience gives is a lantern on the stern , which shines only on the waves behind us!

”(Coleridge, 1835)

Despite this gloomy warning from the nineteenth century, the use of information from past events and trends is commonplace in economic forecasting, social trend forecasting, and even weather forecasting.

The value and importance of historical data are generally recognized. It has been observed that the ability to store historical data is one of the main advantages of data warehousing and that the absence of historical data, in operational systems, is one of the motivating factors in the development of data warehouses.

Some people argue that most operational systems do keep a limited amount of history, about 60 “90 days. In fact, this is not really the case, because the data held at any one time in, say, an order processing system will be orders whose lifecycle has not been completed to the extent that the order can be removed from the system. This means that it may take, on average, 60 “90 days for an order to pass through all its states from inception to deletion. Therefore, at any one time, some of the orders may be up to 90 days old with a status of invoiced, while others will be younger , with different status such as new, delivered, back ordered, etc. This is not the same as history in our sense.

Valid Time and Transaction Time

Throughout this chapter and the remainder of the book, we will make frequent reference to the valid times (Jensen et al., 1994) and transaction times of data warehouse records. These two times are defined in field of temporal database research and have quite precise meanings that are now explained.

The valid time associated with the value of, say, an attribute is the time when the value is true in modeled reality. For instance, the valid time of an order is the time that the order was taken. Such values may be associated with:

  1. A single instant.   Defined to be a time point on an underlying time axis. An event is defined as an instantaneous fact that occurs at an instant.

  2. Intervals (periods) of time.   Defined to be the time between two instants.

The valid time is normally supplied by the user , although in some cases, such as telephone calls, the valid time can be provided by the recording equipment.

The transaction time associated with the value of, say, an attribute records the time at which the value was stored in the database and is able to be retrieved. Transaction times are system generated and may be implemented using transaction commit times. Transaction times may also be represented by single instants or time intervals. Clearly, a transaction time can provide only an approximation of the valid time. We can say, generally, that the transaction time that records an event will never be earlier than the true valid time of the same event.

Behavioral Data

In a dimensional data warehouse, the source systems from which the behavioral data is derived are the organization's operational systems such as order processing, supply chain, and billing. The source systems are not usually designed to record or report upon historical information. For instance, in an order processing system, once an order has satisfactorily passed through its lifecycle, it tends to be removed from the system by some archival or deletion process. After this, for all practical purposes, the order will not be visible. In any case, it will have passed beyond the state that would make it eligible to be captured for information purposes.

The task of the data warehouse manager is to capture the appropriate entities when they achieve the state that renders them eligible to be entered into the data warehouse. That is, when the appropriate event occurs, a snapshot of the entity is recorded. This is likely to be before they reach the end of their lifecycle. For instance, an order is captured into the data warehouse when the order achieves a state of, say, invoiced. At this point the invoice becomes a fact in the data warehouse. Having been captured from the operational systems, the facts are usually inserted into the fact table using the bulk insertion facility that is available with most database management systems. Once loaded, the fact data is not usually subject to change at all. The recording of behavioral history in a fact table is achieved by the continual insertion of such records over time.

Usually each fact is associated with a single time attribute that records the time the event occurred. The time attribute of the event would, ideally , be the valid time, that is, when the event occurred in the real world. In practice, valid times are not always available and transaction times (i.e., the time the data was recorded) have to be used.

The actual data type used to record the time of the event will vary from one application to another depending on how precise the time has to be (the granularity of time might be day, month, and year when recording sales of wine, but would need to be more precise in the case of telephone calls and would probably include hours, minutes, and seconds).

Circumstantial Data

Operational data, from which the facts are derived, is accompanied by supporting data, often referred to as reference data. The reference data relates to entities such as customers, products, sales regions , etc. Its primary purpose, within the operation processing systems, is to enable, for instance, the right products and documentation to be sent to the right addresses. It is this data that is used to populate the dimensions and the dimensional hierarchies in the data warehouse.

In the same way that business transactions have a lifecycle, these reference entities also have a lifecycle. The lifecycle of reference data entities is somewhat different to transactions. Whereas business transactions, under normal circumstances, have a predefined lifecycle that starts at inception and proceeds through a logical path to deletion, the lifecycle of reference data can be much less clear. The existence of some entities can be discontinuous. This is particularly true of customer entities who may switch from one supplier to another and back again over time. It is also true of some other reference information, such as products (e.g., seasonal products). Also, the attributes are subject to change due to people moving, changing jobs, etc.

only for RuBoard - do not distribute or recompile


Designing a Data Warehouse . Supporting Customer Relationship Management
Designing A Data Warehouse: Supporting Customer Relationship Management
ISBN: 0130897124
EAN: 2147483647
Year: 2000
Pages: 96
Authors: Chris Todman

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net