|
|
We will introduce the problem by means of a motivating example. For the remainder of this section, we will consider dimensions as relations representing data as of the current time. Let us suppose a retail data warehouse with the following dimensions: time, salesperson, customer, and product. Moreover, as dimensions are organized in hierarchies, let us also assume the hierarchy {itemId → priceStatus, itemId → brand}, and the following functions (denoted roll-up functions) from itemId → to priceStatus: {(Panasonic DVD A120, reduced), (Sony STR DE675, regular), (Sharp MD-SR50, regular)} (let us ignore the dimension level brand at this time). The following table represents sales facts.
A query asking for the total sales per salesperson and price status would return the following table:
timeID | salesPersonID | customerID | itemID | salesAmount |
---|---|---|---|---|
10 - 10 - 01 | s1 | Becher, J. | Panasonic DVD A120 | 250 |
10 - 11 - 01 | s2 | Rodriguez, P. | Panasonic DVD A120 | 250 |
10 - 12 - 01 | s1 | Ashfield, K. | Sony STR DE675 | 300 |
10 - 13 - 01 | s2 | Finn, N. | Sharp MD - SR50 | 100 |
Suppose now that on October 14, 2001, we decide to assign the Panasonic DVD a regular price status. A nontemporal star or snowflake schema will store < Panasonic DVD A120, regular > in the table representing dimension Product, replacing the tuple <Panasonic DVD A120, reduced >, i.e., there will be no memory
spID | item Type | sales Amount |
---|---|---|
s1 | reduced | 250 |
s2 | reduced | 250 |
s1 | regular | 300 |
s2 | regular | 100 |
of the price status of an item. If the user now poses the same query, as all the sales occurred before the revision, she would expect to get the same result. However, she gets the following:
spID | price Status | sales Amount |
---|---|---|
s1 | regular | 550 |
s2 | regular | 350 |
What happened is that there are no longer items with reduced prices.
Notice that in order to issue a query, the user needs to know the schema of the data warehouse, that is, which are the attributes in the fact and dimension tables. However, this schema may change over time. For instance, itemId may not always have been an attribute of the fact and/or dimension tables if in the early days of this data warehouse, data with granularity itemId was not available at the sources. In this case, the query will only consider total sales made since the time at which itemId was added to the fact table, although information is available to obtain the total sales over the whole lifespan of the data warehouse, at least at a coarser level.
As another example of inaccurate results a user could get when querying this data warehouse, suppose that a many-to-one relationship from customers to salespersons exists, such that each salesperson is assigned a set of customers to serve. At a certain time, a customer, say c1, initially assigned to salesperson s1, is reassigned to s2. Suppose a sales manager wants to use the data warehouse to set future sales goals for each salesperson, basing the forecast on past sales data. Given the relationship between customers and salespersons, clearly this projection should be based on past volume of purchases made to customers currently assigned to each salesperson, no matter who was formerly assigned to whom, as a salesperson cannot expect anything from a customer no longer assigned to her.
The discussion above suggests that new models and query languages are needed in order to address temporal issues in OLAP. This will avoid building ad-hoc applications, as in current commercial OLAP systems, which have no built-in temporal capabilities.
|
|