EXAMPLE

We will introduce the problem by means of a motivating example. For the remainder of this section, we will consider dimensions as relations representing data as of the current time. Let us suppose a retail data warehouse with the following dimensions: time, salesperson, customer, and product. Moreover, as dimensions are organized in hierarchies, let us also assume the hierarchy {itemId priceStatus, itemId brand}, and the following functions (denoted roll-up functions) from itemId to priceStatus: {(Panasonic DVD A120, reduced), (Sony STR DE675, regular), (Sharp MD-SR50, regular)} (let us ignore the dimension level brand at this time). The following table represents sales facts.

A query asking for the total sales per salesperson and price status would return the following table:

timeID

salesPersonID

customerID

itemID

salesAmount

10 - 10 - 01

s1

Becher, J.

Panasonic DVD A120

250

10 - 11 - 01

s2

Rodriguez, P.

Panasonic DVD A120

250

10 - 12 - 01

s1

Ashfield, K.

Sony STR DE675

300

10 - 13 - 01

s2

Finn, N.

Sharp MD - SR50

100

Suppose now that on October 14, 2001, we decide to assign the Panasonic DVD a regular price status. A nontemporal star or snowflake schema will store < Panasonic DVD A120, regular > in the table representing dimension Product, replacing the tuple <Panasonic DVD A120, reduced >, i.e., there will be no memory

spID

item Type

sales Amount

s1

reduced

250

s2

reduced

250

s1

regular

300

s2

regular

100

of the price status of an item. If the user now poses the same query, as all the sales occurred before the revision, she would expect to get the same result. However, she gets the following:

spID

price Status

sales Amount

s1

regular

550

s2

regular

350

What happened is that there are no longer items with reduced prices.

Notice that in order to issue a query, the user needs to know the schema of the data warehouse, that is, which are the attributes in the fact and dimension tables. However, this schema may change over time. For instance, itemId may not always have been an attribute of the fact and/or dimension tables if in the early days of this data warehouse, data with granularity itemId was not available at the sources. In this case, the query will only consider total sales made since the time at which itemId was added to the fact table, although information is available to obtain the total sales over the whole lifespan of the data warehouse, at least at a coarser level.

As another example of inaccurate results a user could get when querying this data warehouse, suppose that a many-to-one relationship from customers to salespersons exists, such that each salesperson is assigned a set of customers to serve. At a certain time, a customer, say c1, initially assigned to salesperson s1, is reassigned to s2. Suppose a sales manager wants to use the data warehouse to set future sales goals for each salesperson, basing the forecast on past sales data. Given the relationship between customers and salespersons, clearly this projection should be based on past volume of purchases made to customers currently assigned to each salesperson, no matter who was formerly assigned to whom, as a salesperson cannot expect anything from a customer no longer assigned to her.

The discussion above suggests that new models and query languages are needed in order to address temporal issues in OLAP. This will avoid building ad-hoc applications, as in current commercial OLAP systems, which have no built-in temporal capabilities.



Multidimensional Databases(c) Problems and Solutions
Multidimensional Databases: Problems and Solutions
ISBN: 1591400538
EAN: 2147483647
Year: 2003
Pages: 150

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net