INCOMPLETE METADATA

Less common is incompleteness in the metadata. The metadata in a multidimensional database are the dimensions, categories, and units. Often, the metadata can be stored as a set of mapping tables, which encodes the relationship between a finer and a coarser category. The set of mapping tables for a dimension specifies the hierarchy for that dimension. A mapping table has two columns, labeled Fine and Coarse as shown in Figure 7. The mapping in the table is from individual kinds of Items sold in a store to Food Groups. In the mapping table, each unit in the Fine category is associated with some unit in the Coarse category. As already discussed, in order to be summarizable, all such relationships must be totaled on both sides, and many-to-one from the finer to the coarser category (Lenz & Shoshani, 1997). The example table shown in Figure 7 has several problems (Pedersen et al., 1999, 2000).

click to expand
Figure 7: A Mapping Table from Items to Food Groups

  • Charcoal is not a food. The null value in the Food Group associated with charcoal indicates that there is no Food Group to which charcoal belongs. The mapping is said to be non-covering.

  • Tomatoes are both a vegetable and a fruit. Technically, a tomato is a fruit, but most people think of tomatoes as vegetables, so the table specification indicates that they are both (the idea is to permit such specifications and let the multidimensional database solve the dilemma). The mapping is a many-to-many relationship, no longer many-to-one, and is referred to as non-strict.

  • Monsteros are difficult to identify. A monstero is a food that is familiar only in the tropics; it is about the size of a cob of corn, and is covered with dark green, hexagonal scales. The store has imported some monsteros from Australia, but the metadata specification is not quite clear on whether monsteros are fruits or vegetables. The mapping is non-strict.

  • Grass is a food group, and some Item sold by the store fits into the group, but it is not clear which Item. The mapping is non-onto since the Fine category has partial participation.

The problem of incomplete information in metadata is that it makes the hierarchy non-summarizable. Proper management of the incomplete information can restore the hierarchy. Each case is considered below in detail.

Non-covering hierarchies: Figure 8(a) shows an example of a non-covering hierarchy. The problem is subtle. There is no Food Group unit to which "Charcoal" belongs; but, skipping a level, "Charcoal" does belong to the "Picnic Supplies" unit in the Store Aisles category. The non-covering hierarchy is a problem in a semi-eager multidimensional model that materializes Food Groups. Even though Food Groups is a finer category than Store Aisles, the materialized values for Food Groups cannot be used to compute Store Aisles since "Picnic Supplies" will be undercounted. The solution is to complete the non-covering mapping by adding a hidden unit as shown in Figure 8(b). The unit "Hidden PS" is added to the Food Groups category. The unit is only seen by the system, that is, it is hidden from a user who queries for Food Group measures. At most one hidden unit is added for each non-covering violation.

click to expand
Figure 8: Repairing a Non-Covering Mapping

Non-onto hierarchies: Techniques for repairing non-onto hierarchies (where the coarse category does not totally participate) are similar to those for completing non-covering hierarchies. Here, the hierarchy is padded with finer, hidden values below the existing values, e.g., a hidden "Grass" value is inserted into the Item level and mapped to "Grass" in the Food Groups level to make the hierarchy onto.

Non-strict hierarchies: A non-strict hierarchy is a many-to-many mapping between two categories. In general, non-strict mappings can be handled by introducing new units in the hierarchy. The example presented above had two separate cases of non-strict mappings. The first case is that tomatoes are considered to be both a fruit and a vegetable. This could lead to overcounting, as shown in Figure 9(a). "Tomato" contributes to both "Fruit" and "Vegetable," and so is counted twice at "Produce." The second case is that a "Monstero" could be considered a "Fruit" or a "Vegetable." This could lead to overcounting (both mappings are possible) or undercounting (neither mapping is definite). The technique to repair the non-strictness in both cases is the same.

click to expand
Figure 9: Repairing a Non-Strict Hierarchy

To repair a non-strict hierarchy, two goals must be simultaneously attained. First, the counting problems must be fixed. In Figure 9(a) the counting problem manifests itself as an overcounting. The count at the "Produce" unit should be two rather than three since the produce is one "Tomato" and one "Apple." The second goal to achieve is the correct display of values at a coarse category. When data at a coarse category is viewed, it must include the multiple mapped unit(s) at the finer category. In the example, if the Food Group units are viewed then the count for "Vegetable" should be one, and for "Fruit," it should be two since a "Tomato" is both a fruit and a vegetable.

To solve the counting problem, a new (hidden) unit is added for each non-strict mapping. In Figure 9(b) the overcounting has been corrected by adding a "Vegetable and Fruit" unit. "Tomato" maps to the new unit. The solution to the second problem is to split each unit into a visible unit and a hidden unit. The hidden units are used to aggregate higher up the hierarchy. In Figure 9(b) the hidden units are shaded in gray. Note that "Tomato" does not map to the hidden "Vegetable" or "Fruit" units. The visible units are used for display purposes only. They are the original non-strict mappings. The visible units are shaded white in the Food Groups category of Figure 9(b). The relationship between the visible units in Food Groups and the units in Items is the same in both Figure 9(a) and Figure 9(b).



Multidimensional Databases(c) Problems and Solutions
Multidimensional Databases: Problems and Solutions
ISBN: 1591400538
EAN: 2147483647
Year: 2003
Pages: 150

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net