REPRESENTING AND INTERPRETING NULLS IN MULTIDIMENSIONAL AGGREGATE DATA

The presence of scarce data or null values is frequent in multidimensional data. In the case of multidimensional aggregate databases, the refined subdivision made in the theory of null values, as mentioned in ANSI/X3/SPARC Report (1975) and Vassiliou (1980), is not necessary. In fact, only two types of null values are basically used: unknown (or non-available, NA) and non-existing (the latter are often called "structural zero entries," see Bishop, 1978).

An example of the null value "unknown" arises when in a MAD only the data relative to the production of fruit in the USA are reported; in this case, if the data relative to the state of California for the years "80, 81, 82" are missing, they will be "unknown" (presumably not recorded after a survey and, in any case, not available).

An example of "non-existing" data, instead, can be found in a MAD where the data relative to the reports of illness subdivided into illness and sex are illustrated; in this case the missing data regard "cancer of the prostate gland" for "sex = female" or "cancer of the uterus" for "sex = male" (the value will be zero, but it will be a structural zero, in that it can never assume a value which is different from zero). It should be noted that there is an important difference between the two types of null values, especially with reference to the relative marginal value (the total regarding the category attribute).

In particular, in the former (unknown) case, the total regarding the category attribute, for which there are unknown values, is not the total of the attribute itself. In the latter (non-existing) case, although it is true that, for example, the total of cancer of the prostate cases is the total reported in the marginal (summarizing with regard to "sex"), in this way the information that these cases are the total of males alone and not the entire population is lost. For this reason, we distinguish, for the summary data of the tables in output, two null values: non-available value (symbol "NA") and structural zero (symbol "-"). The structural zeros are important also in the case in which the cross-product among the different category attributes which describe the summary data is incomplete (for example, in the case of a relation).



Multidimensional Databases(c) Problems and Solutions
Multidimensional Databases: Problems and Solutions
ISBN: 1591400538
EAN: 2147483647
Year: 2003
Pages: 150

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net