INCOMPLETE INFORMATION IN DATABASE MANAGEMENT SYSTEMS | Multidimensional Databases: Problems and Solutions

There has been extensive research on incomplete information in the context of relational, deductive, and object-oriented databases (Dyreson, 1997a). This research has covered incompleteness at the level of attribute values, tuples, and in the schema. Many kinds of incomplete information have been identified. The term "incomplete" covers fuzzy, imprecise, indeterminate, indefinite, missing, partial, possible, probabilistic, unknown, uncertain, and vague information. In the following sections, the most common kinds of incomplete information are discussed.

What Do We Mean By Incomplete?

The definition of the term "incomplete" given in The Concise Oxford English Dictionary is perhaps too concise: incomplete is defined as "not complete." The definition of "complete" is of more help: complete means "entire" or "whole." An object, then, is incomplete with respect to an object that is entire or whole. The incomplete object is missing something from its more complete partner. For example, an ancient Greek statue that is missing an arm is incomplete with respect to the statue with that arm. If the statue's arm is unearthed and reaffixed, the statue could be made complete.

It is important to note that information is incomplete only with respect to more complete information (which in turn could be incomplete with respect to still more complete information). The difference between complete and incomplete information is often a matter of precision. For instance, the fact that the temperature is 56 degrees is less precise than that the temperature is 56.4 degrees. Both statements about the temperature are, however, more precise than the assertion that the temperature is "in the fifties."

Unknown Values

An unknown value is a value that is known to exist, but the actual value is unknown (Codd, 1979). The unknown value is assumed to belong to some specified domain of values. Unknown values are a very common kind of incomplete information; a null value in SQL is often interpreted to mean an unknown value. For example, in a grocery database, every product must have a calorie count, but the number of calories in a pack of gum could be recorded as unknown. The unknown value indicates that the gum has some non-negative number of calories (i.e., the number of calories is not "purple" or "-3"). An unknown value has various names in the literature, including missing null (Goldstein, 1981) and existential null (Biskup, 1984; Minker, 1982).

An unknown value is often semantically overloaded to mean either an unknown value or an inapplicable value (Codd, 1979), in which case it is a no information null. Problems that result from such an overloading have been identified (Zaniolo, 1984). The use of four-valued logics has been advocated to untangle the semantic confusion (Gessert, 1991).

Imprecise Values

A generalization of an unknown null is an imprecise value. An imprecise value is an attribute value that is known to exist and known to be a single value from a sequence of the attribute domain. For example, in a grocery database, the number of grams of protein in an apple is recorded as between 1 and 3 (the sequence 1, 2, 3), but the precise amount is not known since it varies from apple to apple. The amount of protein is said to be imprecise. A special case of an imprecise value is an unknown value (the subset is the domain itself). Another special case is a precise or complete value (the subset is a singleton set). Partially known values (Grant, 1979) and set nulls (Keller & Wilkins, 1985) are imprecise values.

Disjunctive Values

Another generalization of an unknown null is a disjunctive value (Grant & Minker, 1986), also known as an indefinite value (Liu & Sunderraman, 1990, 1991). A disjunctive value is a non-empty subset of the attribute domain. For example, an apple might be a McIntosh, a Granny Smith, or a Fuji apple. The disjunction can be exclusive (Ola, 1992) or inclusive (Homenda, 1991). If it is an exclusive disjunction, one and only one disjunct is the true value. If it is inclusive, then more than one disjunct can be the true value.

Exotic Incomplete Values

A maybe value is an attribute value that might or might not exist (Gessert, 1991). If it does exist, the value is known. For instance, we could store in our grocery database that the price of a packet of coffee creamer is maybe 10 cents. Coffee creamer packets are usually given away freely, and therefore the price attribute is inapplicable, but some stores sell the coffee creamer at a cost of 10 cents. A maybe tuple is similar to a maybe value, but the entire tuple might not be part of the relation (Liu & Sunderraman, 1991). Maybe tuples are produced when one disjunct of an inclusive disjunctive fact is found to be true. The other disjuncts become maybe tuples.

A combination of inclusive disjunctive and maybe information is open information. An open value indicates that an attribute of a particular tuple is under the open world assumption (Gottlob & Zicari, 1988). The attribute value may not exist, could be exactly one value, or could be many values. For example, in the employee database an open value could be used for Jill's previous employment history. This value means that Jill possibly had previous employment (this could be Jill's first job), Jill might have had one previous job, or Jill might have been employed many times previously. The open value covers all these possibilities.

A no information value is a combination of an open value and an unknown value (Zaniolo, 1984). The no information value restricts an open value to resemble an unknown value. A no information value might not exist, but if it does, then it is a single value that is unknown, rather than possibly many values.

A generalization of open information is possible information (Lipski, 1979). Possible information is an attribute value that has an undetermined existence. If it does exist, it could be multiple values from a subset of the attribute domain. For example, in an employee database, an employee's previous employment history could be narrowed to possibly two companies; she could have worked for both companies, only one, or neither. A special case of a possible attribute value is an open value (the subset is the domain itself). Another special case is a maybe value (the subset is a singleton set).

Probabilistic Values

A probabilistic value is a generalization of an exclusive disjunctive value. A probabilistic value is a set of alternatives, where each alternative has an associated probability that it is the attribute value (Barbará et al., 1992; Cavallo & Pittarelli, 1987). For example, in a grocery database, assume that we do not know the fat content of Spam exactly, but are 70% sure it is 50 grams and 30% sure that it is 45 grams. The fat content is a probabilistic data value; the value exists, it is a value from a known subset of the attribute domain, it is exactly one value, and we know that some alternatives are more likely than others. In some models, one of the members of the set of alternatives could be an unknown value, in which case the associated probability is distributed uniformly over the elements in the domain (Barbará et al., 1992). In other models the probability is at the tuple-level, indicating the likelihood that the tuple is a member of a relation (Gelenbe & Hebrail, 1986).

Possibilistic Values

Another variety of weighted incomplete information is a possibilistic or fuzzy set value. A possibilistic value is similar to a possible value. A fuzzy set is a set of possibilities. Each possibility is a maybe value, that is, it may belong to the set or it might not (Zadeh, 1989). The possibility that it does belong is given by a membership function (also known as the degree of membership). The degree is a value between 0 and 1 (inclusive). A fuzzy set can be an attribute value.