11.6 Problems with the Random-Worlds Approach

The previous sections have shown that the random-worlds approach has many desirable properties. This section presents the flip side and shows that the random-worlds approach also suffers from some serious problems. I focus on two of them here: representation dependence and learning.

Suppose that the only predicate in the language is White, and KB is true. Then μ_∞(White(c) | KB) = 1/2. On the other hand, if White is refined by adding Red and Blue to the vocabulary and KB′ asserts that White is the disjoint union of Red and Blue (i.e., KB′ is ∀x(( White(x) ⇔ (Red(x) ∨ Blue(x)) ∧ (Red(x) ∧ Blue(x))), then it is not hard to show that μ_∞(White(c) | KB′) = 1/3 (Exercise 11.17). The fact that simply expanding the language and giving a definition of an old notion ( White) in terms of the new notions (Red and Blue) can affect the degree of belief seems to be a serious problem.

This kind of representation dependence seems to be a necessary consequence of being able to draw conclusions that go beyond those that can be obtained by logical consequence alone. In some cases, the representation dependence may indicate something about the knowledge base. For example, suppose that only about half of all birds can fly, Tweety is a bird, and Opus is some other individual (who may or may not be a bird). One obvious way to represent this information is to have a language with predicates Bird and Flies, and take the knowledge base KB to consist of the statements ∥Flies(x) | Bird(x)∥ _x ≈₁.5 and Bird(Tweety). It is easy to see that μ_∞(Flies(Tweety) | KB) = .5 and μ_∞(Bird(Opus) | KB) = .5. But suppose that, instead, the vocabulary has predicates Bird and FlyingBird. Let KB′ consist of the statements ∥FlyingBird(x) | Bird(x)∥_x ≈₂ .5, Bird(Tweety), and ∀x(FlyingBird(x) ⇒ Bird(x)). KB′ seems to be expressing the same information as KB. But μ_∞(FlyingBird(Tweety) | KB′) = .5 and μ_∞(Bird(Opus) | KB′) = 2/3. The degree of belief that Tweety flies is .5 in both cases, although the degree of belief that Opus is a bird changes. Arguably, the fact that the degree of belief that Opus is a bird is language dependent is a direct reflection of the fact that the knowledge base does not contain sufficient information to assign it a single "justified" value. This suggests that it would be useful to characterize those queries that are language independent, while recognizing that not all queries will be.

In any case, in general, it seems that the best that can be done is to accept representation dependence and, indeed, declare that it is (at times) justified. The choice of an appropriate vocabulary is a significant one, which may encode some important information. In the example with colors, the choice of vocabulary can be viewed as reflecting the bias of the reasoner with respect to the partition of the world into colors. Researchers in machine learning and the philosophy of induction have long realized that bias is an inevitable component of effective inductive reasoning. So it should not be so surprising if it turns out that the related problem of finding degrees of belief should also depend on the bias. Of course, if this is the case, then it would also be useful to have a good intuitive understanding of how the degrees of belief depend on the bias. In particular, it would be helpful to be able to give a knowledge base designer some guidelines for selecting the "appropriate" representation. Unfortunately, such guidelines do not exist (for random worlds or any other approach) to the best of my knowledge.

To understand the problem of learning, note that so far I have taken the knowledge base as given. But how does an agent come to "know" the information in the knowledge base? For some assertions, like "Tom has red hair", it seems reasonable that the knowledge comes from direct perceptions, which agents typically accept as reliable. But under what circumstances should a statement such as ∥Flies(x) | Bird(x)∥ _x ≈_i .9 be included in a knowledge base? Although I have viewed statistical assertions as objective statements about the world, it is unrealistic to suppose that anyone could examine all the birds in the world and count how many of them fly. In practice, it seems that this statistical statement would appear in KB if someone inspects a (presumably large) sample of birds and about 90 percent of the birds in this sample fly. Then a leap is made: the sample is assumed to be typical, and the statistics in the sample are taken to be representative of the actual statistics.

Unfortunately, the random-worlds method by itself does not support this leap, at least not if sampling is represented in the most obvious way. Suppose that an agent starts with no information other than that Tweety is a bird. In that case, the agent's degree of belief that Tweety flies according to the random-worlds approach is, not surprisingly, .5. That is, μ_∞(Flies(Tweety) | Bird(Tweety)) = .5 (Exercise 11.18(a)). In the absence of information, this seems quite reasonable. But the agent then starts observing birds. In fact, the agent observes N birds (think of N as large), say c₁, …, c_N, and the information regarding which of them fly is recorded in the knowledge base. Let Bird(Tweety) ∧ KB′ be the resulting knowledge base. Thus, KB′ has the form

where Flies_i(c_i) is either Flies(c_i) or Flies(c_i). It seems reasonable to expect that if most (say 90 percent) of the N birds observed by the agent fly, then the agent's belief that Tweety flies increases. Unfortunately, it doesn't; μ_∞(Flies(Tweety | Bird(Tweety) ∧ KB′) = .5 (Exercise 11.18(b)).

What if instead the sample is represented using a predicate S? The fact that 90 percent of sampled birds fly can then be expressed as ∥Flies(x) | Bird(x) ∧ S(x)∥_x ≈₁ .9. This helps, but not much. To see why, suppose that α percent of the domain elements were sampled. If KB is

it seems reasonable to expect that μ_∞(Flies(Tweety) | KB″) = .9, but it is not. In fact, μ_∞(Flies(Tweety) | KB″) = .9α + .5(1 − α) (Exercise 11.18(c)). The random-worlds approach treats the birds in S and those outside S as two unrelated populations; it maintains the default degree of belief (1/2) that a bird not in S will fly. (This follows from maximum entropy considerations, along the lines discussed in Section 11.5.) Intuitively, the random-worlds approach is not treating S as a random sample. Of course, the failure of the obvious approach does not imply that random worlds is incapable of learning statistics. Perhaps another representation can be found that will do better (although none has been found yet).

To summarize, the random-worlds approach has many attractive features but some serious flaws as well. There are variants of the approach that deal well with some of the problems, but not with others. (See, e.g., Exercise 11.19.) Perhaps the best lesson that can be derived from this discussion is that it may be impossible to come up with a generic method for obtaining degrees of belief from statistical information that does the "right" thing in all possible circumstances. There is no escaping the need to understand the details of the application.