11.3 Properties of Random Worlds


11.3 Properties of Random Worlds

Any reasonable method of ascribing degrees of belief given a knowledge base should certainly assign the same degrees of belief to a formula φ given two equivalent knowledge bases. Not surprisingly, random worlds satisfies this property.

Proposition 11.3.1

start example

If KB KB, then μ(φ | KB) = μ(φ | KB) for all formulas φ. (μ(φ | KB) = μ(φ | KB) means that either both degrees of belief exist and have the same value, or neither exists. A similar convention is used in other results.)

end example

Proof By assumption, precisely the same set of worlds satisfy KB and KB. Therefore, for all N and and are equal. Therefore, the limits are also equal (or neither exists).

What about more interesting examples; in particular, what about the examples considered in Section 11.1? First, consider perhaps the simplest case, where there is a single reference class that is precisely the "right one." For example, if KB says that 90 percent of people with jaundice have hepatitis and Eric has hepatitis, that is, if

then one would certainly hope that μ(Hep(Eric) | KB) = .9. (Note that the degree of belief assertion uses equality while the statistical assertion uses approximate equality.) More generally, suppose that the formula ψ(c) represents all the information in the knowledge base about the constant c. In this case, every individual x satisfying ψ(x) agrees with c on all properties for which there is information about c in the knowledge base. If there is statistical information in the knowledge base about the fraction of individuals satisfying ψ that also satisfy φ, then clearly ψ is the most appropriate reference class to use for assigning a degree of belief in φ(c).

The next result says that the random-worlds approach satisfies this desideratum. It essentially says that if KB has the form

and ψ(c) is all the information in KB about c, then μ(φ(c) | KB) = α. Here, KB is simply intended to denote the rest of the information in the knowledge base, whatever it may be. But what does it mean that "ψ(c) is all the information in KB about c"? For the purposes of this result, it means that (a) c does not appear in either φ(x) or ψ(x) and (b) c does not appear in KB. To understand why c cannot appear in φ(x), suppose that φ(x) is Q(x) x = c, ψ(x) is true and KB is the formula ∥φ(x) | truex 1.5. If the desired result held without the requirement that c not appear in φ(x), it would lead to the erroneous conclusion that μ(φ(c) | KB) = .5. But since φ(c) is Q(c) c = c, and thus is valid, it follows that μ(φ(c) | KB) = 1. To see why the constant c cannot appear in ψ(x), suppose that ψ(x) is (P(x) x c) P(x), φ(x) is P(x), and the KB is ψ(c) P(x) | ψ(x) x 2.5. Again, if the result held without the requirement that c not appear in ψ(x), it would lead to the erroneous conclusion that μ(P(c) | KB) = .5. But ψ(c) is equivalent to P(c), so KB implies P(c) and μ(P(c) | KB) = 0.

Theorem 11.3.2

start example

Suppose that KB is a knowledge base of the form

KB is eventually consistent, and c does not appear in KB, φ(x), or ψ(x). Then μ(φ(c)|KB) = α.

end example

Proof Since KB is eventually consistent, there exist some such that for all with , there exists such that # for all and . The proof strategy is to partition into disjoint clusters and prove that, within each cluster, the fraction of worlds satisfying φ(c) is between α τi and α + τi. From this it follows that the fraction of worlds in satisfying φ(c)—that is, the degree of belief in φ(c)—must also be between α τi and α + τi. The result then follows by letting go to 0.

Here are the details. Given and , partition so that two worlds are in the same cluster if and only if they agree on the denotation of all symbols in other than c. Let W be one such cluster. Since ψ does not mention c, the set of individuals d DN such that ψ(d) holds is the same at all the relational structures in W. That is, given a world W, let . Then D,ψ = D,ψ for all , W, since the denotation of all the symbols in other than c is the same in and , and c does not appear in ψ (Exercise 10.3). I write DW, ψ to emphasize the fact that the set of domain elements satisfying ψ is the same at all the relational structures in W. Similarly, let DW, φ ψ be the set of domain elements satisfying φ ψ in W.

Since the worlds in W all satisfy KB (for the fixed choice of ), they must satisfy ∥φ(x)| ψ(x)x i α. Thus, (τi α)|DW, ψ ||DW, φ∧ψ| (τi + α)|DW, ψ|. Since the worlds in W all satisfy ψ(c), it must be the case that c DW,ψ for all W. Moreover, since c is not mentioned in KB except for the statement ψ(c), the denotation of c does not affect the truth of ∥φ(x) | ψ(x) x i α KB. Thus, for each d DW, ψ there must be exactly one world d W such that cd = d. That is, there is a one-to-one correspondence between the worlds in W and DW, ψ. Similarly, there is a one-to-one correspondence between the worlds in W satisfying φ(c) and DW, φ∧ψ. Therefore, the fraction of worlds in W satisfying φ(c) is in [α , α + ].

The fraction of worlds in satisfying φ(c) (which is , by definition) is a weighted average of the fraction within the individual clusters. More precisely, if fW is the fraction of worlds in W satisfying φ(c), then , where the sum is taken over all clusters W (Exercise 11.5). Since fW [α τi, α + τi] for all clusters W, it immediately follows that .

This is true for all . It follows that lim and lim are both also in the range [α τi, α + τi]. Since this holds for all , it follows that

Thus, μ(φ(c) | KB) = α.

Theorem 11.3.2 can be generalized in several ways; see Exercise 11.6. However, even this version suffices for a number of interesting conclusions.

Example 11.3.3

start example

Suppose that the doctor sees a patient Eric with jaundice and his medical textbook says that 90 percent of people with jaundice have hepatitis, 80 percent of people with hepatitis have a fever, and fewer than 5 percent of people have hepatitis. Let

Then μ(Hep(Eric) | KBhep KBhep) = .9 as desired; all the information in KBhep is ignored. Other kinds of information would also be ignored. For example, if the doctor had information about other patients and other statistical information, this could be added to KBhep without affecting the conclusion, as long as it did not mention Eric.

end example

Preference for the more specific reference class also follows from Theorem 11.3.2.

Corollary 11.3.4

start example

Suppose that KB is a knowledge base of the form

KB is eventually consistent, and c does not appear in KB, ψ1(x), ψ2(x), or φ(x). Then μ(φ(c) | KB) = α1.

end example

Proof Set KB = ∥φ(x) | ψ1(x)x j α2 KB. Observe that KB = ψ1(c) ψ2(c) ∥φ(x) | ψ1(x) ψ2(x)x i α1 KB and that c does not appear in KB, so the result follows immediately from Theorem 11.3.2 (taking ψ = ψ1 ψ2).

As an immediate consequence of Corollary 11.3.4, if the doctor knows all the facts in knowledge base KBhep KBhep of Example 11.3.3 and, in addition, knows that Eric is a baby and only 10 percent of babies with jaundice have hepatitis, then the doctor would ascribe degree of belief .1 to Eric's having hepatitis.

Preference for the more specific reference class sometimes comes in another guise, where it is more obvious that the more specific reference class is the smaller one.

Corollary 11.3.5

start example

Suppose that KB is a knowledge base of the form

KB is eventually consistent, and c does not appear in KB, ψ1(x), ψ2(x), or φ(x). Then μ(φ(c)| KB) = α1.

end example

Proof Let KB1 be identical to KB except without the conjunct ψ2(c). KB is equivalent to KB1, since (ψ1(c) ∧∀x(ψ1(x) ψ2(x))) ψ2(c). Thus, by Proposition 11.3.1, μ(φ(c)| KB)=μ(φ(c)| KB1). The fact that μ(φ(c) | KB1) = α1 is an immediate consequence of Theorem 11.3.2; since x(ψ1(x) ψ2(x)) φ(x) | ψ2(x) x j α2 does not mention c, it can be incorporated into KB.

Note that in Corollary 11.3.5 there are two potential reference classes for c: the individuals that satisfy ψ1(x) and the individuals that satisfy ψ2(x). Since KB implies x(ψ1(x) ψ2(x)), clearly ψ1(x) is the more specific reference class (at least in worlds satisfying KB). Corollary 11.3.5 says that the statistical information about the reference class ψ1 is what determines the degree of belief of φ; the statistical information regarding ψ2 is irrelevant.

Example 11.1.1 shows that a preference for the more specific reference class can sometimes be problematic. Why does the random-worlds approach not encounter this problem? The following example suggests one answer:

Example 11.3.6

start example

Let ψ(x) =def Jaun(x) ( Hep(x) x = Eric). Let KBhep = KBhep Hep(x) | ψ(x) x 4 0. Clearly ψ(x) is more specific than Jaun(x); that is, |= x(ψ(x) Jaun(x)). Corollary 11.3.5 seems to suggest that the doctor's degree of belief that Eric has hepatitis should be 0. However, this is not the case; Corollary 11.3.5 does not apply because ψ(x) mentions Eric. This observation suggests that what makes the reference class used in Example 11.1.1 fishy is that it mentions Eric. A reference class that explicitly mentions Eric should not be used to derive a degree of belief regarding Eric, even if very good statistics are available for that reference class. (In fact, it can be shown that μ(Hep(Eric) | KBhep) = μ(Hep(Eric) | KBhep) = .9, since in fact μ(Hep(x) | ψ(x)x 4 0 | KBhep) = 1: the new information in KBhep holds in almost all worlds that satisfy KBhep, so it does not really add anything. However, a proof of this fact is beyond the scope of this book.)

end example

In Theorem 11.3.2, the knowledge base is assumed to have statistics for precisely the right reference class to match the knowledge about the individual(s) in question. Unfortunately, in many cases, the available statistical information is not detailed enough for Theorem 11.3.2 to apply. Consider the knowledge base KBhep from the hepatitis example, and suppose that the doctor also knows that Eric has red hair; that is, his knowledge is characterized by KBhep Red(Eric). Since the knowledge base does not include statistics for the frequency of hepatitis among red-haired individuals, Theorem 11.3.2 does not apply. It seems reasonable here to ignore Red(Eric). But why is it reasonable to ignore Red(Eric) and not Jaun(Eric)? To solve this problem in complete generality would require a detailed theory of irrelevance, perhaps using the ideas of conditional independence from Chapter 4. Such a theory is not yet available. Nevertheless, the next theorem shows that, if irrelevance is taken to mean "uses only symbols not mentioned in the relevant statistical likelihood formula", the random-worlds approach gives the desired result. Roughly speaking, the theorem says that if the KB includes the information ∥φ(x) | ψ(x)x i α ψ(c), and perhaps a great deal of other information (including possibly information about c), then the degree of belief in φ(c) is still α, provided that the other information about c does not involve symbols that appear in φ, and whatever other statistics are available about φ in the knowledge base are "subsumed" by the information ∥φ(x) | ψ(x)x i α. "Subsumed" here means that for any other statistical term of the form ∥φ(x) | ψ′(x)x, either x(ψ(x) ψ′(x)) or x(ψ(x) ψ′ (x)) follows from the knowledge base.

Theorem 11.3.7

start example

Let KB be a knowledge base of the form

Suppose that

  1. KB is eventually consistent,

  2. c does not appear in φ(x) or ψ(x), and

  3. none of the symbols in that appear in φ(x) appear in ψ(x) or KB, except possibly in statistical expressions of the form ∥φ(x) | ψ′(x)x; moreover, for any such expression, either ⊨∀x (ψ(x) ψ′(x)) or ⊨∀x (ψ(x) ψ′(x)).

Then μ(φ(c) | KB) = α.

end example

Proof Just as in the proof of Theorem 11.3.2, the key idea involves partitioning the set appropriately. The details are left to Exercise 11.7.

Note how Theorem 11.3.7 differs from Theorem 11.3.2. In Theorem 11.3.2, c cannot appear in ψ(x) or KB. In Theorem 11.3.7, c is allowed to appear in ψ(x) and KB, but no symbol in that appears in φ(x) may appear in ψ(x) or KB. Thus, if φ(x) is P(x), then ψ(x) cannot be (P(x) x c) P(x), because P cannot appear in ψ(x).

From Theorem 11.3.7, it follows immediately that μ(Hep(Eric) | KBhep Red(Eric)) = .9. The degree of belief would continue to be .9 even if other information about Eric were added to KBhep, such as Eric has a fever and Eric is a baby, as long as the information did not involve the predicate Hep.

I now consider a different issue: competing reference classes. In all the examples I have considered so far, there is an obviously "best" reference class. In practice, this will rarely be the case. It seems difficult to completely characterize the behavior of the random-worlds approach on arbitrary knowledge bases (although the connection between random worlds and maximum entropy described in Section 11.5 certainly gives some insight). Interestingly, if there are competing reference classes that are essentially disjoint, Dempster's Rule of Combination can be used to compute the degree of belief.

For simplicity, assume that the knowledge base consists of exactly two pieces of statistical information, both about a unary predicate PP(x) | ψ1(x)x i α1 and P(x)| ψ2(x)x j α2—and, in addition, the knowledge base says that there is exactly one individual satisfying both ψ1(x) and ψ2(x); that is, the knowledge base includes the formula !x(ψ1(x) ψ2(x)). (See Exercise 11.8 for the precise definition of !xφ(x).) The two statistical likelihood formulas can be viewed as providing evidence in favor of P to degree α1 and α2, respectively. Consider two probability measures μ1 and μ2 on a two-point space {0, 1} such that μ1(1) = α1 and μ2(1) = α2. (Think of μ1(1) as describing the degree of belief that P(c) is true according to the evidence provided by the statistical formula P(x) | ψ1(x)x and μ2(1) as describing the degree of belief that P(c) is true according to P(x) | ψ2(x)x.) According to Dempster's Rule of Combination, . As shown in Section 3.4, Dempster's Rule of Combination is appropriate for combining evidence probabilistically. The next theorem shows that this is also how the random-worlds approach combines evidence in this case.

Theorem 11.3.8

start example

Suppose that KB is a knowledge base of the form

KB is eventually consistent, P is a unary predicate, neither P nor c appears in ψ1(x) or ψ2(x), and either α1 < 1 and α2 < 1 or α1 > 0 and α2 > 0. Then .

end example

Proof Again, the idea is to appropriately partition . See Exercise 11.9.

This result can be generalized to allow more than two pieces of statistical information; Dempster's Rule of Combination still applies (Exercise 11.10). It is also not necessary to assume that there is a unique individual satisfying both ψ1 and ψ2. It suffices that the set of individuals satisfying ψ1 ψ2 be "small" relative to the set satisfying ψ1 and the set satisfying ψ2, although the technical details are beyond the scope of this book.

The following example illustrates Theorem 11.3.8:

Example 11.3.9

start example

Assume that the knowledge base consists of the information that Nixon is both a Quaker and a Republican, and there is statistical information for the proportion of pacifists within both classes. More formally, assume that KBNixon is

What is the degree of belief that Nixon is a pacifist, given KBNixon? Clearly that depends on α and β. Let φ be Pac(Nixon). By Theorem 11.3.8, if {α, β} {0, 1}, then μ(φ | KBNixon) always exists and its value is equal to . If, for example, β = .5, so that the information for Republicans is neutral, then μ(φ |KBNixon) = α: the data for Quakers is used to determine the degree of belief. If the evidence given by the two reference classes is conflicting—α >.5 > β—then μ(φ | KBNixon) [α, β]: some intermediate value is chosen. If, on the other hand, the two reference classes provide evidence in the same direction, then the degree of belief is greater than both α and β. For example, if α = β = .8, then the degree of belief is about .94. This has a reasonable explanation: if there are two independent bodies of evidence both supporting φ, then their combination should provide even more support for φ.

Now assume that α = 1 and β > 0. In that case, it follows from Theorem 11.3.8 that μ(φ | KBNixon) = 1. Intuitively, an extreme value dominates. But what happens if the extreme values conflict? For example, suppose that α = 1 and β = 0. This says that almost all Quakers are pacifists and almost no Republicans are. In that case, Theorem 11.3.8 does not apply. In fact, it can be shown that the degree of belief does not exist. This is because the value of the limit depends on the way in which the tolerances tend to 0. More precisely, if τ1 τ2 (where means "much smaller than"), so that the "almost all" in the statistical interpretation of the first conjunct is much closer to "all" than the "almost none" in the second conjunct is closer to "none," then the limit is 1. Symmetrically, if τ2 τ1, then the limit is 0. On the other hand, if τ1 = τ2, then the limit is 1/2. (In particular, this means that if the subscript 1 were used for the in both statistical assertions, then the degree of belief would be 1/2.)

There are good reasons for the limit not to exist in this case. The knowledge base simply does not say what the relationship between τ1 and τ2 is. (It would certainly be possible, of course, to consider a richer language that allows such relationships to be expressed.)

end example




Reasoning About Uncertainty
Reasoning about Uncertainty
ISBN: 0262582597
EAN: 2147483647
Year: 2005
Pages: 140

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net