11.3 Properties of Random Worlds

Any reasonable method of ascribing degrees of belief given a knowledge base should certainly assign the same degrees of belief to a formula φ given two equivalent knowledge bases. Not surprisingly, random worlds satisfies this property.

Proposition 11.3.1

If ≈ ⊨KB ⇔ KB′, then μ_∞(φ | KB′) = μ_∞(φ | KB′) for all formulas φ. (μ_∞(φ | KB) = μ_∞(φ | KB) means that either both degrees of belief exist and have the same value, or neither exists. A similar convention is used in other results.)

Proof By assumption, precisely the same set of worlds satisfy KB and KB′. Therefore, for all N and and are equal. Therefore, the limits are also equal (or neither exists).

What about more interesting examples; in particular, what about the examples considered in Section 11.1? First, consider perhaps the simplest case, where there is a single reference class that is precisely the "right one." For example, if KB says that 90 percent of people with jaundice have hepatitis and Eric has hepatitis, that is, if

then one would certainly hope that μ_∞(Hep(Eric) | KB) = .9. (Note that the degree of belief assertion uses equality while the statistical assertion uses approximate equality.) More generally, suppose that the formula ψ(c) represents all the information in the knowledge base about the constant c. In this case, every individual x satisfying ψ(x) agrees with c on all properties for which there is information about c in the knowledge base. If there is statistical information in the knowledge base about the fraction of individuals satisfying ψ that also satisfy φ, then clearly ψ is the most appropriate reference class to use for assigning a degree of belief in φ(c).

The next result says that the random-worlds approach satisfies this desideratum. It essentially says that if KB has the form

and ψ(c) is all the information in KB about c, then μ_∞(φ(c) | KB) = α. Here, KB′ is simply intended to denote the rest of the information in the knowledge base, whatever it may be. But what does it mean that "ψ(c) is all the information in KB about c"? For the purposes of this result, it means that (a) c does not appear in either φ(x) or ψ(x) and (b) c does not appear in KB′. To understand why c cannot appear in φ(x), suppose that φ(x) is Q(x) ∨ x = c, ψ(x) is true and KB is the formula ∥φ(x) | true∥_x ≈₁.5. If the desired result held without the requirement that c not appear in φ(x), it would lead to the erroneous conclusion that μ_∞(φ(c) | KB) = .5. But since φ(c) is Q(c) ∨ c = c, and thus is valid, it follows that μ_∞(φ(c) | KB) = 1. To see why the constant c cannot appear in ψ(x), suppose that ψ(x) is (P(x) ∧ x ≠ c) ∨ P(x), φ(x) is P(x), and the KB is ψ(c) ∧ ∥P(x) | ψ(x)∥ _x ≈₂.5. Again, if the result held without the requirement that c not appear in ψ(x), it would lead to the erroneous conclusion that μ_∞(P(c) | KB) = .5. But ψ(c) is equivalent to P(c), so KB implies P(c) and μ_∞(P(c) | KB) = 0.

Theorem 11.3.2

Suppose that KB is a knowledge base of the form

KB is eventually consistent, and c does not appear in KB′, φ(x), or ψ(x). Then μ_∞(φ(c)|KB) = α.

Proof Since KB is eventually consistent, there exist some such that for all with , there exists such that # for all and . The proof strategy is to partition into disjoint clusters and prove that, within each cluster, the fraction of worlds satisfying φ(c) is between α − τ_i and α + τ_i. From this it follows that the fraction of worlds in satisfying φ(c)—that is, the degree of belief in φ(c)—must also be between α − τ_i and α + τ_i. The result then follows by letting go to 0.

Here are the details. Given and , partition so that two worlds are in the same cluster if and only if they agree on the denotation of all symbols in 풯 other than c. Let W′ be one such cluster. Since ψ does not mention c, the set of individuals d ∈ D_N such that ψ(d) holds is the same at all the relational structures in W′. That is, given a world 풜 ∊ W′, let . Then D_풜,ψ = D_풜′,ψ for all 풜, 풜′ ∊ W′, since the denotation of all the symbols in 풯 other than c is the same in 풜 and 풜′, and c does not appear in ψ (Exercise 10.3). I write D_{W′, ψ} to emphasize the fact that the set of domain elements satisfying ψ is the same at all the relational structures in W′. Similarly, let D_{W′, φ ∧ ψ} be the set of domain elements satisfying φ ∧ ψ in W′.

Since the worlds in W′ all satisfy KB (for the fixed choice of ), they must satisfy ∥φ(x)| ψ(x)∥_x ≈_i α. Thus, (τ_i − α)|D_{W′, ψ} |≤|D_{W′, φ∧ψ| ≤} (τ_i + α)|D_{W′, ψ}|. Since the worlds in W′ all satisfy ψ(c), it must be the case that c^풜 ∊ D_W′,ψ for all 풜 ∊ W′. Moreover, since c is not mentioned in KB except for the statement ψ(c), the denotation of c does not affect the truth of ∥φ(x) | ψ(x)∥ _x ≈_i α ∧ KB′. Thus, for each d ∈ D_{W′, ψ} there must be exactly one world 풜_d ∊ W′ such that c^풜d = d. That is, there is a one-to-one correspondence between the worlds in W′ and D_{W′, ψ}. Similarly, there is a one-to-one correspondence between the worlds in W′ satisfying φ(c) and D_{W′, φ∧ψ}. Therefore, the fraction of worlds in W′ satisfying φ(c) is in [α − ∊, α + ∊].

The fraction of worlds in satisfying φ(c) (which is , by definition) is a weighted average of the fraction within the individual clusters. More precisely, if f_W′ is the fraction of worlds in W′ satisfying φ(c), then , where the sum is taken over all clusters W′ (Exercise 11.5). Since f_W′ ∈ [α − τ_i, α + τ_i] for all clusters W′, it immediately follows that .

This is true for all . It follows that lim and lim are both also in the range [α − τ_i, α + τ_i]. Since this holds for all , it follows that

Thus, μ_∞(φ(c) | KB) = α.

Theorem 11.3.2 can be generalized in several ways; see Exercise 11.6. However, even this version suffices for a number of interesting conclusions.

Example 11.3.3

Suppose that the doctor sees a patient Eric with jaundice and his medical textbook says that 90 percent of people with jaundice have hepatitis, 80 percent of people with hepatitis have a fever, and fewer than 5 percent of people have hepatitis. Let

Then μ_∞(Hep(Eric) | KB_hep ∧ KB′_hep) = .9 as desired; all the information in KB′_hep is ignored. Other kinds of information would also be ignored. For example, if the doctor had information about other patients and other statistical information, this could be added to KB′_hep without affecting the conclusion, as long as it did not mention Eric.

Preference for the more specific reference class also follows from Theorem 11.3.2.

Corollary 11.3.4

Suppose that KB is a knowledge base of the form

KB is eventually consistent, and c does not appear in KB′, ψ₁(x), ψ₂(x), or φ(x). Then μ_∞(φ(c) | KB) = α₁.

Proof Set KB″ = ∥φ(x) | ψ₁(x)∥_x ≈_j α₂ ∧ KB′. Observe that KB = ψ₁(c) ∧ ψ₂(c) ∧ ∥φ(x) | ψ₁(x) ∧ ψ₂(x)∥_x ≈_i α₁ ∧ KB″ and that c does not appear in KB″, so the result follows immediately from Theorem 11.3.2 (taking ψ = ψ₁ ∧ ψ₂).

As an immediate consequence of Corollary 11.3.4, if the doctor knows all the facts in knowledge base KB_hep ∧ KB′_hep of Example 11.3.3 and, in addition, knows that Eric is a baby and only 10 percent of babies with jaundice have hepatitis, then the doctor would ascribe degree of belief .1 to Eric's having hepatitis.

Preference for the more specific reference class sometimes comes in another guise, where it is more obvious that the more specific reference class is the smaller one.

Corollary 11.3.5

Suppose that KB is a knowledge base of the form

KB is eventually consistent, and c does not appear in KB′, ψ₁(x), ψ₂(x), or φ(x). Then μ_∞(φ(c)| KB) = α₁.

Proof Let KB₁ be identical to KB except without the conjunct ψ₂(c). KB is equivalent to KB₁, since ⊨ (ψ₁(c) ∧∀x(ψ₁(x) ⇒ ψ₂(x))) ⇒ ψ₂(c). Thus, by Proposition 11.3.1, μ_∞(φ(c)| KB)=μ_∞(φ(c)| KB₁). The fact that μ_∞(φ(c) | KB₁) = α₁ is an immediate consequence of Theorem 11.3.2; since ∀x(ψ₁(x) ⇒ ψ₂(x)) ∧ φ(x) | ψ₂(x) _x ≈_j α₂ does not mention c, it can be incorporated into KB′.

Note that in Corollary 11.3.5 there are two potential reference classes for c: the individuals that satisfy ψ₁(x) and the individuals that satisfy ψ₂(x). Since KB implies ∀x(ψ₁(x) ⇒ ψ₂(x)), clearly ψ₁(x) is the more specific reference class (at least in worlds satisfying KB). Corollary 11.3.5 says that the statistical information about the reference class ψ₁ is what determines the degree of belief of φ; the statistical information regarding ψ₂ is irrelevant.

Example 11.1.1 shows that a preference for the more specific reference class can sometimes be problematic. Why does the random-worlds approach not encounter this problem? The following example suggests one answer:

Example 11.3.6

Let ψ(x) =_def Jaun(x) ∧ ( Hep(x) ∨ x = Eric). Let KB″_hep = KB_hep ∧ ∥Hep(x) | ψ(x)∥ _x ≈₄ 0. Clearly ψ(x) is more specific than Jaun(x); that is, |= ∀x(ψ(x) ⇒ Jaun(x)). Corollary 11.3.5 seems to suggest that the doctor's degree of belief that Eric has hepatitis should be 0. However, this is not the case; Corollary 11.3.5 does not apply because ψ(x) mentions Eric. This observation suggests that what makes the reference class used in Example 11.1.1 fishy is that it mentions Eric. A reference class that explicitly mentions Eric should not be used to derive a degree of belief regarding Eric, even if very good statistics are available for that reference class. (In fact, it can be shown that μ_∞(Hep(Eric) | KB″_hep) = μ_∞(Hep(Eric) | KB_hep) = .9, since in fact μ_∞(∥Hep(x) | ψ(x)∥_x ≈₄ 0 | KB_hep) = 1: the new information in KB″_hep holds in almost all worlds that satisfy KB_hep, so it does not really add anything. However, a proof of this fact is beyond the scope of this book.)

In Theorem 11.3.2, the knowledge base is assumed to have statistics for precisely the right reference class to match the knowledge about the individual(s) in question. Unfortunately, in many cases, the available statistical information is not detailed enough for Theorem 11.3.2 to apply. Consider the knowledge base KB_hep from the hepatitis example, and suppose that the doctor also knows that Eric has red hair; that is, his knowledge is characterized by KB_hep ∧ Red(Eric). Since the knowledge base does not include statistics for the frequency of hepatitis among red-haired individuals, Theorem 11.3.2 does not apply. It seems reasonable here to ignore Red(Eric). But why is it reasonable to ignore Red(Eric) and not Jaun(Eric)? To solve this problem in complete generality would require a detailed theory of irrelevance, perhaps using the ideas of conditional independence from Chapter 4. Such a theory is not yet available. Nevertheless, the next theorem shows that, if irrelevance is taken to mean "uses only symbols not mentioned in the relevant statistical likelihood formula", the random-worlds approach gives the desired result. Roughly speaking, the theorem says that if the KB includes the information ∥φ(x) | ψ(x)∥_x ≈_i α ∧ ψ(c), and perhaps a great deal of other information (including possibly information about c), then the degree of belief in φ(c) is still α, provided that the other information about c does not involve symbols that appear in φ, and whatever other statistics are available about φ in the knowledge base are "subsumed" by the information ∥φ(x) | ψ(x)∥_x ≈_i α. "Subsumed" here means that for any other statistical term of the form ∥φ(x) | ψ′(x)∥_x, either ∀x(ψ(x) ⇒ ψ′(x)) or ∀x(ψ(x) ⇒ ψ′ (x)) follows from the knowledge base.

Theorem 11.3.7

Let KB be a knowledge base of the form

Suppose that

KB is eventually consistent,
c does not appear in φ(x) or ψ(x), and
none of the symbols in 풯 that appear in φ(x) appear in ψ(x) or KB′, except possibly in statistical expressions of the form ∥φ(x) | ψ′(x)∥_x; moreover, for any such expression, either ≈ ⊨∀x (ψ(x) ⇒ ψ′(x)) or ≈ ⊨∀x (ψ(x) ⇒ ψ′(x)).

Then μ_∞(φ(c) | KB) = α.

Proof Just as in the proof of Theorem 11.3.2, the key idea involves partitioning the set appropriately. The details are left to Exercise 11.7.

Note how Theorem 11.3.7 differs from Theorem 11.3.2. In Theorem 11.3.2, c cannot appear in ψ(x) or KB. In Theorem 11.3.7, c is allowed to appear in ψ(x) and KB′, but no symbol in 풯 that appears in φ(x) may appear in ψ(x) or KB′. Thus, if φ(x) is P(x), then ψ(x) cannot be (P(x) ∧ x ≠ c) ∨ P(x), because P cannot appear in ψ(x).

From Theorem 11.3.7, it follows immediately that μ_∞(Hep(Eric) | KB^hep ∧ Red(Eric)) = .9. The degree of belief would continue to be .9 even if other information about Eric were added to KB_hep, such as Eric has a fever and Eric is a baby, as long as the information did not involve the predicate Hep.

I now consider a different issue: competing reference classes. In all the examples I have considered so far, there is an obviously "best" reference class. In practice, this will rarely be the case. It seems difficult to completely characterize the behavior of the random-worlds approach on arbitrary knowledge bases (although the connection between random worlds and maximum entropy described in Section 11.5 certainly gives some insight). Interestingly, if there are competing reference classes that are essentially disjoint, Dempster's Rule of Combination can be used to compute the degree of belief.

For simplicity, assume that the knowledge base consists of exactly two pieces of statistical information, both about a unary predicate P — ∥P(x) | ψ₁(x)∥_x ≈_i α₁ and ∥P(x)| ψ₂(x)∥_x ≈_j α₂—and, in addition, the knowledge base says that there is exactly one individual satisfying both ψ₁(x) and ψ₂(x); that is, the knowledge base includes the formula ∃!x(ψ₁(x) ∧ ψ₂(x)). (See Exercise 11.8 for the precise definition of ∃!xφ(x).) The two statistical likelihood formulas can be viewed as providing evidence in favor of P to degree α₁ and α₂, respectively. Consider two probability measures μ₁ and μ₂ on a two-point space {0, 1} such that μ₁(1) = α₁ and μ₂(1) = α₂. (Think of μ₁(1) as describing the degree of belief that P(c) is true according to the evidence provided by the statistical formula ∥P(x) | ψ₁(x)∥_x and μ₂(1) as describing the degree of belief that P(c) is true according to ∥P(x) | ψ₂(x)∥_x.) According to Dempster's Rule of Combination, . As shown in Section 3.4, Dempster's Rule of Combination is appropriate for combining evidence probabilistically. The next theorem shows that this is also how the random-worlds approach combines evidence in this case.

Theorem 11.3.8

Suppose that KB is a knowledge base of the form

KB is eventually consistent, P is a unary predicate, neither P nor c appears in ψ₁(x) or ψ₂(x), and either α₁ < 1 and α₂ < 1 or α₁ > 0 and α₂ > 0. Then .

Proof Again, the idea is to appropriately partition . See Exercise 11.9.

This result can be generalized to allow more than two pieces of statistical information; Dempster's Rule of Combination still applies (Exercise 11.10). It is also not necessary to assume that there is a unique individual satisfying both ψ₁ and ψ₂. It suffices that the set of individuals satisfying ψ₁ ∧ ψ₂ be "small" relative to the set satisfying ψ₁ and the set satisfying ψ₂, although the technical details are beyond the scope of this book.

The following example illustrates Theorem 11.3.8:

Example 11.3.9

Assume that the knowledge base consists of the information that Nixon is both a Quaker and a Republican, and there is statistical information for the proportion of pacifists within both classes. More formally, assume that KB_Nixon is

What is the degree of belief that Nixon is a pacifist, given KB_Nixon? Clearly that depends on α and β. Let φ be Pac(Nixon). By Theorem 11.3.8, if {α, β} ≠{0, 1}, then μ_∞(φ | KB_Nixon) always exists and its value is equal to . If, for example, β = .5, so that the information for Republicans is neutral, then μ_∞(φ |KB_Nixon) = α: the data for Quakers is used to determine the degree of belief. If the evidence given by the two reference classes is conflicting—α >.5 > β—then μ_∞(φ | KB_Nixon) ∈ [α, β]: some intermediate value is chosen. If, on the other hand, the two reference classes provide evidence in the same direction, then the degree of belief is greater than both α and β. For example, if α = β = .8, then the degree of belief is about .94. This has a reasonable explanation: if there are two independent bodies of evidence both supporting φ, then their combination should provide even more support for φ.

Now assume that α = 1 and β > 0. In that case, it follows from Theorem 11.3.8 that μ_∞(φ | KB_Nixon) = 1. Intuitively, an extreme value dominates. But what happens if the extreme values conflict? For example, suppose that α = 1 and β = 0. This says that almost all Quakers are pacifists and almost no Republicans are. In that case, Theorem 11.3.8 does not apply. In fact, it can be shown that the degree of belief does not exist. This is because the value of the limit depends on the way in which the tolerances tend to 0. More precisely, if τ₁ ≪ τ₂ (where ≪ means "much smaller than"), so that the "almost all" in the statistical interpretation of the first conjunct is much closer to "all" than the "almost none" in the second conjunct is closer to "none," then the limit is 1. Symmetrically, if τ₂ ≪ τ₁, then the limit is 0. On the other hand, if τ₁ = τ₂, then the limit is 1/2. (In particular, this means that if the subscript 1 were used for the ≈ in both statistical assertions, then the degree of belief would be 1/2.)

There are good reasons for the limit not to exist in this case. The knowledge base simply does not say what the relationship between τ₁ and τ₂ is. (It would certainly be possible, of course, to consider a richer language that allows such relationships to be expressed.)