11.2 The Random-Worlds Approach

The basic idea behind the random-worlds approach is easy to explain and understand. Fix a finite vocabulary 풯 and a domain D_N of size N; for simplicity, take D_N = {1, …, N}. Since 풯 is finite, there are only finitely many possible relational 풯-structures with domain D_N. Since "relational 풯-structures with domain D_N" is a bit of a mouthful, in the remainder of this chapter I call them simply D_N-풯-structures.

If 풯 consists of the unary predicate P, there are 2^N D_N-풯-structures: for each subset U of D_N, there is a D_N-풯-structure 풜_U such that P^A_U = U.
If 풯 consists of the unary predicate P and the constant symbol c, then there are 2^N N D_N-풯-structures; these can be characterized by pairs (U, i), where U ⊆ D_N is the interpretation of P and i ∊ D_n is the interpretation of c.
If 풯 consists of the binary predicate B, then there are 2^N² D_N-풯-structures, one for each subset of D_N D_N.

Given a D_N-풯-structure 풜, let μ^unif_N be the uniform probability measure on D_N, which gives each element of D_N probability 1/N. Then (풜, μ^unif_N) is a statistical 풯-structure and can be used to determine the truth of all sentences in ^QU,stat(풥). Now consider a simple probability structure (W_N, μ), where the worlds in W_N are all the pairs of the form (풜, μ^unif_N), and μ is the uniform probability measure on W_N. In this probability structure, the conditional probability of a formula φ ∊ ^QU,stat(풯) given a knowledge base KB consisting of formulas in ^QU,stat(풯) is just the fraction of worlds satisfying KB that also satisfy φ. This is what I will take as the degree of belief of φ given KB (given that the domain size is N).

The intuition behind this approach is not hard to explain. If all worlds are originally equally likely (which seems reasonable, in the absence of any other information), then the degree of belief that the agent ascribes to φ upon learning KB should be the conditional probability that φ is true, given that KB is true. Put another way, the degree of belief that the agent ascribes to φ is just the probability of choosing a world (relational structure) at random that satisfies φ out of all the worlds that satisfy KB. That is why this is called the random-worlds approach.

There are two details I need to fill in to make this completely formal. I started by assuming a fixed domain size of N. But where did N come from? Why is a particular choice of N the right choice? In fact, there is no obvious choice of N. Typically, however, the domain is known to be large. (There are many birds and many people.) One way of approximating the degree of belief for a true but unknown large N is to consider the limiting conditional probability as N grows to infinity. This is what I in fact do here.

The other issue that needs to be dealt with involves some problematic aspects related to the use of the language ^QU,stat(풯). To understand the issue, consider a formula such as ‖Hep(x) | Jaun(x)‖_x = .9, which says that 90 percent of people with jaundice have hepatitis. Notice, however, that is impossible for exactly 90 percent of people with jaundice to have hepatitis unless the number of people with jaundice is a multiple of ten. The statistical assertion was almost certainly not intended to have as a consequence such a statement about the number of people with jaundice. Rather, what was intended was almost certainly something like "approximately 90 percent of people with jaundice have hepatitis." Intuitively, this says that the proportion of jaundiced patients with hepatitis is close to 90 percent: that is, within some tolerance τ of .9. To capture this, I consider a language that uses approximate equality and inequality, rather than equality and inequality. The language has an infinite family of connectives ≈_i, ≼_i, and ≽_i, for i = 1, 2, 3… ("i-approximately equal" or "i- approximately less than or equal"). The statement "80 percent of jaundiced patients have hepatitis" then becomes, say, ∥Hep(x) | Jaun(x)_x ≈₁ .8. The intuition behind the semantics of approximate equality is that each comparison should be interpreted using some small tolerance factor to account for measurement error, sample variations, and so on. The appropriate tolerance may differ for various pieces of information, so the logic allows different subscripts on the "approximately equals" connectives. A formula such as ∥Flies(x) | Bird∥_x ≈₁ 1 ∧ ∥Flies(x) | Bat(x)∥_x ≈₂ 1 says that both ∥Flies(x) | Bird(x)∥_x and ∥Flies(x) | Bat(x)∥_x are approximately 1, but the notion of "approximately" may be different in each case. (Note that the actual choice of subscripts is irrelevant here, as long as different notions of "approximately" are denoted by different subscripts.)

The formal definition of the language ≈(풯) is identical to that of ^QU,stat(풯), except instead of statistical likelihood formulas, inequality formulas of the form ∥φ | ψ∥_X ∼ α are used, where ∼ is either ≈_i, ≼_i, or ≽_i, for i = 1, 2, 3, …. (The reason for using conditional statistical likelihood terms, rather than just unconditional ones as in ^QU,stat, will shortly become clear. The results in this section and the next still hold even with polynomial statistical likelihood terms, but allowing only these simple inequality formulas simplifies the exposition.) Of course, a formula such as ∥φ∥_X ≈_i α is an abbreviation for ∥φ | true∥_X ≈_i α. Call the resulting language ≈(풯). As usual, I suppress the 풯 if it does not play a significant role in the discussion.

The semantics for  must include some way of interpreting ≈_i, ≼_i, and ≽_i. This is done by using a tolerance vector . Intuitively ζ ≈_i ζ′ if the values of ζ and ζ′ are within τ_i of each other. (For now there is no need to worry about where the tolerance vector is coming from.) A statistical-approximation 풯-structure is a tuple , where 풜 is a relational 풯-structure and is a tolerance vector. Let ≈(풯) consist of all statistical-approximation 풯-structures.

Given a tolerance vector , a formula φ ∊ ≈ can be translated to a formula . The idea is that a formula such as ∥φ | ψ∥_X ≼_i α becomes ∥φ | ψ∥_X ≤ α + τ_i; multiplying out the denominator, this is ∥φ ∧ ψ∥_X ≤ (α + τ_i)∥ψ∥_X. Formally, the translation is defined inductively as follows:

This translation shows why conditional statistical terms are taken as primitive in ≈, rather than taking them to be abbreviations for the expressions that result by clearing the denominator. Suppose that the knowledge base KB says

that is, the proportion of penguins is very small but the proportion of fliers among penguins is also very small. Clearing the denominator naively results in the knowledge base

KB′ = (∥Penguin(x)∥_x ≈₁ 0) ∧ (∥Flies(x) ∧ Penguin(x)∥_x ≈₂ 0 ∥Penguin(x)∥_x),

which is equivalent to

This last formula simply asserts that the proportion of penguins and the proportion of flying penguins are both small, but says nothing about the proportion of fliers among penguins. In fact, the world where all penguins fly is consistent with KB′. Clearly, the process of multiplying out across an approximate connective does not preserve the intended interpretation of the formulas.

In any case, using the translation, it is straightforward to give semantics to formulas in ≈. For a formula ⊘ ∊ ≈

It remains to assign degrees of belief to formulas. Let W_N(풯) consist of all D_N-풯-structures; let be the set of worlds 풜 ∊ W_N(풯) such that ; let be the cardinality of . The degree of belief in φ given KB with respect to W_N and is

If , the degree of belief is undefined.

Strictly speaking, I should write # rather than #, since the number also depends on the choice of 풯. The degree of belief, however, does not depend on the vocabulary. It is not hard to show that if both 풯 and 풯′ contain all the symbols that appear in φ and KB, then

(Exercise 11.1).

Typically, neither N nor is known exactly. However, N is thought of as "large" and is thought of as "small." As I suggested earlier, one way of approximating the value of an expression where N is "large" is by considering the limit as N goes to infinity; similarly, I approximate the value of the expression for "small" by taking the limit as goes to . That is, I take the degree of belief in φ given KB to be . Notice that the limit is taken first over N for each fixed and then over . This order is important. If the limit appeared last, then nothing would be gained by using approximate equality, since the result would be equivalent to treating approximate equality as exact equality (Exercise 11.2). Note also that the limit of the expression as may depend on how approaches . For example, if , then can take on any value from 0 to ∞ depending on how 〈τ₁, τ₂〉 → 〈0, 0〉. It is not hard to show that, unless the limit is the same no matter how approaches , then there will be some way of having approach for which the limit does not exist at all (Exercise 11.3).

In any case, this limit may not exist, for a number of reasons. An obvious one is that is undefined if #. It actually is not important if # for finitely many values of N; in the limit, this is irrelevant. However, what if KB includes a conjunct such as FIN₁₀₀, which is true only if N ≤ 100? In that case, # for all N > 100, and the limit will certainly not exist. Of course, if the agent is fortunate enough to know the domain size, then this approach (without taking limits) can be applied to domains of that size. However, in this chapter I am interested in the case that there are no known upper bounds on the domain size for any given tolerance. More precisely, I consider only knowledge bases KB that are eventually consistent, in that there exists such that for all with (where means that for all i) there exists such that # for all .

Even if KB is eventually consistent, the limit may not exist. For example, it may be the case that for some i, oscillates between α + τ_i and α − τ_i as N gets large. In this case, for any particular , the limit as N grows does not exist. However, it seems as if the limit as grows small "should", in this case, be α, since the oscillations about α go to 0. Such problems can be avoided by considering the lim sup and lim inf, rather than the limit. The lim inf of a sequence is the limit of the infimums; that is,

The lim sup is defined analogously, using sup instead of inf. Thus, for example, the lim inf of the sequence 0, 1, 0, 1, 0, … is 0; the lim sup is 1. The limit clearly does not exist. The lim inf exists for any sequence bounded from below, even if the limit does not; similarly, the lim sup exists for any sequence bounded from above (Exercise 11.4).

The lim inf and lim sup of a sequence are equal iff the limit of the sequence exists and is equal to each of them; that is, lim inf_N→∞ a_n = lim sup_{N →∞} a_n = a iff lim_N→∞ a_n = a. Thus, using lim inf and lim sup to define the degree of belief leads to a definition that generalizes the one given earlier in terms of limits. Moreover, since, for any the sequence is always bounded from above and below (by 1 and 0, respectively), the lim sup and lim inf always exist.

Definition 11.2.1

both exist and are equal, then the degree of belief in φ given KB, written μ_∞(φ | KB), is defined as the common limit; otherwise μ_∞(φ | KB) does not exist.

Even using this definition, there are many cases where the degree of belief does not exist. This is not necessarily bad. It simply says that the information provided in the knowledge base does not allow the agent to come up with a well-defined degree of belief. There are certainly cases where it is better to recognize that the information is inconclusive rather than trying to create a number. (See Example 11.3.9 for a concrete illustration.)

Definitions cannot be said to be right or wrong; we can, however, try to see whether they are interesting or useful, and to what extent they capture our intuitions. In the next four sections, I prove a number of properties of the random-worlds approach to obtaining a degree of belief given a knowledge base consisting of statistical and first-order information, as captured by Definition 11.2.1. The next three sections illustrate some attractive features of the approach; Section 11.6 considers some arguably unattractive features.