Notes | Reasoning about Uncertainty

All standard texts on probability discuss conditioning and Bayes' Rule in detail. The betting justification for conditional probability goes back to Teller [1973] (who credits David Lewis with the idea); the version of the argument given in Section 3.2.1 is based on one given by Paris [1994] (which in turn is based on work by Kemeny [1955] and Shimony [1955]). In particular, Paris [1994] provides a proof of Theorem 3.2.5.

Another defense of conditioning, given by van Fraassen [1984], is based on what he calls the Reflection Principle. If μ denotes the agent's current probability and μ_t denotes his probability at time t, the Reflection Principle says that if, upon reflection, an agent realizes that his degree of belief at time t that U is true will be α, then his current degree of belief should also be α. That is, μ(U|μ_t(U) = α) should be α. Van Fraassen then shows that if a rational agent's beliefs obey the Reflection Principle, then he must update his beliefs by conditioning. (The Reflection Principle is sometimes called Miller's Principle, since it was first mentioned by Miller [1966].) Gaifman [1986] and Samet [1997; 1998b] present some more recent work connecting conditioning and reflection.

The Reflection Principle is closely related to another issue discussed in the text: the difference between an agent's current beliefs regarding V if U were to occur and how the agent would change his beliefs regarding V if U actually did occur. This issue has been discussed at length in the literature, going back to Ramsey's [1931b] seminal paper. Walley [1991, Section 6.1] and Howson and Urbach [1989, Chapter 6] both have a careful discussion of the issue and give further pointers to the literature.

Van Fraassen [1987] provides yet another defense of conditioning. He shows that any updating process that satisfies two simple properties (essentially, that updating by U results in U having probability 1, and that the update procedure is representation independent in a certain sense) must be conditioning.

Bacchus, Kyburg, and Thalos [1990] present a relatively recent collection of arguments against various defenses of probabilistic conditioning.

The problem of dealing with conditioning on sets of probability 0 is an old one. Walley [1991, Section 6.10] gives a careful discussion of the issue as well as further references. As pointed out in the text, conditional probability measures are one attempt to deal with the issue. (It is worth stressing that, even if conditioning on sets of probability 0 is not a concern, there are still compelling philosophical reasons to take conditional probability as primitive; beliefs are always relative to—that is, conditional on—the evidence at hand. With this intuition, it may not always be appropriate to assume that ′, the set of events that can be conditioned on, is a subset of . An agent may not be willing to assign a probability to the event of getting certain evidence and still be willing to assign a probability to other events, conditional on having that evidence.) In any case, Popper [1968] was the first to consider formally conditional probability as the basic notion. De Finetti [1936] also did some early work, apparently independently, taking conditional probabilities as primitive. Indeed, as R nyi [1964] points out, the idea of taking conditional probabilities as primitive seems to go back as far as Keynes [1921]. CP1–3 are essentially due to R nyi [1955]. Conditional probability measures are sometimes called Popper functions. Van Fraassen [1976] calls an acceptable conditional probability measure a Popper measure. The relationship between nonstandard probability measures and conditional probability measures is considered in [Halpern 2001b; McGee 1994].

[Grove and Halpern 1998] provides a characterization of the approach to updating sets of probabilities considered here (i.e., conditioning each probability measure μ in the set individually, as long as the new information is compatible with μ) in the spirit of van Fraassen's [1987] characterization. Other approaches to updating sets of probabilities are certainly also possible, even among approaches that throw out some probability measures and condition the rest. Gilboa and Schmeidler [1993] focus on one such rule. Roughly speaking, they take ||U = {μ|U : μ ∊ , μ(U) = sup _{μ′ ∊} μ′(U)}. They show that if is a closed, convex set of probability measures, this update rule acts like DS conditioning (hence my choice of notation). (A set of probability measures is closed if it contains its limits. That is, a set of probability measures on W is closed if for all sequences μ₁, μ₂, … of probability measures in , if μ_n → μ in the sense that μ_n(U) → μ(U) for all measurable U ⊆ W, then μ ∊ . Thus, for example, if W = {0, 1} and μ_n assigns probability 1/n to 0 and (n − 1)/n to 1, then μ_n → μ, where μ(0) = 0 and μ(1) = 1.)

The three-prisoners puzzle is old. It is discussed, for example, in [Bar-Hillel and Falk 1982; Gardner 1961; Mosteller 1965]. The description of the story given here is taken from [Diaconis 1978], and much of the discussion is based on that given in [Fagin and Halpern 1991a], which in turn is based on that in [Diaconis 1978; Diaconis and Zabell 1986].

The fact that conditioning on sets of probability measures loses valuable information, in the sense discussed in Example 3.4.1, seems to be well known, although I am not aware of any explicit discussion of it. Peter Walley was the one who convinced me to consider it seriously. The representation of evidence using belief functions was first considered by Shafer [1976], who proved Theorems 3.4.5 and 3.4.6 (see [Shafer 1976, Theorems 9.7, 9.8]). Shafer also defined Bel_o. The representation μ_o is taken from [Halpern and Fagin 1992] and [Walley 1987], as is the general formulation of the results in terms of the space .

There has been a great deal of work in the literature on representing evidence in a purely probabilistic framework. Much of the work goes under the rubric confirmation or weight of evidence. In the literature evidence is typically represented as a number in the range [−∞, ∞], with 0 meaning "no evidence one way or another", ∞ meaning "overwhelming evidence in favor of the hypothesis", and −∞ means "overwhelming evidence against the hypothesis." See, for example, [Good 1960; Milne 1996] for typical papers in this (quite vast) literature. The representation ‖U has, to the best of my knowledge, not been considered before.

Theorem 3.5.1 was proved independently by numerous authors [Campos, Lamata, and Moral 1990; Fagin and Halpern 1991a; Smets and Kennes 1989; Walley 1981]. Indeed, it even appears (lost in a welter of notation) as Equation 4.8 in Dempster's original paper on belief functions [Dempster 1967].

Theorems 3.6.2 and 3.6.3 are proved in [Fagin and Halpern 1991a]; Theorem 3.6.3 was proved independently by Jaffray [1992]. Several characterizations of Bel(V ||U) are also provided in [Fagin and Halpern 1991a], including a charaacterization as a lower probability of a set of probability measures (although not the set _Bel|U). Gilboa and Schmeidler [1993] provide an axiomatic defense for DS conditioning.

The approach discussed here for conditioning with possibility measures is due to Hisdal [1978]. Although this is the most commonly used approach in finite spaces, Dubois and Prade [1998, p. 206] suggest that in infinite spaces, for technical reasons, it may be more appropriate to use Poss( ||U) rather than Poss( |U) as the notion of conditioning. They also argue that Poss(V | U) is appropriate for a qualitative, nonnumeric representation of uncertainty, while Poss(V ||U) is more appropriate for a numeric representation. A number of other approaches to conditioning have been considered for possibility measures; see [Dubois and Prade 1998; Fonck 1994]. The definition of conditioning for ranking functions is due to Spohn [1988].

The discussion of conditional plausibility spaces in Section 3.9, as well as the definition of an algebraic cps, is taken from [Halpern 2001a]; it is based on earlier definitions given in [Friedman and Halpern 1995]. The idea of putting an algebraic structure on likelihood measures also appears in [Darwiche 1992; Darwiche and Ginsberg 1992; Weydert 1994].

Jeffrey's Rule was first discussed and motivated by Jeffrey [1968], using an example much like Example 3.10.1. Diaconis and Zabell [1982] discuss a number of approaches to updating subjective probability, including Jeffrey's Rule, variation distance, and relative entropy. Proposition 3.11.1 was proved by May [1976].

Entropy and maximum entropy were introduced by Shannon in his classic book with Weaver [Shannon and Weaver 1949]; Shannon also characterized entropy as the unique function satisfying certain natural conditions. Jaynes [1957] was the first to argue that maximum entropy should be used as an inference procedure. That is, given a set C of constraints, an agent should act "as if" the probability is determined by the measure that maximizes entropy relative to C. This can be viewed as a combination of relative entropy together with the principle of indifference. See Chapter 11 for more on maximum entropy and the principle of indifference.

Relative entropy was introduced by Kullback and Leibler [1951]. An axiomatic defense of maximum entropy and relative entropy was given by Shore and Johnson [1980]; a recent detailed discussion of the reasonableness of this defense is given by Uffink [1995]. See [Cover and Thomas 1991] for a good introduction to the topic.

Maximum entropy and relative entropy are widely used in many applications today, ranging from speech recognition [Jelinek 1997] to modeling queuing behavior [Kouvatsos 1994]. Analogues of maximum entropy were proposed for belief functions by Yager [1983] and for possibility measures by Klir and his colleagues [Hagashi and Klir 1983; Klir and Mariano 1987]. An example of the counterintuitive behavior of relative entropy is given by van Fraassen's Judy Benjamin problem [1981]; see [Grove and Halpern 1997; Gr nwald and Halpern 2002] for further discussion of this problem.

The lexicographic probability spaces discussed in Exercises 3.50 and 3.51 were introduced by Blume, Brandenburger, and Dekel [1991a; 1991b], who showed that they could be used to model a weaker version of Savage's [1954] postulates of rationality, discussed in the notes to Chapter 2. A number of papers have discussed the connection among conditional probability spaces, nonstandard probability measures, and sequences of probability measures [Halpern 2001b; Hammond 1994; R nyi 1956; van Fraassen 1976; McGee 1994].