Notes


There are many texts on all facets of probability; four standard introductions are by Ash [1970], Billingsley [1986], Feller [1957], and Halmos [1950]. In particular, these texts show that it is impossible to find a probability measure μ defined on all subsets of the interval [0, 1] in such a way that (1) the probability of an interval [a, b] is its length b a and (2) μ(U) = μ(U) if U is the result of translating U by a constant. (Formally, if x mod 1 is the fractional part of x, so that, e.g., 1.6 mod 1 = .6, then U is the result of translating U by the constant c if U ={(x + c) mod 1 : x U}.) There is a translation-invariant countably additive probability measure μ defined on a large σ- algebra of subsets of [0, 1] (that includes all intervals so that μ([a, b]) = b a) such that μ([a, b]) = b a. That is part of the technical motivation for taking the domain of a probability measure to be an algebra (or a σ-algebra, if W is infinite).

Billingsley [1986, p. 17] discusses why, in general, it is useful to have probability measures defined on algebras (indeed, σ-algebras). Dynkin systems [Williams 1991] (sometimes called λ systems [Billingsley 1986, p. 37]) are an attempt to go beyond algebras. A Dynkin system is a set of subsets of a space W that contains W and that is closed under complements and disjoint unions (or countable disjoint unions, depending on whether the analogue of an algebra or a σ-algebra is desired); it is not necessarily closed under arbitrary unions. That is, if is a Dynkin system, and U, V , then U V is in if U and V are disjoint, but if U and V are not disjoint, then U V may not be in . Notice that properties P1 and P2 make perfect sense in Dynkin systems, so a Dynkin system can be taken to be the domain of a probability measure. It is certainly more reasonable to assume that the set of sets to which a probability can be assigned form a Dynkin system rather than an algebra. Moreover, most of the discussion of probability given here goes through if the domain of a probability measure is taken to be a Dynkin system.

The use of the principle of indifference in probability is associated with a number of people in the seventeenth and eighteenth centuries, chief among them perhaps Bernoulli and Laplace. Hacking [1975] provides a good historical discussion. The term principle of indifference is due to Keynes [1921]; it has also been called the principle of insufficient reason [Kries 1886].

Many justifications for probability can be found in the literature. As stated in the text, the strongest proponent of the relative-frequency interpretation was von Mises [1957]. A recent defense of this position was given by van Lambalgen [1987].

Ramsey's [1931b] is perhaps the first careful justification of the subjective view-point; the variant of his argument given here is due to Paris [1994]. De Finetti [1931, 1937, 1972] proved the first Dutch book arguments. The subjective viewpoint often goes under the name Bayesianism and its adherents are often called Bayesians (named after Reverend Thomas Bayes, who derived Bayes' Rule, discussed in Chapter 3).

The notion of a bet considered here is an instance of what Walley [1991] calls a gamble: a function from the set W of worlds to the reals. (Gambles will be studied in more detail in Chapter 4.) Walley [1991, p. 152] describes a number of rationality axioms for when a gamble should be considered acceptable; gamble X is then considered preferable to Y if the gamble X Y is acceptable. Walley's axioms D0 and D3 correspond to RAT1 and RAT4; axiom D3 corresponds to a property RAT5 considered in Chapter 3. RAT2 (transitivity) follows for Walley from his D3 and the definitions. Walley deliberately does not have an analogue of RAT3; he wants to allow incomparable gambles.

Another famous justification of probability is due to Cox [1946], who showed that any function that assigns degrees to events and satisfies certain minimal properties (such as the degree of belief in U is a decreasing function in the degree of belief in U) must be isomorphic to a probability measure. Unfortunately, Cox's argument is not quite correct as stated; his hypotheses need to be strengthened (in ways that make them less compelling) to make it correct [Halpern 1999a; Halpern 1999b; Paris 1994].

Yet another justification for probability is due to Savage [1954], who showed that a rational agent (where "rational" is defined in terms of a collection of axioms) can, in a precise sense, be viewed as acting as if his beliefs were characterized by a probability measure. More precisely, Savage showed that a rational agent's preferences on a set of actions can be represented by a probability measure on a set of possible worlds combined with a utility function on the outcomes of the actions; the agent then prefers action a to action b if and only if the expected utility of a is higher than that of b. Savage's approach has had a profound impact on the field of decision theory (see Section 5.4).

The behavior of people on examples such as Example 2.3.2 has been the subject of intense investigation. This example is closely related to the Ellsberg paradox; see the references for Chapter 5.

The idea of modeling imprecision in terms of sets of probability measures is an old one, apparently going back as far as Boole [1854, Chapters 16–21] and Ostrogradsky [1838]. Borel [1943, Section 3.8] suggested that upper and lower probabilities could be measured behaviorally, as betting rates on or against an event. These arguments were formalized by Smith [1961]. In many cases, the set of probabilities is taken to be convex (so that if μ1 and μ2 are in , then so is aμ1 + bμ2, where a, b [0, 1] and a + b = 1)—see, for example, [Campos and Moral 1995; Cousa, Moral, and Walley 1999; Gilboa and Schmeidler 1993; Levi 1985; Walley 1991] for discussion and further references. It has been argued [Cousa, Moral, and Walley 1999] that, as far as making a decision goes, a set of probabilities is behaviorally equivalent to its convex hull (i.e., the least convex set that contains it). However, a convex set does not seem appropriate for representing say, the uncertainty in the two-coin problem from Chapter 1. Moreover, there are contexts other than decision making where a set of probabilities has very different properties from its convex hull (see Exercise 4.12). Thus, I do not assume convexity in this book.

Walley [1991] provides a thorough discussion of a representation of uncertainty that he calls upper and lower previsions. They are upper and lower bounds on the uncertainty of an event (and are closely related to lower and upper probabilities); see the notes to Chapter 5 for more details.

The idea of using inner measures to capture imprecision was first discussed in [Fagin and Halpern 1991b]. The inclusion-exclusion rule is discussed in most standard probability texts, as well as in standard introductions to discrete mathematics (e.g., [Maurer and Ralston 1991]). Upper and lower probabilities were characterized (independently, it seems) by Wolf [1977], Williams [1976], and Anger and Lembcke [1985]. In particular, Anger and Lembcke show that (2.18) (see Exercise 2.17) characterizes upper probabilities. (It follows from Exercise 2.17 that (2.13) characterizes lower probabilities.) Further discussion of the properties of upper and lower probabilities can be found in [Halpern and Pucella 2001].

The proof of Theorem 2.3.3 is sketched in Exercise 2.8. The result seems to be due to Horn and Tarski [1948]. As mentioned in the discussion in Exercise 2.8, if countable additivity is required, Theorem 2.3.3 may not hold. In fact, if countable additivity is required, the set μ may be empty! (For those familiar with probability theory and set theory, this is why: Let be the Borel subsets of [0, 1], let be all subsets of [0, 1], and let μ be Lebesgue measure defined on the Borel sets in [0, 1]. As shown by Ulam [1930], under the continuum hypothesis (which says that there are no cardinalities in between the cardinality of the reals and the cardinality of the natural numbers), there is no countably additive measure extending μ defined on all subsets of [0, 1].) A variant of Proposition 2.3.3 does hold even for countably additive measures. If μ is a probability measure on an algebra , let μ consist of all extensions of μ to some algebra (so that the measures in μ may be defined on different algebras). Define (μ)*(U) = inf{μ′(U):μ μ, μ′ is defined on U}. Then essentially the same arguments as those given in Exercise 2.8 show that μ* = (μ)*. These arguments hold even if all the probability measures in μ are required to be countably additive (assuming that μ is countably additive).

Belief functions were originally introduced by Dempster [1967, 1968], and then extensively developed by Shafer [1976]. Choquet [1953] independently and earlier introduced the notion of capacities (now often called Choquet capacities); a k-monotone capacity satisfies B3 for n = 1, , k; infinitely-monotone capacities are mathematically equivalent to belief functions. Theorem 2.4.1 was originally proved by Dempster [1967], while Theorem 2.4.3 was proved by Shafer [1976, p. 39]. Examples 2.4.4 and 2.4.5 are taken from Gordon and Shortliffe [1984] (with slight modifications). Fagin and I [1991b] and Ruspini [1987] were the first to observe the connection between belief functions and inner measures. Exercise 2.12 is Proposition 3.1 in [Fagin and Halpern 1991b]; it also follows from a more general result proved by Shafer [1979]. Shafer [1990] discusses various justifications for and interpretations of belief functions. He explicitly rejects the idea of belief function as a lower probability.

Possibility measures were introduced by Zadeh [1978], who developed them from his earlier work on fuzzy sets and fuzzy logic [Zadeh 1975]. The theory was greatly developed by Dubois, Prade, and others; a good introduction can be found in [Dubois and Prade 1990]. Theorem 2.5.4 on the connection between possibility measures and plausibility functions based on consonant mass functions is proved, for example, by Dubois and Prade [1982].

Ordinal conditional functions were originally defined by Spohn [1988], who allowed them to have values in the ordinals, not just values in *. Spohn also showed the relationship between his ranking functions and nonstandard probability, as sketched in Exercise 2.40. (For more on nonstandard probability measures and their applications to decision theory and game theory, see, e.g., [Hammond 1994].) The degree-of-surprise interpretation for ranking functions goes back to Shackle [1969].

Most of the ideas in Section 2.7 go back to Lewis [1973], but he focused on the case of total preorders. The presentation (and, to some extent, the notation) in this section is inspired by that of [Halpern 1997a]. What is called s in [Halpern 1997a] is called e here; ≻′ in [Halpern 1997a] is e here. The ordering s is actually taken from [Friedman and Halpern 2001]. Other ways of ordering sets have been discussed in the literature; see, for example, [Dershowitz and Manna 1979; Doyle, Shoham, and Wellman 1991]. (A more detailed discussion of other approaches and further references can be found in [Halpern 1997a].) The characterizations in Theorems 2.7.2 and 2.7.6 are typical of results in the game theory literature. These particular results are inspired by similar results in [Halpern 1999c]. These "set-theoretic completeness" results should be compared to the axiomatic completeness results proved in Section 7.5.

As observed in the text, the properties of e are quite different from those satisfied by the (total) preorder on sets induced by a probability measure. A qualitative probability preorder is a preorder on sets induced by a probability measure. That is, is a qualitative probability preorder if there is a probability measure μ such that U V iff μ(U) μ(V). What properties does a qualitative probability preorder have? Clearly, must be a total preorder. Another obvious property is that if V is disjoint from both U and U, then U U iff U V U V (i.e., the analogue of property Pl3 in Exercise 2.57). It turns out that it is possible to characterize qualitative probability preorders, but the characterization is nontrivial. Fine [1973] discusses this issue in more detail.

Plausibility measures were introduced in [Friedman and Halpern 1995; Friedman and Halpern 2001]; the discussion in Section 2.8 is taken from these papers. Weber [1991] independently introduced an equivalent notion. Schmeidler [1989] has a notion of nonadditive probability, which is also similar in spirit, except that the range of a nonadditive probability is [0, 1] (so that ν is a nonadditive probability on W iff (1) ν()= 0, (2) ν(W) = 1, and (3) ν(U) ν(V) if U V).

The issue of what is the most appropriate representation to use in various setting deserves closer scrutiny. Walley [2000] has done one of the few serious analyses of this issue; I hope there will be more.




Reasoning About Uncertainty
Reasoning about Uncertainty
ISBN: 0262582597
EAN: 2147483647
Year: 2005
Pages: 140

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net