Notes | Reasoning about Uncertainty

The notions of (conditional) independence and random variable are standard in probability theory, and they are discussed in all texts on probability (and, in particular, the ones cited in Chapter 2). Fine [1973] and Walley [1991] discuss qualitative properties of conditional independence such as CI1–6; Walley, in fact, includes CI3 as part of his definition of independence. Walley calls the asymmetric version of independence irrelevance. It is an interesting notion in its own right; see [Cozman 1998; Cozman and Walley 1999].

The focus on conditional independence properties can be traced back to Dawid [1979] and Spohn [1980], who both discussed properties that are variants of CIRV1– 6 (CIRV6 is discussed in Exercise 4.21). Pearl [1988] discusses these properties at length. These properties have been called the graphoid properties in the literature, which contains extensive research on whether they completely characterize conditional independence of random variables. Very roughly, graphoid properties do not characterize conditional independence of random variables—infinitely many extra properties are required to do that—but they do provide a complete characterization for all the properties of conditional independence of the form "if I^rv_μ (X₁, Y₁ | Z₁) and I ^rv_μ (X₂, Y₂ | Z₂) then I^rv_μ (X₃, Y₃ | Z₃)," that is, where two (or fewer) conditional independence assertions imply a third one. (Note that CIRV1–6 all have this form.) Studeny [1994] proves this result, discusses the issue, and provides further references.

Noninteractivity was originally defined in the context of possibility measures by Zadeh [1978]. It was studied in the context of possibility measures by Fonck [1994], who showed that it was strictly weaker than independence for possibility measures. Shenoy [1994] defines a notion similar in spirit to noninteractivity for random variables. Lemmas 4.3.3 and 4.3.5 are taken from [Halpern 2001a]. Besides noninteractivity, a number of different approaches to defining independence for possibility measures [Campos and Huete 1999a; Campos and Huete 1999b; Dubois, Fari as del Cerro, Herzig, and Prade 1994] and for sets of probability measures [Campos and Huete 1993; Campos and Moral 1995; Cousa, Moral, and Walley 1999] have been considered. In general, CIRV1–5 do not hold for them.

As Peter Walley [private communication, 2000] points out, Example 4.3.6 is some-what misleading in its suggestion that independence with respect to Pl_퓹 avoids counter-intuitive results with respect to functional independence. Suppose that the probabilities in the example are modified slightly so as to make them positive. For example, suppose that the coin in the example is known to land heads with probability either .99 or .01 (rather than 1 and 0, as in the example). Let μ′₀ and μ′₁ be the obvious modifications of μ₀ and μ₁ required to represent this situation, and let 퓹′ = {μ′₀, μ′₁}. Now H¹ and H² are "almost functionally dependent." H¹ and H² continue to be type-1 independent, and noninteractivity continues to hold, but now I_{P1_퓹′}(H¹, H² also holds. The real problem here is the issue raised in Section 3.4: this representation of uncertainty does not take evidence into account.

Theorem 4.4.5 is taken from [Halpern 2001a]. Characterizations of uncertainty measures for which CIRV1–5 hold, somewhat in the spirit of Theorem 4.4.5, can also be found in [Darwiche 1992; Darwiche and Ginsberg 1992; Friedman and Halpern 1995; Wilson 1994].

The idea of using graphical representations for probabilistic information measures can be traced back to Wright [1921] (see [Goldberger 1972] for a discussion). The work of Pearl [1988] energized the area, and it is currently a very active research topic, as a glance at recent proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) [Cooper and Moral 1998; Laskey and Prade 1999; Boutilier and Goldszmidt 2000] will attest. The books by Castillo, Gutierrez, and Hadi [1997], Jensen [1996], and Neapolitan [1990] cover Bayesian networks in detail. Charniak [1991] provides a readable introduction.

Pearl [1988] introduced the notion of d-separation. The first half of Theorem 4.5.7 was proved by Verma [1986], and the second half by Geiger and Pearl [1988]; see also [Geiger, Verma, and Pearl 1990]. Construction 4.5.5 and Theorem 4.5.6 are also essentially due to Verma [1986]. Heckerman [1990] provides a good discussion of the PATHFINDER system. Numerous algorithms for performing inference in Bayesian networks are discussed by Pearl [1988] and in many of the papers in the proceedings of the UAI Conference. Plausibilistic Bayesian networks are discussed in [Halpern 2001a], from where the results of Section 4.5.4 are taken. Independence and d-separation for various approaches to representing sets of probability measures using Bayesian networks are discussed by Cozman [2000a; 2000b]. However, the technical details are quite different from the approach taken here.