4.5 Bayesian Networks


4.5 Bayesian Networks

Suppose that W is a set of possible worlds characterized by n binary random variables = {X1,, Xn}. That is, a world w W is a tuple (x1, , xn), where xi {0, 1} is the value of Xi. That means that there are 2n worlds in W, say w1, , w2n. A naive description of a probability measure on W requires 2n 1 numbers, α1, , α2n1, where αi is the probability of world wi. (Of course, the probability of w2n is determined by the other probabilities, since they must sum to 1.)

If n is relatively small, describing a probability measure in this naive way is not so unreasonable, but if n is, say, 1,000 (certainly not unreasonable in many practical applications), then it is completely infeasible. Bayesian networks provide a tool that makes describing and working with probability measures far more feasible.

4.5.1 Qualitative Bayesian Networks

A (qualitative) Bayesian network (sometimes called a belief network) is a dag, that is, a directed acyclic graph, whose nodes are labeled by random variables. (For readers not familiar with graph theory, a directed graph consists of a collection of nodes or vertices joined by directed edges. Formally, a directed edge is just an ordered pair of nodes; the edge (u, v) can be drawn by joining u and v by a line with an arrow pointing from u to v. A directed graph is acyclic if there is no cycle; that is, there does not exist a sequence of vertices v0, , vk such that v0 = vk, and there is an edge from vi to vi+1 for i = 0, , k 1.) Informally, the edges in a Bayesian network can be thought of as representing causal influence.

For example, a Bayesian network for reasoning about the relationship between smoking and cancer might include binary random variables such as C for "has cancer," SH for "exposed to secondhand smoke," PS for "at least one parent smokes," and S for "smokes." The Bayesian network Gs in Figure 4.1 represents what seem to be reasonable causal relationships. Intuitively, Gs says that whether or not a patient has cancer is directly influenced by whether he is exposed to secondhand smoke and whether he smokes. Both of these random variables, in turn, are influenced by whether his parents smoke. Whether or not his parents smoke also clearly influences whether or not he has cancer, but this influence is mediated through the random variables SH and S. Once whether he smokes and was exposed to secondhand smoke is known, finding out whether his parents smoke gives no additional information. That is, C is independent of PS given SH and S.


Figure 4.1: A Bayesian network Gs that represents the relationship between smoking and cancer.

More generally, given a Bayesian network G and a node X in G, think of the ancestors of X in the graph as those random variables that have a potential influence on X. (Formally, Y is an ancestor of X in G if there is a directed path from Y to X in G—i.e., a sequence (Y1, , Yk) of nodes—such that Y1 = Y, Yk = X, and there is a directed edge from Yi to Yi+1 for i = 1, , k 1.) This influence is mediated through the parents of X, those ancestors of X directly connected to X. That means that X should be conditionally independent of its ancestors, given its parents. The formal definition requires, in fact, that X be independent not only of its ancestors, but of its nondescendants, given its parents, where the nondescendants of X are those nodes Y such that X is not the ancestor of Y.

Definition 4.5.1

start example

Given a qualitative Bayesian network G, let ParG(X) denote the parents of the random variable X in G; let DesG(X) denote the descendants of X, that is, X and all those nodes Y such that X is an ancestor of Y ; let NonDesG(X) denote the nondescendants of X in G, that is, — DesG(X). Note that all ancestors of X are nondescendants of X. The Bayesian network G (qualitatively) represents, or is compatible with, the probability measure μ if Irvμ(X, NonDesG(X) | Par(X)), that is, X is conditionally independent of its nondescendants given its parents, for all X .

end example

Definition 4.5.1 says that G represents μ if, in a certain sense, it captures the conditional independence relations in μ. But what makes this notion of representation at all interesting? In particular, why focus on conditional independencies of the form Irvμ (X, NonDesG(X) | Par(X))? To explain this, I digress briefly to discuss what is called the chain rule for probability. Given arbitrary sets U1, , Un, it is immediate from the definition of conditional probability that

(Assume for now all the relevant probabilities are positive, so that the conditional probabilities are well defined.) Applying this observation inductively gives the chain rule:

As a special instance of the chain rule, take Ui to be the event Xi = xi. Plugging this into (4.1) gives

Now suppose without loss of generality that X1, , Xn is a topological sort of (the nodes in) G; that is, if Xi is a parent of Xj, then i < j. It easily follows that {X1, , Xk1} NonDesG(Xk), for k = 1, , n (Exercise 4.20); all the descendants of Xk must have subscripts greater than k. Thus, all the nodes in {X1, , Xk1} are independent of Xk given ParG(Xk). It follows that

Thus, if G represents μ, then (4.2) reduces to

Equation (4.3) shows that, if G represents μ, then μ is completely determined by conditional probabilities of the form

The consequences of this observation are explored in the next section.

4.5.2 Quantitative Bayesian Networks

A qualitative Bayesian network G gives qualitative information about dependence and independence, but does not actually give the values of the conditional probabilities. A quantitative Bayesian network provides more quantitative information, by associating with each node X in G a conditional probability table (cpt) that quantifies the effects of the parents of X on X. For example, if X's parents in G are Y and Z, then the cpt for X would have an entry denoted dY = j, Z = k for all (j, k) {0, 1}2. As the notation is meant to suggest, dY = j Z = k = μ(X = 1 | Y = j Z = k) for the probability measure μ represented by G. (Of course, there is no need to have an entry for μ(X = 0 | Y = j Z = k), since this is just 1 μ(X = 1 | Y = j Z = k).) Formally, a quantitative Bayesian network is a pair (G, f) consisting of a qualitative Bayesian network G and a function f that associates with each node X in G a cpt, where there is an entry in the interval [0, 1] in the cpt for each possible setting of the parents of X.If X is a root of G, then the cpt for X can be thought of as giving the unconditional probability that X = 1.

Example 4.5.2

start example

Consider the Bayesian network Gs for smoking described in Figure 4.1. Let fs be the function that associates with the nodes C, S, SH, and PS the cpts shown in Figure 4.2.

The first line in the cpt for C describes the conditional probability that C = 1 given = 1 SH = 1; the second line describes the conditional probability that C = 1 given S = 1 SH = 0; and so on. Note that the probability that C = 0 given S = 1 SH = 1 is .4 (1 .6); similarly, the probability that C = 0 conditional on each setting of S and SH can be calculated from the cpt. Finally, note that the .3 in the cpt for PS is the unconditional probability of PS.

end example

click to expand
Figure 4.2: Cpts for the smoking example.

Definition 4.5.3

start example

A quantitative Bayesian network (G, f) (quantitatively) represents, or is compatible with, the probability measure μ if G qualitatively represents μ and the cpts agree with μ in that, for each random variable X, the entry in the cpt for X given some setting Y1 = y1, , Yk = yk of its parents is μ(X = 1 | Y1 = y1 Yk = yk) if μ(Y1 = y1 Yk = yk) 0. (It does not matter what the cpt entry for Y1 = y1, , Yk = yk is if μ(Y1 = y1 Yk = yk) = 0.)

end example

It follows immediately from (4.3) that if (G, f) quantitatively represents μ, then μ can be completely reconstructed from (G, f). More precisely, (4.3) shows that the 2n values μ(X1 = x1 Xn = xn) can be computed from (G, f); from these values, μ(U) can be computed for all U W.

Proposition 4.5.4

start example

A quantitative Bayesian network (G, f) always quantitatively represents a unique probability measure, the one determined by using (4.3).

end example

Proof See Exercise 4.22.

It is easy to calculate that for the unique probability measure μ represented by the quantitative Bayesian network (Gs, fs) in Example 4.5.2,

(These numbers have been made up purely for this example and bear no necessary relationship to reality!)

Proposition 4.5.4, while straightforward, is important because it shows that there is no choice of numbers in a cpt that can be inconsistent with probability. Whatever the numbers are in the cpts for (G, f), (as long as they are in the interval [0, 1]) there is always a probability measure that is compatible with (G, f).

What about the converse? Can every probability measure on W be represented by a quantitative Bayesian network? It can, and in general there are many ways of doing so.

Construction 4.5.5

start example

Given μ, let Y1, , Yn be any permutation of the random variables in . (Think of Y1, , Yn as describing an ordering of the variables in .) Construct a qualitative Bayesian network as follows. For each k, find a minimal subset of {Y1, , Yk 1}, call it Pk, such that Irvμ ({Y1, , Yk 1}, Yk | Pk). (Clearly there is a subset with this property, namely, {Y1, , Yk 1} itself. It follows that there must be a minimal subset with this property.) Then add edges from each of the nodes in Pk to Yk. Call the resulting graph G.

end example

Theorem 4.5.6

start example

The Bayesian network G obtained by applying Construction 4.5.5 to the probability measure μ qualitatively represents μ.

end example

Proof Note that Y1, , Yk represents a topological sort of G; edges always go from nodes in {Y1, , Yk1} to Yk. It follows that G is acyclic; that is, it is a dag. The construction guarantees that Pk = ParG(Yk) and that Irvμ ({Y1, , Yk 1}, Yk | ParG(Yk)). Using CIRV1-5, it can be shown that Irvμ(NonDesG(Yk), Yk | ParG(Yk)) (Exercise 4.23). Thus, G qualitatively represents μ.

Of course, given G and μ, it is then easy to construct a quantitative Bayesian network (G, f) that quantitatively represents μ by consulting μ.

How much does this buy us? That depends on how sparse the graph is, that is, on how many parents each node has. If a node has k parents, then its conditional probability table has 2k entries. For example, there are nine entries altogether in the cpts in the Bayesian network (Gs, fs) in Example 4.5.2: the cpt for C has four entries, the cpts for SH and S each have two entries, and the one for PS has only one entry. On the other hand, a naive description of the probability measure would require fifteen numbers. In general, if each node in a Bayesian network has at most k parents, then there are at most n2k entries in all the cpts. If k is small, then n2k can be much smaller than 2n 1, the number of numbers needed for a naive description of the probability measure. (The numbers 2k and 2n 1arise because I have considered only binary random variables. If the random variables can have m values, say 0, 1, , m 1, the conditional probability table for a random variable X with k parents would have to describe the probability that X = j, for j = 1, , m 1, for each of the mk possible settings of its parents, so would involve (m 1)mk entries.)

Not only does a well-designed Bayesian network (I discuss what "well-designed" means in the next paragraph) typically require far fewer numbers to represent a probability measure, the numbers are typically easier to obtain. For example, for (Gs, fs), it is typically easier to obtain entries in the cpt like μ(C = 1 | S = 1 SH = 0)—the probability that someone gets cancer given that they smoke and are not exposed to secondhand smoke—than it is to obtain μ(C = 1 S = 1 SH = 0 PS = 0).

Note that the Bayesian network constructed in Theorem 4.5.6 depends on the ordering of the random variables. For example, the first element in the ordering is guaranteed to be a root of the Bayesian network. Thus, there are many Bayesian networks that represent a given probability measure. But not all orderings lead to equally useful Bayesian networks. In a well-designed Bayesian network, the nodes are ordered so that if X has a causal influence on Y, then X precedes Y in the ordering. This typically leads both to simpler Bayesian networks (in the sense of having fewer edges) and to conditional probabilities that are easier to obtain in practice. For example, it is possible to construct a Bayesian network that represents the same probability measure as (Gs, fs) but has S as the root, by applying Theorem 4.5.6 with the ordering S, C, PS, SH (Exercise 4.24). However, not only does this network have more edges, the conditional probability tables require entries that are harder to elicit in practice. It is easier to elicit from medical experts the probability that someone will smoke given that at least one of his parents smokes (μ(S = 1 | PS = 1)) than the probability that at least one of a smoker's parents also smokes (μ(PS = 1 | S = 1)).

One of the main criticisms of the use of probability in applications such as expert systems used to be that probability measures were too hard to work with. Bayesian networks deal with part of the criticism by providing a (potentially) compact representation of probability measures, one that experience has shown can be effectively constructed in realistic domains. For example, they have been used by PATHFINDER, a diagnostic expert system for lymph node diseases. The first step in the use of Bayesian networks for PATHFINDER was for experts to decide what the relevant random variables were. At one stage in the design, they used 60 binary random variables to represent diseases (did the agent have the disease or not) and over 100 random variables for findings (symptoms and test results). Deciding on the appropriate vocabulary took 8 hours, constructing the appropriate qualitative Bayesian network took 35 hours, and making the assessments to fill in the cpts took another 40 hours. This is considered a perfectly acceptable length of time to spend in constructing a significant expert system.

But, of course, constructing the Bayesian network is only part of the problem. Once a probability measure has been represented using a Bayesian network, it is important to be able to draw inferences from it. For example, a doctor using the PATHFINDER system will typically want to know the probability of a given disease given certain findings. Even if the disease is a child of the symptom, computing the probability of the disease given the symptom requires some effort. For example, consider the problem of computing μs(C = 1 | SH = 1) for the unique probability measure μs compatible with (Gs, fs). The cpt says that μs(C = 1 | SH = 1 S = 1) = .6 and that μ(C = 1 | SH = 1 S = 0) = .1. μs(C = 1 | SH = 1) can be computed using the identity

That means that μs(S = 1) and μs(S = 0) must be computed. Efficient algorithms for such computations have been developed (and continue to be improved), which take advantage of the dag structure of a Bayesian network. It would take us too far afield here to go into the details; see the notes to this chapter for references.

4.5.3 Independencies in Bayesian Networks

By definition, a node (i.e., a random variable) in a Bayesian network G that qualitatively represents a probability measure μ is independent of its nonancestors, given its parents with respect to μ. What other conditional independencies hold for the probability measures represented by a Bayesian network? These can easily be computed. There is a notion of d-separation, which I am about to define, with the property that X is conditionally independent of Y given Z if the nodes in Z d-separate every node in X from every node in Y.

Now for the formal definition. A node X is d-separated (the d is for directed) from a node Y by a set of nodes Z in the dag G, written d-sepG(X, Y | Z), if, for every undirected path from X to Y (an undirected path is a path that ignores the arrows; e.g., (SH, PS, S) is an undirected path from SH to S in Gs), there is a node Z on the path such that either

  1. Z Z and there is an arrow on the path leading in to Z and an arrow leading out from Z;

  2. Z Z and has both path arrows leading out; or

  3. Z has both path arrows leading in, and neither Z nor any of its descendants are in Z.

X is d-separated from Y by Z if every node X in X is d-separated from every node Y in Y by Z. Consider the graph Gs. The set {SH, S} d-separates PS from C. One path from PS to C is blocked by SH and the other by S, according to clause (a), since both S and SH have an arrow leading in and one leading out. Similarly, {PS} d-separates SH from S. The (undirected) path (SH, PS, S) is blocked by PS according to clause (b), and the undirected path (SH, C, S) is blocked by C ′∉{PS} according to clause (c). On the other hand, {PS, C} does not d-separate SH from S, since there is no node on the path (SH, C, S) that satisfies any of (a), (b), or (c).

These examples may also help explain the intuition behind each of the clauses of the definition. Clause (a) is quite straightforward. Clearly if there is a directed path from PS to C, then PS can influence C, so PS and C are not independent. However, conditioning on {SH, S} blocks all paths from PS to C, so C is conditionally independent of PS given {SH, S}. The situation in clause (b) is exemplified by the edges leading out from PS to SH and S. Intuitively, smoking (S) and being exposed to secondhand smoke (SH) are not independent because they have a common cause, a parent smoking (PS). Finding out that S = 1 increases the likelihood that PS = 1, which in turn increases the likelihood that SH = 1. However, S and SH are conditionally independent given PS.

Clause (c) in the definition of d-separation is perhaps the most puzzling. Why should the absence of a node in Z cause X and Y to be d-separated? Again, this can be understood in terms of the graph Gs. Finding out that C = 1 makes S and SH become negatively correlated. Since they are both potential causes of C = 1, finding out that one holds decreases the likelihood that the other holds: finding out that S = 0 increases the likelihood that SH = 1; finding out that S = 1 decreases the likelihood that S = 0.

To understand the role of descendants in clause (c), suppose that a node D (for "early death") is added to Gs with an edge from C to D. Finding out that D = 1 makes it more likely that C = 1 and thus also makes S and SH negatively correlated.

The following theorem says that d-separation completely characterizes conditional independence in Bayesian networks:

Theorem 4.5.7

start example

If X is d-separated from Y by Z in the Bayesian network G, then Irv μ (X, Y | Z) holds for all probability measures μ compatible with G. Conversely, if X is not d-separated from Y by Z, then there is a probability measure μ compatible with G such that Irv μ (X, Y | Z) does not hold.

end example

The first half says that d-separation really does imply conditional independence in Bayesian networks. For future reference, it is worth noting that it can be proved using only properties CIRV1-5 and the fact that, by definition, Irv μ (NonDesG(X), X | ParG(X)) holds for every μ compatible with G. The second half says that, in a precise sense, d-separation completely captures conditional independence in qualitative Bayesian networks.

4.5.4 Plausibilistic Bayesian Networks

It should be clear that Bayesian networks can be used with other representations of uncertainty. Certainly nothing in the definition of qualitative Bayesian networks really depends on the use of probability—all the definitions are given in terms of conditional independence of random variables, which makes sense for all the notions of uncertainty we have considered. To what extent do results like Proposition 4.5.4 and Theorems 4.5.6 and 4.5.7 depend on the use of probability? Plausibility measures provide a useful tool with which to examine this question.

As far as qualitative representation goes, note that Definition 4.5.1 makes perfect sense if the probability measure μ is replaced by a plausibility measure Pl everywhere. The proof of Theorem 4.5.6 uses only CIRV1–5; by Theorem 4.4.5, CIRV1–5 hold for all algebraic cps's. The following corollary is immediate:

Corollary 4.5.8

start example

A Bayesian network G obtained by applying (the analogue of) Construction 4.5.5 to the algebraic cpm Pl qualitatively represents Pl.

end example

In light of Corollary 4.5.8, for the remainder of this section I restrict to algebraic cps's.

The notion of a Bayesian network quantitatively representing a plausibility measure makes sense for arbitrary plausibility measures, with one minor caveat. Now a cpt for X must have entries of the form dX=i | Y =jZ=k, for both i = 0 and i = 1, since the conditional plausibility of X = 0 can no longer necessarily be determined from that of X = 1. (Of course, in general, if X is not a binary random variable, then there must be an entry for each possible value of X.) With this minor change, the definition of representation is the obvious analogue of Definition 4.5.3.

Definition 4.5.9

start example

A quantitative Bayesian network (G, f) represents a cpm Pl if G is compatible with Pl and the cpts agree with Pl, in the sense that, for each random variable X, the entry dX = i | Y1=j1, , Yk=jk in the cpt is

if Y1 = j1 Yk = jk . (It does not matter what dX=i|Y1=j1, , Yk=jk is if Y1 = j1 ∩…∩ Yk = jk .)

end example

Given a cpm Pl, it is easy to construct a quantitative Bayesian network (G, f) that represents Pl: simply construct G so that it is compatible with Pl as in Corollary 4.5.8 and define f appropriately, using Pl. The more interesting question is whether there is a unique algebraic cpm determined by a quantitative Bayesian network. As stated, this question is somewhat undetermined. The numbers in a quantitative network do not say what and ought to be for the algebraic cpm.

A reasonable way to make the question more interesting is the following. Recall that, for the purposes of this section, I have taken W to consist of the 2n worlds characterized by the n binary random variables in . Let D,, consist of all standard cps's of the form (W, , , Pl), where = 2W, so that all subsets of W are measurable, the range of Pl is D, and Pl is algebraic with respect to and . Thus, for example, N*, , consists of all conditional ranking functions on W defined from unconditional ranking functions by the construction in Section 3.8. Since a cps (W, , , Pl) D,, is determined by Pl, I abuse notation and write P1 D,,.

With this notation, the question becomes whether a quantitative Bayesian network (G, f) such that the entries in the cpts are in D determines a unique element in D,,. The answer is yes, provided (D, , ) satisfies enough conditions. I do not go through all the conditions here; some of them are technical, and not much insight is gained from writing them all out carefully. They include, for example, conditions saying that and are commutative and associative and that distributes over . It is worth noting that these conditions are satisfied by the definitions of and given in the proof of Proposition 3.9.2 for probability measures, ranking function, possibility measure (using either Poss( | U) or Poss( ‖ U)), and the plausibility measure P1 defined by a set of probability measures to a cps. Thus, it follows that, in all these cases, a quantitative Bayesian network represents a unique element in P1 D,,.

What about the analogue to Theorem 4.5.7? The first half is immediate for all algebraic cps's.

Corollary 4.5.10

start example

If X is d-separated from Y by Z in the Bayesian network G, then Irv Pl (X, Y | Z) holds for all cpms P1 D,, compatible with G.

end example

Proof As I observed after the statement of Theorem 4.5.7, the result depends only on CIRV1–5. Since, by Theorem 3.9.2, CIRV1–5 hold for all algebraic plausibility measures, the result follows.

Getting an analogue to the second half of Theorem 4.5.7 requires a little more work. Notice that to prove such an analogue, it suffices to show that if X is not d-separated from Y by Z in G, then there is a plausibility measure P1 D,, such that Irv Pl (X, Y | Z) does not hold. Again, this result holds with enough conditions on (D, , )—essentially the same conditions required to get a quantitative Bayesian network to determine a unique plausibility measure in D,,, together with a richness condition to ensure that there are "enough" plausibility measures in D,, to guarantee that if d-separation does not hold, then there is a plausibility measure that does not make the appropriate random (conditionally) independent. And again, these conditions hold for all the measures of uncertainty constructed in Proposition 3.9.2.

As these results show, the technology of Bayesian networks can be applied rather widely.

Exercises

  • 4.1 Show that Proposition 4.1.2 holds for all conditional probability measures, using only CP1–3.

  • 4.2 Suppose that μ is a conditional probability measure.

    1. Show that if U and V are independent with respect to μ, then μ(U V) = μ(U)μ(V).

    2. Show that if μ(U) 0, μ(V) 0, and μ(U V) = μ(U)μ(V), then U and V are independent with respect to μ.

  • 4.3 Prove Proposition 4.2.3.

  • 4.4 Prove Theorem 4.2.5.

  • 4.5 Show that Iμ(U, V | V) follows from CI1[μ]–CI5[μ].

  • 4.6 Show that CI1[Pl], CI2[Pl], and CI5[Pl] hold for all cpms Pl.

  • 4.7 This exercise examines the extent to which various notions of conditioning satisfy CI3 and CI4.

    1. Show that ranking functions and the representation Pl of a set of probability measures by a plausibility measure both satisfy CI3 and CI4.

    2. Show by means of counterexamples that none of Poss(V | U), Poss(V || U), Bel(V | U), or Bel(V || U) satisfy CI3 or CI4.

  • 4.8 Prove Lemma 4.3.3.

  • 4.9 Prove Lemma 4.3.5.

  • 4.10 Show that the assumption of standardness is necessary in Lemma 4.3.5. More precisely, suppose that (W, , ) is an arbitrary nonstandard algebraic cps for which . Show that there must exist some U such that IPl(U, U | W) does not hold although NIPl(U, U | W) does.

  • 4.11 Show that the cps implicitly defined in Example 3.2.4 is algebraic.

  • 4.12 This exercise shows that a set of probabilities and its convex hull are not equivalent insofar as determination of independencies goes. Suppose that a coin with an unknown probability of heads is tossed twice, and the tosses are known to be independent. A reasonable representation of this situation is given by the set 0 consisting of all measures μα, where μα(hh) = α2, μα(ht) = μα(th) = α(1 α), μα(tt) = (1 α)2.

    1. Show that the coin tosses are independent with respect to P10.

    2. Show that 0 is not convex (i.e., there exist μ1, μ2 0 such that aμ1 + bμ2 0, where a, b [0, 1] and a + b = 1).

    3. Show that the convex hull of 0 (i.e., the least convex set containing 0) includes measures for which the coin tosses are not independent.

  • 4.13 Show that if is a set of probability measures, then IP1(U, V | V) implies Iμ(U, V | V) for all μ .

  • *4.14 Prove Theorem 4.4.4.

  • 4.15 Show that CIRV5[Pl] holds for all cpms Pl.

  • 4.16 Construct a cps for which none of CIRV2–4 holds.

  • 4.17 Show that CIRV2 does not hold for belief functions, with conditioning defined as Bel(V | U), nor for *. (Hint: Construct a belief function Bel such that Bel(X = i) = Bel(Y = j Y = k) = 0 for all i, j, k {0, 1}, but Bel(Y = 0) = Bel(Y = 1) = 1/2. Show, as a consequence, that Irv Bel(X, {Y, Y}) holds, but Irv Bel(X, Y) does not. A similar counterexample can be constructed for *.)

  • 4.18 Show using CIRV1-5 that Irvμ (X, Y | Z) iff Irvμ (X Z, Y Z | Z). Thus it is possible to assume without loss of generality that Z is disjoint from X and Y.

  • *4.19 Prove Theorem 4.4.5.

  • 4.20 Show that if X1, , Xn is a topological sort of G, then {X1, , Xi1} NonDesG(Xi).

  • *4.21 Consider the following property of conditional independence for random variables.

    CIRV6[μ]. If Irv μ (X, Y | Y Z) and Irv μ (X, Y | Y Z), then Irv μ (X, Y Y | Z).

    CIRV6 can be viewed as a partial converse to CIRV3.

    1. Show by means of a counterexample that CIRV6 does not hold if Y = Y.

    2. Show by means of a counterexample that CIRV6 does not hold even if X, Y, Y, Z are pairwise disjoint (i.e., if none of the random variables in X is in Y Y Z, none of the random variables in Y is in X Y Z, and so on).

    3. Show that CIRV6[μ] holds for all probability measures μ that are strictly positive with respect to X, Y, Y, and Z in that if X = {X1, , Xn}, Y = {Y1, , Ym}, Z = {Z1, , Zk}, and Y ={Y 1, , Y p}, then for all xi (Xi), i = 1,,n, yj (Yj), j = 1, , m, yl, l = 1,, p, and zh (Zh), h = 1,, k,

  • 4.22 Prove Proposition 4.5.4. Note that what requires proof here is that the required independence relations hold for the probability measure μ that is determined by (G, f).

  • 4.23 Complete the proof of Theorem 4.5.6 by showing that, for all nodes Yk,

    using CIRV1-5. (Hint: Let Zm = NonDesG(Yk) {Y1, , Ym}. Prove by induction on m that Irvμ (Zm, Yk | ParG(Yk)), using CIRV1-5.)

  • 4.24 Consider the quantitative Bayesian network (Gs, fs) described in Example 4.5.2.

    1. Notice that {S, SH} blocks both paths from PS to C. What does this say about the relationship between PS and C in probabilistic terms?

    2. Calculate μs(C = 1 | PS = 1) for the unique probability measure μs represented by (Gs, fs).

    3. Use the construction of Theorem 4.5.6 to construct two qualitative Bayesian networks representing μs, both having S as their root.

    4. Suppose that you believe that there is a gene (that can be inherited) that results in a predisposition both to smoke and to have cancer, but otherwise smoking and cancer are unrelated. Draw a Bayesian network describing these beliefs, using the variables PG (at least one parent has this gene), G (has this gene), PS (at least one parent smokes), S (smokes), and C (has cancer). Explain why each edge you included is there.




Reasoning About Uncertainty
Reasoning about Uncertainty
ISBN: 0262582597
EAN: 2147483647
Year: 2005
Pages: 140

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net