4.2 Probabilistic Conditional Independence

In practice, the notion of unconditional independence considered in Definition 4.1.1 is often not general enough. Consider the following example:

Example 4.2.1

Suppose that Alice has a coin that she knows is either fair or double-headed. Either possibility seems equally likely, so she assigns each of them probability 1/2. She then tosses the coin twice. Is the event that the first coin toss lands heads independent of the event that the second coin toss lands heads? Coin tosses are typically viewed as being independent but, in this case, that intuition is slightly misleading. There is another intuition at work here. If the first coin toss lands heads, it is more likely that the coin is double-headed, so the probability that the second coin toss lands heads is higher. This is perhaps even clearer if "heads" is replaced by "tails." Learning that the first coin toss lands tails shows that the coin must be fair, and thus makes the probability that the second coin toss lands tails 1/2. A priori, the probability that the second coin toss lands heads is only 1/4 (half of the probability 1/2 that the coin is fair).

This can be formalized using a space much like the one used in Example 3.2.2. There is one possible world corresponding to the double-headed coin, where the coin lands heads twice. This world has probability 1/2, since that is the probability of the coin being double-headed. There are four possible worlds corresponding to the fair coin, one for each of the four possible sequences of two coin tosses; each of them has probability 1/8. The probability of the first toss landing heads is 3/4: it happens in the world corresponding to the double-headed coin and two of the four worlds corresponding to the fair coin. Similarly, the probability of the second toss landing heads is 3/4, and the probability of both coins landing heads is 5/8. Thus, the conditional probability of two heads given that the first coin toss is heads is (5/8)/(3/4) = 5/6, which is not 3/4 3/4.

The coin tosses are independent conditional on the bias of the coin. That is, given that the coin is fair, then the probability of two heads given that the coin is fair is the product of the probability that the first toss lands heads given that the coin is fair and the probability that the second toss lands heads given that the coin is fair. Similarly, the coin tosses are independent conditional on the coin being double-headed.

The formal definition of probabilistic conditional independence is a straightforward generalization of the definition of probabilistic independence.

Definition 4.2.2

It is immediate that U and V are (probabilistically) independent iff they are independent conditional on W. Thus, the definition of conditional independence generalizes that of (unconditional) independence.

The following result generalizes Proposition 4.1.2.

Proposition 4.2.3

The following are equivalent if μ(V′) ≠ 0:

μ(U ∩ V′) ≠ 0 implies μ(V | U ∩ V′) = μ(V | V′),
μ(U ∩ V | V′) = μ(U | V′)μ(V | V′),
μ(V ∩ V′) ≠ 0 implies μ(U | V ∩ V′) = μ(U | V′).

Proof See Exercise 4.3.

Just as in the case of unconditional probability, Proposition 4.2.3 shows that Definition 4.2.2 could have been simplified by using just one of the clauses. And, just as in the case of unconditional probability, I did not simplify the definition because then the generalization would become less transparent.

In general, independent events can become dependent in the presence of additional information, as the following example shows:

Example 4.2.4

A fair coin is tossed twice. The event that it lands heads on the first toss is independent of the event that it lands heads on the second toss. But these events are no longer independent conditional on the event U that exactly one coin toss lands heads. Conditional on U, the probability that the first coin lands heads is 1/2, the probability that the second coin lands heads is 1/2, but the probability that they both land heads is 0.

The following theorem collects some properties of conditional independence:

Theorem 4.2.5

For all probability measures μ on W, the following properties hold for all subsets U, V, and V′ of W:

CI1[μ]. If I_μ (U, V | V′) then I_μ(V, U | V′).
CI2[μ.] I_μ (U, W | V′).
CI3[μ.] If I_μ (U, V | V′) then I_μ (U, V | V′).
CI4[μ]. If V₁ ∩ V₂ = ⊘, I_μ (U, V₁ | V′), and I_μ (U, V₂ | V′),then I_μ (U, V₁ ∪ V₂ | V′).
CI5[μ]. I_μ (U, V | V′) iff I_μ (U, V ∩ V′ | V′).

Proof See Exercise 4.4.

I omit the parenthetical μ in CI1–5 when it is clear from context or plays no significant role. CI1 says that conditional independence is symmetric; this is almost immediate from the definition. CI2 says that the whole space W is independent of every other set, conditional on any set. This seems reasonable—no matter what is learned, the probability of the whole space is still 1. CI3 says that if U is conditionally independent of V, then it is also conditionally independent of the complement of V — if V is unrelated to U given V′, then so is V. CI4 says that if each of two disjoint sets V₁ and V₂ is independent of U given V′, then so is their union. Finally, CI5 says that when determining independence conditional on V′, all that matters is the relativization of all events to V′. Each of these properties is purely qualitative; no mention is made of numbers.