2.8 Plausibility Measures

I conclude this chapter by considering an approach that is a generalization of all the approaches mentioned so far. This approach uses what are called plausibility measures, which are unfortunately not the same as the plausibility functions used in the Dempster-Shafer approach (although plausibility functions are instances of plausibility measures). I hope that the reader will be able to sort through any confusion caused by this overloading of terminology.

The basic idea behind plausibility measures is straightforward. A probability measure maps sets in an algebra over a set W of worlds to [0, 1]. A plausibility measure is more general; it maps sets in to some arbitrary partially ordered set. If Pl is a plausibility measure, Pl(U) denotes the plausibility of U.If Pl(U) ≤ Pl(V), then V is at least as plausible as U. Because the ordering is partial, it could be that the plausibility of two different sets is incomparable. An agent may not be prepared to order two sets in terms of plausibility.

Formally, a plausibility space is a tuple S = (W, , PI), where W is a set of worlds, is an algebra over W, and Pl maps sets in to some set D of plausibility values partially ordered by a relation ≤_D (so that ≤_D is reflexive, transitive, and antisymmetric). D is assumed to contain two special elements, ⊤_D and ⊥_D, such that ⊥_D ≤_D d ≤_D ⊤_D for all d ∈ D. As usual, the ordering <_D is defined by taking d₁ < _D d₂ if d₁ ≤_D d₂ and d₁ ≠ d₂. I omit the subscript D from ≤_D, < _D, ⊤_D, and ⊥_D whenever it is clear from context.

There are three requirements on plausibility measures. The first two just enforce the standard convention that the whole space gets the maximum plausibility and the empty set gets the minimum plausibility (⊤ and ⊥). The third requirement says that a set must be at least as plausible as any of its subsets; that is, plausibility respects subsets.

Pl1. Pl(∅) = ⊥.
Pl2. Pl(W) = ⊤.
Pl3. If U ⊆ V, then Pl(U) ≤ Pl(V).

Clearly probability measures, lower and upper probabilities, inner and outer measures, Dempster-Shafer belief functions and plausibility functions, and possibility and necessity measures are all instances of plausibility measures, where D = [0, 1], ⊥ = 0, ⊤ = 1, and ≤_D is the standard ordering on the reals. Ranking functions are also instances of plausibility measures; in this case, D = 풩^*, ⊥ = ∞, ⊺ = 0, and the ordering ≤_풩^* is the opposite of the standard ordering on 풩^*; that is, x ≤_풩^* y if and only if y ≤ x under the standard ordering.

In all these cases, the plausibility values are totally ordered. But there are also cases of interest where the plausibility values are not totally ordered. Two examples are given by starting with a partial preorder ≽ on W as in Section 2.7. The partial preorders ≽^e and ≽^s derived from ≽ can be used to define plausibility measures, although there is a minor subtle issue. Given ≽, consider the plausibility space (W, 2^W, Pl_≽^e). Roughly speaking, Pl_≽^e is the identity, and Pl_≽(U) ≥ Pl_≽^e(V) iff U ≽^e V. There is only one problem with this. The set of plausibility values is supposed to be a partial order, not just a preorder.

One obvious way around this problem is to allow the order ≤_D of plausibility values to be a preorder rather than a partial order. There would be no conceptual difficulty in doing this, and in fact I do it (briefly) for technical reasons in Section 5.4.3. I have restricted to partial orders here partly to be consistent with the literature and partly because there seems to be an intuition that if the likelihood of U is at least as great as that of V, and the likelihood of V is as great as that of U, then U and V have equal likelihoods. In any case, the particular problem of capturing ≽^e using plausibility measures can easily be solved. Define an equivalence relation ~ on 2^W by taking U ~ V if U ≽^e V and V ≽^e U. Let [U] consist of all the sets equivalent to U; that is, [U] ={U′ : U ~ U′}. Let W/~ = {[U]: U ∈ W}. Define a partial order on W/~ in the obvious way: [U] ≥ [V] iff U ≽^e V. It is easy to check that this order on W/~ is well-defined and makes W/~ a partial order (Exercise 2.54). Now taking E = W/~ and defining Pl_≽^e(U) = [U]gives a well-defined plausibility measure. Exactly the same technique works for ≽^s.

For a perhaps more interesting example, suppose that is a set of probability measures on W. Both _* and ^* give a way of comparing the likelihood of two subsets U and V of W. These two ways are incomparable; it is easy to find a set of probability measures on W and subsets U and V of W such that _* (U) < _* (V) and ^* (U) > ^* (V) (Exercise 2.55(a)). Rather than choosing between _* and ^*, it is possible to associate a different plausibility measure with that captures both. Let D_int ={(a, b) :0 ≤ a ≤ b ≤ 1} (the int is for interval) and define (a, b) ≤ (a′, b′) iff b ≤ a′. This puts a partial order on D_int, with ⊥_{D_int} = (0, 0) and ⊤_{D_int} = (1, 1). Define PI_{_{*, ^*}}(U) = (_* (U), ^* (U)), . Thus, PI_{_*, ^*} associates with a set U two numbers that can be thought of as defining an interval in terms of the lower and upper probability of U. It is easy to check that PI_{_*, ^*} (U) ≤ PI_{_*, ^*} (V) if the upper probability of U is less than or equal to the lower probability of V. Clearly PI_{_*, ^*} satisfies Pl1–3, so it is indeed a plausibility measure, but one that puts only a partial (pre)order on events. A similar plausibility measure can be associated with a belief/plausibility function and with an inner/outer measure.

The trouble with 풫_*, 풫^*, and even Pl_{풫_*,풫*} is that they lose information. Example 2.3.4 gives one instance of this phenomenon; the fact that μ(r) = μ(b) for every measure μ ∊ 풫₄ is lost by taking lower and upper probabilities. It is easy to generate other examples. For example, it is not hard to find a set 풫 of probability measures and subsets U, V of W such that μ(U) ≤ μ(V) for all μ ∊ 풫₄ and μ(U) < μ(V) for some μ ∊ 풫, but 풫_*(U) = 풫_*(V) and 풫_*(U) = 풫_*(V). Indeed, there exists an infinite set 풫 of probability measures such that μ(U) < μ(V) for all μ ∊ 풫 but _* (U) = 풫_*(V) and 풫^*(U) = 풫^*(V) (Exercise 2.55(b)). If all the probability measures in 풫 agree that U is less likely than V, it seems reasonable to conclude that U is less likely than V. However, none of the plausibility measures 풫_*, 풫^*, or Pl_{풫_*,풫*} will necessarily draw this conclusion.

Fortunately, it is not hard to associate yet another plausibility measure with 풫 that does not lose this important information (and does indeed conclude that U is less likely than V ).

To explain this representation, it is easiest to consider first the case that 풫 is finite. Suppose 풫 = {μ₁,…,μ_n}. Then the idea is to define Pl_풫(U) = (μ₁(U, …, μ_n(U)). That is, the plausibility of a set U is represented as a tuple, consisting of the probability of U according to each measure in 풫. The ordering on tuples is pointwise: (a₁, …, a_n) ≤ (b₁, …, b_n) if a_i ≤ b_i for i = 1, …, n. There are two minor problems with this approach, both easily fixed. The first is that a set is unordered. Although the subscripts suggest that μ₁ is the "first" element in 풫, there is no first element in 풫. On the other hand, there really is a first element in a tuple. Which probability measure in 풫 should be first, second, and so on? Another minor problem comes if 풫 consists of an uncountable number of elements; it is not clear how to represent the set of measures in 풫 as a tuple.

These problems can be dealt with in a straightforward way. Let D_풫 consist of all functions from 풫 to [0, 1]. The standard pointwise ordering on functions—that is, f ≤ g if f (μ) ≤ g(μ) for all μ ∊ 풫—gives a partial order on D_풫. Note that ⊥_{D_풫} is the function f : 풫 → [0, 1] such that f (μ) = 0 for all μ ∊ 풫 and ⊺_{D_풫} is the function g such that g(μ) = 1 for all μ ∊ 풫. For U ⊆ W, let f_U be the function such that f_U(μ) = μ(U) for all μ ∊ 풫. Define the plausibility measure Pl_풫 by taking Pl_풫(U) = f_U. Thus, Pl_풫(U) ≤ Pl_풫(V) iff μ(U) ≤ μ(V) for all μ ∊ 풫. It is easy to see that f_θ = ⊥_{D_풫} and f_w ⊺_{D_풫}. Clearly Pl_풫 satisfies Pl1–3. Pl1 and Pl2 follow since Pl_풫(θ) = f_θ = ⊥_{D_풫} and Pl_풫(W) = f_W = ⊺_{D_풫}, while Pl3 holds because if U ⊆ V, then μ(U) ≤ μ(V) for all μ ∊ 풫. Note that if 풫 = {μ₁,…,μ_n}, then Pl_풫(U) is the function f such that f(μ_i) = μ_i(U). This function can be identified with the tuple (μ₁(U), …, μ_n(U)).

To see how this representation works, consider Example 2.3.2 (the example with a bag of red, blue, and yellow marbles). Recall that this was modeled using the set 풫₂ = {μ_a: a ∊ [0,.7]} of probabilities, where μ_a(red) = .3, μ_a(blue) = a, and μ_a(yellow) = .7 − a. Then, for example, Pl_풫₂ (blue) = f_blue, where f_blue(μ_a) = μ_a(blue) = a for all a ∈ [0, .7]. Similarly,

Pl_풫₂ (red) = f_red, where f_red(μ_a) = .3,
Pl_풫₂ (yellow) = f_yellow, where f_yellow(μ_a) = .7 − a,
Pl_풫₂ (red, blue) = f_red,blue, where f_{{red, blue}}(μ_a) = .3 + a.

The events yellow and blue are incomparable with respect to Pl_풫₂ since f_yellow and f_blue are incomparable (e.g., f_yellow(μ_.7) < f_blue(μ_.7) while f_yellow(μ₀)>f_blue(μ₀)).

On the other hand, consider the sets 풫₃ and 풫₄ from Example 2.3.4. Recall that 풫₃ = {μ : μ(blue) ≤ .5, μ(yellow) ≤.5}, and 풫₄ ={μ : μ(b = μ(y)}. It is easy to check that Pl_풫₄ (blue) = Pl_풫₄ (yellow), while Pl_풫₃ (blue) and Pl_풫₃ (yellow) are incomparable.

This technique for defining a plausibility measure that represents a set of probability measures is quite general. The same approach can be used essentially without change to represent any set of plausibility measures as a single plausibility measure.

Plausibility measures are very general. Pl1–3 are quite minimal requirements, by design, and arguably are the smallest set of properties that a representation of likelihood should satisfy. It is, of course, possible to add more properties, some of which seem quite natural, but these are typically properties that some representation of uncertainty does not satisfy (see, e.g., Exercise 2.57).

What is the advantage of having this generality? This should hopefully become clearer in later chapters, but I can make at least some motivating remarks now. For one thing, by using plausibility measures, it is possible to prove general results about properties of representations of uncertainty. That is, it is possible to show that all representations of uncertainty that have property X also have property Y. Since it may be clear that, say, possibility measures and ranking functions have property X, then it immediately follows that both have property Y ; moreover, if Dempster-Shafer belief functions do not have property X, the proof may well give a deeper understanding as to why belief functions do not have property Y.

For example, it turns out that a great deal of mileage can be gained by assuming that there is some operation ⊕ on the set of plausibility values such that Pl(U ∪ V) = Pl(U) ⊕ Pl(V) if U and V are disjoint. (Unfortunately, the ⊕ discussed here has nothing to do with the ⊕ defined in the context of Dempster's Rule of Combination. I hope that it will be clear from context which version of ⊕ is being used.) If such an ⊕ exists, then Pl is said to be additive (with respect to ⊕). Probability measures, possibility measures, and ranking functions are all additive. In the case of probability measures, ⊕ is +; in the case of possibility measures, it is max; in the case of ranking functions, it is min. For the plausibility measure Pl_P, ⊕ is essentially pointwise addition (see Section 3.9 for a more careful definition). However, belief functions are not additive; neither are plausibility functions, lower probabilities, or upper probabilities. There exist a set W, a belief function Bel on W, and pairwise disjoint subsets U₁, U₂, V₁, V₂ of W such that Bel(U₁) = Bel(V₁), Bel(U₂) = Bel(V₂), but Bel(U₁ ∪ U₂) ≠ Bel(V₁ ∪ V₂) (Exercise 2.56). It follows that there cannot be a function ⊕ such that Bel(U ∪ V) = Bel(U) ⊕ Bel(V). Similar arguments apply to plausibility functions, lower probabilities, and upper probabilities. Thus, in the most general setting, I do not assume additivity. Plausibility measures are of interest in part because they make it possible to investigate the consequences of assuming additivity. I return to this issue in Sections 3.9, 5.3, and 8.4.