2.3 Lower and Upper Probabilities


2.3 Lower and Upper Probabilities

Despite its widespread acceptance, there are some problems in using probability to represent uncertainty. Three of the most serious are (1) probability is not good at representing ignorance, (2) while an agent may be prepared to assign probabilities to some sets, she may not be prepared to assign probabilities to all sets, and (3) while an agent may be willing in principle to assign probabilities to all the sets in some algebra, computing these probabilities requires some computational effort; she may simply not have the computational resources required to do it. These criticisms turn out to be closely related to one of the criticisms of the Dutch book justification for probability mentioned in Section 2.2.1. The following two examples might help clarify the issues.

Example 2.3.1

start example

Suppose that a coin is tossed once. There are two possible worlds, heads and tails, corresponding to the two possible outcomes. If the coin is known to be fair, it seems reasonable to assign probability 1/2 to each of these worlds. However, suppose that the coin has an unknown bias. How should this be represented? One approach might be to continue to take heads and tails as the elementary outcomes and, applying the principle of indifference, assign them both probability 1/2, just as in the case of a fair coin. However, there seems to be a significant qualitative difference between a fair coin and a coin of unknown bias. Is there some way that this difference can be captured? One possibility is to take the bias of the coin to be part of the possible world (i.e., a basic outcome would now describe both the bias of the coin and the outcome of the toss), but then what is the probability of heads?

end example

Example 2.3.2

start example

Suppose that a bag contains 100 marbles; 30 are known to be red, and the remainder are known to be either blue or yellow, although the exact proportion of blue and yellow is not known. What is the likelihood that a marble taken out of the bag is yellow? This can be modeled with three possible worlds, red, blue, and yellow, one for each of the possible outcomes. It seems reasonable to assign probability .3 to the outcome to choosing a red marble, and thus probability .7 to choosing either blue or yellow, but what probability should be assigned to the other two outcomes?

end example

Empirically, it is clear that people do not use probability to represent the uncertainty in examples such as Example 2.3.2. For example, consider the following three bets. In each case a marble is chosen from the bag.

  • Br pays $1 if the marble is red, and 0 otherwise;

  • Bb pays $1 if the marble is blue, and 0 otherwise;

  • By pays $1 if the marble is yellow, and 0 otherwise.

People invariably prefer Br to both Bb and By, and they are indifferent between Bb and By. The fact that they are indifferent between Bb ad By suggests that they view it equally likely that the marble chosen is blue and that it is yellow. This seems reasonable; the problem statement provides no reason to prefer blue to yellow, or vice versa. However, if blue and yellow are equally probable, then the probability of drawing a blue marble and that of drawing a yellow marble are both .35, which suggests that By and Bb should both be preferred to Br. Moreover, any way of ascribing probability to blue and yellow either makes choosing a blue marble more likely than choosing a red marble, or makes choosing a yellow marble more likely than choosing a red marble (or both). This suggests that at least one of Bb and By should be preferred to Br, which is simply not what the experimental evidence shows.

There are a number of ways of representing the uncertainty in these examples. As suggested in Example 2.3.1, it is possible to make the uncertainty about the bias of the coin part of the possible world. A possible world would then be a pair (a, X), where a [0, 1]and X {H, T}. Thus, for example, (1/3, H) is the world where the coin has bias 1/3 and lands heads. (Recall that the bias of a coin is the probability that it lands heads.) The problem with this approach (besides the fact that there are an uncountable number of worlds, although that is not a serious problem) is that it is not clear how to put a probability measure on the whole space, since there is no probability given on the coin having, say, bias in [1/3, 2/3]. The space can be partitioned into subspaces wa, a [0, 1], where Wa consists of the two worlds (a, H) and (a, T).In Wa, there is an obvious probability μa on wa : μa(a, H) = a and μa(a, T) = 1 a. This just says that in a world in wa (where the bias of the coin is a), the probability of heads is a and the probability of tails is 1 a. For example, in the world (1/3, H), the probability measure is taken to be on just (1/3, H) and (1/3, T); all the other worlds are ignored. The probability of heads is taken to be 1/3 at (1/3, H). This is just the probability of (1/3, H), since (1/3, H) is the intersection of the event "the coin lands heads" (i.e., all worlds of the form (a, H)) with w1/3.

This is an instance of an approach that will be examined in more detail in Sections 3.4 and 6.9. Rather than there being a global probability on the whole space, the space W is partitioned into subsets wi, i I. (In this case, I = [0, 1].) On each subset Wi, there is a separate probability measure μi that is used for the worlds in that subset. The probability of an event U at a world in wi is μi(Wi U).

For Example 2.3.2, the worlds would have the form (n, X), where X {red, blue, yellow} and n {0, , 70}. (Think of n as representing the number of blue marbles.) In the subset wn ={(n, red), (n, blue), (n, yellow)}, the world (n, red) has probability .3, (n, blue) has probability n/100, and (n, yellow) has probability (70 n)/100. Thus, the probability of red is known to be .3; this is a fact true at every world (even though a different probability measure may be used at different worlds). Similarly, the probability of blue is known to be between 0 and .7, as is the probability of yellow. The probability of blue may be .3, but this is not known.

An advantage of this approach is that it allows a smooth transition to the purely probabilistic case. Suppose, for example, that a probability on the number of blue marbles is given. That amounts to putting a probability on the sets wn, since wn corresponds to the event that there are n blue marbles. If the probability of wn is, say, bn, where 70n=0 bn = 1, then the probability of (n, blue) = bn (n/70). In this way, a probability μ on the whole space W can be defined. The original probability μn on wn is the result of conditioning μ on wn. (I am assuming that readers are familiar with conditional probability; it is discussed in much more detail in Chapter 3.)

This approach turns out to be quite fruitful. However, for now, I focus on two other approaches that do not involve extending the set of possible worlds. The first approach, which has been thoroughly studied in the literature, is quite natural. The idea is to simply represent uncertainty using not just one probability measure, but a set of them. For example, in the case of the coin with unknown bias, the uncertainty can be represented using the set 1 = {μa: a [0, 1]} of probability measures, where μa gives heads probability a. Similarly, in the case of the marbles, the uncertainty can be represented using the set 2 = {μ′a: a [0, .7]}, where μ′a gives red probability .3, blue probability a, and yellow probability .7 a. (I could restrict a to having the form n/100, for n {0, , 70}, but it turns out to be a little more convenient in the later discussion not to make this restriction.)

A set of probability measures, all defined on a set W, can be represented as a single space W. This space can be partitioned into subspaces Wμ, for μ , where Wμ ={(μ, w) : w W}. On the subspace Wμ, the probability measure μ is used. This, of course, is an instance of the first approach discussed in this section. The first approach is actually somewhat more general. Here I am assuming that the space has the form A B, where the elements of A define the partition, so that there is a probability μa on {a} B for each a A. This type of space arises in many applications (see Section 3.4).

The last approach I consider in this section is to make only some sets measurable. Intuitively, the measurable sets are the ones to which a probability can be assigned. For example, in the case of the coin, the algebra might consist only of the empty set and {heads, tails}, so that {heads} and {tails} are no longer measurable sets. Clearly, there is only one probability measure on this space; for future reference, call it μ1.By considering this trivial algebra, there is no need to assign a probability to {heads} or {tails}.

Similarly, in the case of the marbles, consider the algebra

There is an obvious probability measure μ2 on this algebra that describes the story in Example 2.3.2: simply take μ2(red) = .3. That determines all the other probabilities.

Notice that, with the first approach, in the case of the marbles, the probability of red is .3 (since all probability measures 2 give red probability .3), but all that can be said about the probability of blue is that it is somewhere between 0 and .7 (since that is the range of possible probabilities for blue according to the probability measures in 2), and similarly for yellow. There is a sense in which the second approach also gives this answer: any probability for blue between 0 and .7 is compatible with the probability measure μ2. Similarly, in the case of the coin with an unknown bias, all that can be said about the probability of heads is that it is somewhere between 0 and 1.

Recasting these examples in terms of the Dutch book argument, the fact that, for example, all that can be said about the probability of the marble being blue is that it is between 0 and .7 corresponds to the agent definitely preferring (blue, 1 α) to (blue, α) for α > .7, but not being able to choose between the two bets for 0 α .7. In fact, the Dutch book justification for probability given in Theorem 2.2.3 can be recast to provide a justification for using sets of probabilities. Interestingly, with sets of probabilities, RAT3 no longer holds. The agent may not always be able to decide which of (U, α) and (U, 1 α) she prefers.

Given a set of probability measures, all defined on an algebra over a set W, and U , define

*(U) is called the lower probability of U, and *(U) is called the upper probability of U. For example, (2)*(blue) = 0, (2)*(blue) = .7, and similarly for yellow, while (2)*(red) = (2)*(red) = .3.

Now consider the approach of taking only some subsets to be measurable. An algebra is a subalgebra of an algebra if . If is a subalgebra of , μ is a probability measure on , and μ′ is a probability measure on , then μ′ is an extension of μ if μ and μ′ agree on all sets in . Notice that 1 consists of all the extensions of μ1 to the algebra consisting of all subsets of {heads, tails} and 2 consists of all extensions of μ2 to the algebra of all subsets of {red, blue, yellow}.

If μ is a probability measure on the subalgebra and U , then μ(U) is undefined, since U is not in the domain of μ. There are two standard ways of extending μ to , by defining functions μ* and μ*, traditionally called the inner measure and outer measure induced by μ, respectively. For U , define

These definitions are perhaps best understood in the case where the set of possible worlds (and hence the algebra ) is finite. In that case, μ*(U) is the measure of the largest measurable set (in ) contained in U, and μ*(U) is the measure of the smallest measurable set containing U. That is, μ*(U) = μ(V1), where V1 = {B : B U} and μ*(U) = μ(V2), where V2 = {B : U B} (Exercise 2.7). Intuitively, μ*(U) is the best approximation to the actual probability of U from below and μ*(U) is the best approximation from above. If U , then it is easy to see that μ*(U) = μ*(U) = μ(U). If U then, in general, μ*(U)<μ*(U). For example, (μ2)*(blue) = 0 and (μ2)*(blue) = .7, since the largest measurable set contained in {blue} is the empty set, while the smallest measurable set containing blue is {blue, yellow}. Similarly, (μ2)*(red) = (μ2)*(red) = μ2(red) = .3. These are precisely the same numbers obtained using the lower and upper probabilities (2)* and (2)*. Of course, this is no accident.

Theorem 2.3.3

start example

Let μ be a probability measure on a subalgebra and let μ consist of all extensions of μ to . Then μ*(U) = (μ)*(U) and μ*(U) = (μ)*(U) for all U .

end example

Proof See Exercise 2.8. Note that, as the discussion in Exercise 2.8 and the notes to this chapter show, in general, the probability measures in μ are only finitely additive. The result is not true in general for countably additive probability measures. A variant of this result does hold even for countably additive measures; see the notes for details.

Note that whereas probability measures are additive, so that if U and V are disjoint sets then μ(U V) = μ(U) + μ(V), inner measures are superadditive and outer measures are subadditive, so that for disjoint sets U and V,

In addition, the relationship between inner and outer measures is defined by

(Exercise 2.9).

The inequalities in (2.3) are special cases of more general inequalities satisfied by inner and outer measures. These more general inequalities are best understood in terms of the inclusion-exclusion rule for probability, which describes how to compute the probability of the union of (not necessarily disjoint) sets. In the case of two sets, the rule says

To see this, note that U V can be written as the union of three disjoint sets, U V, V U, and U V. Thus,

Since U is the union of U V and U V, and V is the union of V U and U V, it follows that

Now (2.5) easily follows by simple algebra.

In the case of three sets U1, U2, U3, similar arguments show that

That is, the probability of the union of U1, U2, and U3 can be determined by adding the probability of the individual sets (these are one-way intersections), subtracting the probability of the two-way intersections, and adding the probability of the three-way intersections.

The full-blown inclusion-exclusion rule is

Equation (2.7) says that the probability of the union of n sets is obtained by adding the probability of the one-way intersections (the case when |I | = 1), subtracting the probability of the two-way intersections (the case when |I | = 2), adding the probability of the three-way intersections, and so on. The (1)i+1 term causes the alternation from addition to subtraction and back again as the size of the intersection set increases. Equations (2.5) and (2.6) are just special cases of the general rule when n = 2 and n = 3. I leave it to the reader to verify the general rule (Exercise 2.10).

For inner measures, there is also an inclusion-exclusion rule, except that = is replaced by . Thus,

(Exercise 2.12). For outer measures, there is a dual property that holds, which results from (2.8) by (1) switching the roles of intersection and union and (2) replacing by . That is,

(Exercise 2.13). Theorem 7.4.1 in Section 7.4 shows that there is a sense in which these inequalities characterize inner and outer measures.

Theorem 2.3.3 shows that for every probability measure μ on an algebra , there exists a set of probability measures defined on 2W such that μ* = *. Thus, inner measure can be viewed as a special case of lower probability. The converse of Theorem 2.3.3 does not hold; not every lower probability is the inner measure that arises from a measure defined on a subalgebra of 2W. One way of seeing that lower probabilities are more general is by considering the properties that they satisfy.

It is easy to see that lower and upper probabilities satisfy analogues of (2.3) and (2.4) (with μ* and μ* replaced by * and *, respectively). If U and V are disjoint, then

and

However, they do not satisfy the analogues of (2.8) and (2.9) in general (Exercise 2.14). Note that if * does not satisfy the analogue of (2.8), then it cannot be the case that * = μ* for some probability measure μ, since all inner measures do satisfy (2.12).

While (2.10) and (2.11) hold for all lower and upper probabilities, these properties do not completely characterize them. For example, the following property holds for lower probabilities and upper probabilities if U and V are disjoint:

moreover, this property does not follow from (2.10) and (2.11) (Exercise 2.15). However, even adding (2.12) to (2.10) and (2.11) does not provide a complete characterization of upper and lower probabilities. The property needed is rather complex. Stating it requires one more definition: A set of subsets of W covers a subset U of W exactly k times if every element of U is in exactly k sets in . Consider the following property:

It is not hard to show that lower probabilities satisfy (2.13) and that (2.10) and (2.12) follow from (2.13) and (2.11) (Exercise 2.16). Indeed, in a precise sense (discussed in Exercise 2.16), (2.13) completely characterizes lower probabilities (and hence, together with (2.11), upper probabilities as well), at least if all the probability measures are only finitely additive.

If all the probability measures in are countably additive and are defined on a σ-algebra , then * has one additional continuity property analogous to (2.2):

(Exercise 2.18(a)). The analogue of (2.1) does not hold for lower probability. For example, suppose that = {μ0, μ1,}, where μn is the probability measure on such that μn(n) = 1. Clearly *(U) = 0 if U is a strict subset of , and *() = 1. Let Un ={1, , n}. Then Un is an increasing sequence and i = 1Ui = , but limi→∞ *(Ui) = 0 *() = 1. On the other hand, the analogue of (2.1) does hold for upper probability, while the analogue of (2.2) does not (Exercise 2.18(b)).

Although I have been focusing on lower and upper probability, it is important to stress that sets of probability measures contain more information than is captured by their lower and upper probability, as the following example shows:

Example 2.3.4

start example

Consider two variants of the example with marbles. In the first, all that is know is that there are at most 50 yellow marbles and at most 50 blue marbles in a bag of 100 marbles; no information at all is given about the number of red marbles. In the second case, it is known that there are exactly as many blue marbles as yellow marbles. The first situation can be captured by the set 3 = {μ : μ(blue) .5, μ(yellow) .5}. The second situation can be captured by the set 4 = {μ :μ(b) = μ(y)}. These sets of measures are obviously quite different; in fact 4 3. However, it is easy to see that ()* = (4)* and, hence, that 3* = 4* (Exercise 2.19). Thus, the fact that blue and yellow have equal probability in every measure in 4 has been lost. I return to this issue in Section 2.8.

end example




Reasoning About Uncertainty
Reasoning about Uncertainty
ISBN: 0262582597
EAN: 2147483647
Year: 2005
Pages: 140

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net