6.9 Non-SDP Systems

SDP seems like such a natural requirement. Indeed, _i(w) seems by far the most natural choice for W_{w, i}, and taking μ_{w′, i} = μ_{w, i} for w′ ∈ _i(w) seems like almost a logical necessity, given the intuition behind _i(w). If agent i has the same information at all the worlds in _i(w) (an intuition that is enforced by the multi-agent systems framework), then how could agent i use a different probability measure at worlds that he cannot distinguish? The following example shows that all is not as obvious as it may first appear:

Example 6.9.1

Alice chooses a number, either 0 or 1, and writes it down. She then tosses a fair coin. If the outcome of the coin toss agrees with the number chosen (i.e., if the number chosen is 1 and the coin lands heads, or if the number chosen is 0 and the coin lands tails), then she performs an action a; otherwise, she does not. Suppose that Bob does not know Alice's choice. What is the probability, according to Bob, that Alice performs action a? What is the probability according to Alice? (For definiteness, assume that both of these probabilities are to be assessed at time 1, after Alice has chosen the number but before the coin is tossed.)

The story can be represented in terms of the computation tree shown in Figure 6.8. It seems reasonable to say that, according to Alice, who knows the number chosen, the probability (before she tosses the coin) that she performs action a is 1/2. There is also a reasonable argument to show that, even according to Bob (who does not know the number chosen), the probability is 1/2. Clearly from Bob's viewpoint, if Alice chose 0, then the probability that Alice performs action a is 1/2 (since the probability of the coin landing heads is 1/2); similarly, if Alice chose 1, then the probability of her performing action a is 1/2. Since, no matter what number Alice chose, the probability according to Bob that Alice performs action a is 1/2, it seems reasonable to say that Bob knows that the probability of Alice's performing action a is 1/2.

Note that this argument does not assume a probability for the event that Alice chose 0. This is a good thing. No probability is provided by the problem statement, so none should be assumed.

click to expand
Figure 6.8: Choosing a number, then tossing a coin.

Clearly this example involves reasoning about knowledge and probability, so it should be possible to model it using an epistemic probability frame. (It does not add any insight at this point to use a multi-agent system although, of course, it can be represented using a multi-agent system too.) I consider three frames for modeling it, which differ only in Bob's probability assignment _B. Let F ⁱ = (W, _A, _B, _A, ⁱ_B, i = 1, 2, 3. Most of the components of Fⁱ are defined in (what I hope by now is) the obvious way.

W ={(0, H), (0, T), (1, H), (1, T)}: Alice chose 0 and the coin lands heads, Alice chose 0 and the coin lands tails, and so on. The worlds correspond to the runs in the computation tree.
Bob cannot distinguish any of these worlds; in any one of them, he considers all four possible. Thus, _B(w) = W for all w ∈ W.
Alice knows the number she chose. Thus, in a world of the form (0, x), Alice considers only worlds of the form (0, y) possible; similarly, in a world of the form (1, x), Alice considers only worlds of the form (1, y) possible. Alice's probability assignment _A is the obvious one: W_w,A = _A(w) = {(0, H), (0, T)} and μ_w,A(0, H) = μ_w,A(0, T) = 1/2 for w ∈{(0, H), (0, T)}, and similarly for w ∈{(1, H), (1, T)}.

It remains to define _Bⁱ, i = 1, 2, 3. If Bob could assign some probability α to Alice's choosing 0, there would be no problem: in all worlds w, it seems reasonable to take W_w, _B = W and define μ_{w, B} so that both (0, H) and (0, T) get probability α/2, while both (1, H) and (1, T) get probability (1 − α)/2. This gives a set of probability measures on W, parameterized by α.

It is not hard to show that the event U = {(0, T), (1, H)} that Alice performs action a has probability 1/2, for any choice of α. A Bayesian (i.e., someone who subscribes to the subjective viewpoint of probability) might feel that each agent should choose some α and work with that. Since the probability of U is 1/2, independent of the α chosen, this may seem reasonable. There are, however, two arguments against this viewpoint. The first argument is pragmatic. Since the problem statement does not give α,any particular choice may lead to conclusions beyond those justified by the problem. In this particular example, the choice of α may not matter but, in general, it will. It certainly seems unreasonable to depend on conclusions drawn on the basis of a particular α. The second argument is more philosophical: adding a probability seems unnecessary here. After all, the earlier informal argument did not seem to need the assumption that there was some probability of Alice choosing 0. And since the argument did not seem to need it, it seems reasonable to hope to model the argument without using it.

What alternatives are there to assuming a fixed probability α on Alice choosing 0? This situation is reminiscent of the situations considered at the beginning of Section 2.3. Three different approaches were discussed there; in principle, they can all be applied in this example. For example, it is possible to use an epistemic lower probability frame, so that there are sets of probabilities rather than a single probability. The set could then consist of a probability measure for each choice of α. While something like this would work, it still does not address the second point, that the argument did not depend on having a probability at all.

Another possibility is using nonmeasurable sets, making the event that Alice chooses 0 nonmeasurable. Unfortunately, this has some unpleasant consequences. Define _B¹ so that _B¹(w) = (W, ¹, μ¹) for all w ∈ W, where ¹ is the algebra with basis {(0, H), (1, H)} and {(0, T), (1, T)} and where μ¹({(0, H), (1, H)}) = μ¹({(0, T), (1, T)}) = 1/2. That is, in F¹, the only events to which Bob assigns a probability are those for which the problem statement gives a probability: the event of the coin landing heads and the event of the coin landing tails. Of course, both these events are assigned probability 1/2.

The problem with this approach is that the event U of interest is not in ¹. Thus, it is not true that Bob believes that Alice performs action a with probability 1/2. The event that Alice performs action a is not assigned a probability at all according to this approach!

A better approach is the first approach considered in Section 2.3: partitioning Bob's possible worlds into two subspaces, depending on whether Alice chooses 0 or 1. Bob's probability space when Alice chooses 0 consists of the two worlds where 0 is chosen, and similarly when Alice chooses 1. In this probability space, all worlds are measurable and have the obvious probability. This can be easily represented using probability assignments. Let _B²(w) = _A(w) for all w ∈ W; Bob's probability assignment is the same as Alice's.

Frame F² supports the reasoning in the example. In fact, in F², the probability that Alice performs action a is 1/2 in every world w. More precisely, μ_{w, B}(U) = 1/2 for all w. To see this, consider, for example, the world (0, T). Since W₍_0,_T)_,_B = {(0, T), (0, H)}, it follows that U ∩ W_(0,T),B = {(0, T)}. By definition, μ_(0,T),B((0, T)) = 1/2, as desired. Similar arguments apply for all the other worlds. It follows that Bob knows that the probability that Alice performs action a is 1/2. Similarly, in this frame, Bob knows that the probability that the coin lands heads is 1/2 and that the probability that the coin lands tails is 1/2.

What is the probability that 0 was chosen, according to Bob? In the worlds where 0 is actually chosen—that is, (0, H) and (0, T)—it is 1; in the other two worlds, it is 0. So Bob knows that the probability that 0 was chosen is either 0 or 1.

Of course, once the door is open to partitioning Bob's set of possible worlds into separate subspaces, there is nothing to stop us from considering other possible partitions. In particular, consider frame F³, where ³_B(w) = ({w}, _w, μ_w) for each world w ∈ W. _w and μ_w are completely determined in this case, because the set of possible worlds is a singleton: _w consists of {w} and ∅, and μ^w assigns probability 1 and 0, respectively, to these sets. Now there are four hypotheses: "Alice chose 0 and her coin lands heads," "Alice chose 0 and her coin lands tails," and so on. F³ does not support the reasoning of the example; it is easy to check that at each world in F³, the probability of U is either 0 or 1 (Exercise 6.19). Bob knows that the probability that Alice chooses action a is either 0 or 1, but Bob does not know which.

To summarize, in F¹, the probability of Alice choosing a (according to Bob) is undefined; it corresponds to a nonmeasurable set. In F², the probability is 1/2. Finally, in F³, the probability is either 0 or 1, but Bob does not know which. Note that each of F¹, F², and F³ satisfies CONS and UNIF, but only F¹ (which arguably is the least satisfactory model) satisfies SDP. While F² corresponds perhaps most closely to the way Example 6.9.1 was presented, is there anything intrinsically wrong with the frame F³? I would argue that there isn't.

The question of the acceptability of F³ is actually an instance of a larger question: Are there any reasonable constraints on how the subspaces can be chosen? Recall that a natural approach to doing the partitioning was discussed in Section 3.4. There, the subspaces were determined by the possible hypotheses. The "hypotheses" in F² are "Alice chose 0" and "Alice chose 1." In F³, there are four hypotheses: "Alice chose 0 and her coin lands heads," "Alice chose 0 and her coin lands tails," and so on. To see that such hypotheses are not so unnatural, consider the one-coin problem from Chapter 1 again.

Example 6.9.2

This time Alice just tosses a fair coin and looks at the outcome. What is the probability of heads according to Bob? (I am now interested in the probability after the coin toss.) Clearly before the coin was tossed, the probability of heads according to Bob was 1/2. Recall that there seem to be two competing intuitions regarding the probability of heads after the coin is tossed. One says the probability is still 1/2. After all, Bob has not learned anything about the outcome of the coin toss, so why should he change his valuation of the probability? On the other hand, runs the counterargument, once the coin has been tossed, does it really make sense to talk about the probability of heads? It has either landed heads or tails, so at best, Bob can say that the probability is either 0 or 1, but he doesn't know which.

How can this example be modeled? There are two reasonable candidate frames, which again differ only in Bob's probability assignment. Let Fⁱ = ({H, T}, _A, _B, _A, _Bⁱ, i = 4, 5, where

_A(w) = {w}, for w ∈{H, T } (Alice knows the outcome of the coin toss);
_B(w) = {H, T} (Bob does not know the outcome); and
_A(w) = ({w}, _w, μ_w) (Alice puts the obvious probability on her set of possible worlds, which is a singleton, just as in _B³).

It remains to define _B⁴ and _B⁵._B⁴ is the probability assignment corresponding to the answer 1/2; _B⁴(w) = ({H, T}, 2^{{H, T}}, μ), where μ(H) = μ(T) = 1/2, for both w = H and w = T. That is, according to _B⁴, Bob uses the same probability space in both of the worlds he considers possible; in this probability space, he assigns both heads and tails probability 1/2. ⁵_B is quite different; ⁵_B(w) = _A(w) = ({w}, _w, μ_w). It is easy to see that in each world in F⁴, Bob assigns probability 1/2 to heads. On the other hand, in each world in F⁵, Bob assigns either probability 0 or probability 1 to heads. Thus, Bob knows that the probability of heads is either 0 or 1, but he does not know which (Exercise 6.20). Note that F⁴ is similar in spirit to F², while F⁵ is similar to F³. Moreover, F⁴ and F⁵ give the two answers for the probability of heads discussed in Chapter 1: "it is 1/2" vs. "it is either 0 or 1, but Bob does not know which."

In ⁵, the components of the partition can be thought of as corresponding to the hypotheses "the coin landed heads" and "the coin landed tails." While this is reasonable, besides thinking in terms of possible hypotheses, another way to think about the partitioning is in terms of betting games or, more accurately, the knowledge of the person one is betting against. Consider Example 6.9.2 again. Imagine that besides Bob, Charlie is also watching Alice toss the coin. Before the coin is tossed, Bob may be willing to accept an offer from either Alice or Charlie to bet $1 for a payoff of $2 if the coin lands heads. Half the time the coin will land heads and Bob will be $1 ahead, and half the time the coin will land tails and Bob will lose $1. On average, he will break even. On the other hand, Bob should clearly not be willing to accept such an offer from Alice after the coin is tossed (since Alice saw the outcome of the coin toss), although he might still be willing to accept such an offer from Charlie. Roughly speaking, when playing against Charlie, it is appropriate for Bob to act as if the probability of heads is 1/2, whereas while playing against Alice, he should act is if it is either 0 or 1, but he does not know which. Similarly, under this interpretation, the frame F⁵ in Example 6.9.2 amounts to betting against an adversary who knows the outcome of the coin toss, while F⁴ amounts to betting against an adversary that does not know the outcome. This intuition regarding betting games can be generalized, although that is beyond the scope of the book. (See the notes for references.)

Example 6.9.1, where there are nonprobabilistic choices as well as probabilistic choices, may seem artificial. But, in fact, such situations are quite common, as the following example shows:

Example 6.9.3

Consider the problem of testing whether a number n is prime. There is a well-known deterministic algorithm for testing primality, often taught in elementary school: test each number between 1 and n to see if it divides n evenly. If one of them does, then n is composite; otherwise, it is prime.

This algorithm can clearly be improved. For example, there is no need to check all the numbers up to n − 1. Testing up to suffices: if n is composite, its smaller factor is bound to be at most Moreover, there is no need to check any even number besides 2. Even with these improvements, if n is represented in binary (as a string of 0s and 1s), then there are still exponentially many numbers to check in the worst case as a function of the length of n. For example, if n is a 100-digit number, the square root of n is a 50-digit number, so there will still be roughly 2⁵⁰ numbers to check to see if n is prime. This is infeasible on current computers (and likely to remain so for a while). Until recently, no polynomial-time algorithm for testing primality was known. Now there is one. Nevertheless, by far the fastest primality-testing algorithm currently available is probabilistic.

Before discussing the algorithm, it is worth noting that the problem of testing whether a number is prime is not just of academic interest. The well-known RSA algorithm for public-key cryptography depends on finding composite numbers that are the product of two large primes. Generating these primes requires an efficient primalitytesting algorithm. (Theorems of number theory assure us that there are roughly n/ log n primes less than n and that these are well distributed. Thus, it is easy to find a prime quickly by simply testing many 100-digit numbers generated at random for primality, provided that the primality test itself is fast.)

The probabilistic primality-testing algorithm is based on the existence of a predicate P that takes two arguments, n and a (think of n as the number whose primality we are trying to test and a as an arbitrary number between 1 and n), and has the following properties:

P(n, a) is either 1 (true) or 0 (false); computing which it is can be done very rapidly (technically, in time polynomial in the length of n and a).
If n is composite, then P(n, a) is true for at least n/2 possible choices of a in {1,…, n − 1}.
If n is prime, then P(n, a) is false for all a.

The existence of such a predicate P can be proved using number-theoretic arguments. Given P, there is a straightforward Choose, say, 100 different values for a, all less than n, at random. Then compute P(n, a) for each of these choices of a. If P(n, a) is false for any of these choices of a, then the algorithm outputs "composite"; otherwise, it outputs "prime." This algorithm runs very quickly even on small computers. Property (1) guarantees that this algorithm is very efficient. Property (3) guarantees that if the algorithm outputs "composite," then n is definitely composite. If the algorithm outputs "prime," then there is a chance that n is not prime, but property (2) guarantees that this is very rarely the case: if n is indeed composite, then with high probability (probability at least 1 − (1/2)¹⁰⁰) the algorithm outputs "composite."

Corresponding to this algorithm is a set of runs. The first round in these runs is nonprobabilistic: an input n is given (or chosen somehow). The correctness of the algorithm should not depend on how it is chosen. Moreover, in proving the correctness of this algorithm, it seems inappropriate to assume that there is a probability measure on the inputs. The algorithm should be correct for all inputs. But what does "correct" even mean? The preceding arguments show that if the algorithm outputs "composite," then it is correct in the sense that n is really composite. On the other hand, if the algorithm outputs "prime," then n may not be prime. It might seem natural to say that n is then prime with high probability, but that is not quite right. The input n is either prime or it is not; it does not make sense to say that it is prime with high probability. But it does make sense to say that the algorithm gives the correct answer with high probability. (Moreover, if the algorithm says that n is prime, then it seems reasonable for an agent to ascribe a high subjective degree of belief to the proposition "n is prime.")

The natural way to make this statement precise is to partition the runs of the algorithm into a collection of subsystems, one for each possible input, and to prove that the algorithm gives the right answer with high probability in each of these subsystems, where the probability on the runs in each subsystem is generated by the random choices for a. While for a fixed composite input n there may be a few runs where the algorithm incorrectly outputs "prime," in almost all runs it will give the correct output. The spirit of this argument is identical to that used in Example 6.9.1 to argue that the probability that Alice performs action a is 1/2, because she performs a with probability 1/2 whichever number she chooses in the first round.

Example 6.9.4

The coordinated attack problem is well-known in the distributed systems literature. Two generals, A and B, want to coordinate an attack on the enemy village. If they attack together, they will win the battle. However, if either attacks alone, his division will be wiped out. Thus, neither general wants to attack unless he is certain that the other will also attack. Unfortunately, the only way they have of communicating is by means of a messenger, who may get lost or captured. As is well known, no amount of communication suffices to guarantee coordination. But suppose it is known that each messenger sent will, with probability .5, deliver his message within an hour; otherwise, the messenger will get lost or captured, and never arrive. Moreover, messengers are independent. In that case, there is a trivial algorithm for coordination with high probability. If General A wants to attack on a particular day, A sends n messengers with messages saying "Attack at dawn" over an hour before dawn, then attacks at dawn. General B attacks at dawn iff he receives a message from A instructing him to do so. It is easy to see that if A attacks, then with probability 1 − (1/2)ⁿ, B does too, and if A does not attack, then B definitely does not attack. Thus, the probability of coordination is high, whether or not A attacks.

Consider the question of whether there is an attack on a particular day d. The multiagent system representing this situation can be partitioned into two sets of runs, one corresponding to the runs where A wants to attack on day d, and the other corresponding to the run where A does not want to attack on day d. The runs where A wants to attack differ in terms of which of the messengers actually manages to deliver his message. It is easy to describe a probability measure on each of these two sets of runs separately, but there is no probability measure on the whole set, since the problem statement does not give the probability that A wants to attack. Nevertheless, by partitioning the runs, depending on A's choice, it is possible to conclude that both A and B know (indeed, it is common knowledge among them) that, with high probability, they will coordinate, no matter what A chooses to do.

In Examples 6.3.1, 6.9.3, and 6.9.4, only the first choice is nonprobabilistic. (In the coordinated attack example, the nonprobabilistic choice is whether or not A wants to attack; this can be viewed as an initial nondeterministic choice made by the environment.) However, in general, nonprobabilistic choices or moves may be interleaved with probabilistic moves. In this case though, it is possible to represent what happens in such a way that all nonprobabilistic choices happen in the first round. The following example should help make this clear:

Example 6.9.5

Suppose that Alice has a fair coin and Bob has a biased coin, which lands heads with probability 2/3. There will be three coin tosses; Charlie gets to choose who tosses each time. That is, in rounds 1, 3, and 5, Charlie chooses who will toss a coin in the next round; in rounds 2, 4, and 6, the person Charlie chose in the previous round tosses his or her coin. Notice that, in this example, the probabilistic rounds—the coin tosses—alternate with the nonprobabilistic rounds—Charlie's choices.

One way of representing this situation is by taking the environment state at a point (r, m) to represent what happened up to that point: who Charlie chose and what the outcome of his or her coin toss was. Since what happens is public, the local states of Alice, Bob, and Charlie can be taken to be identical to that of the environment. It is easy to see that, with this representation, there are 64 (= 2⁶) runs in the system, since there are two possibilities in each round. In one run, for example, Charlie chooses Alice, who tosses her coin and gets heads, then Charlie chooses Bob, who gets heads, and then Charlie chooses Alice again, who gets tails.

What is the probability of getting heads in all three coin tosses? That depends, of course, on who Charlie chose in each round; the story does not give probabilities for these choices. Intuitively, however, it should be somewhere between 1/8 (if Alice was chosen all the time) and 8/27 (if Bob was chosen all the time). Can the set of runs be partitioned as in Example 6.3.1 to make this precise?

When the only nonprobabilistic choice occurs at the beginning, it is straightforward to partition the runs according to the outcome of the nonprobabilistic choice. (The possible choices can be taken to be the "hypotheses.") However, because the nonprobabilistic choices here are interspersed with the probabilistic moves, it is not so obvious how to do the partitioning. One approach to dealing with this problem is to convert this situation to one where there is only one nonprobabilistic choice, which happens at the beginning. The trick is to ask what Charlie's strategy or protocol is for deciding who goes next. Charlie may have a very simple strategy like "pick Alice in every round" or "pick Alice, then Bob, then Alice again." However, in general, Charlie's strategy may depend on what has happened thus far in the run (in this case, the outcome of the coin tosses), as encoded in his local state. (I am implicitly assuming that Charlie sees the outcome of each coin toss here. If not, then Charlie's local state would not include the outcome, and hence his strategy cannot depend on it.) For example, Charlie's strategy may be the following: "First pick Alice. If she tosses tails, pick Alice again; otherwise, pick Bob. If whoever was picked the second time tosses tails, then pick him/her again; otherwise, pick the other person." Note that both this strategy and the strategy "pick Alice, then Bob, then Alice again" are consistent with the run described earlier. In general, more than one strategy is consistent with observed behavior.

In any case, notice that once Charlie chooses a strategy, then all his other choices are determined. Fixing a strategy factors out all the nondeterminism. The story in this example can then be captured by a multi-agent system where, in the first round, Charlie chooses a strategy and from then on picks Alice and Bob according to the strategy. In more detail, Charlie's local state now would include his choice of strategy together with what has happened thus far, while, as before, the local state of Alice, Bob, and the environment just describe the observable events (who Charlie chose up to the current time and the outcome of the coin tosses).

It can be shown that Charlie has 2²¹ possible strategies (although many of these can be identified; see Exercise 6.21(a)), and for each of these strategies, there are eight runs, corresponding to the possible outcomes of the coin tosses. Thus, this representation leads to a system with 2²¹ 8 = 2²⁴ runs. These runs can be partitioned according to the initial choice of strategy; there is a natural probability measure on the eight runs that arise from any fixed strategy (Exercise 6.21(b)). With this representation, it is indeed the case that the probability of getting three heads is somewhere between 1/8 and 8/27, since in the probability space corresponding to each strategy, the probability of the run with three heads is between 1/8 and 8/27.

Each of the 64 runs in the original representation corresponds to 2¹⁸ runs in this representation (Exercise 6.21(c)). Although, in some sense, the new representation can be viewed as "inefficient," it has the advantage of partitioning the system cleanly into subsystems, each of which gives rise to a natural probability space.

As this example shows, in systems where there is possibly an initial nonprobabilistic choice, and from then on actions are chosen probabilistically (or deterministically) as a function of the local state, the system can be partitioned into subsystems according to the initial choice, and each subsystem can be viewed as a probability space in a natural way. The key point is that, once an agent can partition the set of runs in the system and place a probability on each subspace of the partition, the techniques of Section 6.4 can be applied with no change to get a probability assignment. These probability assignments satisfy UNIF and CONS, but not necessarily SDP. If agent i partitions the set of runs into subspaces ₁, …, _m, then _i(r, m) is then partitioned into _i(r, m) (_j), j = 1, …, m. If (r′, m′) ∈ _i(r, m), then W_r′,m′,i is the unique set _i(r, m)(_j) that includes (r′, m′).

Each of the three systems F¹, F², and F³ in Example 6.9.1 can be thought of as arising from this construction applied to the four runs illustrated in Figure 6.8. If Bob does not partition these runs at all, but takes the only nontrivial measurable sets to be {r₁, r₃} and {r₂, r₄}, the resulting frame is F¹. Frame F¹ satisfies SDP precisely because the original set of runs is not partitioned into separate subspaces. If Bob partitions the runs into two subspaces {r₁, r₂} and {r₃, r₄}, then the resulting frame is F². Finally, if the four runs are partitioned into four separate subspaces, the resulting frame is F³. Since F² is the frame that best seems to capture the informal argument, what all this shows is that, although SDP may seem like an unavoidable assumption initially, in fact, there are many cases where it is too strong an assumption, and UNIF may be more appropriate.