Paper Review: Why Conditionalize?

How should we change our beliefs in the light of new information? This is one of the central questions of epistemology, and has great practical importance.

For example, consider a doctor who has a patient who is concerned he might have cancer. The doctor has certain beliefs: for example, she may think that her patient is unlikely to have cancer since he is a young man. Now she performs a medical test on her patient, and she gets the result “positive”. The test is pretty good but not error-free. How should she change her beliefs based on this new evidence? This matters because she might suggest different courses of action depending on how probable she thinks it is that he has cancer.

Bayesian epistemology has an answer to this question: the agent should conditionalize.

For a Bayesian, an agent’s beliefs are probabilities. What this means is that instead of simply believing something or not, a Bayesian agent believes it to a certain degree — we call this a credence. For example, you might ask your friend “will a human set foot on the moon in the next decade?” We usually think your friend will have one of two beliefs: either she believes that someone will set foot on the moon in the next decade, or she believes that no one will. Of course she might be uncertain about this, but we still think of her as having this kind of full belief in a proposition.

However, if your friend is a Bayesian (like many of mine), she will not have a belief one way or another. Instead, she will have a credence. She would say something like “my credence that someone will land on the moon on the next decade is 0.25”. If she is coherent — if her beliefs have the mathematical structure of probability — then she will assign credence 0.75 to the proposition “no one will land on the moon in the next decade”, since it is a law of probability that the probability of a proposition and the probability of its negation must sum to 1.

Why should we believe that our beliefs should have the mathematical structure of probability? What, for example, is wrong with having a credence of 0.8 in a proposition A and 0.3 in the proposition not A ?

One reason that philosophers give is Dutch book arguments. Suppose we had an agent whose beliefs were as above: she assigns a credence of 0.8 in a proposition A and 0.3 in the proposition not A. Then we can construct a collection of bets that the agent finds acceptable such that, no matter how the world turns out, the agent will lose money.

Since the agent assigns a credence of 0.8 to A, she is willing to spend 80 cents buy a ticket from me that pays $1 is A is true. This is because, from her perspective, since she has an 80% chance to win $1, and otherwise she wins $0, the ticket is worth 80 cents. So she doesn’t care whether she buys the bet or not. We can do the same thing with not A — sell her a ticket for 30 cents that pays $1 if not A is true.

Now let us consider what happens in both possible outcomes.

  1. Suppose A obtains. Then since she bought both bets, she has paid me 80 + 30 = 110 cents. Since A is true, the ticket that pays $1 if A pays her $1. However, the other ticket does not. So overall she is down 10 cents, and I am up 10 cents.
  2. Suppose not A obtains. Then since she bought both bets, she has paid me 80 + 30 = 110 cents. Since A is false, the ticket that pays $1 if not A pays her $1. However, the other ticket does not. So overall she is down 10 cents, and I am up 10 cents.

Thus we see that no matter how the world turns out, she loses money. This set of bets is called a “Dutch book”, and Bayesians view the existence of such a book as revealing an inconsistency in her beliefs. We can construct Dutch books anytime an agent has beliefs that violate the probability axioms.

So far we have only considered synchronic beliefs — beliefs of an agent at one time. But recall that our original concern was how we should update our beliefs given new information. This is a question about diachronic beliefs — beliefs of an agent across time.

In general, Bayesians think that (for an idealized agent) the correct way to update beliefs is through conditionalization. To understand what exactly this is we have to understand what a conditional probability is. We write a normal probability of a proposition A as P(A). As we saw earlier, we can think of this in connection to betting. In particular, this is how valuable (in dollars) the agent considers a ticket that pays out $1 if A is true and $0 otherwise. We write a conditional probability like this


and we read it as “the probability of A given B“. We can think of a conditional probability as the value that an agent assigns to a bet on A conditional on B. If a bet is conditional on a proposition B then if is called off if B is false. For example, if we are betting on A conditional on B then we might have the following outcomes:

  • if A and B happen we get $1
  • if not A and B happen we lose $1
  • if not B then the best is called off; we neither gain nor lose

An agent’s conditional probability of A given B is the value that she assigns to such a conditional bet, where she is refunded the price of the ticket if B does not obtain.

It is a theorem of the probability calculus that the conditional probability is given by

P(A|B) = \frac{P(B|A)P(A)}{P(B)}

This is the celebrated Bayes’ theorem.

Now we have the resources to understand conditionalization. We know that a rational agent’s beliefs are governed at a particular time by the probability axioms. However, when an agent learns something, her beliefs might change. If her old beliefs are given by the probability function P_{old}, how should she choose her new probability function? The principle of conditionalization says that an agent should choose her new probability function such that

P_{new}(A) = P_{old}(A|E)

where E is the proposition expressing the evidence that the agent learned. In words, this means that if an agent observes some evidence E, then her new beliefs should be equal to her old beliefs conditional on E.

Why conditionalize? Why have such a diachronic constraint on an agent’s beliefs?

This is the central question of David Lewis’ paper “Why conditionalize?” (obviously).

***The original paper can be found here.***

This paper is a classic in formal epistemology. It originally started as handout for a course he was teaching, but he eventually published it after realizing the argument hadn’t be expressed elsewhere. Let’s go through the argument together.

His argument is a diachronic Dutch book. Similar to the synchronic Dutch book arguments, the strategy is to show that if an agent does not update her beliefs according to the principle of conditionalization then she will be susceptible to a Dutch book. It is important to note what the actual force of a Dutch book is. As Lewis writes

“Note also that the point of any Dutch book argument is not that it would be imprudent to run the risk that some sneaky Dutchman will come and drain your pockets. After all, there aren’t so many sneaky Dutchmen around; and anyway, if ever you see one coming, you can refuse to do business with him. Rather, the point is that if you are vulnerable to a Dutch book, whether synchronic or diachronic, that means that you have two contradictory opinions about the expected value of the very same transaction. To hold contradictory opinions may or may not be risky, but it is in any case irrational.” (pp. 404-405)

Let us go through his argument carefully.

Lewis considers an agent who is synchronically coherent. That is, at any particular moment in time her beliefs are probabilities. We can think of an agent with beliefs given by the probability function P_{old} at time 0. Suppose that E_{1}, E_{2},\ldots,E_{n} are mutually exclusive and jointly exhaustive propositions that specify all possible experience the agent could have between time 0 and time 1. A set of propositions are mutually exclusive if at most one of them can be true. For example, if we are rolling a six sided die once then the propositions “the die comes up 1” and “the die comes up 2” are mutually exclusive, since at most one can be true. A set of propositions are jointly exhaustive if at least one must happen. The propositions “the die comes up i” for each 1 \leq i \leq 6 are both mutually exclusive and jointly exhaustive.

So, basically, the n E_{i}s represent all the different possible learning experiences the agent could have. Now we can think of what the agent’s new probabilities would be if she observes a particular outcome. Let us call the new probability function she would have if she observed E_{i} P_{i}.

We recall from earlier that the principle of conditionalization says that if an agent observes some evidence E an agent should set her new probabilities to her hold probabilities conditional on E. If we follow Lewis and express this a little more carefully, then we say that an agent conditionalizes if for any proposition A the agent has that

P_{i}(A) = P_{old}(A|E_{i})

The strategy is to show that an agent is immune to a Dutch book if and only if she conditionalizes. Again, like in our earlier Dutch book example, Lewis supposes that your probability in any proposition A is indicative of the price at which you are indifferent between buying or selling the bet [$1 if A, 0 otherwise].

Okay, now let us construct the bets that we would use to Dutch book the agent. I follow Lewis’ construction. Since this is a diachronic context, we will be making bets at a few different points in time. There are two cases to consider: if your new probability in a proposition is higher than your old conditional probability, and if your new probability is lower than your conditional probability. Let us consider this second case first.

  1. I sell you two bets for the maximum price you will pay for them. The bets are [$1 if A\&E_{i}, $0 otherwise] and [$x if not-E_{i}, $0 otherwise] where x = P_{old}(A|E_{i}).
  2. Wait to see which of the E_{i} the agent experiences.
  3. If E_{i} is true then buy from the agent the bet [$1 if A, $0 otherwise] for the minimum price the agent will accept, which is P_{i}(A).

Now consider what happens in the two different possible outcomes: either E_{i} obtains or it does not. If it does, then the agent will have a net loss of P_{old}(A|E_{i}) - P_{i}(A) which is positive because we supposed that the agent’s new probability is lower than her old conditional one. So in this case she loses money. In the other possible outcome, in which E_{i} is false, the loss is $0. Thus, as Lewis writes

“As a result of your failure to conditionalize, I can inflict on you a risk of loss uncompensated by any chance of gain; and I can do this without at any point using knowledge that you do not have.” (p. 406)

We can deploy a similar method to Dutch book the agent if her new probabilities are greater than her old conditional probabilities: we would buy at the first step and sell at the third. It really is just the principle “buy low, sell high”.

We have a diachronic Dutch book for the Bayesian principle of conditionalization. (If we are being very very careful, we would say that this is not a true Dutch book, since the agent doesn’t lose money no matter what happens — however, as Lewis said in the quote earlier, the agent is willing to take a risk of loss uncompensated by any chance of gain, which is also clearly bad.) Thus, we observe with Lewis that “If you can be thus exploited you are irrational; so you are rational only if you conditionalize” (p. 406).

It pays to be Bayes.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s