Why should one expect the future to resemble the past?
This is one formulation of Hume’s problem of induction. Consider the claim that the sun will rise tomorrow. Why should we expect this? It is true that it has risen every day we’ve been alive, and every day for a few billion years before that. But why should we expect it to rise tomorrow? It is not logically inconsistent for the sun to rise a large number of days in a row, and then one day not.
Sandy Zabell claims that a theorem due to Bruno de Finetti leads to a partial solution of Hume’s problem of induction. This is one of the central claims of his essay “Symmetry and its Discontents” in the volume by the same name. Since I found this result incredibly interesting, I will focus on this argument. However, the rest of the paper is certainly worth reading as well, and (for example) answered a question I’ve wondered in the past: why didn’t the ancient Greeks have probability theory? Thus, clearly, this post does not exhaust the insights of this paper.
Suppose you have found a coin, and you are flipping it. It can come up either heads or tails. You have certain beliefs about the outcomes of the tosses before you flip it. For example, even though you have never flipped this particular coin before, you might think that heads and tails are equally likely on the next flip. As you flip this coin and observe the outcomes, you might shift your beliefs. For example, if H=heads and T=tails, and you observe the sequence
you might shift your beliefs so that you think heads is more likely that tails on the next toss, since you have observed more heads in this sequence.
One intuitive way to think of this that you might already be employing is to think of the bias of the coin. For example, the coin could be biased towards heads, so that the chance of the coin landing heads on each toss is 2/3 (and thus the chance of it landing tails in 1/3). When you first find the coin and you don’t know its bias, you might have a few different hypotheses about the chances. For example, you might think it very likely that the coin has a chance of 1/2 towards head (and 1/2 towards tails). But you might also entertain the hypothesis that the coin has a chance of 3/4 to come down heads, and another hypothesis that it has a chance of 1/4 to come down heads.
These are different chance hypotheses. It is common in statistical inference to set up a statistical model that has these chance hypotheses, and then as we observe more data (for example, coin flips), we update our beliefs about the chances. We then use our beliefs about how likely it is that the coin has different biases to calculate how likely we think it is for the coin to come up heads (or tails) on the next toss. We also needn’t limit ourselves to considering only a finite number of chance hypotheses; using a little more sophisticated mathematics, we can have a probability distribution over all chance hypotheses.
So we have the following sketch: we find this coin. We have certain beliefs about how likely the coin is to come down heads or tails. We have some uncertainty about what the true chance of the coin coming up heads is, but as we observe more and more flips we can update our beliefs (for example, using Bayes’ theorem) about the chances. We then use our beliefs about the chances to make predictions about how likely the coin is to come up heads on the next toss. This is the kind of Bayesian inference that we take to characterize good inductive reasoning.
In this context, we might think that we have the following Hume’s-problem-of-induction-flavoured problem: what justifies us in believing that there is such a chance in the background? That is, it seems that we are getting to make inferences about the future because we think that this coin has a chance that is not changing, and that this is why the future will resemble the past. What justifies us in believing there is such a chance?
One might also have concerns about chance for other reasons. Zabell notes, for example, that if the exact same coin were spun in the exact same way, it would come up the same way. Barring quantum phenomena, coins are actually deterministic objects. There isn’t really a chance that the coin has. It is more our uncertainty about the toss than the coin itself that is bringing in probability. Although Zabell does not say this in the paper, I have heard other philosophers say this: “chance is cheesy.”
How, then, are we justified in using a statistical model that invokes chances?
This is where de Finetti’s theorem comes in to save the day. Suppose we don’t like chances for one reason or another. However, I am fine with having beliefs about the sequences of observations I could make, without appeal to chance. For example, before I flip the coin, I might think that the sequence
is more likely than
Suppose my beliefs about how the coin will land satisfy the following condition. For each sequence of a certain length that has the same number of heads, my belief that that sequence will obtain is the same. Zabell uses the example of three flips. If my beliefs satisfy this condition, then my belief (represented here by the probability function P) is such that
P(HTT) = P(THT) = P(TTH)
P(HHT) = P(HTH) = (THH)
That is, I assign the same probability to sequences with the same number of heads. We should notice two things here. The first is that it is not the case that P(HTT) need to equal P(HHT). These are sequences with different numbers of heads. The second thing to notice is that we have not referred to chances at all here. We are only talking about my belief (represented by the probability function P) that I will observe a certain sequence.
This condition is called exchangeability. Basically, if I have a certain degree of belief p that I will observe a sequence (for example HHT), then if however that sequence is rearranged (for example THH) I assign it the same degree of belief p then my beliefs are said to be exchangeable.
Now we can (informally) state de Finetti’s influential result:If the probability function P satisfies exchangeability, then the following two things are true:
- The limiting relative frequency of the sequence exists.
- We can represent this probability function using a distribution over chance hypotheses.
(Of course this is an incredibly informal statement of the theorem–for the actual statement, see Zabell’s essay.)
This is a fairly remarkable theorem. The first of the two results says that if our beliefs over a sequence of observations are exchangeable then the infinite sequence of such observations will have a limiting relative frequency (this is not always the case). The second says that talking about chances and chance hypotheses is mathematically equivalent to talking about just our beliefs.
These results help to address our concerns. We were concerned before that talking about chances was cheesy for a number of reasons–we didn’t want to commit ourselves to believing in chances. But we see from this result that we don’t have to. Whenever we talk about chances, we could always in principle reduce our chance talk to just statements about our beliefs. Thus, even though we might use chance hypotheses and the associated statistical models for the sake of expediency, this does not commit us into believing in chances.
Furthermore, the first of the two results means that if our beliefs are exchangeable, then we expect that in the long run there will be some stable relative frequency of heads to tails. Of course, talk of heads and tails is not important, and the result is much more general–this can be about whether or not the sun rises for example.
This gives us what Zabell takes to be a partial resolution to Hume’s problem of induction, if the problem is cast a certain way. Zabell describes the problem as follows:
“In the coin-tossing situation, this [problem] reduces to: in a long sequence of tosses, if a coin comes up heads with a certain frequency, why are we justified in believing that in future tosses of the same coin, it will come up heads (approximately) the same fraction of the time?” (p. 5)
We see how de Finetti’s results helps us. If our beliefs about the outcomes are exchangeable, then we believe that there will be a relative frequency of heads to tails in the long run. Thus, from exchangeability alone we get that we expect the world (of observations) to have some kind of global structure. Furthermore, because we can mathematically represent our beliefs as if we have hypotheses about chances, we can do Bayesian inference in a chance framework. Furthermore, for all but certain “highly opinionated, eccentric, or downright kinky ‘priors'” (Zabell, p. 5) the posterior distribution over chance hypotheses will be peaked around the observed relative frequency. For example, if we have observed one million tosses, and about 50% of them have come down heads and 50% have come down tails, then the posterior distribution over chance hypotheses will be peaked around the 1/2 heads chance hypothesis.
Thus, with respect to this particular version of Hume’s problem of induction, “de Finetti vanquishes Hume” (Zabell, p. 5).
As mentioned earlier, this is only one part of the paper. Zabell goes on to examine this assumption of exchangeability, and describes it as a kind of symmetry in our beliefs. He then explores justifications for such symmetry assumptions and the role they played in the historical development of inference and probability theory. I hope this summary has piqued your interest, and I strongly encourage you to read the paper (and the whole book!)–it is well worth it.
1 thought on “Paper Review: Symmetry and its Discontents”
[…] The infamous no free lunch theorem (NFL theorem) asserts that all computable prediction methods have equal expected success. Computer scientists, and occasionally philosophers, often describe this result as a computer-science cousin of Hume’s problem of induction. […]