Paper Review: Philosophy and the practice of Bayesian statistics

One of the more interesting things I’ve learned in my life is that our best account of epistemology is that rational beliefs are governed by the probability axioms. Furthermore, there is a specific way in a rational agent updates her beliefs given new evidence—Bayesian conditionalization.

Of course there is disagreement on this. At this point in my life, taking into consideration all of the arguments I’ve heard for one position or another, the balance of evidence seems to point pretty strongly towards the Bayesian account of epistemology. Thus, I am always very interested in arguments against the Bayesian picture—especially arguments that others take to be good arguments. I want to make sure I haven’t missed anything. If Bayesian epistemology is correct I want to know it is correct; if Bayesian epistemology is incorrect I want to know it is incorrect.

Andrew Gelman and Cosma Shalizi are two influential statisticians, and they have written one of the more popular arguments against Bayesianism. One of my Bayesian friends suggested we read the paper and see whether we found it persuasive. This lead to a very rich meeting over coffee with some lovely and brilliant people— Saira Khan, Nikhil Addleman, and Aydin Mohseni (who wrote a guest post not long ago)—to whom I am indebted for their help in gaining clarity on the article.

I like the idea of thesis culture, so I’ll practice it here. My claim in this post is that although Gelman and Shalizi make interesting and worthwhile points concerning the practice of Bayesian statistics and the role it plays in our scientific methodology, overall their criticisms fail to make contact with Bayesian epistemology. That is, their argument against Bayesian epistemology fails. Having said that, I will highlight a particular way in which their argument could persuade someone for whom the main evidence for Bayesian epistemology is the success of Bayesian statistics (this is not my main evidence for Bayesian epistemology, nor even a large part of it).

***The original paper can be found here.***

The main target of the paper is “the usual story” of Bayesian statistics, “which [the authors] don’t like” (p. 8). The usual story is basically the idea I sketched earlier. An agent has some degrees of beliefs, and she updates them according to new evidence via Bayesian conditionalization. In the statistical context Gelman and Shalizi are working in, we can think of the agent as having a set of models under consideration. For example, she might have three models she is considering — a fair coin, a coin with a 40% chance of coming down heads, and a coin with a 60% chance of coming down tails. She has a probability distribution over these different models, which captures her uncertainty about which model is the actual data generating process in the real world.

As the agent gains new evidence — for our example she might flip coins and then record the outcome — she updates her probability distribution over models via Bayesian conditionalization. Gelman and Shalizi provide a helpful schematic of what this might look like.

This picture seems inductive: an agent starts with certain beliefs, and as she encounters new pieces of data she revises those beliefs in a coherent fashion.

This is the view of Bayesianism that they dislike. Instead of endorsing this view, Gelman and Shalizi aim “to demonstrate a possible Bayesian philosophy that goes beyond the usual inductivism and can better match Bayesian practice as [they] know it” (p. 11).

What is this other view? It is that Bayesian inference is actually better understood as part of sophisticated forms of the hypothetico-deductive method. This is the kind of philosophy underlying classical statistics. To understand their position here let’s sketch the classical view of inference, and then see why they think Bayesian inference supports the classic view over their disprefered “usual story”.

Classical statistics follows the logic of falsification. Under this view, you don’t actually gain evidence for any particular theory or hypothesis. Instead, you only ever rule out, or falsify theories. Gelman and Shalizi write

Scientists devise hypotheses, deduce implications for observations from them, and test those implications. Scientific hypotheses can be rejected (i.e., falsified), but never really established or accepted in the same way.

p. 8-9

Consider as a simple example that you have the hypothesis “my roommate didn’t do the dishes”. From this hypothesis you deduce that, if you enter your kitchen and look in the sink, you should see dirty dishes. If you look in the sink and do not see dirty dishes, then you have falsified the hypothesis (you actually need some background assumptions here too — that no fairies did the dishes for your roommate, for example — but you get the idea).

However, from seeing dirty dishes in the sink you can’t confirm your hypothesis. Maybe you forget you had dishes you didn’t do, and your roommate really did the dishes after all. Or maybe your brother broke in and left some dirty dishes. So, even though you can falsify the hypothesis, you can’t confirm it.

The connection to classical statistics is apparent when we can’t actually falsify things. For example, if I have the hypothesis “this coin is fair”, then that hypothesis is logically compatible with any sequence of tosses. However, if I toss the coin 100 times and 98 of them are heads, then even though we haven’t seen something incompatible with the hypothesis, we have seen something that is unlikely given the hypothesis. Classical statistics formalizes this kind of reasoning, which for ease of reference I will call soft falsification. Soft falsification is the reasoning pattern in which we reject a hypothesis because we got data very unlikely given the hypothesis, but we recognize in principle that the hypothesis could still be correct (as opposed to the strict hypothetico-deductive method).

An important part of this process is model checking, in which a statistician examines the predictions of a statistical model to see if they are compatible enough with the observed data. If the model’s predictions are too far off, then this is a key part of the soft falsification process.

Now we are in a position to understand Gelman and Shalizi’s argument against the usual Bayesian story. The following is the core.

In the usual Bayesian story there is a set of models with a probability distribution over them. As new observations are made, the probability distribution gets updated according to Bayesian conditioning. In the usual Bayesian story this is all that happens. We can think of this as a kind of model selection, in which we are trying to find the true (or most predictively useful) model in our set of models. Under certain background assumptions this has really nice properties like the agent is guaranteed to converge to the truth in a formal sense.

However, Gelman and Shalizi show that, in practice, this is not at all how Bayesian statistics is used. Instead, it works as follows. A statistician has some phenomenon which she wants to investigate. She then constructs a set of models which are her best first attempts to model the phenomenon. She does the normal Bayesian updating on her models for a bit, changing the distribution over them as new data come in. Most of the time, however, she notices that the ensemble of models are not doing a great job of predicting future observations. She can make this judgement in a number of ways, from formally specified statistical tests to visually inspecting certain summary statistics and visualizations of the data. If the fit is bad enough, then she can reject that set of models and move to a different set that she constructs, using some of the insights from her judgments about why the previous set failed. According to this view, Bayesian inference helps us with expanding our set of models to better and better ones.

This entire model checking stage is entirely outside of the usual Bayesian story. In the usual story there is only updating via conditionalization, and that is the end of the story. However, we see that the way in which Bayesian statistics is actually used fits much more cleanly into the classical hypothetico-deductive framework. Gelman and Shalizi summarize the disagreement

The main point where we disagree with many Bayesians is that we do not see Bayesian methods as generally useful for giving the posterior probability that a model is true, or the probability for preferring model A over model B, or whatever.

p. 22

They connect this to the broader Bayesian philosophy as follows:

Conversely, a problem with the inductive philosophy of Bayesian statistics – in which science ‘learns’ by updating the probabilities that various competing models are true – is that it assumes that the true model (or, at least, the models among which we will choose or over which we will average) is one of the possibilities being considered. This does not fit our own experiences of learning by finding that a model does not fit and needing to expand beyond the existing class of models to fix the problem.

p. 32

We see that Gelman and Shalizi are arguing against a particular philosophical stance — the position that Bayesian inference à la usual story is a full account of inductive inference (compare with the abstract of the paper).

Now that we have the core of their argument on the table, I want to articulate why I do not think this argument makes contact with the philosophical position against which they are arguing.

I agree that the way in which statisticians as a matter of fact do Bayesian statistics is not in line with the usual Bayesian story. However, the usual Bayesian story is not meant to be an account about the particular way in which statisticians use their models. I think we can understand the gap between the philosophical Bayesian position and the practice of statistics by thinking about the level of idealization.

Consider how Gelman and Shalizi start their abstract:

A substantial school in the philosophy of science identifies Bayesian inference with
inductive inference and even rationality as such…

p. 8

This is the philosophical position. This claim, that Bayesian inference is the core of rationality and inductive inference, requires a sufficient level of idealization to make precise. We are thinking about an idealized agent with a fully specified set (algebra) of propositions over which she has a probability function describing her beliefs. The algebra is supposed to be exhaustive, in that these are all of the hypotheses she considers possible at all. She then updates this probability function by conditioning on the evidence she encounters. This again is basically the usual story, but we see it is a story that takes place at a sufficiently high level of idealization. I certainly don’t have a fully specified algebra of propositions in my head over which I have a sharp probability function capturing my beliefs.

When we speak about Bayesian inference as being the core of rationality and inductive inference we are talking about a best-case, idealized kind of scenario. For a slightly more concrete think to think about, think of the best artificial intelligence or an angelic intellect that is not limited and systemically irrational in the ways we are.

Now, when I qua limited finited intellect am trying to learn about the world, of course I cannot consider my full algebra of propositions (or rather the set I would have if you sufficiently idealize me). Instead, since I have limited time and computing power, I want to think about and work with the simplest hypotheses if I can. So I start with the simple models, and maybe I treat them in a toy kind of Bayesian statistics way or maybe I use more classical statistics, and I only move to more sophisticated models when I am forced to. When I am, I do in fact make judgements about when to “falsify” the assumptions underlying the ensemble, and what kind of models I should have in my next iteration.

The reason I do something like this, however, is that I am limited. This kind of inquiry is a bounded approximation of the optimal, idealized Bayesian inquiry I would be doing. (The limited part is definitely something that Gelman and Shalizi are aware of and mention in the paper—they take this as a mark against Bayesian methods, though. I think this is a mistake of confusing levels of idealization.)

This talk of idealization can be a little spooky, so I want to shy why I think keeping track of this actually matters for the real world. As Gelman and Shalizi have it set up, we do the hypothetico-deductive soft falsification kind of thing, and this is really the heart of inference. However, I want to draw your attention to how much magic is present in this system.

Imagine you were trying to build an AI. What kind of epistemology would you give it?

If you give it the Bayesian epistemology, things are clear. You need to give it an algebra of propositions (set of hypotheses) over which it can have a probability distribution, the probability function, and the rule it uses to change its probability function given new evidence (Bayesian conditionalization). (Realistically you would construct it with an algorithm that approximates these things to some degree.) Although there are choices that go into the choice of algebra and prior, the mechanics of the system are fully specified by the Bayesian framework.

Now, instead, image you were trying to build it using the hypothetico-deductive soft falsification method. First of all, we face a similar problem of how we specify the initial hypothesis set. But now, however, we have to also specify how the AI makes judgments about whether or not a model is good enough, and judgments about how to expand its model set.

Either we do not specify this, and then our AI doesn’t function, or we do specify it, and then we can start thinking about the rationality and optimality of these kind of rules. But the hypothetico-deductive soft falsification doesn’t even specify what the rules are! Right now it is some kind of primitive judgement statisticians or scientists are making. One we are forced to specify a rule, once we are forced to make it a more complete account of inference, there is no more room for magic. And, I claim, the same kinds of arguments we normally take for the Bayesian account of inference and learning at the object level will make us Bayesians at this higher order level. These are things like coherence, dutch books, converge properties, etc. One we push magic out of our system we end up being Bayesian again (with the caveat that bounded rationality can look different — like hypothesis testing — while trying its best to approximate the ideal case).

This concludes my argument for why the paper failed to make contact with the philosophical Bayesian position. Now, however, I want to conclude by sketching one way in which it might undermine someone’s confidence in the Bayesian picture, if that person held it for certain reasons.

Bayesian statistics has had a lot of success recently. If someone’s main evidence for Bayesian epistemology was this success, then I think that Gelman and Shalizi have showed that this line of evidence is undermined. In practice, most of Bayesian statistics fits more naturally into the hypothetico-deductive framework than the Bayesian one, a little ironically. Thus, the person for whom the success of Bayesian statistics was the main source of evidence for Bayesian epistemology should find the article persuasive. However, I think that most philosophers of science are Bayesians mainly for other reasons, some of which I have touched on in other places.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s