Last week we explored a proposed solution to Hume’s problem of induction — Schurz’s meta-inductor. The idea was this: suppose we have a bunch of predictors, and we are predicting something like the next symbol in a sequence. For example, we might have seen this sequence so far:
and our goal is to predict the next digit. Take a quick second and see what you think the next digit is.
Now, suppose instead the sequence looked something like this:
and we have to guess the next one. Less confident now, aren’t you? There doesn’t seem to be as easy a pattern.
Now suppose you can look at a few other people, and see what they are predicting. Suppose you have a friend, call her Alice, who has predicted every digit correctly. You have no idea how she does it, and she either can’t or refuses to explain it to you. However, it still seems reasonable that you might want to copy her predictions, if you can learn what she predicts before you have to make your guess.
This is basically the idea of Schurz’s meta-inductor. It considers the outputs of other predictors, and aggregates them in a coherent, optimal way that I touched on last post. Instead of just following whoever is doing the best, the meta-inductor uses a particular method to aggregate predictions. What was so interesting was not just that this meta-inductor did well, but that it was optimal with respect to that set of methods.
What exactly does that mean? Suppose you have a set of predictors, call them , as well as the meta-inductor. What Schurz proved is that, relative to this set of predictors, the meta-inductor will perform at least as well as the best predictor (on average, eventually, et cetera).
This is interesting, and does indeed point to something like a solution to Hume’s problem. The reasoning runs something like follows. Suppose we have our set of methods, , and one of them is the scientific method (however you want to think of it). Others could be things like using Tarot cards to predict things, or praying to the Greek gods for prediction, or flipping coins to generate predictions—any method to which you have access. We start off not privileging one method over another — who knows, maybe in our world Tarot cards are great predictors of events. Then we start predicting, and keeping track of which methods do well and which don’t. Using meta-induction, we pay attention to the successful ones and discount the predictions of the unsuccessful ones.
Now, if we take Hume’s problem to be something like justifying why science works (this is not exactly Hume’s problem, but it is certainly in a similar spirit), then, according to meta-induction, we need only look at science’s track record relative to other methods we might deploy. This is the empirical part of the argument. We notice that science does a decent job of predicting things, and is certainly better than Tarot cards and astrologers.
Thus, concludes Schurz, we have a meta-inductive justification of the scientific method. We have a deductive proof that meta-induction is optimal, and we have the empirical result that science has been successful in the past. We thus evade Hume’s charge that using induction to justify induction is circular; we instead use meta-induction to justify induction (the scientific method).
This seems great, and I in fact think it is! However, as you might have expected, there are in fact wrinkles in this reasoning, one of which Arnold points out in his paper.
***The original paper can be found here.***
Remember what Schurz’s proof showed. It showed that a meta-inductor, relative a set of other predictors, predicts optimally. However, we might reasonably wonder, which set of predictors should we use? For example, had we not chosen to include the scientific method in our set, our argument would not have gone through. Thus, we should think carefully about which predictors to include in our set.
One such natural set might be all the computable methods; methods which we could, in principle, deploy. Another might be stated in an even simpler way: the set of all methods we could use.
Both of these sets are infinite, presumably because the former is a subset of the latter. If you disagree with this, and think instead that the set of all methods we could possibly use is finite, then the following argument will not trouble you as much.
What Arnold proves is that, even though Schurz’s theorem holds for the finite case, in which there are only a finite number of other predictors, it does not hold in the case where there are an infinite number of predictors.
Why might this be a problem? Well, we are trying to claim something like: due to meta-inductive considerations, we have discovered that science is a good method. In an even more straightforward way, we want to claim that meta-induction is optimal–the claim that science is a good method to deploy is parasitic on the fact that meta-induction is optimal. So, since Arnold shows that in the infinite case meta-induction ceases to be optimal, our solution to Hume’s problem evaporates.
Where does this leave us? I am sympathetic to the objection that we cannot, in fact, use an infinite number of methods. However, it also seems that comparing ourselves to a finite, static number of methods is untenable–what if we come up with a better scientific theory, or a better prediction algorithm? We want some way to introduce these prediction methods into our set. That is, we want not just a finite set, but a dynamic set.
This is a currently active research area. For an excellent example of such current work, see here. I’m excited to see how this approach turns out. It bears a deep resemblance to the problem of new theories in Bayesian epistemology. I also think that the move to a finite but dynamic set of predictors avoids the problems of moving to something like Solomonoff induction, which otherwise might seem like the natural next step.
This kind of work is a great example of how we can deploy technical methods to tackle tradition philosophical problems. I think the adoption of this kind of methodology indicates a positive trajectory for philosophy.