Probability plays a central role in this blog—many of my posts focus on where probability makes contact with philosophy and physics. However, there is also of course the mathematical theory of probability. The mathematics and the;philosophy interact in many ways, often technical results in the mathematics can be important for our work as philosophers.
For today’s post I want to dive into a paper that focuses on the more mathematical side of probability. The paper itself is very technical, and so I wont be focusing on explaining how the proof or argument in the paper works. Instead I am to provide the reader with an introduction to a few of the key concepts used in the paper, so that I can state the result of the paper with something approaching clarity.
***The original paper can be found here.***
There are three main concepts I want to discuss. The first is cardinality, the second is additivity, and the third is reflection/conglomerability. If you skim the post it will look mathematical and it is, but I give you all the tools you need to follow along—no previous knowledge required.
I
The fundamental objects in mathematics are sets. A set is a collection of elements. For example, the set is the set that contains 3 elements:
and
. We would say here that the cardinality of the set
is 3. Similarly, the cardinality of the set
is 5. Cardinality is like the size of a set.
Things become trickier when we move to infinite sets. For example, the set of natural numbers is infinite, since there is no largest natural number. We can’t assign this set a cardinality of any natural number, because it is bigger than any of them.
Does this mean that there are all the finite numbers, and then infinity, and these are all the possible cardinal numbers? Not quite. There are actually infinities of different sizes. To keep this separate, we would want to assign different cardinal numbers to different sizes.
To see that there are different sizes of infinities, we need to think a little more carefully about the size of a set. The natural numbers are easy because they fit our intuition. If one set has 3 members and one set has 5, then the latter clearly is larger than the former. What about the set of natural numbers from before, , and the set of odd natural numbers,
?
On one line of thought, it looks like should be bigger than
, because every odd number is also a natural number, but not the other way around. We write this relationship formally like this,
, and say that
is a subset of
. The line of thought we were considering goes likes this: one set is smaller than another if it is a subset of the other.
This is not the notion mathematicians use, and we can see why. First off, it doesn’t capture simple cases. For example, the set contains 3 elements, and the set
contains 5, but is not the case that
is a subset of
since they don’t have any of the same members. This criterion would not help us compare them. There we can still lean on our intuitions since it is finite. But how about these two sets
Neither is a subset of the other, and it is unclear how we should make the judgement. Are these two sets simply incomparable in size? That would certainly be an impoverished notion of cardinality.
Mathematicians have an answer to this problem, and it involves a special type of function called a bijection. You can think of a function that takes an object from one set which we call the domain, say, , and maps it to an object in a different set we call the codomain, say
. A bijection is a function that satisfies two properties: it is injective and surjective. A function is injective if no two elements of the domain get mapped to the same object. For example, if
is our function, and if both
and
, then
is not injective. A function is bijecive is every element of the codomain—in our example,
—is ‘hit’ by the function. If we want to state this more carefully we would say that for every element
in
, there is some
in
such that
.
Now we can define a rigorous notion of cardinality. The cardinality of two sets and
is equal if and only if there is some bijective function from
to
. This is abstract, so let us consider an example.
Consider two sets and
. Neither is a subset of the other, since they don’t share any members. However, we can define a bijection
between them as follows:
Notice that this satisfies our definition of a bijection. Each element of is hit by
, and no two elements of
are mapped to the same element in
. Since this is a bijection, we say that
and
have the same cardinality, and we write
, where
means the cardinality of
.
Notice that this allows us to compare the finite sets as well. Consider the sets and
from above. We see that we can define a bijection
between these sets as follows
Thus, . Furthermore, we see that, for example,
. Why? There cannot be a function from
to
that is surjective, since each element of
can only be mapped to one element of
, but there are more than 3 elements in
.
However, we can also use this to show that the set of all odd numbers has the same cardinality as the set of naturals. We can see this because for the two sets
we can define the function
and this also satisfies the properties of a bijection.
Now we can finally answer one of our earlier questions: are there different sizes of infinities? To show why this has to be the case I want to introduce the notion of a powerset. The powerset of a set
is the set of all subsets of
. For example, if
as before, then
We can see in this case that , since the power set has 8 elements whereas the original set has only 3.
Cantor, one of the founders of set theory, proved that for any set ,
. Consider the following line of reasoning. Suppose there were a bijection from
to
, call it
. Remember,
contains all subsets of
. One set we can define using
is the set of elements in
such that they are not a member of the set they get mapped to—call this set
. For example, in the above example with
, if
, then
is a member of
, since it is not a member of
. However, if
, then
would not be a member of
, since it is a member of the set it got mapped to.
With this set up, consider the question: is the set hit by
? Remember, this was necessary for it to be a bijection, since
is a member of the power set of
. I claim that it is not hit by
. For suppose that there were some
in
such that
. But then
is in
if and only if
is not in
. But since it can’t be both in it and not in it, this is a contradiction. Thus we see that
. However, since
is a perfectly fine injective function, we have that
.
This is a fundamental result in set theory. It shows that there is a whole hierarchy of infinities—a whole hierarchy of cardinalities. Thus, we have a whole host of different infinite cardinalities to choose from. The in the title of the paper refers to some infinite cardinal number. This completes my summary of the first concept.
II
The next concept is additivity. In particular, we are talking about the additivity of a probability function. Additivity tells us something about how we sum probabilities of different events.
In probability theory we work with something called an event space—this is a set of events. For example, if we are rolling a die, the event space might be the set , since there are 6 possible results of the die roll.
An event is a subset of this space. For example, rolling a 5 is an event—this is the set $\{5\}$. However, rolling an even number is also an event. This would correspond to a the set .
Knowing the probability of each of the basic events—the sides of the die—which in this case is one sixth, we might want to ask if we can calculate the probability of a different event, like rolling an even number.
Our intuition says that we sum up the probabilities of the individual basic events. For the even event, this means summing . This makes sense, but what principle allows us to sum up the probability of other events like this?
What we are appealing to here is a hidden additivity principle. We have a probability function that takes as input an event, like rolling a 3—— or rolling one of the 3 greatest numbers—
. The probability function
assigns a real number between 0 and 1 to each event. The function must satisfy some properties, including additivity properties.
For example, consider the principle that if an event is the union—the set that contains all the elements of both sets, for example, union
—of two disjoint events (two events are disjoint if they share no members), then the probability of the event is equal to the sum of the probability of the two events.
That was a little abstract, so let us consider an example. Suppose we are interested in the probability tat we get either a 1 or a 6. This is the event . It is the union of two disjoint events:
and
. We know the probability of each of these two events:
. We want to say then that the probability of getting a 1 or a 6 is
. In order to do this we are appealing the principle in the preceding paragraph—a principle we might call 2-additivity, since it applied to unions of 2 events.
However, we see that from 2-additivity we get finite-additivity for free. A probability function is finitely additive if the probability of any event that is the disjoint union of finite n events is the sum of the probability of those n events.
In order to see how we get this for free from 2 additivity, consider the earlier case of the even event, . We can write this as the union of
and
. We don’t yet have the probability for this latter event, but we can split it again into the union of
and
. We can then use 2-additivity to calculate the probability of the event
, which is
, and then we use 2-additivity again to get the probability to get the probability of the even event,
. Thus, 2-additivity is sufficient for finite additivity.
Our example of the die roll was one of a finite event space. There were only 6 possible outcomes. However, in science we regularly consider infinite event spaces. Thus, we might want to consider probability functions that have greater types of additivity. One type is countable additivity, which allows you to sum up the probabilities of events with cardinality —we call this type countable because if a set is of this size there exists a bijection from it to the natural numbers—the countable numbers.
Most commonly used in probability theory is countable additivity, but we can still in principle think about higher degrees of additivity , of size for example cardinality , where
is some arbitrary cardinal.
III
The last concept I want to introduce before stating the result of the paper is that of reflection/conglomerability. They are actually two different concepts, but I think starting with reflection provides a nice way to understand why conglomerability is interesting from a philosophical perspective.
The principle of reflection is a constraint on a rational agen’ts probabilities. Consider an agent who is about the undergo a learning experience, in which she will observe one element of a partition. A partition is a set of events that are disjoint from each other, and that cover the whole space. For example, we our die case from before, one partition might be the two sets and
. The agent is trying to learn what the result of the die roll was, but we only tell her whether it was even or odd.
Reflection puts constraints on an agent’s current beliefs based on her possible future beliefs. For example, suppose that we want to know what an agent’s degree of belief that the die came up 1 should be. (Intuitively we think 1/6, but let us follow through this example to illustrate reflection, which can help us in more challenging cases). Suppose that her degree of belief in the die coming up 1 conditional on the even event is 0—that is, if she observes the even event then her new degree of belief in 1 will be 0 (since 1 is not even)—and her degree of belief in the die coming up 1 conditional on the odd event is 1/3. Furthermore, suppose that she assigns a credence of 0.5 to the roll being odd and a credence of 0.5 to the roll being even.
Reflection says that an agent’s current degree of belief in a proposition should be equal to the agent’s expectation of her future degree of belief. I illustrate this with the above case. If the agent learns that the roll was odd, then he probability in the proposition “the roll came up 1” is 1/3. If the agent learns that the roll was even, then he probability in the proposition “the roll came up 1” is 0. She assigns each of these cases probability 0.5. Thus, before she learns whether it was odd or even, her expectation of her future degree of belief is . Reflection says that her current degree of belief should equal one sixth. This is a kind of conservation of expected evidence principle.
This connection to rationality can help us to see why conglomerability is interesting. Consider a probability function, an event space, and an event. The conditional probabilities are conglomerable with respect to a partition of size if for each two real constants
and
, if the conditional probability of the event
given an element of the partition is between the two constants, then this implies that the prior probability
is also between these two constants.
The authors give a nice summary of how this is interesting from an epistemic perspective:
Conglomerability is an intuitively plausible property that probabilities might be required to have. Suppose that one thinks of the conditional probability
(p. 285)as representing one’s degree of belief in
if one learns that
is true. Then
for all i in
means that one believes that, no matter which
observe, one will have degree of belief in
at most
. That is, if one know for sure that one is going to believe that the probability of
is at most
after observing which
is true, then one should be entitled to believe that the probability of
is at most
now.
Thus we see how we can capture an epistemic condition in a formal definition. Ideally we want our probability functions to be conglomerable.
IV
We can now state the main result of the paper, more or less. The main result is that
Subject to several structural assumptions…the cardinality
(p. 289)of a partition where [the probability function]
is nonconglomerable is bounded above by the (least) cardinal for which
is not
-additive…
In other words, there will be a partition, of size at most where our probability function is not
-additive, for which conglomerability fails.
So what is the story here? We have an epistemic principle, something like a conservation of evidence principle, that we want our probability functions to satisfy. However, we have a mathematical result that says that this is impossible for certain large cardinalities in certain contexts. Thus, we have a case in which we clearly see highly technical work filter back up into our epistemology. Even though in some sense it is desirable that our beliefs should satisfy this property, it may in fact be impossible. We should not require rational agents to do the impossible.
Mathematical results can limit and inform our philosophy. We need to pay attention to them.