We’re taught in school something like the following: DNA is the blueprint for an organism. We can sharpen up this kind of idea using ideas from information theory. In particular, biologists tend to say things like “DNA transmits information” or “DNA sequences are coded instructions”. However, some philosophers of biology have claimed that when biologists talk about information in this way they aren’t really talking about the kind of information described in information theory.
In The transmission sense of information Bergstrom and Rosvall reply to two of these kind of criticisms. They argue that both of these criticism fail, and that once we think of information in the sense of transmission then this makes sense of information theory in the biological context.
***The original paper can be found here.***
The first criticism is that information theory only gives us a shallow type of correlation, which they call the “causal sense” of information. Here is the idea. In information theory one of the main tools for thinking about the relation between two things is mutual information. The mutual information between X and Y, which we denote by I(X:Y) , tells you how much we learn about X when we learn about Y.
For example, if X is the variable describing where someone was born, and Y is the variable describing their first language, then we think intuitively that these two variables contain mutual information. I was born in Canada. This makes it more likely that my first language is English than if I were born in Greece. Similarly, if you know my first language is English, it makes it more likely that I was born in Canada than in Greece. Of course these things are not perfectly correlated; my mother was born in Canada but her first language is Greek. The idea is simply that, on average, we expect learning about one of these features to be informative about the ever. It doesn’t need to be a perfect connection for them to have mutual information.
However, come people think that this notion is insufficient for our purposes in biology. This is because mutual information is perfectly symmetric: I(X;Y) = I(Y;X).
When biologists say something like “DNA contains information about the phenotype” there is an implicit directionality here. The information is supposed to flow from the genotype to the phenotype. However, this directionality is lost if we use information theory’s notion of mutual information.
The second criticism is that information theory (allegedly) does not deal with semantics. The concern is the following. Biologists don’t think that DNA carries information merely because it is correlated with phenotype. Instead, there is a sense in which the DNA represents the phenotype. Importantly, the phenotype does not represent the DNA, even though they are correlated.
This is a semantic view of generic information, mean to capture the kind of directionality that was shown to be lacking in the previous criticism. We can see this clearly in figure 1 of the paper.
So, in brief, the two criticisms are that (i) information theory gives us only a shallow, symmetric notion of information insufficient for biological purposes and (ii) information theory lacks the capacity to describe the semantic relationship between DNA and phenotype.
Bergstrom and Rosvall argue that information theory (which they sometimes call Shannon theory) is up for the challenge. In particular, they argue that paying attention to the main motivation for Shannon theory can reveal how it can withstand the two criticisms.
As we described above, philosophers of biology largely restrict Shannon theory to a descriptive statistics for correlations. This misses the point. At the core of Shannon theory is the study of how far mathematical objects such as sequences and functions can be compressed without losing their identity, and if compressed further, how much their structure will be distorted. From this foundation in the limits of compression emerges a richly practical theory of coding: information theory is a decision theory of how to package information for transport, efficiently. It is a theory about the structure of those sequences that efficiently transmit information.p. 163
Viewed in this way, information theory tells us something about how to transmit information. Bergstrom and Rosvall note that in order to make sense of this idea of transmission in biology we have to have a clear idea about who is sending the information to whom.
They suggest that the best way to think about it, and the way that biologists themselves think about it, is that the information is being sent from one generation to the next:
Here is the transmission; we know that genes are transmitted from parent to offspring in order to provide the offspring with information about how to make a living (e.g. metabolize sugars, create cell walls, etc.) in the world. This suggests that we can make sense of a large fraction of the use of information language in biology if we adopt a transmission view of information.p. 165
Instead of thinking about information from the genes flowing to the phenotype in the way pictured in figure 1, we instead have the picture in figure 2:
This leads them to develop a view of the transmission, or how information is conveyed:
Transmission view of information:
An object X conveys information if the function of X is to reduce, by virtue of its sequence properties, uncertainty on the part of an agent who observes X.
Bergstrom and Rosvall argue that this view address both of the problems we discussed earlier. We can see in a sketch how it does this. The information in DNA is about how to make a living, but it is transmitted from generation to generation. Thus it has a kind of directionality, in the way biologists need it to. An offspring uses the information from its parents, encoded in DNA, to start making a living in the world.
To address the second criticism—the semantic criticism—they argue that semantics are separate from transmission, and that when we are focused on transmission (as we should be) the semantic objection loses its force.
The transmission sense of information allows us to separate claims about how information is transmitted from claims about what information means. Indeed, we can study for information is transmitted without having any knowledge of the “codebook” for how to interpret the message, or even what the information represents.p. 170
In many biological studies, we are exactly in this position.”
It is understanding the function of DNA — to transfer information from generation to generation — that shows how information theory can be used as more than just a metaphor in biology. Indeed, it helps us see “what makes the genetic code a code, and we get a new perspective on the information language that is part of the everyday working vocabulary of researches in genetics and evolutionary biology” (p. 174).