This post was provoked by a recent blog at the New Yorker by Gary Marcus and Ernest Davis. The topic of their blog was a criticism of Nate Silver's book, The Signal and the Noise. In the course of their critique, Marcus and Davis go through a case of the standard textbook example, involving diagnosis of breast cancer. They assume, for purposes of the example, that there is a well-known prior probability of breast cancer. They say,
A Bayesian approach is particularly useful when predicting outcome probabilities in cases where one has strong prior knowledge of a situation. ... But the Bayesian approach is much less helpful when there is no consensus about what the prior probabilities should be.I think that the second sentence above is exactly wrong. The Bayesian approach is also helpful, perhaps uniquely helpful, when there is uncertainty in the prior. All we have to do is express the uncertainty in a mathematical model, and then let Bayesian inference tell us how to re-allocate uncertainty given the data. The present blog post is an explicit illustration of one way to do this.
Please note that this post is not defending or criticizing Nate Silver's book. This post is about what Marcus and Davis say about Bayesian methods, illustrated specifically by the case of disease diagnosis. This post is not about anything Silver does or doesn't say about Bayesian or non-Bayesian methods. My goal is to clarify what Bayesian methods can do, and specifically one way for expressing uncertainty in the case of disease diagnosis.
First, the standard textbook example, as provided by Marcus and Davis in their blog:
Suppose, for instance (borrowing an old example that Silver revives), that a woman in her forties goes for a mammogram and receives bad news: a “positive” mammogram. However, since not every positive result is real, what is the probability that she actually has breast cancer? To calculate this, we need to know four numbers. The fraction of women in their forties who have breast cancer is 0.014 [bold added], which is about one in seventy. The fraction who do not have breast cancer is therefore 1 - 0.014 = 0.986. These fractions are known as the prior probabilities. The probability that a woman who has breast cancer will get a positive result on a mammogram is 0.75. The probability that a woman who does not have breast cancer will get a false positive on a mammogram is 0.1. These are known as the conditional probabilities. Applying Bayes’s theorem, we can conclude that, among women who get a positive result, the fraction who actually have breast cancer is (0.014 x 0.75) / ((0.014 x 0.75) + (0.986 x 0.1)) = 0.1, approximately. That is, once we have seen the test result, the chance is about ninety per cent that it is a false positive.In the case above, the prior probability (emphasized by bold font), is assumed to be exactly 0.014. This probability presumably came from some gigantic previous survey of women, using some extremely accurate assay of the disease, so that the prior probability is conceived as having no uncertainty.
But what if there is uncertainty about the prior probability of the disease? What if, instead a gigantic previous survey, there was only a small survey, or no previous survey at all? Bayesian methods handle this situation well. I've got to introduce a little mathematical notation here, but the ideas are straightforward. Let's denote the true probability of the disease in the population as the symbol θ (Greek letter theta). The standard example above stated that θ=0.014. But in Bayesian methods, we can put a distribution of relative credibility across the entire range of θ values from 0 to 1. Instead of assuming we believe only in the exact value θ=0.014, we say there is a spectrum of possibilities, and the prior knowledge expresses how certain we are in the various possible values of θ. The distribution across the range of θ values could be very broad --- that's how we express uncertainty. Or, the distribution could be sharply peaked over θ=0.014 --- that's how we express certainty.
Here is the standard example in which there is high certainty that θ=0.014, with the positive test result denoted mathematically as y=1.
The left panel shows a sharp spike over the value θ=0.014, indicating that all other values have essentially zero credibility. (Ignore the messy numbers under "95% HDI." They are irrelevant for present purposes.) This "spike" distribution can be thought of as being based on a previous survey of 20,000 women. That's why the header of the left panel says "Prior Certainty: 20000." The right panel above shows the posterior probability of having the disease, given the positive test result (y=1). The right bar, centered on disease=1.0, has height of 0.097, which is the posterior probability of having the disease. This result closely matches the result stated by Marcus and Davis in their example.
We can use Bayesian inference seamlessly when the prior is less certain. Suppose, for example, that the previous survey involved only 20 women. Then the prior is a broader distribution over possible underlying probabilities of the disease, and the result looks like this:
The left panel shows a distribution that is spread out over θ, meaning that we are not very certain what the value of θ is. (The displayed distribution is actually the posterior distribution of θ given the test result; the prior distribution looks similar.) In particular, lots of values for θ greater than 0.014 are considered to be possible. The right panel above shows the posterior probability of having the disease, given the positive test result. Notice that the Bayesian inference indicates that the probability of having the disease is about 0.336, much more than the result of 0.097 when the prior was highly certain. This makes sense; after all, the prior knowledge is uncertain, so the the test result should weigh more heavily.
We can use Bayesian inference seamlessly even when the prior is hugely uncertain. Suppose that the previous survey involved only 2 women, which abstractly means merely that we know the disease can happen, but that's all we know. Then the prior distribution looks like this:
The prior, displayed above, shows that any value of θ is equally possible, and the probability of having the disease, according to this prior, is 50-50. The posterior distribution, given a positive test result, looks like this:
Notice that the posterior probability of having the disease is 0.882. Why so high? Because we have very uncertain prior knowledge about the disease, and all we have to go by is the outcome of the diagnostic test. (The Bayesian inference also automatically updates the distribution of credibility across θ as shown in the left panel above. It shows that, given the single positive test result, we should shift our beliefs about the underlying probability of the disease toward higher values.)
In this post I've tried to show that Bayesian inference is seamlessly useful regardless of how much uncertainty there is in prior knowledge. The simplistic framing of the standard textbook example should not be construed as the only way to do Bayesian analysis. Indeed, the whole point of Bayesian inference is to express uncertainty and then re-allocate credibility when given new knowledge. If there is lack of consensus in prior knowledge, then the prior should express the lack of consensus, for example with a broad prior distribution as illustrated in the examples above. If different camps have different strong priors, then Bayesian analysis can tell each camp how they should re-allocate their idiosyncratic beliefs. With enough data, the idiosyncratic posterior distributions will converge, despite starting with different priors.
By the way, I am not aware of the sort of analysis I've provided above appearing elsewhere in the literature. But it's straightforward, and I imagine it must be "out there" somewhere. If any of you can point me to it, I'd appreciate it.