The 2nd Edition of the book visited the grave of Jacob Bernoulli (1655-1705) in Basel, Switzerland. Jacob Bernoulli pre-dated Bayes (1701-1761), of course, but Bernoulli established foundational concepts and theorems of probability. The photos below were taken by Marc Sager, who is a student at the
University of St. Gallen (where I give a workshop in the summers). Thanks, Marc!

The 1st Edition also visited Bernoulli, as was blogged here. The 1st Edition also visited Bayes tomb and the remains of R. A. Fisher. The book is still waiting to visit Laplace!

If you pose the book with other famous
Bayesians, or pre-Bayesians, or anti-Bayesians, either dead or
not-quite-yet dead, please send me those photos too! (The goal is
to be amusing and informative, not offensive.) Thanks, and have fun!

# Doing Bayesian Data Analysis

## Tuesday, April 14, 2015

## Sunday, April 12, 2015

### Power is of two kinds (or: Gandhi, power, fear, love, and statistics)

In the chapter of DBDA2E on

Now the original source has been revealed to me by reader Atul Sharma. (Thank you, Atul!) He even pointed me to an online archive of image scans of the original documents. Here is the relevant page; the passage starts at the bottom of the left column:

The image comes from the Gandhi Heritage Portal. The full reference is Gandhi, M. K. (1925, 08 January).

Well, I was being playful with the word "power" but there also was a deeper relationship. In classical statistics, "power" refers to the goal of rejecting the null hypothesis. But that goal has problems, and a better goal is seeking precision (and accuracy) of parameter estimation. On p. 384 of DBDA2E I said "The goal of achieving precision thereby seems to be motivated by a desire to learn the true value, or, more poetically, by love of the truth, regardless of what it says about the null value. The goal of rejecting a null value, on the other hand, seems too often to be motivated by fear: fear of not being published or not being approved if the null fails to be rejected. The two goals for statistical power might be aligned with different core motivations, love or fear." Then came the quote from Gandhi.

*Goals, Power, and Sample Size*(p. 384), I quoted the Mahatma Gandhi:"But I was not able to find an original source for that quote, and I said so in a footnote.Power is of two kinds. One is obtained by the fear of punishment and the other by arts of love. Power based on love is a thousand times more effective and permanent than the one derived from fear of punishment."

Now the original source has been revealed to me by reader Atul Sharma. (Thank you, Atul!) He even pointed me to an online archive of image scans of the original documents. Here is the relevant page; the passage starts at the bottom of the left column:

The image comes from the Gandhi Heritage Portal. The full reference is Gandhi, M. K. (1925, 08 January).

*Young India*, p. 15.*What did that quote have to do with statistical goals and power?*Well, I was being playful with the word "power" but there also was a deeper relationship. In classical statistics, "power" refers to the goal of rejecting the null hypothesis. But that goal has problems, and a better goal is seeking precision (and accuracy) of parameter estimation. On p. 384 of DBDA2E I said "The goal of achieving precision thereby seems to be motivated by a desire to learn the true value, or, more poetically, by love of the truth, regardless of what it says about the null value. The goal of rejecting a null value, on the other hand, seems too often to be motivated by fear: fear of not being published or not being approved if the null fails to be rejected. The two goals for statistical power might be aligned with different core motivations, love or fear." Then came the quote from Gandhi.

## Thursday, April 9, 2015

### Bayes factors for tests of mean and effect size can be very different

In this post, we consider a Bayes factor null-hypothesis test of the mean in a normal
distribution, and we find the unintuitive result that

By contrast, the posterior distributions of the mean and of the effect size are very stable despite changing the vague prior on the standard deviation.

Although I caution against using Bayes factors (BFs) to routinely test null hypotheses (for many reasons; see Ch. 12 of DBDA2E, or this article, or Appendix D of this article), there might be times when you want to give them a try. A nice way to approximate a BF for a null hypothesis test is with the Savage-Dickey method (again, see Ch. 12 of DBDA2E and references cited there, specifically pp. 352-354). Basically, to test the null hypothesis for a parameter, we consider a narrow region around the null value and see how much of the distribution is in that narrow region, for the prior and for the posterior. The ratio of the posterior to prior probabilities in that zone is the BF for the null hypothesis.

Consider a batch of data randomly sampled from a normal distribution, with N=43. We standardize the data and shift them up by 0.5, so the data have a mean of 0.5 and an SD of 1.0. Figure 1, below, shows the posterior distribution on the parameters

First, consider the
mu (mean) parameter. From the relation of the 95% HDI and ROPE, we would
decide that a value of 0 for mu is not very credible, with the entire
HDI outside the ROPE and only 0.7% of the posterior distribution
practically equivalent to the null value. For the effect size, a similar conclusion is reached, with the 95% HDI completely outside the ROPE, and only 0.8% of the posterior practically equivalent to the null value. Note that the ROPEs for mu and effect size have been chosen here to be commensurate.

To determine Bayes factors (BFs) for mu and effect size, we need to consider the prior distribution in more detail. It has a broad normal prior on mu with an SD of 100 and a broad uniform prior on sigma from near 0 to 1000, as shown in Figures 2 and 3:

The implied prior on the effect size, in the lower right above, is plotted badly because of a few outliers in the MCMC chain, so I replot it below in more detail:

The BF for a test of the null hypothesis on mu is the probability mass inside the ROPE for the posterior relative to the prior. In this case, the BF is 0.7% / 0.1% (rounded in the displays) which equals about 7. That is, the null hypothesis is 7 times more probable in the posterior than in the prior (or, more carefully stated, the data are 7 times more probable under the null hypothesis than under the alternative hypothesis). Thus, the BF for mu decides in

The BF for a test of the null hypothesis on the effect size is the analogous ratio of probabilities in the ROPE for the effect size. The BF is 0.8% / 37.2% which indicates a strong preference

Now we use a different vague prior on sigma, namely unif(0,10), but keeping the same vague prior on mu:

The resulting posterior distribution looks like this:

Compare the posterior in Figure 6 with the posterior in Figure 1. You will see they are basically identical. In other words, the 95% HDIs have barely changed at all, and decisions based on HDI and ROPE are identical, and the probability statements are identical.

But the BF for effect size is rather different than before. Now it is 0.8% / 0.4%, which is to say that the probability of the null hypothesis has gone up, i.e., this is a BF that leans in

To summarize so far,

Proponents of BFs will quickly point out that the priors used here are not well calibrated, i.e., they are too wide, too diluted. Instead, an appropriate use of BFs demands a well calibrated prior. (Proponents of BFs might even argue that an appropriate use of BFs would parameterize differently, focusing on effect size and sigma instead of mu and sigma.) I completely agree that the alternative prior must be meaningful and appropriate (again, see Ch. 12 of DBDA2E, or this article, or Appendix D of this article) and that the priors used here might not satisfy those requirements for a useful Bayes factor.

But there are still two take-away messages:

First, the BF for the mean (mu) need not lead to the same conclusion as the BF for the effect size unless the prior is set up just right.

Second, the posterior distribution on mu and effect size is barely affected at all by big changes in the vagueness of the prior, unlike the BF.

*the Bayes factor for the*. The reason is that the prior distribution of the standard deviation affects the implied prior on the effect size. Different vague priors on the standard deviation can dramatically change the BF on the effect size.**mean**can be very different than the Bayes factor for the**effect size**By contrast, the posterior distributions of the mean and of the effect size are very stable despite changing the vague prior on the standard deviation.

Although I caution against using Bayes factors (BFs) to routinely test null hypotheses (for many reasons; see Ch. 12 of DBDA2E, or this article, or Appendix D of this article), there might be times when you want to give them a try. A nice way to approximate a BF for a null hypothesis test is with the Savage-Dickey method (again, see Ch. 12 of DBDA2E and references cited there, specifically pp. 352-354). Basically, to test the null hypothesis for a parameter, we consider a narrow region around the null value and see how much of the distribution is in that narrow region, for the prior and for the posterior. The ratio of the posterior to prior probabilities in that zone is the BF for the null hypothesis.

Consider a batch of data randomly sampled from a normal distribution, with N=43. We standardize the data and shift them up by 0.5, so the data have a mean of 0.5 and an SD of 1.0. Figure 1, below, shows the posterior distribution on the parameters

Figure 1. Posterior when using unif(0,1000) prior on sigma, shown in Fig's 2 and 3. |

To determine Bayes factors (BFs) for mu and effect size, we need to consider the prior distribution in more detail. It has a broad normal prior on mu with an SD of 100 and a broad uniform prior on sigma from near 0 to 1000, as shown in Figures 2 and 3:

Figure 2. Prior with unif(0,1000) on sigma. Effect size is shown better in Figure 3. |

The implied prior on the effect size, in the lower right above, is plotted badly because of a few outliers in the MCMC chain, so I replot it below in more detail:

Figure 3. Implied prior on effect size for unif(0,1000) prior on sigma. |

The BF for a test of the null hypothesis on mu is the probability mass inside the ROPE for the posterior relative to the prior. In this case, the BF is 0.7% / 0.1% (rounded in the displays) which equals about 7. That is, the null hypothesis is 7 times more probable in the posterior than in the prior (or, more carefully stated, the data are 7 times more probable under the null hypothesis than under the alternative hypothesis). Thus, the BF for mu decides in

*favor*of the null hypothesis.The BF for a test of the null hypothesis on the effect size is the analogous ratio of probabilities in the ROPE for the effect size. The BF is 0.8% / 37.2% which indicates a strong preference

*against*the null hypothesis.**Thus, the BF for mu disagrees with the BF for the effect size.**Now we use a different vague prior on sigma, namely unif(0,10), but keeping the same vague prior on mu:

Figure 4. Prior with unif(0,10) on sigma. Effect size is replotted in Figure 5. Compare with Figure 2. |

Figure 5. Effect size replotted from Figure 4, using unif(0,10) on sigma. |

The resulting posterior distribution looks like this:

Figure 6. Posterior when using unif(0,10) prior on sigma. |

But the BF for effect size is rather different than before. Now it is 0.8% / 0.4%, which is to say that the probability of the null hypothesis has gone up, i.e., this is a BF that leans in

*favor*of the null hypothesis. Thus, a less vague prior on sigma has affected the implied prior on the effect size, which, of course, strongly affected the BF on effect size.To summarize so far,

*a change in the breadth of the prior on sigma had essentially no effect on the HDIs of the posterior distribution, but had a big effect on the BF for the effect size while having no effect on the BF for mu.*Proponents of BFs will quickly point out that the priors used here are not well calibrated, i.e., they are too wide, too diluted. Instead, an appropriate use of BFs demands a well calibrated prior. (Proponents of BFs might even argue that an appropriate use of BFs would parameterize differently, focusing on effect size and sigma instead of mu and sigma.) I completely agree that the alternative prior must be meaningful and appropriate (again, see Ch. 12 of DBDA2E, or this article, or Appendix D of this article) and that the priors used here might not satisfy those requirements for a useful Bayes factor.

But there are still two take-away messages:

First, the BF for the mean (mu) need not lead to the same conclusion as the BF for the effect size unless the prior is set up just right.

Second, the posterior distribution on mu and effect size is barely affected at all by big changes in the vagueness of the prior, unlike the BF.

Subscribe to:
Posts (Atom)