## Wednesday, December 21, 2016

### Bayesian assessment of null values

A blog post by Christian Robert considered an ancient (2011!) article titled "Bayesian assessment of null values via parameter estimation and model comparison." Here I'll try to clarify the ideas from way back then through the lens of more recent diagrams from my workshops and a new article.

Terminology: "Bayesian assessment of null values" is supposed to be neutral wording to refer to any Bayesian method for assessing null values. Bayesian "hypothesis testing" is reserved for Bayes factors. Making decisions by posterior interval is not referred to as hypothesis testing and is not equivalent to Bayes factors.

Bayesian hypothesis testing: Suppose we are modeling some data with a model that has parameter δ in which we are currently interested, along with some other parameters. A null hypothesis model can be formulated as a prior on the parameters that puts a "spike" at the null value of δ but is spread out over the other parameters. A non-null alternative model puts a prior on δ that allows non-null values. The two models are indexed by a higher-level discrete parameter M. The entire hierarchy (a mixture model) has all its parameters updated by the data. The following slide from my workshops illustrates:

The Bayes factor (BF) is the shift in model-index probabilities:

Digression: I throw in two usual caveats about using Bayes factors. First, Bayesian model comparison --for null hypothesis testing or more generally for any (non-nested) models-- must use meaningful priors on the parameters in both models for the Bayes factor to be meaningful. Default priors for either model are typically not very meaningful and quite possibly misleading.
And, the Bayes factor is not the posterior probability of the models. Typically we ultimately want to know the posterior probabilities of the models, and the BF is just a step in that direction.

Assessing null value through parameter estimation: There's another way to assess null values. This other way focuses on the (marginal) posterior distribution of the parameter in which we're interested. (As mentioned at the outset, this approach is not called "hypothesis testing.") This approach is analogous to frequentist equivalence testing, which sets up a region of practical equivalence (ROPE) around the null value of the parameter:

The logic of this approach stems from a direct reading of the meaning of the intervals. We decide to reject the null value when the 95% highest density parameter values are all not practically equivalent to the null value. We decide to accept the null value when the 95% highest density parameter values are all practically equivalent to the null value. Furthermore, we can make direct probability statements about the probability mass inside the ROPE such as, "the probability that the parameter is practically equivalent to the null is 0.017" or "the probability that the parameter is practically equivalent to the null is 0.984."

The ROPE is part of the decision rule, not part of the null hypothesis. The ROPE does not constitute an interval null hypothesis; the null hypothesis here is a point value. The ROPE is part of the decision rule for two main purposes: First, it allows decisions to accept the null (again, analogous to frequentist equivalence testing). Second, it makes the decision rule asymptotically correct: As data sample size increases, the rule will come to the correct decision, either practically equivalent to the null value (within the ROPE) or not (outside the ROPE).

Juxtaposing the two approaches: Notice that the two approaches to assessing null values are not equivalent and have different emphases. The BF focuses on the model index, whereas the HDI and ROPE focus on the parameter estimate:

Therefore the two approaches will not always come to the same decision, though often they will. Neither approach is uniquely "correct;" the two approaches frame the question differently and provide different information.

Below is an example of the different information provided by hypothesis testing and estimation (for both frequentist and Bayesian analyses). The data are dichotomous, with z=14 successes out N=18 attempts (e.g., 14 heads out of 18 flips). The data are modeled by a Bernoulli distribution with parameter θ. The null value is taken to be θ=0.50. For the Bayesian analysis, the alternative-hypothesis prior is uniform merely for purposes of illustration; uniform is equivalent to dbeta(1,1).
You can see above that Bayesian hypothesis testing and Bayesian parameter estimation provide very different information. Which approach to use for assessing the null value then comes down to careful interpretation and practicalities. For more discussion please see this article: Kruschke, J. K. and Liddell, T. (2017), The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin & Review.

## Friday, December 16, 2016

### The Bayesian New Statistics: Hypothesis Testing, Estimation, Meta-Analysis, and Power Analysis from a Bayesian Perspective

UPDATE: Now published, see this post.

 Two conceptual distinctions in the practice of data analysis. Rows show point-value hypothesis testing versus estimating magnitude with uncertainty. Columns show frequentist versus Bayesian methods. Cells indicate the typical information provided by each approach. [Figure 1 of Kruschke & Liddell (in press), The Bayesian new statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin & Review.]
Many people have found the table above to be useful for understanding two conceptual distinctions in the practice of data analysis. The article that discusses the table, and many other issues, is now in press. (It was submitted in mid May, 2015, and was just accepted; a blog post announcing its original version is here, along with many comments.) The in-press version can be found at OSF and at SSRN.

Abstract: In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty, on the other hand. Among frequentists in psychology a shift of emphasis from hypothesis testing to estimation has been dubbed "the New Statistics" (Cumming, 2014). A second conceptual distinction is between frequentist methods and Bayesian methods. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The article reviews frequentist and Bayesian approaches to hypothesis testing and to estimation with confidence or credible intervals. The article also describes Bayesian approaches to meta-analysis, randomized controlled trials, and power analysis.