Saturday, April 13, 2013

Bayesian robust heteroscedastic hierarchical two-factor "ANOVA"

A great advantage of Bayesian analysis, in software such as JAGS/BUGS, is that you can create models that are appropriate to the data (instead of shoe-horning your data into inappropriate models). For example, suppose you have a design that would be analyzed traditionally by two-factor ANOVA. In traditional ANOVA, it is assumed that the data are distributed normally within cells and that the variances are the same for all cells. But suppose you have real data. In real data, there may be many outliers (i.e., distributions within cells have heavy non-normal tails) and the cells may have very different variances (i.e., there is "heteroscedasticity" across cells). In JAGS/BUGS, we can create a model that (1) accommodates outliers by using heavy-tailed distributions to describe data within cells, and (2) accommodates heteroscedasticity by using distinct variance parameters in each cell. And, in the program I present here, there is hierarchical structure that provides shrinkage on the cell means and cell variances.

The program is a variant of programs in DBDA, so I will not provide a self-contained tutorial in this post, instead relying on readers to consult Chapter 19 of DBDA for background information. Since writing DBDA, the ANOVA-style programs have been iteratively improved, as explained in various posts in this blog; search the blog with the term "ANOVA".

Important: The hierarchical structure in the model can produce implosive shrinkage for small data sets and/or priors that allow (or encourage) zero differences between cells. Implosive shrinkage is merely what the model implies, given the data, but it might not be what you think is a meaningful description of the data. If you get implosive shrinkage that you deem inappropriate or not useful, then you can change the prior structure so it more accurately reflects your modeling assumptions. For example, you could simply change the model so it has no hierarchical structure, but then, of course, you get no sharing of information across cells. The program presently uses prior distributions that de-emphasize zero differences between cells. This helps prevent implosive shrinkage, but it is also an expression of a prior belief that exactly-zero differences are improbable. In particular, the prior is not a half-Cauchy or folded-t distribution that some statisticians have recommended as a generic prior for hierarchical variance terms. Please see the comments inserted throughout the program.

As an example, consider a 3-by-4 two-factor between-subjects design. (And no, I don't have any within-subjects versions; sorry.) A link to the data file is included below. Here is a plot of the sample means:
Although not visible in the plot of the cell means, the cells have outliers and heteroscedastic variances. The results of a conventional ANOVA yield:

             Df Sum Sq Mean Sq F value   Pr(>F)
x1            2    658   329.1   2.375   0.0939 .
x2            3   3661  1220.2   8.804 1.02e-05 ***
x1:x2         6   1446   241.0   1.739   0.1096
Residuals   588  81495   138.6
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Notice, above, the non-significant (overall) interaction. Using the Bayesian program, we find, among other things, that there is a credibly non-zero interaction contrast (depending on your choice of ROPE):
The Bayesian program provides estimates of all the cell means and cell variances (etc.), so you can do lots of comparisons of cell means and cell variances.

The program (AnovaTwoFactor.R) and data file (AnovaTwoFactorData.csv) are available at the usual program repository.

1. Not disputing your Bayesian analysis here, but why say that classic ANOVA assumes "that the data are distributed normally within cells and that the variances are the same for all cells".
There is no normality assumption of data in ANOVA, only normality of residuals within cells - which is quite different from normality of data. Also, frequentist methods have adjustments for violations of homogeneity of variance (think Kruskal-Wallis or something like that).

2. Regarding normality, that's what I meant (and what I still think I said). Unless you mean something else...

Regarding frequentist "fixes" for violations of homogeneity... My goal is to show that with Bayesian methods you can actually create a model that is appropriate for your data. As in this example, you can actually estimate distinct variances for each cell, and put those estimates in a hierachical structure (if that is an appropriate expression of your prior). The Bayesian method seamlessly produces a posterior distribution on parameters that is directly interpretable.

3. Dear John,
I think I understand most of the modeling decisions in this program (thanks to your excellent book). One aspect I don't see is your warning about the "de-emphasis of zero differences between cells". How should that come about? Thank you very much!

4. Dear Johannes:

The priors on the SD parameters are gamma distributions that are parameterized by their mode and SD. This makes the left side of the gamma distribution plummet down at zero. (For related info, see http://doingbayesiandataanalysis.blogspot.com/2012/08/gamma-likelihood-parameterized-by-mode.html )

Admittedly, when the SD of gamma distribution is large, the left side of the gamma distribution can still put a lot of mass very close to zero, but it's supposed to be an attempt to let the prior be zero at zero.

Ultimately, this sort of hierarchical prior in ANOVA-style models works best when there are many levels on the factors, and/or lots of data in the cells. The shrinkage can produce strange results for small data sets.