To make things concrete, consider an example provided by Maxwell & Delaney (2004, Designing Experiments and Analyzing Data: A Model Comparison Perspective, 2nd Edition, Erlbaum; Ch. 12). (As I've mentioned elsewhere, if you must learn NHST, their book is a great resource.) A perceptual psychologist is studying response times for identifying letters flashed on a screen. The letters can be rotated off of vertical by zero degrees, four degrees, or eight degrees. Every subject responds many times to letters at each of the three angles. The experimenter (for unknown reasons) analyzes only the median response time of each subject at each angle. Thus, each subject contributes only one datum at each level of angle (factor B). There are two types of subjects: "young" and "old." Age of subject is factor A. The structure of the data is shown below. Y is the median response time, in milliseconds. There are 10 subjects per Age group.
Y | Subj | Angle | Age |
450 | 1 | B1(Zero) | A1(Young) |
510 | 1 | B2(Four) | A1(Young) |
630 | 1 | B3(Eight) | A1(Young) |
390 | 2 | B1(Zero) | A1(Young) |
480 | 2 | B2(Four) | A1(Young) |
540 | 2 | B3(Eight) | A1(Young) |
... | ... | ... | ... |
510 | 10 | B1(Zero) | A1(Young) |
540 | 10 | B2(Four) | A1(Young) |
660 | 10 | B3(Eight) | A1(Young) |
420 | 11 | B1(Zero) | A2(Old) |
570 | 11 | B2(Four) | A2(Old) |
690 | 11 | B3(Eight) | A2(Old) |
600 | 12 | B1(Zero) | A2(Old) |
720 | 12 | B2(Four) | A2(Old) |
810 | 12 | B3(Eight) | A2(Old) |
... | ... | ... | ... |
510 | 20 | B1(Zero) | A2(Old) |
690 | 20 | B2(Four) | A2(Old) |
810 | 20 | B3(Eight) | A2(Old) |
Split-plot designs are difficult to analyze in NHST because it is challenging to select an appropriate "error term" for constructing F ratios of different effects. Ch. 12 of Maxwell & Delaney is devoted largely to explaining the various options for error terms and the different F ratios (and different p values) that result. Split-plot designs get even more difficult when they are unbalanced, that is, when there are different numbers of subjects in the different levels of factor A. And, of course, when we do multiple comparisons we have to worry over which corrections to use.
In a Bayesian approach, on the other hand, there is no need to worry over the selection of error terms because we never construct F ratios. And there is no difficulty with unbalanced designs because the parameters are estimated based on whatever data are provided. The only challenge, in the hierarchical Bayesian approach that I prefer, is the construction of sum-to-zero parameterized effects. In all the ANOVA designs described in the book, the parameter values are converted so the effects, as deflections from baseline, sum to zero. (In case you haven't seen it yet, there is an important update regarding the sum-to-zero computations in the book's ANOVA programs, linked here.) With some effort, I figured out a way to convert the parameter estimates to sum-to-zero versions. It is done in R, outside of JAGS.
The model involves main effects for factor A and factor B, an interaction effect for AxB, and a main effect for subject within level of A. The basic model specification looks like this:
for ( i in 1:Ntotal ) {
y[i] ~ dnorm( mu[i] , tau )
mu[i] <- base + a[aLvl[i]] + s[sLvl[i]] + b[bLvl[i]] + axb[aLvl[i],bLvl[i]]
}
The index i goes over rows of the data table, and Ntotal is the total number of data points. The data values, y[i], are assumed to be normally distributed around the predicted mean mu[i]. The predicted mean is the overall baseline, base, plus a deflection a[aLvl[i]] due to being in level aLvl of factor A, plus a deflection for each subject, plus a deflection for factor B, plus a deflection due to interaction of factors A and B. There is a hierarchical prior on each type of deflection, so that the variance of the deflections is estimated.
The tricky part comes in converting the parameter values to sum-to-zero values. Essentially, at each step in the chain the predicted mu is computed, then the cell means for the AxB table are computed. From the cell means, the baseline, A, B, and AxB deflections are computed. And, from the cell means, the within-level deflections of the subjects are computed. (The difficult part for me was figuring out that I should compute the cell means of the AxB table first, and compute everything else, including the subject effects, relative to those cell means.)
The first figure (below) shows marginals of the posterior standard deviations on the various effects and the noise standard deviation. The estimates of the standard deviations are not of primary interest, but here they are:
The next plot (below) shows the baseline, factor A deflections, factor B deflections, and AxB interaction deflections. Notice that the deflections do indeed sum to zero within each type:
The deflections are informative primarily when they are entered into contrasts. For example, the main effect of factor A is quite large:
This agrees with the conclusion from NHST analysis from Maxwell and Delaney, who reported F(1,18)=7.28, p=.0147. But the Bayesian posterior seems much more powerful than the NHST p value. Did I do something wrong, or is the Bayesian analysis simply much more powerful in this design?
We can also do contrasts, such as comparing young versus old (A1 vs A2) at the angle of zero (B1) only. The result is shown in the left panel below, where it can be seen that a difference of zero is within the 95% HDI:
This result agrees with the conclusion from NHST reported by Maxwell & Delaney (pp. 602-605), with F=3.16 and p=.092, or F=3.08 and p still >.05, depending on the choice of error term.
The right panel above shows a contrast that investigates the difference of quadratic trends across the age groups. A quadratic trend for the three levels of angle has contrast coefficients of 1,-2,1. The question is whether the contrast is different across the two levels of age. The Bayesian analysis shows that a difference of zero is within the 95% HDI. This result agrees with conclusion from NHST reported by Maxwell & Delaney (pp. 605-607), with F(1,54)=3.101 and p>.05, or F(1,18)=4.038 with p>.05, depending again on the choice of error term.
As usual, the above contrasts would have to be corrected for multiple comparisons in an NHST approach.
It is seamless to analyze an unbalanced design in the Bayesian approach. For example, the analysis runs completely unchanged (with different results) if we simply delete the last subject from the data table, so that there are 10 subjects in level A1 but only 9 subjects in level A2.
See revised version here.
The data file is linked here, and the program is linked here. To run the program, just put it in the same folder as the data file and make sure that R has that folder as its working directory. The program uses JAGS and rjags, not OpenBUGS and BRugs. You must first install JAGS. JAGS runs on all platforms, including Mac and Linux! And then in R you must install rjags; just type install.packages("rjags").
Please do comment!