A previous post reported an analysis of a "split plot" design, in which one factor is between subjects and a second factor is within subjects. That program has now been revised, and the advantage of Bayesian analysis over NHST has been confirmed.
My heartfelt thanks to

Wolf Vanpaemel for pointing out an error in one of the contrast analyses in the originally posted program. I have now corrected the error and expanded the program to handle the intended contrast. Subsequent discussion with Wolf also prompted me to think harder about whether the model I specified really matched the model used by NHST analyses, and whether the apparent advantage of the Bayesian analysis (reported in the previous post) really made sense. They do, as I explain below. I also revised other aspects of the program to incorporate the improvements in the most recent forms of the other ANOVA-type programs, as reported at

this previous post. (By the way, you may recall that Wolf and co-author

Francis Tuerlinckx wrote

a review of the book.)

**A reminder of the split plot design:**
Consider an example provided by Maxwell & Delaney (2004,

*Designing Experiments and Analyzing Data: A Model Comparison Perspective*,
2nd Edition, Erlbaum; Ch. 12). (As I've mentioned elsewhere, if you
must learn NHST, their book is a great resource.) A perceptual
psychologist is studying response times for identifying letters flashed
on a screen. The letters can be rotated off of vertical by zero degrees,
four degrees, or eight degrees. Every subject responds many times to
letters at each of the three angles. The experimenter (for unknown
reasons) analyzes only the median response time of each subject at each
angle. Thus,

*each subject contributes only one datum at each level of angle (factor B)*.
There are two types of subjects: "young" and "old." Age of subject is
factor A. The structure of the data is shown below. Y is the median
response time, in milliseconds. There are 10 subjects per Age group.

Y | Subj | Angle | Age |

450 | 1 | B1(Zero) | A1(Young) |

510 | 1 | B2(Four) | A1(Young) |

630 | 1 | B3(Eight) | A1(Young) |

390 | 2 | B1(Zero) | A1(Young) |

480 | 2 | B2(Four) | A1(Young) |

540 | 2 | B3(Eight) | A1(Young) |

... | ... | ... | ... |

510 | 10 | B1(Zero) | A1(Young) |

540 | 10 | B2(Four) | A1(Young) |

660 | 10 | B3(Eight) | A1(Young) |

420 | 11 | B1(Zero) | A2(Old) |

570 | 11 | B2(Four) | A2(Old) |

690 | 11 | B3(Eight) | A2(Old) |

600 | 12 | B1(Zero) | A2(Old) |

720 | 12 | B2(Four) | A2(Old) |

810 | 12 | B3(Eight) | A2(Old) |

... | ... | ... | ... |

510 | 20 | B1(Zero) | A2(Old) |

690 | 20 | B2(Four) | A2(Old) |

810 | 20 | B3(Eight) | A2(Old) |

(More info appears at

the original post.)

**An advantage of Bayesian analysis:**
The original post showed that the main effect of age, that is, the between-subject factor, was credibly non-zero, which agreed with the result from NHST, which had a

*p* value a little under .05. The estimated magnitude of difference between Old and Young closely matched the marginal means in the data.

*But the 95% HDI on the difference was very precise compared with the NHST confidence interval. In other words, the Bayesian analysis was much more powerful.* The diference was dramatic, so I expressed some doubt in the original post. But I am now convinced that the difference is real and correct. Essentially, the Bayesian analysis estimates all the parameters simultaneously, and creates a single, fixed posterior distribution over multi-dimensional parameter space. We then examine the posterior from difference perspectives, such as, for example, collapsing across all dimensions except Old minus Young, or collapsing across all dimensions except Four degrees minus Zero degrees. We do not "re-scale" the posterior for different perspectives on the parameters. In NHST, on the other hand, different tests can have different error terms in their

*F* ratios, which means that different comparisons are scaled differently. In particular, the

*F* ratio for Old versus Young uses a different error term (i.e., denominator) than the

*F* ratio for Four degrees versus Zero degrees, and it turns out that the former error term is larger than the latter error term, hence the former test is not as powerful, relatively speaking. Bayes is better!

**The models in the Bayesian analysis and in classical ANOVA:**
In classical ANOVA, the model analyzes the data variance into five components plus residual noise. (See Maxwell & Delaney, Ch. 12, cited above.) The five components are: (1) the (between-subect) main effect, A, (2) the (within-subject) main effect, B, (3) the main effect of subject within levels of A, S/A, (4) the interaction AxB, and (5) the interaction of B with subjects within levels of A, BxS/A. It turns out that the five components actually use up all the variance, so there is zero residual noise remaining. Another way of thinking about it is that

*the BxS/A interaction term cannot be identified separately from the noise*, because there is only one measurement per cell of the design. (That is also why the BxS/A term is used as the error term for some of the

*F* ratios.) Therefore the model I've used in the Bayesian analysis does

*not* include a BxS/A term:

for ( i in 1:Ntotal ) {
y[i] ~ dnorm( mu[i] , tau )
mu[i] <- base + a[aLvl[i]] + s[sLvl[i]] + b[bLvl[i]] + axb[aLvl[i],bLvl[i]]
# The model has no BxS term because that would leave zero noise variance.
}

The model remains unchanged from the original program; the news here is explicitly explaining it.

**The corrected contrast analysis:**
The original post included a few different contrast analyses. One was programmed incorrectly, because I mistakenly put it in the section for interaction contrasts, instead of creating a new section for so-called "simple" contrasts. While correcting the code, I also re-expressed the contrasts in the more meaningful new format reported at

this previous post. Thus, instead of just a matrix of numerical contrast coefficients without clear meaning, the new code uses logical referencing with the level names, like this:

simpleContrastList = list(
A1B1vA2B1 = ( outer(aLvlNames=="A2(Old)",bLvlNames=="B1(Zero)")
- outer(aLvlNames=="A1(Young)",bLvlNames=="B1(Zero)") )
)

That code simply means to compute the difference of "A2(Old),B1(Zero)" and "A1(Young),B1(Zero)", that is, the difference of Old and Young at Zero degrees angle. If you haven't previously tried programming a contrast, that code might still look mysterious, but once you get the hang of it, it's more meaningful and resistant to inadvertent re-ordering of levels than the old format:

simpleContrastList = list(
A1B1vA2B1 = matrix( c(1,0,0,-1,0,0) , nrow=2 , byrow=TRUE )
)

While that old form is shorter, it is less meaningful because it does not show the names of the levels being referred to, and it is easily broken if the levels get re-ordered by different operating system conventions for alphabetizing. One more change: The computation of the simple contrast, at the end of the program, relies on a new section of code.

**Other revisions in the program:**
I also changed the prior specification in the model, as explained at

this previous post. For example, here is the prior on the main effect of A:

for ( j in 1:NaLvl ) { a[j] ~ dnorm( 0.0 , aTau ) }
aTau <- 1 / pow( aSD , 2 )
aSD ~ dgamma(1.221,0.003683) # mode=60,sd=300. Change for scale of data!

The estimate of the standard deviation of the A effect,

aSD, has a gamma prior that has a mode at 60 with declining density toward zero, instead of a "generic" uniform or folded-t or gamma that has notable mass near zero.

Thanks again to Wolf for finding the error in the contrast analysis and spurring the revisions!

**Get the revised program here and the data file here.**