Saturday, May 19, 2012

Split-Plot Design in JAGS: Revised version

A previous post reported an analysis of a "split plot" design, in which one factor is between subjects and a second factor is within subjects. That program has now been revised, and the advantage of Bayesian analysis over NHST has been confirmed.

My heartfelt thanks to Wolf Vanpaemel for pointing out an error in one of the contrast analyses in the originally posted program. I have now corrected the error and expanded the program to handle the intended contrast. Subsequent discussion with Wolf also prompted me to think harder about whether the model I specified really matched the model used by NHST analyses, and whether the apparent advantage of the Bayesian analysis (reported in the previous post) really made sense. They do, as I explain below. I also revised other aspects of the program to incorporate the improvements in the most recent forms of the other ANOVA-type programs, as reported at this previous post. (By the way, you may recall that Wolf and co-author Francis Tuerlinckx wrote a review of the book.)

A reminder of the split plot design:
Consider an example provided by Maxwell & Delaney (2004, Designing Experiments and Analyzing Data: A Model Comparison Perspective, 2nd Edition, Erlbaum; Ch. 12). (As I've mentioned elsewhere, if you must learn NHST, their book is a great resource.) A perceptual psychologist is studying response times for identifying letters flashed on a screen. The letters can be rotated off of vertical by zero degrees, four degrees, or eight degrees. Every subject responds many times to letters at each of the three angles. The experimenter (for unknown reasons) analyzes only the median response time of each subject at each angle. Thus, each subject contributes only one datum at each level of angle (factor B). There are two types of subjects: "young" and "old." Age of subject is factor A. The structure of the data is shown below. Y is the median response time, in milliseconds. There are 10 subjects per Age group.
Y Subj Angle Age
450 1 B1(Zero) A1(Young)
510 1 B2(Four) A1(Young)
630 1 B3(Eight) A1(Young)
390 2 B1(Zero) A1(Young)
480 2 B2(Four) A1(Young)
540 2 B3(Eight) A1(Young)
... ... ... ...
510 10 B1(Zero) A1(Young)
540 10 B2(Four) A1(Young)
660 10 B3(Eight) A1(Young)
420 11 B1(Zero) A2(Old)
570 11 B2(Four) A2(Old)
690 11 B3(Eight) A2(Old)
600 12 B1(Zero) A2(Old)
720 12 B2(Four) A2(Old)
810 12 B3(Eight) A2(Old)
... ... ... ...
510 20 B1(Zero) A2(Old)
690 20 B2(Four) A2(Old)
810 20 B3(Eight) A2(Old)
(More info appears at the original post.)

An advantage of Bayesian analysis:
The original post showed that the main effect of age, that is, the between-subject factor, was credibly non-zero, which agreed with the result from NHST, which had a p value a little under .05. The estimated magnitude of difference between Old and Young closely matched the marginal means in the data. But the 95% HDI on the difference was very precise compared with the NHST confidence interval. In other words, the Bayesian analysis was much more powerful. The diference was dramatic, so I expressed some doubt in the original post. But I am now convinced that the difference is real and correct. Essentially, the Bayesian analysis estimates all the parameters simultaneously, and creates a single, fixed posterior distribution over multi-dimensional parameter space. We then examine the posterior from difference perspectives, such as, for example, collapsing across all dimensions except Old minus Young, or collapsing across all dimensions except Four degrees minus Zero degrees. We do not "re-scale" the posterior for different perspectives on the parameters. In NHST, on the other hand, different tests can have different error terms in their F ratios, which means that different comparisons are scaled differently. In particular, the F ratio for Old versus Young uses a different error term (i.e., denominator) than the F ratio for Four degrees versus Zero degrees, and it turns out that the former error term is larger than the latter error term, hence the former test is not as powerful, relatively speaking. Bayes is better!

The models in the Bayesian analysis and in classical ANOVA:
In classical ANOVA, the model analyzes the data variance into five components plus residual noise. (See Maxwell & Delaney, Ch. 12, cited above.) The five components are:  (1) the (between-subect) main effect, A, (2) the (within-subject) main effect, B, (3) the main effect of subject within levels of A, S/A, (4) the interaction AxB, and (5) the interaction of B with subjects within levels of A, BxS/A. It turns out that the five components actually use up all the variance, so there is zero residual noise remaining. Another way of thinking about it is that the BxS/A interaction term cannot be identified separately from the noise, because there is only one measurement per cell of the design. (That is also why the BxS/A term is used as the error term for some of the F ratios.) Therefore the model I've used in the Bayesian analysis does not include a BxS/A term:
  for ( i in 1:Ntotal ) {
    y[i] ~ dnorm( mu[i] , tau )
    mu[i] <- base + a[aLvl[i]] + s[sLvl[i]] + b[bLvl[i]] + axb[aLvl[i],bLvl[i]]
    # The model has no BxS term because that would leave zero noise variance.
 The model remains unchanged from the original program; the news here is explicitly explaining it.

The corrected contrast analysis:
The original post included a few different contrast analyses. One was programmed incorrectly, because I mistakenly put it in the section for interaction contrasts, instead of creating a new section for so-called "simple" contrasts. While correcting the code, I also re-expressed the contrasts in the more meaningful new format reported at this previous post. Thus, instead of just a matrix of numerical contrast coefficients without clear meaning, the new code uses logical referencing with the level names, like this:
  simpleContrastList = list(
    A1B1vA2B1 = ( outer(aLvlNames=="A2(Old)",bLvlNames=="B1(Zero)")
                  - outer(aLvlNames=="A1(Young)",bLvlNames=="B1(Zero)") )
That code simply means to compute the difference of "A2(Old),B1(Zero)" and "A1(Young),B1(Zero)", that is, the difference of Old and Young at Zero degrees angle. If you haven't previously tried programming a contrast, that code might still look mysterious, but once you get the hang of it, it's more meaningful and resistant to inadvertent re-ordering of levels than the old format:
  simpleContrastList = list(
    A1B1vA2B1 = matrix( c(1,0,0,-1,0,0) , nrow=2 , byrow=TRUE )
While that old form is shorter, it is less meaningful because it does not show the names of the levels being referred to, and it is easily broken if the levels get re-ordered by different operating system conventions for alphabetizing. One more change: The computation of the simple contrast, at the end of the program, relies on a new section of code.

Other revisions in the program:
I also changed the prior specification in the model, as explained at this previous post. For example, here is the prior on the main effect of A:
  for ( j in 1:NaLvl ) { a[j] ~ dnorm( 0.0 , aTau ) }
  aTau <- 1 / pow( aSD , 2 )
  aSD ~ dgamma(1.221,0.003683) # mode=60,sd=300. Change for scale of data!
The estimate of the standard deviation of the A effect, aSD, has a gamma prior that has a mode at 60 with declining density toward zero, instead of a "generic" uniform or folded-t or gamma that has notable mass near zero.

Thanks again to Wolf for finding the error in the contrast analysis and spurring the revisions!

Get the revised program here and the data file here.


  1. Coming from an agricultural background where plots are, well, plots (split-plots are divided, etc) and not having a copy of Maxwell & Delaney it would be very useful to see the code for the NHST analysis. It is hard to compare results between, say, nlme & JAGS in the abstract.

  2. Maybe you could use this post (and the oats dataset) to explain the split-plot example in a more traditional setting.


  3. The "split plot" terminology does seem to have different meanings in different fields (so to speak). It's a nice suggestion, therefore, to show the results from another data set that people might be more familiar with. Unfortunately, the oats data set has a different structure than what's being used in the Maxwell & Delaney example, as far as I can tell.

    In the oats data, there are three completely crossed factors: Block (6 levels), Variety (3 levels), and Nitrogen (4 levels). That's 72 (=6x3x4) data points altogether. No factor is nested within another factor. (At least, not as I understand the data matrix as shown in R.)

    But in the Maxwell & Delaney example, one of the factors (namely, subject) is nested within one of the other factors (namely, age). There are three factors: Subject (total of 20 levels, 10 in each of 2 ages), Age (2 levels), and Angle (3 levels). A complete crossing of the three factors would create 20x3x2 = 120 data points, but because Subject is not fully crossed with age, there are actually only 10x3 + 10x3 = 60 data points.

  4. Hi John,

    The treatment structure is a full factorial, but what makes it a split plot is the allocation of experimental units to treatments. Each of the six blocks was divided into 3 main plots (first level of randomization for plant varieties). Then each main plot was split into subplots (second level of randomization for fertilizer).

  5. Would you handle the effects and sum-to-zero constraints differently if the experimenter DID include multiple measurements of response time for each subject at each angle (instead of only using the median response)? Or would you still have a single main effect of subject within level A and proceed exactly the same?

    1. If every subject has many data in every cell, then it can make sense to model every subject individually, and put hierarchical structure across the individual parameters.

      For complex multifactor designs, a nice resource is the brms package (for Stan) by Paul Burkner,