Doing Bayesian Data Analysis: December 2011

See revised version here.

For many months I've been meaning to create the code for a split-plot design. The basic split-plot design has each subject participate in every level of factor B, but only one level of factor A. In this blog post, I report an example of a hierarchical Bayesian approach to a split-plot design, coded in JAGS (not BUGS). As this is my first go at this sort of design, please consider this a preliminary version of the analysis, and please provide feedback regarding any errors or infelicities. I'm also looking for people to do analogous NHST analyses for comparison, and I'm looking for other data sets for additional examples.

To make things concrete, consider an example provided by Maxwell & Delaney (2004, Designing Experiments and Analyzing Data: A Model Comparison Perspective, 2nd Edition, Erlbaum; Ch. 12). (As I've mentioned elsewhere, if you must learn NHST, their book is a great resource.) A perceptual psychologist is studying response times for identifying letters flashed on a screen. The letters can be rotated off of vertical by zero degrees, four degrees, or eight degrees. Every subject responds many times to letters at each of the three angles. The experimenter (for unknown reasons) analyzes only the median response time of each subject at each angle. Thus, each subject contributes only one datum at each level of angle (factor B). There are two types of subjects: "young" and "old." Age of subject is factor A. The structure of the data is shown below. Y is the median response time, in milliseconds. There are 10 subjects per Age group.

Y	Subj	Angle	Age
450	1	B1(Zero)	A1(Young)
510	1	B2(Four)	A1(Young)
630	1	B3(Eight)	A1(Young)
390	2	B1(Zero)	A1(Young)
480	2	B2(Four)	A1(Young)
540	2	B3(Eight)	A1(Young)
...	...	...	...
510	10	B1(Zero)	A1(Young)
540	10	B2(Four)	A1(Young)
660	10	B3(Eight)	A1(Young)
420	11	B1(Zero)	A2(Old)
570	11	B2(Four)	A2(Old)
690	11	B3(Eight)	A2(Old)
600	12	B1(Zero)	A2(Old)
720	12	B2(Four)	A2(Old)
810	12	B3(Eight)	A2(Old)
...	...	...	...
510	20	B1(Zero)	A2(Old)
690	20	B2(Four)	A2(Old)
810	20	B3(Eight)	A2(Old)

Split-plot designs are difficult to analyze in NHST because it is challenging to select an appropriate "error term" for constructing F ratios of different effects. Ch. 12 of Maxwell & Delaney is devoted largely to explaining the various options for error terms and the different F ratios (and different p values) that result. Split-plot designs get even more difficult when they are unbalanced, that is, when there are different numbers of subjects in the different levels of factor A. And, of course, when we do multiple comparisons we have to worry over which corrections to use.

In a Bayesian approach, on the other hand, there is no need to worry over the selection of error terms because we never construct F ratios. And there is no difficulty with unbalanced designs because the parameters are estimated based on whatever data are provided. The only challenge, in the hierarchical Bayesian approach that I prefer, is the construction of sum-to-zero parameterized effects. In all the ANOVA designs described in the book, the parameter values are converted so the effects, as deflections from baseline, sum to zero. (In case you haven't seen it yet, there is an important update regarding the sum-to-zero computations in the book's ANOVA programs, linked here.) With some effort, I figured out a way to convert the parameter estimates to sum-to-zero versions. It is done in R, outside of JAGS.

The model involves main effects for factor A and factor B, an interaction effect for AxB, and a main effect for subject within level of A. The basic model specification looks like this:
for ( i in 1:Ntotal ) {
y[i] ~ dnorm( mu[i] , tau )
mu[i] <- base + a[aLvl[i]] + s[sLvl[i]] + b[bLvl[i]] + axb[aLvl[i],bLvl[i]]
}
The index i goes over rows of the data table, and Ntotal is the total number of data points. The data values, y[i], are assumed to be normally distributed around the predicted mean mu[i]. The predicted mean is the overall baseline, base, plus a deflection a[aLvl[i]] due to being in level aLvl of factor A, plus a deflection for each subject, plus a deflection for factor B, plus a deflection due to interaction of factors A and B. There is a hierarchical prior on each type of deflection, so that the variance of the deflections is estimated.

The tricky part comes in converting the parameter values to sum-to-zero values. Essentially, at each step in the chain the predicted mu is computed, then the cell means for the AxB table are computed. From the cell means, the baseline, A, B, and AxB deflections are computed. And, from the cell means, the within-level deflections of the subjects are computed. (The difficult part for me was figuring out that I should compute the cell means of the AxB table first, and compute everything else, including the subject effects, relative to those cell means.)

The first figure (below) shows marginals of the posterior standard deviations on the various effects and the noise standard deviation. The estimates of the standard deviations are not of primary interest, but here they are:

January 8, 2012: It was pointed out to me that the SD estimates for aSD (age), bSD (Angle), and axbSD are not very stable, even for chains of length 100,000. The reason is that only a few values inform the estimates. Specifically, for aSD, only two values inform it, namely a[1] and a[2], so it is very uncertain. And for bSD, only three values inform it, namely b[1], b[2], and b[3], so it too is very uncertain. I therefore included a warning message in the display. . Estimates of the effects (graphs below) are stable.

The next plot (below) shows the baseline, factor A deflections, factor B deflections, and AxB interaction deflections. Notice that the deflections do indeed sum to zero within each type:

The deflections are informative primarily when they are entered into contrasts. For example, the main effect of factor A is quite large:

This agrees with the conclusion from NHST analysis from Maxwell and Delaney, who reported F(1,18)=7.28, p=.0147. But the Bayesian posterior seems much more powerful than the NHST p value. Did I do something wrong, or is the Bayesian analysis simply much more powerful in this design?

We can also do contrasts, such as comparing young versus old (A1 vs A2) at the angle of zero (B1) only. The result is shown in the left panel below, where it can be seen that a difference of zero is within the 95% HDI:

This result agrees with the conclusion from NHST reported by Maxwell & Delaney (pp. 602-605), with F=3.16 and p=.092, or F=3.08 and p still >.05, depending on the choice of error term.

The right panel above shows a contrast that investigates the difference of quadratic trends across the age groups. A quadratic trend for the three levels of angle has contrast coefficients of 1,-2,1. The question is whether the contrast is different across the two levels of age. The Bayesian analysis shows that a difference of zero is within the 95% HDI. This result agrees with conclusion from NHST reported by Maxwell & Delaney (pp. 605-607), with F(1,54)=3.101 and p>.05, or F(1,18)=4.038 with p>.05, depending again on the choice of error term.

As usual, the above contrasts would have to be corrected for multiple comparisons in an NHST approach.

It is seamless to analyze an unbalanced design in the Bayesian approach. For example, the analysis runs completely unchanged (with different results) if we simply delete the last subject from the data table, so that there are 10 subjects in level A1 but only 9 subjects in level A2.

See revised version here.

The data file is linked here, and the program is linked here. To run the program, just put it in the same folder as the data file and make sure that R has that folder as its working directory. The program uses JAGS and rjags, not OpenBUGS and BRugs. You must first install JAGS. JAGS runs on all platforms, including Mac and Linux! And then in R you must install rjags; just type install.packages("rjags").

Please do comment!

It's greatly appreciated when people go to the effort to write a nice review on Amazon. It's appreciated not only by the author :-) but crucially also by prospective readers who are trying to decide whether the book is worth getting. Here are some excerpts from some recent reviews on Amazon.com

This is one of the best written and accessible statistics books I've come across. Obviously, a lot of thinking went into coming up with examples and intuitive explanation of various ideas. I was consistently amazed at author's ability to not just say how something is done but why it is done that way using simple examples. I've read far more mathematically sophisticated explanations of statiscal modeling but, in this book,I felt I was allowed to peek into the mind of previous authors as to what they were really thinking when writing down their math formulas. (Posted November 11, 2011 by Davar314, San Francisco, CA)

As far as I am concerned, if you write a book this good, you get to put whatever you like on the cover - puppies, Angelina Jolie, even members of the metal band "Das Kruschke". While reading "DBDA" - reading *and* stepping through the code examples - will not make you a "Bayesian black-belt", it's impressive how much information it *will* give you - the book is almost 700 pages, after all - and you don't need (but it helps) to have tried to get the hang of the "Bayesian stuff" with other books to appreciate how friendly and effective this one is. (The author's explanation of the Metropolis algorithm is a good example). At the risk of sounding grandiose, the book just might do for Bayesian methods what Apple's original Mac did for the personal computer; here's hoping. (Posted December 7, 2011 by Dimitri Shvorob)

Click here for the full text of all the reviews on Amazon. Thanks again, reviewers, for the nice comments and for helping prospective readers.

Doing Bayesian Data Analysis

Wednesday, December 28, 2011

Split-Plot Design in JAGS (preliminary version)

Thursday, December 8, 2011

Some more nice reviews on Amazon.com

Total Pageviews

Blog Roll