But we don't care about the unconstrained parameters. What we care about is the sum-to-zero versions of the parameters, because those are the parameters we use to interpret the data. It turns out that autocorrelation in the sum-to-zero parameters is much less severe. I've known about this for quite a while, but haven't gotten around to modifying the programs until now.

Here's an example. This comes from the two-factor hierarchical Bayesian ANOVA in Section 19.1.3 (p. 520) of the book, regarding salaries. The deflection parameters in the BUGS model are labeled a1[], a2[], and a1a2[,]. The sum-to-zero versions of the parameters are labeled b1[], b2[], and b1b1[,]. Here are the a1[] chains when the thinning is set to 75:

Obviously there is nasty autocorrelation, and much more thinning would be needed if we cared about these parameters. (Added Nov. 7, 2011: Regarding thinning, see a later post which points out that thinning is not advisable. But the remainder of this post, below, is still important for better interpretation of the parameters in Bayesian ANOVA.)

But we don't care about a1[], instead we care about the sum-to-zero reparameterization, b1[]. Here are the b1[] chains derived from the chains above:

There is minimal autocorrelation in the sum-to-zero versions of the parameters.

Therefore, the versions of the programs that accompany the book, which assess and attenuate autocorrelation in the

*non*-sum-to-zero parameters, thin far more than is necessary. The results are not wrong, merely overkill.

I have created new versions of two of the ANOVA programs such that the sum-to-zero constraint is computed inside the BUGS model statement, and therefore the sum-to-zero parameters can be assayed using the plotChains() command. The two new programs are ANOVAonewayBRugsSTZ.R and ANOVAtwowayBRugsSTZ.R . These programs use far less thinning than the original programs to achieve the same results. Other ANOVA programs --such as those that allow heterogeneous variances, and Poisson exponential ANOVA-- can be modified analogously.

Nice. MCMC usually "likes" deviation contrasts better than dummy coding because the X matrix is orthogonal. Similarly, if you want something like polynomial contrasts it would make a great deal of sense to set them up as orthogonal contrasts.

ReplyDeleteJay: I probably misunderstand your comment, but I wanted to be sure that the method used in the programs is clear. The underlying MCMC chain is completely

ReplyDeleteunaffected by the reparameterization, and the "a" parameters in the MCMC chain remain badly autocorrelated. It's just that weassessautocorrelation by considering the reparameterized "b" values, not the "a" values.With respect to contrasts, again those are computed

afterthe MCMC chain is generated, and the contrasts have no influence on the MCMC chain. In the programs many different contrasts are examined, without concern for their orthogonality. In principle, we should check that the chains for the contrasts are not badly autocorrelated, but in practice they are not, so it's not a problem.