## Saturday, April 23, 2011

### ANOVA with NON-homogeneous variances: An example where it matters. Exercise 18.3 (p. 513) has you consider an ANOVA model that does not assume that every group has the same underlying variance. But the model is applied to data that, it turns out, have fairly homogeneous variances, and therefore the basic conclusions do not change from the model with homogeneous variances assumed. The graph at right shows some data in which the assumption of homogeneous variances does matter. There are four groups with apparently quite different variances. For these data, Groups 2 and 3 have small variances, and therefore they might be credibly different if considered alone. But groups 1 and 4 have large variances, so if the model is forced to accommodate all four groups simultaneously with a single variance (i.e., a single precision parameter), then the estimated variances of Groups 2 and 3 will be too large. Consequently, the estimated means of Groups 2 and 3 will be uncertain because a wide range of candidate mean parameter values can accommodate the data when the variance parameter is large. And, therefore, the means of Groups 2 and 3 will not be credibly different. Simultaneously, the estimated variances of Groups 1 and 4 will be too small. Consequently, the estimated means of Groups 1 and 4 will be too certain because only a relatively narrow range of candidate mean parameter values can accommodate the data when the variance parameter is small. And, therefore, the means of Groups 1 and 4 will appear to be credibly different.

Below are histograms of the differences between group deflections when the estimated variances are forced to be equal across groups. Group 2 vs 3 is lower left; Group 1 vs 4 is upper right. Notice that the comparison of Groups 2 and 3 suggests (inappropriately) that they are not credibly different, and the comparison of Groups 1 and 4 suggests (inappropriately) that they are credibly different. Click the image to display it enlarged, but don't forget to click the "back" button of your browser to come back to this page. Below are histograms of the differences between group deflections when the variances are allowed to be non-homogeneous across groups. Group 2 vs 3 is lower left; Group 1 vs 4 is upper right. Notice that the comparison of Groups 2 and 3 suggests (appropriately) that they are credibly different, and the comparison of Groups 1 and 4 suggests (appropriately) that they are not credibly different. Click the image to display it enlarged, but don't forget to click the "back" button of your browser to come back to this page.  By the way, if you run a traditional, NHST-style ANOVA on these data, using R's aov() function, and follow it with pairwise comparisons using R's TukeyHSD() function, you get the graph displayed at the right. Notice that it says that the difference between Groups 1 and 4 (labeled 4-1) is significant, but the difference between groups 2 and 3 (labeled 3-2) is not significant. In other words, aov() assumes homogeneity of variances.

1. What you've shown is that people should not use ordinary ANOVA. Mixed models can do a much better job. For example:

dat <- transform(dat, x=factor(x))
m1 <- asreml(y~x, data=dat, rcov=~at(x):units)
predict(m1,classify="x", vcov=TRUE, sed=TRUE)\$pred

x Predicted Std Err Status
1 95.85 1.888 Estimable
2 98.99 0.2974 Estimable
3 101.1 0.1792 Estimable
4 103.8 3.106 Estimable

The SED for group 2 vs group 3 is 0.3471798, which nicely shows the difference between the groups.

2. Kevin W:

Thanks for the comment.

The point of the post was to show that expanding an ANOVA model to include different variances for different groups (and shrinking them under a hyperprior) is not only straightforward in BUGS but is also consequential for interpreting data, unlike Exercise 18.3 in the book.

Your comment points out that non-traditional models can also be used in non-Bayesian approaches. But the non-Bayesian approach yields only a point estimate of the parameters, obtained from maximum likelihood (ML) or restricted maximum likelihood (REML). For significance testing, the non-Bayesian approach relies on additional assumptions to approximate sampling distributions and p-values (and is still stuck with concerns about corrections for multiple comparisons).

The Bayesian posterior distribution, on the other hand, yields complete distributional information about credible combinations of parameter values.

In other words, for any particular (hierarchical) model of data, the analyst could take a Bayesian or non-Bayesian approach to estimating and interpreting its parameters. In a Bayesian approach, a prior must be specified (which is typically innocuous at worst or genuinely useful at best ) and the posterior is computed (often by MCMC which is typically tractable ). The result is a complete conjoint distribution over the parameter space, which can be directly interpreted. In a non-Bayesian approach such as ML, the result is a point estimate. If desired, analysts can approximate the joint likelihood function at the MLE using a Laplacian or whatever strikes their fancy, but it's only a local approximation. And to interpret significance, the non-Bayesian approach has to make additional assumptions and approximations to generate sampling distributions and p-values. Why bother when complete information is revealed by the Bayesian approach?

 A complete discussion of priors for all possible exotic models is way beyond the scope of this comment, so let it go.

 Sure, MCMC can get stuck and inefficient in some applications, so the user has to get clever. But hopefully software will get better at detecting and self-correcting these situations.

3. Dr. Kruschke,
Is it statistically acceptable to run the non-homogeneous ANOVA on data that follow non-normal distributions? If so, how do you change the code to run such tests? Specifically, I am interested in analyzing data that are gamma distributed.

4. Dear Anonymous July 31:

Yes, it's easy. In the code ANOVAonewayNonhomogvarJagSTZ.R, change this

for ( i in 1:Ntotal ) {
y[i] ~ dnorm( mu[i] , tau[x[i]] )
mu[i] <- a0 + a[x[i]]
}

to this

for ( i in 1:Ntotal ) {
y[i] ~ dgamma( sY[i] , rY[i] )
sY[i] <- pow(mu[i],2) * tau[x[i]]
rY[i] <- mu[i] * tau[x[i]]
mu[i] <- max( a0 + a[x[i]] , 0.0001 )
}

Also, do NOT standardize the data, because the data must be non-negative to come from a gamma distribution. Therefore, in the dataList, use the original data y, not standardized data. And, after JAGS/BUGS has generated an MCMC chain, do not convert back from standardized to original scale, because the data weren't standardized in the first place.

Enjoy!

5. The asreml() function does not seem to exist? I am sorry, I cannot find it except in relation to other packages, and then it is still different (ASREML, etc).

6. I leave it to Kevin W. to answer that. I've not used the asreml function (it's someone else's package, nothing I've written or dealt with).