Doing Bayesian Data Analysis: Trade-off of between-group and within-group variance (and implosive shrinkage)

Saturday, April 8, 2017

Trade-off of between-group and within-group variance (and implosive shrinkage)

Background: Consider data that would traditionally be analyzed as single-factor ANOVA; that is, a continuous metric predicted variable, \(y\), and a nominal predictor, "Group." In particular, consider the data plotted as red dots here:

A Bayesian approach easily allows a hierarchical model in which both the within-group and between-group variance are estimated. The hierarchical structure imposes shrinkage on the estimated group means. All of this explained in Chapter 19 of DBDA2E.

The data above come from Exercise 19.1, which is designed to illustrate "implosive shrinkage." Because there are only a few data points in each group, and there are lots of groups with little variance from baseline, a reasonable description of the data merely collapses the group means close to baseline while expanding the estimate of within-group variance.

The purpose of the present post is to show the trade-off in the posterior distribution of the estimated within-group variance and between-group variance, while also providing another view of implosive shrinkage.

After running the data through the generic hierarchical model in Exercise 19.1, we can look at the posterior distribution of the within-group and between-group standard deviations. The code below produces the following plot:

mcmcMat = as.matrix( mcmcCoda )
openGraph()
plot( mcmcMat[,"aSigma"] , mcmcMat[,"ySigma"] , 
      xlab="Between-Group Sigma" , ylab="Within-Group Sigma" , 
      col="skyblue" , cex.lab=1.5 , 
      main="Trade-Off of Variances (and implosive shrinkage)" )
abline( lm( mcmcMat[,"ySigma"] ~ mcmcMat[,"aSigma"] ) )

Notice in the plot that as the between-group standard deviation gets larger, the within-group standard deviation gets smaller. Notice also the implosive shrinkage: The estimate of between-group standard deviation is "piled up" against zero.

4 comments:

AnonymousApril 10, 2017 at 9:07 PM
Hello, just starting this chapter and encountered this error below. Is it something obvious I have misspelt? Thanks, Graeme

myDataFrame=read.csv("FruitflyDataReduced.csv")
yName="Longevity"
xName="CompanionNumber"
source("Jags-Ymet-Xnom1fac-MnormalHom.R")
mcmcCoda=genMCMC(datFrm=myDataFrame,yName=yName, xName=xName,
numSavedSteps=1100, thinSteps=10, saveName=fileNameRoot)

Error in genMCMC(datFrm = myDataFrame, yName = yName, xName = xName, numSavedSteps = 1100, :
object 'fileNameRoot' not found

OUTPUT
Reading parameter file inits1.txt
. Initializing model
. Adapting 500
-------------------------------------------------| 500
++++++++++++++++++++++++++++++++++++++++++++++++++ 100%
Adaptation successful
. Updating 1000
-------------------------------------------------| 1000
************************************************** 100%
. . . . . . Updating 3670
-------------------------------------------------| 3650
************************************************** 100%
* 100%
. . . . Updating 0
. Deleting model
.
All chains have finished
Simulation complete. Reading coda files...
Coda files loaded successfully
Finished running the simulation
ReplyDelete
Replies
UnknownApril 14, 2017 at 4:44 PM
Hi John:
What is the difference between your model and using a so-called random effect ANOVA using lme4 or nlme, both of which are frequentist? The shrinkage here is also obtained by bayes theorem for the random effects. If you summarize the posterior with the mode, I would imagine the estimates are equivalent to those from a frequentist model, which also uses the mode.
ReplyDelete
Replies

Add comment

Doing Bayesian Data Analysis

Saturday, April 8, 2017

Trade-off of between-group and within-group variance (and implosive shrinkage)

4 comments:

Total Pageviews

Blog Roll