## Monday, July 11, 2016

### MCMC effective sample size for difference of parameters (in Bayesian posterior distribution)

We'd like the MCMC representation of a posterior distribution to have large effective sample size (ESS) for the relevant parameters. (I recommend ESS > 10,000 for reasonably stable estimates of the limits of the 95% highest density interval.) In many applications, we are interested in combinations of parameters, not just individual parameters. For example, in multi-group comparisons, we are interested in differences of the mean parameters. We might also be interested in the effect size (Cohen's d), which is the difference of mean parameters divided by the (pooled) standard deviation parameter. Unfortunately, the ESS of the combination of parameters might not be as high as the ESS's of the contributing parameters (or it might be higher!). This change in ESS is seen routinely in many results throughout DBDA2E.

This blog post shows a simple contrived example to demonstrate that the correlation of the contributing parameters is not necessarily linked to how much the ESS changes when the parameters are added or subtracted from each other. Consider estimating the means of two groups, with mean parameters, $\mu_1$ and $\mu_2$. We are interested in their difference, $\mu_1 - \mu_2$. See the plots below for two cases in which the correlation of the mean parameters is the same, but their ESS's when combined are quite different. In the left panel, ESS of the difference is much less than ESS of contributing parameters. In the right panel, ESS of difference is same as ESS of contributing parameters.

P.S. This example was motivated by an exercise for one of the workshops I give.

Appendix: The R code.

# Contrived example to illustrate that ESS of combination of parameters
# can be different that ESS of individual parameters.
library(coda)
# First example.
set.seed(47405)
nstep = 1000
x = rnorm(nstep)
effectiveSize(x)
xmy = x
xmy[2:nstep] = xmy[2:nstep] + 1.0 * xmy[1:(nstep-1)]
effectiveSize(xmy)
xpy = rnorm(length(x))
muOne = xpy - xmy
muTwo = xpy + xmy
effectiveSize(muOne)
effectiveSize(muTwo)
effectiveSize(muOne-muTwo)
# Plot result:
layout(matrix(1:2,nrow=1))
par(pty="s")
plot( muOne , muTwo , type="o" , asp=1 ,
xlab=bquote(mu) , ylab=bquote(mu) ,
main=bquote(atop(
list(
ESS[mu]==.(round(effectiveSize(muOne),0)) ,
ESS[mu]==.(round(effectiveSize(muTwo),0)) ) ,
list(
ESS[mu-mu]==.(round(effectiveSize(muOne-muTwo),0)) ,
ESS[mu+mu]==.(round(effectiveSize(muOne+muTwo),0)) )
)) ,
cex.lab=1.5 , cex.main=1.0 )
text( min(muOne) , min(muTwo) , adj=c(0,0) ,
labels=bquote(r[list(mu,mu)]==.(round(cor(muOne,muTwo),2))) )
# Second example:
set.seed(47405)
nstep=1000
muOne = rnorm(nstep)
muTwo = -0.35*muOne + rnorm(nstep)
effectiveSize(muOne)
effectiveSize(muTwo)
effectiveSize(muOne-muTwo)
# Plot result:
plot( muOne , muTwo , type="o" , asp=1 ,
xlab=bquote(mu) , ylab=bquote(mu) ,
main=bquote(atop(
list(
ESS[mu]==.(round(effectiveSize(muOne),0)) ,
ESS[mu]==.(round(effectiveSize(muTwo),0)) ) ,
list(
ESS[mu-mu]==.(round(effectiveSize(muOne-muTwo),0)) ,
ESS[mu+mu]==.(round(effectiveSize(muOne+muTwo),0)) )
)) ,
cex.lab=1.5 , cex.main=1.0 )
text( min(muOne) , min(muTwo) , adj=c(0,0) ,
labels=bquote(r[list(mu,mu)]==.(round(cor(muOne,muTwo),2))) )

1. Interesting and useful post. It is also nice to see your recommendation of at least 10,000 effective samples. I feel much MCMC analysis is done with too few Monte Carlo samples.

If interest is in estimating a contrast of the posterior parameters, (even if its of large dimension) it would be more appropriate to look at an estimate of ESS that takes correlations across components into account. This recent paper allows you to do that (http://arxiv.org/abs/1512.07713), by estimating the asymptotic covariance matrix of the estimates. The paper also gives a lower bound on the number of effective samples to guarantee precision in the estimates. I think you will find that section interesting and relevant.

You can find codes for this in the R package mcmcse.

2. Dear Dootika:

My reasons for recommending ESS >= 10,000 for any relevant parameter or contrast are explained in Section 7.5.2, specifically p. 184, of DBDA2E. Essentially, if the goal is a stable estimate of the 95% HDI limits, a large ESS is needed. The material in Section 7.5.2 illustrates the degree MCMC "wobble" in 95% HDI limits when ESS=10,000.

In all of the functions of DBDA2E (such as smryMCMC and plotPost) the ESS of the MCMC chain is returned. So, if a user computes mu1-mu2, the ESS of the difference is explicitly returned. It can be quite surprising that ESS of combinations of parameters is very different that ESS of the separate parameters --- regardless of the correlations of the parameters in the posterior distribution.

3. That exact same idea of making sure the 95% intervals are stable is used to create lower bound in the paper. Depending on how much "wobble" you are willing to afford, you get different lower bounds. But the numbers are comparable to 10,000 so certain levels of precision.

It is actually not surprising to me that ESS of combinations is very different from ESS of individuals. This has been my experience with ESS so far, that it is (as it should be) heavily dependent upon the function you are estimating.

4. Right, "surprise" is relative to one's current intuition, and intuition is always based on one's current background knowledge and experience. It can be surprising to newcomers to MCMC methods that ESS can be so different for combinations of parameters and component parameters, presumably because newcomers have a latent (and wrong) intuition about independence of MCMC steps for different parameters. But once you see examples to the contrary, and think about it, it's not so surprising anymore. One's intuitions become re-educated.

5. Absolutely agree, which is why I think this is an excellent post!

6. Hi Prof. Kruschke, I have a doubt related to this post.
I've been reading your book, and on page 185, you state that the figures were drawn with 50000 MCMC samples, each with ESS=10000. First, how were you able to do that? Is it known that for Normal dist., and a certain MCMC algorithm takes 50000 to reach an approximate 10000? And according to your post, shouldn't you also tell the reader what is the ESS for the width(difference of HDI's limits)?

1. Regarding Figure 7.13 on p. 185: Every MCMC sample had an ESS of essentially 10,000 because pseudo-random number generators for the normal distribution have essentially zero autocorrelation. For every sample (with ESS=10,000) I computed the HDI limits. I did that 50,000 times. The histograms show the distributions of HDI limits across the 50,000 simulated MCMC chains.

2. Thank you for your reply. I still have one more doubt though. How did you check for the ESS? Did you compute it, every cycle, or did you run the program until you were sure that you had at least 10,000 ?

3. Here's how to check the ESS of a sample:
> mcmcChain = rnorm(10000)
> library(coda)
> effectiveSize( mcmcChain )
var1
10000