Friday, July 29, 2011

How long should an MCMC chain be to get a stable HDI?

In the book, the default length of an MCMC chain differs from one program to another, ranging from around 2,000 to 10,000 total steps combined across chains (e.g., 3 chains with 1,000 steps each is 3,000 total steps). But how many steps are needed for a stable estimate of the HDI limits? Is 2,000 enough? Is 10,000 enough?

The issue motivating the question is that the MCMC chain is just one random sample from the true underlying posterior distribution. If we seeded the random chain differently (or used a few more or less burn-in steps, or thinned slightly more or less) we would have a different random MCMC chain with slightly different HDI limits because of random noise in the MCMC sample. When the MCMC chain gets very long, then in the limit it approximates the true posterior extremely well. But for chains of realistic length (not the unobtainable "in the limit" chain), how stable is the MCMC estimate of the HDI?

This blog entry reports some Monte Carlo simulations to illustrate the answer. Suppose that the true posterior is a known distribution, so we can easily generate random values from it. What we'll do is generate an MCMC chain from the true posterior, and estimate the (95%) HDI from the MCMC chain. We'll do that many times and keep track of the estimated HDI's, so we can see how much the estimated HDI's vary. For each MCMC chain we'll also determine the 2.5%ile and the 97.5%ile, to compare with the 95% HDI.

Here are results when the true posterior is a gamma distribution and the MCMC chain has length 1,000:
The top panel above shows the true gamma distribution from which the MCMC chains were generated. Each MCMC chain had 1,000 values in it, and for each chain the 95% HDI was estimated. This was done 20,000 times. The middle row of histograms shows the HDI limits across the 20,000 replications. The bottom row of histograms shows the percentiles across the 20,000 replications. The vertical dashed lines in the histograms show the true values of the HDI.
  • Notice first that the 2.5%ile and 97.5%ile, in the bottom row, are very poor estimates of the HDI. This is because the true distribution is skewed, and the equal-tailed credible interval is quite different than the HDI. 
  • Notice next that there is indeed quite a lot of variation across the 20,000 replications. For example, the histogram of "HDI low" spans the true value of 0.0799, but different chains give different estimates. (The histogram displays the 95% HDI across the 20,000 replications going from 0.02559 to 0.1976. This is not the HDI of the MCMC chain! It is the HDI, across replications, of the estimated lower value of the HDI of the MCMC chain.)
  • Notice finally that there is a bit of bias in the estimate of the limits of HDI. Specifically, the estimated HDI is a little bit narrower than the true HDI. This makes intuitive sense: The estimated HDI finds the narrowest 95% interval in the finite MCMC sample. Because the tails of the distribution have more points sampled just inside the HDI than just outside the HDI, there is, for small chains, a higher chance of finding a narrowest 95% interval just inside the true HDI. As the chain length gets larger, this small-sample bias is reduced, as can be seen in the next figures

Here are the results when the chain length is 5,000:

And here are the results when the chain length is 10,000:
When the chains have a length of 10,000, the bias in the HDI estimate is only slight. There is still, of course, some imprecision in the estimate of the HDI limits (and width). It is up to the user to ascertain whether this amount of imprecision is tolerable for the particular application, or whether a longer MCMC chain should be computed.

Conclusion: When the limits of the HDI are close to the decision boundary (such as the ROPE), or when the HDI width is only marginally narrow enough for the desired precision, then be sure to use a very long MCMC chain so that the HDI limits are stable. The graphs above help illustrate how long is "very long." In general, while the book uses chains on the order of 5,000 steps for pedagogical purposes, real research reports should probably use at least 10,000 steps or much more as patience permits.


  1. Dear John,

    Since I am relatively new to Bayesian inference, I am trying to develop some intuition on commonly assumed issues in practice. One of which is the one you approach on this blog post.

    Therefore, I am trying to visualize the stability of the lower and upper 95% HDI's under different numbers of efficient samples.

    Would you by any chance still have the code used for this post available?

    Thank you very much!



    1. Thanks for your interest. This topic is discussed at length in Chapter 7 (Section 7.5.2) of DBDA2E.

    2. Thank you very much for the quick and helpful response.

      I will look into it!


  2. Dear John,

    first of all - thank you for your book and for making Bayesian Statistics understandable and applicable for psychologists as well!

    At one point, however, I'm just stuck and i can't find the answer in your book (2011 version) and I thought my question would fit to this topic:
    Why, in general, do we use multiple chains to estimate the posterior distribution. What is the difference between 1 chain with 5000 steps vs. 5 chains with 1000 steps, regardless of the burn-in period.

    Thank you in advance.

    Best regards

  3. The first edition did not explain MCMC diagnostics much. The second edition does a more thorough job; see section 7.5 pp. 178-188. Basically, when multiple chains show the same form, we gain confidence that the chains are not stuck or clumpy in unrepresentative regions of parameter space.