*Computational Modeling of Cognition and Behavior*(

**at the publisher**,

**at Amazon.com**), that includes some chapters on Bayesian methods. Each chapter includes a little "in vivo" commentary by an outside contributor. My commentary accompanies their chapter regarding JAGS. The commentary is posted here in a succession of three blog posts; this is 1 of 3. Do check out their book!

**Run MCMC to achieve effective sample size (ESS) of 10,000**

Bayesian analysis of complex models is possible only by virtue of modern software that takes an abstract model specification and returns a representation of the posterior distribution. In software that uses Markov-chain Monte Carlo (MCMC) methods, such as JAGS, the representation is inherently noisy. The random noise from MCMC tends to cancel out as the chain gets longer and longer. But different aspects of the posterior distribution are differently affected by noise. A relatively stable aspect is the median value of the chain. The median tends to stabilize relatively quickly, that is, with relatively shorter chains, because the median is usually in a high-density region of the posterior and the value of the median does not depend on the distance to outliers (unlike the mean). But other crucial aspects of the posterior distribution tend to need longer chains to achieve stable values.

In particular, a crucial aspect of a parameter distribution is its width. Narrower distributions connote more certainty in the estimate of the parameter. A very useful indicator of the width of a distribution is its 95% highest density interval (HDI). Parameter values within the 95% HDI have higher probability density than parameter values outside the HDI, and the parameter values inside the 95% HDI have a total probability of 95%. An example of an HDI is illustrated in Figure 1.

How long of a chain is needed to produce stable estimates of the 95% HDI? One useful heuristic answer is 10,000 independent steps. The rationale for the heuristic is explained in Section 7.5.2 of

**Kruschke (2015)**. Note that the requirement is 10,000

*independent*steps. Unfortunately, most MCMC chains are strongly autocorrelated, meaning that successive steps are near each other, and are not independent. Therefore we need a measure of chain length that takes into account the autocorrelation of the chain. Such a measure is called the effective sample size (ESS), for which a formal definition is provided in Section 7.5.2 of

**Kruschke (2015)**.

ESS is computed in R by the effectiveSize function (which is in the coda package, which in turn is part of the rjags package for JAGS). For example, suppose we have generated an MCMC chain using the rjags function, coda.samples, and the resulting object is called mcmcfin. Then we can find the ESS of the parameters by typing effectiveSize(mcmcfin).

It is crucial to realize that (i) the ESS will usually be much less than the number of steps in the MCMC chain, and (ii) every parameter in a multi-parameter model has a different ESS. Some parameters might have large ESS while others have small ESS. Moreover, combinations of parameters, such as a difference of two means, can have quite different ESS than the separate parameters. Therefore it is important to check the ESS of every parameter of interest, and the ESS of any interesting parameter combinations.

Stay tuned for Parts 2 and 3...

What do we do if the ESS is low (<1k), even if we're running 200k+ numSavedSteps?

ReplyDeleteSevere autocorrelation can come from different sources. It's an interplay of model and data; that is, a model can have little autocorrelation for some data but lots of autocorrelation for other data. In these cases you have to "get smart" about what's going on in your model by investigating the posterior distribution -- is there a strong trade-off (i.e., correlation) of parameters in the posterior distribution? An example of this is the trade-off of intercept and slope in linear regression. If so, it's possible that a clever re-scaling of the data could solve the problem. Or, maybe the autocorrelation is telling you something about the parameterization in your model. An example of this is in "ANOVA" style models in which the non-sum-to-zero parameters are badly autocorrelated but the sum-to-zero parameters have little autocorrelation. You can try reparameterizing the model. Finally, if getting smart is too much of a pain, you can try using Stan. Its HMC method bends around difficult posterior distributions and greatly reduces autocorrelation, but at the cost of longer real time per step. In your case, might be the way to go!

DeleteWe at Coepd declared Data Science Internship Programs (Self sponsored) for professionals who want to have hands on experience. We are providing this program in alliance with IT Companies in COEPD Hyderabad premises. This program is dedicated to our unwavering participants predominantly acknowledging and appreciating the fact that they are on the path of making a career in Data Science discipline. This internship is designed to ensure that in addition to gaining the requisite theoretical knowledge, the readers gain sufficient hands-on practice and practical know-how to master the nitty-gritty of the Data Science profession. More than a training institute, COEPD today stands differentiated as a mission to help you "Build your dream career" - COEPD way.

ReplyDeletehttp://www.coepd.com/AnalyticsInternship.html