Friday, August 15, 2014

How to use MCMC posterior as prior for future data

An emailer writes:

Dear Prof. Kruschke,
Hello. My name is ... and I am ... . I'm trying to apply Bayesian theorem in developing a model of ... . I used your code to estimate posterior distribution without any trouble. Here is my question. Would you kindly let me know how to use the estimated posterior distribution as the prior of another Bayesian update? ...  Any comment would be greatly appreciated. Thank you very much for your time. Regards,
 My reply:

Thanks very much for your interest in the book.

Because MCMC methods require a mathematically specified prior, but generate a Monte Carlo sample of the posterior, you need to either (a) find a reasonable mathematical summary of the MCMC posterior to use as the mathematical prior for the next batch of data or (b) concatenate the previous data with the next batch of data and analyze them together.

Option (b) gives you the mathematically correct answer, but at the cost of a large data set which might slow down the MCMC process, and it demands that the previous and next batch of data are structured the same way and analyzed with the same model.

For option (a), there is no uniquely correct approach. For each parameter, examine its marginal MCMC posterior distribution and summarize it with a reasonable mathematical function. For example, if the parameter is metric on positive values, maybe a gamma distribution would mimic the MCMC distribution well. Use a mathematical distribution with the same central tendency and standard deviation as the MCMC posterior. A peril of mimicking each marginal distribution separately is that you lose all information about the correlation of parameters in the posterior. If the correlation is important for capturing the prior information you want, then you can use a multivariate prior for two or more parameters simultaneously (e.g., dmnorm in JAGS).

Hope this helps.

P.S. (Added an hour or so after the text above.) A similar thing was done in the book with pseudo-priors. See the middle paragraph of p. 251: "The shape and rate constants for the gamma pseudopriors were set so that the gamma distributions had the same mean and standard deviation as the posterior sample of kappa values. For example, consider the parameter kappa0. In the initial run without pseudopriors, a reasonable posterior sample was obtained for model index 2 (i.e., when kappa0 is actually constrained by the data, as shown in the bottom panel of Figure 10.3). That posterior sample had a mean m and a standard deviation s. The conversion from mean and standard deviation to shape and rate parameters was explained in the caption of Figure 9.8. The conversion yields shape and rate values specified in lines 55 to 56 of the code."


  1. Hey John, this is an incredibly timely post! We're actually working on a similar problem right now. Obviously, I don't know the details of the original emailer's particular problem, but we've gleaned a lot of insight from Myung et al (2013), who offer a technique for using posterior distributions to inform next-trial priors in the form of adaptive design optimization. The link is here: Maybe it will be of interest!

  2. Interesting thoughts on preserving correlations...!

  3. Hi John: I've also thought a little bit about this, and here are some points:

    = The MCMC algorithm needs to get prior probabilities for arbitary parameter values, and that can't be done with the raw MCMC output from a previous run. In principle, you should be able to use a kernel density fitted to previous output, but AFAIK no package implements this - you'd have to write your own code.

    = For your option (a), the function 'fitdistr' in the MASS package may be useful.

    = I think using just the marginal distributions (and implicitly taking them to be independent) is generally suboptimal, and it's often difficult to deal with the relationships via a multivariate distribution and covariances. Combining all the available data into a single run is the best solution.

    = It's easy if the data sets are similar, but that's not necessary: you can use different likelihoods for each, provided they have key parameters in common. BTW do think carefully about what the parameters mean in each model: "reaction time" is not the same when catching a falling rod versus pressing a key when a word flashes on the screen. In ecology, we can estimate abundance trends by combining count data, survival estimates from recaptures of marked animals, and fecundity data: see chapter 11 in Kery and Schaub (2012) Bayesian Population Analysis Using WinBUGS, Academic Press.

    Regards, Mike.