Sunday, April 2, 2017

A foolish prior on consistency is the howling of shrunken minds

(Photo from Professor Sylvain Fiset's web site.)
Professor Sylvain Fiset has been using Bayesian methods (from DBDA2E)  to analyze short-term memory of dogs. In each experimental trial, a dog watched a human experimenter tuck away a treat in one of several boxes. After a moment, the dog was allowed to retrieve the treat. Each trial was scored correct if the dog went directly to the correct box and scored wrong otherwise. (Actually, the task was a little harder, involving two hidden treats, but that detail isn't crucial here.) Thus, each dog had \(z_s\) correct out of \(N_s\) trials. We would like to estimate the underlying ability of each dog, \(\theta_s\), and the typical ability of the group of dogs, \(\omega\), and the consistency across the dogs, \(\kappa\). This is exactly the structure of the therapeutic touch example in Chapter 9 of DBDA2E. So, it should be trivial to simply read in the data, run the pre-existing script, and bark in joy at the output!

But, when Sylvain did that, the MCMC chains would not converge. Howls! He asked me what was going on, and I determined that the data showed a fairly large variability across dogs, and for whatever reason the canned prior on the consistency parameter, \(\kappa\), was allowing the MCMC chain to "get stuck" at very low values of \(\kappa\). So I recommended tweaking the prior on \(\kappa\) a modest amount to keep it away from zero while still being broad and noncommittal. It then worked fine. Barks of joy!

However, only the next day (i.e., today), I realized that Past Self had already thought about implications of the prior on \(\kappa\). (If only my human long-term memory were better!) Past Self was so concerned with this issue that he wrote Exercises about it in DBDA2E! See Exercises 9.1 and 9.2, with solutions available at the book's web site. Here's the idea: Notice that the consistency parameter, \(\kappa\), constrains how far apart the individual ability parameters, \(\theta_s\), will spread apart. The higher the consistency, the more shrinkage on the individual estimates. If the prior on \(\kappa\) avoids small consistencies, it implies that individual parameters will be relatively near each other. In other words, even if the prior on \(\kappa\) seems broad and noncommittal, it may imply stronger shrinkage on the individual estimates that you may intuit. Therefore, like Past Self, Present Self recommends that you take a look at the implied prior on the differences of individual values, \(\theta_s - \theta_t\), to check that the prior is appropriate for your application. Usually (for typical data sets with moderately large amounts of data) the shrinkage will be appropriate, but it's worth checking and thinking about.

By the way, I met Sylvain at Stats Camp, where I taught one of my workshops in Bayesian Methods

Finally, the title of this post is supposed to be directed at myself for having forgotten about Past Self's consideration of this issue, but in defense of Present Self the exercises were not about MCMC convergence. The title is a variation of a phrase from Ralph Waldo Emerson.

No comments:

Post a Comment