Background: Suppose a researcher is interested in the Bayesian posterior distribution of a parameter, because the parameter is directly meaningful in the research domain. This occurs, for example, in psychometrics. Specifically, in item response theory (IRT; for details and an example of Bayesian IRT see this blog post), the data from many test questions (i.e., the items) and many people yield estimates of the difficulties \(\delta_i\) and discriminations \(\gamma_i\) of the items along with the abilities \(\alpha_p\) of the people. That is, the item difficulty is a parameter \(\delta_i\), and the analyst is specifically interested in the magnitude and uncertainty of each item's difficulty. The same is true for the other parameters, item discrimination and person ability. That is, the analyst is specifically interest in the discrimination \(\gamma_i\) magnitude and uncertainty for every item and the ability \(\alpha_p\) magnitude and uncertainty for every person.
The question: How should the posterior distribution of a meaningful parameter be summarized? We want a number that represents the central tendency of the (posterior) distribution, and numbers that indicate the uncertainty of the distribution. There are two options I'm considering, one based on densities, the other based on percentiles.
Densities. One way of conveying a summary of the posterior distribution is in terms of densities. This seems to be the most intuitive summary, as it directly answers the natural questions from the researcher:
- Question: Based on the data, what is the most credible parameter value? Answer: The modal (highest density) value. For example, we ask: Based on the data, what is the most credible value for this item's difficulty \(\delta_i\)? Answer: The mode of the posterior is 64.5.
- Question: Based on the data, what is the range of the 95% (say) most credible values? Answer: The 95% highest density interval (HDI). For example, we ask: Based on the data, what is the range of the 95% most credible values of \(\delta_i\)? Answer: 51.5 to 75.6.
|An illustration from DBDA2E showing how highest-density intervals and equal-tailed intervals (based on percentiles) are not necessarily equivalent.|
Some pros and cons:
Density answers what the researcher wants to know: What is the most credible value of the parameter, and what is the range of the credible (i.e., high density) values? Those questions simply are not answered by percentiles. On the other hand, density is not invariant under non-linear (but monotonic) transformations of the parameters. By squeezing or stretching different regions of the parameter, the densities can change dramatically, but the percentiles stay the same (on the transformed scale). This transformation invariance is the key reason that analysts avoid using densities in abstract, generic models and derivations.
But in applications where the parameters have meaningful interpretations, I don't think researchers are satisfied with percentiles. If you told a researcher, "Well, we cannot tell you what the most probable parameter value is, all we can tell you is the median (50 %ile)," I don't think the researcher would be satisfied. If you told the researcher, "We can tell you that 30% of the posterior falls below this 30th %ile, but we cannot tell you whether values below the 30th %ile have lower or higher probability density than values above the 30th %ile," I don't think the researcher would be satisfied. Lots of parameters in traditional psychometric models have meaningful scales (and aren't arbitrarily non-linearly transformed). Lots of parameters in conventional models have scales that directly map onto the data scales, for example the mean and standard deviation of a normal model (and the data scales are usually conventional and aren't arbitrarily non-linearly transformed). And in spatial or temporal models, many parameters directly correspond to space and time, which (in most terrestial applications) are not non-linearly transformed.
Decision theory to the rescue? I know there is not a uniquely "correct" answer to this question. I suspect that the pros and cons could be formalized as cost functions in formal decision theory, and then an answer would emerge depending on the utilities assigned to density and tranformation invariance. If the cost function depends on densities, then mode and HDI would emerge as the better basis for decisions. If the cost function depends on transformation invariance, then median and equal-tail interval would emerge as the better basis for decisions.
What do you think?