I'll be doing a workshop at the meeting of the Association for Psychological Science in Chicago, Thursday May 24. Details can be found here.
(I'm also doing a workshop in Chicago on May 4; details here.)
A list of future and past workshops can be found here.
Tuesday, April 24, 2012
Wednesday, April 18, 2012
Improved Programs for Hierarchical Bayesian ANOVA
I've made three important changes in the programs for hierarchical Bayesian ANOVA.
First, I changed the hyperprior on the variance (standard deviation) parameters. I was never pleased with the "+.1" kludge that I used in the book, explained on p. 496: "There is one other trick in the BUGS model specification that is not in the hierarchical diagram of Figure 18.1. One line of the BUGS model specifies that the standard deviation of the group effects, denoted aSDunabs, comes from a t distribution: aSDunabs ~ dt(0,0.001,2). Another line takes the absolute value to “fold” the t distribution onto the nonnegative numbers: aSD <- abs(aSDunabs) + .1. But that line also mysteriously adds a small constant, namely 0.1. This constant keeps aSD from venturing extremely close to zero. The reason for keeping aSD away from zero is that shrinkage can become overwhelmingly strong when there are many groups with few data points per group. This becomes especially problematic in the next chapter when we consider interaction of factors." A problem with that kludge is that the aSD parameter can never be less than 0.1, which is unlikely to reflect a real prior belief. It can produce strangely truncated posterior distributions. Therefore I simply abandoned the folded-t prior (recommended by Gelman, 2006) and used a reasonable gamma instead. The gamma prior, for each SD parameter, has a mode at 0.1 and a standard deviation of 10, which seems to be an appropriate scaling when the data are standardized (as indeed they are in the programs). Here's a picture of this prior:
The neat thing about this prior is that it smoothly drops down to zero as the parameter value gets close to zero. (For how to specify a gamma prior with a desired mode and standard deviation, see this previous blog post.) The prior on the within-cell SD remains uniform, as in the original programs and recommended by Gelman (2006).
Second, I changed the plots of the variance parameters so that they are on the original data scale, not the standardized scale. This was achieved merely by multiplying by ySDorig.
Third, I changed how contrasts are specified. The original method was problematic because the contrasts were specified as vectors of contrast coefficients, which (a) lacked explicit meaning and (b) only made sense if the order of the levels of the factors was exactly as assumed by the contrast coefficients. Unfortunately, it turns out that different settings of R's "locale" can produce different orderings of the factor levels, rendering the contrast coefficients nonsensical! (Thanks to Damion Junk for pointing out this problem to me.) The new method uses logical indexing of named levels, so that the contrast specification is meaningful and independent of local ordering conventions. For example, for the mussel muscle data, the old contrast specification was
contrastList = list( ...
ORE1vORE2 = c(1,-1,0,0,0) ,
ALAvORE = c(-1/2,-1/2,1,0,0) ,
... )
but the new specification is
normalize = function( v ){ return( v / sum(v) ) }
contrastList = list( ...
ORE1vORE2 = (xnames=="OregonN")-(xnames=="OregonT") ,
ALAvORE = (xnames=="Alaska")-normalize(xnames=="OregonN"|xnames=="OregonT") ,
... )
Fourth, I improved the program for displaying posterior histograms, so that it now defaults to better breaks based on the HDI. Therefore it is best to use plotPost without a breaks argument unless the (new) default looks bad.
Where are the new programs? Get the updated files, ANOVAonewayJagsSTZ.R, ANOVAtwowayJagsSTZ.R, plotPost.R, and McDonaldSK1991data.txt, from the usual place (click the column labeled "Last Modified" to get the most recent listed first). ** Updated April 25, 2012. Also with the program PoissonExponentialJagsSTZ.R **
First, I changed the hyperprior on the variance (standard deviation) parameters. I was never pleased with the "+.1" kludge that I used in the book, explained on p. 496: "There is one other trick in the BUGS model specification that is not in the hierarchical diagram of Figure 18.1. One line of the BUGS model specifies that the standard deviation of the group effects, denoted aSDunabs, comes from a t distribution: aSDunabs ~ dt(0,0.001,2). Another line takes the absolute value to “fold” the t distribution onto the nonnegative numbers: aSD <- abs(aSDunabs) + .1. But that line also mysteriously adds a small constant, namely 0.1. This constant keeps aSD from venturing extremely close to zero. The reason for keeping aSD away from zero is that shrinkage can become overwhelmingly strong when there are many groups with few data points per group. This becomes especially problematic in the next chapter when we consider interaction of factors." A problem with that kludge is that the aSD parameter can never be less than 0.1, which is unlikely to reflect a real prior belief. It can produce strangely truncated posterior distributions. Therefore I simply abandoned the folded-t prior (recommended by Gelman, 2006) and used a reasonable gamma instead. The gamma prior, for each SD parameter, has a mode at 0.1 and a standard deviation of 10, which seems to be an appropriate scaling when the data are standardized (as indeed they are in the programs). Here's a picture of this prior:
The neat thing about this prior is that it smoothly drops down to zero as the parameter value gets close to zero. (For how to specify a gamma prior with a desired mode and standard deviation, see this previous blog post.) The prior on the within-cell SD remains uniform, as in the original programs and recommended by Gelman (2006).
Second, I changed the plots of the variance parameters so that they are on the original data scale, not the standardized scale. This was achieved merely by multiplying by ySDorig.
Third, I changed how contrasts are specified. The original method was problematic because the contrasts were specified as vectors of contrast coefficients, which (a) lacked explicit meaning and (b) only made sense if the order of the levels of the factors was exactly as assumed by the contrast coefficients. Unfortunately, it turns out that different settings of R's "locale" can produce different orderings of the factor levels, rendering the contrast coefficients nonsensical! (Thanks to Damion Junk for pointing out this problem to me.) The new method uses logical indexing of named levels, so that the contrast specification is meaningful and independent of local ordering conventions. For example, for the mussel muscle data, the old contrast specification was
contrastList = list( ...
ORE1vORE2 = c(1,-1,0,0,0) ,
ALAvORE = c(-1/2,-1/2,1,0,0) ,
... )
but the new specification is
normalize = function( v ){ return( v / sum(v) ) }
contrastList = list( ...
ORE1vORE2 = (xnames=="OregonN")-(xnames=="OregonT") ,
ALAvORE = (xnames=="Alaska")-normalize(xnames=="OregonN"|xnames=="OregonT") ,
... )
Fourth, I improved the program for displaying posterior histograms, so that it now defaults to better breaks based on the HDI. Therefore it is best to use plotPost without a breaks argument unless the (new) default looks bad.
Where are the new programs? Get the updated files, ANOVAonewayJagsSTZ.R, ANOVAtwowayJagsSTZ.R, plotPost.R, and McDonaldSK1991data.txt, from the usual place (click the column labeled "Last Modified" to get the most recent listed first). ** Updated April 25, 2012. Also with the program PoissonExponentialJagsSTZ.R **
Monday, April 16, 2012
Why to use highest density intervals instead of equal tailed intervals
I recently received this question in an e-mail:
My reply:
Here is an example of what I mean. Suppose there are zero "heads" in N=10 flips of coin, being modeled with a Bernoulli likelihood. Suppose we start with a uniform prior --- that is, dbeta(1,1) --- on the bias of the coin. Then the posterior is dbeta(1,11), which looks like this:
The 95% HDI goes from zero to 0.2384. But the 95% equal-tailed interval goes from 0.0023 to 0.2849, which excludes zero. Clearly the HDI seems to be a more intuitive and meaningful summary of the posterior.
There is a way that the posterior 95% HDI could exclude zero even when the data have a frequency of zero. It can happen if the prior already excludes zero. For example, the prior might be dbeta(1.01,1.01), or dbeta(2,10), or whatever.
And, of course, the analogous argument applies when the likelihood is Poisson and the prior is gamma, or what have you.
See also this related post.
We've obtained a dataset that reports hospital rates of hospital-acquired bed sores (as % of patients, based on hospital-wide patient exams 4 days per year), and they also provide CI's for these point estimates "based on Bayesian statistics" (that's all we know, we're not given any more info)….and I'm confused because many of the hospitals that had a rate of zero events are paired with a confidence interval that doesn't contain zero (confidence interval is higher than zero). What do you think about that?
My reply:
If they really are using Bayesian "confidence intervals", and a frequency of zero does not produce a CI that includes zero, then I bet they are using equal-tailed credible intervals. By definition, a 95% equal tailed credible interval has to exclude 2.5% from each tail of the distribution. So, even if the mode of the posterior is at zero, if you exclude 2.5%, then you have to exclude zero. That's why I use highest density intervals (HDIs), not equal-tail CIs. HDIs always include the mode(s).
Here is an example of what I mean. Suppose there are zero "heads" in N=10 flips of coin, being modeled with a Bernoulli likelihood. Suppose we start with a uniform prior --- that is, dbeta(1,1) --- on the bias of the coin. Then the posterior is dbeta(1,11), which looks like this:
The 95% HDI goes from zero to 0.2384. But the 95% equal-tailed interval goes from 0.0023 to 0.2849, which excludes zero. Clearly the HDI seems to be a more intuitive and meaningful summary of the posterior.
There is a way that the posterior 95% HDI could exclude zero even when the data have a frequency of zero. It can happen if the prior already excludes zero. For example, the prior might be dbeta(1.01,1.01), or dbeta(2,10), or whatever.
And, of course, the analogous argument applies when the likelihood is Poisson and the prior is gamma, or what have you.
See also this related post.
Friday, April 13, 2012
Compilation of JAGS resources
Jeremy Anglim has posted a very nice summary of resources for JAGS and examples of Bayesian analyses. Do give it a look!
Saturday, April 7, 2012
Negative Binomial Reparameterization
In a previous post, I showed that direct estimation of the p and r parameters in a negative binomial distribution could involve bad autocorrelation in the MCMC chains, and I suggested that there must be some standard reparameterization to solve the problem, and asked for a pointer. Dr. John Davey of the University of Edinburgh was good enough to point the way. The solution is indeed straight forward: In the direct estimation, the p and r parameters are given priors, while the mean m (and variance v) is derived from p and r. In the reparameterization, the m and r parameters are given priors, while the p (and variance v) is derived from m and r. Autocorrelation in the chains is greatly reduced. Here is an example.
Parameterized by p and r:
Parameterized by m and r:
Parameterized by p and r:
Parameterized by m and r:
The posteriors are a bit different for the two parameterizations, because I used "generic" priors without any attempt to transform them to be equivalent.
Thanks to John Davey for this pointer!
The model specifications:
Parameterized by p and r:
model {
for( i in 1 : N ) {
y[i] ~ dnegbin( p , r )
}
p ~ dbeta(1.001,1.001)
r ~ dgamma(0.01,0.01)
m <- r*(1-p)/p
v <- r*(1-p)/(p*p)
}
for( i in 1 : N ) {
y[i] ~ dnegbin( p , r )
}
p ~ dbeta(1.001,1.001)
r ~ dgamma(0.01,0.01)
m <- r*(1-p)/p
v <- r*(1-p)/(p*p)
}
Parameterized by m and r:
model {
for( i in 1 : N ) {
y[i] ~ dnegbin( p , r )
}
p <- r/(r+m)
r ~ dgamma(0.01,0.01)
m ~ dgamma(0.01,0.01)
v <- r*(1-p)/(p*p)
}
for( i in 1 : N ) {
y[i] ~ dnegbin( p , r )
}
p <- r/(r+m)
r ~ dgamma(0.01,0.01)
m ~ dgamma(0.01,0.01)
v <- r*(1-p)/(p*p)
}
Results:
Parameterized by p and r:
Parameterized by m and r:
The posteriors are a bit different for the two parameterizations, because I used "generic" priors without any attempt to transform them to be equivalent.
Thanks to John Davey for this pointer!
Tuesday, April 3, 2012
Some videos about the history of Bayes' rule
Some videos about the history of Bayes' rule. Update: Be sure to read the comment from Sharon McGrayne (click the comments link at the end of the post).
One by Sharon Bertsch McGrayne (32 min.):
One by Bill Bryson (5 min.):
Update: Be sure to expand the comments link, below, to see Sharon McGrayne's clarifications of Bryson's statements.
One by Sharon Bertsch McGrayne (32 min.):
One by Bill Bryson (5 min.):
Update: Be sure to expand the comments link, below, to see Sharon McGrayne's clarifications of Bryson's statements.
Monday, April 2, 2012
Workshop in Chicago, May 4
Subscribe to:
Posts (Atom)