Thursday, August 9, 2012

Gamma Likelihood Parameterized by MODE and SD

In a previous post I showed that it's more intuitive to think of a gamma distribution in terms of its mode and standard deviation (sd) than its mean and sd because the gamma distribution is typically skewed. But I did not show how to estimate the mode and sd in the context of a working JAGS/BUGS program, which I do here.

Suppose we have some data, y, that we wish to describe with a gamma distribution. (This requires that all the y values are non-negative, of course.) The dgamma function in JAGS/BUGS and R is parameterized by shape and rate parameters, not by mean, mode, or sd. Therefore we must reparameterize the shape and rate into equivalent mean, mode or sd. If we want to reparameterize by the mean of the gamma distribution, a JAGS/BUGS model statement could look like this:

  model {
    for ( i in 1:N ) {
      y[i] ~ dgamma( sh , ra )
    }
    # parameterized by mean (m) and standard deviation (sd)
    sh <- pow(m,2) / pow(sd,2)
    ra <-     m    / pow(sd,2)
    m ~ dunif(0,100)
    sd ~ dunif(0,100)
  }

If we want to reparameterize by the mode of the gamma distribution, a JAGS/BUGS model statement could look like this:

  model {
    for ( i in 1:N ) {
      y[i] ~ dgamma( sh , ra )
    }
    # parameterized by mode (m) and standard deviation (sd):
    sh <- 1 + m * ra
    ra <- ( m + sqrt( m^2 + 4*sd^2 ) ) / ( 2 * sd^2 )
    m ~ dunif(0,100)
    sd ~ dunif(0,100)
  }
In both model statements, the m parameter is given a dunif(0,100) prior, which implies that the priors in the two models are not equivalent because the mean is not the same as the mode. With vague priors the difference may be inconsequential for the posterior, but it is an important conceptual distinction. In fact, the whole point of reparameterizing is to make the prior more intuitive to specify! It's more intuitive, for most of us, to put a meaningful prior on the mode and sd than on the shape and rate.

Here are the results for the two models:

The left panels show the data (which are the same for both models) with a smattering of posterior predicted gamma distributions superimposed. The upper row shows the estimates for the model parameterized by the mean and sd, while the lower row shows the estimates for the model parameterized by mode and sd. You can see that the estimated mean (about 8.08) is larger than the estimated mode (about 7.01), but the estimated sd is essentially the same in the two models.


Appendix: Full program listed here, merely FYI.

graphics.off()
rm(list=ls(all=TRUE))
fileNameRoot="GammaModeJags" # for constructing output filenames
if ( .Platform$OS.type != "windows" ) {
  windows <- function( ... ) X11( ... )
}
require(rjags)         # Kruschke, J. K. (2011). Doing Bayesian Data Analysis:
                       # A Tutorial with R and BUGS. Academic Press / Elsevier.
#------------------------------------------------------------------------------
# THE MODEL.

mType = c("Mean","Mode")[1]

if ( mType == "Mean" ) {
  modelstring = "
  model {
    for ( i in 1:N ) {
      y[i] ~ dgamma( sh , ra )
    }
    # parameterized by mean (m) and standard deviation (sd)
    sh <- pow(m,2) / pow(sd,2)
    ra <-     m    / pow(sd,2)
    m ~ dunif(0,100)
    sd ~ dunif(0,100)
  }
  " # close quote for modelstring


if ( mType == "Mode" ) {
  modelstring = "
  model {
    for ( i in 1:N ) {
      y[i] ~ dgamma( sh , ra )
    }
    # parameterized by mode (m) and standard deviation (sd):
    sh <- 1 + m * ra
    ra <- ( m + sqrt( m^2 + 4*sd^2 ) ) / ( 2 * sd^2 )
    m ~ dunif(0,100)
    sd ~ dunif(0,100)
  }
  " # close quote for modelstring
}
 
# Write model to a file, and send to BUGS:
writeLines(modelstring,con="model.txt")

#------------------------------------------------------------------------------
# THE DATA.

# Load the data:

N = 87 ; m=8 ; sd=3
set.seed(47401)
y = rgamma( N , shape=m^2/sd^2 , rate=m/sd^2 )

# Specify the data as a list:
dataList = list( y = y , N = N )

#------------------------------------------------------------------------------
# INTIALIZE THE CHAINS.

initsList = list( m = mean(y) , sd = sd(y) )

#------------------------------------------------------------------------------
# RUN THE CHAINS

parameters = c( "m" , "sd" ) 
adaptSteps = 1000              # Number of steps to "tune" the samplers.
burnInSteps = 1000            # Number of steps to "burn-in" the samplers.
nChains = 3                   # Number of chains to run.
numSavedSteps=50000           # Total number of steps in chains to save.
thinSteps=1                   # Number of steps to "thin" (1=keep every step).
nPerChain = ceiling( ( numSavedSteps * thinSteps ) / nChains ) # Steps per chain.
# Create, initialize, and adapt the model:
jagsModel = jags.model( "model.txt" , data=dataList , inits=initsList ,
                        n.chains=nChains , n.adapt=adaptSteps )
# Burn-in:
cat( "Burning in the MCMC chain...\n" )
update( jagsModel , n.iter=burnInSteps )
# The saved MCMC chain:
cat( "Sampling final MCMC chain...\n" )
codaSamples = coda.samples( jagsModel , variable.names=parameters ,
                            n.iter=nPerChain , thin=thinSteps )
# resulting codaSamples object has these indices:
#   codaSamples[[ chainIdx ]][ stepIdx , paramIdx ]

#------------------------------------------------------------------------------
# EXAMINE THE RESULTS

checkConvergence = F
if ( checkConvergence ) {
  windows()
  autocorr.plot( codaSamples , ask=F )
}

# Convert coda-object codaSamples to matrix object for easier handling.
# But note that this concatenates the different chains into one long chain.
# Result is mcmcChain[ stepIdx , paramIdx ]
mcmcChain = as.matrix( codaSamples )
chainLength = NROW(mcmcChain)

source("plotPost.R")

windows(width=10,height=3)
layout( matrix(1:3,nrow=1) )
# Data with superimposed posterior predictive gamma's:
hist( y , xlab="y" , ylim=c(0,0.15) , main="Data" , col="grey" , prob=TRUE )
xComb=seq(min(y),max(y),length=501)
nPPC = 15
for ( idx in round(seq(1,chainLength,length=nPPC)) ) {
  if ( mType=="Mean" ) {
    m = mcmcChain[idx,"m"]
    sd = mcmcChain[idx,"sd"]
    sh = m^2 / sd^2
    ra = m   / sd^2
  }
  if ( mType=="Mode" ) {
    m = mcmcChain[idx,"m"]
    sd = mcmcChain[idx,"sd"]
    ra = ( m + sqrt( m^2 + 4*sd^2 ) ) / ( 2 * sd^2 )
    sh = 1 + m * ra
  }
  lines( xComb , dgamma( xComb , sh , ra ) , col="skyblue" , lwd=2 )
}
# Posterior estimates of parameters:
par( mar=c(3,1,2.5,0) , mgp=c(2,0.7,0) )
plotPost( mcmcChain[,"m"] , xlab=mType , main="Posterior Est." , showMode=T )
plotPost( mcmcChain[,"sd"] , xlab="sd" , main="Posterior Est." , showMode=T )
savePlot(file=paste(fileNameRoot,mType,sep=""),type="jpg")

6 comments:

  1. I've been using Wolfram Alpha to help with reparametrization. For example, if I wanted to define gamma in terms of mean and sd i would look up the mean and sd of a gamma distribution on Wikipedia and the solve it using Wolfram Alpha like this:

    Solve[ {mu=a/b, sigma^2=a/b^2}, {a,b}]

    which gives me a = mu^2/sigma^2 and b = mu/sigma^2 . Usually this works well, though I can't get it to work for the mode and sd for the gamma for some reason... still thought I share this tip.

    ReplyDelete
  2. Hi John,

    In your first example (using the mean/sd parameterization), what if instead of putting a uniform prior on sd, you wanted to use an inverse gamma? Could you do this as:

    sd ~ dgamma(0.001,0.001)

    instead of:

    sd ~ dunif(0,100)

    ?

    Would you then need to divide sh and ra by pow(sd,-2) or is it still pow(sd,2)?

    Thanks in advance!

    ReplyDelete
  3. Well, you want the prior to make sense, as well as be mathematically convenient. A dgamma(0.001,0.001) prior puts a huge spike near zero (and a vast long tail up to huge values) -- is that what you want for your prior on sd? Usually the gamma is used as a prior for precision, not sd, but that's only because the gamma is conjugate for some choices of normal distribution. When we're referring to the sd of other distributions, gamma isn't necessarily conjugate. See Figure 16.2, p. 455 of DBDA2E. There are lots of other examples of parameterizing gamma by mode and sd in DBDA2E...

    ReplyDelete
  4. Thanks John, for your quick reply. What you say makes sense. I guess I've just become accustomed to using a gamma prior for precision, and was getting hung up on whether I was trying to model precision or sd. I should mention that I want to then feed the gamma distribution into a Poisson (gamma-Poisson mixture).

    ReplyDelete
  5. Would it be possible to parameterize the gamma likelihood by median and SD? If so, how?

    ReplyDelete
  6. According to the omniscient Wikipedia, there is no simple closed-form expression for the median of a gamma distribution. https://en.wikipedia.org/wiki/Gamma_distribution

    ReplyDelete