Workshop on doing Bayesian data analysis at Michigan State University, East Lansing, September 14-15. See details
and registration info here.
A list of some previous workshops is here.
Monday, August 13, 2012
Thursday, August 9, 2012
Gamma Likelihood Parameterized by MODE and SD
In a previous post I showed that it's more intuitive to think of a gamma distribution in terms of its mode and standard deviation (sd) than its mean and sd because the gamma distribution is typically skewed. But I did not show how to estimate the mode and sd in the context of a working JAGS/BUGS program, which I do here.
Suppose we have some data, y, that we wish to describe with a gamma distribution. (This requires that all the y values are non-negative, of course.) The dgamma function in JAGS/BUGS and R is parameterized by shape and rate parameters, not by mean, mode, or sd. Therefore we must reparameterize the shape and rate into equivalent mean, mode or sd. If we want to reparameterize by the mean of the gamma distribution, a JAGS/BUGS model statement could look like this:
model {
for ( i in 1:N ) {
y[i] ~ dgamma( sh , ra )
}
# parameterized by mean (m) and standard deviation (sd)
sh <- pow(m,2) / pow(sd,2)
ra <- m / pow(sd,2)
m ~ dunif(0,100)
sd ~ dunif(0,100)
}
If we want to reparameterize by the mode of the gamma distribution, a JAGS/BUGS model statement could look like this:
model {
for ( i in 1:N ) {
y[i] ~ dgamma( sh , ra )
}
# parameterized by mode (m) and standard deviation (sd):
sh <- 1 + m * ra
ra <- ( m + sqrt( m^2 + 4*sd^2 ) ) / ( 2 * sd^2 )
m ~ dunif(0,100)
sd ~ dunif(0,100)
}
In both model statements, the m parameter is given a dunif(0,100) prior, which implies that the priors in the two models are not equivalent because the mean is not the same as the mode. With vague priors the difference may be inconsequential for the posterior, but it is an important conceptual distinction. In fact, the whole point of reparameterizing is to make the prior more intuitive to specify! It's more intuitive, for most of us, to put a meaningful prior on the mode and sd than on the shape and rate.
Here are the results for the two models:
The left panels show the data (which are the same for both models) with a smattering of posterior predicted gamma distributions superimposed. The upper row shows the estimates for the model parameterized by the mean and sd, while the lower row shows the estimates for the model parameterized by mode and sd. You can see that the estimated mean (about 8.08) is larger than the estimated mode (about 7.01), but the estimated sd is essentially the same in the two models.
Appendix: Full program listed here, merely FYI.
graphics.off()
rm(list=ls(all=TRUE))
fileNameRoot="GammaModeJags" # for constructing output filenames
if ( .Platform$OS.type != "windows" ) {
windows <- function( ... ) X11( ... )
}
require(rjags) # Kruschke, J. K. (2011). Doing Bayesian Data Analysis:
# A Tutorial with R and BUGS. Academic Press / Elsevier.
#------------------------------------------------------------------------------
# THE MODEL.
mType = c("Mean","Mode")[1]
if ( mType == "Mean" ) {
modelstring = "
model {
for ( i in 1:N ) {
y[i] ~ dgamma( sh , ra )
}
# parameterized by mean (m) and standard deviation (sd)
sh <- pow(m,2) / pow(sd,2)
ra <- m / pow(sd,2)
m ~ dunif(0,100)
sd ~ dunif(0,100)
}
" # close quote for modelstring
}
if ( mType == "Mode" ) {
modelstring = "
model {
for ( i in 1:N ) {
y[i] ~ dgamma( sh , ra )
}
# parameterized by mode (m) and standard deviation (sd):
sh <- 1 + m * ra
ra <- ( m + sqrt( m^2 + 4*sd^2 ) ) / ( 2 * sd^2 )
m ~ dunif(0,100)
sd ~ dunif(0,100)
}
" # close quote for modelstring
}
# Write model to a file, and send to BUGS:
writeLines(modelstring,con="model.txt")
#------------------------------------------------------------------------------
# THE DATA.
# Load the data:
N = 87 ; m=8 ; sd=3
set.seed(47401)
y = rgamma( N , shape=m^2/sd^2 , rate=m/sd^2 )
# Specify the data as a list:
dataList = list( y = y , N = N )
#------------------------------------------------------------------------------
# INTIALIZE THE CHAINS.
initsList = list( m = mean(y) , sd = sd(y) )
#------------------------------------------------------------------------------
# RUN THE CHAINS
parameters = c( "m" , "sd" )
adaptSteps = 1000 # Number of steps to "tune" the samplers.
burnInSteps = 1000 # Number of steps to "burn-in" the samplers.
nChains = 3 # Number of chains to run.
numSavedSteps=50000 # Total number of steps in chains to save.
thinSteps=1 # Number of steps to "thin" (1=keep every step).
nPerChain = ceiling( ( numSavedSteps * thinSteps ) / nChains ) # Steps per chain.
# Create, initialize, and adapt the model:
jagsModel = jags.model( "model.txt" , data=dataList , inits=initsList ,
n.chains=nChains , n.adapt=adaptSteps )
# Burn-in:
cat( "Burning in the MCMC chain...\n" )
update( jagsModel , n.iter=burnInSteps )
# The saved MCMC chain:
cat( "Sampling final MCMC chain...\n" )
codaSamples = coda.samples( jagsModel , variable.names=parameters ,
n.iter=nPerChain , thin=thinSteps )
# resulting codaSamples object has these indices:
# codaSamples[[ chainIdx ]][ stepIdx , paramIdx ]
#------------------------------------------------------------------------------
# EXAMINE THE RESULTS
checkConvergence = F
if ( checkConvergence ) {
windows()
autocorr.plot( codaSamples , ask=F )
}
# Convert coda-object codaSamples to matrix object for easier handling.
# But note that this concatenates the different chains into one long chain.
# Result is mcmcChain[ stepIdx , paramIdx ]
mcmcChain = as.matrix( codaSamples )
chainLength = NROW(mcmcChain)
source("plotPost.R")
windows(width=10,height=3)
layout( matrix(1:3,nrow=1) )
# Data with superimposed posterior predictive gamma's:
hist( y , xlab="y" , ylim=c(0,0.15) , main="Data" , col="grey" , prob=TRUE )
xComb=seq(min(y),max(y),length=501)
nPPC = 15
for ( idx in round(seq(1,chainLength,length=nPPC)) ) {
if ( mType=="Mean" ) {
m = mcmcChain[idx,"m"]
sd = mcmcChain[idx,"sd"]
sh = m^2 / sd^2
ra = m / sd^2
}
if ( mType=="Mode" ) {
m = mcmcChain[idx,"m"]
sd = mcmcChain[idx,"sd"]
ra = ( m + sqrt( m^2 + 4*sd^2 ) ) / ( 2 * sd^2 )
sh = 1 + m * ra
}
lines( xComb , dgamma( xComb , sh , ra ) , col="skyblue" , lwd=2 )
}
# Posterior estimates of parameters:
par( mar=c(3,1,2.5,0) , mgp=c(2,0.7,0) )
plotPost( mcmcChain[,"m"] , xlab=mType , main="Posterior Est." , showMode=T )
plotPost( mcmcChain[,"sd"] , xlab="sd" , main="Posterior Est." , showMode=T )
savePlot(file=paste(fileNameRoot,mType,sep=""),type="jpg")
Suppose we have some data, y, that we wish to describe with a gamma distribution. (This requires that all the y values are non-negative, of course.) The dgamma function in JAGS/BUGS and R is parameterized by shape and rate parameters, not by mean, mode, or sd. Therefore we must reparameterize the shape and rate into equivalent mean, mode or sd. If we want to reparameterize by the mean of the gamma distribution, a JAGS/BUGS model statement could look like this:
model {
for ( i in 1:N ) {
y[i] ~ dgamma( sh , ra )
}
# parameterized by mean (m) and standard deviation (sd)
sh <- pow(m,2) / pow(sd,2)
ra <- m / pow(sd,2)
m ~ dunif(0,100)
sd ~ dunif(0,100)
}
If we want to reparameterize by the mode of the gamma distribution, a JAGS/BUGS model statement could look like this:
model {
for ( i in 1:N ) {
y[i] ~ dgamma( sh , ra )
}
# parameterized by mode (m) and standard deviation (sd):
sh <- 1 + m * ra
ra <- ( m + sqrt( m^2 + 4*sd^2 ) ) / ( 2 * sd^2 )
m ~ dunif(0,100)
sd ~ dunif(0,100)
}
In both model statements, the m parameter is given a dunif(0,100) prior, which implies that the priors in the two models are not equivalent because the mean is not the same as the mode. With vague priors the difference may be inconsequential for the posterior, but it is an important conceptual distinction. In fact, the whole point of reparameterizing is to make the prior more intuitive to specify! It's more intuitive, for most of us, to put a meaningful prior on the mode and sd than on the shape and rate.
Here are the results for the two models:
The left panels show the data (which are the same for both models) with a smattering of posterior predicted gamma distributions superimposed. The upper row shows the estimates for the model parameterized by the mean and sd, while the lower row shows the estimates for the model parameterized by mode and sd. You can see that the estimated mean (about 8.08) is larger than the estimated mode (about 7.01), but the estimated sd is essentially the same in the two models.
Appendix: Full program listed here, merely FYI.
graphics.off()
rm(list=ls(all=TRUE))
fileNameRoot="GammaModeJags" # for constructing output filenames
if ( .Platform$OS.type != "windows" ) {
windows <- function( ... ) X11( ... )
}
require(rjags) # Kruschke, J. K. (2011). Doing Bayesian Data Analysis:
# A Tutorial with R and BUGS. Academic Press / Elsevier.
#------------------------------------------------------------------------------
# THE MODEL.
mType = c("Mean","Mode")[1]
if ( mType == "Mean" ) {
modelstring = "
model {
for ( i in 1:N ) {
y[i] ~ dgamma( sh , ra )
}
# parameterized by mean (m) and standard deviation (sd)
sh <- pow(m,2) / pow(sd,2)
ra <- m / pow(sd,2)
m ~ dunif(0,100)
sd ~ dunif(0,100)
}
" # close quote for modelstring
}
if ( mType == "Mode" ) {
modelstring = "
model {
for ( i in 1:N ) {
y[i] ~ dgamma( sh , ra )
}
# parameterized by mode (m) and standard deviation (sd):
sh <- 1 + m * ra
ra <- ( m + sqrt( m^2 + 4*sd^2 ) ) / ( 2 * sd^2 )
m ~ dunif(0,100)
sd ~ dunif(0,100)
}
" # close quote for modelstring
}
# Write model to a file, and send to BUGS:
writeLines(modelstring,con="model.txt")
#------------------------------------------------------------------------------
# THE DATA.
# Load the data:
N = 87 ; m=8 ; sd=3
set.seed(47401)
y = rgamma( N , shape=m^2/sd^2 , rate=m/sd^2 )
# Specify the data as a list:
dataList = list( y = y , N = N )
#------------------------------------------------------------------------------
# INTIALIZE THE CHAINS.
initsList = list( m = mean(y) , sd = sd(y) )
#------------------------------------------------------------------------------
# RUN THE CHAINS
parameters = c( "m" , "sd" )
adaptSteps = 1000 # Number of steps to "tune" the samplers.
burnInSteps = 1000 # Number of steps to "burn-in" the samplers.
nChains = 3 # Number of chains to run.
numSavedSteps=50000 # Total number of steps in chains to save.
thinSteps=1 # Number of steps to "thin" (1=keep every step).
nPerChain = ceiling( ( numSavedSteps * thinSteps ) / nChains ) # Steps per chain.
# Create, initialize, and adapt the model:
jagsModel = jags.model( "model.txt" , data=dataList , inits=initsList ,
n.chains=nChains , n.adapt=adaptSteps )
# Burn-in:
cat( "Burning in the MCMC chain...\n" )
update( jagsModel , n.iter=burnInSteps )
# The saved MCMC chain:
cat( "Sampling final MCMC chain...\n" )
codaSamples = coda.samples( jagsModel , variable.names=parameters ,
n.iter=nPerChain , thin=thinSteps )
# resulting codaSamples object has these indices:
# codaSamples[[ chainIdx ]][ stepIdx , paramIdx ]
#------------------------------------------------------------------------------
# EXAMINE THE RESULTS
checkConvergence = F
if ( checkConvergence ) {
windows()
autocorr.plot( codaSamples , ask=F )
}
# Convert coda-object codaSamples to matrix object for easier handling.
# But note that this concatenates the different chains into one long chain.
# Result is mcmcChain[ stepIdx , paramIdx ]
mcmcChain = as.matrix( codaSamples )
chainLength = NROW(mcmcChain)
source("plotPost.R")
windows(width=10,height=3)
layout( matrix(1:3,nrow=1) )
# Data with superimposed posterior predictive gamma's:
hist( y , xlab="y" , ylim=c(0,0.15) , main="Data" , col="grey" , prob=TRUE )
xComb=seq(min(y),max(y),length=501)
nPPC = 15
for ( idx in round(seq(1,chainLength,length=nPPC)) ) {
if ( mType=="Mean" ) {
m = mcmcChain[idx,"m"]
sd = mcmcChain[idx,"sd"]
sh = m^2 / sd^2
ra = m / sd^2
}
if ( mType=="Mode" ) {
m = mcmcChain[idx,"m"]
sd = mcmcChain[idx,"sd"]
ra = ( m + sqrt( m^2 + 4*sd^2 ) ) / ( 2 * sd^2 )
sh = 1 + m * ra
}
lines( xComb , dgamma( xComb , sh , ra ) , col="skyblue" , lwd=2 )
}
# Posterior estimates of parameters:
par( mar=c(3,1,2.5,0) , mgp=c(2,0.7,0) )
plotPost( mcmcChain[,"m"] , xlab=mType , main="Posterior Est." , showMode=T )
plotPost( mcmcChain[,"sd"] , xlab="sd" , main="Posterior Est." , showMode=T )
savePlot(file=paste(fileNameRoot,mType,sep=""),type="jpg")
Tuesday, August 7, 2012
Metropolis Algorithm: Discrete Position Probabilities
I was asked by a reader how I created Figure 7.2 of the book, reproduced at right, which shows the progression of discrete position probabilities at each step in a simple Metropolis algorithm. I couldn't find the original program, so I made a new one, reported here, with an additional example that illustrates a bimodal target distribution.
The main point is mentioned in the middle of p. 128: "At every time step, we multiply the current position probability vector w by the transition probability matrix T to get the position probabilities for the next time step. We keep multiplying by T over and over again to derive the long-run position probabilities. This process is exactly what generated the graphs in Figure 7.2."
The code consists of two main parts. First, build the transition matrix. Second, specify an initial position vector and repeatedly multiply by the transition matrix. Here is the code:
# For Figure 7.2, p. 123 of Doing Bayesian Data Analysis.
fileNameRoot = "DBDAfigure7pt2"
# Specify the target probability distribution.
pTargetName = "Linear" # for filename of saved graph
nSlots = 7
pTarget = 1:nSlots
pTarget = pTarget/sum(pTarget)
# Uncomment following lines for different example
#pTargetName = "Bimodal" # for filename of saved graph
#pTarget = c(1,2,2,1,3,3,1)
#nSlots = length(pTarget)
#pTarget = pTarget/sum(pTarget)
# Construct the matrix of proposal probabilities.
# Row is from position, column is to position.
proposalMatrix = matrix( 0 , nrow=nSlots , ncol=nSlots )
for ( fromIdx in 1:nSlots ) {
for ( toIdx in 1:nSlots ) {
if ( toIdx == fromIdx-1 ) { proposalMatrix[fromIdx,toIdx] = 0.5 }
if ( toIdx == fromIdx+1 ) { proposalMatrix[fromIdx,toIdx] = 0.5 }
}
}
# Construct the matrix of acceptance probabilities.
# Row is from position, column is to position.
acceptMatrix = matrix( 0 , nrow=nSlots , ncol=nSlots )
for ( fromIdx in 1:nSlots ) {
for ( toIdx in 1:nSlots ) {
acceptMatrix[fromIdx,toIdx] = min( pTarget[toIdx]/pTarget[fromIdx] , 1 )
}
}
# Compute the matrix of overall move probabilities:
moveMatrix = proposalMatrix * acceptMatrix
# Compute the transition matrix, including the probability of staying in place:
transitionMatrix = moveMatrix
for ( diagIdx in 1:nSlots ) {
transitionMatrix[diagIdx,diagIdx] = 1.0 - sum(moveMatrix[diagIdx,])
}
show( transitionMatrix )
# Specify starting position vector:
positionVec = rep(0,nSlots)
positionVec[round(nSlots/2)] = 1.0
# Loop through time steps and display position probabilities:
windows(height=7,width=7) # change to x11 for non-WindowsOS
layout( matrix(1:16,nrow=4) )
par( mar = c( 2.8 , 2.8 , 1.0 , 0.5 ) , mgp = c( 1.4 , 0.5 , 0 ) )
for ( timeIdx in 1:15 ) {
# Display position probability:
plot( 1:nSlots , positionVec , xlab="Position" , ylab="Probability" ,
type="h" , lwd=5 , col="skyblue" )
text( 1 , max(positionVec) , bquote(t==.(timeIdx)) , adj=c(0,1) )
# Update next position probability:
positionVec = positionVec %*% transitionMatrix
}
# Plot target distribution
plot( 1:nSlots , pTarget , xlab="Position" , ylab="Probability" ,
type="h" , lwd=5 , col="skyblue" )
text( 1 , max(positionVec) , bquote(target) , adj=c(0,1) )
savePlot( filename=paste(fileNameRoot,pTargetName,sep="") , type="jpg" )
For the linearly increasing target distribution, the transition matrix looks like this:
The resulting graph looks like this:
If you uncomment the lines that specify a bimodal target, the resulting graph looks like this:
The main point is mentioned in the middle of p. 128: "At every time step, we multiply the current position probability vector w by the transition probability matrix T to get the position probabilities for the next time step. We keep multiplying by T over and over again to derive the long-run position probabilities. This process is exactly what generated the graphs in Figure 7.2."
The code consists of two main parts. First, build the transition matrix. Second, specify an initial position vector and repeatedly multiply by the transition matrix. Here is the code:
# For Figure 7.2, p. 123 of Doing Bayesian Data Analysis.
fileNameRoot = "DBDAfigure7pt2"
# Specify the target probability distribution.
pTargetName = "Linear" # for filename of saved graph
nSlots = 7
pTarget = 1:nSlots
pTarget = pTarget/sum(pTarget)
# Uncomment following lines for different example
#pTargetName = "Bimodal" # for filename of saved graph
#pTarget = c(1,2,2,1,3,3,1)
#nSlots = length(pTarget)
#pTarget = pTarget/sum(pTarget)
# Construct the matrix of proposal probabilities.
# Row is from position, column is to position.
proposalMatrix = matrix( 0 , nrow=nSlots , ncol=nSlots )
for ( fromIdx in 1:nSlots ) {
for ( toIdx in 1:nSlots ) {
if ( toIdx == fromIdx-1 ) { proposalMatrix[fromIdx,toIdx] = 0.5 }
if ( toIdx == fromIdx+1 ) { proposalMatrix[fromIdx,toIdx] = 0.5 }
}
}
# Construct the matrix of acceptance probabilities.
# Row is from position, column is to position.
acceptMatrix = matrix( 0 , nrow=nSlots , ncol=nSlots )
for ( fromIdx in 1:nSlots ) {
for ( toIdx in 1:nSlots ) {
acceptMatrix[fromIdx,toIdx] = min( pTarget[toIdx]/pTarget[fromIdx] , 1 )
}
}
# Compute the matrix of overall move probabilities:
moveMatrix = proposalMatrix * acceptMatrix
# Compute the transition matrix, including the probability of staying in place:
transitionMatrix = moveMatrix
for ( diagIdx in 1:nSlots ) {
transitionMatrix[diagIdx,diagIdx] = 1.0 - sum(moveMatrix[diagIdx,])
}
show( transitionMatrix )
# Specify starting position vector:
positionVec = rep(0,nSlots)
positionVec[round(nSlots/2)] = 1.0
# Loop through time steps and display position probabilities:
windows(height=7,width=7) # change to x11 for non-WindowsOS
layout( matrix(1:16,nrow=4) )
par( mar = c( 2.8 , 2.8 , 1.0 , 0.5 ) , mgp = c( 1.4 , 0.5 , 0 ) )
for ( timeIdx in 1:15 ) {
# Display position probability:
plot( 1:nSlots , positionVec , xlab="Position" , ylab="Probability" ,
type="h" , lwd=5 , col="skyblue" )
text( 1 , max(positionVec) , bquote(t==.(timeIdx)) , adj=c(0,1) )
# Update next position probability:
positionVec = positionVec %*% transitionMatrix
}
# Plot target distribution
plot( 1:nSlots , pTarget , xlab="Position" , ylab="Probability" ,
type="h" , lwd=5 , col="skyblue" )
text( 1 , max(positionVec) , bquote(target) , adj=c(0,1) )
savePlot( filename=paste(fileNameRoot,pTargetName,sep="") , type="jpg" )
For the linearly increasing target distribution, the transition matrix looks like this:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] 0.50 0.5000000 0.0000000 0.000 0.0000000 0.00000000 0.0000000 [2,] 0.25 0.2500000 0.5000000 0.000 0.0000000 0.00000000 0.0000000 [3,] 0.00 0.3333333 0.1666667 0.500 0.0000000 0.00000000 0.0000000 [4,] 0.00 0.0000000 0.3750000 0.125 0.5000000 0.00000000 0.0000000 [5,] 0.00 0.0000000 0.0000000 0.400 0.1000000 0.50000000 0.0000000 [6,] 0.00 0.0000000 0.0000000 0.000 0.4166667 0.08333333 0.5000000 [7,] 0.00 0.0000000 0.0000000 0.000 0.0000000 0.42857143 0.5714286The row is the "from" position and the column is the "to" position. For example, the probability of moving from position 4 to position 3 is 0.375.
The resulting graph looks like this:
If you uncomment the lines that specify a bimodal target, the resulting graph looks like this:
Friday, August 3, 2012
Program updates and changes
The plotPost() function, which plots histograms of MCMC chains annotated with HDI and other info, was changed a few weeks ago. plotPost() now returns summary information about the chain instead of summary information about the histogram. A reader caught a conflict this caused in the program BernMetropolisTemplate.R, which has now been changed to accommodate the new plotPost(). The new programs can be found at the usual site: http://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/Programs/?M=D.The zip file with all the programs has also been updated. Please let me know right away if you find other programs that have conflicts with the new plotPost(). Greatly appreciated!
Subscribe to:
Posts (Atom)