A little background: Suppose we have two groups of metric data. In each group the data seem to be unimodal and symmetrically distributed, but perhaps with outliers relative to a normal distribution. So we chose to describe the groups with possibly heavy-tailed t distributions, but using the same normality (df) parameter for both groups because outliers are rare (and it's the outliers that most influence the normality parameter). Thus there are five parameters to be estimated: μ1, σ1, μ2, σ2, and the normality parameter ν.
The original BEST program put simplistic broad priors on the five parameters. For example, σ1 was given a broad uniform prior extending across a range that far exceeded any plausible estimate for the scale of the data. It was explained how to make basic modifications of the prior in Appendix B of the article describing the programs. But the programs really were not intended for easy expression of informed priors. In the post I present a version of the BEST programs that are much easier for expression of informed priors.
In the new version, each parameter is given its own line of code for expressing its prior, so that each can be individually set.
- Each μ is given a normal prior with its own (constant) mean and standard deviation.
- Each σ is given a gamma prior with its own (constant) shape and rate. The shape and rate are derived from the desired mode and standard deviation of the gamma distribution.
- The normality parameter ν is given a gamma prior with its own (constant) shape and rate. The shape and rate are derived from the desired mean and standard deviation of the gamma distribution.
Pages 236-239 of DBDA2E explain the gamma distribution and formulas for converting the desired mode or mean and standard deviation to corresponding shape and rate values. In particular, it defines the R functions gammaShRaFromModeSD and gammaShRaFromMeanSD that convert desired mean or mode and SD to corresponding shape and rate parameters.
The JAGS model specification is shown below. To understand it, note that the metric data values are y[i] and the corresponding group membership is specified by x[i].
model {
for ( i in 1:Ntotal ) {
y[i] ~ dt( mu[x[i]] , 1/sigma[x[i]]^2 , nu )
}
mu[1] ~ dnorm( muM1 , 1/muSD1^2 ) # prior for mu[1]
sigma[1] ~ dgamma( Sh1 , Ra1 ) # prior for sigma[1]
mu[2] ~ dnorm( muM2 , 1/muSD2^2 ) # prior for mu[2]
sigma[2] ~ dgamma( Sh2 , Ra2 ) # prior for sigma[2]
nu <- nuMinusOne+1
nuMinusOne ~ dgamma( ShNu , RaNu ) # prior for nu
}
To set the prior constants so that they are broad on the scale of the data, I use the following (somewhat arbitrary) settings:
muM1 = mean(y) # centered on the data
muSD1 = sd(y)*5 # broad relative to the scale of the data
Sh1 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$shape
Ra1 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$rate
muM2 = mean(y)
muSD2 = sd(y)*5
Sh2 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$shape
Ra2 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$rate
ShNu = gammaShRaFromMeanSD( mean=29 , sd=29 )$shape
RaNu = gammaShRaFromMeanSD( mean=29 , sd=29 )$rate
Notice that the gamma prior for nuMinusOne is exactly the exponential prior used in the original program when its mean=29 and sd=29.
When the new program is run with the broad priors specified above, it gives output virtually identical to the original program for any modest amount of data.
Suppose now that you want to specify a strong prior on the first group, such that you want μ1 to be very nearly 100 and σ1 to be very nearly 15. You could specify that as
muM1 = 15
muSD1 = 1 # very small uncertainty
Sh1 = gammaShRaFromModeSD( mode=15 , sd=1 )$shape
Ra1 = gammaShRaFromModeSD( mode=15 , sd=1 )$rate
But it doesn't often make sense to just arbitrary put strong constraints in the prior. Instead, and in general, we would like the prior to resemble a posterior from previous data, starting with a vague proto-prior. Instead of relying on intuition to express an informed prior directly on the parameters, we start with a vague proto-prior and enter some reasonable data that instantiate our prior knowledge. Then we run the Bayesian analysis on the data and observe the resulting posterior distribution. It is the posterior distribution that is now the informed prior for subsequent data analysis. From the posterior distribution of the prior data, just "read off" the mode and sd of the marginal posterior on σ and other parameters. In the program, the MCMC chain is in the matrix mcmcChain, so for example, we can get at the mean and sd of the nu parameter using mean( mcmcChain[,"nu"] ) and sd( mcmcChain[,"nu"] ).
The full scripts are listed below:
BESTgamma.R:
# MODIFIED 4/28/15 WITH GAMMA PRIOR ON SIGMA
# John K. Kruschke
# johnkruschke@gmail.com
# http://www.indiana.edu/~kruschke/BEST/
#
# This program is believed to be free of errors, but it comes with no guarantee!
# The user bears all responsibility for interpreting the results.
# Please check the webpage above for updates or corrections.
#
### ***************************************************************
### ******** SEE FILE BESTgammaExample.R FOR INSTRUCTIONS **************
### ***************************************************************
source("openGraphSaveGraph.R") # graphics functions for Windows, Mac OS, Linux
# Function for shape and rate parameters of gamma. From DBDA2E-utilities.R; see
# p. 238 of DBDA2E. DBDA2E = Doing Bayesian Data Analysis Second Edition,
# https://sites.google.com/site/doingbayesiandataanalysis/
gammaShRaFromModeSD = function( mode , sd ) {
if ( mode <=0 ) stop("mode must be > 0")
if ( sd <=0 ) stop("sd must be > 0")
rate = ( mode + sqrt( mode^2 + 4 * sd^2 ) ) / ( 2 * sd^2 )
shape = 1 + mode * rate
return( list( shape=shape , rate=rate ) )
}
gammaShRaFromMeanSD = function( mean , sd ) {
if ( mean <=0 ) stop("mean must be > 0")
if ( sd <=0 ) stop("sd must be > 0")
shape = mean^2/sd^2
rate = mean/sd^2
return( list( shape=shape , rate=rate ) )
}
BESTmcmc = function( y1, y2, numSavedSteps=100000, thinSteps=1, showMCMC=FALSE) {
# This function generates an MCMC sample from the posterior distribution.
# Description of arguments:
# showMCMC is a flag for displaying diagnostic graphs of the chains.
# If F (the default), no chain graphs are displayed. If T, they are.
require(rjags)
#----------------------------------------------------------------------------
# THE MODEL.
modelString = "
model {
for ( i in 1:Ntotal ) {
y[i] ~ dt( mu[x[i]] , 1/sigma[x[i]]^2 , nu )
}
mu[1] ~ dnorm( muM1 , 1/muSD1^2 ) # prior for mu[1]
sigma[1] ~ dgamma( Sh1 , Ra1 ) # prior for sigma[1]
mu[2] ~ dnorm( muM2 , 1/muSD2^2 ) # prior for mu[2]
sigma[2] ~ dgamma( Sh2 , Ra2 ) # prior for sigma[2]
nu <- nuMinusOne+1
nuMinusOne ~ dgamma( ShNu , RaNu ) # prior for nu
}
" # close quote for modelString
# Write out modelString to a text file
writeLines( modelString , con="BESTmodel.txt" )
#------------------------------------------------------------------------------
# THE DATA.
# Load the data:
y = c( y1 , y2 ) # combine data into one vector
x = c( rep(1,length(y1)) , rep(2,length(y2)) ) # create group membership code
Ntotal = length(y)
# Specify the data and prior constants in a list, for later shipment to JAGS:
dataList = list(
y = y ,
x = x ,
Ntotal = Ntotal ,
# Below are the specifications for the prior constants. These are generic
# broad priors, but you can replace with informed values if appropriate.
muM1 = mean(y) ,
muSD1 = sd(y)*5 ,
muM2 = mean(y) ,
muSD2 = sd(y)*5 ,
Sh1 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$shape ,
Ra1 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$rate ,
Sh2 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$shape ,
Ra2 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$rate ,
ShNu = gammaShRaFromMeanSD( mean=29 , sd=29 )$shape ,
RaNu = gammaShRaFromMeanSD( mean=29 , sd=29 )$rate
)
#------------------------------------------------------------------------------
# INTIALIZE THE CHAINS.
# Initial values of MCMC chains based on data:
mu = c( mean(y1) , mean(y2) )
sigma = c( sd(y1) , sd(y2) )
# Regarding initial values in next line: (1) sigma will tend to be too big if
# the data have outliers, and (2) nu starts at 5 as a moderate value. These
# initial values keep the burn-in period moderate.
initsList = list( mu = mu , sigma = sigma , nuMinusOne = 4 )
#------------------------------------------------------------------------------
# RUN THE CHAINS
parameters = c( "mu" , "sigma" , "nu" ) # The parameters to be monitored
adaptSteps = 500 # Number of steps to "tune" the samplers
burnInSteps = 1000
nChains = 3
nIter = ceiling( ( numSavedSteps * thinSteps ) / nChains )
# Create, initialize, and adapt the model:
jagsModel = jags.model( "BESTmodel.txt" , data=dataList , inits=initsList ,
n.chains=nChains , n.adapt=adaptSteps )
# Burn-in:
cat( "Burning in the MCMC chain...\n" )
update( jagsModel , n.iter=burnInSteps )
# The saved MCMC chain:
cat( "Sampling final MCMC chain...\n" )
codaSamples = coda.samples( jagsModel , variable.names=parameters ,
n.iter=nIter , thin=thinSteps )
# resulting codaSamples object has these indices:
# codaSamples[[ chainIdx ]][ stepIdx , paramIdx ]
#------------------------------------------------------------------------------
# EXAMINE THE RESULTS
if ( showMCMC ) {
openGraph(width=7,height=7)
autocorr.plot( codaSamples[[1]] , ask=FALSE )
show( gelman.diag( codaSamples ) )
effectiveChainLength = effectiveSize( codaSamples )
show( effectiveChainLength )
}
# Convert coda-object codaSamples to matrix object for easier handling.
# But note that this concatenates the different chains into one long chain.
# Result is mcmcChain[ stepIdx , paramIdx ]
mcmcChain = as.matrix( codaSamples )
return( mcmcChain )
} # end function BESTmcmc
#==============================================================================
BESTsummary = function( y1 , y2 , mcmcChain ) {
source("HDIofMCMC.R")
mcmcSummary = function( paramSampleVec , compVal=NULL ) {
meanParam = mean( paramSampleVec )
medianParam = median( paramSampleVec )
dres = density( paramSampleVec )
modeParam = dres$x[which.max(dres$y)]
hdiLim = HDIofMCMC( paramSampleVec )
if ( !is.null(compVal) ) {
pcgtCompVal = ( 100 * sum( paramSampleVec > compVal )
/ length( paramSampleVec ) )
} else {
pcgtCompVal=NA
}
return( c( meanParam , medianParam , modeParam , hdiLim , pcgtCompVal ) )
}
# Define matrix for storing summary info:
summaryInfo = matrix( 0 , nrow=9 , ncol=6 , dimnames=list(
PARAMETER=c( "mu1" , "mu2" , "muDiff" , "sigma1" , "sigma2" , "sigmaDiff" ,
"nu" , "nuLog10" , "effSz" ),
SUMMARY.INFO=c( "mean" , "median" , "mode" , "HDIlow" , "HDIhigh" ,
"pcgtZero" )
) )
summaryInfo[ "mu1" , ] = mcmcSummary( mcmcChain[,"mu[1]"] )
summaryInfo[ "mu2" , ] = mcmcSummary( mcmcChain[,"mu[2]"] )
summaryInfo[ "muDiff" , ] = mcmcSummary( mcmcChain[,"mu[1]"]
- mcmcChain[,"mu[2]"] ,
compVal=0 )
summaryInfo[ "sigma1" , ] = mcmcSummary( mcmcChain[,"sigma[1]"] )
summaryInfo[ "sigma2" , ] = mcmcSummary( mcmcChain[,"sigma[2]"] )
summaryInfo[ "sigmaDiff" , ] = mcmcSummary( mcmcChain[,"sigma[1]"]
- mcmcChain[,"sigma[2]"] ,
compVal=0 )
summaryInfo[ "nu" , ] = mcmcSummary( mcmcChain[,"nu"] )
summaryInfo[ "nuLog10" , ] = mcmcSummary( log10(mcmcChain[,"nu"]) )
N1 = length(y1)
N2 = length(y2)
effSzChain = ( ( mcmcChain[,"mu[1]"] - mcmcChain[,"mu[2]"] )
/ sqrt( ( mcmcChain[,"sigma[1]"]^2 + mcmcChain[,"sigma[2]"]^2 ) / 2 ) )
summaryInfo[ "effSz" , ] = mcmcSummary( effSzChain , compVal=0 )
# Or, use sample-size weighted version:
# effSz = ( mu1 - mu2 ) / sqrt( ( sigma1^2 *(N1-1) + sigma2^2 *(N2-1) )
# / (N1+N2-2) )
# Be sure also to change plot label in BESTplot function, below.
return( summaryInfo )
}
#==============================================================================
BESTplot = function( y1 , y2 , mcmcChain , ROPEm=NULL , ROPEsd=NULL ,
ROPEeff=NULL , showCurve=FALSE , pairsPlot=FALSE ) {
# This function plots the posterior distribution (and data).
# Description of arguments:
# y1 and y2 are the data vectors.
# mcmcChain is a list of the type returned by function BTT.
# ROPEm is a two element vector, such as c(-1,1), specifying the limit
# of the ROPE on the difference of means.
# ROPEsd is a two element vector, such as c(-1,1), specifying the limit
# of the ROPE on the difference of standard deviations.
# ROPEeff is a two element vector, such as c(-1,1), specifying the limit
# of the ROPE on the effect size.
# showCurve is TRUE or FALSE and indicates whether the posterior should
# be displayed as a histogram (by default) or by an approximate curve.
# pairsPlot is TRUE or FALSE and indicates whether scatterplots of pairs
# of parameters should be displayed.
mu1 = mcmcChain[,"mu[1]"]
mu2 = mcmcChain[,"mu[2]"]
sigma1 = mcmcChain[,"sigma[1]"]
sigma2 = mcmcChain[,"sigma[2]"]
nu = mcmcChain[,"nu"]
if ( pairsPlot ) {
# Plot the parameters pairwise, to see correlations:
openGraph(width=7,height=7)
nPtToPlot = 1000
plotIdx = floor(seq(1,length(mu1),by=length(mu1)/nPtToPlot))
panel.cor = function(x, y, digits=2, prefix="", cex.cor, ...) {
usr = par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r = (cor(x, y))
txt = format(c(r, 0.123456789), digits=digits)[1]
txt = paste(prefix, txt, sep="")
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex=1.25 ) # was cex=cex.cor*r
}
pairs( cbind( mu1 , mu2 , sigma1 , sigma2 , log10(nu) )[plotIdx,] ,
labels=c( expression(mu[1]) , expression(mu[2]) ,
expression(sigma[1]) , expression(sigma[2]) ,
expression(log10(nu)) ) ,
lower.panel=panel.cor , col="skyblue" )
}
source("plotPost.R")
# Set up window and layout:
openGraph(width=6.0,height=8.0)
layout( matrix( c(4,5,7,8,3,1,2,6,9,10) , nrow=5, byrow=FALSE ) )
par( mar=c(3.5,3.5,2.5,0.5) , mgp=c(2.25,0.7,0) )
# Select thinned steps in chain for plotting of posterior predictive curves:
chainLength = NROW( mcmcChain )
nCurvesToPlot = 30
stepIdxVec = seq( 1 , chainLength , floor(chainLength/nCurvesToPlot) )
xRange = range( c(y1,y2) )
xLim = c( xRange[1]-0.1*(xRange[2]-xRange[1]) ,
xRange[2]+0.1*(xRange[2]-xRange[1]) )
xVec = seq( xLim[1] , xLim[2] , length=200 )
maxY = max( dt( 0 , df=max(nu[stepIdxVec]) ) /
min(c(sigma1[stepIdxVec],sigma2[stepIdxVec])) )
# Plot data y1 and smattering of posterior predictive curves:
stepIdx = 1
plot( xVec , dt( (xVec-mu1[stepIdxVec[stepIdx]])/sigma1[stepIdxVec[stepIdx]] ,
df=nu[stepIdxVec[stepIdx]] )/sigma1[stepIdxVec[stepIdx]] ,
ylim=c(0,maxY) , cex.lab=1.75 ,
type="l" , col="skyblue" , lwd=1 , xlab="y" , ylab="p(y)" ,
main="Data Group 1 w. Post. Pred." )
for ( stepIdx in 2:length(stepIdxVec) ) {
lines(xVec, dt( (xVec-mu1[stepIdxVec[stepIdx]])/sigma1[stepIdxVec[stepIdx]] ,
df=nu[stepIdxVec[stepIdx]] )/sigma1[stepIdxVec[stepIdx]] ,
type="l" , col="skyblue" , lwd=1 )
}
histBinWd = median(sigma1)/2
histCenter = mean(mu1)
histBreaks = sort( c( seq( histCenter-histBinWd/2 , min(xVec)-histBinWd/2 ,
-histBinWd ),
seq( histCenter+histBinWd/2 , max(xVec)+histBinWd/2 ,
histBinWd ) , xLim ) )
histInfo = hist( y1 , plot=FALSE , breaks=histBreaks )
yPlotVec = histInfo$density
yPlotVec[ yPlotVec==0.0 ] = NA
xPlotVec = histInfo$mids
xPlotVec[ yPlotVec==0.0 ] = NA
points( xPlotVec , yPlotVec , type="h" , lwd=3 , col="red" )
text( max(xVec) , maxY , bquote(N[1]==.(length(y1))) , adj=c(1.1,1.1) )
# Plot data y2 and smattering of posterior predictive curves:
stepIdx = 1
plot( xVec , dt( (xVec-mu2[stepIdxVec[stepIdx]])/sigma2[stepIdxVec[stepIdx]] ,
df=nu[stepIdxVec[stepIdx]] )/sigma2[stepIdxVec[stepIdx]] ,
ylim=c(0,maxY) , cex.lab=1.75 ,
type="l" , col="skyblue" , lwd=1 , xlab="y" , ylab="p(y)" ,
main="Data Group 2 w. Post. Pred." )
for ( stepIdx in 2:length(stepIdxVec) ) {
lines(xVec, dt( (xVec-mu2[stepIdxVec[stepIdx]])/sigma2[stepIdxVec[stepIdx]] ,
df=nu[stepIdxVec[stepIdx]] )/sigma2[stepIdxVec[stepIdx]] ,
type="l" , col="skyblue" , lwd=1 )
}
histBinWd = median(sigma2)/2
histCenter = mean(mu2)
histBreaks = sort( c( seq( histCenter-histBinWd/2 , min(xVec)-histBinWd/2 ,
-histBinWd ),
seq( histCenter+histBinWd/2 , max(xVec)+histBinWd/2 ,
histBinWd ) , xLim ) )
histInfo = hist( y2 , plot=FALSE , breaks=histBreaks )
yPlotVec = histInfo$density
yPlotVec[ yPlotVec==0.0 ] = NA
xPlotVec = histInfo$mids
xPlotVec[ yPlotVec==0.0 ] = NA
points( xPlotVec , yPlotVec , type="h" , lwd=3 , col="red" )
text( max(xVec) , maxY , bquote(N[2]==.(length(y2))) , adj=c(1.1,1.1) )
# Plot posterior distribution of parameter nu:
histInfo = plotPost( log10(nu) , col="skyblue" , # breaks=30 ,
showCurve=showCurve ,
xlab=bquote("log10("*nu*")") , cex.lab = 1.75 , showMode=TRUE ,
main="Normality" ) # (<0.7 suggests kurtosis)
# Plot posterior distribution of parameters mu1, mu2, and their difference:
xlim = range( c( mu1 , mu2 ) )
histInfo = plotPost( mu1 , xlim=xlim , cex.lab = 1.75 ,
showCurve=showCurve ,
xlab=bquote(mu[1]) , main=paste("Group",1,"Mean") ,
col="skyblue" )
histInfo = plotPost( mu2 , xlim=xlim , cex.lab = 1.75 ,
showCurve=showCurve ,
xlab=bquote(mu[2]) , main=paste("Group",2,"Mean") ,
col="skyblue" )
histInfo = plotPost( mu1-mu2 , compVal=0 , showCurve=showCurve ,
xlab=bquote(mu[1] - mu[2]) , cex.lab = 1.75 , ROPE=ROPEm ,
main="Difference of Means" , col="skyblue" )
# Plot posterior distribution of param's sigma1, sigma2, and their difference:
xlim=range( c( sigma1 , sigma2 ) )
histInfo = plotPost( sigma1 , xlim=xlim , cex.lab = 1.75 ,
showCurve=showCurve ,
xlab=bquote(sigma[1]) , main=paste("Group",1,"Std. Dev.") ,
col="skyblue" , showMode=TRUE )
histInfo = plotPost( sigma2 , xlim=xlim , cex.lab = 1.75 ,
showCurve=showCurve ,
xlab=bquote(sigma[2]) , main=paste("Group",2,"Std. Dev.") ,
col="skyblue" , showMode=TRUE )
histInfo = plotPost( sigma1-sigma2 ,
compVal=0 , showCurve=showCurve ,
xlab=bquote(sigma[1] - sigma[2]) , cex.lab = 1.75 ,
ROPE=ROPEsd ,
main="Difference of Std. Dev.s" , col="skyblue" , showMode=TRUE )
# Plot of estimated effect size. Effect size is d-sub-a from
# Macmillan & Creelman, 1991; Simpson & Fitter, 1973; Swets, 1986a, 1986b.
effectSize = ( mu1 - mu2 ) / sqrt( ( sigma1^2 + sigma2^2 ) / 2 )
histInfo = plotPost( effectSize , compVal=0 , ROPE=ROPEeff ,
showCurve=showCurve ,
xlab=bquote( (mu[1]-mu[2])
/sqrt((sigma[1]^2 +sigma[2]^2 )/2 ) ),
showMode=TRUE , cex.lab=1.0 , main="Effect Size" , col="skyblue" )
return( BESTsummary( y1 , y2 , mcmcChain ) )} # end of function BESTplot
#==============================================================================
Script to run after above file is sourced:
### Make sure that the following programs are all
### in the same folder as this file:
### BESTgamma.R
### plotPost.R
### HDIofMCMC.R
### HDIofICDF.R
### openGraphSaveGraph.R
### Make sure that R's working directory is the folder in which those
### files reside. In RStudio, use menu tabs Tools -> Set Working Directory.
### If working in R, use menu tabs File -> Change Dir.
# Get the functions loaded into R's working memory:
source("BESTgamma.R")
# Specify data as vectors:
# (Replace with your own data as needed. R can read many formats of data files,
# see the commands "scan" or "read.table" and its variants.)
y1 = c(101,100,102,104,102,97,105,105,98,101,100,123,105,103,100,95,102,106,
109,102,82,102,100,102,102,101,102,102,103,103,97,97,103,101,97,104,
96,103,124,101,101,100,101,101,104,100,101)
y2 = c(99,101,100,101,102,100,97,101,104,101,102,102,100,105,88,101,100,
104,100,100,100,101,102,103,97,101,101,100,101,99,101,100,100,
101,100,99,101,100,102,99,100,99)
# Run the Bayesian analysis:
mcmcChain = BESTmcmc( y1 , y2 , numSavedSteps=10000 )
# Plot the results of the Bayesian analysis:
postInfo = BESTplot( y1 , y2 , mcmcChain , ROPEeff=c(-0.1,0.1) )
# Show detailed summary info on console:
show( postInfo )
#-------------------------------------------------------------------------------