## Tuesday, April 28, 2015

### Informed priors for Bayesian comparison of two groups

The BEST programs, for Bayesian estimation of two groups, were written with generic vague priors only minimally informed by the scale of the data. Here are new versions of the programs that are better suited for specifying informed priors.

A little background: Suppose we have two groups of metric data. In each group the data seem to be unimodal and symmetrically distributed, but perhaps with outliers relative to a normal distribution. So we chose to describe the groups with possibly heavy-tailed t distributions, but using the same normality (df) parameter for both groups because outliers are rare (and it's the outliers that most influence the normality parameter). Thus there are five parameters to be estimated: μ1, σ1, μ2, σ2, and the normality parameter ν.

The original BEST program put simplistic broad priors on the five parameters. For example, σ1 was given a broad uniform prior extending across a range that far exceeded any plausible estimate for the scale of the data. It was explained how to make basic modifications of the prior in Appendix B of the article describing the programs. But the programs really were not intended for easy expression of informed priors. In the post I present a version of the BEST programs that are much easier for expression of informed priors.

In the new version, each parameter is given its own line of code for expressing its prior, so that each can be individually set.

• Each μ is given a normal prior with its own (constant) mean and standard deviation.
• Each σ is given a gamma prior with its own (constant) shape and rate. The shape and rate are derived from the desired mode and standard deviation of the gamma distribution.
• The normality parameter ν is given a gamma prior with its own (constant) shape and rate. The shape and rate are derived from the desired mean and standard deviation of the gamma distribution.

Pages 236-239 of DBDA2E explain the gamma distribution and formulas for converting the desired mode or mean and standard deviation to corresponding shape and rate values. In particular, it defines the R functions gammaShRaFromModeSD and gammaShRaFromMeanSD that convert desired mean or mode and SD to corresponding shape and rate parameters.

The JAGS model specification is shown below. To understand it, note that the metric data values are y[i] and the corresponding group membership is specified by x[i].

model {
for ( i in 1:Ntotal ) {
y[i] ~ dt( mu[x[i]] , 1/sigma[x[i]]^2 , nu )
}
mu ~ dnorm( muM1 , 1/muSD1^2 )  # prior for mu
sigma ~ dgamma( Sh1 , Ra1 )     # prior for sigma
mu ~ dnorm( muM2 , 1/muSD2^2 )  # prior for mu
sigma ~ dgamma( Sh2 , Ra2 )     # prior for sigma
nu <- nuMinusOne+1
nuMinusOne ~ dgamma( ShNu , RaNu ) # prior for nu
}

To set the prior constants so that they are broad on the scale of the data, I use the following (somewhat arbitrary) settings:

muM1 = mean(y)  # centered on the data
muSD1 = sd(y)*5 # broad relative to the scale of the data

Sh1 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$shape Ra1 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$rate

muM2 = mean(y)
muSD2 = sd(y)*5

Sh2 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$shape Ra2 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$rate

ShNu = gammaShRaFromMeanSD( mean=29 , sd=29 )$shape RaNu = gammaShRaFromMeanSD( mean=29 , sd=29 )$rate

Notice that the gamma prior for nuMinusOne is exactly the exponential prior used in the original program when its mean=29 and sd=29.

When the new program is run with the broad priors specified above, it gives output virtually identical to the original program for any modest amount of data.

Suppose now that you want to specify a strong prior on the first group, such that you want μ1 to be very nearly 100 and σ1 to be very nearly 15. You could specify that as

muM1 = 15
muSD1 = 1 # very small uncertainty

Sh1 = gammaShRaFromModeSD( mode=15 , sd=1 )$shape Ra1 = gammaShRaFromModeSD( mode=15 , sd=1 )$rate

But it doesn't often make sense to just arbitrary put strong constraints in the prior. Instead, and in general, we would like the prior to resemble a posterior from previous data, starting with a vague proto-prior. Instead of relying on intuition to express an informed prior directly on the parameters, we start with a vague proto-prior and enter some reasonable data that instantiate our prior knowledge. Then we run the Bayesian analysis on the data and observe the resulting posterior distribution. It is the posterior distribution that is now the informed prior for subsequent data analysis. From the posterior distribution of the prior data, just "read off" the mode and sd of the marginal posterior on σ and other parameters. In the program, the MCMC chain is in the matrix mcmcChain, so for example, we can get at the mean and sd of the nu parameter using mean( mcmcChain[,"nu"] ) and sd( mcmcChain[,"nu"] ).

The full scripts are listed below:

BESTgamma.R:

# MODIFIED 4/28/15 WITH GAMMA PRIOR ON SIGMA
# John K. Kruschke
# johnkruschke@gmail.com
# http://www.indiana.edu/~kruschke/BEST/
#
# This program is believed to be free of errors, but it comes with no guarantee!
# The user bears all responsibility for interpreting the results.
#
### ***************************************************************
### ******** SEE FILE BESTgammaExample.R FOR INSTRUCTIONS **************
### ***************************************************************

source("openGraphSaveGraph.R") # graphics functions for Windows, Mac OS, Linux

# Function for shape and rate parameters of gamma. From DBDA2E-utilities.R; see
# p. 238 of DBDA2E. DBDA2E = Doing Bayesian Data Analysis Second Edition,
gammaShRaFromModeSD = function( mode , sd ) {
if ( mode <=0 ) stop("mode must be > 0")
if ( sd <=0 ) stop("sd must be > 0")
rate = ( mode + sqrt( mode^2 + 4 * sd^2 ) ) / ( 2 * sd^2 )
shape = 1 + mode * rate
return( list( shape=shape , rate=rate ) )
}

gammaShRaFromMeanSD = function( mean , sd ) {
if ( mean <=0 ) stop("mean must be > 0")
if ( sd <=0 ) stop("sd must be > 0")
shape = mean^2/sd^2
rate = mean/sd^2
return( list( shape=shape , rate=rate ) )
}

BESTmcmc = function( y1, y2, numSavedSteps=100000, thinSteps=1, showMCMC=FALSE) {
# This function generates an MCMC sample from the posterior distribution.
# Description of arguments:
# showMCMC is a flag for displaying diagnostic graphs of the chains.
#    If F (the default), no chain graphs are displayed. If T, they are.
require(rjags)

#----------------------------------------------------------------------------
# THE MODEL.
modelString = "
model {
for ( i in 1:Ntotal ) {
y[i] ~ dt( mu[x[i]] , 1/sigma[x[i]]^2 , nu )
}
mu ~ dnorm( muM1 , 1/muSD1^2 )  # prior for mu
sigma ~ dgamma( Sh1 , Ra1 )     # prior for sigma
mu ~ dnorm( muM2 , 1/muSD2^2 )  # prior for mu
sigma ~ dgamma( Sh2 , Ra2 )     # prior for sigma
nu <- nuMinusOne+1
nuMinusOne ~ dgamma( ShNu , RaNu ) # prior for nu
}
" # close quote for modelString
# Write out modelString to a text file
writeLines( modelString , con="BESTmodel.txt" )

#------------------------------------------------------------------------------
# THE DATA.
y = c( y1 , y2 ) # combine data into one vector
x = c( rep(1,length(y1)) , rep(2,length(y2)) ) # create group membership code
Ntotal = length(y)
# Specify the data and prior constants in a list, for later shipment to JAGS:
dataList = list(
y = y ,
x = x ,
Ntotal = Ntotal ,
# Below are the specifications for the prior constants. These are generic
# broad priors, but you can replace with informed values if appropriate.
muM1 = mean(y) ,
muSD1 = sd(y)*5 ,
muM2 = mean(y) ,
muSD2 = sd(y)*5 ,
Sh1 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$shape , Ra1 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$rate ,
Sh2 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$shape , Ra2 = gammaShRaFromModeSD( mode=sd(y) , sd=sd(y)*5 )$rate ,
ShNu = gammaShRaFromMeanSD( mean=29 , sd=29 )$shape , RaNu = gammaShRaFromMeanSD( mean=29 , sd=29 )$rate
)

#------------------------------------------------------------------------------
# INTIALIZE THE CHAINS.
# Initial values of MCMC chains based on data:
mu = c( mean(y1) , mean(y2) )
sigma = c( sd(y1) , sd(y2) )
# Regarding initial values in next line: (1) sigma will tend to be too big if
# the data have outliers, and (2) nu starts at 5 as a moderate value. These
# initial values keep the burn-in period moderate.
initsList = list( mu = mu , sigma = sigma , nuMinusOne = 4 )

#------------------------------------------------------------------------------
# RUN THE CHAINS

parameters = c( "mu" , "sigma" , "nu" )     # The parameters to be monitored
adaptSteps = 500               # Number of steps to "tune" the samplers
burnInSteps = 1000
nChains = 3
nIter = ceiling( ( numSavedSteps * thinSteps ) / nChains )
# Create, initialize, and adapt the model:
jagsModel = jags.model( "BESTmodel.txt" , data=dataList , inits=initsList ,
# Burn-in:
cat( "Burning in the MCMC chain...\n" )
update( jagsModel , n.iter=burnInSteps )
# The saved MCMC chain:
cat( "Sampling final MCMC chain...\n" )
codaSamples = coda.samples( jagsModel , variable.names=parameters ,
n.iter=nIter , thin=thinSteps )
# resulting codaSamples object has these indices:
#   codaSamples[[ chainIdx ]][ stepIdx , paramIdx ]

#------------------------------------------------------------------------------
# EXAMINE THE RESULTS
if ( showMCMC ) {
openGraph(width=7,height=7)
show( gelman.diag( codaSamples ) )
effectiveChainLength = effectiveSize( codaSamples )
show( effectiveChainLength )
}

# Convert coda-object codaSamples to matrix object for easier handling.
# But note that this concatenates the different chains into one long chain.
# Result is mcmcChain[ stepIdx , paramIdx ]
mcmcChain = as.matrix( codaSamples )
return( mcmcChain )

} # end function BESTmcmc

#==============================================================================

BESTsummary = function( y1 , y2 , mcmcChain ) {
source("HDIofMCMC.R")
mcmcSummary = function( paramSampleVec , compVal=NULL ) {
meanParam = mean( paramSampleVec )
medianParam = median( paramSampleVec )
dres = density( paramSampleVec )
modeParam = dres$x[which.max(dres$y)]
hdiLim = HDIofMCMC( paramSampleVec )
if ( !is.null(compVal) ) {
pcgtCompVal = ( 100 * sum( paramSampleVec > compVal )
/ length( paramSampleVec ) )
} else {
pcgtCompVal=NA
}
return( c( meanParam , medianParam , modeParam , hdiLim , pcgtCompVal ) )
}
# Define matrix for storing summary info:
summaryInfo = matrix( 0 , nrow=9 , ncol=6 , dimnames=list(
PARAMETER=c( "mu1" , "mu2" , "muDiff" , "sigma1" , "sigma2" , "sigmaDiff" ,
"nu" , "nuLog10" , "effSz" ),
SUMMARY.INFO=c( "mean" , "median" , "mode" , "HDIlow" , "HDIhigh" ,
"pcgtZero" )
) )
summaryInfo[ "mu1" , ] = mcmcSummary( mcmcChain[,"mu"] )
summaryInfo[ "mu2" , ] = mcmcSummary( mcmcChain[,"mu"] )
summaryInfo[ "muDiff" , ] = mcmcSummary( mcmcChain[,"mu"]
- mcmcChain[,"mu"] ,
compVal=0 )
summaryInfo[ "sigma1" , ] = mcmcSummary( mcmcChain[,"sigma"] )
summaryInfo[ "sigma2" , ] = mcmcSummary( mcmcChain[,"sigma"] )
summaryInfo[ "sigmaDiff" , ] = mcmcSummary( mcmcChain[,"sigma"]
- mcmcChain[,"sigma"] ,
compVal=0 )
summaryInfo[ "nu" , ] = mcmcSummary( mcmcChain[,"nu"] )
summaryInfo[ "nuLog10" , ] = mcmcSummary( log10(mcmcChain[,"nu"]) )

N1 = length(y1)
N2 = length(y2)
effSzChain = ( ( mcmcChain[,"mu"] - mcmcChain[,"mu"] )
/ sqrt( ( mcmcChain[,"sigma"]^2 + mcmcChain[,"sigma"]^2 ) / 2 ) )
summaryInfo[ "effSz" , ] = mcmcSummary( effSzChain , compVal=0 )
# Or, use sample-size weighted version:
# effSz = ( mu1 - mu2 ) / sqrt( ( sigma1^2 *(N1-1) + sigma2^2 *(N2-1) )
#                               / (N1+N2-2) )
# Be sure also to change plot label in BESTplot function, below.
return( summaryInfo )
}

#==============================================================================

BESTplot = function( y1 , y2 , mcmcChain , ROPEm=NULL , ROPEsd=NULL ,
ROPEeff=NULL , showCurve=FALSE , pairsPlot=FALSE ) {
# This function plots the posterior distribution (and data).
# Description of arguments:
# y1 and y2 are the data vectors.
# mcmcChain is a list of the type returned by function BTT.
# ROPEm is a two element vector, such as c(-1,1), specifying the limit
#   of the ROPE on the difference of means.
# ROPEsd is a two element vector, such as c(-1,1), specifying the limit
#   of the ROPE on the difference of standard deviations.
# ROPEeff is a two element vector, such as c(-1,1), specifying the limit
#   of the ROPE on the effect size.
# showCurve is TRUE or FALSE and indicates whether the posterior should
#   be displayed as a histogram (by default) or by an approximate curve.
# pairsPlot is TRUE or FALSE and indicates whether scatterplots of pairs
#   of parameters should be displayed.
mu1 = mcmcChain[,"mu"]
mu2 = mcmcChain[,"mu"]
sigma1 = mcmcChain[,"sigma"]
sigma2 = mcmcChain[,"sigma"]
nu = mcmcChain[,"nu"]
if ( pairsPlot ) {
# Plot the parameters pairwise, to see correlations:
openGraph(width=7,height=7)
nPtToPlot = 1000
plotIdx = floor(seq(1,length(mu1),by=length(mu1)/nPtToPlot))
panel.cor = function(x, y, digits=2, prefix="", cex.cor, ...) {
usr = par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r = (cor(x, y))
txt = format(c(r, 0.123456789), digits=digits)
txt = paste(prefix, txt, sep="")
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex=1.25 ) # was cex=cex.cor*r
}
pairs( cbind( mu1 , mu2 , sigma1 , sigma2 , log10(nu) )[plotIdx,] ,
labels=c( expression(mu) , expression(mu) ,
expression(sigma) , expression(sigma) ,
expression(log10(nu)) ) ,
lower.panel=panel.cor , col="skyblue" )
}
source("plotPost.R")
# Set up window and layout:
openGraph(width=6.0,height=8.0)
layout( matrix( c(4,5,7,8,3,1,2,6,9,10) , nrow=5, byrow=FALSE ) )
par( mar=c(3.5,3.5,2.5,0.5) , mgp=c(2.25,0.7,0) )

# Select thinned steps in chain for plotting of posterior predictive curves:
chainLength = NROW( mcmcChain )
nCurvesToPlot = 30
stepIdxVec = seq( 1 , chainLength , floor(chainLength/nCurvesToPlot) )
xRange = range( c(y1,y2) )
xLim = c( xRange-0.1*(xRange-xRange) ,
xRange+0.1*(xRange-xRange) )
xVec = seq( xLim , xLim , length=200 )
maxY = max( dt( 0 , df=max(nu[stepIdxVec]) ) /
min(c(sigma1[stepIdxVec],sigma2[stepIdxVec])) )
# Plot data y1 and smattering of posterior predictive curves:
stepIdx = 1
plot( xVec , dt( (xVec-mu1[stepIdxVec[stepIdx]])/sigma1[stepIdxVec[stepIdx]] ,
df=nu[stepIdxVec[stepIdx]] )/sigma1[stepIdxVec[stepIdx]] ,
ylim=c(0,maxY) , cex.lab=1.75 ,
type="l" , col="skyblue" , lwd=1 , xlab="y" , ylab="p(y)" ,
main="Data Group 1 w. Post. Pred." )
for ( stepIdx in 2:length(stepIdxVec) ) {
lines(xVec, dt( (xVec-mu1[stepIdxVec[stepIdx]])/sigma1[stepIdxVec[stepIdx]] ,
df=nu[stepIdxVec[stepIdx]] )/sigma1[stepIdxVec[stepIdx]] ,
type="l" , col="skyblue" , lwd=1 )
}
histBinWd = median(sigma1)/2
histCenter = mean(mu1)
histBreaks = sort( c( seq( histCenter-histBinWd/2 , min(xVec)-histBinWd/2 ,
-histBinWd ),
seq( histCenter+histBinWd/2 , max(xVec)+histBinWd/2 ,
histBinWd ) , xLim ) )
histInfo = hist( y1 , plot=FALSE , breaks=histBreaks )
yPlotVec = histInfo$density yPlotVec[ yPlotVec==0.0 ] = NA xPlotVec = histInfo$mids
xPlotVec[ yPlotVec==0.0 ] = NA
points( xPlotVec , yPlotVec , type="h" , lwd=3 , col="red" )
text( max(xVec) , maxY , bquote(N==.(length(y1))) , adj=c(1.1,1.1) )
# Plot data y2 and smattering of posterior predictive curves:
stepIdx = 1
plot( xVec , dt( (xVec-mu2[stepIdxVec[stepIdx]])/sigma2[stepIdxVec[stepIdx]] ,
df=nu[stepIdxVec[stepIdx]] )/sigma2[stepIdxVec[stepIdx]] ,
ylim=c(0,maxY) , cex.lab=1.75 ,
type="l" , col="skyblue" , lwd=1 , xlab="y" , ylab="p(y)" ,
main="Data Group 2 w. Post. Pred." )
for ( stepIdx in 2:length(stepIdxVec) ) {
lines(xVec, dt( (xVec-mu2[stepIdxVec[stepIdx]])/sigma2[stepIdxVec[stepIdx]] ,
df=nu[stepIdxVec[stepIdx]] )/sigma2[stepIdxVec[stepIdx]] ,
type="l" , col="skyblue" , lwd=1 )
}
histBinWd = median(sigma2)/2
histCenter = mean(mu2)
histBreaks = sort( c( seq( histCenter-histBinWd/2 , min(xVec)-histBinWd/2 ,
-histBinWd ),
seq( histCenter+histBinWd/2 , max(xVec)+histBinWd/2 ,
histBinWd ) , xLim ) )
histInfo = hist( y2 , plot=FALSE , breaks=histBreaks )
yPlotVec = histInfo$density yPlotVec[ yPlotVec==0.0 ] = NA xPlotVec = histInfo$mids
xPlotVec[ yPlotVec==0.0 ] = NA
points( xPlotVec , yPlotVec , type="h" , lwd=3 , col="red" )
text( max(xVec) , maxY , bquote(N==.(length(y2))) , adj=c(1.1,1.1) )

# Plot posterior distribution of parameter nu:
histInfo = plotPost( log10(nu) , col="skyblue" , # breaks=30 ,
showCurve=showCurve ,
xlab=bquote("log10("*nu*")") , cex.lab = 1.75 , showMode=TRUE ,
main="Normality" ) #  (<0.7 suggests kurtosis)

# Plot posterior distribution of parameters mu1, mu2, and their difference:
xlim = range( c( mu1 , mu2 ) )
histInfo = plotPost( mu1 ,  xlim=xlim , cex.lab = 1.75 ,
showCurve=showCurve ,
xlab=bquote(mu) , main=paste("Group",1,"Mean") ,
col="skyblue" )
histInfo = plotPost( mu2 ,  xlim=xlim , cex.lab = 1.75 ,
showCurve=showCurve ,
xlab=bquote(mu) , main=paste("Group",2,"Mean") ,
col="skyblue" )
histInfo = plotPost( mu1-mu2 , compVal=0 ,  showCurve=showCurve ,
xlab=bquote(mu - mu) , cex.lab = 1.75 , ROPE=ROPEm ,
main="Difference of Means" , col="skyblue" )

# Plot posterior distribution of param's sigma1, sigma2, and their difference:
xlim=range( c( sigma1 , sigma2 ) )
histInfo = plotPost( sigma1 ,  xlim=xlim , cex.lab = 1.75 ,
showCurve=showCurve ,
xlab=bquote(sigma) , main=paste("Group",1,"Std. Dev.") ,
col="skyblue" , showMode=TRUE )
histInfo = plotPost( sigma2 ,  xlim=xlim , cex.lab = 1.75 ,
showCurve=showCurve ,
xlab=bquote(sigma) , main=paste("Group",2,"Std. Dev.") ,
col="skyblue" , showMode=TRUE )
histInfo = plotPost( sigma1-sigma2 ,
compVal=0 ,  showCurve=showCurve ,
xlab=bquote(sigma - sigma) , cex.lab = 1.75 ,
ROPE=ROPEsd ,
main="Difference of Std. Dev.s" , col="skyblue" , showMode=TRUE )

# Plot of estimated effect size. Effect size is d-sub-a from
# Macmillan & Creelman, 1991; Simpson & Fitter, 1973; Swets, 1986a, 1986b.
effectSize = ( mu1 - mu2 ) / sqrt( ( sigma1^2 + sigma2^2 ) / 2 )
histInfo = plotPost( effectSize , compVal=0 ,  ROPE=ROPEeff ,
showCurve=showCurve ,
xlab=bquote( (mu-mu)
/sqrt((sigma^2 +sigma^2 )/2 ) ),
showMode=TRUE , cex.lab=1.0 , main="Effect Size" , col="skyblue" )
return( BESTsummary( y1 , y2 , mcmcChain ) )} # end of function BESTplot

#==============================================================================

Script to run after above file is sourced:

###  Make sure that the following programs are all
###    in the same folder as this file:
###      BESTgamma.R
###      plotPost.R
###      HDIofMCMC.R
###      HDIofICDF.R
###      openGraphSaveGraph.R
### Make sure that R's working directory is the folder in which those
###    files reside. In RStudio, use menu tabs Tools -> Set Working Directory.
###    If working in R, use menu tabs File -> Change Dir.

# Get the functions loaded into R's working memory:
source("BESTgamma.R")

# Specify data as vectors:
# (Replace with your own data as needed. R can read many formats of data files,
# see the commands "scan" or "read.table" and its variants.)
y1 = c(101,100,102,104,102,97,105,105,98,101,100,123,105,103,100,95,102,106,
109,102,82,102,100,102,102,101,102,102,103,103,97,97,103,101,97,104,
96,103,124,101,101,100,101,101,104,100,101)
y2 = c(99,101,100,101,102,100,97,101,104,101,102,102,100,105,88,101,100,
104,100,100,100,101,102,103,97,101,101,100,101,99,101,100,100,
101,100,99,101,100,102,99,100,99)

# Run the Bayesian analysis:
mcmcChain = BESTmcmc( y1 , y2 , numSavedSteps=10000 )

# Plot the results of the Bayesian analysis:
postInfo = BESTplot( y1 , y2 , mcmcChain , ROPEeff=c(-0.1,0.1) )
# Show detailed summary info on console:
show( postInfo )

#-------------------------------------------------------------------------------

1. Hi,

I am very sorry to be directly asking for help on your blog ... It would be wonderful if you could help.

It's the first time I'm doing MCMC (the first time using R, even ;-)) and I have problems getting it to work.

I want to do a comparison of two groups, so I am using BEST. My likelihood function should be gamma (I'm generating the data, for now), and if I'm correct I should also specify a gamma prior on the mean for the posterior to be gamma (the prior on the standard deviation I am leaving uniform).

Even though the syntax is working now, the results - even for prior_only - look very wrong. And I have difficulties adapting the code for plotting (which comes with BEST) for a gamma prior/posterior.

It would be great if you could have a look at the code below (basically based on BEST but adapted) and tell if it's basically correct - I think it can't really be as the prior mu's are zero. Also I'd like to ask if you have code for plotting the results for MCMC with gamma - but of course this only really matters if I can get it basically correct :-)

2. This is how I generate the test data:

*************************************************************

y1_mean <- 14.3
y1_std <- 18.6
y1_n <- 44105
y1_alpha <- y1_mean^2 /y1_std^2
y1_beta <- y1_mean / y1_alpha
y1_gamma <- rgamma (n = y1_n, shape = y1_alpha, scale = y1_beta)

y2_mean <- 16.3
y2_std <- 39.1
y2_n <- 37641
y2_alpha <- y2_mean^2 /y2_std^2
y2_beta <- y2_mean / y2_alpha
y2_gamma <- rgamma (n = y2_n, shape = y2_alpha, scale = y2_beta)

*************************************************************

This is the adapted part from BEST.R:

*************************************************************

BESTmcmc = function( y1 , y2 ,
priorOnly=FALSE , showMCMC=FALSE ,
numSavedSteps=20000 , thinSteps=1 ,
mu1PriorMean = mean(c(y1,y2)) ,
mu1PriorSD = sd(c(y1,y2))*5 ,
mu2PriorMean = mean(c(y1,y2)) ,
mu2PriorSD = sd(c(y1,y2))*5 ,
runjagsMethod=runjagsMethodDefault ,
nChains=nChainsDefault ) {

y = c( y1 , y2 ) # combine data into one vector
x = c( rep(1,length(y1)) , rep(2,length(y2)) ) # create group membership code
Ntotal = length(y)
# Specify the data and prior constants in a list, for later shipment to JAGS:

if ( priorOnly ) {
dataList = list(
# y = y ,
x = x ,
Ntotal = Ntotal ,
priorSha1 = mu1PriorMean^2 / mu1PriorSD^2,
priorRa1 = mu1PriorMean / mu1PriorSD^2 ,
priorSha2 = mu2PriorMean^2 / mu2PriorSD^2,
priorRa2 = mu2PriorMean / mu2PriorSD^2
)
} else {
dataList = list(
y = y ,
x = x ,
Ntotal = Ntotal ,
priorSha1 = mu1PriorMean^2 / mu1PriorSD^2,
priorRa1 = mu1PriorMean / mu1PriorSD^2 ,
priorSha2 = mu2PriorMean^2 / mu2PriorSD^2,
priorRa2 = mu2PriorMean / mu2PriorSD^2
)

}

#----------------------------------------------------------------------------
# THE MODEL.
modelString = "
model {
for ( i in 1:Ntotal ) {
y[i] ~ dgamma( sha[x[i]] , ra[x[i]] )
}
sha <- pow(mu,2) / pow(sd,2)
sha <- pow(mu,2) / pow(sd,2)
ra <- mu / pow(sd,2)
ra <- mu / pow(sd,2)
mu ~ dgamma(priorSha1, priorRa1)
mu ~ dgamma(priorSha2, priorRa2)
sd ~ dunif(0,100)
sd ~ dunif(0,100)
}
" # close quote for modelString
# Write out modelString to a text file
writeLines( modelString , con="BESTmodel.txt" )

#------------------------------------------------------------------------------
# INTIALIZE THE CHAINS.
mu = c( mean(y1) , mean(y2) )
sd = c( sd(y1) , sd(y2) )
initsList = list( mu = mu, sd = sd)

#------------------------------------------------------------------------------
# RUN THE CHAINS

parameters = c( "mu" , "sd") # The parameters to be monitored

*************************************************************

I'd be very grateful for any help.

best
Sigrid

3. Have you seen the following post:
http://doingbayesiandataanalysis.blogspot.com/2012/08/gamma-likelihood-parameterized-by-mode.html
It might help. In particular, try a uniform prior on the gamma mode (or gamma mean).

I've only glanced at your code and it seems basically okay, but I haven't studied it closely. One suggestion: For debugging, try data N around 100, not 40000! It'll take a long time to churn through 2*40000 data points.

4. Thank you very much! Yes, I've been using that post for reference too :-)
But because of what I'd like to show, I'd prefer an informed prior distribution, not uniform - actually, best would be - like you use in BEST - one common prior for the mean for both groups.

But if you say it looks basically ok, i will try debugging!

For plotting the results, I should probably refer to that blog post and try to adapt it to two groups I guess ...

Thanks again!
Sigrid

5. Hi John—
I'm curious about the default prior on the groups' means you use in BEST (where the sd is set to the tau precision of the joint data: (5*sd)^-2). It works perfectly with the JAGS algorithm but seems to be far too narrow for others, like PyMC3's NUTS (cf. https://gist.github.com/daeh/d409d6dc5f7fd95c48b9269ed4d012d7). Do you have any idea why that's the case or suggestions about how I should think about parameterizing the priors for problems like this in a general way?

Thanks!!

1. (Think about in a general way as in given that it seems to be a function of the inference algorithm. This blog post was an excellent description of how to think about the priors given that they interact with JAGS like they do here. It just seems weird to need to play with the prior until I find one that works on a per algorithm basis.)

2. A model should give the same results no matter what software it's programmed in. One thing that comes to mind: Does PyMC3 parameterize the t distribution (and normal distribution) in terms of precision like JAGS does? Or does PyMC3 use standard deviation, like Stan does? Be careful to code the model appropriately...

3. that was exactly the issue. thanks a ton!