Nils Winter has created a Matlab interface to BEST (Bayesian estimation for two groups of metric-scale data). You can read about it at his GitHub README.

# Doing Bayesian Data Analysis

## Thursday, April 28, 2016

## Tuesday, April 19, 2016

### Bayesian estimation of log-normal parameters - Update

Using the log-normal density can be
confusing because it's parameterized in terms of the mean and precision
of the log-scale data, not the original-scale data. Thus, if your data,
y, are nicely described by a log-normal distribution, the estimated mean
and precision are for

This post updates the R+JAGS code from a previous post. Please see that previous post for more background. The previous post had code appropriate for the 1st edition of the book; this post updates a few details for use with the 2nd edition of the book. In particular, the new code shows new MCMC diagnostic plots and uses a more appropriate prior on meanOfLogY. Here are some of its output graphs; R code follows below.

Here is the full R script:

**(y), not y. To get the corresponding parameters for the original-scale data, you have to transform the estimated parameters. And the prior on the mean and precision have to be appropriate for log(y), as opposed to y. This post shows an example of a JAGS program (with rjags) for estimating the parameters of log-normal distributed data.***log*This post updates the R+JAGS code from a previous post. Please see that previous post for more background. The previous post had code appropriate for the 1st edition of the book; this post updates a few details for use with the 2nd edition of the book. In particular, the new code shows new MCMC diagnostic plots and uses a more appropriate prior on meanOfLogY. Here are some of its output graphs; R code follows below.

Here is the full R script:

**#------------------------------------------------------------------------------**

# Jags-Ymet-Xnom1grp-MlogNormal-Script.R

# April 19, 2016. John K. Kruschke.

# Requires auxiliary R scripts that accompany the book,

# Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:

# A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.

# graphics.off()

# rm(list=ls(all=TRUE))

source("DBDA2E-utilities.R")

fileNameRoot="Jags-Ymet-Xnom1grp-MlogNormal-Script"

graphFileType="png"

#------------------------------------------------------------------------------

# THE DATA.

# Generate some random data from known parameter values:

set.seed(47405)

trueLogM = 5.0 # true mean of log(y)

trueLogSD = 0.5 # true sd of log(y)

y = rnorm( n=125 ) # normal distribution of log-scale values

LogY = (y-mean(y))/sd(y)*trueLogSD + trueLogM

y = exp(LogY) # y is exponentiated values of log(y)

# OR JUST PUT IN YOUR ACTUAL DATA HERE:

# y = ...

# Package the data for shipping to JAGS:

dataList = list(

y = y ,

N = length(y) ,

meanOfLogY = mean(log(y)) ,

sdOfLogY = sd(log(y))

)

#------------------------------------------------------------------------------

# THE MODEL.

modelstring = "

model {

for( i in 1 : N ) {

y[i] ~ dlnorm( muOfLogY , 1/sigmaOfLogY^2 )

}

sigmaOfLogY ~ dunif( 0.001*sdOfLogY , 1000*sdOfLogY )

muOfLogY ~ dunif( 0.001*meanOfLogY , 1000*meanOfLogY )

muOfY <- exp(muOfLogY+sigmaOfLogY^2/2)

modeOfY <- exp(muOfLogY-sigmaOfLogY^2)

sigmaOfY <- sqrt(exp(2*muOfLogY+sigmaOfLogY^2)*(exp(sigmaOfLogY^2)-1))

}

" # close quote for modelstring

writeLines(modelstring,con="model.txt")

#------------------------------------------------------------------------------

# INTIALIZE THE CHAINS.

# Let JAGS do it

#------------------------------------------------------------------------------

# RUN THE CHAINS

require(rjags)

parameters = c("muOfLogY" , "sigmaOfLogY" , "muOfY" , "modeOfY" , "sigmaOfY" )

adaptSteps = 1000 # Number of steps to "tune" the samplers.

burnInSteps = 1000 # Number of steps to "burn-in" the samplers.

nChains = 3 # Number of chains to run.

numSavedSteps=20000 # Total number of steps in chains to save.

thinSteps=1 # Number of steps to "thin" (1=keep every step).

nPerChain = ceiling( ( numSavedSteps * thinSteps ) / nChains ) # Steps per chain.

# Create, initialize, and adapt the model:

jagsModel = jags.model( "model.txt" , data=dataList ,

n.chains=nChains , n.adapt=adaptSteps )

# Burn-in:

cat( "Burning in the MCMC chain...\n" )

update( jagsModel , n.iter=burnInSteps )

# The saved MCMC chain:

cat( "Sampling final MCMC chain...\n" )

mcmcCoda = coda.samples( jagsModel , variable.names=parameters ,

n.iter=nPerChain , thin=thinSteps )

#------------------------------------------------------------------------------

# EXAMINE THE RESULTS

# Display diagnostics of chain, for specified parameters:

parameterNames = varnames(mcmcCoda) # get all parameter names

for ( parName in parameterNames ) {

diagMCMC( codaObject=mcmcCoda , parName=parName ,

saveName=fileNameRoot , saveType=graphFileType )

}

# Convert coda-object codaSamples to matrix object for easier handling.

mcmcChain = as.matrix( mcmcCoda )

chainLength = NROW(mcmcChain)

openGraph(width=10,height=6)

layout(matrix(1:6,nrow=2,byrow=TRUE))

# posterior predictive

hist( dataList$y , xlab="y" , main="Data w. Post. Pred." , breaks=30 ,

col="pink" , border="white" , prob=TRUE , cex.lab=1.5)

pltIdx = floor(seq(1,chainLength,length=20))

xComb = seq( min(dataList$y) , max(dataList$y) , length=501 )

for ( chnIdx in pltIdx ) {

lines( xComb ,

dlnorm( xComb, mcmcChain[chnIdx,"muOfLogY"], mcmcChain[chnIdx,"sigmaOfLogY"] ),

col="skyblue" )

}

# param's of log(y)

postInfo = plotPost( mcmcChain[,"muOfLogY"] , xlab="mu of log(y)" )

postInfo = plotPost( mcmcChain[,"sigmaOfLogY"] , xlab="sigma of log(y)" )

# param's of y

postInfo = plotPost( mcmcChain[,"modeOfY"] , xlab="mode of y" )

postInfo = plotPost( mcmcChain[,"muOfY"] , xlab="mu of y" )

postInfo = plotPost( mcmcChain[,"sigmaOfY"] , xlab="sigma of y" , cenTend="mode")

saveGraph(file=fileNameRoot,type=graphFileType)

# Jags-Ymet-Xnom1grp-MlogNormal-Script.R

# April 19, 2016. John K. Kruschke.

# Requires auxiliary R scripts that accompany the book,

# Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:

# A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.

# graphics.off()

# rm(list=ls(all=TRUE))

source("DBDA2E-utilities.R")

fileNameRoot="Jags-Ymet-Xnom1grp-MlogNormal-Script"

graphFileType="png"

#------------------------------------------------------------------------------

# THE DATA.

# Generate some random data from known parameter values:

set.seed(47405)

trueLogM = 5.0 # true mean of log(y)

trueLogSD = 0.5 # true sd of log(y)

y = rnorm( n=125 ) # normal distribution of log-scale values

LogY = (y-mean(y))/sd(y)*trueLogSD + trueLogM

y = exp(LogY) # y is exponentiated values of log(y)

# OR JUST PUT IN YOUR ACTUAL DATA HERE:

# y = ...

# Package the data for shipping to JAGS:

dataList = list(

y = y ,

N = length(y) ,

meanOfLogY = mean(log(y)) ,

sdOfLogY = sd(log(y))

)

#------------------------------------------------------------------------------

# THE MODEL.

modelstring = "

model {

for( i in 1 : N ) {

y[i] ~ dlnorm( muOfLogY , 1/sigmaOfLogY^2 )

}

sigmaOfLogY ~ dunif( 0.001*sdOfLogY , 1000*sdOfLogY )

muOfLogY ~ dunif( 0.001*meanOfLogY , 1000*meanOfLogY )

muOfY <- exp(muOfLogY+sigmaOfLogY^2/2)

modeOfY <- exp(muOfLogY-sigmaOfLogY^2)

sigmaOfY <- sqrt(exp(2*muOfLogY+sigmaOfLogY^2)*(exp(sigmaOfLogY^2)-1))

}

" # close quote for modelstring

writeLines(modelstring,con="model.txt")

#------------------------------------------------------------------------------

# INTIALIZE THE CHAINS.

# Let JAGS do it

#------------------------------------------------------------------------------

# RUN THE CHAINS

require(rjags)

parameters = c("muOfLogY" , "sigmaOfLogY" , "muOfY" , "modeOfY" , "sigmaOfY" )

adaptSteps = 1000 # Number of steps to "tune" the samplers.

burnInSteps = 1000 # Number of steps to "burn-in" the samplers.

nChains = 3 # Number of chains to run.

numSavedSteps=20000 # Total number of steps in chains to save.

thinSteps=1 # Number of steps to "thin" (1=keep every step).

nPerChain = ceiling( ( numSavedSteps * thinSteps ) / nChains ) # Steps per chain.

# Create, initialize, and adapt the model:

jagsModel = jags.model( "model.txt" , data=dataList ,

n.chains=nChains , n.adapt=adaptSteps )

# Burn-in:

cat( "Burning in the MCMC chain...\n" )

update( jagsModel , n.iter=burnInSteps )

# The saved MCMC chain:

cat( "Sampling final MCMC chain...\n" )

mcmcCoda = coda.samples( jagsModel , variable.names=parameters ,

n.iter=nPerChain , thin=thinSteps )

#------------------------------------------------------------------------------

# EXAMINE THE RESULTS

# Display diagnostics of chain, for specified parameters:

parameterNames = varnames(mcmcCoda) # get all parameter names

for ( parName in parameterNames ) {

diagMCMC( codaObject=mcmcCoda , parName=parName ,

saveName=fileNameRoot , saveType=graphFileType )

}

# Convert coda-object codaSamples to matrix object for easier handling.

mcmcChain = as.matrix( mcmcCoda )

chainLength = NROW(mcmcChain)

openGraph(width=10,height=6)

layout(matrix(1:6,nrow=2,byrow=TRUE))

# posterior predictive

hist( dataList$y , xlab="y" , main="Data w. Post. Pred." , breaks=30 ,

col="pink" , border="white" , prob=TRUE , cex.lab=1.5)

pltIdx = floor(seq(1,chainLength,length=20))

xComb = seq( min(dataList$y) , max(dataList$y) , length=501 )

for ( chnIdx in pltIdx ) {

lines( xComb ,

dlnorm( xComb, mcmcChain[chnIdx,"muOfLogY"], mcmcChain[chnIdx,"sigmaOfLogY"] ),

col="skyblue" )

}

# param's of log(y)

postInfo = plotPost( mcmcChain[,"muOfLogY"] , xlab="mu of log(y)" )

postInfo = plotPost( mcmcChain[,"sigmaOfLogY"] , xlab="sigma of log(y)" )

# param's of y

postInfo = plotPost( mcmcChain[,"modeOfY"] , xlab="mode of y" )

postInfo = plotPost( mcmcChain[,"muOfY"] , xlab="mu of y" )

postInfo = plotPost( mcmcChain[,"sigmaOfY"] , xlab="sigma of y" , cenTend="mode")

saveGraph(file=fileNameRoot,type=graphFileType)

**#-------------------------------------------------------------------------------**## Friday, January 29, 2016

### The remains of R. A. Fisher are visited by Bayesians again

Thanks to Phillip Alday, the second edition of Doing Bayesian Data Analysis has visited the remains of R. A. Fisher at St. Peter's Cathedral in Adelaide. Here is a photo that Phillip snapped:

In a previous post, the first edition visited Fisher's remains. The book has also visited Bayes tomb, Laplace's memorial, and Jacob Bernoulli's grave. The goal here is to have some fun and pique some historical interest, not to show any (serious) disrespect. If you pose the book at sites of other relevant people, dead or not-quite-yet-dead, let me know!

In a previous post, the first edition visited Fisher's remains. The book has also visited Bayes tomb, Laplace's memorial, and Jacob Bernoulli's grave. The goal here is to have some fun and pique some historical interest, not to show any (serious) disrespect. If you pose the book at sites of other relevant people, dead or not-quite-yet-dead, let me know!

## Thursday, December 31, 2015

### (A pointer to) bayes.js: A Small Library for Doing MCMC in the Browser

Screen shot from bayes.js blog post. |

bayes.js: A small JavaScript library that implements an adaptive MCMC sampler and a couple of probability distributions, and that makes it relatively easy to implement simple Bayesian models in JavaScript.

See the post at http://www.sumsar.net/blog/2015/12/bayes-js-a-small-library-for-doing-mcmc-in-the-browser/

## Sunday, December 27, 2015

### Lessons from Bayesian disease diagnosis: Don't over-interpret the Bayes factor, VERSION 2

[This is a revised version of a previous post which is superseded by this post.

This revision has corrected derivations, new R/JAGS code, and new diagrams.]

**Overview**

"Captain, the prior probability of this character dying and leaving the show is infinitesimal." |

*not*appropriate because it ignores the prior probability of the disease, which presumably is very rare. Suppose the prior probability of having the disease is 1%. Then Bayes' rule implies that the posterior probability of having the disease is only about 16%, even after testing positive! This type of example is presented over and over again in introductory expositions (e.g., pp. 103-105 of DBDA2E), emphatically saying not to use only the detection and false alarm rates, and always to incorporate the prior probabilities of the conditions.

The ratio of the detection and false-alarm rates is the

*Bayes factor*, hence the lesson of the primary example is: Do not use the Bayes factor alone to make decisions; instead always incorporate the prior probabilities of the options. Unfortunately, this lesson is often forgotten or at least de-emphasized in general model-selection settings. In model-selection settings, some people compute the Bayes factor and use it alone as the basis for model selection. The tacit assumption is that the models all have equal prior probabilities. But this assumption of equal prior probabilities seems implausible in most situations, and should be explicitly justified. If we should not rely on the Bayes factor alone to make decisions in disease diagnosis or drug testing, why should we rely on the Bayes factor alone to make decisions in other model-selection applications?

In this post

- I explain the role of the Bayes factor in the simple disease-diagnosis situation. (This also applies to random drug screening.)
- Then I extend to the situation in which the prior probabilities are not certain. I show that in the limit, with
*utter*uncertainty in the prior probabilities, the posterior odds fade to match the Bayes factor. But even with some modest knowledge of the base rates, the posterior odds can be far from the Bayes factor. - Hierarchical diagrams are provided to illustrate the model comparisons.
- JAGS code is provided for hands-on experience.

**Bayes' rule applied to disease diagnosis**

A person is randomly selected from a population in which the prior probability of having a particular disease is 1%. Denote the condition of the person by the variable \(M\), which can have two values, either \(M\!\!=\!\!1\) for having the disease or \(M\!\!=\!\!2\) for not having the disease. The prior probability can be written as \(p(M\!\!=\!\!1) = 0.01\), and hence \(p(M\!\!=\!\!2) = 0.99\).

A diagnostic test for the disease can show a "positive" result suggesting that the disease is present, or a "negative" result suggesting that the disease is absent. The result of the test is the datum, denoted \(D\), which can have two values: \(D\!\!=\!\!1\) for "positive" or \(D\!\!=\!\!0\) for "negative". The correct-detection rate (a.k.a. sensitivity) of the test is 97%, meaning \(p( D\!\!=\!\!1 | M\!\!=\!\!1 ) = 0.97\). The false-alarm rate of the test is 5%, meaning \(p( D\!\!=\!\!1 | M\!\!=\!\!2 ) = 0.05\). The complement of the false alarm rate is called the specificity of the test, which in this example is 95%.

Applying Bayes' rule yields: \[ p( M\!\!=\!\!1| D\!\!=\!\!1 ) = p( D\!\!=\!\!1 | M\!\!=\!\!1 ) p( M\!\!=\!\!1 ) / p( D\!\!=\!\!1 ) \] and \[ p( M\!\!=\!\!2| D\!\!=\!\!1 ) = p( D\!\!=\!\!1 | M\!\!=\!\!2 ) p( M\!\!=\!\!2 ) / p( D\!\!=\!\!1 ) \] In the above two equations, the denominator \( p( D\!\!=\!\!1 ) \) is the same, computed as \( p( D\!\!=\!\!1 ) = \sum_m p( D\!\!=\!\!1 | M\!\!=\!\!m ) p( M\!\!=\!\!m ) \). The ratio of the two equations is \begin{align} \underbrace{\frac{p( M\!\!=\!\!1 | D\!\!=\!\!1 )}{p( M\!\!=\!\!2 | D\!\!=\!\!1 ) } }_{\mbox{posterior odds}} &= \underbrace{\frac{p( D\!\!=\!\!1 | M\!\!=\!\!1 )}{p( D\!\!=\!\!1 | M\!\!=\!\!2 )} }_{\mbox{Bayes factor}} \times \underbrace{\frac{p( M\!\!=\!\!1 ) }{ p( M\!\!=\!\!2 )}}_{\mbox{prior odds}} \label{eq:BF}\end{align} Notice from Equation \ref{eq:BF} that the Bayes factor is the ratio of the detection rate to the false alarm rate (when the datum is \(D\!\!=\!\!1\)). The Bayes factor converts the prior odds (i.e., the ratio of prior probabilities) to the posterior odds (i.e., the ratio of posterior probabilities).

When we plug in the numerical values of the example, we get \[ \underbrace{\frac{0.164}{0.836}}_{\mbox{posterior odds}} = \underbrace{\frac{0.97}{0.05}}_{\mbox{Bayes factor}} \times \underbrace{\frac{0.01}{0.99}}_{\mbox{prior odds}} \] or \[ \underbrace{0.196}_{\mbox{posterior odds}} = \underbrace{19.4}_{\mbox{Bayes factor}} \times \underbrace{0.0102}_{\mbox{prior odds}}\] (the displayed values are rounded, whereas the underlying calculations retained many decimal places). Thus, even though the Bayes factor is 19.4 in favor of having the disease, the posterior probability of having the disease is only 0.164 because the prior probability was 0.01.

The real-life lesson of these sorts of examples is that when doing random disease screening or random drug testing, it is not appropriate to use the Bayes factor alone to interpret the result of the diagnostic test. The prior probabilities of the conditions (disease or healthy, using drug or not) must be factored in, and the decision should be based on the posterior probability.

**Disease diagnosis**

*is*model comparisonThe case of disease diagnosis is just a case of model comparison. In model comparison, we have some data and we infer the posterior probabilities of the models given the data. For disease diagnosis, the data consist of a dichotomous value, namely \(D\!\!=\!\!1\) or \(D\!\!=\!\!0\), and the two models are merely two possible underlying states of the world, namely \(M\!\!=\!\!1\) or \(M\!\!=\!\!2\). Disease diagnosis is simply asking which model, disease or healthy, is a better model of the outcome of the diagnostic test.

The moral of the disease-diagnosis example ---that the Bayes factor alone is not appropriate for selecting the model--- applies to model comparison generally. Model comparison should be based on the posterior probabilities of the models.

**Decision rule in model comparison sometimes uses only the Bayes factor, but should not**

One practice in model comparison is to base decisions on the Bayes factor, forgetting about or at least de-emphasizing the prior probabilities (and hence the posterior probabilities) of the models. A typical decision rule is,

Applying the Bayes-factor-only decision rule to disease diagnosis would lead us to decide that a person has the disease whenever the diagnostic test is positive because the Bayes factor is 19.4 which is much greater than 3. Using only the Bayes factor for decision making, instead of the posterior probabilities of the models, ignores Bayes' rule and is not Bayesian despite the terminology. A Bayes factor is only Bayesian in the full context of Bayes' rule, which converts prior probabilities to posterior probabilities. (Note that I am not concerned with the value 3 in the decision rule. The decision threshold could be 3 or 10 or whatever. Rather, the problem is not taking into account the prior probabilities of the models.)If the Bayes factor is >3 in favor of one model relative to the other, accept the better model and reject the other.

The usual justification for using the Bayes factor alone is two fold. First, if the prior probabilities of the models are all equal, then the posterior odds algebraically equal the Bayes factor. (You can see that for yourself from Equation \ref{eq:BF}, above.) Setting the prior probabilities to be equal

*seems*to be an innocuous default for many model-comparison settings, even if it's an absurd assumption for random screening of diseases or drug usage. Second, even if unequal prior probabilities for the models are assumed, the Bayes factor tells us how to modify those prior odds to arrive at the corresponding posterior odds. Therefore it is informative to report the Bayes factor.

The two justifications must be used carefully. First, I doubt that most model-comparison settings really are well represented by assuming equal prior probabilities of the models.

**Setting the prior probabilities equal to each other is an assertion of equality, not an expression of uncertainty; the next section discusses uncertainty. Setting the prior probabilities to 0.50/0.50 should be based on as much prior information as setting the prior probabilities to 0.01/0.99.**Second, even if the Bayes factor is reported to summarize the analysis, the Bayes factor should not be the basis for decision making. The decision should be based on the posterior probabilities of the models.

**Modeling uncertainty in the prior probabilities of the models**

The usual formal set-up, as in the equations above, assumes specific exact values for the prior probabilities of the models. That is to say, \(p(M\!\!=\!\!m)\) has a specific exact value such as \(p(M\!\!=\!\!1) = 0.01\) for the "has disease" model. But what if we are uncertain about the prior probabilities of the models? Modeling uncertainty is exactly what Bayesian analysis is all about.

In a previous version of this post, I made a mathematical derivation that seemed reasonable but which did not match my intuition for how the answer should play out. I asked readers to let me know if they could point out errors in the derivation. In a series of emails, Tomasz Miąsko (tomasz.miasko@gmail.com) patiently helped me to reconceptualize the derivation, and he also provided me with a simple JAGS script to verify the result. Thank you, Tomasz! I finally have clarity and here I present an elaborated derivation and JAGS script.

Denote \(p(M\!\!=\!\!1)\) by the parameter \(\pi_1\), which can take on any value from 0 to 1. Of course, \(p(M\!\!=\!\!2) = 1-\pi_1\). We express our uncertainty in \(\pi_1\) as a probability distribution \(p(\pi_1)\). For example, \(p(\pi_1)\) could be a beta distribution that has mode \(\omega\) and concentration \(\kappa\). (Hence the shape parameters for the beta distribution are \(a=\omega (\kappa-2) + 1\) and \(b= (1-\omega) (\kappa-2) + 1\).) If we are extremely certain about the prior probability of the model, then \(\kappa\) has a very large value, meaning that the prior distribution on \(\pi_1\) is tightly concentrated at \(\omega\), and the analysis converges to the point value assumed in the equations above. If we are uncertain about the prior probability of the model, then \(\kappa\) is a small-ish value, with \(\kappa \ge 2\). When \(\kappa=2\) the prior is a uniform distribution and we have "utter" uncertainty in the value of \(\pi_2\). Figure 2 shows a diagram of the model.

\begin{align}

p( M , \pi_1 | D )

&= p( D | M , \pi_1 ) p( M , \pi_1 ) / p^*(D)

\notag

\\

&= p( D | M ) p( M | \pi_1 ) p( \pi_1 ) / p^*(D)

\label{eq:pMpi1gD}

\end{align}

where, and this is crucial, \(p^*(D)\) is the double integral over the two-parameter space:

\begin{align}

p^*(D)

&=

\sum_m \int_{\pi_1=0}^{\pi_1=1} d\pi_1 p( D | M ) p( M | \pi_1 ) p( \pi_1 )

\label{eq:pD2param}

\end{align}

Notice that \(p^*(D)\) is a constant, not a function of \(\pi_1\).

To compute the posterior probability of \(M\) given data \(D\), we marginalize the two-parameter posterior of Equation \ref{eq:pMpi1gD}:

\begin{align}

p(M|D)

&=

\int_{\pi_1=0}^{\pi_1=1} d\pi_1 p( M , \pi_1 | D )

\notag

\\

&=

\int_{\pi_1=0}^{\pi_1=1} d\pi_1 p( D | M ) p( M | \pi_1 ) p( \pi_1 ) / p^*(D)

\notag

\\

&=

\frac{ p( D | M )}{ p^*(D) } \int_{\pi_1=0}^{\pi_1=1} d\pi_1 p( M | \pi_1 ) p( \pi_1 )

\label{eq:pMgD2param}

\end{align}

Notice in Equation \ref{eq:pMgD2param} (above) that \(p^*(D)\) was pulled in front of the integral sign, which is valid because \(p^*(D)\) is a constant. For the two models, Equation \ref{eq:pMgD2param} becomes

\begin{align}

p(M=1|D)

&=

\frac{ p( D | M=1 )}{ p^*(D) }

\int_{\pi_1=0}^{\pi_1=1} d\pi_1 \pi_1 p( \pi_1 )

\notag

\\

&\mbox{and}

\notag

\\

p(M=2|D)

&=

\frac{ p( D | M=2 )}{ p^*(D) }

\int_{\pi_1=0}^{\pi_1=1} d\pi_1 (1-\pi_1) p( \pi_1 )

\notag

\end{align}

The posterior odds are therefore

\begin{align}

\frac{ p(M=1|D) }{ p(M=2|D) }

&=

\frac{ p( D | M=1 ) }{ p( D | M=2 ) }

\times

\frac{ \int_{\pi_1=0}^{\pi_1=1} d\pi_1 \pi_1 p( \pi_1 ) }{ \int_{\pi_1=0}^{\pi_1=1} d\pi_1 (1-\pi_1) p( \pi_1 )}

\label{eq:postOdds2param}

\end{align}

In particular, when the distribution \(p( \pi_1 )\) is uniform with \(p( \pi_1 )=1.0\), the prior odds in Equation \ref{eq:postOdds2param} are 1/1. Thus, when there is complete uncertainty on \(\pi_1\) as expressed by a uniform distribution, then the posterior odds match the Bayes factor.

[What was wrong with the derivation of the previous post superseded by this post? The previous derivation began with the wrong conditional, but I didn't realize it because the notation suppressed the conditioned variable and hid the problem. The previous derivation started with \(p(M=1|D)\) without explicitly noting that it was conditional on \(\pi_1\). Therefore it was easy to think, in fact difficult not to think, that the derivation began with \(p(M=1,\pi_1|D)\) when in fact it began with \(p(M=1|D,\pi_1)\). Thus, when integrating over \(\pi_1\), the result is \(\int d\pi_1 \, p(M=1|D,\pi_1) \) and

*not*the desired \(\int d\pi_1 \, p(M=1,\pi_1|D) \). Thus, the derivation of the previous post derived some strange quantity and not the posterior probability of \(M\).]

**Numerical results from R and JAGS**

Here are some results from the R script for JAGS appended below. In all cases, the mode of the prior on \(\pi_1\) is \(\omega=0.01\).

When the prior on \(\pi_1\) is extremely concentrated around \(\omega=0.01\), with \(\kappa=20000\), a run of JAGS produces

P(M=1|D)=0.165; P(M=2|D)=0.835; P(M=1|D)/P(M=2|D)=0.198

This result essentially matches the numerical result in the opening paragraphs of this post, which assumed a singular value for \(\pi_1\) at 0.01.When the prior on \(\pi_1\) is only modestly concentrated around \(\omega=0.01\), with \(\kappa=20\), a run of JAGS produces

P(M=1|D)=0.547; P(M=2|D)=0.453; P(M=1|D)/P(M=2|D)=1.21

Notice that the posterior odds are now slightly in favor of having the disease, but are far from the Bayes factor of 19.4.When the prior on \(\pi_1\) is weakly concentrated around \(\omega=0.01\), with \(\kappa=10\), a run of JAGS produces

P(M=1|D)=0.701; P(M=2|D)=0.299; P(M=1|D)/P(M=2|D)=2.34

Notice that the posterior odds are leaning in favor of having the disease, but have still not exceeded a ratio of 3, and are still far from the Bayes factor of 19.4.When the prior on \(\pi_1\) is completely uncertain around \(\omega=0.01\), that is, a uniform distribution with \(\kappa=2\), a run of JAGS produces

P(M=1|D)=0.951; P(M=2|D)=0.0493; P(M=1|D)/P(M=2|D)=19.3

Here, with utter uncertainty in the prior probabilities of the models, the posterior odds match the Bayes
factor, as was commented just after Equation \ref{eq:postOdds2param}.**Recap:**The posterior odds of the models are not the same thing as the Bayes factor of the models. Decisions should be based on the posterior probabilities. The Bayes factor does not take into account the prior probabilities of the models (and despite its name the Bayes factor by itself does not even use Bayes' rule). The derivations and numerical examples above have shown that even with a lot of uncertainty in the prior probabilities on models, the posterior odds can be quite different than the Bayes factor.

*known to have a mode at*, which I suspect rarely occurs in reality.

**exactly**0.5**The R and JAGS script:**

# Original script by Tomasz Miasko 24-Dec-2015 tomasz.miasko@gmail.com

# Elaborated by John Kruschke 26-Dec-2015 johnkruschke@gmail.com

rm(list=ls())

library(rjags)

# JAGS model for disease diagnosis example, 27-December-2015,

# http://doingbayesiandataanalysis.blogspot.com/

# Datum D is outcome of diagnostic test(s) for a single individual.

# D=1 suggests disease is present, D=0 suggests disease is absent.

# pD1gM[m] is probability of D=1 given model index m.

# m=1 means disease is truly present, m=2 means disease is truly absent.

modelString = "

model {

for ( i in 1:length(D) ) {

D[i] ~ dbern( pD1gM[m] ) # pD1gM[1:2] are constants provided in data list.

}

m ~ dcat( pM[1:2] ) # model index probability.

pM[1] ~ dbeta( omega*(kappa-2)+1 ,

(1-omega)*(kappa-2)+1 ) # prior probability for m=1

pM[2] <- 1-pM[1] # prior probability for m=2

# omega and kappa are constants provided in data list.

}

"

# Specify constants:

# The result of the diagnostic test(s) for a single individual:

D = c(rep(1,1),rep(0,0)) # use D=NA for prior

# Probability D=1 for m=1,2; i.e.,

# correct detection rate and false alarm rate:

pD1gM = c(0.97,0.05) # high value for disease present, low value for absent

# Modal base rate of disease present:

omega = 0.01

# Certainty of modal base rate (kappa >= 2):

kappa = c(2,20,20000)[3]

model = jags.model(

file=textConnection(modelString),

data=list( D=D , pD1gM=pD1gM , omega=omega , kappa=kappa ),

n.chains=4)

update( object=model, n.iter=1000 )

samples = coda.samples(

model=model,

variable.names=c("m","pM"),

n.iter=125000) # total MCMC steps = n.chains*n.iter

chain = as.matrix(samples)

pM1gD = sum(chain[,"m"]==1)/length(chain[,"m"])

pM2gD = sum(chain[,"m"]==2)/length(chain[,"m"])

print( paste0( "For D = ", sum(D) , " 1's and " , length(D)-sum(D) , " 0's" ,

": P(M=1|D)=" , signif(pM1gD,3) ,

"; P(M=2|D)=" , signif(pM2gD,3) ,

"; P(M=1|D)/P(M=2|D)=", signif(pM1gD/pM2gD,3) ) )

hist( chain[,"pM[1]"] , breaks=51 ,

main=bquote("p( pM[1] | D=" *.(sum(D))* " 1's and "

*.(length(D)-sum(D))* " 0's )" ) ,

col="grey" , border="grey" , xlim=c(0,1) )

## Tuesday, December 22, 2015

### Bayesian Item Response Theory in JAGS: A Hierarchical Two Parameter Logistic Model

I recently created a hierarchical two-parameter logistic model for item response theory (IRT). The JAGS script is now in the folder of scripts that accompany the book (available at the book's web site; click book cover at right).

Below are slides that accompany my presentation of the material. I hope the slides are self-explanatory for those of you who are already familiar with IRT, and maybe even for those of you who are not. Maybe one day I'll record a narration over the slides and post a video. Meanwhile, I hope the slides below are useful.

By the way, if you find this program to be useful and adapt it to your own real data, please let me know about the results. (And if you write up the results, it's okay to cite this blog post and the book :-)

All analyses begin with the data. The data are correct/wrong answers for items on a test. The JAGS program will use long format for the data:

Each item is modeled as producing a 1/0 response with probability that depends on the item's difficulty and the subject's ability:

There is indeterminacy in the difficulty/ability scale, and two points must be arbitrarily pinned down. I chose the following: The most difficult not-impossible item is given a difficulty of 100, and the easiest non-trivial item is given a difficulty of 0. Then I opted to put the subject abilities and item discriminations under higher-level distributions, to get the benefits of shrinkage in a hierarchical model:

Here is the JAGS script itself (again, see the program folder that accompanies the book):

Now for the data used to demonstrate the model:

Results from the JAGS run. First, the MCMC diagnostics all look good:

Here are the estimates of individual items, listed in order of estimated difficulty. Notice that the easiest has difficulty fixed at 0, and the most difficult has difficulty fixed at 100.

Subject abilities:

Two of the (simulated) subjects have data from only a single item. But this does not leave the estimated ability unbounded because of shrinkage from the hierarchical model, and the Bayesian estimation provides exact degrees of uncertainty on the estimates:

And in Bayesian estimation it is easy to make comparisons of item difficulties, or of item discriminations, or of subject abilities:

Below are slides that accompany my presentation of the material. I hope the slides are self-explanatory for those of you who are already familiar with IRT, and maybe even for those of you who are not. Maybe one day I'll record a narration over the slides and post a video. Meanwhile, I hope the slides below are useful.

By the way, if you find this program to be useful and adapt it to your own real data, please let me know about the results. (And if you write up the results, it's okay to cite this blog post and the book :-)

All analyses begin with the data. The data are correct/wrong answers for items on a test. The JAGS program will use long format for the data:

(Click on any image to enlarge.) |

Each item is modeled as producing a 1/0 response with probability that depends on the item's difficulty and the subject's ability:

There is indeterminacy in the difficulty/ability scale, and two points must be arbitrarily pinned down. I chose the following: The most difficult not-impossible item is given a difficulty of 100, and the easiest non-trivial item is given a difficulty of 0. Then I opted to put the subject abilities and item discriminations under higher-level distributions, to get the benefits of shrinkage in a hierarchical model:

Here is the JAGS script itself (again, see the program folder that accompanies the book):

Now for the data used to demonstrate the model:

Results from the JAGS run. First, the MCMC diagnostics all look good:

(Click on any image to the images enlarged.) |

Here are the estimates of individual items, listed in order of estimated difficulty. Notice that the easiest has difficulty fixed at 0, and the most difficult has difficulty fixed at 100.

Subject abilities:

Two of the (simulated) subjects have data from only a single item. But this does not leave the estimated ability unbounded because of shrinkage from the hierarchical model, and the Bayesian estimation provides exact degrees of uncertainty on the estimates:

And in Bayesian estimation it is easy to make comparisons of item difficulties, or of item discriminations, or of subject abilities:

## Saturday, December 12, 2015

### Lessons from Bayesian disease diagnosis: Don't over-interpret the Bayes factor

**This post has been revised and is now superseded by the new post linked >HERE<.**

**Overview**

"Captain, the prior probability of this character dying is extremely small." |

*not*appropriate because it ignores the prior probability of the disease, which presumably is very rare. Suppose the prior probability of having the disease is 1%. Then Bayes' rule implies that the posterior probability of having the disease is only about 16%, even after testing positive! This type of example is presented over and over again in introductory expositions (e.g., pp. 103-105 of DBDA2E), emphatically saying not to use only the detection and false alarm rates, and always to incorporate the prior probabilities of the conditions.

The ratio of the detection and false-alarm rates is the

*Bayes factor*, hence the lesson of the primary example is: Do not use the Bayes factor alone to make decisions; instead always incorporate the prior probabilities of the options. Unfortunately, this lesson is often forgotten or at least de-emphasized in general model-selection settings. In model-selection settings, some people compute the Bayes factor and use it alone as the basis for model selection. The tacit assumption is that the models all have equal prior probabilities. But this assumption of equal prior probabilities seems implausible in most situations, and should be explicitly justified.

If we should not rely on the Bayes factor alone to make decisions in disease diagnosis or drug testing, why should we rely on the Bayes factor alone to make decisions in other model-selection applications?

In this post I first explain the role of the Bayes factor in the usual simple disease-diagnosis or drug-screening situation. Then I extend to a situation in which the prior probabilities are not certain. I show that in the limit, with utter uncertainty in the prior probabilities, the posterior odds fade to match the Bayes factor. But the decision should not be based on the Bayes factor alone.

**This post has been revised and is now superseded by the new post linked >HERE<.**

**Bayes' rule applied to disease diagnosis**

A person is randomly selected from a population in which the prior probability of having a particular disease is 1%. Denote the condition of the person by the variable \(M\), which can have two values, either \(M=\mbox{healthy}\) or \(M=\mbox{disease}\). The prior probability can be written as \(p(M=\mbox{disease}) = 0.01\), and hence \(p(M=\mbox{healthy}) = 0.99\).

A diagnostic test for the disease can show a "positive" result, suggesting the disease is present, or a "negative" result, suggesting the disease is absent. The result of the test is the datum, denoted \(D\), which can have two values \(D=\mbox{pos}\) or \(D=\mbox{neg}\). The detection rate of the test is 97%, meaning \(p( D=\mbox{pos} | M=\mbox{disease} ) = 0.97\). This is also called the "sensitivity" of the test. The false-alarm rate of the test is 5%, meaning \(p( D=\mbox{pos} | M=\mbox{healthy} ) = 0.05\). The complement of the false alarm rate is called the "specificity" of the test, which in this example is 95%.

Applying Bayes' rule yields: \[

p( M=\mbox{disease}| D=\mbox{pos} ) = p( D=\mbox{pos} | M=\mbox{disease} ) p( M=\mbox{disease} ) / p( D=\mbox{pos} )

\] and \[

p( M=\mbox{healthy}| D=\mbox{pos} ) = p( D=\mbox{pos} | M=\mbox{healthy} ) p( M=\mbox{healthy} ) / p( D=\mbox{pos} )

\] In the above two equations, the denominator \( p( D=\mbox{pos} ) \) is the same, computed as \( \sum_m p( D=\mbox{pos} | M=\mbox{m} ) p( M=\mbox{m} ) \). The ratio of the two equations is

\[

\underbrace{\frac{p( M=\mbox{disease}| D=\mbox{pos} )}{p( M=\mbox{healthy}| D=\mbox{pos} ) } }_{\mbox{posterior odds}}

=

\underbrace{\frac{p( D=\mbox{pos} | M=\mbox{disease} )}{p( D=\mbox{pos} | M=\mbox{healthy} )} }_{\mbox{Bayes factor}} \times

\underbrace{\frac{p( M=\mbox{disease} ) }{ p( M=\mbox{healthy} )}}_{\mbox{prior odds}}

\] Notice from the above equation that the

*Bayes factor*is the ratio of the detection rate to the false alarm rate (when the datum is \(D=\mbox{pos}\)). The Bayes factor converts the

*prior odds*(i.e., the ratio of prior probabilities) to the

*posterior odds*(i.e., the ratio of posterior probabilities).

When we plug in the numerical values of the example, we get

\[

\underbrace{\frac{0.164}{0.836}}_{\mbox{posterior odds}}

=

\underbrace{\frac{0.97}{0.05}}_{\mbox{Bayes factor}} \times

\underbrace{\frac{0.01}{0.99}}_{\mbox{prior odds}}

\]

or

\[

\underbrace{0.196}_{\mbox{posterior odds}}

=

\underbrace{19.4}_{\mbox{Bayes factor}} \times

\underbrace{0.0102}_{\mbox{prior odds}}

\]

(the displayed values are rounded, whereas the underlying calculations retained many decimal places). Thus, even though the Bayes factor is 19.4 in favor of having the disease, the posterior probability of having the disease is only 0.164 because the prior probability was 0.01.

The real-life lesson of these sorts of examples is that when doing random disease screening or random drug testing, it is not appropriate to use the Bayes factor alone to interpret the result of the diagnostic test. The prior probabilities of the conditions (disease or healthy, using drug or not) must be factored in, and the decision should be based on the posterior probability.

**Disease diagnosis**

*is*model comparisonThe case of disease diagnosis is just a case of model comparison. In model comparison, we have some data and we infer the posterior probabilities of the models given the data. For disease diagnosis, the data consist of a single dichotomous value, namely \(D=\mbox{pos}\) or \(D=\mbox{neg}\), and the two models are merely two possible underlying states of the world, namely \(M=\mbox{disease}\) or \(M=\mbox{healthy}\). Disease diagnosis is simply asking which model, disease or healthy, is a better model of the outcome of the diagnostic test.

The moral of the disease-diagnosis example ---that the Bayes factor alone is not appropriate for selecting the model--- applies to model comparison generally. Model comparison should be based on the posterior probabilities of the models.

**Decision rule in model comparison sometimes uses only the Bayes factor --- but should not**

One practice in model comparison is to base decisions on the Bayes factor, forgetting about or at least de-emphasizing the prior probabilities (and hence the posterior probabilities) of the models. A typical decision rule is,

Applying the Bayes-factor-only decision rule to disease diagnosis would lead us to decide that a person has the disease whenever the diagnostic test is positive because the Bayes factor is 19.4 which is much greater than 3. Using only the Bayes factor for decision making, instead of the posterior probabilities of the models, ignores Bayes' rule and is not Bayesian despite the terminology. A Bayes factor is only Bayesian in the full context of Bayes' rule, which converts prior probabilities to posterior probabilities.If the Bayes factor is >3 in favor of one model relative to the other, accept the better model and reject the other.

The usual justification for using the Bayes factor alone is two fold. First, if the prior probabilities of the models are all equal, then the posterior odds algebraically equal the Bayes factor. (You can see that for yourself from the equations above.) Setting the prior probabilities to be equal

*seems*to be an innocuous default for many model-comparison settings, even if it's an absurd assumption for random screening of diseases or drug usage. Second, even if unequal prior probabilities for the models are assumed, the Bayes factor tells us how to modify those prior odds to arrive at the corresponding posterior odds. Therefore it is informative to report the Bayes factor.

The two justifications must be used carefully. First, I doubt that most model-comparison settings really are well represented by assuming equal prior probabilities of the models.

*utterly ignorant*of the models and has no prior data or theory or knowledge of any sort to suggest that one model is more or less plausible than the other.

**Setting the prior probabilities equal to each other is an assertion of equality, not an expression of uncertainty; the next section discusses uncertainty.**Second, even if the Bayes factor is reported to summarize the analysis, the Bayes factor should not be the basis for decision making. The decision should be based on the posterior probabilities of the models.

**Modeling uncertainty in the prior probabilities of the models**

The usual formal set-up, as in the equations above, assumes specific values for the prior probabilities of the models. That is to say, \(p(M=m)\) has a specific value, such as 0.01 for the "disease" model. But what if we are uncertain about the prior probabilities of the models?

**This post has been revised and is now superseded by the new post linked >HERE<.**

Modeling uncertainty is exactly what Bayesian analysis is all about. It is straight forward to model uncertainty in the prior probabilities of the models. (I do not recall seeing this analysis in the literature, but I would be quite surprised if the following analysis has not appeared somewhere before. Maybe even in this blog.) Denote \(p(M=1)\) by the parameter \(\theta\), which can take on any value from 0 to 1. We will express our uncertainty in \(\theta\) as a beta distribution that has mode \(\omega\) and concentration \(\kappa\). (Hence the shape parameters for the beta distribution are \(a=\omega (\kappa-2) + 1\) and \(b= (1-\omega) (\kappa-2) + 1\).) That is, \(p(\theta)=dbeta(\theta\,|\,\omega,\kappa)\). If we are extremely certain about the prior probability of the model, then \(\kappa\) has a very large value, meaning that the prior distribution on \(\theta\) is tightly concentrated at \(\omega\), and the analysis converges to the point value assumed in the equations above. If we are uncertain about the prior probability of the model, then \(\kappa\) is a small-ish value, with \(\kappa \ge 2\) and \(\kappa=2\) indicating a uniform distribution.

For the two-model case, the posterior probabilities of the models are

\[

p(M=1|D) =

\int_0^1 d\theta \, p(\theta) \, p(D|M=1) \, \theta / [ p(D|M=1) \, \theta + p(D|M=2) \, (1-\theta) ]

\]

and

\[

p(M=2|D) =

\int_0^1 d\theta \, p(\theta) \, p(D|M=2) \, (1-\theta) / [ p(D|M=1) \, \theta + p(D|M=2) \, (1-\theta) ]

\]

The ratio of those equations becomes

\[

\underbrace{\frac{p(M=1|D)}{p(M=2|D)}}_{\mbox{Posterior odds}}

=

\underbrace{\frac{p(D|M=1)}{p(D|M=2)}}_{\mbox{Bayes factor}}

\frac{\int_0^1 d\theta \, p(\theta) \, \theta / [ p(D|M=1) \, \theta + p(D|M=2) \, (1-\theta) ]}{\int_0^1 d\theta \, p(\theta) \, (1-\theta) / [ p(D|M=1) \, \theta + p(D|M=2) \, (1-\theta) ]}

\]

Heartier souls should feel free to derive an analytical simplification of the integrals above, but I'll just perform a numerical approximation in R. The code is appended below.

Here are some numerical results for disease diagnosis when the test result is positive. In all cases, the mode of the prior on \(M=1\) is \(\omega=0.01\), reflecting the assumed rarity of the disease. When \(\kappa\) is very large at 10002, then the posterior odds of having the disease are

**0.197**, essentially matching the analysis presented earlier when using \(p(M=\mbox{dis})=0.01\) exactly. When \(\kappa\) is set to a moderate value of 102, then the posterior odds of having the disease are

**0.345**. When \(\kappa\) is set to a value of 12 that expresses great uncertainty, then the posterior odds of having the disease are

**1.23**. Finally, when \(\kappa\) is set to a value of 2 that expresses essentially total ignorance about the prior probability of the disease, which means that the prior probability of having the disease could be 0.01 or 0.5 or 0.99 uniformly, then the posterior odds of having the disease are

**7.65**, which is getting close to the Bayes factor of 19.4. (Intuition tells me the result should equal the Bayes factor, which means either my intuition is wrong or the mathematical derivation is wrong or my R code is wrong. At this point I trust the numerical result and suspect my intuition. Perhaps a kind soul will gently enlighten me.

**UPDATE 25-Dec-2015: I am re-thinking this, and now believe that something is wrong with the derivation, and that the intuition is right after all. Stay tuned...!**)

[Paragraph added 2015-Dec-13 12:50pm:] I think the math and numbers are correct. The prior ratio involves \(\int_0^1 d\theta \, p(\theta) \, \theta / [ p(D|M=1) \, \theta + p(D|M=2) \, (1-\theta) ]\) divided by 1 minus that integral. The integral involves \(p(D|M=1)\) and \(p(D|M=2)\), so the prior ratio will not generally become 1 when \(p(\theta)\) is uniform.

All of which is to say that if the prior probabilities of the models are uncertain, then the uncertainty can be incorporated into the analysis, but you must still specify a particular degree of uncertainty. The more uncertain you are about the prior probabilities of the models, the stronger is the influence of the Bayes factor. But notice that even with fairly high uncertainty, when \(\kappa=12\), the posterior odds were quite different than the Bayes factor. [Next sentence added 2015-Dec-13 9:12am] Moreover, even when there is complete uncertainty about the prior probabilities of the models, when \(\kappa=2\), the posterior odds are not as extreme as the Bayes factor.

R code:

# Specify distribution on prior probability of Model 1, having the disease:

omega=0.01 # Modal prior probability of disease.

kappa=c(2,12,102,10002)[4] # >= 2.0.

# Specify false alarm rate and hit rate of diagnostic test:

FA = 0.05 # p( Test=+ | healthy )

Hit = 0.97 # p( Test=+ | disease )

BF = Hit/FA # Bayes factor

# Create grid on theta, p(disease):

dTheta = 0.0001 # differential interval on theta.

Theta = seq(0,1,dTheta) # grid on theta for approximating the integral.

# Create prior distribution on theta, as a beta distribution:

pTheta = dbeta( Theta , shape1=omega*(kappa-2)+1 , shape2=(1-omega)*(kappa-2)+1 )

pTheta = (pTheta / sum(pTheta)) / dTheta # make sure that densities sum to 1.0

# Compute the prior integrals:

integTheta = sum( dTheta * pTheta * Theta/(FA+(Hit-FA)*Theta) )

integOneMinusTheta = sum( dTheta * pTheta * (1-Theta)/(FA+(Hit-FA)*Theta) )

# Compute posterior probability of disease, given Test=+

priorOdds = integTheta / integOneMinusTheta

pDisGivenTestPos = Hit * integTheta

pHealthGivenTestPos = FA * integOneMinusTheta

show( c( pDisGivenTestPos , pHealthGivenTestPos ,

pDisGivenTestPos / pHealthGivenTestPos ) )

show( c( priorOdds , BF , priorOdds * BF ) )

Subscribe to:
Posts (Atom)