Sunday, May 15, 2016

Bayesian inference in the (abnormal) mind

The (abnormal) mind can be modeled as a Bayesian inference engine, as summarized in the post, Bayesian reasoning implicated in some mental disorders. Excerpt:
“The brain is a guessing machine [i.e., Bayesian inference engine - JKK], trying at each moment of time to guess what is out there,” says computational neuroscientist Peggy Seriès. Guesses just slightly off — like mistaking a smile for a smirk — rarely cause harm. But guessing gone seriously awry may play a part in mental illnesses such as schizophrenia, autism and even anxiety disorders, Seriès and other neuroscientists suspect. They say that a mathematical expression known as Bayes’ theorem — which quantifies how prior expectations can be combined with current evidence — may provide novel insights into pernicious mental problems that have so far defied explanation.
For a tutorial about Bayesian models of perception and cognition, see Bayesian learning theory applied to human cognition.
http://www.bcs.rochester.edu/people/robbie/jacobskruschke_cogsci.pdf


Note that Bayesian modeling of data is a richly valuable approach regardless of whether any particular Bayesian model of mind is accurate. See this brief blog post for the distinction between (Bayesian) descriptive models of data, psychometric models, and models of mind. 
http://doingbayesiandataanalysis.blogspot.com/2011/10/bayesian-models-of-mind-psychometric.html

Thursday, April 28, 2016

Matlab version of BEST

Nils Winter has created a Matlab interface to BEST (Bayesian estimation for two groups of metric-scale data). You can read about it at his GitHub README

Tuesday, April 19, 2016

Bayesian estimation of log-normal parameters - Update

Using the log-normal density can be confusing because it's parameterized in terms of the mean and precision of the log-scale data, not the original-scale data. Thus, if your data, y, are nicely described by a log-normal distribution, the estimated mean and precision are for log(y), not y. To get the corresponding parameters for the original-scale data, you have to transform the estimated parameters. And the prior on the mean and precision have to be appropriate for log(y), as opposed to y. This post shows an example of a JAGS program (with rjags) for estimating the parameters of log-normal distributed data.
 
This post updates the R+JAGS code from a previous post. Please see that previous post for more background. The previous post had code appropriate for the 1st edition of the book; this post updates a few details for use with the 2nd edition of the book. In particular, the new code shows new MCMC diagnostic plots and uses a more appropriate prior on meanOfLogY. Here are some of its output graphs; R code follows below.


Here is the full R script:

#------------------------------------------------------------------------------

# Jags-Ymet-Xnom1grp-MlogNormal-Script.R
# April 19, 2016. John K. Kruschke.
# Requires auxiliary R scripts that accompany the book,
# Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:
# A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.

# graphics.off()
# rm(list=ls(all=TRUE))

source("DBDA2E-utilities.R")
fileNameRoot="Jags-Ymet-Xnom1grp-MlogNormal-Script"
graphFileType="png"

#------------------------------------------------------------------------------
# THE DATA.

# Generate some random data from known parameter values:
set.seed(47405)
trueLogM = 5.0 # true mean of log(y)
trueLogSD = 0.5 # true sd of log(y)
y = rnorm( n=125 )  # normal distribution of log-scale values
LogY = (y-mean(y))/sd(y)*trueLogSD + trueLogM
y = exp(LogY) # y is exponentiated values of log(y)

# OR JUST PUT IN YOUR ACTUAL DATA HERE:
# y = ...

# Package the data for shipping to JAGS:
dataList = list(
  y = y ,
  N = length(y) ,
  meanOfLogY = mean(log(y)) ,
  sdOfLogY = sd(log(y))
)

#------------------------------------------------------------------------------
# THE MODEL.

modelstring = "
  model {
    for( i in 1 : N ) {
      y[i] ~ dlnorm( muOfLogY , 1/sigmaOfLogY^2 )
    }
  sigmaOfLogY ~ dunif( 0.001*sdOfLogY , 1000*sdOfLogY )
  muOfLogY ~ dunif( 0.001*meanOfLogY , 1000*meanOfLogY )
  muOfY <- exp(muOfLogY+sigmaOfLogY^2/2)
  modeOfY <- exp(muOfLogY-sigmaOfLogY^2)
  sigmaOfY <- sqrt(exp(2*muOfLogY+sigmaOfLogY^2)*(exp(sigmaOfLogY^2)-1))
}
" # close quote for modelstring
writeLines(modelstring,con="model.txt")

#------------------------------------------------------------------------------
# INTIALIZE THE CHAINS.
# Let JAGS do it

#------------------------------------------------------------------------------
# RUN THE CHAINS

require(rjags)

parameters = c("muOfLogY" , "sigmaOfLogY" , "muOfY" , "modeOfY" , "sigmaOfY" )
adaptSteps = 1000         # Number of steps to "tune" the samplers.
burnInSteps = 1000        # Number of steps to "burn-in" the samplers.
nChains = 3               # Number of chains to run.
numSavedSteps=20000       # Total number of steps in chains to save.
thinSteps=1               # Number of steps to "thin" (1=keep every step).
nPerChain = ceiling( ( numSavedSteps * thinSteps ) / nChains ) # Steps per chain.
# Create, initialize, and adapt the model:
jagsModel = jags.model( "model.txt" , data=dataList ,
                        n.chains=nChains , n.adapt=adaptSteps )
# Burn-in:
cat( "Burning in the MCMC chain...\n" )
update( jagsModel , n.iter=burnInSteps )
# The saved MCMC chain:
cat( "Sampling final MCMC chain...\n" )
mcmcCoda = coda.samples( jagsModel , variable.names=parameters ,
                            n.iter=nPerChain , thin=thinSteps )

#------------------------------------------------------------------------------
# EXAMINE THE RESULTS

# Display diagnostics of chain, for specified parameters:
parameterNames = varnames(mcmcCoda) # get all parameter names
for ( parName in parameterNames ) {
  diagMCMC( codaObject=mcmcCoda , parName=parName ,
            saveName=fileNameRoot , saveType=graphFileType )
}


# Convert coda-object codaSamples to matrix object for easier handling.
mcmcChain = as.matrix( mcmcCoda )
chainLength = NROW(mcmcChain)

openGraph(width=10,height=6)
layout(matrix(1:6,nrow=2,byrow=TRUE))
# posterior predictive
hist( dataList$y , xlab="y" , main="Data w. Post. Pred." , breaks=30 ,
      col="pink" , border="white" , prob=TRUE , cex.lab=1.5)
pltIdx = floor(seq(1,chainLength,length=20))
xComb = seq( min(dataList$y) , max(dataList$y) , length=501 )
for ( chnIdx in pltIdx ) {
  lines( xComb ,
         dlnorm( xComb, mcmcChain[chnIdx,"muOfLogY"], mcmcChain[chnIdx,"sigmaOfLogY"] ),
         col="skyblue" )
}
# param's of log(y)
postInfo = plotPost( mcmcChain[,"muOfLogY"] , xlab="mu of log(y)" )
postInfo = plotPost( mcmcChain[,"sigmaOfLogY"] , xlab="sigma of log(y)" )
# param's of y
postInfo = plotPost( mcmcChain[,"modeOfY"] , xlab="mode of y" )
postInfo = plotPost( mcmcChain[,"muOfY"] , xlab="mu of y" )
postInfo = plotPost( mcmcChain[,"sigmaOfY"] , xlab="sigma of y" , cenTend="mode")

saveGraph(file=fileNameRoot,type=graphFileType)

#-------------------------------------------------------------------------------

Friday, January 29, 2016

The remains of R. A. Fisher are visited by Bayesians again

Thanks to Phillip Alday, the second edition of Doing Bayesian Data Analysis has visited the remains of R. A. Fisher at St. Peter's Cathedral in Adelaide. Here is a photo that Phillip snapped: 
In a previous post, the first edition visited Fisher's remains. The book has also visited Bayes tomb, Laplace's memorial, and Jacob Bernoulli's grave. The goal here is to have some fun and pique some historical interest, not to show any (serious) disrespect. If you pose the book at sites of other relevant people, dead or not-quite-yet-dead, let me know!

Thursday, December 31, 2015

(A pointer to) bayes.js: A Small Library for Doing MCMC in the Browser

Screen shot from bayes.js blog post.
This is just a pointer to a new post at Rasmus Bååth's Blog that might interest readers of DBDA2E.

bayes.js: A small JavaScript library that implements an adaptive MCMC sampler and a couple of probability distributions, and that makes it relatively easy to implement simple Bayesian models in JavaScript.

See the post at http://www.sumsar.net/blog/2015/12/bayes-js-a-small-library-for-doing-mcmc-in-the-browser/

Sunday, December 27, 2015

Lessons from Bayesian disease diagnosis: Don't over-interpret the Bayes factor, VERSION 2


[This is a revised version of a previous post which is superseded by this post. 
This revision has corrected derivations, new R/JAGS code, and new diagrams.] 

Overview
"Captain, the prior probability of this
character dying and leaving the show
is infinitesimal."
A primary example of Bayes' rule is for disease diagnosis (or illicit drug screening). The example is invoked routinely to explain the importance of prior probabilities. Here's one version of it: Suppose a diagnostic test has a 97% detection rate and a 5% false alarm rate. Suppose a person selected at random tests positive. What is the probability that the person has the disease? It might seem that the odds of having the disease are 0.97/0.05 (i.e., the detection rate over the false alarm rate), which corresponds to a probability of about 95%. But Bayesians know that such an answer is not appropriate because it ignores the prior probability of the disease, which presumably is very rare. Suppose the prior probability of having the disease is 1%. Then Bayes' rule implies that the posterior probability of having the disease is only about 16%, even after testing positive! This type of example is presented over and over again in introductory expositions (e.g., pp. 103-105 of DBDA2E), emphatically saying not to use only the detection and false alarm rates, and always to incorporate the prior probabilities of the conditions.


The ratio of the detection and false-alarm rates is the Bayes factor, hence the lesson of the primary example is: Do not use the Bayes factor alone to make decisions; instead always incorporate the prior probabilities of the options. Unfortunately, this lesson is often forgotten or at least de-emphasized in general model-selection settings. In model-selection settings, some people compute the Bayes factor and use it alone as the basis for model selection. The tacit assumption is that the models all have equal prior probabilities. But this assumption of equal prior probabilities seems implausible in most situations, and should be explicitly justified. If we should not rely on the Bayes factor alone to make decisions in disease diagnosis or drug testing, why should we rely on the Bayes factor alone to make decisions in other model-selection applications?

In this post
  • I explain the role of the Bayes factor in the simple disease-diagnosis situation. (This also applies to random drug screening.)
  • Then I extend to the situation in which the prior probabilities are not certain. I show that in the limit, with utter uncertainty in the prior probabilities, the posterior odds fade to match the Bayes factor. But even with some modest knowledge of the base rates, the posterior odds can be far from the Bayes factor. 
  • Hierarchical diagrams are provided to illustrate the model comparisons.
  • JAGS code is provided for hands-on experience.
Conclusion: When deciding between models, the decision should not be based on the Bayes factor alone.

Bayes' rule applied to disease diagnosis

A person is randomly selected from a population in which the prior probability of having a particular disease is 1%. Denote the condition of the person by the variable \(M\), which can have two values, either \(M\!\!=\!\!1\) for having the disease or \(M\!\!=\!\!2\) for not having the disease. The prior probability can be written as \(p(M\!\!=\!\!1) = 0.01\), and hence \(p(M\!\!=\!\!2) = 0.99\).

A diagnostic test for the disease can show a "positive" result suggesting that the disease is present, or a "negative" result suggesting that the disease is absent. The result of the test is the datum, denoted \(D\), which can have two values: \(D\!\!=\!\!1\) for "positive" or \(D\!\!=\!\!0\) for "negative". The correct-detection rate (a.k.a. sensitivity) of the test is 97%, meaning \(p( D\!\!=\!\!1 | M\!\!=\!\!1 ) = 0.97\). The false-alarm rate of the test is 5%, meaning \(p( D\!\!=\!\!1 | M\!\!=\!\!2 ) = 0.05\). The complement of the false alarm rate is called the specificity of the test, which in this example is 95%.

Applying Bayes' rule yields: \[ p( M\!\!=\!\!1| D\!\!=\!\!1 ) = p( D\!\!=\!\!1 | M\!\!=\!\!1 ) p( M\!\!=\!\!1 ) / p( D\!\!=\!\!1 ) \] and \[ p( M\!\!=\!\!2| D\!\!=\!\!1 ) = p( D\!\!=\!\!1 | M\!\!=\!\!2 ) p( M\!\!=\!\!2 ) / p( D\!\!=\!\!1 ) \] In the above two equations, the denominator \( p( D\!\!=\!\!1 ) \) is the same, computed as \( p( D\!\!=\!\!1 ) = \sum_m p( D\!\!=\!\!1 | M\!\!=\!\!m ) p( M\!\!=\!\!m ) \). The ratio of the two equations is \begin{align} \underbrace{\frac{p( M\!\!=\!\!1 | D\!\!=\!\!1 )}{p( M\!\!=\!\!2 | D\!\!=\!\!1 ) } }_{\mbox{posterior odds}} &= \underbrace{\frac{p( D\!\!=\!\!1 | M\!\!=\!\!1 )}{p( D\!\!=\!\!1 | M\!\!=\!\!2 )} }_{\mbox{Bayes factor}} \times \underbrace{\frac{p( M\!\!=\!\!1 ) }{ p( M\!\!=\!\!2 )}}_{\mbox{prior odds}} \label{eq:BF}\end{align} Notice from Equation \ref{eq:BF} that the Bayes factor is the ratio of the detection rate to the false alarm rate (when the datum is \(D\!\!=\!\!1\)). The Bayes factor converts the prior odds (i.e., the ratio of prior probabilities) to the posterior odds (i.e., the ratio of posterior probabilities).

When we plug in the numerical values of the example, we get \[ \underbrace{\frac{0.164}{0.836}}_{\mbox{posterior odds}} = \underbrace{\frac{0.97}{0.05}}_{\mbox{Bayes factor}} \times \underbrace{\frac{0.01}{0.99}}_{\mbox{prior odds}} \] or \[  \underbrace{0.196}_{\mbox{posterior odds}} = \underbrace{19.4}_{\mbox{Bayes factor}} \times \underbrace{0.0102}_{\mbox{prior odds}}\] (the displayed values are rounded, whereas the underlying calculations retained many decimal places). Thus, even though the Bayes factor is 19.4 in favor of having the disease, the posterior probability of having the disease is only 0.164 because the prior probability was 0.01.

The real-life lesson of these sorts of examples is that when doing random disease screening or random drug testing, it is not appropriate to use the Bayes factor alone to interpret the result of the diagnostic test. The prior probabilities of the conditions (disease or healthy, using drug or not) must be factored in, and the decision should be based on the posterior probability. 


Disease diagnosis is model comparison

The case of disease diagnosis is just a case of model comparison. In model comparison, we have some data and we infer the posterior probabilities of the models given the data. For disease diagnosis, the data consist of a dichotomous value, namely \(D\!\!=\!\!1\) or \(D\!\!=\!\!0\), and the two models are merely two possible underlying states of the world, namely \(M\!\!=\!\!1\) or \(M\!\!=\!\!2\). Disease diagnosis is simply asking which model, disease or healthy, is a better model of the outcome of the diagnostic test.

The moral of the disease-diagnosis example ---that the Bayes factor alone is not appropriate for selecting the model--- applies to model comparison generally. Model comparison should be based on the posterior probabilities of the models.

Figure 1: Disease diagnosis as model comparison. At bottom, D is the datum from the diagnostic test, with D=1 denoting "positive" result and D=0 denoting negative result. At left, model M=1 represents disease present. The right bar of its Bernoulli distribution over D=1 has height labeled by p(D=1|M=1), which is the correct-detection rate. At right, model M=2 represents disease absent.The right bar of its Bernoulli distribution over D=1 has height labeled by p(D=1|M=2) which is the false-alarm rate. At the top of the diagram is the base rate of the diseases. The prior probability of having the disease, M=1, is \(\pi_1\).
Figure 1 shows disease diagnosis as model comparison analogous to the diagrams in Figures 10.1 and 10.2 of DBDA2E (pp. 267 and 269). The caption provides a description. Essentially, the model for "has disease" has likelihood function \(D \sim \mbox{dbern}( p(D\!\!=\!\!1|M\!\!=\!\!1) ) \) and the model for "no disease" has likelihood function \(D \sim \mbox{dbern}( p(D\!\!=\!\!1|M\!\!=\!\!2) ) \). The prior probability of the model indices is \(M \sim \mbox{dcat}( \pi_1 , 1\!-\!\pi_1 )\).

Decision rule in model comparison sometimes uses only the Bayes factor, but should not

One practice in model comparison is to base decisions on the Bayes factor, forgetting about or at least de-emphasizing the prior probabilities (and hence the posterior probabilities) of the models. A typical decision rule is,

If the Bayes factor is >3 in favor of one model relative to the other, accept the better model and reject the other. 
Applying the Bayes-factor-only decision rule to disease diagnosis would lead us to decide that a person has the disease whenever the diagnostic test is positive because the Bayes factor is 19.4 which is much greater than 3. Using only the Bayes factor for decision making, instead of the posterior probabilities of the models, ignores Bayes' rule and is not Bayesian despite the terminology. A Bayes factor is only Bayesian in the full context of Bayes' rule, which converts prior probabilities to posterior probabilities. (Note that I am not concerned with the value 3 in the decision rule. The decision threshold could be 3 or 10 or whatever. Rather, the problem is not taking into account the prior probabilities of the models.)

The usual justification for using the Bayes factor alone is two fold. First, if the prior probabilities of the models are all equal, then the posterior odds algebraically equal the Bayes factor. (You can see that for yourself from Equation \ref{eq:BF}, above.) Setting the prior probabilities to be equal seems to be an innocuous default for many model-comparison settings, even if it's an absurd assumption for random screening of diseases or drug usage. Second, even if unequal prior probabilities for the models are assumed, the Bayes factor tells us how to modify those prior odds to arrive at the corresponding posterior odds. Therefore it is informative to report the Bayes factor.

The two justifications must be used carefully. First, I doubt that most model-comparison settings really are well represented by assuming equal prior probabilities of the models. Setting the prior probabilities equal to each other is an assertion of equality, not an expression of uncertainty; the next section discusses uncertainty. Setting the prior probabilities to 0.50/0.50 should be based on as much prior information as setting the prior probabilities to 0.01/0.99. Second, even if the Bayes factor is reported to summarize the analysis, the Bayes factor should not be the basis for decision making. The decision should be based on the posterior probabilities of the models.

Modeling uncertainty in the prior probabilities of the models

The usual formal set-up, as in the equations above, assumes specific exact values for the prior probabilities of the models. That is to say, \(p(M\!\!=\!\!m)\) has a specific exact value such as
\(p(M\!\!=\!\!1) = 0.01\) for the "has disease" model. But what if we are uncertain about the prior probabilities of the models? Modeling uncertainty is exactly what Bayesian analysis is all about.

In a previous version of this post, I made a mathematical derivation that seemed reasonable but which did not match my intuition for how the answer should play out. I asked readers to let me know if they could point out errors in the derivation. In a series of emails, Tomasz Miąsko (tomasz.miasko@gmail.com) patiently helped me to reconceptualize the derivation, and he also provided me with a simple JAGS script to verify the result. Thank you, Tomasz! I finally have clarity and here I present an elaborated derivation and JAGS script.

Denote \(p(M\!\!=\!\!1)\) by the parameter \(\pi_1\), which can take on any value from 0 to 1. Of course, \(p(M\!\!=\!\!2) = 1-\pi_1\). We express our uncertainty in \(\pi_1\) as a probability distribution \(p(\pi_1)\). For example, \(p(\pi_1)\) could be a beta distribution that has mode \(\omega\) and concentration \(\kappa\). (Hence the shape parameters for the beta distribution are \(a=\omega (\kappa-2) + 1\) and \(b= (1-\omega) (\kappa-2) + 1\).) If we are extremely certain about the prior probability of the model, then \(\kappa\) has a very large value, meaning that the prior distribution on \(\pi_1\) is tightly concentrated at \(\omega\), and the analysis converges to the point value assumed in the equations above. If we are uncertain about the prior probability of the model, then \(\kappa\) is a small-ish value, with \(\kappa \ge 2\). When \(\kappa=2\) the prior is a uniform distribution and we have "utter" uncertainty in the value of \(\pi_2\). Figure 2 shows a diagram of the model.

Figure 2: Disease diagnosis with uncertainty in the base rates of the diseases. See Figure 1 for description of the lower part of the diagram. Near the top of the diagram is the base rate of the diseases, where the prior probability of having the disease is denoted \(\pi_1\). There is uncertainty in the value of \(\pi_1\) represented by the beta distribution at the top of the diagram. The beta distribution has mode \(\omega\) and concentration \(\kappa\).

Notice that the overall model involves two parameters: \(M\) and \(\pi_1\). Consider Bayes' rule for the two-parameter space (as pointed out to me by Tomasz Miasko):
\begin{align}
p( M , \pi_1 | D )
&= p( D | M , \pi_1 ) p( M , \pi_1 ) / p^*(D)
\notag
\\
&= p( D | M ) p( M | \pi_1 ) p( \pi_1 ) / p^*(D)
\label{eq:pMpi1gD}
\end{align}
where, and this is crucial, \(p^*(D)\) is the double integral over the two-parameter space:
\begin{align}
p^*(D)
&=
\sum_m \int_{\pi_1=0}^{\pi_1=1} d\pi_1 p( D | M ) p( M | \pi_1 ) p( \pi_1 )
\label{eq:pD2param}
\end{align}
Notice that \(p^*(D)\) is a constant, not a function of \(\pi_1\).

To compute the posterior probability of \(M\) given data \(D\), we marginalize the two-parameter posterior of Equation \ref{eq:pMpi1gD}:
\begin{align}
p(M|D)
&=
\int_{\pi_1=0}^{\pi_1=1} d\pi_1 p( M , \pi_1 | D )
\notag
\\
&=
\int_{\pi_1=0}^{\pi_1=1} d\pi_1 p( D | M ) p( M | \pi_1 ) p( \pi_1 ) / p^*(D)
\notag
\\
&=
\frac{ p( D | M )}{ p^*(D) } \int_{\pi_1=0}^{\pi_1=1} d\pi_1  p( M | \pi_1 ) p( \pi_1 )
\label{eq:pMgD2param}
\end{align}
Notice in Equation \ref{eq:pMgD2param} (above) that \(p^*(D)\) was pulled in front of the integral sign, which is valid because \(p^*(D)\) is a constant. For the two models, Equation \ref{eq:pMgD2param} becomes
\begin{align}
p(M=1|D)
&=
\frac{ p( D | M=1 )}{ p^*(D) }
\int_{\pi_1=0}^{\pi_1=1} d\pi_1  \pi_1 p( \pi_1 )
\notag
\\
&\mbox{and}
\notag
\\
p(M=2|D)
&=
\frac{ p( D | M=2 )}{ p^*(D) }
\int_{\pi_1=0}^{\pi_1=1} d\pi_1  (1-\pi_1) p( \pi_1 )
\notag
\end{align}
The posterior odds are therefore
\begin{align}
\frac{ p(M=1|D) }{ p(M=2|D) }
&=
\frac{ p( D | M=1 ) }{ p( D | M=2 ) }
\times
\frac{ \int_{\pi_1=0}^{\pi_1=1} d\pi_1  \pi_1 p( \pi_1 ) }{ \int_{\pi_1=0}^{\pi_1=1} d\pi_1  (1-\pi_1) p( \pi_1 )}
\label{eq:postOdds2param}
\end{align}

In particular, when the distribution \(p( \pi_1 )\) is uniform with \(p( \pi_1 )=1.0\), the prior odds in Equation \ref{eq:postOdds2param} are 1/1. Thus, when there is complete uncertainty on \(\pi_1\) as expressed by a uniform distribution, then the posterior odds match the Bayes factor.

[What was wrong with the derivation of the previous post superseded by this post? The previous derivation began with the wrong conditional, but I didn't realize it because the notation suppressed the conditioned variable and hid the problem. The previous derivation started with \(p(M=1|D)\) without explicitly noting that it was conditional on \(\pi_1\). Therefore it was easy to think, in fact difficult not to think, that the derivation began with \(p(M=1,\pi_1|D)\) when in fact it began with \(p(M=1|D,\pi_1)\). Thus, when integrating over \(\pi_1\), the result is \(\int d\pi_1 \, p(M=1|D,\pi_1) \) and not the desired \(\int d\pi_1 \, p(M=1,\pi_1|D) \). Thus, the derivation of the previous post derived some strange quantity and not the posterior probability of \(M\).]


Numerical results from R and JAGS

Here are some results from the R script for JAGS appended below. In all cases, the mode of the prior on \(\pi_1\) is \(\omega=0.01\).

When the prior on \(\pi_1\) is extremely concentrated around \(\omega=0.01\), with \(\kappa=20000\), a run of JAGS produces
P(M=1|D)=0.165; P(M=2|D)=0.835; P(M=1|D)/P(M=2|D)=0.198
This result essentially matches the numerical result in the opening paragraphs of this post, which assumed a singular value for \(\pi_1\) at 0.01.

When the prior on \(\pi_1\) is only modestly concentrated around \(\omega=0.01\), with \(\kappa=20\), a run of JAGS produces
P(M=1|D)=0.547; P(M=2|D)=0.453; P(M=1|D)/P(M=2|D)=1.21
Notice that the posterior odds are now slightly in favor of having the disease, but are far from the Bayes factor of 19.4.


When the prior on \(\pi_1\) is weakly concentrated around \(\omega=0.01\), with \(\kappa=10\), a run of JAGS produces
P(M=1|D)=0.701; P(M=2|D)=0.299; P(M=1|D)/P(M=2|D)=2.34
Notice that the posterior odds are leaning in favor of having the disease, but have still not exceeded a ratio of 3, and are still far from the Bayes factor of 19.4.


When the prior on \(\pi_1\) is completely uncertain around \(\omega=0.01\), that is, a uniform distribution with \(\kappa=2\), a run of JAGS produces
P(M=1|D)=0.951; P(M=2|D)=0.0493; P(M=1|D)/P(M=2|D)=19.3
Here, with utter uncertainty in the prior probabilities of the models, the posterior odds match the Bayes factor, as was commented just after Equation \ref{eq:postOdds2param}.

Recap: The posterior odds of the models are not the same thing as the Bayes factor of the models. Decisions should be based on the posterior probabilities. The Bayes factor does not take into account the prior probabilities of the models (and despite its name the Bayes factor by itself does not even use Bayes' rule). The derivations and numerical examples above have shown that even with a lot of uncertainty in the prior probabilities on models, the posterior odds can be quite different than the Bayes factor. The posterior odds numerically equal the Bayes factor when (i) there is utter uniform uncertainty in the prior probabilities of the models or (ii) when there is complete prior knowledge that the models have exactly equal prior probability. I suspect that neither of those conditions occurs in reality. [Revised 28-Dec-2015, 9:35am] The posterior odds numerically equal the Bayes factor when (as a broad sufficient but not necessary condition) the uncertainty of the model prior probability is symmetric and known to have a mode at exactly 0.5, which I suspect rarely occurs in reality.

The R and JAGS script:

# Original script by Tomasz Miasko 24-Dec-2015 tomasz.miasko@gmail.com
# Elaborated by John Kruschke 26-Dec-2015 johnkruschke@gmail.com

rm(list=ls())

library(rjags)

# JAGS model for disease diagnosis example, 27-December-2015,
# http://doingbayesiandataanalysis.blogspot.com/

# Datum D is outcome of diagnostic test(s) for a single individual.
# D=1 suggests disease is present, D=0 suggests disease is absent.
# pD1gM[m] is probability of D=1 given model index m.
# m=1 means disease is truly present, m=2 means disease is truly absent.

modelString = "
model {
  for ( i in 1:length(D) ) {
    D[i] ~ dbern( pD1gM[m] ) # pD1gM[1:2] are constants provided in data list.
  }
  m ~ dcat( pM[1:2] )   # model index probability.
  pM[1] ~ dbeta( omega*(kappa-2)+1 ,
                (1-omega)*(kappa-2)+1 ) # prior probability for m=1
  pM[2] <- 1-pM[1]      # prior probability for m=2
  # omega and kappa are constants provided in data list.
}
"

# Specify constants:
# The result of the diagnostic test(s) for a single individual:
D = c(rep(1,1),rep(0,0)) # use D=NA for prior
# Probability D=1 for m=1,2; i.e.,
# correct detection rate and false alarm rate:
pD1gM = c(0.97,0.05) # high value for disease present, low value for absent
# Modal base rate of disease present:
omega = 0.01
# Certainty of modal base rate (kappa >= 2):
kappa = c(2,20,20000)[3]

model = jags.model(
  file=textConnection(modelString),
  data=list( D=D , pD1gM=pD1gM , omega=omega , kappa=kappa ),
  n.chains=4)
update( object=model, n.iter=1000 )
samples = coda.samples(
  model=model,
  variable.names=c("m","pM"),
  n.iter=125000) # total MCMC steps = n.chains*n.iter
chain = as.matrix(samples)

pM1gD = sum(chain[,"m"]==1)/length(chain[,"m"])
pM2gD = sum(chain[,"m"]==2)/length(chain[,"m"])

print( paste0( "For D = ", sum(D) , " 1's and " , length(D)-sum(D) , " 0's" ,
              ": P(M=1|D)=" , signif(pM1gD,3) ,
              "; P(M=2|D)=" , signif(pM2gD,3) ,
              "; P(M=1|D)/P(M=2|D)=", signif(pM1gD/pM2gD,3) ) )
hist( chain[,"pM[1]"] , breaks=51 ,
      main=bquote("p( pM[1] | D=" *.(sum(D))* " 1's and " 
                    *.(length(D)-sum(D))* " 0's )" ) ,
      col="grey" , border="grey" , xlim=c(0,1) )

Tuesday, December 22, 2015

Bayesian Item Response Theory in JAGS: A Hierarchical Two Parameter Logistic Model

I recently created a hierarchical two-parameter logistic model for item response theory (IRT). The JAGS script is now in the folder of scripts that accompany the book (available at the book's web site; click book cover at right). 

Below are slides that accompany my presentation of the material. I hope the slides are self-explanatory for those of you who are already familiar with IRT, and maybe even for those of you who are not. Maybe one day I'll record a narration over the slides and post a video. Meanwhile, I hope the slides below are useful.

By the way, if you find this program to be useful and adapt it to your own real data, please let me know about the results. (And if you write up the results, it's okay to cite this blog post and the book :-)

All analyses begin with the data. The data are correct/wrong answers for items on a test. The JAGS program will use long format for the data:
(Click on any image to enlarge.)

Each item is modeled as producing a 1/0 response with probability that depends on the item's difficulty and the subject's ability:




There is indeterminacy in the difficulty/ability scale, and two points must be arbitrarily pinned down. I chose the following: The most difficult not-impossible item is given a difficulty of 100, and the easiest non-trivial item is given a difficulty of 0. Then I opted to put the subject abilities and item discriminations under higher-level distributions, to get the benefits of shrinkage in a hierarchical model:

Here is the JAGS script itself (again, see the program folder that accompanies the book):





Now for the data used to demonstrate the model:

Results from the JAGS run. First, the MCMC diagnostics all look good:
(Click on any image to the images enlarged.)


Here are the estimates of individual items, listed in order of estimated difficulty. Notice that the easiest has difficulty fixed at 0, and the most difficult has  difficulty fixed at 100.













Subject abilities:


 

Two of the (simulated) subjects have data from only a single item. But this does not leave the estimated ability unbounded because of shrinkage from the hierarchical model, and the Bayesian estimation provides exact degrees of uncertainty on the estimates:

And in Bayesian estimation it is easy to make comparisons of item difficulties, or of item discriminations, or of subject abilities: