- Applied Bayesian Modeling for the Social Sciences, Dave Armstrong
- Bayesian Methods, Nick Beauchamp
- Statistical & Cognitive Modeling for Formal Semantics, Adrian Brasoveanu
- Introduction to Bayesian Analysis, Brad Carlin
- Doing Bayesian Data Analysis, John Kruschke
- Applied Bayesian Statistics, Marco Steenbergen
- Statistics VI, Tuerlinckx and Vanpaemel (from which this book review arose)
- Bayesian Data Analysis, Uppsala University
- Topics in Quantitative Psychology: Bayesian Data Analysis (no web page), Caren Rotello, U. of Massachusetts
Saturday, June 30, 2012
Courses that use Doing Bayesian Data Analysis?
Instructors seek examples of using the book, Doing Bayesian Data Analysis, as part of a course. A cursory web search yielded these few listed below, but there must be others. For example, my own course web page did not show up in the search, nor did another course from which a published review of the book arose. Please let me know of other courses that use the book, and I will update this list in a subsequent post.
Monday, June 25, 2012
Workshop, July 11-13, U. Wisconsin, Madison
Wednesday, June 20, 2012
Solutions to exercises now available to everyone
For solutions to the exercises, please see the book's web site here (https://sites.google.com/site/doingbayesiandataanalysis/exercises).
Wednesday, June 6, 2012
Mixture of Normal Distributions
In this post I show a simple illustration of a mixture of normal distributions. For the examples, we assume we have metric values that we suppose are generated by a mixture of two different normal distributions, which I'll call clusters. We don't know which datum came from each cluster. Our goal is to estimate the probability that each score came from each of the two clusters, and the means and SD of the normal distributions that describe the clusters.
The model specification (for JAGS): The assumes that the clusters have the same standard deviation, but different means.
model {
# Likelihood:
for( i in 1 : N ) {
y[i] ~ dnorm( mu[i] , tau )
mu[i] <- muOfClust[ clust[i] ]
clust[i] ~ dcat( pClust[1:Nclust] )
}
# Prior:
tau ~ dgamma( 0.01 , 0.01 )
for ( clustIdx in 1: Nclust ) {
muOfClust[clustIdx] ~ dnorm( 0 , 1.0E-10 )
}
pClust[1:Nclust] ~ ddirch( onesRepNclust )
}
The data specification:
# Generate random data from known parameter values:
set.seed(47405)
trueM1 = 100
N1 = 200
trueM2 = 145 # 145 for first example below; 130 for second example
N2 = 200
trueSD = 15
effsz = abs( trueM2 - trueM1 ) / trueSD
y1 = rnorm( N1 )
y1 = (y1-mean(y1))/sd(y1) * trueSD + trueM1
y2 = rnorm( N2 )
y2 = (y2-mean(y2))/sd(y2) * trueSD + trueM2
y = c( y1 , y2 )
N = length(y)
# Must have at least one data point with fixed assignment
# to each cluster, otherwise some clusters will end up empty:
Nclust = 2
clust = rep(NA,N)
clust[which.min(y)]=1 # smallest value assigned to cluster 1
clust[which.max(y)]=2 # highest value assigned to cluster 2
dataList = list(
y = y ,
N = N ,
Nclust = Nclust ,
clust = clust ,
onesRepNclust = rep(1,Nclust)
)
Results when mean of cluster 2 is 3 standard deviations away from mean of cluster 1: The posterior recovers the generating values fairly well.
Results when mean of cluster 2 is 2 standard deviations away from mean of cluster 1: There is lots of uncertainty. See captions for discussion.
The model specification (for JAGS): The assumes that the clusters have the same standard deviation, but different means.
model {
# Likelihood:
for( i in 1 : N ) {
y[i] ~ dnorm( mu[i] , tau )
mu[i] <- muOfClust[ clust[i] ]
clust[i] ~ dcat( pClust[1:Nclust] )
}
# Prior:
tau ~ dgamma( 0.01 , 0.01 )
for ( clustIdx in 1: Nclust ) {
muOfClust[clustIdx] ~ dnorm( 0 , 1.0E-10 )
}
pClust[1:Nclust] ~ ddirch( onesRepNclust )
}
The data specification:
# Generate random data from known parameter values:
set.seed(47405)
trueM1 = 100
N1 = 200
trueM2 = 145 # 145 for first example below; 130 for second example
N2 = 200
trueSD = 15
effsz = abs( trueM2 - trueM1 ) / trueSD
y1 = rnorm( N1 )
y1 = (y1-mean(y1))/sd(y1) * trueSD + trueM1
y2 = rnorm( N2 )
y2 = (y2-mean(y2))/sd(y2) * trueSD + trueM2
y = c( y1 , y2 )
N = length(y)
# Must have at least one data point with fixed assignment
# to each cluster, otherwise some clusters will end up empty:
Nclust = 2
clust = rep(NA,N)
clust[which.min(y)]=1 # smallest value assigned to cluster 1
clust[which.max(y)]=2 # highest value assigned to cluster 2
dataList = list(
y = y ,
N = N ,
Nclust = Nclust ,
clust = clust ,
onesRepNclust = rep(1,Nclust)
)
Results when mean of cluster 2 is 3 standard deviations away from mean of cluster 1: The posterior recovers the generating values fairly well.
Upper panel: Data with underlying normal generators. Lower panel: For each datum, the posterior probability that it is assigned to cluster 2. |
Marginal posterior on cluster means and SD. |
Pairs plot of cluster means and SD. |
Results when mean of cluster 2 is 2 standard deviations away from mean of cluster 1: There is lots of uncertainty. See captions for discussion.
Notice the bimodal distribution of sigma (SD). |
Friday, June 1, 2012
Beta distribution parameterized by mode instead of mean
In this post, I describe how it is easier to intuit the beta distribution in terms of its mode than its mean. This is especially handy when specifying a prior beta distribution.
(In a previous post, I explained how it is easier to intuit the gamma distribution in terms of its mode instead of its mean.)
A problem with using the mean to describe a distribution is that for skewed distributions, the mean may be far from the mode, but the mode may be what we intuitively want as the "descriptive handle" on the distribution, and therefore the mean is not a good surrogate for the description of central tendency. Especially when we are specifying a prior distribution, we may want to express our intuition in terms of the mode of the prior instead of the mean.
For a beta distribution with shape parameters a and b, the mode is (a-1)/(a+b-2). Suppose we have a desired mode, and we want to determine the corresponding shape parameters. Here's the solution. First, we express the "certainty" of the estimate in terms of the equivalent prior sample size,
Here are a few examples:
The book expressed beta distributions in terms of mean and certainty instead of mode and certainty; cf. Eqn. 5.5, p. 83, where m denoted the mean and n denoted the certainty instead of k used here.
(In a previous post, I explained how it is easier to intuit the gamma distribution in terms of its mode instead of its mean.)
A problem with using the mean to describe a distribution is that for skewed distributions, the mean may be far from the mode, but the mode may be what we intuitively want as the "descriptive handle" on the distribution, and therefore the mean is not a good surrogate for the description of central tendency. Especially when we are specifying a prior distribution, we may want to express our intuition in terms of the mode of the prior instead of the mean.
For a beta distribution with shape parameters a and b, the mode is (a-1)/(a+b-2). Suppose we have a desired mode, and we want to determine the corresponding shape parameters. Here's the solution. First, we express the "certainty" of the estimate in terms of the equivalent prior sample size,
k=a+b, with k≥2.
The certainty must be at least 2 because it essentially assumes that the prior contains at least one "head" and one "tail," which is to say that we know each outcome is at least possible. Then a little algebra reveals:
a = mode * (k-2) + 1
b = (1-mode) * (k-2) + 1
Here are a few examples:
The book expressed beta distributions in terms of mean and certainty instead of mode and certainty; cf. Eqn. 5.5, p. 83, where m denoted the mean and n denoted the certainty instead of k used here.
Subscribe to:
Posts (Atom)