Doing Bayesian Data Analysis: June 2012

Saturday, June 30, 2012

Courses that use Doing Bayesian Data Analysis?

Instructors seek examples of using the book, Doing Bayesian Data Analysis, as part of a course. A cursory web search yielded these few listed below, but there must be others. For example, my own course web page did not show up in the search, nor did another course from which a published review of the book arose. Please let me know of other courses that use the book, and I will update this list in a subsequent post.

Applied Bayesian Modeling for the Social Sciences, Dave Armstrong
Bayesian Methods, Nick Beauchamp
Statistical & Cognitive Modeling for Formal Semantics, Adrian Brasoveanu
Introduction to Bayesian Analysis, Brad Carlin
Doing Bayesian Data Analysis, John Kruschke
Applied Bayesian Statistics, Marco Steenbergen
Statistics VI, Tuerlinckx and Vanpaemel (from which this book review arose)
Bayesian Data Analysis, Uppsala University
Topics in Quantitative Psychology: Bayesian Data Analysis (no web page), Caren Rotello, U. of Massachusetts

Monday, June 25, 2012

Workshop, July 11-13, U. Wisconsin, Madison

Workshop at the University of Wisconsin at Madison, July 11-13. Three full days of doing Bayesian data analysis. See details and registration info here.

A list of some previous workshops is here.

Wednesday, June 20, 2012

Solutions to exercises now available to everyone

For solutions to the exercises, please see the book's web site here (https://sites.google.com/site/doingbayesiandataanalysis/exercises).

Wednesday, June 6, 2012

Mixture of Normal Distributions

In this post I show a simple illustration of a mixture of normal distributions. For the examples, we assume we have metric values that we suppose are generated by a mixture of two different normal distributions, which I'll call clusters. We don't know which datum came from each cluster. Our goal is to estimate the probability that each score came from each of the two clusters, and the means and SD of the normal distributions that describe the clusters.

The model specification (for JAGS): The assumes that the clusters have the same standard deviation, but different means.

model {
    # Likelihood:
    for( i in 1 : N ) {
      y[i] ~ dnorm( mu[i] , tau )
      mu[i] <- muOfClust[ clust[i] ]
      clust[i] ~ dcat( pClust[1:Nclust] )
    }
    # Prior:
    tau ~ dgamma( 0.01 , 0.01 )
    for ( clustIdx in 1: Nclust ) {
      muOfClust[clustIdx] ~ dnorm( 0 , 1.0E-10 )
    }
    pClust[1:Nclust] ~ ddirch( onesRepNclust )
}

The data specification:

# Generate random data from known parameter values:
set.seed(47405)
trueM1 = 100
N1 = 200
trueM2 = 145 # 145 for first example below; 130 for second example
N2 = 200
trueSD = 15
effsz = abs( trueM2 - trueM1 ) / trueSD
y1 = rnorm( N1 )
y1 = (y1-mean(y1))/sd(y1) * trueSD + trueM1
y2 = rnorm( N2 )
y2 = (y2-mean(y2))/sd(y2) * trueSD + trueM2
y = c( y1 , y2 )
N = length(y)

# Must have at least one data point with fixed assignment
# to each cluster, otherwise some clusters will end up empty:
Nclust = 2
clust = rep(NA,N)
clust[which.min(y)]=1 # smallest value assigned to cluster 1
clust[which.max(y)]=2 # highest value assigned to cluster 2
dataList = list(
    y = y ,
    N = N ,
    Nclust = Nclust ,
    clust = clust ,
    onesRepNclust = rep(1,Nclust)
)

Results when mean of cluster 2 is 3 standard deviations away from mean of cluster 1: The posterior recovers the generating values fairly well.

Upper panel: Data with underlying normal generators.
Lower panel: For each datum, the posterior probability that it is assigned to cluster 2.

Marginal posterior on cluster means and SD.

Pairs plot of cluster means and SD.

Results when mean of cluster 2 is 2 standard deviations away from mean of cluster 1: There is lots of uncertainty. See captions for discussion.

Lower panel: Notice that the lowest and highest data values have fixed cluster assignments, but all the other data values have posterior probabilities of cluster assignment noticeably far from 0 or 1.

Notice the bimodal distribution of sigma (SD).

Notice in the in the right column that when sigma is small, around 15, then the cluster means are near their true generating values. But when sigma is large, then the cluster means get close together. Essentially, there is a bimodal posterior: Either there are two clusters, with smaller sigma and distinct means, or there is one cluster, with larger sigma and both cluster means set near the mean of the one cluster.

Friday, June 1, 2012

Beta distribution parameterized by mode instead of mean

In this post, I describe how it is easier to intuit the beta distribution in terms of its mode than its mean. This is especially handy when specifying a prior beta distribution.

(In a previous post, I explained how it is easier to intuit the gamma distribution in terms of its mode instead of its mean.)

A problem with using the mean to describe a distribution is that for skewed distributions, the mean may be far from the mode, but the mode may be what we intuitively want as the "descriptive handle" on the distribution, and therefore the mean is not a good surrogate for the description of central tendency. Especially when we are specifying a prior distribution, we may want to express our intuition in terms of the mode of the prior instead of the mean.

For a beta distribution with shape parameters a and b, the mode is (a-1)/(a+b-2). Suppose we have a desired mode, and we want to determine the corresponding shape parameters. Here's the solution. First, we express the "certainty" of the estimate in terms of the equivalent prior sample size,

k=a+b, with k≥2.

The certainty must be at least 2 because it essentially assumes that the prior contains at least one "head" and one "tail," which is to say that we know each outcome is at least possible. Then a little algebra reveals:

a = mode * (k-2) + 1

b = (1-mode) * (k-2) + 1

Here are a few examples:

The book expressed beta distributions in terms of mean and certainty instead of mode and certainty; cf. Eqn. 5.5, p. 83, where m denoted the mean and n denoted the certainty instead of k used here.

Doing Bayesian Data Analysis