Doing Bayesian Data Analysis: February 2012

Monday, February 27, 2012

JAGS code for Exercise 12.1 (initializing vectors with nonstochastic components)

Exercise 12.1 provides code snippets for modifying the BUGS code, but not for modifying the JAGS code. Here are the corresponding JAGS modifications.The primary issue here is how to initialize vectors that have some stochastic and some nonstochastic components.

In the model specification, change the prior on the mu[j] as indicated in the exercise:



# Hyperpriors for mu and kappa:

## Four mu version: 

#for ( j in 1:nCond ) {

#    mu[j] ~ dbeta( a[j,mdlIdx] , b[j,mdlIdx] )

#}

# Two mu version:

for ( j in 1:2 ) {

    mu[j] ~ dbeta( a[j,mdlIdx] , b[j,mdlIdx] )

}

mu[3] <- mu[2] 

mu[4] <- mu[2]

From p. 315 of the book: "... when initializing chains in BUGS [and JAGS!], only stochastic variables have random chains and therefore only the stochastic variables are initialized. Any variable that has a value set by the assignment operator, “<-”, has a deterministic, not stochastic value. But sometimes, as in the present application, a vector has some stochastic components and some nonstochastic components. The way to initialize such vectors is to provide a specific value for the stochastic components, and an NA value to the nonstochastic components..."

Hence, in the initialization section, include the following code:



sqzData = .01+.98*dataList$nCorrOfSubj/dataList$nTrlOfSubj

mu = aggregate( sqzData , list(dataList$CondOfSubj) , "mean" )[,"x"]

sd = aggregate( sqzData , list(dataList$CondOfSubj) , "sd" )[,"x"]

kappa = mu*(1-mu)/sd^2 - 1

initsList = list( theta = sqzData ,

# For four mu version:
# mu = mu ,
# For two mu version:
mu = c( mu[1] , mean( mu[2:4] ) , NA , NA ) ,
mu0 = mean(mu) ,
kappa = kappa ,

 mdlIdx = 1 )

And, in run-the-chains section, uncomment the explicit initialization:



jagsModel = jags.model( "model.txt" , data=dataList , inits=initsList ,

 n.chains=nChains , n.adapt=adaptSteps )

If you copy and paste from this page, it would be best not to directly paste into your R script because extraneous font formatting may bother the R interpreter. To avoid this problem, copy from here but then paste into a plain-text editor such as Notepad in Windows, which will strip away the font info. Then copy and paste from the plain-text editor into R.

Sunday, February 26, 2012

Autocorrelation in transdimensional MCMC with pseudoprior

Model-index chain can be badly autocorrelated even with pseudopriors.

Chapter 10 of the book discusses model comparison by transdimensional MCMC in JAGS/BUGS. There can be extreme autocorrelation in the model index. The autocorrelation can be reduced by using pseudopriors. But, even with pseudopriors, the autocorrelation in the model index can still be severe. The immediate consequence is that the estimate of the model-index posterior probability can be very unstable. You may notice this instability if you run the program of Exercise 10.3 several times in JAGS, which generates a different MCMC chain every time it is run, unlike BUGS. How can this problem of unstable model-index estimates be solved?

(N.B.: Chains for continuous parameters within models can be well mixed and stable, even when the chain for the model index is badly autocorrelated.)

* Candidate Answer 1: Just use very long chains. In principle this is fine. In practice, even a moderately long chain can exceed a computer's memory. For example, I ran the model of Exercise 10.3, which has about 180 parameters, for 100,000 steps, and my aging desktop couldn't handle it. (Don't know if it was a hardware or a software memory constraint.) Therefore, this is one of those cases in which thinning is useful -- because of computer constraints, not because it reduces variability in the estimate. Run the chain a very long time, but thin it so only a fraction of the chain is stored.

* Candidate Answer 2: Instead of transdimensional MCMC, estimate p(D|M) directly within each model. One approach to doing this is provided in Section 7.3.3 (p. 137) of the book. This method requires selecting a function h(parameters) that mimics the posterior. This is much like the creation of a pseudoprior in transdimensional MCMC. The book does not discuss how to construct h(parameters) for complex models. Essentially, it is the product of the pseudopriors at the top level of the model, times the conditional probabilities for the parameter dependencies, times the likelihood: h(parameters) = p(D|param) * p(param|hyperparam) * pseudoprior(hyperparam).

* Candidate Answer 3: Use some other method to estimate p(D|M) directly within each model. See, for example, Ch. 11 of Ntzoufras (2009).

* Candidate Answer 4: Don't do model comparison, and instead just look at the parameter estimates within the more elaborate model. If you are considering nested models, the more elaborate model probably tells you everything the model comparison tells you, and more (as is discussed at length in the book, e.g., Section 10.3).

Saturday, February 25, 2012

Why the puppies?

The posterior distribution is a compromise between the likelihood and the prior: The half-up ears are a compromise between the perky ears and the floppy ears. The marginal likelihood, a.k.a. evidence, is not needed in MCMC methods, so it gets sleepy with nothing much to do.

If the puppies bother you, a solution can be found at this other blog post.

Wednesday, February 22, 2012

Talk and Workshop at Vassar College, March 23-24

I'll be doing a talk and workshop at Vassar College. Details can be found here.

A list of future and past workshops can be found here.


(Poster composed entirely by folks at Vassar.)

Monday, February 20, 2012

Hierarchical Diagram for Pseudoprior Model in Ch. 10

Chapter 10 of the book describes the use of pseudopriors in model comparison to help the model-index parameter be sampled more efficiently (with less clumpy autocorrelation). The method was discussed without a diagram to illustrate where the pseudoprior has its effect in the model structure. This blog entry provides a diagram.

The extended example in Chapter 10 examined two models of the filtration-condensation data. One model used a separate κ_c for each condition. The second model used a single κ₀ parameter for all conditions. The program that implements the model comparison is called FilconModelCompPseudoPriorJags.R (or FilconModelCompPseudoPriorBrugs.R originally). The diagram shown above extends Figures 9.15 and 9.17 of the book, to include the two models' priors on κ. See the top of the diagram, which forks under the modelIdx into gamma distributions for κ_c or κ₀. The main thing that you have to do, that is not shown in the diagram, is imagine how the shape of the gamma distribution changes depending on the state of the model index. This is suggested by the text beside the gamma distributions. When the model index is 1, then the left gamma distribution is actually being used to model the data, and the real prior constants are used for it. On the other hand, when the model index is 1, then the right gamma distribution is not being used to model the data, and the pseudoprior constants are used for it, to keep the κ₀ value in a reasonable range. See the two variations below, that illustrate when the model index is 1 and 2, respectively:

Sunday, February 12, 2012

The Book visits Bayes

On his recent visit to Bunhill Fields, Mark Andrews snapped a photo of the book at Bayes' tomb:

Many thanks to Mark!
And see the book at Fisher's remains, too.

Saturday, February 4, 2012

Talk and Workshop at the Eastern Psychological Association, March 2 - 3

I'll be doing a talk and workshop at the meeting of the Eastern Psychological Association. Details can be found here.
A list of future and past workshops can be found here.

Doing Bayesian Data Analysis