Friday, May 10, 2013

Harp, Percussion, Shakespeare, AND Bayesian Data Analysis

There will be a week-long introductory course in doing Bayesian data analysis, July 15-29 (2013), at Indiana University. There are many art, music, theater and social events happening that same week, including the International Harp Competition, the IU Percussion Academy, and many others. Check the web page for complete course info and links to events calendars:
http://www.indiana.edu/~jkkteach/WorkshopICPSR2013.html
The Bayesian course is offered through the Inter-university Consortium for Political and Social Research (ICPSR).

* Please forward this announcement to interested persons or lists. *

Saturday, April 27, 2013

Bayesian estimation of log-normal parameters

Using the log-normal density can be confusing because it's parameterized in terms of the mean and precision of the log-scale data, not the original-scale data. Thus, if your data, y, are nicely described by a log-normal distribution, the estimated mean and precision are for log(y), not y. To get the corresponding parameters for the original-scale data, you have to transform the estimated parameters. And the prior on the mean and precision have to be appropriate for log(y), as opposed to y. This post shows an example of a JAGS program (with rjags) for estimating the parameters of log-normal distributed data.

The data for the example are shown as the (pink) histogram in the top left panel. A smattering of credible log-normal distributions is superimposed (as blue curves) on the data:
The upper-middle panel shows that the mean of the log-normal distribution is estimated to be 5.0. Notice that a value of 5.0 is nowhere near the bulk of the data, because the mean is for the log data, not the original-scale data. The transformed parameters are shown in the lower panels. Thus, the mode of the log-normal distribution, on the original scale, is about 115, and the mean of the log-normal distribution, on the original scale, is about 169.

The complete program I used is included below. Notice, in particular, that the priors are set to be vague on the scale of the log data.


graphics.off()
rm(list=ls(all=TRUE))
fileNameRoot="YmetricXsingleJagsDLNORM" # for constructing output filenames
source("openGraphSaveGraph.R")
source("plotPost.R")
require(rjags)         # Kruschke, J. K. (2011). Doing Bayesian Data Analysis:
                       # A Tutorial with R and BUGS. Academic Press / Elsevier.
#------------------------------------------------------------------------------
# THE MODEL.
modelstring = "
model {
  for( i in 1 : N ) {
    y[i] ~ dlnorm( muOfLogY , 1/sigmaOfLogY^2 )
  }
  sigmaOfLogY ~ dunif( 0.001*sdOfLogY , 1000*sdOfLogY )
  muOfLogY ~ dnorm( meanOfLogY , 0.001*1/sdOfLogY^2 )
  muOfY <- exp(muOfLogY+sigmaOfLogY^2/2)
  modeOfY <- exp(muOfLogY-sigmaOfLogY^2)
  sigmaOfY <- sqrt(exp(2*muOfLogY+sigmaOfLogY^2)*(exp(sigmaOfLogY^2)-1))
}
" # close quote for modelstring
writeLines(modelstring,con="model.txt")

#------------------------------------------------------------------------------
# THE DATA.

# Generate random data from known parameter values:
set.seed(47405)
trueM = 5.0
trueSD = 0.5
y = rnorm( n=125 )  # R dnorm uses mean and SD
LogY = (y-mean(y))/sd(y)*trueSD + trueM
y = exp(LogY)

dataList = list(
    y = y ,
    N = length(LogY) ,
    meanOfLogY = mean(LogY) ,
    sdOfLogY = sd(LogY)
)


#------------------------------------------------------------------------------
# INTIALIZE THE CHAINS.

# Let JAGS do it

#------------------------------------------------------------------------------
# RUN THE CHAINS


parameters = c("muOfLogY" , "sigmaOfLogY" , "muOfY" , "modeOfY" , "sigmaOfY" ) 
adaptSteps = 1000              # Number of steps to "tune" the samplers.
burnInSteps = 1000            # Number of steps to "burn-in" the samplers.
nChains = 3                   # Number of chains to run.
numSavedSteps=30000           # Total number of steps in chains to save.
thinSteps=1                   # Number of steps to "thin" (1=keep every step).
nPerChain = ceiling( ( numSavedSteps * thinSteps ) / nChains ) # Steps per chain.
# Create, initialize, and adapt the model:
jagsModel = jags.model( "model.txt" , data=dataList ,
                        n.chains=nChains , n.adapt=adaptSteps )
# Burn-in:
cat( "Burning in the MCMC chain...\n" )
update( jagsModel , n.iter=burnInSteps )
# The saved MCMC chain:
cat( "Sampling final MCMC chain...\n" )
codaSamples = coda.samples( jagsModel , variable.names=parameters ,
                            n.iter=nPerChain , thin=thinSteps )
# resulting codaSamples object has these indices:
#   codaSamples[[ chainIdx ]][ stepIdx , paramIdx ]

#------------------------------------------------------------------------------
# EXAMINE THE RESULTS

checkConvergence = FALSE
if ( checkConvergence ) {
  openGraph(width=7,height=7)
  autocorr.plot( codaSamples[[1]] , ask=FALSE )
  show( gelman.diag( codaSamples ) )
  effectiveChainLength = effectiveSize( codaSamples )
  show( effectiveChainLength )
}

# Convert coda-object codaSamples to matrix object for easier handling.
# But note that this concatenates the different chains into one long chain.
# Result is mcmcChain[ stepIdx , paramIdx ]
mcmcChain = as.matrix( codaSamples )
chainLength = NROW(mcmcChain)

openGraph(width=10,height=6)
layout(matrix(1:6,nrow=2,byrow=TRUE))
# posterior predictive
hist( dataList$y , xlab="y" , main="Data w. Post. Pred." , breaks=30 ,
      col="pink" , border="white" , prob=TRUE , cex.lab=1.5)
pltIdx = floor(seq(1,chainLength,length=20))
xComb = seq( min(dataList$y) , max(dataList$y) , length=501 )
for ( chnIdx in pltIdx ) {
  lines( xComb ,
         dlnorm( xComb, mcmcChain[chnIdx,"muOfLogY"], mcmcChain[chnIdx,"sigmaOfLogY"] ),
         col="skyblue" )
}
# param's of log(y)
postInfo = plotPost( mcmcChain[,"muOfLogY"] , xlab="mu of log(y)" )
postInfo = plotPost( mcmcChain[,"sigmaOfLogY"] , xlab="sigma of log(y)" )
# param's of y
postInfo = plotPost( mcmcChain[,"modeOfY"] , xlab="mode of y" )
postInfo = plotPost( mcmcChain[,"muOfY"] , xlab="mu of y" )
postInfo = plotPost( mcmcChain[,"sigmaOfY"] , xlab="sigma of y" , showMode=TRUE)

saveGraph(file=fileNameRoot,type="jpg")

#------------------------------------------------------------------------------


Monday, April 15, 2013

Four-day course in doing Bayesian data analysis, June 10-13

There will be a four-day introductory course in doing Bayesian data analysis, July 10-13 (2013), at the University of St. Gallen, Switzerland. The course is offered through the Summer School in Empirical Research Methods. Complete info is at this link: http://www.indiana.edu/~jkkteach/WorkshopStGallen2013.html

Saturday, April 13, 2013

Bayesian robust heteroscedastic hierarchical two-factor "ANOVA"

A great advantage of Bayesian analysis, in software such as JAGS/BUGS, is that you can create models that are appropriate to the data (instead of shoe-horning your data into inappropriate models). For example, suppose you have a design that would be analyzed traditionally by two-factor ANOVA. In traditional ANOVA, it is assumed that the data are distributed normally within cells and that the variances are the same for all cells. But suppose you have real data. In real data, there may be many outliers (i.e., distributions within cells have heavy non-normal tails) and the cells may have very different variances (i.e., there is "heteroscedasticity" across cells). In JAGS/BUGS, we can create a model that (1) accommodates outliers by using heavy-tailed distributions to describe data within cells, and (2) accommodates heteroscedasticity by using distinct variance parameters in each cell. And, in the program I present here, there is hierarchical structure that provides shrinkage on the cell means and cell variances.

The program is a variant of programs in DBDA, so I will not provide a self-contained tutorial in this post, instead relying on readers to consult Chapter 19 of DBDA for background information. Since writing DBDA, the ANOVA-style programs have been iteratively improved, as explained in various posts in this blog; search the blog with the term "ANOVA".

Important: The hierarchical structure in the model can produce implosive shrinkage for small data sets and/or priors that allow (or encourage) zero differences between cells. Implosive shrinkage is merely what the model implies, given the data, but it might not be what you think is a meaningful description of the data. If you get implosive shrinkage that you deem inappropriate or not useful, then you can change the prior structure so it more accurately reflects your modeling assumptions. For example, you could simply change the model so it has no hierarchical structure, but then, of course, you get no sharing of information across cells. The program presently uses prior distributions that de-emphasize zero differences between cells. This helps prevent implosive shrinkage, but it is also an expression of a prior belief that exactly-zero differences are improbable. In particular, the prior is not a half-Cauchy or folded-t distribution that some statisticians have recommended as a generic prior for hierarchical variance terms. Please see the comments inserted throughout the program.

As an example, consider a 3-by-4 two-factor between-subjects design. (And no, I don't have any within-subjects versions; sorry.) A link to the data file is included below. Here is a plot of the sample means:
Although not visible in the plot of the cell means, the cells have outliers and heteroscedastic variances. The results of a conventional ANOVA yield:


             Df Sum Sq Mean Sq F value   Pr(>F)    
x1            2    658   329.1   2.375   0.0939 .  
x2            3   3661  1220.2   8.804 1.02e-05 ***
x1:x2         6   1446   241.0   1.739   0.1096    
Residuals   588  81495   138.6                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Notice, above, the non-significant (overall) interaction. Using the Bayesian program, we find, among other things, that there is a credibly non-zero interaction contrast (depending on your choice of ROPE):
 The Bayesian program provides estimates of all the cell means and cell variances (etc.), so you can do lots of comparisons of cell means and cell variances.

The program (AnovaTwoFactor.R) and data file (AnovaTwoFactorData.csv) are available at the usual program repository.

Monday, April 8, 2013

Week-long course in doing Bayesian data analysis, July 15-19

There will be a week-long introductory course in doing Bayesian data analysis, July 15-29 (2013), at Indiana University. The course is offered through the Inter-university Consortium for Political and Social Research (ICPSR). Complete info is at this link:
http://www.indiana.edu/~jkkteach/WorkshopICPSR2013.html

* Please forward this announcement to interested persons or lists. *

Sunday, March 24, 2013

How to modify a JAGS+rjags (or BUGS) program to include new parameters

A central goal of providing you with lots of examples of programs, that do Bayesian parameter estimation in JAGS (BUGS) and rjags, is for you to be able to expand and modify the programs for your own needs. The book includes several cases in which programs are incrementally modified, but this blog post provides an explicit example of the systematic changes that are needed. We start with a program for linear regression, and expand it to include a parameter for quadratic trend. All the specific changes needed for the inclusion of a new parameter are explicitly pointed out, in every section of the program. This provides a guide for changing other programs to your own needs.

To motivate the present example, here (below) are plots of the data, fit by linear regression (the original program), and by regression that includes a quadratic component (the expanded program).
Notice above that there appears to be a systematic descrepancy between credible linear regression lines and the data. The data seem to have a curvature that is not captured by the linear trend. If we include a quadratic component in the model, the credible trend lines look like this:
(Sorry that the title of the graph still says "lines" instead of "curves".)

My purpose for this blog post is to show you how to expand the program that created the first result above into the program that created the second result above. I will do this simply by providing the two programs, and marking the changed lines in the expanded program with the comment, #CHANGED. The two programs can be found at the usual program repository: The original is here and the modified is here.

Although the programs are moderately long, primarily because of all the graphics commands that come at the end, the number of changes is small. Below, I highlight the changes, section by section.

The crucial conceptual change comes in model statement, which specifies the quadratic component for the trend. The model must be expanded to include the quadratic component. The parts highlighted in yellow were added to the original model specification:
model {
    for( i in 1 : Ndata ) {
        y[i] ~ dnorm( mu[i] , tau )
        mu[i] <- beta0 + beta1 * x[i] + beta2 * pow(x[i],2) # CHANGED
    }
    beta0 ~ dnorm( 0 , 1.0E-12 )
    beta1 ~ dnorm( 0 , 1.0E-12 )
    beta2 ~ dnorm( 0 , 1.0E-12 ) # CHANGED
    tau ~ dgamma( 0.001 , 0.001 )
}
Notice that the trend, specified by mu[i], was expanded to include a new parameter; namely, the beta2 coefficient on the squared data value. Importantly, with the introduction of a new parameter, that parameter must be provided with a prior, as shown in the second highlighted line above.

All that remains to be done after that is to make sure that the rest of the program takes into account the new parameter and model form.

The data section does not need to be changed, because the data are not changed.

The chain-initialization section, if there is one, must be changed to accommodate the new parameter:
initsList = list(
  beta0 = b0Init ,
  beta1 = bInit ,
  beta2 = 0 , # CHANGED
  tau = tauInit
)
I chose to be lazy and initialize the new parameter at zero, rather than doing it more intelligently.

In the run-the-chains section, we need to tell JAGS to keep track of the new parameter:
parameters = c("beta0" , "beta1" , "beta2" , "tau")  # CHANGED
Notice the only change in that whole section was specifying the new parameter to be tracked.

Hopefully you'll find that all those changes mentioned above are pretty straightforward! In general, when you're modifying a JAGS+rjags program, those steps above are the main ones to keep in mind. Here they are, summarized:
  • Carefully specify the model with its (new) parameters.
  • Be sure all the (new) parameters have sensible priors.
  • If you are defining your own initial values for the chains, be sure you've defined intial values for all the (new) parameters.
  • Tell JAGS to track the (new) parameters.
After that, the most effortful part is graphically displaying the results. Here is the change I made for the graphs presented at the beginning of this blog entry:
# Display data with believable regression curves and posterior predictions.
# Plot data values:
xRang = max(x)-min(x)
yRang = max(y)-min(y)
limMult = 0.25
xLim= c( min(x)-limMult*xRang , max(x)+limMult*xRang )
yLim= c( min(y)-limMult*yRang , max(y)+limMult*yRang )
plot( x , y , cex=1.5 , lwd=2 , col="black" , xlim=xLim , ylim=yLim ,
      xlab="X" , ylab="Y" , cex.lab=1.5 ,
      main="Data with credible regression lines" , cex.main=1.33  )
# Superimpose a smattering of believable regression lines:
xComb = seq(xLim[1],xLim[2],length=201)
for ( i in round(seq(from=1,to=chainLength,length=50)) ) {
  lines( xComb , 
         mcmcChain[i,"beta0"] + mcmcChain[i,"beta1"]*xComb 
         + mcmcChain[i,"beta2"]*xComb^2 , # CHANGED 
         col="skyblue" )
}
The set-up of the graph was unchanged; all I did was modify the blue curves that get plotted, so that they correspond to the model that was specified for JAGS. This is perhaps the most dangerous part of modifying JAGS programs: It is easy to modify the model specification for JAGS, but inadvertently not make the identical corresponding change to what gets plotted later! It would be nice if the model were specified only once, in a form that both JAGS and R simultaneously understand, but presently that is not how it's done in the JAGS/BUGS world.

Oh -- and what about interpreting the results in this case? Is there a credible non-zero quadratic trend, and how big is it? Answer: Just look at the posterior distribution on beta2. And, you might ask, how do we examine the results of the linear regression and decide to expand the model? That process is called a posterior predictive check, and my advice about it is provided in this article (n.b., your click on that link constitutes your request to me for a personal copy of the article, and my provision of a personal copy only).

Monday, March 4, 2013

Shrinkage in bimodal hierarchical models: Toward the modes, not the middle

In hierarchical models, the estimated values of intermediate-level parameters exhibit "shrinkage" because the higher-level distribution affects the intermediate-level parameter estimates. In typical applications, the form of the higher-level distribution is also estimated. And, in most typical applications, the higher-level distribution is unimodal, producing shrinkage of intermediate-level parameters is toward the middle of the higher-level distribution. Because this type of shrinkage is so prevalent, it is easy to think that shrinkage is always inward, toward the middle. But it does not have to be. This post shows a simple case of bimodal data producing a bimodal higher-level distribution, which causes shrinkage to be outward to the two modes. In other words, this post is a reminder that shrinkage is not toward the middle, shrinkage is toward the modes.

Consider a simple case of estimating the biases of several coins, and simultaneously estimating the distribution of biases across the coins. This is like estimating individual subject parameters (the coins) and group-level summary parameters (for the distribution across coins). For the jth coin, we observe a particular number of heads, Hj, out of its total number of flips, Nj. Denote the estimated bias of the coin as θj. Then the likelihood function is the usual product of Bernoulli's:
p(Hjj,Nj) = θjH (1-θj)(N-H)
The distribution of θj is here described by a beta distribution with shape parameters a and b:
p({θj}|a,b) = Πj dbeta(θj|a,b)
The overall likelihood of the parameters, for the particular data, is computed as the product of the equations above. Our goal is to find the parameter values, for {θj} and a and b, that maximize the likelihood.

In the two examples shown below there are 6 coins, each flipped 30 times. The proportion of flips that are heads is shown by the placement of the black dots in the figures below. For both examples, there are three coins that have fewer than 50% heads and three coins that have more than 50% heads. All that differs between the two examples is how extreme the separation is between the two clusters of coins.

In the first example, the two clusters of data are not separated by much. In the figure below, the black dots show the data. This first figure shows the likelihood if we choose values for θj that exactly match the observed proportion of heads in each coin, as shown by the blue circles, and we set a=1 and b=1 for the over-arching beta distribution:

We can find parameter values that produce a higher likelihood, however. In fact, the maximum likelihood estimates of the parameters are shown here:

Importantly, notice in the figure above that the blue circles, which represent the best estimates of the biases in the coins, are shrunken --- and toward the middle of the data. 

But now consider what happens when the data are more extremely bimodal, as shown below. First, again, we consider choosing parameter values that match the proportions in the data and give a uniform higher-level distribution:

The maximum-likelihood estimates of the parameters are shown here:

Importantly, notice in the figure above that the blue circles, which represent the best estimates of the biases in the coins, are "shrunken" --- away from the middle of the data. The moral: Shrinkage is toward the modes of the higher-level distribution, not necessarily toward the middle.