Wednesday, August 20, 2014

Converting combination of random variables to hierarchical form for JAGS (BUGS, Stan, etc.)

An emailer asks:

Hi, John. Long-time listener, first-time caller... I have a model that says X is a function of three (independent) random variables:
X ~ normal(mu, sigma) / uniform(a,a+b) - beta(v,w)
and I also have N random samples of X. Can I use JAGS to estimate mu, sigma, a, b, v and w? Or is this problem outside its scope? Not sure if it's even possible.

My reply:

In other words, you want

x <- r / u - b
r ~ normal( rMean , rSD )
u ~ uniform( uLow , uHigh )
b ~ beta( bA , bB )

but the problem is that JAGS won't accept that sort of model specification.

One solution is to change to an equivalent hierarchical formalization:

x ~ normal( rMean/u-b , rSD/u )
u ~ uniform( uLow , uHigh )
b ~ beta( bA , bB )

This hierarchical form is directly implementable in JAGS. But be careful that in JAGS the normal is parameterized by precision, not SD, so you'll need to write x ~ dnorm( rMean/u-b , 1/(rSD/u)^2 ).

To demonstrate that the two forms are equivalent, here is a Monte Carlo sample generated in R from each formalization:

# Specify parameter values and sample size:
rMean=7 ; rSD=2
uLow=3 ; uHigh=5
bA=4 ; bB=3
N = 50000

# Sample x values your original way:
b = rbeta(N, bA,bB)
u = runif(N, uLow,uHigh)
r = rnorm(N, rMean,rSD)
x1 = r/u-b

# Sample x values a new way:
x2 = rep(0,N)
for ( i in 1:N ) {
  b = rbeta(1, bA,bB)
  u = runif(1, uLow,uHigh)
  x2[i] = rnorm(1, rMean/u-b , rSD/u )

# Plot results:
text( mean(x1) , 0 , "+" , cex=2 )
text( mean(x2) , 0 , "+" , cex=2 )

The result is shown below, where you can see that the original and new distributions are identical:

Friday, August 15, 2014

How to use MCMC posterior as prior for future data

An emailer writes:

Dear Prof. Kruschke,
Hello. My name is ... and I am ... . I'm trying to apply Bayesian theorem in developing a model of ... . I used your code to estimate posterior distribution without any trouble. Here is my question. Would you kindly let me know how to use the estimated posterior distribution as the prior of another Bayesian update? ...  Any comment would be greatly appreciated. Thank you very much for your time. Regards,
 My reply:

Thanks very much for your interest in the book.

Because MCMC methods require a mathematically specified prior, but generate a Monte Carlo sample of the posterior, you need to either (a) find a reasonable mathematical summary of the MCMC posterior to use as the mathematical prior for the next batch of data or (b) concatenate the previous data with the next batch of data and analyze them together.

Option (b) gives you the mathematically correct answer, but at the cost of a large data set which might slow down the MCMC process, and it demands that the previous and next batch of data are structured the same way and analyzed with the same model.

For option (a), there is no uniquely correct approach. For each parameter, examine its marginal MCMC posterior distribution and summarize it with a reasonable mathematical function. For example, if the parameter is metric on positive values, maybe a gamma distribution would mimic the MCMC distribution well. Use a mathematical distribution with the same central tendency and standard deviation as the MCMC posterior. A peril of mimicking each marginal distribution separately is that you lose all information about the correlation of parameters in the posterior. If the correlation is important for capturing the prior information you want, then you can use a multivariate prior for two or more parameters simultaneously (e.g., dmnorm in JAGS).

Hope this helps.

P.S. (Added an hour or so after the text above.) A similar thing was done in the book with pseudo-priors. See the middle paragraph of p. 251: "The shape and rate constants for the gamma pseudopriors were set so that the gamma distributions had the same mean and standard deviation as the posterior sample of kappa values. For example, consider the parameter kappa0. In the initial run without pseudopriors, a reasonable posterior sample was obtained for model index 2 (i.e., when kappa0 is actually constrained by the data, as shown in the bottom panel of Figure 10.3). That posterior sample had a mean m and a standard deviation s. The conversion from mean and standard deviation to shape and rate parameters was explained in the caption of Figure 9.8. The conversion yields shape and rate values specified in lines 55 to 56 of the code."

Tuesday, August 12, 2014

Stopping and testing intentions in p values

An emailer asks:

I am really interested in Bayesian analysis, but I don't get the issue of sampling intention being so important in frequentist t-tests; if you have 60 values you have 60 values surely - why does your intention matter? The computer does not know what I intended to do; or have I missed the point entirely!?

That is exactly the point -- intuitively (for natural Bayesians) the stopping and testing intentions should not matter (for some things), but for p values the stopping and testing intention is of the essence, sine qua non. In lieu of reading the BEST article, see this video, starting at 7:15 minutes:

One other quick question if I may; how does the BEST package differ from the BayesFactor package?

An appendix of the BEST article explains, but you can instead see this video, starting at 2:35 minutes:

Those video excerpts might make the most sense if you just start with the first and watch them through, all the way...

Friday, August 8, 2014

Prior for normality (df) parameter in t distribution

A routine way to describe outliers in metric data is with a heavy-tailed t distribution instead of with a normal distribution. The heaviness of the tails is governed by a normality parameter, ν, also called the df parameter. What is a reasonable prior for ν? One answer: A shifted exponential distribution, which is what I used in the "BEST" article and software. But the book (on pp. 432-433) seems to use a different and obscure prior distribution. The purpose of this post is to show that the book's prior on the normality parameter is, in fact, a shifted exponential.

The book's prior on ν, there called tdf, is
  tdf <- 1 - tdfGain*log(1-udf)
  udf ~ dunif(0,1)
where tdfGain is a constant. In other words, tdf is a strange logarithmic transformation of udf, and udf has a uniform prior. Well, that's the same thing as a shifted exponential on tdf, with tdfGain being its mean.

To prove that claim, we could do some math, as suggested by the method in Section 23.4 (p. 630) regarding reparameterization. But instead I'll show a large Monte Carlo sample of tdf from the book, and superimpose a shifted exponential density, and you can see that they are identical:

In each of the above graphs, the title indicates the value of tdfGain, labelled as nuGain. A large number of values, udf, were randomly generated from a uniform distribution, and then those values were transformed to tdf (=ν) values using the formula tdf <- 1 - tdfGain*log(1-udf). The histogram plots the resulting tdf (=ν) values. The title of each graph also indicates the mean of Monte Carlo sample, which should be nuGain+1 except for randomly sampling noise. Superimposed on the histogram is a exponential density with mean set to nuGain, and shifted 1 unit to the right (because ν must be > 1). Thus, you can see that the book's prior for tdf really is a shifted exponential, as used in the BEST article.

Appendix: Here is the code for generating the graphs.
for ( nuGain in c(3,30,100) ) {
  # Random sample of nu from book prior:
  nu = 1 - nuGain*log( 1 - runif(50000) )
  # Plot histogram:
  xMax = 5*nuGain
  histInfo = hist( nu , probability=TRUE ,
                   col="skyblue" , border="white" ,
                   xlab=bquote(nu) , xlim=c(0,xMax) ,
                   cex.lab=1.5 , cex.main = 1.5 ,
                   breaks=c(0,seq(1,xMax,length=30),1E10) ,
                   main=bquote( list( nuGain==.(nuGain) ,
                                      mean(nu)==.(round(mean(nu),2)) ) ) )
  # Superimpose shifted exponential curve with same mean-1:
  x = seq( 1 , xMax , length=201 )   # Make x-axis comb.
  lines( x , dexp( x-1 , 1/(nuGain) ) , lwd=2 )
  text( mean(nu) , dexp( mean(nu)-1 , 1/(nuGain ) ) ,
        bquote("curve: dexp( "*nu*"-1 , mean="*.(nuGain)*" )") ,
        adj=c(-0.1,-0.1) , cex=1.5 )