## Saturday, March 22, 2014

### Potpourri of recent inquires about doing Bayesian data analysis

I get a lot of email asking about the book and the blog. Too often, I say to myself, "That's interesting. I'll have to think about it and reply later." And then later never comes. My apologies to everyone to whom I have not yet replied. Below is a sample of a few of the messages I've received in the last few days, actually with replies.

Topics:
• Initializing chains for multi-factor ANOVA-style models.
• Errors in BUGS (another reason to use JAGS).
• Getting the decimal places of the HDI limits that don't appear in the graphs.
• Effect size measure in ANOVA-style log-linear models.
• Variable selection in multiple regression.

March 22, 2014:
Dear Prof. Kruschke,

by now I've opened and closed your contact form many a times, and have been successfully able to tackle each problem I had at the time using your book. Now, though, I am stumped. I am calculating a Poisson Exponential "ANOVA" for 4 nominal predictors and the code for initializing the chains troubles me. For that matter, my question is regarding any and all situations in which there are more than two nominal predictors present. The code to initialize the linear predictor sum is this:

linpred = as.vector( outer( a1 , a2 , "+" ) + a0 )

I understand the concept of the outer() function for two variables, but could you please explain how you would deal with adding a3 and a4 into this mix, assuming the rest of the code has been updated accordingly?

Kind regards,

Thanks very much for your interest in the book.

One approach is to try not to initialize manually, letting JAGS initialize for you. You have to check that the chains have burned in well and converged. But if it works, problem solved.

But if you really want to manually initialize, you have a few choices. One option is not to bother with the interaction terms. That is, initialize the main effects with lines like
a0 = mean( theData$y ) a1 = aggregate( theData$y , list( theData$x1 ) , mean )[,2] - a0 a2 = aggregate( theData$y , list( theData\$x2 ) , mean )[,2] - a0

but leave the interaction terms all initialized at zero.

If you really, really want to initialize the interactions, you can use the outer() function to build higher dimensional arrays. To get help, type
?outer
at R's command line. As an example, consider this:

> a=1:2
> b=3:5
> c=6:9
> outer(a,b,"+")
[,1] [,2] [,3]
[1,]    4    5    6
[2,]    5    6    7
> outer(c,outer(a,b,"+"),"+")
, , 1

[,1] [,2]
[1,]   10   11
[2,]   11   12
[3,]   12   13
[4,]   13   14

, , 2

[,1] [,2]
[1,]   11   12
[2,]   12   13
[3,]   13   14
[4,]   14   15

, , 3

[,1] [,2]
[1,]   12   13
[2,]   13   14
[3,]   14   15
[4,]   15   16


Hope that helps!
--John

March 20, 2014:
Hi John,

First of all I would like to mention that I am really enjoying reading the book.

I have studied a bit of Bayesian statistics in a university course and I find really interesting to see real world applications of MCMC.

I have been doing the exercises okay but I have come across a stumbling block in chapter Nine.

On exercise 9.2 when I changed the data parameters as per exercise solutions I do get the following error message. These errors appear when I run the FilconBrugs.R script in its original form as well

'Initializing chain 1:
initial values loaded and chain initialized but another chain contain uninitialized variables
Initializing chain 2:
initial values loaded and chain initialized but another chain contain uninitialized variables
Initializing chain 3:
model is initialized,

After this error then I encounter the following one:

"update error for node <theta> algorithm conjugate beta updater error argument two  has too small a value"

Basically the model does not produce any posterior. If you could let me know what the error could be due to it would be much appreciated. I'm using R studio and I haven't come across any errors in the rest of exercises and code that I have ran so far.

Kind Regards,

Thanks very much for your interest in the book.

First off, I strongly recommend using JAGS instead of BUGS. I have repeatedly found that BUGS yields troubles when JAGS works fine. Here is the link to a blog post about using JAGS instead of BUGS:
http://doingbayesiandataanalysis.blogspot.com/2012/01/complete-steps-for-installing-software.html

Both of the errors you encounter are from BUGS. The statements about intializing the chains are not errors, but merely status reports. It's simply going through the chains and initializing them, but letting you know that after chain 1 there are still 2 more chains to go, and after chain 1 there is still another chain to go.

The bit about "update error for node..." is a genuine error. As I recall, I do not get that sort of error in JAGS. There are work-arounds in BUGS, by truncating the distribution or the values of higher-level parameters. But give JAGS a try and see if that solves the problem.

Thanks again.
--John

March 20, 2014:
Good afternoon Mr. Kruschke,

I have a  questions about two-way ANOVA from your book. On page 522, you run an example using salary data and you mention that the means and HDI limits are displayed with only three significant digits, but more precise values can be obtained directly from the program.  How do I get those more precise values?

Thank you for help,

Thanks very much for your interest in the book.

The 22-Jun-2012 version of plotPost.R (see programs at http://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/Programs/ ) produces a summary of the posterior as its returned value. For example, if you use the R statement

postInfo = plotPost( ... )

then postInfo has the mean, median, mode, HDI limits, etc.

Don't forget that even though multiple decimal points are there, they are not necessarily meaningful. The values are estimates from a random MCMC sample from the posterior, so they are noisy!

Thanks again.
--John

March 19, 2014:
Dear Prof. Krushke, I'm working with ch 22 of DBDA to analyse contingency tables. I'm curious to know where to look for a measure of effect size, both for the whole table and for individual interactions. Any tips you have would be much appreciated!
Thanks,

Thanks for your interest in the book.

My short answer is that I don’t have an adequate answer. There are many relevant measures of effect size in the frequentist literature, of course, but we seek something for hierarchical Bayesian log-linear models. A measure of effect size is supposed to be an aggregated difference (i.e., the “effect”) standardized relative to some indicator of variance or noise in the data. Moreover, in a Bayesian setting, the effect size has a posterior distribution; it’s not just a point estimate. An example of posterior effect size for a difference between groups is given here: http://www.indiana.edu/~kruschke/BEST/
One possibility for describing effect size in hierarchical Bayesian ANOVA-style models (including log-linear models) is a ratio of estimated variance parameters. For example, the main effect of factor A might have an “effect size” given by sigma_A divided by sigma_y, where sigma_A is the scale parameter of the distribution of the factor-A deflections, and sigma_y is the scale parameter of the within-cell noise distribution. But that’s just off the top of my head.
Thanks again,
--John

March 18, 2014:
John, I really enjoyed reading your post [http://doingbayesiandataanalysis.blogspot.com/2014/01/bayesian-variable-selection-in-multiple.html] as, similar to you, I feel that when it comes to real data there is a lot of fine tuning that is usually inadequately addressed in papers/books.

I have a problem where the posterior probability of delta=1 is always above 0.5 (even when the respective variable should clearly not be selected). I was wondering whether you made progress with your issues and perhaps can share some light on mine.

Thanks,

Thanks for the comment.

I’m not sure how you know that the “respective variable should clearly not be selected.” If you already know that, does that mean that you have prior knowledge that is not being expressed in the prior distribution of the model?

My impression of variable selection is that it’s a bit of an art, in the sense that the prior matters! But that’s true of Bayesian model selection in general. Real prior knowledge should be implemented if possible, and, in the case of variable selection, shrinkage from hierarchical structure should be implemented if appropriate.

1. I enjoy your blog so much. Like you, I am in psychology, and opportunities to use Bayesian statistics are very scarce (just by convention/tradition, of course). This post was greatly interesting, because it showed some of the real-life applications (and problems) others are juggling with.

This is the most Bayesian statistics I'll be exposed to for another while, I guess... (maybe until your next post :))

Patrick

2. Some important topics got mentioned.

> Don't forget that even though multiple decimal points are there, they are not necessarily meaningful.

Do you have some strategy to find out what is the variability in parameters due to sampler and sample size (not due to estimation error)? Then we could compute how many (effective) samples we need to obtain reliable estimates of the value at a particular decimal point. I don't have a solution for this. I always end up repeating the mcmc simulation just to see whether the parameters vary at a particular decimal point. If they do I increase the chain length...

> @ effect size

The concept of effect size does not really apply to bayesian analysis. I would say that every quantity that is not a bayes factor or a related probability of a model in model comarison is a kind of effect size measure. It is more conductive to ask what is the quantity that is most informative with respect to our research question. In log-normal model y~ lognorm(b0 + b1*indicator) I would report exp(b1) and interpret it as a multiplicative factor. That is if b1=-0.1 then exp(b1)=0.9 and I know that my treatment decreases the outcome var by 10 percent.

By the way, I think you create confusion by calling the regression models with categorical predictors anova in your book. People will look for variance and variance derived quantities when in fact your main message is (or is it not?) that the regression coefficient is of primary interest and should be reported.

best,
matus

3. Commenter> [Blog post said:] "Don't forget that even though multiple decimal points are there, they are not necessarily meaningful." Do you have some strategy to find out what is the variability in parameters due to sampler and sample size (not due to estimation error)? Then we could compute how many (effective) samples we need to obtain reliable estimates of the value at a particular decimal point. I don't have a solution for this. I always end up repeating the mcmc simulation just to see whether the parameters vary at a particular decimal point. If they do I increase the chain length...

Reply: The variability of the MCMC estimated value depends on the density of the distribution at that point. Thus, the MCMC estimate of the mean is relatively robust, assuming that the mean is at a relatively dense part of the distribution. The MCMC estimate of the 95% HDI limits, on the other hand, is relatively noisy, because the distribution is usually sparse at the limits. My rule of thumb for usefully stable estimates of the 95% HDI limits is an effective sample size (ESS) of 10,000. See, e.g., this blog post: http://doingbayesiandataanalysis.blogspot.com/2011/07/how-long-should-mcmc-chain-be-to-get.html

Commenter> The concept of effect size does not really apply to bayesian analysis. I would say that every quantity that is not a bayes factor or a related probability of a model in model comarison is a kind of effect size measure. It is more conductive to ask what is the quantity that is most informative with respect to our research question. In log-normal model y~ lognorm(b0 + b1*indicator) I would report exp(b1) and interpret it as a multiplicative factor. That is if b1=-0.1 then exp(b1)=0.9 and I know that my treatment decreases the outcome var by 10 percent.

Reply: Perhaps I don't understand what you mean, but the concept of effect size definitely does apply to Bayesian analysis. Again, see for example http://www.indiana.edu/~kruschke/BEST/

Commenter> By the way, I think you create confusion by calling the regression models with categorical predictors anova in your book. People will look for variance and variance derived quantities when in fact your main message is (or is it not?) that the regression coefficient is of primary interest and should be reported.

Reply: Sure, the models are "ANOVA" only by analogy, and there is no analysis of variance, in the least-squares sense, being done. I discuss this a little at the top of p. 492. I agree, however, that I could have been more clear about it. On the other hand, a lot of people think of "ANOVA" as the case of the generalized linear model that has nominal predictors with a metric predicted variable, not specifically as least-squares decomposition of variance.

4. > the concept of effect size definitely does apply to Bayesian analysis.

I guess it depends on how you define effect size. Though you are probably right that most people would call standardized estimates effect size.

Matus

5. On the point of contingency tables, Professor William Press has an exclusive installment regarding Bayesian analysis of contingency tables at http://granite.ices.utexas.edu/coursewiki/index.php/Segment_36._Contingency_Tables_Have_Nuisance_Parameters, which is part of his illuminating and entertaining "Opinionated Lessons on Statistics" series. Note Professor Press is also the co-author of Numerical Recipes. It's also interesting listening to his earlier presentations on contingency tables, especially where he says something like "So many people get these wrong", and he's impugning the use of the Fisher Exact Test. Indeed there's a "Bayesian inference supersedes the Fisher Exact Test" in all that.

This does not specifically help you set this up in JAGS, but I get you can listen to Professor Press and know where to go.

6. Jan:

Thanks for the link to Press' videos! I hadn't seen them before.

Two thoughts about that particular video:

1. In the brief chapter in DBDA about contingency tables, I assume that neither margin has fixed size. Observations just fall at random into any cell of the table.

2. More importantly, readers should note that Press is NOT doing what I would consider Bayesian estimation. He is, instead, computing p values. But he's computing p values with incorporating uncertainty about the marginals.