A good topic would be "Tips/Tricks for getting your sampling scheme to converge" (I'm thinking of things like variable transformations, mean centering, etc.)

Hacking my way through the puppy book. Unfortunately, while the methodology is ideal, I have a brainware glitch that prevents me extending the logic to three+ levels of a hierarchical model. Here's my problem : 1. One binary outcome per patient-treatment (side effect A or not) 2. 1-12 Treatments per patient 3. 1-3 courses of treatments per patient over 3 years.

Some predictors of S.E. A occur at the treatment level (like BP)

Some occur at the patient level (like Gender)

How does one go about nesting these enormous models in BUGS?

You can't use a binomial model because you're collapsing across treatment level predictors. If you use a dbern() model as the lowest level, are you not causing pseudoreplication at the patient level ... inflating that effect?

Even tougher question : What if the occurrence of side effect A is autocorrelated throughout a course of treatment within a patient?

Does that violate exchangeability? Will the world, or my head... explode?

R.Dunne

PS there's the potential to extend findings to a level below this looking at the change in side effects after each treatment. - which 4-level model might send me looking for booze.

How would you explain to an only ~moderately statistically literate scientific reviewer that adjustment for multiple comparisons is not necessary when sampling from a Bayesian Posterior?

This may not be possible with blogspot, but I'd love to see a "Very Serious Bayesian Data Analysis" format option for the blog, in addition to the book, though perhaps with density estimates rather than stern post-it smileys.

Absent such an option, my office mate may begin to suspect that I'm an avid reader of one of those cute-animal-photo web sites...

Your question is not trivial to answer. It raises the much broader issue of how to systematically program "mixed" (a.k.a. "split-plot") designs that have some within-subject and some between-subject factors. I've been working on some examples, but finding time to finish them up so that they're presentable has been hard. But I am working on it.

Meanwhile, I guess it would be a good idea to spread tarps around before working on the models, to ease clean-up in case of exploding heads.

I would love to show examples from marketing and many other fields, but I need data sets! If you have access to useful data that are interesting to a wide audience, I'd enjoy learning about them. (Of course, the data should be free of charge and not contain any confidential information.)

You asked for feedback about your blog and book, so here I go. Firstly, thank you for your passionate effort to make many of us aware of the advantages and benefits of proper Bayesian data analysis. I tried to understand and learn how to use these techniques many times in the past. But it wasn't until I read your writings (not only the book, but also your papers) when I started to see the light at the end of the tunnel. Admittedly, I still remain in the darkness most of the time, but at least now I'm sure it is a tunnel, not a pit. So there is an exit after all. Secondly, I've got a suggestion for your blog/popularization activities. If you really want to improve your impact on the scientific/student community, you might consider creating a Twitter account. Writing papers and books is OK (they're compelling when it comes to technical details), a blog is a good popularization tool, but Twitter (and related communities) allows fast viral spread. Not being myself an advanced user of Twitter, it's just my suggestion.

Let me say it again, congratulations and thanks for your awesome book. I feel how my understanding, my knowledge, and what is more important, my interest, grow little by little, and that's reinforcing per se. I may let you know if I eventually come out of that tunnel...

Thanks for your kind remarks. I hope that your emergence into Bayesian sunlight is smooth and swift!

Regarding Twitter, I'm not a user, but perhaps I should be? I just don't know what I'd tweet about. I only update the blog every couple weeks or so. If you or another reader have ideas about spreading Bayesian methods through Twitter, I'm happy to hear them.

Dear Dr. Kruschke, thanks for the wonderful book - I'm in the middle of it and it goes on to be enlightening and fun. Beside it is the first Bayesian book from which I don't get the feeling 'I would like to learn it but those integrals and notations are over my head'.

My suggestion: whenever possible - I would find it instructive to see for a model a related R glmm code using lme4, e.g., to relate the a hyperparameter estimate from BUGS to the one from lmer(). I often asked myselelf 'wait isn't this the same as (or at lest similar to) the variance component from a mixed logit model'?

A question: i've seen code of a mixed logistic BUGS model with a binomial likelihood a normal prior and normal hyperprior (for random slopes and intercepts). The book often uses the beta-distribution with a kappa, mu reparametrisation. Is there a rule of thumb when to use what? Is one way better for specific reasons?

Regarding comparison of Bayesian results with results from lmer() etc.: While I agree it might be nice, there are a number of reasons I've opted not to do it. First, it would double the size of the book, because essentially the book would also have to be a complete textbook about traditional methods and all the complexities of syntax for lmer() and related functions. Second, the application of traditional methods to mixed nonlinear models can be quite difficult and controversial especially when it comes to test statistics and p values. Instead, we can just do the Bayesian analysis and forego all the worries about p values. Therefore, instead of tackling it myself, I invite you and other readers to tell us about specific cases in which you've made comparisons yourself. Perhaps you or others could author summaries, which I would be happy to post in the blog.

Regarding using a normal prior versus a beta prior: If the parameter (call it theta) has support from zero to one, then a beta prior is the obvious and intuitive way to express prior beliefs about theta. Some people go through an extra contortion of reparameterizing theta via a logit (inverse logistic) and then putting a normal prior on logit(theta). But I find the latter approach unintuitive, especially because it's not immediately obvious what the resulting prior on theta itself actually looks like. But this is not to say that a normal on logit(theta) is "wrong". There might be good reasons to use it in particular contexts.

Right, plotChains.R uses some BRugs commands. In the JAGS versions of the programs, plotChains.R is not called. Instead, by default the programs don't even display the chain diagnostics because the chains are long enough to smooth out any typical autocorrelation in the applications used in the book.

But, if you want to see the trace plots or autocorrelation functions, there's a section in the JAGS programs that does so. For example, in ANOVAonewayJagsSTZ.R, the first part of examining the results is this:

The trace plots are created by plot(codaSamples). Beware, for long chains it takes a long time to plot and the plots are not very informative, but you could plot subparts of the chains. The autocorrelation is displayed by autocorr.plot(codaSamples[[1]]). As I mentioned, even if the autocorrelation is moderately high, the very long chains smooth it out anyway.

your book and your paper "Bayesian estimation supersedes the t test" are invaluable help for a non-mathmatician to get an understanding for the Bayesian method. However, the first paper I came across this method was Tim Gerrodette's "Inference without significance: measuring support for hypotheses rather than rejecting them" (http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1269&context=usdeptcommercepub) where he argues on page 412 that its simply more likely (seven times) that a certain population decreased from t1 to t2 than not, although 0 is part of the HDI.

What do you think about this way of arguing?

In your book and articles you usually state in such cases that the "null hypothesis" cannot be excluded. So, what's your opinion about this?

Thanks for your comment and for the pointer to Gerrodette's article.

The method you point out is simply a different way of summarizing a posterior distribution. It could be used as input to a different decision rule, but Gerrodette does not go that far.

For concreteness, consider a posterior distribution that has P proportion of the posterior greater than zero. Then the odds of being greater than zero are P/(1-P). That's the summary value that Gerrodette mentions. Conceivably, that odds ratio could be used for a decision rule; e.g., reject zero if the odds ratio exceeds 39, which is .975/(1-.975).

I prefer the decision rule using the HDI (and the ROPE) because it's more intuitive (odds ratios take a bit of getting used to), it's sensitive to skew (equal tailed credible intervals, which go along with P, don't take skew into account), and it allows accepting the null (not just rejecting).

I am writing up a paper that I have analysed using Bayesian Parameter Estimation. While there are many papers extolling the virtues of Bayesian Parameter estimation there, I am finding it hard to source experimental papers that use and report such an analysis in the 'lean and mean' style that psychology journals require. Basically I need an example paper to base my reporting on. I figured of all people you would be the one to ask for a list. If you do could you post it?

Sorry not to have replied sooner -- for some reason Blogspot stopped notifying me of comments (still have to fix it).

In frequentist stats, the researcher reports (i) the estimated value of the descriptive statistic or parameter value and (ii) its 95% confidence interval. The analogous format in Bayesian stats is to specify the mode of the posterior distribution and its 95% HDI. Just as you'd say in frequentist stats, "estimated mu=2.3 (95% CI 2.1 to 2.5)," you say in Bayesian stats, "posterior modal mu=2.3 (95% HDI 2.1 to 2.5)." Of course, people typically leave out the words and just report the numbers, saying just once what the numbers mean.

If you want to decide that a parameter is "significantly" different from a null value, in frequentist stats you'd check if p<.05. In Bayesian stats, it depends on what kind of statement you want to make. The most analogous procedure to p values is Bayes factors, but decisions based on Bayes factors are nasty perilous -- see the five caveats in this article: https://link.springer.com/article/10.3758/s13423-017-1272-1 (preprint version available at OSF). Instead, you could decide based on the relation of the HDI to the null value -- see this article: http://journals.sagepub.com/doi/full/10.1177/2515245918771304 (preprint available at OSF).

A good topic would be "Tips/Tricks for getting your sampling scheme to converge" (I'm thinking of things like variable transformations, mean centering, etc.)

ReplyDeleteDr. K

ReplyDeleteHacking my way through the puppy book. Unfortunately, while the methodology is ideal, I have a brainware glitch that prevents me extending the logic to three+ levels of a hierarchical model. Here's my problem :

1. One binary outcome per patient-treatment (side effect A or not)

2. 1-12 Treatments per patient

3. 1-3 courses of treatments per patient over 3 years.

Some predictors of S.E. A occur at the treatment level (like BP)

Some occur at the patient level (like Gender)

How does one go about nesting these enormous models in BUGS?

You can't use a binomial model because you're collapsing across treatment level predictors. If you use a dbern() model as the lowest level, are you not causing pseudoreplication at the patient level ... inflating that effect?

Even tougher question : What if the occurrence of side effect A is autocorrelated throughout a course of treatment within a patient?

Does that violate exchangeability? Will the world, or my head... explode?

R.Dunne

PS there's the potential to extend findings to a level below this looking at the change in side effects after each treatment. - which 4-level model might send me looking for booze.

How would you explain to an only ~moderately statistically literate scientific reviewer that adjustment for multiple comparisons is not necessary when sampling from a Bayesian Posterior?

ReplyDeleteThis may not be possible with blogspot, but I'd love to see a "Very Serious Bayesian Data Analysis" format option for the blog, in addition to the book, though perhaps with density estimates rather than stern post-it smileys.

ReplyDeleteAbsent such an option, my office mate may begin to suspect that I'm an avid reader of one of those cute-animal-photo web sites...

Dear Anonymous Oct. 17, 2011, 6:39 PM:

ReplyDeleteOkay, puppies expunged from the blog margins.

Dear Ross (chloropleth):

ReplyDeleteYour question is not trivial to answer. It raises the much broader issue of how to systematically program "mixed" (a.k.a. "split-plot") designs that have some within-subject and some between-subject factors. I've been working on some examples, but finding time to finish them up so that they're presentable has been hard. But I am working on it.

Meanwhile, I guess it would be a good idea to spread tarps around before working on the models, to ease clean-up in case of exploding heads.

Hi Dr. K

ReplyDeleteCan I ask you for some marketing examples in your blogs using Bayesian analysis ?? Would be a great help !!

Dear Ankur:

ReplyDeleteI would love to show examples from marketing and many other fields, but I need data sets! If you have access to useful data that are interesting to a wide audience, I'd enjoy learning about them. (Of course, the data should be free of charge and not contain any confidential information.)

--John

Prof. Kruschke,

ReplyDeleteYou asked for feedback about your blog and book, so here I go.

Firstly, thank you for your passionate effort to make many of us aware of the advantages and benefits of proper Bayesian data analysis. I tried to understand and learn how to use these techniques many times in the past. But it wasn't until I read your writings (not only the book, but also your papers) when I started to see the light at the end of the tunnel. Admittedly, I still remain in the darkness most of the time, but at least now I'm sure it is a tunnel, not a pit. So there is an exit after all.

Secondly, I've got a suggestion for your blog/popularization activities. If you really want to improve your impact on the scientific/student community, you might consider creating a Twitter account. Writing papers and books is OK (they're compelling when it comes to technical details), a blog is a good popularization tool, but Twitter (and related communities) allows fast viral spread. Not being myself an advanced user of Twitter, it's just my suggestion.

Let me say it again, congratulations and thanks for your awesome book. I feel how my understanding, my knowledge, and what is more important, my interest, grow little by little, and that's reinforcing per se. I may let you know if I eventually come out of that tunnel...

PS: sorry about my bad English.

Fernando:

ReplyDeleteThanks for your kind remarks. I hope that your emergence into Bayesian sunlight is smooth and swift!

Regarding Twitter, I'm not a user, but perhaps I should be? I just don't know what I'd tweet about. I only update the blog every couple weeks or so. If you or another reader have ideas about spreading Bayesian methods through Twitter, I'm happy to hear them.

Thanks again.

Dear Dr. Kruschke,

ReplyDeletethanks for the wonderful book - I'm in the middle of it and it goes on to be enlightening and fun.

Beside it is the first Bayesian book from which I don't get the feeling 'I would like to learn it but those integrals and notations are over my head'.

My suggestion: whenever possible - I would find it instructive to see for a model a related R glmm code using lme4, e.g., to relate the a hyperparameter estimate from BUGS to the one from lmer(). I often asked myselelf 'wait isn't this the same as (or at lest similar to) the variance component from a mixed logit model'?

A question: i've seen code of a mixed logistic BUGS model with a binomial likelihood a normal prior and normal hyperprior (for random slopes and intercepts). The book often uses the beta-distribution with a kappa, mu reparametrisation. Is there a rule of thumb when to use what? Is one way better for specific reasons?

Thanks RenĂ©

Rene:

ReplyDeleteThank you for your nice remarks about the book.

Regarding comparison of Bayesian results with results from lmer() etc.: While I agree it might be nice, there are a number of reasons I've opted not to do it. First, it would double the size of the book, because essentially the book would also have to be a complete textbook about traditional methods and all the complexities of syntax for lmer() and related functions. Second, the application of traditional methods to mixed nonlinear models can be quite difficult and controversial especially when it comes to test statistics and p values. Instead, we can just do the Bayesian analysis and forego all the worries about p values. Therefore, instead of tackling it myself, I invite you and other readers to tell us about specific cases in which you've made comparisons yourself. Perhaps you or others could author summaries, which I would be happy to post in the blog.

Regarding using a normal prior versus a beta prior: If the parameter (call it theta) has support from zero to one, then a beta prior is the obvious and intuitive way to express prior beliefs about theta. Some people go through an extra contortion of reparameterizing theta via a logit (inverse logistic) and then putting a normal prior on logit(theta). But I find the latter approach unintuitive, especially because it's not immediately obvious what the resulting prior on theta itself actually looks like. But this is not to say that a normal on logit(theta) is "wrong". There might be good reasons to use it in particular contexts.

Thanks again for the comment and suggestion!

Do you have a version of plotChains.R for rJAGS?

ReplyDeleteIt looks like the current version of plotChains.R is for BRugs only.

Thanks!

Right, plotChains.R uses some BRugs commands. In the JAGS versions of the programs, plotChains.R is not called. Instead, by default the programs don't even display the chain diagnostics because the chains are long enough to smooth out any typical autocorrelation in the applications used in the book.

ReplyDeleteBut, if you want to see the trace plots or autocorrelation functions, there's a section in the JAGS programs that does so. For example, in ANOVAonewayJagsSTZ.R, the first part of examining the results is this:

checkConvergence = FALSE

if ( checkConvergence ) {

#show( summary( codaSamples ) )

#windows()

#plot( codaSamples , ask=F )

windows()

autocorr.plot( codaSamples[[1]][,c("sigma","aSD")] )

}

The trace plots are created by plot(codaSamples). Beware, for long chains it takes a long time to plot and the plots are not very informative, but you could plot subparts of the chains. The autocorrelation is displayed by autocorr.plot(codaSamples[[1]]). As I mentioned, even if the autocorrelation is moderately high, the very long chains smooth it out anyway.

Dear Prof Kruschke,

ReplyDeleteyour book and your paper "Bayesian estimation supersedes the t test" are invaluable help for a non-mathmatician to get an understanding for the Bayesian method.

However, the first paper I came across this method was Tim Gerrodette's "Inference without significance: measuring support for hypotheses rather than rejecting them" (http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1269&context=usdeptcommercepub) where he argues on page 412 that its simply more likely (seven times) that a certain population decreased from t1 to t2 than not, although 0 is part of the HDI.

What do you think about this way of arguing?

In your book and articles you usually state in such cases that the "null hypothesis" cannot be excluded. So, what's your opinion about this?

Dear m.kazmierczak:

ReplyDeleteThanks for your comment and for the pointer to Gerrodette's article.

The method you point out is simply a different way of summarizing a posterior distribution. It could be used as input to a different decision rule, but Gerrodette does not go that far.

For concreteness, consider a posterior distribution that has P proportion of the posterior greater than zero. Then the odds of being greater than zero are P/(1-P). That's the summary value that Gerrodette mentions. Conceivably, that odds ratio could be used for a decision rule; e.g., reject zero if the odds ratio exceeds 39, which is .975/(1-.975).

I prefer the decision rule using the HDI (and the ROPE) because it's more intuitive (odds ratios take a bit of getting used to), it's sensitive to skew (equal tailed credible intervals, which go along with P, don't take skew into account), and it allows accepting the null (not just rejecting).

This comment has been removed by the author.

ReplyDeleteDear Professor Kruschke

ReplyDeleteI am writing up a paper that I have analysed using Bayesian Parameter Estimation. While there are many papers extolling the virtues of Bayesian Parameter estimation there, I am finding it hard to source experimental papers that use and report such an analysis in the 'lean and mean' style that psychology journals require. Basically I need an example paper to base my reporting on. I figured of all people you would be the one to ask for a list. If you do could you post it?

Sorry not to have replied sooner -- for some reason Blogspot stopped notifying me of comments (still have to fix it).

DeleteIn frequentist stats, the researcher reports (i) the estimated value of the descriptive statistic or parameter value and (ii) its 95% confidence interval. The analogous format in Bayesian stats is to specify the mode of the posterior distribution and its 95% HDI. Just as you'd say in frequentist stats, "estimated mu=2.3 (95% CI 2.1 to 2.5)," you say in Bayesian stats, "posterior modal mu=2.3 (95% HDI 2.1 to 2.5)." Of course, people typically leave out the words and just report the numbers, saying just once what the numbers mean.

If you want to decide that a parameter is "significantly" different from a null value, in frequentist stats you'd check if p<.05. In Bayesian stats, it depends on what kind of statement you want to make. The most analogous procedure to p values is Bayes factors, but decisions based on Bayes factors are nasty perilous -- see the five caveats in this article: https://link.springer.com/article/10.3758/s13423-017-1272-1 (preprint version available at OSF). Instead, you could decide based on the relation of the HDI to the null value -- see this article: http://journals.sagepub.com/doi/full/10.1177/2515245918771304 (preprint available at OSF).