Tuesday, April 1, 2014

Potpourri 2: more misc emails regarding doing Bayesian data analysis

More miscellaneous communications with readers. (I have omitted all the salutations and pleasantries to save space here, but most correspondents do begin and end their messages with introductions and salutations that I truly appreciate.) Again, apologies to those I don't reply to -- too many!

Topics:
  • Adapt ANOVA-style models for mixed between- within- designs? 
  • glm() in R, with a poem 
  • Metropolis algorithm
  • I have some data, what should I do?
  • Could you look over my conference paper?
  • Time series models?
  • How to write up a Bayesian analysis
  • Trying to adapt programs, getting errors
  • Modifying FilconJags.R and FilconJagsPower.R for two groups instead of four groups
  • Sequential updating (including more data) when using MCMC
  • How to take the difference between two parameter distributions
  • What book should be studied after Doing Bayesian Data Analysis?
  • Exercise 13.4

March 9: Adapt ANOVA-style models for mixed between- within- designs?
My data set comprises two populations: patients and control subjects.
All subjects were repeatedly assessed - albeit in different conditions, because the treatment manipulation of the patients is not applicable for controls.
While our main interest pertains the patients' treatment conditions (4 levels), I'd like to compare them with the "natural variation" observed in the control group, who just repeated our experiments twice.
Thus abstractly, the patient group is split by a within-factor of 4 levels P1-P4, and the controls are split by a 2-level within factor, C1 and C2. What's the best method to compare these, assuming normal distributed response variables?

Currently I use one-way BANOVAs, extended by a subject factor to account for repeated measures. But with these models (4 and 2 levels, respectively), I think I can only model the two population groups separately in different models, which hinders direct comparisons between parameter estimates.

My questions in particular:
1) Can/should I extend the BANOVA model to include both populations?
2) Or is it possible to compare the parameters of different BANOVA models? If so, how?
3) Does this kind of design have a specific name? This might be handy to know to search for modeling examples.
Thank you very much in advance!
I talked to a colleague, who was of the opinion that I can directly compare posteriors, provided that they are in the same units and play the same "role" in the model.
In my example, it would be admissible to compare the b0 parameters, i.e. the BANOVA baselines from different models. Using that comparison, I would at least be able to state whether the groups are different from each other on average.

Specifically, the comparison has to be directly computed from the MCMC chains. For each pair of values, randomly sampled from the chains, I would compute the difference, and thus build up a distribution of differences. This then may be inspected with the usual methods, e.g. using its 95% HDI.

While this sounds correct to me, I'd like to know some more details. How many samples should/can I draw from the chains? Should I inspect the autocorrelation? Any help and/or pointers to literature are greatly appreciated!

Regarding the model structure, yes, you should be able modify the ANOVA-style models to suit the structure of your data. This is not to say it is necessarily trivial or obvious how to do that, especially when trying to impose the sum-to-zero constraints (if you choose to do so). For example, see this blog post about a split-plot design, which is related to the sort of design you have:
http://doingbayesiandataanalysis.blogspot.com/2012/05/split-plot-design-in-jags-revised.html

Regarding computing differences: At every step in the MCMC chain, you can compute differences or ratios or other transformations of parameters in whatever way is meaningful. At each step in the chain, the parameters are *jointly* representative of the posterior distribution.

If you have split your data into two parts that are modeled separately, then the parameter values are, by definition, independent. You can then set the two chains side-by-side and take differences between them as if they were in a joint space.


March 11: glm() in R, with a poem
n R question that I cannot get to the bottom of. I was hoping that you might be able to shed some light on it, as everyone I ask is drawing a blank. 

It is rather simple actually, I have two factors, and I want to do a logistic regression against a (0,1) target using glm in R. 

I have a 0,1 target and then use some explanatory variables which are factors with levels e.g. ... When I try and perform the logistic regression it refuses to use the first level of each factor, as you can see from the screen cap jk1.png attached to this e-mail. ...  Is this something you have experience of? Am I overlooking something trivial?! Apologies if so! 
As your question is about glm() in R, and not about Bayesian analysis per se, I will defer to the R experts. You might try posting your question on http://stats.stackexchange.com/ or related groups.
Best wishes.
In my desperation
To elucidate the facts
I disturbed you unnecessarily
When I should have consulted stacks.

You're welcome, it's a superb book, and has helped me a great deal. Kind regards

March 1: Metropolis algorithm
I am currently doing my statistics thesis on ... which requires quite a great knowledge of Bayesian Theory especially MCMC methods. However I have some minor problems regarding the Metropolis algorithm:

Any distribution can be attributed to the proposal density function?

Also, the initial values given to the parameters (i.e. the first step of the Metropolis algorithm) are random or there must be some thinking behind the choice of values?

Thank you very much
If you are programming your own Metropolis sampler, you should study the general Metropolis-Hastings algorithm and learn about methods for "tuning" the proposal distribution during an adaptation phase. A great thing about JAGS/BUGS is that they do it for you.

The initial values can be anywhere that has non-zero prior and non-zero likelihood (actually, even that is okay - the real constraint is that the prior times likelihood needs to be defined at the initial values), but initial values near the mode of the posterior make burn-in a lot faster. In either case, you need to monitor convergence to be sure that the chains have lost any bias from their starting positions.

Good luck with your work.



March 14: I have some data, what should I do?
I'm contacting you to ask for your advise on the following problem.


I have 500 samples, each sample being ... [extended description omitted].
Using Bayesian analysis I want to determine the error rate of ... .
Below is an exemplary of the data set. [data omitted] I have never used Bayesian Analysis before so I dont really know how to proceed. I'm thinking that Bayesian analysis is suitable to answer the research question.
-The idea is to use a non informative prior, so the sample can be positive or negative.
-Not sure what distribution to use (binomial, bernoulli, multinomial,). not sure.
-The rest, I do not know.


I look forward to your reply.
Thanks very much.
Thanks for you interest in Bayesian analysis. I hope you do choose to pursue it.

The first thing you need to analyze your data is a descriptive model. The goal of the analysis is to estimate parameter values that are meaningful descriptions of the data. Once you have a model, Bayesian analysis provides a distribution of credible values of the parameters, given the data.

As your field of research is not my specialization, I'm not in a position to prescribe the model you should use. But, whatever it is, I am confident that Bayesian analysis is a richly informative method for analyzing your data.

Best wishes,



 March 16: Could you look over my conference paper?
... I will be presenting a paper at the annual meeting of the ... Association. My paper is based on your BEST software, and I was wondering if you would be willing to take a look at it to point out any major errors I may have committed and save me from potential embarrassment.
Thanks for thinking of me and showing me a draft of your paper. [And I meant that sincerely!] Unfortunately I don't have time to study its 20 pages. If you have a specific question about some topic, however, I might be able to provide a quick answer. Best of luck with your presentation!



March 16: Time series models?
Do you think it is possible to apply Bayesian data methods to time series? What I would hope to avoid is trying to splice GARCH methodology with the Bayesian approach. GARCH is lumbering and clumsy enough in itself. For high frequency financial data, it must surely be possible to assume distribution stability, until evidence begins to emerge that the distribution has broken down, and needs to be reformulated? I would greatly value your advice.
I have not ventured deeply into time-series models, so I have no personal endorsements of particular resources. But Bayesian methods can (in principle) be applied to any model that has parameters, even so-called "non-parametric" models (which have up to an infinite number of parameters). For example, there is a 2011 book titled "Bayesian time series models" that collects recent advanced applications. Although it's not a beginner's tutorial, it might provide useful pointers.


Feb. 25: How to write up a Bayesian analysis
Sir, thanks for Bayesian Estimation Supersedes the t-Test, I can make the plots, sorry but how to analyse the plots and write a paper. Will you please put up a model paper with such analyses. [Here] we have been brought up on a strict diet of ANOVA-RBD/CRD, and t tests with CV values (Analysis of Variance - Randomized Block Design / Completely Randomized design). How can I do these analyses / similar in Bayesian methods,
Reply from Mike Meredith:

I can't point you to papers describing a Bayesian analysis of ANOVA-type models, but I will indicate sources with advice and pass this on to John Kruschke who may have more references to hand.

John has a blog post on the topic at http://doingbayesiandataanalysis.blogspot.com/2012/05/how-to-report-bayesian-analysis.html. This has a reproduction of the first part of section (23.1) on "Reporting a Bayesian analysis" from Kruschke, J.K. (2011) Doing Bayesian data analysis: a tutorial with R and BUGS Elsevier, Amsterdam etc.

The main points are summarised in Kruschke, J.K., Aguinis, H., & Joo, H. (2012) The time has come: Bayesian methods for data analysis in the organizational sciences. Organizational Research Methods, 15, 722-752 in the section "Recommendations and Illustration of How to Report Bayesian Analysis Results", where they also give an example of writing up a regression analysis.

Also Kruschke, J.K. (2013) Bayesian estimation supersedes the t test. *Journal of Experimental Psychology: General*, 142, 573-603 has a section on "Reporting the results of a Bayesian analysis". You would need to cite that paper as the description of the methods for robust Bayesian analysis.

There are links to the two Kruschke papers at http://www.indiana.edu/~kruschke/publications.html

In my own field (wildlife ecology) we can rarely do experiments, so hypothesis testing is scarcely relevant anyway. The focus is on estimation (of density, abundance, occupancy, survival, etc) or on modelling. In the latter context, information theoretic approaches (using Akaike's Information Criterion, AIC) have been the norm, and moving from there to a Bayesian approach is relatively painless. So I can't point you to papers reporting a Bayesian analysis of ANOVA-type data; Marc Kery's book, An Introduction to WinBUGS for Ecologists (Academic Press, 2010), shows how to do this, but not how to write it up.



April 1: Trying to adapt programs, getting errors
I have been looking through your textbook Doing Bayesian Data Analysis and specifically have been looking at how to do the Bayesian ANOVA in Chapter 18. I have been running into many problems, and I was wondering if you would have time to take a quick look over my code to see what's going wrong. I have no familiarity with R prior to this book and no experience programming, so I am not sure where my errors are. Generally speaking, the problem is that objects and functions are not being found. I have attached a the .csv of my imaginary data set and a .txt of the code I am running in RStudio in case you are able to take a look (I don't mean to presume you will, it's just quicker to already have the materials attached). I do load the .csv into RStudio before running the code. I suppose my core problem here is that I don't know how to take the code from the book using the examples there and apply the code to my own data sets for personal use. Thank you for your time,
Thanks very much for your interest in the book. To have any chance to help you, I need to know what error messages are being shown. Could you please run your script and then copy and paste the R console, with error messages, into an email? Especially important is to identify the very first error that occurs.
Then I run the code, which I have attempted (unsuccessfully) to adapt from what is in the textbook for a Bayesian ANOVA.

After running the code, this is what I get. It seems like it's failing to create several objects that it needs to perform the analysis, or is not finding the functions it needs to do it. I'm not really sure where the problem begins. I know in programming code a single error can lead to other errors, but this is my first exposure to R programming and the code in the book is for specific practice examples and I'm just not sure if I am adapting it to my pretend data set properly (which I'm most likely not given the errors).

 I also get several windows where graphs are supposed to be, but nothing is in them. I colored the errors red to make them easier to find.

...

> modelCheck( "model.txt" )
Error: could not find function "modelCheck"
+     fnroot = paste( fnroot , dataSource , sep="" )
+     datarecord = read.table( "model.txt", header=T ,
+                              colClasses=c("factor","numeric") )
... 
Error: unexpected symbol in:
"    NxLvl = length(unique(datarecord$Group))
    contrastList = list(1v2"
>    
A few quick thoughts:

* If you're just interested in comparing two groups, definitely look here:
http://www.indiana.edu/~kruschke/BEST/
It provides you with complete programs for that case.

* I now strongly recommend NOT using BUGS, and instead using JAGS. JAGS is much like BUGS but works better. See these posts:
http://doingbayesiandataanalysis.blogspot.com/2012/01/now-in-jags-now-in-jags.html
http://doingbayesiandataanalysis.blogspot.com/2012/01/complete-steps-for-installing-software.html

* Your first error,
> modelCheck( "model.txt" )
Error: could not find function "modelCheck"
is that R cannot find BRugs. I think you need to first say
library(BRugs)
or whatever (I've forgotten since it's been so long that I've used BRugs).

* Your second error is that datarecord() is reading in model.txt, not data.
Hope that helps.



March 25: Modifying FilconJags.R and FilconJagsPower.R for two groups instead of four groups
The R code assumes the existence of four conditions (consistent with the ongoing example), but the data file has data for only two, so the code fails when it attempts to access the non-existent data. I can remove the code for handling the two conditions not in the data, but I assume that the intent was to have all four in the Rdata file. Is there a version of FilconJagsMuKappa.Rdata that has the expected four conditions?
If I understand your question correctly, what you are trying to do is apply the FilconJags.R and FilconJagsPower.R programs to data with only 2 groups instead of 4 groups. I think the programs should run without complaint for just 2 groups, although there won't be much benefit (i.e., shrinkage) from the hierarchical structure when there are only 2 groups.

Try this:
In FilconJags.R, include only the first 2 groups. Find the data section and insert the lines shown here:

# THE DATA.
# For each subject, specify the condition s/he was in,
# the number of trials s/he experienced, and the number correct.
# (These lines intentionally exceed the margins so that they don't take up
# excessive space on the printed page.)
cond = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4)
N = c(64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64)
z = c(45,63,58,64,58,63,51,60,59,47,63,61,60,51,59,45,61,59,60,58,63,56,63,64,64,60,64,62,49,64,64,58,64,52,64,64,64,62,64,61,59,59,55,62,51,58,55,54,59,57,58,60,54,42,59,57,59,53,53,42,59,57,29,36,51,64,60,54,54,38,61,60,61,60,62,55,38,43,58,60,44,44,32,56,43,36,38,48,32,40,40,34,45,42,41,32,48,36,29,37,53,55,50,47,46,44,50,56,58,42,58,54,57,54,51,49,52,51,49,51,46,46,42,49,46,56,42,53,55,51,55,49,53,55,40,46,56,47,54,54,42,34,35,41,48,46,39,55,30,49,27,51,41,36,45,41,53,32,43,33)

# Include only the first two groups:
includeIndex = ( cond <= 2 )
cond = cond[includeIndex]
N = N[includeIndex]
z = z[includeIndex]

nSubj = length(cond)
nCond = length(unique(cond))
Then source FilconJags.R. This will save the MCMC chain for the 2-group model. It will also try to produce graphs, but because the graphs assume four groups, the program will balk. But the posterior MCMC is saved before that happens.

FilconJagsPower.R will now run as is, except for two things. First, its graphics assume 4 groups. Therefore you can make this change:

    # Make plots for first 10 simulated experiments:
    #if ( expIdx <= 10 ) { plotRes = T } else { plotRes = F }
    plotRes=FALSE
And, its "goals" are defined in terms of 4 groups, so you can make this change:

# Goal is filtration vs condensation 95% HDI exceeds zero:
#goal1Ach = ( HDIofMCMC( (mu[1,]+mu[2,])/2 - (mu[3,]+mu[4,])/2 )[1] > 0.0 )
# Goal is filtration1 vs filtration2 95% HDI exceeds zero:
goal2Ach = ( HDIofMCMC( mu[1,]-mu[2,] )[1] > 0.0 )
# Goal is condensation1 vs condensation2 95% HDI has width less than 0.2:
#HDIlim = HDIofMCMC( mu[3,]-mu[4,] )
#HDIwidth = HDIlim[2] - HDIlim[1]
#goal3Ach = ( HDIwidth < 0.2 )
#goalAchieved = c( goal1Ach , goal2Ach , goal3Ach )
goalAchieved = c( goal2Ach )

I hope that steers you toward achieving your goals!


April 1: Sequential updating (including more data) when using MCMC
I'm sorry to bother you, but I have a question for which I did a lot of researches, but I couldn't get any result. Basically, I implemented a Hierarchal Bayesian model in R and it works fine. However, what I will like to do now is to apply the model on a trial by trial basis (instead of applying it on the whole set of trials) and get the parameters estimations after each trial to use them both as "prior" parameters on the next trial and to track the evolution of these parameters. I do not want to use sequential methods, I just want to apply the model on a trial-by-trial methods. Do you think you can give some idea?
If you are using MCMC methods, and you really want to use the posterior from one run as the prior for the next, it is not directly possible. This is because the posterior distribution from the first run will usually be complex (e.g., curved and correlated in parameter space, with possibly skewed tails and so on) and not possible to be exactly expressed mathematically in the prior specification. (The beauty of using conjugate priors is that the posterior has the same mathematical form as the prior, so incremental updating is elegantly easy. For complex models, there are no [known] conjugate priors.)
Some people attempt to get around that problem by approximating the posterior of the first run with some simpler mathematical form, and using that mathematical re-description as the prior for the next run. But this method is only as good (or bad) as the approximation, and I wouldn't recommend it unless you have really cogent reasons to think that the approximation is excellent.
If you are merely illustrating how the posterior changes with incremental updating, then you can simply do the data analysis using incremental data sets. Do an analysis with {x1,...xN}, then an analysis using the original prior with {x1,...,xN,xN+1}, then an analysis using the original prior with {x1,...xN,xN+1,xN+2}, and so on.



March 28: How to take the difference between two parameter distributions
one thing that has me perplexed right now is how you take the difference between two posterior distributions to see if there is a change. One very common application to us is to have similar data on a specific metric before and after a ... change. Naturally, the intent of our analysts is to understand if there has been a difference after that ... change. Our first step is to calculate the posteriors of the two datasets (before and after). Then I assumed that the posteriors are a random variable so they are compared by forming the convolution of the two distributions. However, if you just visualize two normal distributions (the posteriors) where one has a mean near zero and the other is shifted (say, to 5), the convolution will still cover zero. So although these two datasets are clearly different and the software change has had an impact, the HDI will still straddle zero and the conclusion will be that there was no change. Is this method of using convolution how you did your poster comparisons in the textbook? If not, I would very much like to know the methods for taking this difference like you did with respect to recall and classical music in your text (chapter 12). Thanks very much for you time,
At every step in the MCMC chain, the parameter values at that step are jointly representative of the multidimensional posterior distribution. Therefore the difference of relevant parameter values at each step is a representative difference.

I think the first example of this in the book occurs at the top of p. 176.

Because posterior distributions can have strange shapes, it is important to compute the difference at each step, maintaining the joint parameter values. See Fig. 12.3, p. 300.



March 27: What book should be studied after Doing Bayesian Data Analysis?
What do you recommend as a good second book [after Doing Bayesian Data Analysis]? I’m a long time R user and my problem space is in engineering.
For a follow-up book, I'd recommend
Ntzoufras (2009) Bayesian modeling using WinBUGS
and
Gelman et al. (2014) Bayesian data analysis 3rd ed.
You might also try
Lee & Wagenmakers (2014) Bayesian cognitive modeling.

None of those is specifically engineering, but Ntzoufras and Lee & Wagenmakers give lots of BUGS/JAGS examples and Gelman et al. is a great authoritative resource.



March 27: Exercise 13.4
Exercise 13.4 refers to Kruscke1996CSbugs.R; I don’t see it (or the Jags version) in the collection of R programs and data. The associated data file is there (Kruschke1996CSdatsum.Rdata), however. Is it not there on purpose (i.e., the student is expected to write the code from scratch)?
The program for that exercise was deliberately excluded from the posted programs because it was supposed to be a programming exercise, not merely "click the 'source' button and watch what happens." I should not have mentioned the program name in the exercise description, but I did because that was how I got my word processing software (LaTeX) to include code snippets. The program I have for that exercise is old and not prepared for release, so I would prefer not to circulate it. I apologize that this is not the answer you were hoping for!