Doing Bayesian Data Analysis: January 2012

Saturday, January 28, 2012

Complete steps for installing software and programs

November 29, 2014:
For how to get the software, please see the book's web site here (https://sites.google.com/site/doingbayesiandataanalysis/software-installation).

Sunday, January 22, 2012

Jocular Disbelief

Posted by Yann LeCun - Jan 19, 2012 - Public Google+

Joke of the day (true story, circa 2004):
Radford Neal (giving a talk): I don't necessarily think that the Bayesian method is the best thing to do in all cases...
Geoff Hinton: Sorry Radford, my prior probability for you saying this is zero, so I couldn't hear what you said.

Wednesday, January 18, 2012

Specialized Workshop, Feb 17-18, Seton Hall University

I'll be doing a day-and-a-half workshop at Seton Hall University, including a specialized analysis of data produced by researchers at SHU. Details can be found here.

A list of future and past workshops can be found here.

Parameterizing a gamma distribution by mode and sd

When trying to fashion a gamma-shaped prior, I've found it more intuitive to start with the mode and standard deviation, instead of the mean and standard deviation as used in the book. The reason is that the gamma distribution is typically very skewed, and therefore the location of the mean is not very intuitive. Here I present how to compute the shape and rate parameters of the gamma distribution from a desired mode and standard deviation.

You can extract the mathematical formula for getting to shape and rate parameters from the R code I created:

# Specify desired mode and sd of gamma distribution:
mode = 10
sd = 20

# Here are the corresponding rate and shape parameter values:
ra = ( mode + sqrt( mode^2 + 4*sd^2 ) ) / ( 2 * sd^2 )
sh = 1 + mode * ra

show(sh)
show(ra)

# Graph it:
x = seq(0,mode+5*sd,len=1001)
plot( x , dgamma( x , shape=sh , rate=ra ) , type="l" ,
      main=paste("dgamma, mode=",mode,", sd=",sd,sep="") ,
      ylab=paste("dgamma( shape=",signif(sh,3)," , rate=",signif(ra,3)," )",
                 sep="") )
abline( v=mode , lty="dotted" )

The figure it creates looks like this (below). Notice that the mode is indeed as desired.

Hope that helps. Let me know.

Friday, January 13, 2012

Graphics (plots) in R for MacOS, Linux, and Windows

It seems that there is no universal, cross-operating-system way for opening and saving graphics in R, at least not with the functionality I'd like.

[this post is updated/superseded by this one.]

Here's what I know so far --- but please let me know better ways and I'll update this post. I use Windows, so the info here about MacOS and Linux is what I've learned from readers, and I haven't tried it myself.

What I'd like to do: 1. Open a new graphics window. 2. Put plots into it. 3. Save it in any format of my choice. 4. Do this repeatedly, for a succession of graphs.

How to do it in Windows:
1. Use the windows()command to open a new graphics window.
2. Use plot(), lines(), etc. to make the graph.
3. Use the savePlot() command after the graph is drawn, with an appropriate argument to specify the desired format. File formats that work with savePlot() in Windows include "jpg", "pdf", "eps", etc. Note that savePlot() supersedes dev.copy2eps(), which was used in older versions of the programs.
4. Repeat as desired.

Equivalents for windows() in other operating systems:

Universal: dev.new(). The dev.new() function is great when running in the primitive editor that comes with R, but dev.new() fails in the RStudio editor, which does not allow repeated calls of dev.new(). Sigh. I don't know if repeated calls to dev.new() work in other editors or on other platforms. If you try and have information to share, please let me know.
MacOS: quartz(). You can change all the windows() calls to quartz(). The arguments in windows() default to windows(width,height), which might be different than quartz(). So, programs that use, say, windows(10,6) could be changed to quartz(width=10,height=6). But, I'm told that savePlot() does not work with a Mac window opened with quartz(). Sigh.
MacOS again: x11(). Mac users can also open a new window by using the x11() command, with which savePlot() does work. But I'm told that x11() can produce ugly fonts and strange proportions. Sigh. I'm told that x11( type="cairo", ... ) can do the trick.
Linux: x11(). But I haven't heard from anyone whether this really works without further modification. Linux users can try dev.new(), too, if they're not using RStudio.

Equivalents for savePlot() in other operating systems:

MacOS: If the window was opened with x11(), then savePlot() can work but you must specify a format type the MacOS knows about. For example, you cannot specify type="jpg" but must instead specify type="jpeg". If the window was opened with quartz(), then I am not aware of any way you can save the plot, after the fact, from the program command line. Perhaps you can use a pull-down menu item to manually save it.
Linux: I assume that savePlot(), with an appropriate type specification, will work. If there a Linux users out there who have tried it, please let me know.

An alternative procedure that works across platforms: Instead of creating the graph in a window and saving it after producing it, we open a "device" of the desired type before creating the graph, and then we close the device after the graph is done. For example, the pdf(file="fileName.pdf",...) command, in Windows, opens a device of type pdf. Then you draw the graph in the usual way. When done, close the device with the command dev.off() Thus:

pdf(file="fileName.pdf")

# plot commands go here

dev.off()

Besides pdf(), there are bmp(), jpeg(), png(), and tiff() commands for opening devices of different types. To see a complete list for your system, at the R command line type: help("Devices")

For providing these tips, I thank various readers, including George Kachergis, Reinhold Kliegl, Mark Perlin, Young Ahn, Uwe Ligges, and others.

minNforHDIpower.R updated

In Chapter 13, program minNforHDIpower.R is used to compute the minimum sample size needed for desired power. The program that was previously posted had a few glitches that have now been corrected. (Specifically, the source()-ing HDIofICDF.R needed quotes, and the call to HDIofICDF needed to include the argument credMass=HDImass. I also modified the condition at the beginning to use xor().) The results reported in the book are correct. You can download the updated program from the link in the book website.

If there are still any problems, please let me know.

Many thanks to Nico Corea for bringing this issue to my attention.

Tuesday, January 10, 2012

New procedure required for installing BRugs and OpenBUGS

Before R 2.14, OpenBUGS came packaged with BRugs, so all you had to do was install BRugs, and OpenBUGS magically came along for the ride. Now, you must install OpenBUGS as a separate step in the process. Of course, if you are using JAGS, you do not need to install OpenBUGS or BRugs. But if you want to use BRugs and OpenBUGS, here's what to do:

1. Install R from this site. Use the 32-bit Windows version.
2. Install OpenBUGS from this site.
3. Invoke R (32-bit Windows), and type install.packages("BRugs")

I recommend using JAGS instead, as explained at this previous blog post.

Warning message produced by R 2.14.1 in multiple linear regression programs

R version 2.14.1 produces a warning message that did not occur in earlier versions of R. Specifically, the sd(y) function now produces a warning message if y is a column matrix and not a vector. This situation arises in the various programs for multiple linear regression. I have therefore replaced the offending uses of sd(y) with apply(y,2,sd). You can download the updated programs from the link in the book website.

If I missed any occurrences in these or other programs, please let me know.

Many thanks to Agustin Alonso Rodriguez for bringing this issue to my attention.

Friday, January 6, 2012

Complete example of right censoring in JAGS (with rjags)

In this post I provide a complete and simple example of how to do estimation of right-censored data in JAGS (with rjags). I could not find a complete, simple example anywhere (on the web or in print), and it took me lots of trial and error to figure out the tacit assumptions, which I try to make explicit here. (Of course, there still might be errors or infelicities. Let me know.)

Before launching into programming details, let me illustrate why this is important. Right-censored data can occur in many situations. For example, when measuring response times, the subject is given a limited temporal window in which to respond, and if no response has occurred by the end of the window, then the trial is aborted and the response is recorded as censored. If we assume that the subject was "on task" but merely didn't have enough time to respond before the experimenter got bored, then it would be inappropriate to omit this trial from the modeled data. In other words, the trial is providing genuine information about the response time, namely, that it takes longer than the experimenter's censoring limit. If we omit this trial from the modeled data, then our estimates will be too small!

Here is an example. The data are 500 values generated from a normal distribution, rescaled so that the sample mean is 100.0 and the sample standard deviation is 15.0. Then I marked any value above +1SD as NA. In other words, I censored about 15% of the data in the right tail. Suppose we model the data with a normal likelihood and estimate its μ and σ parameters. We would like the estimates to recover the true generating values of μ=100.0 and σ=15.0. But if we simply omit the censored values (as if they weren't part of the data), the estimates look like this:

Posterior estimates when censored data are omitted. Estimates are too low.
(ROPE is shown only for visual landmarks; it's not particularly meaningful in this case.)

Notice that the estimated parameter values are too low. This makes sense, because the model is trying merely to describe the uncensored data shown in grey, without any attempt to take into account the censored data.

On the other hand, if we tell JAGS that there are censored values that it should take into account, the estimates look like this:

Posterior estimates when censored data are modeled. Estimates are accurate.
(ROPE is shown only for visual landmarks; it's not particularly meaningful in this case.)

Notice that the estimated parameter values are very accurate. This improvement in accuracy (and meaningfulness!) is why we want to be able to model censored data.

The basic idea for implementing in JAGS is presented briefly (one paragraph) in the JAGS user manual. The idea starts with defining an indicator vector that encodes, for every data value, whether the data value was censored or not. The indicator value, here named isCensored, is 1 for censored values and 0 for non-censored values. Then that indicator value is modeled thus: isCensored ~ dinterval(y,censorLimit) It took me a while to infer from that syntax the key idea: When isCensored is 1 and y is missing, then all JAGS knows is that y is somewhere above censorLimit, so JAGS effectively imputes a random value for y from the likelihood function (defined separately, in the usual way). Here are the tacit assumptions that I eventually figured out:

Censored data must be recorded as NA, not as the value of censoring limit.
When explicitly initializing the chains, the censored values of the data must be explicitly initialized (to values above the censoring limits)!

The full program for generating the figures above can be found here. For details, read the comments in the top of the program. Please let me know of errors, improvements, etc. And please let me know if this was helpful.

Sunday, January 1, 2012

Now in JAGS! Now in JAGS!

I have created JAGS versions of all the BUGS programs in Doing Bayesian Data Analysis. Unlike BUGS, JAGS runs on MacOS, Linux, and Windows. JAGS has other features that make it more robust and user-friendly than BUGS. I recommend that you use the JAGS versions of the programs. Please let me know if you encounter any errors or inaccuracies in the programs.

JAGS is an MCMC sampler much like BUGS. To communicate with JAGS from R, a library called rjags is loaded into R. The diagram below illustrates the analogous roles of the sampling programs and the interface libraries:

JAGS, OpenBUGS, and WinBUGS are MCMC samplers. Each has a corresponding package for communicating with R. The book used OpenBUGS with BRugs. But the new programs feature JAGS with rjags.

Even if you use Windows, JAGS is nicer than BUGS for many reasons. One nicety is that JAGS has a dynamic progress bar that tells you how much of the requested chain has been sampled. JAGS also seems more robust and actually works in many situations where BUGS mysteriously fails; e.g., the equals(,) function works in JAGS where it would not work in BUGS.

It is easy to install JAGS and rjags. First, go to the JAGS web site and follow the instructions for downloading and installing JAGS for your operating system. Then, at the command line in R, type install.packages("rjags"). Done!

It is easy to get the JAGS versions of the programs for Doing Bayesian Data Analysis. JAGS versions of the programs use the same name as the BUGS versions, but with the string "Bugs" or "BRugs" replaced with "Jags". (All of the original programs are still available.) A zip file with all the programs and data files is available here. A list of individual programs is available here; click on the column header "last modified" twice to get the most recent files to appear at the top of the list. [Update January 12, 2012: A few programs (BernBeta...) were inadvertently missing from the upload of January 1. I've now posted them with the other programs. Thanks to a reader for pointing out the missing files.]

As explained in other blog posts, there are other revisions and additions to the programs.
* There is a new program for split-plot designs, called SplitPlotJags.R. See this blog post.
* The new versions of the programs do no thinning but use longer chains (defaulting to chain lengths of 50000, but you can use even longer chains for real research reports). See this blog post.
* The ANOVA-like programs consider only the sum-to-zero (STZ) versions of the parameters, because autocorrelation in the pre-STZ parameters is irrelevant. Use the versions of the programs with filenames that end with "STZ". See this blog post.
* Instead of saving plots using dev.copy2eps, the programs use savePlot. It is easy to change the savePlot command to save in whatever format you prefer, such as jpg/jpeg. Each program also begins with a check of the user's operating system to redefine the windows() command if necessary. Both of these changes were recommended by comments from readers.

[Added 28 January: Complete installation instructions are listed here.]

Doing Bayesian Data Analysis