(I first messed around with analyzing Anscombe's quartet back in October 2010, for personal interest. A conversation I had last week in Basel, Switzerland, suggested that this analysis might interest a larger audience. Hence this blog post.)

Below are the four data sets, shown as black circles. They are modeled with typical linear regression that assumes normality:

y ~ dnorm( b0+b1*x , sigma )

with vague priors on the intercept b0, the slope b1, and the standard deviation sigma.Actually, for easy consistency with the subsequent analysis, I used a

*t*distribution with the normality (df) parameter given a prior that insists it is very large, near 500, which is essentially normal. Thus, the model is

y ~ dt( b0+b1*x , sigma , nu )

with vague priors on b0, b1, and sigma, and the prior on normality being
nu ~ dnorm( 500 , sd=0.1 )

The figures below show the data in the left panels, with a smattering of credible regression lines superimposed, and veritical bars that indicate the 95% HDI of posterior predicted y values at selected x values:

Notice that the marginal parameter estimates are the same for all four data sets (up to MCMC sampling error), which is exactly the point that Anscombe wanted to emphasize for

*least-squares*regression. Here we see that it is also true for Bayesian estimation, and

*this is no surprise when we remember from formal Bayesian analysis of linear regression (with normally distributed noise) that the "sufficient statistics" of the data determine the posterior distribution, and the sufficient statistics are identical for the four data sets*.

One of the beauties of the flexibility of JAGS/BUGS is that it is easy to implement other models. In particular, it is easy to implement regression that is robust to outliers, using the t distribution. I used the same model as above,

y ~ dt( b0+b1*x , sigma , nu )

with vague priors on b0, b1, and sigma, but I put a vague prior on the normality of the t distribution:
nu <- nuMinusOne + 1

nuMinusOne ~ dexp( 1/29 )

This is the same prior on nu as is used in the "BEST" model for robust comparison of two groups. The resulting posteriors look like this:nuMinusOne ~ dexp( 1/29 )

The normality parameter (nu) is not much different from its prior for Types 1, 2, and 4, resulting in slightly smaller estimates of sigma than when forcing nu to be 500. Most notably, however, for Type 3, the model detects that the data fall on a line and have a single outlier, indicated by the SD (sigma) estimated as an extremely small value, and the normality (nu) hugging 1.0, which is its smallest possible.

Analogously, we could easily do a

*robust*quadratic trend analysis, and thereby detect the non-linear trend in Type 2. See this previous blog post for how to extend a JAGS/BUGS linear regression to include a quadratic trend.

To reiterate and conclude, the Anscombe quartet not only makes its original point, that we should look at the posterior predictions to make sure that the model is a reasonable description of the data, but it also highlights that for Bayesian analysis using JAGS/BUGS, it is easy to try other models that may be better descriptions of the data.

ADDED 18 JUNE 2013: Rasmus Baath, after reading the above post, has actually implemented the extensions that I merely suggested above. Please continue reading his very interesting follow up.

This comment has been removed by the author.

ReplyDeleteNice post! For fun I tried to extend your model to, in some sense, model all four datasets in Anscombe's quartet. Perhaps not useful, but it was a fun exercise! Wrote a post about it here:

ReplyDeletehttp://www.sumsar.net/2013/06/bayesian-modeling-of-anscombes-quartet/

Very interesting! Thanks! (I edited the post to highlight your follow up.)

ReplyDeleteSorry to bother, I just rolled out a new version of my home page and the link to the "continuation" of you post has unfortunately automatically changed to:

ReplyDeletehttp://www.sumsar.net/blog/2013/06/bayesian-modeling-of-anscombes-quartet/

/Rasmus