I was very pleased to have had the opportunity to present a talk regarding Bayesian models of ordinal data at Princeton University, on Tuesday, March 5, 2019. An abstract of the talk is here, and the published article on which the talk was based is here.
A big thanks to Dr. Ting Qian who orchestrated the visit wonderfully. The lecture hall was packed, with people standing outside the door in the hall, in no small part due to Ting's organization of a popular statistics series. Big thanks also to the staff who made arrangements. And, of course, thanks to the many people who took time to meet with me while I was there.
I took a few hours the next morning to visit some special places in Princeton. In particular, I visited the graves of John Von Neumann and Kurt Godel. Von Neumann made contributions to Monte Carlo methods at the foundation of MCMC methods in Bayesian analysis.
Here's a snapshot of Von Neumann's grave stone:
And only a few feet away, Godel's grave stone:
After visiting the cemetery, I went to the Princeton University Art Museum and came across this painting by Georgia O'Keeffe. It's titled, "From a New Jersey Weekend II," painted in 1941.
It was an interesting coincidence that both O'Keeffe's and my visits featured two notable grave stones.
Here's a map of the Princeton Cemetery, with a marker at the approximate location of the graves of Von Neumann and Godel.
For posts about the book, DBDA2E, visiting other famous grave sites, see this post and its links to previous posts.
Doing Bayesian Data Analysis
Wednesday, March 13, 2019
Thursday, February 21, 2019
A Stendhal moment on the way to Bayesian stats class
On the way to my Bayesian stats class this morning I had a few moments of Stendhal syndrome and thought I'd share it with y'all. (Aside from being on the way to Bayesian stats class, it has nothing directly relevant to Bayesian statistics.) The sun was shining through some construction fences lining a sidewalk and were unintentionally beautifying an otherwise routine Thursday morning. Here are a couple of snapshots from my office window:
It reminds me of Christo and Jeanne-Claude's Running Fence:
It reminds me of Christo and Jeanne-Claude's Running Fence:
Sunday, October 21, 2018
Sinusoidal trend and global warming UPDATED
In a previous post from six years ago, I fit a sinusoidal trend, with auto-regressive component, to daily temperature data. (Spoiler alert: It's still getting warmer.) Recently I've received inquiries about the script for that analysis. I disinterred the ancient script, updated it, and grabbed more recent temperature data. The script and data file are linked below.
The result of the new analysis:
As you can see from the plot (above), the slope of the linear spine of the sinusoidal variation is 0.068 degrees Fahrenheit per year. The 95% HDI on the estimate spans zero, just as it did with the smaller data set in the previous post from six years ago. But I'm pretty sure that if this city were put into a big hierarchical model with lots of other cities across the globe, the high-level estimate of slope on the linear spine would be clearly greater than zero.
But evidence for global warming is not the point of this post. The point is to link the full script and data file. Here they are: R script; data. Hope this is helpful.
The result of the new analysis:
As you can see from the plot (above), the slope of the linear spine of the sinusoidal variation is 0.068 degrees Fahrenheit per year. The 95% HDI on the estimate spans zero, just as it did with the smaller data set in the previous post from six years ago. But I'm pretty sure that if this city were put into a big hierarchical model with lots of other cities across the globe, the high-level estimate of slope on the linear spine would be clearly greater than zero.
But evidence for global warming is not the point of this post. The point is to link the full script and data file. Here they are: R script; data. Hope this is helpful.
Wednesday, September 19, 2018
Which movie (treatment, group) is better? Opposite conclusions from different models.
Which movie is better? One way to answer is by considering the star ratings given to those movies. Just treat those 1-to-5 star ratings as numbers, throw them into a t test, and out pops your answer. Right? Not necessarily...
The analogous structure arises in many situations. Suppose, for example, we ask which group is happier, a group of poor people or a group of rich people? One way to answer is by considering subjective happiness ratings from an ordinal scale: 1 = very unhappy, 2 = mildly unhappy, 3 = neither unhappy nor happy, 4 = mildly happy, 5 = very happy. Just treat those 1-to-5 ratings as numbers, throw them into a t test, and out pops your answer. Right? Not necessarily...
Or, consider ratings of symptom intensity in different treatment groups. How bad is your headache? How depressed do you feel? Just treat the ratings as numbers and throw them into a t test, and out pops your answer. Right? Not necessarily...
Treating ordinal values as if they were numeric can lead to misinterpretations. Ordinal values do not indicate equal distances between their levels, nor equal coverage of each level. The conventional t test (and ANOVA and least-squares regression, etc.) assumes the data are metric values normally distributed around the model's predicted values. But obviously ordinal data are not normally distributed metric values.
A much better model of ordinal data is the ordered-probit model, which assumes a continuous latent dimension that is mapped to ordinal levels by slicing the latent dimension at thresholds. (The ordered-probit model is not the only good model of ordinal data, of course, but it's nicely analogous to the t test etc. because it assumes normally distributed noise on the latent dimension.)
The t test and the ordered probit model can produce opposite conclusions about the means of the groups. Here's an example involving star ratings from two movies:
The figure above shows data from two movies, labelled as Cases 5 and 6 in the first two columns. The pink histograms show the frequency distributions of the star ratings; they are the same in the upper and lower rows. The upper row shows the results from the ordered-probit model. The lower row shows the results from the metric model, that is, the t test. In particular, the right column shows the posterior difference of mu's for the two movies The differences are strongly in opposite directions for the two analyses. Each posterior distribution is marked with a dotted line at a difference of zero, and the line is annotated with the percentage of the distribution below zero and above zero. Notice the ordered-probit model fits the data much better than the metric model, as shown by the posterior predictions superimposed on the data: blue dots for the ordered probit model, and blue normal distributions for the metric model. (This is Figure 8 of the article linked below.)
Read all about it here:
Published article:
https://www.sciencedirect.com/science/article/pii/S0022103117307746
Preprint manuscript: https://osf.io/9h3et/
R code: https://osf.io/53ce9/files/
The analogous structure arises in many situations. Suppose, for example, we ask which group is happier, a group of poor people or a group of rich people? One way to answer is by considering subjective happiness ratings from an ordinal scale: 1 = very unhappy, 2 = mildly unhappy, 3 = neither unhappy nor happy, 4 = mildly happy, 5 = very happy. Just treat those 1-to-5 ratings as numbers, throw them into a t test, and out pops your answer. Right? Not necessarily...
Or, consider ratings of symptom intensity in different treatment groups. How bad is your headache? How depressed do you feel? Just treat the ratings as numbers and throw them into a t test, and out pops your answer. Right? Not necessarily...
Treating ordinal values as if they were numeric can lead to misinterpretations. Ordinal values do not indicate equal distances between their levels, nor equal coverage of each level. The conventional t test (and ANOVA and least-squares regression, etc.) assumes the data are metric values normally distributed around the model's predicted values. But obviously ordinal data are not normally distributed metric values.
A much better model of ordinal data is the ordered-probit model, which assumes a continuous latent dimension that is mapped to ordinal levels by slicing the latent dimension at thresholds. (The ordered-probit model is not the only good model of ordinal data, of course, but it's nicely analogous to the t test etc. because it assumes normally distributed noise on the latent dimension.)
The t test and the ordered probit model can produce opposite conclusions about the means of the groups. Here's an example involving star ratings from two movies:
The figure above shows data from two movies, labelled as Cases 5 and 6 in the first two columns. The pink histograms show the frequency distributions of the star ratings; they are the same in the upper and lower rows. The upper row shows the results from the ordered-probit model. The lower row shows the results from the metric model, that is, the t test. In particular, the right column shows the posterior difference of mu's for the two movies The differences are strongly in opposite directions for the two analyses. Each posterior distribution is marked with a dotted line at a difference of zero, and the line is annotated with the percentage of the distribution below zero and above zero. Notice the ordered-probit model fits the data much better than the metric model, as shown by the posterior predictions superimposed on the data: blue dots for the ordered probit model, and blue normal distributions for the metric model. (This is Figure 8 of the article linked below.)
Read all about it here:
Published article:
https://www.sciencedirect.com/science/article/pii/S0022103117307746
Preprint manuscript: https://osf.io/9h3et/
R code: https://osf.io/53ce9/files/
Friday, September 7, 2018
Analyzing ordinal data with metric models: What could possibly go wrong? (just published)
Just published:
Analyzing ordinal data with metric models: What could possibly go wrong?
We surveyed all articles in the Journal of Personality and Social Psychology(JPSP), Psychological Science (PS), and the Journal of Experimental Psychology: General (JEP:G) that mentioned the term “Likert,” and found that 100% of the articles that analyzed ordinal data did so using a metric model. We present novel evidence that analyzing ordinal data as if they were metric can systematically lead to errors. We demonstrate false alarms (i.e., detecting an effect where none exists, Type I errors) and failures to detect effects (i.e., loss of power, Type II errors). We demonstrate systematic inversions of effects, for which treating ordinal data as metric indicates the opposite ordering of means than the true ordering of means. We show the same problems — false alarms, misses, and inversions — for interactions in factorial designs and for trend analyses in regression. We demonstrate that averaging across multiple ordinal measurements does not solve or even ameliorate these problems. A central contribution is a graphical explanation of how and when the misrepresentations occur. Moreover, we point out that there is no sure-fire way to detect these problems by treating the ordinal values as metric, and instead we advocate use of ordered-probit models (or similar) because they will better describe the data. Finally, although frequentist approaches to some ordered-probit models are available, we use Bayesian methods because of their flexibility in specifying models and their richness and accuracy in providing parameter estimates. An R script is provided for running an analysis that compares ordered-probit and metric models.
Published article:
https://www.sciencedirect.com/science/article/pii/S0022103117307746
Preprint manuscript: https://osf.io/9h3et/
R code: https://osf.io/53ce9/files/
We surveyed all articles in the Journal of Personality and Social Psychology(JPSP), Psychological Science (PS), and the Journal of Experimental Psychology: General (JEP:G) that mentioned the term “Likert,” and found that 100% of the articles that analyzed ordinal data did so using a metric model. We present novel evidence that analyzing ordinal data as if they were metric can systematically lead to errors. We demonstrate false alarms (i.e., detecting an effect where none exists, Type I errors) and failures to detect effects (i.e., loss of power, Type II errors). We demonstrate systematic inversions of effects, for which treating ordinal data as metric indicates the opposite ordering of means than the true ordering of means. We show the same problems — false alarms, misses, and inversions — for interactions in factorial designs and for trend analyses in regression. We demonstrate that averaging across multiple ordinal measurements does not solve or even ameliorate these problems. A central contribution is a graphical explanation of how and when the misrepresentations occur. Moreover, we point out that there is no sure-fire way to detect these problems by treating the ordinal values as metric, and instead we advocate use of ordered-probit models (or similar) because they will better describe the data. Finally, although frequentist approaches to some ordered-probit models are available, we use Bayesian methods because of their flexibility in specifying models and their richness and accuracy in providing parameter estimates. An R script is provided for running an analysis that compares ordered-probit and metric models.
Published article:
https://www.sciencedirect.com/science/article/pii/S0022103117307746
Preprint manuscript: https://osf.io/9h3et/
R code: https://osf.io/53ce9/files/
Tuesday, May 8, 2018
Just published: "Rejecting or Accepting Parameter Values in Bayesian Estimation"
Just published: "Rejecting or Accepting Parameter Values in Bayesian Estimation" in the journal, Advances in Methods and Practices in Psychological Science.
Abstract: This article explains a decision rule that uses Bayesian posterior distributions as the basis for accepting or rejecting null values of parameters. This decision rule focuses on the range of plausible values indicated by the highest density interval of the posterior distribution and the relation between this range and a region of practical equivalence (ROPE) around the null value. The article also discusses considerations for setting the limits of a ROPE and emphasizes that analogous considerations apply to setting the decision thresholds for p values and Bayes factors.
From the introduction:
Abstract: This article explains a decision rule that uses Bayesian posterior distributions as the basis for accepting or rejecting null values of parameters. This decision rule focuses on the range of plausible values indicated by the highest density interval of the posterior distribution and the relation between this range and a region of practical equivalence (ROPE) around the null value. The article also discusses considerations for setting the limits of a ROPE and emphasizes that analogous considerations apply to setting the decision thresholds for p values and Bayes factors.
Figure 1 of the article. |
From the introduction:
In everyday life and in science, people often gather data to estimate a value precisely enough to take action. We use sensory data to decide that a fruit is ripe enough to be tasty but not overripe—that the ripeness is “just right” (e.g., Kappel, Fisher-Fleming, & Hogue, 1995, 1996). Scientists measured the position of the planet Mercury (among other things) until the estimate of the parameter γ in competing theories of gravity was sufficiently close to 1.0 to accept general relativity for applied purposes (e.g., Will, 2014).The published article is available here (http://journals.sagepub.com/doi/full/10.1177/2515245918771304) and a pre-print version, with some differences in details, is available here (https://osf.io/s5vdy).
These examples illustrate a method for decision making that I formalize in this article. This method, which is based on Bayesian estimation of parameters, uses two key ingredients. The first ingredient is a summary of certainty about the measurement. Because data are noisy, a larger set of data provides greater certainty about the estimated value of measurement. Certainty is expressed by a confidence interval in frequentist statistics and by a highest density interval (HDI) in Bayesian statistics. The HDI summarizes the range of most credible values of a measurement. The second key ingredient in the decision method is a range of parameter values that is good enough for practical purposes. This range is called the region of practical equivalence (ROPE). The decision rule, which I refer to as the HDI+ROPE decision rule, is intuitively straightforward: If the entire HDI—that is, all the most credible values—falls within the ROPE, then accept the target value for practical purposes. If the entire HDI falls outside the ROPE, then reject the target value. Otherwise, withhold a decision.
In this article, I explain the HDI+ROPE decision rule and provide examples. I then discuss considerations for setting the limits of a ROPE and explain that similar considerations apply to setting the decision thresholds for p values and Bayes factors.
Sunday, February 25, 2018
Make model diagrams for human comprehension and ease of programming
There's a great new book by Farrell and Lewandowsky, Computational Modeling of Cognition and Behavior (at the publisher, at Amazon.com), that includes some chapters on Bayesian methods. Each chapter includes a little "in vivo" commentary by an outside contributor. My commentary accompanies their chapter regarding JAGS. The commentary is posted here in a succession of three blog posts; this is 3 of 3. (Part 1 is here, and part 2 is here.) Do check out their book!
Make model diagrams for human comprehension and ease of programming
While a JAGS model specification captures the full structure of the model, it can help human beings to have a diagrammatic representation of the model. A diagram can help the viewer achieve a comprehensive overview of the relations between parameters and their meanings with respect to each other and to the data. A good conceptual diagram of a model can also guide writing the JAGS model specification.
For example, Figure 8.13 (below) shows a representation of the normal model used in the previous section. Because of graphical conventions for probability distributions, the data must be shown at the bottom of the diagram. Starting with y_{i}, the diagram shows that the data come from a normal distribution that has parameters μ and σ. Then the top of the diagram illustrates the prior distributions on the parameters.
Often when I’m creating a new model, I first sketch out a diagram in the style of Figure 8.13, and after I’m sure I have a coherent structure, then I type the model into JAGS, scanning the diagram from the bottom up.
There is another convention that is sometimes used to illustrate Bayesian models. This convention has historical roots in general treatments of statistical models that specify probabilistic dependencies between parameters such that no dependencies cycle back on themselves. Such structures are called directed acyclic graphs (DAG’s). In particular, the DAG diagrammatic convention was used by the software DoodleBUGS, which was a component of WinBUGS (Spiegelhalter, Thomas, Best, & Lunn, 2003), the precursor to JAGS.
Make model diagrams for human comprehension and ease of programming
While a JAGS model specification captures the full structure of the model, it can help human beings to have a diagrammatic representation of the model. A diagram can help the viewer achieve a comprehensive overview of the relations between parameters and their meanings with respect to each other and to the data. A good conceptual diagram of a model can also guide writing the JAGS model specification.
For example, Figure 8.13 (below) shows a representation of the normal model used in the previous section. Because of graphical conventions for probability distributions, the data must be shown at the bottom of the diagram. Starting with y_{i}, the diagram shows that the data come from a normal distribution that has parameters μ and σ. Then the top of the diagram illustrates the prior distributions on the parameters.
model { for ( i in 1:N ) { y[i] ~ dnorm( mu , 1/sigma^2 ) } mu ~ dunif( -100 , 100 ) sigma ~ dunif( 0 , 100 ) } |
(Listing 8.11. Describe data with a normal distribution in JAGS.) |
There is another convention that is sometimes used to illustrate Bayesian models. This convention has historical roots in general treatments of statistical models that specify probabilistic dependencies between parameters such that no dependencies cycle back on themselves. Such structures are called directed acyclic graphs (DAG’s). In particular, the DAG diagrammatic convention was used by the software DoodleBUGS, which was a component of WinBUGS (Spiegelhalter, Thomas, Best, & Lunn, 2003), the precursor to JAGS.
For more disucssion, see p. 197 of Kruschke (2015). It’s repeatedly emphasized for many different models in that book that every arrow in a model diagram (usually) has a corresponding line of code in JAGS. See another comparison of diagrams at this blog post. See tools for creating diagrams at this blog post.
(Ironically, the next chapter of Farrell and Lewandowsky's book violates this advice. You can judge for yourself whether you think the DAGs have any useful correspondence to the JAGS code. Despite the use of DAGs, their book is a great resource!)
Subscribe to:
Posts (Atom)