Doing Bayesian Data Analysis: Don't treat ordinal data as metric -- update of movie ratings

Sunday, January 28, 2018

Don't treat ordinal data as metric -- update of movie ratings

In a previous post I applied a Bayesian ordered-probit model to movie ratings and showed how the results differ from treating the data as if they were metric. The metric model used frequentist t tests (because that's what most applied researchers would do). In this post, I re-analyze that data as if they metric but using a Bayesian model that has the same hierarchical structure as the Bayesian hierarchical ordered-probit model I used before. Here we can compare ordered-probit to metric treatments with all else held constant. Spoiler: Same conclusions, of course. The ordered-probit model fits the data much better than the metric model. Don't treat ordinal data as metric.

Please see the previous post for details about the data and the models. In particular, note that there is hierarchical structure on the standard deviations across movies, but not on the means across movies. In other words, there is no direct shrinkage on the means.

Here is a repeat of the data, from 30 movies, fit by the ordered-probit model:

The pink bars (above) are frequency histograms of the data from each movie; the blue dots (with vertical blue whiskers) are the posterior predictions and 95% HDIs for the ordered-probit model. A very good fit, all in all.

Here is some news. The data fit by the metric model. Here a 1-star rating is considered to be a score of 1.0 on a metric scale, a 2-star rating is considered to be a score of 2.0 on a metric scale, and so on. Each histogram is fit by a normal distribution (as is assumed by t tests, ANOVA, etc.). The result:

The superimposed blue curves (above) are a smattering from the posterior distribution. Not a very good fit to the data distributions.

More news. The means of the ordered-probit model plotted against the means of the metric model, with 95% HDI's:

You can see (above) that the rank ordering of the movies is quite different for the two models!

Case in point (and more news):

You can see (above) that the ordered-probit puts movie 10 well above movie 26, but treating the data as metric yields the opposite conclusion. Which conclusion is more appropriate? Clearly the ordered-probit describes the data much more accurately than treating the data as metric. Watch movie 10 before movie 26.

And don't be a free rider on the rating system. If you use the rating system, give a rating.

2 comments:

ManuelMarch 23, 2018 at 12:54 AM
I am currently helping a colleague make sense of her data. Both the outcome and covariate of main interest are Likert scales. Presumably the correct way of going about the analysis would be to first retrieve the underlying continuous construct for each variable, for each individual and then proceed with a standard linear regression between those inferred measures?

Sorry if this is an obvious question but this is the first time I'm working (appropriately!) with this type of data. Really glad I found your preprint before mucking this one up.
ReplyDelete
Replies

Add comment

Doing Bayesian Data Analysis

Sunday, January 28, 2018

Don't treat ordinal data as metric -- update of movie ratings

2 comments:

Total Pageviews

Blog Roll