Sunday, January 28, 2018

Don't treat ordinal data as metric -- update of movie ratings

In a previous post I applied a Bayesian ordered-probit model to movie ratings and showed how the results differ from treating the data as if they were metric. The metric model used frequentist t tests (because that's what most applied researchers would do). In this post, I re-analyze that data as if they metric but using a Bayesian model that has the same hierarchical structure as the Bayesian hierarchical ordered-probit model I used before. Here we can compare ordered-probit to metric treatments with all else held constant. Spoiler: Same conclusions, of course. The ordered-probit model fits the data much better than the metric model. Don't treat ordinal data as metric.

Please see the previous post for details about the data and the models. In particular, note that there is hierarchical structure on the standard deviations across movies, but not on the means across movies. In other words, there is no direct shrinkage on the means.

Here is a repeat of the data, from 30 movies, fit by the ordered-probit model:
The pink bars (above) are frequency histograms of the data from each movie; the blue dots (with vertical blue whiskers) are the posterior predictions and 95% HDIs for the ordered-probit model. A very good fit, all in all.

Here is some news. The data fit by the metric model. Here a 1-star rating is considered to be a score of 1.0 on a metric scale, a 2-star rating is considered to be a score of 2.0 on a metric scale, and so on. Each histogram is fit by a normal distribution (as is assumed by t tests, ANOVA, etc.). The result:
The superimposed blue curves (above) are a smattering from the posterior distribution. Not a very good fit to the data distributions.

More news. The means of the ordered-probit model plotted against the means of the metric model, with 95% HDI's:
You can see (above) that the rank ordering of the movies is quite different for the two models!

Case in point (and more news):
You can see (above) that the ordered-probit puts movie 10 well above movie 26, but treating the data as metric yields the opposite conclusion. Which conclusion is more appropriate? Clearly the ordered-probit describes the data much more accurately than treating the data as metric. Watch movie 10 before movie 26.

And don't be a free rider on the rating system. If you use the rating system, give a rating.