Sunday, August 16, 2015

A case in which metric data are better analyzed by an ordinal model

Here we consider some data that might have been smoothly distributed over a metric scale, but ended up being concentrated on only a few values. The usual treatment of the data as normally or t-distributed is not appropriate, and instead the data are binned and analyzed as ordinal.

The data are from an unpublished study by Shannon Bailey and Dr. Valerie Sims. In their experiment, people read a description of animal cruelty that occurred either in a kennel (group 1, N=270) or in an animal shelter (group 2, N=253). The people then responded with how big a fine they thought should be assessed to the transgressor, on a continuous scale from zero to 2,000 dollars. ($2,000 was the maximum allowed by state law at the time of the experiment.)

Because the responses are on a continuous scale, it seems reasonable to apply a model that describes the data as t-distributed, for which we estimate the means, scales, and normality. For details of the model, see Ch. 16 of DBDA2E  or the article in JEP:General. The result is shown below:
Unfortunately, the histograms of the data in the upper-right panels above show that a t distribution is a terrible description of the data. We cannot really interpret the parameters of the model very meaningfully when the model doesn't describe the data very well.

Despite the fact that the response scale was continuous, the responses were spontaneously ordinal. A histogram of the data (collapsed across groups) is shown below:
Notice that responses are strongly limited to 0, 500, 1000, 1500, and 2000. There are very few response values between those multiples of 500. The data were therefore converted to ordinal values as follows:
0 - 50  --> 1
50 - 450 --> 2
450 - 550 --> 3
550 - 950 --> 4
and so forth.

The resulting ordinal data were then analyzed using the cumulative thresholded normal model described in Ch. 23.3 of DBDA2E. The results were as follows:
Notice in the upper-right panels above that the data are described very accurately by the model. We can therefore put some merit into the interpretation of the parameters. The effect size (lower-right panel above) has a magnitude of about 0.35, indicating that people assigned fines in the kennel about 1/3 standard deviation higher than in the shelter.

Thanks go to Shannon Bailey for bringing this to my attention and for sharing the data so I could make this blog post.

1 comment:

  1. What about a case where I might have some metric data, atomic weights in particular, that I predict have a particular value? I've been re-reading some of Scientific Reasoning: The Bayesian Approach by Colin Howson and Peter Urbach, and they relate a story where an atomic weight was predicted to be one thing by a theory, and data showed otherwise, so the way the authors look at it is looking at the posterior of the auxiliary hypothesis and of the general theory, and they show that the posterior for the general theory is less harmed by the data than the auxiliary hypothesis. I want to do a more refined analysis using probability distributions.

    Let's say here that the predicted variable is exactly 36, with a prior probability of 60% in the auxiliary hypothesis and prior probability of 90% in the theory. The data show that it is a bit less than that (the book gives 35.83, but I've created a data set using a gaussian random number generator to have more data to play with). Using the generalized linear model seems inappropriate to me, but in the DBDA2E book, the GLM is the only area where a single metric predicted variable from a single metric predictor is discussed. To what section of the book would you refer me to do what I want with this toy data?