In R, the

polr function in the MASS package does ordinal probit regression (and ordinal logistic regression, but I focus here on probit). The polr function yields parameter estimates that are difficult to interpret intuitively because they assume a baseline intercept of 0 and a noise standard deviation of 1, which produces slopes and thresholds that are relative to an underlying scale with little direct relation to the ordinal values in the data. In can be more intuitive to fix the low and high thresholds at values that make sense on the ordinal data scale, and estimate the other parameters relative to those low and high thresholds. This intuitive parameterization is used in Chapter 23 of DBDA2E. This blog post presents a function that transforms the polr parameter values to the intuitive parameterization.

##
Background: Ordinal probit regression.

See Chapter 23 of DBDA2E for a complete introduction to and explanation of ordinal probit regression. The figure below provides a quick reminder of the model.

The data to be predicted are ordinal, shown on the left side of the figure. The predictor in this case is metric, shown at the lower-right of the figure. The model assumes there exists a underlying metric scale for the ordinal outcome, and this underlying scale is chopped into subintervals that have thresholds denoted theta (see the diagram). The model assumes there is some noisy mapping from the predictor to the underlying metric response scale, as assumed by ordinary linear regression, and the probability of each discrete ordinal response is the area under the normal curve between the thresholds for that response category.

From the data we estimate the intercept, the slope, the standard deviation, and the thresholds. But the underlying scale has arbitrary location and magnification, so two of the parameters can be arbitrarily fixed. In polr, the intercept is fixed at 0 and the standard deviation is fixed at 1. But that yields thresholds and slopes on a scale with no intuitive relation to the response values, which go from 1 to K, where K is the maximum ordinal response level.

A more intuitive parameterization fixes the low threshold at 1.5 (half way between 1 and 2), and the high threshold at K-.5 (half way between K-1 and K). Then the thresholds are directly interpretable with respect to those anchors, and the standard deviation can also be interpreted relative to the distance between those anchors.

##
Example: Happiness and financial assets.

Consider subjective ratings of happiness on a 1 to 5 ordinal scale, where 1 is bummed out and 5 is ecstatic (I forget the exact wording used to label the response items). For each respondent we also get their total assets. Some real data are shown in the top panel of the figure below:

See the figure caption for details. The lower panels of the figure show aspects of the Bayesian posterior distribution. Notice that the lowest threshold is fixed at 1.5, and the highest threshold is fixed 4.5. This means that the underlying metric output scale is aligned with the ordinal responses, and the other parameters are relative to the low and high anchors. In particular, the intercept is at about 3.16, which suggests a happiness rating of about 3 even with zero assets. The slope is about 4.14e-6 happiness units per yuan. The standard deviation of the noise is about 0.854 happiness units, suggesting large variability on a 5-point scale.

The output of polr, however, looks like this (where Happiness is Yord and Assets is X):

> polrInfo = polr( Yord ~ X , method="probit" )
> print( polrInfo )

Coefficients:
X
4.025474e-06
Intercepts:
1|2 2|3 3|4 4|5
-2.048053 -1.180089 -0.127664 1.544575

What do those parameter estimates mean? The coefficient on X is the slope, and the intercepts are the thresholds. While we can use those estimates to derive predictions of outcome probabilities, they aren't very directly interpretable, at least not for me!

The polr estimates can be transformed into the parameterization I explained above, and then we get the following:

> polrInfoOrd = polrToOrdScale( polrInfo )
> print( polrInfoOrd )

$sigma
[1] 0.8350432
$b0
[1] 3.210213
$coefficients
X
3.361444e-06
$zeta
1|2 2|3 3|4 4|5
1.500000 2.224788 3.103608 4.500000

The thresholds are the zeta values; notice that the lowest threshold is 1.5 and the highest threshold is 4.5. The sigma, intercept (i.e., b0), and slope (i.e., coefficient) are very close to the modal posterior values of the Bayesian estimation.

##
The function:

Here is the specification of the function in R:

polrToOrdScale = function( polrObject ) {

polrThresh = polrObject$zeta

polrSlopes = polrObject$coefficients

polrInter = 0.0

polrSigma = 1.0

K = length(polrThresh) + 1 # K is number of ordinal levels

sigmaMult = unname( (K-2)/(polrThresh[K-1]-polrThresh[1]) )

inter = unname( 1.5 - ( sigmaMult*polrThresh[1] ) )

respThresh = sigmaMult*polrThresh + inter

respSigma = sigmaMult*polrSigma

respB0 = sigmaMult*polrInter + inter

respSlopes = sigmaMult*polrSlopes

return( list( sigma=respSigma ,

b0=respB0 ,

coefficients=respSlopes ,

zeta=respThresh ) )

}
I have only tested the function on a few examples; there are no guarantees! Feel free to reply with refinements.