Before launching into programming details, let me illustrate why this is important. Right-censored data can occur in many situations. For example, when measuring response times, the subject is given a limited temporal window in which to respond, and if no response has occurred by the end of the window, then the trial is aborted and the response is recorded as censored. If we assume that the subject was "on task" but merely didn't have enough time to respond before the experimenter got bored, then it would be inappropriate to omit this trial from the modeled data. In other words, the trial is providing genuine information about the response time, namely, that it takes longer than the experimenter's censoring limit. If we omit this trial from the modeled data, then our estimates will be too small!
Here is an example. The data are 500 values generated from a normal distribution, rescaled so that the sample mean is 100.0 and the sample standard deviation is 15.0. Then I marked any value above +1SD as NA. In other words, I censored about 15% of the data in the right tail. Suppose we model the data with a normal likelihood and estimate its μ and σ parameters. We would like the estimates to recover the true generating values of μ=100.0 and σ=15.0. But if we simply omit the censored values (as if they weren't part of the data), the estimates look like this:
![]() |
| Posterior estimates when censored data are omitted. Estimates are too low. (ROPE is shown only for visual landmarks; it's not particularly meaningful in this case.) |
On the other hand, if we tell JAGS that there are censored values that it should take into account, the estimates look like this:
![]() |
| Posterior estimates when censored data are modeled. Estimates are accurate. (ROPE is shown only for visual landmarks; it's not particularly meaningful in this case.) |
The basic idea for implementing in JAGS is presented briefly (one paragraph) in the JAGS user manual. The idea starts with defining an indicator vector that encodes, for every data value, whether the data value was censored or not. The indicator value, here named isCensored, is 1 for censored values and 0 for non-censored values. Then that indicator value is modeled thus: isCensored ~ dinterval(y,censorLimit) It took me a while to infer from that syntax the key idea: When isCensored is 1 and y is missing, then all JAGS knows is that y is somewhere above censorLimit, so JAGS effectively imputes a random value for y from the likelihood function (defined separately, in the usual way). Here are the tacit assumptions that I eventually figured out:
- Censored data must be recorded as NA, not as the value of censoring limit.
- When explicitly initializing the chains, the censored values of the data must be explicitly initialized (to values above the censoring limits)!


4 comments:
This post is fortuitous, because I was struggling with right-censored data this week. I've expanded this model into one which is slightly more complicated, in which the data have associated measurement uncertainties, so the body of the model is:
isCensored[i] ~ dinterval(yobs[i], censorLimitVec[i])
yobs[i] ~ dnorm(y[i],yerr[i])
y[i] ~ dnorm(mu, tau)
with "yobs" representing the observed values with associated uncertainties "yerr".
For this to run successfully the yerr value associated with the censored yobs points have to be defined---it won't accept "NA". Why this is true isn't immediately clear to me.
Very helpful!! Thanks!
Running your program results in an error due to a change in JAGS 3.3.0 that do not allow values to be overwritten during initialization. The error goes away if
# intial values of censored data:
yInit = y
yInit[isCensored] = censorLimitVec[isCensored]+1
is rewritten to
# intial values of censored data:
yInit = NA
yInit[isCensored] = censorLimitVec[isCensored]+1
In reply to Galeri B. For generality, a better way to do this
yinit=y
yinit[]=NA
yInit[isCensored] = censorLimitVec[isCensored]+1
So that you are guaranteed to keep the dimension, even if there are no censored values in the range
Post a Comment