Doing Bayesian Data Analysis: Graphs of imputed censored y values

Friday, December 8, 2017

Graphs of imputed censored y values

When using JAGS with censored data, the censored values are imputed to be consistent with the parameters of the model and the censoring limits. It's straight forward to record and graph the imputed values. Here are a couple of slides from my workshops that show how. Start by looking at Section 25.4 of DBDA2E, then the code below follows:

8 comments:

Sean SDecember 14, 2017 at 3:40 PM
Thanks for the quick and complete answer. It worked like a charm!

Nice trick where you only graph the first imputed y from each interval since all the censored y's within each region will look the same.

- Sean
ReplyDelete
Replies
Sean SDecember 14, 2017 at 4:20 PM
I'm now attempting to implement a missing predictor model using the HtWt30 robust regression example.

It seems like I'm close, but the model won't quite run. Right now I get a runtime error: unable to resolve the following parameters: x[3], x[4], x[7] line 5, and so on. I created a prior for use when the x's are missing so I'm not sure what's causing the error.

Note that the missingIdx data is a column where mIdx[i]>0 for missing x values. Also note that I hard-coded some of the values for convenience that should be calculated on the fly.

I also apologize in advance for the terrible formatting of the code. I couldn't figure out how to get blogspot to let me paste in html code.

the code:
genMCMC = function( data, xName="x", yName="y", missingIdx="missingIdx", numSavedSteps=50000, saveName=NULL ) {
#-----------------------------------------------------------------------------
# THE DATA.
y = data[,yName]
x = data[,xName]
meanY = mean(y)
meanX = mean(x,na.rm=TRUE)
sdY = sd(y)
sdX = sd(x,na.rm=TRUE)
mIdx = data[,missingIdx] # > 0 when predictor missing
Ntotal = length(y)
# Specify the data in a list, for later shipment to JAGS:
dataList = list(
x = x ,
y = y ,
mIdx = mIdx ,
meanY = meanY ,
meanX = meanX ,
sdY = sdY ,
sdX = sdX ,
Ntotal = Ntotal
)
#-----------------------------------------------------------------------------
# THE MODEL.
modelString = "
# Standardize the data:
data {
for ( i in 1:Ntotal ) {
zx[i] <- ifelse ( mIdx[i]==0, (x[i]-meanX)/sdX , x[i] ) # skip NA's
zy[i] <- ( y[i] - meanY ) / sdY
}
}
# Specify the model for standardized data:
model {
for ( i in 1:Ntotal ) {
zy[i] ~ dt( zbeta0 + zbeta1 * zx[i] , 1/zsigma^2 , nu )
}
# prior used for missing zx's
zx ~ dnorm( 0 , 1 )
# Priors vague on standardized scale:
zbeta0 ~ dnorm( 0 , 1/(10)^2 )
zbeta1 ~ dnorm( 0 , 1/(10)^2 )
zsigma ~ dunif( 1.0E-3 , 1.0E+3 )
nu ~ dexp(1/30.0)
# Transform to original scale:
beta1 <- zbeta1 * ysd / xsd
beta0 <- zbeta0 * ysd + ym - zbeta1 * xm * ysd / xsd
sigma <- zsigma * ysd
x <- zx*sdX + meanX
}
" # close quote for modelString
writeLines( modelString , con="TEMPmodel.txt" )
#-----------------------------------------------------------------------------
# INITIALIZE THE CHAINS
# Initial values of MCMC chains based on data:
beta0 = 0 ; zbeta0 = 0
beta1 = 3.6440 # coef(lm(y~x))["x"]
zbeta1 = 0.5295 # zbeta1 = cor(zx, zy, use = "complete.obs")
sigma = sd(y) # na.rm=TRUE is default
zsigma = 1
nu = 30
# initial values for missing data:
xInit = rep( NA , length(x) )
for ( i in 1:length(y) ) {
if ( mIdx[i] > 0 ) { # only initialize if x is missing
xInit[i] = 56.357662 + 0.0690965*y[i] # hardcoded to keep it simple
}
}
initsList = list( beta0=beta0, beta1=beta1, zbeta0=zbeta0, zbeta1=zbeta1, zsigma=zsigma, sigma=sigma, nu = nu, x=xInit )
#-----------------------------------------------------------------------------
# RUN THE CHAINS
parameters = c( "beta0", "beta1", "sigma", "zbeta0", "zbeta1", "zsigma", "nu", "x")

# including "x" allows us to see imputed x distributions
ReplyDelete
Replies
Sean SDecember 27, 2017 at 12:26 PM
In the spirit of starting with a simple model, getting it to work, and then layering on the complexity, I was trying to get the simplest missing predictor case to work (one predictor, one outcome, 6/30 missing predictor values). I assumed the missing predictors were centered around x=0 with a std dev of 1 on the standardized scale. Initial values were determined by the inverse prediction equation x=f(y) from the complete data. Of course the imputed predictor values should be close to those initial values, but I'm really just trying to get the HtWt30 model to work with those missing predictors before I move onto the harder cases.

Conceptually isn't imputing a few missing predictor values as the model is MCMC'ed similar to jointly determining the regression model and the inverse prediction?

In my real case study, I do have intermediate measurements that will inform the missing predictor values. I have 10 predictors all with some "true" measurements and some missing measurements (missing values that will be independently measured - i.e. observed values measured with unknown error) with intermediate values to dial them in. That model seems like a bear to program/debug. I could have started with the multiple predictor case (like the Guber data) with one of the predictors missing a few values since that would be more realistic, but I truly wanted to start with the simplest of the simple missing value cases.

ReplyDelete
Replies
Sean SDecember 29, 2017 at 3:48 PM
Do you have any examples of a model with some missing predictor values?

I would assume there are cases like a repeated measurement experiment in medicine where imputing a missing data point is more helpful than throwing out all the incomplete cases. The posterior of those missing data points may be wide, but you try to extract maximum information from the datapoints you do have (given the assumptions of missing at random, multivariate normal, etc).
ReplyDelete
Replies

Add comment

Doing Bayesian Data Analysis

Friday, December 8, 2017

Graphs of imputed censored y values

8 comments:

Total Pageviews

Blog Roll