Doing Bayesian Data Analysis: Bayesian predicted slopes with interaction in multiple linear regression

Suppose we have a multiple linear regression with interaction: \[\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_{1\times 2} x_1 x_2 \] Notice that the slope on \(x_1\) is not just \(\beta_1\), it's \(\beta_1 + \beta_{1\times 2} x_2\): \[\hat{y} = \beta_0 + ( \beta_1 + \beta_{1\times 2} x_2) x_1 + \beta_2 x_2 \] In other words, the slope on \(x_1\) depends on the value of \(x_2\). That's the essence of interaction. Of course, the same applies analogously to the slope on \(x_2\).

This is explained in Ch. 18 of DBDA2E, and an example is given in Figure 18.9 on p. 529 that plots the predicted slopes and their 95% HDIs as a function of the value of the other predictor. The variable "Spend" is \(x_1\) and the variable "PrcntTake" is \(x_2\). The top panel (below) plots the posterior predicted slope on \(x_1\) for values of \(x_2\):

After Figure 18.9, p. 529, of DBDA2E

A reader asked me how to compute those predicted slopes in R and make the graph. I thought I already had the R code in the package of programs, but upon searching discovered that it was not there. (Sorry; with over 700 pages of text and many dozens of programs, some things get overlooked.) This blog post supplies the R code! :-)

First, from the folder of programs, open the file Jags-Ymet-XmetMulti-Mrobust-Example.R Then, at the top of the file, comment out all the data-loading examples and put in the following code for reading in the relevant data and creating an interaction variable. This is explained on p. 528 of DBDA2E:

# Two predictors with interaction:
# Read the SAT data file:
myData = read.csv( file="Guber1999data.csv" )
# Append named column with interaction product:
myData = cbind( myData , SpendXPrcnt=myData[,"Spend"]*myData[,"PrcntTake"] )
# Specify the column names:
yName = "SATT" ; xName = c("Spend","PrcntTake","SpendXPrcnt")
# Put the saved files into a sub-directory:
dirName = "Guber1999data-Jags-Inter"
if( !dir.exists(dirName) ) { dir.create(dirName) }
fileNameRoot = paste0(dirName,"/Guber1999data-Jags-Inter-")
# Specify the number of saved steps and the thinning (merely to save memory):
numSavedSteps=15000 ; thinSteps=20 # smaller thinSteps will run faster
# Specify the preferred graph file format:
graphFileType = "png"
Then run the MCMC, as already specified in the script. I'm repeating it here for completeness:

# Load the relevant model into R's working memory:
source("Jags-Ymet-XmetMulti-Mrobust.R")
# Generate the MCMC chain:
mcmcCoda = genMCMC( data=myData , xName=xName , yName=yName ,
                    numSavedSteps=numSavedSteps , thinSteps=thinSteps ,
                    saveName=fileNameRoot )

The script also has commands for displaying MCMC diagnostics, numerical summary, and basic plots of the posterior distribution.

Now the novel R code for computing the slopes and making the graph! At the end of the script, run the following code:

# Convert mcmc coda object to a matrix:
mcmcMat=as.matrix(mcmcCoda,chains=TRUE)

# Make combs of X1 and X2 values for displaying slopes and their HDIs:
X1Comb = seq(min(myData[,xName[1]]),max(myData[,xName[1]]),length=20)
X2Comb = seq(min(myData[,xName[2]]),max(myData[,xName[2]]),length=20)

# Create place holders for X1 median slopes and HDIs:
X1SlopeMed = rep(0,length(X2Comb))
X1SlopeHDI = matrix(0,nrow=2,ncol=length(X2Comb))
# Compute the slope on X1 for each value of X2,
# as explained at top of p. 530 of DBDA2E:
for ( combIdx in 1:length(X2Comb) ) {
credSlope = mcmcMat[,"beta[1]"] + mcmcMat[,"beta[3]"]*X2Comb[combIdx]
X1SlopeHDI[,combIdx] = HDIofMCMC( credSlope )
X1SlopeMed[combIdx] = median( credSlope )
}

# Same as above but for X2:
X2SlopeHDI = matrix(0,nrow=2,ncol=length(X1Comb))
X2SlopeMed = rep(0,length(X1Comb))
for ( combIdx in 1:length(X1Comb) ) {
credSlope = mcmcMat[,"beta[2]"] + mcmcMat[,"beta[3]"]*X1Comb[combIdx]
X2SlopeHDI[,combIdx] = HDIofMCMC( credSlope )
X2SlopeMed[combIdx] = median( credSlope )
}

# Make the graph as in lower part of Figure 18.9, p. 529 of DBDA2E:
openGraph()
layout(matrix(1:2,nrow=2))
par( mar=c(3.5,3.5,2,1) , mgp=c(2,0.7,0) )
# Panel to display slope on X1:
plot( X2Comb , X1SlopeMed , ylim=range(X1SlopeHDI) ,
      pch=19 , cex=1.5 , col="skyblue" ,
      ylab=paste("Slope on",xName[1]) , xlab=paste("Value of",xName[2]) ,
cex.lab=1.5 )
segments( x0=X2Comb , y0=X1SlopeHDI[1,] ,x1=X2Comb , y1=X1SlopeHDI[2,] ,
          col="skyblue" , lwd=3 )
abline(h=0,lty="dotted")
# Panel to display slope on X2:
plot( X1Comb , X2SlopeMed , ylim=range(X2SlopeHDI) , pch=19 , cex=1.5 ,
      col="skyblue" ,
      ylab=paste("Slope on",xName[2]) , xlab=paste("Value of",xName[1]) ,
      cex.lab=1.5 )
segments( x0=X1Comb , y0=X2SlopeHDI[1,] ,x1=X1Comb , y1=X2SlopeHDI[2,] ,
          col="skyblue" , lwd=3 )
abline(h=0,lty="dotted")
# Save it:
saveGraph( file=paste0(fileNameRoot,"Slopes") , type=graphFileType )

Hopefully the comments in the script (above) are sufficient for you to decipher each step.

Doing Bayesian Data Analysis

Thursday, July 21, 2016

Bayesian predicted slopes with interaction in multiple linear regression

No comments:

Post a Comment

Total Pageviews

Blog Roll