Thursday, July 21, 2016

Bayesian predicted slopes with interaction in multiple linear regression

Suppose we have a multiple linear regression with interaction: \[\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_{1\times 2} x_1 x_2 \] Notice that the slope on \(x_1\) is not just \(\beta_1\), it's \(\beta_1 + \beta_{1\times 2} x_2\): \[\hat{y} = \beta_0 + ( \beta_1 + \beta_{1\times 2} x_2) x_1 + \beta_2 x_2  \] In other words, the slope on \(x_1\) depends on the value of \(x_2\). That's the essence of interaction. Of course, the same applies analogously to the slope on \(x_2\).

This is explained in Ch. 18 of DBDA2E, and an example is given in Figure 18.9 on p. 529 that plots the predicted slopes and their 95% HDIs as a function of the value of the other predictor. The variable "Spend" is \(x_1\) and the variable "PrcntTake" is \(x_2\). The top panel (below) plots the posterior predicted slope on \(x_1\) for values of \(x_2\):
After Figure 18.9, p. 529, of DBDA2E

A reader asked me how to compute those predicted slopes in R and make the graph. I thought I already had the R code in the package of programs, but upon searching discovered that it was not there. (Sorry; with over 700 pages of text and many dozens of programs, some things get overlooked.) This blog post supplies the R code!  :-)

First, from the folder of programs, open the file Jags-Ymet-XmetMulti-Mrobust-Example.R Then, at the top of the file, comment out all the data-loading examples and put in the following code for reading in the relevant data and creating an interaction variable. This is explained on p. 528 of DBDA2E:

# Two predictors with interaction:
# Read the SAT data file:

myData = read.csv( file="Guber1999data.csv" )
# Append named column with interaction product: 

myData = cbind( myData , SpendXPrcnt=myData[,"Spend"]*myData[,"PrcntTake"] )
# Specify the column names:

yName = "SATT" ; xName = c("Spend","PrcntTake","SpendXPrcnt")
# Put the saved files into a sub-directory:

dirName = "Guber1999data-Jags-Inter"
if( !dir.exists(dirName) ) { dir.create(dirName) }
fileNameRoot = paste0(dirName,"/Guber1999data-Jags-Inter-")
# Specify the number of saved steps and the thinning (merely to save memory):

numSavedSteps=15000 ; thinSteps=20 # smaller thinSteps will run faster
# Specify the preferred graph file format:
graphFileType = "png"
Then run the MCMC, as already specified in the script. I'm repeating it here for completeness:

# Load the relevant model into R's working memory: 
# Generate the MCMC chain: 

mcmcCoda = genMCMC( data=myData , xName=xName , yName=yName ,
                    numSavedSteps=numSavedSteps , thinSteps=thinSteps ,
                    saveName=fileNameRoot )

The script also has commands for displaying MCMC diagnostics, numerical summary, and basic plots of the posterior distribution.

Now the novel R code for computing the slopes and making the graph! At the end of the script, run the following code:

# Convert mcmc coda object to a matrix: 

# Make combs of X1 and X2 values for displaying slopes and their HDIs: 

X1Comb = seq(min(myData[,xName[1]]),max(myData[,xName[1]]),length=20)
X2Comb = seq(min(myData[,xName[2]]),max(myData[,xName[2]]),length=20)

# Create place holders for X1 median slopes and HDIs: 

X1SlopeMed = rep(0,length(X2Comb))
X1SlopeHDI = matrix(0,nrow=2,ncol=length(X2Comb))
# Compute the slope on X1 for each value of X2,
# as explained at top of p. 530 of DBDA2E:

for ( combIdx in 1:length(X2Comb) ) {
  credSlope = mcmcMat[,"beta[1]"] + mcmcMat[,"beta[3]"]*X2Comb[combIdx]
  X1SlopeHDI[,combIdx] = HDIofMCMC( credSlope )
  X1SlopeMed[combIdx] = median( credSlope )

# Same as above but for X2:

X2SlopeHDI = matrix(0,nrow=2,ncol=length(X1Comb))
X2SlopeMed = rep(0,length(X1Comb))
for ( combIdx in 1:length(X1Comb) ) {
  credSlope = mcmcMat[,"beta[2]"] + mcmcMat[,"beta[3]"]*X1Comb[combIdx]
  X2SlopeHDI[,combIdx] = HDIofMCMC( credSlope )
  X2SlopeMed[combIdx] = median( credSlope )

# Make the graph as in lower part of Figure 18.9, p. 529 of DBDA2E:

par( mar=c(3.5,3.5,2,1) , mgp=c(2,0.7,0) )
# Panel to display slope on X1:

plot( X2Comb , X1SlopeMed , ylim=range(X1SlopeHDI) , 
      pch=19 , cex=1.5 , col="skyblue" ,
      ylab=paste("Slope on",xName[1]) , xlab=paste("Value of",xName[2]) ,

      cex.lab=1.5 )
segments( x0=X2Comb , y0=X1SlopeHDI[1,] ,x1=X2Comb , y1=X1SlopeHDI[2,] ,
          col="skyblue" , lwd=3 )
# Panel to display slope on X2: 

plot( X1Comb , X2SlopeMed , ylim=range(X2SlopeHDI) , pch=19 , cex=1.5 , 
      col="skyblue" ,
      ylab=paste("Slope on",xName[2]) , xlab=paste("Value of",xName[1]) , 

      cex.lab=1.5 )
segments( x0=X1Comb , y0=X2SlopeHDI[1,] ,x1=X1Comb , y1=X2SlopeHDI[2,] ,
          col="skyblue" , lwd=3 )
# Save it:

 saveGraph( file=paste0(fileNameRoot,"Slopes") , type=graphFileType )

Hopefully the comments in the script (above) are sufficient for you to decipher each step.

No comments:

Post a Comment