Comments on Doing Bayesian Data Analysis: p value from likelihood ratio test is not the same as p value from maximum likelihood estimate

Ah, I see. A solution is emerging! Very interestin...

2014-09-18T15:45:45.286-04:00

Ah, I see. A solution is emerging! Very interesting.

The fact that you had mentioned centering (but not...

2014-09-18T09:01:46.970-04:00

The fact that you had mentioned centering (but not scaling) x and y, while in my code I'm centering and scaling x and e (but not y) got me wondering about the role of variance. I hadn't thought through it yet, but it sounds like you have. I look forward to seeing the new pictures.

The key conceptual difference between the sampling...

2014-09-18T08:18:38.932-04:00

The key conceptual difference between the sampling distributions (of MLE beta1 vs G2) is not how they treat the intercept, it's how they treat the variance. In some random samples, MLE beta1 can be large but with large MLE sigma, in which case G2 would be relatively small -- these are the blue dots. In other random samples, MLE beta1 can be small but with small MLE sigma, in which case G2 could be relatively large -- these are the red dots. G2 is (in this normal+linear case like the F ratio) a measure of slope vs intercept-only relative to variance in the sample, while MLE beta1 is a 'direct' measure of slope on the absolute scale of the null hypothesis. I'll post some pictures soon...

I meant to mention this in my first comment, but w...

2014-09-18T08:04:41.858-04:00

I meant to mention this in my first comment, but when you write that "...it adds another reason never to talk about "the" p value for a set of data, because any data set has many different p values", I feel like it's worth pointing out that anyone who claims that there's a single p value for a data set is forgetting (or doesn't know) that a p value is associated with a test carried out on a data set. A data set without a test doesn't have any p value at all (nor does a test without a data set).

Anyway, when I use standardized data (x and e), I get exactly the same p values for the beta coefficient test and the model comparison test. Each time I run 5000 simulations, I get simulated p values that are very close to the values returned by lm() and anova(). I've tried it four or five times with different values for b1 this morning, and the results are very consistent (with respect to the p values matching across tests).

Here's my code:

N = 100
x = runif(N)*5 + 3
x = (x-mean(x))/sqrt(((N-1)/N)*var(x))
b0 = 0
b1 = .23
e = rnorm(N,mean=0,sd=7)
e = (e-mean(e))/sqrt(((N-1)/N)*var(e))
y = b0 + b1*x + e

data = data.frame(y=y,x=x)

BF = matrix(nrow=nsmp,ncol=2)
colnames(BF) = c("beta","Fs")
fit1 = lm(y~x,data=data)
fit2 = lm(y~1,data=data)
test = anova(fit2,fit1)

b1.f = coef(fit1)[2]
Fs.f = test$F[2]

nsmp = 5000
b1.H0 = 0

for(smpi in 1:nsmp){
e = rnorm(N,mean=0,sd=7)
e = (e-mean(e))/sqrt(((N-1)/N)*var(e))
y = b0 + b1.H0*x + e
d = data.frame(x=x,y=y)
fit.a = lm(y~x,data=d)
fit.b = lm(y~1,data=d)
test.ab = anova(fit.b,fit.a)
BF[smpi,"beta"] = coef(fit.a)[2]
BF[smpi,"Fs"] = test.ab$F[2]
}

par(mfrow=c(2,1))
blims = c(min(-1.25*b1.f,min(BF[,'beta'])),max(-1.25*b1.f,max(BF[,'beta'])))
temp = hist(BF[,1],breaks=seq(blims[1],blims[2],length=50))
lines(c(b1.f,b1.f),c(0,max(temp$counts)),lwd=3,col="red3")
Flim = max(1.25*Fs.f,max(BF[,"Fs"]))
temp = hist(BF[,2],breaks=seq(0,Flim,length=50))
lines(c(Fs.f,Fs.f),c(0,max(temp$counts)),lwd=3,col="red3")

ff = summary(fit1)$fstatistic
c(pf(ff[1],ff[2],ff[3],lower.tail=F),2*sum(BF[,1] > b1.f)/nsmp)

c(test[["Pr(>F)"]][2],sum(BF[,2] > Fs.f)/nsmp)

The joint distribution plot seems to make it prett...

2014-09-18T00:14:54.049-04:00

The joint distribution plot seems to make it pretty clear that the majority of the discrepancy is due to the one-tailed vs. two-tailed issue that you mentioned in the last post. As for the non-trivial discrepancy that reminds after that, it seems like Noah's suggestion should do it. But you say that when you do this, contrary to Noah, you "get the same results as in the post." But does this include also correcting the tail issue? Between the two measures it seems like we should be able to get them to match pretty much exactly. Having the code could be helpful here.

2014-09-17T23:31:12.698-04:00

This comment has been removed by the author.

Noah: When I rerun the simulation with mean-center...

2014-09-17T23:31:05.129-04:00

Noah: When I rerun the simulation with mean-centered actual data (both x and y), and mean-centered simulated sample data --so that the estimated intercept in actual and sampled data is always zero-- I get the same results as in the post.

The test of the slope parameter ignores the interc...

2014-09-17T21:43:29.389-04:00

The test of the slope parameter ignores the intercept parameter and the fact that the estimate will vary depending on whether or not the slope is present in the model.

The likelihood ratio test compares the full models, one of which has a slope and an intercept, one of which just has an intercept.

I wrote some code to run similar simulations, but first I standardized x and the error before generating y. In this case, the full and restricted models both have estimated intercepts of zero (or close enough, in the ball park of 1e-16), as expected. Both tests also give the same p value (I used the anova() function, so it's a p value for an F statistic, but that should behave pretty much like a G2 statistic).

If I plot the joint distribution of b1 and F from the null model, it looks like f(x) = x^2, pretty much - a nice, clean U shape with little to no detectable noise.

This recent paper (and its citations) could be use...

2014-09-17T20:43:10.134-04:00

This recent paper (and its citations) could be useful:

http://ba.stat.cmu.edu/journal/forthcoming/smith.pdf