In

a post of a few hours ago, I pointed out that I was having trouble getting

*p* values to agree for two different methods. Thanks to a suggestion from a reader, there is a resolution: The

*p* values should not agree. This was actually my hunch and hope all along, because it adds another reason never to talk about "

**the**"

*p* value for a set of data, because any data set has many different

*p* values.

First, a recap:

I'm doing Monte Carlo simulation of sampling distributions to compute

*p* values. For example, consider the slope parameter, β

_{1}, in simple linear regression. I want to find out if

*p* < .05 for a null hypothesis that β

_{1}=0. I'm working with two different ways to compute a

*p* value, and am finding that the results do not agree with each other. Here are the two ways:

- Consider the maximum likelihood estimate (MLE) of β
_{1} from the actual data, denoted β_{1}^{MLE}_{actual}, and see where it falls in a sampling distribution of β_{1}^{MLE}_{null} for simulated data from the null hypothesis.
- Consider the likelihood ratio statistic, G
^{2} = -2log(LR) where LR is the ratio of the maximum likelihood of the restricted model with β_{1}=0 over the maximum likelihood of the full model with β_{1} free. We see where G^{2}_{actual} falls in the sampling distribution of G^{2}_{null} for simulated data from the null hypothesis.

See

the previous post for details about how the data from the null hypothesis were generated. But here is a repeat of the old picture of the two sampling distributions:

Notice that the p values are not the same in the two distributions. (See the end of the previous post for reasons that the difference cannot be explained away as some simple artifact.)

**Now the news:** Arash Khodadadi, an advanced graduate student in Psychological & Brain Sciences at Indiana University, read the post and pointed out to me that not all samples that have G

^{2}_{null} > G

^{2}_{actual} are the same samples that have β

_{1}^{MLE}_{null} > β

_{1}^{MLE}_{actual}. Essentially he was saying that I really should be looking at the

*joint* sampling distribution. So I made a plot, and here it is:

Each point corresponds to a sample from the null hypothesis. The marginals of the joint distribution were plotted in the previous figure. I put lines at the values of β

_{1}^{MLE}_{actual} and G

^{2}_{actual}, and I color coded the points that exceed those values. Obviously the points contributing to the p value for β

_{1}^{MLE}
are quite different than the points contribution to the p value for G

^{2}. It is conceivable that the red and blue points would exactly equal each other mathematically (perhaps in a two-tailed version), but I doubt it.

**Conclusion:** This exercise leads me to conclude that the

*p* values are different because they are referring to different questions about the the null hypothesis. Which one is more meaningful? For me, the sampling distribution of β

_{1}^{MLE} makes more direct intuitive contact with what people want to know (in a frequentist framework), namely,

*Is the observed magnitude of the slope very probable under the null hypothesis?* The sampling distribution of G

^{2} is less intuitive, as it is asking,

*Is the observed ratio, of the probability of the data under the zero-slope model over the probability of the data under the free-slope model, very probable under the null hypothesis?* Both questions are meaningful, but the first one asks directly about the magnitude of the slope, whereas the second one asks about the relative probabilities of data under the model structures.

**Now the difficult semantic question:** When the

*p* values conflict, as they do in this example (i.e.,

*p* < .05 for β

_{1}^{MLE}, while

*p* > .05 for G

^{2} [and if you prefer two-tailed tests, then we could contrive a case in which the

*p* values conflict there, too]), is the slope "significantly" non-zero? The answer: It depends! It depends on the question you are asking. I think it makes intuitive sense to ask the question about the magnitude of the slope and, therefore, to say in this case that the slope is significantly non-zero. But if you specifically want to ask the model comparison question, then you need the model-comparison

*p* value, and you would conclude that the free-slope model is

*not* significantly better than the zero-slope model.

**Addendum: SEE THE FOLLOW-UP POST. **