*p*values. For example, consider the slope parameter, β

_{1}, in simple linear regression. I want to find out if

*p*< .05 for a null hypothesis that β

_{1}=0. I'm working with two different ways to compute a

*p*value, and am finding that the results do not agree with each other. Here are the two ways:

- Consider the maximum likelihood estimate (MLE) of β
_{1}from the actual data, denoted β_{1}^{MLE}_{actual}, and see where it falls in a sampling distribution of β_{1}^{MLE}_{null}for simulated data from the null hypothesis. - Consider the likelihood ratio statistic, G
^{2}= -2log(LR) where LR is the ratio of the maximum likelihood of the restricted model with β_{1}=0 over the maximum likelihood of the full model with β_{1}free. We see where G^{2}_{actual}falls in the sampling distribution of G^{2}_{null}for simulated data from the null hypothesis.

*p*values don't agree. What am I doing wrong?

Here's an example with details. Consider the data in this scatter plot:

The maximum-likelihood regression line is shown above in blue, and the maximum-likelihood zero-slope line is shown in red. The MLE parameter values are shown in the header of the plot; in particular β

_{1}

^{MLE}

_{actual}= 5.86. The header also shows that G

^{2}

_{actual}= 4.41.

For reference, in R a call to lm yields

*p*= 0.068 for the slope coefficient. Let's see if either of the Monte Carlo schemes will agree with that result.

We need to generate Monte Carlo data samples. For this purpose, I use the x values in the actual data, and for each x value I generate a random y value according to the MLE values of the restricted model. The restricted model has parameter values shown in the header of the figure below (rounded to three digits):

Sampling distributions from the null hypothesis. The null-hypothesis parameter values are shown in the header of the |

_{1}

^{MLE}

_{null}and G

^{2}

_{null}. There were 50,000 simulated data sets to create fairly smooth and stable sampling distributions, shown above.

The sampling distributions above also show the locations of β

_{1}

^{MLE}

_{actual}and G

^{2}

_{actual}along with their

*p*values. You can see that the

*p*value for β

_{1}

^{MLE}

_{actual}does not match the p value for G

^{2}

_{actual}, but the latter

*does*match the result of R's lm function.

Maybe the problem is merely Monte Carlo sampling noise. But, no, even much bigger simulations don't help (and the

*p*values stay essentially the same).

Maybe the problem is one-tailed vs two-tailed p values. But, no, two times the MLE

*p*value does not equal the G

^{2}

*p*value.

Maybe the problem is that the null hypothesis σ should be the unbiased estimate of SD(y), using N-1 in the denominator for computing SD, instead of the MLE for SD(y), which effectively uses N in the denominator. But, no, that still doesn't make the

*p*value match. The

*p*value for β

_{1}

^{MLE}does increase (i.e., the sampling distribution of β

_{1}

^{MLE}

_{null}widens while the sampling distribution of G

^{2}

_{null}remains unchanged), but still does not match, either one-tailed or two-tailed.

What am I doing wrong with the sampling distribution of β

_{1}

^{MLE}

_{null}?

**ADDENDUM: SEE THE FOLLOW-UP POST!**

## No comments:

## Post a Comment