Monday, September 29, 2014
Doing Bayesian Data Analysis in Python
Do you prefer Python? Some readers have undertaken to translate the computer programs from Doing Bayesian Data Analysis into Python, including Osvaldo Martin, who has this GitHub site for his ongoing project. If you are interested in what he has done, or if you are interested in contributing, please contact him. If you have your own Python-translation project going, please add a comment to this post. Thank you all for your contributions to make Bayesian methods available in all flavors.
Thursday, September 18, 2014
p value from likelihood ratio test is STILL not the same as p value from maximum likelihood estimate
In yesterday's post, I described two ways for finding a p value for a parameter, and pointed out that the two ways lead to different p values. As an example, I considered the slope parameter in simple linear regression. One way to get a p value for the slope parameter is with a standard likelihood ratio test, for linear model with free slope versus intercept-only model. The other way to get a p value is to generate a sampling distribution of the MLE slope parameter itself, from a null hypothesis that consists of the MLE intercept-only model. Through remarks from commenters, especially Arash Khodadadi and Noah Silbert, I've been prodded to clarify the differences between these approaches. This post points out
The header of the plot, above, indicates the MLE of the linear model, and -2log(LR), a.k.a. G2, rounded to three significant digits.
The null hypothesis is an intercept-only model (beta1=0) with beta0 and sigma set to their MLE values when beta1 is fixed at zero. (Which means, in this case, that beta0 is the mean of y and sigma is the sd of y using N not N-1 in the denominator.) I generated sample data from the null hypothesis using the x values in the actual data. For each sample I computed the MLE of the full model and -2log(LR). The resulting marginal sampling distributions are shown here:
In the left panel, above, the one-tailed p value of MLE beta1 is displayed; multiply it by 2 to get the two-tailed p value. Notice it is different than the p value of the likelihood ratio.
Below is the joint sampling distribution, where each point is a sample from the null hypothesis. There is a new twist to this figure: Each point is color coded for the magnitude of MLE sigma, where blue is the largest MLE sigma in the distribution and red is the smallest MLE sigma in the distribution.
The joint sampling distribution also shows the thresholds for p=.05 (one-tailed or two-tailed), and the actual data statistics are plotted as a "+".
You can see from the joint sampling distribution that MLE beta1 can be large-ish even when -2log(LR) is small-ish when the sample MLE sigma is large-ish (blue points). But the opposite can happen when the sample MLE sigma is small-ish (red points). Thus, a key difference between the measures of the slope parameter is how they deal with the variance. The likelihood ratio compares the free-slope against intercept-only models relative to the variance, while the MLE beta1 considers the slope on the absolute scale of the data, not relative to the variance.
As discussed in yesterday's post, I don't think either test is inherently better than the other. They just ask the question about the slope parameter in different ways. As mentioned yesterday, posing the question in terms of absolute MLE beta1 has direct intuitive interpretation. It's also much easier to use when defining confidence intervals as the range of parameter values not rejected by p<alpha (which is, for me, the most coherent way to define confidence intervals). But that's a topic for another day!
- The likelihood ratio test compares model fits relative to the variance, while the sampling distribution of the MLE slope parameter is on the absolute scale of the data.
- An example in which the slope parameter has p < .05 two-tailed in the sampling distribution of the MLE slope parameter but has p > .05 (one-tailed) in the sampling distribution of the likelihood ratio.
The header of the plot, above, indicates the MLE of the linear model, and -2log(LR), a.k.a. G2, rounded to three significant digits.
The null hypothesis is an intercept-only model (beta1=0) with beta0 and sigma set to their MLE values when beta1 is fixed at zero. (Which means, in this case, that beta0 is the mean of y and sigma is the sd of y using N not N-1 in the denominator.) I generated sample data from the null hypothesis using the x values in the actual data. For each sample I computed the MLE of the full model and -2log(LR). The resulting marginal sampling distributions are shown here:
In the left panel, above, the one-tailed p value of MLE beta1 is displayed; multiply it by 2 to get the two-tailed p value. Notice it is different than the p value of the likelihood ratio.
Below is the joint sampling distribution, where each point is a sample from the null hypothesis. There is a new twist to this figure: Each point is color coded for the magnitude of MLE sigma, where blue is the largest MLE sigma in the distribution and red is the smallest MLE sigma in the distribution.
The joint sampling distribution also shows the thresholds for p=.05 (one-tailed or two-tailed), and the actual data statistics are plotted as a "+".
You can see from the joint sampling distribution that MLE beta1 can be large-ish even when -2log(LR) is small-ish when the sample MLE sigma is large-ish (blue points). But the opposite can happen when the sample MLE sigma is small-ish (red points). Thus, a key difference between the measures of the slope parameter is how they deal with the variance. The likelihood ratio compares the free-slope against intercept-only models relative to the variance, while the MLE beta1 considers the slope on the absolute scale of the data, not relative to the variance.
As discussed in yesterday's post, I don't think either test is inherently better than the other. They just ask the question about the slope parameter in different ways. As mentioned yesterday, posing the question in terms of absolute MLE beta1 has direct intuitive interpretation. It's also much easier to use when defining confidence intervals as the range of parameter values not rejected by p<alpha (which is, for me, the most coherent way to define confidence intervals). But that's a topic for another day!
Wednesday, September 17, 2014
p value from likelihood ratio test is not the same as p value from maximum likelihood estimate
In a post of a few hours ago, I pointed out that I was having trouble getting p values to agree for two different methods. Thanks to a suggestion from a reader, there is a resolution: The p values should not agree. This was actually my hunch and hope all along, because it adds another reason never to talk about "the" p value for a set of data, because any data set has many different p values.
First, a recap:
I'm doing Monte Carlo simulation of sampling distributions to compute p values. For example, consider the slope parameter, β1, in simple linear regression. I want to find out if p < .05 for a null hypothesis that β1=0. I'm working with two different ways to compute a p value, and am finding that the results do not agree with each other. Here are the two ways:
Notice that the p values are not the same in the two distributions. (See the end of the previous post for reasons that the difference cannot be explained away as some simple artifact.)
Now the news: Arash Khodadadi, an advanced graduate student in Psychological & Brain Sciences at Indiana University, read the post and pointed out to me that not all samples that have G2null > G2actual are the same samples that have β1MLEnull > β1MLEactual. Essentially he was saying that I really should be looking at the joint sampling distribution. So I made a plot, and here it is:
Each point corresponds to a sample from the null hypothesis. The marginals of the joint distribution were plotted in the previous figure. I put lines at the values of β1MLEactual and G2actual, and I color coded the points that exceed those values. Obviously the points contributing to the p value for β1MLE
are quite different than the points contribution to the p value for G2. It is conceivable that the red and blue points would exactly equal each other mathematically (perhaps in a two-tailed version), but I doubt it.
Conclusion: This exercise leads me to conclude that the p values are different because they are referring to different questions about the the null hypothesis. Which one is more meaningful? For me, the sampling distribution of β1MLE makes more direct intuitive contact with what people want to know (in a frequentist framework), namely, Is the observed magnitude of the slope very probable under the null hypothesis? The sampling distribution of G2 is less intuitive, as it is asking, Is the observed ratio, of the probability of the data under the zero-slope model over the probability of the data under the free-slope model, very probable under the null hypothesis? Both questions are meaningful, but the first one asks directly about the magnitude of the slope, whereas the second one asks about the relative probabilities of data under the model structures.
Now the difficult semantic question: When the p values conflict, as they do in this example (i.e., p < .05 for β1MLE, while p > .05 for G2 [and if you prefer two-tailed tests, then we could contrive a case in which the p values conflict there, too]), is the slope "significantly" non-zero? The answer: It depends! It depends on the question you are asking. I think it makes intuitive sense to ask the question about the magnitude of the slope and, therefore, to say in this case that the slope is significantly non-zero. But if you specifically want to ask the model comparison question, then you need the model-comparison p value, and you would conclude that the free-slope model is not significantly better than the zero-slope model.
Addendum: SEE THE FOLLOW-UP POST.
First, a recap:
I'm doing Monte Carlo simulation of sampling distributions to compute p values. For example, consider the slope parameter, β1, in simple linear regression. I want to find out if p < .05 for a null hypothesis that β1=0. I'm working with two different ways to compute a p value, and am finding that the results do not agree with each other. Here are the two ways:
- Consider the maximum likelihood estimate (MLE) of β1 from the actual data, denoted β1MLEactual, and see where it falls in a sampling distribution of β1MLEnull for simulated data from the null hypothesis.
- Consider the likelihood ratio statistic, G2 = -2log(LR) where LR is the ratio of the maximum likelihood of the restricted model with β1=0 over the maximum likelihood of the full model with β1 free. We see where G2actual falls in the sampling distribution of G2null for simulated data from the null hypothesis.
Notice that the p values are not the same in the two distributions. (See the end of the previous post for reasons that the difference cannot be explained away as some simple artifact.)
Now the news: Arash Khodadadi, an advanced graduate student in Psychological & Brain Sciences at Indiana University, read the post and pointed out to me that not all samples that have G2null > G2actual are the same samples that have β1MLEnull > β1MLEactual. Essentially he was saying that I really should be looking at the joint sampling distribution. So I made a plot, and here it is:
Each point corresponds to a sample from the null hypothesis. The marginals of the joint distribution were plotted in the previous figure. I put lines at the values of β1MLEactual and G2actual, and I color coded the points that exceed those values. Obviously the points contributing to the p value for β1MLE
are quite different than the points contribution to the p value for G2. It is conceivable that the red and blue points would exactly equal each other mathematically (perhaps in a two-tailed version), but I doubt it.
Conclusion: This exercise leads me to conclude that the p values are different because they are referring to different questions about the the null hypothesis. Which one is more meaningful? For me, the sampling distribution of β1MLE makes more direct intuitive contact with what people want to know (in a frequentist framework), namely, Is the observed magnitude of the slope very probable under the null hypothesis? The sampling distribution of G2 is less intuitive, as it is asking, Is the observed ratio, of the probability of the data under the zero-slope model over the probability of the data under the free-slope model, very probable under the null hypothesis? Both questions are meaningful, but the first one asks directly about the magnitude of the slope, whereas the second one asks about the relative probabilities of data under the model structures.
Now the difficult semantic question: When the p values conflict, as they do in this example (i.e., p < .05 for β1MLE, while p > .05 for G2 [and if you prefer two-tailed tests, then we could contrive a case in which the p values conflict there, too]), is the slope "significantly" non-zero? The answer: It depends! It depends on the question you are asking. I think it makes intuitive sense to ask the question about the magnitude of the slope and, therefore, to say in this case that the slope is significantly non-zero. But if you specifically want to ask the model comparison question, then you need the model-comparison p value, and you would conclude that the free-slope model is not significantly better than the zero-slope model.
Addendum: SEE THE FOLLOW-UP POST.
Sampling distribution of maximum lilkelihood estimate - help?
I'm doing Monte Carlo simulation of sampling distributions to compute p values. For example, consider the slope parameter, β1, in simple linear regression. I want to find out if p < .05 for a null hypothesis that β1=0. I'm working with two different ways to compute a p value, and am finding that the results do not agree with each other. Here are the two ways:
Here's an example with details. Consider the data in this scatter plot:
The maximum-likelihood regression line is shown above in blue, and the maximum-likelihood zero-slope line is shown in red. The MLE parameter values are shown in the header of the plot; in particular β1MLEactual = 5.86. The header also shows that G2actual = 4.41.
For reference, in R a call to lm yields p = 0.068 for the slope coefficient. Let's see if either of the Monte Carlo schemes will agree with that result.
We need to generate Monte Carlo data samples. For this purpose, I use the x values in the actual data, and for each x value I generate a random y value according to the MLE values of the restricted model. The restricted model has parameter values shown in the header of the figure below (rounded to three digits):
Thus, for each x value in the data, a random value for y is created in R as rnorm(1,mean=158,sd=36.6)using the full-precision MLE values, not just the rounded values shown here. For each simulated set of data, I (well, the computer) computed β1MLEnull and G2null. There were 50,000 simulated data sets to create fairly smooth and stable sampling distributions, shown above.
The sampling distributions above also show the locations of β1MLEactual and G2actual along with their p values. You can see that the p value for β1MLEactual does not match the p value for G2actual, but the latter does match the result of R's lm function.
Maybe the problem is merely Monte Carlo sampling noise. But, no, even much bigger simulations don't help (and the p values stay essentially the same).
Maybe the problem is one-tailed vs two-tailed p values. But, no, two times the MLE p value does not equal the G2 p value.
Maybe the problem is that the null hypothesis σ should be the unbiased estimate of SD(y), using N-1 in the denominator for computing SD, instead of the MLE for SD(y), which effectively uses N in the denominator. But, no, that still doesn't make the p value match. The p value for β1MLE does increase (i.e., the sampling distribution of β1MLEnull widens while the sampling distribution of G2null remains unchanged), but still does not match, either one-tailed or two-tailed.
What am I doing wrong with the sampling distribution of β1MLEnull?
ADDENDUM: SEE THE FOLLOW-UP POST!
- Consider the maximum likelihood estimate (MLE) of β1 from the actual data, denoted β1MLEactual, and see where it falls in a sampling distribution of β1MLEnull for simulated data from the null hypothesis.
- Consider the likelihood ratio statistic, G2 = -2log(LR) where LR is the ratio of the maximum likelihood of the restricted model with β1=0 over the maximum likelihood of the full model with β1 free. We see where G2actual falls in the sampling distribution of G2null for simulated data from the null hypothesis.
Here's an example with details. Consider the data in this scatter plot:
The maximum-likelihood regression line is shown above in blue, and the maximum-likelihood zero-slope line is shown in red. The MLE parameter values are shown in the header of the plot; in particular β1MLEactual = 5.86. The header also shows that G2actual = 4.41.
For reference, in R a call to lm yields p = 0.068 for the slope coefficient. Let's see if either of the Monte Carlo schemes will agree with that result.
We need to generate Monte Carlo data samples. For this purpose, I use the x values in the actual data, and for each x value I generate a random y value according to the MLE values of the restricted model. The restricted model has parameter values shown in the header of the figure below (rounded to three digits):
Sampling distributions from the null hypothesis. The null-hypothesis parameter values are shown in the header of the |
The sampling distributions above also show the locations of β1MLEactual and G2actual along with their p values. You can see that the p value for β1MLEactual does not match the p value for G2actual, but the latter does match the result of R's lm function.
Maybe the problem is merely Monte Carlo sampling noise. But, no, even much bigger simulations don't help (and the p values stay essentially the same).
Maybe the problem is one-tailed vs two-tailed p values. But, no, two times the MLE p value does not equal the G2 p value.
Maybe the problem is that the null hypothesis σ should be the unbiased estimate of SD(y), using N-1 in the denominator for computing SD, instead of the MLE for SD(y), which effectively uses N in the denominator. But, no, that still doesn't make the p value match. The p value for β1MLE does increase (i.e., the sampling distribution of β1MLEnull widens while the sampling distribution of G2null remains unchanged), but still does not match, either one-tailed or two-tailed.
What am I doing wrong with the sampling distribution of β1MLEnull?
ADDENDUM: SEE THE FOLLOW-UP POST!
Subscribe to:
Posts (Atom)