Sunday, February 8, 2015
I've got variable Y that I want to predict from variables X1, X2, etc. What should I do?
For questions like yours -- I've got variable Y that I want to predict from variables X1, X2, etc.; What should I do? -- the best answer is usually informed by background knowledge of the domain. Generic models, like multiple linear regression, don't always make the most meaningful answer.
For example, suppose you're trying to predict the amount of fencing (Y) you'll need for rectangular lots of length X1 and width X2. Then a linear regression would serve you well. Why? Because we know (from background knowledge) that perimeter is a linear function of length and width.
But suppose you're trying to predict how much grass seed you'll need for the same lot. Then you'd want a model that includes the multiplicative product of X1 and X2, because that provides the area of the lot.
As another example, suppose you're trying to predict the installed length of a piece of pipe (Y) as a function of the date (X). You know that pipe expands and contracts as some function of temperature. And you also know that temperature cycles sinusoidally (across the seasons of a year) as a function of date. So, to predict pipe length as function of date, you'd use some trend that incorporates the expansion function on top of a sinusoidal function of date.
Whatever model you end up wanting, it can probably be implemented in JAGS (or BUGS or Stan). That's one of the beauties of the Bayesian approach with its general purpose MCMC software.