## Tuesday, October 2, 2012

### Bayesian estimation of trend with auto-regressive AR(1) deviation

This post shows how to estimate trend coefficients when there is an auto-regressive AR(1) process on the deviation from the trend. The specific example uses a sinusoidal trend to describe daily temperatures across many years, but the programming method in JAGS/BUGS can be easily adapted to other trends.

The example extends a previous post about average daily temperatures modeled as sinusoidal variation around a linear trend. The substantive goal was to estimate the slope of the linear component, to determine whether there is a credible non-zero increase in temperatures over the years. In that post, the discussion mentioned lack of independence across days in the deviation from the trend, and with this post the dependence is described by a simple auto-regressive AR(1) model.  Here is the model specification with the essential conceptual components highlighted in yellow:

model {
trend[1] <- beta0 + beta1 * x[1] + amp * cos( ( x[1] - thresh ) / wl )
for( i in 2 : Ndata ) {
y[i] ~ dt( mu[i] , tau , nu )
mu[i] <- trend[i] + ar1 * ( y[i-1] - trend[i-1] )
trend[i] <- beta0 + beta1 * x[i] + amp * cos( ( x[i] - thresh ) / wl )
}
ar1 ~ dunif(-1.1,1.1) # or dunif(-0.01,0.01)
beta0 ~ dnorm( 0 , 1.0E-12 )
beta1 ~ dnorm( 0 , 1.0E-12 )
tau ~ dgamma( 0.001 , 0.001 )
amp ~ dunif(0,50)
thresh ~ dunif(-183,183)
nu <- nuMinusOne + 1
nuMinusOne ~ dexp(1/29)
}

The trend is modeled as a linear component plus a sinusoidal component:
trend[i] <- beta0 + beta1 * x[i] + amp * cos( ( x[i] - thresh ) / wl )
The slope on the linear component is beta1.

The predicted value of y at time i, denoted mu[i], is the trend at time i plus a proportion of the deviation from the trend on the previous time step:
mu[i] <- trend[i] + ar1 * ( y[i-1] - trend[i-1] )
Notice that if ar1 is zero, then the model reduces to simply mu[i] = trend[i]. Here is the posterior when ar1 is restricted to being essentially zero, by setting its prior to ar1 ~ dunif(-0.01,0.01):
The parameter estimates are basically identical to those in the previous post (as they should be!). In particular, the linear trend component is credibly greater than zero.

When ar1 is freely estimated, by setting its prior to ar1 ~ dunif(-1.1,1.1), then the posterior looks like this:
Notice that the AR(1) coefficient is quite large positive, which makes sense for consecutive daily temperatures (if it's hotter than the sinusoid would predict on one day, it'll probably be hotter than the sinusoid would predict on the next day too). Notice that the estimate of the standard deviation of the noise is now smaller than before, which again makes sense because the AR(1) process is accounting for deviation from the trend which used to be accounted for only by the noise. Importantly, notice that estimates of the other trend parameters are now less certain. In particular, the linear trend component, while having the nearly the same mean in the posterior, has a much wider 95% HDI, which now includes zero.

1. Nice job!

I'm currently struggling to combine AR(1) with censoring to model solution times in a cognitive task. In addition to censoring, another difficulty is that almost all these time series models are working with gaussian white noise. In the context of reaction times it would be more appropriate to consider weibull or gamma distribution. But I find there is almost no work on non-gaussian cases . The best one can do is to log-transform the reaction times.

Nevertheless, I think the time-series approach is sorely underrepresented in psychology research. The consecutive trials in our experiments are not independently distributed as we pretend most of the time. In some cases this may even change the interpretation of the model as you example with temperatures shows.

2. JAGS is nice for implementing all the complexities you mentioned. In particular, censored data can be handled as described in this blog post:

http://doingbayesiandataanalysis.blogspot.com/2012/01/complete-example-of-right-censoring-in.html

In principle, it should be easy to adapt to non-Gaussian likelihood functions.

3. As for improving your model, it would be interesting to subtract the trend from the data and to have a look at the residuals. Are there any remaining periodic patterns? You may want to add more sinusoidal terms. What do the autocorrelation and partial autocorrelation functions of the residuals look like? (If your model describes your data well, there should be no significant left-over correlation in the residuals.)

Also there is a different parametrization in the literature for the sinusoid which involves both sine and cosine term. Its main advantage is that the trend equation then becomes linear with respect to the parameters and hence easier to estimate. Don't know whether this is an issue for MCMC. Maybe, it would improve the sampler convergence.

4. Yes I saw your post on censoring, but as noted there, you need to divine the proper initialization, before you can proceed. Jags error messages are not helpful. I prefer Pymc to Jags, although with pymc you have to be careful how you structure your code so that the sampling is fast enough.

Cryer & Chan recommend in their time series textbook (2008) to replace b*cos(2 *pi*f*t+phi) with b1*cos(2*pi*f*t)+b2*sin(2*pi*f*t) which is linear (f is assumed to be given).
Then it follows that b=sqrt(b1^2+b2^2) and phi=atan(-b1/b2). But I think with MCMC the parametrization shouldn't matter that much.