Saturday, April 7, 2012

Negative Binomial Reparameterization

In a previous post, I showed that direct estimation of the p and r parameters in a negative binomial distribution could involve bad autocorrelation in the MCMC chains, and I suggested that there must be some standard reparameterization to solve the problem, and asked for a pointer. Dr. John Davey of the University of Edinburgh was good enough to point the way. The solution is indeed straight forward: In the direct estimation, the p and r parameters are given priors, while the mean m (and variance v) is derived from p and r. In the reparameterization, the m and r parameters are given priors, while the p (and variance v) is derived from m and r. Autocorrelation in the chains is greatly reduced. Here is an example.

The model specifications:


Parameterized by p and r:
model {
    for( i in 1 : N ) {
      y[i] ~ dnegbin( p , r )
    }
    p ~ dbeta(1.001,1.001)
    r ~ dgamma(0.01,0.01)
    m <- r*(1-p)/p
    v <- r*(1-p)/(p*p)
}

Parameterized by m and r:
model {
    for( i in 1 : N ) {
      y[i] ~ dnegbin( p , r )
    }
    p <- r/(r+m)
    r ~ dgamma(0.01,0.01)
    m ~ dgamma(0.01,0.01)
    v <- r*(1-p)/(p*p)
}

Results:


Parameterized by p and r:


Parameterized by m and r:

The posteriors are a bit different for the two parameterizations, because I used "generic" priors without any attempt to transform them to be equivalent.

Thanks to John Davey for this pointer!

5 comments:

  1. This is great. I'm going to try it with my data now. Thanks for the tip since this seems like a common problem I will encounter with my count data.

    Cheers,
    Dan

    ReplyDelete
  2. Thanks, Professor Kruschke. You inspired me to try this -- and JAGS as a vehicle for learning BUGS -- on another problem, available for public inspection at: http://hypergeometric.wordpress.com/2013/12/23/jags-for-finding-highs-and-lows-in-a-week-of-wikipedia-accesses/

    ReplyDelete
  3. G'day, What would a full script look like when comparing two vectors both of which have a negbinom distribution? I have been trying to transpose this post with BEST.R function, but not very well.

    ReplyDelete
  4. Graeme:

    I won't have a chance to write a full script, but the key thing to do is use nested indexing for the group identifier. Row i of the data file contains the count y[i] and the group identifier x[i]. Then the model statement uses

    for ( i in 1:Nrows ) {
    y[i] ~ dnegbin( p[x[i]] , r[x[i]] )
    }
    for ( grpIdx in 1:2 ) {
    p[grpIdx] <- r[grpIdx]/(r[grpIdx]+m[grpIdx])
    m[grpIdx] ~ dgamma(0.01,0.01)
    # etc. ...
    }

    ReplyDelete
  5. This seems to be working to here. Anything obvioulsy wrong?

    y[i] ~ dnegbin( p[x[i]] , r[x[i]] )
    }
    for ( grpIdx in 1:2) {
    p[grpIdx] <- r[grpIdx]/(r[grpIdx]+m[grpIdx])
    m[grpIdx] ~ dgamma(0.01, 0.01)
    r[grpIdx] ~ dgamma(0.01, 0.01)
    v[grpIdx] <- r[grpIdx]*(1-p[grpIdx])/(p[grpIdx]*p[grpIdx])
    }
    }
    "

    snail<-read.table('alive14-15.txt',header=T)
    attach(snail)

    # Load the data:
    y = c( snail$alive14 , snail$alive15) # combine data into one vector
    x = c( rep(1,length(alive14)) , rep(2,length(alive15)) ) # create group membership code
    Ntotal = length(y)
    # Specify the data in a list, for later shipment to JAGS:
    dataList <- list(
    x = x,
    y = y ,
    Ntotal = Ntotal
    )
    length(y) #check
    length(x)

    paramaters<-c("m","r","p","v") #collect these
    jagsModel <- jags.model(textConnection(modelString), data=dataList, n.chains=3, n.adapt=1e3)

    codaSamples <- coda.samples(jagsModel, variable.names=parameters, n.iter=3000, thin=1)
    m <- as.matrix(codaSamples)

    thanks Graeme

    ReplyDelete