Thursday, August 8, 2013

How much of a Bayesian posterior distribution falls inside a region of practical equivalence (ROPE)

The posterior distribution of a parameter shows explicitly the relative credibility of the parameter values, given the data. But sometimes people want to make a yes/no decision about whether a particular parameter value is credible, such as the "null" values of 0.0 difference or 0.50 chance probability. For making decisions about null values, I advocate using a region of practical equivalence (ROPE) along with a posterior highest density interval (HDI). The null value is declared to be rejected if the (95%, say) HDI falls completely outside the ROPE, and the null value is declared to be accepted (for practical purposes) if the 95% HDI falls completely inside the ROPE. This decision rule accepts the null value only when the posterior estimate is precise enough to fall within a ROPE. The decision rule rejects the null only when the posterior exceeds the buffer provided by the ROPE, which protects against hyper-inflated false alarms in sequential testing. And the decision rule is intuitive: You can graph the posterior, its HDI, its ROPE, and literally see what it all means in terms of the meaningful parameter being estimated. (For more details about the decision rule, its predecessors in the literature, and alternative approaches, see the article on this linked web page.)

But how do we set the limits of the ROPE? How big is "practically equivalent to the null value"? There is typically no uniquely correct answer to this question, because it depends on the particular domain of application and the current practice in that domain. But the rule does not need a uniquely correct ROPE, it merely needs a reasonable ROPE that is justifiable to the audience of the analysis. As a field matures, the limits of practical equivalence may change, and therefore what may be most useful to an audience is a description of how much of the posterior falls inside or outside the ROPE as a function of the width of the ROPE. The purpose of this blog post is to provide some examples and an R program for doing that.

Consider a situation in which we want to estimate an effect size parameter, which I'll call δ. Suppose we collect a lot of data so that we have a very precise estimate, and the posterior distribution shows that the effect size is very nearly zero, as plotted in the left panel below:
We see that zero is among the 95% most credible values of the effect size, but we can ask whether we should decide to "accept" the value zero (for practical purposes). If we establish a ROPE around zero, from -0.1 to +0.1 as shown above, then we see that the 95% HDI falls entirely within the  ROPE and we accept zero. What if we think a different ROPE is appropriate, either now or in the future? Simply display how much of the posterior distribution falls inside the ROPE, as a function of the ROPE size, as shown in the right panel above. The curve plots how much  of the  posterior distribution falls inside a ROPE centered at the null value, as a function of the radius (i.e., half width) of the ROPE. As a landmark, the plot shows dashed lines at the 95% HDI limit farthest from the null value. From the plot, readers can decide for themselves.

Here is another example. Suppose we are spinning a coin (or doing something more interesting that has dichotomous outcomes) and we want to estimate its underlying probability of heads, denoted θ. Suppose we collect a lot of data so that we have a precise estimate, as shown in the left panel below.
We see that the chance value of 0.50 is among the 95% most credible values of the underlying probability of heads, but we can ask whether we should decide to "accept" the value 0.50. If we establish a ROPE around zero, from 0.49 to 0.51 as shown above, we can see that the 95% HDI falls entirely within the ROPE and we would decide to accept 0.50. But we can provide more information by displaying the amount of the posterior distribution inside a ROPE centered on the null value as a function of the width of the ROPE. The right panel, above, allows the readers to decide for themselves.

The plots can be used for cases of rejecting the null, too. Suppose we spin a coin and the estimate shows a value apparently not at chance, as shown in the left panel below.
Should we reject the null value? The 95% HDI falls entirely outside the ROPE from 0.49 to 0.51, which means that the 95% most credible values are all not practically equivalent to 0.5. But what if we used a different ROPE? The posterior distribution, along with the explicit marking of the 95% HDI limits, is all we really need to see what the biggest ROPE could be and still let us reject the null. If we want more detailed information, the panel on the right, above, reveals the answer --- although we need to mentally subtract from 1.0 to get the posterior area not in the ROPE (on either side).

Here is the R script I used for creating the graphs above. Notice that it defines a function for plotting the area in the ROPE.

#------------------------------------------------------------------------------

# From http://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/Programs/ get
# openGraphSaveGraph.R, plotPost.R, and HDIofMCMC.R. Put them in the working
# directory of this script, or specify the path in the next lines' source()
# commands:
source("./openGraphSaveGraph.R")
source("./plotPost.R")
source("./HDIofMCMC.R")

# Define function for computing area in ROPE and plotting it:
plotAreaInROPE = function( mcmcChain , maxROPEradius , compVal=0.0 ,
                           HDImass=0.95 , ... ) {
  ropeRadVec = seq( 0 , maxROPEradius , length=201 ) # arbitrary comb
  areaInRope = rep( NA , length(ropeRadVec) )
  for ( rIdx in 1:length(ropeRadVec) ) {
    areaInRope[rIdx] = ( sum( mcmcChain > (compVal-ropeRadVec[rIdx])
                            & mcmcChain < (compVal+ropeRadVec[rIdx]) )
                         / length(mcmcChain) )
  }
  plot( ropeRadVec , areaInRope ,
        xlab=bquote("Radius of ROPE around "*.(compVal)) ,
        ylab="Posterior in ROPE" ,
        type="l" , lwd=4 , col="darkred" , cex.lab=1.5 , ... )
  # From http://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/Programs/
  # get HDIofMCMC.R. Put it in the working directory of this script, or specify
  # the path in the next line's source() command:
  # source("./HDIofMCMC.R")
  HDIlim = HDIofMCMC( mcmcChain , credMass=HDImass )
  farHDIlim = HDIlim[which.max(abs(HDIlim-compVal))]
  ropeRadHDI = abs(compVal-farHDIlim)
  areaInFarHDIlim = ( sum( mcmcChain > (compVal-ropeRadHDI)
                         & mcmcChain < (compVal+ropeRadHDI) )
                      / length(mcmcChain) )
  lines( c(ropeRadHDI,ropeRadHDI) , c(-0.5,areaInFarHDIlim) ,
         lty="dashed" , col="darkred" )
  text( ropeRadHDI , 0 ,
        bquote( atop( .(100*HDImass)*"% HDI limit" ,
                     "farthest from "*.(compVal) ) ) , adj=c(0.5,0) )
  lines( c(-0.5,ropeRadHDI) ,c(areaInFarHDIlim,areaInFarHDIlim) ,
         lty="dashed" , col="darkred" )
  text( 0 , areaInFarHDIlim , bquote(.(signif(areaInFarHDIlim,3))) ,
        adj=c(0,1.1) )
  return( list( ROPEradius=ropeRadVec , areaInROPE=areaInRope ) )
}

#------------------------------------------------------------------------------

# Generate a fictitious MCMC posterior:
m = 0.03
s = (0.1-m)/3
paramMCMCsample = rnorm(50000,m,s)

# Specify the desired comparison value, ROPE radius, and HDI mass:
compVal = 0.0
ropeRad = 0.1
HDImass = .95

# Open graphics window and specify layout:
openGraph(width=7.5,height=3.5)
par(mar=c(3.5,3.5,2,1),mgp=c(2,0.7,0))
layout( matrix( c(1,2) , nrow=1 , byrow=FALSE ) )

# Plot the posterior with HDI and ROPE:
postInfo = plotPost( paramMCMCsample , compVal=compVal ,
                     ROPE=compVal+c(-ropeRad,ropeRad) , showMode=TRUE ,
                     credMass=HDImass , xlab=expression(delta*" (parameter)") )

# Plot the area in ROPE:
ropeInfo = plotAreaInROPE( mcmcChain=paramMCMCsample , maxROPEradius=0.15 ,
                           compVal=compVal , HDImass=HDImass )


#------------------------------------------------------------------------------

18 comments:

  1. Interesting to read more about ROPE! It does feel a little bit like a Bayesian twist on hypothesis testing. A large ROPE results in a higher chance of accepting the null while a tight ROPE (tightrope?) gives a higher chance of rejecting the null. It would be interesting to see a paper where the ROPE decision rule is compared to other possible decision rules (Neyman–Pearson, Bayes factor, etc.).

    By the way, since you call it a ROPE, would it be ok to introduce some extra rope related nomenclature? For example, when the ROPE null can't be rejected nor accepted this could be a "tie", a reasonably tight ROPE could be a "tightrope", an unreasonable tight ROPE could be a "knot", etc.

    ReplyDelete
  2. I like your string of thought, and it might lead to a long thread of loosely tied responses. Wherever these PUNishable comments take us, let's agree in advance not to noose our sense of humor!

    ReplyDelete
  3. Very nice post and useful function. A version of BEST package with it is now on GitHub and I'll submit to CRAN after I've run some more checks.

    BTW is it clear that the landmark "HDI limit farthest" is the radius at which the HDI lies entirely within the ROPE? Ie. your preferred decision rule for accepting the hypothesis.

    ReplyDelete
  4. Mike:

    That's a good point you make about the meaning of the farthest HDI limit.

    And thank you again for creating the BEST package!

    --John

    ReplyDelete
  5. Is there a way to frame this criterion as the solution to the minimiation of the posterior expected loss for some loss function? If so, what would be the corresponding loss function?

    I started a question here: http://stats.stackexchange.com/questions/103067/rope-vs-hdi-loss-function and it would be great to get your input or someone elses on it.

    Thanks for the great post!

    ReplyDelete
  6. 2 questions on ROPE:

    1. Which do you think more appropriate as a decision criterion a) require the HDI interval to be within ROPE, or b) set a lower limit on the posterior probability contained within ROPE.

    2. What about a confidence level? For instance consider the philosophy behind a Bayesian fixed-in-advance beta-content tolerance interval (e.g., see Wolfinger) in which both a content (proportion) and a confidence level are specified. Specifying a confidence level provides a level of assurance that the result is repeatable. I think this question may apply mostly to a posterior predictive situation.

    Thank you for your thoughts!!

    Dave LeBlond (david.leblond@sbcglobal.net)

    ReplyDelete
  7. > 1. Which do you think more appropriate as a decision criterion a) require the HDI interval to be within ROPE, or b) set a lower limit on the posterior probability contained within ROPE.

    I take the posterior density seriously, which is why it makes sense to me to consider the relation of the HDI to the ROPE. The decision is to "accept" the null value is based on the 95% most probable values all being practically equivalent to the null value. The decision to "reject" he null value is based on the 95% most probable values all not being practically equivalent to the null value.

    Suppose instead that we accept a ROPEd value if (say) 95% of the posterior mass falls within the ROPE, and we reject a ROPEd value if (say) 95% of the posterior mass falls outside the ROPE. Then we would get weird conclusions when distributions are strongly skewed. Consider Figure 12.2, p.342, of DBDA2E, which shows a skewed distribution with its 95% HDI and its 95% equal-tailed interval (ETI). Suppose the ROPE boundaries are at the ETI boundaries. Then we would accept the ROPEd value despite the fact that there are parameter values with high posterior probability that are not practically equivalent to the ROPEd value. Or, suppose that the right side of the ROPE is at the left edge of the ETI in Figure 12.2. Then we would reject the ROPEd value despite the fact that there are parameter values with high posterior probability that are practically equivalent to the ROPEd value.

    > 2. What about a confidence level? For instance consider the philosophy behind a Bayesian fixed-in-advance beta-content tolerance interval (e.g., see Wolfinger) in which both a content (proportion) and a confidence level are specified. Specifying a confidence level provides a level of assurance that the result is repeatable. I think this question may apply mostly to a posterior predictive situation.

    I don't have much to say about this. I think, though, that you're leaning toward a frequentist criterion applied to a Bayesian posterior distribution. There's nothing inherently wrong with that, but it's a different animal than what's being pursued with the HDI and ROPE. With a frequentist criterion you need to make assumptions about your sampling intention --which you can do-- but then the criterion is linked to your specific sampling assumptions.

    ReplyDelete
  8. Somewhat related to the loss function question, why use the HDI and not just compute P(\theta outside ROPE)? If it is "high" it seems like reason to reject the ROPE.

    ReplyDelete
  9. Dear Anonymous March 18:

    I think that question is already answered (perhaps implicitly) in my last comment. I like a decision rule that treats the densities seriously. To illustrate with an extreme contrived example, suppose that 98% of the distribution is outside the ROPE, but it is so spread out that its density is actually lower than the 2% of the distribution inside the ROPE. Then the most credible (highest density) values are inside the ROPE, even if most of the distribution is outside the ROPE.

    That's why I like a decision rule that actually uses the highest-density values. This isn't a uniquely correct decision rule, it's just one that makes a lot of sense if you treat densities seriously.

    ReplyDelete
  10. Do you think that it would be informative if you defined the ROPE as the 95% HDI of the same data randomized? You could include it as another possibly interesting way point in your plot of the proportion of the posterior inside ROPE as a function of ROPE width.

    ReplyDelete
    Replies
    1. The ROPE should be defined by considering what it means to be a negligible effect size, what it means for an effect to be "practically equivalent" to the landmark value. That's separate from whatever data you happen to have in the current study.

      Delete
  11. Dear John,

    the decision rules that you mentioned are: accept the null-hypothesis if the HDI is a subset of the ROPE and reject it if HDI and ROPE are non-overlapping. But what happens if they only partially overlap? Do you think it would be meaningful to condense these three cases into two cases, like in frequentist testing: (i) null-hypothesis can be rejected and (ii) null-hypothesis cannot be rejected (intersection of HDI and ROPE is nonempty)? In words, the null-hypothesis could then *not* be rejected if the 95% most credible values do *not* fall outside of the ROPE. My final goal is to compare several means and derive a letter coding, similar in spirit to Fisher's LSD, to indicate which pairs are not "significantly different".

    Thank you,
    Dominik

    ReplyDelete
    Replies
    1. A key feature of the ROPE+HDI approach is to be able to "accept" a parameter value (for practical purposes). In spirit it is much like frequentist equivalence testing.

      See this blog post about assessing null values:
      http://doingbayesiandataanalysis.blogspot.com/2016/12/bayesian-assessment-of-null-values.html

      See this blog post about equivalence testing:
      http://doingbayesiandataanalysis.blogspot.com/2017/02/equivalence-testing-two-one-sided-test.html

      Delete
  12. I am a new fans here using bayesian method. I am using 3 level hierachical log-linear growth model (non-centered slope & intercept) to estimate growth rate in plants growing in different pesticide concentrations. However, with the log-linear model, the difference of growth rate (slope) is extremely small.


    I have the control compared with other 5 concentrations of pesticide. Mean difference of growth rates are described as below:-
    comparison of mode of mean difference 95% HDI limit
    concentrations (growth rate) farthers from 0
    i) control vs 1mg/L pesticide 0.00576 -0.00433-0.02
    ii) control vs 1.8mg/L pesticide -0.0153 -0.0256-0.0042
    iii) control vs 3.2mg/L pesticide -0.0161 -0.0276-0.00505
    iv) control vs 5.8mg/L pesticide 0.0483 0.0367-0.0582
    v) control vs 10mg/L pesticide 0.0875 0.0777-0.098

    If using the suggested standard ROPE around zero, from -0.1 to +0.1 as above mentioned, there will be no effect of pesticide for all concentrations (whereby, it is not the case). So, seems with log-linear model, the ROPEvalue need to be adjusted. I am looking at 0.02 instead of 0.1. Does 0.02 sounds akward? Is there other forum I could post more information for the question? I have plotted the same plot as shown above, wish I could show it here. This is only result from one lab, there are other data from another 30 labs which has the same problem. I'm hopping if you could shed some light on the problem mentioned.

    ReplyDelete
    Replies
    1. This article has more details about how to construct a ROPE:
      Kruschke, J. K. (2018). Rejecting or Accepting Parameter Values in Bayesian Estimation, Advances in Methods and Practices in Psychological Science, 1, 270-280.
      The article is at https://journals.sagepub.com/doi/pdf/10.1177/2515245918771304
      and important supplemental materials are at https://osf.io/fchdr/

      Delete
  13. Dear John,

    I'm quite new to Bayesian statistics but really appreciate the clarity of your explanations. In Chapter 12.1.1 (p. 339) of your book (DBDAE2), you write:

    "Thus, if the MCMC HDI limit is very near the ROPE limit, be cautious in your interpretation because the HDI limit has instability due to MCMC randomness. Analytically derived HDI limits do not suffer this problem, of course."

    Could you please give some advice on just how near is "very near" (e.g. a certain proportion of the HDI length?), and what we need to consider when interpreting such cases?

    Many thanks :)
    Dylan

    ReplyDelete
    Replies
    1. Hi Dylan. The answer to your question is discussed extensively in Ch. 7 (Section 7.5) of the book (DBDA, 2nd Edition). Basic recommendation: For reasonably stable HDI limits, make the effective sample size (ESS), of whatever it is you're looking at, at least 10,000.

      Delete
    2. Thanks so much for the quick reply John. I just reread that section and it seems to suggest that if my chains have converged and are well mixed, Gelman–Rubin statistics are below 1.1 and ESS > 10,000, then there shouldn't be any great concern with interpreting HDIs limits near the ROPE limits (although I'm still not sure how to interpret the "very near" of p. 339...?)

      Delete