## Wednesday, October 9, 2013

### Diagrams for hierarchical models - we need your opinion

When trying to understand a hierarchical model, I find it helpful to make a diagram of the dependencies between variables. But I have found the traditional directed acyclic graphs (DAGs) to be incomplete at best and downright confusing at worst. Therefore I created differently styled diagrams for Doing Bayesian Data Analysis (DBDA). I have found them to be very useful for explaining models, inventing models, and programming models. But my idiosyncratic impression might be only that, and I would like your insights about the pros and cons of the two styles of diagrams.

To make the contrast concrete, let's consider the classic "rats" example from BUGS. I will start with a textual explanation of the model (using modified phrasing and variable names), and then provide two diagrams of the model, one in DAG style and one in DBDA style. Here is the text:
The data come from 30 young rats whose weights were measured weekly starting at birth for five weeks, with the goal being to assess how their weights changed as a function of age. The variable yj|i denotes the weight of the ith rat measured at days since birth xj|i. The weights are assumed to be distributed normally around the predicted value ωj|i (omega):
yj|i ~ normal( ωj|i , λ )
The parameter λ (lambda) represents the precision (i.e., 1/variance) of the normal distribution. The predicted weight of the ith rat is modeled as a linear function of days since birth:
ωj|i = φi + ξi xj|i
The individual intercepts φi (phi) and slopes ξi (xi) are assumed to come from group-level normal distributions,
φi ~ normal( κ , δ )    and    ξi ~ normal( ζ , γ )
where κ (kappa) and ζ (zeta) are the means of the group-level distributions. The priors on the group-level means are set as vague normal distributions,
κ ~ normal( M , H )    and    ζ ~ normal( M , H )
where the mean M is approximately central on the scale of the data and the precision H is very small. The precision parameters, λ (lambda), δ (delta), and γ (gamma), are given vague gamma priors,
λ ~ gamma( K , I )    and    δ ~ gamma( K , I )    and    γ ~ gamma( K , I )
where the shape K and rate I parameters are set to very small values.
Below are two different diagrams of the model. The first is a DAG. Because there are a variety of style conventions in the literature for DAGs, I used my own hybrid explained in the caption. But I think it is a good and useful hybrid, one that I would want to use if I used DAGs. Take a look at the DAG:
 Squares denote constants, circles denote variables. Solid arrows denote stochastic dependency, heavy dotted arrows denote deterministic dependency. Rounded-corner rectangles denote "plates" for indices.
For DAG users, does the style above capture what you think is needed in a DAG? Does the DAG above help you understand the model? How? Does the DAG confuse you? Why? Would the DAG help you program the model in BUGS / JAGS / Stan? Or not?

Below is a DBDA-style diagram for the model:
 Arrows marked by "~" denote stochastic dependency, arrows marked by "=" denote deterministic dependency. Ellipses on arrows denote indices over which the dependency applies.
Does the DBDA-style diagarm above help you understand the model? How? Does the DBDA-style confuse you? Why? Would the DBDA-style help you program the model in BUGS / JAGS / Stan? Or not?

1. The DAG requires the accompanying text, since the specific distributions aren't denoted in the figure.

On the other hand, you could, in principle, code the DBDA model in JAGS (or whatever) based (almost) entirely on the figure (you'd need a few things not in the figure, of course, e.g., the ranges over which i and j vary).

I'm not sure what the leading ellipses with j|i or i mean in the DBDA figure. Is it just that the indicated relationship is instantiated for all j|i or all i or whatever? I guess it has to be, but it's not entirely clear right away.

I think I prefer line-style (and/or line-width) differences for indicating stochastic vs. deterministic dependencies, so maybe incorporating those into the DBDA style would be nice.

Of course, with either, it would still be better to have accompanying text, and there's some important information that doesn't appear in either figure (e.g., the fact that H, K, and I all have to be positive).

2. John, I always found your own style very intuitive. I think it's an excellent didactic tool for one, but not just that. It's just better in helping people think about the data. I wasn't familiar with the DAGs, and they are not bad of course, but you should stick to your diagrams!

3. Some comments from cross-posting elsewhere:

[Commenter A] I found students followed the Kruschke diagrams, but had trouble putting them into code, which I believe (though have no evidence for) has to do with the lack of plates. I also wonder if replacing (slowly, over the course of the book) the pictogram+name format for identifying the distribution with a shorthand like N(M,H) might render the diagrams less intimidatingly large. Just my \$.00

[Commenter B] I think the Kruschke diagrams make it easier on the student and the novice to visualize the assumptions being made. I do think the plates would be helpful because it is easy to over look the ... or see them on the screen when used during class.

[Commenter C] Seconding [Commenter A]'s suggestion to transition from pictograms to the shorthand. Also, if there's a way of including the plates without cluttering the figures too much, I'd suggest keeping them. I find it's really helpful for things like crossed-random effect models as a way of cleanly showing the difference between "things that vary across participants" and "things that vary across items".

4. Interesting idea. One quick comment.

I don't think it's useful to compare the two because DAGs are a tool for causal inference (and non-parametric which is why they lack some characteristics you're looking for) while these DBDAs appear to be tools for statistical inference.

So they're actually complementary! I'd use the DAG first to decide which type of analysis I want to do. Once I've decided whether I'm going to do regression or MSM or whatever, then the DBDAs would be useful to depict the statistical model.

5. The previous comment raises a point I'd love your comments on: What is a DAG good for? And does being good at that mean it's good for typical data analysis models?

My partial answer: A DAG is useful for indicating dependencies among numerous variables, especially when the variables are meaningfully identified in advance (e.g., ground is/isn't wet, lawn sprinkler is/isn't on, rain is/isn't falling). In traditional applications, the variable values are nominal, and therefore the dependencies are often described by contingency tables, which have numerous parameters and no pretty graphical representation. Sometimes the dependencies are formulas like Noisy-OR with only a few parameters, which might be usefully depicted, but that isn't usually done.

The idea of DAGs was also important in the conceptual and algorithmic development of WinBUGS. Indeed, DoodleBUGS uses DAGs --- see the link to the rats example in the post.

Despite those good uses and origins, I find that they don't transfer very well to typical data analysis situations. (Not even for "deciding which type of analysis ... to do." DAGs might be useful for deciding which extrinsically defined variables do or don't have dependencies, but in data analysis, the variables don't exist outside the models being considered.)

Keep the discussion going...

6. [Commenter D] I definitely like the distributions. I never found the plates useful, but if they could be added without clutter, sure.

[Commenter E] I agree with the others: good to group multiple parameters of hyper prior distributions. Without plate notation, it's hard to figure out which units get unique values of what. (I find it helpful to also label the plate with the index). It would be great to combine these two ideas, but what happens when hyper prior parameters sit in different plates? Should they be grouped by plate, or by the distribution they describe? It would be great to have it both ways, but that seems impossible (unless you use "weirdly contorted tray" notation, in which the plates take on arbitrary shapes to include all their elements)

7. In general I find the DBDA diagrams more intuitive, but that's probably because I learned this stuff initially from your book (thanks by the way!).

One thing that is not captured in either example is to show what parts of the model are observed, inferred or set to constant. In the DAG you show, both constants set by the modeller and the data (y and x) are shown as squares. Other models I have seen denote, for example, data as a bold square. This can be helpful especially when some observed quantity can contribute at a different level of the model.

I also (second? third?) the suggestions that transitioning from the full graphical DBDA diagram into a shorthand notation would be useful over the course of the book.

8. Hi John,

this is really great! How do you make these plots? In R?

9. The plots are presently made by hand in generic drawing software (such as LibreOffice Draw). In principle, it should be possible to start with a JAGS/BUGS model specification and automatically produce a diagram, but that would require many layers of savvy, including parsing the model specification, laying out network structures on planes, and details of a graphics language. It could be a great project for someone.

10. [Commenter F] I'm interested in modelling, ie, "the story of how the data came to be" [cite], and the DBDA style is brilliant for that. DAGs don't give anything like enough info (in your example, no indication of how phi, xi and x relate to omega, other than that it is deterministic) and I've never seen the point of Doodlebugs. A text description interspersed with things like y ~ Binomial(N, p) is essential if you are going to understand either. I don't agree with the suggestions that DBDA-style graphics should be "phased out" in favour of shorthand versions. Perhaps the only justification for that would be that they are difficult/time consuming to make, but good communicators make the effort so that the reader has an easy time! Certainly a tutorial on how to make these plots would be welcome.

A couple of things which might make DBDA diagrams easier to read:

* the ~ and = signs next to the arrows are easily overlooked: different arrow styles would be better, ideally a wavy arrow for stochastic relationships (how to do that?) and a double arrow for deterministic.

* in the example diagram, it looks as if the arrow going down to omega is coming from xi; would be clearer is there was a brace under the whole expression to "funnel" it down to omega.

* the ...i and ...j|i bits are also easily overlooked. Not sure how to improve that; plates will often work but could become a mess, eg. in the expression in your DBDA example would be on the "i" plate, but x needs to be on a "j|i" plate as well. Perhaps multiple arrows would do the trick.

DBDA diagrams are a bit intimidating to start with (don't think "confusing" is the right word). In a workshop, I build them up step by step on the whiteboard. Given a full diagram, I find I need to work through it to "translate into text" in my mind, but once that's done, it's an excellent summary. And does help when it comes to writing the JAGS code.

11. In my research practice I use simple DAGs - just unshaded and shaded (for observed vars) circles, unidirectional and sometimes bidirectional arrows). This allows me to contemplate the structure of the model and the independence relations in particular. Once I've decided on the structure I write equations and then implement them. Often I go back and forth between the equations, implementation and the structure.

I like to keep the structure and the functional specification of the model separate. It allows me to focus on different kinds of questions. With the DAG I ask whether the independence relations can be justified and whether the unobserved variables are identifiable. Once I've chosen the structure, I then consider the distributions and the functional relations between them. Your diagrams mix these aspects together. Although I agree that they may be useful as a didactic device.

12. Your diagrams, all the way! (or as long as possible at least...)

What I'm surprised by is why nobody came up with your type of diagram before. Some comments:

* [Commenter A] I believe it is better to keep the distrograms (distribution pictograms) as long as possible, even if the model is large. Without the distrograms I think a standard model definition would be as clear.

* I like the notation with ~ and = decorating the arrows as opposed to having different arrow styles as there is a clear correspondence here with the regular way of writing models (where ~ and = is used).

* Not really a part of the layout of the diagrams, but do you have a convention for what the different type of symbols stand for? That is, greek for parameters, Latin lower case for variables and upper case for constants? If so, this would mean that the DBDA style diagrams need not to show which symbols stand for these three categories in any other way (relating to Tom Wallis comment).

* Have you been thinking of how multivariate distribution could be depicted, like for example a multivariate normal distribution?

13. I've just recently gotten into Bayesian analysis.

1) Graphical nodes: I much prefer DBDA style "graphic" nodes that convey distribution information over the plain DAG nodes.

2) Edge "end points". IMO the most useful feature that both are missing is that an edge leading into a distribution should indicates which *parameter* this edge/relationship is sending data to. (Instead you have the label of the argument that sent data along this edge...why?) Without it, it's a little bit like specifying arguments for a function without ever saying what the parameters are.

3) Node & Parameter label confounding. Going with the above point, the originating node should be have the label for that node, rather than placing it on the end of the outgoing edge.

4) Iteration. At this early stage of interpreting models, I find the plates to be more intuitive over the ...i,j notation. I have to do a lot of abstraction for the latter. Maybe some kind of multiple/double arrows could help? Or a legend off to the side, showing something about i and j?

5) Stochastic nodes. In the DAG style, coding the lines as stochastic/deterministic made some sense, but in the DBDA diagram, are the ~ markers superfluous? In other words, does any arrow originating from a distribution node mean that you draw a value from that distribution, or can it sometimes mean that you pass the distribution itself as an argument?

6) Possible Additions: I think in some contexts, it might be helpful to annotate nodes or edges with some of the following flags:
-Observed nodes (maybe an eye symbol) for which we are providing data
-Target nodes (maybe a target symbol) for nodes of interest that we will be monitoring the chains of
- Censored or Truncated edge. Although this could be represented by a separate node, it might be less distracting if this were replaced by a shorthand along the edge itself

14. John, I always use graphs to understand the model better, and I really like yours.
What I do however is not use the ~ sign near the arrow, but I indicate with a double-lined arrow deterministic relationships.

To make an example, something like p<=logit(p)<-mu

The curve on the nodes sometimes are useful, sometimes not - but substuting the plates with indexes makes a big difference.

15. You might want to check out this paper, which Aleks Jakulin pointed me to a couple of years ago:

http://www.mpi-inf.mpg.de/~dietz/dirfactor-notation.pdf

It introduces some similar notions of labeling the distributions, only without the density pictures.

I find the density pictures misleading because the basic shape of a density can change based on its parameters. For example Beta(0.5, 0.5), Beta(1,1), and Beta(2,2) vary dramatically in basic shape. I also think it's going to be tricky to sketch the inverse Wishart distribution!

16. Thanks for the pointer to the paper by Laura Dietz. Very interesting!

It's true that any single graph will misrepresent some member of the family it comes from. But what I want from an iconic graph is at-a-glance recognition of what type of function it is, along with the aesthetics of something pictorial. Even higher-dimensional distributions should have iconic forms, although I admit there might be diminishing returns. Nevertheless, I think that (iconic) pictures of the distributions can help. After all, why bother plotting any function at all if we have the mathematical formula? There's something about having a picture that makes it easier for (at least many) human beings to understand. But, as you suggest, pictures can also be misleading if viewers don't have the background knowledge to know which details are relevant and which are incidental.

17. Also, a big THANK YOU to all the people who have posted comments here. Even if I didn't respond directly to your comments, I have read them and appreciated them!

18. I agree with most people rooting for DBDA style, but would like to throw in one more suggestion to unclutter them:
Use an "intuitive" parameterization and HIDE the necessary transformations necessary just to conform with BUGS/JAGS.
E.g. for a normal distribution use and stick to sigma for dispersion; and explain the necessary reparameterizations elsewhere.

I think the graphical representation should be a mid-level description to convey the ideas behind a model. That's why the "distrograms" are so useful, and why imo the variable transformations are in the way.

19. You may be interested in the software I recently developed that allows you to create similar diagrams - it's called http://EquationMap.com

20. Hi John, significantly late to the original posting data so really just putting a message in a bottle here. I've been diving into Bayesian inference over the last few months as part of a graduate research project and I've come across this site a few times, which appears to be a highly venerable resource.

There is however one element of your notation in the DABD model that I just cannot seem to work out, as it seems in contravention to the DAG notation I have been working with most intimately, namely the Win/OpenBUGS graphical notation. Although the WinBUGS notation for directed link (edges, arrows what have you) are further described along the properties of signaling stochastic dependencies vs. functional/deterministic dependencies, such as those in the DBDA model, it seems that you showcase a deterministic link emerging from a deterministic node (i.e. the function in the middle of the model), towards a stochastic node, namely the mean wji. Would this not imply that the deterministic node is deterministically (conditionally) dependent on a stochastic node, against the flow of information updating rather than vice versa? If I was tasked with constructing this figure from scratch given my current understanding, I would have most certainly given deterministic links leading into the deterministic node from the two normal distributions proceeding it, to signal downstream deterministic dependencies. I am certainly not a subject matter expert, but my inability to figure the reason for this out has been particularly irksome as if what is presented is the case, then it seems to subvert my general understanding of Bayesian hierarchical models, which would be particularly inopportune. Maybe I’m just overlooking an obvious piece of information or defined notational characteristic that would explain this discrepancy easily, but otherwise remain perplexed. In the off chance that you see this post, I would really be intrigued to understand the situation and/or potential flaws in my reasoning, as I have been unable to locate a homologous representation in the BUGS literature and none of the other commenters seem to have mentioned this. Thanks for your consideration
Best,
Chris N

1. The arrows in the DBDA diagrams indicate the direction of generation, as in a generative model of the data.

The arrows can be labeled with "~" or with "=". An arrow labeled with "=" merely means that its output is a deterministic function of its input, but such an arrow does *not* indicate that its input is not stochastic. The difference between "~" and "=" is the same as the difference between solid and dotted arrows in DAGs.