Tuesday, November 27, 2012
In a previous post and in a video, I used baseball data to illustrate some ideas about shrinkage and multiple comparisons in hierarchical models. Here I take it a step further, to multi-level hierarchical models, to illustrate shrinkage of estimates within different levels of the model.
Background: Consider the sport of baseball, in which players have opportunities at bat during a season of games, and occasionally hit the ball. In Major League Baseball, players typically hit the ball on about 24% of their opportunities at bat. That ratio, of hits (H) divided by opportunities at bat (AB), is called the "batting average" of each player. Players also must play a position in the field when the other team is at bat. Some fielding positions are specialized and those players are expected to focus on those skills, not necessarily on hitting. In particular, pitchers are not expected to be strong hitters, and catchers might also not be expected to focus so much on hitting. (But I'm no baseball aficionado, so please forgive any inadvertent insults or errors! My thanks to colleagues Ed Hirt and Jim Sherman for tutoring me in some of the basics of baseball rules.)
Our goal is to estimate the underlying probability of getting a hit by each player, based on their hits H, at bats AB, and primary position. The model will put each player's estimated hitting ability (i.e., probability of getting a hit) under a distribution for the primary position, and put those position distributions under an overarching distribution across all positions. Here is a diagram of the hierarchical model. It should be scanned from the bottom up, as described next.
The data are from 948 players in the 2012 regular season of Major League Baseball. Thanks to Matthew Maenner for posting a comment explaining how to download the data. The model simultaneously estimates 948 + 9*2 + 4 = 970 parameters.
Summary: With multi-level hierarchical models, there is shrinkage within each level. In this model, there was shrinkage of the position parameters toward each other, as illustrated by the pitcher and catcher distributions. And there was shrinkage of the player parameters within each position, as shown by various examples above. The model also provided some strong inferences about player abilities based on position alone, as illustrated by the estimates for individual players with few at bats.
Monday, November 12, 2012
Another 15 minute talk, (to be) presented at the Psychonomic Society Special Session on Improving the Quality of Psychological Science. This talk briefly discusses sequential testing, the goal of achieving precision (as opposed to rejecting a null value), and multiple comparisons. The video can be found here.