Does A Modern Ecologist Need To Become A Bayesian?
This question comes from Marney Pratt (@marney_pratt) as she noted that a recent paper tracking trends in ecology papers shows the use of Bayesian statistics increasing over time. (Before we get going, if you want a refresher about what exactly Bayesian thought entails, check out this previous post.) Anderson et al. say:
Regression, an overarching term encompassing many statistical approaches, saw continued and rapid growth in use since the 1950s, approaching correlation by 2005. References to ANOVA and t tests both peaked around 2000, while mentions of generalized linear models (GLMs) steadily increased in frequency, surpassing t tests in 2004. In terms of how such models were fit, least-squares, formerly the dominant paradigm, was rapidly surpassed by Bayesian, maximum likelihood, and information theoretic (AIC) terms around 2000; as of 2010, the term Bayesian was used over twice as often as maximum likelihood.
Unpacking this a little bit, the first thing I notice is that it’s not the models themselves that are changing, but rather it’s the computational approach to fitting those models that is changing. This is supported by a sentence in the discussion that focuses on the implications for ecology education.
“In terms of preparing students to publish their work, our results reinforce the call to refocus ecological statistical education from declining recipe-based approaches (eg ANOVAs) to probability theory, programming, and simulation-based approaches (eg McElreath 2015).”
So perhaps an answer to Marney’s question is, you don’t have to become a Bayesian, but you may need to get familiar with some of a Bayesian’s computational tools to ease your path forward. For any given research question, as we learn more about the ecology and collect more data, we will want to account for more things in a model. This often makes the model more complicated and thus likely harder to fit; we may need to switch tools in order to take on these new complexities.
For example, a generalized linear model is different from a linear model in that it allows the distribution of the response to differ within a set of families. This is a big step forward. If our response is a count of salamanders, we’ll want to analyse it differently than we would the probability of occurrence of a moose. Then a generalized linear mixed model adds another layer of complexity on top of that, allowing for random effects (e.g. we expected unobserved heterogeneity across something like sites or time points). If we are using a tool like R, we’ve had to go from `lm` to `glm` to `lme` (though luckily these have similar syntax). But what if we want to take things a step further?
The minute we want to change something a little bit, we can get stuck computationally. Maybe we think a few of our random effects might be really large (i.e. have heavy tails). Maybe we suspect that slope coefficients for species closely related based on phylogeny are more similar to one another than the slope coefficients for species that have less in common. At this point we’d be searching around, hoping that someone had implemented a method for incorporating this new piece of structure into the model or hacking together different packages specialized for related scenarios ourselves.
What if we could just write out the model structure we expect in a common language or framework and fit the model from there?
For a linear model we observe a covariate X and outcomes Y at many sites (indexed by i) for many species (indexed by j) and want to know the relationship between Y and X. Perhaps in the simplest case we think the relationship is the same for all species.
For ordinary least squares regression, this means that we assume that the observations come from a normal distribution with a mean modeled by a linear combination of a covariate and there is some residual error.
The only thing different about fitting this in a Bayesian framework is we need to establish our prior beliefs about where the intercept, slope, and residual error come from:
Here the pi symbols represent probability distributions that we are free to choose. This is one of the advantages of being Bayesian. If we have prior knowledge about what our responses should be, we can incorporate it here when we choose our priors. Defining that choice is non-trivial, but let’s focus on the structure of the model for the purposes of this post.
If we knew more information about what we expect the slope coefficients to be, like when we suspect that it actually should vary across species in a way that is related to the phylogeny, we could change the prior to be something more complicated, so that it incorporates the phylogeny.
Now we may know phylogenetic covariance relationship based on a phylogenetic tree, in which case, it’s just a fixed matrix, or we need to estimate parts of it, in which case we need a prior distribution on those parameters too.
If our outcomes are counts, we would think that they come from a Poisson distribution with some average value lambda that we model as a linear combination of a covariate. Now we are just changing the distribution of Y in this setup.
If we want to add in random effects per site, we can easily extend lambda.
Again, we’ve added an element that we don’t know, the variance of the random effects, so we need to specify a prior distribution for it. Here we can immediately see that if we think the random effects are heavy tailed we can replace that line with a different distribution and then put a prior on its defining parameter as well, no big deal.
Hopefully, the way these examples have been sketched out shows you why you may have heard “hierarchical Bayesian models” used. This structure is used by common Bayesian software such as BUGS, JAGS, Stan, and NIMBLE. We write out the form of the relationship we expect and then make sure we have defined the data generating process for all components, expressing our prior beliefs as probability distributions.
Of course I’m sweeping a lot of things under the rug like how do I specify these priors for the parameters at the bottom level of the hierarchy, how do I perform and streamline the computation after I’ve set up this model structure, how do I interpret the results (now a *posterior distribution* for our parameters of interest rather than a maximum likelihood estimate). Those are stories for another time, but the core idea is that if you can write down the model you have in mind, you can in principle fit it with the same type of machinery in a Bayeisan framework.
Long story short, and from my point of view, even if one doesn’t become a Bayesian, learning from the Bayesians can help us think about the model structure and the underlying data generating processes of the ecological phenomenon we are interested in. If we then learn to express this structure in an appropriate Bayesian coding language, we can fit more complicated models in a more unified way.