Bayesians v. Frequentists: A Tale as Old as Time
In our last stats post, we talked at length about everything that can influence the outcome of a statistical model. The choice of parameters. The choice of data. But one thing we avoided talking about was the choice of the approach to the model itself. And that brings us to the two big approaches in statistical modelling – Bayesian vs. Frequentist.
Depending on who you are, you may assign “beauty” and “beast” labels to different sides of the Bayesian v. Frequentist debate. We are not going to solve any serious rivalries here in this blog post. Instead, we are just going to try to give an overview of both schools of thought so that we can have a conceptual understanding of the broader debate.
It all starts with Bayes’ Rule (which has a dramatic history all its own): P(A|B) = [P(B|A) * P(A)] / P(B). This really just means that we can understand the probability of A given B if we only have information about the probability of B given A and the marginal probabilities of A and B. If we swap the alphabet with a modelling scenario: P(parameter | data) = (P(data | parameter) P(parameter))/P(data). P(parameter) is the prior. P(parameter | data) is the posterior and the final outcome of the model. P(data) is a constant we often can ignore.
Still confused? That’s fine, let’s use a more concrete example. To understand the likelihood of a fish being absent from a lake as a result of it being too small – probability of the effect of the parameter given the data, or P(parameter|data), we need to understand
- the probability of the lake being small – P(parameter)
- the probability of the fish being absent from the lake – P(data), and
- the probability of the lake being small given that the fish is absent from it P(data|parameter).
That last one, the probability of our data given the parameter, is what a frequentist approach attempts to analyse. In other words, what is the probability that we got this data (that the fish is absent) based on the parameter (the size of the lake). Maximum likelihood estimation is used to choose the values of parameters that maximize this probability. Bayesian analysis gives us the probability of the parameters given that we observed a set of data. We still want to find the value of the parameters that maximize this probability, but there is a subtle distinction in interpretation.
Hierarchical Bayesian models appear a lot in ecology. Some researchers are pragmatists, taking advantage of the many computational tools that have been developed (since maximum likelihood estimation for complicated models is often hard). Others have qualms about the subjective nature of Bayesian analysis. Bayesian analysis requires that we propose a set of reasonable values of our parameters (via priors) before conducting our analysis. Our results might be sensitive to these proposals.
How are priors chosen in practice? First we need to make sure that the prior puts a positive probability on the true value of the model parameter. Of course, if we knew the true value, we wouldn’t be doing all of this. Therefore we often choose a diffuse prior that covers our bases. We pay a price for this safety though; a diffuse prior makes the distribution of the posterior wider, increasing our uncertainty in the estimated parameter. (To get some intuition for how different priors affect the posterior distribution, check out these interactive apps.)
Think of it like this: you smell smoke in a house, and you want to get to it as soon as possible. Your prior information might be that the smoke is most likely coming from the kitchen, so you check there first, and if your prior is right, you find the fire much faster. Great. But if your prior is wrong, you might waste too much time looking in the kitchen and not be able to find the source of the fire in time.
We can argue about priors forever, but we must keep in mind that in both the Bayesian and frequentist approaches we have to choose a likelihood distribution. This feels like second nature to us, but we should remember that if our model is mis-specified in either case, our results may be in jeopardy. and want to figure out where the baby is. Your prior might be that the baby is normally
In a world where we had some information about the true parameter, we would want to choose a prior that put a large probability mass in the region where we suspect the parameter to be. This would help us decrease the uncertainty in the final estimate. A natural case where this happens is when we update analyses with additional data. Our estimate of a parameter from our first study provides a natural central point for the prior distribution for our second study. As in the scientific method, we constantly update our information about the state of the world by collecting more data and refining our analyses. Sometimes new data confirms what we already know and helps us narrow the uncertainty around an estimate. Other times new data provides overwhelming enough evidence that the new estimate ends up very different than our previous one. Somewhere in between frequentist and Bayesian approaches lies the so-called Empirical Bayes which uses the observed data to form the prior distribution.
Whether you identify as a Bayesian, a frequentist, or an equal opportunity modeler, assessing the fit and the sensitivity of the results to differing modeling assumptions is a key step in the data analysis workflow. Remember, all approaches have their flaws when used inappropriately.
Have a quantitative term or concept that mystifies you? Want it explained simply? Suggest a topic for next month → @sastoudt. You can also see more of Sara’s work at Ecology for the Masses at her profile here.