Tag Archives: statistical

In Silico Science: Ecology Without the Nature

When dealing with complicated ecological concepts, theoretical models – though they may seem abstract – often help create bridges to fill in our understanding, writes Thomas Haaland (Image Credit: Aga Khan, CC BY-SA 4.0, Image Cropped)

It should not come as a surprise any more that most ecologists don’t spend all that much (work) time outside. Numerous posts on this blog about data management and ecological modelling draw a picture of a modern biologist spending most of their time in front of a computer rather than out in the field. However, the work is still intimately related to the natural world. Gathering the data is simply the first step on the way to scientific understanding, and between organizing data, analyzing data, interpreting results and writing them up, the computer hours just vastly outweigh the outdoor hours. But there is another, more mysterious breed of researchers that has even less to do with nature: theoretical biologists.

Read more

Bayesians v. Frequentists: A Tale as Old as Time

In our last stats post, we talked at length about everything that can influence the outcome of a statistical model. The choice of parameters. The choice of data. But one thing we avoided talking about was the choice of the approach to the model itself. And that brings us to the two big approaches in statistical modelling – Bayesian vs. Frequentist.

Depending on who you are, you may assign “beauty” and “beast” labels to different sides of the Bayesian v. Frequentist debate. We are not going to solve any serious rivalries here in this blog post. Instead, we are just going to try to give an overview of both schools of thought so that we can have a conceptual understanding of the broader debate.

It all starts with Bayes’ Rule (which has a dramatic history all its own): P(A|B) = [P(B|A) * P(A)] / P(B). This really just means that we can understand the probability of A given B if we only have information about the probability of B given A and the marginal probabilities of A and B. If we swap the alphabet with a modelling scenario: P(parameter | data) = (P(data | parameter) P(parameter))/P(data). P(parameter) is the prior. P(parameter | data) is the posterior and the final outcome of the model. P(data) is a constant we often can ignore.

Still confused? That’s fine, let’s use a more concrete example. To understand the likelihood of a fish being absent from a lake as a result of it being too small – probability of the effect of the parameter given the data, or P(parameter|data), we need to understand

  1. the probability of the lake being small – P(parameter)
  2. the probability of the fish being absent from the lake – P(data), and
  3. the probability of the lake being small given that the fish is absent from it P(data|parameter).

That last one, the probability of our data given the parameter, is what a frequentist approach attempts to analyse. In other words, what is the probability that we got this data (that the fish is absent) based on the parameter (the size of the lake).  Maximum likelihood estimation is used to choose the values of parameters that maximize this probability. Bayesian analysis gives us the probability of the parameters given that we observed a set of data. We still want to find the value of the parameters that maximize this probability, but there is a subtle distinction in interpretation.

Hierarchical Bayesian models appear a lot in ecology. Some researchers are pragmatists, taking advantage of the many computational tools that have been developed (since maximum likelihood estimation for complicated models is often hard). Others have qualms about the subjective nature of Bayesian analysis. Bayesian analysis requires that we propose a set of reasonable values of our parameters (via priors) before conducting our analysis. Our results might be sensitive to these proposals.

How are priors chosen in practice? First we need to make sure that the prior puts a positive probability on the true value of the model parameter. Of course, if we knew the true value, we wouldn’t be doing all of this. Therefore we often choose a diffuse prior that covers our bases. We pay a price for this safety though; a diffuse prior makes the distribution of the posterior wider, increasing our uncertainty in the estimated parameter. (To get some intuition for how different priors affect the posterior distribution, check out these interactive apps.)

Think of it like this: you smell smoke in a house, and you want to get to it as soon as possible. Your prior information might be that the smoke is most likely coming from the kitchen, so you check there first, and if your prior is right, you find the fire much faster. Great. But if your prior is wrong, you might waste too much time looking in the kitchen and not be able to find the source of the fire in time.

8225263177_5dcb059b5d_k

Bayesian Pros: You could find the fire more quickly. Cons: You might miss it altogether. (Image Credit: bertknot, CC BY-SA 2.0)

We can argue about priors forever, but we must keep in mind that in both the Bayesian and frequentist approaches we have to choose a likelihood distribution. This feels like second nature to us, but we should remember that if our model is mis-specified in either case, our results may be in jeopardy.  and want to figure out where the baby is. Your prior might be that the baby is normally

In a world where we had some information about the true parameter, we would want to choose a prior that put a large probability mass in the region where we suspect the parameter to be. This would help us decrease the uncertainty in the final estimate. A natural case where this happens is when we update analyses with additional data. Our estimate of a parameter from our first study provides a natural central point for the prior distribution for our second study. As in the scientific method, we constantly update our information about the state of the world by collecting more data and refining our analyses. Sometimes new data confirms what we already know and helps us narrow the uncertainty around an estimate. Other times new data provides overwhelming enough evidence that the new estimate ends up very different than our previous one. Somewhere in between frequentist and Bayesian approaches lies the so-called Empirical Bayes which uses the observed data to form the prior distribution.

Whether you identify as a Bayesian, a frequentist, or an equal opportunity modeler, assessing the fit and the sensitivity of the results to differing modeling assumptions is a key step in the data analysis workflow. Remember, all approaches have their flaws when used inappropriately.

Have a quantitative term or concept that mystifies you? Want it explained simply? Suggest a topic for next month →  @sastoudt. You can also see more of Sara’s work at Ecology for the Masses at her profile here.

Title Image Credit: Daniel Brachlow, Pixabay licence, Image Cropped

The Independence Assumption and its Foe, Spatial Correlation

When animals like these wolves travel in packs, spotting one individual means we're more likely to spot another soon after. So how do we come up with a reliable population estimate in situations like these? (Image Credit: Eric Kilby, CC BY-SA 2.0, Image Cropped)

When animals like these wolves travel in packs, spotting one individual means we’re more likely to spot another soon after. So how do we come up with a reliable population estimate in situations like these? (Image Credit: Eric Kilby, CC BY-SA 2.0, Image Cropped)

The thought of an ecologist may conjure the image of a scientist spending their time out in the field counting birds, looking for moss, studying mushrooms. Yet whilst field ecologists remain an integral part of modern ecology, the reality is that much of the discipline has come to rely on complex models. These are the processes which allow us to estimate figures like the 1 billion animals that have died in the recent Australian bushfires, or the potential spread of species further polewards as climate change warms our planet.

Read more

Species Associations in a Changing World

Species associations will change as the climate rises. So how can we attempt to predict these changes

Species associations will change as the climate rises. So how can we attempt to predict these changes (Image Credit: Charles J Sharp, CC BY-SA 4.0, Image Cropped)

Using joint species distribution models for evaluating how species-to-species associations depend on the environmental context (2017) Tikhonov et al, Methods in Ecology and Evolution, DOI: https://doi.org/10.1111/2041-210X.12723

The Crux

Statistical modelling is a crucial part of ecology. Being able to provide an (admittedly simplified) mathematical description of the relationship between species abundance, range or density and the surrounding environment is a huge help in taking proactive steps to manage an ecosystem, or predicting species numbers in other areas.

Historically models have used environmental variables to explain population or evolutionary developments in species. When modelling a single species, many ecologists have taken into account that the presence of other species (for example competitors or predators) may influence the presence of this single species. This has led to the rise of joint species distribution models (JSDMs), which take into account environmental variables, as well as the interactions between certain species. These models have become increasingly useful, and with environmental change now being the norm in many ecosystems, this week’s authors produced one such model that accounts for changes in species interactions in the face of changing environmental factors.

Read more