Category Archives: Stats Corner
There are many papers out there discussing estimates of abundance and occurrence of a variety of plants and animals. Sometimes you’ll also see references to relative abundance and relative occurrence. What makes researchers go for one estimate over the other? When might you face a similar choice? The goal of this post is to try to shed some light on when you might want to keep things relative.Read more
Every once and awhile the term “ecological fallacy” gets thrown around to critique a particular study. Some Twitter discussion around this pre-print, which compares COVID-19 mortality to vegetable consumption at a country level, got me thinking about the term again. So let’s go through what it is, why it’s a problem, and why sometimes it can’t be avoided.
Image Credit: Pharexia, Ratherous, AKS471883, Source Data from Johns Hopkins University CSSE, The Centers for Disease Control and Prevention, New York Times, CNBC.
As it quickly became clear in late February and early March that COVID-19 was not going away anytime soon, attention turned to trying to figure out when and where the virus would spread. Epidemiologists and virologists have had their work cut out for them, trying to simultaneously reassure and warn people the world over about the dangers, the nature and the potential timeline of the virus.
So it came as somewhat of a surprise to see ecologists try and tip their hat into the ring. Early on in the pandemic, teams of ecologists sprang up, trying to use Species Distribution Models to predict the spread of the virus. And whilst this might sound helpful, many of these studies lacked collaboration with epidemiologists, and their predictions very quickly fell flat. Some studies suggested that areas like Brazil and Central Africa would be largely spared by the virus, which quickly turned out not to be the case. Flaws in the studies were spotted quite quickly by concerned members of both the ecological and epidemiological communities alike, and a few teams got started on responses.
A common goal of ecologists is to understand the population abundance of a particular species. We might be looking for the California condor as part of assessing how well the recovery project is going. This requires some field work, going out to a variety of sites and counting animals that we see. How do we choose which sites to go to? Even in the era of camera traps, we still need to know where to put our extra set of eyes. It would be a shame to have a particular camera not get any action due to an unlucky placement. We don’t have infinite time and money after all!
Image Credit: beeveephoto, CC BY-SA 2.0, Image Cropped
Everything that ecologists do – from saving endangered species to projecting climate change impacts – requires ecological data. Sometimes that data can be hard to come by, like when you’re trying to figure out the range of a rare moss. At other times, that data can be smack bang in front of you, but impossible to measure. The depth of a lake for instance, or the surface area of a tree. Today, we’ll look at how to overcome that second situation, by using other, more easy-to-obtain covariates to provide an estimate of the property you’re looking for.
In our last stats post, we talked at length about everything that can influence the outcome of a statistical model. The choice of parameters. The choice of data. But one thing we avoided talking about was the choice of the approach to the model itself. And that brings us to the two big approaches in statistical modelling – Bayesian vs. Frequentist.
In ecological studies, the quality of the data we use is often a concern. For example, individual animals may be cryptic and hard to detect. Certain sites that we should really be sampling might be hard to reach, so we end up sampling more accessible, less relevant ones. Or it could even be something as simple as recording a raven when we’re really seeing a crow (check our #CrowOrNo if you have problems with that last one). Modeling approaches aim to mitigate the effect on our results of these shortcomings in the data collection.
However, even if we had perfect data, when we decide how to model that data, we have to make choices that may not match the reality of the scenario we are trying to understand. Model mis-specification is a generic term for when our model doesn’t match the processes which have generated the data we are trying to understand. It can lead to biased estimates of covariates and incorrect uncertainty quantification.
After the first edition of Ecology for the Masses’ new Stats Corner, many people requested a discussion of p-values. Ask and you shall receive! And as an added bonus, we’ll also talk about confidence intervals. (Image Credit: Patrick Kavanagh, CC BY 2.0, Image Cropped)
Much of ecological research involves making a decision. Does implementing a particular management strategy significantly increase the species diversity of a region? Is the amount of tree cover significantly associated with the number of deer? Do bigger individuals of a species tend to have longer life expectancies?
When animals like these wolves travel in packs, spotting one individual means we’re more likely to spot another soon after. So how do we come up with a reliable population estimate in situations like these? (Image Credit: Eric Kilby, CC BY-SA 2.0, Image Cropped)