Tag Archives: definition

What’s the Deal with P-Values and Their Friend the Confidence Interval?

After the first edition of Ecology for the Masses’ new Stats Corner, many people requested a discussion of p-values. Ask and you shall receive! And as an added bonus, we’ll also talk about confidence intervals. (Image Credit: Patrick Kavanagh, CC BY 2.0, Image Cropped)

Much of ecological research involves making a decision. Does implementing a particular management strategy significantly increase the species diversity of a region? Is the amount of tree cover significantly associated with the number of deer? Do bigger individuals of a species tend to have longer life expectancies?

To answer these questions ecologists collect data and perform a statistical test, either explicitly or in the form of interpreting the significance of a coefficient (usually some sort of value relating to the effect of an environment variable, like temperature or pollution levels) in a model. The p-value is often used to help translate the results of a test or model into a decision. You’ve heard it over and over again: if the p-value is less than 0.05 we reject the null in favor of the alternative. But what does that really mean? What is the null? What is the alternative? And what is so special about 0.05?

Plenty of people have weighed in on the use of p-values. This will not be a post that judges (or applauds you) for using p-values; instead the goal of this post is to make sure readers understand what p-values really are, and where they may lead us astray.

Consider the recent study assessing bird abundance over time. A null hypothesis in this scenario is that there is no change in bird abundance over time. An alternative hypothesis is that bird abundance is decreasing over time. The patterns we might see and methods used are of course quite nuanced, but here, let’s consider a simplified scenario where we have data on the estimates of bird abundance across a series of years. We could start by performing an ordinary linear regression using the abundance as the response variable and year as the explanatory variable (yes, there are lots of good reasons not to do this, but just for the sake of argument, bear with me) to try to get some information on whether bird abundance seems to be changing over time.

When we fit this model we will get a coefficient giving us an idea of the effect of “year,” an estimated standard error for the covariate, and a p-value for the coefficient. The p-value is the probability (“p” for “probability”) that we would obtain an estimated coefficient equal to or more extreme than the one we calculated given that the null hypothesis (there is no change in bird abundance over time) is true. The intuition is that if this probability is small, it is unlikely that we got our result just by chance under the scenario of the null hypothesis, providing evidence in favor of the alternative hypothesis.

Now, I brushed past the standard error of the coefficient at first, but it is closely related to the p-value. Instead of using the p-value to help make a decision, we could use the coefficient and its standard error to create a confidence interval, which we could then use to help us make a decision. The statement that is drilled into most beginners in scientific modelling goes as follows: the 95% confidence interval means that if you replicated your study 100 times and calculated a confidence interval, 95 of them would cover the true value of your parameter of interest.

It is important to realize that this does not mean that we have a 95% chance of making the right decision based on our confidence interval. The truth either is or is not in our confidence interval. Our study setup is what is being evaluated, not the particular study result. To use a confidence interval to make a decision, we consider any value within the interval to be plausible (since 95 out of 100 calculated under our setup would cover the truth).  If zero lies within the confidence interval, then it is plausible that there is no relationship between time and abundance and we fail to reject the null hypothesis of no change in abundance over time. The decision we make with the p-value and the confidence interval will be the same.

What if that data we had on our birds was a little richer. Instead of total abundance over time we have species specific abundance over time. We might fit an ordinary linear regression between abundance and year for each species, or if we are feeling fancy, use a species indicator term to obtain one model to rule them all. Now we have a different coefficient, standard error, and p-value for each species, explaining the relationship between its abundance and time. We could evaluate each p-value separately, but the more species there are, the more likely it is that we’ll get a p-value less than 0.05 just by chance. This means that we will falsely decide that the species abundance is changing over time.

The intuition here is that it is unlikely that one unlikely thing will happen to us, but it is more likely that one of many unlikely things will happen to us. Slightly more formally, you may have learned at some point that the probability of event A or event B happening was the sum of their individual probabilities (caveat, because I’m a statistician and can’t help it, the events must be independent, which is maybe not a good assumption in the case of different bird species, but forgive me). So if we add enough really small probabilities together, the sum will eventually get big enough to cross into “likely” territory. In statistics, we call this problem, the “multiple testing” problem. The good news is there are ways to adjust the p-values for how many tests we do in order to still make a decision based on them, but the bad news is that a discussion of those methods is out of the scope of this post. We’ll save that for another time if there is interest.

Have a quantitative term or concept that mystifies you? Want it explained simply? Suggest a topic for next month to @sastoudt.

Ecofeminism: The Difficulty of a Definition

Image Credit: Christoph Strässler, CC BY-SA 2.0

Over the next month or so I’ll be summarising a sociology paper that I wrote back in 2017 on ecofeminism. You can read the introductory piece here. This is part two. Image Credit: Christoph Strässler, CC BY-SA 2.0, Image Cropped.

One of the earliest difficulties that ecofeminism faced was that nobody seemed to understand exactly what is was. In the first piece of this series, I listed it as “a vaguely defined version of… a combination of ecology and feminism.” You can probably see this issue already – a combination of ecological and feminist thought sounds nice, but if it doesn’t have any clear message or meaning then is there really a point?

Read more

In Defense of Aliens

It's important to remember that not all alien species are harmful, and we shouldn't treat them all as such

Image Credit: Lou133lou133, CC BY-SA 4.0, Image Cropped

Building on last week’s article on defining invasive and alien species as well as the work of Professor Mark Davis, I am going to do the unimaginable for an ecologist and argue that maybe alien species aren’t always a bad thing. I want to emphasize that maintaining biodiversity is essential, but maybe we should focus on the role of species in their environment rather than their place of origin.

Read more

Defining an Invader

The Northern Pike. Although it's native to Norway, it has been moved around since and is now classified as 'regionally invasive'.

The Northern Pike. Although it’s native to Norway, it has been moved around since and is now classified as ‘regionally invasive’. (Image Credit: Jik jik, CC BY-SA 2.0)

Two weeks ago, Norwegian Science Institute Artsdatabanken (ADB) announced that they would be changing the name of their invasive and alien species index. Formerly known as the Black List, the institute decided to use a name with less negative connotations, “Fremmedartslista“, loosely translated, the Alien Species list. Given this series’ focus on species from that list, it seems like an appropriate time to look at how we define the terms ‘alien’ or ‘invasive’ species.

Read more