Author Archives: Sara Stoudt

Don’t Let Coefficient Interpretation Make an Ass of You

Image Credit: beeveephoto, CC BY-SA 2.0, Image Cropped

Everything that ecologists do – from saving endangered species to projecting climate change impacts – requires ecological data. Sometimes that data can be hard to come by, like when you’re trying to figure out the range of a rare moss. At other times, that data can be smack bang in front of you, but impossible to measure. The depth of a lake for instance, or the surface area of a tree. Today, we’ll look at how to overcome that second situation, by using other, more easy-to-obtain covariates to provide an estimate of the property you’re looking for.

Sure interpreting coefficients in increasingly complicated regression models is challenging, but have you ever tried to weigh a donkey in the wild? It turns out it is hard to do without special equipment, so Kate Milner and Jonathan Rougier devised a way to estimate the weight using easier to obtain measurements such as height and girth (for these measurements a simple tape measure will suffice). We’ll use their data to illustrate how to interpret a variety of types of coefficients in a regression scenario.

One Predictor

Screenshot from 2020-05-18 17-23-08

We’ll start with a simple linear regression. We have our predictor variable (height) on the x scale, and the variable we’re trying to predict (weight) on the y scale. The results given on the right might seem like a simple case – the value given in the red circle indicates that when height increases, weight tends to increase by 4.55 times as much (on average). But before we jump to use that 4.55, we have to think about the units of all of the variables. In this case, weight is measured in kilograms and height is measured in centimeters. Now we have enough information to say an increase in one centimeter of a donkey’s height is associated with (note we don’t say “causes”) an increase in weight of 4.55 kilograms on average. This extra “on average” is because the model helps us understand the expected value of a donkey’s weight given its height, but there can be variability for any particular donkey.

One Predictor – Response on Log Scale

Screenshot from 2020-05-18 17-24-03

What happens if we need to apply a transformation? Now a simple case gets a little bit more complicated. If we transform the response variable (here we apply a logarithmic transformation to weight) we can still say an increase in one centimeter of a donkey’s height is associated with an increase of 0.036 in log weight on average. However, that is kind of clunky; what does log weight even mean in reality? Instead, we can back-transform the coefficient and say that an increase in one centimeter of a donkey’s height is associated with an increase of e^0.036 = 1.037 kilograms in weight on average.

Predictor on Log Scale

Screenshot from 2020-05-18 17-26-51

What if it is the predictor variable that is on the log scale? This makes things a bit more complicated because the effect of the predictor on the response is nonlinear (i.e. a one centimeter increase in height is associated with a different increase in weight depending on what the original height was). Therefore we have to talk in terms of a percentage increase rather than a fixed value increase. For example, a 1% increase in height is associated with a difference in average weight of 450.02 * log(1.01) = 4.48 kilograms.

Predictor and Response on Log Scale

Screenshot from 2020-05-18 17-27-19

If both the predictor and the response are log transformed, the effect on the response of the nonlinear relationship with a predictor is also nonlinear itself. Now both parts of our explanation need to be in terms of percentages rather than fixed numbers since their increases in absolute terms depend on their starting values. For a 1% increase in height we expect the average ratio of the weights to be 1.01^3.57 = 1.04. In other words, a 1% increase in height is associated with a 4% increase in weight. These nonlinearities can make interpretation tricky. Find more guidance on log transformation interpretation here. Similarly, if you are using logistic regression, interpretation of coefficients can also be a bit mysterious. We won’t tackle that case in this post, but you can learn more here.

Interaction Term on Discrete Covariate

Screenshot from 2020-05-18 17-32-07

The relationship between height and weight may depend on a categorical variable, like sex in this case. The coefficient on height is for a baseline category (here stallion), so an increase in height of one centimeter is associated with an increase in weight of 4.79 kilograms for a stallion. Other sex categories have an additional term to consider. For a gelding, there is an additional association, decreasing the average weight by 1.14 kilograms (the blue circle). Therefore, the overall effect of an increase in height of one centimeter is associated with an increase of 4.79 – 1.14 = 3.65 kilograms for a gelding donkey. A female has its own association as well, decreasing the average weight by 0.43 kilograms. Again, the overall effect of an increase in height of one centimeter is associated with an increase of 4.79 – 0.43 = 4.36 kilograms for a female donkey.

Multiple Predictors

Screenshot from 2020-05-18 17-33-00

Commonly we model a response variable with more than one predictor. Then the interpretation of the coefficients changes a bit. The associations must be interpreted “in the presence of” the other covariates. This means that after accounting for height, an increase in girth of one centimeter is associated with an increase in weight of 2.84 kilograms on average. Similarly, after accounting for girth, an increase in height of one centimeter is associated with an increase in weight of 0.93 kilograms.

Screenshot from 2020-05-18 17-33-40

Conceptually, this interpretation is necessary because covariates may share an association with the response. After accounting for one covariate, another covariate may have less association with the response because some of the variability in the response is already accounted for by variability in the first covariate. Above we see that height and girth are correlated with each other. Therefore some of the information that each covariate contributes to helping to understand weight is redundant in the presence of the other (this is why the coefficient on height is smaller in the multivariate model).

Interaction Term on Continuous Variable

Screenshot from 2020-05-18 17-34-52

An interaction term between two continuous variables means that the magnitude of the relationship between the first covariate and the response depends on the value of the second covariate. To better understand how the relationship between weight and height changes depending on the value of girth, we can make a conditional coefficient plot (learn more here). We can see that the relationship between weight and height gets more positive as the value of girth increases. This matches the regression output (the relevant coefficient is 0.0275). This means that the increase in weight associated with height increases by an additional 0.0275 kilogram for every one centimeter increase in girth. It’s important to note though that for donkeys, a 3 gram difference in weight might not be practically significant.

Interpretation of coefficients in regression output can be a bit of a mouthful, and word choice really does matter, especially when facing transformations of variables and multiple variables interacting with one another. Hopefully this example provides enough guidance to guide you through your next hairy regression interpretation. 

Have a quantitative term or concept that mystifies you? Want it explained simply? Suggest a topic for next month →  @sastoudt. You can also see more of Sara’s work at Ecology for the Masses at her profile here.

Bayesians v. Frequentists: A Tale as Old as Time

In our last stats post, we talked at length about everything that can influence the outcome of a statistical model. The choice of parameters. The choice of data. But one thing we avoided talking about was the choice of the approach to the model itself. And that brings us to the two big approaches in statistical modelling – Bayesian vs. Frequentist.

Depending on who you are, you may assign “beauty” and “beast” labels to different sides of the Bayesian v. Frequentist debate. We are not going to solve any serious rivalries here in this blog post. Instead, we are just going to try to give an overview of both schools of thought so that we can have a conceptual understanding of the broader debate.

It all starts with Bayes’ Rule (which has a dramatic history all its own): P(A|B) = [P(B|A) * P(A)] / P(B). This really just means that we can understand the probability of A given B if we only have information about the probability of B given A and the marginal probabilities of A and B. If we swap the alphabet with a modelling scenario: P(parameter | data) = (P(data | parameter) P(parameter))/P(data). P(parameter) is the prior. P(parameter | data) is the posterior and the final outcome of the model. P(data) is a constant we often can ignore.

Still confused? That’s fine, let’s use a more concrete example. To understand the likelihood of a fish being absent from a lake as a result of it being too small – probability of the effect of the parameter given the data, or P(parameter|data), we need to understand

  1. the probability of the lake being small – P(parameter)
  2. the probability of the fish being absent from the lake – P(data), and
  3. the probability of the lake being small given that the fish is absent from it P(data|parameter).

That last one, the probability of our data given the parameter, is what a frequentist approach attempts to analyse. In other words, what is the probability that we got this data (that the fish is absent) based on the parameter (the size of the lake).  Maximum likelihood estimation is used to choose the values of parameters that maximize this probability. Bayesian analysis gives us the probability of the parameters given that we observed a set of data. We still want to find the value of the parameters that maximize this probability, but there is a subtle distinction in interpretation.

Hierarchical Bayesian models appear a lot in ecology. Some researchers are pragmatists, taking advantage of the many computational tools that have been developed (since maximum likelihood estimation for complicated models is often hard). Others have qualms about the subjective nature of Bayesian analysis. Bayesian analysis requires that we propose a set of reasonable values of our parameters (via priors) before conducting our analysis. Our results might be sensitive to these proposals.

How are priors chosen in practice? First we need to make sure that the prior puts a positive probability on the true value of the model parameter. Of course, if we knew the true value, we wouldn’t be doing all of this. Therefore we often choose a diffuse prior that covers our bases. We pay a price for this safety though; a diffuse prior makes the distribution of the posterior wider, increasing our uncertainty in the estimated parameter. (To get some intuition for how different priors affect the posterior distribution, check out these interactive apps.)

Think of it like this: you smell smoke in a house, and you want to get to it as soon as possible. Your prior information might be that the smoke is most likely coming from the kitchen, so you check there first, and if your prior is right, you find the fire much faster. Great. But if your prior is wrong, you might waste too much time looking in the kitchen and not be able to find the source of the fire in time.


Bayesian Pros: You could find the fire more quickly. Cons: You might miss it altogether. (Image Credit: bertknot, CC BY-SA 2.0)

We can argue about priors forever, but we must keep in mind that in both the Bayesian and frequentist approaches we have to choose a likelihood distribution. This feels like second nature to us, but we should remember that if our model is mis-specified in either case, our results may be in jeopardy.  and want to figure out where the baby is. Your prior might be that the baby is normally

In a world where we had some information about the true parameter, we would want to choose a prior that put a large probability mass in the region where we suspect the parameter to be. This would help us decrease the uncertainty in the final estimate. A natural case where this happens is when we update analyses with additional data. Our estimate of a parameter from our first study provides a natural central point for the prior distribution for our second study. As in the scientific method, we constantly update our information about the state of the world by collecting more data and refining our analyses. Sometimes new data confirms what we already know and helps us narrow the uncertainty around an estimate. Other times new data provides overwhelming enough evidence that the new estimate ends up very different than our previous one. Somewhere in between frequentist and Bayesian approaches lies the so-called Empirical Bayes which uses the observed data to form the prior distribution.

Whether you identify as a Bayesian, a frequentist, or an equal opportunity modeler, assessing the fit and the sensitivity of the results to differing modeling assumptions is a key step in the data analysis workflow. Remember, all approaches have their flaws when used inappropriately.

Have a quantitative term or concept that mystifies you? Want it explained simply? Suggest a topic for next month →  @sastoudt. You can also see more of Sara’s work at Ecology for the Masses at her profile here.

Title Image Credit: Daniel Brachlow, Pixabay licence, Image Cropped

Model Mis-specification: All The Ways Things Can Go Wrong…

Image Credit: Grand Velas Riviera Maya, CC BY-SA 2.0, Image Cropped

In ecological studies, the quality of the data we use is often a concern. For example, individual animals may be cryptic and hard to detect. Certain sites that we should really be sampling might be hard to reach, so we end up sampling more accessible, less relevant ones. Or it could even be something as simple as recording a raven when we’re really seeing a crow (check our #CrowOrNo if you have problems with that last one). Modeling approaches aim to mitigate the effect on our results of these shortcomings in the data collection.

However, even if we had perfect data, when we decide how to model that data, we have to make choices that may not match the reality of the scenario we are trying to understand. Model mis-specification is a generic term for when our model doesn’t match the processes which have generated the data we are trying to understand. It can lead to biased estimates of covariates and incorrect uncertainty quantification.

Read more

What’s the Deal with P-Values and Their Friend the Confidence Interval?

After the first edition of Ecology for the Masses’ new Stats Corner, many people requested a discussion of p-values. Ask and you shall receive! And as an added bonus, we’ll also talk about confidence intervals. (Image Credit: Patrick Kavanagh, CC BY 2.0, Image Cropped)

Much of ecological research involves making a decision. Does implementing a particular management strategy significantly increase the species diversity of a region? Is the amount of tree cover significantly associated with the number of deer? Do bigger individuals of a species tend to have longer life expectancies?

Read more

The Independence Assumption and its Foe, Spatial Correlation

When animals like these wolves travel in packs, spotting one individual means we're more likely to spot another soon after. So how do we come up with a reliable population estimate in situations like these? (Image Credit: Eric Kilby, CC BY-SA 2.0, Image Cropped)

When animals like these wolves travel in packs, spotting one individual means we’re more likely to spot another soon after. So how do we come up with a reliable population estimate in situations like these? (Image Credit: Eric Kilby, CC BY-SA 2.0, Image Cropped)

The thought of an ecologist may conjure the image of a scientist spending their time out in the field counting birds, looking for moss, studying mushrooms. Yet whilst field ecologists remain an integral part of modern ecology, the reality is that much of the discipline has come to rely on complex models. These are the processes which allow us to estimate figures like the 1 billion animals that have died in the recent Australian bushfires, or the potential spread of species further polewards as climate change warms our planet.

Read more