Don’t Let Coefficient Interpretation Make an Ass of You
Image Credit: beeveephoto, CC BY-SA 2.0, Image Cropped
Everything that ecologists do – from saving endangered species to projecting climate change impacts – requires ecological data. Sometimes that data can be hard to come by, like when you’re trying to figure out the range of a rare moss. At other times, that data can be smack bang in front of you, but impossible to measure. The depth of a lake for instance, or the surface area of a tree. Today, we’ll look at how to overcome that second situation, by using other, more easy-to-obtain covariates to provide an estimate of the property you’re looking for.
Sure interpreting coefficients in increasingly complicated regression models is challenging, but have you ever tried to weigh a donkey in the wild? It turns out it is hard to do without special equipment, so Kate Milner and Jonathan Rougier devised a way to estimate the weight using easier to obtain measurements such as height and girth (for these measurements a simple tape measure will suffice). We’ll use their data to illustrate how to interpret a variety of types of coefficients in a regression scenario.
We’ll start with a simple linear regression. We have our predictor variable (height) on the x scale, and the variable we’re trying to predict (weight) on the y scale. The results given on the right might seem like a simple case – the value given in the red circle indicates that when height increases, weight tends to increase by 4.55 times as much (on average). But before we jump to use that 4.55, we have to think about the units of all of the variables. In this case, weight is measured in kilograms and height is measured in centimeters. Now we have enough information to say an increase in one centimeter of a donkey’s height is associated with (note we don’t say “causes”) an increase in weight of 4.55 kilograms on average. This extra “on average” is because the model helps us understand the expected value of a donkey’s weight given its height, but there can be variability for any particular donkey.
One Predictor – Response on Log Scale
What happens if we need to apply a transformation? Now a simple case gets a little bit more complicated. If we transform the response variable (here we apply a logarithmic transformation to weight) we can still say an increase in one centimeter of a donkey’s height is associated with an increase of 0.036 in log weight on average. However, that is kind of clunky; what does log weight even mean in reality? Instead, we can back-transform the coefficient and say that an increase in one centimeter of a donkey’s height is associated with an increase of e^0.036 = 1.037 kilograms in weight on average.
Predictor on Log Scale
What if it is the predictor variable that is on the log scale? This makes things a bit more complicated because the effect of the predictor on the response is nonlinear (i.e. a one centimeter increase in height is associated with a different increase in weight depending on what the original height was). Therefore we have to talk in terms of a percentage increase rather than a fixed value increase. For example, a 1% increase in height is associated with a difference in average weight of 450.02 * log(1.01) = 4.48 kilograms.
Predictor and Response on Log Scale
If both the predictor and the response are log transformed, the effect on the response of the nonlinear relationship with a predictor is also nonlinear itself. Now both parts of our explanation need to be in terms of percentages rather than fixed numbers since their increases in absolute terms depend on their starting values. For a 1% increase in height we expect the average ratio of the weights to be 1.01^3.57 = 1.04. In other words, a 1% increase in height is associated with a 4% increase in weight. These nonlinearities can make interpretation tricky. Find more guidance on log transformation interpretation here. Similarly, if you are using logistic regression, interpretation of coefficients can also be a bit mysterious. We won’t tackle that case in this post, but you can learn more here.
Interaction Term on Discrete Covariate
The relationship between height and weight may depend on a categorical variable, like sex in this case. The coefficient on height is for a baseline category (here stallion), so an increase in height of one centimeter is associated with an increase in weight of 4.79 kilograms for a stallion. Other sex categories have an additional term to consider. For a gelding, there is an additional association, decreasing the average weight by 1.14 kilograms (the blue circle). Therefore, the overall effect of an increase in height of one centimeter is associated with an increase of 4.79 – 1.14 = 3.65 kilograms for a gelding donkey. A female has its own association as well, decreasing the average weight by 0.43 kilograms. Again, the overall effect of an increase in height of one centimeter is associated with an increase of 4.79 – 0.43 = 4.36 kilograms for a female donkey.
Commonly we model a response variable with more than one predictor. Then the interpretation of the coefficients changes a bit. The associations must be interpreted “in the presence of” the other covariates. This means that after accounting for height, an increase in girth of one centimeter is associated with an increase in weight of 2.84 kilograms on average. Similarly, after accounting for girth, an increase in height of one centimeter is associated with an increase in weight of 0.93 kilograms.
Conceptually, this interpretation is necessary because covariates may share an association with the response. After accounting for one covariate, another covariate may have less association with the response because some of the variability in the response is already accounted for by variability in the first covariate. Above we see that height and girth are correlated with each other. Therefore some of the information that each covariate contributes to helping to understand weight is redundant in the presence of the other (this is why the coefficient on height is smaller in the multivariate model).
Interaction Term on Continuous Variable
An interaction term between two continuous variables means that the magnitude of the relationship between the first covariate and the response depends on the value of the second covariate. To better understand how the relationship between weight and height changes depending on the value of girth, we can make a conditional coefficient plot (learn more here). We can see that the relationship between weight and height gets more positive as the value of girth increases. This matches the regression output (the relevant coefficient is 0.0275). This means that the increase in weight associated with height increases by an additional 0.0275 kilogram for every one centimeter increase in girth. It’s important to note though that for donkeys, a 3 gram difference in weight might not be practically significant.
Interpretation of coefficients in regression output can be a bit of a mouthful, and word choice really does matter, especially when facing transformations of variables and multiple variables interacting with one another. Hopefully this example provides enough guidance to guide you through your next hairy regression interpretation.
Have a quantitative term or concept that mystifies you? Want it explained simply? Suggest a topic for next month → @sastoudt. You can also see more of Sara’s work at Ecology for the Masses at her profile here.
Pingback: The How, Why, and When of Transforming Data | Ecology for the Masses
Comparing the non-transformed vs. log transformed donkey examples, the non transformed model predicts a ~4 kg increase in weight per cm, vs a ~ 1kg increase in weight for the model run with the log transformed response. Thats a big difference. How to interpret this?
Pingback: Farewell to the Stats Corner | Ecology for the Masses