Tag Archives: statistical modelling

Who Is Simpson And What Does His Paradox Mean For Ecologists?

Edward H. Simpson was a codebreaker at Bletchley Park, the home of Allied code-breakers during the Second World War. While you’d think this would be his claim to fame, perhaps his most lasting contribution is his description of Simpson’s paradox. The paradox describes the phenomena whereby a relationship within a dataset dramatically changes if you look at the data by group or all together. More famous examples of the paradox stem from the medical world or the famous Berkeley admissions example. But what examples can we have in mind in ecological settings to guide us? Let’s consider the dimensions of penguins’ bills compiled from Palmer Station in Antarctica. If we are interested in the relationship between the bill depth and length we might do a preliminary analysis like the following linear regression.

Read more

Finding Balance on the Bias-Variance Seesaw

Building models is a tricky business. There are lots of decisions involved and competing motivations. Say we are an ecologist studying owl abundance in a park near our school. Our primary goal may be to have a good understanding of what is going on in our data. We don’t want to miss any important relationships between abundance and measurable factors about the landscape. Like if we didn’t include tree cover as an explanatory variable, we might have a model that is underfit since that variable would give us potential information about the availability of spots for owls to nest. 

Read more