“Wait, What Am I Even Saying?” Communicating Statistics To A Wide Audience
If we write about our statistical methods behind our ecology work, and none of our readers understand it, have we really communicated at all?
This month I’m getting meta. It’s been about a year and a half since I started writing the Stats Corner for this blog with the goal of demystifying some of the statistical methods that are used by ecologists every day. At the same time, I’ve been writing a book with Deborah Nolan called “Communicating with Data: The Art of Writing for Data Science.” The book was released this spring, so it seemed like a good time to reflect on writing about statistics accessibly.
Writing is hard. Maybe you just finished writing up a final project for a class, are wrapping up a dissertation, or starting to plan your summer of writing projects and are feeling a twinge of “oh no.” Making other people quickly understand and feel invested in something that we ourselves have spent years of our lives working on is challenging. It can sometimes feel like we are glossing over all of the work we’ve done and all of the fascinating rabbit holes we went down when we boil the work down to just the essential takeaways. We may also forget that readers won’t have the same background knowledge that we have carefully accumulated over time and therefore leave the reader asking “huh?!” more than we would like.
Writing about statistics has its own set of challenges. Think about your own field for a minute. Are there words that mean something particular in your work that the general public might interpret differently? Words like “community” and “diversity” come to mind. Statisticians face the same thing. We have overloaded words like “significance,” “confidence,” and “error,” and because we’re constantly focusing on what we *cannot* say with the data, we can often lose our audience along the way.
As an applied statistician who works on ecology applications, one of the biggest challenges I’ve faced is knowing when the methods actually *are* the story and when they are just playing a supporting role. I’m constantly writing for two audiences and am required to both convince other statisticians that my approach is defensible while convincing ecologists that the extra statistical effort is necessary. I’ll outline a few potential approaches for balancing these requirements and audiences.
Concrete Example in Context
There is nothing worse than a paper that starts out “Let X be… and Y be…” with a bunch of notation and subscripts floating around. This is true even if the methods really are the star of the show. Readers need context to help guide them through (like when I walked through different data scenarios to motivate different transformations). Remember, this is the first time they are encountering any of our work. We have been stewing in it for a long time, so everything feels obvious to us.
I hope at least most of my Stats Corner posts have succeeded in this concrete example approach. Yes, I just make up examples based on animals I find interesting at the time, but the goal is to connect abstract methods to something tangible in the real-world to get some traction. Don’t stop reading, reader!
“It’s like this… but better“
The nice thing about a lot of statistical methods is that they build on one another. So if you can find common ground with a simpler model that more people are familiar with and then articulate what makes the new approach different, then you are on your way. You can build up, layer by layer, perhaps even tracing the historical discovery process of the methods as you go. Sometimes this layering is literal, like when I established what fixed and random effects were and then used that common base to show how they go together in mixed models.
Beware the Consequences
Ok, I feel a little doomsday here, but sometimes you just need to spell out the worst case scenario if the data are not analyzed correctly. This approach is even better if you can translate the statistical consequences into the downstream consequences for ecological decisions based on findings of a faulty analysis. For example, if we fail to quantify the uncertainty appropriately (like when the independence assumption isn’t met), then we might underestimate the width of these confidence intervals and conclude that there is a statistically significant relationship between two variables when that relationship is truly spurious. We could be throwing money and resources into a relationship that isn’t even there.
I’m still honing my own writing craft, and these certainly aren’t the only strategies to use. I’d love to hear from you! What scenarios do you find most challenging to write or talk to others about? What strategies have worked well for you?
How do we learn to balance being precise and faithful to the data while being accessible to a wide audience? Shameless plug time: check out our book. We hope it will help folks navigate that tradeoff and build those data communication skills. We don’t have the magic cure-all; it does take practice. But we hope the book provides readers with guided practice prompts, examples tailored to data, and tips that we have found useful in our writing, including writing the book itself (continuing with the meta thread). We interpret “communication” broadly to include reading, visualization, coding, and yes, even networking, all of which take practice. Want to know more? I’ve written a Twitter thread overview of the book (complete with a promo code) here.
Have a quantitative term or concept that mystifies you? Want it explained simply? Suggest a topic for next month → @sastoudt