Data in Colour: Bringing Photos Into Our Spreadsheets

Image Credit: Shiv’s fotografia, CC BY-SA 4.0, Image Cropped

When I think of the ecological data I typically work with, it usually tells me where plants or animals are, how many of them there are, and how those quantities might change. Most often, these organisms boil down to a few spreadsheet cells. But what if the questions you’re asking are less “where is the organism”, and more “what does it look like”? 

Photographic data is not a new phenomenon for scientists, but thanks to huge leaps in technology (hello, camera phones) it is a booming data source.  Community science – whereby members of the general public submit photos of species they’ve happened across – has seen a huge rise in popularity, thanks to apps and community platforms like iNaturalist. As a result, photo data is constantly growing in abundance, and many studies are quickly adapting to take advantage of this data source.

In this post, I’m going to cherry pick some studies that have taken this approach and summarize how photos became data. Keep in mind there are many more examples (exhibit A) of this kind of work (and feel free to share with me your favorites), but I hope these examples show the research opportunities that arise when we consider data beyond the numbers.

We’ll start with the study that first sparked my interest in photo data. Moore et al. (2019) found that “greater wing coloration heats males – the magnitude of which improves flight performance under cool conditions but dramatically reduces it under warm conditions” in dragonflies. They were then able to use photos from iNaturalist to show that wing coloration “is dramatically reduced in the hottest portions of the species’ range.” How did they go from photos to a conclusion like this? They manually looked at a bunch of photos and decided if the wing coloration of interest was present or not. People-power took hundreds of photos and turned them into yes/no data. 

What if we don’t have the patience for that? Is there a way to automate the process? Laitly et al. (2020) tested photos taken in controlled settings v. those taken “in the wild” by community scientists and used a computer program to automatically access the color information, coded as RGB (red, green, blue contributions) and HSV (hue, saturation, value) summaries. In this way, a computer does the heavy lifting translating a photo into numbers.

This automated analyst approach is enjoying a lot of popularity right now, with many researchers creating machine learning tools that can instantly turn photo (or even video) data into a species identification (with some remaining uncertainty of course). It has some astonishing practical applications too, with a recent study by McClure et al. (2021) showing that optical recognition of approaching birds could automatically slow wind turbines and lead to a drop in bird deaths!

What about museum specimens? Pearson et al. (2021) used a database of herbarium specimens to answer the question, “what is associated with variation in the flowering date of the California Poppy?” They used a mix of manual and automatic approaches as well. Some specimens were visually inspected for whether they were flowering or not. Other specimens had meta-data associated with the record that was automatically sifted through via text analysis. Now the computer is turning unstructured words into yes/no data.

Another personal favorite study looked at the molting patterns of a mountain goat’s winter coat. Nowak et al. (2020) wanted to assess the extent of the winter coat which is shed and see what can help explain the patterns they find. This time the researchers needed to find a way to calculate the areas of the coat that were shed and unshed. A meticulous series of steps in Photoshop is now required. They have made a video tutorial to explain if you are interested in the details. 

You might have noticed by now that the subtitle of this post is a bit of a misnomer. Even though these researchers started with objects that do not fit into a traditional spreadsheet, they found a way to manipulate photos and text to get the relevant information back into that precious spreadsheet. Fair enough, but hear me out! What I love about these studies is that there is such creativity required in being able to see something like a crowd-sourced photo as potentially relevant to an impactful ecological question and working towards finding a way to massage that new data form into a data analysis workflow that we are already familiar with. Let’s cultivate that creativity and cheers to the new types of questions we can answer with new types of data!

Have a quantitative term or concept that mystifies you? Want it explained simply? Suggest a topic for next month →  @sastoudtYou can read more of Sara’s work at The Stats Corner.

One comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s