A Guide to Improving Data Coverage Using Australian Reptiles
Constant improvements in data integration technology have meant that its now possible to bring together large numbers of separate datasets into enormous datasets spanning many species and regions. This sounds great in practice – that means we can look at important trends at large scales with plenty of data, right?
The problem is that there are often biases within these data. Some areas are more accessible and will have higher densities of observations or studies. Some species are of less interest and may be more poorly covered. Today’s researchers wanted to take such a dataset and see if they could identify patterns in the biases present.
What They Did
The authors compiled 1201 datasets which included field data on Australian reptiles registered between 1972 and 2017. The datasets had to include some sort of location details. The authors then modelled the locations of the data against four variables: proximity to a university, the reptile species richness of the location, human impact in the area and whether or not the study was located in a protected area.
They then used the effect of these variables to construct a map showing where future studies were unlikely to take place.
Did You Know: Metadata
Metadata is data that tells you something about a larger group of data. In today’s paper metadata is being used to let scientists know if any biases exist in studies of reptiles in Australia. The tricky thing about data collection is that you have to be sure that you are collecting a non-biased and truly representative sample of all of the available data for your scientific study. Failure to do so can result in biased metadata, which means that you aren’t going to have an accurate representation of the overall data itself.
For example, if the researchers in today’s paper only collected studies that were performed in Melbourne, but ignored every other part of Australia, then they wouldn’t be able to make any statements about overall trends for Australian research.
What They Found
Proximity to universities turned out to be the best predictor of study location by a fair margin, with sampled sites tending to be much closer to universities. This was followed by human footprint and reptiles species richness, with areas with a higher human footprint and higher species richness more likely to be sampled. Protected areas were the least important variable, though the results showed that sites inside protected areas were more likely to be sampled.
The figure below shows areas that were predicted to be ‘cold spots’ for reptile research in Australia, based on the values of the variables listed above.
When you’re working with a dataset that shows only presences and now absences, it’s quite common to use MaxEnt (maximum entropy) models. These are usually tricky, as they essentially have to make up absences that you compare your presences too. This study is fortunate, as the fact they’re only looking at study sites means they don’t need to worry about ‘false absences’ – the model making up absences where there were actually studies that took place.
Calls for better coverage of studies have been ongoing ever since we were able to compile large metadatasets. What this study does is show (for a particular taxa) exactly where we need better coverage. The map included above is a great guide for future reptile research.
A further advantage is that this paper gives a pretty straightforward guide to follow for other groups of species as well.
Sam Perrin is a freshwater ecologist currently completing his PhD at the Norwegian University of Science and Technology who might detest birds and fish but is very ambivalent on reptiles. You can read more about Sam’s research on his Ecology for the Masses profile here, and follow him on Twitter @samperrinNTNU.
Title Image Credit: Sam Perrin, CC BY-NC 2.0