Tag Archives: technology

The Modern Biologist’s Challenge: Data Management

Modern biologists often do most of their most integral work not deep in a forest, but sitting behind a laptop while fuelling their caffeine addictions (Image Credit: gdsteam, CC BY 2.0, Image Cropped)

When you are asked to picture a biologist, chances are that many will picture someone like Jane Goodall or David Attenborough: a determined scientist wearing a zip-off pants and a pair of sturdy boots making their way through the thick vegetation of a remote Pacific island to study the intricate social behaviour of an elusive ground-dwelling mammal. Yet these days a large portion of modern biologists embark on very different journeys. Equipped with a computer full of code and mathematical models, they venture through a jungle of spreadsheets and tables filled with row upon row of data.

First of all, some nuance is needed. I might fit the picture of the biologist who only leaves their office to refill their coffee mug or cool down after another computer meltdown, but the majority of biologists do fit the above description of the ‘traditional biologist’ to varying degrees. They might spend time out in the field, growing plants in greenhouses or cultivating microorganisms in the lab. But nowadays they’re almost all spending some time wrangling, analyzing and visualizing data behind their computers. And as this type of scientist has slowly become the norm, the amount of biological data floating around has grown exponentially. And this comes with a whole new set of challenges.

The Challenges of Data Management

Good data management is fundamental to produce high quality research. It starts with the creation and collection of data. Even if the process involves clear protocols, calibrated measuring devices and well-trained volunteers, students or researchers, the many people that are often involved in data collection will introduce errors and biases. Identifying sources of potential error and bias and documenting these explicitly will make it possible to account for them at a later stage, yet often it’s hard to do this.

After collection, data are digitized and converted into a format suitable for subsequent analyses. During this process, a researcher, often with a particular study or research project in mind, makes any number of of small, seemingly insignificant decisions that determine how the data are structured. The number of files to store the data in, variable names and data types might be logical to the researcher who processed the data, but might not appear so obvious to their student. Metadata or similar files and quality checks are often missing, so it is difficult to figure out how to interpret the content of the data. Choosing a consistent, intuitive format that is also usable in future work is not easy. As biologists are rarely trained in data management, the typical dataset may be a database manager’s worst nightmare: unorganized, inaccurate and inefficient.

Data management does not only entail the creation and processing of data; it also includes sharing and reusing data by the scientific community. It has become increasingly common to be asked to share the data used in a scientific paper. Online repositories as Dryad – a community-led platform that is committed to making data available for research and educational reuse – or code-sharing platforms like GitHub are often used, but the available data is often a mere summary of the actual data used. It is not so surprising: imagine being a researcher responsible for the long-term individual-level monitoring of a species that is very dear to them. It can be very frightening to make years and years of commitment and valuable information available to the public, as it means that other researchers can incorporate that data into their own papers, even before you’ve had a chance to publish your own research. Sharing data can, however, be very valuable for the visibility and influence of the owner’s research, encourage collaborations and new research ideas, and improve transparency – a theme of increasing importance in the Open Access movement.

Community Standards and Initiatives

The challenges described above become even clearer when one integrates data from different sources. Inconsistencies and errors accumulate, and the many different formats and data structures make the conversion of these data in a usable format difficult and time consuming. Luckily, there are some initiatives out there that recognise the problems with data management.

Community data standards are one way to tackle the infinite number of formats. Community data standards are, as the name implies, data formatting standards commonly used by a community. One of the most widely used data standards is Darwin Core, a standard that offers a clear and flexible framework for compiling biodiversity data using a glossary of terms, but there are numerous data standards tailored for specific research fields (e.g., Open Traits Network, a community of researchers and institutions working towards the standardisation and integration trait data, and SPI-Birds, a network and database with a community-defined, standardized method for formatting data on hole-nesting birds).

european-908502_1920

Whilst the ubiquity of the house sparrow means there is plenty of data on it, that data can be a nightmare to bring together (Image Credit: TK McLean, Pixabay licence)

Progress towards integration of data from different sources has also been made through databases and initiatives as the Global Biodiversity Information Facility (GBIF), an international network and research infrastructure with the aim to provide open access to biodiversity data, GenBank, a database of all publicly available DNA sequences, and FORCE11. Using the FAIR principles, this community of researchers, librarians, publishers and funding agencies intends to provide guidelines to improve the findability, accessibility, interoperability (i.e., the ability to integrate with other data sources) and reusability of data and other digital research objects.

Biodiversity is facing unprecedented challenges like climate change, invasive species and habitat loss. To better understand the consequences of these pressures on biodiversity, data from different disciplines need to be integrated, which is only possible if individual datasets are well-managed, interoperable and publicly available.

To find out more about modern data management challengers, read our interview with GBIF’s Head of Informatics Tim Robertson, linked below.

Tim Robertson: The World of Ecological Data

Stefan Vriend is a population ecologist working as a PhD student at the Norwegian University of Science and Technology. Through his work on the spatial variation of hole-nesting bird demography, life history and phenotypic selection he got involved in the SPI-Birds Network and Database. You can read more about his research here, read more of his articles on Ecology for the Masses here or follow him on Twitter here.

The Anthropocene: A Human-Dominated Age on the Horizon

The impact of our species on the conditions and fundamental processes on Earth is unmistakable. From carbon emissions to the cities that dominate skylines to the plastics that swirl around in our seas, the evidence of our existence can be found anywhere. And now, a group of geologists considers our impact so drastic that a new epoch – the Anthropocene – should be declared. Whilst this change has gained support in much of the scientific community, others say that the Anthropocene is more about sensationalism or pop culture than science, as clear evidence for a new geological time is lacking. So whilst much of the scientific community, the general public and the media have already embraced the Anthropocene, the search for hard evidence for the start of a human-dominated age continues.

Read more

Tim Robertson: The World of Ecological Data

Image Credit: GBIF, CC BY 4.0, Image Cropped

When I was a child, I’d often study books of Australian birds and mammals, rifling through the pages to see which species lived nearby. My source of information were the maps printed next to photos of the species, distribution maps showing the extent of the species range. These days, many of these species ranges are declining. Or at least, many ecologists believe they are. One of the problems with knowing exactly where species exist or how they are faring is a lack of data. The more data we have, the more precise an idea we get of the future of the species. Some data is difficult to collect, but yet more data has been collected, and is simply inaccessible.

At the Living Norway seminar earlier this month I sat down with Tim Robertson, Head of Informatics and the Global Biodiversity Information Facility. GBIF is an international network that works to solve this data problem worldwide, both by making collected data accessible and by helping everyday people to collect scientific data. I spoke with Tim about the journey from a species observation to a species distribution map, the role of GBIF, and the future of data collection.

Read more

Andrew MacDougall: Finding Ecological Solutions for the Farming Industry

Image Credit: W.carter, CC0 1.0, Image Cropped

The farming industry has had a strange relationship with ecology over the years. They have been maligned by claims they shoot native species, suck up water greedily from nature and the people, and pollute our countryside with pesticides, all whilst producing the food many of us subsist on. So why haven’t ecologists worked with them more closely?

At the recent NØF 2019 Conference, Tanja Petersen and I sat down with Canadian ecologist Professor Andrew MacDougall, who has been working with the farming industry for the past six years to quantify their contribution to ecosystem services. We talked about the often damaging public perception of farmers, how his stereotypes were challenged by working with them, and the biggest problems the industry will face heading into the next fifty years.

Read more

The Changing Face of Ecology: ASFB Edition

I speak to another group of influential researchers on how ecology has changed over the recent decades

I speak to another group of influential researchers on how ecology has changed over the recent decades (Image Credits: Sam Perrin, Mallee Catchment Management Authority, Gretta Pecl, CSIRO, CC BY-SA 2.0, all images cropped)

I’m 29. It’s not like that makes me uniquely qualified to give me the youth’s perspective on ecology today. But it does make me 100% unqualified to talk about how ecology has changed in recent decades. So when I was at the recent Australian Society for Fish Biology Conference (a line you’ll surely be sick of if you’ve been keeping up with my recent interviews), I decided to get some uniquely fishy perspectives on how our discipline has changed over the last 20-30 years.

The following commentaries are naturally from fish biologists. If you’d like a broader perspective on the changing face of ecology, check out Part One and Part Two of this series. You can also find the full interview with all the scientists below by clicking on their names.

Read more

Episode 7: The Monsters of Doctor Who

We pick apart the flaws of the monsters from one of our favourite TV shows, Doctor Who (Image Credit: Doctor Who Spoilers, CC BY 2.0)

We pick apart the flaws of the monsters from one of our favourite TV shows, Doctor Who (Image Credit: Doctor Who Spoilers, CC BY 2.0, Image Cropped)

We bash down the doors of the TARDIS and ruthlessly mock the Doctor’s rogue’s gallery. We discover that Dave loves Peter Capaldi, Sam reveals a strange fetish and Adam tries to justify the moon being an egg.

Topics covered:  Parasitism, species dispersal, behavioural ecology, the moon

6:30 – What is the Doctor?
11:44 – The Vashta Nerada (Silence in the Library/Forest of the Dead)
19:34 – The Alien Stingrays (Planet of the Dead)
30:34 – The Flood (Waters of Mars)
41:40 – The Krafayis (Vincent and the Doctor)
50:49 – Moon Egg F***er (Kill the Moon)
1:01:00 – Swimmy Long Boi (Thin Ice)
1:08:12 – The Doctor Who Royal Rumble

Paul Hebert: Saving Humanity From a Lonely Planet

Image Credit: Bernard Spragg, CC0 1.0

Earlier this year I sat down with Professor Paul Hebert, leader of the International Barcode of Life project. We talked at length about this project, which you can read more on here. But what’s the use of documenting life on our planet if we don’t use the information? And how do we maintain hope for species in a world where more seem to be dying out every day?

Read more

« Older Entries