Tag Archives: management

The Modern Biologist’s Challenge: Data Management

Modern biologists often do most of their most integral work not deep in a forest, but sitting behind a laptop while fuelling their caffeine addictions (Image Credit: gdsteam, CC BY 2.0, Image Cropped)

When you are asked to picture a biologist, chances are that many will picture someone like Jane Goodall or David Attenborough: a determined scientist wearing a zip-off pants and a pair of sturdy boots making their way through the thick vegetation of a remote Pacific island to study the intricate social behaviour of an elusive ground-dwelling mammal. Yet these days a large portion of modern biologists embark on very different journeys. Equipped with a computer full of code and mathematical models, they venture through a jungle of spreadsheets and tables filled with row upon row of data.

First of all, some nuance is needed. I might fit the picture of the biologist who only leaves their office to refill their coffee mug or cool down after another computer meltdown, but the majority of biologists do fit the above description of the ‘traditional biologist’ to varying degrees. They might spend time out in the field, growing plants in greenhouses or cultivating microorganisms in the lab. But nowadays they’re almost all spending some time wrangling, analyzing and visualizing data behind their computers. And as this type of scientist has slowly become the norm, the amount of biological data floating around has grown exponentially. And this comes with a whole new set of challenges.

The Challenges of Data Management

Good data management is fundamental to produce high quality research. It starts with the creation and collection of data. Even if the process involves clear protocols, calibrated measuring devices and well-trained volunteers, students or researchers, the many people that are often involved in data collection will introduce errors and biases. Identifying sources of potential error and bias and documenting these explicitly will make it possible to account for them at a later stage, yet often it’s hard to do this.

After collection, data are digitized and converted into a format suitable for subsequent analyses. During this process, a researcher, often with a particular study or research project in mind, makes any number of of small, seemingly insignificant decisions that determine how the data are structured. The number of files to store the data in, variable names and data types might be logical to the researcher who processed the data, but might not appear so obvious to their student. Metadata or similar files and quality checks are often missing, so it is difficult to figure out how to interpret the content of the data. Choosing a consistent, intuitive format that is also usable in future work is not easy. As biologists are rarely trained in data management, the typical dataset may be a database manager’s worst nightmare: unorganized, inaccurate and inefficient.

Data management does not only entail the creation and processing of data; it also includes sharing and reusing data by the scientific community. It has become increasingly common to be asked to share the data used in a scientific paper. Online repositories as Dryad – a community-led platform that is committed to making data available for research and educational reuse – or code-sharing platforms like GitHub are often used, but the available data is often a mere summary of the actual data used. It is not so surprising: imagine being a researcher responsible for the long-term individual-level monitoring of a species that is very dear to them. It can be very frightening to make years and years of commitment and valuable information available to the public, as it means that other researchers can incorporate that data into their own papers, even before you’ve had a chance to publish your own research. Sharing data can, however, be very valuable for the visibility and influence of the owner’s research, encourage collaborations and new research ideas, and improve transparency – a theme of increasing importance in the Open Access movement.

Community Standards and Initiatives

The challenges described above become even clearer when one integrates data from different sources. Inconsistencies and errors accumulate, and the many different formats and data structures make the conversion of these data in a usable format difficult and time consuming. Luckily, there are some initiatives out there that recognise the problems with data management.

Community data standards are one way to tackle the infinite number of formats. Community data standards are, as the name implies, data formatting standards commonly used by a community. One of the most widely used data standards is Darwin Core, a standard that offers a clear and flexible framework for compiling biodiversity data using a glossary of terms, but there are numerous data standards tailored for specific research fields (e.g., Open Traits Network, a community of researchers and institutions working towards the standardisation and integration trait data, and SPI-Birds, a network and database with a community-defined, standardized method for formatting data on hole-nesting birds).


Whilst the ubiquity of the house sparrow means there is plenty of data on it, that data can be a nightmare to bring together (Image Credit: TK McLean, Pixabay licence)

Progress towards integration of data from different sources has also been made through databases and initiatives as the Global Biodiversity Information Facility (GBIF), an international network and research infrastructure with the aim to provide open access to biodiversity data, GenBank, a database of all publicly available DNA sequences, and FORCE11. Using the FAIR principles, this community of researchers, librarians, publishers and funding agencies intends to provide guidelines to improve the findability, accessibility, interoperability (i.e., the ability to integrate with other data sources) and reusability of data and other digital research objects.

Biodiversity is facing unprecedented challenges like climate change, invasive species and habitat loss. To better understand the consequences of these pressures on biodiversity, data from different disciplines need to be integrated, which is only possible if individual datasets are well-managed, interoperable and publicly available.

To find out more about modern data management challengers, read our interview with GBIF’s Head of Informatics Tim Robertson, linked below.

Tim Robertson: The World of Ecological Data

Stefan Vriend is a population ecologist working as a PhD student at the Norwegian University of Science and Technology. Through his work on the spatial variation of hole-nesting bird demography, life history and phenotypic selection he got involved in the SPI-Birds Network and Database. You can read more about his research here, read more of his articles on Ecology for the Masses here or follow him on Twitter here.

Fredrik Widemo: The Manifold Conflicts Behind the Hunting Industry

Image Credit: USFWS Endangered Species, CC BY 2.0, Image Cropped

Rewilding is a tricky business. Bringing back species that once roamed a country as their native land may seem like a worthy cause, but it is often fraught with conflict. People don’t want predators threatening their safety, or herbivores destroying their crops. Rural vs. urban tensions come into play. Local and federal politics get thrown into the mix.

With that in mind, I sat down with Associate Professor Fredrik Widemo, currently a Senior lecturer with the Swedish University of Agricultural Sciences. Fredrik has previously worked at both the Swedish Association for Hunting and Wildlife Management (where he was the Director of Science) and the Swedish Biodiversity Centre. We explored some of the complexities behind the rewilding of wolves and its effects on the hunting and forestry industries in Sweden.

Read more

Modernising Ecological Data Management: Reflections from the Living Norway Seminar

Ecological data is constantly being collected worldwide, but how accessible is it?

Ecological data is constantly being collected worldwide, but how accessible is it? (Image Credit: GBIF, CC BY 4.0, Image Cropped)

This week Trondheim played host to Living Norway, a Norwegian collective that aims to promote FAIR data use and management. It might sound dry from an ecological perspective, but I was told I’d see my supervisor wearing a suit jacket, an opportunity too preposterous to miss. While the latter opportunity was certainly a highlight, the seminar itself proved fascinating, and underlined just how important FAIR data is for ecology, and science in general. So why is it so important, what can we do to help, and why do I keep capitalising FAIR?

Read more

Re-Analysing Forest Biodiversity

The Gribskov Forest in Denmarkj, where this study took place (Image Credit: Malene Thyssen, CC BY-SA 3.0, Image Cropped)

Biodiversity response to forest structure and management: Comparing species richness, conservation relevant species and functional diversity as metrics in forest conservation (2019) Lelli et al., Forest Ecology and Management, https://doi.org/10.1016/j.foreco.2018.09.057

The Crux

The classification of biodiversity is something that has become more and more relevant as the term ‘biodiversity’ has worked its way into the public’s vernacular. How we measure biodiversity can vastly influence our perception of it, and whilst we’ve previously looked at spatial interpretations of biodiversity on EcoMass, today I’m examining a paper that looks at interpretations of biodiversity by species groups.

Species richness (how many species are present in a given place) is often the go-to measurement for biodiversity. But it doesn’t always help when trying to conserve an ecosystem. For instance, we may wish to focus on certain types of species which are rare, or that preserve certain ecosystem functions. This paper looks at the differences in the effect of management on biodiversity, depending on which approach to biodiversity you take.

Read more

Policy for the Masses: Thoughts from a Day with IPBES

Bill Sutherland was one of two keynote speakers in last week’s seminar on biodiversity and ecosystem services (Image Credit: Øystein Kielland, NTNU University Museum, CC BY 2.0)

I’ve been on a bit of a policy trip lately. The latest Norwegian Ecological Society conference was heavily policy based, so much so that it inspired me to get in touch and set up a meeting with local freshwater managers in a country in which I do not speak the local language. So when the CBD hosted a one-day seminar on the Intergovernmental Science-Policy Platform for Biodiversity and Ecosystem Services (mercifully usually referred to only as IPBES) rolled into town, I was right on board.

Read more

Fishers and Fish Science: The Australian Fish Scientist Perspective

Fishing is an important part of Australian society. So is communication between fish scientists and fishers strong enough?

Fishing is an important part of Australian society. So is communication between fish scientists and fishers strong enough? (Image Credit: State Library of Queensland, Image Cropped)

Last Thursday, I posted an article on the need for more contact communication the fish scientist community and the fishing community, which you can find here. It gives a breakdown of why better communication between the two groups is mutually beneficial, and how it could be improved. The piece was written after talks with a number of prominent Australian fish biologists, whose thoughts I’ve shared in more detail below.

Read more

« Older Entries