Donald Hobern: Cataloguing the Planet’s DNA

Posted on July 8, 2019 by Sam Perrin 2 comments

I spoke with GBIF’s executive secretary and amateur lepidopterist Donald Hobern about how DNA barcoding fits into modern conservation and ecology (Image Credit: Donald Hobern, CC BY-2.0, Image Cropped)

DNA barcoding has revolutionised science. Ask anyone working in evolution or taxonomy these days what the biggest changes are the they’ve seen in their discipline, chances are it’ll be to do with gene sequencing and DNA processing. So when the International Barcode of Life (iBOL) Conference came to Trondheim last week, I jumped at the opportunity to learn more about the behind the scenes work that goes into cataloguing the DNA barcodes of life on earth.

I sat down with Donald Hobern, Executive Secretary of iBOL and former Executive Secretary of the Global Biodiversity Information Facility (GBIF) and Director of the Atlas of Living Australia (ALA). Donald joined iBOL just as they launched BIOSCAN, a $180 million dollar program which aims to accelerate the cataloguing of the world’s biodiversity in DNA form. We spoke about BIOSCAN, the technology behind bringing occurrence and genetic data together, and how the work iBOL and GBIF do ties into the bigger picture of global conservation and sustainability.

Sam Perrin (SP): You began your career at business tech giant IBM. What was your journey from IBM to iBOL like?

Donald Hobern, Executive Secretary, iBOL (DH): As a child, my only real interest was natural history. Had things gone a different way I might have ended up as a zoologist, but I dropped chemistry in high school and ended up as a programmer instead. I ended up working for IBM UK, for a total of 13 years. Eventually I emigrated to New Zealand. Now all this time I had been doing lots of birdwatching, but as I had small children and found it harder to get out and about, I shifted my attention to things that would come to me, particularly moths. In New Zealand there were very few information resources, so trying to organise the various scattered data that I could find on the web about all of these various insects was difficult. So I started playing around, making my own little databases and identification tools. And simultaneously I was getting more tired of just supporting IT customers.

I was searching for something new on the web and I found the early 2002 job adverts for GBIF, and applied. I moved from New Zealand to Denmark, to take on responsibility for the early work around data standards and interoperability for global biodiversity data. I was very fortunate to fall right into the center of the movement for integration of biodiversity information. Since then I’ve had the privilege of leading the Atlas of Living Australia, Australia’s national facility for organising biodiversity information, then coming back to Copenhhagen and leading GBIF for seven and a half years. Through all of that, I’ve had the opportunity to see the global landscape of efforts around the world to understand and to map biodiversity, to provide the kind of information we need to conserve the natural world.

SP: How would you best relate the work that GBIF, ALA and iBOL do to the bigger picture?

DH: A lot of it fits within those larger themes, such as the work of the Convention on Biological Diversity, and right at the top, the United Nations sustainable development goals (SDGs). And I’m interested therefore in how every vegetation survey or insect collection can feed into this enormous interconnected understanding of the world that we need if we’re going to treat biodiversity responsibly as part of this world that we live in.

I imagine it this way. If there’s a politician who’s really serious about a fully rounded agenda for improving the life of their people and leaving behind a liveable world, I’d like them to be thinking about every single one of the SDGs. But if I was in that position and trying to make judgements about how to maintain biodiversity as part of that, I would need to know so many things. What’s our expected projection of urbanisation? If my economy is decarbonising, what sort of land do I need for wind or solar farms? Today, it would be very hard to put biodiversity properly into that equation.

So for a long time I’ve been interested in the work of iBOL, and the progress that has been made around DNA barcoding as a way to shine a light into the portions of natural systems where an awful lot of the real species turnover happens. Where much of the real diversity exists and yet where we just have so little data today. And if we can use DNA barcoding to help to fill out our understanding of the diversity on earth and support future taxonomic and conservation efforts, that’s a huge win for me and for the planet.

“To my mind, our first question is how many species exist, and how do we recognise them. The second question is how these species are organised in time and space. If we could answer those two things, we’d have a lot of what I think we need if we’re going to support those sustainable development goals.” (Image Credit: Donald Hobern, CC BY-SA 2.0)

SP: You’ve spent a lot of time with GBIF bringing together datasets from everywhere. What’s the biggest challenge integrating such disparate data?

DH: The biggest challenge for us is that the identification of organisms remains very, very difficult. Understanding what species are recorded is difficult, whether it’s a specimen in a collection or a field observation by a naturalist, or something referred to in the literature from 50 years ago. We may have a name, but there’s very rarely adequate information associated with that name to be able fully to untangle all of the possible sources of error and develop a confidence metric around that identification.

One of the other activities that GBIF is heavily involved in is addressing the fundamental catalogue of species that currently have defined names, and trying to link that up to some of the species for which we have DNA-based identifications. Many species IDs are Latin binomials, but increasingly we have IDs which involve things like barcode index numbers (BINs).

Historically the greatest challenge we’ve had was getting data to be sufficiently open. When GBIF first started there was a lot of concern, even from institutes that were major proponents of a global biodiversity facility. One concern was that if they shared their data freely on the web, someone would download it, put it on a CD and sell it at a profit. Thankfully none of that seems ever to have happened. At the same time, over the last 20 years we’ve seen a growth in open data, open science, to the point where funding bodies actually expect data to be open. Today the barriers to sharing data are more around the challenges of people having to do a little bit of extra work to share their data. And so we need to lower that technical threshold.

SP: Can you take me through what BIOSCAN is?

DH: iBOL has gone through an evolution since its early days. The initial focus revolved around Sanger sequencing. The idea was to develop a library of barcodes, primarily from museum specimens, but also from some freshly collected organisms. The efforts of iBOL and the energies of so many groups around the world, meant that we came to the end of the first phase of this activity with a reference library for around 500,000 species. And over that period and through the last couple of years since that first phase ended, we’ve seen the advance of sequencing technologies, which offer us more and more scope for continuing to lower the price point for sequencing new materials.

Now we’ve hit the point where the costs make it much more feasible for us to think about a massive upscaling of the number of different individual specimens that we sample. That really accelerates building out a library of barcodes that is much closer to the known range of biodiversity of earth. The same reduction in costs has put us at a point where it becomes much more viable for us to use metabarcoding approaches as a general purpose tool for repeated surveying and sampling of the environment, rather than spending a lot of time looking for one or two particular species a couple of times. We’re probably on the edge of solutions that allow us to almost ‘stream’ biodiversity data, just as we could monitor atmospheric carbon or ocean pH, with sensors giving us continuous signals. We’re not quite there yet, but we’ve got most of the tools in place and are making progress on the rest.

BIOSCAN is really the fusion of these things, an attempt to use the current state of technology over the next few years to accelerate the filling out of a reference library, which is the fundamental bedrock of what iBOL does. We have the tools, the processes, the pipelines for establishing a global biodiversity monitoring system based on DNA identifications. BIOSCAN brings that together. To my mind, our first question is how many species exist, and how do we recognise them. The second question is how these species are organised in time and space. If we could answer those two things, we’d have a lot of what I think we need if we’re going to support those sustainable development goals.

SP: What sort of implications does this have for ecology?

DH: So at a basic level, iBOL is producing a list of individual species. But every species is itself a universe of associations with other species, microbes, bacteria, fungi, parasitoids, food. And that if we’re able to barcode those simultaneously with the central organism around which they are clustered, then we start building up pictures of the interconnectedness of that individual. It’s giving us the tools to probe really interesting questions about evolution and ecology, particularly community ecology.

SP: What’s the leap you’d like to see in barcoding technology?

DH: I’m a passionate amateur naturalist. I spend a lot of time on iNaturalist uploading photos and working with others on identifications, particularly for Australian insects. I live next to probably the most intensively sampled (from the standpoint of insect taxonomy) patch of land in Australia. Right next to Black Mountain where the Australian National Insect Collection is based. And there have been decades of pretty intensive collection in the area. But even there, of the thousand or so moth species that are readily found in my garden, probably 20-30% can be assigned to a genus, but with no formal name. So it would be really fabulous if I knew whether the ones that I’m getting in my garden and the ones that somebody is getting 800km away are the same thing, whether they’re different, and for us somehow to be able to collaborate in providing much more robust information about these insects that are fascinating us all. Rather than just sticking them all in a undiscriminated genus ‘bucket’.

Even today in Australia, I find that people are using the image-based information they can get out of the Barcode of Life Data (BOLD) System to help them with diagnosis. But we don’t know what all the different forms represented on BOLD really are. We don’t necessarily know in each case whether it’s a good species or whether it’s ultimately just a well-marked variant of a known species. No-one’s dug deeply enough into all of them. So I would love to be in a position where I was not necessarily be able to tell which species it was, but at the very least to be able to get barcodes cheaply, and for others to be able to do the same thing, so we can compare our findings. Until recently we’ve been talking about submitting a plate of 96 specimens for 1000 dollars to get DNA barcodes for each specimen. If we could get that down to a price point of 20c, 50c per specimen, I think there would be a radical upsurge in amateurs wanting to avail themselves of this opportunity to understand more about the insects in their own local patch.

SP: So how do we make that happen?

DH: If we get to the point where we can secure enough funding just to write off a large part of the world’s sequencing costs, everything will change. Take running a malaise trap for a year as an example. If sequencing is off the table as a cost, then you’ve got the cost of a malaise trap, the cost of the consumables and ethanol, and probably some postage costs, but you’re probably only talking about a few hundred dollars per trap per year. And if we had price points for different countries and sampling methods, whether that’s a collecting plate fastened to the sea bed or an insect trap in a coniferous forest, then we can start thinking about more novel ways to engage a much broader part of the community in cataloguing life on Earth.

For this to happen, we need to make the idea and the information more accessible. I picture a web-based view for each collecting site that shows you the organisms that are recorded in this month at this site, the ones that we have binomials for, the ones that we don’t. Which ones we have sequences for, which ones we have images of. Then you can explore everything from temporal diversity to an organism’s seasonal changes. Exposing all kinds of diversity in a very visible way could really connect with the public, and with educators. As well as thrilling the heart of amateur naturalists like myself.

Donald is a passionate amateur naturalist, and has a fantastic flickr account, which you can check out here (Image Credit: Donald Hobern, CC BY 2.0)

SP: We’re talking about big leaps in technology on the DNA side, but there’s ongoing development on the informatics side too. Have there been times where growth on either side hasn’t matched the other?

DH: I’d actually say that there are four interconnecting areas where our efforts have to keep lining up. There’s the sequencing technologies and the associated bioinformatics, but there are also issues around long-term storage, both for the digital products and for the physical/DNA samples. And I think those 4 things are going to keep stretching us in all directions.

Firstly, there are potentially vast volumes of data that could be coming to us, from different sources, from metabarcoding activities. If we get to the point where there are sensors in the environment that are more or less continually streaming, that could mean hideous volumes of data.

Secondly, one of the exciting things about barcoding is that it decouples two things that previously have we have always had to treat in sequential order. Unless you had sorted out the taxonomy you couldn’t record what you had found in the environment with any clarity. That’s what’s called the ‘taxonomic impediment’. So what DNA barcodes do is make it possible to desynchronise the collection and the identification. But, if we really want to benefit as much as we can from that desynchronisation, we need to have an IT system that continuously updates the understanding and the labelling of the barcodes that are collected, based on the increasingly fine resolution of taxonomic understanding. So as a new organism is described, any barcodes that match up to it that were previously in unnamed BINs below genera become associated with the species name, along with all the distributional space/time data that they bring. And there are challenges in building a continuously reinterpreting system like that.

SP: These are big leaps we’re talking about. So how do we translate those technological leaps into changes in society?

DH: Well the biggest challenge is how to communicate this in a way that overcomes the increasingly deeply grained skepticism about science generally. And that’s difficult. Because you’re dealing with people who are not interested in facts. And that’s going to be tough.

I’m very much an analytic, fact-based person. And so my assumption has always been that we have to make it easier for people to get access to information that allows them to make intelligent decisions. But right now, we’re not limited by knowledge, we’re limited by lack of will to act. And I’m not sure how much that’s going to change just from having more information.

Yesterday there were photos in the Guardian of a glacier in Greenland with dogs running across it, and the top ten centimetres were thawed. Images like that have an enormous effect. Like polar bears on ice floes. And I think we have to allow some optimism for the fact that single images, single things that really are just memes, can have really deep effects. The view of the earth from space is something that affected how people saw the fragility of the environment.

But on the data side, we’re starting to see public response to extreme weather events and rapid changes in the environment, as well as to things like the insect apocalypse and the IPBES summary. That’s a cause for some encouragement.