Europe has taken action to halt the loss of biodiversity. Last September saw the official launch of the Biodiversity Genomics Europe (BGE) project, funded by the European Union with 21 million euros, with the aim of boosting genomics as a tool for biodiversity conservation in Europe.
To achieve this, this umbrella project brings together two existing European networks based on two different DNA technologies:
- European Reference Genome Atlas (ERGA), which is the European node of the US-led Earth Biogenome Project (EBP). ERGA in turn includes other initiatives at regional level (such as the Catalan Initiative) or at taxonomic level (such as the Vertebrate Genomes Project). The aim of ERGA and the Earth Biogenome Project is to promote and facilitate the sequencing of the complete genome of all eukaryotic biodiversity.
- BIOSCAN, which is part of the International Barcode of Life (IBOL) network, led from Canada. The aim of this network is to determine, not the entire genome, but short DNA sequences (one or a few genes) that act as barcodes. These serve to differentiate between species, just as conventional barcodes distinguish products in a supermarket, for example. This project aims to barcode thousands of individuals of each species.
They are two different strategies, two extremes of a range, explains Rosa Fernández, researcher at the Institute for Evolutionary Biology (IBE: CSIC-UPF) and representative of the Spanish board of ERGA. It is what is called “depth vs breadth”. And the type of approach depends on what you want to achieve.
On the one hand, with complete sequencing, the genome of one or a few individuals of each species is completely sequenced (at a cost of about 5000€ per gigabase of genome size). This offers more genomic information to understand the tree of life – especially of lesser known organisms -, to study the evolution and adaptations, which can be used for the conservation of species, or even to discover substances produced by these organisms that may be relevant at the biomedical level.
On the other hand, barcoding allows the sequencing of only one or two genes from thousands of individuals, obtaining very little information from many individuals in a cheap way (about 3-4€ per gen per species). This makes it possible to understand the composition of whole communities and to biomonitor them, to have a better view of variability, to identify cryptic species, to see which populations have greater genetic diversity and thus to be able to manage biodiversity.
Eighty percent of the world’s species have not yet been discovered or described at a scientific level. Therefore, the project aims to create a DNA barcode for each species, with the goal of creating an “inventory of life on Earth”.
In this first phase, the BGE project aims to sequence about 500 species and to obtain the “barcodes” of several thousand more species, through ERGA and BIOSCAN.
Large-scale collaboration, IBE leadership and the role of CNAG
It is impossible to carry out alone a project of this magnitude. Therefore, it counts with an unprecedented volume of scientific collaborations, with about 30 partners from 20 European countries.
In Spain, ten CSIC centers have received 1.4 M€: 800,000 for ERGA and 600,000 for BIOSCAN. This will partly cover the collection of about 50 endemic and protected species of the Iberian Peninsula whose genomes will be sequenced as part be of the 500 reference genomes that ERGA intends to sequence. The collection of the samples, from all Spain and coordinated from IBE, involves a great deal of management work, starting by deciding which especies to sequence and including all the logistics of collection, from collecting on dry ice and the necessary permits, to sending the samples to the sequencing services.
“In order not to reinvent the wheel, a lot of coordination is needed”
Rosa Fernández (IBE:CSIC-UPF)
The sequencing and genome assembly part will be precisely the responsibility of another of the centers linked to PRBB, the CNAG-Centre for Genomic Regulation (CRG). The National Center for Genomic Analysis is one of the five major DNA sequencing and data analysis centers that will contribute to the project. It is estimated that it will participate in the assembly of the reference genome of more than 100 species (the 50 Spanish species and others from the rest of Europe), which will be made available to the international community through public repositories.
The species to be sequenced in the Biodiversity Genomics Europe (BGE) will come from:
- 3 biodiversity hotspots, in Spain, Greece and Slovenia, in each of which about 50 species endemic to the country will be sequenced.
- community samplings: species nominated by the entire ERGA community at the European level, with an emphasis on collecting and sequencing species from all countries.
- specific species considered as ‘case studies’ because of their special interest, e.g. as disease transmitters, etc.
But how are the specific species to be sequenced chosen? “We are developing prioritization rules for community sampling to decide which species are sequenced. These include the interest of the species (e.g. if no other member of that family has been sequenced), the size of the genome, good representation at the country level, etc.,” says Fernández. “In addition, the idea is also to keep parts of the samples in museums, and that is why we have collaborations with the Museum of Natural Sciences in Madrid, among others,” adds the biologist. And there is also a part of the money to help develop cell cultures of species for which there are still none– a kind of “cryozoo” that will be directed by Tomàs Marquès-Bonet, also from IBE.
“Spain will contribute the reference genomes of approximately 50 species as part of the project. IBE will coordinate the collection of samples, and CNAG will carry out the sequencing”
A large project full of challenges
As is to be expected, such a project brings with it great challenges. “In such long and complex processes it is difficult to predict all the intermediate steps and how to connect them – the collection, sequencing and analysis; from who is in charge of what or who should be at each meeting, to understanding what the people who come in the next step need,” Fernandez tells us. “We have people who are experts in different fields, and sometimes we don’t speak the same language… a person who works in the field, or someone who does experiments in the lab, or someone who does computational analysis… you work at different paces, you need different things. And you also have to take into account legal and ethical issues of sample collection and shipping. For example, sending a sample from Greece to England now requires permits that are not needed to send it from Spain.”
Apart from the logistical issues, there are rather conceptual challenges. “Defining what a reference genome is (i.e. of sufficient quality), which is what this project aims to do, is not always easy. We try to achieve high quality genomes, but we define this quality based on the experience we have from vertebrates, which are very homogeneous. However, non-model animals are much more variable, more unknown, and perhaps the maximum possible quality is much lower than that of vertebrates”, explains the researcher.