Tomàs Marquès-Bonet is the Principal Investigator of the Comparative Genomics Lab, at the Institute for Evolutionary Biology (IBE: CSIC-UPF), and for years he has been participating in a project that is revolutionizing the world of genomics and biodiversity: the Vertebrate Genomes Project.
Now, with the publication of an article in Nature, they have put the first of the 70,000 pieces of this genomic puzzle. We wanted to know more.
Obligatory first question: What is the Vertebrate Genomes Project?
More than 10 years ago a group of people (including myself), seeing the global biodiversity crisis and taking into account that we didn’t have the reference genome (a complete high quality genome) of many species, we started a project to sequence 10,000 vertebrates, giving priority to the most threatened species: this was the Genome10K consortium.
We started working and, although 10,000 species was already a very ambitious goal, in 2017 we started the Vertebrate Genomes Project (VGP), which is an international project that aims to sequence, over the next decade, the 70,000 species of vertebrates that exist on Earth nowadays.
All these years of work have culminated in this first article in Nature, where we have already published 16 complete genomes.
From 16 to 70,000 there’s a big gap … will 10 years be enough?
The article is only the starting gun of the project. What we wanted was to publicize the gold standard that we have created to sequence genomes of the highest quality. Since the beginning of the project, tests and techniques were explored, and we can say that we have already found the method, that is, the combination of genomic techniques that allow us to obtain reference genomes cheaply enough but with excellent quality, almost error-free.
“With the article we want to make a call to use this set of techniques and thus generate high quality genomes that serve the entire scientific community”
We started by publishing 16 genomes because the importance of the article was not the number but the sequencing methodology. Now we are preparing the next article with more than 130 species, and the goal will be to try to reach 500 by the end of the year.
In terms of time, the limiting factor is the samples: for many species there is no good quality material or it is very difficult to get it. If we had the samples, technically there are no problems, and with distributed networks where laboratories from all over the world collaborate, we could get it in ten years without problems.
“We encourage any group that has samples to contact us. We will look for a way to finance the collaboration and obtain data of the same quality to be able to compare them “
You say that the technical level is not a problem, but getting these high quality genomes must not be easy… What challenges have you faced during the process?
One of the crucial parts of this project (and of genomics in general) is knowing how to assemble the parts of a reference genome. From a group of cells with the same DNA we have to read the bases that compose it one by one and, therefore, it is not a simple process.
We have been working on the technical aspect for many years, but the key point that has allowed us to launch VGP now and not before, is that, in recent years, the field of genomics has enjoyed a great revolution in the way to assemble these reference genomes. Now we have the technical ability to read the entire genome from end to end and reach regions of the genome that we could not read before.
“When it comes to reading and assembling genomes, until four years ago it was as if we watched TV from the 80s at 480p and now we do it in 4K”
This reading ability, which until now was reserved for the analysis of the human genome or model organisms for biomedicine (C. elegans, mouse …) can now be applied to the genome of vertebrates in a generalized way.
Now that you are talking about biomedicine… Could this methodology that you have developed be applied to other branches of research or to human health?
Of course! One of the reasons why we insist so much on the standardization and quality of the genomes that are generated in this project is to be able to compare them with the human genome and thus be able to make interpretations, which I believe will be the challenge of genomics in the XXI century.
There are a large number of fragments of our genome that we still do not know what they do, and this is where comparative genomics can help us, since all vertebrates share thousands of characteristics (systems, evolution and development, brain …).
“Having hundreds or thousands of complete vertebrate genomes opens the door to compare and interpret the human genome and diseases”
Thus, while we generate data that serve to better understand the human genome, we want to create reference genomes for the conservation of biodiversity, which is a crucial issue.
With these data we want to facilitate the work of the communities that work with highly threatened species. With this information they will be able to include genomics in their conservation policies in an economic and simple way.
About biodiversity conservation, do you feel that you are in a race against time with this project?
The obvious answer is yes: while we wait for the technology to exist, every year hundreds of species disappear in the world. Therefore, it is very important to start, in parallel to these sequencing efforts, other projects that help conserve, for example, viable cell lines of animals, and thus contribute to the conservation of biodiversity.
“There are already many cases of species that will never return because we do not have living material. If we don’t keep material from different species now, we may not be able to do in the near future”
How important is collaboration between groups on projects like the VGP?
All these projects need multidisciplinary knowledge: zoology, taxonomy, bioinformatics, phylogenetics… They are large-scale projects that require the participation of centers and laboratories from all over the world in order to advance towards the common goal: the conservation of biodiversity.
For example, all the data generated by the VGP is open access: the genomes are public, and the only thing we ask is that users contact us to explain what they want to do. We also ask them to wait until we publish our results, since we are the ones that generated the genomes.
“Our intention is to publish all the genomes in public databases as we sequence them”
This data could therefore be used for other projects, right?
Totally! In fact, the VGP is part of another even more ambitious project that wants to sequence the entire DNA of the Earth: the Earth Biogenome Project (EBP).
In addition, at the European level, the European Reference Genome Atlas (ERGA) was created, which is the association of 400 researchers from different fields to sequence all organisms in Europe. The ERGA has different nodes, such as ERGA Spain, where Rosa Fernández (IBE) is one of the two representatives of the project.
The IBE is very involved with these initiatives! Your group, in particular, what role has it played in the VGP?
That’s true! But other centers of the Barcelona Biomedical Research Park (PRBB), such as the National Center for Genomic Analysis (CNAG) – Centre for Genomic Regulation (CRG) also participated actively. The CNAG-CRG is one of the most important hubs in Europe at the sequencing level and Ivo Gut is one of the authors of the article, since part of the Oxford Nanopore data (one of the techniques that has been used) was generated under his leadership.
In addition, the CNAG-CRG is expected to be one of the genome production centers in Europe in the ERGA, and they will mantain the quality standards that we have imposed on the VGP.
On a personal level, I have been on the Scientific Council of the project for many years, working together with 15 other researchers from around the world. Thus, in addition to being in the scientific direction, with my group we have participated intellectually making decisions, such as the techniques to use or the species that we would prioritize, but also actively sequencing part of the VGP genomes.
Finally: could we say, then, that we are in a good moment for genomics and the conservation of biodiversity?
I think so, and the proof is that even private companies are showing an interest: Oxford Nanopore, one of the large sequencing companies, presented a pilot project, Org One, where they will finance the sequencing of long-reads (the most expensive and most important part of obtaining reference genomes) of critically endangered species.
Right now, the balance between technology and economic spending is so good that many of this type of initiatives are emerging and as a scientific community we have to take advantage of it.
Rhie A. et. al. Towards complete and error-free genome assemblies of all vertebrate species (2021); Nature.