A Hubble telescope for our genes: a new algorithm compares almost 1.5 million DNA sequences

A CRG research team has developed a new technique to compare 1.4 million DNA sequences at a time. It represents 10 times more than what had been done so far.

Picture obtained by the NASA Hubble telescope. The new algorithm, designed by the Notredame laboratory, does for the genome the same as this telescope to the space.

Picture obtained by the NASA Hubble telescope. The new algorithm, designed by the Notredame laboratory, does for the genome the same as this telescope to the space.

There are different ways to travel to the past.

One way is the macro one, looking out, observing the behavior of the huge galaxies and how the universe around us has varied over time. Another way is the micro one, looking inwards, towards the tiny genes, and also comparing their variation between species. Both ways approach technological challenges.

For the macro way, scientists have developed the Hubble Space Telescope which, orbiting the Earth, prevents distortion caused by Earth’s atmosphere.

For the micro way, computer software comparing DNA or protein sequences has been used for years. This technique, called Multiple Sequence Alignment (MSA), looks for similarities and differences amongst the biological sequences of different species to decipher how long two species diverged and predict how specific changes in a gene or a protein can affect their function. It has allowed us to advance a lot in the knowledge of our evolutionary history, and of our relationship with other species, but it has its limitations when comparing large numbers of sequences; The limit has long been 100,000 sequences.

 

Multiple Sequence Alignment (MSA) has so far allowed to compare up to 100,000 DNA sequences of different species. It looks for similarities and differences amongst them to decipher how long two species diverged and predict how they have affected their function

 

Ejemplo de un alineamiento múltiple de secuencias (MSA). Imagen de Miguel Andrade, CC BY-SA 3.0
Example of Multiple Sequence Alignment (MSA). Picture by Miguel Andrade, CC BY-SA 3.0

100,000 to 1.4 million

Now, a team of researchers from the Centre for Genomic Regulation (CRG), led by Cedric Notredame, have developed a much more efficient algorithm that allows you to compare 1.4 million sequences at the same time – 10 times more than you could until now . The more sequences that can be compared at once, the further back in time we can go

The new sequencing techniques allow us to sequence more and more genomes; for example, the Earth BioGenome project aims to sequence and catalog the genome of all the diversity of eukaryotic organisms on Earth – about 1.5 million sequences. Now, this new algorithm developed in the CRG will allow us to compare each other to decipher their relationships and their evolutionary history.

“There is a great deal of ‘dark matter’ in biology that we have not yet identified, both in our genome and in that of other organisms that, although seemingly irrelevant, may have a fundamental role in promoting human health and that of our planet, like the discovery of CRISPR with the archaea”, concludes Cedric. “Our development can help analyze genomes in more detail, to find a needle in the haystack of life genomes.”

 

Leave a Reply

Your email address will not be published. Required fields are marked *