An important step towards sharing human genomic data

The first human genomic datasets are now available on the EGA Federated Network, based at the PRBB.

We talked to Teresa D'Altri, Project Management Coordinator at the EGA-CRG, about the EGA Federated Network, based at the PRBB, and other projects.

Two years after the birth of the Federation of European Genome and Phenome Archives (FEGA), the first datasets have been made public. Although stored at national level – in this case in Poland, Norway and Sweden – they are accessible globally through the EGA portal, based at the Barcelona Biomedical Research Park (PRBB) .

These datasets cover topics as diverse as gut metagenomes, single cell transcriptomics in autoimmune disease patients, genomic mapping of a whole population or paediatric cancers. And they are all accessible to the scientific community according to each node’s national standards.

A genomic data federation

FEGA was launched in 2022 as an extension of EGA (based at EBI in Cambridge, UK, and the Centre for Genomic Regulation (CRG) in Barcelona). Seven countries have already set up national nodes to share metadata from their human genome (and other types of data) studies so that scientific teams around the world can safely re-use them. The EGA supports them by providing its platform as well as training and assistance.

Thus, after years of hard work, what Arcadi Navarro, director of the EGA team, said at the launch of FEGA has been fulfilled: “That any researcher can find out if there are studies and data of interest to them, anywhere in the world”.

It took us a long time, because it was very important to lay the foundations correctly, to have a durable and secure structure, technically and ethically, and one that can also accept new technologies as they come along,” explains Teresa D’Altri, Project Management Coordinator at the EGA-CRG. “But we already have the first research datasets,” she celebrates.

GDI, the European Genomic Data Infrastructure

At the European level, the GDI (Genomic Data Infraestructure) project also aims to provide access to genomic, phenotypic and clinical data from all over Europe. It is co-funded by the European Commission and the participating countries.

“The idea has been brewing for a long time and has gradually grown,” explains D’Altri. In 2018, a political declaration was made at European level to “sequence 1 million human genomes of different ancestry or origin“. Various countries joined in, and in 2020 an agreement was signed to fund what was then called B1MG (Beyond 1 million genomes). In 2022, the infrastructures for this project – which eventually became known as GDI – finally began to be built.

The project is managed by ELIXIR Europe, who asked EGA for help. “We are a kind of ‘expertise providers’ for the GDI. In the end, both projects have a lot in common. It is the same concept: an extended structure consisting of local nodes and a common portal at European level”.

“FEGA and GDI share the same concept: an extended structure consisting of local nodes and a common portal at European level.”
Teresa D’Altri (EGA-CRG)

Indeed, D’Altri explains that some countries have decided to merge their FEGA and GDI nodes. At the moment, one of the software packages to be used by GDI is based on one developed at EGA Barcelona.

Why so many genomes?

GDI is not the only human genome sequencing project. There are other similar projects in the US (All of us), on the African continent (H3Africa) or in Asia (GenomeAsia 100K). But why do we need so many genomes?

“Human genomic data is very rich in information, has great potential and can be reused many, many times to answer different scientific questions,” explains D’Altri.

One of the problems, though, is the lack of diversity. “We have sequenced a lot of genomes, but most of them are of European ancestry. We lack a significant representation of the other ancestries. In fact, it wouldn’t be such a problem if we had more African genomes and fewer European genomes, because Africans are much more diverse and therefore more representative of human genetic diversity,” explains D’Altri.

There is a lack of diversity in the genomes we have described; there is no significant representation of non-European ancestry.

These projects – GDI, All of us, H3Africa, GenomeAsia 100K and others – should help to fill this gap. Not forgetting the other part – sharing these data, as FEGA does. As of today (February 2024), the FEGA network consists of seven European national nodes, but there are dozens of countries working to establish a FEGA node.

As more and more data is added to the FEGA network, we will move closer to a truly global resource for accelerating disease discovery and improving human health in key areas such as cancer and rare diseases – the kind of research that particularly benefits from global data sharing.

Leave a Reply

Your email address will not be published. Required fields are marked *