CE45 - Mathématiques et sciences du numérique pour la biologie et la santé 2021

Adequate graph structures for third-generation sequencing data exploration – Agate

Submission summary

In the last years, third-generation sequencing (TGS) changed the whole genomic landscape. Providing long-range information that can overcome most genomic repetitions, we can now obtain chromosome-scale assembled sequences even from vertebrate genomes. However, flawless de novo assembly is still a challenge as assemblies may contain errors or miss regions and variations. For second-generation sequencing (SGS), many applications choose to skip the assembly step to index and work directly on assembly graphs that still contain most relevant information before the usage of heuristics and the introduction of potential errors. Those successful approaches were possible because of graph structures that could be queried efficiently, using an index that could scale up to the largest datasets. To meet this renewed need for TGS, this project aims to conceive and implement efficient graph structures to perform versatile queries adapted to those sequences.

Project coordination

Antoine Limasset (Centre de Recherche en Informatique, Signal et Automatique de Lille)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partnership

CRIStAL Centre de Recherche en Informatique, Signal et Automatique de Lille

Help of the ANR 227,584 euros
Beginning and duration of the scientific project: January 2022 - 48 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter