Adequate graph structures for third-generation sequencing data exploration – Agate
In the last years, third-generation sequencing (TGS) changed the whole genomic landscape. Providing long-range information that can overcome most genomic repetitions, we can now obtain chromosome-scale assembled sequences even from vertebrate genomes. However, flawless de novo assembly is still a challenge as assemblies may contain errors or miss regions and variations. For second-generation sequencing (SGS), many applications choose to skip the assembly step to index and work directly on assembly graphs that still contain most relevant information before the usage of heuristics and the introduction of potential errors. Those successful approaches were possible because of graph structures that could be queried efficiently, using an index that could scale up to the largest datasets. To meet this renewed need for TGS, this project aims to conceive and implement efficient graph structures to perform versatile queries adapted to those sequences.
Project coordination
Antoine Limasset (Centre de Recherche en Informatique, Signal et Automatique de Lille)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
CRIStAL Centre de Recherche en Informatique, Signal et Automatique de Lille
Help of the ANR 227,584 euros
Beginning and duration of the scientific project:
January 2022
- 48 Months