BLANC - Blanc

Archaeal DNA Repair – CoCoGen

Submission summary

Today, 506 genome sequences are available, among which 80% are bacterial, and another 2400 are ongoing genome projects. Comparative genomics is a major challenge in post-genomic biology. Identifying common genomic structures, as well as differences between genomes is a prerequisite to understand genome maintenance and evolution. Such analysis are bound to bioinformatics approaches that rely on genome comparison. Although some computational methods exist, numerous methodological problems remain to be solved; present solutions address incompletely issues critical for biological interpretation, such as accuracy, significance, and the presence of rearrangements. Cocogen is a multi-disciplinary project that gathers algorithmicians, statisticians, bioinformaticians, and biologists to address the challenges of comparative genomics. It aims at investigating genome comparison from an algorithmic and statistical view-point, and at understanding the molecular mechanisms of genome evolution in bacteria. Our goal is first goal to provide fast and adequate programs for genome comparison, as well as a measure of the statistical significance of a comparison. Second, to use these methods to detect precisely the conserved (core-genome) and variable regions of bacterial genomes. And third to exploit resulting comparisons to analyse and experimentally test the molecular mechanisms underlying core genome conservation and variable regions accumulation in 3 bacteria. The originality of the project is 1/ to investigate the comparative genomics of closely related genomes, and especially intra-species comparisons, 2/ to propose fundamental research in computer science and statistics to exhibit truly innovative solutions for this problem, 3/ to involve mathematicians and computer scientists, who develop new methods and biologists who evaluate their results on bacterial genomes. The project is organized in 4 strongly interconnected parts: * Algorithmical developments will rely on three ideas. First, we will use novel filtration strategies to find very rapidly similar regions in two or more genomes. Filters based on spaced seeds should enable us to find inexact anchors for genome alignment, and thus to improve both the alignment accuracy and the computational efficiency. Second, we will design negative filters to detect regions that are not similar to any other genome, in order to find directly and incrementally genome-specific regions of a bacterial strain. Third, we want to release the ordering constraint of conserved regions, and authorize regions to be displaced in the comparison, which more realistically reflects the evolution of divergent genomes. Our approach aims at computing a relative compression of one target genome knowing other genomes, where the length of this compressed description approximates the relative Kolmogorov complexity of the target genome. * Our statistical approach will first consist in developing a statistical analysis of the optimal seed length to anchor an alignment. It will also propose a simulation approach to evaluate the local significance of an alignment as well as its robustness with respect to parameters. It thus requires fast comparison algorithms. Finally, we will propose global scores to evaluate a complete alignment and perform the theoretical analysis of their statistical significance. * The third part will systematically compare the genomes of bacterial strains (as already started in MOSAIC, genome.jouy.inra.fr/mosaic) with the methods developed in Part I and evaluate the comparisons using the significance measure investigated in Part II. * The biological interpretation will first identify short DNA motifs, named KOPS, involved in chromosome segregation, propose a typology of strain variable regions, and infer putative mechanisms responsible for their origin. The drawn hypothesis will be investigated experimentally in Escherichia coli, Staphylococcus aureus, and Bacillus subtilis. Finally we will combine information on core genome and strain variable regions to infer rules of genome evolution. Among the deliverables of Cocogen account: fast, accurate algorithms for genome comparisons and methods to evaluate the statistical significance of alignments, a Web resource with systematic detection of conserved and variable regions in bacterial genomes, and an accurate description of the mechanisms source of genome evolution in three bacterial species. Beyond the bacterial case, the methods and concepts developed will prove relevant for the analysis of eukaryotic genomes and metagenomic data. Compared to the project submitted in 2006, the team of the coordinator has been strengthened, the demand of financial support shortened, and the biological objective stated more precisely. The project contains several challenges. However, the strong commitment of the partners, together with the support that the National Research Agency can allocate, make this project achievable in 3 years.

Project coordination

Eric RIVALS (Organisme de recherche)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

Help of the ANR 250,000 euros
Beginning and duration of the scientific project: - 36 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter