We use the yeast Saccharomyces cerevisiae as a model organism to explore, at the species level and at the genome scale, the dynamics of gene acquisition at the time they arise and before they are removed by selection.
Objective 1. Comprehensive analysis of the evolutionary trajectories of natural NAGs. We analyse the evolutionary trajectories of NAGs based on high quality population genomics datasets and using state of the art computational approaches. <br /> <br />Objective 2. Experimental measures of organismal fitness of natural and engineered NAGs. We measure fitness effect of natural and engineered NAGs by combining genome editing, synthetic biology and high throughput phenotyping. <br /> <br />Objective 3. Sequence optimization and wiring of NAGs into the cellular network. Finally, we will investigate how NAGs functionally wire into cellular networks by measuring transcription, translation and post- translational modifications for both natural and engineered NAGs in various environments in order to quantify their expression and how level of expression and cellular integration. <br /> <br />Altogether we plan to generate a large pool of heterogeneous data that we will integrate into a single conceptual framework including information on ecological niches and genetic backgrounds.
We use modern genetics and genomics approaches to understand the emergence, evolution and function of novel accessory genes. Some of the novel approaches developed, are described in the section below, whereas the other methods are described in the application.
Revisiting the pangenome of S. cerevisiae:
We assembled a non-redundant dataset of 8,084 protein coding genes comprising (i) all genes originally defined in the 1,011 strain pangenome, (ii) 285 published de novo genes and (iii) 340 ORFs defined in SGD not initially included in the original dataset. We developed a new bioinformatic pipeline to look at the conservation of the protein length for these 8,084 genes across the 1,011 strains. We found that 6,332 genes were conserved in more than 90% of the isolates, 989 genes were segregating between 10 and 90% of the strains and 763 genes were absent from more than 90% of the strains.
The landscape of shared alleles
We developed a new method to detect shared alleles between S. cerevisiae and S. paradoxus and this allow to detect introgressions with unprecedented resolution. We are now applying this method to a subset of 100 genomes that capture a large part of S. cerevisiae variation. The results underscore a complex and connected histories between the two species, with multiple rounds of interbreeding that emerge from the shared allele analysis.
Functional characterization of de novo genes:
We are using CRISP/Cas9 to introduce a stop codon in the coding sequence of 32 de novo genes for which KanMX4 insertion mutants are not available. We have so far successfully generated non-sense mutations in 22 of them. Preliminary phenotyping tests revealed phenotypic variations associated with these mutations.
In parallel we developed a pipeline to search 9 different databases to gather information on gene expression and co-regulation as well as gene and protein interactions, to predict the functional pathways of the de novo genes. This allowed us to predict functional properties for 85 de novo genes with an estimated reliability of 75%.
We made extensive progresses in objective 2 and produce mapping populations to measure fitness effects of introgressed genes. Several pool experiments were already sequenced, and some interesting candidate genes were identified and are currently under investigation. One related paper on the underling mechanism that generate introgression has been submitted.
Once multiple NAGs have been tested and their contribution to fitness has been proved, we will start the experiments described in the objective 3.
1. Tattini L*, Tellini N, Mozzachiodi S, D'Angiolo M, Loeillet S, Nicolas A and Liti G*. 2019. Accurate tracking of the mutational landscape of diploid hybrid genomes. Molecular Biology and Evolution. 36:2861-2877.
Evolutionary innovations can occur via several mutational paths but an integrated view of novel gene acquisition at the population level is still lacking. The different mechanisms that generate or introduce new genes in a genome act continuously throughout evolution, resulting in genes of different ages. However, most studies on novel genes were conducted at the interspecific level by comparing a single reference genome per species. Population genomics is presently shifting the field of comparative genomics from single reference genomes to population pangenomes, thereby giving access to individual variations in presence/absence of genes at the population level. The collection of genes present in a population is named the pangenome. The pangenome consists of core genes invariably present in all individuals and accessory genes that are segregating in the population at varying frequencies. Here, we define Novel Accessory Genes (NAGs) as the subset of accessory genes that are not vertically inherited from the species ancestor but were gained or emerged during the diversification of the species. NAGs originate mainly from introgression events, horizontal gene transfers (HGTs) and de novo gene emergence. When novel genes first appear, they are present at very low frequency in a population, likely in a single individual, and subsequently can either disappear or eventually raise in frequency. Therefore, we make the hypothesis that the best evolutionary time scale to investigate gene acquisition mechanisms would be at the population level.
We propose to use the yeast Saccharomyces cerevisiae as a model organism to explore, at the species level and at the genome scale, the dynamics of gene acquisition at the time they arise and before they are removed by selection. We will directly benefit both from the high quality population genomic dataset available and the possibility to perform large-scale experimental testing to a level not accessible in any other model organism. The main goal of our proposal is to systematically explore the mechanisms of acquisition of NAGs and their relative contributions to the emergence of evolutionary novelties.
First, we will analyse the evolutionary trajectories of NAGs based on high quality population genomics datasets and using state of the art computational approaches. Second, we will measure fitness effect of natural and engineered NAGs by combining genome editing, synthetic biology and high throughput phenotyping. Finally, we will investigate how NAGs functionally wire into cellular networks by measuring transcription, translation and post- translational modifications for both natural and engineered NAGs in various environments in order to quantify their level of i cellular integration. Altogether we plan to generate a large pool of heterogeneous data that we will integrate into a single conceptual framework including information on ecological niches and genetic backgrounds. The outcome of this proposal will provide a multi-layered view of the functional and evolutionary impacts of NAGs in the yeast pangenome revealing general rules leading to the evolution of new genes and new functions in eukaryotes.
Monsieur Gianni LITI (Centre Cancer et vieillissement)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
CQB Biologie Computationnelle et Quantitative
Centre Cancer et vieillissement
Help of the ANR 493,487 euros
Beginning and duration of the scientific project: September 2018 - 48 Months