The GenoFish project aimed to study the specific duplication of teleost fish that took place 320-350 million years ago using comparative genomics approaches. For this, the GenoFish project has developed new genomic resources and new methodological tools.
Whole genome duplications are a recurring theme in vertebrate evolution and their analysis still remains a crucial question in genomics and evolutionary biology. In this context, the GenoFish project focused on the specific duplication of teleost fish that took place 320-350 million years ago. This duplication is of particular interest because teleost fish experienced a particularly large and rapid evolutionary radiation, and their genomes have retained a large number of duplicate genes while some other genes have returned to single copy. This extra complexity has often been suggested as an explanation for the extraordinary radiation of the teleost fishes that account for half of all vertebrate species. Based on new fish genome sequences and new methods for analyzing duplicated genes, the GenoFish project aimed to explore at the genome scale the post-duplication evolution of genes in teleost fish, with the objective of trying to unify the gene nomenclature in teleost fish in order to more effectively link existing functional information between model fish species, and other species for which this information is not available.
To answer these questions, the GenoFish project first sequenced and assembled the genomes of fourteen teleost fish species selected at key taxonomic positions in their evolution to serve as the basis for genomic evolution analyses. A particular focus has been given on the super-order Elopomorpha which includes fish such as eel, conger, moray eel or tarpon, because this group was recognized as one of the early-branching modern species groups that diverged very soon after the teleost fish whole genome duplication. Moreover, very few Elopomorph genome sequences were available in international databases at the beginning of the GenoFish project. In addition to these new genomic resources, the GenoFish project has developed new comparative genomic analysis tools to study the complex evolutionary fates of these duplicated genes. One of these tools, called SCORPIOs, allows a more precise analysis of the evolution of duplicated genes by using both the gene sequence phylogenies, which are commonly used for this type of analysis, and the conservation of the order of these genes (synteny).
Among the main results of the GenoFish project, we establish the first large-scale map of duplicated regions between fish genomes, which now allows us to propose a unified nomenclature of genes in teleost fish. Another more unexpected result of the GenoFish project challenges the classically accepted phylogenetic relationships between the major taxonomic groups at the base of the tree of life of teleost fishes.
Resources and methods developed during the GenoFish project bring many perspectives for looking at the evolution of teleost genomes after their whole genome duplication. In particular, the integration of datasets on conserved non-coding elements or microRNAs would would allow to extend our analyzes beyond protein-coding genes.
With the massive increase of genomic resources, a new challenge will also be to be able to analyze not tens but hundreds of genomes simultaneously.
Two scientific publications resulting from the results of the projects have already been published:
- Parey et al., 2020. Synteny-Guided Resolution of Gene Trees Clarifies the Functional Impact of Whole-Genome Duplications. Mol Biol Evol. 37(11):3324-3337. doi: 10.1093/molbev/msaa149.
- Thompson et al., 2021. The bowfin genome illuminates the developmental evolution of ray-finned fishes. Nat Genet. 53(9):1373-1384. doi: 10.1038/s41588-021-00914-y.
Several other publications of the project are submitted to scientific journals, and available on preprint servers, or in the process of being written.
Genome duplication is a recurring theme in vertebrate evolution; for instance, the human genome retains numerous gene family members that arose from one or two rounds of whole genome duplication (WGD) at the origin of vertebrates. These ‘extra’ genes are, in principle, available for the evolution of new functions that could drive the origin of novelties and thus contribute to the diversification of life on Earth. How genes evolve after genome duplication is thus a crucial question to understand the mechanisms by which genomes evolved and drive vertebrate development and physiology. Unfortunately, we do not yet have a sufficient understanding of vertebrate genomes to fully answer this question and the first rounds of vertebrate WGD are so ancient that they are difficult to study. To improve our understanding of evolution after WGD, this project will take advantage of the teleost-specific WGD (TGD) that arose 320-350 million years ago, after the split between Holostei and the lineage leading to Teleostei. This TGD is of special interest for the study of gene evolution because teleosts radiated shortly after the TGD resulting in many different extant lineages with a huge diversity. In addition, teleost genomes preserved a substantial number of duplicate genes while others lapsed back to single copy and this additional complexity was hypothesized as an explanation of the extraordinary radiation of this group to become half of all vertebrate species. With the advent of next-generation sequencing technologies, publicly available whole genome sequences in fish have increased dramatically in recent years. However, these fish genome resources still lack many important nodes in teleost diversity and evolution because, for instance, more than 80 % of species with sequenced genomes lie within the Euteleostei lineage. In addition to the paucity of an evolutionary-based whole genome resource, many of these recent sequences are also highly fragmented and therefore cannot be used to correctly infer synteny relationships over long genome fragments or to be certain that any individual gene is missing from the actual fish genome or just absent from the genome assembly. To solve this problem, our project will first use the cutting-edge sequencing approach of Single Molecule, Real-Time DNA (SMRT) sequencing to fill in these knowledge and resource gaps. We will provide high quality genomes in fish species carefully selected to fill key taxonomic positions with regards to teleost fish evolution. It must be noted that this objective would not have been feasible until the last few months at a reasonable cost and with such a quality high enough to allow comparative whole genome analysis. Results of this project should provide genome-wide answers on how often different gene copies are lost independently in different fish lineages and whether lineage-specific changes in duplicate gene content, gene regulation, or gene expression patterns is important for the evolution of the remarkable diversity among teleosts. In addition, and because these gene duplications also have a major impact on the quality of gene annotation in teleosts, this project will propose, supported by the results of our evolutionary-based analysis, the refinement of teleost gene nomenclature. Reforming nomenclature will link gene information across many vertebrate species, thereby bridging functional information from current major fish model species (zebrafish, medaka) to other biomedical or economically relevant fish species.
Monsieur Yann GUIGUEN (Laboratoire INRA de Physiologie et Génomique des Poissons)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
GeT-PlaGe INRA UAR1209
U-Oregon Université Oregon
U-Lausanne Université Lausanne
LPGP Laboratoire INRA de Physiologie et Génomique des Poissons
Help of the ANR 453,257 euros
Beginning and duration of the scientific project: February 2017 - 48 Months