Understating the Pathogenicity of Entamoeba Using Comparative Transcriptomics and Phylogenomics – GENAMIBE
Phylogenomics and high-throughput sequencing-driven research to understand infectious diseases
Entamoeba histolytica is the agent of human amoebiasis (dysentery and liver abscess). Two closely related species, E. dispar and E. moshkovskii are non-pathogenic and produce asymptomatic infections. What lead to their dramatically different clinical outcomes? We address this question from transcriptomic (i.e. RNA) and genomic (i.e. DNA) perspectives.<br /><br />
To understand the genomic and transcriptomic differences between Entamoeba species
First we will compare the genomes (i.e. on the DNA level) of Entamoeba species and catalogue a list of relevant genes, which might account for the adaptation of their lifestyle, i.e. commensal or parasitic. Then we will catalogue transcriptomic differences between Entamoeba species (i.e. on the RNA level), including differentially expressed genes and regulatory RNAs. Finally, we will merge the common features from the DNA and RNA data, and combine with our custom annotations of the genomes, in order discover the gene sets that might be relevant for the clinical differences between Entamoeba species/strains.<br />
On the genomics side, we will first annotate the gene families in different species and then identifiy the gene families that are “abnormally evolving”, e.g. gene family expansion, or gene families with very fast substitution rate. On the transcriptomic side, we will sequence the RNA populations in different species using high throughput sequencing methods, which will enable us to identify the genes and regulatory RNAs behaving differently in different species. Customized analyses softwares will be developed to analyze the high throughput sequencing data. Finally, to facilitate the interpretation of these data, we will develop a desktop application to incorporate the genomic and transcriptomic data with the functional annotations.
First, we revised and validated the original gene model annotations, defining a set of bona fide gene models, the data generated from this study are expected to provide valuable resources for the Entamoeba research community. We also quantified the stochastic noise of the splicing and polyadenylation processes and provided evidence to show most of the alternative splicing and polyadenylation isoforms are not likely to be functionally relevant.
Second, our data suggest elevated levels of small RNA accumulation within a gene are quantitatively correlated to the down-regulation of the corresponding mRNA between the two strains, providing evidence to support the existence of an endogenous RNA interference pathway in Entamoeba. Then, we demonstrated the pervasive existence of antisense transcripts and provided evidences to support most of them are not likely to originate from promiscuous leaky transcription of neighboring genes. We suspect the genesis of antisense RNA and small RNA are correlated. We identified several virulence factors being shut down by small RNA in the non-pathogenic E.histolytica strain, suggesting the possible roles of small RNA is pathogenicity of E.histolytica.
We expect this project to bring the following scientific breakthroughs: 1) Significant improvements of the gene annotations in Entamoeba genomes; 2) Discovery of regulatory RNAs in Entamoeba genomes; 3) Insights on the fundamental understandings of transcription in unicellular eukaryotes as a whole; 4) Setting an example of comprehensive application of phylogenomics in pathogens; 5) Demonstration of interspecies transcriptomic comparison to investigate phenotypic differences of pathogens; 6) Providing data-rich annotations of Entamoeba genomes to facilitate interpretations of high-throughput functional studies.
A Comprehensive Evaluation of Normalization Methods for Illumina High-Throughput RNA-Seq Data Analysis. Briefings in bioinformatics (In press).
We compared seven recently proposed normalization methods for the differential analysis of RNA-seq data and propose practical recommendations on the appropriate normalization method to be used and its impact on the differential analysis of RNA-seq data.
Entamoeba histolytica is a protozoan parasite and an amitochondriate pathogenic amoeba, which may cause dysentery and liver abscess in humans (i.e. amoebiasis). Disease (transmitted by cyst-contaminated water) develops in approximately 10% of infected individuals, resulting in 50 million clinical cases and 100,000 deaths annually. Differential pathogenicity is observed among E. histolytica strains: strain HM1:IMSS is virulent while strain Rahman is naturally attenuated. A closely related species, E. dispar, is non-pathogenic and produces asymptomatic infections. Another Entamoeba species, E. moshkovskii, is primarily free-living and rarely infects humans. Despite the fact that these Entamoeba species are morphologically indistinguishable and phylogenetically closely related, their clinical outcomes are dramatically different. Their phenotypic differences form an excellent theoretical basis for genome-wide comparative analyses to search for factors relevant to pathogenicity and adaptation to humans. Our aim is to understand the phenotypic differences between Entamoeba species/strains using comparative phylogenomic and transcriptomic approaches, which is supported by two primary objectives; 1) To catalogue the relevant and high-resolution transcriptomic and phylogenomic differences between Entamoeba species/strains and 2) To functionally annotate the Entamoeba protein families and discover the relevant gene sets relevant for the phenotypic differences between Entamoeba species/strains.
First, taking advantage of the next-generation sequencing technologies, we plan to characterize the transcriptional landscape of the Entamoeba species/strains at an unprecedented scale and resolution, including the generation of genome wide maps for coding and non-coding transcripts, small RNAs and anti-sense RNAs, as well as the expression profiles of these transcripts in three culture conditions (axenic culture, nitric oxide treatment and ex vivo colon culture). By discovering the diversity and expression profiles of small RNAs and anti-sense RNAs we expect to provide valuable insights into potential roles of non-coding RNA across the Entamoeba transcriptomes.
Second, from an evolutionary perspective, the phenotypic differences between pathogenic and non-pathogenic Entamoeba species result from natural selection of loci mutations (i.e. adaptive selection). Identification of these loci might shed light on the genomic basis of their phenotypic differences. Therefore, using integrated phylogenomic methods, we plan to perform a genome-wide scanning of the interesting mutations from an evolutionary standpoint, including adaptively evolving sites and point mutations that are likely to have functional impacts. By co-analyzing these coding differences with the transcriptomes described above, this study is expected to provide a comprehensive picture of the genotypic differences of Entamoeba species.
Finally, the relatively poor annotations of Entamoeba genomes represent the bottleneck for the analysis of high-throughput data. We plan to re-annotate the coding regions with traceable functional annotations and present them with a desktop application, enabling researchers to easily integrate and analyze their data on a pathway/network basis rather than the laborious “from-gene-to-gene” basis.
This project represents the first and the most comprehensive study on protists in terms of : 1) the types of transcripts we are able to capture; 2) the quality of the coding transcript map ( i.e. map of transcription start site, splicing junctions, alternative splicing pattern and poly-adenylation sites); 3) the scope of comparisons (i.e. inter-species, inter-strain and inter-culture-condition comparisons).
We expect this study to have a significant impact on the fundamental understanding of transcription in lower eukaryotes and set an example for high-quality standard for studies in similar kinds of protists.
Project coordination
NANCY GUILLEN (INSTITUT PASTEUR) – nguillen@pasteur.fr
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partner
IP INSTITUT PASTEUR
IP INSTITUT PASTEUR
Help of the ANR 641,744 euros
Beginning and duration of the scientific project:
- 36 Months