CE45 - Mathématique, informatique, automatique, traitement du signal pour répondre aux défis de la biologie et de la santé

High-performance genomic profiling of DNA replication by nanopore sequencing – NanoPoRep

High-performance genomic profiling of DNA replication by nanopore sequencing.

Going beyond DNA replication stochasticity.<br />Eukaryotes replicate DNA by activating numerous replication origins. Despite the advent of massive DNA sequencing technologies, origins remain difficult to identify and our knowledge is still incomplete. Some studies report narrowly localised DNA initiation sites whereas others describe broad initiation zones. Because origin activation is stochastic, different origin cohorts are chosen among numerous potential origins at each replication cycle.

Revolutionise the characterisation of DNA replication on single DNA molecules.

Activation of each replication origin establishes two divergent replication forks that perform bi-directional DNA synthesis with variable speeds. Forks can slow or mark pauses at specific genomic sites but the distribution of fork speed at all genome locations is still unknown. Solving the nature of origins and analysing fork progression requires high-throughput analyses at the single molecule level because cell-population methods provide only an average picture where inter-cellular variability and rare events are masked in the average. Current single molecule methods such as DNA combing can reveal fork progression by monitoring non-standard nucleotide incorporation with antibodies. However, they provide no sequence information unless they are combined with DNA probes. This is laborious, low resolution and low throughput. High-throughput single molecule genomic profiling of DNA replication remains to be achieved. This challenge can be attained using nanopore sequencing as the monitoring apparatus of non-standard nucleotide incorporation. Sequencing precisely localizes the analysed molecules on the reference genome and promises high-throughput. Nanopore sequencing is an emerging technology where ionic current through a bioengineered nanopore is recorded during the transit of a single-stranded DNA molecule through the pore. The nucleotide sequence of the molecule is then determined from the resulting current signal variations thanks to dedicated signal processing tools. We carried out a pilot project providing a proof of concept that available nanopore technology sensitivity is sufficient to carry out the project. The main scientific barriers to be lifted are thus to make a nanopore current analysis pipeline discriminating at least one of the non-standard nucleotides compatible with DNA replication studies, in addition to the four canonical nucleotides, and then to use this pipeline to map the replication fork speed and orientation in yeast and human.

We use as non-standard nucleotide Bromodeoxyuridine (BrdU), a thymidine (T) analog compatible with replication studies, and the MinION system of nanopore sequencing from Oxford Nanopore Technologies (ONT).
Single stranded DNA molecules where thymidine can be fully substituted by BrdU were prepared in vitro using primer extension on linearised plasmid DNA. The presence of BrdU modifies the current intensity at most (though not all) thymidine positions, without perturbing notably the current at the other bases. These results demonstrate that the MinION system can discriminate BrdU from the standard thymidine.
A S. cerevisiae strain genetically modified to depend on the external supply of thymidine and/or BrdU for growth was cultured in mediums with variable proportion of T and BrdU. In this way, we prepared 5 yeast DNA samples where thymidine is substituted by BrdU in variable proportions (0%, 9%, 38%, 69% et 91%) as determined by mass spectroscopy. The 5 samples were sequenced on the MinION. As seen for the in vitro samples, nanopore current shifts induced by BrdU are also detectable on labelled genomic DNA.
These samples constitute a reference for supervised learning allowing us to develop the RepNano software that implement two strategies for the estimations of the local proportion of BrdU incorporation from the series of normalised current shifts between successive nucleotides. The first method (CNN) is a trained convolutional neural network with 3 convolutional layers which predicts BrdU incorporation ratio in windows of 96 current shifts. The second method (TM) uses the distributions of current shifts at T/BrdU sites in either thymidine or fully BrdU substituted reads to estimate the probabilities associated to the T and BrdU states of an observed current shift. It allows to estimate that ~20% of T/BrdU sites are informative. The mean BrdU content estimated by the two methods is in good agreement with the mass spectrometry measurement for every sample.

RepNano fulfils the main computational objective of the project as it allows, from MinION nanopore current signals, a quantitative estimate of BrdU incorporation, a non standard nucleotide compatible with replication studies. BrdU incorporation profiles obtained by RepNano on yeast samples submitted to a brief (4 min) BrdU pulse-chase present well localised asymmetric motifs, with a steep increase from ~0 to 60-80% BrdU and a shallow decrease to ~10% BrdU. We interpret the steep segments to reflect a rapid increase in the intracellular BrdUTP/dTTP ratio during the BrdU pulse, and the shallow segments to reflect a progressive decrease in this ratio during the final chase. Thus, we could develop a signal processing tool asymmetry that detect replication fork and determine the direction of their progression at an unprecedented 200 nucleotides resolution. The full procedure named FORK-seq allowed us (i) to report the location of 60545 oriented replication forks in very good agreement with mean field estimates of replication fork directionality profiles, and particularly (ii) to map 4964 replication initiation events and 4485 terminations. The latter achievement allows a significant progress in our understanding of the replication programme in yeast. It reveals a new type of initiation events, dispersed on the genome away from known origins that account for ~9% of all initiations. Similarly, we observe that termination is more dispersive than previously thought, with 18% of termination events located in segments where cell population methods could only detect a predominance of initiation events. These results illustrate the power of our single molecule genomics approach.

We designed a methodology to determine the orientation of replication forks thanks to the unexpected asymmetric temporal dependency of BrdU incorporation that we successfully applied in yeast. We now develop a model of the kinetic of the intra-cellular BrdUTP/dTTP ratio that can predict the asymmetric BrdU incorporation profiles allowing us to deduce the speed of each fork by fitting to the model. Hence, the data already accumulated (~5 forks per kb) will provide a first map of replication fork speed in yeast. Further experiments will densify the map allowing to obtain the local fork speed distribution at high resolution. The next step will be to extend the analysis to the human genome. Pulse-labelling of replication with BrdU is commonly used in combing experiments so the analysis pipeline optimised in yeast is expected to be directly applicable in human. However, this will entail to overcome the throughput challenge. To compensate for a genome 250 folds larger than the yeast genome, we experiment with the selection of target regions prior to nanopore sequencing. The length of DNA replicated by a single fork being ~5 folds longer in human than in yeast, we develop a labelling protocol with successive pulse-chase of BrdU, which will allow us to question the stability of the speed of a single fork over time.
The project also pursue computational objectives which aim to relieve (i) the current dependency on ONT proprietary software, which could question the applicability of RepNano if ONT technology changes, and (ii) the need to assemble learning datasets which can be a complex process in some cases, which limits the possibility to reproduce our approach for other applications.

The consortium has released to the scientific community RepNano as a free open source software (MIT license). This allows other teams to directly perform single molecule analysis based on BrdU labelling being for replication studies or other applications.
github.com/organic-chemistry/RepNano

1 article in an international peer reviewed journal:
Hennion M, Arbona JM, Lacroix L, Cruaud C, Theulot C, Le Tallec B, Proux F, Wu X, Novikova E, Engelen S, Lemainque A, Audit B, Hyrien O (2020) FORK-seq: replication landscape of the Saccharomyces cerevisiae genome by nanopore sequencing. Genome Biology 21, 125. hal-02979039

2 oral presentations including one invited talk:
- Invited oral presentation by M. Hennion. Mapping DNA replication using nanopore sequencing. (London Calling 2019, London, UK, 22-24 May 2019).
- oral presentation by O. Hyrien. Quantitative, single-molecule analysis of replication initiation and fork progression using nanopore sequencing (Cold Spring Harbor Meeting on “Eukaryotic DNA Replication and Genome Maintenance”, Cold Spring Harbor, USA, 3–7 September 2019).

5 poster presentations by:
- M. Hennion. Mapping DNA replication using nanopore sequencing (Q-Life meeting, Paris, 11 April 2019; GDR 2019 «Stress réplicatif & cancer«, 9-10 May 2019, Banyuls-sur-mer, France).
- B. Audit. Emergence of the spatio-temporal replication program - Role of origin distribution heterogeneity (Cold Spring Harbor Meeting on “Eukaryotic DNA Replication and Genome Maintenance”, Cold Spring Harbor, USA, 3–7 September 2019).
- H. Kabalane. Quantifying the co-regulation strength between DNA replication and gene transcription (idem).
- B. Theulot. Use of Nanopore sequencing to map genome replication at the single-molecule level (idem).

The project proposes a revolutionary method to characterize DNA replication on single molecules. Eukaryotes replicate DNA by activating numerous replication origins that each establishes two divergent replication forks that perform bi-directional DNA synthesis with variable speeds. Despite the advent of DNA micro-array and massive DNA sequencing technologies, origins remain difficult to identify and our knowledge is still incomplete. Solving the nature of origins and analyzing replication progression require high-throughput analyses at the single molecule level because cell-population methods provide only an average picture where inter-cellular variability and rare events are masked in the average. State-of-the-art single molecule methods monitor the incorporation of non-standard nucleotides during replication. However, they provide no sequence information unless they are combined with DNA probes. This is laborious, low resolution and low throughput. High-throughput single molecule genomic profiling of DNA replication remains to be achieved. This challenge can be attained using nanopore sequencing as the monitoring apparatus of non-standard nucleotide incorporation. Sequencing precisely localizes the analyzed molecules on the reference genome and promises high-throughput. Nanopore sequencing is an emerging technology where ionic current through a bioengineered nanopore is recorded during the transit of a single-stranded DNA molecule through the pore. The nucleotide sequence of the molecule is then determined from the resulting current signal variations thanks to dedicated signal processing tools. We carried out a pilot project providing a proof of concept that available nanopore technology sensitivity is sufficient to carry out the project. The main scientific barrier to be lifted is thus to make a nanopore current analysis pipeline discriminating at least one of the non-standard nucleotides compatible with DNA replication studies, in addition to the four canonical nucleotides.

We will develop an open-source signal processing pipeline for nanopore current analysis concomitantly with experimental validations and applications. We will follow neural network based approaches that are currently the most efficient methods in the field. The first objective is to detect replication tracks resulting from one short pulse (1-2 minutes) of bromo-desoxyuridine (BrdU), a thymidine analog, allowing to measure replication initiation and progression much more precisely and at higher throughput than the state-of-the-art but also to immediately identify the genomic loci where replication events take place. The second objective is to make the technology also sensitive to replication progression orientation. We will tackle this challenge using two consecutive pulses of BrdU of different concentration and developing the analysis pipeline that can discriminate them. The spatial order of low and high BrdU incorporating regions will provide the replication progression orientation at each pair of BrdU tracks. NanoPoRep project will determine the distributions of single replication fork speed and orientation at all points in the genome of yeast and human cells with a high resolution, and characterize DNA replication stochasticity. NanoPoRep will question the relationship between replication progression and chromatin organization and provide maps of potentially asymmetric replication barriers. The final product delivered by NanoPoRep will both contain software usable by other teams to directly analyze nanopore replication datasets and a programming suite allowing readily adaptation to other experimental settings (different non standard nucleotide or nanopore). NanoPoRep will thus contribute to the emergence of nanopore sequencing as a technological breakthrough for biological assays and diagnostic tools and thus help to respond to challenges in biology and human health.

Project coordination

Benjamin AUDIT (LABORATOIRE DE PHYSIQUE DE L'ENS DE LYON)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

LPENSL - CNRS LABORATOIRE DE PHYSIQUE DE L'ENS DE LYON
IBENS Institut de biologie de l'Ecole Normale Supérieure

Help of the ANR 509,327 euros
Beginning and duration of the scientific project: February 2019 - 42 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter