CE12 - Génétique, génomique et ARN 2021

Detailed and mechanistic characterization of Topologically Associating Domain (TAD) boundaries using complementary single-molecule sequencing and super-resolution imaging approaches – TADwalker

Borders in our genome: a new layer of regulatory complexity with unforeseen impact on the neighboring domains.

Chromosomes in mammalian cells are hubs of biological activity, including the precise regulation of the thousands of genes within the genome. To focus these activities within specific regions of the chromosomes, thus preventing unwanted interference, our chromosomes are separated into several thousands of physically insulated domains (known as “Topologically Associating Domains” or “TADs”) within the cell nucleus. How the separation between TADs is achieved remains incompletely understood.

Deciphering the regulatory code of genomic borders and how this helps to improve the separation between neighboring TADs.

The complexity of mammals, including humans, requires highly complex and precisely regulated patterns of cell type-specific gene activity. To achieve such precision, many genes use additional regulatory elements (so-called enhancers) that can be located far away on the chromosome. Enhancers activate their target genes by forming DNA loops, but how enhancers can activate their targets while ignoring other genes remains incompletely understood. A better understanding of this process is important, as incorrect loops between enhancers and promoters are frequent root causes of cancers and developmental defects. A major step forward in our understanding how enhancers can effectively select their target genes came from the discovery of TADs: the organization of mammalian chromosomes into several thousands of physically insulated domains. Due to their insulated nature, DNA loops between enhancers and genes within the same domain are promoted, whereas the formation of loops between neighboring domains are rare. Consequently, the selection process for enhancers to find their target genes becomes much simplified. The organization of chromosomes into physically insulated domains within the cell nucleus requires the presence of borders between TADs. An essential factor in this process is the DNA-binding CTCF protein, whose binding is detected at nearly all TAD borders. Most previous studies on TAD structure and function had assumed that one site of CTCF binding was sufficient to create a functional TAD border. Yet, prior to the start of the TADwalker project, the project leader (Université Paris-Saclay / CNRS) had used the results from genomics studies to reveal that most TAD borders contained multiple binding sites for CTCF. This finding suggested a need for CTCF binding cooperativity or synergy to create TAD borders, providing potential explanations for two observations: (i) CTCF binds many additional sites in the genome without creating a noticeable TAD border and (ii) many borders between TADs appear as extended zones within chromosomes, rather than a single point as would be expected if a single CTCF binding site was enough. This latter observation was further supported by microscopy studies from the project partner (University of Pennsylvania, USA), who had noticed stochastic intermingling between regions surrounding a TAD border. The main aim of the TADwalker project was to determine if clustered binding of CTCF at TAD borders showed further special characteristics (as compared to binding elsewhere in the genome) and if this clustered binding contributed both to the function of borders and the structure of the neighboring TADs.

Based on our observation that CTCF binding sites are often clustered at TAD borders, combined with the finding that many TAD borders extend over longer distance, we hypothesized that the CTCF site that functions as the border may vary from cell to cell. To study the dynamic nature of TAD borders, and the associated variable function of individual CTCF binding sites, we thus needed technologies that could distinguish TAD border function from within individual cells.

 

To achieve our aim of distinguishing cell-to-cell variation of TAD border function, we decided to use an approach consisting of two complementary technologies:

1. single-molecule sequencing (project leader, Université Paris-Saclay / CNRS).

2. super-resolution single-cell microscopy (project partner, University of Pennsylvania, USA).

 

These technologies were applied to cells where CTCF binding at a TAD border could be modulated in different ways:

1. Unmodified: CTCF could bind to all the four naturally occurring sites that create the separation between the neighboring TADs.

2. Local modification of CTCF binding sites: a single site was removed, maintaining binding at three out of four naturally occurring sites.

3. Removal of the large majority of the CTCF protein from the cells: all four naturally occurring binding sites remained, but very little protein (< 5%) was available to bind these sites.

 

In parallel, we also applied a computational strategy to characterize the diversity of TAD borders and clustered binding of CTCF in more detail. Moreover, using in-depth data analysis and computer simulations based on polymer physics (which reproduce behavior of the long DNA molecules that constitute chromosomes), we determined if and how clustered CTCF binding influenced the integrity of the neighboring TADs.

This collaborative project resulted in two main results:

1. Our complementary single-molecule sequencing and super-resolution single-cell microscopy in normal cells revealed that each individual CTCF binding site measurably contributes to the separation between the two neighboring TADs. Yet, even at the border where four CTCF binding sites are grouped together, in around 5% of cases we find that the regions directly located on both side of the TAD border become intermingled.

>> Our FIRST conclusion is that TAD borders are moderately permeable, resulting in the fusion of TADs in a small fraction of cells.

By next repeating these experiments in cells where the number of binding sites or the amount of the CTCF protein is reduced, we showed that intermingling increased, with a more pronounced effect when the total amount of protein was reduced, as compared to the removal of a single site.

>> Our SECOND conclusion is that CTCF binding sites additively contribute to the separation between neighboring TADs, thus providing a DNA-encoded means to regulate the permeability of TAD borders: the addition of more CTCF binding sites will reduce the permeability of the border.

 

2. Our computational analysis and computer first described the diversity of TAD borders within a mammalian genome. We found that borders are diverse in the size they occupy on the chromosome, ranging from narrow (~ 10% of borders) to highly extended. Next we determined if the size of the borders was linked to the distribution of CTCF binding sites. Here we found a direct correlation, confirming that CTCF binding occurs within a more extended region at extended borders. Within these borders, CTCF binding sites are present anywhere, with no particular preference (e.g. the extremities). Following up, we used computational simulations to determine if and how borders that are more or less extended influence the mingling between the neighboring TADs. Here we found an unexpectedly large difference, with the narrow borders promoting mingling between the neighboring TADs. This effect did not only emerge for regions close to the borders, much further away as well. Our simulations revealed that this is due to an unanticipated effect whereby borders attract the neighboring domains, thus promoting mingling. The same effect occurs at extended borders, but the increased distance between the neighboring TADs prevents mingling. Upon reanalysis of biological data, we confirmed this effect that we identified using computational simulations.

>> Our THIRD conclusion is that the width of borders influences the mingling of neighboring TADs, which extends over long distance. The DNA-encoded spacing of CTCF binding sites therefore also influences mingling of TADs over much longer distance.

 

These findings help to decode CTCF binding and how this regulates communication between TADs, thereby better explaining how changes to the DNA sequence can perturb gene activity in various disease settings.

This project has confirmed how the DNA-encoded binding patterns of the CTCF protein help to modulate the permeability of genomic borders and, in extenso, the regulatory communication between neighboring TADs. For now, these observations remain mostly limited to the function of CTCF, which we studied at a small number of borders.

 

In follow-up experiments, including in the ANR funded InsulatorGrammar project, we want to continue decoding how border structure modulated inter-domain communication. First, we want to develop approaches that allow us to address these questions in a more systematic manner: rather than removing parts of a border, followed by the investigation of its effects, we want to construct borders ourselves. For this we will construct a cell system that will allow us to integrate our own “designer borders”, followed by a precise measure of regulatory communication between both sides around the border. Secondly, we want to expand our investigations beyond CTCF alone, as other genomic features, including active genes, appear to have the capacity to act as border as well. Here, again, the incorporation of other genomic features in our “designer borders” will be instrumental.

 

Topologically Associated Domains (TADs) compartmentalize vertebrate genomes into functional neighbourhoods for gene regulation, DNA replication, recombination and repair. Both structural variation in the genome and perturbed protein function can cause the reorganization of TAD structure. In the context of disease, TAD restructuration has been reported in a range of different cancers and embryonic defects, which until now has mostly been linked to transcriptional deregulation and gene-enhancer contact rewiring.
TADs are formed by a continuously ongoing mechanism of Cohesin-mediated loop extrusion, which requires blocking at defined TAD boundaries to maintain separation between neighboring TADs. In vertebrate cells, the large majority of TAD boundaries bind the CTCF insulator protein. Experimental studies and in silico models for TAD formation have generally assumed that a single static CTCF binding site is sufficient to create a functional TAD boundary. Partner 1 in the project has recently reported that most TAD boundaries have a modular nature where multiple CTCF binding sites cluster in extended transition zones. We speculate that this clustering counters against the dynamic DNA binding kinetics of CTCF. Partner 2, using super-resolution oligopainting, has reported that domains on either side of a TAD boundary can variably intermingle in individual cells, confirming the dynamic capacity of TAD boundaries. More recently, Partner 1 has developed Nano-C, a multi-contact 3C assay that allows simultaneous targeting of multiple viewpoints, to confirm that individual CTCF sites within modular TAD boundaries additively contribute to insulation. Partner 2 has recently developed a 100-fold more efficient version of Oligopaints that are compatible with multicolor sequential FISH used to reconstruct chromatin folding at single allele resolution. Until now, a quantitative and molecular characterization how modular and dynamic TAD boundaries interact with the loop extrusion machinery to insulate TADs has not been reported.
In our TADwalker project, we will capitalize on our complementary expertise in Nano-C and super-resolution Oligopaint imaging to perform a molecular characterization of TAD boundaries in mouse embryonic stem cells. Practically, we will combine the explorative capacity of Nano-C to identify elements with insulating function and the capacity of super-resolution imaging to quantitatively analyze large numbers of individual cells.
To achieve our overall goal, the project is divided into three work packages. First, we will generate an in-depth description of a representative set of modular TAD boundaries in normal cells, which will serve as a reference for our mechanistic studies. Second, we will determine the molecular interactions of modular TAD boundaries with the loop extrusion machinery. For this purpose, we will remove, on-by-one, the major components of the loop extrusion machinery or the CTCF protein, followed by determination of how the structure and insulation of TAD boundaries are affected. Third, we will precisely dissect the function of the modular nature of TAD boundaries by systematically removing or inversing individual CTCF binding sites within two selected TAD boundaries. Again, we will measure how those perturbations to the modularity of TAD boundaries will influence the structure and insulating function of those boundaries.
The outcome of our project will provide a highly detailed and molecular characterization of how modular TAD boundaries engage with the loop extrusion machinery to create stable TADs. Its results will allow an important refinement of the existing models for TAD structure and function. Moreover, it will provide new leads to explain how distant structural variation, located within the extended transition zones that are formed by modular TAD boundaries, cause (moderate) disease-associated perturbations to gene regulation, DNA replication, recombination and repair.

Project coordination

Daan Noordermeer (Institut de Biologie Intégrative de la Cellule)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partnership

I2BC Institut de Biologie Intégrative de la Cellule
University of Pennsylvania / Perelman School of Medicine

Help of the ANR 285,562 euros
Beginning and duration of the scientific project: December 2021 - 36 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter