Synthetic Enhancers

Synthetic Enhancers
One of the main discoveries in biology in the post-genomic era is the realization that the difference between organisms (i.e. mouse and human) is not due to genes. Rather, the difference is rooted in the algorithm or program that controls when, where, and how much a particular gene is expressed. In a seminal work published several years ago (Wilson, M.D. et al. Science (2008) 322:434-438) showed that replacing the genomic content of a mouse cell with human DNA (i.e. leaving the original mouse proteomic profile in-tact), the resultant gene-expression profile resembled closely that of the human cell. Had the expression profile resembled that of the mouse, than the genes or protein could be considered to be the crucial difference between the two organisms. As a result of this work and others, a new picture of the genome is beginning to emerge, which suggests a model where the genome contains not only genes (protein-coding and RNA), but also a complex operating system, which decided when, where, and how much a particular gene is expressed.

In Eukaryotic organisms this program is executed via a host of regulatory sequences termed enhancers. Enhancers are complex regulatory regions that consist of many binding sites for several Transcription Factors (TFs). They are ubiquitous in all organisms, and are in particular associated in controlling gene expression during development. Enhancers are assumed to occupy ~10% of the genome (as compared with 1-2% for protein coding genes), and are likely numbered in the hundreds of thousand in the human genome alone.

While recent high-throughput analysis of enhancers has substantially increased our understanding of particular enhancers, the over-all generalization of the findings to other systems has been limited. This is due to the fact that unlike protein coding genes, where the vast majority of mutations will be detrimental to the protein, for enhancers nearly every mutation will produce some sort of functional change. This implies that in order to decipher a particular natural enhancer millions of mutations must be carried out in order to collect sufficient functional changes to develop a generalized mutational model.

In particular, to gain a faster insight into the underlying regulatory approach, we devised the model-guided synthetic enhancer approach (see Schema 1). Here, we design our synthetic enhancers based on the predictions produced by numerical simulation, which utilize the Self-Avoiding Wormlike chain model to predict the probability of looping based on particular binding sites. After several rounds of synthetic biology experiments and model improvement, we then use the model to make genomic predictions, and use both bioinformatic analysis and genome-editing techniques to test our predictions on real genomes. Our designs are based on the understanding that enhancers are modular objects, which can be divided into three modules or segments:

  • The driver module – akin to your computer’s power switch – without the presence of this TF, expression will not possible.
  • The program module – this is the area on the DNA that integrates binding sites for many TFs. The presence or absence of TFs either increases (activates) or decreases (represses) the probability for transcription. Thus, this module is akin to the software installed on your computer
  • The CPU module – this is the actual promoter region which typically binds a poised RNA polymerase holoenzyme complex. Transcription will commence only if DNA looping will take place and the driver forms a contact with the promoter. As a result, this region is responsible for integrating all the inputs and executing a computational like process, yielding an RNA transcript in a particular time and place.

This computational process is not typically digital, but rather analog.

A. Synthetic Bacterial Enhancers
We first tested our approach on bacterial enhancers. Bacterial enhancers are typically 100-200 bp in length, and comprise of a σ54 promoter and a tandem of binding sites for a driver complex, located upstream of the promoter. As a result, bacterial enhancers are significantly simpler than their Eukaryotic counterparts. Due to their simplicity, we were able to start our designs with a minimal enhancer, and proceed to increase its complexity. Our main goal was to try to determine what regulatory effects were possible in a bacterial enhancer architecture, and what was the underlying mechanism.
Using a combined theoretical and experimental approach, we found that bacterial enhancers can generate an intricate set of regulatory mechanisms that are mostly a consequence of DNA looping. We devised a self-avoiding wormlike chain model (SAWLC) to provide a possible mechanism, and using an iterative set of experiment modeling rounds, we were able to continuously refine it. To date we have constructed ~1000 synthetic bacterial enhancer circuits, and confirmed many of our looping model’s predictions (see Figure 1 below).

Figure 1: Combined effects of excluded volume, stiffening, and bending
Top – bacterial synthetic enhancer schematic depicting the NtrC driver-activator binding sites positioned 192 bp upstream from the center of the glnAp2 σ54 promoter. The synthetic bacterial enhancer has an additional LacI binding site, which is positioned at various distances (k) from the NtrC binding sites. The regulatory effect generated by either LacI or LacI-GST fusion is quantified via mCherry fluorescence expression level ratio. Expression level ratio is measured by dividing the basal mCherry expression of the synthetic enhancer when the LacI or LacI-GST TF is present by the expression level when the TF is absent. (Left Panel). Expression level ratio measurements for the synthetic enhancers with a single LacI binding site. Blue and red circles correspond LacI-GST and LacI expression level ratios, respectively. Note, the periodic behavior with ~11 bp periodicity. (Right panel) Model predictions. 1D cross-sections for N=192 links comparing two protrusion volumes that can also bend the thick chain by 10° (blue – 8.16 nm, red – 5.44 nm). The model shows that the excluded volume and bending effects oppose each other. While the former inhibits looping when the TF points is positioned inside the loop, the latter promotes it. Thus, if a bending protein’s size can be made sufficiently large, a 180° phase shift in the regulatory effect should be achieved. The experimental data confirms the mode’s predictions of the periodicity and the 180° phase shift due to the increased volume endowed to the LacI protein by the GST moiety.

For a more detailed description of our bacterial synthetic enhancers see:
Using synthetic bacterial enhancers to reveal a looping-based mechanism for quenching-like repressionNat. Commun. 7:10407  (2016).
Michal Brunwasser-Meirom, Yaroslav Pollak, Sarah Goldberg, Lior Levy, Orna Atar, Roee Amit

A protrusion can “eclipse” looping of a long self-avoiding chain(2016)
Yaroslav Pollak, Sarah Goldberg, Roee Amit

Self-avoiding wormlike chain model for double-stranded-DNA loop formation“, Physical Review  E 90,052602 (2014), Yaroslav Pollak, Sarah Goldberg, Roee Amit

B. Poised polymerase as a road-block and new bacterial regulatory mechanism
This project is a consequence of our synthetic enhancer strategy. We began by trying to examine genomic level questions of what is the importance of particular promoter architectures, and how close to one another can two enhancers be without interfering with each other’s promoter, etc. But by mixing and matching many different enhancer and promoter motifs, and taking the natural components out-of-context, we found that σ54 promoter can serve as powerful and directional downstream regulators of σ70 promoters. This regulatory capability implies that σ54 promoters, which can also function as simple σ54 binding sites can play a dual role of promoter and regulator within the bacterial genome. Using the oligo-pool technology, we tested all of E.coli’s, V. cholera potential σ54 binding sites, and an additional set of σ54 promoter from another 20 bacterial species for this silencing phenomenon. Using this focused design approach, we found that the silencing is encoded into multiple short CT-rich segments that cumulatively encode an anti-Shine-Dalgarno sequence. Thus, what we found is a “context-level” effect where bacterial insulation is encoded via a collection of short 3-5 nucleotide motifs that cumulatively insulate σ54 promoters from transcriptional read-through.

Figure 2Oligo-library analysis of the insulation phenomenon
(A) Oligo library design and schematic description of the protocol. In brief, the synthesized oligo library (Twist Bioscience) was cloned into E. coli competent cells which were then grown and sorted by FACS into 14 expression bins according to mCherry to eYFP fluorescence or expression level ratio (e.l.r). DNA from the cells of each bins was barcoded and pooled into a single sequencing run to produce an e.l.r profile for each variant. For details see Materials and Methods. (B) Library expression distribution. Heat map of smoothed, normalized number of reads per expression bin obtained for 10438 analyzed variants ordered according to increasing mean e.l.r. (C) Left: heat map ordering of the examined variants by mean e.l.r value, with silenced variants at the top. Middle: for each variant in the left panel, each enriched 5mer (E5mer) appearance is marked by a brown line at its position within the variant sequence. (Green shade) σ54 core promoter region. Top:  σ54 core promoter consensus sequence(Barrios et al., 1999) Right: Running average on the number of E5mers observed within a variant in the ordered heat map. Top: a PSSM summarizing a multiple alignment of the E5mers found with DRIMust.

Short CT-rich motifs encoded within σ54 promoters insulate downstream genes from transcriptional read-through
Lior Levy, Leon Anavy, Oz Solomon, Roni Cohen, Shilo Ohayon, Orna Atar, Sarah Goldberg, Zohar Yakhini, Roee Amit

C. Synthetic Eukaryotic Enhancers and The MRG-GRammar Project (Massive Reverse Genomics to Decipher Gene Regulatory Grammar)
We are now expanding our approach to Eukaryotic organisms through the MRG project . In the project we will be constructing synthetic enhancers in Mammalian and Yeast cells, building them from the ground up using endogenous drivers (e.g. c-Myc) or synthetic transcription factors (e.g. nuclease deficient-Cas9/Crispr/VP64). We plan to fund this work using the MRG-Grammer project.
MRG-GRammar aims to devise an entirely new strategy for deciphering the regulatory rules of gene regulation. We will leverage Synthetic Biology with cutting-edge DNA synthesis technologies and high-throughput analysis to generate new types of biological datasets that systematically explore all possible regulatory landscapes rather than just the naturally occurring regulatory sequences.
The extensive and unbiased nature of these unique data-sets will allow us to build new models explaining different aspects of regulatory activity, which will be tested in second-generation libraries, designed based on model predictions. Consequently, through such an iterative process, we expect to make a significant breakthrough in deciphering, and evolving, the regulatory code. Our strategy synergizes four orthogonal objectives that will form a new knowledge base from which the regulatory algorithm can be derived. We will employ our strategy on diverse model organisms from the tree of life: bacteria, yeast, mouse cell lines, mouse stem cells, human cell lines, and finally, whole D. melanogaster embryos.
We expect this multidisciplinary synthetic biology approach to generate a major technological advance, which will provide the community with algorithms that will not only decipher extant natural regulatory code, but also interpret variations leading to a profoundly deeper understanding of the origins of many diseases. We expect our models to also serve as a reference in designing and implementing accurate and more controllable synthetic biology devices, with applications in fuel production, healthcare and other industrial fields. Thus, our ultimate goal is to substantially accelerate the advance of technologies and knowledge related to generating systematic and personal therapeutic solutions based on the analysis of each individual’s natural genomic variations.
MRG-Grammar is funded by FET-Open and is a collaboration between the Amit group, Eran Segal’s group (WIS), Sarah Teichmann’s group (EMBL), Eileen Furlong’s group (EMBL), and Jussi Taipale’s group (Karolinska Institute).

MRG-StrategyFigure 3: MRGrammar strategy schematic


D. Our Numerical simulations
In order to properly simulate enhancer-based transcriptional regulation (bacterial and Eularyotic), we devised a new approach to simulate nucleoprotein assemblies (see animation). Our approach bridges the gap between the atomic-level molecular dynamics simulations (which can only be used to simulate DNA chains 25 bp long) to the qualitative and poorly predictive wormlike chain model. In our model DNA is treated as a thick chain, and a nucleoprotein assembly is treated as a thick chain with protrusions (see thick chain drawing). The thick-chain with protrusion approach is a crude structural model of the enhancer, and then using statistical mechanics tools the possible spatial configurations that such a chain can take are faithfully simulated. After generating sufficiently large configurational ensembles, we focus on those states, which are most likely to be “looped” or transcriptionally active, which subsequently allow us to predict the regulatory output. Our model allows us to also incorporate other structural parameters of protein-DNA interactions such as degree of bending, and stiffening of the nucleoprotein chain, which together generates a powerful and predictive simulation of how particular enhancer binding site architectures affect the transcriptional output. (see animations).
The thick-chain with protrusion model predicts that many regulatory effects can be induced as a result of an excluded volume effect, or by the shear presence and the particular location of the protrusion on the chain. This implies that literally, any transcription factor can generate a substantial regulatory effect when properly positioned within an enhancer. Our model provides an elegant mechanism for “quenching-repression”, irrespective of additional chromatin modifying effects. Quenching is a repression phenomenon, where a transcription factor is somehow able to down-regulate expression despite binding several thousands of base-pairs away from the promoter, and up to 150 bp away from the nearest activator on an enhancer. Our model shows (see schema below) that it is possible to generate such an effect by simple excluded volume considerations. In this case, the TF or protruding object essentially “eclipses” or partially blocks the line of site of one chain end from the other, thus leading to repression. The eclipsing effect is strongly dependent on the protrusion’s position relative to the chain end, and such must exhibit an 11 bp (DNA helical repeat) periodicity. This periodic signature was observed by us in our bacterial synthetic experiments experiments (see Fig. 1 above). A typical set of results is shown in the schema below.

growth1 - tiny

animation

Numerical simulations

Figure 4: Explaining quenching repression.  A) F (fold down regulation) is plotted as a function of chain length L for several values of Ro and K, and γk = 0° (solid lines) or 180° (dashed lines). The locations of the protrusions are denoted by circles on the corresponding curves. Note that an inhibitory regulatory effect is predicted independent of chain length (B) F plotted as a function of Ro and K, for γk = 0. F is calculated numerically as the average of F (L) over the range of L values where F (L) ≈ const. The solid red curve is a visual aid: if the segment of the chain between the origin and the protrusion location was straight, points on the red curve would result in the protrusion touching the looping volume δr. Quenching repression effects can be observed up to about 150 bp from the chain end, consistent with experimental observations.

Using synthetic bacterial enhancers to reveal a looping-based mechanism for quenching-like repressionNat. Commun. 7:10407  (2016).
Michal Brunwasser-Meirom, Yaroslav Pollak, Sarah Goldberg, Lior Levy, Orna Atar, Roee Amit

 

E. Genomic consequences of our synthetic enhancer experiments

 The importance of INDELS
The main consequence from our combined numerical simulation and synthetic biology experiments is that short Insertion or Deletions (INDELs) of DNA sequences, whose sequence does not code for anything, can have a functional role. Thus, regulatory effects can be triggered by not only sequence-specific information, but also from the shear physical characteristics of DNA. In this case, we observed that INDELS which are integer multiples of the DNA helical repeat (i.e 11, 22, 32-33, etc.) inside loops function more as neutral mutations and as a result are not function altering. Alternatively, INDELS which are odd-integer multiples of half the DNA’s helical repeat (i.e. 5-6, 16-17, etc.) are function altering, as they change the relative orientation between adjacent proteins on the enhancer. This observation led us to predict that in related enhancers, 11-bp INDELS should be more prevalent due to their non-function altering nature, while 6-bp INDELs should be rare.
We are currently in the process of checking this prediction across all bacterial genomes, and so far were able to confirm this for the qrr genes in the Vibrio family. To do so, we annotated and 64 putative bacterial enhancers which regulate a qrr gene in various Vibrio species. We cross-correlated all enhancer sequences and found a distinct periodicity of 10.5 bp in the computation. In addition, for sequences that were immediately upstream of the enhancer no such correlation was found.  Thus, this analysis seems to provide additional support to the biological validity of the regulatory mechanism we propose.

.

Bioinfo qrr schema
Figure 5: Natural bacterial enhancers show evidence for 11 bp INDELS hypothesis: (Top) Schema shows the phylogenetic relationship between the different Vibrio species, each containing a different number of qrr (quorum regulation rna) genes. (Middle) a schema of the qrr enhancer with LuxO as the activator. We cross correlated the intra-loop sequence in all qrr enhancers. (Bottom) results of the cross correlation computation as a function of looping lenght showing a periodic function with 10.5 bp periodicity.

Using synthetic bacterial enhancers to reveal a looping-based mechanism for quenching-like repressionNat. Commun. 7:10407  (2016).
Michal Brunwasser-Meirom, Yaroslav Pollak, Sarah Goldberg, Lior Levy, Orna Atar, Roee Amit

Short CT-rich motifs encoded within σ54 promoters insulate downstream genes from transcriptional read-through
Lior Levy, Leon Anavy, Oz Solomon, Roni Cohen, Shilo Ohayon, Orna Atar, Sarah Goldberg, Zohar Yakhini, Roee Amit

 

Depletion of CT-rich motifs upstream of RBS
One of the immediate consequences of the bacterial insulation phenomenon that we uncovered (see Fig. 2) is the detrimental role that CT-rich motifs can play if positioned at a high concentration immediately upstream of bacterial RBS sites. Thus, to mitigate against this insulation effect, regions upstream of RBSs should on average be depleted of CT-rich 3-5 trinucleotide sequences. To check for this, we analyzed 591 bacterial mesophilic genomes and found that 364 genomes were indeed depleted for CT-rich motifs upstream of putative RBS binding sites confirming our hypothesis.

Figure 6

Figure 6: Analysis of prevalence of CT-rich k-mers around E.coli promoters and in other bacterial genomes
(A) A scheme for the analysis of the occurrences of aSD:SD around σ54 TSS positions. 1. Step i – we scan the 50bp region downstream to TSS and locate the SD. 2. Step ii – the best matched aSD, i.e. the hexamer which is the best Hamming reverse complement to the SD we found in Step i, is found in the 50bp upstream the TSS. (B) Venn diagram for promoters found with an aSD sequence that is either a perfect match or at most 1bp away. Square: the space of all putative E. coli promoters. Red circle: the space of all putative E. coli σ54 promoters. Green and purple circles: promoters that possess either a perfect aSD match or one off by 1bp (C) Distribution of % proximal occurrences (%6mer:SD) within 300bp separation (see Methods detail) of CT-rich to GA-rich (aSD:SD) pairs (orange) as compared with the % proximal occurrences of random to GA-rich (R:SD) hexamer pairs for E. coli. (D) Scatter plot for 591 mesophile and psychrophile bacterial genomes, where each genome is represented by the mean value for the aSD:SD (x-axis) and R:SD (y-axis) % proximal occurrence distributions. Dashed line (x=y) corresponds to the null model assuming that mean aSD:SD should equal to mean R:SD. Vertical dashed-dot line (x=7.5% occurrences at 300bp) corresponds to the null expected value