Τhe Biological Rosetta stone approach is currently being applied in two major research directions:
- Synthetic Enhancers
- Synthetic RNA scaffolds.
In addition, we also have a third direction, which emerges from the various undergraduate iGEM projects.
One of the main discoveries in biology in the post-genomic era is the realization that the difference between organisms (i.e. mouse and human) is not due to genes. Rather, the difference is rooted in the algorithm or program that controls when, where, and how much a particular gene is expressed. In a seminal work published several years ago (Wilson, M.D. et al. Science (2008) 322:434-438) showed that replacing the genomic content of a mouse cell with human DNA (i.e. leaving the original mouse proteomic profile in-tact), the resultant gene-expression profile resembled closely that of the human cell. Had the expression profile resembled that of the mouse, than the genes or protein could be considered to be the crucial difference between the two organisms. As a result of this work and others, a new picture of the genome is beginning to emerge, which suggests a model where the genome contains not only genes (protein-coding and RNA), but also a complex operating system, which decided when, where, and how much a particular gene is expressed.
In Eukaryotic organisms this program is executed via a host of regulatory sequences termed enhancers. Enhancers are complex regulatory regions that consist of many binding sites for several Transcription Factors (TFs). They are ubiquitous in all organisms, and are in particular associated in controlling gene expression during development. Enhancers are assumed to occupy ~10% of the genome (as compared with 1-2% for protein coding genes), and are likely numbered in the hundreds of thousand in the human genome alone.
While recent high-throughput analysis of enhancers has substantially increased our understanding of particular enhancers, the over-all generalization of the findings to other systems has been limited. This is due to the fact that unlike protein coding genes, where the vast majority of mutations will be detrimental to the protein, for enhancers nearly every mutation will produce some sort of functional change. This implies that in order to decipher a particular natural enhancer millions of mutations must be carried out in order to collect sufficient functional changes to develop a generalized mutational model.
In particular, to gain a faster insight into the underlying regulatory approach, we devised the model-guided synthetic enhancer approach (see Schema 1). Here, we design our synthetic enhancers based on the predictions produced by numerical simulation, which utilize the Self-Avoiding Wormlike chain model to predict the probability of looping based on particular binding sites. After several rounds of synthetic biology experiments and model improvement, we then use the model to make genomic predictions, and use both bioinformatic analysis and genome-editing techniques to test our predictions on real genomes. Our designs are based on the understanding that enhancers are modular objects, which can be divided into three modules or segments:
- The driver module – akin to your computer’s power switch – without the presence of this TF, expression will not possible.
- The program module – this is the area on the DNA that integrates binding sites for many TFs. The presence or absence of TFs either increases (activates) or decreases (represses) the probability for transcription. Thus, this module is akin to the software installed on your computer
- The CPU module – this is the actual promoter region which typically binds a poised RNA polymerase holoenzyme complex. Transcription will commence only if DNA looping will take place and the driver forms a contact with the promoter. As a result, this region is responsible for integrating all the inputs and executing a computational like process, yielding an RNA transcript in a particular time and place.
This computational process is not typically digital, but rather analog.
Synthetic Bacterial Enhancers
We first tested our approach on bacterial enhancers. Bacterial enhancers are typically 100-200 bp in length, and comprise of a σ54 promoter and a tandem of binding sites for a driver complex, located upstream of the promoter. To date we have constructed ~500 synthetic bacterial enhancers, and confirmed many of our model’s predictions (links to publications, seminars).
Poised Polymerase as a road-block and new bacterial regulatory mechanism
This project is a consequence of our synthetic enhancer strategy. We began by trying to examine genomic level questions of what is the importance of particular promoter architectures, and how close to one another can two enhancers be without interfering with each other’s promoter, etc. But by mixing and matching many different enhancer and promoter motifs, and taking the natural components out-of-context, we found that σ54 promoter can serve as powerful and directional downstream regulators of σ70 promoters. This regulatory capability implies that σ54 promoters, which can also function as simple σ54 binding sites can play a dual role of promoter and regulator within the bacterial genome. At the present time, we are using the oligo-pool technology to test all of E.coli’s potential σ54 binding sites, and test which of these sites actually plays a regulatory role.
Synthetic Eukaryotic Enhancers and The MRG-GRammar Project (Massive Reverse Genomics to Decipher Gene Regulatory Grammar)
We are now expanding our approach to Eukaryotic organisms through the MRG project . In the project we will be constructing synthetic enhancers in Mammalian and Yeast cells, building them from the ground up using endogenous drivers (e.g. c-Myc) or synthetic transcription factors (e.g. nuclease deficient-Cas9/Crispr/VP64). We plan to fund this work using the MRG-Grammer project.
MRG-GRammar aims to devise an entirely new strategy for deciphering the regulatory rules of gene regulation. We will leverage Synthetic Biology with cutting-edge DNA synthesis technologies and high-throughput analysis to generate new types of biological datasets that systematically explore all possible regulatory landscapes rather than just the naturally occurring regulatory sequences.
The extensive and unbiased nature of these unique data-sets will allow us to build new models explaining different aspects of regulatory activity, which will be tested in second-generation libraries, designed based on model predictions. Consequently, through such an iterative process, we expect to make a significant breakthrough in deciphering, and evolving, the regulatory code. Our strategy synergizes four orthogonal objectives that will form a new knowledge base from which the regulatory algorithm can be derived. We will employ our strategy on diverse model organisms from the tree of life: bacteria, yeast, mouse cell lines, mouse stem cells, human cell lines, and finally, whole D. melanogaster embryos.
We expect this multidisciplinary synthetic biology approach to generate a major technological advance, which will provide the community with algorithms that will not only decipher extant natural regulatory code, but also interpret variations leading to a profoundly deeper understanding of the origins of many diseases. We expect our models to also serve as a reference in designing and implementing accurate and more controllable synthetic biology devices, with applications in fuel production, healthcare and other industrial fields. Thus, our ultimate goal is to substantially accelerate the advance of technologies and knowledge related to generating systematic and personal therapeutic solutions based on the analysis of each individual’s natural genomic variations.
MRG-Grammar is funded by FET-Open and is a collaboration between the Amit group, Eran Segal’s group (WIS), Sarah Teichmann’s group (EMBL), Eileen Furlong’s group (EMBL), and Jussi Taipale’s group (Karolinska Institute).
Our Numerical simulations
In order to properly simulate enhancer-based transcriptional regulation, we devised a new approach to simulate nucleoprotein assemblies (see animation). Our approach bridges the gap between the atomic-level molecular dynamics simulations (which can only be used to simulate DNA chains 25 bp long) to the qualitative and poorly predictive wormlike chain model. In our model DNA is treated as a thick chain, and a nucleoprotein assembly is treated as a thick chain with protrusions (see thick chain drawing). The thick-chain with protrusion approach is a crude structural model of the enhancer, and then using statistical mechanics tools the possible spatial configurations that such a chain can take are faithfully simulated. After generating sufficiently large configurational ensembles, we focus on those states, which are most likely to be “looped” or transcriptionally active, which subsequently allow us to predict the regulatory output. Our model allows us to also incorporate other structural parameters of protein-DNA interactions such as degree of bending, and stiffening of the nucleoprotein chain, which together generates a powerful and predictive simulation of how particular enhancer binding site architectures affect the transcriptional output.
The thick-chain with protrusion model predicts that many regulatory effects can be induced as a result of an excluded volume effect, or by the shear presence and the particular location of the protrusion on the chain. This implies that literally, any transcription factor can generate a substantial regulatory effect where properly positioned within an enhancers.
Genomic predictions – INDELS
The main consequence from our combined numerical simulation and synthetic biology experiments is that short Insertion or Deletions (INDELs) of DNA sequences, whose sequence does not code for anything, can have a functional role. Thus, regulatory effects can be triggered by not only sequence-specific information, but also from the shear physical characteristics of DNA. In this case, we observed that INDELS which are integer multiples of the DNA helical repeat (i.e 11, 22, 32-33, etc.) inside loops function more as neutral mutations and as a result are not function altering. Alternatively, INDELS which are odd-integer multiples of half the DNA’s helical repeat (i.e. 5-6, 16-17, etc.) are function altering, as they change the relative orientation between adjacent proteins on the enhancer. This observation led us to predict that in related enhancers, 11-bp INDELS should be more prevalent due to their non-function altering nature, while 6-bp INDELs should be rare.
We are currently in the process of checking this prediction across all bacterial genomes, and so far were able to confirm this for the qrr genes in the Vibrio family. To do so, we annotated and 64 putative bacterial enhancers which regulate a qrr gene in various Vibrio species. We cross-correlated all enhancer sequences and found a distinct periodicity of 10.5 bp in the computation. In addition, for sequences that were immediately upstream of the enhancer no such correlation was found. Thus, this analysis seems to provide additional support to the biological validity of the regulatory mechanism we propose.