Synthetic lonc non-coding RNA scaffolds – slncRNAs
Molecular scaffolding is ubiquitous in biology. Scaffolding facilitates an intricate set of enzymatic cascades or physical reactions in close proximity by having a complex of one or more molecular species to which several proteins or RNA molecules can bind. Theoretically, by bringing together reactants into a close proximity, molecular scaffolding enables complex chemical reaction cascades that may not be possible if a standard 3D search was required. Another hallmark of molecular scaffolding is its modular structure, which is an advantage from both an evolutionary and bio-engineering perspectives. This implies that functional components can be exchanged with modules from other scaffolds, facilitating the ability to rapidly evolve or engineer novel function. However, despite its pervasiveness in natural biological systems, scaffolding had been thus far underexplored in synthetic biology due to an insufficient understanding of the modular parts, which in turn causes a critical deficiency in the knowledge needed to assemble them to functional scaffolds.
In recent years, my group has focused on RNA to study molecular scaffolds. Unlike DNA, RNA can provide a structurally flexible sequence programmable scaffold, which on the one hand takes advantage of modern DNA synthesis technology to allow us to generate rapidly and cost-effectively large libraries of sequence variants, while on the other allows us to explore simultaneously many different structural variants. In order learn how to reliably construct a functional molecular scaffold, a systematic characterization of the sequence-structure-function relationship encoded within scaffold modules is needed.
Once this is achieved, the knowledge obtained will facilitate the design of functional RNA-based scaffolds that we call synthetic long non-coding RNA (slncRNA). Unlike protein and DNA-based scaffolds, which are either highly complex (protein) or structurally constrained (DNA), RNA could provide a more promiscuous substrate for engineered scaffolds for a wide variety of applications, from therapeutics to clean-tech. Thus, by elucidating the design rules and function encoded within natural lncRNA molecules, we should be able not only to engineer molecules with predicted regulatory output, but also design nano-probes and nanoscale assembly lines based on RNA scaffolds. These scaffolds have the potential to usher an era of computer-based-design for the genetically-encoded chemistry of unprecedented complexity, with applications in a wide range of industries, such as pharmaceuticals, food, cosmetics, and general material science.
For some of the progress that we have made towards slncRNA scaffolds see below:
A. Exploring the effect of secondary structure on reporter gene expression.
We use the tools of reporter expression to explore the structure-function relationship of RNA molecules. In perhaps the simplest example, we install a hairpin at various positions of the 5’UTR or the header of a reporter gene and explore the resultant effect on the rate of mCherry reporter production. The data shows several interesting features. In the 5’UTR the level of expression depends on the sequence content of the flanking regions. If the sequence that flanks the hairpin encoded an anti-Shine Dalgarno (aSD) sequence (CT-rich) than at a particular length translational repression will ensue. Our results show that the presence of the hairpin destabilized the aSD:SD hairpin necessitating a longer CT-rich segment to trigger repression.
A more interesting phenomenon occurs when the hairpins are placed in the reporter gene’s header region. Here, we observe a 3 nt periodic function for all hairpins studies, which likely indicates that the ribosome capability to unwind the hairpin is dependent on position. This repression phenomenon opens the possibility to generate a high-throughput binding assay for any structure-binding RBP (see below).
Figure 1: Dependence of mCherry expression level on hairpin position
(A) The four hairpins used in this experiment were the native (wt) binding sites for the MS2, PP7, GA, and Qβ coat proteins. Stop codons and start codons inside binding sites are highlighted, in bold and red. (B) mean mCherry basal expression levels for 60 variants as a function of distance from the A of the AUG. We did not position hairpins in the region from the RBS to the AUG (colored gray). Data for PP7-wt (red), GA-wt (green), Qβ-wt (blue), and MS2-wt (magenta) binding sites as a function of hairpin position d, in the absence of RBPs (non-induced case). Dashed line for d>0 (right) represents the average basal level for two constructs with a non-structured sequence replacing the binding site downstream of the AUG. Dashed line for d<0 (left) represents the average basal level for four constructs without a binding site in at positions ranging from d=-21 to -29. Each of the wt binding sites has a stop codon encoded within its sequence [see (A)]. As a result, every binding site has a set of positions (d=3,6, etc. for MS2-wt, PP7-wt, and Qβ-wt, and d=5,8, etc. for GA-wt), for which a stop codon in frame with the upstream AUG prevents proper readout of the structure’s effect on expression. For this same frame, the start codons inside the hairpins did not result in mCherry expression. CBPS stands for coat-protein binding site (schemas, top). (C) A simple kinetic scheme that can explain the 3-nt periodicity observed for the mCherry expression. The reading frame D in which the ribosome encounters the hairpin determines the ultimate rate of mRNA unwinding, thus leading to three possible timescales.
B. RBP-RNA interaction – how do RNA binding proteins affect the RNA’s function and structure?
B.1 Repression effect and binding assay
When placing an RBP binding site in the header of genes, we noticed that for the phage coat proteins of PP7 (PCP), MS2 (MCP), Qβ (QCP), and GA (GCP) a strong repression effect ensues. This repression only occurs when the hairpin is placed in the header of genes within 11 nucleotides of the AUG coinciding with the ribosome’s initiation region. This leads us to hypothesize that upon binding of the RBP the 30S subunit is blocked from forming an elongating 70S subunit.
For more details see:
Figure 2: Translational regulation by an RBP-hairpin complex in the ribosomal initiation region
(A) Experimental schematic. Top: plasmid expressing the RBP-CP fusion from a pRhlR inducible promoter. Bottom: a second plasmid expressing the reporter plasmid with the RBP binding site encoded within the 5′ end of the gene (at position d>0). (B) Dose-response functions for PCP with a reporter mRNA encoding PP7-wt at three positions: d=7 (red), d=11 (blue), and d=16 (green) nt. (C) A schematic for the mechanistic repression model. The bound RBP (middle) is able to disrupt the formation of the elongating 70S subunit, leaving the bound 30S subunit inactive. If the hairpin is positioned downstream of the initiation region (bottom), the 70S subunit is able to assemble and subsequently unwind the RBP-bound structure at a rate Tslow, which is substantially slower than the rate for a standard elongation step Tfast. (D) Fold-repression measurements for PCP (red) and QCP (blue) as a function of hairpin position d. Fold repression is computed by the ratio of the mCherry rate of production at no induction to the rate of production at full induction. Note, for two constructs (QCP with d=4 and d=8) the basal levels without induction were too low for fold-repression measurements
B.2 SHAPE-Seq complements reporter expression assay
Due to RNA’s complex structure-function relationship, we needed a complementary assay to provide additional support to our hypothesis. Thus, to provide additional evidence for our mechanistic interpretation, we carried In-cell SHAPE-Seq showing that that hairpin and initiation region are non-modified or protected over a large region corresponding to the initiation region, while in the non-initiation region no protection is observed. This indicates that both the RBP and 30S subunit are locked into place leading to the observed repression phenomenon as the figures shows.
Figure 3: Schematic of the in-cell SHAPE-seq protocol used by our group
Schematic overview of SHAPE-seq experiment. (A) Overnight-grown bacterial strains harboring both the RBP-binding site plasmid (PP7-wt, δ =5 containing the mCherry reporter) and the RBP-fusion plasmid (PCP-mCerulean) are split into two samples and PCP-mCerulean expression are induced (using C 4 -HSL) for one of them. Following protein expression, each bacterial sample is further split and treated with either DMSO (as the non-modified control) or NAI. Subsequently, RNA is isolated and either further chemically probed (samples 2+4) or directly used for subsequent steps of SHAPE-seq (samples 1+3). (B) Following 2’ hydroxyl acylation and subsequent RNA isolation, RNA samples are reverse-transcribed using a gene-specific primer that binds to the target transcript. During reverse transcription, reverse transcriptase is stalled one nucleotide before the modification. Subsequently, a single-stranded 5’ phosphorylated (5’P) and 3-carbon spacer (3’C) adapter sequence is ligated to the obtained cDNAs, which serves in the next step as a primer-binding site for the Illumina index primers to prepare double-stranded DNA for Illumina next-generation sequencing.
Figure 4: SHAPE-seq analysis of the PP7-wt binding site in absence and presence of RBP
(A) In vitro, SHAPE-seq read ratio for the construct containing PP7-wt. (Inset) Structure of the PP7-wt binding site, with yellow and green boxes highlighting regions that are prone to NAI modification. (B) Comparison of in vivo SHAPE-seq ratios for induced (blue) and non-induced (red) strains containing a PP7-wt binding site in the d=5 position. (Inset) Quantitative PCR measurements of the induced (blue) and non-induced (red) strains. (C-E) a structural depiction of the binding site and the overall segment of the mRNA molecule to which SHAPE-seq was applied. Each base in the structure is colored with its “reactivity” score. A score means that the base was likely to be single stranded, and thus modified in the assay. All structures were computed with RNA fold. Reactivity scores are shown for the in vivo non-induced (C), in vivo induced (D), and in vitro (E) cases respectively. Notice that high reactivity in all bases for the non-induced case, which is consistent with a translationally active molecule that lacks secondary structure due to multiple translating ribosomes. Alternatively, a comparison of the in vitro and in-vivo induced cases shows that the binding site and flanking regions are not modified as compared with the in vitro control, and thus likely protected by a protein or protein complexes.
B.3. Using the RBP-based repression phenomenon as a new binding assay for RBPs
Finally, using this effect, we have been able to adapt this phenomenon to a high throughput RBP binding assay, which allows us to scan rapidly for alternative RBP binding sites illuminating questions on the nature and mechanism of RBP binding. Finding additional binding sites is crucial for being able to efficiently construct slncRNA molecules as such sites will preclude us from using repeat sequences, which are detrimental for most designs.
Figure 5: Repression effect can be used to estimate an effective dissociation constant KRBP
(A) Structure schematic for the 11 binding sites used in the binding affinity study. Red nucleotides indicate mutations from the original wt binding sequence. Abbreviations: US/LS/L/B = upper stem/ lower stem/ loop/ bulge, m = mutation, s = short, struct = significant change in binding site structure. (B) Dose responses for 175 variants whose basal rate of production levels were > 50 a.u./hr. Each response is divided by its maximal mCherry level, for easier comparison. Variants are arranged in order of increasing fold up-regulation. (C) Normalized KRBP for variants that generated a detectible down-regulatory effect for at least one position. Blue corresponds to low KRBP, while red indicates high KRBP. If there was no measurable interaction between the RBP and binding site, KRBP was set to 1.
B.4 Protein coding riboswitches
Another surprising regulatory effect that we identified in our studies is up-regulation. Namely, RBPs such as PCP or MCP can either down or up-regulate expression depending on the location of their binding site on the RNA molecule. Up-regulation only occurs when the binding site is positioned at the 5’UTR independent of the distance to the AUG. Using a combination of shape-seq and expression level experiments, we are able to show that the RBPs upon binding induces structural change which is akin to riboswitching, hence we were able to make the first protein-sensing riboswitches.
Figure 6. Translational stimulation upon RBP binding in the 5’ UTR. (A) Heatmap of the dose responses of the 5’ UTR variants. Each response is divided by its maximal mCherry level, for easier comparison. Variants are arranged in order of increasing fold up-regulation. (B) Normalized KRBP. Blue corresponds to low KRBP, while red indicates no binding. If there was no measurable interaction between the RBP and binding site, KRBP was set to 1. (C) Bar graph showing fold change of each RBP—binding-site pair for all 11 binding sites, as follows: QCP-mCer (purple), PCP-mCer (green), MCP-mCer (red), and GCP-mCer (blue). Values larger and smaller than one correspond to up- and down-regulation, respectively. (Inset) Dose response function for MS2-U(-5)C with MCP at positions d=-23,-26,-29, and -31 nt from the AUG. (D) Dose response functions for two strains containing the PP7-wt (red) and PP7-USs (blue) binding sites at d=-29 nt from the AUG. (E) Adding CU-rich (red), AU-rich (blue), or random (green) flanking sequences upstream of the RBS. While basal levels are clearly affected by the presence of a strong CU-rich flanking sequence, the nature of the regulatory effect is apparently not determined by the strength of the aSD.
Figure 7. Up-regulation is dependent on binding-site free energy. (A) Schematic model for translation stimulation. Sizes of illustrated states reflect their probabilities. (Top-left) A free-energy landscape of the PP7-wt structure showing a stable intermediate state with a trapped 30S subunit, and a large energy barrier between the intermediate state and the folded-binding-site state. Arrows correspond to the transition rates between the different kinetic states. Thicker arrows correspond to faster rates. In this case, k2 is a fast process, which traps the molecule in a metastable state, with slow rates k-2 and k3 to transition out of it. (Top-middle) The stable intermediate state is destabilized for the PP7-USs binding site, resulting in a semi-stable partially-folded-binding-site state. This is depicted by the arrows with fast transition rates k-2 and k3, that generate an unstable intermediate kinetic state. (Top-right) Increasing the bottom stem length (PP7-wt+3) reduces the energy barrier between the intermediate and folded-binding-site states, thus destabilizing the intermediate state. This is depicted by the fast k2, k3, and slow k-2, k-3 rates. (Bottom) Energy landscape with RBP bound. Bound RBP reduces the energy barrier between intermediate and folded-binding-site states. (B) Fold change for binding sites with an extra 3 (blue), 6 (magenta), and 9 (green) stem base-pairs are shown relative to the fold change for PP7-wt (red). (Inset) KRBP calculated for all constructs shown in Figure 5, with corresponding RBPs (MCP or PCP). (C) Basal levels and logarithm of fold change for dose responses of all constructs with their corresponding RBPs (MCP or PCP).
B.5 Single molecule in single cell imaging – the problem of repeats.
Using the insights obtained from the expression level experiments, we are now able to design newer and better PCP and MCP cassettes, where the RBP is fused to a fluorescent reporter. These do not contain repeats and the binding sites are spaced by hairpins. This apparently leads to more efficient RBP-FP chimera binding. The images below show that a cassette with 5 MCP binding site can be easily imaged in E. coli. We are currently working on structurally characterizing these cassettes and engineering novel ones.
Figure 8: Genetically encoded nanoprobes. (A) Cassettes of multiple RBP binding sites and linkers are designed according to the general schematic with alternating RBP binding sites non-RBP binding hairpin structures. (B) When expressed with the RBP-FP chimera the slncRNA forms localized spots, which can be analyzed. These can then be diversified to form genetically encoded nanoprobes for multiplexing real-time tracking of several regulatory loci. The image depicts E.coli cells expressing a cassette of 4 PCP binding sites separated by a 15-bp insulator hairpin linker with a small bulge. (C) Thermodynamic numerical simulation, which estimates RBP occupancy on multi-binding site cassettes. The model aids in nanoprobe design. The k constants correspond to different rates that may be involved in the cassette folding, unfolding and various RBP binding processes.
C. How does RNA bind to DNA?
C.1 Triplex-seq and Hoogsteen-base-pairing.
In order to fully facilitate a nano-assembly line, slncRNA molecules need to bind DNA to allow for multiple such scaffolds in close proximity. RNA-DNA binding can be mediated by a DNA-RNA binding protein (DRBP). A more intriguing option is a direct interaction of RNA-DNA via triple helix formation. Triple-helix formation is facilitated by another set of hydrogen bonds called Hoogsteen bonds. The problem with deciphering the triplex code is rooted in the especially large sequence space that is available for this interaction, as one needs to match a triplex target sites with its triplex forming oligo. As a result, while this interaction was discovered over 50 years ago, there are many determinants that are still unclear, specifically:
- For purine-rich triplex target sites (TTSs) – what is the required length of a purine-rich stretch in a triplex forming oligo (TFO) of given length?
- How many non-purine nucleotides can be tolerated in a purine-rich TFO?
- Are G and A equally preferred, or are G-tracks dominant in purine-rich TFOs?
- What role does thymine play in TFO binding in both anti-parallel (purine motif) and parallel (pyrimidine motif) binding?
- What is the dependence of TFO sequence and binding affinity on pH?
- Can a position-specific TFO scoring matrix (PSSM) logo be established for each TTS?
To overcome the technological hurdles and limitation of sequence space, we are exploring triplex formation both in vitro and in vivo, using large libraries of potential Triplex forming oligos that are custom manufactured using state-of-the-art DNA synthesis technology. Below, we show some preliminary data from our novel assay, which we call Triplex-seq. The data shows that when an oligo containing equal amounts of G,C, and T at particular positions is transfected into mammalian cells, a subset that is enriched for G’s is pull down with the genome. This result is consistent with previous triplex formation studies, but also with a G-quadruplex formation. Thus, it remains to be seen whether we are observing directly triplex- formation in vivo.