Biological Rosetta Stone

Over two hundred years ago, a remarkable discovery occurred in Egypt – a slab of black granite was uncovered from its millennia-old tomb in the sand, with identical inscriptions in Greek alphabet, Demotic (Coptic script), and ancient Egyptian hieroglyphics. Within a quarter century, the ancient Egyptian code was deciphered, and the recorded scripts of the Pharaohs were finally readable to historians and archaeologists.

We face a similar goal today: to decipher and decode the genome. Only, we do not have a Rosetta stone; we need to make one. This is where Synthetic Biology comes into play.


What is Synthetic Biology?

Synthetic Biology has been enabled by two important developments:

  • The Genomics revolution.
  • The decreasing cost of synthesizing double stranded DNA (<$0.1/bp).

The cheap cost for manufacturing DNA, and the enormous genomic data-bases provide us with what is effectively a huge “ware-house” of parts that can be put together in whatever way we imagine for a symbolic cost.

The Biological Rosetta Stone is a Synthetic Biology approach to research that involves developing increasingly complex libraries of scaffolded molecular structures that are complemented with nested biophysical models to develop a self-consistent and hierarchical understanding of our synthetic constructs.

We do this by first taking characterized genomic parts and putting them together or “wiring” them in novel architectures (depicted by the sequence in the top panel), next we develop a structure based thermodynamics model (middle panel), and finally we analyze the regulatory output using our model to try to extract the underlying design principles that will enable us to translate the structural architecture and sequence to some computational algorithm (bottom panel).

The litmus test is simple: if we can write a sequence down whose output can be predicted from the computational rules depicted in the Rosetta Stone, we can then apply this “key” to natural regulatory sequences in an attempt to decipher the regulatory genomic code.


The Biological Rosetta Stone Methodology

The Biological Rosetta Stone Methodology is depicted by the schema shown on the side. In a typical project, we begin by what we term model-guided design of our sequences. While making 100,000 - 250 bp sequences seems like a large number of sequences to explore, it still only covers an insignificant amount of the possible sequence space (250 bp = 2500 ~ 3x10150 sequences). Thus, we need to truncate the relevant space by any means possible. To do that we use biological data, bioinformatics, and thermodynamic models to generate an initial guess as to the outcome of the particular phenomenon that we wish to understand. Using this guess, we design an oligo-pool library whose purpose is to test the predictions of our model. The oligo-pool is then shot-gunned cloned into the model organism of choice, and using a combination of imaging and next-generation sequencing the data is obtained. We then use the data to improve the model, produce a more focused library design, and carry out additional measurements. Using this approach - we are able to study diverse phenomenon such as enhancer regulatory rules, RNA secondary structure and RBP binding, etc.