RESUMO
Whether synthetic genomes can power life has attracted broad interest in the synthetic biology field. Here, we report de novo synthesis of the largest eukaryotic chromosome thus far, synIV, a 1,454,621-bp yeast chromosome resulting from extensive genome streamlining and modification. We developed megachunk assembly combined with a hierarchical integration strategy, which significantly increased the accuracy and flexibility of synthetic chromosome construction. Besides the drastic sequence changes, we further manipulated the 3D structure of synIV to explore spatial gene regulation. Surprisingly, we found few gene expression changes, suggesting that positioning inside the yeast nucleoplasm plays a minor role in gene regulation. Lastly, we tethered synIV to the inner nuclear membrane via its hundreds of loxPsym sites and observed transcriptional repression of the entire chromosome, demonstrating chromosome-wide transcription manipulation without changing the DNA sequences. Our manipulation of the spatial structure of synIV sheds light on higher-order architectural design of the synthetic genomes.
Assuntos
Núcleo Celular , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Cromossomos/genética , Genoma Fúngico , Biologia Sintética/métodosRESUMO
Engineering proteins with desired functions and biochemical properties is pivotal for biotechnology and drug discovery. While computational methods based on evolutionary information are reducing the experimental burden by designing targeted libraries of functional variants, they still have a low success rate when the desired protein has few or very remote homologous sequences. Here we propose an autoregressive model, called Temporal Dirichlet Variational Autoencoder (TDVAE), which exploits the mathematical properties of the Dirichlet distribution and temporal convolution to efficiently learn high-order information from a functionally related, possibly remotely similar, set of sequences. TDVAE is highly accurate in predicting the effects of amino acid mutations, while being significantly 90% smaller than the other state-of-the-art models. We then use TDVAE to design variants of the human alpha galactosidase enzymes as potential treatment for Fabry disease. Our model builds a library of diverse variants which retain sequence, biochemical and structural properties of the wildtype protein, suggesting they could be suitable for enzyme replacement therapy. Taken together, our results show the importance of accurate sequence modelling and the potential of autoregressive models as protein engineering and analysis tools.
Assuntos
Engenharia de Proteínas , Humanos , Engenharia de Proteínas/métodos , alfa-Galactosidase/genética , alfa-Galactosidase/metabolismo , Mutação , Algoritmos , Modelos Moleculares , Biologia Computacional/métodos , Proteínas/metabolismo , Proteínas/química , Proteínas/genéticaRESUMO
Pioneering advances in genome engineering, and specifically in genome writing, have revolutionized the field of synthetic biology, propelling us toward the creation of synthetic genomes. The Sc2.0 project aims to build the first fully synthetic eukaryotic organism by assembling the genome of Saccharomyces cerevisiae. With the completion of synthetic chromosome VIII (synVIII) described here, this goal is within reach. In addition to writing the yeast genome, we sought to manipulate an essential functional element: the point centromere. By relocating the native centromere sequence to various positions along chromosome VIII, we discovered that the minimal 118-bp CEN8 sequence is insufficient for conferring chromosomal stability at ectopic locations. Expanding the transplanted sequence to include a small segment (â¼500 bp) of the CDEIII-proximal pericentromere improved chromosome stability, demonstrating that minimal centromeres display context-dependent functionality.
RESUMO
We describe the complete synthesis, assembly, debugging, and characterization of a synthetic 404,963 bp chromosome, synIX (synthetic chromosome IX). Combined chromosome construction methods were used to synthesize and integrate its left arm (synIXL) into a strain containing previously described synIXR. We identified and resolved a bug affecting expression of EST3, a crucial gene for telomerase function, producing a synIX strain with near wild-type fitness. To facilitate future synthetic chromosome consolidation and increase flexibility of chromosome transfer between distinct strains, we combined chromoduction, a method to transfer a whole chromosome between two strains, with conditional centromere destabilization to substitute a chromosome of interest for its native counterpart. Both steps of this chromosome substitution method were efficient. We observed that wild-type II tended to co-transfer with synIX and was co-destabilized with wild-type IX, suggesting a potential gene dosage compensation relationship between these chromosomes.
RESUMO
We describe construction of the synthetic yeast chromosome XI (synXI) and reveal the effects of redesign at non-coding DNA elements. The 660-kb synthetic yeast genome project (Sc2.0) chromosome was assembled from synthesized DNA fragments before CRISPR-based methods were used in a process of bug discovery, redesign, and chromosome repair, including precise compaction of 200 kb of repeat sequence. Repaired defects were related to poor centromere function and mitochondrial health and were associated with modifications to non-coding regions. As part of the Sc2.0 design, loxPsym sequences for Cre-mediated recombination are inserted between most genes. Using the GAP1 locus from chromosome XI, we show that these sites can facilitate induced extrachromosomal circular DNA (eccDNA) formation, allowing direct study of the effects and propagation of these important molecules. Construction and characterization of synXI contributes to our understanding of non-coding DNA elements, provides a useful tool for eccDNA study, and will inform future synthetic genome design.
RESUMO
Type 2 diabetes mellitus (T2D) presents a major health and economic burden that could be alleviated with improved early prediction and intervention. While standard risk factors have shown good predictive performance, we show that the use of blood-based DNA methylation information leads to a significant improvement in the prediction of 10-year T2D incidence risk. Previous studies have been largely constrained by linear assumptions, the use of cytosine-guanine pairs one-at-a-time and binary outcomes. We present a flexible approach (via an R package, MethylPipeR) based on a range of linear and tree-ensemble models that incorporate time-to-event data for prediction. Using the Generation Scotland cohort (training set ncases = 374, ncontrols = 9,461; test set ncases = 252, ncontrols = 4,526) our best-performing model (area under the receiver operating characteristic curve (AUC) = 0.872, area under the precision-recall curve (PRAUC) = 0.302) showed notable improvement in 10-year onset prediction beyond standard risk factors (AUC = 0.839, precision-recall AUC = 0.227). Replication was observed in the German-based KORA study (n = 1,451, ncases = 142, P = 1.6 × 10-5).