Pesquisa | BVS Aleitamento Materno

1.

A Deep Neural Network for Predicting and Engineering Alternative Polyadenylation.

Bogard, Nicholas; Linder, Johannes; Rosenberg, Alexander B; Seelig, Georg.

Cell ; 178(1): 91-106.e23, 2019 06 27.

Artigo em Inglês | MEDLINE | ID: mdl-31178116

RESUMO

Alternative polyadenylation (APA) is a major driver of transcriptome diversity in human cells. Here, we use deep learning to predict APA from DNA sequence alone. We trained our model (APARENT, APA REgression NeT) on isoform expression data from over 3 million APA reporters. APARENT's predictions are highly accurate when tasked with inferring APA in synthetic and human 3'UTRs. Visualizing features learned across all network layers reveals that APARENT recognizes sequence motifs known to recruit APA regulators, discovers previously unknown sequence determinants of 3' end processing, and integrates these features into a comprehensive, interpretable, cis-regulatory code. We apply APARENT to forward engineer functional polyadenylation signals with precisely defined cleavage position and isoform usage and validate predictions experimentally. Finally, we use APARENT to quantify the impact of genetic variants on APA. Our approach detects pathogenic variants in a wide range of disease contexts, expanding our understanding of the genetic origins of disease.

Assuntos

Aprendizado Profundo , Modelos Genéticos , Poliadenilação/genética , Regiões 3' não Traduzidas/genética , Sequência de Bases/genética , Bases de Dados Genéticas , Expressão Gênica/genética , Células HEK293 , Humanos , Mutagênese/genética , Clivagem do RNA/genética , RNA Mensageiro/genética , RNA-Seq , Biologia Sintética , Transcriptoma

2.

Learning the sequence determinants of alternative splicing from millions of random sequences.

Rosenberg, Alexander B; Patwardhan, Rupali P; Shendure, Jay; Seelig, Georg.

Cell ; 163(3): 698-711, 2015 Oct 22.

Artigo em Inglês | MEDLINE | ID: mdl-26496609

RESUMO

Most human transcripts are alternatively spliced, and many disease-causing mutations affect RNA splicing. Toward better modeling the sequence determinants of alternative splicing, we measured the splicing patterns of over two million (M) synthetic mini-genes, which include degenerate subsequences totaling over 100 M bases of variation. The massive size of these training data allowed us to improve upon current models of splicing, as well as to gain new mechanistic insights. Our results show that the vast majority of hexamer sequence motifs measurably influence splice site selection when positioned within alternative exons, with multiple motifs acting additively rather than cooperatively. Intriguingly, motifs that enhance (suppress) exon inclusion in alternative 5' splicing also enhance (suppress) exon inclusion in alternative 3' or cassette exon splicing, suggesting a universal mechanism for alternative exon recognition. Finally, our empirically trained models are highly predictive of the effects of naturally occurring variants on alternative splicing in vivo.

Assuntos

Processamento Alternativo , Genoma Humano , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Sequência de Bases , Humanos , Dados de Sequência Molecular , Motivos de Nucleotídeos , Sítios de Splice de RNA

3.

CellMeSH: probabilistic cell-type identification using indexed literature.

Mao, Shunfu; Zhang, Yue; Seelig, Georg; Kannan, Sreeram.

Bioinformatics ; 38(5): 1393-1402, 2022 02 07.

Artigo em Inglês | MEDLINE | ID: mdl-34893819

RESUMO

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) is widely used for analyzing gene expression in multi-cellular systems and provides unprecedented access to cellular heterogeneity. scRNA-seq experiments aim to identify and quantify all cell types present in a sample. Measured single-cell transcriptomes are grouped by similarity and the resulting clusters are mapped to cell types based on cluster-specific gene expression patterns. While the process of generating clusters has become largely automated, annotation remains a laborious ad hoc effort that requires expert biological knowledge. RESULTS: Here, we introduce CellMeSH-a new automated approach to identifying cell types for clusters based on prior literature. CellMeSH combines a database of gene-cell-type associations with a probabilistic method for database querying. The database is constructed by automatically linking gene and cell-type information from millions of publications using existing indexed literature resources. Compared to manually constructed databases, CellMeSH is more comprehensive and is easily updated with new data. The probabilistic query method enables reliable information retrieval even though the gene-cell-type associations extracted from the literature are noisy. CellMeSH is also able to optionally utilize prior knowledge about tissues or cells for further annotation improvement. CellMeSH achieves top-one and top-three accuracies on a number of mouse and human datasets that are consistently better than existing approaches. AVAILABILITY AND IMPLEMENTATION: Web server at https://uncurl.cs.washington.edu/db_query and API at https://github.com/shunfumao/cellmesh. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Software , Humanos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos

4.

Machine Learning for Designing Next-Generation mRNA Therapeutics.

Castillo-Hair, Sebastian M; Seelig, Georg.

Acc Chem Res ; 55(1): 24-34, 2022 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-34905691

RESUMO

Over just the last 2 years, mRNA therapeutics and vaccines have undergone a rapid transition from an intriguing concept to real-world impact. However, whereas some aspects of mRNA therapeutics, such as the use of chemical modifications to increase stability and reduce immunogenicity, have been extensively optimized for over two decades, other aspects, particularly the selection and design of the noncoding leader and trailer sequences which control translation efficiency and stability, have received comparably less attention. In practice, such 5' and 3' untranslated regions (UTRs) are often borrowed from highly expressed human genes with few or no modifications, as in the case for the Pfizer/BioNTech Covid vaccine. Focusing on the 5'UTR, we here argue that model-driven design is a promising alternative that provides unprecedented control over 5'UTR function. We review recent work that combines synthetic biology with machine learning to build quantitative models that relate ribosome loading, and thus translation efficiency, to the 5'UTR sequence. We first introduce an experimental approach that uses polysome profiling and high-throughput sequencing to quantify ribosome loading for hundreds of thousands of 5'UTRs in parallel. We apply this approach to measure ribosome loading in synthetic RNA libraries with a random sequence inserted into the 5'UTR. We then review Optimus 5-Prime, a convolutional neural network model trained on the experimental data. We highlight that very accurate models of biological regulation can be learned from synthetic data sets with degenerate 5'UTRs. We validate model predictions not only on held-out data sets from our random library but also on a large library of over 30â¯000 human 5'UTR fragments and using translation reporter data collected independently by other groups. Both the experiment and model are compatible with commonly used chemically modified nucleosides, in particular, pseudouridine (Ψ) and 1-methyl-pseudouridine (m1Ψ). We find that, in general, 5'UTRs have very similar impacts when combined with different protein-coding sequences and even in the context of different chemical modifications. We demonstrate that Optimus 5-Prime can be combined with design algorithms to generate de novo sequences with precisely defined translation efficiencies. We emphasize recent developments in design algorithms that rely on activation maximization and generative modeling to improve both the fitness and diversity of designed sequences. Compared with prior approaches such as genetic algorithms, we show that these approaches are not only faster but also less likely to get stuck in local sequence optima. Finally, we discuss how the approach reviewed here can be generalized to other gene regions and applications.

Assuntos

COVID-19 , Biossíntese de Proteínas , Vacinas contra COVID-19 , Humanos , Aprendizado de Máquina , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , SARS-CoV-2

5.

Redefining the Etiologic Landscape of Cerebellar Malformations.

Aldinger, Kimberly A; Timms, Andrew E; Thomson, Zachary; Mirzaa, Ghayda M; Bennett, James T; Rosenberg, Alexander B; Roco, Charles M; Hirano, Matthew; Abidi, Fatima; Haldipur, Parthiv; Cheng, Chi V; Collins, Sarah; Park, Kaylee; Zeiger, Jordan; Overmann, Lynne M; Alkuraya, Fowzan S; Biesecker, Leslie G; Braddock, Stephen R; Cathey, Sara; Cho, Megan T; Chung, Brian H Y; Everman, David B; Zarate, Yuri A; Jones, Julie R; Schwartz, Charles E; Goldstein, Amy; Hopkin, Robert J; Krantz, Ian D; Ladda, Roger L; Leppig, Kathleen A; McGillivray, Barbara C; Sell, Susan; Wusik, Katherine; Gleeson, Joseph G; Nickerson, Deborah A; Bamshad, Michael J; Gerrelli, Dianne; Lisgo, Steven N; Seelig, Georg; Ishak, Gisele E; Barkovich, A James; Curry, Cynthia J; Glass, Ian A; Millen, Kathleen J; Doherty, Dan; Dobyns, William B.

Am J Hum Genet ; 105(3): 606-615, 2019 09 05.

Artigo em Inglês | MEDLINE | ID: mdl-31474318

RESUMO

Cerebellar malformations are diverse congenital anomalies frequently associated with developmental disability. Although genetic and prenatal non-genetic causes have been described, no systematic analysis has been performed. Here, we present a large-exome sequencing study of Dandy-Walker malformation (DWM) and cerebellar hypoplasia (CBLH). We performed exome sequencing in 282 individuals from 100 families with DWM or CBLH, and we established a molecular diagnosis in 36 of 100 families, with a significantly higher yield for CBLH (51%) than for DWM (16%). The 41 variants impact 27 neurodevelopmental-disorder-associated genes, thus demonstrating that CBLH and DWM are often features of monogenic neurodevelopmental disorders. Though only seven monogenic causes (19%) were identified in more than one individual, neuroimaging review of 131 additional individuals confirmed cerebellar abnormalities in 23 of 27 genetic disorders (85%). Prenatal risk factors were frequently found among individuals without a genetic diagnosis (30 of 64 individuals [47%]). Single-cell RNA sequencing of prenatal human cerebellar tissue revealed gene enrichment in neuronal and vascular cell types; this suggests that defective vasculogenesis may disrupt cerebellar development. Further, de novo gain-of-function variants in PDGFRB, a tyrosine kinase receptor essential for vascular progenitor signaling, were associated with CBLH, and this discovery links genetic and non-genetic etiologies. Our results suggest that genetic defects impact specific cerebellar cell types and implicate abnormal vascular development as a mechanism for cerebellar malformations. We also confirmed a major contribution for non-genetic prenatal factors in individuals with cerebellar abnormalities, substantially influencing diagnostic evaluation and counseling regarding recurrence risk and prognosis.

Assuntos

Cerebelo/anormalidades , Cerebelo/diagnóstico por imagem , Estudos de Coortes , Feminino , Humanos , Masculino , Gravidez

6.

Fast activation maximization for molecular sequence design.

Linder, Johannes; Seelig, Georg.

BMC Bioinformatics ; 22(1): 510, 2021 Oct 20.

Artigo em Inglês | MEDLINE | ID: mdl-34670493

RESUMO

BACKGROUND: Optimization of DNA and protein sequences based on Machine Learning models is becoming a powerful tool for molecular design. Activation maximization offers a simple design strategy for differentiable models: one-hot coded sequences are first approximated by a continuous representation, which is then iteratively optimized with respect to the predictor oracle by gradient ascent. While elegant, the current version of the method suffers from vanishing gradients and may cause predictor pathologies leading to poor convergence. RESULTS: Here, we introduce Fast SeqProp, an improved activation maximization method that combines straight-through approximation with normalization across the parameters of the input sequence distribution. Fast SeqProp overcomes bottlenecks in earlier methods arising from input parameters becoming skewed during optimization. Compared to prior methods, Fast SeqProp results in up to 100-fold faster convergence while also finding improved fitness optima for many applications. We demonstrate Fast SeqProp's capabilities by designing DNA and protein sequences for six deep learning predictors, including a protein structure predictor. CONCLUSIONS: Fast SeqProp offers a reliable and efficient method for general-purpose sequence optimization through a differentiable fitness predictor. As demonstrated on a variety of deep learning models, the method is widely applicable, and can incorporate various regularization techniques to maintain confidence in the sequence designs. As a design tool, Fast SeqProp may aid in the development of novel molecules, drug therapies and vaccines.

Assuntos

Algoritmos , Aprendizado de Máquina , Sequência de Aminoácidos

7.

Degenerate minigene library analysis enables identification of altered branch point utilization by mutant splicing factor 3B1 (SF3B1).

Gupta, Abhishek K; Murthy, Tushar; Paul, Kiran V; Ramirez, Oscar; Fisher, Joseph B; Rao, Sridhar; Rosenberg, Alexander B; Seelig, Georg; Minella, Alex C; Pillai, Manoj M.

Nucleic Acids Res ; 47(2): 970-980, 2019 01 25.

Artigo em Inglês | MEDLINE | ID: mdl-30462273

RESUMO

Cancer-associated mutations of the core splicing factor 3 B1 (SF3B1) result in selection of novel 3' splice sites (3'SS), but precise molecular mechanisms of oncogenesis remain unclear. SF3B1 stabilizes the interaction between U2 snRNP and branch point (BP) on the pre-mRNA. It has hence been speculated that a change in BP selection is the basis for novel 3'SS selection. Direct quantitative determination of BP utilization is however technically challenging. To define BP utilization by SF3B1-mutant spliceosomes, we used an overexpression approach in human cells as well as a complementary strategy using isogenic murine embryonic stem cells with monoallelic K700E mutations constructed via CRISPR/Cas9-based genome editing and a dual vector homology-directed repair methodology. A synthetic minigene library with degenerate regions in 3' intronic regions (3.4 million individual minigenes) was used to compare BP usage of SF3B1K700E and SF3B1WT. Using this model, we show that SF3B1K700E spliceosomes utilize non-canonical sequence variants (at position -1 relative to BP adenosine) more frequently than wild-type spliceosomes. These predictions were confirmed using minigene splicing assays. Our results suggest a model of BP utilization by mutant SF3B1 wherein it is able to utilize non-consensus alternative BP sequences by stabilizing weaker U2-BP interactions.

Assuntos

Fatores de Processamento de RNA/metabolismo , Animais , Pareamento de Bases , Células Cultivadas , Células-Tronco Embrionárias/metabolismo , Biblioteca Gênica , Células HEK293 , Humanos , Camundongos , Mutação , Motivos de Nucleotídeos , Fosfoproteínas/genética , Sítios de Splice de RNA , Fatores de Processamento de RNA/genética , RNA Mensageiro/metabolismo

8.

Variant Interpretation: Functional Assays to the Rescue.

Starita, Lea M; Ahituv, Nadav; Dunham, Maitreya J; Kitzman, Jacob O; Roth, Frederick P; Seelig, Georg; Shendure, Jay; Fowler, Douglas M.

Am J Hum Genet ; 101(3): 315-325, 2017 Sep 07.

Artigo em Inglês | MEDLINE | ID: mdl-28886340

RESUMO

Classical genetic approaches for interpreting variants, such as case-control or co-segregation studies, require finding many individuals with each variant. Because the overwhelming majority of variants are present in only a few living humans, this strategy has clear limits. Fully realizing the clinical potential of genetics requires that we accurately infer pathogenicity even for rare or private variation. Many computational approaches to predicting variant effects have been developed, but they can identify only a small fraction of pathogenic variants with the high confidence that is required in the clinic. Experimentally measuring a variant's functional consequences can provide clearer guidance, but individual assays performed only after the discovery of the variant are both time and resource intensive. Here, we discuss how multiplex assays of variant effect (MAVEs) can be used to measure the functional consequences of all possible variants in disease-relevant loci for a variety of molecular and cellular phenotypes. The resulting large-scale functional data can be combined with machine learning and clinical knowledge for the development of "lookup tables" of accurate pathogenicity predictions. A coordinated effort to produce, analyze, and disseminate large-scale functional data generated by multiplex assays could be essential to addressing the variant-interpretation crisis.

Assuntos

Biologia Computacional/métodos , Doença/genética , Variação Genética , Genoma Humano , Humanos

9.

Deep learning of the regulatory grammar of yeast 5' untranslated regions from 500,000 random sequences.

Cuperus, Josh T; Groves, Benjamin; Kuchina, Anna; Rosenberg, Alexander B; Jojic, Nebojsa; Fields, Stanley; Seelig, Georg.

Genome Res ; 27(12): 2015-2024, 2017 12.

Artigo em Inglês | MEDLINE | ID: mdl-29097404

RESUMO

Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5' untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5' UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5' UTRs as well as native S. cerevisiae 5' UTRs. The model additionally was used to computationally evolve highly active 5' UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model.

Assuntos

Modelos Genéticos , Saccharomyces cerevisiae/genética , Regiões 5' não Traduzidas , Processamento Alternativo , Simulação por Computador , Biblioteca Gênica , Aprendizado de Máquina , Redes Neurais de Computação , RNA Fúngico , RNA Mensageiro

10.

Programmable patterns in a DNA-based reaction-diffusion system.

Chen, Sifang; Seelig, Georg.

Soft Matter ; 16(14): 3555-3563, 2020 Apr 14.

Artigo em Inglês | MEDLINE | ID: mdl-32219296

RESUMO

Biology offers compelling proof that macroscopic "living materials" can emerge from reactions between diffusing biomolecules. Here, we show that molecular self-organization could be a similarly powerful approach for engineering functional synthetic materials. We introduce a programmable DNA embedded hydrogel that produces tunable patterns at the centimeter length scale. We generate these patterns by implementing chemical reaction networks through synthetic DNA complexes, embedding the complexes in the hydrogel, and triggering with locally applied input DNA strands. We first demonstrate ring pattern formation around a circular input cavity and show that the ring width and intensity can be predictably tuned. Then, we create patterns of increasing complexity, including concentric rings and non-isotropic patterns. Finally, we show "destructive" and "constructive" interference patterns, by combining several ring-forming modules in the gel and triggering them from multiple sources. We further show that computer simulations based on the reaction-diffusion model can predict and inform the programming of target patterns.

Assuntos

Simulação por Computador , DNA/química , Hidrogéis/química , Modelos Químicos

11.

Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge.

Mukherjee, Sumit; Zhang, Yue; Fan, Joshua; Seelig, Georg; Kannan, Sreeram.

Bioinformatics ; 34(13): i124-i132, 2018 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-29949988

RESUMO

Motivation: Single cell RNA-seq (scRNA-seq) data contains a wealth of information which has to be inferred computationally from the observed sequencing reads. As the ability to sequence more cells improves rapidly, existing computational tools suffer from three problems. (i) The decreased reads-per-cell implies a highly sparse sample of the true cellular transcriptome. (ii) Many tools simply cannot handle the size of the resulting datasets. (iii) Prior biological knowledge such as bulk RNA-seq information of certain cell types or qualitative marker information is not taken into account. Here we present UNCURL, a preprocessing framework based on non-negative matrix factorization for scRNA-seq data, that is able to handle varying sampling distributions, scales to very large cell numbers and can incorporate prior knowledge. Results: We find that preprocessing using UNCURL consistently improves performance of commonly used scRNA-seq tools for clustering, visualization and lineage estimation, both in the absence and presence of prior knowledge. Finally we demonstrate that UNCURL is extremely scalable and parallelizable, and runs faster than other methods on a scRNA-seq dataset containing 1.3 million cells. Availability and implementation: Source code is available at https://github.com/yjzhang/uncurl_python. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software , Algoritmos , Análise por Conglomerados

12.

Molecular circuits for dynamic noise filtering.

Zechner, Christoph; Seelig, Georg; Rullan, Marc; Khammash, Mustafa.

Proc Natl Acad Sci U S A ; 113(17): 4729-34, 2016 Apr 26.

Artigo em Inglês | MEDLINE | ID: mdl-27078094

RESUMO

The invention of the Kalman filter is a crowning achievement of filtering theory-one that has revolutionized technology in countless ways. By dealing effectively with noise, the Kalman filter has enabled various applications in positioning, navigation, control, and telecommunications. In the emerging field of synthetic biology, noise and context dependency are among the key challenges facing the successful implementation of reliable, complex, and scalable synthetic circuits. Although substantial further advancement in the field may very well rely on effectively addressing these issues, a principled protocol to deal with noise-as provided by the Kalman filter-remains completely missing. Here we develop an optimal filtering theory that is suitable for noisy biochemical networks. We show how the resulting filters can be implemented at the molecular level and provide various simulations related to estimation, system identification, and noise cancellation problems. We demonstrate our approach in vitro using DNA strand displacement cascades as well as in vivo using flow cytometry measurements of a light-inducible circuit in Escherichia coli.

Assuntos

Computadores Moleculares , Modelos Biológicos , Modelos Químicos , Modelos Estatísticos , Processamento de Sinais Assistido por Computador , Razão Sinal-Ruído

13.

An Engineered Kinetic Amplification Mechanism for Single Nucleotide Variant Discrimination by DNA Hybridization Probes.

Chen, Sherry Xi; Seelig, Georg.

J Am Chem Soc ; 138(15): 5076-86, 2016 Apr 20.

Artigo em Inglês | MEDLINE | ID: mdl-27010123

RESUMO

Even a single-nucleotide difference between the sequences of two otherwise identical biological nucleic acids can have dramatic functional consequences. Here, we use model-guided reaction pathway engineering to quantitatively improve the performance of selective hybridization probes in recognizing single nucleotide variants (SNVs). Specifically, we build a detection system that combines discrimination by competition with DNA strand displacement-based catalytic amplification. We show, both mathematically and experimentally, that the single nucleotide selectivity of such a system in binding to single-stranded DNA and RNA is quadratically better than discrimination due to competitive hybridization alone. As an additional benefit the integrated circuit inherits the property of amplification and provides at least 10-fold better sensitivity than standard hybridization probes. Moreover, we demonstrate how the detection mechanism can be tuned such that the detection reaction is agnostic to the position of the SNV within the target sequence. in contrast, prior strand displacement-based probes designed for kinetic discrimination are highly sensitive to position effects. We apply our system to reliably discriminate between different members of the let-7 microRNA family that differ in only a single base position. Our results demonstrate the power of systematic reaction network design to quantitatively improve biotechnology.

Assuntos

Sondas de DNA/química , Sondas de DNA/genética , DNA/química , DNA/genética , MicroRNAs/química , MicroRNAs/genética , Hibridização de Ácido Nucleico/métodos , Humanos , Polimorfismo de Nucleotídeo Único

14.

Navigating through a maze.

Seelig, Georg.

Nat Mater ; 18(3): 198-199, 2019 03.

Artigo em Inglês | MEDLINE | ID: mdl-30783223

Assuntos

DNA , Nanotecnologia

15.

Optimizing 5'UTRs for mRNA-delivered gene editing using deep learning.

Castillo-Hair, Sebastian; Fedak, Stephen; Wang, Ban; Linder, Johannes; Havens, Kyle; Certo, Michael; Seelig, Georg.

Nat Commun ; 15(1): 5284, 2024 Jun 20.

Artigo em Inglês | MEDLINE | ID: mdl-38902240

RESUMO

mRNA therapeutics are revolutionizing the pharmaceutical industry, but methods to optimize the primary sequence for increased expression are still lacking. Here, we design 5'UTRs for efficient mRNA translation using deep learning. We perform polysome profiling of fully or partially randomized 5'UTR libraries in three cell types and find that UTR performance is highly correlated across cell types. We train models on our datasets and use them to guide the design of high-performing 5'UTRs using gradient descent and generative neural networks. We experimentally test designed 5'UTRs with mRNA encoding megaTALTM gene editing enzymes for two different gene targets and in two different cell lines. We find that the designed 5'UTRs support strong gene editing activity. Editing efficiency is correlated between cell types and gene targets, although the best performing UTR was specific to one cargo and cell type. Our results highlight the potential of model-based sequence design for mRNA therapeutics.

Assuntos

Regiões 5' não Traduzidas , Aprendizado Profundo , Edição de Genes , RNA Mensageiro , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Regiões 5' não Traduzidas/genética , Humanos , Edição de Genes/métodos , Polirribossomos/metabolismo , Linhagem Celular , Células HEK293 , Biossíntese de Proteínas

16.

Iterative deep learning-design of human enhancers exploits condensed sequence grammar to achieve cell type-specificity.

Yin, Christopher; Hair, Sebastian Castillo; Byeon, Gun Woo; Bromley, Peter; Meuleman, Wouter; Seelig, Georg.

bioRxiv ; 2024 Jun 14.

Artigo em Inglês | MEDLINE | ID: mdl-38915713

RESUMO

An important and largely unsolved problem in synthetic biology is how to target gene expression to specific cell types. Here, we apply iterative deep learning to design synthetic enhancers with strong differential activity between two human cell lines. We initially train models on published datasets of enhancer activity and chromatin accessibility and use them to guide the design of synthetic enhancers that maximize predicted specificity. We experimentally validate these sequences, use the measurements to re-optimize the predictor, and design a second generation of enhancers with improved specificity. Our design methods embed relevant transcription factor binding site (TFBS) motifs with higher frequencies than comparable endogenous enhancers while using a more selective motif vocabulary, and we show that enhancer activity is correlated with transcription factor expression at the single cell level. Finally, we characterize causal features of top enhancers via perturbation experiments and show enhancers as short as 50bp can maintain specificity.

17.

Single-cell RNA sequencing reveals plasmid constrains bacterial population heterogeneity and identifies a non-conjugating subpopulation.

Cyriaque, Valentine; Ibarra-Chávez, Rodrigo; Kuchina, Anna; Seelig, Georg; Nesme, Joseph; Madsen, Jonas Stenløkke.

Nat Commun ; 15(1): 5853, 2024 Jul 12.

Artigo em Inglês | MEDLINE | ID: mdl-38997267

RESUMO

Transcriptional heterogeneity in isogenic bacterial populations can play various roles in bacterial evolution, but its detection remains technically challenging. Here, we use microbial split-pool ligation transcriptomics to study the relationship between bacterial subpopulation formation and plasmid-host interactions at the single-cell level. We find that single-cell transcript abundances are influenced by bacterial growth state and plasmid carriage. Moreover, plasmid carriage constrains the formation of bacterial subpopulations. Plasmid genes, including those with core functions such as replication and maintenance, exhibit transcriptional heterogeneity associated with cell activity. Notably, we identify a cell subpopulation that does not transcribe conjugal plasmid transfer genes, which may help reduce plasmid burden on a subset of cells. Our study advances the understanding of plasmid-mediated subpopulation dynamics and provides insights into the plasmid-bacteria interplay.

Assuntos

Plasmídeos , Análise de Célula Única , Plasmídeos/genética , Análise de Célula Única/métodos , Escherichia coli/genética , Análise de Sequência de RNA/métodos , Conjugação Genética , Bactérias/genética , Regulação Bacteriana da Expressão Gênica , Heterogeneidade Genética

18.

High-throughput single-cell transcriptomics of bacteria using combinatorial barcoding.

Gaisser, Karl D; Skloss, Sophie N; Brettner, Leandra M; Paleologu, Luana; Roco, Charles M; Rosenberg, Alexander B; Hirano, Matthew; DePaolo, R William; Seelig, Georg; Kuchina, Anna.

Nat Protoc ; 2024 Jun 17.

Artigo em Inglês | MEDLINE | ID: mdl-38886529

RESUMO

Microbial split-pool ligation transcriptomics (microSPLiT) is a high-throughput single-cell RNA sequencing method for bacteria. With four combinatorial barcoding rounds, microSPLiT can profile transcriptional states in hundreds of thousands of Gram-negative and Gram-positive bacteria in a single experiment without specialized equipment. As bacterial samples are fixed and permeabilized before barcoding, they can be collected and stored ahead of time. During the first barcoding round, the fixed and permeabilized bacteria are distributed into a 96-well plate, where their transcripts are reverse transcribed into cDNA and labeled with the first well-specific barcode inside the cells. The cells are mixed and redistributed two more times into new 96-well plates, where the second and third barcodes are appended to the cDNA via in-cell ligation reactions. Finally, the cells are mixed and divided into aliquot sub-libraries, which can be stored until future use or prepared for sequencing with the addition of a fourth barcode. It takes 4 days to generate sequencing-ready libraries, including 1 day for collection and overnight fixation of samples. The standard plate setup enables single-cell transcriptional profiling of up to 1 million bacterial cells and up to 96 samples in a single barcoding experiment, with the possibility of expansion by adding barcoding rounds. The protocol requires experience in basic molecular biology techniques, handling of bacterial samples and preparation of DNA libraries for next-generation sequencing. It can be performed by experienced undergraduate or graduate students. Data analysis requires access to computing resources, familiarity with Unix command line and basic experience with Python or R.

19.

DNA as a universal substrate for chemical kinetics.

Soloveichik, David; Seelig, Georg; Winfree, Erik.

Proc Natl Acad Sci U S A ; 107(12): 5393-8, 2010 Mar 23.

Artigo em Inglês | MEDLINE | ID: mdl-20203007

RESUMO

Molecular programming aims to systematically engineer molecular and chemical systems of autonomous function and ever-increasing complexity. A key goal is to develop embedded control circuitry within a chemical system to direct molecular events. Here we show that systems of DNA molecules can be constructed that closely approximate the dynamic behavior of arbitrary systems of coupled chemical reactions. By using strand displacement reactions as a primitive, we construct reaction cascades with effectively unimolecular and bimolecular kinetics. Our construction allows individual reactions to be coupled in arbitrary ways such that reactants can participate in multiple reactions simultaneously, reproducing the desired dynamical properties. Thus arbitrary systems of chemical equations can be compiled into real chemical systems. We illustrate our method on the Lotka-Volterra oscillator, a limit-cycle oscillator, a chaotic system, and systems implementing feedback digital logic and algorithmic behavior.

Assuntos

DNA/química , DNA/metabolismo , Modelos Biológicos , Fenômenos Biofísicos , Simulação por Computador , Cinética , Dinâmica não Linear , Teoria de Sistemas

20.

Massively parallel protein-protein interaction measurement by sequencing (MP3-seq) enables rapid screening of protein heterodimers.

Baryshev, Alexander; La Fleur, Alyssa; Groves, Benjamin; Michel, Cirstyn; Baker, David; Ljubetic, Ajasja; Seelig, Georg.

bioRxiv ; 2023 Aug 16.

Artigo em Inglês | MEDLINE | ID: mdl-36798377

RESUMO

Protein-protein interactions (PPIs) regulate many cellular processes, and engineered PPIs have cell and gene therapy applications. Here we introduce massively parallel protein-protein interaction measurement by sequencing (MP3-seq), an easy-to-use and highly scalable yeast-two-hybrid approach for measuring PPIs. In MP3-seq, DNA barcodes are associated with specific protein pairs, and barcode enrichment can be read by sequencing to provide a direct measure of interaction strength. We show that MP3-seq is highly quantitative and scales to over 100,000 interactions. We apply MP3-seq to characterize interactions between families of rationally designed heterodimers and to investigate elements conferring specificity to coiled-coil interactions. Finally, we predict coiled heterodimer structures using AlphaFold-Multimer (AF-M) and train linear models on physics simulation energy terms to predict MP3-seq values. We find that AF-M and AF-M complex prediction-based models could be valuable for pre-screening interactions, but that measuring interactions experimentally remains necessary to rank their strengths quantitatively.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA