Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
1.
Nat Struct Mol Biol ; 30(12): 1947-1957, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38087090

RESUMO

JTE-607 is an anticancer and anti-inflammatory compound and its active form, compound 2, directly binds to and inhibits CPSF73, the endonuclease for the cleavage step in pre-messenger RNA (pre-mRNA) 3' processing. Surprisingly, compound 2-mediated inhibition of pre-mRNA cleavage is sequence specific and the drug sensitivity is predominantly determined by sequences flanking the cleavage site (CS). Using massively parallel in vitro assays, we identified key sequence features that determine drug sensitivity. We trained a machine learning model that can predict poly(A) site (PAS) relative sensitivity to compound 2 and provide the molecular basis for understanding the impact of JTE-607 on PAS selection and transcription termination genome wide. We propose that CPSF73 and associated factors bind to the CS region in a sequence-dependent manner and the interaction affinity determines compound 2 sensitivity. These results have not only elucidated the mechanism of action of JTE-607, but also unveiled an evolutionarily conserved sequence specificity of the mRNA 3' processing machinery.


Assuntos
Precursores de RNA , Processamento Pós-Transcricional do RNA , Linhagem Celular , Precursores de RNA/genética , Precursores de RNA/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo
2.
bioRxiv ; 2023 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-38045294

RESUMO

The 5' UTRs of mRNAs are critical for translation regulation, but their in vivo regulatory features are poorly characterized. Here, we report the regulatory landscape of 5' UTRs during early zebrafish embryogenesis using a massively parallel reporter assay of 18,154 sequences coupled to polysome profiling. We found that the 5' UTR is sufficient to confer temporal dynamics to translation initiation, and identified 86 motifs enriched in 5' UTRs with distinct ribosome recruitment capabilities. A quantitative deep learning model, DaniO5P, revealed a combined role for 5' UTR length, translation initiation site context, upstream AUGs and sequence motifs on in vivo ribosome recruitment. DaniO5P predicts the activities of 5' UTR isoforms and indicates that modulating 5' UTR length and motif grammar contributes to translation initiation dynamics. This study provides a first quantitative model of 5' UTR-based translation regulation in early vertebrate development and lays the foundation for identifying the underlying molecular effectors.

3.
Nat Nanotechnol ; 18(8): 912-921, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37142708

RESUMO

DNA has emerged as an attractive medium for archival data storage due to its durability and high information density. Scalable parallel random access to information is a desirable property of any storage system. For DNA-based storage systems, however, this still needs to be robustly established. Here we report on a thermoconfined polymerase chain reaction, which enables multiplexed, repeated random access to compartmentalized DNA files. The strategy is based on localizing biotin-functionalized oligonucleotides inside thermoresponsive, semipermeable microcapsules. At low temperatures, microcapsules are permeable to enzymes, primers and amplified products, whereas at high temperatures, membrane collapse prevents molecular crosstalk during amplification. Our data show that the platform outperforms non-compartmentalized DNA storage compared with repeated random access and reduces amplification bias tenfold during multiplex polymerase chain reaction. Using fluorescent sorting, we also demonstrate sample pooling and data retrieval by microcapsule barcoding. Therefore, the thermoresponsive microcapsule technology offers a scalable, sequence-agnostic approach for repeated random access to archival DNA files.


Assuntos
DNA , Armazenamento e Recuperação da Informação , Cápsulas , DNA/genética , Oligonucleotídeos , Sequenciamento de Nucleotídeos em Larga Escala
4.
bioRxiv ; 2023 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-37090613

RESUMO

JTE-607 is a small molecule compound with anti-inflammation and anti-cancer activities. Upon entering the cell, it is hydrolyzed to Compound 2, which directly binds to and inhibits CPSF73, the endonuclease for the cleavage step in pre-mRNA 3' processing. Although CPSF73 is universally required for mRNA 3' end formation, we have unexpectedly found that Compound 2- mediated inhibition of pre-mRNA 3' processing is sequence-specific and that the sequences flanking the cleavage site (CS) are a major determinant for drug sensitivity. By using massively parallel in vitro assays, we have measured the Compound 2 sensitivities of over 260,000 sequence variants and identified key sequence features that determine drug sensitivity. A machine learning model trained on these data can predict the impact of JTE-607 on poly(A) site (PAS) selection and transcription termination genome-wide. We propose a biochemical model in which CPSF73 and other mRNA 3' processing factors bind to RNA of the CS region in a sequence-specific manner and the affinity of such interaction determines the Compound 2 sensitivity of a PAS. As the Compound 2-resistant CS sequences, characterized by U/A-rich motifs, are prevalent in PASs from yeast to human, the CS region sequence may have more fundamental functions beyond determining drug resistance. Together, our study not only characterized the mechanism of action of a compound with clinical implications, but also revealed a previously unknown and evolutionarily conserved sequence-specificity of the mRNA 3' processing machinery.

5.
bioRxiv ; 2023 Aug 16.
Artigo em Inglês | MEDLINE | ID: mdl-36798377

RESUMO

Protein-protein interactions (PPIs) regulate many cellular processes, and engineered PPIs have cell and gene therapy applications. Here we introduce massively parallel protein-protein interaction measurement by sequencing (MP3-seq), an easy-to-use and highly scalable yeast-two-hybrid approach for measuring PPIs. In MP3-seq, DNA barcodes are associated with specific protein pairs, and barcode enrichment can be read by sequencing to provide a direct measure of interaction strength. We show that MP3-seq is highly quantitative and scales to over 100,000 interactions. We apply MP3-seq to characterize interactions between families of rationally designed heterodimers and to investigate elements conferring specificity to coiled-coil interactions. Finally, we predict coiled heterodimer structures using AlphaFold-Multimer (AF-M) and train linear models on physics simulation energy terms to predict MP3-seq values. We find that AF-M and AF-M complex prediction-based models could be valuable for pre-screening interactions, but that measuring interactions experimentally remains necessary to rank their strengths quantitatively.

6.
Genome Biol ; 23(1): 232, 2022 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-36335397

RESUMO

BACKGROUND: 3'-end processing by cleavage and polyadenylation is an important and finely tuned regulatory process during mRNA maturation. Numerous genetic variants are known to cause or contribute to human disorders by disrupting the cis-regulatory code of polyadenylation signals. Yet, due to the complexity of this code, variant interpretation remains challenging. RESULTS: We introduce a residual neural network model, APARENT2, that can infer 3'-cleavage and polyadenylation from DNA sequence more accurately than any previous model. This model generalizes to the case of alternative polyadenylation (APA) for a variable number of polyadenylation signals. We demonstrate APARENT2's performance on several variant datasets, including functional reporter data and human 3' aQTLs from GTEx. We apply neural network interpretation methods to gain insights into disrupted or protective higher-order features of polyadenylation. We fine-tune APARENT2 on human tissue-resolved transcriptomic data to elucidate tissue-specific variant effects. By combining APARENT2 with models of mRNA stability, we extend aQTL effect size predictions to the entire 3' untranslated region. Finally, we perform in silico saturation mutagenesis of all human polyadenylation signals and compare the predicted effects of [Formula: see text] million variants against gnomAD. While loss-of-function variants were generally selected against, we also find specific clinical conditions linked to gain-of-function mutations. For example, we detect an association between gain-of-function mutations in the 3'-end and autism spectrum disorder. To experimentally validate APARENT2's predictions, we assayed clinically relevant variants in multiple cell lines, including microglia-derived cells. CONCLUSIONS: A sequence-to-function model based on deep residual learning enables accurate functional interpretation of genetic variants in polyadenylation signals and, when coupled with large human variation databases, elucidates the link between functional 3'-end mutations and human health.


Assuntos
Transtorno do Espectro Autista , Poliadenilação , Humanos , Transtorno do Espectro Autista/genética , Estabilidade de RNA/genética , Transcriptoma , Variação Genética , Regiões 3' não Traduzidas
7.
Nat Mach Intell ; 4(1): 41-54, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35966405

RESUMO

Sequence-based neural networks can learn to make accurate predictions from large biological datasets, but model interpretation remains challenging. Many existing feature attribution methods are optimized for continuous rather than discrete input patterns and assess individual feature importance in isolation, making them ill-suited for interpreting non-linear interactions in molecular sequences. Building on work in computer vision and natural language processing, we developed an approach based on deep learning - Scrambler networks - wherein the most salient sequence positions are identified with learned input masks. Scramblers learn to predict Position-Specific Scoring Matrices (PSSMs) where unimportant nucleotides or residues are scrambled by raising their entropy. We apply Scramblers to interpret the effects of genetic variants, uncover non-linear interactions between cis-regulatory elements, explain binding specificity for protein-protein interactions, and identify structural determinants of de novo designed proteins. We show that Scramblers enable efficient attribution across large datasets and result in high-quality explanations, often outperforming state-of-the-art methods.

8.
Nat Commun ; 13(1): 4904, 2022 08 20.
Artigo em Inglês | MEDLINE | ID: mdl-35987925

RESUMO

DNA has emerged as a powerful substrate for programming information processing machines at the nanoscale. Among the DNA computing primitives used today, DNA strand displacement (DSD) is arguably the most popular, with DSD-based circuit applications ranging from disease diagnostics to molecular artificial neural networks. The outputs of DSD circuits are generally read using fluorescence spectroscopy. However, due to the spectral overlap of typical small-molecule fluorescent reporters, the number of unique outputs that can be detected in parallel is limited, requiring complex optical setups or spatial isolation of reactions to make output bandwidths scalable. Here, we present a multiplexable sequencing-free readout method that enables real-time, kinetic measurement of DSD circuit activity through highly parallel, direct detection of barcoded output strands using nanopore sensor array technology (Oxford Nanopore Technologies' MinION device). These results increase DSD output bandwidth by an order of magnitude over what is currently feasible with fluorescence spectroscopy.


Assuntos
Nanoporos , DNA , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Recombinação Genética , Análise de Sequência de DNA/métodos
9.
Elife ; 112022 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-35312478

RESUMO

Division of labor between cells is ubiquitous in biology but the use of multicellular consortia for engineering applications is only beginning to be explored. A significant advantage of multicellular circuits is their potential to be modular with respect to composition but this claim has not yet been extensively tested using experiments and quantitative modeling. Here, we construct a library of 24 yeast strains capable of sending, receiving or responding to three molecular signals, characterize them experimentally and build quantitative models of their input-output relationships. We then compose these strains into two- and three-strain cascades as well as a four-strain bistable switch and show that experimentally measured consortia dynamics can be predicted from the models of the constituent parts. To further explore the achievable range of behaviors, we perform a fully automated computational search over all two-, three-, and four-strain consortia to identify combinations that realize target behaviors including logic gates, band-pass filters, and time pulses. Strain combinations that are predicted to map onto a target behavior are further computationally optimized and then experimentally tested. Experiments closely track computational predictions. The high reliability of these model descriptions further strengthens the feasibility and highlights the potential for distributed computing in synthetic biology.


Assuntos
Saccharomyces cerevisiae , Biologia Sintética , Biblioteca Gênica , Lógica , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/genética , Biologia Sintética/métodos
10.
Acc Chem Res ; 55(1): 24-34, 2022 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-34905691

RESUMO

Over just the last 2 years, mRNA therapeutics and vaccines have undergone a rapid transition from an intriguing concept to real-world impact. However, whereas some aspects of mRNA therapeutics, such as the use of chemical modifications to increase stability and reduce immunogenicity, have been extensively optimized for over two decades, other aspects, particularly the selection and design of the noncoding leader and trailer sequences which control translation efficiency and stability, have received comparably less attention. In practice, such 5' and 3' untranslated regions (UTRs) are often borrowed from highly expressed human genes with few or no modifications, as in the case for the Pfizer/BioNTech Covid vaccine. Focusing on the 5'UTR, we here argue that model-driven design is a promising alternative that provides unprecedented control over 5'UTR function. We review recent work that combines synthetic biology with machine learning to build quantitative models that relate ribosome loading, and thus translation efficiency, to the 5'UTR sequence. We first introduce an experimental approach that uses polysome profiling and high-throughput sequencing to quantify ribosome loading for hundreds of thousands of 5'UTRs in parallel. We apply this approach to measure ribosome loading in synthetic RNA libraries with a random sequence inserted into the 5'UTR. We then review Optimus 5-Prime, a convolutional neural network model trained on the experimental data. We highlight that very accurate models of biological regulation can be learned from synthetic data sets with degenerate 5'UTRs. We validate model predictions not only on held-out data sets from our random library but also on a large library of over 30 000 human 5'UTR fragments and using translation reporter data collected independently by other groups. Both the experiment and model are compatible with commonly used chemically modified nucleosides, in particular, pseudouridine (Ψ) and 1-methyl-pseudouridine (m1Ψ). We find that, in general, 5'UTRs have very similar impacts when combined with different protein-coding sequences and even in the context of different chemical modifications. We demonstrate that Optimus 5-Prime can be combined with design algorithms to generate de novo sequences with precisely defined translation efficiencies. We emphasize recent developments in design algorithms that rely on activation maximization and generative modeling to improve both the fitness and diversity of designed sequences. Compared with prior approaches such as genetic algorithms, we show that these approaches are not only faster but also less likely to get stuck in local sequence optima. Finally, we discuss how the approach reviewed here can be generalized to other gene regions and applications.


Assuntos
COVID-19 , Biossíntese de Proteínas , Vacinas contra COVID-19 , Humanos , Aprendizado de Máquina , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , SARS-CoV-2
11.
Bioinformatics ; 38(5): 1393-1402, 2022 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-34893819

RESUMO

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) is widely used for analyzing gene expression in multi-cellular systems and provides unprecedented access to cellular heterogeneity. scRNA-seq experiments aim to identify and quantify all cell types present in a sample. Measured single-cell transcriptomes are grouped by similarity and the resulting clusters are mapped to cell types based on cluster-specific gene expression patterns. While the process of generating clusters has become largely automated, annotation remains a laborious ad hoc effort that requires expert biological knowledge. RESULTS: Here, we introduce CellMeSH-a new automated approach to identifying cell types for clusters based on prior literature. CellMeSH combines a database of gene-cell-type associations with a probabilistic method for database querying. The database is constructed by automatically linking gene and cell-type information from millions of publications using existing indexed literature resources. Compared to manually constructed databases, CellMeSH is more comprehensive and is easily updated with new data. The probabilistic query method enables reliable information retrieval even though the gene-cell-type associations extracted from the literature are noisy. CellMeSH is also able to optionally utilize prior knowledge about tissues or cells for further annotation improvement. CellMeSH achieves top-one and top-three accuracies on a number of mouse and human datasets that are consistently better than existing approaches. AVAILABILITY AND IMPLEMENTATION: Web server at https://uncurl.cs.washington.edu/db_query and API at https://github.com/shunfumao/cellmesh. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Humanos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
12.
BMC Bioinformatics ; 22(1): 510, 2021 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-34670493

RESUMO

BACKGROUND: Optimization of DNA and protein sequences based on Machine Learning models is becoming a powerful tool for molecular design. Activation maximization offers a simple design strategy for differentiable models: one-hot coded sequences are first approximated by a continuous representation, which is then iteratively optimized with respect to the predictor oracle by gradient ascent. While elegant, the current version of the method suffers from vanishing gradients and may cause predictor pathologies leading to poor convergence. RESULTS: Here, we introduce Fast SeqProp, an improved activation maximization method that combines straight-through approximation with normalization across the parameters of the input sequence distribution. Fast SeqProp overcomes bottlenecks in earlier methods arising from input parameters becoming skewed during optimization. Compared to prior methods, Fast SeqProp results in up to 100-fold faster convergence while also finding improved fitness optima for many applications. We demonstrate Fast SeqProp's capabilities by designing DNA and protein sequences for six deep learning predictors, including a protein structure predictor. CONCLUSIONS: Fast SeqProp offers a reliable and efficient method for general-purpose sequence optimization through a differentiable fitness predictor. As demonstrated on a variety of deep learning models, the method is widely applicable, and can incorporate various regularization techniques to maintain confidence in the sequence designs. As a design tool, Fast SeqProp may aid in the development of novel molecules, drug therapies and vaccines.


Assuntos
Algoritmos , Aprendizado de Máquina , Sequência de Aminoácidos
13.
Nat Commun ; 12(1): 4764, 2021 08 06.
Artigo em Inglês | MEDLINE | ID: mdl-34362913

RESUMO

As global demand for digital storage capacity grows, storage technologies based on synthetic DNA have emerged as a dense and durable alternative to traditional media. Existing approaches leverage robust error correcting codes and precise molecular mechanisms to reliably retrieve specific files from large databases. Typically, files are retrieved using a pre-specified key, analogous to a filename. However, these approaches lack the ability to perform more complex computations over the stored data, such as similarity search: e.g., finding images that look similar to an image of interest without prior knowledge of their file names. Here we demonstrate a technique for executing similarity search over a DNA-based database of 1.6 million images. Queries are implemented as hybridization probes, and a key step in our approach was to learn an image-to-sequence encoding ensuring that queries preferentially bind to targets representing visually similar images. Experimental results show that our molecular implementation performs comparably to state-of-the-art in silico algorithms for similarity search.


Assuntos
Biologia Computacional/métodos , DNA/química , Bases de Dados Genéticas , Armazenamento e Recuperação da Informação , Algoritmos , Sequência de Bases , Simulação por Computador , DNA/genética , Sondas de DNA , Bases de Dados Factuais , Redes Neurais de Computação
14.
Sci Rep ; 11(1): 15845, 2021 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-34349150

RESUMO

We performed a comprehensive analysis of the transcriptional changes occurring during human induced pluripotent stem cell (hiPSC) differentiation to cardiomyocytes. Using single cell RNA-seq, we sequenced > 20,000 single cells from 55 independent samples representing two differentiation protocols and multiple hiPSC lines. Samples included experimental replicates ranging from undifferentiated hiPSCs to mixed populations of cells at D90 post-differentiation. Differentiated cell populations clustered by time point, with differential expression analysis revealing markers of cardiomyocyte differentiation and maturation changing from D12 to D90. We next performed a complementary cluster-independent sparse regression analysis to identify and rank genes that best assigned cells to differentiation time points. The two highest ranked genes between D12 and D24 (MYH7 and MYH6) resulted in an accuracy of 0.84, and the three highest ranked genes between D24 and D90 (A2M, H19, IGF2) resulted in an accuracy of 0.94, revealing that low dimensional gene features can identify differentiation or maturation stages in differentiating cardiomyocytes. Expression levels of select genes were validated using RNA FISH. Finally, we interrogated differences in cardiac gene expression resulting from two differentiation protocols, experimental replicates, and three hiPSC lines in the WTC-11 background to identify sources of variation across these experimental variables.


Assuntos
Biomarcadores/metabolismo , Diferenciação Celular , Regulação da Expressão Gênica , Células-Tronco Pluripotentes Induzidas/metabolismo , Miócitos Cardíacos/citologia , Miócitos Cardíacos/metabolismo , Transcriptoma , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , RNA-Seq
15.
Nat Neurosci ; 24(8): 1163-1175, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34140698

RESUMO

The human neonatal cerebellum is one-fourth of its adult size yet contains the blueprint required to integrate environmental cues with developing motor, cognitive and emotional skills into adulthood. Although mature cerebellar neuroanatomy is well studied, understanding of its developmental origins is limited. In this study, we systematically mapped the molecular, cellular and spatial composition of human fetal cerebellum by combining laser capture microscopy and SPLiT-seq single-nucleus transcriptomics. We profiled functionally distinct regions and gene expression dynamics within cell types and across development. The resulting cell atlas demonstrates that the molecular organization of the cerebellar anlage recapitulates cytoarchitecturally distinct regions and developmentally transient cell types that are distinct from the mouse cerebellum. By mapping genes dominant for pediatric and adult neurological disorders onto our dataset, we identify relevant cell types underlying disease mechanisms. These data provide a resource for probing the cellular basis of human cerebellar development and disease.


Assuntos
Cerebelo/embriologia , Neurogênese , Feto , Humanos , Microdissecção e Captura a Laser , Análise de Célula Única , Transcriptoma
16.
Science ; 371(6531)2021 02 19.
Artigo em Inglês | MEDLINE | ID: mdl-33335020

RESUMO

Single-cell RNA sequencing (scRNA-seq) has become an essential tool for characterizing gene expression in eukaryotes, but current methods are incompatible with bacteria. Here, we introduce microSPLiT (microbial split-pool ligation transcriptomics), a high-throughput scRNA-seq method for Gram-negative and Gram-positive bacteria that can resolve heterogeneous transcriptional states. We applied microSPLiT to >25,000 Bacillus subtilis cells sampled at different growth stages, creating an atlas of changes in metabolism and lifestyle. We retrieved detailed gene expression profiles associated with known, but rare, states such as competence and prophage induction and also identified unexpected gene expression states, including the heterogeneous activation of a niche metabolic pathway in a subpopulation of cells. MicroSPLiT paves the way to high-throughput analysis of gene expression in bacterial communities that are otherwise not amenable to single-cell analysis, such as natural microbiota.


Assuntos
Bacillus subtilis/genética , Regulação Bacteriana da Expressão Gênica , Redes e Vias Metabólicas/genética , RNA-Seq/métodos , Análise de Célula Única/métodos , Antibacterianos/biossíntese , Fagos Bacilares/fisiologia , Bacillus subtilis/crescimento & desenvolvimento , Bacillus subtilis/metabolismo , Carbono/metabolismo , Meios de Cultura , Escherichia coli/genética , Fermentação/genética , Gluconeogênese/genética , Glicólise/genética , Resposta ao Choque Térmico/genética , Inositol/metabolismo , Transporte de Íons , Metais/metabolismo , Movimento , Óperon , RNA Bacteriano/genética , Estresse Fisiológico , Transcrição Gênica , Transcriptoma , Ativação Viral
17.
Nat Commun ; 11(1): 3264, 2020 06 29.
Artigo em Inglês | MEDLINE | ID: mdl-32601272

RESUMO

DNA has recently emerged as an attractive medium for archival data storage. Recent work has demonstrated proof-of-principle prototype systems; however, very uneven (biased) sequencing coverage has been reported, which indicates inefficiencies in the storage process. Deviations from the average coverage in the sequence copy distribution can either cause wasteful provisioning in sequencing or excessive number of missing sequences. Here, we use millions of unique sequences from a DNA-based digital data archival system to study the oligonucleotide copy unevenness problem and show that the two paramount sources of bias are the synthesis and amplification (PCR) processes. Based on these findings, we develop a statistical model for each molecular process as well as the overall process. We further use our model to explore the trade-offs between synthesis bias, storage physical density, logical redundancy, and sequencing redundancy, providing insights for engineering efficient, robust DNA data storage systems.


Assuntos
Armazenamento e Recuperação da Informação , Análise de Sequência de DNA , Viés , Modelos Teóricos , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/estatística & dados numéricos
18.
Cell Syst ; 11(1): 49-62.e16, 2020 07 22.
Artigo em Inglês | MEDLINE | ID: mdl-32711843

RESUMO

Engineering gene and protein sequences with defined functional properties is a major goal of synthetic biology. Deep neural network models, together with gradient ascent-style optimization, show promise for sequence design. The generated sequences can however get stuck in local minima and often have low diversity. Here, we develop deep exploration networks (DENs), a class of activation-maximizing generative models, which minimize the cost of a neural network fitness predictor by gradient descent. By penalizing any two generated patterns on the basis of a similarity metric, DENs explicitly maximize sequence diversity. To avoid drifting into low-confidence regions of the predictor, we incorporate variational autoencoders to maintain the likelihood ratio of generated sequences. Using DENs, we engineered polyadenylation signals with more than 10-fold higher selection odds than the best gradient ascent-generated patterns, identified splice regulatory sequences predicted to result in highly differential splicing between cell lines, and improved on state-of-the-art results for protein design tasks.


Assuntos
DNA/genética , Redes Neurais de Computação , Análise de Sequência de Proteína/métodos , Humanos
19.
Soft Matter ; 16(14): 3555-3563, 2020 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-32219296

RESUMO

Biology offers compelling proof that macroscopic "living materials" can emerge from reactions between diffusing biomolecules. Here, we show that molecular self-organization could be a similarly powerful approach for engineering functional synthetic materials. We introduce a programmable DNA embedded hydrogel that produces tunable patterns at the centimeter length scale. We generate these patterns by implementing chemical reaction networks through synthetic DNA complexes, embedding the complexes in the hydrogel, and triggering with locally applied input DNA strands. We first demonstrate ring pattern formation around a circular input cavity and show that the ring width and intensity can be predictably tuned. Then, we create patterns of increasing complexity, including concentric rings and non-isotropic patterns. Finally, we show "destructive" and "constructive" interference patterns, by combining several ring-forming modules in the gel and triggering them from multiple sources. We further show that computer simulations based on the reaction-diffusion model can predict and inform the programming of target patterns.


Assuntos
Simulação por Computador , DNA/química , Hidrogéis/química , Modelos Químicos
20.
Am J Hum Genet ; 105(3): 606-615, 2019 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-31474318

RESUMO

Cerebellar malformations are diverse congenital anomalies frequently associated with developmental disability. Although genetic and prenatal non-genetic causes have been described, no systematic analysis has been performed. Here, we present a large-exome sequencing study of Dandy-Walker malformation (DWM) and cerebellar hypoplasia (CBLH). We performed exome sequencing in 282 individuals from 100 families with DWM or CBLH, and we established a molecular diagnosis in 36 of 100 families, with a significantly higher yield for CBLH (51%) than for DWM (16%). The 41 variants impact 27 neurodevelopmental-disorder-associated genes, thus demonstrating that CBLH and DWM are often features of monogenic neurodevelopmental disorders. Though only seven monogenic causes (19%) were identified in more than one individual, neuroimaging review of 131 additional individuals confirmed cerebellar abnormalities in 23 of 27 genetic disorders (85%). Prenatal risk factors were frequently found among individuals without a genetic diagnosis (30 of 64 individuals [47%]). Single-cell RNA sequencing of prenatal human cerebellar tissue revealed gene enrichment in neuronal and vascular cell types; this suggests that defective vasculogenesis may disrupt cerebellar development. Further, de novo gain-of-function variants in PDGFRB, a tyrosine kinase receptor essential for vascular progenitor signaling, were associated with CBLH, and this discovery links genetic and non-genetic etiologies. Our results suggest that genetic defects impact specific cerebellar cell types and implicate abnormal vascular development as a mechanism for cerebellar malformations. We also confirmed a major contribution for non-genetic prenatal factors in individuals with cerebellar abnormalities, substantially influencing diagnostic evaluation and counseling regarding recurrence risk and prognosis.


Assuntos
Cerebelo/anormalidades , Cerebelo/diagnóstico por imagem , Estudos de Coortes , Feminino , Humanos , Masculino , Gravidez
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...