Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 57
Filtrar
1.
bioRxiv ; 2024 Jun 14.
Artículo en Inglés | MEDLINE | ID: mdl-38915713

RESUMEN

An important and largely unsolved problem in synthetic biology is how to target gene expression to specific cell types. Here, we apply iterative deep learning to design synthetic enhancers with strong differential activity between two human cell lines. We initially train models on published datasets of enhancer activity and chromatin accessibility and use them to guide the design of synthetic enhancers that maximize predicted specificity. We experimentally validate these sequences, use the measurements to re-optimize the predictor, and design a second generation of enhancers with improved specificity. Our design methods embed relevant transcription factor binding site (TFBS) motifs with higher frequencies than comparable endogenous enhancers while using a more selective motif vocabulary, and we show that enhancer activity is correlated with transcription factor expression at the single cell level. Finally, we characterize causal features of top enhancers via perturbation experiments and show enhancers as short as 50bp can maintain specificity.

2.
Nat Protoc ; 2024 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-38886529

RESUMEN

Microbial split-pool ligation transcriptomics (microSPLiT) is a high-throughput single-cell RNA sequencing method for bacteria. With four combinatorial barcoding rounds, microSPLiT can profile transcriptional states in hundreds of thousands of Gram-negative and Gram-positive bacteria in a single experiment without specialized equipment. As bacterial samples are fixed and permeabilized before barcoding, they can be collected and stored ahead of time. During the first barcoding round, the fixed and permeabilized bacteria are distributed into a 96-well plate, where their transcripts are reverse transcribed into cDNA and labeled with the first well-specific barcode inside the cells. The cells are mixed and redistributed two more times into new 96-well plates, where the second and third barcodes are appended to the cDNA via in-cell ligation reactions. Finally, the cells are mixed and divided into aliquot sub-libraries, which can be stored until future use or prepared for sequencing with the addition of a fourth barcode. It takes 4 days to generate sequencing-ready libraries, including 1 day for collection and overnight fixation of samples. The standard plate setup enables single-cell transcriptional profiling of up to 1 million bacterial cells and up to 96 samples in a single barcoding experiment, with the possibility of expansion by adding barcoding rounds. The protocol requires experience in basic molecular biology techniques, handling of bacterial samples and preparation of DNA libraries for next-generation sequencing. It can be performed by experienced undergraduate or graduate students. Data analysis requires access to computing resources, familiarity with Unix command line and basic experience with Python or R.

3.
Nat Commun ; 15(1): 5284, 2024 Jun 20.
Artículo en Inglés | MEDLINE | ID: mdl-38902240

RESUMEN

mRNA therapeutics are revolutionizing the pharmaceutical industry, but methods to optimize the primary sequence for increased expression are still lacking. Here, we design 5'UTRs for efficient mRNA translation using deep learning. We perform polysome profiling of fully or partially randomized 5'UTR libraries in three cell types and find that UTR performance is highly correlated across cell types. We train models on our datasets and use them to guide the design of high-performing 5'UTRs using gradient descent and generative neural networks. We experimentally test designed 5'UTRs with mRNA encoding megaTALTM gene editing enzymes for two different gene targets and in two different cell lines. We find that the designed 5'UTRs support strong gene editing activity. Editing efficiency is correlated between cell types and gene targets, although the best performing UTR was specific to one cargo and cell type. Our results highlight the potential of model-based sequence design for mRNA therapeutics.


Asunto(s)
Regiones no Traducidas 5' , Aprendizaje Profundo , Edición Génica , ARN Mensajero , ARN Mensajero/genética , ARN Mensajero/metabolismo , Regiones no Traducidas 5'/genética , Humanos , Edición Génica/métodos , Polirribosomas/metabolismo , Línea Celular , Células HEK293 , Biosíntesis de Proteínas
4.
bioRxiv ; 2023 Nov 23.
Artículo en Inglés | MEDLINE | ID: mdl-38045294

RESUMEN

The 5' UTRs of mRNAs are critical for translation regulation, but their in vivo regulatory features are poorly characterized. Here, we report the regulatory landscape of 5' UTRs during early zebrafish embryogenesis using a massively parallel reporter assay of 18,154 sequences coupled to polysome profiling. We found that the 5' UTR is sufficient to confer temporal dynamics to translation initiation, and identified 86 motifs enriched in 5' UTRs with distinct ribosome recruitment capabilities. A quantitative deep learning model, DaniO5P, revealed a combined role for 5' UTR length, translation initiation site context, upstream AUGs and sequence motifs on in vivo ribosome recruitment. DaniO5P predicts the activities of 5' UTR isoforms and indicates that modulating 5' UTR length and motif grammar contributes to translation initiation dynamics. This study provides a first quantitative model of 5' UTR-based translation regulation in early vertebrate development and lays the foundation for identifying the underlying molecular effectors.

5.
Nat Struct Mol Biol ; 30(12): 1947-1957, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38087090

RESUMEN

JTE-607 is an anticancer and anti-inflammatory compound and its active form, compound 2, directly binds to and inhibits CPSF73, the endonuclease for the cleavage step in pre-messenger RNA (pre-mRNA) 3' processing. Surprisingly, compound 2-mediated inhibition of pre-mRNA cleavage is sequence specific and the drug sensitivity is predominantly determined by sequences flanking the cleavage site (CS). Using massively parallel in vitro assays, we identified key sequence features that determine drug sensitivity. We trained a machine learning model that can predict poly(A) site (PAS) relative sensitivity to compound 2 and provide the molecular basis for understanding the impact of JTE-607 on PAS selection and transcription termination genome wide. We propose that CPSF73 and associated factors bind to the CS region in a sequence-dependent manner and the interaction affinity determines compound 2 sensitivity. These results have not only elucidated the mechanism of action of JTE-607, but also unveiled an evolutionarily conserved sequence specificity of the mRNA 3' processing machinery.


Asunto(s)
Precursores del ARN , Procesamiento Postranscripcional del ARN , Línea Celular , Precursores del ARN/genética , Precursores del ARN/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo
6.
Nat Nanotechnol ; 18(8): 912-921, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37142708

RESUMEN

DNA has emerged as an attractive medium for archival data storage due to its durability and high information density. Scalable parallel random access to information is a desirable property of any storage system. For DNA-based storage systems, however, this still needs to be robustly established. Here we report on a thermoconfined polymerase chain reaction, which enables multiplexed, repeated random access to compartmentalized DNA files. The strategy is based on localizing biotin-functionalized oligonucleotides inside thermoresponsive, semipermeable microcapsules. At low temperatures, microcapsules are permeable to enzymes, primers and amplified products, whereas at high temperatures, membrane collapse prevents molecular crosstalk during amplification. Our data show that the platform outperforms non-compartmentalized DNA storage compared with repeated random access and reduces amplification bias tenfold during multiplex polymerase chain reaction. Using fluorescent sorting, we also demonstrate sample pooling and data retrieval by microcapsule barcoding. Therefore, the thermoresponsive microcapsule technology offers a scalable, sequence-agnostic approach for repeated random access to archival DNA files.


Asunto(s)
ADN , Almacenamiento y Recuperación de la Información , Cápsulas , ADN/genética , Oligonucleótidos , Secuenciación de Nucleótidos de Alto Rendimiento
7.
bioRxiv ; 2023 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-37090613

RESUMEN

JTE-607 is a small molecule compound with anti-inflammation and anti-cancer activities. Upon entering the cell, it is hydrolyzed to Compound 2, which directly binds to and inhibits CPSF73, the endonuclease for the cleavage step in pre-mRNA 3' processing. Although CPSF73 is universally required for mRNA 3' end formation, we have unexpectedly found that Compound 2- mediated inhibition of pre-mRNA 3' processing is sequence-specific and that the sequences flanking the cleavage site (CS) are a major determinant for drug sensitivity. By using massively parallel in vitro assays, we have measured the Compound 2 sensitivities of over 260,000 sequence variants and identified key sequence features that determine drug sensitivity. A machine learning model trained on these data can predict the impact of JTE-607 on poly(A) site (PAS) selection and transcription termination genome-wide. We propose a biochemical model in which CPSF73 and other mRNA 3' processing factors bind to RNA of the CS region in a sequence-specific manner and the affinity of such interaction determines the Compound 2 sensitivity of a PAS. As the Compound 2-resistant CS sequences, characterized by U/A-rich motifs, are prevalent in PASs from yeast to human, the CS region sequence may have more fundamental functions beyond determining drug resistance. Together, our study not only characterized the mechanism of action of a compound with clinical implications, but also revealed a previously unknown and evolutionarily conserved sequence-specificity of the mRNA 3' processing machinery.

8.
bioRxiv ; 2023 Aug 16.
Artículo en Inglés | MEDLINE | ID: mdl-36798377

RESUMEN

Protein-protein interactions (PPIs) regulate many cellular processes, and engineered PPIs have cell and gene therapy applications. Here we introduce massively parallel protein-protein interaction measurement by sequencing (MP3-seq), an easy-to-use and highly scalable yeast-two-hybrid approach for measuring PPIs. In MP3-seq, DNA barcodes are associated with specific protein pairs, and barcode enrichment can be read by sequencing to provide a direct measure of interaction strength. We show that MP3-seq is highly quantitative and scales to over 100,000 interactions. We apply MP3-seq to characterize interactions between families of rationally designed heterodimers and to investigate elements conferring specificity to coiled-coil interactions. Finally, we predict coiled heterodimer structures using AlphaFold-Multimer (AF-M) and train linear models on physics simulation energy terms to predict MP3-seq values. We find that AF-M and AF-M complex prediction-based models could be valuable for pre-screening interactions, but that measuring interactions experimentally remains necessary to rank their strengths quantitatively.

9.
Genome Biol ; 23(1): 232, 2022 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-36335397

RESUMEN

BACKGROUND: 3'-end processing by cleavage and polyadenylation is an important and finely tuned regulatory process during mRNA maturation. Numerous genetic variants are known to cause or contribute to human disorders by disrupting the cis-regulatory code of polyadenylation signals. Yet, due to the complexity of this code, variant interpretation remains challenging. RESULTS: We introduce a residual neural network model, APARENT2, that can infer 3'-cleavage and polyadenylation from DNA sequence more accurately than any previous model. This model generalizes to the case of alternative polyadenylation (APA) for a variable number of polyadenylation signals. We demonstrate APARENT2's performance on several variant datasets, including functional reporter data and human 3' aQTLs from GTEx. We apply neural network interpretation methods to gain insights into disrupted or protective higher-order features of polyadenylation. We fine-tune APARENT2 on human tissue-resolved transcriptomic data to elucidate tissue-specific variant effects. By combining APARENT2 with models of mRNA stability, we extend aQTL effect size predictions to the entire 3' untranslated region. Finally, we perform in silico saturation mutagenesis of all human polyadenylation signals and compare the predicted effects of [Formula: see text] million variants against gnomAD. While loss-of-function variants were generally selected against, we also find specific clinical conditions linked to gain-of-function mutations. For example, we detect an association between gain-of-function mutations in the 3'-end and autism spectrum disorder. To experimentally validate APARENT2's predictions, we assayed clinically relevant variants in multiple cell lines, including microglia-derived cells. CONCLUSIONS: A sequence-to-function model based on deep residual learning enables accurate functional interpretation of genetic variants in polyadenylation signals and, when coupled with large human variation databases, elucidates the link between functional 3'-end mutations and human health.


Asunto(s)
Trastorno del Espectro Autista , Poliadenilación , Humanos , Trastorno del Espectro Autista/genética , Estabilidad del ARN/genética , Transcriptoma , Variación Genética , Regiones no Traducidas 3'
10.
Nat Mach Intell ; 4(1): 41-54, 2022 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-35966405

RESUMEN

Sequence-based neural networks can learn to make accurate predictions from large biological datasets, but model interpretation remains challenging. Many existing feature attribution methods are optimized for continuous rather than discrete input patterns and assess individual feature importance in isolation, making them ill-suited for interpreting non-linear interactions in molecular sequences. Building on work in computer vision and natural language processing, we developed an approach based on deep learning - Scrambler networks - wherein the most salient sequence positions are identified with learned input masks. Scramblers learn to predict Position-Specific Scoring Matrices (PSSMs) where unimportant nucleotides or residues are scrambled by raising their entropy. We apply Scramblers to interpret the effects of genetic variants, uncover non-linear interactions between cis-regulatory elements, explain binding specificity for protein-protein interactions, and identify structural determinants of de novo designed proteins. We show that Scramblers enable efficient attribution across large datasets and result in high-quality explanations, often outperforming state-of-the-art methods.

11.
Nat Commun ; 13(1): 4904, 2022 08 20.
Artículo en Inglés | MEDLINE | ID: mdl-35987925

RESUMEN

DNA has emerged as a powerful substrate for programming information processing machines at the nanoscale. Among the DNA computing primitives used today, DNA strand displacement (DSD) is arguably the most popular, with DSD-based circuit applications ranging from disease diagnostics to molecular artificial neural networks. The outputs of DSD circuits are generally read using fluorescence spectroscopy. However, due to the spectral overlap of typical small-molecule fluorescent reporters, the number of unique outputs that can be detected in parallel is limited, requiring complex optical setups or spatial isolation of reactions to make output bandwidths scalable. Here, we present a multiplexable sequencing-free readout method that enables real-time, kinetic measurement of DSD circuit activity through highly parallel, direct detection of barcoded output strands using nanopore sensor array technology (Oxford Nanopore Technologies' MinION device). These results increase DSD output bandwidth by an order of magnitude over what is currently feasible with fluorescence spectroscopy.


Asunto(s)
Nanoporos , ADN , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Recombinación Genética , Análisis de Secuencia de ADN/métodos
12.
Elife ; 112022 03 21.
Artículo en Inglés | MEDLINE | ID: mdl-35312478

RESUMEN

Division of labor between cells is ubiquitous in biology but the use of multicellular consortia for engineering applications is only beginning to be explored. A significant advantage of multicellular circuits is their potential to be modular with respect to composition but this claim has not yet been extensively tested using experiments and quantitative modeling. Here, we construct a library of 24 yeast strains capable of sending, receiving or responding to three molecular signals, characterize them experimentally and build quantitative models of their input-output relationships. We then compose these strains into two- and three-strain cascades as well as a four-strain bistable switch and show that experimentally measured consortia dynamics can be predicted from the models of the constituent parts. To further explore the achievable range of behaviors, we perform a fully automated computational search over all two-, three-, and four-strain consortia to identify combinations that realize target behaviors including logic gates, band-pass filters, and time pulses. Strain combinations that are predicted to map onto a target behavior are further computationally optimized and then experimentally tested. Experiments closely track computational predictions. The high reliability of these model descriptions further strengthens the feasibility and highlights the potential for distributed computing in synthetic biology.


Asunto(s)
Saccharomyces cerevisiae , Biología Sintética , Biblioteca de Genes , Lógica , Reproducibilidad de los Resultados , Saccharomyces cerevisiae/genética , Biología Sintética/métodos
13.
Acc Chem Res ; 55(1): 24-34, 2022 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-34905691

RESUMEN

Over just the last 2 years, mRNA therapeutics and vaccines have undergone a rapid transition from an intriguing concept to real-world impact. However, whereas some aspects of mRNA therapeutics, such as the use of chemical modifications to increase stability and reduce immunogenicity, have been extensively optimized for over two decades, other aspects, particularly the selection and design of the noncoding leader and trailer sequences which control translation efficiency and stability, have received comparably less attention. In practice, such 5' and 3' untranslated regions (UTRs) are often borrowed from highly expressed human genes with few or no modifications, as in the case for the Pfizer/BioNTech Covid vaccine. Focusing on the 5'UTR, we here argue that model-driven design is a promising alternative that provides unprecedented control over 5'UTR function. We review recent work that combines synthetic biology with machine learning to build quantitative models that relate ribosome loading, and thus translation efficiency, to the 5'UTR sequence. We first introduce an experimental approach that uses polysome profiling and high-throughput sequencing to quantify ribosome loading for hundreds of thousands of 5'UTRs in parallel. We apply this approach to measure ribosome loading in synthetic RNA libraries with a random sequence inserted into the 5'UTR. We then review Optimus 5-Prime, a convolutional neural network model trained on the experimental data. We highlight that very accurate models of biological regulation can be learned from synthetic data sets with degenerate 5'UTRs. We validate model predictions not only on held-out data sets from our random library but also on a large library of over 30 000 human 5'UTR fragments and using translation reporter data collected independently by other groups. Both the experiment and model are compatible with commonly used chemically modified nucleosides, in particular, pseudouridine (Ψ) and 1-methyl-pseudouridine (m1Ψ). We find that, in general, 5'UTRs have very similar impacts when combined with different protein-coding sequences and even in the context of different chemical modifications. We demonstrate that Optimus 5-Prime can be combined with design algorithms to generate de novo sequences with precisely defined translation efficiencies. We emphasize recent developments in design algorithms that rely on activation maximization and generative modeling to improve both the fitness and diversity of designed sequences. Compared with prior approaches such as genetic algorithms, we show that these approaches are not only faster but also less likely to get stuck in local sequence optima. Finally, we discuss how the approach reviewed here can be generalized to other gene regions and applications.


Asunto(s)
COVID-19 , Biosíntesis de Proteínas , Vacunas contra la COVID-19 , Humanos , Aprendizaje Automático , ARN Mensajero/genética , ARN Mensajero/metabolismo , SARS-CoV-2
14.
Bioinformatics ; 38(5): 1393-1402, 2022 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-34893819

RESUMEN

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) is widely used for analyzing gene expression in multi-cellular systems and provides unprecedented access to cellular heterogeneity. scRNA-seq experiments aim to identify and quantify all cell types present in a sample. Measured single-cell transcriptomes are grouped by similarity and the resulting clusters are mapped to cell types based on cluster-specific gene expression patterns. While the process of generating clusters has become largely automated, annotation remains a laborious ad hoc effort that requires expert biological knowledge. RESULTS: Here, we introduce CellMeSH-a new automated approach to identifying cell types for clusters based on prior literature. CellMeSH combines a database of gene-cell-type associations with a probabilistic method for database querying. The database is constructed by automatically linking gene and cell-type information from millions of publications using existing indexed literature resources. Compared to manually constructed databases, CellMeSH is more comprehensive and is easily updated with new data. The probabilistic query method enables reliable information retrieval even though the gene-cell-type associations extracted from the literature are noisy. CellMeSH is also able to optionally utilize prior knowledge about tissues or cells for further annotation improvement. CellMeSH achieves top-one and top-three accuracies on a number of mouse and human datasets that are consistently better than existing approaches. AVAILABILITY AND IMPLEMENTATION: Web server at https://uncurl.cs.washington.edu/db_query and API at https://github.com/shunfumao/cellmesh. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Programas Informáticos , Humanos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos
15.
BMC Bioinformatics ; 22(1): 510, 2021 Oct 20.
Artículo en Inglés | MEDLINE | ID: mdl-34670493

RESUMEN

BACKGROUND: Optimization of DNA and protein sequences based on Machine Learning models is becoming a powerful tool for molecular design. Activation maximization offers a simple design strategy for differentiable models: one-hot coded sequences are first approximated by a continuous representation, which is then iteratively optimized with respect to the predictor oracle by gradient ascent. While elegant, the current version of the method suffers from vanishing gradients and may cause predictor pathologies leading to poor convergence. RESULTS: Here, we introduce Fast SeqProp, an improved activation maximization method that combines straight-through approximation with normalization across the parameters of the input sequence distribution. Fast SeqProp overcomes bottlenecks in earlier methods arising from input parameters becoming skewed during optimization. Compared to prior methods, Fast SeqProp results in up to 100-fold faster convergence while also finding improved fitness optima for many applications. We demonstrate Fast SeqProp's capabilities by designing DNA and protein sequences for six deep learning predictors, including a protein structure predictor. CONCLUSIONS: Fast SeqProp offers a reliable and efficient method for general-purpose sequence optimization through a differentiable fitness predictor. As demonstrated on a variety of deep learning models, the method is widely applicable, and can incorporate various regularization techniques to maintain confidence in the sequence designs. As a design tool, Fast SeqProp may aid in the development of novel molecules, drug therapies and vaccines.


Asunto(s)
Algoritmos , Aprendizaje Automático , Secuencia de Aminoácidos
16.
Sci Rep ; 11(1): 15845, 2021 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-34349150

RESUMEN

We performed a comprehensive analysis of the transcriptional changes occurring during human induced pluripotent stem cell (hiPSC) differentiation to cardiomyocytes. Using single cell RNA-seq, we sequenced > 20,000 single cells from 55 independent samples representing two differentiation protocols and multiple hiPSC lines. Samples included experimental replicates ranging from undifferentiated hiPSCs to mixed populations of cells at D90 post-differentiation. Differentiated cell populations clustered by time point, with differential expression analysis revealing markers of cardiomyocyte differentiation and maturation changing from D12 to D90. We next performed a complementary cluster-independent sparse regression analysis to identify and rank genes that best assigned cells to differentiation time points. The two highest ranked genes between D12 and D24 (MYH7 and MYH6) resulted in an accuracy of 0.84, and the three highest ranked genes between D24 and D90 (A2M, H19, IGF2) resulted in an accuracy of 0.94, revealing that low dimensional gene features can identify differentiation or maturation stages in differentiating cardiomyocytes. Expression levels of select genes were validated using RNA FISH. Finally, we interrogated differences in cardiac gene expression resulting from two differentiation protocols, experimental replicates, and three hiPSC lines in the WTC-11 background to identify sources of variation across these experimental variables.


Asunto(s)
Biomarcadores/metabolismo , Diferenciación Celular , Regulación de la Expresión Génica , Células Madre Pluripotentes Inducidas/metabolismo , Miocitos Cardíacos/citología , Miocitos Cardíacos/metabolismo , Transcriptoma , Humanos , Células Madre Pluripotentes Inducidas/citología , RNA-Seq
17.
Nat Commun ; 12(1): 4764, 2021 08 06.
Artículo en Inglés | MEDLINE | ID: mdl-34362913

RESUMEN

As global demand for digital storage capacity grows, storage technologies based on synthetic DNA have emerged as a dense and durable alternative to traditional media. Existing approaches leverage robust error correcting codes and precise molecular mechanisms to reliably retrieve specific files from large databases. Typically, files are retrieved using a pre-specified key, analogous to a filename. However, these approaches lack the ability to perform more complex computations over the stored data, such as similarity search: e.g., finding images that look similar to an image of interest without prior knowledge of their file names. Here we demonstrate a technique for executing similarity search over a DNA-based database of 1.6 million images. Queries are implemented as hybridization probes, and a key step in our approach was to learn an image-to-sequence encoding ensuring that queries preferentially bind to targets representing visually similar images. Experimental results show that our molecular implementation performs comparably to state-of-the-art in silico algorithms for similarity search.


Asunto(s)
Biología Computacional/métodos , ADN/química , Bases de Datos Genéticas , Almacenamiento y Recuperación de la Información , Algoritmos , Secuencia de Bases , Simulación por Computador , ADN/genética , Sondas de ADN , Bases de Datos Factuales , Redes Neurales de la Computación
18.
Nat Neurosci ; 24(8): 1163-1175, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34140698

RESUMEN

The human neonatal cerebellum is one-fourth of its adult size yet contains the blueprint required to integrate environmental cues with developing motor, cognitive and emotional skills into adulthood. Although mature cerebellar neuroanatomy is well studied, understanding of its developmental origins is limited. In this study, we systematically mapped the molecular, cellular and spatial composition of human fetal cerebellum by combining laser capture microscopy and SPLiT-seq single-nucleus transcriptomics. We profiled functionally distinct regions and gene expression dynamics within cell types and across development. The resulting cell atlas demonstrates that the molecular organization of the cerebellar anlage recapitulates cytoarchitecturally distinct regions and developmentally transient cell types that are distinct from the mouse cerebellum. By mapping genes dominant for pediatric and adult neurological disorders onto our dataset, we identify relevant cell types underlying disease mechanisms. These data provide a resource for probing the cellular basis of human cerebellar development and disease.


Asunto(s)
Cerebelo/embriología , Neurogénesis , Feto , Humanos , Captura por Microdisección con Láser , Análisis de la Célula Individual , Transcriptoma
19.
Science ; 371(6531)2021 02 19.
Artículo en Inglés | MEDLINE | ID: mdl-33335020

RESUMEN

Single-cell RNA sequencing (scRNA-seq) has become an essential tool for characterizing gene expression in eukaryotes, but current methods are incompatible with bacteria. Here, we introduce microSPLiT (microbial split-pool ligation transcriptomics), a high-throughput scRNA-seq method for Gram-negative and Gram-positive bacteria that can resolve heterogeneous transcriptional states. We applied microSPLiT to >25,000 Bacillus subtilis cells sampled at different growth stages, creating an atlas of changes in metabolism and lifestyle. We retrieved detailed gene expression profiles associated with known, but rare, states such as competence and prophage induction and also identified unexpected gene expression states, including the heterogeneous activation of a niche metabolic pathway in a subpopulation of cells. MicroSPLiT paves the way to high-throughput analysis of gene expression in bacterial communities that are otherwise not amenable to single-cell analysis, such as natural microbiota.


Asunto(s)
Bacillus subtilis/genética , Regulación Bacteriana de la Expresión Génica , Redes y Vías Metabólicas/genética , RNA-Seq/métodos , Análisis de la Célula Individual/métodos , Antibacterianos/biosíntesis , Fagos de Bacillus/fisiología , Bacillus subtilis/crecimiento & desarrollo , Bacillus subtilis/metabolismo , Carbono/metabolismo , Medios de Cultivo , Escherichia coli/genética , Fermentación/genética , Gluconeogénesis/genética , Glucólisis/genética , Respuesta al Choque Térmico/genética , Inositol/metabolismo , Transporte Iónico , Metales/metabolismo , Movimiento , Operón , ARN Bacteriano/genética , Estrés Fisiológico , Transcripción Genética , Transcriptoma , Activación Viral
20.
Cell Syst ; 11(1): 49-62.e16, 2020 07 22.
Artículo en Inglés | MEDLINE | ID: mdl-32711843

RESUMEN

Engineering gene and protein sequences with defined functional properties is a major goal of synthetic biology. Deep neural network models, together with gradient ascent-style optimization, show promise for sequence design. The generated sequences can however get stuck in local minima and often have low diversity. Here, we develop deep exploration networks (DENs), a class of activation-maximizing generative models, which minimize the cost of a neural network fitness predictor by gradient descent. By penalizing any two generated patterns on the basis of a similarity metric, DENs explicitly maximize sequence diversity. To avoid drifting into low-confidence regions of the predictor, we incorporate variational autoencoders to maintain the likelihood ratio of generated sequences. Using DENs, we engineered polyadenylation signals with more than 10-fold higher selection odds than the best gradient ascent-generated patterns, identified splice regulatory sequences predicted to result in highly differential splicing between cell lines, and improved on state-of-the-art results for protein design tasks.


Asunto(s)
ADN/genética , Redes Neurales de la Computación , Análisis de Secuencia de Proteína/métodos , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...