Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Cell ; 178(1): 91-106.e23, 2019 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-31178116

RESUMO

Alternative polyadenylation (APA) is a major driver of transcriptome diversity in human cells. Here, we use deep learning to predict APA from DNA sequence alone. We trained our model (APARENT, APA REgression NeT) on isoform expression data from over 3 million APA reporters. APARENT's predictions are highly accurate when tasked with inferring APA in synthetic and human 3'UTRs. Visualizing features learned across all network layers reveals that APARENT recognizes sequence motifs known to recruit APA regulators, discovers previously unknown sequence determinants of 3' end processing, and integrates these features into a comprehensive, interpretable, cis-regulatory code. We apply APARENT to forward engineer functional polyadenylation signals with precisely defined cleavage position and isoform usage and validate predictions experimentally. Finally, we use APARENT to quantify the impact of genetic variants on APA. Our approach detects pathogenic variants in a wide range of disease contexts, expanding our understanding of the genetic origins of disease.


Assuntos
Aprendizado Profundo , Modelos Genéticos , Poliadenilação/genética , Regiões 3' não Traduzidas/genética , Sequência de Bases/genética , Bases de Dados Genéticas , Expressão Gênica/genética , Células HEK293 , Humanos , Mutagênese/genética , Clivagem do RNA/genética , RNA Mensageiro/genética , RNA-Seq , Biologia Sintética , Transcriptoma
2.
Cell ; 163(3): 698-711, 2015 Oct 22.
Artigo em Inglês | MEDLINE | ID: mdl-26496609

RESUMO

Most human transcripts are alternatively spliced, and many disease-causing mutations affect RNA splicing. Toward better modeling the sequence determinants of alternative splicing, we measured the splicing patterns of over two million (M) synthetic mini-genes, which include degenerate subsequences totaling over 100 M bases of variation. The massive size of these training data allowed us to improve upon current models of splicing, as well as to gain new mechanistic insights. Our results show that the vast majority of hexamer sequence motifs measurably influence splice site selection when positioned within alternative exons, with multiple motifs acting additively rather than cooperatively. Intriguingly, motifs that enhance (suppress) exon inclusion in alternative 5' splicing also enhance (suppress) exon inclusion in alternative 3' or cassette exon splicing, suggesting a universal mechanism for alternative exon recognition. Finally, our empirically trained models are highly predictive of the effects of naturally occurring variants on alternative splicing in vivo.


Assuntos
Processamento Alternativo , Genoma Humano , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Sequência de Bases , Humanos , Dados de Sequência Molecular , Motivos de Nucleotídeos , Sítios de Splice de RNA
3.
Am J Hum Genet ; 105(3): 606-615, 2019 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-31474318

RESUMO

Cerebellar malformations are diverse congenital anomalies frequently associated with developmental disability. Although genetic and prenatal non-genetic causes have been described, no systematic analysis has been performed. Here, we present a large-exome sequencing study of Dandy-Walker malformation (DWM) and cerebellar hypoplasia (CBLH). We performed exome sequencing in 282 individuals from 100 families with DWM or CBLH, and we established a molecular diagnosis in 36 of 100 families, with a significantly higher yield for CBLH (51%) than for DWM (16%). The 41 variants impact 27 neurodevelopmental-disorder-associated genes, thus demonstrating that CBLH and DWM are often features of monogenic neurodevelopmental disorders. Though only seven monogenic causes (19%) were identified in more than one individual, neuroimaging review of 131 additional individuals confirmed cerebellar abnormalities in 23 of 27 genetic disorders (85%). Prenatal risk factors were frequently found among individuals without a genetic diagnosis (30 of 64 individuals [47%]). Single-cell RNA sequencing of prenatal human cerebellar tissue revealed gene enrichment in neuronal and vascular cell types; this suggests that defective vasculogenesis may disrupt cerebellar development. Further, de novo gain-of-function variants in PDGFRB, a tyrosine kinase receptor essential for vascular progenitor signaling, were associated with CBLH, and this discovery links genetic and non-genetic etiologies. Our results suggest that genetic defects impact specific cerebellar cell types and implicate abnormal vascular development as a mechanism for cerebellar malformations. We also confirmed a major contribution for non-genetic prenatal factors in individuals with cerebellar abnormalities, substantially influencing diagnostic evaluation and counseling regarding recurrence risk and prognosis.


Assuntos
Cerebelo/anormalidades , Cerebelo/diagnóstico por imagem , Estudos de Coortes , Feminino , Humanos , Masculino , Gravidez
4.
Nucleic Acids Res ; 47(2): 970-980, 2019 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-30462273

RESUMO

Cancer-associated mutations of the core splicing factor 3 B1 (SF3B1) result in selection of novel 3' splice sites (3'SS), but precise molecular mechanisms of oncogenesis remain unclear. SF3B1 stabilizes the interaction between U2 snRNP and branch point (BP) on the pre-mRNA. It has hence been speculated that a change in BP selection is the basis for novel 3'SS selection. Direct quantitative determination of BP utilization is however technically challenging. To define BP utilization by SF3B1-mutant spliceosomes, we used an overexpression approach in human cells as well as a complementary strategy using isogenic murine embryonic stem cells with monoallelic K700E mutations constructed via CRISPR/Cas9-based genome editing and a dual vector homology-directed repair methodology. A synthetic minigene library with degenerate regions in 3' intronic regions (3.4 million individual minigenes) was used to compare BP usage of SF3B1K700E and SF3B1WT. Using this model, we show that SF3B1K700E spliceosomes utilize non-canonical sequence variants (at position -1 relative to BP adenosine) more frequently than wild-type spliceosomes. These predictions were confirmed using minigene splicing assays. Our results suggest a model of BP utilization by mutant SF3B1 wherein it is able to utilize non-consensus alternative BP sequences by stabilizing weaker U2-BP interactions.


Assuntos
Fatores de Processamento de RNA/metabolismo , Animais , Pareamento de Bases , Células Cultivadas , Células-Tronco Embrionárias/metabolismo , Biblioteca Gênica , Células HEK293 , Humanos , Camundongos , Mutação , Motivos de Nucleotídeos , Fosfoproteínas/genética , Sítios de Splice de RNA , Fatores de Processamento de RNA/genética , RNA Mensageiro/metabolismo
5.
Genome Res ; 27(12): 2015-2024, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-29097404

RESUMO

Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5' untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5' UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5' UTRs as well as native S. cerevisiae 5' UTRs. The model additionally was used to computationally evolve highly active 5' UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model.


Assuntos
Modelos Genéticos , Saccharomyces cerevisiae/genética , Regiões 5' não Traduzidas , Processamento Alternativo , Simulação por Computador , Biblioteca Gênica , Aprendizado de Máquina , Redes Neurais de Computação , RNA Fúngico , RNA Mensageiro
6.
Nat Protoc ; 2024 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-38886529

RESUMO

Microbial split-pool ligation transcriptomics (microSPLiT) is a high-throughput single-cell RNA sequencing method for bacteria. With four combinatorial barcoding rounds, microSPLiT can profile transcriptional states in hundreds of thousands of Gram-negative and Gram-positive bacteria in a single experiment without specialized equipment. As bacterial samples are fixed and permeabilized before barcoding, they can be collected and stored ahead of time. During the first barcoding round, the fixed and permeabilized bacteria are distributed into a 96-well plate, where their transcripts are reverse transcribed into cDNA and labeled with the first well-specific barcode inside the cells. The cells are mixed and redistributed two more times into new 96-well plates, where the second and third barcodes are appended to the cDNA via in-cell ligation reactions. Finally, the cells are mixed and divided into aliquot sub-libraries, which can be stored until future use or prepared for sequencing with the addition of a fourth barcode. It takes 4 days to generate sequencing-ready libraries, including 1 day for collection and overnight fixation of samples. The standard plate setup enables single-cell transcriptional profiling of up to 1 million bacterial cells and up to 96 samples in a single barcoding experiment, with the possibility of expansion by adding barcoding rounds. The protocol requires experience in basic molecular biology techniques, handling of bacterial samples and preparation of DNA libraries for next-generation sequencing. It can be performed by experienced undergraduate or graduate students. Data analysis requires access to computing resources, familiarity with Unix command line and basic experience with Python or R.

7.
Science ; 371(6531)2021 02 19.
Artigo em Inglês | MEDLINE | ID: mdl-33335020

RESUMO

Single-cell RNA sequencing (scRNA-seq) has become an essential tool for characterizing gene expression in eukaryotes, but current methods are incompatible with bacteria. Here, we introduce microSPLiT (microbial split-pool ligation transcriptomics), a high-throughput scRNA-seq method for Gram-negative and Gram-positive bacteria that can resolve heterogeneous transcriptional states. We applied microSPLiT to >25,000 Bacillus subtilis cells sampled at different growth stages, creating an atlas of changes in metabolism and lifestyle. We retrieved detailed gene expression profiles associated with known, but rare, states such as competence and prophage induction and also identified unexpected gene expression states, including the heterogeneous activation of a niche metabolic pathway in a subpopulation of cells. MicroSPLiT paves the way to high-throughput analysis of gene expression in bacterial communities that are otherwise not amenable to single-cell analysis, such as natural microbiota.


Assuntos
Bacillus subtilis/genética , Regulação Bacteriana da Expressão Gênica , Redes e Vias Metabólicas/genética , RNA-Seq/métodos , Análise de Célula Única/métodos , Antibacterianos/biossíntese , Fagos Bacilares/fisiologia , Bacillus subtilis/crescimento & desenvolvimento , Bacillus subtilis/metabolismo , Carbono/metabolismo , Meios de Cultura , Escherichia coli/genética , Fermentação/genética , Gluconeogênese/genética , Glicólise/genética , Resposta ao Choque Térmico/genética , Inositol/metabolismo , Transporte de Íons , Metais/metabolismo , Movimento , Óperon , RNA Bacteriano/genética , Estresse Fisiológico , Transcrição Gênica , Transcriptoma , Ativação Viral
8.
Nat Neurosci ; 24(8): 1163-1175, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34140698

RESUMO

The human neonatal cerebellum is one-fourth of its adult size yet contains the blueprint required to integrate environmental cues with developing motor, cognitive and emotional skills into adulthood. Although mature cerebellar neuroanatomy is well studied, understanding of its developmental origins is limited. In this study, we systematically mapped the molecular, cellular and spatial composition of human fetal cerebellum by combining laser capture microscopy and SPLiT-seq single-nucleus transcriptomics. We profiled functionally distinct regions and gene expression dynamics within cell types and across development. The resulting cell atlas demonstrates that the molecular organization of the cerebellar anlage recapitulates cytoarchitecturally distinct regions and developmentally transient cell types that are distinct from the mouse cerebellum. By mapping genes dominant for pediatric and adult neurological disorders onto our dataset, we identify relevant cell types underlying disease mechanisms. These data provide a resource for probing the cellular basis of human cerebellar development and disease.


Assuntos
Cerebelo/embriologia , Neurogênese , Feto , Humanos , Microdissecção e Captura a Laser , Análise de Célula Única , Transcriptoma
9.
Sci Rep ; 11(1): 15845, 2021 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-34349150

RESUMO

We performed a comprehensive analysis of the transcriptional changes occurring during human induced pluripotent stem cell (hiPSC) differentiation to cardiomyocytes. Using single cell RNA-seq, we sequenced > 20,000 single cells from 55 independent samples representing two differentiation protocols and multiple hiPSC lines. Samples included experimental replicates ranging from undifferentiated hiPSCs to mixed populations of cells at D90 post-differentiation. Differentiated cell populations clustered by time point, with differential expression analysis revealing markers of cardiomyocyte differentiation and maturation changing from D12 to D90. We next performed a complementary cluster-independent sparse regression analysis to identify and rank genes that best assigned cells to differentiation time points. The two highest ranked genes between D12 and D24 (MYH7 and MYH6) resulted in an accuracy of 0.84, and the three highest ranked genes between D24 and D90 (A2M, H19, IGF2) resulted in an accuracy of 0.94, revealing that low dimensional gene features can identify differentiation or maturation stages in differentiating cardiomyocytes. Expression levels of select genes were validated using RNA FISH. Finally, we interrogated differences in cardiac gene expression resulting from two differentiation protocols, experimental replicates, and three hiPSC lines in the WTC-11 background to identify sources of variation across these experimental variables.


Assuntos
Biomarcadores/metabolismo , Diferenciação Celular , Regulação da Expressão Gênica , Células-Tronco Pluripotentes Induzidas/metabolismo , Miócitos Cardíacos/citologia , Miócitos Cardíacos/metabolismo , Transcriptoma , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , RNA-Seq
10.
Cell Syst ; 11(1): 49-62.e16, 2020 07 22.
Artigo em Inglês | MEDLINE | ID: mdl-32711843

RESUMO

Engineering gene and protein sequences with defined functional properties is a major goal of synthetic biology. Deep neural network models, together with gradient ascent-style optimization, show promise for sequence design. The generated sequences can however get stuck in local minima and often have low diversity. Here, we develop deep exploration networks (DENs), a class of activation-maximizing generative models, which minimize the cost of a neural network fitness predictor by gradient descent. By penalizing any two generated patterns on the basis of a similarity metric, DENs explicitly maximize sequence diversity. To avoid drifting into low-confidence regions of the predictor, we incorporate variational autoencoders to maintain the likelihood ratio of generated sequences. Using DENs, we engineered polyadenylation signals with more than 10-fold higher selection odds than the best gradient ascent-generated patterns, identified splice regulatory sequences predicted to result in highly differential splicing between cell lines, and improved on state-of-the-art results for protein design tasks.


Assuntos
DNA/genética , Redes Neurais de Computação , Análise de Sequência de Proteína/métodos , Humanos
11.
Science ; 360(6385): 176-182, 2018 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-29545511

RESUMO

To facilitate scalable profiling of single cells, we developed split-pool ligation-based transcriptome sequencing (SPLiT-seq), a single-cell RNA-seq (scRNA-seq) method that labels the cellular origin of RNA through combinatorial barcoding. SPLiT-seq is compatible with fixed cells or nuclei, allows efficient sample multiplexing, and requires no customized equipment. We used SPLiT-seq to analyze 156,049 single-nucleus transcriptomes from postnatal day 2 and 11 mouse brains and spinal cords. More than 100 cell types were identified, with gene expression patterns corresponding to cellular function, regional specificity, and stage of differentiation. Pseudotime analysis revealed transcriptional programs driving four developmental lineages, providing a snapshot of early postnatal development in the murine central nervous system. SPLiT-seq provides a path toward comprehensive single-cell transcriptomic analysis of other similarly complex multicellular systems.


Assuntos
Encéfalo/crescimento & desenvolvimento , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica no Desenvolvimento , Análise de Célula Única/métodos , Medula Espinal/crescimento & desenvolvimento , Transcriptoma , Animais , Núcleo Celular/genética , Células HEK293 , Humanos , Camundongos , Células NIH 3T3 , Neurônios/metabolismo , Análise de Sequência de RNA
12.
ACS Synth Biol ; 3(5): 324-31, 2014 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-24847681

RESUMO

Achieving precise control of mammalian transgene expression has remained a long-standing, and increasingly urgent, challenge in biomedical science. Despite much work, single-cell methods have consistently revealed that mammalian gene expression levels remain susceptible to fluctuations (noise) and external perturbations. Here, we show that precise control of protein synthesis can be realized using a single-gene microRNA (miRNA)-based feed-forward loop (sgFFL). This minimal autoregulatory gene circuit consists of an intronic miRNA that targets its own transcript. In response to a step-like increase in transcription rate, the network generated a transient protein expression pulse before returning to a lower steady state level, thus exhibiting adaptation. Critically, the steady state protein levels were independent of the size of the stimulus, demonstrating that this simple network architecture effectively buffered protein production against changes in transcription. The single-gene network architecture was also effective in buffering against transcriptional noise, leading to reduced cell-to-cell variability in protein synthesis. Noise was up to 5-fold lower for a sgFFL than for an unregulated control gene with equal mean protein levels. The noise buffering capability varied predictably with the strength of the miRNA-target interaction. Together, these results suggest that the sgFFL single-gene motif provides a general and broadly applicable platform for robust gene expression in synthetic and natural gene circuits.


Assuntos
Redes Reguladoras de Genes/genética , MicroRNAs/genética , Modelos Genéticos , Biossíntese de Proteínas/genética , Biologia Sintética/métodos , Animais , Linhagem Celular , Retroalimentação Fisiológica , Proteínas Luminescentes/genética , Proteínas Luminescentes/metabolismo , Camundongos , MicroRNAs/metabolismo , Transdução de Sinais/genética , Transdução de Sinais/fisiologia , Proteína Vermelha Fluorescente
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA