RESUMO
Although we can increasingly measure transcription, chromatin, methylation, and other aspects of molecular biology at single-cell resolution, most assays survey only one aspect of cellular biology. Here we describe sci-CAR, a combinatorial indexing-based coassay that jointly profiles chromatin accessibility and mRNA (CAR) in each of thousands of single cells. As a proof of concept, we apply sci-CAR to 4825 cells, including a time series of dexamethasone treatment, as well as to 11,296 cells from the adult mouse kidney. With the resulting data, we compare the pseudotemporal dynamics of chromatin accessibility and gene expression, reconstruct the chromatin accessibility profiles of cell types defined by RNA profiles, and link cis-regulatory sites to their target genes on the basis of the covariance of chromatin accessibility and transcription across large numbers of single cells.
Assuntos
Cromatina/metabolismo , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Genômica/métodos , Análise de Célula Única/métodos , Células A549 , Animais , Dexametasona/farmacologia , Regulação da Expressão Gênica/efeitos dos fármacos , Células HEK293 , Humanos , Rim/citologia , Rim/efeitos dos fármacos , Camundongos , Células NIH 3T3 , Elementos Reguladores de Transcrição/efeitos dos fármacos , Transcrição Gênica/efeitos dos fármacosRESUMO
We applied a combinatorial indexing assay, sci-ATAC-seq, to profile genome-wide chromatin accessibility in â¼100,000 single cells from 13 adult mouse tissues. We identify 85 distinct patterns of chromatin accessibility, most of which can be assigned to cell types, and â¼400,000 differentially accessible elements. We use these data to link regulatory elements to their target genes, to define the transcription factor grammar specifying each cell type, and to discover in vivo correlates of heterogeneity in accessibility within cell types. We develop a technique for mapping single cell gene expression data to single-cell chromatin accessibility data, facilitating the comparison of atlases. By intersecting mouse chromatin accessibility with human genome-wide association summary statistics, we identify cell-type-specific enrichments of the heritability signal for hundreds of complex traits. These data define the in vivo landscape of the regulatory genome for common mammalian cell types at single-cell resolution.
Assuntos
Cromatina/química , Análise de Célula Única/métodos , Animais , Análise por Conglomerados , Epigênese Genética , Epigenômica , Regulação da Expressão Gênica , Genoma Humano , Estudo de Associação Genômica Ampla , Humanos , Masculino , Mamíferos , Camundongos , Camundongos Endogâmicos C57BL , Fatores de TranscriçãoRESUMO
Linking regulatory DNA elements to their target genes, which may be located hundreds of kilobases away, remains challenging. Here, we introduce Cicero, an algorithm that identifies co-accessible pairs of DNA elements using single-cell chromatin accessibility data and so connects regulatory elements to their putative target genes. We apply Cicero to investigate how dynamically accessible elements orchestrate gene regulation in differentiating myoblasts. Groups of Cicero-linked regulatory elements meet criteria of "chromatin hubs"-they are enriched for physical proximity, interact with a common set of transcription factors, and undergo coordinated changes in histone marks that are predictive of changes in gene expression. Pseudotemporal analysis revealed that most DNA elements remain in chromatin hubs throughout differentiation. A subset of elements bound by MYOD1 in myoblasts exhibit early opening in a PBX1- and MEIS1-dependent manner. Our strategy can be applied to dissect the architecture, sequence determinants, and mechanisms of cis-regulation on a genome-wide scale.
Assuntos
Montagem e Desmontagem da Cromatina/genética , Cromatina/genética , DNA/genética , Elementos Facilitadores Genéticos/genética , Regulação da Expressão Gênica/genética , Adolescente , Diferenciação Celular/genética , Feminino , Genes Homeobox/genética , Histonas/genética , Humanos , Mioblastos/fisiologia , Fatores de Transcrição/genéticaRESUMO
Understanding how gene regulatory networks control the progressive restriction of cell fates is a long-standing challenge. Recent advances in measuring gene expression in single cells are providing new insights into lineage commitment. However, the regulatory events underlying these changes remain unclear. Here we investigate the dynamics of chromatin regulatory landscapes during embryogenesis at single-cell resolution. Using single-cell combinatorial indexing assay for transposase accessible chromatin with sequencing (sci-ATAC-seq), we profiled chromatin accessibility in over 20,000 single nuclei from fixed Drosophila melanogaster embryos spanning three landmark embryonic stages: 2-4 h after egg laying (predominantly stage 5 blastoderm nuclei), when each embryo comprises around 6,000 multipotent cells; 6-8 h after egg laying (predominantly stage 10-11), to capture a midpoint in embryonic development when major lineages in the mesoderm and ectoderm are specified; and 10-12 h after egg laying (predominantly stage 13), when each of the embryo's more than 20,000 cells are undergoing terminal differentiation. Our results show that there is spatial heterogeneity in the accessibility of the regulatory genome before gastrulation, a feature that aligns with future cell fate, and that nuclei can be temporally ordered along developmental trajectories. During mid-embryogenesis, tissue granularity emerges such that individual cell types can be inferred by their chromatin accessibility while maintaining a signature of their germ layer of origin. Analysis of the data reveals overlapping usage of regulatory elements between cells of the endoderm and non-myogenic mesoderm, suggesting a common developmental program that is reminiscent of the mesendoderm lineage in other species. We identify 30,075 distal regulatory elements that exhibit tissue-specific accessibility. We validated the germ-layer specificity of a subset of these predicted enhancers in transgenic embryos, achieving an accuracy of 90%. Overall, our results demonstrate the power of shotgun single-cell profiling of embryos to resolve dynamic changes in the chromatin landscape during development, and to uncover the cis-regulatory programs of metazoan germ layers and cell types.
Assuntos
Drosophila melanogaster/citologia , Drosophila melanogaster/embriologia , Desenvolvimento Embrionário/genética , Regulação da Expressão Gênica no Desenvolvimento , Análise de Célula Única , Animais , Diferenciação Celular/genética , Linhagem da Célula/genética , Cromatina/genética , Cromatina/metabolismo , Drosophila melanogaster/genética , Endoderma/citologia , Endoderma/metabolismo , Elementos Facilitadores Genéticos/genética , Feminino , Gastrulação/genética , Genoma de Inseto/genética , Masculino , Mesoderma/citologia , Mesoderma/metabolismo , Especificidade de Órgãos/genética , Organismos Geneticamente Modificados/citologia , Organismos Geneticamente Modificados/genética , Reprodutibilidade dos TestesRESUMO
Motivation: The increasing availability of chromatin immunoprecipitation sequencing (ChIP-Seq) data enables us to learn more about the action of transcription factors in the regulation of gene expression. Even though in vivo transcriptional regulation often involves the concerted action of more than one transcription factor, the format of each individual ChIP-Seq dataset usually represents the action of a single transcription factor. Therefore, a relational database in which available ChIP-Seq datasets are curated is essential. Results: We present Expresso (database and webserver) as a tool for the collection and integration of available Arabidopsis ChIP-Seq peak data, which in turn can be linked to a user's gene expression data. Known target genes of transcription factors were identified by motif analysis of publicly available GEO ChIP-Seq data sets. Expresso currently provides three services: 1) Identification of target genes of a given transcription factor; 2) Identification of transcription factors that regulate a gene of interest; 3) Computation of correlation between the gene expression of transcription factors and their target genes. Availability: Expresso is freely available at http://bioinformatics.cs.vt.edu/expresso/.
RESUMO
BACKGROUND: Alternative splicing has been proposed to increase transcript diversity and protein plasticity in eukaryotic organisms, but the extent to which this is the case is currently unclear, especially with regard to the diversification of molecular function. Eukaryotic splicing involves complex interactions of splicing factors and their targets. Inference of co-splicing networks capturing these types of interactions is important for understanding this crucial, highly regulated post-transcriptional process at the systems level. RESULTS: First, several transcript and protein attributes, including coding potential of transcripts and differences in functional domains of proteins, were compared between splice variants and protein isoforms to assess transcript and protein diversity in a biological system. Alternative splicing was shown to increase transcript and function-related protein diversity in developing Arabidopsis embryos. Second, CoSpliceNet, which integrates co-expression and motif discovery at splicing regulatory regions to infer co-splicing networks, was developed. CoSpliceNet was applied to temporal RNA sequencing data to identify candidate regulators of splicing events and predict RNA-binding motifs, some of which are supported by prior experimental evidence. Analysis of inferred splicing factor targets revealed an unexpected role for the unfolded protein response in embryo development. CONCLUSIONS: The methods presented here can be used in any biological system to assess transcript diversity and protein plasticity and to predict candidate regulators, their targets, and RNA-binding motifs for splicing factors. CoSpliceNet is freely available at http://delasa.github.io/co-spliceNet/ .
Assuntos
Biologia Computacional/métodos , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Splicing de RNA , Transcrição Gênica , Processamento Alternativo , Arabidopsis/genética , Arabidopsis/metabolismo , Desenvolvimento Embrionário/genética , Regulação da Expressão Gênica no Desenvolvimento , Regulação da Expressão Gênica de Plantas , Isoformas de Proteínas , Sementes/genética , Sementes/metabolismoRESUMO
Gene regulatory networks (GRNs) provide a representation of relationships between regulators and their target genes. Several methods for GRN inference, both unsupervised and supervised, have been developed to date. Because regulatory relationships consistently reprogram in diverse tissues or under different conditions, GRNs inferred without specific biological contexts are of limited applicability. In this report, a machine learning approach is presented to predict GRNs specific to developing Arabidopsis thaliana embryos. We developed the Beacon GRN inference tool to predict GRNs occurring during seed development in Arabidopsis based on a support vector machine (SVM) model. We developed both global and local inference models and compared their performance, demonstrating that local models are generally superior for our application. Using both the expression levels of the genes expressed in developing embryos and prior known regulatory relationships, GRNs were predicted for specific embryonic developmental stages. The targets that are strongly positively correlated with their regulators are mostly expressed at the beginning of seed development. Potential direct targets were identified based on a match between the promoter regions of these inferred targets and the cis elements recognized by specific regulators. Our analysis also provides evidence for previously unknown inhibitory effects of three positive regulators of gene expression. The Beacon GRN inference tool provides a valuable model system for context-specific GRN inference and is freely available at https://github.com/BeaconProjectAtVirginiaTech/beacon_network_inference.git.
RESUMO
Developing Arabidopsis seeds accumulate oils and seed storage proteins synthesized by the pathways of primary metabolism. Seed development and metabolism are positively regulated by transcription factors belonging to the LAFL (LEC1, AB13, FUSCA3 and LEC2) regulatory network. The VAL gene family encodes repressors of the seed maturation program in germinating seeds, although they are also expressed during seed maturation. The possible regulatory role of VAL1 in seed development has not been studied to date. Reverse genetics revealed that val1 mutant seeds accumulated elevated levels of proteins compared with the wild type, suggesting that VAL1 functions as a repressor of seed metabolism; however, in the absence of VAL1, the levels of metabolites, ABA, auxin and jasmonate derivatives did not change significantly in developing embryos. Two VAL1 splice variants were identified through RNA sequencing analysis: a full-length form and a truncated form lacking the plant homeodomain-like domain associated with epigenetic repression. None of the transcripts encoding the core LAFL network transcription factors were affected in val1 embryos. Instead, activation of VAL1 by FUSCA3 appears to result in the repression of a subset of seed maturation genes downstream of core LAFL regulators, as 39% of transcripts in the FUSCA3 regulon were derepressed in the val1 mutant. The LEC1 and LEC2 regulons also responded, but to a lesser extent. Additional 832 transcripts that were not LAFL targets were derepressed in val1 mutant embryos. These transcripts are candidate targets of VAL1, acting through epigenetic and/or transcriptional repression.
Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/embriologia , Arabidopsis/metabolismo , Regulação da Expressão Gênica de Plantas , Proteínas Repressoras/metabolismo , Fatores de Transcrição/metabolismo , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Proteínas Estimuladoras de Ligação a CCAAT/genética , Proteínas Estimuladoras de Ligação a CCAAT/metabolismo , Regulação da Expressão Gênica no Desenvolvimento/genética , Regulação da Expressão Gênica de Plantas/genética , Proteínas Repressoras/genética , Fatores de Transcrição/genéticaRESUMO
BACKGROUND: Transcriptomics reveals the existence of transcripts of different coding potential and strand orientation. Alternative splicing (AS) can yield proteins with altered number and types of functional domains, suggesting the global occurrence of transcriptional and post-transcriptional events. Many biological processes, including seed maturation and desiccation, are regulated post-transcriptionally (e.g., by AS), leading to the production of more than one coding or noncoding sense transcript from a single locus. RESULTS: We present an integrated computational framework to predict isoform-specific functions of plant transcripts. This framework includes a novel plant-specific weighted support vector machine classifier called CodeWise, which predicts the coding potential of transcripts with over 96 % accuracy, and several other tools enabling global sequence similarity, functional domain, and co-expression network analyses. First, this framework was applied to all detected transcripts (103,106), out of which 13 % was predicted by CodeWise to be noncoding RNAs in developing soybean embryos. Second, to investigate the role of AS during soybean embryo development, a population of 2,938 alternatively spliced and differentially expressed splice variants was analyzed and mined with respect to timing of expression. Conserved domain analyses revealed that AS resulted in global changes in the number, types, and extent of truncation of functional domains in protein variants. Isoform-specific co-expression network analysis using ArrayMining and clustering analyses revealed specific sub-networks and potential interactions among the components of selected signaling pathways related to seed maturation and the acquisition of desiccation tolerance. These signaling pathways involved abscisic acid- and FUSCA3-related transcripts, several of which were classified as noncoding and/or antisense transcripts and were co-expressed with corresponding coding transcripts. Noncoding and antisense transcripts likely play important regulatory roles in seed maturation- and desiccation-related signaling in soybean. CONCLUSIONS: This work demonstrates how our integrated framework can be implemented to make experimentally testable predictions regarding the coding potential, co-expression, co-regulation, and function of transcripts and proteins related to a biological process of interest.
Assuntos
Processamento Alternativo , Regulação da Expressão Gênica de Plantas , Glycine max/genética , Transcriptoma , Genes de Plantas , RNA de Plantas , Sementes/genética , Glycine max/embriologiaRESUMO
Soybean (Glycine max) seeds are an important source of seed storage compounds, including protein, oil, and sugar used for food, feed, chemical, and biofuel production. We assessed detailed temporal transcriptional and metabolic changes in developing soybean embryos to gain a systems biology view of developmental and metabolic changes and to identify potential targets for metabolic engineering. Two major developmental and metabolic transitions were captured enabling identification of potential metabolic engineering targets specific to seed filling and to desiccation. The first transition involved a switch between different types of metabolism in dividing and elongating cells. The second transition involved the onset of maturation and desiccation tolerance during seed filling and a switch from photoheterotrophic to heterotrophic metabolism. Clustering analyses of metabolite and transcript data revealed clusters of functionally related metabolites and transcripts active in these different developmental and metabolic programs. The gene clusters provide a resource to generate predictions about the associations and interactions of unknown regulators with their targets based on "guilt-by-association" relationships. The inferred regulators also represent potential targets for future metabolic engineering of relevant pathways and steps in central carbon and nitrogen metabolism in soybean embryos and drought and desiccation tolerance in plants.
RESUMO
Developing soybean seeds accumulate oils, proteins, and carbohydrates that are used as oxidizable substrates providing metabolic precursors and energy during seed germination. The accumulation of these storage compounds in developing seeds is highly regulated at multiple levels, including at transcriptional and post-transcriptional regulation. RNA sequencing was used to provide comprehensive information about transcriptional and post-transcriptional events that take place in developing soybean embryos. Bioinformatics analyses lead to the identification of different classes of alternatively spliced isoforms and corresponding changes in their levels on a global scale during soybean embryo development. Alternative splicing was associated with transcripts involved in various metabolic and developmental processes, including central carbon and nitrogen metabolism, induction of maturation and dormancy, and splicing itself. Detailed examination of selected RNA isoforms revealed alterations in individual domains that could result in changes in subcellular localization of the resulting proteins, protein-protein and enzyme-substrate interactions, and regulation of protein activities. Different isoforms may play an important role in regulating developmental and metabolic processes occurring at different stages in developing oilseed embryos.