Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
PLoS Comput Biol ; 17(9): e1008991, 2021 09.
Article in English | MEDLINE | ID: mdl-34570758

ABSTRACT

Identification of biopolymer motifs represents a key step in the analysis of biological sequences. The MEME Suite is a widely used toolkit for comprehensive analysis of biopolymer motifs; however, these tools are poorly integrated within popular analysis frameworks like the R/Bioconductor project, creating barriers to their use. Here we present memes, an R package that provides a seamless R interface to a selection of popular MEME Suite tools. memes provides a novel "data aware" interface to these tools, enabling rapid and complex discriminative motif analysis workflows. In addition to interfacing with popular MEME Suite tools, memes leverages existing R/Bioconductor data structures to store the multidimensional data returned by MEME Suite tools for rapid data access and manipulation. Finally, memes provides data visualization capabilities to facilitate communication of results. memes is available as a Bioconductor package at https://bioconductor.org/packages/memes, and the source code can be found at github.com/snystrom/memes.


Subject(s)
Amino Acid Motifs , Computational Biology/methods , Nucleotide Motifs , Software , Animals , Chromatin Immunoprecipitation Sequencing/statistics & numerical data , Computational Biology/statistics & numerical data , Data Interpretation, Statistical , Humans
2.
PLoS Comput Biol ; 17(9): e1009368, 2021 09.
Article in English | MEDLINE | ID: mdl-34473698

ABSTRACT

The ChIP-seq signal of histone modifications at promoters is a good predictor of gene expression in different cellular contexts, but whether this is also true at enhancers is not clear. To address this issue, we develop quantitative models to characterize the relationship of gene expression with histone modifications at enhancers or promoters. We use embryonic stem cells (ESCs), which contain a full spectrum of active and repressed (poised) enhancers, to train predictive models. As many poised enhancers in ESCs switch towards an active state during differentiation, predictive models can also be trained on poised enhancers throughout differentiation and in development. Remarkably, we determine that histone modifications at enhancers, as well as promoters, are predictive of gene expression in ESCs and throughout differentiation and development. Importantly, we demonstrate that their contribution to the predictive models varies depending on their location in enhancers or promoters. Moreover, we use a local regression (LOESS) to normalize sequencing data from different sources, which allows us to apply predictive models trained in a specific cellular context to a different one. We conclude that the relationship between gene expression and histone modifications at enhancers is universal and different from promoters. Our study provides new insight into how histone modifications relate to gene expression based on their location in enhancers or promoters.


Subject(s)
Enhancer Elements, Genetic , Gene Expression , Histone Code/genetics , Models, Genetic , Promoter Regions, Genetic , Animals , Cell Differentiation/genetics , Cells, Cultured , Chromatin Immunoprecipitation Sequencing/statistics & numerical data , Computational Biology , Humans , Mice , Mouse Embryonic Stem Cells/metabolism , Regression Analysis
3.
PLoS Comput Biol ; 17(7): e1009203, 2021 07.
Article in English | MEDLINE | ID: mdl-34292930

ABSTRACT

Transcription factors (TFs) often function as a module including both master factors and mediators binding at cis-regulatory regions to modulate nearby gene transcription. ChIP-seq profiling of multiple TFs makes it feasible to infer functional TF modules. However, when inferring TF modules based on co-localization of ChIP-seq peaks, often many weak binding events are missed, especially for mediators, resulting in incomplete identification of modules. To address this problem, we develop a ChIP-seq data-driven Gibbs Sampler to infer Modules (ChIP-GSM) using a Bayesian framework that integrates ChIP-seq profiles of multiple TFs. ChIP-GSM samples read counts of module TFs iteratively to estimate the binding potential of a module to each region and, across all regions, estimates the module abundance. Using inferred module-region probabilistic bindings as feature units, ChIP-GSM then employs logistic regression to predict active regulatory elements. Validation of ChIP-GSM predicted regulatory regions on multiple independent datasets sharing the same context confirms the advantage of using TF modules for predicting regulatory activity. In a case study of K562 cells, we demonstrate that the ChIP-GSM inferred modules form as groups, activate gene expression at different time points, and mediate diverse functional cellular processes. Hence, ChIP-GSM infers biologically meaningful TF modules and improves the prediction accuracy of regulatory region activities.


Subject(s)
Chromatin Immunoprecipitation Sequencing/methods , Gene Regulatory Networks , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/genetics , Transcription Factors/metabolism , Bayes Theorem , Binding Sites/genetics , Chromatin/genetics , Chromatin/metabolism , Chromatin Immunoprecipitation Sequencing/statistics & numerical data , Computational Biology , Enhancer Elements, Genetic , Epigenesis, Genetic , Gene Expression Regulation , Humans , K562 Cells , MCF-7 Cells , Models, Statistical , Promoter Regions, Genetic
4.
Commun Biol ; 4(1): 661, 2021 06 02.
Article in English | MEDLINE | ID: mdl-34079046

ABSTRACT

Detecting changes in the activity of a transcription factor (TF) in response to a perturbation provides insights into the underlying cellular process. Transcription Factor Enrichment Analysis (TFEA) is a robust and reliable computational method that detects positional motif enrichment associated with changes in transcription observed in response to a perturbation. TFEA detects positional motif enrichment within a list of ranked regions of interest (ROIs), typically sites of RNA polymerase initiation inferred from regulatory data such as nascent transcription. Therefore, we also introduce muMerge, a statistically principled method of generating a consensus list of ROIs from multiple replicates and conditions. TFEA is broadly applicable to data that informs on transcriptional regulation including nascent transcription (eg. PRO-Seq), CAGE, histone ChIP-Seq, and accessibility data (e.g., ATAC-Seq). TFEA not only identifies the key regulators responding to a perturbation, but also temporally unravels regulatory networks with time series data. Consequently, TFEA serves as a hypothesis-generating tool that provides an easy, rigorous, and cost-effective means to broadly assess TF activity yielding new biological insights.


Subject(s)
Transcription Factors/metabolism , Breast/cytology , Breast/metabolism , Cell Line , Chromatin Immunoprecipitation Sequencing/statistics & numerical data , Computational Biology/methods , Computer Simulation , Dexamethasone/pharmacology , Epithelial Cells/metabolism , Female , Gene Expression Regulation , Genetic Techniques/statistics & numerical data , HCT116 Cells , Humans , Imidazoles/pharmacology , Piperazines/pharmacology , Receptors, Glucocorticoid/drug effects , Receptors, Glucocorticoid/metabolism , Transcription Factors/genetics , Transcription, Genetic , Tumor Suppressor Protein p53/genetics , Tumor Suppressor Protein p53/metabolism
5.
J Invest Dermatol ; 141(7): 1745-1753, 2021 07.
Article in English | MEDLINE | ID: mdl-33607116

ABSTRACT

Psoriasis is a complex, chronic inflammatory skin disease characterized by keratinocyte hyperproliferation and a disordered immune response; however, its exact etiology remains unknown. To better understand the regulatory network underlying psoriasis, we explored the landscape of chromatin accessibility by using an assay for transposase-accessible chromatin using sequencing analysis of 15 psoriatic, 9 nonpsoriatic, and 19 normal skin tissue samples, and the chromatin accessibility data were integrated with genomic, epigenomic, and transcriptomic datasets. We identified 4,915 genomic regions that displayed differential accessibility in psoriatic samples compared with both nonpsoriatic and normal samples, nearly all of which exhibited an increased accessibility in psoriatic skin tissue. These differentially accessible regions tended to be more hypomethylated and correlated with the expression of their linked genes, which comprised several psoriasis susceptibility loci. Analyses of the differentially accessible region sequences showed that they were most highly enriched with FRA1 and/or activator protein-1 transcription factor DNA-binding motifs. We also found that AIM2, which encodes an important inflammasome component that triggers skin inflammation, is a direct target of FRA1 and/or activator protein-1. Our study provided clear insights and resources for an improved understanding of the pathogenesis of psoriasis. These disease-associated accessible regions might serve as therapeutic targets for psoriasis treatment in the future.


Subject(s)
Chromatin/metabolism , Gene Regulatory Networks/immunology , Psoriasis/genetics , Transposases/metabolism , Case-Control Studies , Chromatin Immunoprecipitation Sequencing/statistics & numerical data , DNA Methylation , Datasets as Topic , Epigenomics , Female , Humans , Inflammasomes/genetics , Inflammasomes/immunology , Male , Psoriasis/immunology , Psoriasis/pathology , RNA-Seq/statistics & numerical data , Skin/immunology , Skin/pathology
6.
PLoS Comput Biol ; 16(11): e1008405, 2020 11.
Article in English | MEDLINE | ID: mdl-33166290

ABSTRACT

Given the complexity and diversity of the cancer genomics profiles, it is challenging to identify distinct clusters from different cancer types. Numerous analyses have been conducted for this propose. Still, the methods they used always do not directly support the high-dimensional omics data across the whole genome (Such as ATAC-seq profiles). In this study, based on the deep adversarial learning, we present an end-to-end approach ClusterATAC to leverage high-dimensional features and explore the classification results. On the ATAC-seq dataset and RNA-seq dataset, ClusterATAC has achieved excellent performance. Since ATAC-seq data plays a crucial role in the study of the effects of non-coding regions on the molecular classification of cancers, we explore the clustering solution obtained by ClusterATAC on the pan-cancer ATAC dataset. In this solution, more than 70% of the clustering are single-tumor-type-dominant, and the vast majority of the remaining clusters are associated with similar tumor types. We explore the representative non-coding loci and their linked genes of each cluster and verify some results by the literature search. These results suggest that a large number of non-coding loci affect the development and progression of cancer through its linked genes, which can potentially advance cancer diagnosis and therapy.


Subject(s)
Chromatin Immunoprecipitation Sequencing/statistics & numerical data , Deep Learning , Neoplasms/classification , Neoplasms/genetics , Chromatin/genetics , Computational Biology , Databases, Nucleic Acid/statistics & numerical data , Genomics/methods , Genomics/statistics & numerical data , Humans , Multigene Family , Normal Distribution , Oncogenes , RNA-Seq/statistics & numerical data
7.
Nat Commun ; 11(1): 2696, 2020 06 01.
Article in English | MEDLINE | ID: mdl-32483223

ABSTRACT

Conversion between cell types, e.g., by induced expression of master transcription factors, holds great promise for cellular therapy. Our ability to manipulate cell identity is constrained by incomplete information on cell identity genes (CIGs) and their expression regulation. Here, we develop CEFCIG, an artificial intelligent framework to uncover CIGs and further define their master regulators. On the basis of machine learning, CEFCIG reveals unique histone codes for transcriptional regulation of reported CIGs, and utilizes these codes to predict CIGs and their master regulators with high accuracy. Applying CEFCIG to 1,005 epigenetic profiles, our analysis uncovers the landscape of regulation network for identity genes in individual cell or tissue types. Together, this work provides insights into cell identity regulation, and delivers a powerful technique to facilitate regenerative medicine.


Subject(s)
Cells/classification , Cells/metabolism , Histone Code , Machine Learning , Algorithms , Cells/cytology , Chromatin Immunoprecipitation Sequencing/statistics & numerical data , Databases, Genetic/statistics & numerical data , Epigenesis, Genetic , Gene Expression Regulation , Gene Regulatory Networks , Human Umbilical Vein Endothelial Cells/cytology , Human Umbilical Vein Endothelial Cells/metabolism , Humans , Phenotype , Pluripotent Stem Cells/cytology , Pluripotent Stem Cells/metabolism , RNA-Seq/statistics & numerical data , Regenerative Medicine , Transcription Factors/metabolism
8.
PLoS Comput Biol ; 15(8): e1007227, 2019 08.
Article in English | MEDLINE | ID: mdl-31425505

ABSTRACT

RNA-protein interaction plays important roles in post-transcriptional regulation. Recent advancements in cross-linking and immunoprecipitation followed by sequencing (CLIP-seq) technologies make it possible to detect the binding peaks of a given RNA binding protein (RBP) at transcriptome scale. However, it is still challenging to predict the functional consequences of RBP binding peaks. In this study, we propose the Protein-RNA Association Strength (PRAS), which integrates the intensities and positions of the binding peaks of RBPs for functional mRNA targets prediction. We illustrate the superiority of PRAS over existing approaches on predicting the functional targets of two related but divergent CELF (CUGBP, ELAV-like factor) RBPs in mouse brain and muscle. We also demonstrate the potential of PRAS for wide adoption by applying it to the enhanced CLIP-seq (eCLIP) datasets of 37 RNA decay related RBPs in two human cell lines. PRAS can be utilized to investigate any RBPs with available CLIP-seq peaks. PRAS is freely available at http://ouyanglab.jax.org/pras/.


Subject(s)
Chromatin Immunoprecipitation Sequencing/statistics & numerical data , RNA, Messenger/genetics , RNA, Messenger/metabolism , RNA-Binding Proteins/metabolism , Software , Animals , Base Sequence , Binding Sites/genetics , Brain/metabolism , CELF Proteins/genetics , CELF Proteins/metabolism , Computational Biology , Databases, Protein , Gene Expression Profiling , Hep G2 Cells , Humans , K562 Cells , Mice , Muscles/metabolism , RNA-Binding Proteins/genetics
9.
Curr Med Chem ; 26(42): 7641-7654, 2019.
Article in English | MEDLINE | ID: mdl-29848263

ABSTRACT

BACKGROUND: Transcription factors are DNA-binding proteins that play key roles in many fundamental biological processes. Unraveling their interactions with DNA is essential to identify their target genes and understand the regulatory network. Genome-wide identification of their binding sites became feasible thanks to recent progress in experimental and computational approaches. ChIP-chip, ChIP-seq, and ChIP-exo are three widely used techniques to demarcate genome-wide transcription factor binding sites. OBJECTIVE: This review aims to provide an overview of these three techniques including their experiment procedures, computational approaches, and popular analytic tools. CONCLUSION: ChIP-chip, ChIP-seq, and ChIP-exo have been the major techniques to study genome- wide in vivo protein-DNA interaction. Due to the rapid development of next-generation sequencing technology, array-based ChIP-chip is deprecated and ChIP-seq has become the most widely used technique to identify transcription factor binding sites in genome-wide. The newly developed ChIP-exo further improves the spatial resolution to single nucleotide. Numerous tools have been developed to analyze ChIP-chip, ChIP-seq and ChIP-exo data. However, different programs may employ different mechanisms or underlying algorithms thus each will inherently include its own set of statistical assumption and bias. So choosing the most appropriate analytic program for a given experiment needs careful considerations. Moreover, most programs only have command line interface so their installation and usage will require basic computation expertise in Unix/Linux.


Subject(s)
DNA-Binding Proteins/metabolism , DNA/metabolism , Genome , Transcription Factors/metabolism , Animals , Binding Sites , Chromatin Immunoprecipitation Sequencing/methods , Chromatin Immunoprecipitation Sequencing/statistics & numerical data , Computational Biology , Humans , Protein Binding
SELECTION OF CITATIONS
SEARCH DETAIL
...