Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
Brief Bioinform ; 19(5): 1069-1081, 2018 09 28.
Artículo en Inglés | MEDLINE | ID: mdl-28334268

RESUMEN

Transcription factors are proteins that bind to specific DNA sequences and play important roles in controlling the expression levels of their target genes. Hence, prediction of transcription factor binding sites (TFBSs) provides a solid foundation for inferring gene regulatory mechanisms and building regulatory networks for a genome. Chromatin immunoprecipitation sequencing (ChIP-seq) technology can generate large-scale experimental data for such protein-DNA interactions, providing an unprecedented opportunity to identify TFBSs (a.k.a. cis-regulatory motifs). The bottleneck, however, is the lack of robust mathematical models, as well as efficient computational methods for TFBS prediction to make effective use of massive ChIP-seq data sets in the public domain. The purpose of this study is to review existing motif-finding methods for ChIP-seq data from an algorithmic perspective and provide new computational insight into this field. The state-of-the-art methods were shown through summarizing eight representative motif-finding algorithms along with corresponding challenges, and introducing some important relative functions according to specific biological demands, including discriminative motif finding and cofactor motifs analysis. Finally, potential directions and plans for ChIP-seq-based motif-finding tools were showcased in support of future algorithm development.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Programas Informáticos , Secuencia de Bases , Sitios de Unión/genética , Inmunoprecipitación de Cromatina/estadística & datos numéricos , Biología Computacional/métodos , ADN/genética , ADN/metabolismo , Humanos , Análisis de Secuencia de ADN/estadística & datos numéricos , Factores de Transcripción/metabolismo
2.
BMC Genomics ; 20(1): 6, 2019 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-30611200

RESUMEN

BACKGROUND: Sequencing data has become a standard measure of diverse cellular activities. For example, gene expression is accurately measured by RNA sequencing (RNA-Seq) libraries, protein-DNA interactions are captured by chromatin immunoprecipitation sequencing (ChIP-Seq), protein-RNA interactions by crosslinking immunoprecipitation sequencing (CLIP-Seq) or RNA immunoprecipitation (RIP-Seq) sequencing, DNA accessibility by assay for transposase-accessible chromatin (ATAC-Seq), DNase or MNase sequencing libraries. The processing of these sequencing techniques involves library-specific approaches. However, in all cases, once the sequencing libraries are processed, the result is a count table specifying the estimated number of reads originating from each genomic locus. Differential analysis to determine which loci have different cellular activity under different conditions starts with the count table and iterates through a cycle of data assessment, preparation and analysis. Such complex analysis often relies on multiple programs and is therefore a challenge for those without programming skills. RESULTS: We developed DEBrowser as an R bioconductor project to interactively visualize every step of the differential analysis, without programming. The application provides a rich and interactive web based graphical user interface built on R's shiny infrastructure. DEBrowser allows users to visualize data with various types of graphs that can be explored further by selecting and re-plotting any desired subset of data. Using the visualization approaches provided, users can determine and correct technical variations such as batch effects and sequencing depth that affect differential analysis. We show DEBrowser's ease of use by reproducing the analysis of two previously published data sets. CONCLUSIONS: DEBrowser is a flexible, intuitive, web-based analysis platform that enables an iterative and interactive analysis of count data without any requirement of programming knowledge.


Asunto(s)
Inmunoprecipitación de Cromatina/estadística & datos numéricos , Genoma Humano/genética , Análisis de Secuencia de ARN/estadística & datos numéricos , Programas Informáticos , Cromatina/genética , ADN/genética , Proteínas de Unión al ADN/genética , Interpretación Estadística de Datos , Genómica/estadística & datos numéricos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Análisis de Secuencia de ADN
3.
PLoS Comput Biol ; 14(4): e1006090, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29684008

RESUMEN

Genome-wide in vivo protein-DNA interactions are routinely mapped using high-throughput chromatin immunoprecipitation (ChIP). ChIP-reported regions are typically investigated for enriched sequence-motifs, which are likely to model the DNA-binding specificity of the profiled protein and/or of co-occurring proteins. However, simple enrichment analyses can miss insights into the binding-activity of the protein. Note that ChIP reports regions making direct contact with the protein as well as those binding through intermediaries. For example, consider a ChIP experiment targeting protein X, which binds DNA at its cognate sites, but simultaneously interacts with four other proteins. Each of these proteins also binds to its own specific cognate sites along distant parts of the genome, a scenario consistent with the current view of transcriptional hubs and chromatin loops. Since ChIP will pull down all X-associated regions, the final reported data will be a union of five distinct sets of regions, each containing binding sites of one of the five proteins, respectively. Characterizing all five different motifs and the corresponding sets is important to interpret the ChIP experiment and ultimately, the role of X in regulation. We present diversity which attempts exactly this: it partitions the data so that each partition can be characterized with its own de novo motif. Diversity uses a Bayesian approach to identify the optimal number of motifs and the associated partitions, which together explain the entire dataset. This is in contrast to standard motif finders, which report motifs individually enriched in the data, but do not necessarily explain all reported regions. We show that the different motifs and associated regions identified by diversity give insights into the various complexes that may be forming along the chromatin, something that has so far not been attempted from ChIP data. Webserver at http://diversity.ncl.res.in/; standalone (Mac OS X/Linux) from https://github.com/NarlikarLab/DIVERSITY/releases/tag/v1.0.0.


Asunto(s)
Inmunoprecipitación de Cromatina/estadística & datos numéricos , Programas Informáticos , Algoritmos , Animales , Teorema de Bayes , Sitios de Unión , Cromatina/genética , Cromatina/metabolismo , Biología Computacional , ADN/genética , ADN/metabolismo , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Evolución Molecular , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Neuronas/metabolismo , Motivos de Nucleótidos , Unión Proteica , Análisis de Secuencia de ADN/estadística & datos numéricos , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
4.
Nucleic Acids Res ; 43(6): e40, 2015 Mar 31.
Artículo en Inglés | MEDLINE | ID: mdl-25564527

RESUMEN

RNA-seq is a sensitive and accurate technique to compare steady-state levels of RNA between different cellular states. However, as it does not provide an account of transcriptional activity per se, other technologies are needed to more precisely determine acute transcriptional responses. Here, we have developed an easy, sensitive and accurate novel computational method, IRNA-SEQ: , for genome-wide assessment of transcriptional activity based on analysis of intron coverage from total RNA-seq data. Comparison of the results derived from iRNA-seq analyses with parallel results derived using current methods for genome-wide determination of transcriptional activity, i.e. global run-on (GRO)-seq and RNA polymerase II (RNAPII) ChIP-seq, demonstrate that iRNA-seq provides similar results in terms of number of regulated genes and their fold change. However, unlike the current methods that are all very labor-intensive and demanding in terms of sample material and technologies, iRNA-seq is cheap and easy and requires very little sample material. In conclusion, iRNA-seq offers an attractive novel alternative to current methods for determination of changes in transcriptional activity at a genome-wide level.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Línea Celular , Inmunoprecipitación de Cromatina/métodos , Inmunoprecipitación de Cromatina/estadística & datos numéricos , Perfilación de la Expresión Génica/estadística & datos numéricos , Regulación de la Expresión Génica , Genoma Humano , Humanos , Intrones , Análisis de Secuencia de ARN/estadística & datos numéricos
5.
Nucleic Acids Res ; 43(6): e38, 2015 Mar 31.
Artículo en Inglés | MEDLINE | ID: mdl-25539918

RESUMEN

Genome-wide chromatin immunoprecipitation (ChIP) studies have brought significant insight into the genomic localization of chromatin-associated proteins and histone modifications. The large amount of data generated by these analyses, however, require approaches that enable rapid validation and analysis of biological relevance. Furthermore, there are still protein and modification targets that are difficult to detect using standard ChIP methods. To address these issues, we developed an immediate chromatin immunoprecipitation procedure which we call ZipChip. ZipChip significantly reduces the time and increases sensitivity allowing for rapid screening of multiple loci. Here we describe how ZipChIP enables detection of histone modifications (H3K4 mono- and trimethylation) and two yeast histone demethylases, Jhd2 and Rph1, which were previously difficult to detect using standard methods. Furthermore, we demonstrate the versatility of ZipChIP by analyzing the enrichment of the histone deacetylase Sir2 at heterochromatin in yeast and enrichment of the chromatin remodeler, PICKLE, at euchromatin in Arabidopsis thaliana.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Reacción en Cadena en Tiempo Real de la Polimerasa/métodos , Actinas/genética , Actinas/metabolismo , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Cromatina/genética , Cromatina/metabolismo , Inmunoprecipitación de Cromatina/estadística & datos numéricos , ADN Helicasas/genética , ADN Helicasas/metabolismo , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Genes Fúngicos , Genes de Plantas , Histona Demetilasas/genética , Histona Demetilasas/metabolismo , Histonas/genética , Histonas/metabolismo , Histona Demetilasas con Dominio de Jumonji/genética , Histona Demetilasas con Dominio de Jumonji/metabolismo , Sistemas de Lectura Abierta , Regiones Promotoras Genéticas , Proteínas Represoras/genética , Proteínas Represoras/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Proteínas Reguladoras de Información Silente de Saccharomyces cerevisiae/genética , Proteínas Reguladoras de Información Silente de Saccharomyces cerevisiae/metabolismo , Sirtuina 2/genética , Sirtuina 2/metabolismo
6.
Brief Bioinform ; 14(2): 225-37, 2013 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-22517426

RESUMEN

Motif discovery has been one of the most widely studied problems in bioinformatics ever since genomic and protein sequences have been available. In particular, its application to the de novo prediction of putative over-represented transcription factor binding sites in nucleotide sequences has been, and still is, one of the most challenging flavors of the problem. Recently, novel experimental techniques like chromatin immunoprecipitation (ChIP) have been introduced, permitting the genome-wide identification of protein-DNA interactions. ChIP, applied to transcription factors and coupled with genome tiling arrays (ChIP on Chip) or next-generation sequencing technologies (ChIP-Seq) has opened new avenues in research, as well as posed new challenges to bioinformaticians developing algorithms and methods for motif discovery.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Elementos Reguladores de la Transcripción , Factores de Transcripción/metabolismo , Algoritmos , Animales , Sitios de Unión/genética , Inmunoprecipitación de Cromatina/estadística & datos numéricos , Biología Computacional , Secuencia de Consenso , ADN/genética , ADN/metabolismo , Perfilación de la Expresión Génica/estadística & datos numéricos , Humanos
7.
Biostatistics ; 13(1): 113-28, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21914728

RESUMEN

Chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) is a powerful technique that is being used in a wide range of biological studies including genome-wide measurements of protein-DNA interactions, DNA methylation, and histone modifications. The vast amount of data and biases introduced by sequencing and/or genome mapping pose new challenges and call for effective methods and fast computer programs for statistical analysis. To systematically model ChIP-seq data, we build a dynamic signal profile for each chromosome and then model the profile using a fully Bayesian hidden Ising model. The proposed model naturally takes into account spatial dependency and global and local distributions of sequence tags. It can be used for one-sample and two-sample analyses. Through model diagnosis, the proposed method can detect falsely enriched regions caused by sequencing and/or mapping errors, which is usually not offered by the existing hypothesis-testing-based methods. The proposed method is illustrated using 3 transcription factor (TF) ChIP-seq data sets and 2 mixed ChIP-seq data sets and compared with 4 popular and/or well-documented methods: MACS, CisGenome, BayesPeak, and SISSRs. The results indicate that the proposed method achieves equivalent or higher sensitivity and spatial resolution in detecting TF binding sites with false discovery rate at a much lower level.


Asunto(s)
Inmunoprecipitación de Cromatina/estadística & datos numéricos , Modelos Estadísticos , Análisis de Secuencia de ADN/estadística & datos numéricos , Algoritmos , Teorema de Bayes , Sitios de Unión/genética , Biotecnología , ADN/genética , ADN/metabolismo , Interpretación Estadística de Datos , Bases de Datos de Ácidos Nucleicos , Humanos , Cadenas de Markov , Factores de Transcripción/metabolismo
8.
Hum Genomics ; 5(2): 117-23, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21296745

RESUMEN

Chromatin immunoprecipitation followed by massively parallel next-generation sequencing (ChIP-seq) is a valuable experimental strategy for assaying protein-DNA interaction over the whole genome. Many computational tools have been designed to find the peaks of the signals corresponding to protein binding sites. In this paper, three computational methods, ChIP-seq processing pipeline (spp), PeakSeq and CisGenome, used in ChIP-seq data analysis are reviewed. There is also a comparison of how they agree and disagree on finding peaks using the publically available Signal Transducers and Activators of Transcription protein 1 (STAT1) and RNA polymerase II (PolII) datasets with corresponding negative controls.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Análisis de Secuencia de ADN , Programas Informáticos , Inmunoprecipitación de Cromatina/estadística & datos numéricos , Humanos , Unión Proteica , ARN Polimerasa II/genética , Proyectos de Investigación , Factor de Transcripción STAT1/genética
9.
Biometrics ; 66(4): 1284-94, 2010 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-20128774

RESUMEN

ChIP-chip experiments are procedures that combine chromatin immunoprecipitation (ChIP) and DNA microarray (chip) technology to study a variety of biological problems, including protein-DNA interaction, histone modification, and DNA methylation. The most important feature of ChIP-chip data is that the intensity measurements of probes are spatially correlated because the DNA fragments are hybridized to neighboring probes in the experiments. We propose a simple, but powerful Bayesian hierarchical approach to ChIP-chip data through an Ising model with high-order interactions. The proposed method naturally takes into account the intrinsic spatial structure of the data and can be used to analyze data from multiple platforms with different genomic resolutions. The model parameters are estimated using the Gibbs sampler. The proposed method is illustrated using two publicly available data sets from Affymetrix and Agilent platforms, and compared with three alternative Bayesian methods, namely, Bayesian hierarchical model, hierarchical gamma mixture model, and Tilemap hidden Markov model. The numerical results indicate that the proposed method performs as well as the other three methods for the data from Affymetrix tiling arrays, but significantly outperforms the other three methods for the data from Agilent promoter arrays. In addition, we find that the proposed method has better operating characteristics in terms of sensitivities and false discovery rates under various scenarios.


Asunto(s)
Teorema de Bayes , Inmunoprecipitación de Cromatina/estadística & datos numéricos , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Humanos , Métodos , Sensibilidad y Especificidad
10.
PLoS Comput Biol ; 4(10): e1000201, 2008 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-18927605

RESUMEN

Computational methods to identify functional genomic elements using genetic information have been very successful in determining gene structure and in identifying a handful of cis-regulatory elements. But the vast majority of regulatory elements have yet to be discovered, and it has become increasingly apparent that their discovery will not come from using genetic information alone. Recently, high-throughput technologies have enabled the creation of information-rich epigenetic maps, most notably for histone modifications. However, tools that search for functional elements using this epigenetic information have been lacking. Here, we describe an unsupervised learning method called ChromaSig to find, in an unbiased fashion, commonly occurring chromatin signatures in both tiling microarray and sequencing data. Applying this algorithm to nine chromatin marks across a 1% sampling of the human genome in HeLa cells, we recover eight clusters of distinct chromatin signatures, five of which correspond to known patterns associated with transcriptional promoters and enhancers. Interestingly, we observe that the distinct chromatin signatures found at enhancers mark distinct functional classes of enhancers in terms of transcription factor and coactivator binding. In addition, we identify three clusters of novel chromatin signatures that contain evolutionarily conserved sequences and potential cis-regulatory elements. Applying ChromaSig to a panel of 21 chromatin marks mapped genomewide by ChIP-Seq reveals 16 classes of genomic elements marked by distinct chromatin signatures. Interestingly, four classes containing enrichment for repressive histone modifications appear to be locally heterochromatic sites and are enriched in quickly evolving regions of the genome. The utility of this approach in uncovering novel, functionally significant genomic elements will aid future efforts of genome annotation via chromatin modifications.


Asunto(s)
Cromatina/genética , Genoma Humano , Modelos Genéticos , Modelos Estadísticos , Inteligencia Artificial , Cromatina/metabolismo , Inmunoprecipitación de Cromatina/estadística & datos numéricos , Biología Computacional , Elementos de Facilitación Genéticos , Células HeLa , Histonas/química , Histonas/genética , Histonas/metabolismo , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Regiones Promotoras Genéticas , Procesamiento Proteico-Postraduccional , Sitio de Iniciación de la Transcripción
11.
Biometrics ; 65(4): 1087-95, 2009 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-19210737

RESUMEN

We propose a unified framework for the analysis of chromatin (Ch) immunoprecipitation (IP) microarray (ChIP-chip) data for detecting transcription factor binding sites (TFBSs) or motifs. ChIP-chip assays are used to focus the genome-wide search for TFBSs by isolating a sample of DNA fragments with TFBSs and applying this sample to a microarray with probes corresponding to tiled segments across the genome. Present analytical methods use a two-step approach: (i) analyze array data to estimate IP-enrichment peaks then (ii) analyze the corresponding sequences independently of intensity information. The proposed model integrates peak finding and motif discovery through a unified Bayesian hidden Markov model (HMM) framework that accommodates the inherent uncertainty in both measurements. A Markov chain Monte Carlo algorithm is formulated for parameter estimation, adapting recursive techniques used for HMMs. In simulations and applications to a yeast RAP1 dataset, the proposed method has favorable TFBS discovery performance compared to currently available two-stage procedures in terms of both sensitivity and specificity.


Asunto(s)
Biometría/métodos , Inmunoprecipitación de Cromatina/estadística & datos numéricos , Genómica/estadística & datos numéricos , Modelos Estadísticos , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Algoritmos , Secuencia de Bases , Teorema de Bayes , Sitios de Unión/genética , ADN de Hongos/genética , ADN de Hongos/metabolismo , Cadenas de Markov , Método de Montecarlo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Complejo Shelterina , Proteínas de Unión a Telómeros/metabolismo , Factores de Transcripción/metabolismo
12.
Curr Opin Biotechnol ; 19(1): 50-4, 2008 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-18207385

RESUMEN

Changes in transcript levels are assessed by microarray analysis on an individual basis, essentially resulting in long lists of genes that were found to have significantly changed transcript levels. However, in biology these changes do not occur as independent events as such lists suggest, but in a highly coordinated and interdependent manner. Understanding the biological meaning of the observed changes requires elucidating such biological interdependencies. The most common way to achieve this is to project the gene lists onto distinct biological processes often represented in the form of gene-ontology (GO) categories or metabolic and regulatory pathways as derived from literature analysis. This review focuses on different approaches and tools employed for this task, starting form GO-ranking methods, covering pathway mappings, finally converging on biological network analysis. A brief outlook of the application of such approaches to the newest microarray-based technologies (Chromatin-ImmunoPrecipitation, ChIP-on-chip) concludes the review.


Asunto(s)
Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Biotecnología , Inmunoprecipitación de Cromatina/estadística & datos numéricos , Biología Computacional , Interpretación Estadística de Datos , Bases de Datos Genéticas
13.
Methods Mol Biol ; 521: 255-78, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19563111

RESUMEN

Chromatin immunoprecipitation (ChIP) is a widely used method to study the interactions between proteins and discrete chromosomal loci in vivo. Originally, ChIP was developed for analysis of protein associations with DNA sequences known or suspected to bind the protein of interest. The advent of DNA microarrays has enabled the identification of all DNA sequences enriched by ChIP, providing a genomic view of protein binding. This powerful approach, termed ChIP-chip, is broadly applicable and has been particularly valuable in DNA replication studies to map replication origins in Saccharomyces cerevisiae based on the association of replication proteins with these chromosomal elements. We present a detailed ChIP-chip protocol for S. cerevisiae that uses oligonucleotide DNA microarrays printed on polylysine-coated glass slides and can also be easily adapted for commercially available high-density tiling microarrays from NimbleGen. We also outline general protocols for data analysis; however, microarray data analyses usually must be tailored specifically for individual studies, depending on experimental design, microarray format, and data quality.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Cromatina/metabolismo , Replicación del ADN , Proteínas de Unión al ADN/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Inmunoprecipitación de Cromatina/estadística & datos numéricos , Reactivos de Enlaces Cruzados , ADN de Hongos/biosíntesis , ADN de Hongos/aislamiento & purificación , Interpretación Estadística de Datos , Colorantes Fluorescentes , Hibridación de Ácido Nucleico , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Origen de Réplica , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo
14.
Pac Symp Biocomput ; 24: 184-195, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30864321

RESUMEN

Genetic variations of the human genome are linked to many disease phenotypes. While whole-genome sequencing and genome-wide association studies (GWAS) have uncovered a number of genotype-phenotype associations, their functional interpretation remains challenging given most single nucleotide polymorphisms (SNPs) fall into the non-coding region of the genome. Advances in chromatin immunoprecipitation sequencing (ChIP-seq) have made large-scale repositories of epigenetic data available, allowing investigation of coordinated mechanisms of epigenetic markers and transcriptional regulation and their influence on biological function. To address this, we propose SNPs2ChIP, a method to infer biological functions of non-coding variants through unsupervised statistical learning methods applied to publicly-available epigenetic datasets. We systematically characterized latent factors by applying singular value decomposition to ChIP-seq tracks of lymphoblastoid cell lines, and annotated the biological function of each latent factor using the genomic region enrichment analysis tool. Using these annotated latent factors as reference, we developed SNPs2ChIP, a pipeline that takes genomic region(s) as an input, identifies the relevant latent factors with quantitative scores, and returns them along with their inferred functions. As a case study, we focused on systemic lupus erythematosus and demonstrated our method's ability to infer relevant biological function. We systematically applied SNPs2ChIP on publicly available datasets, including known GWAS associations from the GWAS catalogue and ChIP-seq peaks from a previously published study. Our approach to leverage latent patterns across genome-wide epigenetic datasets to infer the biological function will advance understanding of the genetics of human diseases by accelerating the interpretation of non-coding genomes.


Asunto(s)
Inmunoprecipitación de Cromatina/estadística & datos numéricos , Polimorfismo de Nucleótido Simple , Algoritmos , Línea Celular , Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Epigénesis Genética , Estudios de Asociación Genética , Genoma Humano , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Lupus Eritematoso Sistémico/genética , Linfocitos/metabolismo , Receptores de Calcitriol/genética
15.
Nat Commun ; 10(1): 4613, 2019 10 10.
Artículo en Inglés | MEDLINE | ID: mdl-31601804

RESUMEN

Characterizing and interpreting heterogeneous mixtures at the cellular level is a critical problem in genomics. Single-cell assays offer an opportunity to resolve cellular level heterogeneity, e.g., scRNA-seq enables single-cell expression profiling, and scATAC-seq identifies active regulatory elements. Furthermore, while scHi-C can measure the chromatin contacts (i.e., loops) between active regulatory elements to target genes in single cells, bulk HiChIP can measure such contacts in a higher resolution. In this work, we introduce DC3 (De-Convolution and Coupled-Clustering) as a method for the joint analysis of various bulk and single-cell data such as HiChIP, RNA-seq and ATAC-seq from the same heterogeneous cell population. DC3 can simultaneously identify distinct subpopulations, assign single cells to the subpopulations (i.e., clustering) and de-convolve the bulk data into subpopulation-specific data. The subpopulation-specific profiles of gene expression, chromatin accessibility and enhancer-promoter contact obtained by DC3 provide a comprehensive characterization of the gene regulatory system in each subpopulation.


Asunto(s)
Algoritmos , Análisis por Conglomerados , Perfilación de la Expresión Génica/estadística & datos numéricos , Genómica/estadística & datos numéricos , Análisis de la Célula Individual/estadística & datos numéricos , Animales , Línea Celular , Cromatina , Inmunoprecipitación de Cromatina/estadística & datos numéricos , Simulación por Computador , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Ratones , Regiones Promotoras Genéticas , Análisis de la Célula Individual/métodos
16.
Methods Mol Biol ; 408: 129-51, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-18314581

RESUMEN

The combinatorial control of gene regulatory switches involves both transcription factor (TF) complexes and associated epigenetic modifications to the chromatin template. The novel high-throughput technologies, such as Chromatin ImmunoPrecipitation ChIP-chip, have enabled genome-wide in vivo identification of TF target regulatory regions and related epigenetic modifications, which led to the view of highly dynamic TF-DNA interactions in activated or repressed promoters. Consequently, modeling and elucidating the combinatorial interaction of TFs and corresponding cis-regulatory modules in target promoters is of paramount interest. An estimated 5% of the genes in mammalian genomes code for TF proteins, and computational modeling of cis-regulatory logic would rapidly increase the pace of experimental confirmation of TF target promoters at the bench. The purpose of this chapter is to discuss the use of different bioinformatics tools for predicting the target genes of TFs of interest in mammalian genomes, and the application of these methods in the analysis of ChIP-chip experimental data. The author describes most commonly used databases and prediction programs that are available on the World Wide Web and demonstrate the use of some of these programs by an example. A list of these programs is provided along with their web Uniform Resource Locator (URLs) and guidelines for successful application are suggested.


Asunto(s)
Biología Computacional/estadística & datos numéricos , Simulación por Computador , Epigénesis Genética , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Animales , Secuencia de Bases , Sitios de Unión/genética , Inmunoprecipitación de Cromatina/estadística & datos numéricos , Islas de CpG , ADN/genética , ADN/metabolismo , Bases de Datos Genéticas , Árboles de Decisión , Humanos , Internet , Ratones , Regiones Promotoras Genéticas
17.
BMC Bioinformatics ; 7: 434, 2006 Oct 05.
Artículo en Inglés | MEDLINE | ID: mdl-17022824

RESUMEN

BACKGROUND: High density oligonucleotide tiling arrays are an effective and powerful platform for conducting unbiased genome-wide studies. The ab initio probe selection method employed in tiling arrays is unbiased, and thus ensures consistent sampling across coding and non-coding regions of the genome. Tiling arrays are increasingly used in chromatin immunoprecipitation (IP) experiments (ChIP on chip). ChIP on chip facilitates the generation of genome-wide maps of in-vivo interactions between DNA-associated proteins including transcription factors and DNA. Analysis of the hybridization of an immunoprecipitated sample to a tiling array facilitates the identification of ChIP-enriched segments of the genome. These enriched segments are putative targets of antibody assayable regulatory elements. The enrichment response is not ubiquitous across the genome. Typically 5 to 10% of tiled probes manifest some significant enrichment. Depending upon the factor being studied, this response can drop to less than 1%. The detection and assessment of significance for interactions that emanate from non-canonical and/or un-annotated regions of the genome is especially challenging. This is the motivation behind the proposed algorithm. RESULTS: We have proposed a novel rank and replicate statistics-based methodology for identifying and ascribing statistical confidence to regions of ChIP-enrichment. The algorithm is optimized for identification of sites that manifest low levels of enrichment but are true positives, as validated by alternative biochemical experiments. Although the method is described here in the context of ChIP on chip experiments, it can be generalized to any treatment-control experimental design. The results of the algorithm show a high degree of concordance with independent biochemical validation methods. The sensitivity and specificity of the algorithm have been characterized via quantitative PCR and independent computational approaches. CONCLUSION: The algorithm ranks all enrichment sites based on their intra-replicate ranks and inter-replicate rank consistency. Following the ranking, the method allows segmentation of sites based on a meta p-value, a composite array signal enrichment criterion, or a composite of these two measures. The sensitivities obtained subsequent to the segmentation of data using a meta p-value of 10-5, an array signal enrichment of 0.2 and a composite of these two values are 88%, 87% and 95%, respectively.


Asunto(s)
Algoritmos , Inmunoprecipitación de Cromatina/métodos , Análisis por Micromatrices/métodos , Inmunoprecipitación de Cromatina/estadística & datos numéricos , Análisis por Micromatrices/estadística & datos numéricos , Valor Predictivo de las Pruebas , Estadísticas no Paramétricas
18.
Pac Symp Biocomput ; : 320-31, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23424137

RESUMEN

We have developed a novel approach called ChIPModule to systematically discover transcription factors and their cofactors from ChIP-seq data. Given a ChIP-seq dataset and the binding patterns of a large number of transcription factors, ChIPModule can efficiently identify groups of transcription factors, whose binding sites significantly co-occur in the ChIP-seq peak regions. By testing ChIPModule on simulated data and experimental data, we have shown that ChIPModule identifies known cofactors of transcription factors, and predicts new cofactors that are supported by literature. ChIPModule provides a useful tool for studying gene transcriptional regulation.


Asunto(s)
Inmunoprecipitación de Cromatina/estadística & datos numéricos , Análisis de Secuencia/estadística & datos numéricos , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Sitios de Unión/genética , Biología Computacional , Bases de Datos Genéticas/estadística & datos numéricos , Humanos
19.
PLoS One ; 7(1): e28272, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22238575

RESUMEN

Chromatin Immuno Precipitation (ChIP) profiling detects in vivo protein-DNA binding, and has revealed a large combinatorial complexity in the binding of chromatin associated proteins and their post-translational modifications. To fully explore the spatial and combinatorial patterns in ChIP-profiling data and detect potentially meaningful patterns, the areas of enrichment must be aligned and clustered, which is an algorithmically and computationally challenging task. We have developed CATCHprofiles, a novel tool for exhaustive pattern detection in ChIP profiling data. CATCHprofiles is built upon a computationally efficient implementation for the exhaustive alignment and hierarchical clustering of ChIP profiling data. The tool features a graphical interface for examination and browsing of the clustering results. CATCHprofiles requires no prior knowledge about functional sites, detects known binding patterns "ab initio", and enables the detection of new patterns from ChIP data at a high resolution, exemplified by the detection of asymmetric histone and histone modification patterns around H2A.Z-enriched sites. CATCHprofiles' capability for exhaustive analysis combined with its ease-of-use makes it an invaluable tool for explorative research based on ChIP profiling data. CATCHprofiles and the CATCH algorithm run on all platforms and is available for free through the CATCH website: http://catch.cmbi.ru.nl/. User support is available by subscribing to the mailing list catch-users@bioinformatics.org.


Asunto(s)
Inmunoprecipitación de Cromatina/estadística & datos numéricos , Interpretación Estadística de Datos , Análisis por Micromatrices/estadística & datos numéricos , Alineación de Secuencia , Programas Informáticos , Algoritmos , Secuencia de Bases , Células Cultivadas , Inmunoprecipitación de Cromatina/métodos , Análisis por Conglomerados , Biología Computacional/métodos , Eficiencia , Perfilación de la Expresión Génica/métodos , Perfilación de la Expresión Génica/estadística & datos numéricos , Humanos , Modelos Biológicos , Datos de Secuencia Molecular , Regiones Promotoras Genéticas/genética , Alineación de Secuencia/métodos , Alineación de Secuencia/estadística & datos numéricos
20.
J Bioinform Comput Biol ; 9(2): 269-82, 2011 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-21523932

RESUMEN

New high-throughput sequencing technologies can generate millions of short sequences in a single experiment. As the size of the data increases, comparison of multiple experiments on different cell lines under different experimental conditions becomes a big challenge. In this paper, we investigate ways to compare multiple ChIP-sequencing experiments. We specifically studied epigenetic regulation of breast cancer and the effect of estrogen using 50 ChIP-sequencing data from Illumina Genome Analyzer II. First, we evaluate the correlation among different experiments focusing on the total number of reads in transcribed and promoter regions of the genome. Then, we adopt the method that is used to identify the most stable genes in RT-PCR experiments to understand background signal across all of the experiments and to identify the most variable transcribed and promoter regions of the genome. We observed that the most variable genes for transcribed regions and promoter regions are very distinct. Gene ontology and function enrichment analysis on these most variable genes demonstrate the biological relevance of the results. In this study, we present a method that can effectively select differential regions of the genome based on protein-binding profiles over multiple experiments using real data points without any normalization among the samples.


Asunto(s)
Inmunoprecipitación de Cromatina/estadística & datos numéricos , Algoritmos , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Línea Celular , Línea Celular Tumoral , Biología Computacional , Epigénesis Genética , Femenino , Genoma Humano , Humanos , Unión Proteica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA