RESUMO
BACKGROUND: Massive amounts of data are produced by combining next-generation sequencing with complex biochemistry techniques to characterize regulatory genomics profiles, such as protein-DNA interaction and chromatin accessibility. Interpretation of such high-throughput data typically requires different computation methods. However, existing tools are usually developed for a specific task, which makes it challenging to analyze the data in an integrative manner. RESULTS: We here describe the Regulatory Genomics Toolbox (RGT), a computational library for the integrative analysis of regulatory genomics data. RGT provides different functionalities to handle genomic signals and regions. Based on that, we developed several tools to perform distinct downstream analyses, including the prediction of transcription factor binding sites using ATAC-seq data, identification of differential peaks from ChIP-seq data, and detection of triple helix mediated RNA and DNA interactions, visualization, and finding an association between distinct regulatory factors. CONCLUSION: We present here RGT; a framework to facilitate the customization of computational methods to analyze genomic data for specific regulatory genomics problems. RGT is a comprehensive and flexible Python package for analyzing high throughput regulatory genomics data and is available at: https://github.com/CostaLab/reg-gen . The documentation is available at: https://reg-gen.readthedocs.io.
Assuntos
Cromatina , Genômica , Sequenciamento de Cromatina por Imunoprecipitação , Documentação , Biblioteca GênicaRESUMO
DNase-seq allows nucleotide-level identification of transcription factor binding sites on the basis of a computational search of footprint-like DNase I cleavage patterns on the DNA. Frequently in high-throughput methods, experimental artifacts such as DNase I cleavage bias affect the computational analysis of DNase-seq experiments. Here we performed a comprehensive and systematic study on the performance of computational footprinting methods. We evaluated ten footprinting methods in a panel of DNase-seq experiments for their ability to recover cell-specific transcription factor binding sites. We show that three methods--HINT, DNase2TF and PIQ--consistently outperformed the other evaluated methods and that correcting the DNase-seq signal for experimental artifacts significantly improved the accuracy of computational footprints. We also propose a score that can be used to detect footprints arising from transcription factors with potentially short residence times.
Assuntos
Biologia Computacional/métodos , Pegada de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Fatores de Transcrição/metabolismo , Algoritmos , Sítios de Ligação , Cromatina/genética , Cromatina/metabolismo , Imunoprecipitação da Cromatina , Desoxirribonuclease I/metabolismo , Humanos , Células K562 , Ligação ProteicaRESUMO
The study of changes in protein-DNA interactions measured by ChIP-seq on dynamic systems, such as cell differentiation, response to treatments or the comparison of healthy and diseased individuals, is still an open challenge. There are few computational methods comparing changes in ChIP-seq signals with replicates. Moreover, none of these previous approaches addresses ChIP-seq specific experimental artefacts arising from studies with biological replicates. We propose THOR, a Hidden Markov Model based approach, to detect differential peaks between pairs of biological conditions with replicates. THOR provides all pre- and post-processing steps required in ChIP-seq analyses. Moreover, we propose a novel normalization approach based on housekeeping genes to deal with cases where replicates have distinct signal-to-noise ratios. To evaluate differential peak calling methods, we delineate a methodology using both biological and simulated data. This includes an evaluation procedure that associates differential peaks with changes in gene expression as well as histone modifications close to these peaks. We evaluate THOR and seven competing methods on data sets with distinct characteristics from in vitro studies with technical replicates to clinical studies of cancer patients. Our evaluation analysis comprises of 13 comparisons between pairs of biological conditions. We show that THOR performs best in all scenarios.
Assuntos
Imunoprecipitação da Cromatina , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Cadeias de Markov , Análise de Sequência de DNA , Algoritmos , Diferenciação Celular/genética , Conjuntos de Dados como Assunto , Células Dendríticas/imunologia , Células Dendríticas/metabolismo , Epigênese Genética , Histonas/metabolismo , Humanos , Linfoma de Células B/genética , Fluxo de TrabalhoRESUMO
MOTIVATION: Detection of changes in deoxyribonucleic acid (DNA)-protein interactions from ChIP-seq data is a crucial step in unraveling the regulatory networks behind biological processes. The simplest variation of this problem is the differential peak calling (DPC) problem. Here, one has to find genomic regions with ChIP-seq signal changes between two cellular conditions in the interaction of a protein with DNA. The great majority of peak calling methods can only analyze one ChIP-seq signal at a time and are unable to perform DPC. Recently, a few approaches based on the combination of these peak callers with statistical tests for detecting differential digital expression have been proposed. However, these methods fail to detect detailed changes of protein-DNA interactions. RESULTS: We propose an One-stage DIffereNtial peak caller (ODIN); an Hidden Markov Model-based approach to detect and analyze differential peaks (DPs) in pairs of ChIP-seq data. ODIN performs genomic signal processing, peak calling and p-value calculation in an integrated framework. We also propose an evaluation methodology to compare ODIN with competing methods. The evaluation method is based on the association of DPs with expression changes in the same cellular conditions. Our empirical study based on several ChIP-seq experiments from transcription factors, histone modifications and simulated data shows that ODIN outperforms considered competing methods in most scenarios.
Assuntos
Imunoprecipitação da Cromatina/métodos , Análise de Sequência de DNA/métodos , Animais , Genômica/métodos , Histonas/metabolismo , Cadeias de Markov , Camundongos , Fatores de Transcrição/metabolismoRESUMO
BACKGROUND: Elevated sequencing error rates are the most predominant obstacle in single-nucleotide polymorphism (SNP) detection, which is a major goal in the bulk of current studies using next-generation sequencing (NGS). Beyond routinely handled generic sources of errors, certain base calling errors relate to specific sequence patterns. Statistically principled ways to associate sequence patterns with base calling errors have not been previously described. Extant approaches either incur decisive losses in power, due to relating errors with individual genomic positions rather than motifs, or do not properly distinguish between motif-induced and sequence-unspecific sources of errors. RESULTS: Here, for the first time, we describe a statistically rigorous framework for the discovery of motifs that induce sequencing errors. We apply our method to several datasets from Illumina GA IIx, HiSeq 2000, and MiSeq sequencers. We confirm previously known error-causing sequence contexts and report new more specific ones. CONCLUSIONS: Checking for error-inducing motifs should be included into SNP calling pipelines to avoid false positives. To facilitate filtering of sets of putative SNPs, we provide tracks of error-prone genomic positions (in BED format). AVAILABILITY: http://discovering-cse.googlecode.com.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , DNA/química , Genoma , Genômica/métodos , Humanos , Motivos de Nucleotídeos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
BACKGROUND: Interferon alpha (IFNa) monotherapy is recommended as the standard therapy in polycythemia vera (PV) but not in chronic myeloid leukemia (CML). Here, we investigated the mechanisms of IFNa efficacy in JAK2V617F- vs. BCR-ABL-positive cells. METHODS: Gene expression microarrays and RT-qPCR of PV vs. CML patient PBMCs and CD34+ cells and of the murine cell line 32D expressing JAK2V617F or BCR-ABL were used to analyze and compare interferon-stimulated gene (ISG) expression. Furthermore, using CRISPR/Cas9n technology, targeted disruption of STAT1 or STAT2, respectively, was performed in 32D-BCR-ABL and 32D-JAK2V617F cells to evaluate the role of these transcription factors for IFNa efficacy. The knockout cell lines were reconstituted with STAT1, STAT2, STAT1Y701F, or STAT2Y689F to analyze the importance of wild-type and phosphomutant STATs for the IFNa response. ChIP-seq and ChIP were performed to correlate histone marks with ISG expression. RESULTS: Microarray analysis and RT-qPCR revealed significant upregulation of ISGs in 32D-JAK2V617F but downregulation in 32D-BCR-ABL cells, and these effects were reversed by tyrosine kinase inhibitor (TKI) treatment. Similar expression patterns were confirmed in human cell lines, primary PV and CML patient PBMCs and CD34+ cells, demonstrating that these effects are operational in patients. IFNa treatment increased Stat1, Stat2, and Irf9 mRNA as well as pY-STAT1 in all cell lines; however, viability was specifically decreased in 32D-JAK2V617F. STAT1 or STAT2 knockout and reconstitution with wild-type or phospho-deficient STAT mutants demonstrated the necessity of STAT2 for IFNa-induced STAT1 phosphorylation in BCR-ABL- but not in JAK2V617F-expressing cells. STAT1 was essential for IFNa activity in both BCR-ABL- and JAK2V617F-positive cells. Furthermore, ChIP experiments demonstrate higher repressive and lower active chromatin marks at the promoters of ISGs in BCR-ABL-expressing cells. CONCLUSIONS: JAK2V617F but not BCR-ABL sensitizes MPN cells to interferon, and this effect was dependent on STAT1. Moreover, STAT2 is a survival factor in BCR-ABL- and JAK2V617F-positive cells but an IFNa-sensitizing factor solely in 32D-JAK2V617F cells by upregulation of STAT1 expression.