Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 140
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 39(6)2023 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-37243667

RESUMEN

MOTIVATION: Single-cell sequencing enables exploring the pathways and processes of cells, and cell populations. However, there is a paucity of pathway enrichment methods designed to tolerate the high noise and low gene coverage of this technology. When gene expression data are noisy and signals are sparse, testing pathway enrichment based on the genes expression may not yield statistically significant results, which is particularly problematic when detecting the pathways enriched in less abundant cells that are vulnerable to disturbances. RESULTS: In this project, we developed a Weighted Concept Signature Enrichment Analysis specialized for pathway enrichment analysis from single-cell transcriptomics (scRNA-seq). Weighted Concept Signature Enrichment Analysis took a broader approach for assessing the functional relations of pathway gene sets to differentially expressed genes, and leverage the cumulative signature of molecular concepts characteristic of the highly differentially expressed genes, which we termed as the universal concept signature, to tolerate the high noise and low coverage of this technology. We then incorporated Weighted Concept Signature Enrichment Analysis into an R package called "IndepthPathway" for biologists to broadly leverage this method for pathway analysis based on bulk and single-cell sequencing data. Through simulating technical variability and dropouts in gene expression characteristic of scRNA-seq as well as benchmarking on a real dataset of matched single-cell and bulk RNAseq data, we demonstrate that IndepthPathway presents outstanding stability and depth in pathway enrichment results under stochasticity of the data, thus will substantially improve the scientific rigor of the pathway analysis for single-cell sequencing data. AVAILABILITY AND IMPLEMENTATION: The IndepthPathway R package is available through: https://github.com/wangxlab/IndepthPathway.


Asunto(s)
Análisis de la Célula Individual , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Secuenciación del Exoma
2.
Environ Sci Technol ; 58(13): 5889-5898, 2024 Apr 02.
Artículo en Inglés | MEDLINE | ID: mdl-38501580

RESUMEN

Human exposure to toxic chemicals presents a huge health burden. Key to understanding chemical toxicity is knowledge of the molecular target(s) of the chemicals. Because a comprehensive safety assessment for all chemicals is infeasible due to limited resources, a robust computational method for discovering targets of environmental exposures is a promising direction for public health research. In this study, we implemented a novel matrix completion algorithm named coupled matrix-matrix completion (CMMC) for predicting direct and indirect exposome-target interactions, which exploits the vast amount of accumulated data regarding chemical exposures and their molecular targets. Our approach achieved an AUC of 0.89 on a benchmark data set generated using data from the Comparative Toxicogenomics Database. Our case studies with bisphenol A and its analogues, PFAS, dioxins, PCBs, and VOCs show that CMMC can be used to accurately predict molecular targets of novel chemicals without any prior bioactivity knowledge. Our results demonstrate the feasibility and promise of computationally predicting environmental chemical-target interactions to efficiently prioritize chemicals in hazard identification and risk assessment.


Asunto(s)
Dioxinas , Bifenilos Policlorados , Humanos , Exposición a Riesgos Ambientales/análisis , Bifenilos Policlorados/análisis , Medición de Riesgo , Salud Pública
3.
Brief Bioinform ; 22(2): 2161-2171, 2021 03 22.
Artículo en Inglés | MEDLINE | ID: mdl-32186716

RESUMEN

Predicting the interactions between drugs and targets plays an important role in the process of new drug discovery, drug repurposing (also known as drug repositioning). There is a need to develop novel and efficient prediction approaches in order to avoid the costly and laborious process of determining drug-target interactions (DTIs) based on experiments alone. These computational prediction approaches should be capable of identifying the potential DTIs in a timely manner. Matrix factorization methods have been proven to be the most reliable group of methods. Here, we first propose a matrix factorization-based method termed 'Coupled Matrix-Matrix Completion' (CMMC). Next, in order to utilize more comprehensive information provided in different databases and incorporate multiple types of scores for drug-drug similarities and target-target relationship, we then extend CMMC to 'Coupled Tensor-Matrix Completion' (CTMC) by considering drug-drug and target-target similarity/interaction tensors. Results: Evaluation on two benchmark datasets, DrugBank and TTD, shows that CTMC outperforms the matrix-factorization-based methods: GRMF, $L_{2,1}$-GRMF, NRLMF and NRLMF$\beta $. Based on the evaluation, CMMC and CTMC outperform the above three methods in term of area under the curve, F1 score, sensitivity and specificity in a considerably shorter run time.


Asunto(s)
Biología Computacional/métodos , Sistemas de Liberación de Medicamentos , Algoritmos , Desarrollo de Medicamentos , Interacciones Farmacológicas , Humanos
4.
Brief Bioinform ; 22(1): 247-269, 2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-31950972

RESUMEN

The task of predicting the interactions between drugs and targets plays a key role in the process of drug discovery. There is a need to develop novel and efficient prediction approaches in order to avoid costly and laborious yet not-always-deterministic experiments to determine drug-target interactions (DTIs) by experiments alone. These approaches should be capable of identifying the potential DTIs in a timely manner. In this article, we describe the data required for the task of DTI prediction followed by a comprehensive catalog consisting of machine learning methods and databases, which have been proposed and utilized to predict DTIs. The advantages and disadvantages of each set of methods are also briefly discussed. Lastly, the challenges one may face in prediction of DTI using machine learning approaches are highlighted and we conclude by shedding some lights on important future research directions.


Asunto(s)
Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Aprendizaje Automático , Bases de Datos Factuales , Humanos
5.
Brief Bioinform ; 21(5): 1717-1732, 2020 09 25.
Artículo en Inglés | MEDLINE | ID: mdl-31631213

RESUMEN

Identifying new gene functions and pathways underlying diseases and biological processes are major challenges in genomics research. Particularly, most methods for interpreting the pathways characteristic of an experimental gene list defined by genomic data are limited by their dependence on assessing the overlapping genes or their interactome topology, which cannot account for the variety of functional relations. This is particularly problematic for pathway discovery from single-cell genomics with low gene coverage or interpreting complex pathway changes such as during change of cell states. Here, we exploited the comprehensive sets of molecular concepts that combine ontologies, pathways, interactions and domains to help inform the functional relations. We first developed a universal concept signature (uniConSig) analysis for genome-wide quantification of new gene functions underlying biological or pathological processes based on the signature molecular concepts computed from known functional gene lists. We then further developed a novel concept signature enrichment analysis (CSEA) for deep functional assessment of the pathways enriched in an experimental gene list. This method is grounded on the framework of shared concept signatures between gene sets at multiple functional levels, thus overcoming the limitations of the current methods. Through meta-analysis of transcriptomic data sets of cancer cell line models and single hematopoietic stem cells, we demonstrate the broad applications of CSEA on pathway discovery from gene expression and single-cell transcriptomic data sets for genetic perturbations and change of cell states, which complements the current modalities. The R modules for uniConSig analysis and CSEA are available through https://github.com/wangxlab/uniConSig.


Asunto(s)
Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Algoritmos , Línea Celular Tumoral , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Genómica , Humanos
6.
J Biol Chem ; 295(26): 8834-8845, 2020 06 26.
Artículo en Inglés | MEDLINE | ID: mdl-32398261

RESUMEN

Anaplastic thyroid cancer (ATC) is one of the most aggressive human malignancies, with an average life expectancy of ∼6 months from the time of diagnosis. The genetic and epigenetic changes that underlie this malignancy are incompletely understood. We found that ASH1-like histone lysine methyltransferase (ASH1L) is overexpressed in ATC relative to the much less aggressive and more common differentiated thyroid cancer. This increased expression was due at least in part to reduced levels of microRNA-200b-3p (miR-200b-3p), which represses ASH1L expression, in ATC. Genetic knockout of ASH1L protein expression in ATC cell lines decreased cell growth both in culture and in mouse xenografts. RNA-Seq analysis of ASH1L knockout versus WT ATC cell lines revealed that ASH1L is involved in the regulation of numerous cancer-related genes and gene sets. The pro-oncogenic long noncoding RNA colon cancer-associated transcript 1 (CCAT1) was one of the most highly (approximately 68-fold) down-regulated transcripts in ASH1L knockout cells. Therefore, we investigated CCAT1 as a potential mediator of the growth-inducing activity of ASH1L. Supporting this hypothesis, CCAT1 knockdown in ATC cells decreased their growth rate, and ChIP-Seq data indicated that CCAT1 is likely a direct target of ASH1L's histone methyltransferase activity. These results indicate that ASH1L contributes to the aggressiveness of ATC and suggest that ASH1L, along with its upstream regulator miR-200b-3p and its downstream mediator CCAT1, represents a potential therapeutic target in ATC.


Asunto(s)
Proteínas de Unión al ADN/genética , N-Metiltransferasa de Histona-Lisina/genética , Carcinoma Anaplásico de Tiroides/genética , Neoplasias de la Tiroides/genética , Animales , Línea Celular Tumoral , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Ratones Endogámicos NOD , Ratones SCID , Carcinoma Anaplásico de Tiroides/patología , Neoplasias de la Tiroides/patología
7.
J Biol Chem ; 295(25): 8537-8549, 2020 06 19.
Artículo en Inglés | MEDLINE | ID: mdl-32371391

RESUMEN

Overexpression of centromeric proteins has been identified in a number of human malignancies, but the functional and mechanistic contributions of these proteins to disease progression have not been characterized. The centromeric histone H3 variant centromere protein A (CENPA) is an epigenetic mark that determines centromere identity. Here, using an array of approaches, including RNA-sequencing and ChIP-sequencing analyses, immunohistochemistry-based tissue microarrays, and various cell biology assays, we demonstrate that CENPA is highly overexpressed in prostate cancer in both tissue and cell lines and that the level of CENPA expression correlates with the disease stage in a large cohort of patients. Gain-of-function and loss-of-function experiments confirmed that CENPA promotes prostate cancer cell line growth. The results from the integrated sequencing experiments suggested a previously unidentified function of CENPA as a transcriptional regulator that modulates expression of critical proliferation, cell-cycle, and centromere/kinetochore genes. Taken together, our findings show that CENPA overexpression is crucial to prostate cancer growth.


Asunto(s)
Proteína A Centromérica/metabolismo , Histonas/metabolismo , Neoplasias de la Próstata/patología , Proteínas de Ciclo Celular/metabolismo , División Celular , Línea Celular Tumoral , Proliferación Celular/genética , Proteína A Centromérica/antagonistas & inhibidores , Proteína A Centromérica/genética , Mutación con Ganancia de Función , Histonas/genética , Humanos , Masculino , Neoplasias de la Próstata/metabolismo , Interferencia de ARN , ARN Interferente Pequeño/metabolismo
8.
Nutr Cancer ; 71(5): 772-780, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30862188

RESUMEN

AIM: Soy isoflavones have been suggested as epigenetic modulating agents with effects that could be important in carcinogenesis. Hypomethylation of LINE-1 has been associated with head and neck squamous cell carcinoma (HNSCC) development from oral premalignant lesions and with poor prognosis. To determine if neoadjuvant soy isoflavone supplementation could modulate LINE-1 methylation in HNSCC, we undertook a clinical trial. METHODS: Thirty-nine patients received 2-3 weeks of soy isoflavone supplements (300 mg/day) orally prior to surgery. Methylation of LINE-1, and 6 other genes was measured by pyrosequencing in biopsy, resection, and whole blood (WB) specimens. Changes in methylation were tested using paired t tests and ANOVA. Median follow up was 45 months. RESULTS: LINE-1 methylation increased significantly after soy isoflavone (P < 0.005). Amount of change correlated positively with days of isoflavone taken (P = 0.04). Similar changes were not seen in corresponding WB samples. No significant changes in tumor or blood methylation levels were seen in the other candidate genes. CONCLUSION: This is the first demonstration of in vivo increases in tissue-specific global methylation associated with soy isoflavone intake in patients with HNSCC. Prior associations of LINE-1 hypomethylation with genetic instability, carcinogenesis, and prognosis suggest that soy isoflavones maybe potential chemopreventive agents in HNSCC.


Asunto(s)
Metilación de ADN/efectos de los fármacos , Suplementos Dietéticos , Neoplasias de Cabeza y Cuello/tratamiento farmacológico , Isoflavonas/farmacología , Elementos de Nucleótido Esparcido Largo/efectos de los fármacos , Carcinoma de Células Escamosas de Cabeza y Cuello/tratamiento farmacológico , Femenino , Humanos , Masculino , Persona de Mediana Edad , Glycine max
10.
Bioinformatics ; 33(15): 2381-2383, 2017 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-28369316

RESUMEN

MOTIVATION: Analysis of next-generation sequencing data often results in a list of genomic regions. These may include differentially methylated CpGs/regions, transcription factor binding sites, interacting chromatin regions, or GWAS-associated SNPs, among others. A common analysis step is to annotate such genomic regions to genomic annotations (promoters, exons, enhancers, etc.). Existing tools are limited by a lack of annotation sources and flexible options, the time it takes to annotate regions, an artificial one-to-one region-to-annotation mapping, a lack of visualization options to easily summarize data, or some combination thereof. RESULTS: We developed the annotatr Bioconductor package to flexibly and quickly summarize and plot annotations of genomic regions. The annotatr package reports all intersections of regions and annotations, giving a better understanding of the genomic context of the regions. A variety of graphics functions are implemented to easily plot numerical or categorical data associated with the regions across the annotations, and across annotation intersections, providing insight into how characteristics of the regions differ across the annotations. We demonstrate that annotatr is up to 27× faster than comparable R packages. Overall, annotatr enables a richer biological interpretation of experiments. AVAILABILITY AND IMPLEMENTATION: http://bioconductor.org/packages/annotatr/ and https://github.com/rcavalcante/annotatr. CONTACT: rcavalca@umich.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Anotación de Secuencia Molecular/métodos , Secuencias Reguladoras de Ácidos Nucleicos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Cromatina/metabolismo , Exones , Genómica/métodos , Polimorfismo de Nucleótido Simple
11.
J Biol Chem ; 291(37): 19274-86, 2016 09 09.
Artículo en Inglés | MEDLINE | ID: mdl-27435678

RESUMEN

A subset of thyroid carcinomas contains a t(2;3)(q13;p25) chromosomal translocation that fuses paired box gene 8 (PAX8) with the peroxisome proliferator-activated receptor γ gene (PPARG), resulting in expression of a PAX8-PPARγ fusion protein, PPFP. We previously generated a transgenic mouse model of PPFP thyroid carcinoma and showed that feeding the PPARγ agonist pioglitazone greatly decreased the size of the primary tumor and prevented metastatic disease in vivo The antitumor effect correlates with the fact that pioglitazone turns PPFP into a strongly PPARγ-like molecule, resulting in trans-differentiation of the thyroid cancer cells into adipocyte-like cells that lose malignant character as they become more differentiated. To further study this process, we performed cell culture experiments with thyrocytes from the PPFP mouse thyroid cancers. Our data show that pioglitazone induced cellular lipid accumulation and the expression of adipocyte marker genes in the cultured cells, and shRNA knockdown of PPFP eliminated this pioglitazone effect. In addition, we found that PPFP and thyroid transcription factor 1 (TTF-1) physically interact, and that these transcription factors bind near each other on numerous target genes. TTF-1 knockdown and overexpression studies showed that TTF-1 inhibits PPFP target gene expression and impairs adipogenic trans-differentiation. Surprisingly, pioglitazone repressed TTF-1 expression in PPFP-expressing thyrocytes. Our data indicate that TTF-1 interacts with PPFP to inhibit the pro-adipogenic response to pioglitazone, and that the ability of pioglitazone to decrease TTF-1 expression contributes to its pro-adipogenic action.


Asunto(s)
Adipogénesis , Diferenciación Celular , Proteínas de Fusión Oncogénica/metabolismo , Factor de Transcripción PAX8/metabolismo , PPAR gamma/metabolismo , Neoplasias de la Tiroides/metabolismo , Animales , Línea Celular Tumoral , Ratones , Proteínas Nucleares , Proteínas de Fusión Oncogénica/genética , Factor de Transcripción PAX8/genética , PPAR gamma/genética , Ratas , Neoplasias de la Tiroides/genética , Neoplasias de la Tiroides/patología , Factor Nuclear Tiroideo 1 , Factores de Transcripción
12.
Bioinformatics ; 32(7): 1100-2, 2016 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-26607492

RESUMEN

UNLABELLED: Tests for differential gene expression with RNA-seq data have a tendency to identify certain types of transcripts as significant, e.g. longer and highly-expressed transcripts. This tendency has been shown to bias gene set enrichment (GSE) testing, which is used to find over- or under-represented biological functions in the data. Yet, there remains a surprising lack of tools for GSE testing specific for RNA-seq. We present a new GSE method for RNA-seq data, RNA-Enrich, that accounts for the above tendency empirically by adjusting for average read count per gene. RNA-Enrich is a quick, flexible method and web-based tool, with 16 available gene annotation databases. It does not require a P-value cut-off to define differential expression, and works well even with small sample-sized experiments. We show that adjusting for read counts per gene improves both the type I error rate and detection power of the test. AVAILABILITY AND IMPLEMENTATION: RNA-Enrich is available at http://lrpath.ncibi.org or from supplemental material as R code. CONTACT: sartorma@umich.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Anotación de Secuencia Molecular , Análisis de Secuencia de ARN , Perfilación de la Expresión Génica , ARN , Programas Informáticos
13.
Bioinformatics ; 32(10): 1536-43, 2016 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-26794319

RESUMEN

MOTIVATION: Capabilities in the field of metabolomics have grown tremendously in recent years. Many existing resources contain the chemical properties and classifications of commonly identified metabolites. However, the annotation of small molecules (both endogenous and synthetic) to meaningful biological pathways and concepts still lags behind the analytical capabilities and the chemistry-based annotations. Furthermore, no tools are available to visually explore relationships and networks among functionally related groups of metabolites (biomedical concepts). Such a tool would provide the ability to establish testable hypotheses regarding links among metabolic pathways, cellular processes, phenotypes and diseases. RESULTS: Here we present ConceptMetab, an interactive web-based tool for mapping and exploring the relationships among 16 069 biologically defined metabolite sets developed from Gene Ontology, KEGG and Medical Subject Headings, using both KEGG and PubChem compound identifiers, and based on statistical tests for association. We demonstrate the utility of ConceptMetab with multiple scenarios, showing it can be used to identify known and potentially novel relationships among metabolic pathways, cellular processes, phenotypes and diseases, and provides an intuitive interface for linking compounds to their molecular functions and higher level biological effects. AVAILABILITY AND IMPLEMENTATION: http://conceptmetab.med.umich.edu CONTACTS: akarnovsky@umich.edu or sartorma@umich.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Metabolómica , Programas Informáticos , Conjuntos de Datos como Asunto , Humanos , Redes y Vías Metabólicas , Estadística como Asunto , Vocabulario Controlado
14.
Hum Mol Genet ; 23(17): 4528-42, 2014 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-24781209

RESUMEN

To globally survey the changes in transcriptional landscape during terminal erythroid differentiation, we performed RNA sequencing (RNA-seq) on primary human CD34(+) cells after ex vivo differentiation from the earliest into the most mature erythroid cell stages. This analysis identified thousands of novel intergenic and intronic transcripts as well as novel alternative transcript isoforms. After rigorous data filtering, 51 (presumptive) novel protein-coding transcripts, 5326 long and 679 small non-coding RNA candidates remained. The analysis also revealed two clear transcriptional trends during terminal erythroid differentiation: first, the complexity of transcript diversity was predominantly achieved by alternative splicing, and second, splicing junctional diversity diminished during erythroid differentiation. Finally, 404 genes that were not known previously to be differentially expressed in erythroid cells were annotated. Analysis of the most extremely differentially expressed transcripts revealed that these gene products were all closely associated with hematopoietic lineage differentiation. Taken together, this study will serve as a comprehensive platform for future in-depth investigation of human erythroid development that, in turn, may reveal new insights into multiple layers of the transcriptional regulatory hierarchy that controls erythropoiesis.


Asunto(s)
Eritropoyesis/genética , Perfilación de la Expresión Génica , Regulación del Desarrollo de la Expresión Génica , Adulto , Diferenciación Celular/genética , Linaje de la Célula/genética , Células Eritroides/citología , Células Eritroides/metabolismo , Humanos , Sistemas de Lectura Abierta/genética , Isoformas de Proteínas/metabolismo , Empalme del ARN/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , ARN no Traducido/genética , Análisis de Secuencia de ARN , Globinas beta/metabolismo
15.
Breast Cancer Res Treat ; 158(1): 29-41, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-27306423

RESUMEN

Curcumin is a potential agent for both the prevention and treatment of cancers. Curcumin treatment alone, or in combination with piperine, limits breast stem cell self-renewal, while remaining non-toxic to normal differentiated cells. We paired fluorescence-activated cell sorting with RNA sequencing to characterize the genome-wide changes induced specifically in normal breast stem cells following treatment with these compounds. We generated genome-wide maps of the transcriptional changes that occur in epithelial-like (ALDH+) and mesenchymal-like (ALDH-/CD44+/CD24-) normal breast stem/progenitor cells following treatment with curcumin and piperine. We show that curcumin targets both stem cell populations by down-regulating expression of breast stem cell genes including ALDH1A3, CD49f, PROM1, and TP63. We also identified novel genes and pathways targeted by curcumin, including downregulation of SCD. Transient siRNA knockdown of SCD in MCF10A cells significantly inhibited mammosphere formation and the mean proportion of CD44+/CD24- cells, suggesting that SCD is a regulator of breast stemness and a target of curcumin in breast stem cells. These findings extend previous reports of curcumin targeting stem cells, here in two phenotypically distinct stem/progenitor populations isolated from normal human breast tissue. We identified novel mechanisms by which curcumin and piperine target breast stem cell self-renewal, such as by targeting lipid metabolism, providing a mechanistic link between curcumin treatment and stem cell self-renewal. These results elucidate the mechanisms by which curcumin may act as a cancer-preventive compound and provide novel targets for cancer prevention and treatment.


Asunto(s)
Antineoplásicos/farmacología , Neoplasias de la Mama/genética , Curcumina/farmacología , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Estearoil-CoA Desaturasa/genética , Alcaloides/farmacología , Benzodioxoles/farmacología , Neoplasias de la Mama/prevención & control , Diferenciación Celular/efectos de los fármacos , Línea Celular Tumoral , Proliferación Celular/efectos de los fármacos , Separación Celular , Femenino , Citometría de Flujo , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Humanos , Células MCF-7 , Piperidinas/farmacología , Alcamidas Poliinsaturadas/farmacología , Células Madre/efectos de los fármacos , Células Madre/metabolismo
16.
Nucleic Acids Res ; 42(13): e105, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24878920

RESUMEN

Gene set enrichment testing can enhance the biological interpretation of ChIP-seq data. Here, we develop a method, ChIP-Enrich, for this analysis which empirically adjusts for gene locus length (the length of the gene body and its surrounding non-coding sequence). Adjustment for gene locus length is necessary because it is often positively associated with the presence of one or more peaks and because many biologically defined gene sets have an excess of genes with longer or shorter gene locus lengths. Unlike alternative methods, ChIP-Enrich can account for the wide range of gene locus length-to-peak presence relationships (observed in ENCODE ChIP-seq data sets). We show that ChIP-Enrich has a well-calibrated type I error rate using permuted ENCODE ChIP-seq data sets; in contrast, two commonly used gene set enrichment methods, Fisher's exact test and the binomial test implemented in Genomic Regions Enrichment of Annotations Tool (GREAT), can have highly inflated type I error rates and biases in ranking. We identify DNA-binding proteins, including CTCF, JunD and glucocorticoid receptor α (GRα), that show different enrichment patterns for peaks closer to versus further from transcription start sites. We also identify known and potential new biological functions of GRα. ChIP-Enrich is available as a web interface (http://chip-enrich.med.umich.edu) and Bioconductor package.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Genes , Sitios Genéticos , Análisis de Secuencia de ADN/métodos , Proteínas de Unión al ADN/análisis , Modelos Logísticos , Receptores de Glucocorticoides/análisis
17.
J Virol ; 88(16): 8924-35, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24872592

RESUMEN

UNLABELLED: Approximately 8% of the human genome is made up of endogenous retroviral sequences. As the HIV-1 Tat protein activates the overall expression of the human endogenous retrovirus type K (HERV-K) (HML-2), we used next-generation sequencing to determine which of the 91 currently annotated HERV-K (HML-2) proviruses are regulated by Tat. Transcriptome sequencing of total RNA isolated from Tat- and vehicle-treated peripheral blood lymphocytes from a healthy donor showed that Tat significantly activates expression of 26 unique HERV-K (HML-2) proviruses, silences 12, and does not significantly alter the expression of the remaining proviruses. Quantitative reverse transcription-PCR validation of the sequencing data was performed on Tat-treated PBLs of seven donors using provirus-specific primers and corroborated the results with a substantial degree of quantitative similarity. IMPORTANCE: The expression of HERV-K (HML-2) is tightly regulated but becomes markedly increased following infection with HIV-1, in part due to the HIV-1 Tat protein. The findings reported here demonstrate the complexity of the genome-wide regulation of HERV-K (HML-2) expression by Tat. This work also demonstrates that although HERV-K (HML-2) proviruses in the human genome are highly similar in terms of DNA sequence, modulation of the expression of specific proviruses in a given biological situation can be ascertained using next-generation sequencing and bioinformatics analysis.


Asunto(s)
Retrovirus Endógenos/genética , Productos del Gen tat/genética , Productos del Gen tat/metabolismo , VIH-1/genética , VIH-1/metabolismo , Transcriptoma/genética , Células Cultivadas , Retrovirus Endógenos/metabolismo , Genoma Humano/genética , Infecciones por VIH/genética , Infecciones por VIH/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Linfocitos/virología , Provirus/genética , Provirus/metabolismo , ARN Viral/genética , Proteínas Virales/genética , Proteínas Virales/metabolismo
18.
Bioinformatics ; 30(17): 2414-22, 2014 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-24836530

RESUMEN

MOTIVATION: DNA methylation plays critical roles in gene regulation and cellular specification without altering DNA sequences. The wide application of reduced representation bisulfite sequencing (RRBS) and whole genome bisulfite sequencing (bis-seq) opens the door to study DNA methylation at single CpG site resolution. One challenging question is how best to test for significant methylation differences between groups of biological samples in order to minimize false positive findings. RESULTS: We present a statistical analysis package, methylSig, to analyse genome-wide methylation differences between samples from different treatments or disease groups. MethylSig takes into account both read coverage and biological variation by utilizing a beta-binomial approach across biological samples for a CpG site or region, and identifies relevant differences in CpG methylation. It can also incorporate local information to improve group methylation level and/or variance estimation for experiments with small sample size. A permutation study based on data from enhanced RRBS samples shows that methylSig maintains a well-calibrated type-I error when the number of samples is three or more per group. Our simulations show that methylSig has higher sensitivity compared with several alternative methods. The use of methylSig is illustrated with a comparison of different subtypes of acute leukemia and normal bone marrow samples. AVAILABILITY: methylSig is available as an R package at http://sartorlab.ccmb.med.umich.edu/software. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Metilación de ADN , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Islas de CpG , Genómica , Humanos , Leucemia Mieloide Aguda/genética , Sulfitos
19.
Bioinformatics ; 30(17): i393-400, 2014 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-25161225

RESUMEN

MOTIVATION: Functional enrichment testing facilitates the interpretation of Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) data in terms of pathways and other biological contexts. Previous methods developed and used to test for key gene sets affected in ChIP-seq experiments treat peaks as points, and are based on the number of peaks associated with a gene or a binary score for each gene. These approaches work well for transcription factors, but histone modifications often occur over broad domains, and across multiple genes. RESULTS: To incorporate the unique properties of broad domains into functional enrichment testing, we developed Broad-Enrich, a method that uses the proportion of each gene's locus covered by a peak. We show that our method has a well-calibrated false-positive rate, performing well with ChIP-seq data having broad domains compared with alternative approaches. We illustrate Broad-Enrich with 55 ENCODE ChIP-seq datasets using different methods to define gene loci. Broad-Enrich can also be applied to other datasets consisting of broad genomic domains such as copy number variations. AVAILABILITY AND IMPLEMENTATION: http://broad-enrich.med.umich.edu for Web version and R package. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Genómica/métodos , Histonas/metabolismo , Línea Celular , Sitios Genéticos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Modelos Logísticos , Análisis de Secuencia de ADN , Factores de Transcripción/metabolismo
20.
Bioinformatics ; 30(18): 2568-75, 2014 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-24894502

RESUMEN

MOTIVATION: ChIP-Seq is the standard method to identify genome-wide DNA-binding sites for transcription factors (TFs) and histone modifications. There is a growing need to analyze experiments with biological replicates, especially for epigenomic experiments where variation among biological samples can be substantial. However, tools that can perform group comparisons are currently lacking. RESULTS: We present a peak-calling prioritization pipeline (PePr) for identifying consistent or differential binding sites in ChIP-Seq experiments with biological replicates. PePr models read counts across the genome among biological samples with a negative binomial distribution and uses a local variance estimation method, ranking consistent or differential binding sites more favorably than sites with greater variability. We compared PePr with commonly used and recently proposed approaches on eight TF datasets and show that PePr uniquely identifies consistent regions with enriched read counts, high motif occurrence rate and known characteristics of TF binding based on visual inspection. For histone modification data with broadly enriched regions, PePr identified differential regions that are consistent within groups and outperformed other methods in scaling False Discovery Rate (FDR) analysis. AVAILABILITY AND IMPLEMENTATION: http://code.google.com/p/pepr-chip-seq/.


Asunto(s)
Algoritmos , Inmunoprecipitación de Cromatina/métodos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Animales , Línea Celular Tumoral , Epigenómica , Ratones , Motivos de Nucleótidos , Análisis de Secuencia por Matrices de Oligonucleótidos , Análisis de Secuencia de ADN , Factores de Transcripción/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA