Búsqueda | BVS CLAP/SMR-OPS/OMS

1.

BioPlexR and BioPlexPy: integrated data products for the analysis of human protein interactions.

Geistlinger, Ludwig; Vargas, Roger; Lee, Tyrone; Pan, Joshua; Huttlin, Edward L; Gentleman, Robert.

Bioinformatics ; 39(3)2023 03 01.

Artículo en Inglés | MEDLINE | ID: mdl-36794911

RESUMEN

SUMMARY: The BioPlex project has created two proteome scale, cell-line-specific protein-protein interaction (PPI) networks: the first in 293T cells, including 120k interactions among 15k proteins; and the second in HCT116 cells, including 70k interactions between 10k proteins. Here, we describe programmatic access to the BioPlex PPI networks and integration with related resources from within R and Python. Besides PPI networks for 293T and HCT116 cells, this includes access to CORUM protein complex data, PFAM protein domain data, PDB protein structures, and transcriptome and proteome data for the two cell lines. The implemented functionality serves as a basis for integrative downstream analysis of BioPlex PPI data with domain-specific R and Python packages, including efficient execution of maximum scoring sub-network analysis, protein domain-domain association analysis, mapping of PPIs onto 3D protein structures and analysis of BioPlex PPIs at the interface of transcriptomic and proteomic data. AVAILABILITY AND IMPLEMENTATION: The BioPlex R package is available from Bioconductor (bioconductor.org/packages/BioPlex), and the BioPlex Python package is available from PyPI (pypi.org/project/bioplexpy). Applications and downstream analyses are available from GitHub (github.com/ccb-hms/BioPlexAnalysis).

Asunto(s)

Proteoma , Programas Informáticos , Humanos , Proteómica , Mapas de Interacción de Proteínas , Transcriptoma

2.

Demographic, spatial and temporal dietary intake patterns among 526 774 23andMe research participants.

Shelton, Janie F; Cameron, Briana; Aslibekyan, Stella; Gentleman, Robert.

Public Health Nutr ; 24(10): 2952-2963, 2021 07.

Artículo en Inglés | MEDLINE | ID: mdl-32597744

RESUMEN

OBJECTIVE: To characterise dietary habits, their temporal and spatial patterns and associations with BMI in the 23andMe study population. DESIGN: We present a large-scale cross-sectional analysis of self-reported dietary intake data derived from the web-based National Health and Nutrition Examination Survey 2009-2010 dietary screener. Survey-weighted estimates for each food item were characterised by age, sex, race/ethnicity, education and BMI. Temporal patterns were plotted over a 2-year time period, and average consumption for select food items was mapped by state. Finally, dietary intake variables were tested for association with BMI. SETTING: US-based adults 20-85 years of age participating in the 23andMe research programme. PARTICIPANTS: Participants were 23andMe customers who consented to participate in research (n 526 774) and completed web-based surveys on demographic and dietary habits. RESULTS: Survey-weighted estimates show very few participants met federal recommendations for fruit: 2·6 %, vegetables: 5·9 % and dairy intake: 2·8 %. Between 2017 and 2019, fruit, vegetables and milk intake frequency declined, while total dairy remained stable and added sugars increased. Seasonal patterns in reporting were most pronounced for ice cream, chocolate, fruits and vegetables. Dietary habits varied across the USA, with higher intake of sugar and energy dense foods characterising areas with higher average BMI. In multivariate-adjusted models, BMI was directly associated with the intake of processed meat, red meat, dairy and inversely associated with consumption of fruit, vegetables and whole grains. CONCLUSIONS: 23andMe research participants have created an opportunity for rapid, large-scale, real-time nutritional data collection, informing demographic, seasonal and spatial patterns with broad geographical coverage across the USA.

Asunto(s)

Dieta , Verduras , Adulto , Estudios Transversales , Demografía , Ingestión de Alimentos , Ingestión de Energía , Conducta Alimentaria , Frutas , Humanos , Encuestas Nutricionales

3.

Orchestrating high-throughput genomic analysis with Bioconductor.

Huber, Wolfgang; Carey, Vincent J; Gentleman, Robert; Anders, Simon; Carlson, Marc; Carvalho, Benilton S; Bravo, Hector Corrada; Davis, Sean; Gatto, Laurent; Girke, Thomas; Gottardo, Raphael; Hahne, Florian; Hansen, Kasper D; Irizarry, Rafael A; Lawrence, Michael; Love, Michael I; MacDonald, James; Obenchain, Valerie; Oles, Andrzej K; Pagès, Hervé; Reyes, Alejandro; Shannon, Paul; Smyth, Gordon K; Tenenbaum, Dan; Waldron, Levi; Morgan, Martin.

Nat Methods ; 12(2): 115-21, 2015 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-25633503

RESUMEN

Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.

Asunto(s)

Biología Computacional , Perfilación de la Expresión Génica , Genómica/métodos , Ensayos Analíticos de Alto Rendimiento/métodos , Programas Informáticos , Lenguajes de Programación , Interfaz Usuario-Computador

4.

VariantTools: an extensible framework for developing and testing variant callers.

Lawrence, Michael; Gentleman, Robert.

Bioinformatics ; 33(20): 3311-3313, 2017 Oct 15.

Artículo en Inglés | MEDLINE | ID: mdl-29028267

RESUMEN

MOTIVATION: Variant calling is the complex task of separating real polymorphisms from errors. The appropriate strategy will depend on characteristics of the sample, the sequencing methodology and on the questions of interest. RESULTS: We present VariantTools, an extensible framework for developing and testing variant callers. There are facilities for reproducibly tallying, filtering, flagging and annotating variants. The tools are extensible, modular and flexible, so that they are tunable to particular use cases, and they interoperate with existing analysis software so that they can be embedded in established work flows. AVAILABILITY AND IMPLEMENTATION: VariantTools is available from http://www.bioconductor.org/. CONTACT: michafla@gene.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Técnicas de Genotipaje/métodos , Polimorfismo Genético , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Genómica/métodos

5.

Recurrent R-spondin fusions in colon cancer.

Seshagiri, Somasekar; Stawiski, Eric W; Durinck, Steffen; Modrusan, Zora; Storm, Elaine E; Conboy, Caitlin B; Chaudhuri, Subhra; Guan, Yinghui; Janakiraman, Vasantharajan; Jaiswal, Bijay S; Guillory, Joseph; Ha, Connie; Dijkgraaf, Gerrit J P; Stinson, Jeremy; Gnad, Florian; Huntley, Melanie A; Degenhardt, Jeremiah D; Haverty, Peter M; Bourgon, Richard; Wang, Weiru; Koeppen, Hartmut; Gentleman, Robert; Starr, Timothy K; Zhang, Zemin; Largaespada, David A; Wu, Thomas D; de Sauvage, Frederic J.

Nature ; 488(7413): 660-4, 2012 Aug 30.

Artículo en Inglés | MEDLINE | ID: mdl-22895193

RESUMEN

Identifying and understanding changes in cancer genomes is essential for the development of targeted therapeutics. Here we analyse systematically more than 70 pairs of primary human colon tumours by applying next-generation sequencing to characterize their exomes, transcriptomes and copy-number alterations. We have identified 36,303 protein-altering somatic changes that include several new recurrent mutations in the Wnt pathway gene TCF7L2, chromatin-remodelling genes such as TET2 and TET3 and receptor tyrosine kinases including ERBB3. Our analysis for significantly mutated cancer genes identified 23 candidates, including the cell cycle checkpoint kinase ATM. Copy-number and RNA-seq data analysis identified amplifications and corresponding overexpression of IGF2 in a subset of colon tumours. Furthermore, using RNA-seq data we identified multiple fusion transcripts including recurrent gene fusions involving R-spondin family members RSPO2 and RSPO3 that together occur in 10% of colon tumours. The RSPO fusions were mutually exclusive with APC mutations, indicating that they probably have a role in the activation of Wnt signalling and tumorigenesis. Consistent with this we show that the RSPO fusion proteins were capable of potentiating Wnt signalling. The R-spondin gene fusions and several other gene mutations identified in this study provide new potential opportunities for therapeutic intervention in colon cancer.

Asunto(s)

Neoplasias del Colon/genética , Fusión Génica/genética , Genes Relacionados con las Neoplasias/genética , Péptidos y Proteínas de Señalización Intercelular/genética , Trombospondinas/genética , Proteínas de la Ataxia Telangiectasia Mutada , Secuencia de Bases , Proteínas de Ciclo Celular/genética , Neoplasias del Colon/metabolismo , Neoplasias del Colon/patología , Variaciones en el Número de Copia de ADN/genética , Proteínas de Unión al ADN/genética , Dioxigenasas/genética , Exoma/genética , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica/genética , Genes APC , Humanos , Factor II del Crecimiento Similar a la Insulina/genética , Datos de Secuencia Molecular , Mutación/genética , Polimorfismo de Nucleótido Simple/genética , Proteínas Serina-Treonina Quinasas/genética , Proteínas Proto-Oncogénicas/genética , Receptor ErbB-3/genética , Análisis de Secuencia de ARN , Transducción de Señal/genética , Proteína 2 Similar al Factor de Transcripción 7/genética , Proteínas Supresoras de Tumor/genética , Proteínas Wnt/metabolismo

6.

Complex regulation of ADAR-mediated RNA-editing across tissues.

Huntley, Melanie A; Lou, Melanie; Goldstein, Leonard D; Lawrence, Michael; Dijkgraaf, Gerrit J P; Kaminker, Joshua S; Gentleman, Robert.

BMC Genomics ; 17: 61, 2016 Jan 15.

Artículo en Inglés | MEDLINE | ID: mdl-26768488

RESUMEN

BACKGROUND: RNA-editing is a tightly regulated, and essential cellular process for a properly functioning brain. Dysfunction of A-to-I RNA editing can have catastrophic effects, particularly in the central nervous system. Thus, understanding how the process of RNA-editing is regulated has important implications for human health. However, at present, very little is known about the regulation of editing across tissues, and individuals. RESULTS: Here we present an analysis of RNA-editing patterns from 9 different tissues harvested from a single mouse. For comparison, we also analyzed data for 5 of these tissues harvested from 15 additional animals. We find that tissue specificity of editing largely reflects differential expression of substrate transcripts across tissues. We identified a surprising enrichment of editing in intronic regions of brain transcripts, that could account for previously reported higher levels of editing in brain. There exists a small but remarkable amount of editing which is tissue-specific, despite comparable expression levels of the edit site across multiple tissues. Expression levels of editing enzymes and their isoforms can explain some, but not all of this variation. CONCLUSIONS: Together, these data suggest a complex regulation of the RNA-editing process beyond transcript expression levels.

Asunto(s)

Adenosina Desaminasa/genética , Especificidad de Órganos/genética , Edición de ARN/genética , Proteínas de Unión al ARN/genética , Adenosina Desaminasa/biosíntesis , Animales , Encéfalo/crecimiento & desarrollo , Encéfalo/metabolismo , Regulación de la Expresión Génica , Humanos , Intrones/genética , Ratones , Isoformas de Proteínas/genética , Proteínas de Unión al ARN/biosíntesis , Transcripción Genética

7.

The mutation spectrum revealed by paired genome sequences from a lung cancer patient.

Lee, William; Jiang, Zhaoshi; Liu, Jinfeng; Haverty, Peter M; Guan, Yinghui; Stinson, Jeremy; Yue, Peng; Zhang, Yan; Pant, Krishna P; Bhatt, Deepali; Ha, Connie; Johnson, Stephanie; Kennemer, Michael I; Mohan, Sankar; Nazarenko, Igor; Watanabe, Colin; Sparks, Andrew B; Shames, David S; Gentleman, Robert; de Sauvage, Frederic J; Stern, Howard; Pandita, Ajay; Ballinger, Dennis G; Drmanac, Radoje; Modrusan, Zora; Seshagiri, Somasekar; Zhang, Zemin.

Nature ; 465(7297): 473-7, 2010 May 27.

Artículo en Inglés | MEDLINE | ID: mdl-20505728

RESUMEN

Lung cancer is the leading cause of cancer-related mortality worldwide, with non-small-cell lung carcinomas in smokers being the predominant form of the disease. Although previous studies have identified important common somatic mutations in lung cancers, they have primarily focused on a limited set of genes and have thus provided a constrained view of the mutational spectrum. Recent cancer sequencing efforts have used next-generation sequencing technologies to provide a genome-wide view of mutations in leukaemia, breast cancer and cancer cell lines. Here we present the complete sequences of a primary lung tumour (60x coverage) and adjacent normal tissue (46x). Comparing the two genomes, we identify a wide variety of somatic variations, including >50,000 high-confidence single nucleotide variants. We validated 530 somatic single nucleotide variants in this tumour, including one in the KRAS proto-oncogene and 391 others in coding regions, as well as 43 large-scale structural variations. These constitute a large set of new somatic mutations and yield an estimated 17.7 per megabase genome-wide somatic mutation rate. Notably, we observe a distinct pattern of selection against mutations within expressed genes compared to non-expressed genes and in promoter regions up to 5 kilobases upstream of all protein-coding genes. Furthermore, we observe a higher rate of amino acid-changing mutations in kinase genes. We present a comprehensive view of somatic alterations in a single lung tumour, and provide the first evidence, to our knowledge, of distinct selective pressures present within the tumour environment.

Asunto(s)

Carcinoma de Pulmón de Células no Pequeñas/genética , Genoma Humano/genética , Neoplasias Pulmonares/genética , Mutación Puntual/genética , Análisis Mutacional de ADN , Humanos , Masculino , Persona de Mediana Edad , Modelos Biológicos , Proto-Oncogenes Mas , Selección Genética/genética

8.

Differential genomic targeting of the transcription factor TAL1 in alternate haematopoietic lineages.

Palii, Carmen G; Perez-Iratxeta, Carolina; Yao, Zizhen; Cao, Yi; Dai, Fengtao; Davison, Jerry; Atkins, Harold; Allan, David; Dilworth, F Jeffrey; Gentleman, Robert; Tapscott, Stephen J; Brand, Marjorie.

EMBO J ; 30(3): 494-509, 2011 Feb 02.

Artículo en Inglés | MEDLINE | ID: mdl-21179004

RESUMEN

TAL1/SCL is a master regulator of haematopoiesis whose expression promotes opposite outcomes depending on the cell type: differentiation in the erythroid lineage or oncogenesis in the T-cell lineage. Here, we used a combination of ChIP sequencing and gene expression profiling to compare the function of TAL1 in normal erythroid and leukaemic T cells. Analysis of the genome-wide binding properties of TAL1 in these two haematopoietic lineages revealed new insight into the mechanism by which transcription factors select their binding sites in alternate lineages. Our study shows limited overlap in the TAL1-binding profile between the two cell types with an unexpected preference for ETS and RUNX motifs adjacent to E-boxes in the T-cell lineage. Furthermore, we show that TAL1 interacts with RUNX1 and ETS1, and that these transcription factors are critically required for TAL1 binding to genes that modulate T-cell differentiation. Thus, our findings highlight a critical role of the cellular environment in modulating transcription factor binding, and provide insight into the mechanism by which TAL1 inhibits differentiation leading to oncogenesis in the T-cell lineage.

Asunto(s)

Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/genética , Diferenciación Celular/genética , Transformación Celular Neoplásica/genética , Hematopoyesis/genética , Leucemia de Células T/metabolismo , Proteínas Proto-Oncogénicas/genética , Linfocitos T/metabolismo , Secuencia de Bases , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/metabolismo , Sitios de Unión/genética , Células Cultivadas , Inmunoprecipitación de Cromatina , Subunidad alfa 2 del Factor de Unión al Sitio Principal/genética , Subunidad alfa 2 del Factor de Unión al Sitio Principal/metabolismo , Perfilación de la Expresión Génica , Hematopoyesis/fisiología , Humanos , Células Jurkat , Leucemia de Células T/genética , Análisis por Micromatrices , Datos de Secuencia Molecular , Proteína Proto-Oncogénica c-ets-1/genética , Proteína Proto-Oncogénica c-ets-1/metabolismo , Proteínas Proto-Oncogénicas/metabolismo , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Análisis de Secuencia de ADN , Proteína 1 de la Leucemia Linfocítica T Aguda , Linfocitos T/citología

9.

The effects of hepatitis B virus integration into the genomes of hepatocellular carcinoma patients.

Jiang, Zhaoshi; Jhunjhunwala, Suchit; Liu, Jinfeng; Haverty, Peter M; Kennemer, Michael I; Guan, Yinghui; Lee, William; Carnevali, Paolo; Stinson, Jeremy; Johnson, Stephanie; Diao, Jingyu; Yeung, Stacy; Jubb, Adrian; Ye, Weilan; Wu, Thomas D; Kapadia, Sharookh B; de Sauvage, Frederic J; Gentleman, Robert C; Stern, Howard M; Seshagiri, Somasekar; Pant, Krishna P; Modrusan, Zora; Ballinger, Dennis G; Zhang, Zemin.

Genome Res ; 22(4): 593-601, 2012 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-22267523

RESUMEN

Hepatitis B virus (HBV) infection is a leading risk factor for hepatocellular carcinoma (HCC). HBV integration into the host genome has been reported, but its scale, impact and contribution to HCC development is not clear. Here, we sequenced the tumor and nontumor genomes (>80× coverage) and transcriptomes of four HCC patients and identified 255 HBV integration sites. Increased sequencing to 240× coverage revealed a proportionally higher number of integration sites. Clonal expansion of HBV-integrated hepatocytes was found specifically in tumor samples. We observe a diverse collection of genomic perturbations near viral integration sites, including direct gene disruption, viral promoter-driven human transcription, viral-human transcript fusion, and DNA copy number alteration. Thus, we report the most comprehensive characterization of HBV integration in hepatocellular carcinoma patients. Such widespread random viral integration will likely increase carcinogenic opportunities in HBV-infected individuals.

Asunto(s)

Carcinoma Hepatocelular/genética , Genoma Humano/genética , Virus de la Hepatitis B/genética , Hepatitis B/genética , Neoplasias Hepáticas/genética , Integración Viral/genética , Secuencia de Bases , Sitios de Unión/genética , Carcinoma Hepatocelular/virología , Femenino , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Hepatitis B/virología , Virus de la Hepatitis B/fisiología , Interacciones Huésped-Patógeno/genética , Humanos , Neoplasias Hepáticas/virología , Masculino , Datos de Secuencia Molecular , Mutación , Análisis de Secuencia por Matrices de Oligonucleótidos , Análisis de Secuencia de ADN/métodos , Transcriptoma/genética

10.

Genome and transcriptome sequencing of lung cancers reveal diverse mutational and splicing events.

Liu, Jinfeng; Lee, William; Jiang, Zhaoshi; Chen, Zhongqiang; Jhunjhunwala, Suchit; Haverty, Peter M; Gnad, Florian; Guan, Yinghui; Gilbert, Houston N; Stinson, Jeremy; Klijn, Christiaan; Guillory, Joseph; Bhatt, Deepali; Vartanian, Steffan; Walter, Kimberly; Chan, Jocelyn; Holcomb, Thomas; Dijkgraaf, Peter; Johnson, Stephanie; Koeman, Julie; Minna, John D; Gazdar, Adi F; Stern, Howard M; Hoeflich, Klaus P; Wu, Thomas D; Settleman, Jeff; de Sauvage, Frederic J; Gentleman, Robert C; Neve, Richard M; Stokoe, David; Modrusan, Zora; Seshagiri, Somasekar; Shames, David S; Zhang, Zemin.

Genome Res ; 22(12): 2315-27, 2012 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-23033341

RESUMEN

Lung cancer is a highly heterogeneous disease in terms of both underlying genetic lesions and response to therapeutic treatments. We performed deep whole-genome sequencing and transcriptome sequencing on 19 lung cancer cell lines and three lung tumor/normal pairs. Overall, our data show that cell line models exhibit similar mutation spectra to human tumor samples. Smoker and never-smoker cancer samples exhibit distinguishable patterns of mutations. A number of epigenetic regulators, including KDM6A, ASH1L, SMARCA4, and ATAD2, are frequently altered by mutations or copy number changes. A systematic survey of splice-site mutations identified 106 splice site mutations associated with cancer specific aberrant splicing, including mutations in several known cancer-related genes. RAC1b, an isoform of the RAC1 GTPase that includes one additional exon, was found to be preferentially up-regulated in lung cancer. We further show that its expression is significantly associated with sensitivity to a MAP2K (MEK) inhibitor PD-0325901. Taken together, these data present a comprehensive genomic landscape of a large number of lung cancer samples and further demonstrate that cancer-specific alternative splicing is a widespread phenomenon that has potential utility as therapeutic biomarkers. The detailed characterizations of the lung cancer cell lines also provide genomic context to the vast amount of experimental data gathered for these lines over the decades, and represent highly valuable resources for cancer biology.

Asunto(s)

Empalme Alternativo , Regulación Neoplásica de la Expresión Génica , Genoma Humano/genética , Neoplasias Pulmonares/genética , Mutación , Transcriptoma , ATPasas Asociadas con Actividades Celulares Diversas , Adenosina Trifosfatasas/genética , Adenosina Trifosfatasas/metabolismo , Línea Celular Tumoral , Variaciones en el Número de Copia de ADN , ADN Helicasas/genética , ADN Helicasas/metabolismo , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Epigenómica , Exones , Marcadores Genéticos , Heterocigoto , Histona Demetilasas/genética , Histona Demetilasas/metabolismo , N-Metiltransferasa de Histona-Lisina , Humanos , Cariotipificación/métodos , Neoplasias Pulmonares/patología , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Polimorfismo de Nucleótido Simple , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Regulación hacia Arriba , Proteína de Unión al GTP rac1/genética , Proteína de Unión al GTP rac1/metabolismo

11.

gCMAP: user-friendly connectivity mapping with R.

Sandmann, Thomas; Kummerfeld, Sarah K; Gentleman, Robert; Bourgon, Richard.

Bioinformatics ; 30(1): 127-8, 2014 Jan 01.

Artículo en Inglés | MEDLINE | ID: mdl-24132929

RESUMEN

UNLABELLED: Connections between disease phenotypes and drug effects can be made by identifying commonalities in the associated patterns of differential gene expression. Searchable databases that record the impacts of chemical or genetic perturbations on the transcriptome--here referred to as 'connectivity maps'--permit discovery of such commonalities. We describe two R packages, gCMAP and gCMAPWeb, which provide a complete framework to construct and query connectivity maps assembled from user-defined collections of differential gene expression data. Microarray or RNAseq data are processed in a standardized way, and results can be interrogated using various well-established gene set enrichment methods. The packages also feature an easy-to-deploy web application that facilitates reproducible research through automatic generation of graphical and tabular reports. AVAILABILITY AND IMPLEMENTATION: The gCMAP and gCMAPWeb R packages are freely available for UNIX, Windows and Mac OS X operating systems at Bioconductor (http://www.bioconductor.org).

Asunto(s)

Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Interfaz Usuario-Computador , Animales , Línea Celular , Perfilación de la Expresión Génica/métodos , Humanos , Internet

12.

Discriminative motif analysis of high-throughput dataset.

Yao, Zizhen; Macquarrie, Kyle L; Fong, Abraham P; Tapscott, Stephen J; Ruzzo, Walter L; Gentleman, Robert C.

Bioinformatics ; 30(6): 775-83, 2014 Mar 15.

Artículo en Inglés | MEDLINE | ID: mdl-24162561

RESUMEN

MOTIVATION: High-throughput ChIP-seq studies typically identify thousands of peaks for a single transcription factor (TF). It is common for traditional motif discovery tools to predict motifs that are statistically significant against a naïve background distribution but are of questionable biological relevance. RESULTS: We describe a simple yet effective algorithm for discovering differential motifs between two sequence datasets that is effective in eliminating systematic biases and scalable to large datasets. Tested on 207 ENCODE ChIP-seq datasets, our method identifies correct motifs in 78% of the datasets with known motifs, demonstrating improvement in both accuracy and efficiency compared with DREME, another state-of-art discriminative motif discovery tool. More interestingly, on the remaining more challenging datasets, we identify common technical or biological factors that compromise the motif search results and use advanced features of our tool to control for these factors. We also present case studies demonstrating the ability of our method to detect single base pair differences in DNA specificity of two similar TFs. Lastly, we demonstrate discovery of key TF motifs involved in tissue specification by examination of high-throughput DNase accessibility data. AVAILABILITY: The motifRG package is publically available via the bioconductor repository. CONTACT: yzizhen@fhcrc.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Inmunoprecipitación de Cromatina/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Secuencia de Bases , ADN/genética , Humanos , Factores de Transcripción/genética

13.

Addressing the accuracy of direct-to-consumer genetic testing.

Wu, Shirley; Pollard, Jeffrey; Chowdry, Arnab; Scheller, Richard; Gentleman, Robert.

Genet Med ; 21(3): 758-759, 2019 03.

Artículo en Inglés | MEDLINE | ID: mdl-29955106

Asunto(s)

Pruebas Dirigidas al Consumidor , Pruebas Genéticas , Humanos , Atención al Paciente

14.

Software for computing and annotating genomic ranges.

Lawrence, Michael; Huber, Wolfgang; Pagès, Hervé; Aboyoun, Patrick; Carlson, Marc; Gentleman, Robert; Morgan, Martin T; Carey, Vincent J.

PLoS Comput Biol ; 9(8): e1003118, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-23950696

RESUMEN

We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.

Asunto(s)

Bases de Datos Genéticas , Genómica/métodos , Programas Informáticos , Algoritmos , Animales , Genómica/normas , Humanos , Ratones , Alineación de Secuencia , Análisis de Secuencia de ADN

15.

An integrative genomic approach identifies p73 and p63 as activators of miR-200 microRNA family transcription.

Knouf, Emily C; Garg, Kavita; Arroyo, Jason D; Correa, Yesenia; Sarkar, Deepayan; Parkin, Rachael K; Wurz, Kaitlyn; O'Briant, Kathy C; Godwin, Andrew K; Urban, Nicole D; Ruzzo, Walter L; Gentleman, Robert; Drescher, Charles W; Swisher, Elizabeth M; Tewari, Muneesh.

Nucleic Acids Res ; 40(2): 499-510, 2012 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-21917857

RESUMEN

Although microRNAs (miRNAs) are important regulators of gene expression, the transcriptional regulation of miRNAs themselves is not well understood. We employed an integrative computational pipeline to dissect the transcription factors (TFs) responsible for altered miRNA expression in ovarian carcinoma. Using experimental data and computational predictions to define miRNA promoters across the human genome, we identified TFs with binding sites significantly overrepresented among miRNA genes overexpressed in ovarian carcinoma. This pipeline nominated TFs of the p53/p63/p73 family as candidate drivers of miRNA overexpression. Analysis of data from an independent set of 253 ovarian carcinomas in The Cancer Genome Atlas showed that p73 and p63 expression is significantly correlated with expression of miRNAs whose promoters contain p53/p63/p73 family binding sites. In experimental validation of specific miRNAs predicted by the analysis to be regulated by p73 and p63, we found that p53/p63/p73 family binding sites modulate promoter activity of miRNAs of the miR-200 family, which are known regulators of cancer stem cells and epithelial-mesenchymal transitions. Furthermore, in chromatin immunoprecipitation studies both p73 and p63 directly associated with the miR-200b/a/429 promoter. This study delineates an integrative approach that can be applied to discover transcriptional regulatory mechanisms in other biological settings where analogous genomic data are available.

Asunto(s)

Proteínas de Unión al ADN/metabolismo , Genómica/métodos , MicroARNs/genética , Proteínas Nucleares/metabolismo , Factores de Transcripción/metabolismo , Proteínas Supresoras de Tumor/metabolismo , Sitios de Unión , Carcinoma/genética , Carcinoma/metabolismo , Línea Celular Tumoral , Femenino , Genoma Humano , Humanos , MicroARNs/biosíntesis , Anotación de Secuencia Molecular , Neoplasias Ováricas/genética , Neoplasias Ováricas/metabolismo , Regiones Promotoras Genéticas , Sitio de Iniciación de la Transcripción , Activación Transcripcional , Proteína Tumoral p73

16.

nhanesA: achieving transparency and reproducibility in NHANES research.

Ale, Laha; Gentleman, Robert; Sonmez, Teresa Filshtein; Sarkar, Deepayan; Endres, Christopher.

Database (Oxford) ; 20242024 Apr 15.

Artículo en Inglés | MEDLINE | ID: mdl-38625809

RESUMEN

The National Health and Nutrition Examination Survey provides comprehensive data on demographics, sociology, health and nutrition. Conducted in 2-year cycles since 1999, most of its data are publicly accessible, making it pivotal for research areas like studying social determinants of health or tracking trends in health metrics such as obesity or diabetes. Assembling the data and analyzing it presents a number of technical and analytic challenges. This paper introduces the nhanesA R package, which is designed to assist researchers in data retrieval and analysis and to enable the sharing and extension of prior research efforts. We believe that fostering community-driven activity in data reproducibility and sharing of analytic methods will greatly benefit the scientific community and propel scientific advancements. Database URL: https://github.com/cjendres1/nhanes.

Asunto(s)

Almacenamiento y Recuperación de la Información , Encuestas Nutricionales , Reproducibilidad de los Resultados , Bases de Datos Factuales

17.

Querying genomic databases: refining the connectivity map.

Segal, Mark R; Xiong, Hao; Bengtsson, Henrik; Bourgon, Richard; Gentleman, Robert.

Stat Appl Genet Mol Biol ; 11(2)2012 Jan 06.

Artículo en Inglés | MEDLINE | ID: mdl-22499690

RESUMEN

The advent of high-throughput biotechnologies, which can efficiently measure gene expression on a global basis, has led to the creation and population of correspondingly rich databases and compendia. Such repositories have the potential to add enormous scientific value beyond that provided by individual studies which, due largely to cost considerations, are typified by small sample sizes. Accordingly, substantial effort has been invested in devising analysis schemes for utilizing gene-expression repositories. Here, we focus on one such scheme, the Connectivity Map (cmap), that was developed with the express purpose of identifying drugs with putative efficacy against a given disease, where the disease in question is characterized by a (differential) gene-expression signature. Initial claims surrounding cmap intimated that such tools might lead to new, previously unanticipated applications of existing drugs. However, further application suggests that its primary utility is in connecting a disease condition whose biology is largely unknown to a drug whose mechanisms of action are well understood, making cmap a tool for enhancing biological knowledge.The success of the Connectivity Map is belied by its simplicity. The aforementioned signature serves as an unordered query which is applied to a customized database of (differential) gene-expression experiments designed to elicit response to a wide range of drugs, across of spectrum of concentrations, durations, and cell lines. Such application is effected by computing a per experiment score that measures "closeness" between the signature and the experiment. Top-scoring experiments, and the attendant drug(s), are then deemed relevant to the disease underlying the query. Inference supporting such elicitations is pursued via re-sampling. In this paper, we revisit two key aspects of the Connectivity Map implementation. Firstly, we develop new approaches to measuring closeness for the common scenario wherein the query constitutes an ordered list. These involve using metrics proposed for analyzing partially ranked data, these being of interest in their own right and not widely used. Secondly, we advance an alternate inferential approach based on generating empirical null distributions that exploit the scope, and capture dependencies, embodied by the database. Using these refinements we undertake a comprehensive re-evaluation of Connectivity Map findings that, in general terms, reveal that accommodating ordered queries is less critical than the mode of inference.

Asunto(s)

Minería de Datos/métodos , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Algoritmos , Biología Computacional/métodos , Estrógenos/farmacología , Expresión Génica/efectos de los fármacos , Predisposición Genética a la Enfermedad , Genómica/métodos , Inhibidores de Histona Desacetilasas/farmacología , Humanos , Limoninas/farmacología

18.

Towards BioDBcore: a community-defined information specification for biological databases.

Gaudet, Pascale; Bairoch, Amos; Field, Dawn; Sansone, Susanna-Assunta; Taylor, Chris; Attwood, Teresa K; Bateman, Alex; Blake, Judith A; Bult, Carol J; Cherry, J Michael; Chisholm, Rex L; Cochrane, Guy; Cook, Charles E; Eppig, Janan T; Galperin, Michael Y; Gentleman, Robert; Goble, Carole A; Gojobori, Takashi; Hancock, John M; Howe, Douglas G; Imanishi, Tadashi; Kelso, Janet; Landsman, David; Lewis, Suzanna E; Mizrachi, Ilene Karsch; Orchard, Sandra; Ouellette, B F Francis; Ranganathan, Shoba; Richardson, Lorna; Rocca-Serra, Philippe; Schofield, Paul N; Smedley, Damian; Southan, Christopher; Tan, Tin Wee; Tatusova, Tatiana; Whetzel, Patricia L; White, Owen; Yamasaki, Chisato.

Nucleic Acids Res ; 39(Database issue): D7-10, 2011 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-21097465

RESUMEN

The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.

Asunto(s)

Bases de Datos Factuales/normas , Difusión de la Información

19.

Independent filtering increases detection power for high-throughput experiments.

Bourgon, Richard; Gentleman, Robert; Huber, Wolfgang.

Proc Natl Acad Sci U S A ; 107(21): 9546-51, 2010 May 25.

Artículo en Inglés | MEDLINE | ID: mdl-20460310

RESUMEN

With high-dimensional data, variable-by-variable statistical testing is often used to select variables whose behavior differs across conditions. Such an approach requires adjustment for multiple testing, which can result in low statistical power. A two-stage approach that first filters variables by a criterion independent of the test statistic, and then only tests variables which pass the filter, can provide higher power. We show that use of some filter/test statistics pairs presented in the literature may, however, lead to loss of type I error control. We describe other pairs which avoid this problem. In an application to microarray data, we found that gene-by-gene filtering by overall variance followed by a t-test increased the number of discoveries by 50%. We also show that this particular statistic pair induces a lower bound on fold-change among the set of discoveries. Independent filtering-using filter/test pairs that are independent under the null hypothesis but correlated under the alternative-is a general approach that can substantially increase the efficiency of experiments.

Asunto(s)

Biometría/métodos , Algoritmos , Biología Computacional , Modelos Genéticos

20.

Circulating microRNAs as stable blood-based markers for cancer detection.

Mitchell, Patrick S; Parkin, Rachael K; Kroh, Evan M; Fritz, Brian R; Wyman, Stacia K; Pogosova-Agadjanyan, Era L; Peterson, Amelia; Noteboom, Jennifer; O'Briant, Kathy C; Allen, April; Lin, Daniel W; Urban, Nicole; Drescher, Charles W; Knudsen, Beatrice S; Stirewalt, Derek L; Gentleman, Robert; Vessella, Robert L; Nelson, Peter S; Martin, Daniel B; Tewari, Muneesh.

Proc Natl Acad Sci U S A ; 105(30): 10513-8, 2008 Jul 29.

Artículo en Inglés | MEDLINE | ID: mdl-18663219

RESUMEN

Improved approaches for the detection of common epithelial malignancies are urgently needed to reduce the worldwide morbidity and mortality caused by cancer. MicroRNAs (miRNAs) are small ( approximately 22 nt) regulatory RNAs that are frequently dysregulated in cancer and have shown promise as tissue-based markers for cancer classification and prognostication. We show here that miRNAs are present in human plasma in a remarkably stable form that is protected from endogenous RNase activity. miRNAs originating from human prostate cancer xenografts enter the circulation, are readily measured in plasma, and can robustly distinguish xenografted mice from controls. This concept extends to cancer in humans, where serum levels of miR-141 (a miRNA expressed in prostate cancer) can distinguish patients with prostate cancer from healthy controls. Our results establish the measurement of tumor-derived miRNAs in serum or plasma as an important approach for the blood-based detection of human cancer.

Asunto(s)

Biomarcadores de Tumor/genética , Regulación Neoplásica de la Expresión Génica , MicroARNs/sangre , MicroARNs/genética , Animales , Clonación Molecular , Perfilación de la Expresión Génica , Humanos , Masculino , Ratones , Trasplante de Neoplasias , Neoplasias/metabolismo , Neoplasias de la Próstata/sangre , Neoplasias de la Próstata/genética , ARN Neoplásico/sangre , ARN Neoplásico/metabolismo , Ribonucleasas/metabolismo , Sensibilidad y Especificidad

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA