Búsqueda | BVS CLAP/SMR-OPS/OMS

1.

mbQTL: an R/Bioconductor package for microbial quantitative trait loci (QTL) estimation.

Movassagh, Mercedeh; Schiff, Steven J; Paulson, Joseph N.

Bioinformatics ; 39(9)2023 09 02.

Artículo en Inglés | MEDLINE | ID: mdl-37707523

RESUMEN

MOTIVATION: In recent years, significant strides have been made in the field of genomics, with the commencement of large-scale studies aimed at collecting host mutational profiles and microbiome data. The amalgamation of host gene mutational profiles in both healthy and diseased subjects with microbial abundance data holds immense promise in providing insights into several crucial research questions, including the development and progression of diseases, as well as individual responses to therapeutic interventions. With the advent of sequencing methods such as 16s ribosomal RNA (rRNA) sequencing and whole genome sequencing, there is increasing evidence of interplay of human genetics and microbial communities. Quantitative trait loci associated with microbial abundance (mbQTLs), are genetic variants that influence the abundance of microbial populations within the host. RESULTS: Here, we introduce mbQTL, the first R package integrating 16S ribosomal RNA (rRNA) sequencing and single-nucleotide variation (SNV) and single-nucleotide polymorphism (SNP) data. We describe various statistical methods implemented for the identification of microbe-SNV pairs, relevant statistical measures, and plot functionality for interpretation. AVAILABILITY AND IMPLEMENTATION: mbQTL is available on bioconductor at https://bioconductor.org/packages/mbQTL/.

Asunto(s)

Microbiota , Sitios de Carácter Cuantitativo , Humanos , ARN Ribosómico 16S/genética , Genómica , Mutación , Nucleótidos

2.

Neonatal Paenibacilliosis: Paenibacillus Infection as a Novel Cause of Sepsis in Term Neonates With High Risk of Sequelae in Uganda.

Ericson, Jessica E; Burgoine, Kathy; Kumbakumba, Elias; Ochora, Moses; Hehnly, Christine; Bajunirwe, Francis; Bazira, Joel; Fronterre, Claudio; Hagmann, Cornelia; Kulkarni, Abhaya V; Kumar, M Senthil; Magombe, Joshua; Mbabazi-Kabachelor, Edith; Morton, Sarah U; Movassagh, Mercedeh; Mugamba, John; Mulondo, Ronald; Natukwatsa, Davis; Kaaya, Brian Nsubuga; Olupot-Olupot, Peter; Onen, Justin; Sheldon, Kathryn; Smith, Jasmine; Ssentongo, Paddy; Ssenyonga, Peter; Warf, Benjamin; Wegoye, Emmanuel; Zhang, Lijun; Kiwanuka, Julius; Paulson, Joseph N; Broach, James R; Schiff, Steven J.

Clin Infect Dis ; 77(5): 768-775, 2023 09 11.

Artículo en Inglés | MEDLINE | ID: mdl-37279589

RESUMEN

BACKGROUND: Paenibacillus thiaminolyticus may be an underdiagnosed cause of neonatal sepsis. METHODS: We prospectively enrolled a cohort of 800 full-term neonates presenting with a clinical diagnosis of sepsis at 2 Ugandan hospitals. Quantitative polymerase chain reaction specific to P. thiaminolyticus and to the Paenibacillus genus were performed on the blood and cerebrospinal fluid (CSF) of 631 neonates who had both specimen types available. Neonates with Paenibacillus genus or species detected in either specimen type were considered to potentially have paenibacilliosis, (37/631, 6%). We described antenatal, perinatal, and neonatal characteristics, presenting signs, and 12-month developmental outcomes for neonates with paenibacilliosis versus clinical sepsis due to other causes. RESULTS: Median age at presentation was 3 days (interquartile range 1, 7). Fever (92%), irritability (84%), and clinical signs of seizures (51%) were common. Eleven (30%) had an adverse outcome: 5 (14%) neonates died during the first year of life; 5 of 32 (16%) survivors developed postinfectious hydrocephalus (PIH) and 1 (3%) additional survivor had neurodevelopmental impairment without hydrocephalus. CONCLUSIONS: Paenibacillus species was identified in 6% of neonates with signs of sepsis who presented to 2 Ugandan referral hospitals; 70% were P. thiaminolyticus. Improved diagnostics for neonatal sepsis are urgently needed. Optimal antibiotic treatment for this infection is unknown but ampicillin and vancomycin will be ineffective in many cases. These results highlight the need to consider local pathogen prevalence and the possibility of unusual pathogens when determining antibiotic choice for neonatal sepsis.

Asunto(s)

Hidrocefalia , Sepsis Neonatal , Paenibacillus , Sepsis , Recién Nacido , Humanos , Femenino , Embarazo , Uganda/epidemiología , Sepsis/complicaciones , Sepsis/epidemiología , Sepsis/tratamiento farmacológico , Antibacterianos/uso terapéutico , Progresión de la Enfermedad

3.

GameRank: R package for feature selection and construction.

Henneges, Carsten; Paulson, Joseph N.

Bioinformatics ; 38(20): 4840-4842, 2022 10 14.

Artículo en Inglés | MEDLINE | ID: mdl-35951761

RESUMEN

MOTIVATION: Building calibrated and discriminating predictive models can be developed through the direct optimization of model performance metrics with combinatorial search algorithms. Often, predictive algorithms are desired in clinical settings to identify patients that may be high and low risk. However, due to the large combinatorial search space, these algorithms are slow and do not guarantee the global optimality of their selection. RESULTS: Here, we present a novel and quick maximum likelihood-based feature selection algorithm, named GameRank. The method is implemented into an R package composed of additional functions to build calibrated and discriminative predictive models. AVAILABILITY AND IMPLEMENTATION: GameRank is available at https://github.com/Genentech/GameRank and released under the MIT License.

Asunto(s)

Algoritmos , Programas Informáticos , Humanos , Funciones de Verosimilitud , Proyectos de Investigación

4.

mirTarRnaSeq: An R/Bioconductor Statistical Package for miRNA-mRNA Target Identification and Interaction Analysis.

Movassagh, Mercedeh; Morton, Sarah U; Hehnly, Christine; Smith, Jasmine; Doan, Trang T; Irizarry, Rafael; Broach, James R; Schiff, Steven J; Bailey, Jeffrey A; Paulson, Joseph N.

BMC Genomics ; 23(1): 439, 2022 Jun 13.

Artículo en Inglés | MEDLINE | ID: mdl-35698050

RESUMEN

We introduce mirTarRnaSeq, an R/Bioconductor package for quantitative assessment of miRNA-mRNA relationships within sample cohorts. mirTarRnaSeq is a statistical package to explore predicted or pre-hypothesized miRNA-mRNA relationships following target prediction.We present two use cases applying mirTarRnaSeq. First, to identify miRNA targets, we examined EBV miRNAs for interaction with human and virus transcriptomes of stomach adenocarcinoma. This revealed enrichment of mRNA targets highly expressed in CD105+ endothelial cells, monocytes, CD4+ T cells, NK cells, CD19+ B cells, and CD34 cells. Next, to investigate miRNA-mRNA relationships in SARS-CoV-2 (COVID-19) infection across time, we used paired miRNA and RNA sequenced datasets of SARS-CoV-2 infected lung epithelial cells across three time points (4, 12, and 24 hours post-infection). mirTarRnaSeq identified evidence for human miRNAs targeting cytokine signaling and neutrophil regulation immune pathways from 4 to 24 hours after SARS-CoV-2 infection. Confirming the clinical relevance of these predictions, three of the immune specific mRNA-miRNA relationships identified in human lung epithelial cells after SARS-CoV-2 infection were also observed to be differentially expressed in blood from patients with COVID-19. Overall, mirTarRnaSeq is a robust tool that can address a wide-range of biological questions providing improved prediction of miRNA-mRNA interactions.

Asunto(s)

COVID-19 , MicroARNs , COVID-19/genética , Células Endoteliales , Humanos , MicroARNs/genética , MicroARNs/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , SARS-CoV-2

5.

MicrobiomeExplorer: an R package for the analysis and visualization of microbial communities.

Reeder, Janina; Huang, Mo; Kaminker, Joshua S; Paulson, Joseph N.

Bioinformatics ; 37(9): 1317-1318, 2021 06 09.

Artículo en Inglés | MEDLINE | ID: mdl-32960962

RESUMEN

SUMMARY: We developed the MicrobiomeExplorer R package to facilitate the analysis and visualization of microbial communities. The MicrobiomeExplorer R package allows a user to perform typical microbiome analytic workflows and visualize their results, either through the command line or an interactive Shiny application included with the package. In addition to applying common analytical workflows, the application enables automated analysis report generation. AVAILABILITY AND IMPLEMENTATION: Available at https://github.com/zoecastillo/microbiomeExplorer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Microbiota , Programas Informáticos

6.

Prognostic mutational subtyping in de novo diffuse large B-cell lymphoma.

Kim, Eugene; Jiang, Yanwen; Xu, Tao; Bazeos, Alexandra; Knapp, Andrea; Bolen, Christopher R; Humphrey, Kathryn; Nielsen, Tina G; Penuel, Elicia; Paulson, Joseph N.

BMC Cancer ; 22(1): 231, 2022 Mar 03.

Artículo en Inglés | MEDLINE | ID: mdl-35236331

RESUMEN

BACKGROUND: Diffuse large B-cell lymphoma (DLBCL) is a heterogeneous disease defined using a number of well-established molecular subsets. Application of non-negative matrix factorization (NMF) to whole exome sequence data has previously been used to identify six distinct molecular clusters in DLBCL with potential clinical relevance. In this study, we applied NMF-clustering to targeted sequencing data utilizing the FoundationOne Heme® panel from the Phase III GOYA (NCT01287741) and Phase Ib/II CAVALLI studies (NCT02055820) in de novo DLBCL. Biopsy samples, survival outcomes, RNA-Seq and targeted exome-sequencing data were available for 423 patients in GOYA (obinutuzumab [G]-cyclophosphamide, doxorubicin, vincristine, and prednisone [CHOP] vs rituximab [R]-CHOP) and 86 patients in CAVALLI (venetoclax+[G/R]-CHOP). RESULTS: When the NMF algorithm was applied to samples from the GOYA study analyzed using a comprehensive genomic profiling platform, four of the six groups previously reported were observed: MYD88/CD79B, BCL2/EZH2, NOTCH2/TNFAIP3, and no mutations. Mutation profiles, cell-of-origin subset distributions and clinical associations of MYD88/CD79B and BCL2/EZH2 groups were similar to those described in previous NMF studies. In contrast, application of NMF to the CAVALLI study yielded only three; MYD88/CD79B-, BCL2/EZH2-like clusters, and a no mutations group, and there was a trend towards improved outcomes for BCL2/EZH2 over MYD88/CD79B. CONCLUSIONS: This analysis supports the utility of NMF used in conjunction with targeted sequencing platforms for identifying patients with different prognostic subsets. The observed trend for improved overall survival in the BCL2/EZH2 group is consistent with the mechanism of action of venetoclax, suggesting that targeting sequencing and NMF has potential for identifying patients who are more likely to gain benefit from venetoclax therapy.

Asunto(s)

Linfoma de Células B Grandes Difuso/genética , Mutación/genética , Adulto , Anciano , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapéutico , Compuestos Bicíclicos Heterocíclicos con Puentes/uso terapéutico , Ensayos Clínicos Fase II como Asunto , Ensayos Clínicos Fase III como Asunto , Proteína Potenciadora del Homólogo Zeste 2/genética , Femenino , Humanos , Linfoma de Células B Grandes Difuso/tratamiento farmacológico , Masculino , Persona de Mediana Edad , Pronóstico , Proteínas Proto-Oncogénicas c-bcl-2/genética , RNA-Seq , Sulfonamidas/uso terapéutico , Resultado del Tratamiento , Secuenciación del Exoma

7.

Multivariable association discovery in population-scale meta-omics studies.

Mallick, Himel; Rahnavard, Ali; McIver, Lauren J; Ma, Siyuan; Zhang, Yancong; Nguyen, Long H; Tickle, Timothy L; Weingart, George; Ren, Boyu; Schwager, Emma H; Chatterjee, Suvo; Thompson, Kelsey N; Wilkinson, Jeremy E; Subramanian, Ayshwarya; Lu, Yiren; Waldron, Levi; Paulson, Joseph N; Franzosa, Eric A; Bravo, Hector Corrada; Huttenhower, Curtis.

PLoS Comput Biol ; 17(11): e1009442, 2021 11.

Artículo en Inglés | MEDLINE | ID: mdl-34784344

RESUMEN

It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2's linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles.

Asunto(s)

Biología Computacional , Microbioma Gastrointestinal , Análisis Multivariante , Simulación por Computador , Humanos , Enfermedades Inflamatorias del Intestino/genética , Enfermedades Inflamatorias del Intestino/metabolismo , Enfermedades Inflamatorias del Intestino/patología

8.

metagenomeFeatures: an R package for working with 16S rRNA reference databases and marker-gene survey feature data.

Olson, Nathan D; Shah, Nidhi; Kancherla, Jayaram; Wagner, Justin; Paulson, Joseph N; Corrada Bravo, Hector.

Bioinformatics ; 35(19): 3870-3872, 2019 10 01.

Artículo en Inglés | MEDLINE | ID: mdl-30821316

RESUMEN

SUMMARY: We developed the metagenomeFeatures R Bioconductor package along with annotation packages for three 16S rRNA databases (Greengenes, RDP and SILVA) to facilitate working with 16S rRNA databases and marker-gene survey feature data. The metagenomeFeatures package defines two classes, MgDb for working with 16S rRNA sequence databases, and mgFeatures for marker-gene survey feature data. The associated annotation packages provide a consistent interface to the different databases facilitating database comparison and exploration. The mgFeatures-class represents a crucial step in the development of a common data structure for working with 16S marker-gene survey data in R. AVAILABILITY AND IMPLEMENTATION: https://bioconductor.org/packages/release/bioc/html/metagenomeFeatures.html. SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.

Asunto(s)

Bases de Datos de Ácidos Nucleicos , Programas Informáticos , ARN Ribosómico 16S , Encuestas y Cuestionarios

9.

Prognostic impact of somatic mutations in diffuse large B-cell lymphoma and relationship to cell-of-origin: data from the phase III GOYA study.

Bolen, Christopher R; Klanova, Magdalena; Trneny, Marek; Sehn, Laurie H; He, Jie; Tong, Jing; Paulson, Joseph N; Kim, Eugene; Vitolo, Umberto; Di Rocco, Alice; Fingerle-Rowson, Günter; Nielsen, Tina; Lenz, Georg; Oestergaard, Mikkel Z.

Haematologica ; 105(9): 2298-2307, 2020 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-33054054

RESUMEN

Diffuse large B-cell lymphoma represents a biologically and clinically heterogeneous diagnostic category with well-defined cell-of-origin subtypes. Using data from the GOYA study (NCT01287741), we characterized the mutational profile of diffuse large B-cell lymphoma and evaluated the prognostic impact of somatic mutations in relation to cell-of-origin. Targeted DNA next-generation sequencing was performed in 499 formalin-fixed paraffin-embedded tissue biopsies from previously untreated patients. Prevalence of genetic alterations/mutations was examined. Multivariate Cox regression was used to evaluate the prognostic effect of individual genomic alterations. Of 465 genes analyzed, 59 were identified with mutations occurring in at least 10 of 499 patients (≥2% prevalence); 334 additional genes had mutations occurring in ≥1 patient. Single nucleotide variants were the most common mutation type. On multivariate analysis, BCL2 alterations were most strongly associated with shorter progression-free survival (multivariate hazard ratio: 2.6; 95% confidence interval: 1.6 to 4.2). BCL2 alterations were detected in 102 of 499 patients; 92 had BCL2 translocations, 90% of whom had germinal center B-cell-like diffuse large B-cell lymphoma. BCL2 alterations were also significantly correlated with BCL2 gene and protein expression levels. Validation of published mutational subsets revealed consistent patterns of co-occurrence, but no consistent prognostic differences between subsets. Our data confirm the molecular heterogeneity of diffuse large B-cell lymphoma, with potential treatment targets occurring in distinct cell-of-origin subtypes. clinicaltrials.gov identifier: NCT01287741.

Asunto(s)

Linfoma de Células B Grandes Difuso , Proteínas Proto-Oncogénicas c-myc , Protocolos de Quimioterapia Combinada Antineoplásica , Ciclofosfamida/uso terapéutico , Doxorrubicina/uso terapéutico , Humanos , Linfoma de Células B Grandes Difuso/diagnóstico , Linfoma de Células B Grandes Difuso/tratamiento farmacológico , Linfoma de Células B Grandes Difuso/genética , Mutación , Prednisona/uso terapéutico , Pronóstico , Proteínas Proto-Oncogénicas c-bcl-2/genética , Proteínas Proto-Oncogénicas c-myc/genética , Rituximab/uso terapéutico , Vincristina/uso terapéutico

10.

Metaviz: interactive statistical and visual analysis of metagenomic data.

Wagner, Justin; Chelaru, Florin; Kancherla, Jayaram; Paulson, Joseph N; Zhang, Alexander; Felix, Victor; Mahurkar, Anup; Elmqvist, Niklas; Corrada Bravo, Héctor.

Nucleic Acids Res ; 46(6): 2777-2787, 2018 04 06.

Artículo en Inglés | MEDLINE | ID: mdl-29529268

RESUMEN

Large studies profiling microbial communities and their association with healthy or disease phenotypes are now commonplace. Processed data from many of these studies are publicly available but significant effort is required for users to effectively organize, explore and integrate it, limiting the utility of these rich data resources. Effective integrative and interactive visual and statistical tools to analyze many metagenomic samples can greatly increase the value of these data for researchers. We present Metaviz, a tool for interactive exploratory data analysis of annotated microbiome taxonomic community profiles derived from marker gene or whole metagenome shotgun sequencing. Metaviz is uniquely designed to address the challenge of browsing the hierarchical structure of metagenomic data features while rendering visualizations of data values that are dynamically updated in response to user navigation. We use Metaviz to provide the UMD Metagenome Browser web service, allowing users to browse and explore data for more than 7000 microbiomes from published studies. Users can also deploy Metaviz as a web service, or use it to analyze data through the metavizr package to interoperate with state-of-the-art analysis tools available through Bioconductor. Metaviz is free and open source with the code, documentation and tutorials publicly accessible.

Asunto(s)

Biología Computacional/métodos , Metagenoma/genética , Metagenómica/métodos , Secuenciación Completa del Genoma/métodos , Bacterias/clasificación , Bacterias/genética , Niño , Biología Computacional/estadística & datos numéricos , Diarrea/diagnóstico , Diarrea/genética , Humanos , Internet , Metagenómica/estadística & datos numéricos , Reproducibilidad de los Resultados , Navegador Web , Secuenciación Completa del Genoma/estadística & datos numéricos

11.

Simplified and representative bacterial community of maize roots.

Niu, Ben; Paulson, Joseph Nathaniel; Zheng, Xiaoqi; Kolter, Roberto.

Proc Natl Acad Sci U S A ; 114(12): E2450-E2459, 2017 03 21.

Artículo en Inglés | MEDLINE | ID: mdl-28275097

RESUMEN

Plant-associated microbes are important for the growth and health of their hosts. As a result of numerous prior studies, we know that host genotypes and abiotic factors influence the composition of plant microbiomes. However, the high complexity of these communities challenges detailed studies to define experimentally the mechanisms underlying the dynamics of community assembly and the beneficial effects of such microbiomes on plant hosts. In this work, from the distinctive microbiota assembled by maize roots, through host-mediated selection, we obtained a greatly simplified synthetic bacterial community consisting of seven strains (Enterobacter cloacae, Stenotrophomonas maltophilia, Ochrobactrum pituitosum, Herbaspirillum frisingense, Pseudomonas putida, Curtobacterium pusillum, and Chryseobacterium indologenes) representing three of the four most dominant phyla found in maize roots. By using a selective culture-dependent method to track the abundance of each strain, we investigated the role that each plays in community assembly on roots of axenic maize seedlings. Only the removal of E. cloacae led to the complete loss of the community, and C. pusillum took over. This result suggests that E. cloacae plays the role of keystone species in this model ecosystem. In planta and in vitro, this model community inhibited the phytopathogenic fungus Fusarium verticillioides, indicating a clear benefit to the host. Thus, combined with the selective culture-dependent quantification method, our synthetic seven-species community representing the root microbiome has the potential to serve as a useful system to explore how bacterial interspecies interactions affect root microbiome assembly and to dissect the beneficial effects of the root microbiota on hosts under laboratory conditions in the future.

Asunto(s)

Bacterias/aislamiento & purificación , Zea mays/microbiología , Bacterias/clasificación , Bacterias/genética , Microbiota , Filogenia , Raíces de Plantas/microbiología , Microbiología del Suelo

12.

Exploring regulation in tissues with eQTL networks.

Fagny, Maud; Paulson, Joseph N; Kuijjer, Marieke L; Sonawane, Abhijeet R; Chen, Cho-Yi; Lopes-Ramos, Camila M; Glass, Kimberly; Quackenbush, John; Platig, John.

Proc Natl Acad Sci U S A ; 114(37): E7841-E7850, 2017 09 12.

Artículo en Inglés | MEDLINE | ID: mdl-28851834

RESUMEN

Characterizing the collective regulatory impact of genetic variants on complex phenotypes is a major challenge in developing a genotype to phenotype map. Using expression quantitative trait locus (eQTL) analyses, we constructed bipartite networks in which edges represent significant associations between genetic variants and gene expression levels and found that the network structure informs regulatory function. We show, in 13 tissues, that these eQTL networks are organized into dense, highly modular communities grouping genes often involved in coherent biological processes. We find communities representing shared processes across tissues, as well as communities associated with tissue-specific processes that coalesce around variants in tissue-specific active chromatin regions. Node centrality is also highly informative, with the global and community hubs differing in regulatory potential and likelihood of being disease associated.

Asunto(s)

Estudio de Asociación del Genoma Completo/métodos , Especificidad de Órganos/genética , Sitios de Carácter Cuantitativo/genética , Expresión Génica/genética , Regulación de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Predisposición Genética a la Enfermedad/genética , Variación Genética , Genotipo , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/fisiología , Transcriptoma/genética

13.

Smooth quantile normalization.

Hicks, Stephanie C; Okrah, Kwame; Paulson, Joseph N; Quackenbush, John; Irizarry, Rafael A; Bravo, Héctor Corrada.

Biostatistics ; 19(2): 185-198, 2018 04 01.

Artículo en Inglés | MEDLINE | ID: mdl-29036413

RESUMEN

Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduce other forms of unwanted technical variation. Methods such as quantile normalization transform the statistical distributions across samples to be the same and assume global differences in the distribution are induced by only technical variation. However, it remains unclear how to proceed with normalization if these assumptions are violated, for example, if there are global differences in the statistical distributions between biological conditions or groups, and external information, such as negative or control features, is not available. Here, we introduce a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions, but allowing that they may differ between groups. We illustrate the advantages of our method on several high-throughput datasets with global differences in distributions corresponding to different biological conditions. We also perform a Monte Carlo simulation study to illustrate the bias-variance tradeoff and root mean squared error of qsmooth compared to other global normalization methods. A software implementation is available from https://github.com/stephaniehicks/qsmooth.

Asunto(s)

Bioestadística/métodos , Interpretación Estadística de Datos , Genómica/estadística & datos numéricos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Modelos Estadísticos , Humanos

14.

Cancer subtype identification using somatic mutation data.

Kuijjer, Marieke Lydia; Paulson, Joseph Nathaniel; Salzman, Peter; Ding, Wei; Quackenbush, John.

Br J Cancer ; 118(11): 1492-1501, 2018 05.

Artículo en Inglés | MEDLINE | ID: mdl-29765148

RESUMEN

BACKGROUND: With the onset of next-generation sequencing technologies, we have made great progress in identifying recurrent mutational drivers of cancer. As cancer tissues are now frequently screened for specific sets of mutations, a large amount of samples has become available for analysis. Classification of patients with similar mutation profiles may help identifying subgroups of patients who might benefit from specific types of treatment. However, classification based on somatic mutations is challenging due to the sparseness and heterogeneity of the data. METHODS: Here we describe a new method to de-sparsify somatic mutation data using biological pathways. We applied this method to 23 cancer types from The Cancer Genome Atlas, including samples from 5805 primary tumours. RESULTS: We show that, for most cancer types, de-sparsified mutation data associate with phenotypic data. We identify poor prognostic subtypes in three cancer types, which are associated with mutations in signal transduction pathways for which targeted treatment options are available. We identify subtype-drug associations for 14 additional subtypes. Finally, we perform a pan-cancer subtyping analysis and identify nine pan-cancer subtypes, which associate with mutations in four overarching sets of biological pathways. CONCLUSIONS: This study is an important step toward understanding mutational patterns in cancer.

Asunto(s)

Biomarcadores de Tumor/genética , Biología Computacional/métodos , Mutación , Neoplasias/clasificación , Curaduría de Datos , Bases de Datos Genéticas , Femenino , Redes Reguladoras de Genes , Humanos , Neoplasias/genética , Análisis de Componente Principal , Pronóstico

15.

Estimating gene regulatory networks with pandaR.

Schlauch, Daniel; Paulson, Joseph N; Young, Albert; Glass, Kimberly; Quackenbush, John.

Bioinformatics ; 33(14): 2232-2234, 2017 Jul 15.

Artículo en Inglés | MEDLINE | ID: mdl-28334344

RESUMEN

CONTACT: johnq@jimmy.harvard.edu or dschlauch@fas.harvard.edu. AVAILABILITY AND IMPLEMENTATION: PandaR is provided as a Bioconductor R Package and is available at bioconductor.org/packages/pandaR.

Asunto(s)

Biología Computacional/métodos , Redes Reguladoras de Genes , Programas Informáticos , Humanos , Modelos Biológicos , Mapas de Interacción de Proteínas , Transcriptoma

16.

Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data.

Paulson, Joseph N; Chen, Cho-Yi; Lopes-Ramos, Camila M; Kuijjer, Marieke L; Platig, John; Sonawane, Abhijeet R; Fagny, Maud; Glass, Kimberly; Quackenbush, John.

BMC Bioinformatics ; 18(1): 437, 2017 Oct 03.

Artículo en Inglés | MEDLINE | ID: mdl-28974199

RESUMEN

BACKGROUND: Although ultrahigh-throughput RNA-Sequencing has become the dominant technology for genome-wide transcriptional profiling, the vast majority of RNA-Seq studies typically profile only tens of samples, and most analytical pipelines are optimized for these smaller studies. However, projects are generating ever-larger data sets comprising RNA-Seq data from hundreds or thousands of samples, often collected at multiple centers and from diverse tissues. These complex data sets present significant analytical challenges due to batch and tissue effects, but provide the opportunity to revisit the assumptions and methods that we use to preprocess, normalize, and filter RNA-Seq data - critical first steps for any subsequent analysis. RESULTS: We find that analysis of large RNA-Seq data sets requires both careful quality control and the need to account for sparsity due to the heterogeneity intrinsic in multi-group studies. We developed Yet Another RNA Normalization software pipeline (YARN), that includes quality control and preprocessing, gene filtering, and normalization steps designed to facilitate downstream analysis of large, heterogeneous RNA-Seq data sets and we demonstrate its use with data from the Genotype-Tissue Expression (GTEx) project. CONCLUSIONS: An R package instantiating YARN is available at http://bioconductor.org/packages/yarn .

Asunto(s)

Bases de Datos Genéticas , Especificidad de Órganos/genética , Análisis de Secuencia de ARN/métodos , Análisis de Secuencia de ARN/normas , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Humanos , Anotación de Secuencia Molecular , Análisis de Componente Principal , Control de Calidad , Estándares de Referencia , Tamaño de la Muestra , Programas Informáticos

17.

Regulatory network changes between cell lines and their tissues of origin.

Lopes-Ramos, Camila M; Paulson, Joseph N; Chen, Cho-Yi; Kuijjer, Marieke L; Fagny, Maud; Platig, John; Sonawane, Abhijeet R; DeMeo, Dawn L; Quackenbush, John; Glass, Kimberly.

BMC Genomics ; 18(1): 723, 2017 Sep 12.

Artículo en Inglés | MEDLINE | ID: mdl-28899340

RESUMEN

BACKGROUND: Cell lines are an indispensable tool in biomedical research and often used as surrogates for tissues. Although there are recognized important cellular and transcriptomic differences between cell lines and tissues, a systematic overview of the differences between the regulatory processes of a cell line and those of its tissue of origin has not been conducted. The RNA-Seq data generated by the GTEx project is the first available data resource in which it is possible to perform a large-scale transcriptional and regulatory network analysis comparing cell lines with their tissues of origin. RESULTS: We compared 127 paired Epstein-Barr virus transformed lymphoblastoid cell lines (LCLs) and whole blood samples, and 244 paired primary fibroblast cell lines and skin samples. While gene expression analysis confirms that these cell lines carry the expression signatures of their primary tissues, albeit at reduced levels, network analysis indicates that expression changes are the cumulative result of many previously unreported alterations in transcription factor (TF) regulation. More specifically, cell cycle genes are over-expressed in cell lines compared to primary tissues, and this alteration in expression is a result of less repressive TF targeting. We confirmed these regulatory changes for four TFs, including SMAD5, using independent ChIP-seq data from ENCODE. CONCLUSIONS: Our results provide novel insights into the regulatory mechanisms controlling the expression differences between cell lines and tissues. The strong changes in TF regulation that we observe suggest that network changes, in addition to transcriptional levels, should be considered when using cell lines as models for tissues.

Asunto(s)

Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Ciclo Celular/genética , Línea Celular , Humanos , Especificidad de Órganos

18.

Privacy-preserving microbiome analysis using secure computation.

Wagner, Justin; Paulson, Joseph N; Wang, Xiao; Bhattacharjee, Bobby; Corrada Bravo, Héctor.

Bioinformatics ; 32(12): 1873-9, 2016 06 15.

Artículo en Inglés | MEDLINE | ID: mdl-26873931

RESUMEN

MOTIVATION: Developing targeted therapeutics and identifying biomarkers relies on large amounts of research participant data. Beyond human DNA, scientists now investigate the DNA of micro-organisms inhabiting the human body. Recent work shows that an individual's collection of microbial DNA consistently identifies that person and could be used to link a real-world identity to a sensitive attribute in a research dataset. Unfortunately, the current suite of DNA-specific privacy-preserving analysis tools does not meet the requirements for microbiome sequencing studies. RESULTS: To address privacy concerns around microbiome sequencing, we implement metagenomic analyses using secure computation. Our implementation allows comparative analysis over combined data without revealing the feature counts for any individual sample. We focus on three analyses and perform an evaluation on datasets currently used by the microbiome research community. We use our implementation to simulate sharing data between four policy-domains. Additionally, we describe an application of our implementation for patients to combine data that allows drug developers to query against and compensate patients for the analysis. AVAILABILITY AND IMPLEMENTATION: The software is freely available for download at: http://cbcb.umd.edu/â¼hcorrada/projects/secureseq.html SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: hcorrada@umiacs.umd.edu.

Asunto(s)

Microbiota , ADN , Humanos , Metagenómica , Privacidad , Programas Informáticos

19.

Individual-specific changes in the human gut microbiota after challenge with enterotoxigenic Escherichia coli and subsequent ciprofloxacin treatment.

Pop, Mihai; Paulson, Joseph N; Chakraborty, Subhra; Astrovskaya, Irina; Lindsay, Brianna R; Li, Shan; Bravo, Héctor Corrada; Harro, Clayton; Parkhill, Julian; Walker, Alan W; Walker, Richard I; Sack, David A; Stine, O Colin.

BMC Genomics ; 17: 440, 2016 06 08.

Artículo en Inglés | MEDLINE | ID: mdl-27277524

RESUMEN

BACKGROUND: Enterotoxigenic Escherichia coli (ETEC) is a major cause of diarrhea in inhabitants from low-income countries and in visitors to these countries. The impact of the human intestinal microbiota on the initiation and progression of ETEC diarrhea is not yet well understood. RESULTS: We used 16S rRNA (ribosomal RNA) gene sequencing to study changes in the fecal microbiota of 12 volunteers during a human challenge study with ETEC (H10407) and subsequent treatment with ciprofloxacin. Five subjects developed severe diarrhea and seven experienced few or no symptoms. Diarrheal symptoms were associated with high concentrations of fecal E. coli as measured by quantitative culture, quantitative PCR, and normalized number of 16S rRNA gene sequences. Large changes in other members of the microbiota varied greatly from individual to individual, whether or not diarrhea occurred. Nonetheless the variation within an individual was small compared to variation between individuals. Ciprofloxacin treatment reorganized microbiota populations; however, the original structure was largely restored at one and three month follow-up visits. CONCLUSION: Symptomatic ETEC infections, but not asymptomatic infections, were associated with high fecal concentrations of E. coli. Both infection and ciprofloxacin treatment caused variable changes in other bacteria that generally reverted to baseline levels after three months.

Asunto(s)

Ciprofloxacina/uso terapéutico , Escherichia coli Enterotoxigénica/efectos de los fármacos , Escherichia coli Enterotoxigénica/fisiología , Infecciones por Escherichia coli/tratamiento farmacológico , Infecciones por Escherichia coli/microbiología , Microbioma Gastrointestinal/efectos de los fármacos , Adulto , Ciprofloxacina/farmacología , Diarrea/tratamiento farmacológico , Diarrea/microbiología , Heces/microbiología , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Metagenoma , Metagenómica/métodos , Persona de Mediana Edad , ARN Ribosómico 16S , Curva ROC , Resultado del Tratamiento , Adulto Joven

20.

Differential abundance analysis for microbial marker-gene surveys.

Paulson, Joseph N; Stine, O Colin; Bravo, Héctor Corrada; Pop, Mihai.

Nat Methods ; 10(12): 1200-2, 2013 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-24076764

RESUMEN

We introduce a methodology to assess differential abundance in sparse high-throughput microbial marker-gene survey data. Our approach, implemented in the metagenomeSeq Bioconductor package, relies on a novel normalization technique and a statistical model that accounts for undersampling-a common feature of large-scale marker-gene studies. Using simulated data and several published microbiota data sets, we show that metagenomeSeq outperforms the tools currently used in this field.

Asunto(s)

Marcadores Genéticos , Metagenómica/métodos , Microbiota , ARN Ribosómico 16S/genética , Algoritmos , Animales , Área Bajo la Curva , Análisis por Conglomerados , Simulación por Computador , Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Variación Genética , Humanos , Intestinos/microbiología , Ratones , Modelos Genéticos , Modelos Estadísticos , Distribución Normal , Fenotipo , Análisis de Secuencia de ADN , Programas Informáticos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA