Pesquisa | Prevenção e Controle de Câncer

Predicting nutrition and environmental factors associated with female reproductive disorders using a knowledge graph and random forests.

Chan, Lauren E; Casiraghi, Elena; Reese, Justin; Harmon, Quaker E; Schaper, Kevin; Hegde, Harshad; Valentini, Giorgio; Schmitt, Charles; Motsinger-Reif, Alison; Hall, Janet E; Mungall, Christopher J; Robinson, Peter N; Haendel, Melissa A.

Int J Med Inform ; 187: 105461, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38643701

RESUMO

OBJECTIVE: Female reproductive disorders (FRDs) are common health conditions that may present with significant symptoms. Diet and environment are potential areas for FRD interventions. We utilized a knowledge graph (KG) method to predict factors associated with common FRDs (for example, endometriosis, ovarian cyst, and uterine fibroids). MATERIALS AND METHODS: We harmonized survey data from the Personalized Environment and Genes Study (PEGS) on internal and external environmental exposures and health conditions with biomedical ontology content. We merged the harmonized data and ontologies with supplemental nutrient and agricultural chemical data to create a KG. We analyzed the KG by embedding edges and applying a random forest for edge prediction to identify variables potentially associated with FRDs. We also conducted logistic regression analysis for comparison. RESULTS: Across 9765 PEGS respondents, the KG analysis resulted in 8535 significant or suggestive predicted links between FRDs and chemicals, phenotypes, and diseases. Amongst these links, 32 were exact matches when compared with the logistic regression results, including comorbidities, medications, foods, and occupational exposures. DISCUSSION: Mechanistic underpinnings of predicted links documented in the literature may support some of our findings. Our KG methods are useful for predicting possible associations in large, survey-based datasets with added information on directionality and magnitude of effect from logistic regression. These results should not be construed as causal but can support hypothesis generation. CONCLUSION: This investigation enabled the generation of hypotheses on a variety of potential links between FRDs and exposures. Future investigations should prospectively evaluate the variables hypothesized to impact FRDs.

Assuntos

Exposição Ambiental , Humanos , Feminino , Exposição Ambiental/efeitos adversos , Doenças dos Genitais Femininos , Modelos Logísticos , Estado Nutricional , Dieta , Adulto , Algoritmo Florestas Aleatórias

Predicting nutrition and environmental factors associated with female reproductive disorders using a knowledge graph and random forests.

Chan, Lauren E; Casiraghi, Elena; Putman, Tim; Reese, Justin; Harmon, Quaker E; Schaper, Kevin; Hedge, Harshad; Valentini, Giorgio; Schmitt, Charles; Motsinger-Reif, Alison; Hall, Janet E; Mungall, Christopher J; Robinson, Peter N; Haendel, Melissa A.

medRxiv ; 2023 Jul 16.

Artigo em Inglês | MEDLINE | ID: mdl-37502882

RESUMO

Objective: Female reproductive disorders (FRDs) are common health conditions that may present with significant symptoms. Diet and environment are potential areas for FRD interventions. We utilized a knowledge graph (KG) method to predict factors associated with common FRDs (e.g., endometriosis, ovarian cyst, and uterine fibroids). Materials and Methods: We harmonized survey data from the Personalized Environment and Genes Study on internal and external environmental exposures and health conditions with biomedical ontology content. We merged the harmonized data and ontologies with supplemental nutrient and agricultural chemical data to create a KG. We analyzed the KG by embedding edges and applying a random forest for edge prediction to identify variables potentially associated with FRDs. We also conducted logistic regression analysis for comparison. Results: Across 9765 PEGS respondents, the KG analysis resulted in 8535 significant predicted links between FRDs and chemicals, phenotypes, and diseases. Amongst these links, 32 were exact matches when compared with the logistic regression results, including comorbidities, medications, foods, and occupational exposures. Discussion: Mechanistic underpinnings of predicted links documented in the literature may support some of our findings. Our KG methods are useful for predicting possible associations in large, survey-based datasets with added information on directionality and magnitude of effect from logistic regression. These results should not be construed as causal, but can support hypothesis generation. Conclusion: This investigation enabled the generation of hypotheses on a variety of potential links between FRDs and exposures. Future investigations should prospectively evaluate the variables hypothesized to impact FRDs.

Phenopacket-tools: Building and validating GA4GH Phenopackets.

Danis, Daniel; Jacobsen, Julius O B; Wagner, Alex H; Groza, Tudor; Beckwith, Martha A; Rekerle, Lauren; Carmody, Leigh C; Reese, Justin; Hegde, Harshad; Ladewig, Markus S; Seitz, Berthold; Munoz-Torres, Monica; Harris, Nomi L; Rambla, Jordi; Baudis, Michael; Mungall, Christopher J; Haendel, Melissa A; Robinson, Peter N.

PLoS One ; 18(5): e0285433, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37196000

RESUMO

The Global Alliance for Genomics and Health (GA4GH) is a standards-setting organization that is developing a suite of coordinated standards for genomics. The GA4GH Phenopacket Schema is a standard for sharing disease and phenotype information that characterizes an individual person or biosample. The Phenopacket Schema is flexible and can represent clinical data for any kind of human disease including rare disease, complex disease, and cancer. It also allows consortia or databases to apply additional constraints to ensure uniform data collection for specific goals. We present phenopacket-tools, an open-source Java library and command-line application for construction, conversion, and validation of phenopackets. Phenopacket-tools simplifies construction of phenopackets by providing concise builders, programmatic shortcuts, and predefined building blocks (ontology classes) for concepts such as anatomical organs, age of onset, biospecimen type, and clinical modifiers. Phenopacket-tools can be used to validate the syntax and semantics of phenopackets as well as to assess adherence to additional user-defined requirements. The documentation includes examples showing how to use the Java library and the command-line tool to create and validate phenopackets. We demonstrate how to create, convert, and validate phenopackets using the library or the command-line application. Source code, API documentation, comprehensive user guide and a tutorial can be found at https://github.com/phenopackets/phenopacket-tools. The library can be installed from the public Maven Central artifact repository and the application is available as a standalone archive. The phenopacket-tools library helps developers implement and standardize the collection and exchange of phenotypic and other clinical data for use in phenotype-driven genomic diagnostics, translational research, and precision medicine applications.

Assuntos

Neoplasias , Software , Humanos , Genômica , Bases de Dados Factuais , Biblioteca Gênica

Metformin is associated with reduced COVID-19 severity in patients with prediabetes.

Chan, Lauren E; Casiraghi, Elena; Laraway, Bryan; Coleman, Ben; Blau, Hannah; Zaman, Adnin; Harris, Nomi L; Wilkins, Kenneth; Antony, Blessy; Gargano, Michael; Valentini, Giorgio; Sahner, David; Haendel, Melissa; Robinson, Peter N; Bramante, Carolyn; Reese, Justin.

Diabetes Res Clin Pract ; 194: 110157, 2022 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-36400170

RESUMO

AIMS: Studies suggest that metformin is associated with reduced COVID-19 severity in individuals with diabetes compared to other antihyperglycemics. We assessed if metformin is associated with reduced incidence of severe COVID-19 for patients with prediabetes or polycystic ovary syndrome (PCOS), common diseases that increase the risk of severe COVID-19. METHODS: This observational, retrospective study utilized EHR data from 52 hospitals for COVID-19 patients with PCOS or prediabetes treated with metformin or levothyroxine/ondansetron (controls). After balancing via inverse probability score weighting, associations with COVID-19 severity were assessed by logistic regression. RESULTS: In the prediabetes cohort, when compared to levothyroxine, metformin was associated with a significantly lower incidence of COVID-19 with "mild-ED" or worse (OR [95% CI]: 0.636, [0.455-0.888]) and "moderate" or worse severity (0.493 [0.339-0.718]). Compared to ondansetron, metformin was associated with lower incidence of "mild-ED" or worse severity (0.039 [0.026-0.057]), "moderate" or worse (0.045 [0.03-0.069]), "severe" or worse (0.183 [0.077-0.431]), and "mortality/hospice" (0.223 [0.071-0.694]). For PCOS, metformin showed no significant differences in severity compared to levothyroxine, but was associated with a significantly lower incidence of "mild-ED" or worse (0.101 [0.061-0.166]), and "moderate" or worse (0.094 [0.049-0.18]) COVID-19 outcome compared to ondansetron. CONCLUSIONS: Metformin use is associated with less severe COVID-19 in patients with prediabetes or PCOS.

Assuntos

COVID-19 , Metformina , Síndrome do Ovário Policístico , Estado Pré-Diabético , Feminino , Humanos , Metformina/uso terapêutico , Estudos Retrospectivos , COVID-19/epidemiologia , COVID-19/complicações , Estado Pré-Diabético/tratamento farmacológico , Estado Pré-Diabético/epidemiologia , Estado Pré-Diabético/complicações , Síndrome do Ovário Policístico/complicações , Hipoglicemiantes/uso terapêutico , Tiroxina

Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer.

Ravanmehr, Vida; Blau, Hannah; Cappelletti, Luca; Fontana, Tommaso; Carmody, Leigh; Coleman, Ben; George, Joshy; Reese, Justin; Joachimiak, Marcin; Bocci, Giovanni; Hansen, Peter; Bult, Carol; Rueter, Jens; Casiraghi, Elena; Valentini, Giorgio; Mungall, Christopher; Oprea, Tudor I; Robinson, Peter N.

NAR Genom Bioinform ; 3(4): lqab113, 2021 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-34888523

RESUMO

Inhibiting protein kinases (PKs) that cause cancers has been an important topic in cancer therapy for years. So far, almost 8% of >530 PKs have been targeted by FDA-approved medications, and around 150 protein kinase inhibitors (PKIs) have been tested in clinical trials. We present an approach based on natural language processing and machine learning to investigate the relations between PKs and cancers, predicting PKs whose inhibition would be efficacious to treat a certain cancer. Our approach represents PKs and cancers as semantically meaningful 100-dimensional vectors based on word and concept neighborhoods in PubMed abstracts. We use information about phase I-IV trials in ClinicalTrials.gov to construct a training set for random forest classification. Our results with historical data show that associations between PKs and specific cancers can be predicted years in advance with good accuracy. Our tool can be used to predict the relevance of inhibiting PKs for specific cancers and to support the design of well-focused clinical trials to discover novel PKIs for cancer therapy.

BAYSIC: a Bayesian method for combining sets of genome variants with improved specificity and sensitivity.

Cantarel, Brandi L; Weaver, Daniel; McNeill, Nathan; Zhang, Jianhua; Mackey, Aaron J; Reese, Justin.

BMC Bioinformatics ; 15: 104, 2014 Apr 12.

Artigo em Inglês | MEDLINE | ID: mdl-24725768

RESUMO

BACKGROUND: Accurate genomic variant detection is an essential step in gleaning medically useful information from genome data. However, low concordance among variant-calling methods reduces confidence in the clinical validity of whole genome and exome sequence data, and confounds downstream analysis for applications in genome medicine.Here we describe BAYSIC (BAYeSian Integrated Caller), which combines SNP variant calls produced by different methods (e.g. GATK, FreeBayes, Atlas, SamTools, etc.) into a more accurate set of variant calls. BAYSIC differs from majority voting, consensus or other ad hoc intersection-based schemes for combining sets of genome variant calls. Unlike other classification methods, the underlying BAYSIC model does not require training using a "gold standard" of true positives. Rather, with each new dataset, BAYSIC performs an unsupervised, fully Bayesian latent class analysis to estimate false positive and false negative error rates for each input method. The user specifies a posterior probability threshold according to the user's tolerance for false positive and false negative errors; lowering the posterior probability threshold allows the user to trade specificity for sensitivity while raising the threshold increases specificity in exchange for sensitivity. RESULTS: We assessed the performance of BAYSIC in comparison to other variant detection methods using ten low coverage (~5X) samples from The 1000 Genomes Project, a tumor/normal exome pair (40X), and exome sequences (40X) from positive control samples previously identified to contain clinically relevant SNPs. We demonstrated BAYSIC's superior variant-calling accuracy, both for somatic mutation detection and germline variant detection. CONCLUSIONS: BAYSIC provides a method for combining sets of SNP variant calls produced by different variant calling programs. The integrated set of SNP variant calls produced by BAYSIC improves the sensitivity and specificity of the variant calls used as input. In addition to combining sets of germline variants, BAYSIC can also be used to combine sets of somatic mutations detected in the context of tumor/normal sequencing experiments.

Assuntos

Genoma Humano , Design de Software , Algoritmos , Teorema de Bayes , Exoma , Humanos , Mutação , Polimorfismo de Nucleotídeo Único , Probabilidade

Finding the missing honey bee genes: lessons learned from a genome upgrade.

Elsik, Christine G; Worley, Kim C; Bennett, Anna K; Beye, Martin; Camara, Francisco; Childers, Christopher P; de Graaf, Dirk C; Debyser, Griet; Deng, Jixin; Devreese, Bart; Elhaik, Eran; Evans, Jay D; Foster, Leonard J; Graur, Dan; Guigo, Roderic; Hoff, Katharina Jasmin; Holder, Michael E; Hudson, Matthew E; Hunt, Greg J; Jiang, Huaiyang; Joshi, Vandita; Khetani, Radhika S; Kosarev, Peter; Kovar, Christie L; Ma, Jian; Maleszka, Ryszard; Moritz, Robin F A; Munoz-Torres, Monica C; Murphy, Terence D; Muzny, Donna M; Newsham, Irene F; Reese, Justin T; Robertson, Hugh M; Robinson, Gene E; Rueppell, Olav; Solovyev, Victor; Stanke, Mario; Stolle, Eckart; Tsuruda, Jennifer M; Vaerenbergh, Matthias Van; Waterhouse, Robert M; Weaver, Daniel B; Whitfield, Charles W; Wu, Yuanqing; Zdobnov, Evgeny M; Zhang, Lan; Zhu, Dianhui; Gibbs, Richard A.

BMC Genomics ; 15: 86, 2014 Jan 30.

Artigo em Inglês | MEDLINE | ID: mdl-24479613

RESUMO

BACKGROUND: The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. RESULTS: Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. CONCLUSIONS: Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.

Assuntos

Abelhas/genética , Genes de Insetos , Animais , Composição de Bases , Bases de Dados Genéticas , Sequências Repetitivas Dispersas/genética , Anotação de Sequência Molecular , Fases de Leitura Aberta/genética , Peptídeos/análise , Análise de Sequência de RNA , Homologia de Sequência de Aminoácidos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA