Search | VHL Regional Portal

1.

Integrating biological knowledge for mechanistic inference in the host-associated microbiome.

Santangelo, Brook E; Apgar, Madison; Colorado, Angela Sofia Burkhart; Martin, Casey G; Sterrett, John; Wall, Elena; Joachimiak, Marcin P; Hunter, Lawrence E; Lozupone, Catherine A.

Front Microbiol ; 15: 1351678, 2024.

Article in English | MEDLINE | ID: mdl-38638909

ABSTRACT

Advances in high-throughput technologies have enhanced our ability to describe microbial communities as they relate to human health and disease. Alongside the growth in sequencing data has come an influx of resources that synthesize knowledge surrounding microbial traits, functions, and metabolic potential with knowledge of how they may impact host pathways to influence disease phenotypes. These knowledge bases can enable the development of mechanistic explanations that may underlie correlations detected between microbial communities and disease. In this review, we survey existing resources and methodologies for the computational integration of broad classes of microbial and host knowledge. We evaluate these knowledge bases in their access methods, content, and source characteristics. We discuss challenges of the creation and utilization of knowledge bases including inconsistency of nomenclature assignment of taxa and metabolites across sources, whether the biological entities represented are rooted in ontologies or taxonomies, and how the structure and accessibility limit the diversity of applications and user types. We make this information available in a code and data repository at: https://github.com/lozuponelab/knowledge-source-mappings. Addressing these challenges will allow for the development of more effective tools for drawing from abundant knowledge to find new insights into microbial mechanisms in disease by fostering a systematic and unbiased exploration of existing information.

2.

An open source knowledge graph ecosystem for the life sciences.

Callahan, Tiffany J; Tripodi, Ignacio J; Stefanski, Adrianne L; Cappelletti, Luca; Taneja, Sanya B; Wyrwa, Jordan M; Casiraghi, Elena; Matentzoglu, Nicolas A; Reese, Justin; Silverstein, Jonathan C; Hoyt, Charles Tapley; Boyce, Richard D; Malec, Scott A; Unni, Deepak R; Joachimiak, Marcin P; Robinson, Peter N; Mungall, Christopher J; Cavalleri, Emanuele; Fontana, Tommaso; Valentini, Giorgio; Mesiti, Marco; Gillenwater, Lucas A; Santangelo, Brook; Vasilevsky, Nicole A; Hoehndorf, Robert; Bennett, Tellen D; Ryan, Patrick B; Hripcsak, George; Kahn, Michael G; Bada, Michael; Baumgartner, William A; Hunter, Lawrence E.

Sci Data ; 11(1): 363, 2024 Apr 11.

Article in English | MEDLINE | ID: mdl-38605048

ABSTRACT

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.

Subject(s)

Biological Science Disciplines , Knowledge Bases , Pattern Recognition, Automated , Algorithms , Translational Research, Biomedical

3.

Estimating geographic variation of infection fatality ratios during epidemics.

Ladau, Joshua; Brodie, Eoin L; Falco, Nicola; Bansal, Ishan; Hoffman, Elijah B; Joachimiak, Marcin P; Mora, Ana M; Walker, Angelica M; Wainwright, Haruko M; Wu, Yulun; Pavicic, Mirko; Jacobson, Daniel; Hess, Matthias; Brown, James B; Abuabara, Katrina.

Infect Dis Model ; 9(2): 634-643, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38572058

ABSTRACT

Objectives: We aim to estimate geographic variability in total numbers of infections and infection fatality ratios (IFR; the number of deaths caused by an infection per 1,000 infected people) when the availability and quality of data on disease burden are limited during an epidemic. Methods: We develop a noncentral hypergeometric framework that accounts for differential probabilities of positive tests and reflects the fact that symptomatic people are more likely to seek testing. We demonstrate the robustness, accuracy, and precision of this framework, and apply it to the United States (U.S.) COVID-19 pandemic to estimate county-level SARS-CoV-2 IFRs. Results: The estimators for the numbers of infections and IFRs showed high accuracy and precision; for instance, when applied to simulated validation data sets, across counties, Pearson correlation coefficients between estimator means and true values were 0.996 and 0.928, respectively, and they showed strong robustness to model misspecification. Applying the county-level estimators to the real, unsimulated COVID-19 data spanning April 1, 2020 to September 30, 2020 from across the U.S., we found that IFRs varied from 0 to 44.69, with a standard deviation of 3.55 and a median of 2.14. Conclusions: The proposed estimation framework can be used to identify geographic variation in IFRs across settings.

4.

Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning.

Caufield, J Harry; Hegde, Harshad; Emonet, Vincent; Harris, Nomi L; Joachimiak, Marcin P; Matentzoglu, Nicolas; Kim, HyeongSik; Moxon, Sierra; Reese, Justin T; Haendel, Melissa A; Robinson, Peter N; Mungall, Christopher J.

Bioinformatics ; 40(3)2024 Mar 04.

Article in English | MEDLINE | ID: mdl-38383067

ABSTRACT

MOTIVATION: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas. RESULTS: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM's native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. AVAILABILITY AND IMPLEMENTATION: SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.

Subject(s)

Knowledge Bases , Semantics , Databases, Factual

5.

A bacterial sensor taxonomy across earth ecosystems for machine learning applications.

Park, Helen; Joachimiak, Marcin P; Jungbluth, Sean P; Yang, Ziming; Riehl, William J; Canon, R Shane; Arkin, Adam P; Dehal, Paramvir S.

mSystems ; 9(1): e0002623, 2024 Jan 23.

Article in English | MEDLINE | ID: mdl-38078749

ABSTRACT

Microbial communities have evolved to colonize all ecosystems of the planet, from the deep sea to the human gut. Microbes survive by sensing, responding, and adapting to immediate environmental cues. This process is driven by signal transduction proteins such as histidine kinases, which use their sensing domains to bind or otherwise detect environmental cues and "transduce" signals to adjust internal processes. We hypothesized that an ecosystem's unique stimuli leave a sensor "fingerprint," able to identify and shed insight on ecosystem conditions. To test this, we collected 20,712 publicly available metagenomes from Host-associated, Environmental, and Engineered ecosystems across the globe. We extracted and clustered the collection's nearly 18M unique sensory domains into 113,712 similar groupings with MMseqs2. We built gradient-boosted decision tree machine learning models and found we could classify the ecosystem type (accuracy: 87%) and predict the levels of different physical parameters (R2 score: 83%) using the sensor cluster abundance as features. Feature importance enables identification of the most predictive sensors to differentiate between ecosystems which can lead to mechanistic interpretations if the sensor domains are well annotated. To demonstrate this, a machine learning model was trained to predict patient's disease state and used to identify domains related to oxygen sensing present in a healthy gut but missing in patients with abnormal conditions. Moreover, since 98.7% of identified sensor domains are uncharacterized, importance ranking can be used to prioritize sensors to determine what ecosystem function they may be sensing. Furthermore, these new predictive sensors can function as targets for novel sensor engineering with applications in biotechnology, ecosystem maintenance, and medicine.IMPORTANCEMicrobes infect, colonize, and proliferate due to their ability to sense and respond quickly to their surroundings. In this research, we extract the sensory proteins from a diverse range of environmental, engineered, and host-associated metagenomes. We trained machine learning classifiers using sensors as features such that it is possible to predict the ecosystem for a metagenome from its sensor profile. We use the optimized model's feature importance to identify the most impactful and predictive sensors in different environments. We next use the sensor profile from human gut metagenomes to classify their disease states and explore which sensors can explain differences between diseases. The sensors most predictive of environmental labels here, most of which correspond to uncharacterized proteins, are a useful starting point for the discovery of important environment signals and the development of possible diagnostic interventions.

Subject(s)

Metagenomics , Microbiota , Humans , Metagenome , Machine Learning , Earth, Planet

6.

Gene Set Summarization using Large Language Models.

Joachimiak, Marcin P; Caufield, J Harry; Harris, Nomi L; Kim, Hyeongsik; Mungall, Christopher J.

ArXiv ; 2023 May 25.

Article in English | MEDLINE | ID: mdl-37292480

ABSTRACT

Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpreting gene lists can also be framed as a textual summarization task, enabling the use of Large Language Models (LLMs), potentially utilizing scientific texts directly and avoiding reliance on a KB. We developed SPINDOCTOR (Structured Prompt Interpolation of Natural Language Descriptions of Controlled Terms for Ontology Reporting), a method that uses GPT models to perform gene set function summarization as a complement to standard enrichment analysis. This method can use different sources of gene functional information: (1) structured text derived from curated ontological KB annotations, (2) ontology-free narrative gene summaries, or (3) direct model retrieval. We demonstrate that these methods are able to generate plausible and biologically valid summary GO term lists for gene sets. However, GPT-based approaches are unable to deliver reliable scores or p-values and often return terms that are not statistically significant. Crucially, these methods were rarely able to recapitulate the most precise and informative term from standard enrichment, likely due to an inability to generalize and reason using an ontology. Results are highly nondeterministic, with minor variations in prompt resulting in radically different term lists. Our results show that at this point, LLM-based methods are unsuitable as a replacement for standard term enrichment analysis and that manual curation of ontological assertions remains necessary.

7.

KG-Hub-building and exchanging biological knowledge graphs.

Caufield, J Harry; Putman, Tim; Schaper, Kevin; Unni, Deepak R; Hegde, Harshad; Callahan, Tiffany J; Cappelletti, Luca; Moxon, Sierra A T; Ravanmehr, Vida; Carbon, Seth; Chan, Lauren E; Cortes, Katherina; Shefchek, Kent A; Elsarboukh, Glass; Balhoff, Jim; Fontana, Tommaso; Matentzoglu, Nicolas; Bruskiewich, Richard M; Thessen, Anne E; Harris, Nomi L; Munoz-Torres, Monica C; Haendel, Melissa A; Robinson, Peter N; Joachimiak, Marcin P; Mungall, Christopher J; Reese, Justin T.

Bioinformatics ; 39(7)2023 07 01.

Article in English | MEDLINE | ID: mdl-37389415

ABSTRACT

MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION: https://kghub.org.

Subject(s)

Biological Ontologies , COVID-19 , Humans , Pattern Recognition, Automated , Rare Diseases , Machine Learning

8.

Developing a Knowledge Graph for Pharmacokinetic Natural Product-Drug Interactions.

Taneja, Sanya B; Callahan, Tiffany J; Paine, Mary F; Kane-Gill, Sandra L; Kilicoglu, Halil; Joachimiak, Marcin P; Boyce, Richard D.

J Biomed Inform ; 140: 104341, 2023 04.

Article in English | MEDLINE | ID: mdl-36933632

ABSTRACT

BACKGROUND: Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical or other natural products are co-consumed with pharmaceutical drugs. With the growing use of natural products, the risk for potential NPDIs and consequent adverse events has increased. Understanding mechanisms of NPDIs is key to preventing or minimizing adverse events. Although biomedical knowledge graphs (KGs) have been widely used for drug-drug interaction applications, computational investigation of NPDIs is novel. We constructed NP-KG as a first step toward computational discovery of plausible mechanistic explanations for pharmacokinetic NPDIs that can be used to guide scientific research. METHODS: We developed a large-scale, heterogeneous KG with biomedical ontologies, linked data, and full texts of the scientific literature. To construct the KG, biomedical ontologies and drug databases were integrated with the Phenotype Knowledge Translator framework. The semantic relation extraction systems, SemRep and Integrated Network and Dynamic Reasoning Assembler, were used to extract semantic predications (subject-relation-object triples) from full texts of the scientific literature related to the exemplar natural products green tea and kratom. A literature-based graph constructed from the predications was integrated into the ontology-grounded KG to create NP-KG. NP-KG was evaluated with case studies of pharmacokinetic green tea- and kratom-drug interactions through KG path searches and meta-path discovery to determine congruent and contradictory information in NP-KG compared to ground truth data. We also conducted an error analysis to identify knowledge gaps and incorrect predications in the KG. RESULTS: The fully integrated NP-KG consisted of 745,512 nodes and 7,249,576 edges. Evaluation of NP-KG resulted in congruent (38.98% for green tea, 50% for kratom), contradictory (15.25% for green tea, 21.43% for kratom), and both congruent and contradictory (15.25% for green tea, 21.43% for kratom) information compared to ground truth data. Potential pharmacokinetic mechanisms for several purported NPDIs, including the green tea-raloxifene, green tea-nadolol, kratom-midazolam, kratom-quetiapine, and kratom-venlafaxine interactions were congruent with the published literature. CONCLUSION: NP-KG is the first KG to integrate biomedical ontologies with full texts of the scientific literature focused on natural products. We demonstrate the application of NP-KG to identify known pharmacokinetic interactions between natural products and pharmaceutical drugs mediated by drug metabolizing enzymes and transporters. Future work will incorporate context, contradiction analysis, and embedding-based methods to enrich NP-KG. NP-KG is publicly available at https://doi.org/10.5281/zenodo.6814507. The code for relation extraction, KG construction, and hypothesis generation is available at https://github.com/sanyabt/np-kg.

Subject(s)

Biological Ontologies , Biological Products , Pattern Recognition, Automated , Drug Interactions , Semantics , Pharmaceutical Preparations

9.

GRAPE for fast and scalable graph processing and random-walk-based embedding.

Cappelletti, Luca; Fontana, Tommaso; Casiraghi, Elena; Ravanmehr, Vida; Callahan, Tiffany J; Cano, Carlos; Joachimiak, Marcin P; Mungall, Christopher J; Robinson, Peter N; Reese, Justin; Valentini, Giorgio.

Nat Comput Sci ; 3(6): 552-568, 2023 Jun.

Article in English | MEDLINE | ID: mdl-38177435

ABSTRACT

Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods. Compared with state-of-the-art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as competitive edge- and node-label prediction performance. GRAPE comprises approximately 1.7 million well-documented lines of Python and Rust code and provides 69 node-embedding methods, 25 inference models, a collection of efficient graph-processing utilities, and over 80,000 graphs from the literature and other sources. Standardized interfaces allow a seamless integration of third-party libraries, while ready-to-use and modular pipelines permit an easy-to-use evaluation of graph-representation-learning methods, therefore also positioning GRAPE as a software resource that performs a fair comparison between methods and libraries for graph processing and embedding.

Subject(s)

Libraries , Vitis , Algorithms , Software , Learning

10.

Why was this cited? Explainable machine learning applied to COVID-19 research literature.

Beranová, Lucie; Joachimiak, Marcin P; Kliegr, Tomás; Rabby, Gollam; Sklenák, Vilém.

Scientometrics ; 127(5): 2313-2349, 2022.

Article in English | MEDLINE | ID: mdl-35431364

ABSTRACT

Multiple studies have investigated bibliometric factors predictive of the citation count a research article will receive. In this article, we go beyond bibliometric data by using a range of machine learning techniques to find patterns predictive of citation count using both article content and available metadata. As the input collection, we use the CORD-19 corpus containing research articles-mostly from biology and medicine-applicable to the COVID-19 crisis. Our study employs a combination of state-of-the-art machine learning techniques for text understanding, including embeddings-based language model BERT, several systems for detection and semantic expansion of entities: ConceptNet, Pubtator and ScispaCy. To interpret the resulting models, we use several explanation algorithms: random forest feature importance, LIME, and Shapley values. We compare the performance and comprehensibility of models obtained by "black-box" machine learning algorithms (neural networks and random forests) with models built with rule learning (CORELS, CBA), which are intrinsically explainable. Multiple rules were discovered, which referred to biomedical entities of potential interest. Of the rules with the highest lift measure, several rules pointed to dipeptidyl peptidase4 (DPP4), a known MERS-CoV receptor and a critical determinant of camel to human transmission of the camel coronavirus (MERS-CoV). Some other interesting patterns related to the type of animal investigated were found. Articles referring to bats and camels tend to draw citations, while articles referring to most other animal species related to coronavirus are lowly cited. Bat coronavirus is the only other virus from a non-human species in the betaB clade along with the SARS-CoV and SARS-CoV-2 viruses. MERS-CoV is in a sister betaC clade, also close to human SARS coronaviruses. Thus both species linked to high citation counts harbor coronaviruses which are more phylogenetically similar to human SARS viruses. On the other hand, feline (FIPV, FCOV) and canine coronaviruses (CCOV) are in the alpha coronavirus clade and more distant from the betaB clade with human SARS viruses. Other results include detection of apparent citation bias favouring authors with western sounding names. Equal performance of TF-IDF weights and binary word incidence matrix was observed, with the latter resulting in better interpretability. The best predictive performance was obtained with a "black-box" method-neural network. The rule-based models led to most insights, especially when coupled with text representation using semantic entity detection methods. Follow-up work should focus on the analysis of citation patterns in the context of phylogenetic trees, as well on patterns referring to DPP4, which is currently considered as a SARS-Cov-2 therapeutic target.

11.

Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer.

Ravanmehr, Vida; Blau, Hannah; Cappelletti, Luca; Fontana, Tommaso; Carmody, Leigh; Coleman, Ben; George, Joshy; Reese, Justin; Joachimiak, Marcin; Bocci, Giovanni; Hansen, Peter; Bult, Carol; Rueter, Jens; Casiraghi, Elena; Valentini, Giorgio; Mungall, Christopher; Oprea, Tudor I; Robinson, Peter N.

NAR Genom Bioinform ; 3(4): lqab113, 2021 Dec.

Article in English | MEDLINE | ID: mdl-34888523

ABSTRACT

Inhibiting protein kinases (PKs) that cause cancers has been an important topic in cancer therapy for years. So far, almost 8% of >530 PKs have been targeted by FDA-approved medications, and around 150 protein kinase inhibitors (PKIs) have been tested in clinical trials. We present an approach based on natural language processing and machine learning to investigate the relations between PKs and cancers, predicting PKs whose inhibition would be efficacious to treat a certain cancer. Our approach represents PKs and cancers as semantically meaningful 100-dimensional vectors based on word and concept neighborhoods in PubMed abstracts. We use information about phase I-IV trials in ClinicalTrials.gov to construct a training set for random forest classification. Our results with historical data show that associations between PKs and specific cancers can be predicted years in advance with good accuracy. Our tool can be used to predict the relevance of inhibiting PKs for specific cancers and to support the design of well-focused clinical trials to discover novel PKIs for cancer therapy.

12.

Correction for Vangay et al., "Microbiome Metadata Standards: Report of the National Microbiome Data Collaborative's Workshop and Follow-On Activities".

Vangay, Pajau; Burgin, Josephine; Johnston, Anjanette; Beck, Kristen L; Berrios, Daniel C; Blumberg, Kai; Canon, Shane; Chain, Patrick; Chandonia, John-Marc; Christianson, Danielle; Costes, Sylvain V; Damerow, Joan; Duncan, William D; Dundore-Arias, Jose Pablo; Fagnan, Kjiersten; Galazka, Jonathan M; Gibbons, Sean M; Hays, David; Hervey, Judson; Hu, Bin; Hurwitz, Bonnie L; Jaiswal, Pankaj; Joachimiak, Marcin P; Kinkel, Linda; Ladau, Joshua; Martin, Stanton L; McCue, Lee Ann; Miller, Kayd; Mouncey, Nigel; Mungall, Chris; Pafilis, Evangelos; Reddy, T B K; Richardson, Lorna; Roux, Simon; Schriml, Lynn M; Shaffer, Justin P; Sundaramurthi, Jagadish Chandrabose; Thompson, Luke R; Timme, Ruth E; Zheng, Jie; Wood-Charlson, Elisha M; Eloe-Fadrosh, Emiley A.

mSystems ; 6(3)2021 May 04.

Article in English | MEDLINE | ID: mdl-33947809

13.

Microbiome Metadata Standards: Report of the National Microbiome Data Collaborative's Workshop and Follow-On Activities.

Vangay, Pajau; Burgin, Josephine; Johnston, Anjanette; Beck, Kristen L; Berrios, Daniel C; Blumberg, Kai; Canon, Shane; Chain, Patrick; Chandonia, John-Marc; Christianson, Danielle; Costes, Sylvain V; Damerow, Joan; Duncan, William D; Dundore-Arias, Jose Pablo; Fagnan, Kjiersten; Galazka, Jonathan M; Gibbons, Sean M; Hays, David; Hervey, Judson; Hu, Bin; Hurwitz, Bonnie L; Jaiswal, Pankaj; Joachimiak, Marcin P; Kinkel, Linda; Ladau, Joshua; Martin, Stanton L; McCue, Lee Ann; Miller, Kayd; Mouncey, Nigel; Mungall, Chris; Pafilis, Evangelos; Reddy, T B K; Richardson, Lorna; Roux, Simon; Schriml, Lynn M.; Shaffer, Justin P; Sundaramurthi, Jagadish Chandrabose; Thompson, Luke R; Timme, Ruth E; Zheng, Jie; Wood-Charlson, Elisha M; Eloe-Fadrosh, Emiley A.

mSystems ; 6(1)2021 02 23.

Article in English | MEDLINE | ID: mdl-33622857

ABSTRACT

Microbiome samples are inherently defined by the environment in which they are found. Therefore, data that provide context and enable interpretation of measurements produced from biological samples, often referred to as metadata, are critical. Important contributions have been made in the development of community-driven metadata standards; however, these standards have not been uniformly embraced by the microbiome research community. To understand how these standards are being adopted, or the barriers to adoption, across research domains, institutions, and funding agencies, the National Microbiome Data Collaborative (NMDC) hosted a workshop in October 2019. This report provides a summary of discussions that took place throughout the workshop, as well as outcomes of the working groups initiated at the workshop.

14.

Zinc against COVID-19? Symptom surveillance and deficiency risk groups.

Joachimiak, Marcin P.

PLoS Negl Trop Dis ; 15(1): e0008895, 2021 01.

Article in English | MEDLINE | ID: mdl-33395417

ABSTRACT

A wide variety of symptoms is associated with Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infection, and these symptoms can overlap with other conditions and diseases. Knowing the distribution of symptoms across diseases and individuals can support clinical actions on timelines shorter than those for drug and vaccine development. Here, we focus on zinc deficiency symptoms, symptom overlap with other conditions, as well as zinc effects on immune health and mechanistic zinc deficiency risk groups. There are well-studied beneficial effects of zinc on the immune system including a decreased susceptibility to and improved clinical outcomes for infectious pathogens including multiple viruses. Zinc is also an anti-inflammatory and anti-oxidative stress agent, relevant to some severe Coronavirus Disease 2019 (COVID-19) symptoms. Unfortunately, zinc deficiency is common worldwide and not exclusive to the developing world. Lifestyle choices and preexisting conditions alone can result in zinc deficiency, and we compile zinc risk groups based on a review of the literature. It is also important to distinguish chronic zinc deficiency from deficiency acquired upon viral infection and immune response and their different supplementation strategies. Zinc is being considered as prophylactic or adjunct therapy for COVID-19, with 12 clinical trials underway, highlighting the relevance of this trace element for global pandemics. Using the example of zinc, we show that there is a critical need for a deeper understanding of essential trace elements in human health, and the resulting deficiency symptoms and their overlap with other conditions. This knowledge will directly support human immune health for decreasing susceptibility, shortening illness duration, and preventing progression to severe cases in the current and future pandemics.

Subject(s)

COVID-19 Drug Treatment , COVID-19/prevention & control , Zinc/administration & dosage , Zinc/deficiency , Anti-Inflammatory Agents/pharmacology , COVID-19/immunology , COVID-19/virology , Humans , Immune System/drug effects , Oxidative Stress/drug effects , Oxidative Stress/immunology , Pandemics , Risk Factors , SARS-CoV-2/isolation & purification

15.

KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response.

Reese, Justin T; Unni, Deepak; Callahan, Tiffany J; Cappelletti, Luca; Ravanmehr, Vida; Carbon, Seth; Shefchek, Kent A; Good, Benjamin M; Balhoff, James P; Fontana, Tommaso; Blau, Hannah; Matentzoglu, Nicolas; Harris, Nomi L; Munoz-Torres, Monica C; Haendel, Melissa A; Robinson, Peter N; Joachimiak, Marcin P; Mungall, Christopher J.

Patterns (N Y) ; 2(1): 100155, 2021 Jan 08.

Article in English | MEDLINE | ID: mdl-33196056

ABSTRACT

Integrated, up-to-date data about SARS-CoV-2 and COVID-19 is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time-consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community vary drastically for different tasks; the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians. To address these challenges, we created KG-COVID-19, a flexible framework that ingests and integrates heterogeneous biomedical data to produce knowledge graphs (KGs), and applied it to create a KG for COVID-19 response. This KG framework also can be applied to other problems in which siloed biomedical data must be quickly integrated for different research applications, including future pandemics.

16.

KG-COVID-19: a framework to produce customized knowledge graphs for COVID-19 response.

Reese, Justin; Unni, Deepak; Callahan, Tiffany J; Cappelletti, Luca; Ravanmehr, Vida; Carbon, Seth; Fontana, Tommaso; Blau, Hannah; Matentzoglu, Nicolas; Harris, Nomi L; Munoz-Torres, Monica C; Robinson, Peter N; Joachimiak, Marcin P; Mungall, Christopher J.

bioRxiv ; 2020 Aug 18.

Article in English | MEDLINE | ID: mdl-32839776

ABSTRACT

Integrated, up-to-date data about SARS-CoV-2 and coronavirus disease 2019 (COVID-19) is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community varies drastically for different tasks - the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians. To address these challenges, we created KG-COVID-19, a flexible framework that ingests and integrates biomedical data to produce knowledge graphs (KGs) for COVID-19 response. This KG framework can also be applied to other problems in which siloed biomedical data must be quickly integrated for different research applications, including future pandemics. BIGGER PICTURE: An effective response to the COVID-19 pandemic relies on integration of many different types of data available about SARS-CoV-2 and related viruses. KG-COVID-19 is a framework for producing knowledge graphs that can be customized for downstream applications including machine learning tasks, hypothesis-based querying, and browsable user interface to enable researchers to explore COVID-19 data and discover relationships.

17.

How many rare diseases are there?

Haendel, Melissa; Vasilevsky, Nicole; Unni, Deepak; Bologa, Cristian; Harris, Nomi; Rehm, Heidi; Hamosh, Ada; Baynam, Gareth; Groza, Tudor; McMurry, Julie; Dawkins, Hugh; Rath, Ana; Thaxton, Courtney; Bocci, Giovanni; Joachimiak, Marcin P; Köhler, Sebastian; Robinson, Peter N; Mungall, Chris; Oprea, Tudor I.

Nat Rev Drug Discov ; 19(2): 77-78, 2020 02.

Article in English | MEDLINE | ID: mdl-32020066

Subject(s)

Rare Diseases/classification , Rare Diseases/epidemiology , Humans , Phenotype , Rare Diseases/diagnosis , Rare Diseases/therapy

18.

The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species.

Shefchek, Kent A; Harris, Nomi L; Gargano, Michael; Matentzoglu, Nicolas; Unni, Deepak; Brush, Matthew; Keith, Daniel; Conlin, Tom; Vasilevsky, Nicole; Zhang, Xingmin Aaron; Balhoff, James P; Babb, Larry; Bello, Susan M; Blau, Hannah; Bradford, Yvonne; Carbon, Seth; Carmody, Leigh; Chan, Lauren E; Cipriani, Valentina; Cuzick, Alayne; Della Rocca, Maria; Dunn, Nathan; Essaid, Shahim; Fey, Petra; Grove, Chris; Gourdine, Jean-Phillipe; Hamosh, Ada; Harris, Midori; Helbig, Ingo; Hoatlin, Maureen; Joachimiak, Marcin; Jupp, Simon; Lett, Kenneth B; Lewis, Suzanna E; McNamara, Craig; Pendlington, Zoë M; Pilgrim, Clare; Putman, Tim; Ravanmehr, Vida; Reese, Justin; Riggs, Erin; Robb, Sofia; Roncaglia, Paola; Seager, James; Segerdell, Erik; Similuk, Morgan; Storm, Andrea L; Thaxon, Courtney; Thessen, Anne; Jacobsen, Julius O B.

Nucleic Acids Res ; 48(D1): D704-D715, 2020 01 08.

Article in English | MEDLINE | ID: mdl-31701156

ABSTRACT

In biology and biomedicine, relating phenotypic outcomes with genetic variation and environmental factors remains a challenge: patient phenotypes may not match known diseases, candidate variants may be in genes that haven't been characterized, research organisms may not recapitulate human or veterinary diseases, environmental factors affecting disease outcomes are unknown or undocumented, and many resources must be queried to find potentially significant phenotypic associations. The Monarch Initiative (https://monarchinitiative.org) integrates information on genes, variants, genotypes, phenotypes and diseases in a variety of species, and allows powerful ontology-based search. We develop many widely adopted ontologies that together enable sophisticated computational analysis, mechanistic discovery and diagnostics of Mendelian diseases. Our algorithms and tools are widely used to identify animal models of human disease through phenotypic similarity, for differential diagnostics and to facilitate translational research. Launched in 2015, Monarch has grown with regards to data (new organisms, more sources, better modeling); new API and standards; ontologies (new Mondo unified disease ontology, improvements to ontologies such as HPO and uPheno); user interface (a redesigned website); and community development. Monarch data, algorithms and tools are being used and extended by resources such as GA4GH and NCATS Translator, among others, to aid mechanistic discovery and diagnostics.

Subject(s)

Computational Biology/methods , Genotype , Phenotype , Algorithms , Animals , Biological Ontologies , Databases, Genetic , Exome , Genetic Association Studies , Genetic Variation , Genomics , Humans , Internet , Software , Translational Research, Biomedical , User-Computer Interface

19.

Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery.

Zhang, Xingmin Aaron; Yates, Amy; Vasilevsky, Nicole; Gourdine, J P; Callahan, Tiffany J; Carmody, Leigh C; Danis, Daniel; Joachimiak, Marcin P; Ravanmehr, Vida; Pfaff, Emily R; Champion, James; Robasky, Kimberly; Xu, Hao; Fecho, Karamarie; Walton, Nephi A; Zhu, Richard L; Ramsdill, Justin; Mungall, Christopher J; Köhler, Sebastian; Haendel, Melissa A; McDonald, Clement J; Vreeman, Daniel J; Peden, David B; Bennett, Tellen D; Feinstein, James A; Martin, Blake; Stefanski, Adrianne L; Hunter, Lawrence E; Chute, Christopher G; Robinson, Peter N.

NPJ Digit Med ; 22019.

Article in English | MEDLINE | ID: mdl-31119199

ABSTRACT

Electronic Health Record (EHR) systems typically define laboratory test results using the Laboratory Observation Identifier Names and Codes (LOINC) and can transmit them using Fast Healthcare Interoperability Resource (FHIR) standards. LOINC has not yet been semantically integrated with computational resources for phenotype analysis. Here, we provide a method for mapping LOINC-encoded laboratory test results transmitted in FHIR standards to Human Phenotype Ontology (HPO) terms. We annotated the medical implications of 2923 commonly used laboratory tests with HPO terms. Using these annotations, our software assesses laboratory test results and converts each result into an HPO term. We validated our approach with EHR data from 15,681 patients with respiratory complaints and identified known biomarkers for asthma. Finally, we provide a freely available SMART on FHIR application that can be used within EHR systems. Our approach allows readily available laboratory tests in EHR to be reused for deep phenotyping and exploits the hierarchical structure of HPO to integrate distinct tests that have comparable medical interpretations for association studies.

20.

Effects of genetic variation on the E. coli host-circuit interface.

Cardinale, Stefano; Joachimiak, Marcin Pawel; Arkin, Adam Paul.

Cell Rep ; 4(2): 231-7, 2013 Jul 25.

Article in English | MEDLINE | ID: mdl-23871664

ABSTRACT

Predictable operation of engineered biological circuitry requires the knowledge of host factors that compete or interfere with designed function. Here, we perform a detailed analysis of the interaction between constitutive expression from a test circuit and cell-growth properties in a subset of genetic variants of the bacterium Escherichia coli. Differences in generic cellular parameters such as ribosome availability and growth rate are the main determinants (89%) of strain-specific differences of circuit performance in laboratory-adapted strains but are responsible for only 35% of expression variation across 88 mutants of E. coli BW25113. In the latter strains, we identify specific cell functions, such as nitrogen metabolism, that directly modulate circuit behavior. Finally, we expose aspects of carbon metabolism that act in a strain- and sequence-specific manner. This method of dissecting interactions between host factors and heterologous circuits enables the discovery of mechanisms of interference necessary for the development of design principles for predictable cellular engineering.

Subject(s)

Escherichia coli/genetics , Escherichia coli/metabolism , Gene Expression Regulation, Bacterial , Genetic Variation , Host-Pathogen Interactions/genetics

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL