Search | VHL Regional Portal

1.

Integrating biological knowledge for mechanistic inference in the host-associated microbiome.

Santangelo, Brook E; Apgar, Madison; Colorado, Angela Sofia Burkhart; Martin, Casey G; Sterrett, John; Wall, Elena; Joachimiak, Marcin P; Hunter, Lawrence E; Lozupone, Catherine A.

Front Microbiol ; 15: 1351678, 2024.

Article in English | MEDLINE | ID: mdl-38638909

ABSTRACT

Advances in high-throughput technologies have enhanced our ability to describe microbial communities as they relate to human health and disease. Alongside the growth in sequencing data has come an influx of resources that synthesize knowledge surrounding microbial traits, functions, and metabolic potential with knowledge of how they may impact host pathways to influence disease phenotypes. These knowledge bases can enable the development of mechanistic explanations that may underlie correlations detected between microbial communities and disease. In this review, we survey existing resources and methodologies for the computational integration of broad classes of microbial and host knowledge. We evaluate these knowledge bases in their access methods, content, and source characteristics. We discuss challenges of the creation and utilization of knowledge bases including inconsistency of nomenclature assignment of taxa and metabolites across sources, whether the biological entities represented are rooted in ontologies or taxonomies, and how the structure and accessibility limit the diversity of applications and user types. We make this information available in a code and data repository at: https://github.com/lozuponelab/knowledge-source-mappings. Addressing these challenges will allow for the development of more effective tools for drawing from abundant knowledge to find new insights into microbial mechanisms in disease by fostering a systematic and unbiased exploration of existing information.

2.

Estimating geographic variation of infection fatality ratios during epidemics.

Ladau, Joshua; Brodie, Eoin L; Falco, Nicola; Bansal, Ishan; Hoffman, Elijah B; Joachimiak, Marcin P; Mora, Ana M; Walker, Angelica M; Wainwright, Haruko M; Wu, Yulun; Pavicic, Mirko; Jacobson, Daniel; Hess, Matthias; Brown, James B; Abuabara, Katrina.

Infect Dis Model ; 9(2): 634-643, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38572058

ABSTRACT

Objectives: We aim to estimate geographic variability in total numbers of infections and infection fatality ratios (IFR; the number of deaths caused by an infection per 1,000 infected people) when the availability and quality of data on disease burden are limited during an epidemic. Methods: We develop a noncentral hypergeometric framework that accounts for differential probabilities of positive tests and reflects the fact that symptomatic people are more likely to seek testing. We demonstrate the robustness, accuracy, and precision of this framework, and apply it to the United States (U.S.) COVID-19 pandemic to estimate county-level SARS-CoV-2 IFRs. Results: The estimators for the numbers of infections and IFRs showed high accuracy and precision; for instance, when applied to simulated validation data sets, across counties, Pearson correlation coefficients between estimator means and true values were 0.996 and 0.928, respectively, and they showed strong robustness to model misspecification. Applying the county-level estimators to the real, unsimulated COVID-19 data spanning April 1, 2020 to September 30, 2020 from across the U.S., we found that IFRs varied from 0 to 44.69, with a standard deviation of 3.55 and a median of 2.14. Conclusions: The proposed estimation framework can be used to identify geographic variation in IFRs across settings.

3.

An open source knowledge graph ecosystem for the life sciences.

Callahan, Tiffany J; Tripodi, Ignacio J; Stefanski, Adrianne L; Cappelletti, Luca; Taneja, Sanya B; Wyrwa, Jordan M; Casiraghi, Elena; Matentzoglu, Nicolas A; Reese, Justin; Silverstein, Jonathan C; Hoyt, Charles Tapley; Boyce, Richard D; Malec, Scott A; Unni, Deepak R; Joachimiak, Marcin P; Robinson, Peter N; Mungall, Christopher J; Cavalleri, Emanuele; Fontana, Tommaso; Valentini, Giorgio; Mesiti, Marco; Gillenwater, Lucas A; Santangelo, Brook; Vasilevsky, Nicole A; Hoehndorf, Robert; Bennett, Tellen D; Ryan, Patrick B; Hripcsak, George; Kahn, Michael G; Bada, Michael; Baumgartner, William A; Hunter, Lawrence E.

Sci Data ; 11(1): 363, 2024 Apr 11.

Article in English | MEDLINE | ID: mdl-38605048

ABSTRACT

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.

Subject(s)

Biological Science Disciplines , Knowledge Bases , Pattern Recognition, Automated , Algorithms , Translational Research, Biomedical

4.

Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning.

Caufield, J Harry; Hegde, Harshad; Emonet, Vincent; Harris, Nomi L; Joachimiak, Marcin P; Matentzoglu, Nicolas; Kim, HyeongSik; Moxon, Sierra; Reese, Justin T; Haendel, Melissa A; Robinson, Peter N; Mungall, Christopher J.

Bioinformatics ; 40(3)2024 Mar 04.

Article in English | MEDLINE | ID: mdl-38383067

ABSTRACT

MOTIVATION: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas. RESULTS: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM's native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. AVAILABILITY AND IMPLEMENTATION: SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.

Subject(s)

Knowledge Bases , Semantics , Databases, Factual

5.

A bacterial sensor taxonomy across earth ecosystems for machine learning applications.

Park, Helen; Joachimiak, Marcin P; Jungbluth, Sean P; Yang, Ziming; Riehl, William J; Canon, R Shane; Arkin, Adam P; Dehal, Paramvir S.

mSystems ; 9(1): e0002623, 2024 Jan 23.

Article in English | MEDLINE | ID: mdl-38078749

ABSTRACT

Microbial communities have evolved to colonize all ecosystems of the planet, from the deep sea to the human gut. Microbes survive by sensing, responding, and adapting to immediate environmental cues. This process is driven by signal transduction proteins such as histidine kinases, which use their sensing domains to bind or otherwise detect environmental cues and "transduce" signals to adjust internal processes. We hypothesized that an ecosystem's unique stimuli leave a sensor "fingerprint," able to identify and shed insight on ecosystem conditions. To test this, we collected 20,712 publicly available metagenomes from Host-associated, Environmental, and Engineered ecosystems across the globe. We extracted and clustered the collection's nearly 18M unique sensory domains into 113,712 similar groupings with MMseqs2. We built gradient-boosted decision tree machine learning models and found we could classify the ecosystem type (accuracy: 87%) and predict the levels of different physical parameters (R2 score: 83%) using the sensor cluster abundance as features. Feature importance enables identification of the most predictive sensors to differentiate between ecosystems which can lead to mechanistic interpretations if the sensor domains are well annotated. To demonstrate this, a machine learning model was trained to predict patient's disease state and used to identify domains related to oxygen sensing present in a healthy gut but missing in patients with abnormal conditions. Moreover, since 98.7% of identified sensor domains are uncharacterized, importance ranking can be used to prioritize sensors to determine what ecosystem function they may be sensing. Furthermore, these new predictive sensors can function as targets for novel sensor engineering with applications in biotechnology, ecosystem maintenance, and medicine.IMPORTANCEMicrobes infect, colonize, and proliferate due to their ability to sense and respond quickly to their surroundings. In this research, we extract the sensory proteins from a diverse range of environmental, engineered, and host-associated metagenomes. We trained machine learning classifiers using sensors as features such that it is possible to predict the ecosystem for a metagenome from its sensor profile. We use the optimized model's feature importance to identify the most impactful and predictive sensors in different environments. We next use the sensor profile from human gut metagenomes to classify their disease states and explore which sensors can explain differences between diseases. The sensors most predictive of environmental labels here, most of which correspond to uncharacterized proteins, are a useful starting point for the discovery of important environment signals and the development of possible diagnostic interventions.

Subject(s)

Metagenomics , Microbiota , Humans , Metagenome , Machine Learning , Earth, Planet

6.

Gene Set Summarization using Large Language Models.

Joachimiak, Marcin P; Caufield, J Harry; Harris, Nomi L; Kim, Hyeongsik; Mungall, Christopher J.

ArXiv ; 2023 May 25.

Article in English | MEDLINE | ID: mdl-37292480

ABSTRACT

Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpreting gene lists can also be framed as a textual summarization task, enabling the use of Large Language Models (LLMs), potentially utilizing scientific texts directly and avoiding reliance on a KB. We developed SPINDOCTOR (Structured Prompt Interpolation of Natural Language Descriptions of Controlled Terms for Ontology Reporting), a method that uses GPT models to perform gene set function summarization as a complement to standard enrichment analysis. This method can use different sources of gene functional information: (1) structured text derived from curated ontological KB annotations, (2) ontology-free narrative gene summaries, or (3) direct model retrieval. We demonstrate that these methods are able to generate plausible and biologically valid summary GO term lists for gene sets. However, GPT-based approaches are unable to deliver reliable scores or p-values and often return terms that are not statistically significant. Crucially, these methods were rarely able to recapitulate the most precise and informative term from standard enrichment, likely due to an inability to generalize and reason using an ontology. Results are highly nondeterministic, with minor variations in prompt resulting in radically different term lists. Our results show that at this point, LLM-based methods are unsuitable as a replacement for standard term enrichment analysis and that manual curation of ontological assertions remains necessary.

7.

KG-Hub-building and exchanging biological knowledge graphs.

Caufield, J Harry; Putman, Tim; Schaper, Kevin; Unni, Deepak R; Hegde, Harshad; Callahan, Tiffany J; Cappelletti, Luca; Moxon, Sierra A T; Ravanmehr, Vida; Carbon, Seth; Chan, Lauren E; Cortes, Katherina; Shefchek, Kent A; Elsarboukh, Glass; Balhoff, Jim; Fontana, Tommaso; Matentzoglu, Nicolas; Bruskiewich, Richard M; Thessen, Anne E; Harris, Nomi L; Munoz-Torres, Monica C; Haendel, Melissa A; Robinson, Peter N; Joachimiak, Marcin P; Mungall, Christopher J; Reese, Justin T.

Bioinformatics ; 39(7)2023 07 01.

Article in English | MEDLINE | ID: mdl-37389415

ABSTRACT

MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION: https://kghub.org.

Subject(s)

Biological Ontologies , COVID-19 , Humans , Pattern Recognition, Automated , Rare Diseases , Machine Learning

8.

Developing a Knowledge Graph for Pharmacokinetic Natural Product-Drug Interactions.

Taneja, Sanya B; Callahan, Tiffany J; Paine, Mary F; Kane-Gill, Sandra L; Kilicoglu, Halil; Joachimiak, Marcin P; Boyce, Richard D.

J Biomed Inform ; 140: 104341, 2023 04.

Article in English | MEDLINE | ID: mdl-36933632

ABSTRACT

BACKGROUND: Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical or other natural products are co-consumed with pharmaceutical drugs. With the growing use of natural products, the risk for potential NPDIs and consequent adverse events has increased. Understanding mechanisms of NPDIs is key to preventing or minimizing adverse events. Although biomedical knowledge graphs (KGs) have been widely used for drug-drug interaction applications, computational investigation of NPDIs is novel. We constructed NP-KG as a first step toward computational discovery of plausible mechanistic explanations for pharmacokinetic NPDIs that can be used to guide scientific research. METHODS: We developed a large-scale, heterogeneous KG with biomedical ontologies, linked data, and full texts of the scientific literature. To construct the KG, biomedical ontologies and drug databases were integrated with the Phenotype Knowledge Translator framework. The semantic relation extraction systems, SemRep and Integrated Network and Dynamic Reasoning Assembler, were used to extract semantic predications (subject-relation-object triples) from full texts of the scientific literature related to the exemplar natural products green tea and kratom. A literature-based graph constructed from the predications was integrated into the ontology-grounded KG to create NP-KG. NP-KG was evaluated with case studies of pharmacokinetic green tea- and kratom-drug interactions through KG path searches and meta-path discovery to determine congruent and contradictory information in NP-KG compared to ground truth data. We also conducted an error analysis to identify knowledge gaps and incorrect predications in the KG. RESULTS: The fully integrated NP-KG consisted of 745,512 nodes and 7,249,576 edges. Evaluation of NP-KG resulted in congruent (38.98% for green tea, 50% for kratom), contradictory (15.25% for green tea, 21.43% for kratom), and both congruent and contradictory (15.25% for green tea, 21.43% for kratom) information compared to ground truth data. Potential pharmacokinetic mechanisms for several purported NPDIs, including the green tea-raloxifene, green tea-nadolol, kratom-midazolam, kratom-quetiapine, and kratom-venlafaxine interactions were congruent with the published literature. CONCLUSION: NP-KG is the first KG to integrate biomedical ontologies with full texts of the scientific literature focused on natural products. We demonstrate the application of NP-KG to identify known pharmacokinetic interactions between natural products and pharmaceutical drugs mediated by drug metabolizing enzymes and transporters. Future work will incorporate context, contradiction analysis, and embedding-based methods to enrich NP-KG. NP-KG is publicly available at https://doi.org/10.5281/zenodo.6814507. The code for relation extraction, KG construction, and hypothesis generation is available at https://github.com/sanyabt/np-kg.

Subject(s)

Biological Ontologies , Biological Products , Pattern Recognition, Automated , Drug Interactions , Semantics , Pharmaceutical Preparations

9.

GRAPE for fast and scalable graph processing and random-walk-based embedding.

Cappelletti, Luca; Fontana, Tommaso; Casiraghi, Elena; Ravanmehr, Vida; Callahan, Tiffany J; Cano, Carlos; Joachimiak, Marcin P; Mungall, Christopher J; Robinson, Peter N; Reese, Justin; Valentini, Giorgio.

Nat Comput Sci ; 3(6): 552-568, 2023 Jun.

Article in English | MEDLINE | ID: mdl-38177435

ABSTRACT

Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods. Compared with state-of-the-art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as competitive edge- and node-label prediction performance. GRAPE comprises approximately 1.7 million well-documented lines of Python and Rust code and provides 69 node-embedding methods, 25 inference models, a collection of efficient graph-processing utilities, and over 80,000 graphs from the literature and other sources. Standardized interfaces allow a seamless integration of third-party libraries, while ready-to-use and modular pipelines permit an easy-to-use evaluation of graph-representation-learning methods, therefore also positioning GRAPE as a software resource that performs a fair comparison between methods and libraries for graph processing and embedding.

Subject(s)

Libraries , Vitis , Algorithms , Software , Learning

10.

Why was this cited? Explainable machine learning applied to COVID-19 research literature.

Beranová, Lucie; Joachimiak, Marcin P; Kliegr, Tomás; Rabby, Gollam; Sklenák, Vilém.

Scientometrics ; 127(5): 2313-2349, 2022.

Article in English | MEDLINE | ID: mdl-35431364

ABSTRACT

Multiple studies have investigated bibliometric factors predictive of the citation count a research article will receive. In this article, we go beyond bibliometric data by using a range of machine learning techniques to find patterns predictive of citation count using both article content and available metadata. As the input collection, we use the CORD-19 corpus containing research articles-mostly from biology and medicine-applicable to the COVID-19 crisis. Our study employs a combination of state-of-the-art machine learning techniques for text understanding, including embeddings-based language model BERT, several systems for detection and semantic expansion of entities: ConceptNet, Pubtator and ScispaCy. To interpret the resulting models, we use several explanation algorithms: random forest feature importance, LIME, and Shapley values. We compare the performance and comprehensibility of models obtained by "black-box" machine learning algorithms (neural networks and random forests) with models built with rule learning (CORELS, CBA), which are intrinsically explainable. Multiple rules were discovered, which referred to biomedical entities of potential interest. Of the rules with the highest lift measure, several rules pointed to dipeptidyl peptidase4 (DPP4), a known MERS-CoV receptor and a critical determinant of camel to human transmission of the camel coronavirus (MERS-CoV). Some other interesting patterns related to the type of animal investigated were found. Articles referring to bats and camels tend to draw citations, while articles referring to most other animal species related to coronavirus are lowly cited. Bat coronavirus is the only other virus from a non-human species in the betaB clade along with the SARS-CoV and SARS-CoV-2 viruses. MERS-CoV is in a sister betaC clade, also close to human SARS coronaviruses. Thus both species linked to high citation counts harbor coronaviruses which are more phylogenetically similar to human SARS viruses. On the other hand, feline (FIPV, FCOV) and canine coronaviruses (CCOV) are in the alpha coronavirus clade and more distant from the betaB clade with human SARS viruses. Other results include detection of apparent citation bias favouring authors with western sounding names. Equal performance of TF-IDF weights and binary word incidence matrix was observed, with the latter resulting in better interpretability. The best predictive performance was obtained with a "black-box" method-neural network. The rule-based models led to most insights, especially when coupled with text representation using semantic entity detection methods. Follow-up work should focus on the analysis of citation patterns in the context of phylogenetic trees, as well on patterns referring to DPP4, which is currently considered as a SARS-Cov-2 therapeutic target.

11.

Correction for Vangay et al., "Microbiome Metadata Standards: Report of the National Microbiome Data Collaborative's Workshop and Follow-On Activities".

Vangay, Pajau; Burgin, Josephine; Johnston, Anjanette; Beck, Kristen L; Berrios, Daniel C; Blumberg, Kai; Canon, Shane; Chain, Patrick; Chandonia, John-Marc; Christianson, Danielle; Costes, Sylvain V; Damerow, Joan; Duncan, William D; Dundore-Arias, Jose Pablo; Fagnan, Kjiersten; Galazka, Jonathan M; Gibbons, Sean M; Hays, David; Hervey, Judson; Hu, Bin; Hurwitz, Bonnie L; Jaiswal, Pankaj; Joachimiak, Marcin P; Kinkel, Linda; Ladau, Joshua; Martin, Stanton L; McCue, Lee Ann; Miller, Kayd; Mouncey, Nigel; Mungall, Chris; Pafilis, Evangelos; Reddy, T B K; Richardson, Lorna; Roux, Simon; Schriml, Lynn M; Shaffer, Justin P; Sundaramurthi, Jagadish Chandrabose; Thompson, Luke R; Timme, Ruth E; Zheng, Jie; Wood-Charlson, Elisha M; Eloe-Fadrosh, Emiley A.

mSystems ; 6(3)2021 May 04.

Article in English | MEDLINE | ID: mdl-33947809

12.

Microbiome Metadata Standards: Report of the National Microbiome Data Collaborative's Workshop and Follow-On Activities.

Vangay, Pajau; Burgin, Josephine; Johnston, Anjanette; Beck, Kristen L; Berrios, Daniel C; Blumberg, Kai; Canon, Shane; Chain, Patrick; Chandonia, John-Marc; Christianson, Danielle; Costes, Sylvain V; Damerow, Joan; Duncan, William D; Dundore-Arias, Jose Pablo; Fagnan, Kjiersten; Galazka, Jonathan M; Gibbons, Sean M; Hays, David; Hervey, Judson; Hu, Bin; Hurwitz, Bonnie L; Jaiswal, Pankaj; Joachimiak, Marcin P; Kinkel, Linda; Ladau, Joshua; Martin, Stanton L; McCue, Lee Ann; Miller, Kayd; Mouncey, Nigel; Mungall, Chris; Pafilis, Evangelos; Reddy, T B K; Richardson, Lorna; Roux, Simon; Schriml, Lynn M.; Shaffer, Justin P; Sundaramurthi, Jagadish Chandrabose; Thompson, Luke R; Timme, Ruth E; Zheng, Jie; Wood-Charlson, Elisha M; Eloe-Fadrosh, Emiley A.

mSystems ; 6(1)2021 02 23.

Article in English | MEDLINE | ID: mdl-33622857

ABSTRACT

Microbiome samples are inherently defined by the environment in which they are found. Therefore, data that provide context and enable interpretation of measurements produced from biological samples, often referred to as metadata, are critical. Important contributions have been made in the development of community-driven metadata standards; however, these standards have not been uniformly embraced by the microbiome research community. To understand how these standards are being adopted, or the barriers to adoption, across research domains, institutions, and funding agencies, the National Microbiome Data Collaborative (NMDC) hosted a workshop in October 2019. This report provides a summary of discussions that took place throughout the workshop, as well as outcomes of the working groups initiated at the workshop.

13.

Zinc against COVID-19? Symptom surveillance and deficiency risk groups.

Joachimiak, Marcin P.

PLoS Negl Trop Dis ; 15(1): e0008895, 2021 01.

Article in English | MEDLINE | ID: mdl-33395417

ABSTRACT

A wide variety of symptoms is associated with Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infection, and these symptoms can overlap with other conditions and diseases. Knowing the distribution of symptoms across diseases and individuals can support clinical actions on timelines shorter than those for drug and vaccine development. Here, we focus on zinc deficiency symptoms, symptom overlap with other conditions, as well as zinc effects on immune health and mechanistic zinc deficiency risk groups. There are well-studied beneficial effects of zinc on the immune system including a decreased susceptibility to and improved clinical outcomes for infectious pathogens including multiple viruses. Zinc is also an anti-inflammatory and anti-oxidative stress agent, relevant to some severe Coronavirus Disease 2019 (COVID-19) symptoms. Unfortunately, zinc deficiency is common worldwide and not exclusive to the developing world. Lifestyle choices and preexisting conditions alone can result in zinc deficiency, and we compile zinc risk groups based on a review of the literature. It is also important to distinguish chronic zinc deficiency from deficiency acquired upon viral infection and immune response and their different supplementation strategies. Zinc is being considered as prophylactic or adjunct therapy for COVID-19, with 12 clinical trials underway, highlighting the relevance of this trace element for global pandemics. Using the example of zinc, we show that there is a critical need for a deeper understanding of essential trace elements in human health, and the resulting deficiency symptoms and their overlap with other conditions. This knowledge will directly support human immune health for decreasing susceptibility, shortening illness duration, and preventing progression to severe cases in the current and future pandemics.

Subject(s)

COVID-19 Drug Treatment , COVID-19/prevention & control , Zinc/administration & dosage , Zinc/deficiency , Anti-Inflammatory Agents/pharmacology , COVID-19/immunology , COVID-19/virology , Humans , Immune System/drug effects , Oxidative Stress/drug effects , Oxidative Stress/immunology , Pandemics , Risk Factors , SARS-CoV-2/isolation & purification

14.

KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response.

Reese, Justin T; Unni, Deepak; Callahan, Tiffany J; Cappelletti, Luca; Ravanmehr, Vida; Carbon, Seth; Shefchek, Kent A; Good, Benjamin M; Balhoff, James P; Fontana, Tommaso; Blau, Hannah; Matentzoglu, Nicolas; Harris, Nomi L; Munoz-Torres, Monica C; Haendel, Melissa A; Robinson, Peter N; Joachimiak, Marcin P; Mungall, Christopher J.

Patterns (N Y) ; 2(1): 100155, 2021 Jan 08.

Article in English | MEDLINE | ID: mdl-33196056

ABSTRACT

Integrated, up-to-date data about SARS-CoV-2 and COVID-19 is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time-consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community vary drastically for different tasks; the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians. To address these challenges, we created KG-COVID-19, a flexible framework that ingests and integrates heterogeneous biomedical data to produce knowledge graphs (KGs), and applied it to create a KG for COVID-19 response. This KG framework also can be applied to other problems in which siloed biomedical data must be quickly integrated for different research applications, including future pandemics.

15.

KG-COVID-19: a framework to produce customized knowledge graphs for COVID-19 response.

Reese, Justin; Unni, Deepak; Callahan, Tiffany J; Cappelletti, Luca; Ravanmehr, Vida; Carbon, Seth; Fontana, Tommaso; Blau, Hannah; Matentzoglu, Nicolas; Harris, Nomi L; Munoz-Torres, Monica C; Robinson, Peter N; Joachimiak, Marcin P; Mungall, Christopher J.

bioRxiv ; 2020 Aug 18.

Article in English | MEDLINE | ID: mdl-32839776

ABSTRACT

Integrated, up-to-date data about SARS-CoV-2 and coronavirus disease 2019 (COVID-19) is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community varies drastically for different tasks - the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians. To address these challenges, we created KG-COVID-19, a flexible framework that ingests and integrates biomedical data to produce knowledge graphs (KGs) for COVID-19 response. This KG framework can also be applied to other problems in which siloed biomedical data must be quickly integrated for different research applications, including future pandemics. BIGGER PICTURE: An effective response to the COVID-19 pandemic relies on integration of many different types of data available about SARS-CoV-2 and related viruses. KG-COVID-19 is a framework for producing knowledge graphs that can be customized for downstream applications including machine learning tasks, hypothesis-based querying, and browsable user interface to enable researchers to explore COVID-19 data and discover relationships.

16.

How many rare diseases are there?

Haendel, Melissa; Vasilevsky, Nicole; Unni, Deepak; Bologa, Cristian; Harris, Nomi; Rehm, Heidi; Hamosh, Ada; Baynam, Gareth; Groza, Tudor; McMurry, Julie; Dawkins, Hugh; Rath, Ana; Thaxton, Courtney; Bocci, Giovanni; Joachimiak, Marcin P; Köhler, Sebastian; Robinson, Peter N; Mungall, Chris; Oprea, Tudor I.

Nat Rev Drug Discov ; 19(2): 77-78, 2020 02.

Article in English | MEDLINE | ID: mdl-32020066

Subject(s)

Rare Diseases/classification , Rare Diseases/epidemiology , Humans , Phenotype , Rare Diseases/diagnosis , Rare Diseases/therapy

17.

Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery.

Zhang, Xingmin Aaron; Yates, Amy; Vasilevsky, Nicole; Gourdine, J P; Callahan, Tiffany J; Carmody, Leigh C; Danis, Daniel; Joachimiak, Marcin P; Ravanmehr, Vida; Pfaff, Emily R; Champion, James; Robasky, Kimberly; Xu, Hao; Fecho, Karamarie; Walton, Nephi A; Zhu, Richard L; Ramsdill, Justin; Mungall, Christopher J; Köhler, Sebastian; Haendel, Melissa A; McDonald, Clement J; Vreeman, Daniel J; Peden, David B; Bennett, Tellen D; Feinstein, James A; Martin, Blake; Stefanski, Adrianne L; Hunter, Lawrence E; Chute, Christopher G; Robinson, Peter N.

NPJ Digit Med ; 22019.

Article in English | MEDLINE | ID: mdl-31119199

ABSTRACT

Electronic Health Record (EHR) systems typically define laboratory test results using the Laboratory Observation Identifier Names and Codes (LOINC) and can transmit them using Fast Healthcare Interoperability Resource (FHIR) standards. LOINC has not yet been semantically integrated with computational resources for phenotype analysis. Here, we provide a method for mapping LOINC-encoded laboratory test results transmitted in FHIR standards to Human Phenotype Ontology (HPO) terms. We annotated the medical implications of 2923 commonly used laboratory tests with HPO terms. Using these annotations, our software assesses laboratory test results and converts each result into an HPO term. We validated our approach with EHR data from 15,681 patients with respiratory complaints and identified known biomarkers for asthma. Finally, we provide a freely available SMART on FHIR application that can be used within EHR systems. Our approach allows readily available laboratory tests in EHR to be reused for deep phenotyping and exploits the hierarchical structure of HPO to integrate distinct tests that have comparable medical interpretations for association studies.

18.

Characterization of NaCl tolerance in Desulfovibrio vulgaris Hildenborough through experimental evolution.

Zhou, Aifen; Baidoo, Edward; He, Zhili; Mukhopadhyay, Aindrila; Baumohl, Jason K; Benke, Peter; Joachimiak, Marcin P; Xie, Ming; Song, Rong; Arkin, Adam P; Hazen, Terry C; Keasling, Jay D; Wall, Judy D; Stahl, David A; Zhou, Jizhong.

ISME J ; 7(9): 1790-802, 2013 Sep.

Article in English | MEDLINE | ID: mdl-23575373

ABSTRACT

Desulfovibrio vulgaris Hildenborough strains with significantly increased tolerance to NaCl were obtained via experimental evolution. A NaCl-evolved strain, ES9-11, isolated from a population cultured for 1200 generations in medium amended with 100 mM NaCl, showed better tolerance to NaCl than a control strain, EC3-10, cultured for 1200 generations in parallel but without NaCl amendment in medium. To understand the NaCl adaptation mechanism in ES9-11, we analyzed the transcriptional, metabolite and phospholipid fatty acid (PLFA) profiles of strain ES9-11 with 0, 100- or 250 mM-added NaCl in medium compared with the ancestral strain and EC3-10 as controls. In all the culture conditions, increased expressions of genes involved in amino-acid synthesis and transport, energy production, cation efflux and decreased expression of flagellar assembly genes were detected in ES9-11. Consistently, increased abundances of organic solutes and decreased cell motility were observed in ES9-11. Glutamate appears to be the most important osmoprotectant in D. vulgaris under NaCl stress, whereas, other organic solutes such as glutamine, glycine and glycine betaine might contribute to NaCl tolerance under low NaCl concentration only. Unsaturation indices of PLFA significantly increased in ES9-11. Branched unsaturated PLFAs i17:1 ω9c, a17:1 ω9c and branched saturated i15:0 might have important roles in maintaining proper membrane fluidity under NaCl stress. Taken together, these data suggest that the accumulation of osmolytes, increased membrane fluidity, decreased cell motility and possibly an increased exclusion of Na(+) contribute to increased NaCl tolerance in NaCl-evolved D. vulgaris.

Subject(s)

Adaptation, Physiological , Biological Evolution , Desulfovibrio vulgaris/physiology , Gene Expression Regulation, Bacterial , Sodium Chloride/metabolism , Desulfovibrio vulgaris/genetics , Desulfovibrio vulgaris/metabolism , Energy Metabolism/genetics , Fatty Acids/metabolism , Gene Expression Profiling , Membrane Fluidity/genetics

19.

Deletion of the Desulfovibrio vulgaris carbon monoxide sensor invokes global changes in transcription.

Rajeev, Lara; Hillesland, Kristina L; Zane, Grant M; Zhou, Aifen; Joachimiak, Marcin P; He, Zhili; Zhou, Jizhong; Arkin, Adam P; Wall, Judy D; Stahl, David A.

J Bacteriol ; 194(21): 5783-93, 2012 Nov.

Article in English | MEDLINE | ID: mdl-22904289

ABSTRACT

The carbon monoxide-sensing transcriptional factor CooA has been studied only in hydrogenogenic organisms that can grow using CO as the sole source of energy. Homologs for the canonical CO oxidation system, including CooA, CO dehydrogenase (CODH), and a CO-dependent Coo hydrogenase, are present in the sulfate-reducing bacterium Desulfovibrio vulgaris, although it grows only poorly on CO. We show that D. vulgaris Hildenborough has an active CO dehydrogenase capable of consuming exogenous CO and that the expression of the CO dehydrogenase, but not that of a gene annotated as encoding a Coo hydrogenase, is dependent on both CO and CooA. Carbon monoxide did not act as a general metabolic inhibitor, since growth of a strain deleted for cooA was inhibited by CO on lactate-sulfate but not pyruvate-sulfate. While the deletion strain did not accumulate CO in excess, as would have been expected if CooA were important in the cycling of CO as a metabolic intermediate, global transcriptional analyses suggested that CooA and CODH are used during normal metabolism.

Subject(s)

Bacterial Proteins/genetics , Carbon Monoxide/metabolism , Desulfovibrio vulgaris/genetics , Gene Deletion , Gene Expression Profiling , Gene Expression Regulation, Bacterial , Transcription Factors/genetics , Aldehyde Oxidoreductases/metabolism , Desulfovibrio vulgaris/growth & development , Desulfovibrio vulgaris/metabolism , Lactates/metabolism , Multienzyme Complexes/metabolism , Pyruvic Acid/metabolism , Sulfates/metabolism

20.

Functional responses of methanogenic archaea to syntrophic growth.

Walker, Christopher B; Redding-Johanson, Alyssa M; Baidoo, Edward E; Rajeev, Lara; He, Zhili; Hendrickson, Erik L; Joachimiak, Marcin P; Stolyar, Sergey; Arkin, Adam P; Leigh, John A; Zhou, Jizhong; Keasling, Jay D; Mukhopadhyay, Aindrila; Stahl, David A.

ISME J ; 6(11): 2045-55, 2012 Nov.

Article in English | MEDLINE | ID: mdl-22739494

ABSTRACT

Methanococcus maripaludis grown syntrophically with Desulfovibrio vulgaris was compared with M. maripaludis monocultures grown under hydrogen limitation using transcriptional, proteomic and metabolite analyses. These measurements indicate a decrease in transcript abundance for energy-consuming biosynthetic functions in syntrophically grown M. maripaludis, with an increase in transcript abundance for genes involved in the energy-generating central pathway for methanogenesis. Compared with growth in monoculture under hydrogen limitation, the response of paralogous genes, such as those coding for hydrogenases, often diverged, with transcripts of one variant increasing in relative abundance, whereas the other was little changed or significantly decreased in abundance. A common theme was an apparent increase in transcripts for functions using H(2) directly as reductant, versus those using the reduced deazaflavin (coenzyme F(420)). The greater importance of direct reduction by H(2) was supported by improved syntrophic growth of a deletion mutant in an F(420)-dependent dehydrogenase of M. maripaludis. These data suggest that paralogous genes enable the methanogen to adapt to changing substrate availability, sustaining it under environmental conditions that are often near the thermodynamic threshold for growth. Additionally, the discovery of interspecies alanine transfer adds another metabolic dimension to this environmentally relevant mutualism.

Subject(s)

Desulfovibrio vulgaris/growth & development , Methanococcus/growth & development , Desulfovibrio vulgaris/genetics , Desulfovibrio vulgaris/metabolism , Energy Metabolism , Hydrogen/metabolism , Lactic Acid/metabolism , Methane/metabolism , Methanococcus/genetics , Methanococcus/metabolism , Oxidoreductases/genetics , Oxidoreductases/metabolism , Proteomics

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL