Search | VHL Regional Portal

1.

Decoding the exposome: data science methodologies and implications in exposome-wide association studies (ExWASs).

Chung, Ming Kei; House, John S; Akhtari, Farida S; Makris, Konstantinos C; Langston, Michael A; Islam, Khandaker Talat; Holmes, Philip; Chadeau-Hyam, Marc; Smirnov, Alex I; Du, Xiuxia; Thessen, Anne E; Cui, Yuxia; Zhang, Kai; Manrai, Arjun K; Motsinger-Reif, Alison; Patel, Chirag J.

Exposome ; 4(1): osae001, 2024.

Article in English | MEDLINE | ID: mdl-38344436

ABSTRACT

This paper explores the exposome concept and its role in elucidating the interplay between environmental exposures and human health. We introduce two key concepts critical for exposomics research. Firstly, we discuss the joint impact of genetics and environment on phenotypes, emphasizing the variance attributable to shared and nonshared environmental factors, underscoring the complexity of quantifying the exposome's influence on health outcomes. Secondly, we introduce the importance of advanced data-driven methods in large cohort studies for exposomic measurements. Here, we introduce the exposome-wide association study (ExWAS), an approach designed for systematic discovery of relationships between phenotypes and various exposures, identifying significant associations while controlling for multiple comparisons. We advocate for the standardized use of the term "exposome-wide association study, ExWAS," to facilitate clear communication and literature retrieval in this field. The paper aims to guide future health researchers in understanding and evaluating exposomic studies. Our discussion extends to emerging topics, such as FAIR Data Principles, biobanked healthcare datasets, and the functional exposome, outlining the future directions in exposomic research. This abstract provides a succinct overview of our comprehensive approach to understanding the complex dynamics of the exposome and its significant implications for human health.

2.

Data sharing and ontology use among agricultural genetics, genomics, and breeding databases and resources of the Agbiodata Consortium.

Clarke, Jennifer L; Cooper, Laurel D; Poelchau, Monica F; Berardini, Tanya Z; Elser, Justin; Farmer, Andrew D; Ficklin, Stephen; Kumari, Sunita; Laporte, Marie-Angélique; Nelson, Rex T; Sadohara, Rie; Selby, Peter; Thessen, Anne E; Whitehead, Brandon; Sen, Taner Z.

Database (Oxford) ; 20232023 11 15.

Article in English | MEDLINE | ID: mdl-37971715

ABSTRACT

Over the last couple of decades, there has been a rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources (https://www.agbiodata.org/databases) covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databases' throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, respectively, conducted a Consortium-wide survey to assess the current status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data-sharing practices by AgBioData databases are in a fairly healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that, ontology use has not substantially changed since a similar survey was conducted in 2017. Based on our evaluation of the survey results, we recommend (i) providing training for database personnel in a specific data-sharing techniques, as well as in ontology use; (ii) further study on what metadata is shared, and how well it is shared among databases; (iii) promoting an understanding of data sharing and ontologies in the stakeholder community; (iv) improving data sharing and ontologies for specific phenotypic data types and formats; and (v) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means. Database URL https://www.agbiodata.org/databases.

Subject(s)

Data Management , Plant Breeding , Animals , Genomics/methods , Databases, Factual , Information Dissemination

3.

An approach for collaborative development of a federated biomedical knowledge graph-based question-answering system: Question-of-the-Month challenges.

Fecho, Karamarie; Bizon, Chris; Issabekova, Tursynay; Moxon, Sierra; Thessen, Anne E; Abdollahi, Shervin; Baranzini, Sergio E; Belhu, Basazin; Byrd, William E; Chung, Lawrence; Crouse, Andrew; Duby, Marc P; Ferguson, Stephen; Foksinska, Aleksandra; Forero, Laura; Friedman, Jennifer; Gardner, Vicki; Glusman, Gwênlyn; Hadlock, Jennifer; Hanspers, Kristina; Hinderer, Eugene; Hobbs, Charlotte; Hyde, Gregory; Huang, Sui; Koslicki, David; Mease, Philip; Muller, Sandrine; Mungall, Christopher J; Ramsey, Stephen A; Roach, Jared; Rubin, Irit; Schurman, Shepherd H; Shalev, Anath; Smith, Brett; Soman, Karthik; Stemann, Sarah; Su, Andrew I; Ta, Casey; Watkins, Paul B; Williams, Mark D; Wu, Chunlei; Xu, Colleen H.

J Clin Transl Sci ; 7(1): e214, 2023.

Article in English | MEDLINE | ID: mdl-37900350

ABSTRACT

Knowledge graphs have become a common approach for knowledge representation. Yet, the application of graph methodology is elusive due to the sheer number and complexity of knowledge sources. In addition, semantic incompatibilities hinder efforts to harmonize and integrate across these diverse sources. As part of The Biomedical Translator Consortium, we have developed a knowledge graph-based question-answering system designed to augment human reasoning and accelerate translational scientific discovery: the Translator system. We have applied the Translator system to answer biomedical questions in the context of a broad array of diseases and syndromes, including Fanconi anemia, primary ciliary dyskinesia, multiple sclerosis, and others. A variety of collaborative approaches have been used to research and develop the Translator system. One recent approach involved the establishment of a monthly "Question-of-the-Month (QotM) Challenge" series. Herein, we describe the structure of the QotM Challenge; the six challenges that have been conducted to date on drug-induced liver injury, cannabidiol toxicity, coronavirus infection, diabetes, psoriatic arthritis, and ATP1A3-related phenotypes; the scientific insights that have been gleaned during the challenges; and the technical issues that were identified over the course of the challenges and that can now be addressed to foster further development of the prototype Translator system. We close with a discussion on Large Language Models such as ChatGPT and highlight differences between those models and the Translator system.

4.

KG-Hub-building and exchanging biological knowledge graphs.

Caufield, J Harry; Putman, Tim; Schaper, Kevin; Unni, Deepak R; Hegde, Harshad; Callahan, Tiffany J; Cappelletti, Luca; Moxon, Sierra A T; Ravanmehr, Vida; Carbon, Seth; Chan, Lauren E; Cortes, Katherina; Shefchek, Kent A; Elsarboukh, Glass; Balhoff, Jim; Fontana, Tommaso; Matentzoglu, Nicolas; Bruskiewich, Richard M; Thessen, Anne E; Harris, Nomi L; Munoz-Torres, Monica C; Haendel, Melissa A; Robinson, Peter N; Joachimiak, Marcin P; Mungall, Christopher J; Reese, Justin T.

Bioinformatics ; 39(7)2023 07 01.

Article in English | MEDLINE | ID: mdl-37389415

ABSTRACT

MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION: https://kghub.org.

Subject(s)

Biological Ontologies , COVID-19 , Humans , Pattern Recognition, Automated , Rare Diseases , Machine Learning

5.

Using knowledge graphs to infer gene expression in plants.

Thessen, Anne E; Cooper, Laurel; Swetnam, Tyson L; Hegde, Harshad; Reese, Justin; Elser, Justin; Jaiswal, Pankaj.

Front Artif Intell ; 6: 1201002, 2023.

Article in English | MEDLINE | ID: mdl-37384147

ABSTRACT

Introduction: Climate change is already affecting ecosystems around the world and forcing us to adapt to meet societal needs. The speed with which climate change is progressing necessitates a massive scaling up of the number of species with understood genotype-environment-phenotype (G×E×P) dynamics in order to increase ecosystem and agriculture resilience. An important part of predicting phenotype is understanding the complex gene regulatory networks present in organisms. Previous work has demonstrated that knowledge about one species can be applied to another using ontologically-supported knowledge bases that exploit homologous structures and homologous genes. These types of structures that can apply knowledge about one species to another have the potential to enable the massive scaling up that is needed through in silico experimentation. Methods: We developed one such structure, a knowledge graph (KG) using information from Planteome and the EMBL-EBI Expression Atlas that connects gene expression, molecular interactions, functions, and pathways to homology-based gene annotations. Our preliminary analysis uses data from gene expression studies in Arabidopsis thaliana and Populus trichocarpa plants exposed to drought conditions. Results: A graph query identified 16 pairs of homologous genes in these two taxa, some of which show opposite patterns of gene expression in response to drought. As expected, analysis of the upstream cis-regulatory region of these genes revealed that homologs with similar expression behavior had conserved cis-regulatory regions and potential interaction with similar trans-elements, unlike homologs that changed their expression in opposite ways. Discussion: This suggests that even though the homologous pairs share common ancestry and functional roles, predicting expression and phenotype through homology inference needs careful consideration of integrating cis and trans-regulatory components in the curated and inferred knowledge graph.

6.

The Ontology of Biological Attributes (OBA)-computational traits for the life sciences.

Stefancsik, Ray; Balhoff, James P; Balk, Meghan A; Ball, Robyn L; Bello, Susan M; Caron, Anita R; Chesler, Elissa J; de Souza, Vinicius; Gehrke, Sarah; Haendel, Melissa; Harris, Laura W; Harris, Nomi L; Ibrahim, Arwa; Koehler, Sebastian; Matentzoglu, Nicolas; McMurry, Julie A; Mungall, Christopher J; Munoz-Torres, Monica C; Putman, Tim; Robinson, Peter; Smedley, Damian; Sollis, Elliot; Thessen, Anne E; Vasilevsky, Nicole; Walton, David O; Osumi-Sutherland, David.

Mamm Genome ; 34(3): 364-378, 2023 09.

Article in English | MEDLINE | ID: mdl-37076585

ABSTRACT

Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.

Subject(s)

Biological Ontologies , Biological Science Disciplines , Genome-Wide Association Study , Phenotype

7.

The Superfund Research Program Analytics Portal: linking environmental chemical exposure to biological phenotypes.

Gosline, Sara J C; Kim, Doo Nam; Pande, Paritosh; Thomas, Dennis G; Truong, Lisa; Hoffman, Peter; Barton, Michael; Loftus, Joseph; Moran, Addy; Hampton, Shawn; Dowson, Scott; Franklin, Lyndsey; Degnan, David; Anderson, Lindsey; Thessen, Anne; Tanguay, Robyn L; Anderson, Kim A; Waters, Katrina M.

Sci Data ; 10(1): 151, 2023 03 21.

Article in English | MEDLINE | ID: mdl-36944655

ABSTRACT

The OSU/PNNL Superfund Research Program (SRP) represents a longstanding collaboration to quantify Polycyclic Aromatic Hydrocarbons (PAHs) at various superfund sites in the Pacific Northwest and assess their potential impact on human health. To link the chemical measurements to biological activity, we describe the use of the zebrafish as a high-throughput developmental toxicity model that provides quantitative measurements of the exposure to chemicals. Toward this end, we have linked over 150 PAHs found at Superfund sites to the effect of these same chemicals in zebrafish, creating a rich dataset that links environmental exposure to biological response. To quantify this response, we have implemented a dose-response modelling pipeline to calculate benchmark dose parameters which enable potency comparison across over 500 chemicals and 12 of the phenotypes measured in zebrafish. We provide a rich dataset for download and analysis as well as a web portal that provides public access to this dataset via an interactive web site designed to support exploration and re-use of these data by the scientific community at http://srp.pnnl.gov .

Subject(s)

Environmental Exposure , Polycyclic Aromatic Hydrocarbons , Zebrafish , Animals , Humans , Environmental Exposure/analysis , Hazardous Substances/analysis , Northwestern United States , Polycyclic Aromatic Hydrocarbons/toxicity , Polycyclic Aromatic Hydrocarbons/analysis

8.

Workshop Report: Catalyzing Knowledge-Driven Discovery in Environmental Health Sciences through a Harmonized Language.

Holmgren, Stephanie; Bell, Shannon M; Wignall, Jessica; Duncan, Christopher G; Kwok, Richard K; Cronk, Ryan; Osborn, Kimberly; Black, Steven; Thessen, Anne; Schmitt, Charles.

Int J Environ Res Public Health ; 20(3)2023 01 28.

Article in English | MEDLINE | ID: mdl-36767684

ABSTRACT

Harmonized language is essential to finding, sharing, and reusing large-scale, complex data. Gaps and barriers prevent the adoption of harmonized language approaches in environmental health sciences (EHS). To address this, the National Institute of Environmental Health Sciences and partners created the Environmental Health Language Collaborative (EHLC). The purpose of EHLC is to facilitate a community-driven effort to advance the development and adoption of harmonized language approaches in EHS. EHLC is a forum to pinpoint language harmonization gaps, to facilitate the development of, raise awareness of, and encourage the use of harmonization approaches and tools, and to develop new standards and recommendations. To ensure that EHLC's focus and structure would be sustainable long-term and meet the needs of the field, EHLC launched an inaugural workshop in September 2021 focused on "Developing Sustainable Language Solutions" and "Building a Sustainable Community". When the attendees were surveyed, 91% said harmonized language solutions would be of high value/benefit, and 60% agreed to continue contributing to EHLC efforts. Based on workshop discussions, future activities will focus on targeted collaborative use-case working groups in addition to offering education and training on ontologies, metadata, and standards, and developing an EHS language resource portal.

Subject(s)

Environmental Health , Language , United States , National Institute of Environmental Health Sciences (U.S.)

9.

The Ontology of Biological Attributes (OBA) - Computational Traits for the Life Sciences.

Stefancsik, Ray; Balhoff, James P; Balk, Meghan A; Ball, Robyn; Bello, Susan M; Caron, Anita R; Chessler, Elissa; de Souza, Vinicius; Gehrke, Sarah; Haendel, Melissa; Harris, Laura W; Harris, Nomi L; Ibrahim, Arwa; Koehler, Sebastian; Matentzoglu, Nicolas; McMurry, Julie A; Mungall, Christopher J; Munoz-Torres, Monica C; Putman, Tim; Robinson, Peter; Smedley, Damian; Sollis, Elliot; Thessen, Anne E; Vasilevsky, Nicole; Walton, David O; Osumi-Sutherland, David.

bioRxiv ; 2023 Jan 27.

Article in English | MEDLINE | ID: mdl-36747660

ABSTRACT

Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focused measurable trait data. Moreover, variations in gene expression in response to environmental disturbances even without any genetic alterations can also be associated with particular biological attributes. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.

10.

The Environmental Conditions, Treatments, and Exposures Ontology (ECTO): connecting toxicology and exposure to human health and beyond.

Chan, Lauren E; Thessen, Anne E; Duncan, William D; Matentzoglu, Nicolas; Schmitt, Charles; Grondin, Cynthia J; Vasilevsky, Nicole; McMurry, Julie A; Robinson, Peter N; Mungall, Christopher J; Haendel, Melissa A.

J Biomed Semantics ; 14(1): 3, 2023 02 24.

Article in English | MEDLINE | ID: mdl-36823605

ABSTRACT

BACKGROUND: Evaluating the impact of environmental exposures on organism health is a key goal of modern biomedicine and is critically important in an age of greater pollution and chemicals in our environment. Environmental health utilizes many different research methods and generates a variety of data types. However, to date, no comprehensive database represents the full spectrum of environmental health data. Due to a lack of interoperability between databases, tools for integrating these resources are needed. In this manuscript we present the Environmental Conditions, Treatments, and Exposures Ontology (ECTO), a species-agnostic ontology focused on exposure events that occur as a result of natural and experimental processes, such as diet, work, or research activities. ECTO is intended for use in harmonizing environmental health data resources to support cross-study integration and inference for mechanism discovery. METHODS AND FINDINGS: ECTO is an ontology designed for describing organismal exposures such as toxicological research, environmental variables, dietary features, and patient-reported data from surveys. ECTO utilizes the base model established within the Exposure Ontology (ExO). ECTO is developed using a combination of manual curation and Dead Simple OWL Design Patterns (DOSDP), and contains over 2700 environmental exposure terms, and incorporates chemical and environmental ontologies. ECTO is an Open Biological and Biomedical Ontology (OBO) Foundry ontology that is designed for interoperability, reuse, and axiomatization with other ontologies. ECTO terms have been utilized in axioms within the Mondo Disease Ontology to represent diseases caused or influenced by environmental factors, as well as for survey encoding for the Personalized Environment and Genes Study (PEGS). CONCLUSIONS: We constructed ECTO to meet Open Biological and Biomedical Ontology (OBO) Foundry principles to increase translation opportunities between environmental health and other areas of biology. ECTO has a growing community of contributors consisting of toxicologists, public health epidemiologists, and health care providers to provide the necessary expertise for areas that have been identified previously as gaps.

Subject(s)

Biological Ontologies , Humans , Databases, Factual

11.

The Exposome and Nutritional Pharmacology and Toxicology: A New Application for Metabolomics.

Rushing, Blake R; Thessen, Anne E; Soliman, Ghada A; Ramesh, Aramandla; Sumner, Susan Cj.

Exposome ; 3(1)2023.

Article in English | MEDLINE | ID: mdl-38766521

ABSTRACT

The exposome refers to all of the internal and external life-long exposures that an individual experiences. These exposures, either acute or chronic, are associated with changes in metabolism that will positively or negatively influence the health and well-being of individuals. Nutrients and other dietary compounds modulate similar biochemical processes and have the potential in some cases to counteract the negative effects of exposures or enhance their beneficial effects. We present herein the concept of Nutritional Pharmacology/Toxicology which uses high-information metabolomics workflows to identify metabolic targets associated with exposures. Using this information, nutritional interventions can be designed toward those targets to mitigate adverse effects or enhance positive effects. We also discuss the potential for this approach in precision nutrition where nutrients/diet can be used to target gene-environment interactions and other subpopulation characteristics. Deriving these "nutrient cocktails" presents an opportunity to modify the effects of exposures for more beneficial outcomes in public health.

12.

Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science.

Unni, Deepak R; Moxon, Sierra A T; Bada, Michael; Brush, Matthew; Bruskiewich, Richard; Caufield, J Harry; Clemons, Paul A; Dancik, Vlado; Dumontier, Michel; Fecho, Karamarie; Glusman, Gustavo; Hadlock, Jennifer J; Harris, Nomi L; Joshi, Arpita; Putman, Tim; Qin, Guangrong; Ramsey, Stephen A; Shefchek, Kent A; Solbrig, Harold; Soman, Karthik; Thessen, Anne E; Haendel, Melissa A; Bizon, Chris; Mungall, Christopher J.

Clin Transl Sci ; 15(8): 1848-1855, 2022 08.

Article in English | MEDLINE | ID: mdl-36125173

ABSTRACT

Within clinical, biomedical, and translational science, an increasing number of projects are adopting graphs for knowledge representation. Graph-based data models elucidate the interconnectedness among core biomedical concepts, enable data structures to be easily updated, and support intuitive queries, visualizations, and inference algorithms. However, knowledge discovery across these "knowledge graphs" (KGs) has remained difficult. Data set heterogeneity and complexity; the proliferation of ad hoc data formats; poor compliance with guidelines on findability, accessibility, interoperability, and reusability; and, in particular, the lack of a universally accepted, open-access model for standardization across biomedical KGs has left the task of reconciling data sources to downstream consumers. Biolink Model is an open-source data model that can be used to formalize the relationships between data structures in translational science. It incorporates object-oriented classification and graph-oriented features. The core of the model is a set of hierarchical, interconnected classes (or categories) and relationships between them (or predicates) representing biomedical entities such as gene, disease, chemical, anatomic structure, and phenotype. The model provides class and edge attributes and associations that guide how entities should relate to one another. Here, we highlight the need for a standardized data model for KGs, describe Biolink Model, and compare it with other models. We demonstrate the utility of Biolink Model in various initiatives, including the Biomedical Data Translator Consortium and the Monarch Initiative, and show how it has supported easier integration and interoperability of biomedical KGs, bringing together knowledge from multiple sources and helping to realize the goals of translational science.

Subject(s)

Pattern Recognition, Automated , Translational Science, Biomedical , Knowledge

13.

A Simple Standard for Sharing Ontological Mappings (SSSOM).

Matentzoglu, Nicolas; Balhoff, James P; Bello, Susan M; Bizon, Chris; Brush, Matthew; Callahan, Tiffany J; Chute, Christopher G; Duncan, William D; Evelo, Chris T; Gabriel, Davera; Graybeal, John; Gray, Alasdair; Gyori, Benjamin M; Haendel, Melissa; Harmse, Henriette; Harris, Nomi L; Harrow, Ian; Hegde, Harshad B; Hoyt, Amelia L; Hoyt, Charles T; Jiao, Dazhi; Jiménez-Ruiz, Ernesto; Jupp, Simon; Kim, Hyeongsik; Koehler, Sebastian; Liener, Thomas; Long, Qinqin; Malone, James; McLaughlin, James A; McMurry, Julie A; Moxon, Sierra; Munoz-Torres, Monica C; Osumi-Sutherland, David; Overton, James A; Peters, Bjoern; Putman, Tim; Queralt-Rosinach, Núria; Shefchek, Kent; Solbrig, Harold; Thessen, Anne; Tudorache, Tania; Vasilevsky, Nicole; Wagner, Alex H; Mungall, Christopher J.

Database (Oxford) ; 20222022 05 25.

Article in English | MEDLINE | ID: mdl-35616100

ABSTRACT

Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec.

Subject(s)

Metadata , Semantic Web , Data Management , Databases, Factual , Workflow

14.

Progress toward a universal biomedical data translator.

Fecho, Karamarie; Thessen, Anne E; Baranzini, Sergio E; Bizon, Chris; Hadlock, Jennifer J; Huang, Sui; Roper, Ryan T; Southall, Noel; Ta, Casey; Watkins, Paul B; Williams, Mark D; Xu, Hao; Byrd, William; Dancík, Vlado; Duby, Marc P; Dumontier, Michel; Glusman, Gustavo; Harris, Nomi L; Hinderer, Eugene W; Hyde, Greg; Johs, Adam; Su, Andrew I; Qin, Guangrong; Zhu, Qian.

Clin Transl Sci ; 2022 May 25.

Article in English | MEDLINE | ID: mdl-35611543

ABSTRACT

Clinical, biomedical, and translational science has reached an inflection point in the breadth and diversity of available data and the potential impact of such data to improve human health and well-being. However, the data are often siloed, disorganized, and not broadly accessible due to discipline-specific differences in terminology and representation. To address these challenges, the Biomedical Data Translator Consortium has developed and tested a pilot knowledge graph-based "Translator" system capable of integrating existing biomedical data sets and "translating" those data into insights intended to augment human reasoning and accelerate translational science. Having demonstrated feasibility of the Translator system, the Translator program has since moved into development, and the Translator Consortium has made significant progress in the research, design, and implementation of an operational system. Herein, we describe the current system's architecture, performance, and quality of results. We apply Translator to several real-world use cases developed in collaboration with subject-matter experts. Finally, we discuss the scientific and technical features of Translator and compare those features to other state-of-the-art, biomedical graph-based question-answering systems.

15.

Implementation of Zebrafish Ontologies for Toxicology Screening.

Thessen, Anne E; Marvel, Skylar; Achenbach, J C; Fischer, Stephan; Haendel, Melissa A; Hayward, Kimberly; Klüver, Nils; Könemann, Sarah; Legradi, Jessica; Lein, Pamela; Leong, Connor; Mylroie, J Erik; Padilla, Stephanie; Perone, Dante; Planchart, Antonio; Prieto, Rafael Miñana; Muriana, Arantza; Quevedo, Celia; Reif, David; Ryan, Kristen; Stinckens, Evelyn; Truong, Lisa; Vergauwen, Lucia; Vom Berg, Colette; Wilbanks, Mitch; Yaghoobi, Bianca; Hamm, Jon.

Front Toxicol ; 4: 817999, 2022.

Article in English | MEDLINE | ID: mdl-35387429

ABSTRACT

Toxicological evaluation of chemicals using early-life stage zebrafish (Danio rerio) involves the observation and recording of altered phenotypes. Substantial variability has been observed among researchers in phenotypes reported from similar studies, as well as a lack of consistent data annotation, indicating a need for both terminological and data harmonization. When examined from a data science perspective, many of these apparent differences can be parsed into the same or similar endpoints whose measurements differ only in time, methodology, or nomenclature. Ontological knowledge structures can be leveraged to integrate diverse data sets across terminologies, scales, and modalities. Building on this premise, the National Toxicology Program's Systematic Evaluation of the Application of Zebrafish in Toxicology undertook a collaborative exercise to evaluate how the application of standardized phenotype terminology improved data consistency. To accomplish this, zebrafish researchers were asked to assess images of zebrafish larvae for morphological malformations in two surveys. In the first survey, researchers were asked to annotate observed malformations using their own terminology. In the second survey, researchers were asked to annotate the images from a list of terms and definitions from the Zebrafish Phenotype Ontology. Analysis of the results suggested that the use of ontology terms increased consistency and decreased ambiguity, but a larger study is needed to confirm. We conclude that utilizing a common data standard will not only reduce the heterogeneity of reported terms but increases agreement and repeatability between different laboratories. Thus, we advocate for the development of a zebrafish phenotype atlas to help laboratories create interoperable, computable data.

16.

Catalyzing Knowledge-Driven Discovery in Environmental Health Sciences through a Community-Driven Harmonized Language.

Holmgren, Stephanie D; Boyles, Rebecca R; Cronk, Ryan D; Duncan, Christopher G; Kwok, Richard K; Lunn, Ruth M; Osborn, Kimberly C; Thessen, Anne E; Schmitt, Charles P.

Int J Environ Res Public Health ; 18(17)2021 08 26.

Article in English | MEDLINE | ID: mdl-34501574

ABSTRACT

Harmonized language is critical for helping researchers to find data, collecting scientific data to facilitate comparison, and performing pooled and meta-analyses. Using standard terms to link data to knowledge systems facilitates knowledge-driven analysis, allows for the use of biomedical knowledge bases for scientific interpretation and hypothesis generation, and increasingly supports artificial intelligence (AI) and machine learning. Due to the breadth of environmental health sciences (EHS) research and the continuous evolution in scientific methods, the gaps in standard terminologies, vocabularies, ontologies, and related tools hamper the capabilities to address large-scale, complex EHS research questions that require the integration of disparate data and knowledge sources. The results of prior workshops to advance a harmonized environmental health language demonstrate that future efforts should be sustained and grounded in scientific need. We describe a community initiative whose mission was to advance integrative environmental health sciences research via the development and adoption of a harmonized language. The products, outcomes, and recommendations developed and endorsed by this community are expected to enhance data collection and management efforts for NIEHS and the EHS community, making data more findable and interoperable. This initiative will provide a community of practice space to exchange information and expertise, be a coordination hub for identifying and prioritizing activities, and a collaboration platform for the development and adoption of semantic solutions. We encourage anyone interested in advancing this mission to engage in this community.

Subject(s)

Artificial Intelligence , Language , Environmental Health , Knowledge Bases , National Institute of Environmental Health Sciences (U.S.) , United States

17.

From Reductionism to Reintegration: Solving society's most pressing problems requires building bridges between data types across the life sciences.

Thessen, Anne E; Bogdan, Paul; Patterson, David J; Casey, Theresa M; Hinojo-Hinojo, César; de Lange, Orlando; Haendel, Melissa A.

PLoS Biol ; 19(3): e3001129, 2021 03.

Article in English | MEDLINE | ID: mdl-33770077

ABSTRACT

Decades of reductionist approaches in biology have achieved spectacular progress, but the proliferation of subdisciplines, each with its own technical and social practices regarding data, impedes the growth of the multidisciplinary and interdisciplinary approaches now needed to address pressing societal challenges. Data integration is key to a reintegrated biology able to address global issues such as climate change, biodiversity loss, and sustainable ecosystem management. We identify major challenges to data integration and present a vision for a "Data as a Service"-oriented architecture to promote reuse of data for discovery. The proposed architecture includes standards development, new tools and services, and strategies for career-development and sustainability.

Subject(s)

Data Management/methods , Information Dissemination/methods , Interdisciplinary Research/trends , Biodiversity , Biological Science Disciplines , Conservation of Natural Resources , Ecosystem , Interdisciplinary Communication , Interdisciplinary Research/methods

18.

The landscape of nutri-informatics: a review of current resources and challenges for integrative nutrition research.

Chan, Lauren; Vasilevsky, Nicole; Thessen, Anne; McMurry, Julie; Haendel, Melissa.

Database (Oxford) ; 20212021 01 25.

Article in English | MEDLINE | ID: mdl-33494105

ABSTRACT

Informatics has become an essential component of research in the past few decades, capitalizing on the efficiency and power of computation to improve the knowledge gained from increasing quantities and types of data. While other fields of research such as genomics are well represented in informatics resources, nutrition remains underrepresented. Nutrition is one of the most integral components of human life, and it impacts individuals far beyond just nutrient provisions. For example, nutrition plays a role in cultural practices, interpersonal relationships and body image. Despite this, integrated computational investigations have been limited due to challenges within nutrition informatics (nutri-informatics) and nutrition data. The purpose of this review is to describe the landscape of nutri-informatics resources available for use in computational nutrition research and clinical utilization. In particular, we will focus on the application of biomedical ontologies and their potential to improve the standardization and interoperability of nutrition terminologies and relationships between nutrition and other biomedical disciplines such as disease and phenomics. Additionally, we will highlight challenges currently faced by the nutri-informatics community including experimental design, data aggregation and the roles scientific journals and primary nutrition researchers play in facilitating data reuse and successful computational research. Finally, we will conclude with a call to action to create and follow community standards regarding standardization of language, documentation specifications and requirements for data reuse. With the continued movement toward community standards of this kind, the entire nutrition research community can transition toward greater usage of Findability, Accessibility, Interoperability and Reusability principles and in turn more transparent science.

Subject(s)

Informatics , Medical Informatics , Genomics , Humans , Knowledge , Research

19.

Community Approaches for Integrating Environmental Exposures into Human Models of Disease.

Thessen, Anne E; Grondin, Cynthia J; Kulkarni, Resham D; Brander, Susanne; Truong, Lisa; Vasilevsky, Nicole A; Callahan, Tiffany J; Chan, Lauren E; Westra, Brian; Willis, Mary; Rothenberg, Sarah E; Jarabek, Annie M; Burgoon, Lyle; Korrick, Susan A; Haendel, Melissa A.

Environ Health Perspect ; 128(12): 125002, 2020 12.

Article in English | MEDLINE | ID: mdl-33369481

ABSTRACT

BACKGROUND: A critical challenge in genomic medicine is identifying the genetic and environmental risk factors for disease. Currently, the available data links a majority of known coding human genes to phenotypes, but the environmental component of human disease is extremely underrepresented in these linked data sets. Without environmental exposure information, our ability to realize precision health is limited, even with the promise of modern genomics. Achieving integration of gene, phenotype, and environment will require extensive translation of data into a standard, computable form and the extension of the existing gene/phenotype data model. The data standards and models needed to achieve this integration do not currently exist. OBJECTIVES: Our objective is to foster development of community-driven data-reporting standards and a computational model that will facilitate the inclusion of exposure data in computational analysis of human disease. To this end, we present a preliminary semantic data model and use cases and competency questions for further community-driven model development and refinement. DISCUSSION: There is a real desire by the exposure science, epidemiology, and toxicology communities to use informatics approaches to improve their research workflow, gain new insights, and increase data reuse. Critical to success is the development of a community-driven data model for describing environmental exposures and linking them to existing models of human disease. https://doi.org/10.1289/EHP7215.

Subject(s)

Environmental Exposure , Environmental Pollutants , Genome, Human , Genomics , Humans

20.

Transforming the study of organisms: Phenomic data models and knowledge bases.

Thessen, Anne E; Walls, Ramona L; Vogt, Lars; Singer, Jessica; Warren, Robert; Buttigieg, Pier Luigi; Balhoff, James P; Mungall, Christopher J; McGuinness, Deborah L; Stucky, Brian J; Yoder, Matthew J; Haendel, Melissa A.

PLoS Comput Biol ; 16(11): e1008376, 2020 11.

Article in English | MEDLINE | ID: mdl-33232313

ABSTRACT

The rapidly decreasing cost of gene sequencing has resulted in a deluge of genomic data from across the tree of life; however, outside a few model organism databases, genomic data are limited in their scientific impact because they are not accompanied by computable phenomic data. The majority of phenomic data are contained in countless small, heterogeneous phenotypic data sets that are very difficult or impossible to integrate at scale because of variable formats, lack of digitization, and linguistic problems. One powerful solution is to represent phenotypic data using data models with precise, computable semantics, but adoption of semantic standards for representing phenotypic data has been slow, especially in biodiversity and ecology. Some phenotypic and trait data are available in a semantic language from knowledge bases, but these are often not interoperable. In this review, we will compare and contrast existing ontology and data models, focusing on nonhuman phenotypes and traits. We discuss barriers to integration of phenotypic data and make recommendations for developing an operationally useful, semantically interoperable phenotypic data ecosystem.

Subject(s)

Databases, Genetic , Knowledge Bases , Phenomics , Animals , Classification , Computational Biology , Ecosystem , Gene-Environment Interaction , Humans , Models, Biological , Models, Genetic , Models, Statistical , Phenotype , Semantics

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL