Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 47
Filter
Add more filters










Publication year range
1.
Methods Mol Biol ; 2802: 587-609, 2024.
Article in English | MEDLINE | ID: mdl-38819573

ABSTRACT

Comparative analysis of (meta)genomes necessitates aggregation, integration, and synthesis of well-annotated data using standards. The Genomic Standards Consortium (GSC) collaborates with the research community to develop and maintain the Minimum Information about any (x) Sequence (MIxS) reporting standard for genomic data. To facilitate the use of the GSC's MIxS reporting standard, we provide a description of the structure and terminology, how to navigate ontologies for required terms in MIxS, and demonstrate practical usage through a soil metagenome example.


Subject(s)
Genomics , Metagenome , Metagenomics , Metagenomics/methods , Metagenomics/standards , Genomics/methods , Genomics/standards , Metagenome/genetics , Databases, Genetic , Soil Microbiology
3.
PLoS Comput Biol ; 20(2): e1011270, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38324613

ABSTRACT

CyVerse, the largest publicly-funded open-source research cyberinfrastructure for life sciences, has played a crucial role in advancing data-driven research since the 2010s. As the technology landscape evolved with the emergence of cloud computing platforms, machine learning and artificial intelligence (AI) applications, CyVerse has enabled access by providing interfaces, Software as a Service (SaaS), and cloud-native Infrastructure as Code (IaC) to leverage new technologies. CyVerse services enable researchers to integrate institutional and private computational resources, custom software, perform analyses, and publish data in accordance with open science principles. Over the past 13 years, CyVerse has registered more than 124,000 verified accounts from 160 countries and was used for over 1,600 peer-reviewed publications. Since 2011, 45,000 students and researchers have been trained to use CyVerse. The platform has been replicated and deployed in three countries outside the US, with additional private deployments on commercial clouds for US government agencies and multinational corporations. In this manuscript, we present a strategic blueprint for creating and managing SaaS cyberinfrastructure and IaC as free and open-source software.


Subject(s)
Artificial Intelligence , Software , Humans , Cloud Computing , Publishing
4.
Nucleic Acids Res ; 52(D1): D938-D949, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-38000386

ABSTRACT

Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.


Subject(s)
Databases, Factual , Disease , Genes , Phenotype , Humans , Internet , Databases, Factual/standards , Software , Genes/genetics , Disease/genetics
5.
Biodivers Data J ; 11: e112420, 2023.
Article in English | MEDLINE | ID: mdl-37829294

ABSTRACT

The standardization of data, encompassing both primary and contextual information (metadata), plays a pivotal role in facilitating data (re-)use, integration, and knowledge generation. However, the biodiversity and omics communities, converging on omics biodiversity data, have historically developed and adopted their own distinct standards, hindering effective (meta)data integration and collaboration. In response to this challenge, the Task Group (TG) for Sustainable DwC-MIxS Interoperability was established. Convening experts from the Biodiversity Information Standards (TDWG) and the Genomic Standards Consortium (GSC) alongside external stakeholders, the TG aimed to promote sustainable interoperability between the Minimum Information about any (x) Sequence (MIxS) and Darwin Core (DwC) specifications. To achieve this goal, the TG utilized the Simple Standard for Sharing Ontology Mappings (SSSOM) to create a comprehensive mapping of DwC keys to MIxS keys. This mapping, combined with the development of the MIxS-DwC extension, enables the incorporation of MIxS core terms into DwC-compliant metadata records, facilitating seamless data exchange between MIxS and DwC user communities. Through the implementation of this translation layer, data produced in either MIxS- or DwC-compliant formats can now be efficiently brokered, breaking down silos and fostering closer collaboration between the biodiversity and omics communities. To ensure its sustainability and lasting impact, TDWG and GSC have both signed a Memorandum of Understanding (MoU) on creating a continuous model to synchronize their standards. These achievements mark a significant step forward in enhancing data sharing and utilization across domains, thereby unlocking new opportunities for scientific discovery and advancement.

6.
Ecol Lett ; 26(11): 1877-1886, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37721806

ABSTRACT

Climate change has already caused local extinction in many plants and animals, based on surveys spanning many decades. As climate change accelerates, the pace of these extinctions may also accelerate, potentially leading to large-scale, species-level extinctions. We tested this hypothesis in a montane lizard. We resurveyed 18 mountain ranges in 2021-2022 after only ~7 years. We found rates of local extinction among the fastest ever recorded, which have tripled in the past ~7 years relative to the preceding ~42 years. Further, climate change generated local extinction in ~7 years similar to that seen in other organisms over ~70 years. Yet, contrary to expectations, populations at two of the hottest sites survived. We found that genomic data helped predict which populations survived and which went extinct. Overall, we show the increasing risk to biodiversity posed by accelerating climate change and the opportunity to study its effects over surprisingly brief timescales.


Subject(s)
Climate Change , Lizards , Animals , Biodiversity , Lizards/genetics , Hot Temperature , Extinction, Biological , Ecosystem
7.
J Pharmacokinet Pharmacodyn ; 50(6): 507-519, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37131052

ABSTRACT

Rare disease drug development is wrought with challenges not the least of which is access to the limited data currently available throughout the rare disease ecosystem where sharing of the available data is not guaranteed. Most pharmaceutical sponsors seeking to develop agents to treat rare diseases will initiate data landscaping efforts to identify various data sources that might be informative with respect to disease prevalence, patient selection and identification, disease progression and any data projecting likelihood of patient response to therapy including any genetic data. Such data are often difficult to come by for highly prevalent, mainstream disease populations let alone for the 8000 rare disease that make up the pooled patient population of rare disease patients. The future of rare disease drug development will hopefully rely on increased data sharing and collaboration among the entire rare disease ecosystem. One path to achieving this outcome has been the development of the rare disease cures accelerator, data analytics platform (RDCA-DAP) funded by the US FDA and operationalized by the Critical Path Institute. FDA intentions were clearly focused on improving the quality of rare disease regulatory applications by sponsors seeking to develop treatment options for various rare disease populations. As this initiative moves into its second year of operations it is envisioned that the increased connectivity to new and diverse data streams and tools will result in solutions that benefit the entire rare disease ecosystem and that the platform becomes a Collaboratory for engagement of this ecosystem that also includes patients and caregivers.


Subject(s)
Rare Diseases , Humans , Data Science , Disease Progression , Rare Diseases/drug therapy
8.
iScience ; 25(10): 105101, 2022 Oct 21.
Article in English | MEDLINE | ID: mdl-36212022

ABSTRACT

Understanding variation of traits within and among species through time and across space is central to many questions in biology. Many resources assemble species-level trait data, but the data and metadata underlying those trait measurements are often not reported. Here, we introduce FuTRES (Functional Trait Resource for Environmental Studies; pronounced few-tress), an online datastore and community resource for individual-level trait reporting that utilizes a semantic framework. FuTRES already stores millions of trait measurements for paleobiological, zooarchaeological, and modern specimens, with a current focus on mammals. We compare dynamically derived extant mammal species' body size measurements in FuTRES with summary values from other compilations, highlighting potential issues with simply reporting a single mean estimate. We then show that individual-level data improve estimates of body mass-including uncertainty-for zooarchaeological specimens. FuTRES facilitates trait data integration and discoverability, accelerating new research agendas, especially scaling from intra- to interspecific trait variability.

9.
Database (Oxford) ; 20222022 10 08.
Article in English | MEDLINE | ID: mdl-36208225

ABSTRACT

Similar to managing software packages, managing the ontology life cycle involves multiple complex workflows such as preparing releases, continuous quality control checking and dependency management. To manage these processes, a diverse set of tools is required, from command-line utilities to powerful ontology-engineering environmentsr. Particularly in the biomedical domain, which has developed a set of highly diverse yet inter-dependent ontologies, standardizing release practices and metadata and establishing shared quality standards are crucial to enable interoperability. The Ontology Development Kit (ODK) provides a set of standardized, customizable and automatically executable workflows, and packages all required tooling in a single Docker image. In this paper, we provide an overview of how the ODK works, show how it is used in practice and describe how we envision it driving standardization efforts in our community. Database URL: https://github.com/INCATools/ontology-development-kit.


Subject(s)
Biological Ontologies , Databases, Factual , Metadata , Quality Control , Software , Workflow
10.
Front Nutr ; 9: 928837, 2022.
Article in English | MEDLINE | ID: mdl-35811979

ABSTRACT

Informed policy and decision-making for food systems, nutritional security, and global health would benefit from standardization and comparison of food composition data, spanning production to consumption. To address this challenge, we present a formal controlled vocabulary of terms, definitions, and relationships within the Compositional Dietary Nutrition Ontology (CDNO, www.cdno.info) that enables description of nutritional attributes for material entities contributing to the human diet. We demonstrate how ongoing community development of CDNO classes can harmonize trans-disciplinary approaches for describing nutritional components from food production to diet.

11.
Ther Innov Regul Sci ; 56(5): 768-776, 2022 09.
Article in English | MEDLINE | ID: mdl-35668316

ABSTRACT

Rare diseases impact the lives of an estimated 350 million people worldwide, and yet about 90% of rare diseases remain without an approved treatment. New technologies have become available, such as gene and oligonucleotide therapies, that offer great promise in treating rare diseases. However, progress toward the development of therapies to treat these diseases is hampered by a limited understanding of the course of each rare disease, how changes in disease progression occur and can be effectively measured over time, and challenges in designing and running clinical trials in diseases where the natural history is poorly characterized. Data that could be used to characterize the natural history of each disease has often been collected in various ways, including in electronic health records, patient-report registries, clinical natural history studies, and in past clinical trials. However, each data source contains a limited number of subjects and different data elements, and data is frequently kept proprietary in the hands of the study sponsor rather than shared widely across the rare disease community. The Rare Disease Cures Accelerator-Data and Analytics Platform (RDCA-DAP) is an FDA-funded effort to overcome these persistent challenges. By aggregating data across all rare diseases and making that data available to the community to support understanding of rare disease natural history and inform drug development, RDCA-DAP aims to accelerate the regulatory approval of new therapies. RDCA-DAP curates, standardizes, and tags data across rare disease datasets to make it findable within the database, and contains a built-in analytics platform to help visualize, interpret, and use it to support drug development. RDCA-DAP will coordinate data and tool resources across non-profit, commercial, and for-profit entities to serve a diverse array of rare disease stakeholders that includes academic researchers, drug developers, FDA reviewers and of course patients and their caregivers. Drug development programs utilizing the RDCA-DAP will be able to leverage existing data to support their efforts and reach definitive decisions on the efficacy of their therapeutics more efficiently and more rapidly than ever.


Subject(s)
Drug Development , Rare Diseases , Databases, Factual , Humans , Rare Diseases/drug therapy , Registries
12.
ISME Commun ; 2(1): 9, 2022 Feb 01.
Article in English | MEDLINE | ID: mdl-37938691

ABSTRACT

The symbiont-associated (SA) environmental package is a new extension to the minimum information about any (x) sequence (MIxS) standards, established by the Parasite Microbiome Project (PMP) consortium, in collaboration with the Genomics Standard Consortium. The SA was built upon the host-associated MIxS standard, but reflects the nestedness of symbiont-associated microbiota within and across host-symbiont-microbe interactions. This package is designed to facilitate the collection and reporting of a broad range of metadata information that apply to symbionts such as life history traits, association with one or multiple host organisms, or the nature of host-symbiont interactions along the mutualism-parasitism continuum. To better reflect the inherent nestedness of all biological systems, we present a novel feature that allows users to co-localize samples, to nest a package within another package, and to identify replicates. Adoption of the MIxS-SA and of the new terms will facilitate reports of complex sampling design from a myriad of environments.

13.
Gigascience ; 122022 12 28.
Article in English | MEDLINE | ID: mdl-37632753

ABSTRACT

Omic BON is a thematic Biodiversity Observation Network under the Group on Earth Observations Biodiversity Observation Network (GEO BON), focused on coordinating the observation of biomolecules in organisms and the environment. Our founding partners include representatives from national, regional, and global observing systems; standards organizations; and data and sample management infrastructures. By coordinating observing strategies, methods, and data flows, Omic BON will facilitate the co-creation of a global omics meta-observatory to generate actionable knowledge. Here, we present key elements of Omic BON's founding charter and first activities.


Subject(s)
Biodiversity , Knowledge
14.
Database (Oxford) ; 20212021 10 26.
Article in English | MEDLINE | ID: mdl-34697637

ABSTRACT

Biological ontologies are used to organize, curate and interpret the vast quantities of data arising from biological experiments. While this works well when using a single ontology, integrating multiple ontologies can be problematic, as they are developed independently, which can lead to incompatibilities. The Open Biological and Biomedical Ontologies (OBO) Foundry was created to address this by facilitating the development, harmonization, application and sharing of ontologies, guided by a set of overarching principles. One challenge in reaching these goals was that the OBO principles were not originally encoded in a precise fashion, and interpretation was subjective. Here, we show how we have addressed this by formally encoding the OBO principles as operational rules and implementing a suite of automated validation checks and a dashboard for objectively evaluating each ontology's compliance with each principle. This entailed a substantial effort to curate metadata across all ontologies and to coordinate with individual stakeholders. We have applied these checks across the full OBO suite of ontologies, revealing areas where individual ontologies require changes to conform to our principles. Our work demonstrates how a sizable, federated community can be organized and evaluated on objective criteria that help improve overall quality and interoperability, which is vital for the sustenance of the OBO project and towards the overall goals of making data Findable, Accessible, Interoperable, and Reusable (FAIR). Database URL http://obofoundry.org/.


Subject(s)
Biological Ontologies , Databases, Factual , Metadata
15.
Gigascience ; 10(5)2021 05 07.
Article in English | MEDLINE | ID: mdl-33960385

ABSTRACT

Sampling the natural world and built environment underpins much of science, yet systems for managing material samples and associated (meta)data are fragmented across institutional catalogs, practices for identification, and discipline-specific (meta)data standards. The Internet of Samples (iSamples) is a standards-based collaboration to uniquely, consistently, and conveniently identify material samples, record core metadata about them, and link them to other samples, data, and research products. iSamples extends existing resources and best practices in data stewardship to render a cross-domain cyberinfrastructure that enables transdisciplinary research, discovery, and reuse of material samples in 21st century natural science.


Subject(s)
Internet , Metadata
17.
Article in English | MEDLINE | ID: mdl-35664667

ABSTRACT

Environmental contamination is a fundamental determinant of health and well-being, and when the environment is compromised, vulnerabilities are generated. The complex challenges associated with environmental health and food security are influenced by current and emerging political, social, economic, and environmental contexts. To solve these "wicked" dilemmas, disparate public health surveillance efforts are conducted by local, state, and federal agencies. More recently, citizen/community science (CS) monitoring efforts are providing site-specific data. One of the biggest challenges in using these government datasets, let alone incorporating CS data, for a holistic assessment of environmental exposure is data management and interoperability. To facilitate a more holistic perspective and approach to solution generation, we have developed a method to provide a common data model that will allow environmental health researchers working at different scales and research domains to exchange data and ask new questions. We anticipate that this method will help to address environmental health disparities, which are unjust and avoidable, while ensuring CS datasets are ethically integrated to achieve environmental justice. Specifically, we used a transdisciplinary research framework to develop a methodology to integrate CS data with existing governmental environmental monitoring and social attribute data (vulnerability and resilience variables) that span across 10 different federal and state agencies. A key challenge in integrating such different datasets is the lack of widely adopted ontologies for vulnerability and resiliency factors. In addition to following the best practice of submitting new term requests to existing ontologies to fill gaps, we have also created an application ontology, the Superfund Research Project Data Interface Ontology (SRPDIO).

18.
PLoS Comput Biol ; 16(11): e1008376, 2020 11.
Article in English | MEDLINE | ID: mdl-33232313

ABSTRACT

The rapidly decreasing cost of gene sequencing has resulted in a deluge of genomic data from across the tree of life; however, outside a few model organism databases, genomic data are limited in their scientific impact because they are not accompanied by computable phenomic data. The majority of phenomic data are contained in countless small, heterogeneous phenotypic data sets that are very difficult or impossible to integrate at scale because of variable formats, lack of digitization, and linguistic problems. One powerful solution is to represent phenotypic data using data models with precise, computable semantics, but adoption of semantic standards for representing phenotypic data has been slow, especially in biodiversity and ecology. Some phenotypic and trait data are available in a semantic language from knowledge bases, but these are often not interoperable. In this review, we will compare and contrast existing ontology and data models, focusing on nonhuman phenotypes and traits. We discuss barriers to integration of phenotypic data and make recommendations for developing an operationally useful, semantically interoperable phenotypic data ecosystem.


Subject(s)
Databases, Genetic , Knowledge Bases , Phenomics , Animals , Classification , Computational Biology , Ecosystem , Gene-Environment Interaction , Humans , Models, Biological , Models, Genetic , Models, Statistical , Phenotype , Semantics
20.
BMC Res Notes ; 13(1): 71, 2020 Feb 12.
Article in English | MEDLINE | ID: mdl-32051026

ABSTRACT

OBJECTIVES: Advanced tools and resources are needed to efficiently and sustainably produce food for an increasing world population in the context of variable environmental conditions. The maize genomes to fields (G2F) initiative is a multi-institutional initiative effort that seeks to approach this challenge by developing a flexible and distributed infrastructure addressing emerging problems. G2F has generated large-scale phenotypic, genotypic, and environmental datasets using publicly available inbred lines and hybrids evaluated through a network of collaborators that are part of the G2F's genotype-by-environment (G × E) project. This report covers the public release of datasets for 2014-2017. DATA DESCRIPTION: Datasets include inbred genotypic information; phenotypic, climatic, and soil measurements and metadata information for each testing location across years. For a subset of inbreds in 2014 and 2015, yield component phenotypes were quantified by image analysis. Data released are accompanied by README descriptions. For genotypic and phenotypic data, both raw data and a version without outliers are reported. For climatic data, a version calibrated to the nearest airport weather station and a version without outliers are reported. The 2014 and 2015 datasets are updated versions from the previously released files [1] while 2016 and 2017 datasets are newly available to the public.


Subject(s)
Genome, Plant/genetics , Plant Breeding , Zea mays/genetics , Datasets as Topic , Genotype , Phenotype
SELECTION OF CITATIONS
SEARCH DETAIL
...