RESUMO
In 2003, the Human Disease Ontology (DO, https://disease-ontology.org/) was established at Northwestern University. In the intervening 20 years, the DO has expanded to become a highly-utilized disease knowledge resource. Serving as the nomenclature and classification standard for human diseases, the DO provides a stable, etiology-based structure integrating mechanistic drivers of human disease. Over the past two decades the DO has grown from a collection of clinical vocabularies, into an expertly curated semantic resource of over 11300 common and rare diseases linking disease concepts through more than 37000 vocabulary cross mappings (v2023-08-08). Here, we introduce the recently launched DO Knowledgebase (DO-KB), which expands the DO's representation of the diseaseome and enhances the findability, accessibility, interoperability and reusability (FAIR) of disease data through a new SPARQL service and new Faceted Search Interface. The DO-KB is an integrated data system, built upon the DO's semantic disease knowledge backbone, with resources that expose and connect the DO's semantic knowledge with disease-related data across Open Linked Data resources. This update includes descriptions of efforts to assess the DO's global impact and improvements to data quality and content, with emphasis on changes in the last two years.
Assuntos
Ecossistema , Bases de Conhecimento , Humanos , Doenças Raras , Semântica , Fatores de TempoRESUMO
Genomics encompasses the entire tree of life, both extinct and extant, and the evolutionary processes that shape this diversity. To date, genomic research has focused on humans, a small number of agricultural species, and established laboratory models. Fewer than 18,000 of â¼2,000,000 eukaryotic species (<1%) have a representative genome sequence in GenBank, and only a fraction of these have ancillary information on genome structure, genetic variation, gene expression, epigenetic modifications, and population diversity. This imbalance reflects a perception that human studies are paramount in disease research. Yet understanding how genomes work, and how genetic variation shapes phenotypes, requires a broad view that embraces the vast diversity of life. We have the technology to collect massive and exquisitely detailed datasets about the world, but expertise is siloed into distinct fields. A new approach, integrating comparative genomics with cell and evolutionary biology, ecology, archaeology, anthropology, and conservation biology, is essential for understanding and protecting ourselves and our world. Here, we describe potential for scientific discovery when comparative genomics works in close collaboration with a broad range of fields as well as the technical, scientific, and social constraints that must be addressed.
Assuntos
Biodiversidade , Evolução Biológica , Genômica/métodos , Animais , Evolução Molecular , Variação Genética/genética , Genoma/genética , Genômica/tendências , Humanos , FilogeniaRESUMO
The Human Disease Ontology (DO) (www.disease-ontology.org) database, has significantly expanded the disease content and enhanced our userbase and website since the DO's 2018 Nucleic Acids Research DATABASE issue paper. Conservatively, based on available resource statistics, terms from the DO have been annotated to over 1.5 million biomedical data elements and citations, a 10× increase in the past 5 years. The DO, funded as a NHGRI Genomic Resource, plays a key role in disease knowledge organization, representation, and standardization, serving as a reference framework for multiscale biomedical data integration and analysis across thousands of clinical, biomedical and computational research projects and genomic resources around the world. This update reports on the addition of 1,793 new disease terms, a 14% increase of textual definitions and the integration of 22 137 new SubClassOf axioms defining disease to disease connections representing the DO's complex disease classification. The DO's updated website provides multifaceted etiology searching, enhanced documentation and educational resources.
Assuntos
Ontologias Biológicas , Bases de Dados Factuais , Bases de Dados Genéticas , Doenças Genéticas Inatas/classificação , Doenças Genéticas Inatas/genética , Genômica/classificação , HumanosRESUMO
In response to the COVID-19 outbreak, scientists and medical researchers are capturing a wide range of host responses, symptoms and lingering postrecovery problems within the human population. These variable clinical manifestations suggest differences in influential factors, such as innate and adaptive host immunity, existing or underlying health conditions, comorbidities, genetics and other factors-compounding the complexity of COVID-19 pathobiology and potential biomarkers associated with the disease, as they become available. The heterogeneous data pose challenges for efficient extrapolation of information into clinical applications. We have curated 145 COVID-19 biomarkers by developing a novel cross-cutting disease biomarker data model that allows integration and evaluation of biomarkers in patients with comorbidities. Most biomarkers are related to the immune (SAA, TNF-â and IP-10) or coagulation (D-dimer, antithrombin and VWF) cascades, suggesting complex vascular pathobiology of the disease. Furthermore, we observe commonality with established cancer biomarkers (ACE2, IL-6, IL-4 and IL-2) as well as biomarkers for metabolic syndrome and diabetes (CRP, NLR and LDL). We explore these trends as we put forth a COVID-19 biomarker resource (https://data.oncomx.org/covid19) that will help researchers and diagnosticians alike.
RESUMO
BACKGROUND: Complex diseases often present as a diagnosis riddle, further complicated by the combination of multiple phenotypes and diseases as features of other diseases. With the aim of enhancing the determination of key etiological factors, we developed and tested a complex disease model that encompasses diverse factors that in combination result in complex diseases. This model was developed to address the challenges of classifying complex diseases given the evolving nature of understanding of disease and interaction and contributions of genetic, environmental, and social factors. METHODS: Here we present a new approach for modeling complex diseases that integrates the multiple contributing genetic, epigenetic, environmental, host and social pathogenic effects causing disease. The model was developed to provide a guide for capturing diverse mechanisms of complex diseases. Assessment of disease drivers for asthma, diabetes and fetal alcohol syndrome tested the model. RESULTS: We provide a detailed rationale for a model representing the classification of complex disease using three test conditions of asthma, diabetes and fetal alcohol syndrome. Model assessment resulted in the reassessment of the three complex disease classifications and identified driving factors, thus improving the model. The model is robust and flexible to capture new information as the understanding of complex disease improves. CONCLUSIONS: The Human Disease Ontology's Complex Disease model offers a mechanism for defining more accurate disease classification as a tool for more precise clinical diagnosis. This broader representation of complex disease, therefore, has implications for clinicians and researchers who are tasked with creating evidence-based and consensus-based recommendations and for public health tracking of complex disease. The new model facilitates the comparison of etiological factors between complex, common and rare diseases and is available at the Human Disease Ontology website.
Assuntos
Asma , Diabetes Mellitus , Transtornos do Espectro Alcoólico Fetal , Gravidez , Feminino , Humanos , CausalidadeRESUMO
In urban ecosystems, microbes play a key role in maintaining major ecological functions that directly support human health and city life. However, the knowledge about the species composition and functions involved in urban environments is still limited, which is largely due to the lack of reference genomes in metagenomic studies comprises more than half of unclassified reads. Here we uncovered 732 novel bacterial species from 4728 samples collected from various common surface with the matching materials in the mass transit system across 60 cities by the MetaSUB Consortium. The number of novel species is significantly and positively correlated with the city population, and more novel species can be identified in the skin-associated samples. The in-depth analysis of the new gene catalog showed that the functional terms have a significant geographical distinguishability. Moreover, we revealed that more biosynthetic gene clusters (BGCs) can be found in novel species. The co-occurrence relationship between BGCs and genera and the geographical specificity of BGCs can also provide us more information for the synthesis pathways of natural products. Expanded the known urban microbiome diversity and suggested additional mechanisms for taxonomic and functional characterization of the urban microbiome. Considering the great impact of urban microbiomes on human life, our study can also facilitate the microbial interaction analysis between human and urban environment.
Assuntos
Metagenoma , Microbiota , Bactérias/genética , Humanos , Metagenômica , Interações Microbianas , Microbiota/genéticaRESUMO
The Human Disease Ontology (DO) (http://www.disease-ontology.org), database has undergone significant expansion in the past three years. The DO disease classification includes specific formal semantic rules to express meaningful disease models and has expanded from a single asserted classification to include multiple-inferred mechanistic disease classifications, thus providing novel perspectives on related diseases. Expansion of disease terms, alternative anatomy, cell type and genetic disease classifications and workflow automation highlight the updates for the DO since 2015. The enhanced breadth and depth of the DO's knowledgebase has expanded the DO's utility for exploring the multi-etiology of human disease, thus improving the capture and communication of health-related data across biomedical databases, bioinformatics tools, genomic and cancer resources and demonstrated by a 6.6× growth in DO's user community since 2015. The DO's continual integration of human disease knowledge, evidenced by the more than 200 SVN/GitHub releases/revisions, since previously reported in our DO 2015 NAR paper, includes the addition of 2650 new disease terms, a 30% increase of textual definitions, and an expanding suite of disease classification hierarchies constructed through defined logical axioms.
Assuntos
Ontologias Biológicas , Bases de Dados Factuais , Doença , Doença/classificação , Doença/etiologia , Humanos , Fluxo de TrabalhoRESUMO
The Evidence and Conclusion Ontology (ECO) contains terms (classes) that describe types of evidence and assertion methods. ECO terms are used in the process of biocuration to capture the evidence that supports biological assertions (e.g. gene product X has function Y as supported by evidence Z). Capture of this information allows tracking of annotation provenance, establishment of quality control measures and query of evidence. ECO contains over 1500 terms and is in use by many leading biological resources including the Gene Ontology, UniProt and several model organism databases. ECO is continually being expanded and revised based on the needs of the biocuration community. The ontology is freely available for download from GitHub (https://github.com/evidenceontology/) or the project's website (http://evidenceontology.org/). Users can request new terms or changes to existing terms through the project's GitHub site. ECO is released into the public domain under CC0 1.0 Universal.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Ontologia Genética , Proteínas/genética , Animais , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Proteínas/metabolismo , Análise de Sequência de Proteína , Interface Usuário-ComputadorRESUMO
The current version of the Human Disease Ontology (DO) (http://www.disease-ontology.org) database expands the utility of the ontology for the examination and comparison of genetic variation, phenotype, protein, drug and epitope data through the lens of human disease. DO is a biomedical resource of standardized common and rare disease concepts with stable identifiers organized by disease etiology. The content of DO has had 192 revisions since 2012, including the addition of 760 terms. Thirty-two percent of all terms now include definitions. DO has expanded the number and diversity of research communities and community members by 50+ during the past two years. These community members actively submit term requests, coordinate biomedical resource disease representation and provide expert curation guidance. Since the DO 2012 NAR paper, there have been hundreds of term requests and a steady increase in the number of DO listserv members, twitter followers and DO website usage. DO is moving to a multi-editor model utilizing Protégé to curate DO in web ontology language. This will enable closer collaboration with the Human Phenotype Ontology, EBI's Ontology Working Group, Mouse Genome Informatics and the Monarch Initiative among others, and enhance DO's current asserted view and multiple inferred views through reasoning.
Assuntos
Ontologias Biológicas , Bases de Dados Factuais , Doença , Doenças Genéticas Inatas , Humanos , Internet , Doenças Raras/genéticaRESUMO
The Disease Ontology (DO) enables cross-domain data integration through a common standard of human disease terms and their etiological descriptions. Standardized disease descriptors that are integrated across mammalian genomic resources provide a human-readable, machine-interpretable, community-driven disease corpus that unifies the representation of human common and rare diseases. The DO is populated by consensus-driven disease data descriptors that incorporate disease terms utilized by genomic and genetic projects and resources engaged in studies to understand the genetics of human disease through the study of model organisms. The DO project serves multiple roles for the model organism community by providing: (1) a structured "backbone" of disease concepts represented among the model organism databases; (2) authoritative disease curation services to researchers and resource providers; and (3) development of subsets of the DO representative of human diseases annotated to animal models curated within the model organism databases.
Assuntos
Bases de Dados Genéticas , Modelos Animais de Doenças , Doenças Genéticas Inatas/classificação , Animais , Doenças Genéticas Inatas/genética , Genoma , Humanos , FenótipoRESUMO
A vast and rich body of information has grown up as a result of the world's enthusiasm for 'omics technologies. Finding ways to describe and make available this information that maximise its usefulness has become a major effort across the 'omics world. At the heart of this effort is the Genomic Standards Consortium (GSC), an open-membership organization that drives community-based standardization activities, Here we provide a short history of the GSC, provide an overview of its range of current activities, and make a call for the scientific community to join forces to improve the quality and quantity of contextual information about our public collections of genomes, metagenomes, and marker gene sequences.
Assuntos
Bases de Dados Genéticas , Genômica/normas , Cooperação Internacional , MetagenomaRESUMO
In biomedical research, validating a scientific discovery hinges on the reproducibility of its experimental results. However, in genomics, the definition and implementation of reproducibility remain imprecise. We argue that genomic reproducibility, defined as the ability of bioinformatics tools to maintain consistent results across technical replicates, is essential for advancing scientific knowledge and medical applications. Initially, we examine different interpretations of reproducibility in genomics to clarify terms. Subsequently, we discuss the impact of bioinformatics tools on genomic reproducibility and explore methods for evaluating these tools regarding their effectiveness in ensuring genomic reproducibility. Finally, we recommend best practices to improve genomic reproducibility.
Assuntos
Biologia Computacional , Genômica , Genômica/métodos , Biologia Computacional/métodos , Reprodutibilidade dos Testes , HumanosRESUMO
DNA/RNA-stable isotope probing (SIP) is a powerful tool to link in situ microbial activity to sequencing data. Every SIP dataset captures distinct information about microbial community metabolism, process rates, and population dynamics, offering valuable insights for a wide range of research questions. Data reuse maximizes the information derived from the labor and resource-intensive SIP approaches. Yet, a review of publicly available SIP sequencing metadata showed that critical information necessary for reproducibility and reuse was often missing. Here, we outline the Minimum Information for any Stable Isotope Probing Sequence (MISIP) according to the Minimum Information for any (x) Sequence (MIxS) framework and include examples of MISIP reporting for common SIP experiments. Our objectives are to expand the capacity of MIxS to accommodate SIP-specific metadata and guide SIP users in metadata collection when planning and reporting an experiment. The MISIP standard requires 5 metadata fields-isotope, isotopolog, isotopolog label, labeling approach, and gradient position-and recommends several fields that represent best practices in acquiring and reporting SIP sequencing data (e.g., gradient density and nucleic acid amount). The standard is intended to be used in concert with other MIxS checklists to comprehensively describe the origin of sequence data, such as for marker genes (MISIP-MIMARKS) or metagenomes (MISIP-MIMS), in combination with metadata required by an environmental extension (e.g., soil). The adoption of the proposed data standard will improve the reuse of any sequence derived from a SIP experiment and, by extension, deepen understanding of in situ biogeochemical processes and microbial ecology.
Assuntos
Marcação por Isótopo , Marcação por Isótopo/métodos , Reprodutibilidade dos Testes , Microbiota/genética , Metadados , Metagenômica/métodos , Análise de Sequência de DNA/métodos , MetagenomaRESUMO
Comparative analysis of (meta)genomes necessitates aggregation, integration, and synthesis of well-annotated data using standards. The Genomic Standards Consortium (GSC) collaborates with the research community to develop and maintain the Minimum Information about any (x) Sequence (MIxS) reporting standard for genomic data. To facilitate the use of the GSC's MIxS reporting standard, we provide a description of the structure and terminology, how to navigate ontologies for required terms in MIxS, and demonstrate practical usage through a soil metagenome example.
Assuntos
Genômica , Metagenoma , Metagenômica , Metagenômica/métodos , Metagenômica/normas , Genômica/métodos , Genômica/normas , Metagenoma/genética , Bases de Dados Genéticas , Microbiologia do SoloRESUMO
As a genomic resource provider, grappling with getting a handle on how your resource is utilized can be extremely challenging. At the same time, being able to thus document the plethora of use cases is vital to demonstrate sustainability. Herein, we describe a flexible workflow, built on readily available software, that the Human Disease Ontology (DO) project has utilized to transition to semi-automated methods to identify uses of the ontology in the published literature. The novel R package DO.utils (https://github.com/DiseaseOntology/DO.utils) has been devised with a small set of key functions to support our usage workflow in combination with Google Sheets. Use of this workflow has resulted in a 3-fold increase in the number of identified publications that use the DO and has provided novel usage insights that offer new research directions and reveal a clearer picture of the DO's use and scientific impact. The DO's resource use assessment workflow and the supporting software are designed to be useful to other resources, including databases, software tools, method providers and other web resources, to achieve similar results. Database URL: https://github.com/DiseaseOntology/DO.utils.
Assuntos
Genômica , Software , Humanos , Bases de Dados Factuais , Fluxo de TrabalhoRESUMO
Molecular biology methods and technologies have advanced substantially over the past decade. These new molecular methods should be incorporated among the standard tools of planetary protection (PP) and could be validated for incorporation by 2026. To address the feasibility of applying modern molecular techniques to such an application, NASA conducted a technology workshop with private industry partners, academics, and government agency stakeholders, along with NASA staff and contractors. The technical discussions and presentations of the Multi-Mission Metagenomics Technology Development Workshop focused on modernizing and supplementing the current PP assays. The goals of the workshop were to assess the state of metagenomics and other advanced molecular techniques in the context of providing a validated framework to supplement the bacterial endospore-based NASA Standard Assay and to identify knowledge and technology gaps. In particular, workshop participants were tasked with discussing metagenomics as a stand-alone technology to provide rapid and comprehensive analysis of total nucleic acids and viable microorganisms on spacecraft surfaces, thereby allowing for the development of tailored and cost-effective microbial reduction plans for each hardware item on a spacecraft. Workshop participants recommended metagenomics approaches as the only data source that can adequately feed into quantitative microbial risk assessment models for evaluating the risk of forward (exploring extraterrestrial planet) and back (Earth harmful biological) contamination. Participants were unanimous that a metagenomics workflow, in tandem with rapid targeted quantitative (digital) PCR, represents a revolutionary advance over existing methods for the assessment of microbial bioburden on spacecraft surfaces. The workshop highlighted low biomass sampling, reagent contamination, and inconsistent bioinformatics data analysis as key areas for technology development. Finally, it was concluded that implementing metagenomics as an additional workflow for addressing concerns of NASA's robotic mission will represent a dramatic improvement in technology advancement for PP and will benefit future missions where mission success is affected by backward and forward contamination.
Assuntos
Planetas , Voo Espacial , Estados Unidos , Humanos , Meio Ambiente Extraterreno , Metagenômica , United States National Aeronautics and Space Administration , Astronave , PolíticasRESUMO
The Gemina system (http://gemina.igs.umaryland.edu) identifies, standardizes and integrates the outbreak metadata for the breadth of NIAID category A-C viral and bacterial pathogens, thereby providing an investigative and surveillance tool describing the Who [Host], What [Disease, Symptom], When [Date], Where [Location] and How [Pathogen, Environmental Source, Reservoir, Transmission Method] for each pathogen. The Gemina database will provide a greater understanding of the interactions of viral and bacterial pathogens with their hosts and infectious diseases through in-depth literature text-mining, integrated outbreak metadata, outbreak surveillance tools, extensive ontology development, metadata curation and representative genomic sequence identification and standards development. The Gemina web interface provides metadata selection and retrieval of a pathogen's; Infection Systems (Pathogen, Host, Disease, Transmission Method and Anatomy) and Incidents (Location and Date) along with a hosts Age and Gender. The Gemina system provides an integrated investigative and geospatial surveillance system connecting pathogens, pathogen products and disease anchored on the taxonomic ID of the pathogen and host to identify the breadth of hosts and diseases known for these pathogens, to identify the extent of outbreak locations, and to identify unique genomic regions with the DNA Signature Insignia Detection Tool.
Assuntos
Doenças Transmissíveis/microbiologia , Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Genes Bacterianos , Genes Virais , Animais , Infecções Bacterianas/microbiologia , Doenças Transmissíveis/virologia , Biologia Computacional/tendências , Bases de Dados Factuais , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Software , Interface Usuário-Computador , Viroses/virologiaRESUMO
Standardization of omics data drives FAIR data practices through community-built genomic data standards and biomedical ontologies. Use of standards has progressed from a foreign concept to a sought-after solution, moving from efforts to coordinate data within individual research projects to research communities coming together to identify solutions to common challenges. Today we are seeing the benefits of this multidecade groundswell to coordinate, exchange, and reuse data; to compare data across studies; and to integrate data across previously siloed resources.
Assuntos
Pesquisa Biomédica , Genômica , Metagenômica , Padrões de ReferênciaRESUMO
The symbiont-associated (SA) environmental package is a new extension to the minimum information about any (x) sequence (MIxS) standards, established by the Parasite Microbiome Project (PMP) consortium, in collaboration with the Genomics Standard Consortium. The SA was built upon the host-associated MIxS standard, but reflects the nestedness of symbiont-associated microbiota within and across host-symbiont-microbe interactions. This package is designed to facilitate the collection and reporting of a broad range of metadata information that apply to symbionts such as life history traits, association with one or multiple host organisms, or the nature of host-symbiont interactions along the mutualism-parasitism continuum. To better reflect the inherent nestedness of all biological systems, we present a novel feature that allows users to co-localize samples, to nest a package within another package, and to identify replicates. Adoption of the MIxS-SA and of the new terms will facilitate reports of complex sampling design from a myriad of environments.