RESUMO
Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.
Assuntos
Bases de Dados Factuais , Doença , Genes , Fenótipo , Humanos , Internet , Bases de Dados Factuais/normas , Software , Genes/genética , Doença/genéticaRESUMO
MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION: https://kghub.org.
Assuntos
Ontologias Biológicas , COVID-19 , Humanos , Reconhecimento Automatizado de Padrão , Doenças Raras , Aprendizado de MáquinaRESUMO
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Assuntos
Ontologias Biológicas , Disciplinas das Ciências Biológicas , Estudo de Associação Genômica Ampla , FenótipoRESUMO
The Human Phenotype Ontology (HPO, https://hpo.jax.org) was launched in 2008 to provide a comprehensive logical standard to describe and computationally analyze phenotypic abnormalities found in human disease. The HPO is now a worldwide standard for phenotype exchange. The HPO has grown steadily since its inception due to considerable contributions from clinical experts and researchers from a diverse range of disciplines. Here, we present recent major extensions of the HPO for neurology, nephrology, immunology, pulmonology, newborn screening, and other areas. For example, the seizure subontology now reflects the International League Against Epilepsy (ILAE) guidelines and these enhancements have already shown clinical validity. We present new efforts to harmonize computational definitions of phenotypic abnormalities across the HPO and multiple phenotype ontologies used for animal models of disease. These efforts will benefit software such as Exomiser by improving the accuracy and scope of cross-species phenotype matching. The computational modeling strategy used by the HPO to define disease entities and phenotypic features and distinguish between them is explained in detail.We also report on recent efforts to translate the HPO into indigenous languages. Finally, we summarize recent advances in the use of HPO in electronic health record systems.
Assuntos
Ontologias Biológicas , Biologia Computacional/métodos , Bases de Dados Factuais , Doença/genética , Genoma , Fenótipo , Software , Animais , Modelos Animais de Doenças , Genótipo , Humanos , Recém-Nascido , Cooperação Internacional , Internet , Triagem Neonatal/métodos , Farmacogenética/métodos , Terminologia como AssuntoRESUMO
Technological advances in both genome sequencing and prenatal imaging are increasing our ability to accurately recognize and diagnose Mendelian conditions prenatally. Phenotype-driven early genetic diagnosis of fetal genetic disease can help to strategize treatment options and clinical preventive measures during the perinatal period, to plan in utero therapies, and to inform parental decision-making. Fetal phenotypes of genetic diseases are often unique and at present are not well understood; more comprehensive knowledge about prenatal phenotypes and computational resources have an enormous potential to improve diagnostics and translational research. The Human Phenotype Ontology (HPO) has been widely used to support diagnostics and translational research in human genetics. To better support prenatal usage, the HPO consortium conducted a series of workshops with a group of domain experts in a variety of medical specialties, diagnostic techniques, as well as diseases and phenotypes related to prenatal medicine, including perinatal pathology, musculoskeletal anomalies, neurology, medical genetics, hydrops fetalis, craniofacial malformations, cardiology, neonatal-perinatal medicine, fetal medicine, placental pathology, prenatal imaging, and bioinformatics. We expanded the representation of prenatal phenotypes in HPO by adding 95 new phenotype terms under the Abnormality of prenatal development or birth (HP:0001197) grouping term, and revised definitions, synonyms, and disease annotations for most of the 152 terms that existed before the beginning of this effort. The expansion of prenatal phenotypes in HPO will support phenotype-driven prenatal exome and genome sequencing for precision genetic diagnostics of rare diseases to support prenatal care.
Assuntos
Biologia Computacional , Placenta , Recém-Nascido , Humanos , Feminino , Gravidez , Biologia Computacional/métodos , Fenótipo , Doenças Raras , Sequenciamento do ExomaRESUMO
Hyalella azteca is a cryptic species complex of epibenthic amphipods of interest to ecotoxicology and evolutionary biology. It is the primary crustacean used in North America for sediment toxicity testing and an emerging model for molecular ecotoxicology. To provide molecular resources for sediment quality assessments and evolutionary studies, we sequenced, assembled, and annotated the genome of the H. azteca U.S. Lab Strain. The genome quality and completeness is comparable with other ecotoxicological model species. Through targeted investigation and use of gene expression data sets of H. azteca exposed to pesticides, metals, and other emerging contaminants, we annotated and characterized the major gene families involved in sequestration, detoxification, oxidative stress, and toxicant response. Our results revealed gene loss related to light sensing, but a large expansion in chemoreceptors, likely underlying sensory shifts necessary in their low light habitats. Gene family expansions were also noted for cytochrome P450 genes, cuticle proteins, ion transporters, and include recent gene duplications in the metal sequestration protein, metallothionein. Mapping of differentially expressed transcripts to the genome significantly increased the ability to functionally annotate toxicant responsive genes. The H. azteca genome will greatly facilitate development of genomic tools for environmental assessments and promote an understanding of how evolution shapes toxicological pathways with implications for environmental and human health.
Assuntos
Anfípodes , Poluentes Químicos da Água , Animais , Ecotoxicologia , Sedimentos Geológicos , América do Norte , Testes de ToxicidadeRESUMO
Genomes of eusocial insects code for dramatic examples of phenotypic plasticity and social organization. We compared the genomes of seven ants, the honeybee, and various solitary insects to examine whether eusocial lineages share distinct features of genomic organization. Each ant lineage contains â¼4000 novel genes, but only 64 of these genes are conserved among all seven ants. Many gene families have been expanded in ants, notably those involved in chemical communication (e.g., desaturases and odorant receptors). Alignment of the ant genomes revealed reduced purifying selection compared with Drosophila without significantly reduced synteny. Correspondingly, ant genomes exhibit dramatic divergence of noncoding regulatory elements; however, extant conserved regions are enriched for novel noncoding RNAs and transcription factor-binding sites. Comparison of orthologous gene promoters between eusocial and solitary species revealed significant regulatory evolution in both cis (e.g., Creb) and trans (e.g., fork head) for nearly 2000 genes, many of which exhibit phenotypic plasticity. Our results emphasize that genomic changes can occur remarkably fast in ants, because two recently diverged leaf-cutter ant species exhibit faster accumulation of species-specific genes and greater divergence in regulatory elements compared with other ants or Drosophila. Thus, while the "socio-genomes" of ants and the honeybee are broadly characterized by a pervasive pattern of divergence in gene composition and regulation, they preserve lineage-specific regulatory features linked to eusociality. We propose that changes in gene regulation played a key role in the origins of insect eusociality, whereas changes in gene composition were more relevant for lineage-specific eusocial adaptations.
Assuntos
Formigas/genética , Genoma de Inseto , Animais , Comportamento Animal , Sítios de Ligação , Sequência Conservada , Metilação de DNA , Evolução Molecular , Regulação da Expressão Gênica , Himenópteros/genética , Proteínas de Insetos/genética , MicroRNAs/genética , Modelos Genéticos , Filogenia , Sequências Reguladoras de Ácido Nucleico , Análise de Sequência de DNA , Comportamento Social , Especificidade da Espécie , Sintenia , Fatores de Transcrição/genéticaRESUMO
BACKGROUND: The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. RESULTS: Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. CONCLUSIONS: Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.
Assuntos
Abelhas/genética , Genes de Insetos , Animais , Composição de Bases , Bases de Dados Genéticas , Sequências Repetitivas Dispersas/genética , Anotação de Sequência Molecular , Fases de Leitura Aberta/genética , Peptídeos/análise , Análise de Sequência de RNA , Homologia de Sequência de AminoácidosRESUMO
Leaf-cutter ants are one of the most important herbivorous insects in the Neotropics, harvesting vast quantities of fresh leaf material. The ants use leaves to cultivate a fungus that serves as the colony's primary food source. This obligate ant-fungus mutualism is one of the few occurrences of farming by non-humans and likely facilitated the formation of their massive colonies. Mature leaf-cutter ant colonies contain millions of workers ranging in size from small garden tenders to large soldiers, resulting in one of the most complex polymorphic caste systems within ants. To begin uncovering the genomic underpinnings of this system, we sequenced the genome of Atta cephalotes using 454 pyrosequencing. One prediction from this ant's lifestyle is that it has undergone genetic modifications that reflect its obligate dependence on the fungus for nutrients. Analysis of this genome sequence is consistent with this hypothesis, as we find evidence for reductions in genes related to nutrient acquisition. These include extensive reductions in serine proteases (which are likely unnecessary because proteolysis is not a primary mechanism used to process nutrients obtained from the fungus), a loss of genes involved in arginine biosynthesis (suggesting that this amino acid is obtained from the fungus), and the absence of a hexamerin (which sequesters amino acids during larval development in other insects). Following recent reports of genome sequences from other insects that engage in symbioses with beneficial microbes, the A. cephalotes genome provides new insights into the symbiotic lifestyle of this ant and advances our understanding of host-microbe symbioses.
Assuntos
Formigas/fisiologia , Genoma de Inseto/genética , Folhas de Planta/fisiologia , Simbiose , Animais , Formigas/genética , Arginina/genética , Arginina/metabolismo , Sequência de Bases , Fungos/genética , Proteínas de Insetos/genética , Proteínas de Insetos/metabolismo , Análise de Sequência de DNA , Serina Proteases/genética , Serina Proteases/metabolismoRESUMO
Ants are some of the most abundant and familiar animals on Earth, and they play vital roles in most terrestrial ecosystems. Although all ants are eusocial, and display a variety of complex and fascinating behaviors, few genomic resources exist for them. Here, we report the draft genome sequence of a particularly widespread and well-studied species, the invasive Argentine ant (Linepithema humile), which was accomplished using a combination of 454 (Roche) and Illumina sequencing and community-based funding rather than federal grant support. Manual annotation of >1,000 genes from a variety of different gene families and functional classes reveals unique features of the Argentine ant's biology, as well as similarities to Apis mellifera and Nasonia vitripennis. Distinctive features of the Argentine ant genome include remarkable expansions of gustatory (116 genes) and odorant receptors (367 genes), an abundance of cytochrome P450 genes (>110), lineage-specific expansions of yellow/major royal jelly proteins and desaturases, and complete CpG DNA methylation and RNAi toolkits. The Argentine ant genome contains fewer immune genes than Drosophila and Tribolium, which may reflect the prominent role played by behavioral and chemical suppression of pathogens. Analysis of the ratio of observed to expected CpG nucleotides for genes in the reproductive development and apoptosis pathways suggests higher levels of methylation than in the genome overall. The resources provided by this genome sequence will offer an abundance of tools for researchers seeking to illuminate the fascinating biology of this emerging model organism.
Assuntos
Formigas/genética , Genoma de Inseto/genética , Genômica/métodos , Filogenia , Animais , Formigas/fisiologia , Sequência de Bases , California , Metilação de DNA , Biblioteca Gênica , Genética Populacional , Hierarquia Social , Dados de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , Receptores Odorantes/genética , Análise de Sequência de DNARESUMO
We report the draft genome sequence of the red harvester ant, Pogonomyrmex barbatus. The genome was sequenced using 454 pyrosequencing, and the current assembly and annotation were completed in less than 1 y. Analyses of conserved gene groups (more than 1,200 manually annotated genes to date) suggest a high-quality assembly and annotation comparable to recently sequenced insect genomes using Sanger sequencing. The red harvester ant is a model for studying reproductive division of labor, phenotypic plasticity, and sociogenomics. Although the genome of P. barbatus is similar to other sequenced hymenopterans (Apis mellifera and Nasonia vitripennis) in GC content and compositional organization, and possesses a complete CpG methylation toolkit, its predicted genomic CpG content differs markedly from the other hymenopterans. Gene networks involved in generating key differences between the queen and worker castes (e.g., wings and ovaries) show signatures of increased methylation and suggest that ants and bees may have independently co-opted the same gene regulatory mechanisms for reproductive division of labor. Gene family expansions (e.g., 344 functional odorant receptors) and pseudogene accumulation in chemoreception and P450 genes compared with A. mellifera and N. vitripennis are consistent with major life-history changes during the adaptive radiation of Pogonomyrmex spp., perhaps in parallel with the development of the North American deserts.
Assuntos
Formigas/genética , Redes Reguladoras de Genes/genética , Genoma de Inseto/genética , Genômica/métodos , Filogenia , Animais , Formigas/fisiologia , Sequência de Bases , Clima Desértico , Hierarquia Social , Dados de Sequência Molecular , América do Norte , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Receptores Odorantes/genética , Análise de Sequência de DNARESUMO
The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present Phenopacket Store. Version 0.1.19 of Phenopacket Store includes 6668 phenopackets representing 475 Mendelian and chromosomal diseases associated with 423 genes and 3834 unique pathogenic alleles curated from 959 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.
RESUMO
The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.
RESUMO
BACKGROUND: Ontologies are fundamental components of informatics infrastructure in domains such as biomedical, environmental, and food sciences, representing consensus knowledge in an accurate and computable form. However, their construction and maintenance demand substantial resources and necessitate substantial collaboration between domain experts, curators, and ontology experts. We present Dynamic Retrieval Augmented Generation of Ontologies using AI (DRAGON-AI), an ontology generation method employing Large Language Models (LLMs) and Retrieval Augmented Generation (RAG). DRAGON-AI can generate textual and logical ontology components, drawing from existing knowledge in multiple ontologies and unstructured text sources. RESULTS: We assessed performance of DRAGON-AI on de novo term construction across ten diverse ontologies, making use of extensive manual evaluation of results. Our method has high precision for relationship generation, but has slightly lower precision than from logic-based reasoning. Our method is also able to generate definitions deemed acceptable by expert evaluators, but these scored worse than human-authored definitions. Notably, evaluators with the highest level of confidence in a domain were better able to discern flaws in AI-generated definitions. We also demonstrated the ability of DRAGON-AI to incorporate natural language instructions in the form of GitHub issues. CONCLUSIONS: These findings suggest DRAGON-AI's potential to substantially aid the manual ontology construction process. However, our results also underscore the importance of having expert curators and ontology editors drive the ontology generation process.
Assuntos
Inteligência Artificial , Ontologias Biológicas , Processamento de Linguagem Natural , Armazenamento e Recuperação da Informação/métodosRESUMO
Phenotypic data are critical for understanding biological mechanisms and consequences of genomic variation, and are pivotal for clinical use cases such as disease diagnostics and treatment development. For over a century, vast quantities of phenotype data have been collected in many different contexts covering a variety of organisms. The emerging field of phenomics focuses on integrating and interpreting these data to inform biological hypotheses. A major impediment in phenomics is the wide range of distinct and disconnected approaches to recording the observable characteristics of an organism. Phenotype data are collected and curated using free text, single terms or combinations of terms, using multiple vocabularies, terminologies, or ontologies. Integrating these heterogeneous and often siloed data enables the application of biological knowledge both within and across species. Existing integration efforts are typically limited to mappings between pairs of terminologies; a generic knowledge representation that captures the full range of cross-species phenomics data is much needed. We have developed the Unified Phenotype Ontology (uPheno) framework, a community effort to provide an integration layer over domain-specific phenotype ontologies, as a single, unified, logical representation. uPheno comprises (1) a system for consistent computational definition of phenotype terms using ontology design patterns, maintained as a community library; (2) a hierarchical vocabulary of species-neutral phenotype terms under which their species-specific counterparts are grouped; and (3) mapping tables between species-specific ontologies. This harmonized representation supports use cases such as cross-species integration of genotype-phenotype associations from different organisms and cross-species informed variant prioritization.
RESUMO
The Hymenoptera Genome Database (HGD) is a comprehensive model organism database that caters to the needs of scientists working on insect species of the order Hymenoptera. This system implements open-source software and relational databases providing access to curated data contributed by an extensive, active research community. HGD contains data from 9 different species across â¼200 million years in the phylogeny of Hymenoptera, allowing researchers to leverage genetic, genome sequence and gene expression data, as well as the biological knowledge of related model organisms. The availability of resources across an order greatly facilitates comparative genomics and enhances our understanding of the biology of agriculturally important Hymenoptera species through genomics. Curated data at HGD includes predicted and annotated gene sets supported with evidence tracks such as ESTs/cDNAs, small RNA sequences and GC composition domains. Data at HGD can be queried using genome browsers and/or BLAST/PSI-BLAST servers, and it may also be downloaded to perform local searches. We encourage the public to access and contribute data to HGD at: http://HymenopteraGenome.org.
Assuntos
Bases de Dados Genéticas , Genoma de Inseto , Himenópteros/genética , Animais , Genômica , Anotação de Sequência Molecular , Software , Integração de SistemasRESUMO
The 24th annual Bioinformatics Open Source Conference ( BOSC 2023) was part of the 2023i conference on Intelligent Systems for Molecular Biology and the European Conference on Computational Biology (ISMB/ECCB 2023). Launched in 2000 and held yearly since, BOSC is the premier meeting covering open-source bioinformatics and open science. Like ISMB 2022, the 2023 meeting was a hybrid conference, with the in-person component hosted in Lyon, France. ISMB/ECCB attracted a near-record number of attendees, with over 2100 in person and about 900 more online. Approximately 200 people participated in BOSC sessions. In addition to 43 talks and 49 posters, BOSC featured two keynotes: Sara El-Gebali, who spoke about "A New Odyssey: Pioneering the Future of Scientific Progress Through Open Collaboration", and Joseph Yracheta, who spoke about "The Dissonance between Scientific Altruism & Capitalist Extraction: The Zero Trust and Federated Data Sovereignty Solution." Once again, a joint session brought together BOSC and the Bio-Ontologies COSI. The conference ended with a panel on Open and Ethical Data Sharing. As in prior years, BOSC was preceded by a CollaborationFest, a collaborative work event that brought together about 40 participants interested in synergistically combining ideas, shaping project plans, developing software, and more.
Assuntos
Biologia Computacional , Software , Humanos , Disseminação de InformaçãoRESUMO
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focused measurable trait data. Moreover, variations in gene expression in response to environmental disturbances even without any genetic alterations can also be associated with particular biological attributes. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
RESUMO
Navigating the vast landscape of clinical literature to find optimal treatments and management strategies can be a challenging task, especially for rare diseases. To address this task, we introduce the Medical Action Ontology (MAxO), the first ontology specifically designed to organize medical procedures, therapies, and interventions in a structured way. Currently, MAxO contains 1757 medical action terms added through a combination of manual and semi-automated processes. MAxO was developed with logical structures that make it compatible with several other ontologies within the Open Biological and Biomedical Ontologies (OBO) Foundry. These cover a wide range of biomedical domains, from human anatomy and investigations to the chemical and protein entities involved in biological processes. We have created a database of over 16000 annotations that describe diagnostic modalities for specific phenotypic abnormalities as defined by the Human Phenotype Ontology (HPO). Additionally, 413 annotations are provided for medical actions for 189 rare diseases. We have developed a web application called POET (https://poet.jax.org/) for the community to use to contribute MAxO annotations. MAxO provides a computational representation of treatments and other actions taken for the clinical management of patients. The development of MAxO is closely coupled to the Mondo Disease Ontology (Mondo) and the Human Phenotype Ontology (HPO) and expands the scope of our computational modeling of diseases and phenotypic features to include diagnostics and therapeutic actions. MAxO is available under the open-source CC-BY 4.0 license (https://github.com/monarch-initiative/MAxO).