Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 58
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 48(D1): D704-D715, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31701156

RESUMEN

In biology and biomedicine, relating phenotypic outcomes with genetic variation and environmental factors remains a challenge: patient phenotypes may not match known diseases, candidate variants may be in genes that haven't been characterized, research organisms may not recapitulate human or veterinary diseases, environmental factors affecting disease outcomes are unknown or undocumented, and many resources must be queried to find potentially significant phenotypic associations. The Monarch Initiative (https://monarchinitiative.org) integrates information on genes, variants, genotypes, phenotypes and diseases in a variety of species, and allows powerful ontology-based search. We develop many widely adopted ontologies that together enable sophisticated computational analysis, mechanistic discovery and diagnostics of Mendelian diseases. Our algorithms and tools are widely used to identify animal models of human disease through phenotypic similarity, for differential diagnostics and to facilitate translational research. Launched in 2015, Monarch has grown with regards to data (new organisms, more sources, better modeling); new API and standards; ontologies (new Mondo unified disease ontology, improvements to ontologies such as HPO and uPheno); user interface (a redesigned website); and community development. Monarch data, algorithms and tools are being used and extended by resources such as GA4GH and NCATS Translator, among others, to aid mechanistic discovery and diagnostics.


Asunto(s)
Biología Computacional/métodos , Genotipo , Fenotipo , Algoritmos , Animales , Ontologías Biológicas , Bases de Datos Genéticas , Exoma , Estudios de Asociación Genética , Variación Genética , Genómica , Humanos , Internet , Programas Informáticos , Investigación Biomédica Traslacional , Interfaz Usuario-Computador
2.
PLoS Comput Biol ; 15(2): e1006790, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30726205

RESUMEN

Genome annotation is the process of identifying the location and function of a genome's encoded features. Improving the biological accuracy of annotation is a complex and iterative process requiring researchers to review and incorporate multiple sources of information such as transcriptome alignments, predictive models based on sequence profiles, and comparisons to features found in related organisms. Because rapidly decreasing costs are enabling an ever-growing number of scientists to incorporate sequencing as a routine laboratory technique, there is widespread demand for tools that can assist in the deliberative analytical review of genomic information. To this end, we present Apollo, an open source software package that enables researchers to efficiently inspect and refine the precise structure and role of genomic features in a graphical browser-based platform. Some of Apollo's newer user interface features include support for real-time collaboration, allowing distributed users to simultaneously edit the same encoded features while also instantly seeing the updates made by other researchers on the same region in a manner similar to Google Docs. Its technical architecture enables Apollo to be integrated into multiple existing genomic analysis pipelines and heterogeneous laboratory workflow platforms. Finally, we consider the implications that Apollo and related applications may have on how the results of genome research are published and made accessible.


Asunto(s)
Biología Computacional/métodos , Anotación de Secuencia Molecular/métodos , Mapeo Cromosómico/métodos , Sistemas de Administración de Bases de Datos , Genoma/genética , Genómica , Almacenamiento y Recuperación de la Información , Internet , Programas Informáticos , Interfaz Usuario-Computador
3.
Am J Hum Genet ; 99(3): 595-606, 2016 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-27569544

RESUMEN

The interpretation of non-coding variants still constitutes a major challenge in the application of whole-genome sequencing in Mendelian disease, especially for single-nucleotide and other small non-coding variants. Here we present Genomiser, an analysis framework that is able not only to score the relevance of variation in the non-coding genome, but also to associate regulatory variants to specific Mendelian diseases. Genomiser scores variants through either existing methods such as CADD or a bespoke machine learning method and combines these with allele frequency, regulatory sequences, chromosomal topological domains, and phenotypic relevance to discover variants associated to specific Mendelian disorders. Overall, Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes, allowing effective detection and discovery of regulatory variants in Mendelian disease.


Asunto(s)
Algoritmos , Enfermedades Genéticas Congénitas/genética , Genoma Humano/genética , Mutación/genética , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo , Humanos , Aprendizaje Automático , Sistemas de Lectura Abierta/genética , Fenotipo , Mutación Puntual/genética
4.
Nat Methods ; 13(5): 425-30, 2016 05.
Artículo en Inglés | MEDLINE | ID: mdl-27043882

RESUMEN

Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.


Asunto(s)
Biología Computacional/normas , Genómica/normas , Filogenia , Proteómica/normas , Archaea/clasificación , Archaea/genética , Bacterias/clasificación , Bacterias/genética , Biología Computacional/métodos , Bases de Datos Genéticas , Eucariontes/clasificación , Eucariontes/genética , Ontología de Genes , Genómica/métodos , Modelos Genéticos , Proteómica/métodos , Análisis de Secuencia de Proteína , Homología de Secuencia , Especificidad de la Especie
5.
Nucleic Acids Res ; 45(D1): D712-D722, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899636

RESUMEN

The correlation of phenotypic outcomes with genetic variation and environmental factors is a core pursuit in biology and biomedicine. Numerous challenges impede our progress: patient phenotypes may not match known diseases, candidate variants may be in genes that have not been characterized, model organisms may not recapitulate human or veterinary diseases, filling evolutionary gaps is difficult, and many resources must be queried to find potentially significant genotype-phenotype associations. Non-human organisms have proven instrumental in revealing biological mechanisms. Advanced informatics tools can identify phenotypically relevant disease models in research and diagnostic contexts. Large-scale integration of model organism and clinical research data can provide a breadth of knowledge not available from individual sources and can provide contextualization of data back to these sources. The Monarch Initiative (monarchinitiative.org) is a collaborative, open science effort that aims to semantically integrate genotype-phenotype data from many species and sources in order to support precision medicine, disease modeling, and mechanistic exploration. Our integrated knowledge graph, analytic tools, and web services enable diverse users to explore relationships between phenotypes and genotypes across species.


Asunto(s)
Bases de Datos Genéticas , Estudios de Asociación Genética/métodos , Genotipo , Fenotipo , Animales , Evolución Biológica , Biología Computacional/métodos , Curaduría de Datos , Humanos , Motor de Búsqueda , Programas Informáticos , Especificidad de la Especie , Interfaz Usuario-Computador , Navegador Web
6.
Am J Hum Genet ; 97(1): 111-24, 2015 Jul 02.
Artículo en Inglés | MEDLINE | ID: mdl-26119816

RESUMEN

The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence-variation data, and translational research, but a comparable resource has not been available for common disease. Here, we have developed a concept-recognition procedure that analyzes the frequencies of HPO disease annotations as identified in over five million PubMed abstracts by employing an iterative procedure to optimize precision and recall of the identified terms. We derived disease models for 3,145 common human diseases comprising a total of 132,006 HPO annotations. The HPO now comprises over 250,000 phenotypic annotations for over 10,000 rare and common diseases and can be used for examining the phenotypic overlap among common diseases that share risk alleles, as well as between Mendelian diseases and common diseases linked by genomic location. The annotations, as well as the HPO itself, are freely available.


Asunto(s)
Ontología de Genes/tendencias , Enfermedades Genéticas Congénitas/clasificación , Enfermedades Genéticas Congénitas/genética , Fenotipo , Terminología como Asunto , Enfermedades Genéticas Congénitas/patología , Humanos , MEDLINE , Modelos Biológicos
7.
PLoS Biol ; 13(1): e1002033, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25562316

RESUMEN

Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.


Asunto(s)
Estudios de Asociación Genética , Animales , Biología Computacional , Curaduría de Datos , Bases de Datos Factuales/normas , Interacción Gen-Ambiente , Genómica , Humanos , Fenotipo , Estándares de Referencia , Reproducibilidad de los Resultados , Terminología como Asunto
8.
Genome Res ; 24(2): 340-8, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24162188

RESUMEN

Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic. The observation that each of our genomes contains about 100 genuine loss-of-function variants makes identification of the causative mutation problematic when using these strategies alone. We propose using the wealth of genotype to phenotype data that already exists from model organism studies to assess the potential impact of these exome variants. Here, we introduce PHenotypic Interpretation of Variants in Exomes (PHIVE), an algorithm that integrates the calculation of phenotype similarity between human diseases and genetically modified mouse models with evaluation of the variants according to allele frequency, pathogenicity, and mode of inheritance approaches in our Exomiser tool. Large-scale validation of PHIVE analysis using 100,000 exomes containing known mutations demonstrated a substantial improvement (up to 54.1-fold) over purely variant-based (frequency and pathogenicity) methods with the correct gene recalled as the top hit in up to 83% of samples, corresponding to an area under the ROC curve of >95%. We conclude that incorporation of phenotype data can play a vital role in translational bioinformatics and propose that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here.


Asunto(s)
Exoma/genética , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple/genética , Algoritmos , Animales , Biología Computacional , Bases de Datos Genéticas , Humanos , Ratones , Fenotipo , Análisis de Secuencia de ADN , Programas Informáticos
9.
Bioinformatics ; 32(22): 3501-3503, 2016 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-27412096

RESUMEN

The MSAViewer is a quick and easy visualization and analysis JavaScript component for Multiple Sequence Alignment data of any size. Core features include interactive navigation through the alignment, application of popular color schemes, sorting, selecting and filtering. The MSAViewer is 'web ready': written entirely in JavaScript, compatible with modern web browsers and does not require any specialized software. The MSAViewer is part of the BioJS collection of components. AVAILABILITY AND IMPLEMENTATION: The MSAViewer is released as open source software under the Boost Software License 1.0. Documentation, source code and the viewer are available at http://msa.biojs.net/Supplementary information: Supplementary data are available at Bioinformatics online. CONTACT: msa@bio.sh.


Asunto(s)
Alineación de Secuencia , Programas Informáticos , Lenguajes de Programación , Navegador Web
10.
Nucleic Acids Res ; 42(Database issue): D966-74, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24217912

RESUMEN

The Human Phenotype Ontology (HPO) project, available at http://www.human-phenotype-ontology.org, provides a structured, comprehensive and well-defined set of 10,088 classes (terms) describing human phenotypic abnormalities and 13,326 subclass relations between the HPO classes. In addition we have developed logical definitions for 46% of all HPO classes using terms from ontologies for anatomy, cell types, function, embryology, pathology and other domains. This allows interoperability with several resources, especially those containing phenotype information on model organisms such as mouse and zebrafish. Here we describe the updated HPO database, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO. Various meta-attributes such as frequency, references and negations are associated with each annotation. Several large-scale projects worldwide utilize the HPO for describing phenotype information in their datasets. We have therefore generated equivalence mappings to other phenotype vocabularies such as LDDB, Orphanet, MedDRA, UMLS and phenoDB, allowing integration of existing datasets and interoperability with multiple biomedical resources. We have created various ways to access the HPO database content using flat files, a MySQL database, and Web-based tools. All data and documentation on the HPO project can be found online.


Asunto(s)
Ontologías Biológicas , Bases de Datos Factuales , Enfermedades Genéticas Congénitas/genética , Fenotipo , Animales , Enfermedades Genéticas Congénitas/diagnóstico , Genómica , Humanos , Internet , Ratones
11.
Hum Mutat ; 36(10): 979-84, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26269093

RESUMEN

The Matchmaker Exchange application programming interface (API) allows searching a patient's genotypic or phenotypic profiles across clinical sites, for the purposes of cohort discovery and variant disease causal validation. This API can be used not only to search for matching patients, but also to match against public disease and model organism data. This public disease data enable matching known diseases and variant-phenotype associations using phenotype semantic similarity algorithms developed by the Monarch Initiative. The model data can provide additional evidence to aid diagnosis, suggest relevant models for disease mechanism and treatment exploration, and identify collaborators across the translational divide. The Monarch Initiative provides an implementation of this API for searching multiple integrated sources of data that contextualize the knowledge about any given patient or patient family into the greater biomedical knowledge landscape. While this corpus of data can aid diagnosis, it is also the beginning of research to improve understanding of rare human diseases.


Asunto(s)
Bases de Datos Genéticas , Enfermedad/genética , Predisposición Genética a la Enfermedad/genética , Animales , Modelos Animales de Enfermedad , Variación Genética , Humanos , Difusión de la Información , Fenotipo , Interfaz Usuario-Computador
12.
Mamm Genome ; 26(9-10): 548-55, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26092691

RESUMEN

New sequencing technologies have ushered in a new era for diagnosis and discovery of new causative mutations for rare diseases. However, the sheer numbers of candidate variants that require interpretation in an exome or genomic analysis are still a challenging prospect. A powerful approach is the comparison of the patient's set of phenotypes (phenotypic profile) to known phenotypic profiles caused by mutations in orthologous genes associated with these variants. The most abundant source of relevant data for this task is available through the efforts of the Mouse Genome Informatics group and the International Mouse Phenotyping Consortium. In this review, we highlight the challenges in comparing human clinical phenotypes with mouse phenotypes and some of the solutions that have been developed by members of the Monarch Initiative. These tools allow the identification of mouse models for known disease-gene associations that may otherwise have been overlooked as well as candidate genes may be prioritized for novel associations. The culmination of these efforts is the Exomiser software package that allows clinical researchers to analyse patient exomes in the context of variant frequency and predicted pathogenicity as well the phenotypic similarity of the patient to any given candidate orthologous gene.


Asunto(s)
Bases de Datos Genéticas , Enfermedades Genéticas Congénitas , Animales , Biología Computacional , Modelos Animales de Enfermedad , Exoma/genética , Genómica , Humanos , Ratones , Mutación , Fenotipo
13.
Nucleic Acids Res ; 40(Database issue): D1082-8, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22080565

RESUMEN

In an effort to comprehensively characterize the functional elements within the genomes of the important model organisms Drosophila melanogaster and Caenorhabditis elegans, the NHGRI model organism Encyclopaedia of DNA Elements (modENCODE) consortium has generated an enormous library of genomic data along with detailed, structured information on all aspects of the experiments. The modMine database (http://intermine.modencode.org) described here has been built by the modENCODE Data Coordination Center to allow the broader research community to (i) search for and download data sets of interest among the thousands generated by modENCODE; (ii) access the data in an integrated form together with non-modENCODE data sets; and (iii) facilitate fine-grained analysis of the above data. The sophisticated search features are possible because of the collection of extensive experimental metadata by the consortium. Interfaces are provided to allow both biologists and bioinformaticians to exploit these rich modENCODE data sets now available via modMine.


Asunto(s)
Caenorhabditis elegans/genética , Bases de Datos Genéticas , Drosophila melanogaster/genética , Animales , Expresión Génica , Genoma de los Helmintos , Genoma de los Insectos , Genómica , Internet , Interfaz Usuario-Computador
14.
Brief Bioinform ; 12(5): 449-62, 2011 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-21873635

RESUMEN

The goal of the Gene Ontology (GO) project is to provide a uniform way to describe the functions of gene products from organisms across all kingdoms of life and thereby enable analysis of genomic data. Protein annotations are either based on experiments or predicted from protein sequences. Since most sequences have not been experimentally characterized, most available annotations need to be based on predictions. To make as accurate inferences as possible, the GO Consortium's Reference Genome Project is using an explicit evolutionary framework to infer annotations of proteins from a broad set of genomes from experimental annotations in a semi-automated manner. Most components in the pipeline, such as selection of sequences, building multiple sequence alignments and phylogenetic trees, retrieving experimental annotations and depositing inferred annotations, are fully automated. However, the most crucial step in our pipeline relies on software-assisted curation by an expert biologist. This curation tool, Phylogenetic Annotation and INference Tool (PAINT) helps curators to infer annotations among members of a protein family. PAINT allows curators to make precise assertions as to when functions were gained and lost during evolution and record the evidence (e.g. experimentally supported GO annotations and phylogenetic information including orthology) for those assertions. In this article, we describe how we use PAINT to infer protein function in a phylogenetic context with emphasis on its strengths, limitations and guidelines. We also discuss specific examples showing how PAINT annotations compare with those generated by other highly used homology-based methods.


Asunto(s)
Genómica/métodos , Anotación de Secuencia Molecular/métodos , Filogenia , Proteínas/química , Bases de Datos Genéticas , Genoma , Proteínas/genética
15.
Nat Cell Biol ; 8(11): 1190-4, 2006 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-17060903

RESUMEN

Logical models and physical specifications provide the foundation for storage, management and analysis of complex sets of data, and describe the relationships between measured data elements and metadata - the contextual descriptors that define the primary data. Here, we use imaging applications to illustrate the purpose of the various implementations of data specifications and the requirement for open, standardized, data formats to facilitate the sharing of critical digital data and metadata.


Asunto(s)
Sistemas de Administración de Bases de Datos/normas , Hipermedia/normas , Almacenamiento y Recuperación de la Información/normas , Lenguajes de Programación , Animales , Sistemas de Administración de Bases de Datos/estadística & datos numéricos , Genoma , Humanos , Hipermedia/estadística & datos numéricos , Almacenamiento y Recuperación de la Información/métodos , Almacenamiento y Recuperación de la Información/estadística & datos numéricos , Investigación/normas , Investigación/estadística & datos numéricos , Proyectos de Investigación , Integración de Sistemas , Factores de Tiempo
16.
PLoS Comput Biol ; 8(2): e1002386, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22359495

RESUMEN

A recent paper (Nehrt et al., PLoS Comput. Biol. 7:e1002073, 2011) has proposed a metric for the "functional similarity" between two genes that uses only the Gene Ontology (GO) annotations directly derived from published experimental results. Applying this metric, the authors concluded that paralogous genes within the mouse genome or the human genome are more functionally similar on average than orthologous genes between these genomes, an unexpected result with broad implications if true. We suggest, based on both theoretical and empirical considerations, that this proposed metric should not be interpreted as a functional similarity, and therefore cannot be used to support any conclusions about the "ortholog conjecture" (or, more properly, the "ortholog functional conservation hypothesis"). First, we reexamine the case studies presented by Nehrt et al. as examples of orthologs with divergent functions, and come to a very different conclusion: they actually exemplify how GO annotations for orthologous genes provide complementary information about conserved biological functions. We then show that there is a global ascertainment bias in the experiment-based GO annotations for human and mouse genes: particular types of experiments tend to be performed in different model organisms. We conclude that the reported statistical differences in annotations between pairs of orthologous genes do not reflect differences in biological function, but rather complementarity in experimental approaches. Our results underscore two general considerations for researchers proposing novel types of analysis based on the GO: 1) that GO annotations are often incomplete, potentially in a biased manner, and subject to an "open world assumption" (absence of an annotation does not imply absence of a function), and 2) that conclusions drawn from a novel, large-scale GO analysis should whenever possible be supported by careful, in-depth examination of examples, to help ensure the conclusions have a justifiable biological basis.


Asunto(s)
Biología Computacional/métodos , Genoma Humano , Algoritmos , Animales , Núcleo Celular/metabolismo , Genoma , Genómica/métodos , Humanos , Ratones , Modelos Genéticos , Modelos Estadísticos , Biología Molecular/métodos , Anotación de Secuencia Molecular/métodos , Fosforilación , Probabilidad , Especificidad de la Especie
17.
Nucleic Acids Res ; 39(Database issue): D7-10, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21097465

RESUMEN

The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.


Asunto(s)
Bases de Datos Factuales/normas , Difusión de la Información
18.
Hum Mutat ; 33(5): 858-66, 2012 May.
Artículo en Inglés | MEDLINE | ID: mdl-22331800

RESUMEN

Mouse phenotype data represents a valuable resource for the identification of disease-associated genes, especially where the molecular basis is unknown and there is no clue to the candidate gene's function, pathway involvement or expression pattern. However, until recently these data have not been systematically used due to difficulties in mapping between clinical features observed in humans and mouse phenotype annotations. Here, we describe a semantic approach to solve this problem and demonstrate highly significant recall of known disease-gene associations and orthology relationships. A Web application (MouseFinder; www.mousemodels.org) has been developed to allow users to search the results of our whole-phenome comparison of human and mouse. We demonstrate its use in identifying ARTN as a strong candidate gene within the 1p34.1-p32 mapped locus for a hereditary form of ptosis.


Asunto(s)
Estudios de Asociación Genética , Fenotipo , Animales , Bases de Datos Genéticas , Enfermedad/genética , Modelos Animales de Enfermedad , Humanos , Ratones , Anotación de Secuencia Molecular , Mutación , Sistemas en Línea , Terminología como Asunto
19.
PLoS Biol ; 7(11): e1000247, 2009 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-19956802

RESUMEN

Scientists and clinicians who study genetic alterations and disease have traditionally described phenotypes in natural language. The considerable variation in these free-text descriptions has posed a hindrance to the important task of identifying candidate genes and models for human diseases and indicates the need for a computationally tractable method to mine data resources for mutant phenotypes. In this study, we tested the hypothesis that ontological annotation of disease phenotypes will facilitate the discovery of new genotype-phenotype relationships within and across species. To describe phenotypes using ontologies, we used an Entity-Quality (EQ) methodology, wherein the affected entity (E) and how it is affected (Q) are recorded using terms from a variety of ontologies. Using this EQ method, we annotated the phenotypes of 11 gene-linked human diseases described in Online Mendelian Inheritance in Man (OMIM). These human annotations were loaded into our Ontology-Based Database (OBD) along with other ontology-based phenotype descriptions of mutants from various model organism databases. Phenotypes recorded with this EQ method can be computationally compared based on the hierarchy of terms in the ontologies and the frequency of annotation. We utilized four similarity metrics to compare phenotypes and developed an ontology of homologous and analogous anatomical structures to compare phenotypes between species. Using these tools, we demonstrate that we can identify, through the similarity of the recorded phenotypes, other alleles of the same gene, other members of a signaling pathway, and orthologous genes and pathway members across species. We conclude that EQ-based annotation of phenotypes, in conjunction with a cross-species ontology, and a variety of similarity metrics can identify biologically meaningful similarities between genes by comparing phenotypes alone. This annotation and search method provides a novel and efficient means to identify gene candidates and animal models of human disease, which may shorten the lengthy path to identification and understanding of the genetic basis of human disease.


Asunto(s)
Modelos Animales de Enfermedad , Estudios de Asociación Genética , Fenotipo , Alelos , Animales , Proteínas Hedgehog/genética , Humanos , Transducción de Señal/genética , Pez Cebra , Proteínas de Pez Cebra/genética
20.
Database (Oxford) ; 20212021 10 26.
Artículo en Inglés | MEDLINE | ID: mdl-34697637

RESUMEN

Biological ontologies are used to organize, curate and interpret the vast quantities of data arising from biological experiments. While this works well when using a single ontology, integrating multiple ontologies can be problematic, as they are developed independently, which can lead to incompatibilities. The Open Biological and Biomedical Ontologies (OBO) Foundry was created to address this by facilitating the development, harmonization, application and sharing of ontologies, guided by a set of overarching principles. One challenge in reaching these goals was that the OBO principles were not originally encoded in a precise fashion, and interpretation was subjective. Here, we show how we have addressed this by formally encoding the OBO principles as operational rules and implementing a suite of automated validation checks and a dashboard for objectively evaluating each ontology's compliance with each principle. This entailed a substantial effort to curate metadata across all ontologies and to coordinate with individual stakeholders. We have applied these checks across the full OBO suite of ontologies, revealing areas where individual ontologies require changes to conform to our principles. Our work demonstrates how a sizable, federated community can be organized and evaluated on objective criteria that help improve overall quality and interoperability, which is vital for the sustenance of the OBO project and towards the overall goals of making data Findable, Accessible, Interoperable, and Reusable (FAIR). Database URL http://obofoundry.org/.


Asunto(s)
Ontologías Biológicas , Bases de Datos Factuales , Metadatos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA