Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Database (Oxford) ; 20222022 05 25.
Artigo em Inglês | MEDLINE | ID: mdl-35616100

RESUMO

Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec.


Assuntos
Metadados , Web Semântica , Gerenciamento de Dados , Bases de Dados Factuais , Fluxo de Trabalho
2.
Drug Discov Today ; 24(10): 2068-2075, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31158512

RESUMO

In this review, we provide a summary of recent progress in ontology mapping (OM) at a crucial time when biomedical research is under a deluge of an increasing amount and variety of data. This is particularly important for realising the full potential of semantically enabled or enriched applications and for meaningful insights, such as drug discovery, using machine-learning technologies. We discuss challenges and solutions for better ontology mappings, as well as how to select ontologies before their application. In addition, we describe tools and algorithms for ontology mapping, including evaluation of tool capability and quality of mappings. Finally, we outline the requirements for an ontology mapping service (OMS) and the progress being made towards implementation of such sustainable services.


Assuntos
Ontologias Biológicas , Descoberta de Drogas/métodos , Aprendizado de Máquina , Semântica , Algoritmos , Humanos
3.
J Biomed Semantics ; 9(1): 9, 2018 02 08.
Artigo em Inglês | MEDLINE | ID: mdl-29422110

RESUMO

BACKGROUND: Pathogenesis of inflammatory diseases can be tracked by studying the causality relationships among the factors contributing to its development. We could, for instance, hypothesize on the connections of the pathogenesis outcomes to the observed conditions. And to prove such causal hypotheses we would need to have the full understanding of the causal relationships, and we would have to provide all the necessary evidences to support our claims. In practice, however, we might not possess all the background knowledge on the causality relationships, and we might be unable to collect all the evidence to prove our hypotheses. RESULTS: In this work we propose a methodology for the translation of biological knowledge on causality relationships of biological processes and their effects on conditions to a computational framework for hypothesis testing. The methodology consists of two main points: hypothesis graph construction from the formalization of the background knowledge on causality relationships, and confidence measurement in a causality hypothesis as a normalized weighted path computation in the hypothesis graph. In this framework, we can simulate collection of evidences and assess confidence in a causality hypothesis by measuring it proportionally to the amount of available knowledge and collected evidences. CONCLUSIONS: We evaluate our methodology on a hypothesis graph that represents both contributing factors which may cause cartilage degradation and the factors which might be caused by the cartilage degradation during osteoarthritis. Hypothesis graph construction has proven to be robust to the addition of potentially contradictory information on the simultaneously positive and negative effects. The obtained confidence measures for the specific causality hypotheses have been validated by our domain experts, and, correspond closely to their subjective assessments of confidences in investigated hypotheses. Overall, our methodology for a shared hypothesis testing framework exhibits important properties that researchers will find useful in literature review for their experimental studies, planning and prioritizing evidence collection acquisition procedures, and testing their hypotheses with different depths of knowledge on causal dependencies of biological processes and their effects on the observed conditions.


Assuntos
Ontologias Biológicas , Gráficos por Computador , Inflamação
4.
J Biomed Semantics ; 8(1): 55, 2017 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-29197409

RESUMO

BACKGROUND: The disease and phenotype track was designed to evaluate the relative performance of ontology matching systems that generate mappings between source ontologies. Disease and phenotype ontologies are important for applications such as data mining, data integration and knowledge management to support translational science in drug discovery and understanding the genetics of disease. RESULTS: Eleven systems (out of 21 OAEI participating systems) were able to cope with at least one of the tasks in the Disease and Phenotype track. AML, FCA-Map, LogMap(Bio) and PhenoMF systems produced the top results for ontology matching in comparison to consensus alignments. The results against manually curated mappings proved to be more difficult most likely because these mapping sets comprised mostly subsumption relationships rather than equivalence. Manual assessment of unique equivalence mappings showed that AML, LogMap(Bio) and PhenoMF systems have the highest precision results. CONCLUSIONS: Four systems gave the highest performance for matching disease and phenotype ontologies. These systems coped well with the detection of equivalence matches, but struggled to detect semantic similarity. This deserves more attention in the future development of ontology matching systems. The findings of this evaluation show that such systems could help to automate equivalence matching in the workflow of curators, who maintain ontology mapping services in numerous domains such as disease and phenotype.


Assuntos
Ontologias Biológicas , Doença , Fenótipo , Consenso , Humanos
5.
BMC Bioinformatics ; 13 Suppl 1: S6, 2012 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-22373409

RESUMO

BACKGROUND: The semantic integration of biomedical resources is still a challenging issue which is required for effective information processing and data analysis. The availability of comprehensive knowledge resources such as biomedical ontologies and integrated thesauri greatly facilitates this integration effort by means of semantic annotation, which allows disparate data formats and contents to be expressed under a common semantic space. In this paper, we propose a multidimensional representation for such a semantic space, where dimensions regard the different perspectives in biomedical research (e.g., population, disease, anatomy and protein/genes). RESULTS: This paper presents a novel method for building multidimensional semantic spaces from semantically annotated biomedical data collections. This method consists of two main processes: knowledge and data normalization. The former one arranges the concepts provided by a reference knowledge resource (e.g., biomedical ontologies and thesauri) into a set of hierarchical dimensions for analysis purposes. The latter one reduces the annotation set associated to each collection item into a set of points of the multidimensional space. Additionally, we have developed a visual tool, called 3D-Browser, which implements OLAP-like operators over the generated multidimensional space. The method and the tool have been tested and evaluated in the context of the Health-e-Child (HeC) project. Automatic semantic annotation was applied to tag three collections of abstracts taken from PubMed, one for each target disease of the project, the Uniprot database, and the HeC patient record database. We adopted the UMLS Meta-thesaurus 2010AA as the reference knowledge resource. CONCLUSIONS: Current knowledge resources and semantic-aware technology make possible the integration of biomedical resources. Such an integration is performed through semantic annotation of the intended biomedical data resources. This paper shows how these annotations can be exploited for integration, exploration, and analysis tasks. Results over a real scenario demonstrate the viability and usefulness of the approach, as well as the quality of the generated multidimensional semantic spaces.


Assuntos
Pesquisa Biomédica , Biologia Computacional/métodos , Semântica , Ontologias Biológicas , Gráficos por Computador , Coleta de Dados , Bases de Dados Factuais , Humanos , PubMed
6.
J Biomed Semantics ; 2 Suppl 1: S2, 2011 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-21388571

RESUMO

BACKGROUND: The UMLS Metathesaurus (UMLS-Meta) is currently the most comprehensive effort for integrating independently-developed medical thesauri and ontologies. UMLS-Meta is being used in many applications, including PubMed and ClinicalTrials.gov. The integration of new sources combines automatic techniques, expert assessment, and auditing protocols. The automatic techniques currently in use, however, are mostly based on lexical algorithms and often disregard the semantics of the sources being integrated. RESULTS: In this paper, we argue that UMLS-Meta's current design and auditing methodologies could be significantly enhanced by taking into account the logic-based semantics of the ontology sources. We provide empirical evidence suggesting that UMLS-Meta in its 2009AA version contains a significant number of errors; these errors become immediately apparent if the rich semantics of the ontology sources is taken into account, manifesting themselves as unintended logical consequences that follow from the ontology sources together with the information in UMLS-Meta. We then propose general principles and specific logic-based techniques to effectively detect and repair such errors. CONCLUSIONS: Our results suggest that the methodologies employed in the design of UMLS-Meta are not only very costly in terms of human effort, but also error-prone. The techniques presented here can be useful for both reducing human effort in the design and maintenance of UMLS-Meta and improving the quality of its contents.

7.
BMC Bioinformatics ; 10 Suppl 10: S4, 2009 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-19796401

RESUMO

This paper is intended to explore how to use terminological resources for ontology engineering. Nowadays there are several biomedical ontologies describing overlapping domains, but there is not a clear correspondence between the concepts that are supposed to be equivalent or just similar. These resources are quite precious but their integration and further development are expensive. Terminologies may support the ontological development in several stages of the lifecycle of the ontology; e.g. ontology integration. In this paper we investigate the use of terminological resources during the ontology lifecycle. We claim that the proper creation and use of a shared thesaurus is a cornerstone for the successful application of the Semantic Web technology within life sciences. Moreover, we have applied our approach to a real scenario, the Health-e-Child (HeC) project, and we have evaluated the impact of filtering and re-organizing several resources. As a result, we have created a reference thesaurus for this project, named HeCTh.


Assuntos
Biologia Computacional/métodos , Armazenamento e Recuperação da Informação/métodos , Vocabulário Controlado , Bases de Dados Factuais , Internet , Processamento de Linguagem Natural , Unified Medical Language System
8.
BMC Bioinformatics ; 10 Suppl 12: S7, 2009 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-19828083

RESUMO

BACKGROUND: The today's public database infrastructure spans a very large collection of heterogeneous biological data, opening new opportunities for molecular biology, bio-medical and bioinformatics research, but raising also new problems for their integration and computational processing. RESULTS: In this paper we survey the most interesting and novel approaches for the representation, integration and management of different kinds of biological data by exploiting XML and the related recommendations and approaches. Moreover, we present new and interesting cutting edge approaches for the appropriate management of heterogeneous biological data represented through XML. CONCLUSION: XML has succeeded in the integration of heterogeneous biomolecular information, and has established itself as the syntactic glue for biological data sources. Nevertheless, a large variety of XML-based data formats have been proposed, thus resulting in a difficult effective integration of bioinformatics data schemes. The adoption of a few semantic-rich standard formats is urgent to achieve a seamless integration of the current biological resources.


Assuntos
Biologia Computacional/métodos , Linguagens de Programação , Bases de Dados Factuais , Bases de Dados Genéticas , Genômica , Mutação , Polimorfismo Genético
9.
BMC Bioinformatics ; 9 Suppl 3: S3, 2008 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-18426548

RESUMO

BACKGROUND: In recent years, the recognition of semantic types from the biomedical scientific literature has been focused on named entities like protein and gene names (PGNs) and gene ontology terms (GO terms). Other semantic types like diseases have not received the same level of attention. Different solutions have been proposed to identify disease named entities in the scientific literature. While matching the terminology with language patterns suffers from low recall (e.g., Whatizit) other solutions make use of morpho-syntactic features to better cover the full scope of terminological variability (e.g., MetaMap). Currently, MetaMap that is provided from the National Library of Medicine (NLM) is the state of the art solution for the annotation of concepts from UMLS (Unified Medical Language System) in the literature. Nonetheless, its performance has not yet been assessed on an annotated corpus. In addition, little effort has been invested so far to generate an annotated dataset that links disease entities in text to disease entries in a database, thesaurus or ontology and that could serve as a gold standard to benchmark text mining solutions. RESULTS: As part of our research work, we have taken a corpus that has been delivered in the past for the identification of associations of genes to diseases based on the UMLS Metathesaurus and we have reprocessed and re-annotated the corpus. We have gathered annotations for disease entities from two curators, analyzed their disagreement (0.51 in the kappa-statistic) and composed a single annotated corpus for public use. Thereafter, three solutions for disease named entity recognition including MetaMap have been applied to the corpus to automatically annotate it with UMLS Metathesaurus concepts. The resulting annotations have been benchmarked to compare their performance. CONCLUSIONS: The annotated corpus is publicly available at ftp://ftp.ebi.ac.uk/pub/software/textmining/corpora/diseases and can serve as a benchmark to other systems. In addition, we found that dictionary look-up already provides competitive results indicating that the use of disease terminology is highly standardized throughout the terminologies and the literature. MetaMap generates precise results at the expense of insufficient recall while our statistical method obtains better recall at a lower precision rate. Even better results in terms of precision are achieved by combining at least two of the three methods leading, but this approach again lowers recall. Altogether, our analysis gives a better understanding of the complexity of disease annotations in the literature. MetaMap and the dictionary based approach are available through the Whatizit web service infrastructure (Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Jimeno A: Text processing through Web services: Calling Whatizit. Bioinformatics 2008, 24:296-298).


Assuntos
Algoritmos , Inteligência Artificial , Doença/classificação , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Terminologia como Assunto , Unified Medical Language System , Dicionários como Assunto , Semântica , Vocabulário Controlado
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA