Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Bioinformatics ; 25(12): i69-76, 2009 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-19478019

RESUMO

MOTIVATION: For many years, the Unified Medical Language System (UMLS) semantic network (SN) has been used as an upper-level semantic framework for the categorization of terms from terminological resources in biomedicine. BioTop has recently been developed as an upper-level ontology for the biomedical domain. In contrast to the SN, it is founded upon strict ontological principles, using OWL DL as a formal representation language, which has become standard in the semantic Web. In order to make logic-based reasoning available for the resources annotated or categorized with the SN, a mapping ontology was developed aligning the SN with BioTop. METHODS: The theoretical foundations and the practical realization of the alignment are being described, with a focus on the design decisions taken, the problems encountered and the adaptations of BioTop that became necessary. For evaluation purposes, UMLS concept pairs obtained from MEDLINE abstracts by a named entity recognition system were tested for possible semantic relationships. Furthermore, all semantic-type combinations that occur in the UMLS Metathesaurus were checked for satisfiability. RESULTS: The effort-intensive alignment process required major design changes and enhancements of BioTop and brought up several design errors that could be fixed. A comparison between a human curator and the ontology yielded only a low agreement. Ontology reasoning was also used to successfully identify 133 inconsistent semantic-type combinations. AVAILABILITY: BioTop, the OWL DL representation of the UMLS SN, and the mapping ontology are available at http://www.purl.org/biotop/.


Assuntos
Biologia Computacional/métodos , Armazenamento e Recuperação da Informação/métodos , Unified Medical Language System/normas , Bases de Dados Factuais , Reconhecimento Automatizado de Padrão , Semântica , Vocabulário Controlado
2.
Bioinformatics ; 25(16): 2064-70, 2009 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-19429601

RESUMO

MOTIVATION: The high level of polymorphism associated with the major histocompatibility complex (MHC) poses a challenge to organizing associated bioinformatic data, particularly in the area of hematopoietic stem cell transplantation. Thus, this area of research has great potential to profit from the ongoing development of biomedical ontologies, which offer structure and definition to MHC-data related communication and portability issues. RESULTS: We introduce the design considerations, methodological foundations and implementational issues underlying MaHCO, an ontology which represents the alleles and encoded molecules of the major histocompatibility complex. Importantly for human immunogenetics, it includes a detailed level of human leukocyte antigen (HLA) classification. We then present an ontology browser, search interfaces for immunogenetic fact and document retrieval, and the specification of an annotation language for semantic metadata, based on MaHCO. These use cases are intended to demonstrate the utility of ontology-driven bioinformatics in the field of immunogenetics. AVAILABILITY AND IMPLEMENTATION: The MaHCO Ontology is available via the BioPortal: http://www.bioontology.org/tools/portal/bioportal.html, and at: http://purl.org/stemnet/.


Assuntos
Biologia Computacional/métodos , Complexo Principal de Histocompatibilidade , Alelos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Humanos , Armazenamento e Recuperação da Informação
3.
Stud Health Technol Inform ; 160(Pt 2): 1030-4, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20841840

RESUMO

Terminologies which lack semantic connectivity hamper the effective search in biomedical fact databases and document retrieval systems. We here focus on the integration of two such isolated resources, the term lists from the protein fact database UNIPROT and the indexing vocabulary MESH from the bibliographic database MEDLINE. The generated semantic ties result from string matching and term set inclusion. We investigated the implicit terminological overlap between both resources in the domain of human proteins and evaluated our approach on a sample of 550 randomly selected UNIPROT entries that were manually mapped to their corresponding MESH headings. We achieved 90% precision and 79% recall (applying taxonomy-sensitive metrics). Fortunately, those proteins we were able to map to the MESH are ten times as frequently discussed in the literature as those on which we failed.


Assuntos
Bases de Dados de Proteínas , Medical Subject Headings , Terminologia como Assunto , Bases de Dados Bibliográficas , Humanos , MEDLINE , Proteínas/classificação , Estados Unidos , Vocabulário Controlado
4.
Stud Health Technol Inform ; 136: 9-14, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18487700

RESUMO

The Gene Regulation Ontology (GRO) is designed as a novel approach to model complex events that are part of the gene regulatory processes. We introduce the design requirements for such a conceptual model and discuss terminological resources suitable to base its construction on. The ontology defines gene regulation events in terms of ontological classes and imposes constraints on them by specifying the participants involved. The logical structure of the ontology is intended to meet the needs of advanced information extraction and text mining systems which target the identification of event representations in scientific literature. The GRO has just been submitted to the OBO library and is currently under review. It is available at http://www.ebi.ac.uk/Rebholz-srv/GRO/GRO.html.


Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Biologia Computacional , Técnicas de Apoio para a Decisão , Humanos , Biologia Molecular , Software , Transcrição Gênica , Vocabulário Controlado
5.
Stud Health Technol Inform ; 129(Pt 2): 1225-9, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17911910

RESUMO

In this paper we present the ongoing development and extension work on BioTop--a top-domain ontology for linking biomedical domain ontologies. We start by making the case for the application of a common ontology to interface independent biomedical domain ontologies by introducing a set of more general classes. Then we briefly depict the relation of BioTop to the GENIA ontology as starting point of its initial developement. Afterwards we propose our distinction of ontologies into top, top-domain and domain ones and describe our approach to the integration of the top ontology BFO into BioTop. Then we present our plans to join the OBO and OBO Foundry repository of ontologies and list its admission principles in relation to our ontology. Some actual BioTop interface classes are shown subsequently. We conclude by detailing on some planned BioTop usages in the area of BioNLP and cancer research and show some further intended improvements.


Assuntos
Disciplinas das Ciências Biológicas/classificação , Vocabulário Controlado , Pesquisa Biomédica , Medicina/classificação
6.
Prev Vet Med ; 117(1): 180-8, 2014 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-25241618

RESUMO

The importance of livestock-associated methicillin-resistant Staphylococcus aureus (LA-MRSA) as an infectious agent for humans has increased in recent years in Germany. Although it is well known that the prevalence of MRSA in pig farms is high, risk factors for the presence of MRSA in herds of fattening pigs are still poorly understood. The aim of this study was to evaluate available data from previous studies on MRSA in fattening pigs in a meta-analysis to answer the question: What are the factors associated with the occurrence of MRSA in fattening pig herds? The studies on MRSA in pigs that were identified by literature research were heterogeneous with respect to the risk factors investigated and the type of herds focused on. Therefore we decided to carry out a pooling analysis on herd level rather than a typical meta-analysis. Eligible herd data were identified based on the published literature and communication with the authors. The final data set covered 400 fattening pig herds from 10 different studies and 12 risk factors. The prevalence of MRSA in the 400 fattening pig herds was 53.5%. Data were analyzed using generalized estimating equations (GEE). The resulting multivariate model confirmed previously identified risk factors for MRSA in pig herds (herd size and herd type). It also identified further risk factors: group treatment of fattening pigs with antimicrobial drugs (OR=1.79) and housing fattening pig herds on at least partially slatted floors (OR=2.39) compared to plain floor. In contrast, according to the model, fattening pig herds on farms keeping other livestock along with pigs were less likely to harbor MRSA (OR=0.54). The results underline the benefits from a pooling analysis and cooperative re-evaluation of published data.


Assuntos
Staphylococcus aureus Resistente à Meticilina , Infecções Estafilocócicas/veterinária , Doenças dos Suínos/microbiologia , Animais , Fatores de Risco , Infecções Estafilocócicas/microbiologia , Suínos
7.
J Biomed Semantics ; 3 Suppl 1: S4, 2012 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-22541595

RESUMO

Identifying relationships between hitherto unrelated entities in different ontologies is the key task of ontology alignment. An alignment is either manually created by domain experts or automatically by an alignment system. In recent years, several alignment systems have been made available, each using its own set of methods for relation detection. To evaluate and compare these systems, typically a manually created alignment is used, the so-called reference alignment. Based on our experience with several of these reference alignments we derived requirements and translated them into simple quality checks to ensure the alignments' validity and also their reusability. In this article, these quality checks are applied to a standard reference alignment in the biomedical domain, the Ontology Alignment Evaluation Initiative Anatomy track reference alignment, and two more recent data sets covering multiple domains, including but not restricted to anatomy and biology.

8.
Pac Symp Biocomput ; : 376-87, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22174293

RESUMO

In this paper, we report on adapting the JREX relation extraction engine, originally developed For the elicitation of protein-protein interaction relations, to the domains of pharmacogenetics and pharmacogenomics. We propose an intrinsic and an extrinsic evaluation scenario which is based on knowledge contained in the PharmGKB knowledge base. Porting JREX yields favorable results in the range of 80% F-score for Gene-Disease, Gene-Drug, and Drug-Disease relations.


Assuntos
Bases de Conhecimento , Farmacogenética/estatística & dados numéricos , Hidrocarboneto de Aril Hidroxilases/genética , Hidrocarboneto de Aril Hidroxilases/metabolismo , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/genética , Citalopram/farmacocinética , Biologia Computacional , Citocromo P-450 CYP2C19 , Bases de Dados Factuais , Bases de Dados Genéticas , Docetaxel , Feminino , Genes BRCA2 , Predisposição Genética para Doença , Humanos , Obesidade/genética , Farmacogenética/normas , Farmacocinética , Mapeamento de Interação de Proteínas/estatística & dados numéricos , Taxoides/uso terapêutico , Urocortinas/genética
9.
AMIA Annu Symp Proc ; 2012: 301-10, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23304300

RESUMO

We report on basic design decisions and novel annotation procedures underlying the development of PathoJen, a corpus of Medline abstracts annotated for pathological phenomena, including diseases as a proper subclass. This named entity type is known to be hard to delineate and capture by annotation guidelines. We here propose a two-category encoding schema where we distinguish short from long mention spans, the first covering standardized terminology (e.g. diseases), the latter accounting for less structured descriptive statements about norm-deviant states, as well as criteria and observations that might signal pathologies. The second design decision relates to the way annotation instances are sampled. Here we subscribe to an Active Learning-based approach which is known to save annotation costs without sacrificing annotation quality by means of a sample bias. By design, Active Learning picks up 'hard' to annotate instances for human annotators, whereas 'easier' ones are passed over to the automatic classifier whose models already incorporate and gradually improve with previous annotation experience.


Assuntos
Algoritmos , Inteligência Artificial , Patologia/classificação , Aprendizagem Baseada em Problemas , Humanos , MEDLINE
10.
J Biomed Semantics ; 2 Suppl 5: S11, 2011 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-22166494

RESUMO

BACKGROUND: Competitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic groups. To overcome these shortcomings, the CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus (SSC-I). The four semantic groups are chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). This corpus has been used for the First CALBC Challenge asking the participants to annotate the corpus with their text processing solutions. RESULTS: All four PPs from the CALBC project and in addition, 12 challenge participants (CPs) contributed annotated data sets for an evaluation against the SSC-I. CPs could ignore the training data and deliver the annotations from their genuine annotation system, or could train a machine-learning approach on the provided pre-annotated data. In general, the performances of the annotation solutions were lower for entities from the categories CHED and PRGE in comparison to the identification of entities categorized as DISO and SPE. The best performance over all semantic groups were achieved from two annotation solutions that have been trained on the SSC-I.The data sets from participants were used to generate the harmonised Silver Standard Corpus II (SSC-II), if the participant did not make use of the annotated data set from the SSC-I for training purposes. The performances of the participants' solutions were again measured against the SSC-II. The performances of the annotation solutions showed again better results for DISO and SPE in comparison to CHED and PRGE. CONCLUSIONS: The SSC-I delivers a large set of annotations (1,121,705) for a large number of documents (100,000 Medline abstracts). The annotations cover four different semantic groups and are sufficiently homogeneous to be reproduced with a trained classifier leading to an average F-measure of 85%. Benchmarking the annotation solutions against the SSC-II leads to better performance for the CPs' annotation solutions in comparison to the SSC-I.

11.
J Bioinform Comput Biol ; 8(1): 163-79, 2010 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-20183881

RESUMO

The CALBC initiative aims to provide a large-scale biomedical text corpus that contains semantic annotations for named entities of different kinds. The generation of this corpus requires that the annotations from different automatic annotation systems be harmonized. In the first phase, the annotation systems from five participants (EMBL-EBI, EMC Rotterdam, NLM, JULIE Lab Jena, and Linguamatics) were gathered. All annotations were delivered in a common annotation format that included concept identifiers in the boundary assignments and that enabled comparison and alignment of the results. During the harmonization phase, the results produced from those different systems were integrated in a single harmonized corpus ("silver standard" corpus) by applying a voting scheme. We give an overview of the processed data and the principles of harmonization--formal boundary reconciliation and semantic matching of named entities. Finally, all submissions of the participants were evaluated against that silver standard corpus. We found that species and disease annotations are better standardized amongst the partners than the annotations of genes and proteins. The raw corpus is now available for additional named entity annotations. Parts of it will be made available later on for a public challenge. We expect that we can improve corpus building activities both in terms of the numbers of named entity classes being covered, as well as the size of the corpus in terms of annotated documents.


Assuntos
Biologia Computacional/normas , Mineração de Dados/normas , Comportamento Cooperativo , Mineração de Dados/estatística & dados numéricos , Bases de Dados Factuais/estatística & dados numéricos , Unified Medical Language System
12.
AMIA Annu Symp Proc ; : 41-5, 2007 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-18693794

RESUMO

We present a formally coherent and consistent multi-species MHC ontology which includes all human MHC alleles and serological groups. The ontology is part of StemNet, a knowledge management system for hematopoietic stem cell transplantation with an integrated semantic search engine. The Owl-encoded MHC ontology contributes to the system in a threefold manner. First, it supports query formulation and query processing as well as mapping onto external terminological resources, second, it eases the interaction with the search engine when navigating through search results, and finally, it provides a formal language for text annotation, a methodological prerequisite for state-of-the-art natural language text processors which are increasingly based on machine learning methods and hence require annotated text corpora.


Assuntos
Complexo Principal de Histocompatibilidade , Vocabulário Controlado , Alelos , Animais , Inteligência Artificial , Humanos , Imunogenética , Complexo Principal de Histocompatibilidade/genética , Processamento de Linguagem Natural , Linguagens de Programação , Proteínas/classificação
13.
AMIA Annu Symp Proc ; : 694-8, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17238430

RESUMO

There is a growing need for the general-purpose description of the basic conceptual entities in the life sciences. Up until now, upper level models have mainly been purpose-driven, such as the GENIA ontology, originally devised as a vocabulary for corpus annotation. As an alternative,we here present BioTop, a description-logic-based top level ontology for molecular biology, which we consider as an ontologically conscious redesign of the GENIA ontology.


Assuntos
Biologia Molecular/classificação , Vocabulário Controlado
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA