Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Artif Intell Med ; 39(2): 127-36, 2007 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-17052900

RESUMO

OBJECTIVE: The amount of new discoveries (as published in the scientific literature) in the biomedical area is growing at an exponential rate. This growth makes it very difficult to filter the most relevant results, and thus the extraction of the core information becomes very expensive. Therefore, there is a growing interest in text processing approaches that can deliver selected information from scientific publications, which can limit the amount of human intervention normally needed to gather those results. MATERIALS AND METHODS: This paper presents and evaluates an approach aimed at automating the process of extracting functional relations (e.g. interactions between genes and proteins) from scientific literature in the biomedical domain. The approach, using a novel dependency-based parser, is based on a complete syntactic analysis of the corpus. RESULTS: We have implemented a state-of-the-art text mining system for biomedical literature, based on a deep-linguistic, full-parsing approach. The results are validated on two different corpora: the manually annotated genomics information access (GENIA) corpus and the automatically annotated arabidopsis thaliana circadian rhythms (ATCR) corpus. CONCLUSION: We show how a deep-linguistic approach (contrary to common belief) can be used in a real world text mining application, offering high-precision relation extraction, while at the same time retaining a sufficient recall.


Assuntos
Automação , Linguística , Publicações Periódicas como Assunto , Proteínas/química , Proteínas/fisiologia , Animais , Inteligência Artificial , Bases de Dados Factuais , Humanos , Mamíferos , Proteínas de Plantas , Editoração , Semântica
2.
BMC Bioinformatics ; 7 Suppl 3: S3, 2006 Nov 24.
Artigo em Inglês | MEDLINE | ID: mdl-17134476

RESUMO

BACKGROUND: The biomedical domain is witnessing a rapid growth of the amount of published scientific results, which makes it increasingly difficult to filter the core information. There is a real need for support tools that 'digest' the published results and extract the most important information. RESULTS: We describe and evaluate an environment supporting the extraction of domain-specific relations, such as protein-protein interactions, from a richly-annotated corpus. We use full, deep-linguistic parsing and manually created, versatile patterns, expressing a large set of syntactic alternations, plus semantic ontology information. CONCLUSION: The experiments show that our approach described is capable of delivering high-precision results, while maintaining sufficient levels of recall. The high level of abstraction of the rules used by the system, which are considerably more powerful and versatile than finite-state approaches, allows speedy interactive development and validation.


Assuntos
Indexação e Redação de Resumos , Inteligência Artificial , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Publicações Periódicas como Assunto , Terminologia como Assunto , Vocabulário Controlado , Algoritmos , Bases de Dados Factuais , Semântica , Software
3.
Artif Intell Med ; 52(2): 107-14, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21652190

RESUMO

OBJECTIVE: We present a combined terminological resource for text mining over biomedical literature. The purpose of the resource is to allow the detection of mentions of specific biological entities in scientific publications, and their grounding to widely accepted identifiers. This is an essential process, useful in itself, and necessary as an intermediate step for almost every type of complex text mining application. METHODS: We discuss some of the properties of the terminology for this domain, in particular the degree of ambiguity, which constitutes a peculiar problem for text mining applications. Without a correct recognition and disambiguation of the domain entities no reliable results can be produced. RESULTS: We also discuss an application that makes use of the resulting terminological knowledge base. We annotate an existing corpus of sentences about protein interactions. The annotation consists of a normalization step that matches the terms in our resource with their actual representation in the corpus, and a disambiguation step that resolves the ambiguity of matched terms. CONCLUSION: In this paper we present a large terminological resource, compiled through the aggregation of a number of different manually curated sources. We discuss the lexical properties of such resources, specifically the degree of ambiguity of the terms, and we inspect the causes of such ambiguity, in particular for protein names. This information is of vital importance for the implementation of an efficient term normalization and grounding algorithm.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Bases de Dados Bibliográficas , Algoritmos , Publicações , Vocabulário Controlado
4.
Artigo em Inglês | MEDLINE | ID: mdl-20671319

RESUMO

We describe a system for the detection of mentions of protein-protein interactions in the biomedical scientific literature. The original system was developed as a part of the OntoGene project, which focuses on using advanced computational linguistic techniques for text mining applications in the biomedical domain. In this paper, we focus in particular on the participation to the BioCreative II.5 challenge, where the OntoGene system achieved best-ranked results. Additionally, we describe a feature-analysis experiment performed after the challenge, which shows the unexpected result that one single feature alone performs better than the combination of features used in the challenge.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Mapeamento de Interação de Proteínas/métodos , Bases de Dados Bibliográficas , Processamento de Linguagem Natural , Sociedades Científicas
5.
Genome Biol ; 9 Suppl 2: S13, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18834491

RESUMO

BACKGROUND: Research scientists and companies working in the domains of biomedicine and genomics are increasingly faced with the problem of efficiently locating, within the vast body of published scientific findings, the critical pieces of information that are needed to direct current and future research investment. RESULTS: In this report we describe approaches taken within the scope of the second BioCreative competition in order to solve two aspects of this problem: detection of novel protein interactions reported in scientific articles, and detection of the experimental method that was used to confirm the interaction. Our approach to the former problem is based on a high-recall protein annotation step, followed by two strict disambiguation steps. The remaining proteins are then combined according to a number of lexico-syntactic filters, which deliver high-precision results while maintaining reasonable recall. The detection of the experimental methods is tackled by a pattern matching approach, which has delivered the best results in the official BioCreative evaluation. CONCLUSION: Although the results of BioCreative clearly show that no tool is sufficiently reliable for fully automated annotations, a few of the proposed approaches (including our own) already perform at a competitive level. This makes them interesting either as standalone tools for preliminary document inspection, or as modules within an environment aimed at supporting the process of curation of biomedical literature.


Assuntos
Biologia Computacional/métodos , Genes , Sociedades Científicas , Indexação e Redação de Resumos , Internet , Mapeamento de Interação de Proteínas , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa