Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
2.
Artigo em Inglês | MEDLINE | ID: mdl-11072322

RESUMO

We have selected the most frequently seen verbs from raw texts made up of 1-million-words of Medline abstracts, and we were able to identify (or bracket) noun phrases contained in the corpus, with a precision rate of 90%. Then, based on the noun-phrase-bracketted corpus, we tried to find the subject and object terms for some frequently seen verbs in the domain. The precision rate of finding the right subject and object for each verb was about 73%. This task was only made possible because we were able to linguistically analyze (or parse) a large quantity of a raw corpus. Our approach will be useful for classifying genes and gene products and for identifying the interaction between them. It is the first step of our effort in building a genome-related thesaurus and hierarchies in a fully automatic way.

3.
Bioinformatics ; 19 Suppl 1: i180-2, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-12855455

RESUMO

MOTIVATION: Natural language processing (NLP) methods are regarded as being useful to raise the potential of text mining from biological literature. The lack of an extensively annotated corpus of this literature, however, causes a major bottleneck for applying NLP techniques. GENIA corpus is being developed to provide reference materials to let NLP techniques work for bio-textmining. RESULTS: GENIA corpus version 3.0 consisting of 2000 MEDLINE abstracts has been released with more than 400,000 words and almost 100,000 annotations for biological terms.


Assuntos
Indexação e Redação de Resumos/métodos , Biologia/métodos , Bases de Dados Bibliográficas , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Publicações Periódicas como Assunto , Terminologia como Assunto , Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Documentação , MEDLINE
4.
Pac Symp Biocomput ; : 408-19, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11262959

RESUMO

We have designed and implemented an information extraction system using a full parser to investigate the plausibility of full analysis of text using general-purpose parser and grammar applied to biomedical domain. We partially solved the problems of full parsing of inefficiency, ambiguity, and low coverage by introducing the preprocessors, and proposed the use of modules that handles partial results of parsing for further improvement. Our approach makes it possible to modularize the system, so that the IE system as a whole becomes easy to be tuned to specific domains, and easy to be maintained and improved by incorporating various techniques of disambiguation, speed up, etc. In preliminary experiment, from 133 argument structures that should be extracted from 97 sentences, we obtained 23% uniquely and 24% with ambiguity. And 20% are extractable from not complete but partial results of full parsing.


Assuntos
Processamento de Linguagem Natural , Bases de Dados Factuais , Processamento Eletrônico de Dados
5.
Artigo em Inglês | MEDLINE | ID: mdl-11072324

RESUMO

Huge quantities of on-line medical texts such as Medline are available, and we would hope to extract useful information from these resources, as much as possible, hopefully in an automatic way, with the aid of computer technologies. Especially, recent advances in Natural Language Processing (NLP) techniques raise new challenges and opportunities for tackling genome-related on-line text; combining NLP techniques with genome informatics extends beyond the traditional realms of either technology to a variety of emerging applications. In this paper, we explain some of our current efforts for developing various NLP-based tools for tackling genome-related on-line documents for information extraction task.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA