Automating curation using a natural language processing pipeline.

Alex, Beatrice; Grover, Claire; Haddow, Barry; Kabadjov, Mijail; Klein, Ewan; Matthews, Michael; Tobin, Richard; Wang, Xinglong

Alex, Beatrice; Grover, Claire; Haddow, Barry; Kabadjov, Mijail; Klein, Ewan; Matthews, Michael; Tobin, Richard; Wang, Xinglong.

Afiliação

Alex B; School of Informatics, University of Edinburgh, Edinburgh, UK. balex@inf.ed.ac.uk

Genome Biol ; 9 Suppl 2: S10, 2008.

Article em En | MEDLINE | ID: mdl-18834488

RESUMO

BACKGROUND: The tasks in BioCreative II were designed to approximate some of the laborious work involved in curating biomedical research papers. The approach to these tasks taken by the University of Edinburgh team was to adapt and extend the existing natural language processing (NLP) system that we have developed as part of a commercial curation assistant. Although this paper concentrates on using NLP to assist with curation, the system can be equally employed to extract types of information from the literature that is immediately relevant to biologists in general. RESULTS: Our system was among the highest performing on the interaction subtasks, and competitive performance on the gene mention task was achieved with minimal development effort. For the gene normalization task, a string matching technique that can be quickly applied to new domains was shown to perform close to average. CONCLUSION: The technologies being developed were shown to be readily adapted to the BioCreative II tasks. Although high performance may be obtained on individual tasks such as gene mention recognition and normalization, and document classification, tasks in which a number of components must be combined, such as detection and normalization of interacting protein pairs, are still challenging for NLP systems.

Assuntos

Automação; Processamento de Linguagem Natural; Genes; Reprodutibilidade dos Testes

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Automação / Processamento de Linguagem Natural Idioma: En Ano de publicação: 2008 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Automação / Processamento de Linguagem Natural Idioma: En Ano de publicação: 2008 Tipo de documento: Article