Automated PDF highlighting to support faster curation of literature for Parkinson's and Alzheimer's disease.

Wu, Honghan; Oellrich, Anika; Girges, Christine; de Bono, Bernard; Hubbard, Tim J P; Dobson, Richard J B

Wu, Honghan; Oellrich, Anika; Girges, Christine; de Bono, Bernard; Hubbard, Tim J P; Dobson, Richard J B.

Afiliação

Wu H; Department of Biostatistics and Health Informatics, King's College London, De Crespigny Park, Denmark Hill London SE5 8AF, UK.
Oellrich A; School of Computer and Software, Nanjing University of Information Science and Technology, 219 Ningliu Road, Nanjing, China, 210044.
Girges C; Department of Biostatistics and Health Informatics, King's College London, De Crespigny Park, Denmark Hill London SE5 8AF, UK.
de Bono B; Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London Gower Street, WC1E 6BT, UK.
Hubbard TJ; Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London Gower Street, WC1E 6BT, UK.
Dobson RJ; Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK.

Database (Oxford) ; 2017(1)2017 01 01.

Article em En | MEDLINE | ID: mdl-28365743

ABSTRACT

ABSTRACT

Neurodegenerative disorders such as Parkinson's and Alzheimer's disease are devastating and costly illnesses, a source of major global burden. In order to provide successful interventions for patients and reduce costs, both causes and pathological processes need to be understood. The ApiNATOMY project aims to contribute to our understanding of neurodegenerative disorders by manually curating and abstracting data from the vast body of literature amassed on these illnesses. As curation is labour-intensive, we aimed to speed up the process by automatically highlighting those parts of the PDF document of primary importance to the curator. Using techniques similar to those of summarisation, we developed an algorithm that relies on linguistic, semantic and spatial features. Employing this algorithm on a test set manually corrected for tool imprecision, we achieved a macro F 1 -measure of 0.51, which is an increase of 132% compared to the best bag-of-words baseline model. A user based evaluation was also conducted to assess the usefulness of the methodology on 40 unseen publications, which reveals that in 85% of cases all highlighted sentences are relevant to the curation task and in about 65% of the cases, the highlights are sufficient to support the knowledge curation task without needing to consult the full text. In conclusion, we believe that these are promising results for a step in automating the recognition of curation-relevant sentences. Refining our approach to pre-digest papers will lead to faster processing and cost reduction in the curation process. Database URL https//github.com/KHP-Informatics/NapEasy.

Assuntos

Doença de Alzheimer; Curadoria de Dados/métodos; Mineração de Dados/métodos; Doença de Parkinson; Doença de Alzheimer/genética; Doença de Alzheimer/metabolismo; Animais; Curadoria de Dados/normas; Mineração de Dados/normas; Humanos; Doença de Parkinson/genética; Doença de Parkinson/metabolismo

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Doença de Parkinson / Mineração de Dados / Doença de Alzheimer / Curadoria de Dados Tipo de estudo: Prognostic_studies Limite: Animals / Humans Idioma: En Revista: Database (Oxford) Ano de publicação: 2017 Tipo de documento: Article País de afiliação: Reino Unido

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google