A crowdsourcing workflow for extracting chemical-induced disease relations from free text.

Li, Tong Shu; Bravo, Àlex; Furlong, Laura I; Good, Benjamin M; Su, Andrew I

Li, Tong Shu; Bravo, Àlex; Furlong, Laura I; Good, Benjamin M; Su, Andrew I.

Afiliação

Li TS; Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA 92037, USA.
Bravo À; Research Programme on Biomedical Informatics (GRIB), IMIM, UPF, Barcelona, Spain.
Furlong LI; Research Programme on Biomedical Informatics (GRIB), IMIM, UPF, Barcelona, Spain.
Good BM; Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA 92037, USA.
Su AI; Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA 92037, USA asu@scripps.edu.

Database (Oxford) ; 20162016.

Article em En | MEDLINE | ID: mdl-27087308

RESUMO

Relations between chemicals and diseases are one of the most queried biomedical interactions. Although expert manual curation is the standard method for extracting these relations from the literature, it is expensive and impractical to apply to large numbers of documents, and therefore alternative methods are required. We describe here a crowdsourcing workflow for extracting chemical-induced disease relations from free text as part of the BioCreative V Chemical Disease Relation challenge. Five non-expert workers on the CrowdFlower platform were shown each potential chemical-induced disease relation highlighted in the original source text and asked to make binary judgments about whether the text supported the relation. Worker responses were aggregated through voting, and relations receiving four or more votes were predicted as true. On the official evaluation dataset of 500 PubMed abstracts, the crowd attained a 0.505F-score (0.475 precision, 0.540 recall), with a maximum theoretical recall of 0.751 due to errors with named entity recognition. The total crowdsourcing cost was $1290.67 ($2.58 per abstract) and took a total of 7 h. A qualitative error analysis revealed that 46.66% of sampled errors were due to task limitations and gold standard errors, indicating that performance can still be improved. All code and results are publicly available athttps://github.com/SuLab/crowd_cid_relexDatabase URL:https://github.com/SuLab/crowd_cid_relex.

Assuntos

Crowdsourcing; Curadoria de Dados/métodos; Mineração de Dados/métodos; Bases de Dados Factuais; Doença/etiologia; Substâncias Perigosas/toxicidade; Humanos; Fluxo de Trabalho

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Substâncias Perigosas / Doença / Bases de Dados Factuais / Mineração de Dados / Crowdsourcing / Curadoria de Dados Tipo de estudo: Prognostic_studies / Qualitative_research / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2016 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google