Hybrid curation of gene-mutation relations combining automated extraction and crowdsourcing.

Burger, John D; Doughty, Emily; Khare, Ritu; Wei, Chih-Hsuan; Mishra, Rajashree; Aberdeen, John; Tresner-Kirsch, David; Wellner, Ben; Kann, Maricel G; Lu, Zhiyong; Hirschman, Lynette

Burger, John D; Doughty, Emily; Khare, Ritu; Wei, Chih-Hsuan; Mishra, Rajashree; Aberdeen, John; Tresner-Kirsch, David; Wellner, Ben; Kann, Maricel G; Lu, Zhiyong; Hirschman, Lynette.

Affiliation

Burger JD; The MITRE Corporation, Bedford, MA 01730, USA, Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and The University of Maryland, Baltimore C
Doughty E; The MITRE Corporation, Bedford, MA 01730, USA, Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and The University of Maryland, Baltimore C
Khare R; The MITRE Corporation, Bedford, MA 01730, USA, Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and The University of Maryland, Baltimore C
Wei CH; The MITRE Corporation, Bedford, MA 01730, USA, Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and The University of Maryland, Baltimore C
Mishra R; The MITRE Corporation, Bedford, MA 01730, USA, Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and The University of Maryland, Baltimore C
Aberdeen J; The MITRE Corporation, Bedford, MA 01730, USA, Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and The University of Maryland, Baltimore C
Tresner-Kirsch D; The MITRE Corporation, Bedford, MA 01730, USA, Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and The University of Maryland, Baltimore C
Wellner B; The MITRE Corporation, Bedford, MA 01730, USA, Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and The University of Maryland, Baltimore C
Kann MG; The MITRE Corporation, Bedford, MA 01730, USA, Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and The University of Maryland, Baltimore C
Lu Z; The MITRE Corporation, Bedford, MA 01730, USA, Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and The University of Maryland, Baltimore C
Hirschman L; The MITRE Corporation, Bedford, MA 01730, USA, Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and The University of Maryland, Baltimore C

Database (Oxford) ; 20142014.

Article in En | MEDLINE | ID: mdl-25246425

ABSTRACT

ABSTRACT

BACKGROUND:

This article describes capture of biological information using a hybrid approach that combines natural language processing to extract biological entities and crowdsourcing with annotators recruited via Amazon Mechanical Turk to judge correctness of candidate biological relations. These techniques were applied to extract gene- mutation relations from biomedical abstracts with the goal of supporting production scale capture of gene-mutation-disease findings as an open source resource for personalized medicine.

RESULTS:

The hybrid system could be configured to provide good performance for gene-mutation extraction (precision â¼82%; recall â¼70% against an expert-generated gold standard) at a cost of $0.76 per abstract. This demonstrates that crowd labor platforms such as Amazon Mechanical Turk can be used to recruit quality annotators, even in an application requiring subject matter expertise; aggregated Turker judgments for gene-mutation relations exceeded 90% accuracy. Over half of the precision errors were due to mismatches against the gold standard hidden from annotator view (e.g., incorrect EntrezGene identifier or incorrect mutation position extracted), or incomplete task instructions (e.g., the need to exclude nonhuman mutations).

CONCLUSIONS:

The hybrid curation model provides a readily scalable cost-effective approach to curation, particularly if coupled with expert human review to filter precision errors. We plan to generalize the framework and make it available as open source software. DATABASE URL http//www.mitre.org/publications/technical-papers/hybrid-curation-of-gene-mutation-relations-combining-automated.

Subject(s)

Crowdsourcing/methods; Data Curation/methods; Genetic Predisposition to Disease; Information Storage and Retrieval/methods; Mutation/genetics; Natural Language Processing; Computational Biology/methods; Crowdsourcing/economics; Data Curation/economics; Databases, Genetic; Genomics; Humans

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Natural Language Processing / Information Storage and Retrieval / Genetic Predisposition to Disease / Crowdsourcing / Data Curation / Mutation Type of study: Prognostic_studies Limits: Humans Language: En Journal: Database (Oxford) Year: 2014 Document type: Article

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google