Is the crowd better as an assistant or a replacement in ontology engineering? An exploration through the lens of the Gene Ontology.

Mortensen, Jonathan M; Telis, Natalie; Hughey, Jacob J; Fan-Minogue, Hua; Van Auken, Kimberly; Dumontier, Michel; Musen, Mark A

Mortensen, Jonathan M; Telis, Natalie; Hughey, Jacob J; Fan-Minogue, Hua; Van Auken, Kimberly; Dumontier, Michel; Musen, Mark A.

Mortensen JM; Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305-5479, United States; Biomedical Informatics Training Program, Stanford University, Stanford, CA 94305-5479, United States.
Telis N; Biomedical Informatics Training Program, Stanford University, Stanford, CA 94305-5479, United States.
Hughey JJ; Institute of Computational Health Sciences, University of California, San Francisco, San Francisco, CA 94143, United States.
Fan-Minogue H; Department of Pediatrics, Stanford University, Stanford, CA 94305-5479, United States.
Van Auken K; Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, United States.
Dumontier M; Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305-5479, United States.
Musen MA; Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305-5479, United States. Electronic address: musen@stanford.edu.

J Biomed Inform ; 60: 199-209, 2016 Apr.

Article en En | MEDLINE | ID: mdl-26873781

ABSTRACT

ABSTRACT

Biomedical ontologies contain errors. Crowdsourcing, defined as taking a job traditionally performed by a designated agent and outsourcing it to an undefined large group of people, provides scalable access to humans. Therefore, the crowd has the potential to overcome the limited accuracy and scalability found in current ontology quality assurance approaches. Crowd-based methods have identified errors in SNOMED CT, a large, clinical ontology, with an accuracy similar to that of experts, suggesting that crowdsourcing is indeed a feasible approach for identifying ontology errors. This work uses that same crowd-based methodology, as well as a panel of experts, to verify a subset of the Gene Ontology (200 relationships). Experts identified 16 errors, generally in relationships referencing acids and metals. The crowd performed poorly in identifying those errors, with an area under the receiver operating characteristic curve ranging from 0.44 to 0.73, depending on the methods configuration. However, when the crowd verified what experts considered to be easy relationships with useful definitions, they performed reasonably well. Notably, there are significantly fewer Google search results for Gene Ontology concepts than SNOMED CT concepts. This disparity may account for the difference in performance - fewer search results indicate a more difficult task for the worker. The number of Internet search results could serve as a method to assess which tasks are appropriate for the crowd. These results suggest that the crowd fits better as an expert assistant, helping experts with their verification by completing the easy tasks and allowing experts to focus on the difficult tasks, rather than an expert replacement.

Asunto(s)

Colaboración de las Masas/métodos; Ontología de Genes; Systematized Nomenclature of Medicine; Algoritmos; Análisis de Varianza; Área Bajo la Curva; Biología Computacional/métodos; Humanos; Internet; Motor de Búsqueda; Programas Informáticos; Análisis y Desempeño de Tareas

Palabras clave

Crowdsourcing; Gene Ontology; Ontology engineering

Texto completo

Imprimir

XML

PubMed Links

Search on Google

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Systematized Nomenclature of Medicine / Colaboración de las Masas / Ontología de Genes Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Año: 2016 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Search on Google