Performance and usability of machine learning for screening in systematic reviews: a comparative evaluation of three tools.

Gates, Allison; Guitard, Samantha; Pillay, Jennifer; Elliott, Sarah A; Dyson, Michele P; Newton, Amanda S; Hartling, Lisa

Gates, Allison; Guitard, Samantha; Pillay, Jennifer; Elliott, Sarah A; Dyson, Michele P; Newton, Amanda S; Hartling, Lisa.

Afiliación

Gates A; Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada.
Guitard S; Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada.
Pillay J; Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada.
Elliott SA; Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada.
Dyson MP; Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada.
Newton AS; Department of Pediatrics, University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada.
Hartling L; Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada. hartling@ualberta.ca.

Syst Rev ; 8(1): 278, 2019 11 15.

Article en En | MEDLINE | ID: mdl-31727150

RESUMEN

BACKGROUND: We explored the performance of three machine learning tools designed to facilitate title and abstract screening in systematic reviews (SRs) when used to (a) eliminate irrelevant records (automated simulation) and (b) complement the work of a single reviewer (semi-automated simulation). We evaluated user experiences for each tool. METHODS: We subjected three SRs to two retrospective screening simulations. In each tool (Abstrackr, DistillerSR, RobotAnalyst), we screened a 200-record training set and downloaded the predicted relevance of the remaining records. We calculated the proportion missed and workload and time savings compared to dual independent screening. To test user experiences, eight research staff tried each tool and completed a survey. RESULTS: Using Abstrackr, DistillerSR, and RobotAnalyst, respectively, the median (range) proportion missed was 5 (0 to 28) percent, 97 (96 to 100) percent, and 70 (23 to 100) percent for the automated simulation and 1 (0 to 2) percent, 2 (0 to 7) percent, and 2 (0 to 4) percent for the semi-automated simulation. The median (range) workload savings was 90 (82 to 93) percent, 99 (98 to 99) percent, and 85 (85 to 88) percent for the automated simulation and 40 (32 to 43) percent, 49 (48 to 49) percent, and 35 (34 to 38) percent for the semi-automated simulation. The median (range) time savings was 154 (91 to 183), 185 (95 to 201), and 157 (86 to 172) hours for the automated simulation and 61 (42 to 82), 92 (46 to 100), and 64 (37 to 71) hours for the semi-automated simulation. Abstrackr identified 33-90% of records missed by a single reviewer. RobotAnalyst performed less well and DistillerSR provided no relative advantage. User experiences depended on user friendliness, qualities of the user interface, features and functions, trustworthiness, ease and speed of obtaining predictions, and practicality of the export file(s). CONCLUSIONS: The workload savings afforded in the automated simulation came with increased risk of missing relevant records. Supplementing a single reviewer's decisions with relevance predictions (semi-automated simulation) sometimes reduced the proportion missed, but performance varied by tool and SR. Designing tools based on reviewers' self-identified preferences may improve their compatibility with present workflows. SYSTEMATIC REVIEW REGISTRATION: Not applicable.

Asunto(s)

Almacenamiento y Recuperación de la Información/métodos; Aprendizaje Automático; Programas Informáticos; Indización y Redacción de Resúmenes/clasificación; Humanos; Reproducibilidad de los Resultados; Revisiones Sistemáticas como Asunto; Factores de Tiempo; Carga de Trabajo

Palabras clave

Automation; Machine learning; Systematic reviews; Usability; User experience

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Programas Informáticos / Almacenamiento y Recuperación de la Información / Aprendizaje Automático Tipo de estudio: Diagnostic_studies / Prognostic_studies / Screening_studies / Systematic_reviews Límite: Humans Idioma: En Revista: Syst Rev Año: 2019 Tipo del documento: Article País de afiliación: Canadá

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google