Machine learning to optimize literature screening in medical guideline development.

Harmsen, Wouter; de Groot, Janke; Harkema, Albert; van Dusseldorp, Ingeborg; de Bruin, Jonathan; van den Brand, Sofie; van de Schoot, Rens

Harmsen, Wouter; de Groot, Janke; Harkema, Albert; van Dusseldorp, Ingeborg; de Bruin, Jonathan; van den Brand, Sofie; van de Schoot, Rens.

Afiliação

Harmsen W; Knowlegde Institute for the Federation of Medical Specialists, Utrecht, The Netherlands.
de Groot J; Knowlegde Institute for the Federation of Medical Specialists, Utrecht, The Netherlands.
Harkema A; Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, The Netherlands.
van Dusseldorp I; Knowlegde Institute for the Federation of Medical Specialists, Utrecht, The Netherlands.
de Bruin J; Department of Research and Data Management Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands.
van den Brand S; Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, The Netherlands.
van de Schoot R; Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, The Netherlands. a.g.j.vandeschoot@uu.nl.

Syst Rev ; 13(1): 177, 2024 Jul 11.

Article em En | MEDLINE | ID: mdl-38992684

ABSTRACT

ABSTRACT

OBJECTIVES:

In a time of exponential growth of new evidence supporting clinical decision-making, combined with a labor-intensive process of selecting this evidence, methods are needed to speed up current processes to keep medical guidelines up-to-date. This study evaluated the performance and feasibility of active learning to support the selection of relevant publications within medical guideline development and to study the role of noisy labels.

DESIGN:

We used a mixed-methods design. Two independent clinicians' manual process of literature selection was evaluated for 14 searches. This was followed by a series of simulations investigating the performance of random reading versus using screening prioritization based on active learning. We identified hard-to-find papers and checked the labels in a reflective dialogue. MAIN OUTCOME

MEASURES:

Inter-rater reliability was assessed using Cohen's Kappa (Ä¸). To evaluate the performance of active learning, we used the Work Saved over Sampling at 95% recall (WSS@95) and percentage Relevant Records Found at reading only 10% of the total number of records (RRF@10). We used the average time to discovery (ATD) to detect records with potentially noisy labels. Finally, the accuracy of labeling was discussed in a reflective dialogue with guideline developers.

RESULTS:

Mean Ä¸ for manual title-abstract selection by clinicians was 0.50 and varied between - 0.01 and 0.87 based on 5.021 abstracts. WSS@95 ranged from 50.15% (SD = 17.7) based on selection by clinicians to 69.24% (SD = 11.5) based on the selection by research methodologist up to 75.76% (SD = 12.2) based on the final full-text inclusion. A similar pattern was seen for RRF@10, ranging from 48.31% (SD = 23.3) to 62.8% (SD = 21.20) and 65.58% (SD = 23.25). The performance of active learning deteriorates with higher noise. Compared with the final full-text selection, the selection made by clinicians or research methodologists deteriorated WSS@95 by 25.61% and 6.25%, respectively.

CONCLUSION:

While active machine learning tools can accelerate the process of literature screening within guideline development, they can only work as well as the input given by human raters. Noisy labels make noisy machine learning.

Assuntos

Aprendizado de Máquina; Guias de Prática Clínica como Assunto; Humanos; Reprodutibilidade dos Testes; Tomada de Decisão Clínica; Medicina Baseada em Evidências

Palavras-chave

Active learning; Guideline development; Machine learning; Systematic reviewing

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Guias de Prática Clínica como Assunto / Aprendizado de Máquina Limite: Humans Idioma: En Revista: Syst Rev Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Holanda

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google