Evaluation of crowdsourced mortality prediction models as a framework for assessing artificial intelligence in medicine.

Bergquist, Timothy; Schaffter, Thomas; Yan, Yao; Yu, Thomas; Prosser, Justin; Gao, Jifan; Chen, Guanhua; Charzewski, Lukasz; Nawalany, Zofia; Brugere, Ivan; Retkute, Renata; Prusokas, Alidivinas; Prusokas, Augustinas; Choi, Yonghwa; Lee, Sanghoon; Choe, Junseok; Lee, Inggeol; Kim, Sunkyu; Kang, Jaewoo; Mooney, Sean D; Guinney, Justin

Bergquist, Timothy; Schaffter, Thomas; Yan, Yao; Yu, Thomas; Prosser, Justin; Gao, Jifan; Chen, Guanhua; Charzewski, Lukasz; Nawalany, Zofia; Brugere, Ivan; Retkute, Renata; Prusokas, Alidivinas; Prusokas, Augustinas; Choi, Yonghwa; Lee, Sanghoon; Choe, Junseok; Lee, Inggeol; Kim, Sunkyu; Kang, Jaewoo; Mooney, Sean D; Guinney, Justin.

Afiliação

Bergquist T; Sage Bionetworks, Seattle, WA, United States.
Schaffter T; Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States.
Yan Y; Sage Bionetworks, Seattle, WA, United States.
Yu T; Sage Bionetworks, Seattle, WA, United States.
Prosser J; Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, United States.
Gao J; Sage Bionetworks, Seattle, WA, United States.
Chen G; Institute of Translational Health Sciences, University of Washington, Seattle, WA, United States.
Charzewski L; Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States.
Nawalany Z; Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States.
Brugere I; Proacta, Warsaw, Poland.
Retkute R; Division of Biophysics, University of Warsaw, Warsaw, Poland.
Prusokas A; Proacta, Warsaw, Poland.
Prusokas A; Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States.
Choi Y; Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom.
Lee S; Plant and Molecular Sciences, School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom.
Choe J; Department of Life Sciences, Imperial College London, London, United Kingdom.
Lee I; Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea.
Kim S; Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea.
Kang J; Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea.
Mooney SD; Department of Interdisciplinary Program in Bioinformatics, College of Informatics, Korea University, Seoul, Republic of Korea.
Guinney J; Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea.

J Am Med Inform Assoc ; 31(1): 35-44, 2023 12 22.

Article em En | MEDLINE | ID: mdl-37604111

ABSTRACT

ABSTRACT

OBJECTIVE:

Applications of machine learning in healthcare are of high interest and have the potential to improve patient care. Yet, the real-world accuracy of these models in clinical practice and on different patient subpopulations remains unclear. To address these important questions, we hosted a community challenge to evaluate methods that predict healthcare outcomes. We focused on the prediction of all-cause mortality as the community challenge question. MATERIALS AND

METHODS:

Using a Model-to-Data framework, 345 registered participants, coalescing into 25 independent teams, spread over 3 continents and 10 countries, generated 25 accurate models all trained on a dataset of over 1.1 million patients and evaluated on patients prospectively collected over a 1-year observation of a large health system.

RESULTS:

The top performing team achieved a final area under the receiver operator curve of 0.947 (95% CI, 0.942-0.951) and an area under the precision-recall curve of 0.487 (95% CI, 0.458-0.499) on a prospectively collected patient cohort.

DISCUSSION:

Post hoc analysis after the challenge revealed that models differ in accuracy on subpopulations, delineated by race or gender, even when they are trained on the same data.

CONCLUSION:

This is the largest community challenge focused on the evaluation of state-of-the-art machine learning methods in a healthcare system performed to date, revealing both opportunities and pitfalls of clinical AI.

Assuntos

Crowdsourcing; Medicina; Humanos; Inteligência Artificial; Aprendizado de Máquina; Algoritmos

Palavras-chave

evaluation; health informatics; machine learning

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Crowdsourcing / Medicina Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google