RESUMEN
OBJECTIVE: Applications of machine learning in healthcare are of high interest and have the potential to improve patient care. Yet, the real-world accuracy of these models in clinical practice and on different patient subpopulations remains unclear. To address these important questions, we hosted a community challenge to evaluate methods that predict healthcare outcomes. We focused on the prediction of all-cause mortality as the community challenge question. MATERIALS AND METHODS: Using a Model-to-Data framework, 345 registered participants, coalescing into 25 independent teams, spread over 3 continents and 10 countries, generated 25 accurate models all trained on a dataset of over 1.1 million patients and evaluated on patients prospectively collected over a 1-year observation of a large health system. RESULTS: The top performing team achieved a final area under the receiver operator curve of 0.947 (95% CI, 0.942-0.951) and an area under the precision-recall curve of 0.487 (95% CI, 0.458-0.499) on a prospectively collected patient cohort. DISCUSSION: Post hoc analysis after the challenge revealed that models differ in accuracy on subpopulations, delineated by race or gender, even when they are trained on the same data. CONCLUSION: This is the largest community challenge focused on the evaluation of state-of-the-art machine learning methods in a healthcare system performed to date, revealing both opportunities and pitfalls of clinical AI.
Asunto(s)
Colaboración de las Masas , Medicina , Humanos , Inteligencia Artificial , Aprendizaje Automático , AlgoritmosRESUMEN
Importance: Machine learning could be used to predict the likelihood of diagnosis and severity of illness. Lack of COVID-19 patient data has hindered the data science community in developing models to aid in the response to the pandemic. Objectives: To describe the rapid development and evaluation of clinical algorithms to predict COVID-19 diagnosis and hospitalization using patient data by citizen scientists, provide an unbiased assessment of model performance, and benchmark model performance on subgroups. Design, Setting, and Participants: This diagnostic and prognostic study operated a continuous, crowdsourced challenge using a model-to-data approach to securely enable the use of regularly updated COVID-19 patient data from the University of Washington by participants from May 6 to December 23, 2020. A postchallenge analysis was conducted from December 24, 2020, to April 7, 2021, to assess the generalizability of models on the cumulative data set as well as subgroups stratified by age, sex, race, and time of COVID-19 test. By December 23, 2020, this challenge engaged 482 participants from 90 teams and 7 countries. Main Outcomes and Measures: Machine learning algorithms used patient data and output a score that represented the probability of patients receiving a positive COVID-19 test result or being hospitalized within 21 days after receiving a positive COVID-19 test result. Algorithms were evaluated using area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC) scores. Ensemble models aggregating models from the top challenge teams were developed and evaluated. Results: In the analysis using the cumulative data set, the best performance for COVID-19 diagnosis prediction was an AUROC of 0.776 (95% CI, 0.775-0.777) and an AUPRC of 0.297, and for hospitalization prediction, an AUROC of 0.796 (95% CI, 0.794-0.798) and an AUPRC of 0.188. Analysis on top models submitting to the challenge showed consistently better model performance on the female group than the male group. Among all age groups, the best performance was obtained for the 25- to 49-year age group, and the worst performance was obtained for the group aged 17 years or younger. Conclusions and Relevance: In this diagnostic and prognostic study, models submitted by citizen scientists achieved high performance for the prediction of COVID-19 testing and hospitalization outcomes. Evaluation of challenge models on demographic subgroups and prospective data revealed performance discrepancies, providing insights into the potential bias and limitations in the models.
Asunto(s)
Algoritmos , Benchmarking , COVID-19/diagnóstico , Reglas de Decisión Clínica , Colaboración de las Masas , Hospitalización/estadística & datos numéricos , Aprendizaje Automático , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Área Bajo la Curva , COVID-19/epidemiología , COVID-19/terapia , Prueba de COVID-19 , Niño , Preescolar , Femenino , Humanos , Lactante , Recién Nacido , Masculino , Persona de Mediana Edad , Modelos Estadísticos , Pronóstico , Curva ROC , Índice de Severidad de la Enfermedad , Washingtón/epidemiología , Adulto JovenRESUMEN
In many animal societies, groups of individuals form stable social units that are shaped by well-delineated dominance hierarchies and a range of affiliative relationships. How do socially complex groups maintain cohesion and achieve collective movement? Using high-resolution GPS tracking of members of a wild baboon troop, we test whether collective movement in stable social groups is governed by interactions among local neighbours (commonly found in groups with largely anonymous memberships), social affiliates, and/or by individuals paying attention to global group structure. We construct candidate movement prediction models and evaluate their ability to predict the future trajectory of focal individuals. We find that baboon movements are best predicted by 4 to 6 neighbours. While these are generally individuals' nearest neighbours, we find that baboons have distinct preferences for particular neighbours, and that these social affiliates best predict individual location at longer time scales (>10 minutes). Our results support existing theoretical and empirical studies highlighting the importance of local rules in driving collective outcomes, such as collective departures, in primates. We extend previous studies by elucidating the rules that maintain cohesion in baboons 'on the move', as well as the different temporal scales of social interactions that are at play.