Pesquisa | Portal Regional da BVS

Discrepancies in Stroke Distribution and Dataset Origin in Machine Learning for Stroke.

Velagapudi, Lohit; Mouchtouris, Nikolaos; Baldassari, Michael P; Nauheim, David; Khanna, Omaditya; Saiegh, Fadi Al; Herial, Nabeel; Gooch, M Reid; Tjoumakaris, Stavropoula; Rosenwasser, Robert H; Jabbour, Pascal.

J Stroke Cerebrovasc Dis ; 30(7): 105832, 2021 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-33940363

RESUMO

BACKGROUND: Machine learning algorithms depend on accurate and representative datasets for training in order to become valuable clinical tools that are widely generalizable to a varied population. We aim to conduct a review of machine learning uses in stroke literature to assess the geographic distribution of datasets and patient cohorts used to train these models and compare them to stroke distribution to evaluate for disparities. AIMS: 582 studies were identified on initial searching of the PubMed database. Of these studies, 106 full texts were assessed after title and abstract screening which resulted in 489 papers excluded. Of these 106 studies, 79 were excluded due to using cohorts from outside the United States or being review articles or editorials. 27 studies were thus included in this analysis. SUMMARY OF REVIEW: Of the 27 studies included, 7 (25.9%) used patient data from California, 6 (22.2%) were multicenter, 3 (11.1%) were in Massachusetts, 2 (7.4%) each in Illinois, Missouri, and New York, and 1 (3.7%) each from South Carolina, Washington, West Virginia, and Wisconsin. 1 (3.7%) study used data from Utah and Texas. These were qualitatively compared to a CDC study showing the highest distribution of stroke in Mississippi (4.3%) followed by Oklahoma (3.4%), Washington D.C. (3.4%), Louisiana (3.3%), and Alabama (3.2%) while the prevalence in California was 2.6%. CONCLUSIONS: It is clear that a strong disconnect exists between the datasets and patient cohorts used in training machine learning algorithms in clinical research and the stroke distribution in which clinical tools using these algorithms will be implemented. In order to ensure a lack of bias and increase generalizability and accuracy in future machine learning studies, datasets using a varied patient population that reflects the unequal distribution of stroke risk factors would greatly benefit the usability of these tools and ensure accuracy on a nationwide scale.

Assuntos

Mineração de Dados , Aprendizado de Máquina , Acidente Vascular Cerebral/epidemiologia , Viés , Confiabilidade dos Dados , Bases de Dados Factuais , Humanos , Prevalência , Prognóstico , Acidente Vascular Cerebral/diagnóstico , Acidente Vascular Cerebral/terapia , Estados Unidos/epidemiologia

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA