Búsqueda | Portal Regional de la BVS

Cost-sensitive ordinal classification methods to predict SARS-CoV-2 pneumonia severity.

Garcia-Garcia, Fernando; Lee, Dae-Jin; Espana Yandiola, Pedro Pablo; Landa, Isabel Urrutia; Martinez-Minaya, Joaquin; Hayet-Otero, Miren; Ermecheo, Monica Nieves; Quintana, Jose Maria; Menendez, Rosario; Torres, Antoni; Jorge, Rafael Zalacain.

IEEE J Biomed Health Inform ; PP2024 Feb 08.

Artículo en Inglés | MEDLINE | ID: mdl-38329848

RESUMEN

OBJECTIVE: To study the suitability of costsensitive ordinal artificial intelligence-machine learning (AIML) strategies in the prognosis of SARS-CoV-2 pneumonia severity. MATERIALS & METHODS: Observational, retrospective, longitudinal, cohort study in 4 hospitals in Spain. Information regarding demographic and clinical status was supplemented by socioeconomic data and air pollution exposures. We proposed AI-ML algorithms for ordinal classification via ordinal decomposition and for cost-sensitive learning via resampling techniques. For performance-based model selection, we defined a custom score including per-class sensitivities and asymmetric misprognosis costs. 260 distinct AI-ML models were evaluated via 10 repetitions of 5×5 nested cross-validation with hyperparameter tuning. Model selection was followed by the calibration of predicted probabilities. Final overall performance was compared against five well-established clinical severity scores and against a 'standard' (non-cost sensitive, non-ordinal) AI-ML baseline. In our best model, we also evaluated its explainability with respect to each of the input variables. RESULTS: The study enrolled n = 1548 patients: 712 experienced low, 238 medium, and 598 high clinical severity. d = 131 variables were collected, becoming d ' = 148 features after categorical encoding. Model selection resulted in our best-performing AI-ML pipeline having: a) no imputation of missing data, b) no feature selection (i.e. using the full set of d ' features), c) 'Ordered Partitions' ordinal decomposition, d) cost-based reimbalance, and e) a Histogram-based Gradient Boosting classifier. This best model (calibrated) obtained a median accuracy of 68.1% [67.3%, 68.8%] (95% confidence interval), a balanced accuracy of 57.0% [55.6%, 57.9%], and an overall area under the curve (AUC) 0.802 [0.795, 0.808]. In our dataset, it outperformed all five clinical severity scores and the 'standard' AI-ML baseline. DISCUSSION & CONCLUSION: We conducted an exhaustive exploration of AI-ML methods designed for both ordinal and cost-sensitive classification, motivated by a real-world application domain (clinical severity prognosis) in which these topics arise naturally. Our model with the best classification performance exploited successfully the ordering information of ground truth classes, coping with imbalance and asymmetric costs. However, these ordinal and cost-sensitive aspects are seldom explored in the literature.

Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques.

Hayet-Otero, Miren; García-García, Fernando; Lee, Dae-Jin; Martínez-Minaya, Joaquín; España Yandiola, Pedro Pablo; Urrutia Landa, Isabel; Nieves Ermecheo, Mónica; Quintana, José María; Menéndez, Rosario; Torres, Antoni; Zalacain Jorge, Rafael; Arostegui, Inmaculada.

PLoS One ; 18(4): e0284150, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37053151

RESUMEN

With the COVID-19 pandemic having caused unprecedented numbers of infections and deaths, large research efforts have been undertaken to increase our understanding of the disease and the factors which determine diverse clinical evolutions. Here we focused on a fully data-driven exploration regarding which factors (clinical or otherwise) were most informative for SARS-CoV-2 pneumonia severity prediction via machine learning (ML). In particular, feature selection techniques (FS), designed to reduce the dimensionality of data, allowed us to characterize which of our variables were the most useful for ML prognosis. We conducted a multi-centre clinical study, enrolling n = 1548 patients hospitalized due to SARS-CoV-2 pneumonia: where 792, 238, and 598 patients experienced low, medium and high-severity evolutions, respectively. Up to 106 patient-specific clinical variables were collected at admission, although 14 of them had to be discarded for containing â©¾60% missing values. Alongside 7 socioeconomic attributes and 32 exposures to air pollution (chronic and acute), these became d = 148 features after variable encoding. We addressed this ordinal classification problem both as a ML classification and regression task. Two imputation techniques for missing data were explored, along with a total of 166 unique FS algorithm configurations: 46 filters, 100 wrappers and 20 embeddeds. Of these, 21 setups achieved satisfactory bootstrap stability (â©¾0.70) with reasonable computation times: 16 filters, 2 wrappers, and 3 embeddeds. The subsets of features selected by each technique showed modest Jaccard similarities across them. However, they consistently pointed out the importance of certain explanatory variables. Namely: patient's C-reactive protein (CRP), pneumonia severity index (PSI), respiratory rate (RR) and oxygen levels -saturation Sp O2, quotients Sp O2/RR and arterial Sat O2/Fi O2-, the neutrophil-to-lymphocyte ratio (NLR) -to certain extent, also neutrophil and lymphocyte counts separately-, lactate dehydrogenase (LDH), and procalcitonin (PCT) levels in blood. A remarkable agreement has been found a posteriori between our strategy and independent clinical research works investigating risk factors for COVID-19 severity. Hence, these findings stress the suitability of this type of fully data-driven approaches for knowledge extraction, as a complementary to clinical perspectives.

Asunto(s)

COVID-19 , Neumonía , Humanos , SARS-CoV-2 , Pandemias , Pronóstico , Estudios Retrospectivos

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA