Improving mortality prediction in Acute Pancreatitis by machine learning and data augmentation.

Hameed, M Asad Bin; Alamgir, Zareen

Hameed, M Asad Bin; Alamgir, Zareen.

Afiliación

Hameed MAB; Department of Computer Science, National University of Computer and Emerging sciences (NUCES), Lahore, Pakistan. Electronic address: ranaasad074@gmail.com.
Alamgir Z; Department of Computer Science, National University of Computer and Emerging sciences (NUCES), Lahore, Pakistan. Electronic address: zareen.alamgir@nu.edu.pk.

Comput Biol Med ; 150: 106077, 2022 11.

Article en En | MEDLINE | ID: mdl-36137318

ABSTRACT

ABSTRACT

Acute Pancreatitis (AP) is the inflammation of the pancreas that can be fatal or lead to further complications based on the severity of the attack. Early detection of AP disease can help save lives by providing utmost care, rigorous treatment, and better resources. In this era of data and technology, instead of relying on manual scoring systems, scientists are employing advanced machine learning and data mining models for the early detection of patients with high chances of mortality. The current work on AP mortality prediction is negligible, and the few studies that exist have many shortcomings and are impractical for clinical deployment. In this research work, we tried to overcome the existing issues. One main issue is the lack of high-quality public datasets for AP, which are crucial for effectively training ML models. The available datasets are small in size, have many missing values, and suffer from high class imbalance. We augmented three public datasets, MIMIC-III, MIMIC-IV, and eICU, to obtain a larger dataset, and experiments proved that augmented data trained classifiers better than original small datasets. Moreover, we employed emerging advanced techniques to handle underlying issues in data. The results showed that iterative imputer is best for filling missing values in AP data. It beats not only the basic techniques but also the Knn-based imputation. Class imbalance is first addressed using data downsampling; apparently, it gave decent results on small test sets. However, we conducted numerous experiments on large test sets to prove that downsampling in the case of AP produced misleading and poor results. Next, we applied various techniques to upsample data in two different class splits, a 50 to 50 and a 70 to 30 majority-minority class split. Four different tabular generative adversarial networks, CTGAN, TGAN, CopulaGAN, and CTAB, and a variational autoencoder, TVAE, were deployed for synthetic data generation. SMOTE was also utilized for data upsampling. The computational results showed that the Random Forest (RF) classifier outperformed all other classifiers on a 50 to 50 class split data generated by CTGAN, with 0.702 Fß and 0.833 recall. Results produced by RF on the TVAE dataset were also comparable, with 0.698 Fß. In the case of SMOTE-based upsampling, DNN performed best with a 0.671 Fß score.

Asunto(s)

Pancreatitis; Humanos; Pancreatitis/diagnóstico; Enfermedad Aguda; Inflamación; Minería de Datos; Aprendizaje Automático

Palabras clave

Acute Pancreatitis; Generative Adversarial Network (GAN); Imputation; MIMIC-III; MIMIC-IV; Machine learning; Mortality prediction; Variational auto encoder (VAE)

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Base de datos: MEDLINE Asunto principal: Pancreatitis Tipo de estudio: Guideline / Prognostic_studies / Risk_factors_studies / Screening_studies Idioma: En Revista: Comput Biol Med Año: 2022 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google