Clinical data classification using an enhanced SMOTE and chaotic evolutionary feature selection.

Sreejith, S; Khanna Nehemiah, H; Kannan, A

Sreejith, S; Khanna Nehemiah, H; Kannan, A.

Afiliación

Sreejith S; Ramanujan Computing Centre, Anna University, Chennai, 600025, Tamil Nadu, India.
Khanna Nehemiah H; Ramanujan Computing Centre, Anna University, Chennai, 600025, Tamil Nadu, India. Electronic address: nehemiah@annauniv.edu.
Kannan A; School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.

Comput Biol Med ; 126: 103991, 2020 11.

Article en En | MEDLINE | ID: mdl-32987205

ABSTRACT

ABSTRACT

Class imbalance and the presence of irrelevant or redundant features in training data can pose serious challenges to the development of a classification framework. This paper proposes a framework for developing a Clinical Decision Support System (CDSS) that addresses class imbalance and the feature selection problem. Under this framework, the dataset is balanced at the data level and a wrapper approach is used to perform feature selection. The following three clinical datasets from the University of California Irvine (UCI) machine learning repository were used for experimentation the Indian Liver Patient Dataset (ILPD), the Thoracic Surgery Dataset (TSD) and the Pima Indian Diabetes (PID) dataset. The Synthetic Minority Over-sampling Technique (SMOTE), which was enhanced using Orchard's algorithm, was used to balance the datasets. A wrapper approach that uses Chaotic Multi-Verse Optimisation (CMVO) was proposed for feature subset selection. The arithmetic mean of the Matthews correlation coefficient (MCC) and F-score (F1), which was measured using a Random Forest (RF) classifier, was used as the fitness function. After selecting the relevant features, a RF, which comprises 100 estimators and uses the Information Gain Ratio as the split criteria, was used for classification. The classifier achieved a 0.65 MCC, a 0.84 F1 and 82.46% accuracy for the ILPD; a 0.74 MCC, a 0.87 F1 and 86.88% accuracy for the TSD; and a 0.78 MCC, a 0.89 F1and 89.04% accuracy for the PID dataset. The effects of balancing and feature selection on the classifier were investigated and the performance of the framework was compared with the existing works in the literature. The results showed that the proposed framework is competitive in terms of the three performance measures used. The results of a Wilcoxon test confirmed the statistical superiority of the proposed method.

Asunto(s)

Algoritmos; Aprendizaje Automático; Evolución Biológica; Humanos

Palabras clave

Chaotic maps; Class imbalance; Classification; Clinical decision support system; Feature selection; Multi Verse Optimisation; SMOTE

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Algoritmos / Aprendizaje Automático Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: Comput Biol Med Año: 2020 Tipo del documento: Article País de afiliación: India

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google