Your browser doesn't support javascript.
loading
Clinical data classification using an enhanced SMOTE and chaotic evolutionary feature selection.
Sreejith, S; Khanna Nehemiah, H; Kannan, A.
Afiliación
  • Sreejith S; Ramanujan Computing Centre, Anna University, Chennai, 600025, Tamil Nadu, India.
  • Khanna Nehemiah H; Ramanujan Computing Centre, Anna University, Chennai, 600025, Tamil Nadu, India. Electronic address: nehemiah@annauniv.edu.
  • Kannan A; School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
Comput Biol Med ; 126: 103991, 2020 11.
Article en En | MEDLINE | ID: mdl-32987205
ABSTRACT
Class imbalance and the presence of irrelevant or redundant features in training data can pose serious challenges to the development of a classification framework. This paper proposes a framework for developing a Clinical Decision Support System (CDSS) that addresses class imbalance and the feature selection problem. Under this framework, the dataset is balanced at the data level and a wrapper approach is used to perform feature selection. The following three clinical datasets from the University of California Irvine (UCI) machine learning repository were used for experimentation the Indian Liver Patient Dataset (ILPD), the Thoracic Surgery Dataset (TSD) and the Pima Indian Diabetes (PID) dataset. The Synthetic Minority Over-sampling Technique (SMOTE), which was enhanced using Orchard's algorithm, was used to balance the datasets. A wrapper approach that uses Chaotic Multi-Verse Optimisation (CMVO) was proposed for feature subset selection. The arithmetic mean of the Matthews correlation coefficient (MCC) and F-score (F1), which was measured using a Random Forest (RF) classifier, was used as the fitness function. After selecting the relevant features, a RF, which comprises 100 estimators and uses the Information Gain Ratio as the split criteria, was used for classification. The classifier achieved a 0.65 MCC, a 0.84 F1 and 82.46% accuracy for the ILPD; a 0.74 MCC, a 0.87 F1 and 86.88% accuracy for the TSD; and a 0.78 MCC, a 0.89 F1and 89.04% accuracy for the PID dataset. The effects of balancing and feature selection on the classifier were investigated and the performance of the framework was compared with the existing works in the literature. The results showed that the proposed framework is competitive in terms of the three performance measures used. The results of a Wilcoxon test confirmed the statistical superiority of the proposed method.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Algoritmos / Aprendizaje Automático Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: Comput Biol Med Año: 2020 Tipo del documento: Article País de afiliación: India

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Algoritmos / Aprendizaje Automático Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: Comput Biol Med Año: 2020 Tipo del documento: Article País de afiliación: India
...