Imbalanced biomedical data classification using self-adaptive multilayer ELM combined with dynamic GAN.

Zhang, Liyuan; Yang, Huamin; Jiang, Zhengang

Zhang, Liyuan; Yang, Huamin; Jiang, Zhengang.

Affiliation

Zhang L; School of Computer Science and Technology, Medical Imaging Engineering Laboratory, Changchun University of Science and Technology, No.7089, Weixing Road, Changchun, China.
Yang H; School of Computer Science and Technology, Medical Imaging Engineering Laboratory, Changchun University of Science and Technology, No.7089, Weixing Road, Changchun, China. yhm@cust.edu.cn.
Jiang Z; School of Computer Science and Technology, Medical Imaging Engineering Laboratory, Changchun University of Science and Technology, No.7089, Weixing Road, Changchun, China.

Biomed Eng Online ; 17(1): 181, 2018 Dec 04.

Article in En | MEDLINE | ID: mdl-30514298

ABSTRACT

BACKGROUND: Imbalanced data classification is an inevitable problem in medical intelligent diagnosis. Most of real-world biomedical datasets are usually along with limited samples and high-dimensional feature. This seriously affects the classification performance of the model and causes erroneous guidance for the diagnosis of diseases. Exploring an effective classification method for imbalanced and limited biomedical dataset is a challenging task. METHODS: In this paper, we propose a novel multilayer extreme learning machine (ELM) classification model combined with dynamic generative adversarial net (GAN) to tackle limited and imbalanced biomedical data. Firstly, principal component analysis is utilized to remove irrelevant and redundant features. Meanwhile, more meaningful pathological features are extracted. After that, dynamic GAN is designed to generate the realistic-looking minority class samples, thereby balancing the class distribution and avoiding overfitting effectively. Finally, a self-adaptive multilayer ELM is proposed to classify the balanced dataset. The analytic expression for the numbers of hidden layer and node is determined by quantitatively establishing the relationship between the change of imbalance ratio and the hyper-parameters of the model. Reducing interactive parameters adjustment makes the classification model more robust. RESULTS: To evaluate the classification performance of the proposed method, numerical experiments are conducted on four real-world biomedical datasets. The proposed method can generate authentic minority class samples and self-adaptively select the optimal parameters of learning model. By comparing with W-ELM, SMOTE-ELM, and H-ELM methods, the quantitative experimental results demonstrate that our method can achieve better classification performance and higher computational efficiency in terms of ROC, AUC, G-mean, and F-measure metrics. CONCLUSIONS: Our study provides an effective solution for imbalanced biomedical data classification under the condition of limited samples and high-dimensional feature. The proposed method could offer a theoretical basis for computer-aided diagnosis. It has the potential to be applied in biomedical clinical practice.

Subject(s)

Biomedical Research; Data Analysis; Machine Learning

Key words

Dynamic GAN; High-dimensional feature; Imbalanced data classification; Limited biomedical samples; Multilayer ELM

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Biomedical Research / Machine Learning / Data Analysis Type of study: Guideline Language: En Journal: Biomed Eng Online Journal subject: ENGENHARIA BIOMEDICA Year: 2018 Document type: Article Affiliation country: China Country of publication: United kingdom

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google