A hybrid ensemble method based on double disturbance for classifying microarray data.

Chen, Tao; Xue, Huifeng; Hong, Zenglin; Cui, Man; Zhao, Hui

Chen, Tao; Xue, Huifeng; Hong, Zenglin; Cui, Man; Zhao, Hui.

Afiliação

Chen T; School of Automation, Northwestern Polytechnical University, Xi'an, 710072, Shaanxi, China.
Xue H; School of Mathematics and Computer Science, Shaanxi University of Technology, Hanzhong, 723000, Shaanxi, China.
Hong Z; School of Automation, Northwestern Polytechnical University, Xi'an, 710072, Shaanxi, China.
Cui M; School of Automation, Northwestern Polytechnical University, Xi'an, 710072, Shaanxi, China.
Zhao H; School of Automation, Northwestern Polytechnical University, Xi'an, 710072, Shaanxi, China.

Biomed Mater Eng ; 26 Suppl 1: S1961-8, 2015.

Article em En | MEDLINE | ID: mdl-26405970

ABSTRACT

ABSTRACT

Microarray data has small samples and high dimension, and it contains a significant amount of irrelevant and redundant genes. This paper proposes a hybrid ensemble method based on double disturbance to improve classification performance. Firstly, original genes are ranked through reliefF algorithm and part of the genes are selected from the original genes set, and then a new training set is generated from the original training set according to the previously selected genes. Secondly, D bootstrap training subsets are produced from the previously generated training set by bootstrap technology. Thirdly, an attribute reduction method based on neighborhood mutual information with a different radius is used to reduce genes on each bootstrap training subset to produce new training subsets. Each new training subset is applied to train a base classifier. Finally, a part of the base classifiers are selected based on the teaching-learning-based optimization to build an ensemble by weighted voting. Experimental results on six benchmark cancer microarray datasets showed proposed method decreased ensemble size and obtained higher classification performance compared with Bagging, AdaBoost, and Random Forest.

Assuntos

Perfilação da Expressão Gênica/métodos; Aprendizado de Máquina; Modelos Estatísticos; Proteínas de Neoplasias/metabolismo; Neoplasias/metabolismo; Análise de Sequência com Séries de Oligonucleotídeos/métodos; Algoritmos; Simulação por Computador; Humanos; Reconhecimento Automatizado de Padrão/métodos

Palavras-chave

Microarray data; bagging; neighborhood mutual information; reliefF; teaching-learning-based optimization

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Modelos Estatísticos / Análise de Sequência com Séries de Oligonucleotídeos / Perfilação da Expressão Gênica / Aprendizado de Máquina / Proteínas de Neoplasias / Neoplasias Tipo de estudo: Risk_factors_studies Limite: Humans Idioma: En Revista: Biomed Mater Eng Assunto da revista: BIOTECNOLOGIA / ENGENHARIA BIOMEDICA Ano de publicação: 2015 Tipo de documento: Article País de afiliação: China

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google