Multi-view feature selection for identifying gene markers: a diversified biological data driven approach.

Acharya, Sudipta; Cui, Laizhong; Pan, Yi

Acharya, Sudipta; Cui, Laizhong; Pan, Yi.

Afiliação

Acharya S; College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, People's Republic of China.
Cui L; College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, People's Republic of China. cuilz@szu.edu.cn.
Pan Y; Department of Computer Science, Georgia State University, Atlanta, USA.

BMC Bioinformatics ; 21(Suppl 18): 483, 2020 Dec 30.

Article em En | MEDLINE | ID: mdl-33375940

ABSTRACT

ABSTRACT

BACKGROUND:

In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population.

RESULTS:

In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets.

CONCLUSION:

A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.

Assuntos

Algoritmos; Marcadores Genéticos/genética; Análise por Conglomerados; Bases de Dados Genéticas; Ontologia Genética; Humanos; Neoplasias/genética; Neoplasias/patologia; Mapas de Interação de Proteínas

Palavras-chave

Gene ontology (GO); Gene selection; Gene similarity measures; Multi-objective clustering; Multi-view learning; Proteinprotein interaction network (PPIN); Sample classification

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Marcadores Genéticos Limite: Humans Idioma: En Ano de publicação: 2020 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google