Your browser doesn't support javascript.
loading
Hybrid genetic algorithm-neural network: feature extraction for unpreprocessed microarray data.
Tong, Dong Ling; Schierz, Amanda C.
Affiliation
  • Tong DL; The John van Geest Cancer Research Centre, School of Science and Technology, Nottingham Trent University, UK. dong.tong@ntu.ac.uk
Artif Intell Med ; 53(1): 47-56, 2011 Sep.
Article in En | MEDLINE | ID: mdl-21775110
ABSTRACT

OBJECTIVE:

Suitable techniques for microarray analysis have been widely researched, particularly for the study of marker genes expressed to a specific type of cancer. Most of the machine learning methods that have been applied to significant gene selection focus on the classification ability rather than the selection ability of the method. These methods also require the microarray data to be preprocessed before analysis takes place. The objective of this study is to develop a hybrid genetic algorithm-neural network (GANN) model that emphasises feature selection and can operate on unpreprocessed microarray data.

METHOD:

The GANN is a hybrid model where the fitness value of the genetic algorithm (GA) is based upon the number of samples correctly labelled by a standard feedforward artificial neural network (ANN). The model is evaluated by using two benchmark microarray datasets with different array platforms and differing number of classes (a 2-class oligonucleotide microarray data for acute leukaemia and a 4-class complementary DNA (cDNA) microarray dataset for SRBCTs (small round blue cell tumours)). The underlying concept of the GANN algorithm is to select highly informative genes by co-evolving both the GA fitness function and the ANN weights at the same time.

RESULTS:

The novel GANN selected approximately 50% of the same genes as the original studies. This may indicate that these common genes are more biologically significant than other genes in the datasets. The remaining 50% of the significant genes identified were used to build predictive models and for both datasets, the models based on the set of genes extracted by the GANN method produced more accurate results. The results also suggest that the GANN method not only can detect genes that are exclusively associated with a single cancer type but can also explore the genes that are differentially expressed in multiple cancer types.

CONCLUSIONS:

The results show that the GANN model has successfully extracted statistically significant genes from the unpreprocessed microarray data as well as extracting known biologically significant genes. We also show that assessing the biological significance of genes based on classification accuracy may be misleading and though the GANN's set of extra genes prove to be more statistically significant than those selected by other methods, a biological assessment of these genes is highly recommended to confirm their functionality.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Algorithms / Neural Networks, Computer / Oligonucleotide Array Sequence Analysis / Neoplasms Type of study: Prognostic_studies Language: En Journal: Artif Intell Med Journal subject: INFORMATICA MEDICA Year: 2011 Document type: Article Affiliation country: United kingdom

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Algorithms / Neural Networks, Computer / Oligonucleotide Array Sequence Analysis / Neoplasms Type of study: Prognostic_studies Language: En Journal: Artif Intell Med Journal subject: INFORMATICA MEDICA Year: 2011 Document type: Article Affiliation country: United kingdom