Your browser doesn't support javascript.
loading
Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms.
Maniruzzaman, Md; Jahanur Rahman, Md; Ahammed, Benojir; Abedin, Md Menhazul; Suri, Harman S; Biswas, Mainak; El-Baz, Ayman; Bangeas, Petros; Tsoulfas, Georgios; Suri, Jasjit S.
Afiliación
  • Maniruzzaman M; Statistics Discipline, Khulna University, Khulna, Bangladesh; Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh.
  • Jahanur Rahman M; Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh.
  • Ahammed B; Statistics Discipline, Khulna University, Khulna, Bangladesh.
  • Abedin MM; Statistics Discipline, Khulna University, Khulna, Bangladesh.
  • Suri HS; Brown University, Providence, RI, USA.
  • Biswas M; Advanced Knowledge Engineering Centre, Global Biomedical Technologies, Inc., Roseville, CA, USA.
  • El-Baz A; Department of Bioengineering, University of Louisville, Louisville, Kentucky, USA.
  • Bangeas P; Department of Surgery, Papageorgiou Hospital, Aristotle University Thessaloniki, Greece.
  • Tsoulfas G; Department of Surgery, Aristotle University of Thessaloniki, Thessaloniki, Greece.
  • Suri JS; Advanced Knowledge Engineering Centre, Global Biomedical Technologies, Inc., Roseville, CA, USA; AtheroPoint, Roseville, CA, USA. Electronic address: jasjit.suri@atheropoint.com.
Comput Methods Programs Biomed ; 176: 173-193, 2019 Jul.
Article en En | MEDLINE | ID: mdl-31200905
ABSTRACT

OBJECTIVE:

A colon microarray data is a repository of thousands of gene expressions with different strengths for each cancer cell. It is necessary to detect which genes are responsible for cancer growth. This study presents an exhaustive comparative study of different machine learning (ML) systems which serves two major

purposes:

(a) identification of high risk differential genes using statistical tests and (b) development of a ML strategy for predicting cancer genes.

METHODS:

Four statistical tests namely Wilcoxon sign rank sum (WCSRS), t test, Kruskal-Wallis (KW), and F-test were adapted for cancerous gene identification using their p-values. The extracted gene set was used to classify cancer patients using ten classifiers namely linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), naïve Bayes (NB), Gaussian process classification (GPC), support vector machine (SVM), artificial neural network (ANN), logistic regression (LR), decision tree (DT), Adaboost (AB), and random forest (RF). Performance was then evaluated using cross-validation protocols and standardized metrics viz. accuracy (ACC) and area under the curve (AUC).

RESULTS:

The colon cancer dataset consists of 2000 genes from 62 patients (40 cancer vs. 22 control). The overall mean ACC of our ML system using all four statistical tests and all ten classifiers was 90.50%. The ML system showed an ACC of 99.81% using a combination WCSRS test and RF-based classifier. This is an improvement of 8% over previously published values in literature.

CONCLUSIONS:

RF-based model with statistical tests for detection of high risk genes showed the best performance for accurate cancer classification in multi-center clinical trials.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Colon / Neoplasias del Colon / Análisis de Matrices Tisulares / Aprendizaje Automático Tipo de estudio: Clinical_trials / Diagnostic_studies / Etiology_studies / Guideline / Prognostic_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: Comput Methods Programs Biomed Asunto de la revista: INFORMATICA MEDICA Año: 2019 Tipo del documento: Article País de afiliación: Bangladesh

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Colon / Neoplasias del Colon / Análisis de Matrices Tisulares / Aprendizaje Automático Tipo de estudio: Clinical_trials / Diagnostic_studies / Etiology_studies / Guideline / Prognostic_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: Comput Methods Programs Biomed Asunto de la revista: INFORMATICA MEDICA Año: 2019 Tipo del documento: Article País de afiliación: Bangladesh
...