Addressing the missing data challenge in multi-modal datasets for the diagnosis of Alzheimer's disease.
J Neurosci Methods
; 375: 109582, 2022 06 01.
Article
en En
| MEDLINE
| ID: mdl-35346696
BACKGROUND: One of the challenges facing accurate diagnosis and prognosis of Alzheimer's disease, beyond identifying the subtle changes that define its early onset, is the scarcity of sufficient data compounded by the missing data challenge. Although there are many participants in the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, many of the observations have a lot of missing features which often leads to the exclusion of potentially valuable data points in many ongoing experiments, especially in longitudinal studies. NEW METHODS: Motivated by the necessity of examining all participants, even those with missing tests or imaging modalities, this study draws attention to the Gradient Boosting (GB) algorithm which has an inherent capability of addressing missing values. The four groups considered include: Cognitively Normal (CN), Early Mild Cognitive Impairment (EMCI), Late Mild Cognitive Impairment (LMCI) and Alzheimer's Disease (AD). Prior to applying state of the art classifiers such as Support Vector Machine (SVM) and Random Forest (RF), the impact of imputing (i.e., replacing) data in common datasets with numerical techniques has been investigated and compared with the GB algorithm. Empirical evaluations show that the GB performance is highly resilient to missing values in comparison to SVM and RF algorithms. These latter algorithms can however be improved when coupled with more sophisticated imputation technique such as soft-impute or K-Nearest Neighbors (KNN) algorithm assuming low extent of data incompleteness. RESULTS: The classification accuracy has been improved by up to 3% in the multiclass classification of all four classes of subjects when all the samples including the incomplete ones are considered during the model generation and testing phases. COMPARISON WITH EXISTING METHODS: Unlike other methods, the proposed approach addresses the challenging multiclass classification of the ADNI dataset in the presence of different levels of missing data points. It also provides a comparative study on effects of existing imputation techniques on a block-wise missing data. Results of the proposed method are validated against gold standard methods used for AD classification.
Palabras clave
Texto completo:
1
Bases de datos:
MEDLINE
Asunto principal:
Enfermedad de Alzheimer
/
Disfunción Cognitiva
Tipo de estudio:
Diagnostic_studies
/
Observational_studies
Límite:
Humans
Idioma:
En
Revista:
J Neurosci Methods
Año:
2022
Tipo del documento:
Article