Addressing the missing data challenge in multi-modal datasets for the diagnosis of Alzheimer's disease.

Aghili, Maryamossadat; Tabarestani, Solale; Adjouadi, Malek

Aghili, Maryamossadat; Tabarestani, Solale; Adjouadi, Malek.

Afiliación

Aghili M; Center for Advanced Technology and Education Department of Electrical and Computer Engineering Florida International University Miami, FL, USA. Electronic address: maghili001@fiu.edu.
Tabarestani S; Center for Advanced Technology and Education Department of Electrical and Computer Engineering Florida International University Miami, FL, USA. Electronic address: Staba006@fiu.edu.
Adjouadi M; Center for Advanced Technology and Education Department of Electrical and Computer Engineering Florida International University Miami, FL, USA. Electronic address: adjouadi@fiu.edu.

J Neurosci Methods ; 375: 109582, 2022 06 01.

Article en En | MEDLINE | ID: mdl-35346696

RESUMEN

BACKGROUND: One of the challenges facing accurate diagnosis and prognosis of Alzheimer's disease, beyond identifying the subtle changes that define its early onset, is the scarcity of sufficient data compounded by the missing data challenge. Although there are many participants in the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, many of the observations have a lot of missing features which often leads to the exclusion of potentially valuable data points in many ongoing experiments, especially in longitudinal studies. NEW METHODS: Motivated by the necessity of examining all participants, even those with missing tests or imaging modalities, this study draws attention to the Gradient Boosting (GB) algorithm which has an inherent capability of addressing missing values. The four groups considered include: Cognitively Normal (CN), Early Mild Cognitive Impairment (EMCI), Late Mild Cognitive Impairment (LMCI) and Alzheimer's Disease (AD). Prior to applying state of the art classifiers such as Support Vector Machine (SVM) and Random Forest (RF), the impact of imputing (i.e., replacing) data in common datasets with numerical techniques has been investigated and compared with the GB algorithm. Empirical evaluations show that the GB performance is highly resilient to missing values in comparison to SVM and RF algorithms. These latter algorithms can however be improved when coupled with more sophisticated imputation technique such as soft-impute or K-Nearest Neighbors (KNN) algorithm assuming low extent of data incompleteness. RESULTS: The classification accuracy has been improved by up to 3% in the multiclass classification of all four classes of subjects when all the samples including the incomplete ones are considered during the model generation and testing phases. COMPARISON WITH EXISTING METHODS: Unlike other methods, the proposed approach addresses the challenging multiclass classification of the ADNI dataset in the presence of different levels of missing data points. It also provides a comparative study on effects of existing imputation techniques on a block-wise missing data. Results of the proposed method are validated against gold standard methods used for AD classification.

Asunto(s)

Enfermedad de Alzheimer; Disfunción Cognitiva; Enfermedad de Alzheimer/diagnóstico por imagen; Encéfalo; Disfunción Cognitiva/diagnóstico por imagen; Humanos; Imagen por Resonancia Magnética/métodos; Neuroimagen/métodos

Palabras clave

ADNI data; Alzheimer's Disease; Gradient Boosting (GB); Multiclass classification; Multimodal data; Random Forest (RF); SVD Impute; Soft Impute; Support Vector Machine (SVM); Weighted K-nearest neighbors (KNN impute)

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Enfermedad de Alzheimer / Disfunción Cognitiva Tipo de estudio: Diagnostic_studies / Observational_studies Límite: Humans Idioma: En Revista: J Neurosci Methods Año: 2022 Tipo del documento: Article

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google