RESUMEN
In machine learning, data often comes from different sources, but combining them can introduce extraneous variation that affects both generalization and interpretability. For example, we investigate the classification of neurodegenerative diseases using FDG-PET data collected from multiple neuroimaging centers. However, data collected at different centers introduces unwanted variation due to differences in scanners, scanning protocols, and processing methods. To address this issue, we propose a two-step approach to limit the influence of center-dependent variation on the classification of healthy controls and early vs. late-stage Parkinson's disease patients. First, we train a Generalized Matrix Learning Vector Quantization (GMLVQ) model on healthy control data to identify a "relevance space" that distinguishes between centers. Second, we use this space to construct a correction matrix that restricts a second GMLVQ system's training on the diagnostic problem. We evaluate the effectiveness of this approach on the real-world multi-center datasets and simulated artificial dataset. Our results demonstrate that the approach produces machine learning systems with reduced bias - being more specific due to eliminating information related to center differences during the training process - and more informative relevance profiles that can be interpreted by medical experts. This method can be adapted to similar problems outside the neuroimaging domain, as long as an appropriate "relevance space" can be identified to construct the correction matrix.
Asunto(s)
Neuroimagen , Enfermedad de Parkinson , Humanos , Tomografía de Emisión de Positrones , Aprendizaje Automático , Enfermedad de Parkinson/diagnóstico por imagenRESUMEN
BACKGROUND AND OBJECTIVES: 18F-fluorodeoxyglucose (FDG) positron emission tomography (PET) combined with principal component analysis (PCA) has been applied to identify disease-related brain patterns in neurodegenerative disorders such as Parkinson's disease (PD), Dementia with Lewy Bodies (DLB) and Alzheimer's disease (AD). These patterns are used to quantify functional brain changes at the single subject level. This is especially relevant in determining disease progression in idiopathic REM sleep behavior disorder (iRBD), a prodromal stage of PD and DLB. However, the PCA method is limited in discriminating between neurodegenerative conditions. More advanced machine learning algorithms may provide a solution. In this study, we apply Generalized Matrix Learning Vector Quantization (GMLVQ) to FDG-PET scans of healthy controls, and patients with AD, PD and DLB. Scans of iRBD patients, scanned twice with an approximate 4 year interval, were projected into GMLVQ space to visualize their trajectory. METHODS: We applied a combination of SSM/PCA and GMLVQ as a classifier on FDG-PET data of healthy controls, AD, DLB, and PD patients. We determined the diagnostic performance by performing a ten times repeated ten fold cross validation. We analyzed the validity of the classification system by inspecting the GMLVQ space. First by the projection of the patients into this space. Second by representing the axis, that span this decision space, into a voxel map. Furthermore, we projected a cohort of RBD patients, whom have been scanned twice (approximately 4 years apart), into the same decision space and visualized their trajectories. RESULTS: The GMLVQ prototypes, relevance diagonal, and decision space voxel maps showed metabolic patterns that agree with previously identified disease-related brain patterns. The GMLVQ decision space showed a plausible quantification of FDG-PET data. Distance traveled by iRBD subjects through GMLVQ space per year (i.e. velocity) was correlated with the change in motor symptoms per year (Spearman's rho =0.62, P=0.004). CONCLUSION: In this proof-of-concept study, we show that GMLVQ provides a classification of patients with neurodegenerative disorders, and may be useful in future studies investigating speed of progression in prodromal disease stages.
Asunto(s)
Enfermedades Neurodegenerativas , Enfermedad de Parkinson , Trastorno de la Conducta del Sueño REM , Fluorodesoxiglucosa F18 , Humanos , Enfermedades Neurodegenerativas/diagnóstico por imagen , Enfermedad de Parkinson/diagnóstico por imagen , Tomografía de Emisión de Positrones/métodos , Trastorno de la Conducta del Sueño REM/diagnóstico por imagen , Trastorno de la Conducta del Sueño REM/metabolismoRESUMEN
BACKGROUND AND OBJECTIVE: Neurodegenerative diseases like Parkinson's disease often take several years before they can be diagnosed reliably based on clinical grounds. Imaging techniques such as MRI are used to detect anatomical (structural) pathological changes. However, these kinds of changes are usually seen only late in the development. The measurement of functional brain activity by means of [18F]fluorodeoxyglucose positron emission tomography (FDG-PET) can provide useful information, but its interpretation is more difficult. The scaled sub-profile model principal component analysis (SSM/PCA) was shown to provide more useful information than other statistical techniques. Our objective is to improve the performance further by combining SSM/PCA and prototype-based generalized matrix learning vector quantization (GMLVQ). METHODS: We apply a combination of SSM/PCA and GMLVQ as a classifier. In order to demonstrate the combination's validity, we analyze FDG-PET data of Parkinson's disease (PD) patients collected at three different neuroimaging centers in Europe. We determine the diagnostic performance by performing a ten times repeated ten fold cross validation. Additionally, discriminant visualizations of the data are included. The prototypes and relevance of GMLVQ are transformed back to the original voxel space by exploiting the linearity of SSM/PCA. The resulting prototypes and relevance profiles have then been assessed by three neurologists. RESULTS: One important finding is that discriminative visualization can help to identify disease-related properties as well as differences which are due to center-specific factors. Secondly, the neurologist assessed the interpretability of the method and confirmed that prototypes are similar to known activity profiles of PD patients. CONCLUSION: We have shown that the presented combination of SSM/PCA and GMLVQ can provide useful means to assess and better understand characteristic differences in FDG-PET data from PD patients and HCs. Based on the assessments by medical experts and the results of our computational analysis we conclude that the first steps towards a diagnostic support system have been taken successfully.