RESUMEN
Identifying groups of patients with similar disease progression patterns is key to understand disease heterogeneity, guide clinical decisions and improve patient care. In this paper, we propose a data-driven temporal stratification approach, ClusTric, combining triclustering and hierarchical clustering. The proposed approach enables the discovery of complex disease progression patterns not found by univariate temporal analyses. As a case study, we use Amyotrophic Lateral Sclerosis (ALS), a neurodegenerative disease with a non-linear and heterogeneous disease progression. In this context, we applied ClusTric to stratify a hospital-based population (Lisbon ALS Clinic dataset) and validate it in a clinical trial population. The results unravelled four clinically relevant disease progression groups: slow progressors, moderate bulbar and spinal progressors, and fast progressors. We compared ClusTric with a state-of-the-art method, showing its effectiveness in capturing the heterogeneity of ALS disease progression in a lower number of clinically relevant progression groups.
Asunto(s)
Esclerosis Amiotrófica Lateral , Progresión de la Enfermedad , Esclerosis Amiotrófica Lateral/patología , Esclerosis Amiotrófica Lateral/fisiopatología , Humanos , Masculino , Análisis por Conglomerados , Femenino , Persona de Mediana Edad , AncianoRESUMEN
BACKGROUND: Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disorder characterised by the progressive loss of motor neurons in the brain and spinal cord. The fact that ALS's disease course is highly heterogeneous, and its determinants not fully known, combined with ALS's relatively low prevalence, renders the successful application of artificial intelligence (AI) techniques particularly arduous. OBJECTIVE: This systematic review aims at identifying areas of agreement and unanswered questions regarding two notable applications of AI in ALS, namely the automatic, data-driven stratification of patients according to their phenotype, and the prediction of ALS progression. Differently from previous works, this review is focused on the methodological landscape of AI in ALS. METHODS: We conducted a systematic search of the Scopus and PubMed databases, looking for studies on data-driven stratification methods based on unsupervised techniques resulting in (A) automatic group discovery or (B) a transformation of the feature space allowing patient subgroups to be identified; and for studies on internally or externally validated methods for the prediction of ALS progression. We described the selected studies according to the following characteristics, when applicable: variables used, methodology, splitting criteria and number of groups, prediction outcomes, validation schemes, and metrics. RESULTS: Of the starting 1604 unique reports (2837 combined hits between Scopus and PubMed), 239 were selected for thorough screening, leading to the inclusion of 15 studies on patient stratification, 28 on prediction of ALS progression, and 6 on both stratification and prediction. In terms of variables used, most stratification and prediction studies included demographics and features derived from the ALSFRS or ALSFRS-R scores, which were also the main prediction targets. The most represented stratification methods were K-means, and hierarchical and expectation-maximisation clustering; while random forests, logistic regression, the Cox proportional hazard model, and various flavours of deep learning were the most widely used prediction methods. Predictive model validation was, albeit unexpectedly, quite rarely performed in absolute terms (leading to the exclusion of 78 eligible studies), with the overwhelming majority of included studies resorting to internal validation only. CONCLUSION: This systematic review highlighted a general agreement in terms of input variable selection for both stratification and prediction of ALS progression, and in terms of prediction targets. A striking lack of validated models emerged, as well as a general difficulty in reproducing many published studies, mainly due to the absence of the corresponding parameter lists. While deep learning seems promising for prediction applications, its superiority with respect to traditional methods has not been established; there is, instead, ample room for its application in the subfield of patient stratification. Finally, an open question remains on the role of new environmental and behavioural variables collected via novel, real-time sensors.
Asunto(s)
Esclerosis Amiotrófica Lateral , Humanos , Esclerosis Amiotrófica Lateral/diagnóstico , Inteligencia Artificial , Encéfalo , Análisis por Conglomerados , Bases de Datos FactualesRESUMEN
This work proposes a new class of explainable prognostic models for longitudinal data classification using triclusters. A new temporally constrained triclustering algorithm, termed TCtriCluster, is proposed to comprehensively find informative temporal patterns common to a subset of patients in a subset of features (triclusters), and use them as discriminative features within a state-of-the-art classifier with guarantees of interpretability. The proposed approach further enhances prediction with the potentialities of model explainability by revealing clinically relevant disease progression patterns underlying prognostics, describing features used for classification. The proposed methodology is used in the Amyotrophic Lateral Sclerosis (ALS) Portuguese cohort (N = 1321), providing the first comprehensive assessment of the prognostic limits of five notable clinical endpoints: need for non-invasive ventilation (NIV); need for an auxiliary communication device; need for percutaneous endoscopic gastrostomy (PEG); need for a caregiver; and need for a wheelchair. Triclustering-based predictors outperform state-of-the-art alternatives, being able to predict the need for auxiliary communication device (within 180 days) and the need for PEG (within 90 days) with an AUC above 90%. The approach was validated in clinical practice, supporting healthcare professionals in understanding the link between the highly heterogeneous patterns of ALS disease progression and the prognosis.
Asunto(s)
Esclerosis Amiotrófica Lateral , Ventilación no Invasiva , Humanos , Esclerosis Amiotrófica Lateral/diagnóstico , Esclerosis Amiotrófica Lateral/terapia , Pronóstico , Progresión de la Enfermedad , Respiración Artificial , GastrostomíaRESUMEN
Longitudinal cohort studies to study disease progression generally combine temporal features produced under periodic assessments (clinical follow-up) with static features associated with single-time assessments, genetic, psychophysiological, and demographic profiles. Subspace clustering, including biclustering and triclustering stances, enables the discovery of local and discriminative patterns from such multidimensional cohort data. These patterns, highly interpretable, are relevant to identifying groups of patients with similar traits or progression patterns. Despite their potential, their use for improving predictive tasks in clinical domains remains unexplored. In this work, we propose to learn predictive models from static and temporal data using discriminative patterns, obtained via biclustering and triclustering, as features within a state-of-the-art classifier, thus enhancing model interpretation. triCluster is extended to find time-contiguous triclusters in temporal data (temporal patterns) and a biclustering algorithm to discover coherent patterns in static data. The transformed data space, composed of bicluster and tricluster features, capture local and cross-variable associations with discriminative power, yielding unique statistical properties of interest. As a case study, we applied our methodology to follow-up data from Portuguese patients with Amyotrophic Lateral Sclerosis (ALS) to predict the need for non-invasive ventilation (NIV) since the last appointment. The results showed that, in general, our methodology outperformed baseline results using the original features. Furthermore, the bicluster/tricluster-based patterns used by the classifier can be used by clinicians to understand the models by highlighting relevant prognostic patterns.