Your browser doesn't support javascript.
loading
Kernel machine learning methods to handle missing responses with complex predictors. Application in modelling five-year glucose changes using distributional representations.
Matabuena, Marcos; Félix, Paulo; García-Meixide, Carlos; Gude, Francisco.
Afiliação
  • Matabuena M; CiTIUS (Centro Singular de Investigación en Tecnoloxías Intelixentes), Universidade de Santiago de Compostela, Santiago de Compostela 15782, Spain. Electronic address: marcos.matabuena@usc.es.
  • Félix P; CiTIUS (Centro Singular de Investigación en Tecnoloxías Intelixentes), Universidade de Santiago de Compostela, Santiago de Compostela 15782, Spain.
  • García-Meixide C; ETH Zürich.
  • Gude F; Unidade de Epidemioloxía Clínica, Complexo Hospitalario Universidade de Santiago (CHUS), Travesía da Choupana, Santiago de Compostela 15706, Spain.
Comput Methods Programs Biomed ; 221: 106905, 2022 Jun.
Article em En | MEDLINE | ID: mdl-35649295
ABSTRACT
BACKGROUND AND

OBJECTIVES:

Missing data is a ubiquitous problem in longitudinal studies due to the number of patients lost to follow-up. Kernel methods have enriched the machine learning field by successfully managing non-vectorial predictors, such as graphs, strings, and probability distributions, and have emerged as a promising tool for the analysis of complex data stemming from modern healthcare. This paper proposes a new set of kernel methods to handle missing data in the response variables. These methods will be applied to predict long-term changes in glycated haemoglobin (A1c), the primary biomarker used to diagnose and monitor the progression of diabetes mellitus, making emphasis on exploring the predictive potential of continuous glucose monitoring (CGM).

METHODS:

We propose a new framework of non-linear kernel methods for testing statistical independence, selecting relevant predictors, and quantifying the uncertainty of the resultant predictive models. As a novelty in the clinical analysis, we used a distributional representation of CGM as a predictor and compared its performance with that of traditional diabetes biomarkers.

RESULTS:

The results show that, after the incorporation of CGM information, predictive ability increases from R2=0.61 to R2=0.71. In addition, uncertainty analysis is useful for characterising some subpopulations where predictivity is worsened, and a more personalised clinical follow-up is advisable according to expected patient uncertainty in glucose values.

CONCLUSIONS:

The proposed methods have proven to deal effectively with missing data. They also have the potential to improve the results of predictive tasks by including new complex objects as explanatory variables and modelling arbitrary dependence relations. The application of these methods to a longitudinal study of diabetes showed that the inclusion of a distributional representation of CGM data provides greater sensitivity in predicting five-year A1c changes than classical diabetes biomarkers and traditional CGM metrics.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Diabetes Mellitus / Diabetes Mellitus Tipo 1 Tipo de estudo: Observational_studies / Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Diabetes Mellitus / Diabetes Mellitus Tipo 1 Tipo de estudo: Observational_studies / Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2022 Tipo de documento: Article