Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer's disease prediction.
Sci Rep
; 13(1): 450, 2023 01 09.
Article
en En
| MEDLINE
| ID: mdl-36624143
Alzheimer's disease (AD) is the most common late-onset neurodegenerative disorder. Identifying individuals at increased risk of developing AD is important for early intervention. Using data from the Alzheimer Disease Genetics Consortium, we constructed polygenic risk scores (PRSs) for AD and age-at-onset (AAO) of AD for the UK Biobank participants. We then built machine learning (ML) models for predicting development of AD, and explored feature importance among PRSs, conventional risk factors, and ICD-10 codes from electronic health records, a total of > 11,000 features using the UK Biobank dataset. We used eXtreme Gradient Boosting (XGBoost) and SHapley Additive exPlanations (SHAP), which provided superior ML performance as well as aided ML model explanation. For participants age 40 and older, the area under the curve for AD was 0.88. For subjects of age 65 and older (late-onset AD), PRSs were the most important predictors. This is the first observation that PRSs constructed from the AD risk and AAO play more important roles than age in predicting AD. The ML model also identified important predictors from EHR, including urinary tract infection, syncope and collapse, chest pain, disorientation and hypercholesterolemia, for developing AD. Our ML model improved the accuracy of AD risk prediction by efficiently exploring numerous predictors and identified novel feature patterns.
Texto completo:
1
Bases de datos:
MEDLINE
Asunto principal:
Enfermedad de Alzheimer
Tipo de estudio:
Etiology_studies
/
Prognostic_studies
/
Risk_factors_studies
Límite:
Adult
/
Aged
/
Humans
Idioma:
En
Revista:
Sci Rep
Año:
2023
Tipo del documento:
Article
País de afiliación:
Estados Unidos