Pesquisa | Portal Regional da BVS

Machine Learning Informed Diagnosis for Congenital Heart Disease in Large Claims Data Source.

Marelli, Ariane J; Li, Chao; Liu, Aihua; Nguyen, Hanh; Moroz, Harry; Brophy, James M; Guo, Liming; Buckeridge, David L; Tang, Jian; Yang, Archer Y; Li, Yue.

JACC Adv ; 3(2): 100801, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38939385

RESUMO

Background: With an increasing interest in using large claims databases in medical practice and research, it is a meaningful and essential step to efficiently identify patients with the disease of interest. Objectives: This study aims to establish a machine learning (ML) approach to identify patients with congenital heart disease (CHD) in large claims databases. Methods: We harnessed data from the Quebec claims and hospitalization databases from 1983 to 2000. The study included 19,187 patients. Of them, 3,784 were labeled as true CHD patients using a clinician developed algorithm with manual audits considered as the gold standards. To establish an accurate ML-empowered automated CHD classification system, we evaluated ML methods including Gradient Boosting Decision Tree, Support Vector Machine, Decision tree, and compared them to regularized logistic regression. The Area Under the Precision Recall Curve was used as the evaluation metric. External validation was conducted with an updated data set to 2010 with different subjects. Results: Among the ML methods we evaluated, Gradient Boosting Decision Tree led the performance in identifying true CHD patients with 99.3% Area Under the Precision Recall Curve, 98.0% for sensitivity, and 99.7% for specificity. External validation returned similar statistics on model performance. Conclusions: This study shows that a tedious and time-consuming clinical inspection for CHD patient identification can be replaced by an extremely efficient ML algorithm in large claims database. Our findings demonstrate that ML methods can be used to automate complicated algorithms to identify patients with complex diseases.

Structured learning in time-dependent Cox models.

Wang, Guanbo; Lian, Yi; Yang, Archer Y; Platt, Robert W; Wang, Rui; Perreault, Sylvie; Dorais, Marc; Schnitzer, Mireille E.

Stat Med ; 43(17): 3164-3183, 2024 Jul 30.

Artigo em Inglês | MEDLINE | ID: mdl-38807296

RESUMO

Cox models with time-dependent coefficients and covariates are widely used in survival analysis. In high-dimensional settings, sparse regularization techniques are employed for variable selection, but existing methods for time-dependent Cox models lack flexibility in enforcing specific sparsity patterns (ie, covariate structures). We propose a flexible framework for variable selection in time-dependent Cox models, accommodating complex selection rules. Our method can adapt to arbitrary grouping structures, including interaction selection, temporal, spatial, tree, and directed acyclic graph structures. It achieves accurate estimation with low false alarm rates. We develop the sox package, implementing a network flow algorithm for efficiently solving models with complex covariate structures. sox offers a user-friendly interface for specifying grouping structures and delivers fast computation. Through examples, including a case study on identifying predictors of time to all-cause death in atrial fibrillation patients, we demonstrate the practical application of our method with specific selection rules.

Assuntos

Algoritmos , Modelos de Riscos Proporcionais , Humanos , Análise de Sobrevida , Fibrilação Atrial , Fatores de Tempo , Simulação por Computador

MixEHR-SurG: A joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records.

Li, Yixuan; Yang, Archer Y; Marelli, Ariane; Li, Yue.

J Biomed Inform ; 153: 104638, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38631461

RESUMO

Survival models can help medical practitioners to evaluate the prognostic importance of clinical variables to patient outcomes such as mortality or hospital readmission and subsequently design personalized treatment regimes. Electronic Health Records (EHRs) hold the promise for large-scale survival analysis based on systematically recorded clinical features for each patient. However, existing survival models either do not scale to high dimensional and multi-modal EHR data or are difficult to interpret. In this study, we present a supervised topic model called MixEHR-SurG to simultaneously integrate heterogeneous EHR data and model survival hazard. Our contributions are three-folds: (1) integrating EHR topic inference with Cox proportional hazards likelihood; (2) integrating patient-specific topic hyperparameters using the PheCode concepts such that each topic can be identified with exactly one PheCode-associated phenotype; (3) multi-modal survival topic inference. This leads to a highly interpretable survival topic model that can infer PheCode-specific phenotype topics associated with patient mortality. We evaluated MixEHR-SurG using a simulated dataset and two real-world EHR datasets: the Quebec Congenital Heart Disease (CHD) data consisting of 8211 subjects with 75,187 outpatient claim records of 1767 unique ICD codes; the MIMIC-III consisting of 1458 subjects with multi-modal EHR records. Compared to the baselines, MixEHR-SurG achieved a superior dynamic AUROC for mortality prediction, with a mean AUROC score of 0.89 in the simulation dataset and a mean AUROC of 0.645 on the CHD dataset. Qualitatively, MixEHR-SurG associates severe cardiac conditions with high mortality risk among the CHD patients after the first heart failure hospitalization and critical brain injuries with increased mortality among the MIMIC-III patients after their ICU discharge. Together, the integration of the Cox proportional hazards model and EHR topic inference in MixEHR-SurG not only leads to competitive mortality prediction but also meaningful phenotype topics for in-depth survival analysis. The software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-SurG.

Assuntos

Registros Eletrônicos de Saúde , Modelos de Riscos Proporcionais , Humanos , Análise de Sobrevida , Algoritmos , Prognóstico , Mortalidade

Variable selection in high dimensions for discrete-outcome individualized treatment rules: Reducing severity of depression symptoms.

Moodie, Erica E M; Bian, Zeyu; Coulombe, Janie; Lian, Yi; Yang, Archer Y; Shortreed, Susan M.

Biostatistics ; 2023 Aug 31.

Artigo em Inglês | MEDLINE | ID: mdl-37660312

RESUMO

Despite growing interest in estimating individualized treatment rules, little attention has been given the binary outcome setting. Estimation is challenging with nonlinear link functions, especially when variable selection is needed. We use a new computational approach to solve a recently proposed doubly robust regularized estimating equation to accomplish this difficult task in a case study of depression treatment. We demonstrate an application of this new approach in combination with a weighted and penalized estimating equation to this challenging binary outcome setting. We demonstrate the double robustness of the method and its effectiveness for variable selection. The work is motivated by and applied to an analysis of treatment for unipolar depression using a population of patients treated at Kaiser Permanente Washington.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA