Machine Learning Informed Diagnosis for Congenital Heart Disease in Large Claims Data Source.

Marelli, Ariane J; Li, Chao; Liu, Aihua; Nguyen, Hanh; Moroz, Harry; Brophy, James M; Guo, Liming; Buckeridge, David L; Tang, Jian; Yang, Archer Y; Li, Yue

Machine Learning Informed Diagnosis for Congenital Heart Disease in Large Claims Data Source.

Marelli, Ariane J; Li, Chao; Liu, Aihua; Nguyen, Hanh; Moroz, Harry; Brophy, James M; Guo, Liming; Buckeridge, David L; Tang, Jian; Yang, Archer Y; Li, Yue.

Afiliación

Marelli AJ; McGill University Health Centre, McGill Adult Unit for Congenital Heart Disease Excellence, Montreal, Québec, Canada.
Li C; McGill University Health Centre, McGill Adult Unit for Congenital Heart Disease Excellence, Montreal, Québec, Canada.
Liu A; McGill University Health Centre, McGill Adult Unit for Congenital Heart Disease Excellence, Montreal, Québec, Canada.
Nguyen H; McGill University Health Centre, McGill Adult Unit for Congenital Heart Disease Excellence, Montreal, Québec, Canada.
Moroz H; McGill University Health Centre, McGill Adult Unit for Congenital Heart Disease Excellence, Montreal, Québec, Canada.
Brophy JM; Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montreal, Québec, Canada.
Guo L; McGill University Health Centre, McGill Adult Unit for Congenital Heart Disease Excellence, Montreal, Québec, Canada.
Buckeridge DL; Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montreal, Québec, Canada.
Tang J; Department of Decision Sciences HEC, Université de Montréal, Montreal, Québec, Canada.
Yang AY; Department of Mathematics and Statistics, McGill University, Montreal, Québec, Canada.
Li Y; School of Computer Science, McGill University, Montreal, Québec, Canada.

JACC Adv ; 3(2): 100801, 2024 Feb.

Article en En | MEDLINE | ID: mdl-38939385

ABSTRACT

ABSTRACT

Background:

With an increasing interest in using large claims databases in medical practice and research, it is a meaningful and essential step to efficiently identify patients with the disease of interest.

Objectives:

This study aims to establish a machine learning (ML) approach to identify patients with congenital heart disease (CHD) in large claims databases.

Methods:

We harnessed data from the Quebec claims and hospitalization databases from 1983 to 2000. The study included 19,187 patients. Of them, 3,784 were labeled as true CHD patients using a clinician developed algorithm with manual audits considered as the gold standards. To establish an accurate ML-empowered automated CHD classification system, we evaluated ML methods including Gradient Boosting Decision Tree, Support Vector Machine, Decision tree, and compared them to regularized logistic regression. The Area Under the Precision Recall Curve was used as the evaluation metric. External validation was conducted with an updated data set to 2010 with different subjects.

Results:

Among the ML methods we evaluated, Gradient Boosting Decision Tree led the performance in identifying true CHD patients with 99.3% Area Under the Precision Recall Curve, 98.0% for sensitivity, and 99.7% for specificity. External validation returned similar statistics on model performance.

Conclusions:

This study shows that a tedious and time-consuming clinical inspection for CHD patient identification can be replaced by an extremely efficient ML algorithm in large claims database. Our findings demonstrate that ML methods can be used to automate complicated algorithms to identify patients with complex diseases.

Palabras clave

congenital heart disease; large administrative claims database; machine learning

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: JACC Adv Año: 2024 Tipo del documento: Article País de afiliación: Canadá