Retrospective comparison of traditional and artificial intelligence-based heart failure phenotyping in a US health system to enable real-world evidence.

Garan, Arthur Reshad; Monda, Keri L; Dent-Acosta, Ricardo E; Riskin, Daniel J; Gluckman, Ty J

Garan, Arthur Reshad; Monda, Keri L; Dent-Acosta, Ricardo E; Riskin, Daniel J; Gluckman, Ty J.

Afiliação

Garan AR; Beth Israel Deaconess Medical Center, Department of Medicine, Division of Cardiology, Harvard Medical School, Boston, Massachusetts, USA agaran@bidmc.harvard.edu.
Monda KL; Amgen Inc, Thousand Oaks, California, USA.
Dent-Acosta RE; Amgen Inc, Thousand Oaks, California, USA.
Riskin DJ; Verantos, Menlo Park, California, USA.
Gluckman TJ; Center for Cardiovascular Analytics, Research and Data Science (CARDS), Providence Heart Institute, Providence Research Network, Portland, Oregon, USA.

BMJ Open ; 13(8): e073178, 2023 08 09.

Article em En | MEDLINE | ID: mdl-37558448

ABSTRACT

ABSTRACT

OBJECTIVE:

Quantitatively evaluate the quality of data underlying real-world evidence (RWE) in heart failure (HF).

DESIGN:

Retrospective comparison of accuracy in identifying patients with HF and phenotypic information was made using traditional (ie, structured query language applied to structured electronic health record (EHR) data) and advanced (ie, artificial intelligence (AI) applied to unstructured EHR data) RWE approaches. The performance of each approach was measured by the harmonic mean of precision and recall (F1 score) using manual annotation of medical records as a reference standard.

SETTING:

EHR data from a large academic healthcare system in North America between 2015 and 2019, with an expected catchment of approximately 5 00 000 patients. POPULATION 4288 encounters for 1155 patients aged 18-85 years, with 472 patients identified as having HF. OUTCOME

MEASURES:

HF and associated concepts, such as comorbidities, left ventricular ejection fraction, and selected medications.

RESULTS:

The average F1 scores across 19 HF-specific concepts were 49.0% and 94.1% for the traditional and advanced approaches, respectively (p<0.001 for all concepts with available data). The absolute difference in F1 score between approaches was 45.1% (98.1% relative increase in F1 score using the advanced approach). The advanced approach achieved superior F1 scores for HF presence, phenotype and associated comorbidities. Some phenotypes, such as HF with preserved ejection fraction, revealed dramatic differences in extraction accuracy based on technology applied, with a 4.9% F1 score when using natural language processing (NLP) alone and a 91.0% F1 score when using NLP plus AI-based inference.

CONCLUSIONS:

A traditional RWE generation approach resulted in low data quality in patients with HF. While an advanced approach demonstrated high accuracy, the results varied dramatically based on extraction techniques. For future studies, advanced approaches and accuracy measurement may be required to ensure data are fit-for-purpose.

Assuntos

Inteligência Artificial; Insuficiência Cardíaca; Humanos; Estudos Retrospectivos; Volume Sistólico; Função Ventricular Esquerda; Registros Eletrônicos de Saúde; Processamento de Linguagem Natural

Palavras-chave

cardiac epidemiology; health informatics; heart failure

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Inteligência Artificial / Insuficiência Cardíaca Tipo de estudo: Observational_studies / Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google