Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 56
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Circulation ; 149(12): 917-931, 2024 03 19.
Article in English | MEDLINE | ID: mdl-38314583

ABSTRACT

BACKGROUND: Artificial intelligence-enhanced ECG analysis shows promise to detect ventricular dysfunction and remodeling in adult populations. However, its application to pediatric populations remains underexplored. METHODS: A convolutional neural network was trained on paired ECG-echocardiograms (≤2 days apart) from patients ≤18 years of age without major congenital heart disease to detect human expert-classified greater than mild left ventricular (LV) dysfunction, hypertrophy, and dilation (individually and as a composite outcome). Model performance was evaluated on single ECG-echocardiogram pairs per patient at Boston Children's Hospital and externally at Mount Sinai Hospital using area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). RESULTS: The training cohort comprised 92 377 ECG-echocardiogram pairs (46 261 patients; median age, 8.2 years). Test groups included internal testing (12 631 patients; median age, 8.8 years; 4.6% composite outcomes), emergency department (2830 patients; median age, 7.7 years; 10.0% composite outcomes), and external validation (5088 patients; median age, 4.3 years; 6.1% composite outcomes) cohorts. Model performance was similar on internal test and emergency department cohorts, with model predictions of LV hypertrophy outperforming the pediatric cardiologist expert benchmark. Adding age and sex to the model added no benefit to model performance. When using quantitative outcome cutoffs, model performance was similar between internal testing (composite outcome: AUROC, 0.88, AUPRC, 0.43; LV dysfunction: AUROC, 0.92, AUPRC, 0.23; LV hypertrophy: AUROC, 0.88, AUPRC, 0.28; LV dilation: AUROC, 0.91, AUPRC, 0.47) and external validation (composite outcome: AUROC, 0.86, AUPRC, 0.39; LV dysfunction: AUROC, 0.94, AUPRC, 0.32; LV hypertrophy: AUROC, 0.84, AUPRC, 0.25; LV dilation: AUROC, 0.87, AUPRC, 0.33), with composite outcome negative predictive values of 99.0% and 99.2%, respectively. Saliency mapping highlighted ECG components that influenced model predictions (precordial QRS complexes for all outcomes; T waves for LV dysfunction). High-risk ECG features include lateral T-wave inversion (LV dysfunction), deep S waves in V1 and V2 and tall R waves in V6 (LV hypertrophy), and tall R waves in V4 through V6 (LV dilation). CONCLUSIONS: This externally validated algorithm shows promise to inexpensively screen for LV dysfunction and remodeling in children, which may facilitate improved access to care by democratizing the expertise of pediatric cardiologists.


Subject(s)
Deep Learning , Ventricular Dysfunction, Left , Adult , Humans , Child , Child, Preschool , Electrocardiography , Artificial Intelligence , Ventricular Dysfunction, Left/diagnostic imaging , Hypertrophy, Left Ventricular/diagnostic imaging
2.
Ann Intern Med ; 176(10): 1358-1369, 2023 10.
Article in English | MEDLINE | ID: mdl-37812781

ABSTRACT

BACKGROUND: Substantial effort has been directed toward demonstrating uses of predictive models in health care. However, implementation of these models into clinical practice may influence patient outcomes, which in turn are captured in electronic health record data. As a result, deployed models may affect the predictive ability of current and future models. OBJECTIVE: To estimate changes in predictive model performance with use through 3 common scenarios: model retraining, sequentially implementing 1 model after another, and intervening in response to a model when 2 are simultaneously implemented. DESIGN: Simulation of model implementation and use in critical care settings at various levels of intervention effectiveness and clinician adherence. Models were either trained or retrained after simulated implementation. SETTING: Admissions to the intensive care unit (ICU) at Mount Sinai Health System (New York, New York) and Beth Israel Deaconess Medical Center (Boston, Massachusetts). PATIENTS: 130 000 critical care admissions across both health systems. INTERVENTION: Across 3 scenarios, interventions were simulated at varying levels of clinician adherence and effectiveness. MEASUREMENTS: Statistical measures of performance, including threshold-independent (area under the curve) and threshold-dependent measures. RESULTS: At fixed 90% sensitivity, in scenario 1 a mortality prediction model lost 9% to 39% specificity after retraining once and in scenario 2 a mortality prediction model lost 8% to 15% specificity when created after the implementation of an acute kidney injury (AKI) prediction model; in scenario 3, models for AKI and mortality prediction implemented simultaneously, each led to reduced effective accuracy of the other by 1% to 28%. LIMITATIONS: In real-world practice, the effectiveness of and adherence to model-based recommendations are rarely known in advance. Only binary classifiers for tabular ICU admissions data were simulated. CONCLUSION: In simulated ICU settings, a universally effective model-updating approach for maintaining model performance does not seem to exist. Model use may have to be recorded to maintain viability of predictive modeling. PRIMARY FUNDING SOURCE: National Center for Advancing Translational Sciences.


Subject(s)
Acute Kidney Injury , Artificial Intelligence , Humans , Intensive Care Units , Critical Care , Delivery of Health Care
3.
J Am Soc Nephrol ; 32(1): 151-160, 2021 01.
Article in English | MEDLINE | ID: mdl-32883700

ABSTRACT

BACKGROUND: Early reports indicate that AKI is common among patients with coronavirus disease 2019 (COVID-19) and associated with worse outcomes. However, AKI among hospitalized patients with COVID-19 in the United States is not well described. METHODS: This retrospective, observational study involved a review of data from electronic health records of patients aged ≥18 years with laboratory-confirmed COVID-19 admitted to the Mount Sinai Health System from February 27 to May 30, 2020. We describe the frequency of AKI and dialysis requirement, AKI recovery, and adjusted odds ratios (aORs) with mortality. RESULTS: Of 3993 hospitalized patients with COVID-19, AKI occurred in 1835 (46%) patients; 347 (19%) of the patients with AKI required dialysis. The proportions with stages 1, 2, or 3 AKI were 39%, 19%, and 42%, respectively. A total of 976 (24%) patients were admitted to intensive care, and 745 (76%) experienced AKI. Of the 435 patients with AKI and urine studies, 84% had proteinuria, 81% had hematuria, and 60% had leukocyturia. Independent predictors of severe AKI were CKD, men, and higher serum potassium at admission. In-hospital mortality was 50% among patients with AKI versus 8% among those without AKI (aOR, 9.2; 95% confidence interval, 7.5 to 11.3). Of survivors with AKI who were discharged, 35% had not recovered to baseline kidney function by the time of discharge. An additional 28 of 77 (36%) patients who had not recovered kidney function at discharge did so on posthospital follow-up. CONCLUSIONS: AKI is common among patients hospitalized with COVID-19 and is associated with high mortality. Of all patients with AKI, only 30% survived with recovery of kidney function by the time of discharge.


Subject(s)
Acute Kidney Injury/etiology , COVID-19/complications , SARS-CoV-2 , Acute Kidney Injury/epidemiology , Acute Kidney Injury/therapy , Acute Kidney Injury/urine , Aged , Aged, 80 and over , COVID-19/mortality , Female , Hematuria/etiology , Hospital Mortality , Hospitals, Private/statistics & numerical data , Hospitals, Urban/statistics & numerical data , Humans , Incidence , Inpatients , Leukocytes , Male , Middle Aged , New York City/epidemiology , Proteinuria/etiology , Renal Dialysis , Retrospective Studies , Treatment Outcome , Urine/cytology
4.
Europace ; 23(8): 1179-1191, 2021 08 06.
Article in English | MEDLINE | ID: mdl-33564873

ABSTRACT

In the recent decade, deep learning, a subset of artificial intelligence and machine learning, has been used to identify patterns in big healthcare datasets for disease phenotyping, event predictions, and complex decision making. Public datasets for electrocardiograms (ECGs) have existed since the 1980s and have been used for very specific tasks in cardiology, such as arrhythmia, ischemia, and cardiomyopathy detection. Recently, private institutions have begun curating large ECG databases that are orders of magnitude larger than the public databases for ingestion by deep learning models. These efforts have demonstrated not only improved performance and generalizability in these aforementioned tasks but also application to novel clinical scenarios. This review focuses on orienting the clinician towards fundamental tenets of deep learning, state-of-the-art prior to its use for ECG analysis, and current applications of deep learning on ECGs, as well as their limitations and future areas of improvement.


Subject(s)
Cardiology , Deep Learning , Artificial Intelligence , Electrocardiography , Humans , Machine Learning
5.
Blood Purif ; 50(4-5): 621-627, 2021.
Article in English | MEDLINE | ID: mdl-33631752

ABSTRACT

BACKGROUND/AIMS: Acute kidney injury (AKI) in critically ill patients is common, and continuous renal replacement therapy (CRRT) is a preferred mode of renal replacement therapy (RRT) in hemodynamically unstable patients. Prediction of clinical outcomes in patients on CRRT is challenging. We utilized several approaches to predict RRT-free survival (RRTFS) in critically ill patients with AKI requiring CRRT. METHODS: We used the Medical Information Mart for Intensive Care (MIMIC-III) database to identify patients ≥18 years old with AKI on CRRT, after excluding patients who had ESRD on chronic dialysis, and kidney transplantation. We defined RRTFS as patients who were discharged alive and did not require RRT ≥7 days prior to hospital discharge. We utilized all available biomedical data up to CRRT initiation. We evaluated 7 approaches, including logistic regression (LR), random forest (RF), support vector machine (SVM), adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), multilayer perceptron (MLP), and MLP with long short-term memory (MLP + LSTM). We evaluated model performance by using area under the receiver operating characteristic (AUROC) curves. RESULTS: Out of 684 patients with AKI on CRRT, 205 (30%) patients had RRTFS. The median age of patients was 63 years and their median Simplified Acute Physiology Score (SAPS) II was 67 (interquartile range 52-84). The MLP + LSTM showed the highest AUROC (95% CI) of 0.70 (0.67-0.73), followed by MLP 0.59 (0.54-0.64), LR 0.57 (0.52-0.62), SVM 0.51 (0.46-0.56), AdaBoost 0.51 (0.46-0.55), RF 0.44 (0.39-0.48), and XGBoost 0.43 (CI 0.38-0.47). CONCLUSIONS: A MLP + LSTM model outperformed other approaches for predicting RRTFS. Performance could be further improved by incorporating other data types.


Subject(s)
Acute Kidney Injury/therapy , Renal Replacement Therapy , Acute Kidney Injury/diagnosis , Age Factors , Aged , Critical Care , Female , Humans , Logistic Models , Machine Learning , Male , Middle Aged , Prognosis
6.
Kidney Int ; 98(5): 1323-1330, 2020 11.
Article in English | MEDLINE | ID: mdl-32540406

ABSTRACT

Urinary tract stones have high heritability indicating a strong genetic component. However, genome-wide association studies (GWAS) have uncovered only a few genome wide significant single nucleotide polymorphisms (SNPs). Polygenic risk scores (PRS) sum cumulative effect of many SNPs and shed light on underlying genetic architecture. Using GWAS summary statistics from 361,141 participants in the United Kingdom Biobank, we generated a PRS and determined association with stone diagnosis in 28,877 participants in the Mount Sinai BioMe Biobank. In BioMe (1,071 cases and 27,806 controls), for every standard deviation increase, we observed a significant increment in adjusted odds ratio of a factor of 1.2 (95% confidence interval 1.13-1.26). In comparison, a risk score comprised of GWAS significant SNPs was not significantly associated with diagnosis. After stratifying individuals into low and high-risk categories on clinical risk factors, there was a significant increment in adjusted odds ratio of 1.3 (1.12-1.6) in the low- and 1.2 (1.1-1.2) in the high-risk group for every standard deviation increment in PRS. In a 14,348-participant validation cohort (Penn Medicine Biobank), every standard deviation increment was associated with a significant adjusted odds ratio of 1.1 (1.03 - 1.2). Thus, a genome-wide PRS is associated with urinary tract stones overall and in the absence of known clinical risk factors and illustrates their complex polygenic architecture.


Subject(s)
Genome-Wide Association Study , Urinary Calculi , Genetic Predisposition to Disease , Humans , Multifactorial Inheritance , Polymorphism, Single Nucleotide , United Kingdom/epidemiology
7.
Curr Opin Nephrol Hypertens ; 29(3): 319-326, 2020 05.
Article in English | MEDLINE | ID: mdl-32235273

ABSTRACT

PURPOSE OF REVIEW: The universal adoption of electronic health records, improvement in technology, and the availability of continuous monitoring has generated large quantities of healthcare data. Machine learning is increasingly adopted by nephrology researchers to analyze this data in order to improve the care of their patients. RECENT FINDINGS: In this review, we provide a broad overview of the different types of machine learning algorithms currently available and how researchers have applied these methods in nephrology research. Current applications have included prediction of acute kidney injury and chronic kidney disease along with progression of kidney disease. Researchers have demonstrated the ability of machine learning to read kidney biopsy samples, identify patient outcomes from unstructured data, and identify subtypes in complex diseases. We end with a discussion on the ethics and potential pitfalls of machine learning. SUMMARY: Machine learning provides researchers with the ability to analyze data that were previously inaccessible. While still burgeoning, several studies show promising results, which will enable researchers to perform larger scale studies and clinicians the ability to provide more personalized care. However, we must ensure that implementation aids providers and does not lead to harm to patients.


Subject(s)
Kidney Diseases/therapy , Machine Learning , Algorithms , Humans , Natural Language Processing , Translational Research, Biomedical
8.
J Med Internet Res ; 22(11): e24018, 2020 11 06.
Article in English | MEDLINE | ID: mdl-33027032

ABSTRACT

BACKGROUND: COVID-19 has infected millions of people worldwide and is responsible for several hundred thousand fatalities. The COVID-19 pandemic has necessitated thoughtful resource allocation and early identification of high-risk patients. However, effective methods to meet these needs are lacking. OBJECTIVE: The aims of this study were to analyze the electronic health records (EHRs) of patients who tested positive for COVID-19 and were admitted to hospitals in the Mount Sinai Health System in New York City; to develop machine learning models for making predictions about the hospital course of the patients over clinically meaningful time horizons based on patient characteristics at admission; and to assess the performance of these models at multiple hospitals and time points. METHODS: We used Extreme Gradient Boosting (XGBoost) and baseline comparator models to predict in-hospital mortality and critical events at time windows of 3, 5, 7, and 10 days from admission. Our study population included harmonized EHR data from five hospitals in New York City for 4098 COVID-19-positive patients admitted from March 15 to May 22, 2020. The models were first trained on patients from a single hospital (n=1514) before or on May 1, externally validated on patients from four other hospitals (n=2201) before or on May 1, and prospectively validated on all patients after May 1 (n=383). Finally, we established model interpretability to identify and rank variables that drive model predictions. RESULTS: Upon cross-validation, the XGBoost classifier outperformed baseline models, with an area under the receiver operating characteristic curve (AUC-ROC) for mortality of 0.89 at 3 days, 0.85 at 5 and 7 days, and 0.84 at 10 days. XGBoost also performed well for critical event prediction, with an AUC-ROC of 0.80 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. In external validation, XGBoost achieved an AUC-ROC of 0.88 at 3 days, 0.86 at 5 days, 0.86 at 7 days, and 0.84 at 10 days for mortality prediction. Similarly, the unimputed XGBoost model achieved an AUC-ROC of 0.78 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. Trends in performance on prospective validation sets were similar. At 7 days, acute kidney injury on admission, elevated LDH, tachypnea, and hyperglycemia were the strongest drivers of critical event prediction, while higher age, anion gap, and C-reactive protein were the strongest drivers of mortality prediction. CONCLUSIONS: We externally and prospectively trained and validated machine learning models for mortality and critical events for patients with COVID-19 at different time horizons. These models identified at-risk patients and uncovered underlying relationships that predicted outcomes.


Subject(s)
Coronavirus Infections/diagnosis , Coronavirus Infections/mortality , Machine Learning/standards , Pneumonia, Viral/diagnosis , Pneumonia, Viral/mortality , Acute Kidney Injury/epidemiology , Adolescent , Adult , Aged , Aged, 80 and over , Betacoronavirus , COVID-19 , Cohort Studies , Electronic Health Records , Female , Hospital Mortality , Hospitalization/statistics & numerical data , Hospitals , Humans , Male , Middle Aged , New York City/epidemiology , Pandemics , Prognosis , ROC Curve , Risk Assessment/methods , Risk Assessment/standards , SARS-CoV-2 , Young Adult
9.
J Am Med Inform Assoc ; 31(9): 2097-2102, 2024 Sep 01.
Article in English | MEDLINE | ID: mdl-38687616

ABSTRACT

OBJECTIVES: The study developed framework that leverages an open-source Large Language Model (LLM) to enable clinicians to ask plain-language questions about a patient's entire echocardiogram report history. This approach is intended to streamline the extraction of clinical insights from multiple echocardiogram reports, particularly in patients with complex cardiac diseases, thereby enhancing both patient care and research efficiency. MATERIALS AND METHODS: Data from over 10 years were collected, comprising echocardiogram reports from patients with more than 10 echocardiograms on file at the Mount Sinai Health System. These reports were converted into a single document per patient for analysis, broken down into snippets and relevant snippets were retrieved using text similarity measures. The LLaMA-2 70B model was employed for analyzing the text using a specially crafted prompt. The model's performance was evaluated against ground-truth answers created by faculty cardiologists. RESULTS: The study analyzed 432 reports from 37 patients for a total of 100 question-answer pairs. The LLM correctly answered 90% questions, with accuracies of 83% for temporality, 93% for severity assessment, 84% for intervention identification, and 100% for diagnosis retrieval. Errors mainly stemmed from the LLM's inherent limitations, such as misinterpreting numbers or hallucinations. CONCLUSION: The study demonstrates the feasibility and effectiveness of using a local, open-source LLM for querying and interpreting echocardiogram report data. This approach offers a significant improvement over traditional keyword-based searches, enabling more contextually relevant and semantically accurate responses; in turn showing promise in enhancing clinical decision-making and research by facilitating more efficient access to complex patient data.


Subject(s)
Echocardiography , Electronic Health Records , Natural Language Processing , Humans , Heart Diseases/diagnostic imaging , Confidentiality , Information Storage and Retrieval/methods
10.
medRxiv ; 2024 Aug 07.
Article in English | MEDLINE | ID: mdl-39148855

ABSTRACT

Drug repurposing - identifying new therapeutic uses for approved drugs - is often serendipitous and opportunistic, expanding the use of drugs for new diseases. The clinical utility of drug repurposing AI models remains limited because the models focus narrowly on diseases for which some drugs already exist. Here, we introduce TXGNN, a graph foundation model for zero-shot drug repurposing, identifying therapeutic candidates even for diseases with limited treatment options or no existing drugs. Trained on a medical knowledge graph, TXGNN utilizes a graph neural network and metric-learning module to rank drugs as potential indications and contraindications across 17,080 diseases. When benchmarked against eight methods, TXGNN improves prediction accuracy for indications by 49.2% and contraindications by 35.1% under stringent zero-shot evaluation. To facilitate model interpretation, TXGNN's Explainer module offers transparent insights into multi-hop medical knowledge paths that form TXGNN's predictive rationales. Human evaluation of TXGNN's Explainer showed that TXGNN's predictions and explanations perform encouragingly on multiple axes of performance beyond accuracy. Many of TxGNN's novel predictions align with off-label prescriptions clinicians make in a large healthcare system. TXGNN's drug repurposing predictions are accurate, consistent with off-label drug use, and can be investigated by human experts through multi-hop interpretable rationales.

11.
Nat Med ; 2024 Sep 25.
Article in English | MEDLINE | ID: mdl-39322717

ABSTRACT

Drug repurposing-identifying new therapeutic uses for approved drugs-is often a serendipitous and opportunistic endeavour to expand the use of drugs for new diseases. The clinical utility of drug-repurposing artificial intelligence (AI) models remains limited because these models focus narrowly on diseases for which some drugs already exist. Here we introduce TxGNN, a graph foundation model for zero-shot drug repurposing, identifying therapeutic candidates even for diseases with limited treatment options or no existing drugs. Trained on a medical knowledge graph, TxGNN uses a graph neural network and metric learning module to rank drugs as potential indications and contraindications for 17,080 diseases. When benchmarked against 8 methods, TxGNN improves prediction accuracy for indications by 49.2% and contraindications by 35.1% under stringent zero-shot evaluation. To facilitate model interpretation, TxGNN's Explainer module offers transparent insights into multi-hop medical knowledge paths that form TxGNN's predictive rationales. Human evaluation of TxGNN's Explainer showed that TxGNN's predictions and explanations perform encouragingly on multiple axes of performance beyond accuracy. Many of TxGNN's new predictions align well with off-label prescriptions that clinicians previously made in a large healthcare system. TxGNN's drug-repurposing predictions are accurate, consistent with off-label drug use, and can be investigated by human experts through multi-hop interpretable rationales.

12.
medRxiv ; 2024 Feb 18.
Article in English | MEDLINE | ID: mdl-38405776

ABSTRACT

Timely and accurate assessment of electrocardiograms (ECGs) is crucial for diagnosing, triaging, and clinically managing patients. Current workflows rely on a computerized ECG interpretation using rule-based tools built into the ECG signal acquisition systems with limited accuracy and flexibility. In low-resource settings, specialists must review every single ECG for such decisions, as these computerized interpretations are not available. Additionally, high-quality interpretations are even more essential in such low-resource settings as there is a higher burden of accuracy for automated reads when access to experts is limited. Artificial Intelligence (AI)-based systems have the prospect of greater accuracy yet are frequently limited to a narrow range of conditions and do not replicate the full diagnostic range. Moreover, these models often require raw signal data, which are unavailable to physicians and necessitate costly technical integrations that are currently limited. To overcome these challenges, we developed and validated a format-independent vision encoder-decoder model - ECG-GPT - that can generate free-text, expert-level diagnosis statements directly from ECG images. The model shows robust performance, validated on 2.6 million ECGs across 6 geographically distinct health settings: (1) 2 large and diverse US health systems- Yale-New Haven and Mount Sinai Health Systems, (2) a consecutive ECG dataset from a central ECG repository from Minas Gerais, Brazil, (3) the prospective cohort study, UK Biobank, (4) a Germany-based, publicly available repository, PTB-XL, and (5) a community hospital in Missouri. The model demonstrated consistently high performance (AUROC≥0.81) across a wide range of rhythm and conduction disorders. This can be easily accessed via a web-based application capable of receiving ECG images and represents a scalable and accessible strategy for generating accurate, expert-level reports from images of ECGs, enabling accurate triage of patients globally, especially in low-resource settings.

13.
J Am Coll Cardiol ; 84(9): 815-828, 2024 Aug 27.
Article in English | MEDLINE | ID: mdl-39168568

ABSTRACT

BACKGROUND: Artificial intelligence-enhanced electrocardiogram (AI-ECG) analysis shows promise to detect biventricular pathophysiology. However, AI-ECG analysis remains underexplored in congenital heart disease (CHD). OBJECTIVES: The purpose of this study was to develop and externally validate an AI-ECG model to predict cardiovascular magnetic resonance (CMR)-defined biventricular dysfunction/dilation in patients with CHD. METHODS: We trained (80%) and tested (20%) a convolutional neural network on paired ECG-CMRs (≤30 days apart) from patients with and without CHD to detect left ventricular (LV) dysfunction (ejection fraction ≤40%), RV dysfunction (ejection fraction ≤35%), and LV and RV dilation (end-diastolic volume z-score ≥4). Performance was assessed during internal testing and external validation on an outside health care system using area under receiver-operating curve (AUROC) and area under precision recall curve. RESULTS: The internal and external cohorts comprised 8,584 ECG-CMR pairs (n = 4,941; median CMR age 20.7 years) and 909 ECG-CMR pairs (n = 746; median CMR age 25.4 years), respectively. Model performance was similar for internal testing (AUROC: LV dysfunction 0.87; LV dilation 0.86; RV dysfunction 0.88; RV dilation 0.81) and external validation (AUROC: LV dysfunction 0.89; LV dilation 0.83; RV dysfunction 0.82; RV dilation 0.80). Model performance was lowest in functionally single ventricle patients. Tetralogy of Fallot patients predicted to be at high risk of ventricular dysfunction had lower survival (P < 0.001). Model explainability via saliency mapping revealed that lateral precordial leads influence all outcome predictions, with high-risk features including QRS widening and T-wave inversions for RV dysfunction/dilation. CONCLUSIONS: AI-ECG shows promise to predict biventricular dysfunction/dilation, which may help inform CMR timing in CHD.


Subject(s)
Deep Learning , Electrocardiography , Heart Defects, Congenital , Humans , Electrocardiography/methods , Female , Male , Heart Defects, Congenital/physiopathology , Heart Defects, Congenital/complications , Heart Defects, Congenital/diagnosis , Adult , Adolescent , Young Adult , Child , Ventricular Dysfunction, Left/physiopathology , Ventricular Dysfunction, Left/diagnosis , Ventricular Dysfunction, Left/diagnostic imaging , Ventricular Dysfunction, Right/physiopathology , Ventricular Dysfunction, Right/diagnostic imaging , Ventricular Dysfunction, Right/diagnosis , Magnetic Resonance Imaging, Cine/methods , Child, Preschool , Predictive Value of Tests
14.
J Am Med Inform Assoc ; 31(9): 1921-1928, 2024 Sep 01.
Article in English | MEDLINE | ID: mdl-38771093

ABSTRACT

BACKGROUND: Artificial intelligence (AI) and large language models (LLMs) can play a critical role in emergency room operations by augmenting decision-making about patient admission. However, there are no studies for LLMs using real-world data and scenarios, in comparison to and being informed by traditional supervised machine learning (ML) models. We evaluated the performance of GPT-4 for predicting patient admissions from emergency department (ED) visits. We compared performance to traditional ML models both naively and when informed by few-shot examples and/or numerical probabilities. METHODS: We conducted a retrospective study using electronic health records across 7 NYC hospitals. We trained Bio-Clinical-BERT and XGBoost (XGB) models on unstructured and structured data, respectively, and created an ensemble model reflecting ML performance. We then assessed GPT-4 capabilities in many scenarios: through Zero-shot, Few-shot with and without retrieval-augmented generation (RAG), and with and without ML numerical probabilities. RESULTS: The Ensemble ML model achieved an area under the receiver operating characteristic curve (AUC) of 0.88, an area under the precision-recall curve (AUPRC) of 0.72 and an accuracy of 82.9%. The naïve GPT-4's performance (0.79 AUC, 0.48 AUPRC, and 77.5% accuracy) showed substantial improvement when given limited, relevant data to learn from (ie, RAG) and underlying ML probabilities (0.87 AUC, 0.71 AUPRC, and 83.1% accuracy). Interestingly, RAG alone boosted performance to near peak levels (0.82 AUC, 0.56 AUPRC, and 81.3% accuracy). CONCLUSIONS: The naïve LLM had limited performance but showed significant improvement in predicting ED admissions when supplemented with real-world examples to learn from, particularly through RAG, and/or numerical probabilities from traditional ML models. Its peak performance, although slightly lower than the pure ML model, is noteworthy given its potential for providing reasoning behind predictions. Further refinement of LLMs with real-world data is necessary for successful integration as decision-support tools in care settings.


Subject(s)
Electronic Health Records , Emergency Service, Hospital , Patient Admission , Humans , Retrospective Studies , Artificial Intelligence , Natural Language Processing , Machine Learning , Supervised Machine Learning
15.
J Am Heart Assoc ; 13(1): e031671, 2024 Jan 02.
Article in English | MEDLINE | ID: mdl-38156471

ABSTRACT

BACKGROUND: Right ventricular ejection fraction (RVEF) and end-diastolic volume (RVEDV) are not readily assessed through traditional modalities. Deep learning-enabled ECG analysis for estimation of right ventricular (RV) size or function is unexplored. METHODS AND RESULTS: We trained a deep learning-ECG model to predict RV dilation (RVEDV >120 mL/m2), RV dysfunction (RVEF ≤40%), and numerical RVEDV and RVEF from a 12-lead ECG paired with reference-standard cardiac magnetic resonance imaging volumetric measurements in UK Biobank (UKBB; n=42 938). We fine-tuned in a multicenter health system (MSHoriginal [Mount Sinai Hospital]; n=3019) with prospective validation over 4 months (MSHvalidation; n=115). We evaluated performance with area under the receiver operating characteristic curve for categorical and mean absolute error for continuous measures overall and in key subgroups. We assessed the association of RVEF prediction with transplant-free survival with Cox proportional hazards models. The prevalence of RV dysfunction for UKBB/MSHoriginal/MSHvalidation cohorts was 1.0%/18.0%/15.7%, respectively. RV dysfunction model area under the receiver operating characteristic curve for UKBB/MSHoriginal/MSHvalidation cohorts was 0.86/0.81/0.77, respectively. The prevalence of RV dilation for UKBB/MSHoriginal/MSHvalidation cohorts was 1.6%/10.6%/4.3%. RV dilation model area under the receiver operating characteristic curve for UKBB/MSHoriginal/MSHvalidation cohorts was 0.91/0.81/0.92, respectively. MSHoriginal mean absolute error was RVEF=7.8% and RVEDV=17.6 mL/m2. The performance of the RVEF model was similar in key subgroups including with and without left ventricular dysfunction. Over a median follow-up of 2.3 years, predicted RVEF was associated with adjusted transplant-free survival (hazard ratio, 1.40 for each 10% decrease; P=0.031). CONCLUSIONS: Deep learning-ECG analysis can identify significant cardiac magnetic resonance imaging RV dysfunction and dilation with good performance. Predicted RVEF is associated with clinical outcome.


Subject(s)
Ventricular Dysfunction, Right , Ventricular Function, Right , Humans , Stroke Volume , Magnetic Resonance Imaging/methods , Heart , Electrocardiography
16.
medRxiv ; 2024 Jun 29.
Article in English | MEDLINE | ID: mdl-38559021

ABSTRACT

Background: Point-of-care ultrasonography (POCUS) enables cardiac imaging at the bedside and in communities but is limited by abbreviated protocols and variation in quality. We developed and tested artificial intelligence (AI) models to automate the detection of underdiagnosed cardiomyopathies from cardiac POCUS. Methods: In a development set of 290,245 transthoracic echocardiographic videos across the Yale-New Haven Health System (YNHHS), we used augmentation approaches and a customized loss function weighted for view quality to derive a POCUS-adapted, multi-label, video-based convolutional neural network (CNN) that discriminates HCM (hypertrophic cardiomyopathy) and ATTR-CM (transthyretin amyloid cardiomyopathy) from controls without known disease. We evaluated the final model across independent, internal and external, retrospective cohorts of individuals who underwent cardiac POCUS across YNHHS and Mount Sinai Health System (MSHS) emergency departments (EDs) (2011-2024) to prioritize key views and validate the diagnostic and prognostic performance of single-view screening protocols. Findings: We identified 33,127 patients (median age 61 [IQR: 45-75] years, n=17,276 [52·2%] female) at YNHHS and 5,624 (57 [IQR: 39-71] years, n=1,953 [34·7%] female) at MSHS with 78,054 and 13,796 eligible cardiac POCUS videos, respectively. An AI-enabled single-view screening approach successfully discriminated HCM (AUROC of 0·90 [YNHHS] & 0·89 [MSHS]) and ATTR-CM (YNHHS: AUROC of 0·92 [YNHHS] & 0·99 [MSHS]). In YNHHS, 40 (58·0%) HCM and 23 (47·9%) ATTR-CM cases had a positive screen at median of 2·1 [IQR: 0·9-4·5] and 1·9 [IQR: 1·0-3·4] years before clinical diagnosis. Moreover, among 24,448 participants without known cardiomyopathy followed over 2·2 [IQR: 1·1-5·8] years, AI-POCUS probabilities in the highest (vs lowest) quintile for HCM and ATTR-CM conferred a 15% (adj.HR 1·15 [95%CI: 1·02-1·29]) and 39% (adj.HR 1·39 [95%CI: 1·22-1·59]) higher age- and sex-adjusted mortality risk, respectively. Interpretation: We developed and validated an AI framework that enables scalable, opportunistic screening of treatable cardiomyopathies wherever POCUS is used. Funding: National Heart, Lung and Blood Institute, Doris Duke Charitable Foundation, BridgeBio.

17.
NPJ Digit Med ; 7(1): 226, 2024 Aug 24.
Article in English | MEDLINE | ID: mdl-39181999

ABSTRACT

Congenital long QT syndrome (LQTS) diagnosis is complicated by limited genetic testing at scale, low prevalence, and normal QT corrected interval in patients with high-risk genotypes. We developed a deep learning approach combining electrocardiogram (ECG) waveform and electronic health record data to assess whether patients had pathogenic variants causing LQTS. We defined patients with high-risk genotypes as having ≥1 pathogenic variant in one of the LQTS-susceptibility genes. We trained the model using data from United Kingdom Biobank (UKBB) and then fine-tuned in a racially/ethnically diverse cohort using Mount Sinai BioMe Biobank. Following group-stratified 5-fold splitting, the fine-tuned model achieved area under the precision-recall curve of 0.29 (95% confidence interval [CI] 0.28-0.29) and area under the receiver operating curve of 0.83 (0.82-0.83) on independent testing data from BioMe. Multimodal fusion learning has promise to identify individuals with pathogenic genetic mutations to enable patient prioritization for further work up.

18.
NPJ Digit Med ; 7(1): 233, 2024 Sep 05.
Article in English | MEDLINE | ID: mdl-39237755

ABSTRACT

Increased intracranial pressure (ICP) ≥15 mmHg is associated with adverse neurological outcomes, but needs invasive intracranial monitoring. Using the publicly available MIMIC-III Waveform Database (2000-2013) from Boston, we developed an artificial intelligence-derived biomarker for elevated ICP (aICP) for adult patients. aICP uses routinely collected extracranial waveform data as input, reducing the need for invasive monitoring. We externally validated aICP with an independent dataset from the Mount Sinai Hospital (2020-2022) in New York City. The AUROC, accuracy, sensitivity, and specificity on the external validation dataset were 0.80 (95% CI, 0.80-0.80), 73.8% (95% CI, 72.0-75.6%), 73.5% (95% CI 72.5-74.5%), and 73.0% (95% CI, 72.0-74.0%), respectively. We also present an exploratory analysis showing aICP predictions are associated with clinical phenotypes. A ten-percentile increment was associated with brain malignancy (OR = 1.68; 95% CI, 1.09-2.60), intracerebral hemorrhage (OR = 1.18; 95% CI, 1.07-1.32), and craniotomy (OR = 1.43; 95% CI, 1.12-1.84; P < 0.05 for all).

19.
medRxiv ; 2024 Jan 30.
Article in English | MEDLINE | ID: mdl-38352556

ABSTRACT

Importance: Increased intracranial pressure (ICP) is associated with adverse neurological outcomes, but needs invasive monitoring. Objective: Development and validation of an AI approach for detecting increased ICP (aICP) using only non-invasive extracranial physiological waveform data. Design: Retrospective diagnostic study of AI-assisted detection of increased ICP. We developed an AI model using exclusively extracranial waveforms, externally validated it and assessed associations with clinical outcomes. Setting: MIMIC-III Waveform Database (2000-2013), a database derived from patients admitted to an ICU in an academic Boston hospital, was used for development of the aICP model, and to report association with neurologic outcomes. Data from Mount Sinai Hospital (2020-2022) in New York City was used for external validation. Participants: Patients were included if they were older than 18 years, and were monitored with electrocardiograms, arterial blood pressure, respiratory impedance plethysmography and pulse oximetry. Patients who additionally had intracranial pressure monitoring were used for development (N=157) and external validation (N=56). Patients without intracranial monitors were used for association with outcomes (N=1694). Exposures: Extracranial waveforms including electrocardiogram, arterial blood pressure, plethysmography and SpO2. Main Outcomes and Measures: Intracranial pressure > 15 mmHg. Measures were Area under receiver operating characteristic curves (AUROCs), sensitivity, specificity, and accuracy at threshold of 0.5. We calculated odds ratios and p-values for phenotype association. Results: The AUROC was 0.91 (95% CI, 0.90-0.91) on testing and 0.80 (95% CI, 0.80-0.80) on external validation. aICP had accuracy, sensitivity, and specificity of 73.8% (95% CI, 72.0%-75.6%), 99.5% (95% CI 99.3%-99.6%), and 76.9% (95% CI, 74.0-79.8%) on external validation. A ten-percentile increment was associated with stroke (OR=2.12; 95% CI, 1.27-3.13), brain malignancy (OR=1.68; 95% CI, 1.09-2.60), subdural hemorrhage (OR=1.66; 95% CI, 1.07-2.57), intracerebral hemorrhage (OR=1.18; 95% CI, 1.07-1.32), and procedures like percutaneous brain biopsy (OR=1.58; 95% CI, 1.15-2.18) and craniotomy (OR = 1.43; 95% CI, 1.12-1.84; P < 0.05 for all). Conclusions and Relevance: aICP provides accurate, non-invasive estimation of increased ICP, and is associated with neurological outcomes and neurosurgical procedures in patients without intracranial monitoring.

20.
Sci Rep ; 13(1): 16492, 2023 10 01.
Article in English | MEDLINE | ID: mdl-37779171

ABSTRACT

The United States Medical Licensing Examination (USMLE) has been a subject of performance study for artificial intelligence (AI) models. However, their performance on questions involving USMLE soft skills remains unexplored. This study aimed to evaluate ChatGPT and GPT-4 on USMLE questions involving communication skills, ethics, empathy, and professionalism. We used 80 USMLE-style questions involving soft skills, taken from the USMLE website and the AMBOSS question bank. A follow-up query was used to assess the models' consistency. The performance of the AI models was compared to that of previous AMBOSS users. GPT-4 outperformed ChatGPT, correctly answering 90% compared to ChatGPT's 62.5%. GPT-4 showed more confidence, not revising any responses, while ChatGPT modified its original answers 82.5% of the time. The performance of GPT-4 was higher than that of AMBOSS's past users. Both AI models, notably GPT-4, showed capacity for empathy, indicating AI's potential to meet the complex interpersonal, ethical, and professional demands intrinsic to the practice of medicine.


Subject(s)
Artificial Intelligence , Medicine , Empathy , Mental Processes
SELECTION OF CITATIONS
SEARCH DETAIL