RESUMO
BACKGROUND: Substantial effort has been directed toward demonstrating uses of predictive models in health care. However, implementation of these models into clinical practice may influence patient outcomes, which in turn are captured in electronic health record data. As a result, deployed models may affect the predictive ability of current and future models. OBJECTIVE: To estimate changes in predictive model performance with use through 3 common scenarios: model retraining, sequentially implementing 1 model after another, and intervening in response to a model when 2 are simultaneously implemented. DESIGN: Simulation of model implementation and use in critical care settings at various levels of intervention effectiveness and clinician adherence. Models were either trained or retrained after simulated implementation. SETTING: Admissions to the intensive care unit (ICU) at Mount Sinai Health System (New York, New York) and Beth Israel Deaconess Medical Center (Boston, Massachusetts). PATIENTS: 130 000 critical care admissions across both health systems. INTERVENTION: Across 3 scenarios, interventions were simulated at varying levels of clinician adherence and effectiveness. MEASUREMENTS: Statistical measures of performance, including threshold-independent (area under the curve) and threshold-dependent measures. RESULTS: At fixed 90% sensitivity, in scenario 1 a mortality prediction model lost 9% to 39% specificity after retraining once and in scenario 2 a mortality prediction model lost 8% to 15% specificity when created after the implementation of an acute kidney injury (AKI) prediction model; in scenario 3, models for AKI and mortality prediction implemented simultaneously, each led to reduced effective accuracy of the other by 1% to 28%. LIMITATIONS: In real-world practice, the effectiveness of and adherence to model-based recommendations are rarely known in advance. Only binary classifiers for tabular ICU admissions data were simulated. CONCLUSION: In simulated ICU settings, a universally effective model-updating approach for maintaining model performance does not seem to exist. Model use may have to be recorded to maintain viability of predictive modeling. PRIMARY FUNDING SOURCE: National Center for Advancing Translational Sciences.
Assuntos
Injúria Renal Aguda , Inteligência Artificial , Humanos , Unidades de Terapia Intensiva , Cuidados Críticos , Atenção à SaúdeRESUMO
PURPOSE OF REVIEW: Risk stratification for chronic kidney is becoming increasingly important as a clinical tool for both treatment and prevention measures. The goal of this review is to identify how machine learning tools contribute and facilitate risk stratification in the clinical setting. RECENT FINDINGS: The two key machine learning paradigms to predictively stratify kidney disease risk are genomics-based and electronic health record based approaches. These methods can provide both quantitative information such as relative risk and qualitative information such as characterizing risk by subphenotype. SUMMARY: The four key methods to stratify chronic kidney disease risk are genomics, multiomics, supervised and unsupervised machine learning methods. Polygenic risk scores utilize whole genome sequencing data to generate an individual's relative risk compared with the population. Multiomic methods integrate information from multiple biomarkers to generate trajectories and prognostic different outcomes. Supervised machine learning methods can directly utilize the growing compendia of electronic health records such as laboratory results and notes to generate direct risk predictions, while unsupervised machine learning methods can cluster individuals with chronic kidney disease into subphenotypes with differing approaches to care.
Assuntos
Aprendizado de Máquina , Insuficiência Renal Crônica , Biomarcadores , Registros Eletrônicos de Saúde , Humanos , Insuficiência Renal Crônica/diagnóstico , Insuficiência Renal Crônica/genética , Insuficiência Renal Crônica/terapia , Medição de RiscoRESUMO
OBJECTIVE: To compare the efficacy and safety of dual antiplatelet therapy (DAPT) and triple therapy (TT, dual antiplatelet plus warfarin) in patients with myocardial infarction (MI) or PCI with stenting (PCI-S) who also require chronic oral anticoagulation. BACKGROUND: Recommendations for the optimal antiplatelet/anticoagulant treatment regimen for patients undergoing PCI-S or MI who also require oral anticoagulation are largely based on evidence from observational studies and expert opinions. METHODS: A systematic search was performed for studies comparing TT vs. DAPT in patients post PCI-S or MI and requiring chronic anticoagulation. Primary outcome was all-cause mortality. Secondary outcomes were ischemic stroke, major bleeding, MI, and stent thrombosis. Pooled relative risks (RR) were calculated using random effects model. RESULTS: A total of 17 studies were included, with 14,921 patients [TT: 5,819(39%) and DAPT: 9,102(61%)] and a mean follow-up of 1.6 years. The majority of patients required oral anticoagulation for atrial fibrillation. Compared to DAPT, patients treated with TT had no significant difference in all-cause mortality [RR: 0.81, 95% confidence interval (CI): 0.61-1.08, P = 0.15], MI [RR 0.74, 95% CI: 0.51-1.06, P = 0.10], and stent thrombosis [RR 0.67, 95% CI: 0.35-1.30, P = 0.24]. Patients treated with TT had significantly increased risk of major bleeding [RR 1.20, 95% CI: 1.03-1.39, P = 0.02], whereas the risk for ischemic stroke was significantly lower [RR 0.59, 95% CI: 0.38-0.92, P = 0.02]. CONCLUSIONS: All-cause mortality appears similar in patients treated with TT or DAPT although TT was associated with higher rates of major bleeding and a lower risk for ischemic stroke. © 2015 Wiley Periodicals, Inc.
Assuntos
Anticoagulantes/administração & dosagem , Fibrilação Atrial/tratamento farmacológico , Infarto do Miocárdio/terapia , Intervenção Coronária Percutânea , Inibidores da Agregação Plaquetária/administração & dosagem , Varfarina/administração & dosagem , Administração Oral , Anticoagulantes/efeitos adversos , Fibrilação Atrial/complicações , Fibrilação Atrial/diagnóstico , Fibrilação Atrial/mortalidade , Distribuição de Qui-Quadrado , Comorbidade , Quimioterapia Combinada , Hemorragia/induzido quimicamente , Humanos , Infarto do Miocárdio/complicações , Infarto do Miocárdio/diagnóstico , Infarto do Miocárdio/mortalidade , Estudos Observacionais como Assunto , Razão de Chances , Intervenção Coronária Percutânea/efeitos adversos , Intervenção Coronária Percutânea/instrumentação , Inibidores da Agregação Plaquetária/efeitos adversos , Fatores de Risco , Stents , Acidente Vascular Cerebral/etiologia , Acidente Vascular Cerebral/prevenção & controle , Resultado do Tratamento , Varfarina/efeitos adversosRESUMO
BACKGROUND: Artificial intelligence (AI) and large language models (LLMs) can play a critical role in emergency room operations by augmenting decision-making about patient admission. However, there are no studies for LLMs using real-world data and scenarios, in comparison to and being informed by traditional supervised machine learning (ML) models. We evaluated the performance of GPT-4 for predicting patient admissions from emergency department (ED) visits. We compared performance to traditional ML models both naively and when informed by few-shot examples and/or numerical probabilities. METHODS: We conducted a retrospective study using electronic health records across 7 NYC hospitals. We trained Bio-Clinical-BERT and XGBoost (XGB) models on unstructured and structured data, respectively, and created an ensemble model reflecting ML performance. We then assessed GPT-4 capabilities in many scenarios: through Zero-shot, Few-shot with and without retrieval-augmented generation (RAG), and with and without ML numerical probabilities. RESULTS: The Ensemble ML model achieved an area under the receiver operating characteristic curve (AUC) of 0.88, an area under the precision-recall curve (AUPRC) of 0.72 and an accuracy of 82.9%. The naïve GPT-4's performance (0.79 AUC, 0.48 AUPRC, and 77.5% accuracy) showed substantial improvement when given limited, relevant data to learn from (ie, RAG) and underlying ML probabilities (0.87 AUC, 0.71 AUPRC, and 83.1% accuracy). Interestingly, RAG alone boosted performance to near peak levels (0.82 AUC, 0.56 AUPRC, and 81.3% accuracy). CONCLUSIONS: The naïve LLM had limited performance but showed significant improvement in predicting ED admissions when supplemented with real-world examples to learn from, particularly through RAG, and/or numerical probabilities from traditional ML models. Its peak performance, although slightly lower than the pure ML model, is noteworthy given its potential for providing reasoning behind predictions. Further refinement of LLMs with real-world data is necessary for successful integration as decision-support tools in care settings.
Assuntos
Registros Eletrônicos de Saúde , Serviço Hospitalar de Emergência , Admissão do Paciente , Humanos , Estudos Retrospectivos , Inteligência Artificial , Processamento de Linguagem Natural , Aprendizado de Máquina , Aprendizado de Máquina SupervisionadoRESUMO
Importance: Increased intracranial pressure (ICP) is associated with adverse neurological outcomes, but needs invasive monitoring. Objective: Development and validation of an AI approach for detecting increased ICP (aICP) using only non-invasive extracranial physiological waveform data. Design: Retrospective diagnostic study of AI-assisted detection of increased ICP. We developed an AI model using exclusively extracranial waveforms, externally validated it and assessed associations with clinical outcomes. Setting: MIMIC-III Waveform Database (2000-2013), a database derived from patients admitted to an ICU in an academic Boston hospital, was used for development of the aICP model, and to report association with neurologic outcomes. Data from Mount Sinai Hospital (2020-2022) in New York City was used for external validation. Participants: Patients were included if they were older than 18 years, and were monitored with electrocardiograms, arterial blood pressure, respiratory impedance plethysmography and pulse oximetry. Patients who additionally had intracranial pressure monitoring were used for development (N=157) and external validation (N=56). Patients without intracranial monitors were used for association with outcomes (N=1694). Exposures: Extracranial waveforms including electrocardiogram, arterial blood pressure, plethysmography and SpO2. Main Outcomes and Measures: Intracranial pressure > 15 mmHg. Measures were Area under receiver operating characteristic curves (AUROCs), sensitivity, specificity, and accuracy at threshold of 0.5. We calculated odds ratios and p-values for phenotype association. Results: The AUROC was 0.91 (95% CI, 0.90-0.91) on testing and 0.80 (95% CI, 0.80-0.80) on external validation. aICP had accuracy, sensitivity, and specificity of 73.8% (95% CI, 72.0%-75.6%), 99.5% (95% CI 99.3%-99.6%), and 76.9% (95% CI, 74.0-79.8%) on external validation. A ten-percentile increment was associated with stroke (OR=2.12; 95% CI, 1.27-3.13), brain malignancy (OR=1.68; 95% CI, 1.09-2.60), subdural hemorrhage (OR=1.66; 95% CI, 1.07-2.57), intracerebral hemorrhage (OR=1.18; 95% CI, 1.07-1.32), and procedures like percutaneous brain biopsy (OR=1.58; 95% CI, 1.15-2.18) and craniotomy (OR = 1.43; 95% CI, 1.12-1.84; P < 0.05 for all). Conclusions and Relevance: aICP provides accurate, non-invasive estimation of increased ICP, and is associated with neurological outcomes and neurosurgical procedures in patients without intracranial monitoring.
RESUMO
Purpose: Intravenous fluids are mainstay of management of acute kidney injury (AKI) after sepsis but can cause fluid overload. Recent literature shows that restrictive fluid strategy may be beneficial in some patients with AKI, however, identifying these patients is challenging. We aimed to develop and validate a machine learning algorithm to identify patients who would benefit from a restrictive fluid strategy. Methods: We included patients with sepsis who developed AKI within 48 hours of ICU admission and defined restrictive fluid strategy as receiving <500mL fluids within 24 hours after AKI. Our primary outcome was early AKI reversal within 48 hours of AKI onset, and secondary outcomes included sustained AKI reversal and major adverse kidney events (MAKE) at discharge. We used a causal forest, a machine learning algorithm to estimate individual treatment effects and policy tree algorithm to identify patients who would benefit by restrictive fluid strategy. We developed the algorithm in MIMIC-IV and validated it in eICU database. Results: Among 2,091 patients in the external validation cohort, policy tree recommended restrictive fluids for 88.2%. Among these, patients who received restrictive fluids demonstrated significantly higher rate of early AKI reversal (48.2% vs 39.6%, p<0.001), sustained AKI reversal (36.7% vs 27.4%, p<0.001) and lower rates of MAKE by discharge (29.3% vs 35.1%, p=0.019). These results were consistent in adjusted analysis. Conclusion: Policy tree based on causal machine learning can identify septic patients with AKI who benefit from a restrictive fluid strategy. This approach needs to be validated in prospective trials.
RESUMO
Increased intracranial pressure (ICP) ≥15 mmHg is associated with adverse neurological outcomes, but needs invasive intracranial monitoring. Using the publicly available MIMIC-III Waveform Database (2000-2013) from Boston, we developed an artificial intelligence-derived biomarker for elevated ICP (aICP) for adult patients. aICP uses routinely collected extracranial waveform data as input, reducing the need for invasive monitoring. We externally validated aICP with an independent dataset from the Mount Sinai Hospital (2020-2022) in New York City. The AUROC, accuracy, sensitivity, and specificity on the external validation dataset were 0.80 (95% CI, 0.80-0.80), 73.8% (95% CI, 72.0-75.6%), 73.5% (95% CI 72.5-74.5%), and 73.0% (95% CI, 72.0-74.0%), respectively. We also present an exploratory analysis showing aICP predictions are associated with clinical phenotypes. A ten-percentile increment was associated with brain malignancy (OR = 1.68; 95% CI, 1.09-2.60), intracerebral hemorrhage (OR = 1.18; 95% CI, 1.07-1.32), and craniotomy (OR = 1.43; 95% CI, 1.12-1.84; P < 0.05 for all).
RESUMO
The electrocardiogram (ECG) is a ubiquitous diagnostic modality. Convolutional neural networks (CNNs) applied towards ECG analysis require large sample sizes, and transfer learning approaches for biomedical problems may result in suboptimal performance when pre-training is done on natural images. We leveraged masked image modeling to create a vision-based transformer model, HeartBEiT, for electrocardiogram waveform analysis. We pre-trained this model on 8.5 million ECGs and then compared performance vs. standard CNN architectures for diagnosis of hypertrophic cardiomyopathy, low left ventricular ejection fraction and ST elevation myocardial infarction using differing training sample sizes and independent validation datasets. We find that HeartBEiT has significantly higher performance at lower sample sizes compared to other models. We also find that HeartBEiT improves explainability of diagnosis by highlighting biologically relevant regions of the EKG vs. standard CNNs. Domain specific pre-trained transformer models may exceed the classification performance of models trained on natural images especially in very low data regimes. The combination of the architecture and such pre-training allows for more accurate, granular explainability of model predictions.
RESUMO
OBJECTIVE: The novel coronavirus disease 2019 (COVID-19) has heterogenous clinical courses, indicating that there might be distinct subphenotypes in critically ill patients. Although prior research has identified these subphenotypes, the temporal pattern of multiple clinical features has not been considered in cluster models. We aimed to identify temporal subphenotypes in critically ill patients with COVID-19 using a novel sequence cluster analysis and associate them with clinically relevant outcomes. MATERIALS AND METHODS: We analyzed 1036 confirmed critically ill patients with laboratory-confirmed SARS-COV-2 infection admitted to the Mount Sinai Health System in New York city. The agglomerative hierarchical clustering method was used with Levenshtein distance and Ward's minimum variance linkage. RESULTS: We identified four subphenotypes. Subphenotype I (N = 233 [22.5%]) included patients with rapid respirations and a rapid heartbeat but less need for invasive interventions within the first 24 hours, along with a relatively good prognosis. Subphenotype II (N = 418 [40.3%]) represented patients with the least degree of ailments, relatively low mortality, and the highest probability of discharge from the hospital. Subphenotype III (N = 259 [25.0%]) represented patients who experienced clinical deterioration during the first 24 hours of intensive care unit admission, leading to poor outcomes. Subphenotype IV (N = 126 [12.2%]) represented an acute respiratory distress syndrome trajectory with an almost universal need for mechanical ventilation. CONCLUSION: We utilized the sequence cluster analysis to identify clinical subphenotypes in critically ill COVID-19 patients who had distinct temporal patterns and different clinical outcomes. This study points toward the utility of including temporal information in subphenotyping approaches.
Assuntos
COVID-19 , Síndrome do Desconforto Respiratório , Análise por Conglomerados , Humanos , Unidades de Terapia Intensiva , SARS-CoV-2RESUMO
Sample size estimation is a crucial step in experimental design but is understudied in the context of deep learning. Currently, estimating the quantity of labeled data needed to train a classifier to a desired performance, is largely based on prior experience with similar models and problems or on untested heuristics. In many supervised machine learning applications, data labeling can be expensive and time-consuming and would benefit from a more rigorous means of estimating labeling requirements. Here, we study the problem of estimating the minimum sample size of labeled training data necessary for training computer vision models as an exemplar for other deep learning problems. We consider the problem of identifying the minimal number of labeled data points to achieve a generalizable representation of the data, a minimum converging sample (MCS). We use autoencoder loss to estimate the MCS for fully connected neural network classifiers. At sample sizes smaller than the MCS estimate, fully connected networks fail to distinguish classes, and at sample sizes above the MCS estimate, generalizability strongly correlates with the loss function of the autoencoder. We provide an easily accessible, code-free, and dataset-agnostic tool to estimate sample sizes for fully connected networks. Taken together, our findings suggest that MCS and convergence estimation are promising methods to guide sample size estimates for data collection and labeling prior to training deep learning models in computer vision.
RESUMO
BACKGROUND AND OBJECTIVES: Left ventricular ejection fraction is disrupted in patients on maintenance hemodialysis and can be estimated using deep learning models on electrocardiograms. Smaller sample sizes within this population may be mitigated using transfer learning. DESIGN, SETTING, PARTICIPANTS, & MEASUREMENTS: We identified patients on hemodialysis with transthoracic echocardiograms within 7 days of electrocardiogram using diagnostic/procedure codes. We developed four models: (1) trained from scratch in patients on hemodialysis, (2) pretrained on a publicly available set of natural images (ImageNet), (3) pretrained on all patients not on hemodialysis, and (4) pretrained on patients not on hemodialysis and fine-tuned on patients on hemodialysis. We assessed the ability of the models to classify left ventricular ejection fraction into clinically relevant categories of ≤40%, 41% to ≤50%, and >50%. We compared performance by area under the receiver operating characteristic curve. RESULTS: We extracted 705,075 electrocardiogram:echocardiogram pairs for 158,840 patients not on hemodialysis used for development of models 3 and 4 and n=18,626 electrocardiogram:echocardiogram pairs for 2168 patients on hemodialysis for models 1, 2, and 4. The transfer learning model achieved area under the receiver operating characteristic curves of 0.86, 0.63, and 0.83 in predicting left ventricular ejection fraction categories of ≤40% (n=461), 41%-50% (n=398), and >50% (n=1309), respectively. For the same tasks, model 1 achieved area under the receiver operating characteristic curves of 0.74, 0.55, and 0.71, respectively; model 2 achieved area under the receiver operating characteristic curves of 0.71, 0.55, and 0.69, respectively, and model 3 achieved area under the receiver operating characteristic curves of 0.80, 0.51, and 0.77, respectively. We found that predictions of left ventricular ejection fraction by the transfer learning model were associated with mortality in a Cox regression with an adjusted hazard ratio of 1.29 (95% confidence interval, 1.04 to 1.59). CONCLUSION: A deep learning model can determine left ventricular ejection fraction for patients on hemodialysis following pretraining on electrocardiograms of patients not on hemodialysis. Predictions of low ejection fraction from this model were associated with mortality over a 5-year follow-up period. PODCAST: This article contains a podcast at https://www.asn-online.org/media/podcast/CJASN/2022_06_06_CJN16481221.mp3.
Assuntos
Diálise Renal , Função Ventricular Esquerda , Ecocardiografia , Eletrocardiografia , Humanos , Diálise Renal/efeitos adversos , Volume SistólicoRESUMO
Federated learning is a technique for training predictive models without sharing patient-level data, thus maintaining data security while allowing inter-institutional collaboration. We used federated learning to predict acute kidney injury within three and seven days of admission, using demographics, comorbidities, vital signs, and laboratory values, in 4029 adults hospitalized with COVID-19 at five sociodemographically diverse New York City hospitals, between March-October 2020. Prediction performance of federated models was generally higher than single-hospital models and was comparable to pooled-data models. In the first use-case in kidney disease, federated learning improved prediction of a common complication of COVID-19, while preserving data privacy.
RESUMO
Epidermal growth factor receptor (EGFR) has been an attractive target for treatment of epithelial cancers, including colorectal cancer (CRC). Evidence from clinical trials indicates that cetuximab and panitumumab (anti-EGFR monoclonal antibodies) have clinical activity in patients with metastatic CRC. The discovery of intrinsic EGFR blockade resistance in Kirsten RAS (KRAS)-mutant patients led to the restriction of anti-EGFR antibodies to KRAS wild-type patients by Food and Drug Administration and European Medicine Agency. Studies have since focused on the evaluation of biomarkers to identify appropriate patient populations that may benefit from EGFR blockade. Accumulating evidence suggests that patients with mutations in EGFR downstream signaling pathways including KRAS, BRAF, PIK3CA and PTEN could be intrinsically resistant to EGFR blockade. Recent whole genome studies also suggest that dynamic alterations in signaling pathways downstream of EGFR leads to distinct oncogenic signatures and subclones which might have some impact on emerging resistance in KRAS wild-type patients. While anti-EGFR monoclonal antibodies have a clear potential in the management of a subset of patients with metastatic CRC, further studies are warranted to uncover exact mechanisms related to acquired resistance to EGFR blockade.