Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-38703162

RESUMEN

BACKGROUND: Prediction of drug-induced long QT syndrome (diLQTS) is of critical importance given its association with torsades de pointes. There is no reliable method for the outpatient prediction of diLQTS. OBJECTIVES: This study sought to evaluate the use of a convolutional neural network (CNN) applied to electrocardiograms (ECGs) to predict diLQTS in an outpatient population. METHODS: We identified all adult outpatients newly prescribed a QT-prolonging medication between January 1, 2003, and March 31, 2022, who had a 12-lead sinus ECG in the preceding 6 months. Using risk factor data and the ECG signal as inputs, the CNN QTNet was implemented in TensorFlow to predict diLQTS. RESULTS: Models were evaluated in a held-out test dataset of 44,386 patients (57% female) with a median age of 62 years. Compared with 3 other models relying on risk factors or ECG signal or baseline QTc alone, QTNet achieved the best (P < 0.001) performance with a mean area under the curve of 0.802 (95% CI: 0.786-0.818). In a survival analysis, QTNet also had the highest inverse probability of censorship-weighted area under the receiver-operating characteristic curve at day 2 (0.875; 95% CI: 0.848-0.904) and up to 6 months. In a subgroup analysis, QTNet performed best among males and patients ≤50 years or with baseline QTc <450 ms. In an external validation cohort of solely suburban outpatient practices, QTNet similarly maintained the highest predictive performance. CONCLUSIONS: An ECG-based CNN can accurately predict diLQTS in the outpatient setting while maintaining its predictive performance over time. In the outpatient setting, our model could identify higher-risk individuals who would benefit from closer monitoring.

2.
BMC Med Inform Decis Mak ; 24(1): 111, 2024 Apr 26.
Artículo en Inglés | MEDLINE | ID: mdl-38664664

RESUMEN

In cancer research there is much interest in building and validating outcome prediction models to support treatment decisions. However, because most outcome prediction models are developed and validated without regard to the causal aspects of treatment decision making, many published outcome prediction models may cause harm when used for decision making, despite being found accurate in validation studies. Guidelines on prediction model validation and the checklist for risk model endorsement by the American Joint Committee on Cancer do not protect against prediction models that are accurate during development and validation but harmful when used for decision making. We explain why this is the case and how to build and validate models that are useful for decision making.


Asunto(s)
Algoritmos , Humanos , Causalidad , Toma de Decisiones Clínicas , Neoplasias/terapia , Mejoramiento de la Calidad
3.
Artículo en Inglés | MEDLINE | ID: mdl-38518758

RESUMEN

BACKGROUND: Myocardial infarction and heart failure are major cardiovascular diseases that affect millions of people in the US with the morbidity and mortality being highest among patients who develop cardiogenic shock. Early recognition of cardiogenic shock allows prompt implementation of treatment measures. Our objective is to develop a new dynamic risk score, called CShock, to improve early detection of cardiogenic shock in cardiac intensive care unit (ICU). METHODS: We developed and externally validated a deep learning-based risk stratification tool, called CShock, for patients admitted into the cardiac ICU with acute decompensated heart failure and/or myocardial infarction to predict onset of cardiogenic shock. We prepared a cardiac ICU dataset using MIMIC-III database by annotating with physician adjudicated outcomes. This dataset that consisted of 1500 patients with 204 having cardiogenic/mixed shock was then used to train CShock. The features used to train the model for CShock included patient demographics, cardiac ICU admission diagnoses, routinely measured laboratory values and vital signs, and relevant features manually extracted from echocardiogram and left heart catheterization reports. We externally validated the risk model on the New York University (NYU) Langone Health cardiac ICU database that was also annotated with physician adjudicated outcomes. The external validation cohort consisted of 131 patients with 25 patients experiencing cardiogenic/mixed shock. RESULTS: CShock achieved an area under the receiver operator characteristic curve (AUROC) of 0.821 (95% CI 0.792-0.850). CShock was externally validated in the more contemporary NYU cohort and achieved an AUROC of 0.800 (95% CI 0.717-0.884), demonstrating its generalizability in other cardiac ICUs. Having an elevated heart rate is most predictive of cardiogenic shock development based on Shapley values. The other top ten predictors are having an admission diagnosis of myocardial infarction with ST-segment elevation, having an admission diagnosis of acute decompensated heart failure, Braden Scale, Glasgow Coma Scale, Blood urea nitrogen, Systolic blood pressure, Serum chloride, Serum sodium, and Arterial blood pH. CONCLUSIONS: The novel CShock score has the potential to provide automated detection and early warning for cardiogenic shock and improve the outcomes for the millions of patients who suffer from myocardial infarction and heart failure.

4.
ArXiv ; 2023 Nov 21.
Artículo en Inglés | MEDLINE | ID: mdl-38045479

RESUMEN

Automatic assessment of impairment and disease severity is a key challenge in data-driven medicine. We propose a novel framework to address this challenge, which leverages AI models trained exclusively on healthy individuals. The COnfidence-Based chaRacterization of Anomalies (COBRA) score exploits the decrease in confidence of these models when presented with impaired or diseased patients to quantify their deviation from the healthy population. We applied the COBRA score to address a key limitation of current clinical evaluation of upper-body impairment in stroke patients. The gold-standard Fugl-Meyer Assessment (FMA) requires in-person administration by a trained assessor for 30-45 minutes, which restricts monitoring frequency and precludes physicians from adapting rehabilitation protocols to the progress of each patient. The COBRA score, computed automatically in under one minute, is shown to be strongly correlated with the FMA on an independent test cohort for two different data modalities: wearable sensors ($\rho = 0.845$, 95% CI [0.743,0.908]) and video ($\rho = 0.746$, 95% C.I [0.594, 0.847]). To demonstrate the generalizability of the approach to other conditions, the COBRA score was also applied to quantify severity of knee osteoarthritis from magnetic-resonance imaging scans, again achieving significant correlation with an independent clinical assessment ($\rho = 0.644$, 95% C.I [0.585,0.696]).

5.
Proc Mach Learn Res ; 206: 10343-10367, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-37681192

RESUMEN

Conditional randomization tests (CRTs) assess whether a variable x is predictive of another variable y, having observed covariates z. CRTs require fitting a large number of predictive models, which is often computationally intractable. Existing solutions to reduce the cost of CRTs typically split the dataset into a train and test portion, or rely on heuristics for interactions, both of which lead to a loss in power. We propose the decoupled independence test (DIET), an algorithm that avoids both of these issues by leveraging marginal independence statistics to test conditional independence relationships. DIET tests the marginal independence of two random variables: Fx∣z(x∣z) and Fy∣z(y∣z) where F⋅∣z(⋅∣z) is a conditional cumulative distribution function (CDF) for the distribution p(⋅∣z). These variables are termed "information residuals." We give sufficient conditions for DIET to achieve finite sample type-1 error control and power greater than the type-1 error rate. We then prove that when using the mutual information between the information residuals as a test statistic, DIET yields the most powerful conditionally valid test. Finally, we show DIET achieves higher power than other tractable CRTs on several synthetic and real benchmarks.

6.
Artículo en Inglés | MEDLINE | ID: mdl-38645403

RESUMEN

Deep Neural Networks (DNNs) are prone to learning spurious features that correlate with the label during training but are irrelevant to the learning problem. This hurts model generalization and poses problems when deploying them in safety-critical applications. This paper aims to better understand the effects of spurious features through the lens of the learning dynamics of the internal neurons during the training process. We make the following observations: (1) While previous works highlight the harmful effects of spurious features on the generalization ability of DNNs, we emphasize that not all spurious features are harmful. Spurious features can be "benign" or "harmful" depending on whether they are "harder" or "easier" to learn than the core features for a given model. This definition is model and dataset dependent. (2) We build upon this premise and use instance difficulty methods (like Prediction Depth (Baldock et al., 2021)) to quantify "easiness" for a given model and to identify this behavior during the training phase. (3) We empirically show that the harmful spurious features can be detected by observing the learning dynamics of the DNN's early layers. In other words, easy features learned by the initial layers of a DNN early during the training can (potentially) hurt model generalization. We verify our claims on medical and vision datasets, both simulated and real, and justify the empirical success of our hypothesis by showing the theoretical connections between Prediction Depth and information-theoretic concepts like 𝒱-usable information (Ethayarajh et al., 2021). Lastly, our experiments show that monitoring only accuracy during training (as is common in machine learning pipelines) is insufficient to detect spurious features. We, therefore, highlight the need for monitoring early training dynamics using suitable instance difficulty metrics.

7.
Proc AAAI Conf Artif Intell ; 37(12): 15305-15312, 2023 Jun 27.
Artículo en Inglés | MEDLINE | ID: mdl-38464961

RESUMEN

Methods which utilize the outputs or feature representations of predictive models have emerged as promising approaches for out-of-distribution (ood) detection of image inputs. However, these methods struggle to detect ood inputs that share nuisance values (e.g. background) with in-distribution inputs. The detection of shared-nuisance out-of-distribution (sn-ood) inputs is particularly relevant in real-world applications, as anomalies and in-distribution inputs tend to be captured in the same settings during deployment. In this work, we provide a possible explanation for sn-ood detection failures and propose nuisance-aware ood detection to address them. Nuisance-aware ood detection substitutes a classifier trained via Empirical Risk Minimization (erm) and cross-entropy loss with one that 1. is trained under a distribution where the nuisance-label relationship is broken and 2. yields representations that are independent of the nuisance under this distribution, both marginally and conditioned on the label. We can train a classifier to achieve these objectives using Nuisance-Randomized Distillation (NURD), an algorithm developed for ood generalization under spurious correlations. Output- and feature-based nuisance-aware ood detection perform substantially better than their original counterparts, succeeding even when detection based on domain generalization algorithms fails to improve performance.

8.
Sci Rep ; 12(1): 5848, 2022 04 07.
Artículo en Inglés | MEDLINE | ID: mdl-35393451

RESUMEN

Randomized Controlled Trials (RCT) are the gold standard for estimating treatment effects but some important situations in cancer care require treatment effect estimates from observational data. We developed "Proxy based individual treatment effect modeling in cancer" (PROTECT) to estimate treatment effects from observational data when there are unobserved confounders, but proxy measurements of these confounders exist. We identified an unobserved confounder in observational cancer research: overall fitness. Proxy measurements of overall fitness exist like performance score, but the fitness as observed by the treating physician is unavailable for research. PROTECT reconstructs the distribution of the unobserved confounder based on these proxy measurements to estimate the treatment effect. PROTECT was applied to an observational cohort of 504 stage III non-small cell lung cancer (NSCLC) patients, treated with concurrent chemoradiation or sequential chemoradiation. Whereas conventional confounding adjustment methods seemed to overestimate the treatment effect, PROTECT provided credible treatment effect estimates.


Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas , Neoplasias Pulmonares , Carcinoma de Pulmón de Células no Pequeñas/tratamiento farmacológico , Quimioradioterapia , Estudios de Cohortes , Humanos , Neoplasias Pulmonares/terapia
9.
Proc Mach Learn Res ; 162: 26559-26574, 2022 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37645424

RESUMEN

Permutation invariant neural networks are a promising tool for making predictions from sets. However, we show that existing permutation invariant architectures, Deep Sets and Set Transformer, can suffer from vanishing or exploding gradients when they are deep. Additionally, layer norm, the normalization of choice in Set Transformer, can hurt performance by removing information useful for prediction. To address these issues, we introduce the "clean path principle" for equivariant residual connections and develop set norm (sn), a normalization tailored for sets. With these, we build Deep Sets++ and Set Transformer++, models that reach high depths with better or comparable performance than their original counterparts on a diverse suite of tasks. We additionally introduce Flow-RBC, a new single-cell dataset and real-world application of permutation invariant prediction. We open-source our data and code here: https://github.com/rajesh-lab/deep_permutation_invariant.

10.
Proc Mach Learn Res ; 177: 290-301, 2022 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-37646010

RESUMEN

Spurious correlations allow flexible models to predict well during training but poorly on related test populations. Recent work has shown that models that satisfy particular independencies involving correlation-inducing nuisance variables have guarantees on their test performance. Enforcing such independencies requires nuisances to be observed during training. However, nuisances, such as demographics or image background labels, are often missing. Enforcing independence on just the observed data does not imply independence on the entire population. Here we derive MMD estimators used for invariance objectives under missing nuisances. On simulations and clinical data, optimizing through these estimates achieves test performance similar to using estimators that make use of the full data.

11.
Proc Mach Learn Res ; 182: 224-248, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37706207

RESUMEN

Survival analysis, the art of time-to-event modeling, plays an important role in clinical treatment decisions. Recently, continuous time models built from neural ODEs have been proposed for survival analysis. However, the training of neural ODEs is slow due to the high computational complexity of neural ODE solvers. Here, we propose an efficient alternative for flexible continuous time models, called Survival Mixture Density Networks (Survival MDNs). Survival MDN applies an invertible positive function to the output of Mixture Density Networks (MDNs). While MDNs produce flexible real-valued distributions, the invertible positive function maps the model into the time-domain while preserving a tractable density. Using four datasets, we show that Survival MDN performs better than, or similarly to continuous and discrete time baselines on concordance, integrated Brier score and integrated binomial log-likelihood. Meanwhile, Survival MDNs are also faster than ODE-based models and circumvent binning issues in discrete models.

13.
Proc Mach Learn Res ; 130: 1900-1908, 2021 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-34522887

RESUMEN

The holdout randomization test (HRT) discovers a set of covariates most predictive of a response. Given the covariate distribution, HRTs can explicitly control the false discovery rate (FDR). However, if this distribution is unknown and must be estimated from data, HRTs can inflate the FDR. To alleviate the inflation of FDR, we propose the contrarian randomization test (CONTRA), which is designed explicitly for scenarios where the covariate distribution must be estimated from data and may even be misspecified. Our key insight is to use an equal mixture of two "contrarian" probabilistic models in determining the importance of a covariate. One model is fit with the real data, while the other is fit using the same data, but with the covariate being tested replaced with samples from an estimate of the covariate distribution. CONTRA is flexible enough to achieve a power of 1 asymptotically, can reduce the FDR compared to state-of-the-art CVS methods when the covariate distribution is misspecified, and is computationally efficient in high dimensions and large sample sizes. We further demonstrate the effectiveness of CONTRA on numerous synthetic benchmarks, and highlight its capabilities on a genetic dataset.

14.
Annu Rev Biomed Data Sci ; 4: 393-415, 2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-34465179

RESUMEN

Machine learning can be used to make sense of healthcare data. Probabilistic machine learning models help provide a complete picture of observed data in healthcare. In this review, we examine how probabilistic machine learning can advance healthcare. We consider challenges in the predictive model building pipeline where probabilistic models can be beneficial, including calibration and missing data. Beyond predictive models, we also investigate the utility of probabilistic machine learning models in phenotyping, in generative models for clinical use cases, and in reinforcement learning.


Asunto(s)
Atención a la Salud , Aprendizaje Automático , Instituciones de Salud , Modelos Estadísticos
15.
Trials ; 22(1): 537, 2021 Aug 16.
Artículo en Inglés | MEDLINE | ID: mdl-34399832

RESUMEN

BACKGROUND: Interest in the application of machine learning (ML) to the design, conduct, and analysis of clinical trials has grown, but the evidence base for such applications has not been surveyed. This manuscript reviews the proceedings of a multi-stakeholder conference to discuss the current and future state of ML for clinical research. Key areas of clinical trial methodology in which ML holds particular promise and priority areas for further investigation are presented alongside a narrative review of evidence supporting the use of ML across the clinical trial spectrum. RESULTS: Conference attendees included stakeholders, such as biomedical and ML researchers, representatives from the US Food and Drug Administration (FDA), artificial intelligence technology and data analytics companies, non-profit organizations, patient advocacy groups, and pharmaceutical companies. ML contributions to clinical research were highlighted in the pre-trial phase, cohort selection and participant management, and data collection and analysis. A particular focus was paid to the operational and philosophical barriers to ML in clinical research. Peer-reviewed evidence was noted to be lacking in several areas. CONCLUSIONS: ML holds great promise for improving the efficiency and quality of clinical research, but substantial barriers remain, the surmounting of which will require addressing significant gaps in evidence.


Asunto(s)
Inteligencia Artificial , Aprendizaje Automático , Humanos , Estados Unidos , United States Food and Drug Administration
16.
Proc Mach Learn Res ; 130: 1459-1467, 2021 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-33954293

RESUMEN

While the need for interpretable machine learning has been established, many common approaches are slow, lack fidelity, or hard to evaluate. Amortized explanation methods reduce the cost of providing interpretations by learning a global selector model that returns feature importances for a single instance of data. The selector model is trained to optimize the fidelity of the interpretations, as evaluated by a predictor model for the target. Popular methods learn the selector and predictor model in concert, which we show allows predictions to be encoded within interpretations. We introduce EVAL-X as a method to quantitatively evaluate interpretations and REAL-X as an amortized explanation method, which learn a predictor model that approximates the true data generating distribution given any subset of the input. We show EVAL-X can detect when predictions are encoded in interpretations and show the advantages of REAL-X through quantitative and radiologist evaluation.

17.
Sci Transl Med ; 13(586)2021 03 24.
Artículo en Inglés | MEDLINE | ID: mdl-33762434

RESUMEN

Machine learning for health must be reproducible to ensure reliable clinical use. We evaluated 511 scientific papers across several machine learning subfields and found that machine learning for health compared poorly to other areas regarding reproducibility metrics, such as dataset and code accessibility. We propose recommendations to address this problem.


Asunto(s)
Aprendizaje Automático , Reproducibilidad de los Resultados
18.
Adv Neural Inf Process Syst ; 34: 2160-2172, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-35859987

RESUMEN

Deep models trained through maximum likelihood have achieved state-of-the-art results for survival analysis. Despite this training scheme, practitioners evaluate models under other criteria, such as binary classification losses at a chosen set of time horizons, e.g. Brier score (BS) and Bernoulli log likelihood (BLL). Models trained with maximum likelihood may have poor BS or BLL since maximum likelihood does not directly optimize these criteria. Directly optimizing criteria like BS requires inverse-weighting by the censoring distribution. However, estimating the censoring model under these metrics requires inverse-weighting by the failure distribution. The objective for each model requires the other, but neither are known. To resolve this dilemma, we introduce Inverse-Weighted Survival Games. In these games, objectives for each model are built from re-weighted estimates featuring the other model, where the latter is held fixed during training. When the loss is proper, we show that the games always have the true failure and censoring distributions as a stationary point. This means models in the game do not leave the correct distributions once reached. We construct one case where this stationary point is unique. We show that these games optimize BS on simulations and then apply these principles on real world cancer and critically-ill patient data.

19.
Proc Mach Learn Res ; 139: 12427-12436, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-35860036

RESUMEN

Deep generative models (dgms) seem a natural fit for detecting out-of-distribution (ood) inputs, but such models have been shown to assign higher probabilities or densities to ood images than images from the training distribution. In this work, we explain why this behavior should be attributed to model misestimation. We first prove that no method can guarantee performance beyond random chance without assumptions on which out-distributions are relevant. We then interrogate the typical set hypothesis, the claim that relevant out-distributions can lie in high likelihood regions of the data distribution, and that ood detection should be defined based on the data distribution's typical set. We highlight the consequences implied by assuming support overlap between in- and out-distributions, as well as the arbitrariness of the typical set for ood detection. Our results suggest that estimation error is a more plausible explanation than the misalignment between likelihood-based ood detection and out-distributions of interest, and we illustrate how even minimal estimation error can lead to ood detection failures, yielding implications for future work in deep generative modeling and ood detection.

20.
NPJ Digit Med ; 3: 130, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33083565

RESUMEN

The COVID-19 pandemic has challenged front-line clinical decision-making, leading to numerous published prognostic tools. However, few models have been prospectively validated and none report implementation in practice. Here, we use 3345 retrospective and 474 prospective hospitalizations to develop and validate a parsimonious model to identify patients with favorable outcomes within 96 h of a prediction, based on real-time lab values, vital signs, and oxygen support variables. In retrospective and prospective validation, the model achieves high average precision (88.6% 95% CI: [88.4-88.7] and 90.8% [90.8-90.8]) and discrimination (95.1% [95.1-95.2] and 86.8% [86.8-86.9]) respectively. We implemented and integrated the model into the EHR, achieving a positive predictive value of 93.3% with 41% sensitivity. Preliminary results suggest clinicians are adopting these scores into their clinical workflows.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...