Search | VHL Regional Portal

A cost-aware framework for the development of AI models for healthcare applications.

Erion, Gabriel; Janizek, Joseph D; Hudelson, Carly; Utarnachitt, Richard B; McCoy, Andrew M; Sayre, Michael R; White, Nathan J; Lee, Su-In.

Nat Biomed Eng ; 6(12): 1384-1398, 2022 12.

Article in English | MEDLINE | ID: mdl-35393566

ABSTRACT

Accurate artificial intelligence (AI) for disease diagnosis could lower healthcare workloads. However, when time or financial resources for gathering input data are limited, as in emergency and critical-care medicine, developing accurate AI models, which typically require inputs for many clinical variables, may be impractical. Here we report a model-agnostic cost-aware AI (CoAI) framework for the development of predictive models that optimize the trade-off between prediction performance and feature cost. By using three datasets, each including thousands of patients, we show that relative to clinical risk scores, CoAI substantially reduces the cost and improves the accuracy of predicting acute traumatic coagulopathy in a pre-hospital setting, mortality in intensive-care patients and mortality in outpatient settings. We also show that CoAI outperforms state-of-the-art cost-aware prediction strategies in terms of predictive performance, model cost, training time and robustness to feature-cost perturbations. CoAI uses axiomatic feature-attribution methods for the estimation of feature importance and decouples feature selection from model training, thus allowing for a faster and more flexible adaptation of AI models to new feature costs and prediction budgets.

Subject(s)

Artificial Intelligence , Humans , Risk Factors

Forecasting adverse surgical events using self-supervised transfer learning for physiological signals.

Chen, Hugh; Lundberg, Scott M; Erion, Gabriel; Kim, Jerry H; Lee, Su-In.

NPJ Digit Med ; 4(1): 167, 2021 Dec 08.

Article in English | MEDLINE | ID: mdl-34880410

ABSTRACT

Hundreds of millions of surgical procedures take place annually across the world, which generate a prevalent type of electronic health record (EHR) data comprising time series physiological signals. Here, we present a transferable embedding method (i.e., a method to transform time series signals into input features for predictive machine learning models) named PHASE (PHysiologicAl Signal Embeddings) that enables us to more accurately forecast adverse surgical outcomes based on physiological signals. We evaluate PHASE on minute-by-minute EHR data of more than 50,000 surgeries from two operating room (OR) datasets and patient stays in an intensive care unit (ICU) dataset. PHASE outperforms other state-of-the-art approaches, such as long-short term memory networks trained on raw data and gradient boosted trees trained on handcrafted features, in predicting six distinct outcomes: hypoxemia, hypocapnia, hypotension, hypertension, phenylephrine, and epinephrine. In a transfer learning setting where we train embedding models in one dataset then embed signals and predict adverse events in unseen data, PHASE achieves significantly higher prediction accuracy at lower computational cost compared to conventional approaches. Finally, given the importance of understanding models in clinical applications we demonstrate that PHASE is explainable and validate our predictive models using local feature attribution methods.

From Local Explanations to Global Understanding with Explainable AI for Trees.

Lundberg, Scott M; Erion, Gabriel; Chen, Hugh; DeGrave, Alex; Prutkin, Jordan M; Nair, Bala; Katz, Ronit; Himmelfarb, Jonathan; Bansal, Nisha; Lee, Su-In.

Nat Mach Intell ; 2(1): 56-67, 2020 Jan.

Article in English | MEDLINE | ID: mdl-32607472

ABSTRACT

Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are popular non-linear predictive models, yet comparatively little attention has been paid to explaining their predictions. Here, we improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the US population, ii) highlight distinct population sub-groups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model's performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains.

Viral Genetic Linkage Analysis in the Presence of Missing Data.

Liu, Shelley H; Erion, Gabriel; Novitsky, Vladimir; De Gruttola, Victor.

PLoS One ; 10(8): e0135469, 2015.

Article in English | MEDLINE | ID: mdl-26301919

ABSTRACT

Analyses of viral genetic linkage can provide insight into HIV transmission dynamics and the impact of prevention interventions. For example, such analyses have the potential to determine whether recently-infected individuals have acquired viruses circulating within or outside a given community. In addition, they have the potential to identify characteristics of chronically infected individuals that make their viruses likely to cluster with others circulating within a community. Such clustering can be related to the potential of such individuals to contribute to the spread of the virus, either directly through transmission to their partners or indirectly through further spread of HIV from those partners. Assessment of the extent to which individual (incident or prevalent) viruses are clustered within a community will be biased if only a subset of subjects are observed, especially if that subset is not representative of the entire HIV infected population. To address this concern, we develop a multiple imputation framework in which missing sequences are imputed based on a model for the diversification of viral genomes. The imputation method decreases the bias in clustering that arises from informative missingness. Data from a household survey conducted in a village in Botswana are used to illustrate these methods. We demonstrate that the multiple imputation approach reduces bias in the overall proportion of clustering due to the presence of missing observations.

Subject(s)

Genetic Linkage , HIV Infections/genetics , HIV-1/genetics , Adult , Botswana , Female , HIV Infections/transmission , HIV Infections/virology , HIV-1/pathogenicity , Humans , Male , Models, Theoretical , Sexual Partners

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL