RESUMEN
Feature attribution, the ability to localize regions of the input data that are relevant for classification, is an important capability for ML models in scientific and biomedical domains. Current methods for feature attribution, which rely on "explaining" the predictions of end-to-end classifiers, suffer from imprecise feature localization and are inadequate for use with small sample sizes and high-dimensional datasets due to computational challenges. We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods that can be applied to any encoder and any data modality. Prospector heads generalize across modalities through experiments on sequences (text), images (pathology), and graphs (protein structures), outperforming baseline attribution methods by up to 26.3 points in mean localization AUPRC. We also demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data. Through their high performance, flexibility, and generalizability, prospectors provide a framework for improving trust and transparency for ML models in complex domains.
RESUMEN
Undetected infection and delayed isolation of infected individuals are key factors driving the monkeypox virus (now termed mpox virus or MPXV) outbreak. To enable earlier detection of MPXV infection, we developed an image-based deep convolutional neural network (named MPXV-CNN) for the identification of the characteristic skin lesions caused by MPXV. We assembled a dataset of 139,198 skin lesion images, split into training/validation and testing cohorts, comprising non-MPXV images (n = 138,522) from eight dermatological repositories and MPXV images (n = 676) from the scientific literature, news articles, social media and a prospective cohort of the Stanford University Medical Center (n = 63 images from 12 patients, all male). In the validation and testing cohorts, the sensitivity of the MPXV-CNN was 0.83 and 0.91, the specificity was 0.965 and 0.898 and the area under the curve was 0.967 and 0.966, respectively. In the prospective cohort, the sensitivity was 0.89. The classification performance of the MPXV-CNN was robust across various skin tones and body regions. To facilitate the usage of the algorithm, we developed a web-based app by which the MPXV-CNN can be accessed for patient guidance. The capability of the MPXV-CNN for identifying MPXV lesions has the potential to aid in MPXV outbreak mitigation.
Asunto(s)
Aprendizaje Profundo , Mpox , Humanos , Masculino , Estudios Prospectivos , Monkeypox virus , AlgoritmosRESUMEN
OBJECTIVE: To develop prediction models for intensive care unit (ICU) vs non-ICU level-of-care need within 24 hours of inpatient admission for emergency department (ED) patients using electronic health record data. MATERIALS AND METHODS: Using records of 41 654 ED visits to a tertiary academic center from 2015 to 2019, we tested 4 algorithms-feed-forward neural networks, regularized regression, random forests, and gradient-boosted trees-to predict ICU vs non-ICU level-of-care within 24 hours and at the 24th hour following admission. Simple-feature models included patient demographics, Emergency Severity Index (ESI), and vital sign summary. Complex-feature models added all vital signs, lab results, and counts of diagnosis, imaging, procedures, medications, and lab orders. RESULTS: The best-performing model, a gradient-boosted tree using a full feature set, achieved an AUROC of 0.88 (95%CI: 0.87-0.89) and AUPRC of 0.65 (95%CI: 0.63-0.68) for predicting ICU care need within 24 hours of admission. The logistic regression model using ESI achieved an AUROC of 0.67 (95%CI: 0.65-0.70) and AUPRC of 0.37 (95%CI: 0.35-0.40). Using a discrimination threshold, such as 0.6, the positive predictive value, negative predictive value, sensitivity, and specificity were 85%, 89%, 30%, and 99%, respectively. Vital signs were the most important predictors. DISCUSSION AND CONCLUSIONS: Undertriaging admitted ED patients who subsequently require ICU care is common and associated with poorer outcomes. Machine learning models using readily available electronic health record data predict subsequent need for ICU admission with good discrimination, substantially better than the benchmarking ESI system. The results could be used in a multitiered clinical decision-support system to improve ED triage.
Asunto(s)
Servicio de Urgencia en Hospital , Triaje , Hospitalización , Hospitales , Humanos , Unidades de Cuidados Intensivos , Aprendizaje Automático , Estudios RetrospectivosRESUMEN
Identification of protein biomarkers for cancer diagnosis and prognosis remains a critical unmet clinical need. A major reason is that the dynamic relationship between proliferating and necrotic cell populations during vascularized tumor growth, and the associated extra- and intra-cellular protein outflux from these populations into blood circulation remains poorly understood. Complementary to experimental efforts, mathematical approaches have been employed to effectively simulate the kinetics of detectable surface proteins (e.g., CA-125) shed into the bloodstream. However, existing models can be difficult to tune and may be unable to capture the dynamics of non-extracellular proteins, such as those shed from necrotic and apoptosing cells. The models may also fail to account for intra-tumoral spatial and microenvironmental heterogeneity. We present a new multi-compartment model to simulate heterogeneously vascularized growing tumors and the corresponding protein outflux. Model parameters can be tuned from histology data, including relative vascular volume, mean vessel diameter, and distance from vasculature to necrotic tissue. The model enables evaluating the difference in shedding rates between extra- and non-extracellular proteins from viable and necrosing cells as a function of heterogeneous vascularization. Simulation results indicate that under certain conditions it is possible for non-extracellular proteins to have superior outflux relative to extracellular proteins. This work contributes towards the goal of cancer biomarker identification by enabling simulation of protein shedding kinetics based on tumor tissue-specific characteristics. Ultimately, we anticipate that models like the one introduced herein will enable examining origins and circulating dynamics of candidate biomarkers, thus facilitating marker selection for validation studies.