RESUMO
Importance: Artificial intelligence (AI) could support clinicians when diagnosing hospitalized patients; however, systematic bias in AI models could worsen clinician diagnostic accuracy. Recent regulatory guidance has called for AI models to include explanations to mitigate errors made by models, but the effectiveness of this strategy has not been established. Objectives: To evaluate the impact of systematically biased AI on clinician diagnostic accuracy and to determine if image-based AI model explanations can mitigate model errors. Design, Setting, and Participants: Randomized clinical vignette survey study administered between April 2022 and January 2023 across 13 US states involving hospitalist physicians, nurse practitioners, and physician assistants. Interventions: Clinicians were shown 9 clinical vignettes of patients hospitalized with acute respiratory failure, including their presenting symptoms, physical examination, laboratory results, and chest radiographs. Clinicians were then asked to determine the likelihood of pneumonia, heart failure, or chronic obstructive pulmonary disease as the underlying cause(s) of each patient's acute respiratory failure. To establish baseline diagnostic accuracy, clinicians were shown 2 vignettes without AI model input. Clinicians were then randomized to see 6 vignettes with AI model input with or without AI model explanations. Among these 6 vignettes, 3 vignettes included standard-model predictions, and 3 vignettes included systematically biased model predictions. Main Outcomes and Measures: Clinician diagnostic accuracy for pneumonia, heart failure, and chronic obstructive pulmonary disease. Results: Median participant age was 34 years (IQR, 31-39) and 241 (57.7%) were female. Four hundred fifty-seven clinicians were randomized and completed at least 1 vignette, with 231 randomized to AI model predictions without explanations, and 226 randomized to AI model predictions with explanations. Clinicians' baseline diagnostic accuracy was 73.0% (95% CI, 68.3% to 77.8%) for the 3 diagnoses. When shown a standard AI model without explanations, clinician accuracy increased over baseline by 2.9 percentage points (95% CI, 0.5 to 5.2) and by 4.4 percentage points (95% CI, 2.0 to 6.9) when clinicians were also shown AI model explanations. Systematically biased AI model predictions decreased clinician accuracy by 11.3 percentage points (95% CI, 7.2 to 15.5) compared with baseline and providing biased AI model predictions with explanations decreased clinician accuracy by 9.1 percentage points (95% CI, 4.9 to 13.2) compared with baseline, representing a nonsignificant improvement of 2.3 percentage points (95% CI, -2.7 to 7.2) compared with the systematically biased AI model. Conclusions and Relevance: Although standard AI models improve diagnostic accuracy, systematically biased AI models reduced diagnostic accuracy, and commonly used image-based AI model explanations did not mitigate this harmful effect. Trial Registration: ClinicalTrials.gov Identifier: NCT06098950.
Assuntos
Inteligência Artificial , Competência Clínica , Insuficiência Respiratória , Adulto , Feminino , Humanos , Masculino , Insuficiência Cardíaca/complicações , Insuficiência Cardíaca/diagnóstico , Pneumonia/complicações , Pneumonia/diagnóstico , Doença Pulmonar Obstrutiva Crônica/complicações , Doença Pulmonar Obstrutiva Crônica/diagnóstico , Insuficiência Respiratória/diagnóstico , Insuficiência Respiratória/etiologia , Diagnóstico , Reprodutibilidade dos Testes , Viés , Doença Aguda , Médicos Hospitalares , Profissionais de Enfermagem , Assistentes Médicos , Estados UnidosRESUMO
OBJECTIVE: When patients develop acute respiratory failure (ARF), accurately identifying the underlying etiology is essential for determining the best treatment. However, differentiating between common medical diagnoses can be challenging in clinical practice. Machine learning models could improve medical diagnosis by aiding in the diagnostic evaluation of these patients. MATERIALS AND METHODS: Machine learning models were trained to predict the common causes of ARF (pneumonia, heart failure, and/or chronic obstructive pulmonary disease [COPD]). Models were trained using chest radiographs and clinical data from the electronic health record (EHR) and applied to an internal and external cohort. RESULTS: The internal cohort of 1618 patients included 508 (31%) with pneumonia, 363 (22%) with heart failure, and 137 (8%) with COPD based on physician chart review. A model combining chest radiographs and EHR data outperformed models based on each modality alone. Models had similar or better performance compared to a randomly selected physician reviewer. For pneumonia, the combined model area under the receiver operating characteristic curve (AUROC) was 0.79 (0.77-0.79), image model AUROC was 0.74 (0.72-0.75), and EHR model AUROC was 0.74 (0.70-0.76). For heart failure, combined: 0.83 (0.77-0.84), image: 0.80 (0.71-0.81), and EHR: 0.79 (0.75-0.82). For COPD, combined: AUROC = 0.88 (0.83-0.91), image: 0.83 (0.77-0.89), and EHR: 0.80 (0.76-0.84). In the external cohort, performance was consistent for heart failure and increased for COPD, but declined slightly for pneumonia. CONCLUSIONS: Machine learning models combining chest radiographs and EHR data can accurately differentiate between common causes of ARF. Further work is needed to determine how these models could act as a diagnostic aid to clinicians in clinical settings.
Assuntos
Insuficiência Cardíaca , Doença Pulmonar Obstrutiva Crônica , Insuficiência Respiratória , Registros Eletrônicos de Saúde , Insuficiência Cardíaca/diagnóstico por imagem , Humanos , Aprendizado de Máquina , Doença Pulmonar Obstrutiva Crônica/diagnóstico , Insuficiência Respiratória/diagnóstico por imagem , Raios XRESUMO
Our goal in this paper is to investigate properties of 3D shape that can be determined from a single image. We define 3D shape attributes-generic properties of the shape that capture curvature, contact and occupied space. Our first objective is to infer these 3D shape attributes from a single image. A second objective is to infer a 3D shape embedding-a low dimensional vector representing the 3D shape. We study how the 3D shape attributes and embedding can be obtained from a single image by training a Convolutional Neural Network (CNN) for this task. We start with synthetic images so that the contribution of various cues and nuisance parameters can be controlled. We then turn to real images and introduce a large scale image dataset of sculptures containing 143K images covering 2197 works from 242 artists. For the CNN trained on the sculpture dataset we show the following: (i) which regions of the imaged sculpture are used by the CNN to infer the 3D shape attributes; (ii) that the shape embedding can be used to match previously unseen sculptures largely independent of viewpoint; and (iii) that the 3D attributes generalize to images of other (non-sculpture) object classes.
RESUMO
Measurements of the extreme ultraviolet (EUV) solar spectral irradiance (SSI) are essential for understanding drivers of space weather effects, such as radio blackouts, and aerodynamic drag on satellites during periods of enhanced solar activity. In this paper, we show how to learn a mapping from EUV narrowband images to spectral irradiance measurements using data from NASA's Solar Dynamics Observatory obtained between 2010 to 2014. We describe a protocol and baselines for measuring the performance of models. Our best performing machine learning (ML) model based on convolutional neural networks (CNNs) outperforms other ML models, and a differential emission measure (DEM) based approach, yielding average relative errors of under 4.6% (maximum error over emission lines) and more typically 1.6% (median). We also provide evidence that the proposed method is solving this mapping in a way that makes physical sense and by paying attention to magnetic structures known to drive EUV SSI variability.