RESUMO
Artificial intelligence (AI) models for medical imaging tasks, such as classification or segmentation, require large and diverse datasets of images. However, due to privacy and ethical issues, as well as data sharing infrastructure barriers, these datasets are scarce and difficult to assemble. Synthetic medical imaging data generated by AI from existing data could address this challenge by augmenting and anonymizing real imaging data. In addition, synthetic data enable new applications, including modality translation, contrast synthesis, and professional training for radiologists. However, the use of synthetic data also poses technical and ethical challenges. These challenges include ensuring the realism and diversity of the synthesized images while keeping data unidentifiable, evaluating the performance and generalizability of models trained on synthetic data, and high computational costs. Since existing regulations are not sufficient to guarantee the safe and ethical use of synthetic images, it becomes evident that updated laws and more rigorous oversight are needed. Regulatory bodies, physicians, and AI developers should collaborate to develop, maintain, and continually refine best practices for synthetic data. This review aims to provide an overview of the current knowledge of synthetic data in medical imaging and highlights current key challenges in the field to guide future research and development.
Assuntos
Inteligência Artificial , Diagnóstico por Imagem , Humanos , Diagnóstico por Imagem/métodosRESUMO
BACKGROUND: Adjustment for race is discouraged in lung-function testing, but the implications of adopting race-neutral equations have not been comprehensively quantified. METHODS: We obtained longitudinal data from 369,077 participants in the National Health and Nutrition Examination Survey, U.K. Biobank, the Multi-Ethnic Study of Atherosclerosis, and the Organ Procurement and Transplantation Network. Using these data, we compared the race-based 2012 Global Lung Function Initiative (GLI-2012) equations with race-neutral equations introduced in 2022 (GLI-Global). Evaluated outcomes included national projections of clinical, occupational, and financial reclassifications; individual lung-allocation scores for transplantation priority; and concordance statistics (C statistics) for clinical prediction tasks. RESULTS: Among the 249 million persons in the United States between 6 and 79 years of age who are able to produce high-quality spirometric results, the use of GLI-Global equations may reclassify ventilatory impairment for 12.5 million persons, medical impairment ratings for 8.16 million, occupational eligibility for 2.28 million, grading of chronic obstructive pulmonary disease for 2.05 million, and military disability compensation for 413,000. These potential changes differed according to race; for example, classifications of nonobstructive ventilatory impairment may change dramatically, increasing 141% (95% confidence interval [CI], 113 to 169) among Black persons and decreasing 69% (95% CI, 63 to 74) among White persons. Annual disability payments may increase by more than $1 billion among Black veterans and decrease by $0.5 billion among White veterans. GLI-2012 and GLI-Global equations had similar discriminative accuracy with regard to respiratory symptoms, health care utilization, new-onset disease, death from any cause, death related to respiratory disease, and death among persons on a transplant waiting list, with differences in C statistics ranging from -0.008 to 0.011. CONCLUSIONS: The use of race-based and race-neutral equations generated similarly accurate predictions of respiratory outcomes but assigned different disease classifications, occupational eligibility, and disability compensation for millions of persons, with effects diverging according to race. (Funded by the National Heart Lung and Blood Institute and the National Institute of Environmental Health Sciences.).
Assuntos
Testes de Função Respiratória , Insuficiência Respiratória , Adolescente , Adulto , Idoso , Criança , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Adulto Jovem , Pneumopatias/diagnóstico , Pneumopatias/economia , Pneumopatias/etnologia , Pneumopatias/terapia , Transplante de Pulmão/estatística & dados numéricos , Inquéritos Nutricionais/estatística & dados numéricos , Doença Pulmonar Obstrutiva Crônica/diagnóstico , Doença Pulmonar Obstrutiva Crônica/economia , Doença Pulmonar Obstrutiva Crônica/etnologia , Doença Pulmonar Obstrutiva Crônica/terapia , Grupos Raciais , Testes de Função Respiratória/classificação , Testes de Função Respiratória/economia , Testes de Função Respiratória/normas , Espirometria , Estados Unidos/epidemiologia , Insuficiência Respiratória/diagnóstico , Insuficiência Respiratória/economia , Insuficiência Respiratória/etnologia , Insuficiência Respiratória/terapia , Negro ou Afro-Americano/estatística & dados numéricos , Brancos/estatística & dados numéricos , Avaliação da Deficiência , Ajuda a Veteranos de Guerra com Deficiência/classificação , Ajuda a Veteranos de Guerra com Deficiência/economia , Ajuda a Veteranos de Guerra com Deficiência/estatística & dados numéricos , Pessoas com Deficiência/classificação , Pessoas com Deficiência/estatística & dados numéricos , Doenças Profissionais/diagnóstico , Doenças Profissionais/economia , Doenças Profissionais/etnologia , Financiamento Governamental/economia , Financiamento Governamental/estatística & dados numéricosRESUMO
This scoping review of randomised controlled trials on artificial intelligence (AI) in clinical practice reveals an expanding interest in AI across clinical specialties and locations. The USA and China are leading in the number of trials, with a focus on deep learning systems for medical imaging, particularly in gastroenterology and radiology. A majority of trials (70 [81%] of 86) report positive primary endpoints, primarily related to diagnostic yield or performance; however, the predominance of single-centre trials, little demographic reporting, and varying reports of operational efficiency raise concerns about the generalisability and practicality of these results. Despite the promising outcomes, considering the likelihood of publication bias and the need for more comprehensive research including multicentre trials, diverse outcome measures, and improved reporting standards is crucial. Future AI trials should prioritise patient-relevant outcomes to fully understand AI's true effects and limitations in health care.
Assuntos
Inteligência Artificial , Ensaios Clínicos Controlados Aleatórios como Assunto , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Aprendizado ProfundoRESUMO
The integration of artificial intelligence (AI) in medical image interpretation requires effective collaboration between clinicians and AI algorithms. Although previous studies demonstrated the potential of AI assistance in improving overall clinician performance, the individual impact on clinicians remains unclear. This large-scale study examined the heterogeneous effects of AI assistance on 140 radiologists across 15 chest X-ray diagnostic tasks and identified predictors of these effects. Surprisingly, conventional experience-based factors, such as years of experience, subspecialty and familiarity with AI tools, fail to reliably predict the impact of AI assistance. Additionally, lower-performing radiologists do not consistently benefit more from AI assistance, challenging prevailing assumptions. Instead, we found that the occurrence of AI errors strongly influences treatment outcomes, with inaccurate AI predictions adversely affecting radiologist performance on the aggregate of all pathologies and on half of the individual pathologies investigated. Our findings highlight the importance of personalized approaches to clinician-AI collaboration and the importance of accurate AI models. By understanding the factors that shape the effectiveness of AI assistance, this study provides valuable insights for targeted implementation of AI, enabling maximum benefits for individual clinicians in clinical practice.
Assuntos
Algoritmos , Inteligência Artificial , Humanos , RadiologistasRESUMO
The complex relationships between continuously monitored health signals and therapeutic regimens can be modelled via machine learning. However, the clinical implementation of the models will require changes to clinical workflows. Here we outline ClinAIOps ('clinical artificial-intelligence operations'), a framework that integrates continuous therapeutic monitoring and the development of artificial intelligence (AI) for clinical care. ClinAIOps leverages three feedback loops to enable the patient to make treatment adjustments using AI outputs, the clinician to oversee patient progress with AI assistance, and the AI developer to receive continuous feedback from both the patient and the clinician. We lay out the central challenges and opportunities in the deployment of ClinAIOps by means of examples of its application in the management of blood pressure, diabetes and Parkinson's disease. By enabling more frequent and accurate measurements of a patient's health and more timely adjustments to their treatment, ClinAIOps may substantially improve patient outcomes.
RESUMO
Autonomous AI systems in medicine promise improved outcomes but raise concerns about liability, regulation, and costs. With the advent of large-language models, which can understand and generate medical text, the urgency for addressing these concerns increases as they create opportunities for more sophisticated autonomous AI systems. This perspective explores the liability implications for physicians, hospitals, and creators of AI technology, as well as the evolving regulatory landscape and payment models. Physicians may be favored in malpractice cases if they follow rigorously validated AI recommendations. However, AI developers may face liability for failing to adhere to industry-standard best practices during development and implementation. The evolving regulatory landscape, led by the FDA, seeks to ensure transparency, evaluation, and real-world monitoring of AI systems, while payment models such as MPFS, NTAP, and commercial payers adapt to accommodate them. The widespread adoption of autonomous AI systems can potentially streamline workflows and allow doctors to concentrate on the human aspects of healthcare.
RESUMO
Clinical decision support tools can improve diagnostic performance or reduce variability, but they are also subject to post-deployment underperformance. Although using AI in an assistive setting offsets many concerns with autonomous AI in medicine, systems that present all predictions equivalently fail to protect against key AI safety concerns. We design a decision pipeline that supports the diagnostic model with an ecosystem of models, integrating disagreement prediction, clinical significance categorization, and prediction quality modeling to guide prediction presentation. We characterize disagreement using data from a deployed chest X-ray interpretation aid and compare clinician burden in this proposed pipeline to the diagnostic model in isolation. The average disagreement rate is 6.5%, and the expected burden reduction is 4.8%, even if 5% of disagreements on urgent findings receive a second read. We conclude that, in our production setting, we can adequately balance risk mitigation with clinician burden if disagreement false positives are reduced.
Assuntos
Inteligência Artificial , Radiologistas , Humanos , Relevância Clínica , Medicina , Estudos RetrospectivosRESUMO
Artificial intelligence (AI) models for automatic generation of narrative radiology reports from images have the potential to enhance efficiency and reduce the workload of radiologists. However, evaluating the correctness of these reports requires metrics that can capture clinically pertinent differences. In this study, we investigate the alignment between automated metrics and radiologists' scoring of errors in report generation. We address the limitations of existing metrics by proposing new metrics, RadGraph F1 and RadCliQ, which demonstrate stronger correlation with radiologists' evaluations. In addition, we analyze the failure modes of the metrics to understand their limitations and provide guidance for metric selection and interpretation. This study establishes RadGraph F1 and RadCliQ as meaningful metrics for guiding future research in radiology report generation.
RESUMO
The exceptionally rapid development of highly flexible, reusable artificial intelligence (AI) models is likely to usher in newfound capabilities in medicine. We propose a new paradigm for medical AI, which we refer to as generalist medical AI (GMAI). GMAI models will be capable of carrying out a diverse set of tasks using very little or no task-specific labelled data. Built through self-supervision on large, diverse datasets, GMAI will flexibly interpret different combinations of medical modalities, including data from imaging, electronic health records, laboratory results, genomics, graphs or medical text. Models will in turn produce expressive outputs such as free-text explanations, spoken recommendations or image annotations that demonstrate advanced medical reasoning abilities. Here we identify a set of high-impact potential applications for GMAI and lay out specific technical capabilities and training datasets necessary to enable them. We expect that GMAI-enabled applications will challenge current strategies for regulating and validating AI devices for medicine and will shift practices associated with the collection of large medical datasets.
Assuntos
Inteligência Artificial , Medicina , Diagnóstico por Imagem , Registros Eletrônicos de Saúde , Genômica , Conjuntos de Dados como Assunto , Aprendizado de Máquina não Supervisionado , HumanosRESUMO
Pancreatic ductal adenocarcinoma (PDAC) has been left behind in the evolution of personalized medicine. Predictive markers of response to therapy are lacking in PDAC despite various histological and transcriptional classification schemes. We report an artificial intelligence (AI) approach to histologic feature examination that extracts a signature predictive of disease-specific survival (DSS) in patients with PDAC receiving adjuvant gemcitabine. We demonstrate that this AI-generated histologic signature is associated with outcomes following adjuvant gemcitabine, while three previously developed transcriptomic classification systems are not (n = 47). We externally validate this signature in an independent cohort of patients treated with adjuvant gemcitabine (n = 46). Finally, we demonstrate that the signature does not stratify survival outcomes in a third cohort of untreated patients (n = 161), suggesting that the signature is specifically predictive of treatment-related outcomes but is not generally prognostic. This imaging analysis pipeline has promise in the development of actionable markers in other clinical settings where few biomarkers currently exist.
Assuntos
Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Gencitabina , Inteligência Artificial , Desoxicitidina/uso terapêutico , Neoplasias Pancreáticas/tratamento farmacológico , Neoplasias Pancreáticas/patologia , Carcinoma Ductal Pancreático/tratamento farmacológico , Carcinoma Ductal Pancreático/genética , Resultado do Tratamento , Biomarcadores , Neoplasias PancreáticasRESUMO
Anticipation of clinical decompensation is essential for effective emergency and critical care. In this study, we develop a multimodal machine learning approach to predict the onset of new vital sign abnormalities (tachycardia, hypotension, hypoxia) in ED patients with normal initial vital signs. Our method combines standard triage data (vital signs, demographics, chief complaint) with features derived from a brief period of continuous physiologic monitoring, extracted via both conventional signal processing and transformer-based deep learning on ECG and PPG waveforms. We study 19,847 adult ED visits, divided into training (75%), validation (12.5%), and a chronologically sequential held-out test set (12.5%). The best-performing models use a combination of engineered and transformer-derived features, predicting in a 90-minute window new tachycardia with AUROC of 0.836 (95% CI, 0.800-0.870), new hypotension with AUROC 0.802 (95% CI, 0.747-0.856), and new hypoxia with AUROC 0.713 (95% CI, 0.680-0.745), in all cases significantly outperforming models using only standard triage data. Salient features include vital sign trends, PPG perfusion index, and ECG waveforms. This approach could improve the triage of apparently stable patients and be applied continuously for the prediction of near-term clinical deterioration.
RESUMO
In tasks involving the interpretation of medical images, suitably trained machine-learning models often exceed the performance of medical experts. Yet such a high-level of performance typically requires that the models be trained with relevant datasets that have been painstakingly annotated by experts. Here we show that a self-supervised model trained on chest X-ray images that lack explicit annotations performs pathology-classification tasks with accuracies comparable to those of radiologists. On an external validation dataset of chest X-rays, the self-supervised model outperformed a fully supervised model in the detection of three pathologies (out of eight), and the performance generalized to pathologies that were not explicitly annotated for model training, to multiple image-interpretation tasks and to datasets from multiple institutions.
Assuntos
Aprendizado de Máquina , Aprendizado de Máquina Supervisionado , Raios XRESUMO
The increasing availability of biomedical data from large biobanks, electronic health records, medical imaging, wearable and ambient biosensors, and the lower cost of genome and microbiome sequencing have set the stage for the development of multimodal artificial intelligence solutions that capture the complexity of human health and disease. In this Review, we outline the key applications enabled, along with the technical and analytical challenges. We explore opportunities in personalized medicine, digital clinical trials, remote monitoring and care, pandemic surveillance, digital twin technology and virtual health assistants. Further, we survey the data, modeling and privacy challenges that must be overcome to realize the full potential of multimodal artificial intelligence in health.
Assuntos
Inteligência Artificial , Pandemias , Registros Eletrônicos de Saúde , Humanos , PrivacidadeRESUMO
OBJECTIVE: Chest pain is common, and current risk-stratification methods, requiring 12-lead electrocardiograms (ECGs) and serial biomarker assays, are static and restricted to highly resourced settings. Our objective was to predict myocardial injury using continuous single-lead ECG waveforms similar to those obtained from wearable devices and to evaluate the potential of transfer learning from labeled 12-lead ECGs to improve these predictions. METHODS: We studied 10â874 Emergency Department (ED) patients who received continuous ECG monitoring and troponin testing from 2020 to 2021. We defined myocardial injury as newly elevated troponin in patients with chest pain or shortness of breath. We developed deep learning models of myocardial injury using continuous lead II ECG from bedside monitors as well as conventional 12-lead ECGs from triage. We pretrained single-lead models on a pre-existing corpus of labeled 12-lead ECGs. We compared model predictions to those of ED physicians. RESULTS: A transfer learning strategy, whereby models for continuous single-lead ECGs were first pretrained on 12-lead ECGs from a separate cohort, predicted myocardial injury as accurately as models using patients' own 12-lead ECGs: area under the receiver operating characteristic curve 0.760 (95% confidence interval [CI], 0.721-0.799) and area under the precision-recall curve 0.321 (95% CI, 0.251-0.397). Models demonstrated a high negative predictive value for myocardial injury among patients with chest pain or shortness of breath, exceeding the predictive performance of ED physicians, while attending to known stigmata of myocardial injury. CONCLUSIONS: Deep learning models pretrained on labeled 12-lead ECGs can predict myocardial injury from noisy, continuous monitor data early in a patient's presentation. The utility of continuous single-lead ECG in the risk stratification of chest pain has implications for wearable devices and preclinical settings, where external validation of the approach is needed.
Assuntos
Dor no Peito , Eletrocardiografia , Biomarcadores , Dor no Peito/diagnóstico , Dor no Peito/etiologia , Dispneia/diagnóstico , Dispneia/etiologia , Eletrocardiografia/métodos , Serviço Hospitalar de Emergência , Humanos , Aprendizado de Máquina , TroponinaRESUMO
The development of medical applications of machine learning has required manual annotation of data, often by medical experts. Yet, the availability of large-scale unannotated data provides opportunities for the development of better machine-learning models. In this Review, we highlight self-supervised methods and models for use in medicine and healthcare, and discuss the advantages and limitations of their application to tasks involving electronic health records and datasets of medical images, bioelectrical signals, and sequences and structures of genes and proteins. We also discuss promising applications of self-supervised learning for the development of models leveraging multimodal datasets, and the challenges in collecting unbiased data for their training. Self-supervised learning may accelerate the development of medical artificial intelligence.
Assuntos
Inteligência Artificial , Medicina , Aprendizado de Máquina , Aprendizado de Máquina Supervisionado , Atenção à SaúdeRESUMO
The use of artificial intelligence (AI) has grown dramatically in the past few years in the United States and worldwide, with more than 300 AI-enabled devices approved by the U.S. Food and Drug Administration (FDA). Most of these AI-enabled applications focus on helping radiologists with detection, triage, and prioritization of tasks by using data from a single point, but clinical practice often encompasses a dynamic scenario wherein physicians make decisions on the basis of longitudinal information. Unfortunately, benchmark data sets incorporating clinical and radiologic data from several points are scarce, and, therefore, the machine learning community has not focused on developing methods and architectures suitable for these tasks. Current AI algorithms are not suited to tackle key image interpretation tasks that require comparisons to previous examinations. Focusing on the curation of data sets and algorithm development that allow for comparisons at different points will be required to advance the range of relevant tasks covered by future AI-enabled FDA-cleared devices.
Assuntos
Inteligência Artificial , Radiologia , Algoritmos , Humanos , Aprendizado de Máquina , RadiologistasRESUMO
Artificial intelligence (AI) is poised to broadly reshape medicine, potentially improving the experiences of both clinicians and patients. We discuss key findings from a 2-year weekly effort to track and share key developments in medical AI. We cover prospective studies and advances in medical image analysis, which have reduced the gap between research and deployment. We also address several promising avenues for novel medical AI research, including non-image data sources, unconventional problem formulations and human-AI collaboration. Finally, we consider serious technical and ethical challenges in issues spanning from data scarcity to racial bias. As these challenges are addressed, AI's potential may be realized, making healthcare more accurate, efficient and accessible for patients worldwide.
Assuntos
Inteligência Artificial , Atenção à Saúde , Medicina , Algoritmos , Humanos , Estudos ProspectivosRESUMO
Data labeling is often the limiting step in machine learning because it requires time from trained experts. To address the limitation on labeled data, contrastive learning, among other unsupervised learning methods, leverages unlabeled data to learn representations of data. Here, we propose a contrastive learning framework that utilizes metadata for selecting positive and negative pairs when training on unlabeled data. We demonstrate its application in the healthcare domain on heart and lung sound recordings. The increasing availability of heart and lung sound recordings due to adoption of digital stethoscopes lends itself as an opportunity to demonstrate the application of our contrastive learning method. Compared to contrastive learning with augmentations, the contrastive learning model leveraging metadata for pair selection utilizes clinical information associated with lung and heart sound recordings. This approach uses shared context of the recordings on the patient level using clinical information including age, sex, weight, location of sounds, etc. We show improvement in downstream tasks for diagnosing heart and lung sounds when leveraging patient-specific representations in selecting positive and negative pairs. This study paves the path for medical applications of contrastive learning that leverage clinical information. We have made our code available here: https://github.com/stanfordmlgroup/selfsupervised-lungandheartsounds.