RESUMO
Importance: Myopic maculopathy (MM) is a major cause of vision impairment globally. Artificial intelligence (AI) and deep learning (DL) algorithms for detecting MM from fundus images could potentially improve diagnosis and assist screening in a variety of health care settings. Objectives: To evaluate DL algorithms for MM classification and segmentation and compare their performance with that of ophthalmologists. Design, Setting, and Participants: The Myopic Maculopathy Analysis Challenge (MMAC) was an international competition to develop automated solutions for 3 tasks: (1) MM classification, (2) segmentation of MM plus lesions, and (3) spherical equivalent (SE) prediction. Participants were provided 3 subdatasets containing 2306, 294, and 2003 fundus images, respectively, with which to build algorithms. A group of 5 ophthalmologists evaluated the same test sets for tasks 1 and 2 to ascertain performance. Results from model ensembles, which combined outcomes from multiple algorithms submitted by MMAC participants, were compared with each individual submitted algorithm. This study was conducted from March 1, 2023, to March 30, 2024, and data were analyzed from January 15, 2024, to March 30, 2024. Exposure: DL algorithms submitted as part of the MMAC competition or ophthalmologist interpretation. Main Outcomes and Measures: MM classification was evaluated by quadratic-weighted κ (QWK), F1 score, sensitivity, and specificity. MM plus lesions segmentation was evaluated by dice similarity coefficient (DSC), and SE prediction was evaluated by R2 and mean absolute error (MAE). Results: The 3 tasks were completed by 7, 4, and 4 teams, respectively. MM classification algorithms achieved a QWK range of 0.866 to 0.901, an F1 score range of 0.675 to 0.781, a sensitivity range of 0.667 to 0.778, and a specificity range of 0.931 to 0.945. MM plus lesions segmentation algorithms achieved a DSC range of 0.664 to 0.687 for lacquer cracks (LC), 0.579 to 0.673 for choroidal neovascularization, and 0.768 to 0.841 for Fuchs spot (FS). SE prediction algorithms achieved an R2 range of 0.791 to 0.874 and an MAE range of 0.708 to 0.943. Model ensemble results achieved the best performance compared to each submitted algorithms, and the model ensemble outperformed ophthalmologists at MM classification in sensitivity (0.801; 95% CI, 0.764-0.840 vs 0.727; 95% CI, 0.684-0.768; P = .006) and specificity (0.946; 95% CI, 0.939-0.954 vs 0.933; 95% CI, 0.925-0.941; P = .009), LC segmentation (DSC, 0.698; 95% CI, 0.649-0.745 vs DSC, 0.570; 95% CI, 0.515-0.625; P < .001), and FS segmentation (DSC, 0.863; 95% CI, 0.831-0.888 vs DSC, 0.790; 95% CI, 0.742-0.830; P < .001). Conclusions and Relevance: In this diagnostic study, 15 AI models for MM classification and segmentation on a public dataset made available for the MMAC competition were validated and evaluated, with some models achieving better diagnostic performance than ophthalmologists.
RESUMO
The widespread use of Chat Generative Pre-trained Transformer (known as ChatGPT) and other emerging technology that is powered by generative artificial intelligence (GenAI) has drawn attention to the potential ethical issues they can cause, especially in high-stakes applications such as health care, but ethical discussions have not yet been translated into operationalisable solutions. Furthermore, ongoing ethical discussions often neglect other types of GenAI that have been used to synthesise data (eg, images) for research and practical purposes, which resolve some ethical issues and expose others. We did a scoping review of the ethical discussions on GenAI in health care to comprehensively analyse gaps in the research. To reduce the gaps, we have developed a checklist for comprehensive assessment and evaluation of ethical discussions in GenAI research. The checklist can be integrated into peer review and publication systems to enhance GenAI research and might be useful for ethics-related disclosures for GenAI-powered products and health-care applications of such products and beyond.
Assuntos
Inteligência Artificial , Lista de Checagem , Atenção à Saúde , Inteligência Artificial/ética , Humanos , Atenção à Saúde/éticaRESUMO
The emergence of generative artificial intelligence (AI) has revolutionized various fields. In ophthalmology, generative AI has the potential to enhance efficiency, accuracy, personalization and innovation in clinical practice and medical research, through processing data, streamlining medical documentation, facilitating patient-doctor communication, aiding in clinical decision-making, and simulating clinical trials. This review focuses on the development and integration of generative AI models into clinical workflows and scientific research of ophthalmology. It outlines the need for development of a standard framework for comprehensive assessments, robust evidence, and exploration of the potential of multimodal capabilities and intelligent agents. Additionally, the review addresses the risks in AI model development and application in clinical service and research of ophthalmology, including data privacy, data bias, adaptation friction, over interdependence, and job replacement, based on which we summarized a risk management framework to mitigate these concerns. This review highlights the transformative potential of generative AI in enhancing patient care, improving operational efficiency in the clinical service and research in ophthalmology. It also advocates for a balanced approach to its adoption.
Assuntos
Inteligência Artificial , Oftalmologia , Inteligência Artificial/tendências , Humanos , Oftalmologia/tendências , Oftalmologia/métodosRESUMO
Generative Artificial Intelligence (GenAI) are algorithms capable of generating original content. The ability of GenAI to learn and generate novel outputs alike human cognition has taken the world by storm and ushered in a new era. In this review, we explore the role of GenAI in healthcare, including clinical, operational, and research applications, and delve into the cybersecurity risks of this technology. We discuss risks such as data privacy risks, data poisoning attacks, the propagation of bias, and hallucinations. In this review, we recommend risk mitigation strategies to enhance cybersecurity in GenAI technologies and further explore the use of GenAI as a tool in itself to enhance cybersecurity across the various AI algorithms. GenAI is emerging as a pivotal catalyst across various industries including the healthcare domain. Comprehending the intricacies of this technology and its potential risks will be imperative for us to fully capitalise on the benefits that GenAI can bring.
Assuntos
Inteligência Artificial , Segurança Computacional , Humanos , Algoritmos , Atenção à SaúdeRESUMO
BACKGROUND: Discharge letters are a critical component in the continuity of care between specialists and primary care providers. However, these letters are time-consuming to write, underprioritized in comparison to direct clinical care, and are often tasked to junior doctors. Prior studies assessing the quality of discharge summaries written for inpatient hospital admissions show inadequacies in many domains. Large language models such as GPT have the ability to summarize large volumes of unstructured free text such as electronic medical records and have the potential to automate such tasks, providing time savings and consistency in quality. OBJECTIVE: The aim of this study was to assess the performance of GPT-4 in generating discharge letters written from urology specialist outpatient clinics to primary care providers and to compare their quality against letters written by junior clinicians. METHODS: Fictional electronic records were written by physicians simulating 5 common urology outpatient cases with long-term follow-up. Records comprised simulated consultation notes, referral letters and replies, and relevant discharge summaries from inpatient admissions. GPT-4 was tasked to write discharge letters for these cases with a specified target audience of primary care providers who would be continuing the patient's care. Prompts were written for safety, content, and style. Concurrently, junior clinicians were provided with the same case records and instructional prompts. GPT-4 output was assessed for instances of hallucination. A blinded panel of primary care physicians then evaluated the letters using a standardized questionnaire tool. RESULTS: GPT-4 outperformed human counterparts in information provision (mean 4.32, SD 0.95 vs 3.70, SD 1.27; P=.03) and had no instances of hallucination. There were no statistically significant differences in the mean clarity (4.16, SD 0.95 vs 3.68, SD 1.24; P=.12), collegiality (4.36, SD 1.00 vs 3.84, SD 1.22; P=.05), conciseness (3.60, SD 1.12 vs 3.64, SD 1.27; P=.71), follow-up recommendations (4.16, SD 1.03 vs 3.72, SD 1.13; P=.08), and overall satisfaction (3.96, SD 1.14 vs 3.62, SD 1.34; P=.36) between the letters generated by GPT-4 and humans, respectively. CONCLUSIONS: Discharge letters written by GPT-4 had equivalent quality to those written by junior clinicians, without any hallucinations. This study provides a proof of concept that large language models can be useful and safe tools in clinical documentation.
Assuntos
Alta do Paciente , Humanos , Alta do Paciente/normas , Registros Eletrônicos de Saúde/normas , Método Simples-Cego , IdiomaRESUMO
Introduction: Automated machine learning (autoML) removes technical and technological barriers to building artificial intelligence models. We aimed to summarise the clinical applications of autoML, assess the capabilities of utilised platforms, evaluate the quality of the evidence trialling autoML, and gauge the performance of autoML platforms relative to conventionally developed models, as well as each other. Method: This review adhered to a prospectively registered protocol (PROSPERO identifier CRD42022344427). The Cochrane Library, Embase, MEDLINE and Scopus were searched from inception to 11 July 2022. Two researchers screened abstracts and full texts, extracted data and conducted quality assessment. Disagreement was resolved through discussion and if required, arbitration by a third researcher. Results: There were 26 distinct autoML platforms featured in 82 studies. Brain and lung disease were the most common fields of study of 22 specialties. AutoML exhibited variable performance: area under the receiver operator characteristic curve (AUCROC) 0.35-1.00, F1-score 0.16-0.99, area under the precision-recall curve (AUPRC) 0.51-1.00. AutoML exhibited the highest AUCROC in 75.6% trials; the highest F1-score in 42.3% trials; and the highest AUPRC in 83.3% trials. In autoML platform comparisons, AutoPrognosis and Amazon Rekognition performed strongest with unstructured and structured data, respectively. Quality of reporting was poor, with a median DECIDE-AI score of 14 of 27. Conclusion: A myriad of autoML platforms have been applied in a variety of clinical contexts. The performance of autoML compares well to bespoke computational and clinical benchmarks. Further work is required to improve the quality of validation studies. AutoML may facilitate a transition to data-centric development, and integration with large language models may enable AI to build itself to fulfil user-defined goals.
Assuntos
Aprendizado de Máquina , Humanos , Pneumopatias/diagnóstico , Curva ROC , Encefalopatias/diagnóstico , Área Sob a CurvaRESUMO
BACKGROUND: Diabetic retinopathy (DR) and diabetic macular edema (DME) are major causes of visual impairment that challenge global vision health. New strategies are needed to tackle these growing global health problems, and the integration of artificial intelligence (AI) into ophthalmology has the potential to revolutionize DR and DME management to meet these challenges. MAIN TEXT: This review discusses the latest AI-driven methodologies in the context of DR and DME in terms of disease identification, patient-specific disease profiling, and short-term and long-term management. This includes current screening and diagnostic systems and their real-world implementation, lesion detection and analysis, disease progression prediction, and treatment response models. It also highlights the technical advancements that have been made in these areas. Despite these advancements, there are obstacles to the widespread adoption of these technologies in clinical settings, including regulatory and privacy concerns, the need for extensive validation, and integration with existing healthcare systems. We also explore the disparity between the potential of AI models and their actual effectiveness in real-world applications. CONCLUSION: AI has the potential to revolutionize the management of DR and DME, offering more efficient and precise tools for healthcare professionals. However, overcoming challenges in deployment, regulatory compliance, and patient privacy is essential for these technologies to realize their full potential. Future research should aim to bridge the gap between technological innovation and clinical application, ensuring AI tools integrate seamlessly into healthcare workflows to enhance patient outcomes.
RESUMO
Spectral-domain optical coherence tomography (SDOCT) is the gold standard of imaging the eye in clinics. Penetration depth with such devices is, however, limited and visualization of the choroid, which is essential for diagnosing chorioretinal disease, remains limited. Whereas swept-source OCT (SSOCT) devices allow for visualization of the choroid these instruments are expensive and availability in praxis is limited. We present an artificial intelligence (AI)-based solution to enhance the visualization of the choroid in OCT scans and allow for quantitative measurements of choroidal metrics using generative deep learning (DL). Synthetically enhanced SDOCT B-scans with improved choroidal visibility were generated, leveraging matching images to learn deep anatomical features during the training. Using a single-center tertiary eye care institution cohort comprising a total of 362 SDOCT-SSOCT paired subjects, we trained our model with 150,784 images from 410 healthy, 192 glaucoma, and 133 diabetic retinopathy eyes. An independent external test dataset of 37,376 images from 146 eyes was deployed to assess the authenticity and quality of the synthetically enhanced SDOCT images. Experts' ability to differentiate real versus synthetic images was poor (47.5% accuracy). Measurements of choroidal thickness, area, volume, and vascularity index, from the reference SSOCT and synthetically enhanced SDOCT, showed high Pearson's correlations of 0.97 [95% CI: 0.96-0.98], 0.97 [0.95-0.98], 0.95 [0.92-0.98], and 0.87 [0.83-0.91], with intra-class correlation values of 0.99 [0.98-0.99], 0.98 [0.98-0.99], and 0.95 [0.96-0.98], 0.93 [0.91-0.95], respectively. Thus, our DL generative model successfully generated realistic enhanced SDOCT data that is indistinguishable from SSOCT images providing improved visualization of the choroid. This technology enabled accurate measurements of choroidal metrics previously limited by the imaging depth constraints of SDOCT. The findings open new possibilities for utilizing affordable SDOCT devices in studying the choroid in both healthy and pathological conditions.
RESUMO
Utilization of digital technologies for cataract screening in primary care is a potential solution for addressing the dilemma between the growing aging population and unequally distributed resources. Here, we propose a digital technology-driven hierarchical screening (DH screening) pattern implemented in China to promote the equity and accessibility of healthcare. It consists of home-based mobile artificial intelligence (AI) screening, community-based AI diagnosis, and referral to hospitals. We utilize decision-analytic Markov models to evaluate the cost-effectiveness and cost-utility of different cataract screening strategies (no screening, telescreening, AI screening and DH screening). A simulated cohort of 100,000 individuals from age 50 is built through a total of 30 1-year Markov cycles. The primary outcomes are incremental cost-effectiveness ratio and incremental cost-utility ratio. The results show that DH screening dominates no screening, telescreening and AI screening in urban and rural China. Annual DH screening emerges as the most economically effective strategy with 341 (338 to 344) and 1326 (1312 to 1340) years of blindness avoided compared with telescreening, and 37 (35 to 39) and 140 (131 to 148) years compared with AI screening in urban and rural settings, respectively. The findings remain robust across all sensitivity analyses conducted. Here, we report that DH screening is cost-effective in urban and rural China, and the annual screening proves to be the most cost-effective option, providing an economic rationale for policymakers promoting public eye health in low- and middle-income countries.
Assuntos
Catarata , Análise Custo-Benefício , Programas de Rastreamento , Humanos , China/epidemiologia , Catarata/economia , Catarata/diagnóstico , Catarata/epidemiologia , Pessoa de Meia-Idade , Programas de Rastreamento/economia , Programas de Rastreamento/métodos , Masculino , Tecnologia Digital/economia , Feminino , Cadeias de Markov , Idoso , Inteligência Artificial , Telemedicina/economia , Telemedicina/métodosRESUMO
With the rapid growth of interest in and use of large language models (LLMs) across various industries, we are facing some crucial and profound ethical concerns, especially in the medical field. The unique technical architecture and purported emergent abilities of LLMs differentiate them substantially from other artificial intelligence (AI) models and natural language processing techniques used, necessitating a nuanced understanding of LLM ethics. In this Viewpoint, we highlight ethical concerns stemming from the perspectives of users, developers, and regulators, notably focusing on data privacy and rights of use, data provenance, intellectual property contamination, and broad applications and plasticity of LLMs. A comprehensive framework and mitigating strategies will be imperative for the responsible integration of LLMs into medical practice, ensuring alignment with ethical principles and safeguarding against potential societal risks.
Assuntos
Inteligência Artificial , Processamento de Linguagem Natural , Humanos , Inteligência Artificial/ética , Propriedade IntelectualRESUMO
Large language models (LLMs) underlie remarkable recent advanced in natural language processing, and they are beginning to be applied in clinical contexts. We aimed to evaluate the clinical potential of state-of-the-art LLMs in ophthalmology using a more robust benchmark than raw examination scores. We trialled GPT-3.5 and GPT-4 on 347 ophthalmology questions before GPT-3.5, GPT-4, PaLM 2, LLaMA, expert ophthalmologists, and doctors in training were trialled on a mock examination of 87 questions. Performance was analysed with respect to question subject and type (first order recall and higher order reasoning). Masked ophthalmologists graded the accuracy, relevance, and overall preference of GPT-3.5 and GPT-4 responses to the same questions. The performance of GPT-4 (69%) was superior to GPT-3.5 (48%), LLaMA (32%), and PaLM 2 (56%). GPT-4 compared favourably with expert ophthalmologists (median 76%, range 64-90%), ophthalmology trainees (median 59%, range 57-63%), and unspecialised junior doctors (median 43%, range 41-44%). Low agreement between LLMs and doctors reflected idiosyncratic differences in knowledge and reasoning with overall consistency across subjects and types (p>0.05). All ophthalmologists preferred GPT-4 responses over GPT-3.5 and rated the accuracy and relevance of GPT-4 as higher (p<0.05). LLMs are approaching expert-level knowledge and reasoning skills in ophthalmology. In view of the comparable or superior performance to trainee-grade ophthalmologists and unspecialised junior doctors, state-of-the-art LLMs such as GPT-4 may provide useful medical advice and assistance where access to expert ophthalmologists is limited. Clinical benchmarks provide useful assays of LLM capabilities in healthcare before clinical trials can be designed and conducted.
RESUMO
ABSTRACT: With the rise of generative artificial intelligence (AI) and AI-powered chatbots, the landscape of medicine and healthcare is on the brink of significant transformation. This perspective delves into the prospective influence of AI on medical education, residency training and the continuing education of attending physicians or consultants. We begin by highlighting the constraints of the current education model, challenges in limited faculty, uniformity amidst burgeoning medical knowledge and the limitations in 'traditional' linear knowledge acquisition. We introduce 'AI-assisted' and 'AI-integrated' paradigms for medical education and physician training, targeting a more universal, accessible, high-quality and interconnected educational journey. We differentiate between essential knowledge for all physicians, specialised insights for clinician-scientists and mastery-level proficiency for clinician-computer scientists. With the transformative potential of AI in healthcare and service delivery, it is poised to reshape the pedagogy of medical education and residency training.
Assuntos
Educação Médica , Médicos , Humanos , Inteligência Artificial , Estudos Prospectivos , Educação ContinuadaRESUMO
Federated learning (FL) is a distributed machine learning framework that is gaining traction in view of increasing health data privacy protection needs. By conducting a systematic review of FL applications in healthcare, we identify relevant articles in scientific, engineering, and medical journals in English up to August 31st, 2023. Out of a total of 22,693 articles under review, 612 articles are included in the final analysis. The majority of articles are proof-of-concepts studies, and only 5.2% are studies with real-life application of FL. Radiology and internal medicine are the most common specialties involved in FL. FL is robust to a variety of machine learning models and data types, with neural networks and medical imaging being the most common, respectively. We highlight the need to address the barriers to clinical translation and to assess its real-world impact in this new digital data-driven healthcare scene.
Assuntos
Atenção à Saúde , Aprendizado de Máquina , Humanos , Redes Neurais de ComputaçãoRESUMO
This perspective highlights the importance of addressing social determinants of health (SDOH) in patient health outcomes and health inequity, a global problem exacerbated by the COVID-19 pandemic. We provide a broad discussion on current developments in digital health and artificial intelligence (AI), including large language models (LLMs), as transformative tools in addressing SDOH factors, offering new capabilities for disease surveillance and patient care. Simultaneously, we bring attention to challenges, such as data standardization, infrastructure limitations, digital literacy, and algorithmic bias, that could hinder equitable access to AI benefits. For LLMs, we highlight potential unique challenges and risks including environmental impact, unfair labor practices, inadvertent disinformation or "hallucinations," proliferation of bias, and infringement of copyrights. We propose the need for a multitiered approach to digital inclusion as an SDOH and the development of ethical and responsible AI practice frameworks globally and provide suggestions on bridging the gap from development to implementation of equitable AI technologies.
Assuntos
Inteligência Artificial , COVID-19 , Humanos , Pandemias , Determinantes Sociais da Saúde , COVID-19/epidemiologia , IdiomaRESUMO
Color fundus photography (CFP) and Optical coherence tomography (OCT) images are two of the most widely used modalities in the clinical diagnosis and management of retinal diseases. Despite the widespread use of multimodal imaging in clinical practice, few methods for automated diagnosis of eye diseases utilize correlated and complementary information from multiple modalities effectively. This paper explores how to leverage the information from CFP and OCT images to improve the automated diagnosis of retinal diseases. We propose a novel multimodal learning method, named geometric correspondence-based multimodal learning network (GeCoM-Net), to achieve the fusion of CFP and OCT images. Specifically, inspired by clinical observations, we consider the geometric correspondence between the OCT slice and the CFP region to learn the correlated features of the two modalities for robust fusion. Furthermore, we design a new feature selection strategy to extract discriminative OCT representations by automatically selecting the important feature maps from OCT slices. Unlike the existing multimodal learning methods, GeCoM-Net is the first method that formulates the geometric relationships between the OCT slice and the corresponding region of the CFP image explicitly for CFP and OCT fusion. Experiments have been conducted on a large-scale private dataset and a publicly available dataset to evaluate the effectiveness of GeCoM-Net for diagnosing diabetic macular edema (DME), impaired visual acuity (VA) and glaucoma. The empirical results show that our method outperforms the current state-of-the-art multimodal learning methods by improving the AUROC score 0.4%, 1.9% and 2.9% for DME, VA and glaucoma detection, respectively.
Assuntos
Interpretação de Imagem Assistida por Computador , Imagem Multimodal , Tomografia de Coerência Óptica , Humanos , Tomografia de Coerência Óptica/métodos , Imagem Multimodal/métodos , Interpretação de Imagem Assistida por Computador/métodos , Algoritmos , Doenças Retinianas/diagnóstico por imagem , Retina/diagnóstico por imagem , Aprendizado de Máquina , Fotografação/métodos , Técnicas de Diagnóstico Oftalmológico , Bases de Dados FactuaisRESUMO
The advent of generative artificial intelligence and large language models has ushered in transformative applications within medicine. Specifically in ophthalmology, large language models offer unique opportunities to revolutionise digital eye care, address clinical workflow inefficiencies, and enhance patient experiences across diverse global eye care landscapes. Yet alongside these prospects lie tangible and ethical challenges, encompassing data privacy, security, and the intricacies of embedding large language models into clinical routines. This Viewpoint highlights the promising applications of large language models in ophthalmology, while weighing up the practical and ethical barriers towards their real-world implementation. This Viewpoint seeks to stimulate broader discourse on the potential of large language models in ophthalmology and to galvanise both clinicians and researchers into tackling the prevailing challenges and optimising the benefits of large language models while curtailing the associated risks.
Assuntos
Medicina , Oftalmologia , Humanos , Inteligência Artificial , Idioma , PrivacidadeRESUMO
Current and future healthcare professionals are generally not trained to cope with the proliferation of artificial intelligence (AI) technology in healthcare. To design a curriculum that caters to variable baseline knowledge and skills, clinicians may be conceptualized as "consumers", "translators", or "developers". The changes required of medical education because of AI innovation are linked to those brought about by evidence-based medicine (EBM). We outline a core curriculum for AI education of future consumers, translators, and developers, emphasizing the links between AI and EBM, with suggestions for how teaching may be integrated into existing curricula. We consider the key barriers to implementation of AI in the medical curriculum: time, resources, variable interest, and knowledge retention. By improving AI literacy rates and fostering a translator- and developer-enriched workforce, innovation may be accelerated for the benefit of patients and practitioners.
Assuntos
Inteligência Artificial , Educação Médica , Humanos , Currículo , Medicina Baseada em Evidências/educaçãoRESUMO
In this issue of Cell Reports Medicine, Zhao and colleagues1 report a multi-tasking artificial intelligence system that can assist the whole process of fundus fluorescein angiography (FFA) imaging and reduce the reliance on retinal specialists in FFA examination.