RESUMO
OBJECTIVES: Non-contrast computed tomography of the brain (NCCTB) is commonly used to detect intracranial pathology but is subject to interpretation errors. Machine learning can augment clinical decision-making and improve NCCTB scan interpretation. This retrospective detection accuracy study assessed the performance of radiologists assisted by a deep learning model and compared the standalone performance of the model with that of unassisted radiologists. METHODS: A deep learning model was trained on 212,484 NCCTB scans drawn from a private radiology group in Australia. Scans from inpatient, outpatient, and emergency settings were included. Scan inclusion criteria were age ≥ 18 years and series slice thickness ≤ 1.5 mm. Thirty-two radiologists reviewed 2848 scans with and without the assistance of the deep learning system and rated their confidence in the presence of each finding using a 7-point scale. Differences in AUC and Matthews correlation coefficient (MCC) were calculated using a ground-truth gold standard. RESULTS: The model demonstrated an average area under the receiver operating characteristic curve (AUC) of 0.93 across 144 NCCTB findings and significantly improved radiologist interpretation performance. Assisted and unassisted radiologists demonstrated an average AUC of 0.79 and 0.73 across 22 grouped parent findings and 0.72 and 0.68 across 189 child findings, respectively. When assisted by the model, radiologist AUC was significantly improved for 91 findings (158 findings were non-inferior), and reading time was significantly reduced. CONCLUSIONS: The assistance of a comprehensive deep learning model significantly improved radiologist detection accuracy across a wide range of clinical findings and demonstrated the potential to improve NCCTB interpretation. CLINICAL RELEVANCE STATEMENT: This study evaluated a comprehensive CT brain deep learning model, which performed strongly, improved the performance of radiologists, and reduced interpretation time. The model may reduce errors, improve efficiency, facilitate triage, and better enable the delivery of timely patient care. KEY POINTS: ⢠This study demonstrated that the use of a comprehensive deep learning system assisted radiologists in the detection of a wide range of abnormalities on non-contrast brain computed tomography scans. ⢠The deep learning model demonstrated an average area under the receiver operating characteristic curve of 0.93 across 144 findings and significantly improved radiologist interpretation performance. ⢠The assistance of the comprehensive deep learning model significantly reduced the time required for radiologists to interpret computed tomography scans of the brain.
Assuntos
Aprendizado Profundo , Adolescente , Humanos , Radiografia , Radiologistas , Estudos Retrospectivos , Tomografia Computadorizada por Raios X/métodos , AdultoRESUMO
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the evergrowing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multisociety paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools.
Assuntos
Inteligência Artificial , Radiologia , Sociedades Médicas , Humanos , Canadá , Europa (Continente) , Nova Zelândia , Estados Unidos , AustráliaRESUMO
Machine learning may assist in medical student evaluation. This study involved scoring short answer questions administered at three centres. Bidirectional encoder representations from transformers were particularly effective for professionalism question scoring (accuracy ranging from 41.6% to 92.5%). In the scoring of 3-mark professionalism questions, as compared with clinical questions, machine learning had a lower classification accuracy (P < 0.05). The role of machine learning in medical professionalism evaluation warrants further investigation.
Assuntos
Profissionalismo , Estudantes de Medicina , Humanos , Aprendizado de MáquinaRESUMO
The development and commercialisation of medical decision systems based on artificial intelligence (AI) far outpaces our understanding of their value for clinicians. Although applicable across many forms of medicine, we focus on characterising the diagnostic decisions of radiologists through the concept of ecologically bounded reasoning, review the differences between clinician decision making and medical AI model decision making, and reveal how these differences pose fundamental challenges for integrating AI into radiology. We argue that clinicians are contextually motivated, mentally resourceful decision makers, whereas AI models are contextually stripped, correlational decision makers, and discuss misconceptions about clinician-AI interaction stemming from this misalignment of capabilities. We outline how future research on clinician-AI interaction could better address the cognitive considerations of decision making and be used to enhance the safety and usability of AI models in high-risk medical decision-making contexts.
Assuntos
Inteligência Artificial , Tomada de Decisão Clínica , Humanos , Tomada de Decisão Clínica/métodos , Cognição , Sistemas de Apoio a Decisões Clínicas , RadiologiaRESUMO
OBJECTIVE: In this prospective cohort study, we provide several prognostic models to predict functional status as measured by the modified Health Assessment Questionnaire (mHAQ). The early adoption of the treat-to-target strategy in this cohort offered a unique opportunity to identify predictive factors using longitudinal data across 20 years. METHODS: A cohort of 397 patients with early RA was used to develop statistical models to predict mHAQ score measured at baseline, 12 months, and 18 months post diagnosis, as well as serially measured mHAQ. Demographic data, clinical measures, autoantibodies, medication use, comorbid conditions, and baseline mHAQ were considered as predictors. RESULTS: The discriminative performance of models was comparable to previous work, with an area under the receiver operator curve ranging from 0.64 to 0.88. The most consistent predictive variable was baseline mHAQ. Patient-reported outcomes including early morning stiffness, tender joint count (TJC), fatigue, pain, and patient global assessment were positively predictive of a higher mHAQ at baseline and longitudinally, as was the physician global assessment and C-reactive protein. When considering future function, a higher TJC predicted persistent disability while a higher swollen joint count predicted functional improvements with treatment. CONCLUSION: In our study of mHAQ prediction in RA patients receiving treat-to-target therapy, patient-reported outcomes were most consistently predictive of function. Patients with high disease activity due predominantly to tenderness scores rather than swelling may benefit from less aggressive treatment escalation and an emphasis on non-pharmacological therapies, allowing for a more personalized approach to treatment. Key Points ⢠Long-term use of the treat-to-target strategy in this patient cohort offers a unique opportunity to develop prognostic models for functional outcomes using extensive longitudinal data. ⢠Patient reported outcomes were more consistent predictors of function than traditional prognostic markers. ⢠Tender joint count and swollen joint count had discordant relationships with future function, adding weight to the possibility that disease activity may better guide treatment when the components are considered separately.
Assuntos
Antirreumáticos , Artrite Reumatoide , Mitoxantrona/análogos & derivados , Humanos , Prognóstico , Estudos Prospectivos , Artrite Reumatoide/diagnóstico , Artrite Reumatoide/tratamento farmacológico , Proteína C-Reativa , Índice de Gravidade de Doença , Antirreumáticos/uso terapêuticoRESUMO
Purpose To investigate the issues of generalizability and replication of deep learning models by assessing performance of a screening mammography deep learning system developed at New York University (NYU) on a local Australian dataset. Materials and Methods In this retrospective study, all individuals with biopsy or surgical pathology-proven lesions and age-matched controls were identified from a South Australian public mammography screening program (January 2010 to December 2016). The primary outcome was deep learning system performance-measured with area under the receiver operating characteristic curve (AUC)-in classifying invasive breast cancer or ductal carcinoma in situ (n = 425) versus no malignancy (n = 490) or benign lesions (n = 44). The NYU system, including models without (NYU1) and with (NYU2) heatmaps, was tested in its original form, after training from scratch (without transfer learning), and after retraining with transfer learning. Results The local test set comprised 959 individuals (mean age, 62.5 years ± 8.5 [SD]; all female). The original AUCs for the NYU1 and NYU2 models were 0.83 (95% CI: 0.82, 0.84) and 0.89 (95% CI: 0.88, 0.89), respectively. When NYU1 and NYU2 were applied in their original form to the local test set, the AUCs were 0.76 (95% CI: 0.73, 0.79) and 0.84 (95% CI: 0.82, 0.87), respectively. After local training without transfer learning, the AUCs were 0.66 (95% CI: 0.62, 0.69) and 0.86 (95% CI: 0.84, 0.88). After retraining with transfer learning, the AUCs were 0.82 (95% CI: 0.80, 0.85) and 0.86 (95% CI: 0.84, 0.88). Conclusion A deep learning system developed using a U.S. dataset showed reduced performance when applied "out of the box" to an Australian dataset. Local retraining with transfer learning using available model weights improved model performance. Keywords: Screening Mammography, Convolutional Neural Network (CNN), Deep Learning Algorithms, Breast Cancer Supplemental material is available for this article. © RSNA, 2024 See also commentary by Cadrin-Chênevert in this issue.
Assuntos
Neoplasias da Mama , Aprendizado Profundo , Mamografia , Humanos , Mamografia/métodos , Feminino , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/patologia , Pessoa de Meia-Idade , Estudos Retrospectivos , Detecção Precoce de Câncer/métodos , Idoso , Interpretação de Imagem Radiográfica Assistida por Computador/métodosRESUMO
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones.This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools.Key points ⢠The incorporation of artificial intelligence (AI) in radiological practice demands increased monitoring of its utility and safety.⢠Cooperation between developers, clinicians, and regulators will allow all involved to address ethical issues and monitor AI performance.⢠AI can fulfil its promise to advance patient well-being if all steps from development to integration in healthcare are rigorously evaluated.
RESUMO
Artificial intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools. KEY POINTS.
Assuntos
Inteligência Artificial , Radiologia , Humanos , Estados Unidos , Sociedades Médicas , Europa (Continente) , Canadá , Nova Zelândia , AustráliaRESUMO
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools. This article is simultaneously published in Insights into Imaging (DOI 10.1186/s13244-023-01541-3), Journal of Medical Imaging and Radiation Oncology (DOI 10.1111/1754-9485.13612), Canadian Association of Radiologists Journal (DOI 10.1177/08465371231222229), Journal of the American College of Radiology (DOI 10.1016/j.jacr.2023.12.005), and Radiology: Artificial Intelligence (DOI 10.1148/ryai.230513). Keywords: Artificial Intelligence, Radiology, Automation, Machine Learning Published under a CC BY 4.0 license. ©The Author(s) 2024. Editor's Note: The RSNA Board of Directors has endorsed this article. It has not undergone review or editing by this journal.
Assuntos
Inteligência Artificial , Radiologia , Humanos , Canadá , Radiografia , AutomaçãoRESUMO
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools.
Assuntos
Inteligência Artificial , Radiologia , Humanos , Canadá , Sociedades Médicas , Europa (Continente)RESUMO
The Consolidated Standards of Reporting Trials extension for Artificial Intelligence interventions (CONSORT-AI) was published in September 2020. Since its publication, several randomised controlled trials (RCTs) of AI interventions have been published but their completeness and transparency of reporting is unknown. This systematic review assesses the completeness of reporting of AI RCTs following publication of CONSORT-AI and provides a comprehensive summary of RCTs published in recent years. 65 RCTs were identified, mostly conducted in China (37%) and USA (18%). Median concordance with CONSORT-AI reporting was 90% (IQR 77-94%), although only 10 RCTs explicitly reported its use. Several items were consistently under-reported, including algorithm version, accessibility of the AI intervention or code, and references to a study protocol. Only 3 of 52 included journals explicitly endorsed or mandated CONSORT-AI. Despite a generally high concordance amongst recent AI RCTs, some AI-specific considerations remain systematically poorly reported. Further encouragement of CONSORT-AI adoption by journals and funders may enable more complete adoption of the full CONSORT-AI guidelines.
Assuntos
Inteligência Artificial , Ensaios Clínicos Controlados Aleatórios como Assunto , Ensaios Clínicos Controlados Aleatórios como Assunto/normas , Humanos , Guias como Assunto , Projetos de Pesquisa/normas , Relatório de Pesquisa/normas , ChinaRESUMO
INTRODUCTION: This study assessed replacing traditional protocol CT-arterial chest and venous abdomen and pelvis, with a single-pass, single-bolus, venous phase CT chest, abdomen and pelvis (CAP) protocol in general oncology outpatients at a single centre. METHODS: A traditional protocol is an arterial phase chest followed by venous phase abdomen and pelvis. A venous CAP (vCAP) protocol is a single acquisition 60 s after contrast injection, with optional arterial phase upper abdomen based on the primary tumour. Consecutive eligible patients were assessed, using each patient's prior study as a comparator. Attenuation for various structures, lesion conspicuity and dose were compared. Subset analysis of dual-energy (DE) CT scans in the vCAP protocol performed for lesion conspicuity on 50 keV virtual monoenergetic (VME) images. RESULTS: One hundred and eleven patients were assessed with both protocols. Forty-six patients had their vCAP scans using DECT. The vCAP protocol had no significant difference in the attenuation of abdominal structures, with reduced attenuation of mediastinal structures. There was a significant improvement in the visibility of pleural lesions (p < 0.001), a trend for improved mediastinal nodes assessment, and no significant difference for abdominal lesions. A significant increase in liver lesion conspicuity on 50 keV VME reconstructions was noted for both readers (p < 0.001). There were significant dose reductions with the vCAP protocol. CONCLUSION: A single-pass vCAP protocol offered an improved thoracic assessment with no loss of abdominal diagnostic confidence and significant dose reductions compared to traditional protocol. Improved liver lesion conspicuity on 50 keV VME images across a range of cancers is promising.
Assuntos
Neoplasias Hepáticas , Imagem Radiográfica a Partir de Emissão de Duplo Fóton , Humanos , Pacientes Ambulatoriais , Tomografia Computadorizada por Raios X/métodos , Abdome/diagnóstico por imagem , Pelve/diagnóstico por imagem , Neoplasias Hepáticas/diagnóstico por imagem , Estudos Retrospectivos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Meios de Contraste , Imagem Radiográfica a Partir de Emissão de Duplo Fóton/métodosRESUMO
The inclusion and celebration of LGBTQIA+ staff in radiology and radiation oncology departments is crucial in developing a diverse and thriving workplace. Despite the substantial social change in Australia, LGBTQIA+ people still experience harassment and exclusion, negatively impacting their well-being and workplace productivity. We need to be proactive in creating policies that are properly implemented and translate to a safe and inclusive space for marginalised groups. In this work, we outline the role we all can play in creating inclusive environments, for both individuals and leaders working in radiology and radiation oncology. We can learn how to avoid normative assumptions about gender and sexuality, respect people's identities and speak out against witnessed discrimination or slights. Robust policies are needed to protect LGBTQIA+ members from discrimination and provide equal access across other pertinent parts of work life such as leave entitlements, representation in data collection and safe bathroom access. We all deserve to feel safe and respected at work and further effort is needed to ensure this extends to LGBTQIA+ staff in the radiology and radiation oncology workforces.
Assuntos
Radioterapia (Especialidade) , Minorias Sexuais e de Gênero , Humanos , Identidade de Gênero , Local de Trabalho , AustráliaRESUMO
BACKGROUND: Machine learning and deep learning models have been increasingly used to predict long-term disease progression in patients with chronic obstructive pulmonary disease (COPD). We aimed to summarise the performance of such prognostic models for COPD, compare their relative performances, and identify key research gaps. METHODS: We conducted a systematic review and meta-analysis to compare the performance of machine learning and deep learning prognostic models and identify pathways for future research. We searched PubMed, Embase, the Cochrane Library, ProQuest, Scopus, and Web of Science from database inception to April 6, 2023, for studies in English using machine learning or deep learning to predict patient outcomes at least 6 months after initial clinical presentation in those with COPD. We included studies comprising human adults aged 18-90 years and allowed for any input modalities. We reported area under the receiver operator characteristic curve (AUC) with 95% CI for predictions of mortality, exacerbation, and decline in forced expiratory volume in 1 s (FEV1). We reported the degree of interstudy heterogeneity using Cochran's Q test (significant heterogeneity was defined as p≤0·10 or I2>50%). Reporting quality was assessed using the TRIPOD checklist and a risk-of-bias assessment was done using the PROBAST checklist. This study was registered with PROSPERO (CRD42022323052). FINDINGS: We identified 3620 studies in the initial search. 18 studies were eligible, and, of these, 12 used conventional machine learning and six used deep learning models. Seven models analysed exacerbation risk, with only six reporting AUC and 95% CI on internal validation datasets (pooled AUC 0·77 [95% CI 0·69-0·85]) and there was significant heterogeneity (I2 97%, p<0·0001). 11 models analysed mortality risk, with only six reporting AUC and 95% CI on internal validation datasets (pooled AUC 0·77 [95% CI 0·74-0·80]) with significant degrees of heterogeneity (I2 60%, p=0·027). Two studies assessed decline in lung function and were unable to be pooled. Machine learning and deep learning models did not show significant improvement over pre-existing disease severity scores in predicting exacerbations (p=0·24). Three studies directly compared machine learning models against pre-existing severity scores for predicting mortality and pooled performance did not differ (p=0·57). Of the five studies that performed external validation, performance was worse than or equal to regression models. Incorrect handling of missing data, not reporting model uncertainty, and use of datasets that were too small relative to the number of predictive features included provided the largest risks of bias. INTERPRETATION: There is limited evidence that conventional machine learning and deep learning prognostic models demonstrate superior performance to pre-existing disease severity scores. More rigorous adherence to reporting guidelines would reduce the risk of bias in future studies and aid study reproducibility. FUNDING: None.
Assuntos
Aprendizado Profundo , Doença Pulmonar Obstrutiva Crônica , Adulto , Humanos , Reprodutibilidade dos Testes , Qualidade de Vida , Doença Pulmonar Obstrutiva Crônica/diagnóstico , PrognósticoRESUMO
Artificial intelligence as a medical device is increasingly being applied to healthcare for diagnosis, risk stratification and resource allocation. However, a growing body of evidence has highlighted the risk of algorithmic bias, which may perpetuate existing health inequity. This problem arises in part because of systemic inequalities in dataset curation, unequal opportunity to participate in research and inequalities of access. This study aims to explore existing standards, frameworks and best practices for ensuring adequate data diversity in health datasets. Exploring the body of existing literature and expert views is an important step towards the development of consensus-based guidelines. The study comprises two parts: a systematic review of existing standards, frameworks and best practices for healthcare datasets; and a survey and thematic analysis of stakeholder views of bias, health equity and best practices for artificial intelligence as a medical device. We found that the need for dataset diversity was well described in literature, and experts generally favored the development of a robust set of guidelines, but there were mixed views about how these could be implemented practically. The outputs of this study will be used to inform the development of standards for transparency of data diversity in health datasets (the STANDING Together initiative).
Assuntos
Inteligência Artificial , Atenção à Saúde , Humanos , Consenso , Revisões Sistemáticas como AssuntoRESUMO
Artificial intelligence systems for health care, like any other medical device, have the potential to fail. However, specific qualities of artificial intelligence systems, such as the tendency to learn spurious correlates in training data, poor generalisability to new deployment settings, and a paucity of reliable explainability mechanisms, mean they can yield unpredictable errors that might be entirely missed without proactive investigation. We propose a medical algorithmic audit framework that guides the auditor through a process of considering potential algorithmic errors in the context of a clinical task, mapping the components that might contribute to the occurrence of errors, and anticipating their potential consequences. We suggest several approaches for testing algorithmic errors, including exploratory error analysis, subgroup testing, and adversarial testing, and provide examples from our own work and previous studies. The medical algorithmic audit is a tool that can be used to better understand the weaknesses of an artificial intelligence system and put in place mechanisms to mitigate their impact. We propose that safety monitoring and medical algorithmic auditing should be a joint responsibility between users and developers, and encourage the use of feedback mechanisms between these groups to promote learning and maintain safe deployment of artificial intelligence systems.
Assuntos
Inteligência Artificial , Atenção à Saúde , Instalações de SaúdeRESUMO
Introduction: Machine learning (ML) methods are being increasingly applied to prognostic prediction for stroke patients with large vessel occlusion (LVO) treated with endovascular thrombectomy. This systematic review aims to summarize ML-based pre-thrombectomy prognostic models for LVO stroke and identify key research gaps. Methods: Literature searches were performed in Embase, PubMed, Web of Science, and Scopus. Meta-analyses of the area under the receiver operating characteristic curves (AUCs) of ML models were conducted to synthesize model performance. Results: Sixteen studies describing 19 models were eligible. The predicted outcomes include functional outcome at 90 days, successful reperfusion, and hemorrhagic transformation. Functional outcome was analyzed by 10 conventional ML models (pooled AUC=0.81, 95% confidence interval [CI]: 0.77-0.85, AUC range: 0.68-0.93) and four deep learning (DL) models (pooled AUC=0.75, 95% CI: 0.70-0.81, AUC range: 0.71-0.81). Successful reperfusion was analyzed by three conventional ML models (pooled AUC=0.72, 95% CI: 0.56-0.88, AUC range: 0.55-0.88) and one DL model (AUC=0.65, 95% CI: 0.62-0.68). Conclusions: Conventional ML and DL models have shown variable performance in predicting post-treatment outcomes of LVO without generally demonstrating superiority compared to existing prognostic scores. Most models were developed using small datasets, lacked solid external validation, and at high risk of potential bias. There is considerable scope to improve study design and model performance. The application of ML and DL methods to improve the prediction of prognosis in LVO stroke, while promising, remains nascent. Systematic review registration: https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42021266524, identifier CRD42021266524.
RESUMO
Rheumatoid arthritis is an autoimmune condition that predominantly affects the synovial joints, causing joint destruction, pain, and disability. Historically, the standard for measuring the long-term efficacy of disease-modifying antirheumatic drugs has been the assessment of plain radiographs with scoring techniques that quantify joint damage. However, with significant improvements in therapy, current radiographic scoring systems may no longer be fit for purpose for the milder spectrum of disease seen today. We argue that artificial intelligence is an apt solution to further improve upon radiographic scoring, as it can readily learn to recognize subtle patterns in imaging data to not only improve efficiency, but can also increase the sensitivity to variation in mild disease. Current work in the area demonstrates the feasibility of automating scoring but is yet to take full advantage of the strengths of artificial intelligence. By fully leveraging the power of artificial intelligence, faster and more sensitive scoring could enable the ongoing development of effective treatments for patients with rheumatoid arthritis.
Assuntos
Antirreumáticos , Artrite Reumatoide , Humanos , Inteligência Artificial , Progressão da Doença , Artrite Reumatoide/diagnóstico por imagem , Artrite Reumatoide/tratamento farmacológico , Antirreumáticos/uso terapêutico , ArticulaçõesRESUMO
BACKGROUND: Proximal femoral fractures are an important clinical and public health issue associated with substantial morbidity and early mortality. Artificial intelligence might offer improved diagnostic accuracy for these fractures, but typical approaches to testing of artificial intelligence models can underestimate the risks of artificial intelligence-based diagnostic systems. METHODS: We present a preclinical evaluation of a deep learning model intended to detect proximal femoral fractures in frontal x-ray films in emergency department patients, trained on films from the Royal Adelaide Hospital (Adelaide, SA, Australia). This evaluation included a reader study comparing the performance of the model against five radiologists (three musculoskeletal specialists and two general radiologists) on a dataset of 200 fracture cases and 200 non-fractures (also from the Royal Adelaide Hospital), an external validation study using a dataset obtained from Stanford University Medical Center, CA, USA, and an algorithmic audit to detect any unusual or unexpected model behaviour. FINDINGS: In the reader study, the area under the receiver operating characteristic curve (AUC) for the performance of the deep learning model was 0·994 (95% CI 0·988-0·999) compared with an AUC of 0·969 (0·960-0·978) for the five radiologists. This strong model performance was maintained on external validation, with an AUC of 0·980 (0·931-1·000). However, the preclinical evaluation identified barriers to safe deployment, including a substantial shift in the model operating point on external validation and an increased error rate on cases with abnormal bones (eg, Paget's disease). INTERPRETATION: The model outperformed the radiologists tested and maintained performance on external validation, but showed several unexpected limitations during further testing. Thorough preclinical evaluation of artificial intelligence models, including algorithmic auditing, can reveal unexpected and potentially harmful behaviour even in high-performance artificial intelligence systems, which can inform future clinical testing and deployment decisions. FUNDING: None.
Assuntos
Aprendizado Profundo , Fraturas do Fêmur , Inteligência Artificial , Serviço Hospitalar de Emergência , Fraturas do Fêmur/diagnóstico por imagem , Humanos , Estudos RetrospectivosRESUMO
BACKGROUND: Previous studies in medical imaging have shown disparate abilities of artificial intelligence (AI) to detect a person's race, yet there is no known correlation for race on medical imaging that would be obvious to human experts when interpreting the images. We aimed to conduct a comprehensive evaluation of the ability of AI to recognise a patient's racial identity from medical images. METHODS: Using private (Emory CXR, Emory Chest CT, Emory Cervical Spine, and Emory Mammogram) and public (MIMIC-CXR, CheXpert, National Lung Cancer Screening Trial, RSNA Pulmonary Embolism CT, and Digital Hand Atlas) datasets, we evaluated, first, performance quantification of deep learning models in detecting race from medical images, including the ability of these models to generalise to external environments and across multiple imaging modalities. Second, we assessed possible confounding of anatomic and phenotypic population features by assessing the ability of these hypothesised confounders to detect race in isolation using regression models, and by re-evaluating the deep learning models by testing them on datasets stratified by these hypothesised confounding variables. Last, by exploring the effect of image corruptions on model performance, we investigated the underlying mechanism by which AI models can recognise race. FINDINGS: In our study, we show that standard AI deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities, which was sustained under external validation conditions (x-ray imaging [area under the receiver operating characteristics curve (AUC) range 0·91-0·99], CT chest imaging [0·87-0·96], and mammography [0·81]). We also showed that this detection is not due to proxies or imaging-related surrogate covariates for race (eg, performance of possible confounders: body-mass index [AUC 0·55], disease distribution [0·61], and breast density [0·61]). Finally, we provide evidence to show that the ability of AI deep learning models persisted over all anatomical regions and frequency spectrums of the images, suggesting the efforts to control this behaviour when it is undesirable will be challenging and demand further study. INTERPRETATION: The results from our study emphasise that the ability of AI deep learning models to predict self-reported race is itself not the issue of importance. However, our finding that AI can accurately predict self-reported race, even from corrupted, cropped, and noised medical images, often when clinical experts cannot, creates an enormous risk for all model deployments in medical imaging. FUNDING: National Institute of Biomedical Imaging and Bioengineering, MIDRC grant of National Institutes of Health, US National Science Foundation, National Library of Medicine of the National Institutes of Health, and Taiwan Ministry of Science and Technology.