RESUMEN
Artificial intelligence (AI) has made impressive progress over the past few years, including many applications in medical imaging. Numerous commercial solutions based on AI techniques are now available for sale, forcing radiology practices to learn how to properly assess these tools. While several guidelines describing good practices for conducting and reporting AI-based research in medicine and radiology have been published, fewer efforts have focused on recommendations addressing the key questions to consider when critically assessing AI solutions before purchase. Commercial AI solutions are typically complicated software products, for the evaluation of which many factors are to be considered. In this work, authors from academia and industry have joined efforts to propose a practical framework that will help stakeholders evaluate commercial AI solutions in radiology (the ECLAIR guidelines) and reach an informed decision. Topics to consider in the evaluation include the relevance of the solution from the point of view of each stakeholder, issues regarding performance and validation, usability and integration, regulatory and legal aspects, and financial and support services. KEY POINTS: ⢠Numerous commercial solutions based on artificial intelligence techniques are now available for sale, and radiology practices have to learn how to properly assess these tools. ⢠We propose a framework focusing on practical points to consider when assessing an AI solution in medical imaging, allowing all stakeholders to conduct relevant discussions with manufacturers and reach an informed decision as to whether to purchase an AI commercial solution for imaging applications. ⢠Topics to consider in the evaluation include the relevance of the solution from the point of view of each stakeholder, issues regarding performance and validation, usability and integration, regulatory and legal aspects, and financial and support services.
Asunto(s)
Inteligencia Artificial , Radiología , Diagnóstico por Imagen , Humanos , Radiografía , Programas InformáticosRESUMEN
Artificial intelligence (AI) continues to garner substantial interest in medical imaging. The potential applications are vast and include the entirety of the medical imaging life cycle from image creation to diagnosis to outcome prediction. The chief obstacles to development and clinical implementation of AI algorithms include availability of sufficiently large, curated, and representative training data that includes expert labeling (eg, annotations). Current supervised AI methods require a curation process for data to optimally train, validate, and test algorithms. Currently, most research groups and industry have limited data access based on small sample sizes from small geographic areas. In addition, the preparation of data is a costly and time-intensive process, the results of which are algorithms with limited utility and poor generalization. In this article, the authors describe fundamental steps for preparing medical imaging data in AI algorithm development, explain current limitations to data curation, and explore new approaches to address the problem of data availability.
Asunto(s)
Algoritmos , Recolección de Datos , Manejo de Datos , Diagnóstico por Imagen , Aprendizaje Automático , HumanosRESUMEN
The purpose of this study was to establish interobserver reproducibility of Young's modulus (YM) derived from ultrasound shear wave elastography (US-SWE) in the normal prostate and correlate it with multiparametric magnetic resonance imaging (mpMRI) tissue characteristics. Twenty men being screened for prostate cancer underwent same-day US-SWE (10 done by two blinded, newly-trained observers) and mpMRI followed by 12-core biopsy. Bland-Altman plots established limits of agreement for YM. Quantitative data from the peripheral zone (PZ) and the transitional zone (TZ) for YM, apparent diffusion coefficient (ADC, mm2/s from diffusion-weighted MRI), and Ktrans (volume transfer coefficient, min-1), Ve (extravascular-extracellular space, %), Kep (rate constant, /min), and initial area under the gadolinium concentration curve (IAUGC60, mmol/L/s) from dynamic contrast-enhanced MRI were obtained for slice-matched prostate sextants. Interobserver intraclass correlation coefficients were fair to good for individual regions (PZ = 0.57, TZ = 0.65) and for whole gland 0.67, (increasing to 0.81 when corrected for systematic observer bias). In the PZ, there were weak negative correlations between YM and ADC ( p = 0.008), and Ve ( p = 0.01) and a weak positive correlation with Kep ( p = 0.003). No significant intermodality correlations were seen in the TZ. Transrectal prostate US-SWE done without controlling manually applied probe pressure has fair/good interobserver reproducibility in inexperienced observers with potential to improve this to excellent by standardization of probe contact pressure. Within the PZ, increase in tissue stiffness is associated with reduced extracellular water (decreased ADC) and space (reduced Ve).
Asunto(s)
Diagnóstico por Imagen de Elasticidad/métodos , Imagen por Resonancia Magnética/métodos , Próstata/anatomía & histología , Adulto , Anciano , Medios de Contraste , Módulo de Elasticidad , Gadolinio , Humanos , Aumento de la Imagen/métodos , Masculino , Persona de Mediana Edad , Variaciones Dependientes del Observador , Próstata/diagnóstico por imagen , Valores de Referencia , Reproducibilidad de los ResultadosRESUMEN
BACKGROUND: Intended use statements (IUSs) are mandatory to obtain regulatory clearance for artificial intelligence (AI)-based medical devices in the European Union. In order to guide the safe use of AI-based medical devices, IUSs need to contain comprehensive and understandable information. This study analyzes the IUSs of CE-marked AI products listed on AIforRadiology.com for ambiguity and completeness. METHODS: We retrieved 157 IUSs of CE-marked AI products listed on AIforRadiology.com in September 2022. Duplicate products (n = 1), discontinued products (n = 3), and duplicate statements (n = 14) were excluded. The resulting IUSs were assessed for the presence of 6 items: medical indication, part of the body, patient population, user profile, use environment, and operating principle. Disclaimers, defined as contra-indications or warnings in the IUS, were identified and compared with claims. RESULTS: Of 139 AI products, the majority (n = 78) of IUSs mentioned 3 or less items. IUSs of only 7 products mentioned all 6 items. The intended body part (n = 115) and the operating principle (n = 116) were the most frequently mentioned components, while the intended use environment (n = 24) and intended patient population (n = 29) were mentioned less frequently. Fifty-six statements contained disclaimers that conflicted with the claims in 13 cases. CONCLUSION: The majority of IUSs of CE-marked AI-based medical devices lack substantial information and, in few cases, contradict the claims of the product. CRITICAL RELEVANCE STATEMENT: To ensure correct usage and to avoid off-label use or foreseeable misuse of AI-based medical devices in radiology, manufacturers are encouraged to provide more comprehensive and less ambiguous intended use statements. KEY POINTS: ⢠Radiologists must know AI products' intended use to avoid off-label use or misuse. ⢠Ninety-five percent (n = 132/139) of the intended use statements analyzed were incomplete. ⢠Nine percent (n = 13) of the intended use statements held disclaimers contradicting the claim of the AI product. ⢠Manufacturers and regulatory bodies must ensure that intended use statements are comprehensive.
RESUMEN
BACKGROUND: Medical Imaging and radiotherapy (MIRT) are at the forefront of artificial intelligence applications. The exponential increase of these applications has made governance frameworks necessary to uphold safe and effective clinical adoption. There is little information about how healthcare practitioners in MIRT in the UK use AI tools, their governance and associated challenges, opportunities and priorities for the future. METHODS: This cross-sectional survey was open from November to December 2022 to MIRT professionals who had knowledge or made use of AI tools, as an attempt to map out current policy and practice and to identify future needs. The survey was electronically distributed to the participants. Statistical analysis included descriptive statistics and inferential statistics on the SPSS statistical software. Content analysis was employed for the open-ended questions. RESULTS: Among the 245 responses, the following were emphasised as central to AI adoption: governance frameworks, practitioner training, leadership, and teamwork within the AI ecosystem. Prior training was strongly correlated with increased knowledge about AI tools and frameworks. However, knowledge of related frameworks remained low, with different professionals showing different affinity to certain frameworks related to their respective roles. Common challenges and opportunities of AI adoption were also highlighted, with recommendations for future practice.
Asunto(s)
Inteligencia Artificial , Humanos , Estudios Transversales , Diagnóstico por Imagen , Reino UnidoRESUMEN
Mobile apps are the primary means by which consumers access digital health and wellness software, with delivery dominated by the 'Apple App Store' and the 'Google Play Store'. Through these virtual storefronts Apple and Google act as the distributor (and sometimes, importer) of many thousands of health and wellness apps into the EU, some of which have a medical purpose. As a result of changes to EU law which came into effect in May 2021, they must now ensure that apps are compliant with medical devices regulation and to inform authorities of serious incidents arising from their use. The extent to which these new rules are being complied with in practice is uneven, and in some areas unclear. In light of EU legislation related to competition, which came into effect in November 2022, it is also unclear how conflicts of interest can be managed between Apple and Google's roles as gateway duopoly importers and distributors whilst also developers of their own competitive health products. Finally, with the proposed European health data space regulation, wellness apps will be voluntarily registered and labelled in a fashion more like medical devices than consumer software. We explore the implications of these new regulations and propose future models that could resolve the apparent conflicts. All stakeholders would benefit from improved app store models to sustainably evolve safer, better, and fairer provision of digital health applications in the EU. As EU legislation comes into force it could serve as a template for other regions globally.
RESUMEN
Technological advancements in computer science have started to bring artificial intelligence (AI) from the bench closer to the bedside. While there is still lots to do and improve, AI models in medical imaging and radiotherapy are rapidly being developed and increasingly deployed in clinical practice. At the same time, AI governance frameworks are still under development. Clinical practitioners involved with procuring, deploying, and adopting AI tools in the UK should be well-informed about these AI governance frameworks. This scoping review aimed to map out available literature on AI governance in the UK, focusing on medical imaging and radiotherapy. Searches were performed on Google Scholar, Pubmed, and the Cochrane Library, between June and July 2022. Of 4225 initially identified sources, 35 were finally included in this review. A comprehensive conceptual AI governance framework was proposed, guided by the need for rigorous AI validation and evaluation procedures, the accreditation rules and standards, and the fundamental ethical principles of AI. Fairness, transparency, trustworthiness, and explainability should be drivers of all AI models deployed in clinical practice. Appropriate staff education is also mandatory to ensure AI's safe and responsible use. Multidisciplinary teams under robust leadership will facilitate AI adoption, and it is crucial to involve patients, the public, and practitioners in decision-making. Collaborative research should be encouraged to enhance and promote innovation, while caution should be paid to the ongoing auditing of AI tools to ensure safety and clinical effectiveness.
Asunto(s)
Inteligencia Artificial , Oncología por Radiación , Humanos , Diagnóstico por Imagen , Radiografía , Reino UnidoRESUMEN
With the exit of the UK from the European Union and the European Union Regulation 201/745 coming into effect on 26 May 2021, the regulatory landscape for medical devices is undergoing a substantial change, the implications of which will be felt by those procuring and using medical devices in clinical settings. This article outlines the changes that clinicians, as users of medical devices, should be aware of in the immediate future.
Asunto(s)
Legislación de Dispositivos Médicos , Unión Europea , Humanos , Reino UnidoRESUMEN
PURPOSE: To explore whether generative adversarial networks (GANs) can enable synthesis of realistic medical images that are indiscernible from real images, even by domain experts. MATERIALS AND METHODS: In this retrospective study, progressive growing GANs were used to synthesize mammograms at a resolution of 1280 × 1024 pixels by using images from 90 000 patients (average age, 56 years ± 9) collected between 2009 and 2019. To evaluate the results, a method to assess distributional alignment for ultra-high-dimensional pixel distributions was used, which was based on moment plots. This method was able to reveal potential sources of misalignment. A total of 117 volunteer participants (55 radiologists and 62 nonradiologists) took part in a study to assess the realism of synthetic images from GANs. RESULTS: A quantitative evaluation of distributional alignment shows 60%-78% mutual-information score between the real and synthetic image distributions, and 80%-91% overlap in their support, which are strong indications against mode collapse. It also reveals shape misalignment as the main difference between the two distributions. Obvious artifacts were found by an untrained observer in 13.6% and 6.4% of the synthetic mediolateral oblique and craniocaudal images, respectively. A reader study demonstrated that real and synthetic images are perceptually inseparable by the majority of participants, even by trained breast radiologists. Only one out of the 117 participants was able to reliably distinguish real from synthetic images, and this study discusses the cues they used to do so. CONCLUSION: On the basis of these findings, it appears possible to generate realistic synthetic full-field digital mammograms by using a progressive GAN architecture up to a resolution of 1280 × 1024 pixels.Supplemental material is available for this article.© RSNA, 2020.
RESUMEN
Although artificial intelligence (AI)-based algorithms for diagnosis hold promise for improving care, their safety and effectiveness must be ensured to facilitate wide adoption. Several recently proposed regulatory frameworks provide a solid foundation but do not address a number of issues that may prevent algorithms from being fully trusted. In this article, we review the major regulatory frameworks for software as a medical device applications, identify major gaps, and propose additional strategies to improve the development and evaluation of diagnostic AI algorithms. We identify the following major shortcomings of the current regulatory frameworks: (1) conflation of the diagnostic task with the diagnostic algorithm, (2) superficial treatment of the diagnostic task definition, (3) no mechanism to directly compare similar algorithms, (4) insufficient characterization of safety and performance elements, (5) lack of resources to assess performance at each installed site, and (6) inherent conflicts of interest. We recommend the following additional measures: (1) separate the diagnostic task from the algorithm, (2) define performance elements beyond accuracy, (3) divide the evaluation process into discrete steps, (4) encourage assessment by a third-party evaluator, (5) incorporate these elements into the manufacturers' development process. Specifically, we recommend four phases of development and evaluation, analogous to those that have been applied to pharmaceuticals and proposed for software applications, to help ensure world-class performance of all algorithms at all installed sites. In the coming years, we anticipate the emergence of a substantial body of research dedicated to ensuring the accuracy, reliability, and safety of the algorithms.
Asunto(s)
Algoritmos , Inteligencia Artificial , Diagnóstico por Imagen , Reproducibilidad de los Resultados , Programas InformáticosRESUMEN
The arrival of artificially intelligent systems into the domain of medical imaging has focused attention and sparked much debate on the role and responsibilities of the radiologist. However, discussion about the impact of such technology on the radiographer role is lacking. This paper discusses the potential impact of artificial intelligence (AI) on the radiography profession by assessing current workflow and cross-mapping potential areas of AI automation such as procedure planning, image acquisition and processing. We also highlight the opportunities that AI brings including enhancing patient-facing care, increased cross-modality education and working, increased technological expertise and expansion of radiographer responsibility into AI-supported image reporting and auditing roles.
Asunto(s)
Inteligencia Artificial , Radiografía , Radiología , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Procesamiento de Imagen Asistido por Computador/tendencias , Rol Profesional , Control de Calidad , Radiólogos , Radiología/educación , Planificación de la Radioterapia Asistida por Computador/métodos , Planificación de la Radioterapia Asistida por Computador/tendencias , Flujo de TrabajoRESUMEN
Since its inception in 2017, npj Digital Medicine has attracted a disproportionate number of manuscripts reporting on uses of artificial intelligence. This field has matured rapidly in the past several years. There was initial fascination with the algorithms themselves (machine learning, deep learning, convoluted neural networks) and the use of these algorithms to make predictions that often surpassed prevailing benchmarks. As the discipline has matured, individuals have called attention to aberrancies in the output of these algorithms. In particular, criticisms have been widely circulated that algorithmically developed models may have limited generalizability due to overfitting to the training data and may systematically perpetuate various forms of biases inherent in the training data, including race, gender, age, and health state or fitness level (Challen et al. BMJ Qual. Saf. 28:231-237, 2019; O'neil. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, Broadway Book, 2016). Given our interest in publishing the highest quality papers and the growing volume of submissions using AI algorithms, we offer a list of criteria that authors should consider before submitting papers to npj Digital Medicine.
RESUMEN
OBJECTIVE: To systematically examine the design, reporting standards, risk of bias, and claims of studies comparing the performance of diagnostic deep learning algorithms for medical imaging with that of expert clinicians. DESIGN: Systematic review. DATA SOURCES: Medline, Embase, Cochrane Central Register of Controlled Trials, and the World Health Organization trial registry from 2010 to June 2019. ELIGIBILITY CRITERIA FOR SELECTING STUDIES: Randomised trial registrations and non-randomised studies comparing the performance of a deep learning algorithm in medical imaging with a contemporary group of one or more expert clinicians. Medical imaging has seen a growing interest in deep learning research. The main distinguishing feature of convolutional neural networks (CNNs) in deep learning is that when CNNs are fed with raw data, they develop their own representations needed for pattern recognition. The algorithm learns for itself the features of an image that are important for classification rather than being told by humans which features to use. The selected studies aimed to use medical imaging for predicting absolute risk of existing disease or classification into diagnostic groups (eg, disease or non-disease). For example, raw chest radiographs tagged with a label such as pneumothorax or no pneumothorax and the CNN learning which pixel patterns suggest pneumothorax. REVIEW METHODS: Adherence to reporting standards was assessed by using CONSORT (consolidated standards of reporting trials) for randomised studies and TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) for non-randomised studies. Risk of bias was assessed by using the Cochrane risk of bias tool for randomised studies and PROBAST (prediction model risk of bias assessment tool) for non-randomised studies. RESULTS: Only 10 records were found for deep learning randomised clinical trials, two of which have been published (with low risk of bias, except for lack of blinding, and high adherence to reporting standards) and eight are ongoing. Of 81 non-randomised clinical trials identified, only nine were prospective and just six were tested in a real world clinical setting. The median number of experts in the comparator group was only four (interquartile range 2-9). Full access to all datasets and code was severely limited (unavailable in 95% and 93% of studies, respectively). The overall risk of bias was high in 58 of 81 studies and adherence to reporting standards was suboptimal (<50% adherence for 12 of 29 TRIPOD items). 61 of 81 studies stated in their abstract that performance of artificial intelligence was at least comparable to (or better than) that of clinicians. Only 31 of 81 studies (38%) stated that further prospective studies or trials were required. CONCLUSIONS: Few prospective deep learning studies and randomised trials exist in medical imaging. Most non-randomised trials are not prospective, are at high risk of bias, and deviate from existing reporting standards. Data and code availability are lacking in most studies, and human comparator groups are often small. Future studies should diminish risk of bias, enhance real world clinical relevance, improve reporting and transparency, and appropriately temper conclusions. STUDY REGISTRATION: PROSPERO CRD42019123605.
Asunto(s)
Aprendizaje Profundo , Diagnóstico por Imagen , Procesamiento de Imagen Asistido por Computador , Proyectos de Investigación , Algoritmos , Sesgo , Humanos , Médicos , Ensayos Clínicos Controlados Aleatorios como Asunto , Proyectos de Investigación/normasRESUMEN
resumen está disponible en el texto completo
ABSTRACT The CONSORT 2010 statement provides minimum guidelines for reporting randomized trials. Its widespread use has been instrumental in ensuring transparency in the evaluation of new interventions. More recently, there has been a growing recognition that interventions involving artificial intelligence (AI) need to undergo rigorous, prospective evaluation to demonstrate impact on health outcomes. The CONSORT-AI (Consolidated Standards of Reporting Trials-Artificial Intelligence) extension is a new reporting guideline for clinical trials evaluating interventions with an AI component. It was developed in parallel with its companion statement for clinical trial protocols: SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials-Artificial Intelligence). Both guidelines were developed through a staged consensus process involving literature review and expert consultation to generate 29 candidate items, which were assessed by an international multi-stakeholder group in a two-stage Delphi survey (103 stakeholders), agreed upon in a two-day consensus meeting (31 stakeholders) and refined through a checklist pilot (34 participants). The CONSORT-AI extension includes 14 new items that were considered sufficiently important for AI interventions that they should be routinely reported in addition to the core CONSORT 2010 items. CONSORT-AI recommends that investigators provide clear descriptions of the AI intervention, including instructions and skills required for use, the setting in which the AI intervention is integrated, the handling of inputs and outputs of the AI intervention, the human-AI interaction and provision of an analysis of error cases. CONSORT-AI will help promote transparency and completeness in reporting clinical trials for AI interventions. It will assist editors and peer reviewers, as well as the general readership, to understand, interpret and critically appraise the quality of clinical trial design and risk of bias in the reported outcomes.
RESUMO A declaração CONSORT 2010 apresenta diretrizes mínimas para relatórios de ensaios clínicos randomizados. Seu uso generalizado tem sido fundamental para garantir a transparência na avaliação de novas intervenções. Recentemente, tem-se reconhecido cada vez mais que intervenções que incluem inteligência artificial (IA) precisam ser submetidas a uma avaliação rigorosa e prospectiva para demonstrar seus impactos sobre os resultados de saúde. A extensão CONSORT-AI (Consolidated Standards of Reporting Trials - Artificial Intelligence) é uma nova diretriz para relatórios de ensaios clínicos que avaliam intervenções com um componente de IA. Ela foi desenvolvida em paralelo à sua declaração complementar para protocolos de ensaios clínicos, a SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials - Artificial Intelligence). Ambas as diretrizes foram desenvolvidas por meio de um processo de consenso em etapas que incluiu revisão da literatura e consultas a especialistas para gerar 29 itens candidatos. Foram feitas consultas sobre esses itens a um grupo internacional composto por 103 interessados diretos, que participaram de uma pesquisa Delphi em duas etapas. Chegou-se a um acordo sobre os itens em uma reunião de consenso que incluiu 31 interessados diretos, e os itens foram refinados por meio de uma lista de verificação piloto que envolveu 34 participantes. A extensão CONSORT-AI inclui 14 itens novos que, devido à sua importância para as intervenções de IA, devem ser informados rotineiramente juntamente com os itens básicos da CONSORT 2010. A CONSORT-AI preconiza que os pesquisadores descrevam claramente a intervenção de IA, incluindo instruções e as habilidades necessárias para seu uso, o contexto no qual a intervenção de IA está inserida, considerações sobre o manuseio dos dados de entrada e saída da intervenção de IA, a interação humano-IA e uma análise dos casos de erro. A CONSORT-AI ajudará a promover a transparência e a integralidade nos relatórios de ensaios clínicos com intervenções que utilizam IA. Seu uso ajudará editores e revisores, bem como leitores em geral, a entender, interpretar e avaliar criticamente a qualidade do desenho do ensaio clínico e o risco de viés nos resultados relatados.
RESUMEN
resumen está disponible en el texto completo
ABSTRACT The SPIRIT 2013 statement aims to improve the completeness of clinical trial protocol reporting by providing evidence-based recommendations for the minimum set of items to be addressed. This guidance has been instrumental in promoting transparent evaluation of new interventions. More recently, there has been a growing recognition that interventions involving artificial intelligence (AI) need to undergo rigorous, prospective evaluation to demonstrate their impact on health outcomes. The SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials-Artificial Intelligence) extension is a new reporting guideline for clinical trial protocols evaluating interventions with an AI component. It was developed in parallel with its companion statement for trial reports: CONSORT-AI (Consolidated Standards of Reporting Trials-Artificial Intelligence). Both guidelines were developed through a staged consensus process involving literature review and expert consultation to generate 26 candidate items, which were consulted upon by an international multi-stakeholder group in a two-stage Delphi survey (103 stakeholders), agreed upon in a consensus meeting (31 stakeholders) and refined through a checklist pilot (34 participants). The SPIRIT-AI extension includes 15 new items that were considered sufficiently important for clinical trial protocols of AI interventions. These new items should be routinely reported in addition to the core SPIRIT 2013 items. SPIRIT-AI recommends that investigators provide clear descriptions of the AI intervention, including instructions and skills required for use, the setting in which the AI intervention will be integrated, considerations for the handling of input and output data, the human-AI interaction and analysis of error cases. SPIRIT-AI will help promote transparency and completeness for clinical trial protocols for AI interventions. Its use will assist editors and peer reviewers, as well as the general readership, to understand, interpret and critically appraise the design and risk of bias for a planned clinical trial.
RESUMO A declaração SPIRIT 2013 tem como objetivo melhorar a integralidade dos relatórios dos protocolos de ensaios clínicos, fornecendo recomendações baseadas em evidências para o conjunto mínimo de itens que devem ser abordados. Essas orientações têm sido fundamentais para promover uma avaliação transparente de novas intervenções. Recentemente, tem-se reconhecido cada vez mais que intervenções que incluem inteligência artificial (IA) precisam ser submetidas a uma avaliação rigorosa e prospectiva para demonstrar seus impactos sobre os resultados de saúde. A extensão SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials - Artificial Intelligence) é uma nova diretriz de relatório para protocolos de ensaios clínicos que avaliam intervenções com um componente de IA. Essa diretriz foi desenvolvida em paralelo à sua declaração complementar para relatórios de ensaios clínicos, CONSORT-AI (Consolidated Standards of Reporting Trials - Artificial Intelligence). Ambas as diretrizes foram desenvolvidas por meio de um processo de consenso em etapas que incluiu revisão da literatura e consultas a especialistas para gerar 26 itens candidatos. Foram feitas consultas sobre esses itens a um grupo internacional composto por 103 interessados diretos, que participaram de uma pesquisa Delphi em duas etapas. Chegou-se a um acordo sobre os itens em uma reunião de consenso que incluiu 31 interessados diretos, e os itens foram refinados por meio de uma lista de verificação piloto que envolveu 34 participantes. A extensão SPIRIT-AI inclui 15 itens novos que foram considerados suficientemente importantes para os protocolos de ensaios clínicos com intervenções que utilizam IA. Esses itens novos devem constar dos relatórios de rotina, juntamente com os itens básicos da SPIRIT 2013. A SPIRIT-AI preconiza que os pesquisadores descrevam claramente a intervenção de IA, incluindo instruções e as habilidades necessárias para seu uso, o contexto no qual a intervenção de IA será integrada, considerações sobre o manuseio dos dados de entrada e saída, a interação humano-IA e a análise de casos de erro. A SPIRIT-AI ajudará a promover a transparência e a integralidade nos protocolos de ensaios clínicos com intervenções que utilizam IA. Seu uso ajudará editores e revisores, bem como leitores em geral, a entender, interpretar e avaliar criticamente o delineamento e o risco de viés de um futuro estudo clínico.