Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 119
Filtrar
1.
JMIR Res Protoc ; 13: e48156, 2024 Jul 11.
Artículo en Inglés | MEDLINE | ID: mdl-38990628

RESUMEN

BACKGROUND: The reporting of adverse events (AEs) relating to medical devices is a long-standing area of concern, with suboptimal reporting due to a range of factors including a failure to recognize the association of AEs with medical devices, lack of knowledge of how to report AEs, and a general culture of nonreporting. The introduction of artificial intelligence as a medical device (AIaMD) requires a robust safety monitoring environment that recognizes both generic risks of a medical device and some of the increasingly recognized risks of AIaMD (such as algorithmic bias). There is an urgent need to understand the limitations of current AE reporting systems and explore potential mechanisms for how AEs could be detected, attributed, and reported with a view to improving the early detection of safety signals. OBJECTIVE: The systematic review outlined in this protocol aims to yield insights into the frequency and severity of AEs while characterizing the events using existing regulatory guidance. METHODS: Publicly accessible AE databases will be searched to identify AE reports for AIaMD. Scoping searches have identified 3 regulatory territories for which public access to AE reports is provided: the United States, the United Kingdom, and Australia. AEs will be included for analysis if an artificial intelligence (AI) medical device is involved. Software as a medical device without AI is not within the scope of this review. Data extraction will be conducted using a data extraction tool designed for this review and will be done independently by AUK and a second reviewer. Descriptive analysis will be conducted to identify the types of AEs being reported, and their frequency, for different types of AIaMD. AEs will be analyzed and characterized according to existing regulatory guidance. RESULTS: Scoping searches are being conducted with screening to begin in April 2024. Data extraction and synthesis will commence in May 2024, with planned completion by August 2024. The review will highlight the types of AEs being reported for different types of AI medical devices and where the gaps are. It is anticipated that there will be particularly low rates of reporting for indirect harms associated with AIaMD. CONCLUSIONS: To our knowledge, this will be the first systematic review of 3 different regulatory sources reporting AEs associated with AIaMD. The review will focus on real-world evidence, which brings certain limitations, compounded by the opacity of regulatory databases generally. The review will outline the characteristics and frequency of AEs reported for AIaMD and help regulators and policy makers to continue developing robust safety monitoring processes. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): PRR1-10.2196/48156.


Asunto(s)
Inteligencia Artificial , Revisiones Sistemáticas como Asunto , Humanos , Equipos y Suministros/efectos adversos , Equipos y Suministros/normas , Bases de Datos Factuales , Estados Unidos , Reino Unido , Australia
2.
Med Image Anal ; 97: 103260, 2024 Jun 29.
Artículo en Inglés | MEDLINE | ID: mdl-38970862

RESUMEN

Robustness of deep learning segmentation models is crucial for their safe incorporation into clinical practice. However, these models can falter when faced with distributional changes. This challenge is evident in magnetic resonance imaging (MRI) scans due to the diverse acquisition protocols across various domains, leading to differences in image characteristics such as textural appearances. We posit that the restricted anatomical differences between subjects could be harnessed to refine the latent space into a set of shape components. The learned set then aims to encompass the relevant anatomical shape variation found within the patient population. We explore this by utilising multiple MRI sequences to learn texture invariant and shape equivariant features which are used to construct a shape dictionary using vector quantisation. We investigate shape equivariance to a number of different types of groups. We hypothesise and prove that the greater the group order, i.e., the denser the constraint, the better becomes the model robustness. We achieve shape equivariance either with a contrastive based approach or by imposing equivariant constraints on the convolutional kernels. The resulting shape equivariant dictionary is then sampled to compose the segmentation output. Our method achieves state-of-the-art performance for the task of single domain generalisation for prostate and cardiac MRI segmentation. Code is available at https://github.com/AinkaranSanthi/A_Geometric_Perspective_For_Robust_Segmentation.

3.
JMIR Res Protoc ; 13: e51614, 2024 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-38941147

RESUMEN

BACKGROUND: Artificial intelligence (AI) medical devices have the potential to transform existing clinical workflows and ultimately improve patient outcomes. AI medical devices have shown potential for a range of clinical tasks such as diagnostics, prognostics, and therapeutic decision-making such as drug dosing. There is, however, an urgent need to ensure that these technologies remain safe for all populations. Recent literature demonstrates the need for rigorous performance error analysis to identify issues such as algorithmic encoding of spurious correlations (eg, protected characteristics) or specific failure modes that may lead to patient harm. Guidelines for reporting on studies that evaluate AI medical devices require the mention of performance error analysis; however, there is still a lack of understanding around how performance errors should be analyzed in clinical studies, and what harms authors should aim to detect and report. OBJECTIVE: This systematic review will assess the frequency and severity of AI errors and adverse events (AEs) in randomized controlled trials (RCTs) investigating AI medical devices as interventions in clinical settings. The review will also explore how performance errors are analyzed including whether the analysis includes the investigation of subgroup-level outcomes. METHODS: This systematic review will identify and select RCTs assessing AI medical devices. Search strategies will be deployed in MEDLINE (Ovid), Embase (Ovid), Cochrane CENTRAL, and clinical trial registries to identify relevant papers. RCTs identified in bibliographic databases will be cross-referenced with clinical trial registries. The primary outcomes of interest are the frequency and severity of AI errors, patient harms, and reported AEs. Quality assessment of RCTs will be based on version 2 of the Cochrane risk-of-bias tool (RoB2). Data analysis will include a comparison of error rates and patient harms between study arms, and a meta-analysis of the rates of patient harm in control versus intervention arms will be conducted if appropriate. RESULTS: The project was registered on PROSPERO in February 2023. Preliminary searches have been completed and the search strategy has been designed in consultation with an information specialist and methodologist. Title and abstract screening started in September 2023. Full-text screening is ongoing and data collection and analysis began in April 2024. CONCLUSIONS: Evaluations of AI medical devices have shown promising results; however, reporting of studies has been variable. Detection, analysis, and reporting of performance errors and patient harms is vital to robustly assess the safety of AI medical devices in RCTs. Scoping searches have illustrated that the reporting of harms is variable, often with no mention of AEs. The findings of this systematic review will identify the frequency and severity of AI performance errors and patient harms and generate insights into how errors should be analyzed to account for both overall and subgroup performance. TRIAL REGISTRATION: PROSPERO CRD42023387747; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=387747. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): PRR1-10.2196/51614.


Asunto(s)
Algoritmos , Inteligencia Artificial , Ensayos Clínicos Controlados Aleatorios como Asunto , Humanos , Ensayos Clínicos Controlados Aleatorios como Asunto/métodos , Revisiones Sistemáticas como Asunto , Daño del Paciente/prevención & control , Equipos y Suministros/efectos adversos , Equipos y Suministros/normas , Proyectos de Investigación
4.
Artículo en Inglés | MEDLINE | ID: mdl-38740720

RESUMEN

PURPOSE: Automated prostate disease classification on multi-parametric MRI has recently shown promising results with the use of convolutional neural networks (CNNs). The vision transformer (ViT) is a convolutional free architecture which only exploits the self-attention mechanism and has surpassed CNNs in some natural imaging classification tasks. However, these models are not very robust to textural shifts in the input space. In MRI, we often have to deal with textural shift arising from varying acquisition protocols. Here, we focus on the ability of models to generalise well to new magnet strengths for MRI. METHOD: We propose a new framework to improve the robustness of vision transformer-based models for disease classification by constructing discrete representations of the data using vector quantisation. We sample a subset of the discrete representations to form the input into a transformer-based model. We use cross-attention in our transformer model to combine the discrete representations of T2-weighted and apparent diffusion coefficient (ADC) images. RESULTS: We analyse the robustness of our model by training on a 1.5 T scanner and test on a 3 T scanner and vice versa. Our approach achieves SOTA performance for classification of lesions on prostate MRI and outperforms various other CNN and transformer-based models in terms of robustness to domain shift and perturbations in the input space. CONCLUSION: We develop a method to improve the robustness of transformer-based disease classification of prostate lesions on MRI using discrete representations of the T2-weighted and ADC images.

6.
Nat Methods ; 21(2): 182-194, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38347140

RESUMEN

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.


Asunto(s)
Inteligencia Artificial
7.
Nat Methods ; 21(2): 195-212, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38347141

RESUMEN

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.


Asunto(s)
Algoritmos , Procesamiento de Imagen Asistido por Computador , Aprendizaje Automático , Semántica
8.
Insights Imaging ; 15(1): 47, 2024 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-38361108

RESUMEN

OBJECTIVES: MAchine Learning In MyelomA Response (MALIMAR) is an observational clinical study combining "real-world" and clinical trial data, both retrospective and prospective. Images were acquired on three MRI scanners over a 10-year window at two institutions, leading to a need for extensive curation. METHODS: Curation involved image aggregation, pseudonymisation, allocation between project phases, data cleaning, upload to an XNAT repository visible from multiple sites, annotation, incorporation of machine learning research outputs and quality assurance using programmatic methods. RESULTS: A total of 796 whole-body MR imaging sessions from 462 subjects were curated. A major change in scan protocol part way through the retrospective window meant that approximately 30% of available imaging sessions had properties that differed significantly from the remainder of the data. Issues were found with a vendor-supplied clinical algorithm for "composing" whole-body images from multiple imaging stations. Historic weaknesses in a digital video disk (DVD) research archive (already addressed by the mid-2010s) were highlighted by incomplete datasets, some of which could not be completely recovered. The final dataset contained 736 imaging sessions for 432 subjects. Software was written to clean and harmonise data. Implications for the subsequent machine learning activity are considered. CONCLUSIONS: MALIMAR exemplifies the vital role that curation plays in machine learning studies that use real-world data. A research repository such as XNAT facilitates day-to-day management, ensures robustness and consistency and enhances the value of the final dataset. The types of process described here will be vital for future large-scale multi-institutional and multi-national imaging projects. CRITICAL RELEVANCE STATEMENT: This article showcases innovative data curation methods using a state-of-the-art image repository platform; such tools will be vital for managing the large multi-institutional datasets required to train and validate generalisable ML algorithms and future foundation models in medical imaging. KEY POINTS: • Heterogeneous data in the MALIMAR study required the development of novel curation strategies. • Correction of multiple problems affecting the real-world data was successful, but implications for machine learning are still being evaluated. • Modern image repositories have rich application programming interfaces enabling data enrichment and programmatic QA, making them much more than simple "image marts".

9.
Commun Med (Lond) ; 4(1): 21, 2024 Feb 19.
Artículo en Inglés | MEDLINE | ID: mdl-38374436

RESUMEN

BACKGROUND: Breast density is an important risk factor for breast cancer complemented by a higher risk of cancers being missed during screening of dense breasts due to reduced sensitivity of mammography. Automated, deep learning-based prediction of breast density could provide subject-specific risk assessment and flag difficult cases during screening. However, there is a lack of evidence for generalisability across imaging techniques and, importantly, across race. METHODS: This study used a large, racially diverse dataset with 69,697 mammographic studies comprising 451,642 individual images from 23,057 female participants. A deep learning model was developed for four-class BI-RADS density prediction. A comprehensive performance evaluation assessed the generalisability across two imaging techniques, full-field digital mammography (FFDM) and two-dimensional synthetic (2DS) mammography. A detailed subgroup performance and bias analysis assessed the generalisability across participants' race. RESULTS: Here we show that a model trained on FFDM-only achieves a 4-class BI-RADS classification accuracy of 80.5% (79.7-81.4) on FFDM and 79.4% (78.5-80.2) on unseen 2DS data. When trained on both FFDM and 2DS images, the performance increases to 82.3% (81.4-83.0) and 82.3% (81.3-83.1). Racial subgroup analysis shows unbiased performance across Black, White, and Asian participants, despite a separate analysis confirming that race can be predicted from the images with a high accuracy of 86.7% (86.0-87.4). CONCLUSIONS: Deep learning-based breast density prediction generalises across imaging techniques and race. No substantial disparities are found for any subgroup, including races that were never seen during model development, suggesting that density predictions are unbiased.


Women with dense breasts have a higher risk of breast cancer. For dense breasts, it is also more difficult to spot cancer in mammograms, which are the X-ray images commonly used for breast cancer screening. Thus, knowing about an individual's breast density provides important information to doctors and screening participants. This study investigated whether an artificial intelligence algorithm (AI) can be used to accurately determine the breast density by analysing mammograms. The study tested whether such an algorithm performs equally well across different imaging devices, and importantly, across individuals from different self-reported race groups. A large, racially diverse dataset was used to evaluate the algorithm's performance. The results show that there were no substantial differences in the accuracy for any of the groups, providing important assurances that AI can be used safely and ethically for automated prediction of breast density.

10.
ArXiv ; 2024 Feb 23.
Artículo en Inglés | MEDLINE | ID: mdl-36945687

RESUMEN

Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.

11.
Radiol Artif Intell ; 5(6): e230060, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-38074789

RESUMEN

Purpose: To analyze a recently published chest radiography foundation model for the presence of biases that could lead to subgroup performance disparities across biologic sex and race. Materials and Methods: This Health Insurance Portability and Accountability Act-compliant retrospective study used 127 118 chest radiographs from 42 884 patients (mean age, 63 years ± 17 [SD]; 23 623 male, 19 261 female) from the CheXpert dataset that were collected between October 2002 and July 2017. To determine the presence of bias in features generated by a chest radiography foundation model and baseline deep learning model, dimensionality reduction methods together with two-sample Kolmogorov-Smirnov tests were used to detect distribution shifts across sex and race. A comprehensive disease detection performance analysis was then performed to associate any biases in the features to specific disparities in classification performance across patient subgroups. Results: Ten of 12 pairwise comparisons across biologic sex and race showed statistically significant differences in the studied foundation model, compared with four significant tests in the baseline model. Significant differences were found between male and female (P < .001) and Asian and Black (P < .001) patients in the feature projections that primarily capture disease. Compared with average model performance across all subgroups, classification performance on the "no finding" label decreased between 6.8% and 7.8% for female patients, and performance in detecting "pleural effusion" decreased between 10.7% and 11.6% for Black patients. Conclusion: The studied chest radiography foundation model demonstrated racial and sex-related bias, which led to disparate performance across patient subgroups; thus, this model may be unsafe for clinical applications.Keywords: Conventional Radiography, Computer Application-Detection/Diagnosis, Chest Radiography, Bias, Foundation Models Supplemental material is available for this article. Published under a CC BY 4.0 license.See also commentary by Czum and Parr in this issue.

12.
Nat Med ; 29(12): 3044-3049, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37973948

RESUMEN

Artificial intelligence (AI) has the potential to improve breast cancer screening; however, prospective evidence of the safe implementation of AI into real clinical practice is limited. A commercially available AI system was implemented as an additional reader to standard double reading to flag cases for further arbitration review among screened women. Performance was assessed prospectively in three phases: a single-center pilot rollout, a wider multicenter pilot rollout and a full live rollout. The results showed that, compared to double reading, implementing the AI-assisted additional-reader process could achieve 0.7-1.6 additional cancer detection per 1,000 cases, with 0.16-0.30% additional recalls, 0-0.23% unnecessary recalls and a 0.1-1.9% increase in positive predictive value (PPV) after 7-11% additional human reads of AI-flagged cases (equating to 4-6% additional overall reading workload). The majority of cancerous cases detected by the AI-assisted additional-reader process were invasive (83.3%) and small-sized (≤10 mm, 47.0%). This evaluation suggests that using AI as an additional reader can improve the early detection of breast cancer with relevant prognostic features, with minimal to no unnecessary recalls. Although the AI-assisted additional-reader workflow requires additional reads, the higher PPV suggests that it can increase screening effectiveness.


Asunto(s)
Neoplasias de la Mama , Femenino , Humanos , Inteligencia Artificial , Neoplasias de la Mama/diagnóstico , Detección Precoz del Cáncer/métodos , Mamografía/métodos , Variaciones Dependientes del Observador , Estudios Prospectivos , Estudios Retrospectivos
13.
Nat Commun ; 14(1): 6608, 2023 10 19.
Artículo en Inglés | MEDLINE | ID: mdl-37857643

RESUMEN

Image-based prediction models for disease detection are sensitive to changes in data acquisition such as the replacement of scanner hardware or updates to the image processing software. The resulting differences in image characteristics may lead to drifts in clinically relevant performance metrics which could cause harm in clinical decision making, even for models that generalise in terms of area under the receiver-operating characteristic curve. We propose Unsupervised Prediction Alignment, a generic automatic recalibration method that requires no ground truth annotations and only limited amounts of unlabelled example images from the shifted data distribution. We illustrate the effectiveness of the proposed method to detect and correct performance drift in mammography-based breast cancer screening and on publicly available histopathology data. We show that the proposed method can preserve the expected performance in terms of sensitivity/specificity under various realistic scenarios of image acquisition shift, thus offering an important safeguard for clinical deployment.


Asunto(s)
Neoplasias de la Mama , Mamografía , Humanos , Femenino , Mamografía/métodos , Neoplasias de la Mama/diagnóstico por imagen , Sensibilidad y Especificidad , Curva ROC , Programas Informáticos , Procesamiento de Imagen Asistido por Computador/métodos
14.
IEEE Trans Med Imaging ; 42(11): 3323-3335, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37276115

RESUMEN

This paper presents an effective and general data augmentation framework for medical image segmentation. We adopt a computationally efficient and data-efficient gradient-based meta-learning scheme to explicitly align the distribution of training and validation data which is used as a proxy for unseen test data. We improve the current data augmentation strategies with two core designs. First, we learn class-specific training-time data augmentation (TRA) effectively increasing the heterogeneity within the training subsets and tackling the class imbalance common in segmentation. Second, we jointly optimize TRA and test-time data augmentation (TEA), which are closely connected as both aim to align the training and test data distribution but were so far considered separately in previous works. We demonstrate the effectiveness of our method on four medical image segmentation tasks across different scenarios with two state-of-the-art segmentation models, DeepMedic and nnU-Net. Extensive experimentation shows that the proposed data augmentation framework can significantly and consistently improve the segmentation performance when compared to existing solutions. Code is publicly available at https://github.com/ZerojumpLine/JCSAugment.

15.
Invest Radiol ; 58(12): 823-831, 2023 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-37358356

RESUMEN

OBJECTIVES: Whole-body magnetic resonance imaging (WB-MRI) has been demonstrated to be efficient and cost-effective for cancer staging. The study aim was to develop a machine learning (ML) algorithm to improve radiologists' sensitivity and specificity for metastasis detection and reduce reading times. MATERIALS AND METHODS: A retrospective analysis of 438 prospectively collected WB-MRI scans from multicenter Streamline studies (February 2013-September 2016) was undertaken. Disease sites were manually labeled using Streamline reference standard. Whole-body MRI scans were randomly allocated to training and testing sets. A model for malignant lesion detection was developed based on convolutional neural networks and a 2-stage training strategy. The final algorithm generated lesion probability heat maps. Using a concurrent reader paradigm, 25 radiologists (18 experienced, 7 inexperienced in WB-/MRI) were randomly allocated WB-MRI scans with or without ML support to detect malignant lesions over 2 or 3 reading rounds. Reads were undertaken in the setting of a diagnostic radiology reading room between November 2019 and March 2020. Reading times were recorded by a scribe. Prespecified analysis included sensitivity, specificity, interobserver agreement, and reading time of radiology readers to detect metastases with or without ML support. Reader performance for detection of the primary tumor was also evaluated. RESULTS: Four hundred thirty-three evaluable WB-MRI scans were allocated to algorithm training (245) or radiology testing (50 patients with metastases, from primary 117 colon [n = 117] or lung [n = 71] cancer). Among a total 562 reads by experienced radiologists over 2 reading rounds, per-patient specificity was 86.2% (ML) and 87.7% (non-ML) (-1.5% difference; 95% confidence interval [CI], -6.4%, 3.5%; P = 0.39). Sensitivity was 66.0% (ML) and 70.0% (non-ML) (-4.0% difference; 95% CI, -13.5%, 5.5%; P = 0.344). Among 161 reads by inexperienced readers, per-patient specificity in both groups was 76.3% (0% difference; 95% CI, -15.0%, 15.0%; P = 0.613), with sensitivity of 73.3% (ML) and 60.0% (non-ML) (13.3% difference; 95% CI, -7.9%, 34.5%; P = 0.313). Per-site specificity was high (>90%) for all metastatic sites and experience levels. There was high sensitivity for the detection of primary tumors (lung cancer detection rate of 98.6% with and without ML [0.0% difference; 95% CI, -2.0%, 2.0%; P = 1.00], colon cancer detection rate of 89.0% with and 90.6% without ML [-1.7% difference; 95% CI, -5.6%, 2.2%; P = 0.65]). When combining all reads from rounds 1 and 2, reading times fell by 6.2% (95% CI, -22.8%, 10.0%) when using ML. Round 2 read-times fell by 32% (95% CI, 20.8%, 42.8%) compared with round 1. Within round 2, there was a significant decrease in read-time when using ML support, estimated as 286 seconds (or 11%) quicker ( P = 0.0281), using regression analysis to account for reader experience, read round, and tumor type. Interobserver variance suggests moderate agreement, Cohen κ = 0.64; 95% CI, 0.47, 0.81 (with ML), and Cohen κ = 0.66; 95% CI, 0.47, 0.81 (without ML). CONCLUSIONS: There was no evidence of a significant difference in per-patient sensitivity and specificity for detecting metastases or the primary tumor using concurrent ML compared with standard WB-MRI. Radiology read-times with or without ML support fell for round 2 reads compared with round 1, suggesting that readers familiarized themselves with the study reading method. During the second reading round, there was a significant reduction in reading time when using ML support.


Asunto(s)
Neoplasias del Colon , Neoplasias Pulmonares , Humanos , Imagen por Resonancia Magnética/métodos , Estudios Retrospectivos , Imagen de Cuerpo Entero/métodos , Pulmón , Neoplasias Pulmonares/diagnóstico por imagen , Neoplasias del Colon/diagnóstico por imagen , Sensibilidad y Especificidad , Pruebas Diagnósticas de Rutina
16.
BMJ Open ; 13(5): e069594, 2023 05 23.
Artículo en Inglés | MEDLINE | ID: mdl-37221026

RESUMEN

INTRODUCTION: A significant environmental risk factor for neurodegenerative disease is traumatic brain injury (TBI). However, it is not clear how TBI results in ongoing chronic neurodegeneration. Animal studies show that systemic inflammation is signalled to the brain. This can result in sustained and aggressive microglial activation, which in turn is associated with widespread neurodegeneration. We aim to evaluate systemic inflammation as a mediator of ongoing neurodegeneration after TBI. METHODS AND ANALYSIS: TBI-braINFLAMM will combine data already collected from two large prospective TBI studies. The CREACTIVE study, a broad consortium which enrolled >8000 patients with TBI to have CT scans and blood samples in the hyperacute period, has data available from 854 patients. The BIO-AX-TBI study recruited 311 patients to have acute CT scans, longitudinal blood samples and longitudinal MRI brain scans. The BIO-AX-TBI study also has data from 102 healthy and 24 non-TBI trauma controls, comprising blood samples (both control groups) and MRI scans (healthy controls only). All blood samples from BIO-AX-TBI and CREACTIVE have already been tested for neuronal injury markers (GFAP, tau and NfL), and CREACTIVE blood samples have been tested for inflammatory cytokines. We will additionally test inflammatory cytokine levels from the already collected longitudinal blood samples in the BIO-AX-TBI study, as well as matched microdialysate and blood samples taken during the acute period from a subgroup of patients with TBI (n=18).We will use this unique dataset to characterise post-TBI systemic inflammation, and its relationships with injury severity and ongoing neurodegeneration. ETHICS AND DISSEMINATION: Ethical approval for this study has been granted by the London-Camberwell St Giles Research Ethics Committee (17/LO/2066). Results will be submitted for publication in peer-review journals, presented at conferences and inform the design of larger observational and experimental medicine studies assessing the role and management of post-TBI systemic inflammation.


Asunto(s)
Lesiones Traumáticas del Encéfalo , Enfermedades Neurodegenerativas , Animales , Estudios Prospectivos , Encéfalo , Citocinas , Inflamación
17.
BMC Cancer ; 23(1): 460, 2023 May 19.
Artículo en Inglés | MEDLINE | ID: mdl-37208717

RESUMEN

BACKGROUND: Double reading (DR) in screening mammography increases cancer detection and lowers recall rates, but has sustainability challenges due to workforce shortages. Artificial intelligence (AI) as an independent reader (IR) in DR may provide a cost-effective solution with the potential to improve screening performance. Evidence for AI to generalise across different patient populations, screening programmes and equipment vendors, however, is still lacking. METHODS: This retrospective study simulated DR with AI as an IR, using data representative of real-world deployments (275,900 cases, 177,882 participants) from four mammography equipment vendors, seven screening sites, and two countries. Non-inferiority and superiority were assessed for relevant screening metrics. RESULTS: DR with AI, compared with human DR, showed at least non-inferior recall rate, cancer detection rate, sensitivity, specificity and positive predictive value (PPV) for each mammography vendor and site, and superior recall rate, specificity, and PPV for some. The simulation indicates that using AI would have increased arbitration rate (3.3% to 12.3%), but could have reduced human workload by 30.0% to 44.8%. CONCLUSIONS: AI has potential as an IR in the DR workflow across different screening programmes, mammography equipment and geographies, substantially reducing human reader workload while maintaining or improving standard of care. TRIAL REGISTRATION: ISRCTN18056078 (20/03/2019; retrospectively registered).


Asunto(s)
Neoplasias de la Mama , Humanos , Femenino , Neoplasias de la Mama/diagnóstico por imagen , Mamografía , Inteligencia Artificial , Estudios Retrospectivos , Detección Precoz del Cáncer , Tamizaje Masivo
18.
Ophthalmol Sci ; 3(3): 100294, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37113474

RESUMEN

Purpose: To study the individual course of retinal changes caused by healthy aging using deep learning. Design: Retrospective analysis of a large data set of retinal OCT images. Participants: A total of 85 709 adults between the age of 40 and 75 years of whom OCT images were acquired in the scope of the UK Biobank population study. Methods: We created a counterfactual generative adversarial network (GAN), a type of neural network that learns from cross-sectional, retrospective data. It then synthesizes high-resolution counterfactual OCT images and longitudinal time series. These counterfactuals allow visualization and analysis of hypothetical scenarios in which certain characteristics of the imaged subject, such as age or sex, are altered, whereas other attributes, crucially the subject's identity and image acquisition settings, remain fixed. Main Outcome Measures: Using our counterfactual GAN, we investigated subject-specific changes in the retinal layer structure as a function of age and sex. In particular, we measured changes in the retinal nerve fiber layer (RNFL), combined ganglion cell layer plus inner plexiform layer (GCIPL), inner nuclear layer to the inner boundary of the retinal pigment epithelium (INL-RPE), and retinal pigment epithelium (RPE). Results: Our counterfactual GAN is able to smoothly visualize the individual course of retinal aging. Across all counterfactual images, the RNFL, GCIPL, INL-RPE, and RPE changed by -0.1 µm ± 0.1 µm, -0.5 µm ± 0.2 µm, -0.2 µm ± 0.1 µm, and 0.1 µm ± 0.1 µm, respectively, per decade of age. These results agree well with previous studies based on the same cohort from the UK Biobank population study. Beyond population-wide average measures, our counterfactual GAN allows us to explore whether the retinal layers of a given eye will increase in thickness, decrease in thickness, or stagnate as a subject ages. Conclusion: This study demonstrates how counterfactual GANs can aid research into retinal aging by generating high-resolution, high-fidelity OCT images, and longitudinal time series. Ultimately, we envision that they will enable clinical experts to derive and explore hypotheses for potential imaging biomarkers for healthy and pathologic aging that can be refined and tested in prospective clinical trials. Financial Disclosures: Proprietary or commercial disclosure may be found after the references.

19.
IEEE Trans Med Imaging ; 42(6): 1885-1896, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-37022408

RESUMEN

Background samples provide key contextual information for segmenting regions of interest (ROIs). However, they always cover a diverse set of structures, causing difficulties for the segmentation model to learn good decision boundaries with high sensitivity and precision. The issue concerns the highly heterogeneous nature of the background class, resulting in multi-modal distributions. Empirically, we find that neural networks trained with heterogeneous background struggle to map the corresponding contextual samples to compact clusters in feature space. As a result, the distribution over background logit activations may shift across the decision boundary, leading to systematic over-segmentation across different datasets and tasks. In this study, we propose context label learning (CoLab) to improve the context representations by decomposing the background class into several subclasses. Specifically, we train an auxiliary network as a task generator, along with the primary segmentation model, to automatically generate context labels that positively affect the ROI segmentation accuracy. Extensive experiments are conducted on several challenging segmentation tasks and datasets. The results demonstrate that CoLab can guide the segmentation model to map the logits of background samples away from the decision boundary, resulting in significantly improved segmentation accuracy. Code is available at https://github.com/ZerojumpLine/CoLab.


Asunto(s)
Redes Neurales de la Computación , Semántica , Procesamiento de Imagen Asistido por Computador
20.
Int J Comput Assist Radiol Surg ; 18(10): 1875-1883, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-36862365

RESUMEN

PURPOSE: In curriculum learning, the idea is to train on easier samples first and gradually increase the difficulty, while in self-paced learning, a pacing function defines the speed to adapt the training progress. While both methods heavily rely on the ability to score the difficulty of data samples, an optimal scoring function is still under exploration. METHODOLOGY: Distillation is a knowledge transfer approach where a teacher network guides a student network by feeding a sequence of random samples. We argue that guiding student networks with an efficient curriculum strategy can improve model generalization and robustness. For this purpose, we design an uncertainty-based paced curriculum learning in self-distillation for medical image segmentation. We fuse the prediction uncertainty and annotation boundary uncertainty to develop a novel paced-curriculum distillation (P-CD). We utilize the teacher model to obtain prediction uncertainty and spatially varying label smoothing with Gaussian kernel to generate segmentation boundary uncertainty from the annotation. We also investigate the robustness of our method by applying various types and severity of image perturbation and corruption. RESULTS: The proposed technique is validated on two medical datasets of breast ultrasound image segmentation and robot-assisted surgical scene segmentation and achieved significantly better performance in terms of segmentation and robustness. CONCLUSION: P-CD improves the performance and obtains better generalization and robustness over the dataset shift. While curriculum learning requires extensive tuning of hyper-parameters for pacing function, the level of performance improvement suppresses this limitation.


Asunto(s)
Curriculum , Destilación , Humanos , Incertidumbre , Aprendizaje , Algoritmos , Procesamiento de Imagen Asistido por Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...