Pesquisa | Portal Regional da BVS

1.

Robust prostate disease classification using transformers with discrete representations.

Santhirasekaram, Ainkaran; Winkler, Mathias; Rockall, Andrea; Glocker, Ben.

Int J Comput Assist Radiol Surg ; 2024 May 13.

Artigo em Inglês | MEDLINE | ID: mdl-38740720

RESUMO

PURPOSE: Automated prostate disease classification on multi-parametric MRI has recently shown promising results with the use of convolutional neural networks (CNNs). The vision transformer (ViT) is a convolutional free architecture which only exploits the self-attention mechanism and has surpassed CNNs in some natural imaging classification tasks. However, these models are not very robust to textural shifts in the input space. In MRI, we often have to deal with textural shift arising from varying acquisition protocols. Here, we focus on the ability of models to generalise well to new magnet strengths for MRI. METHOD: We propose a new framework to improve the robustness of vision transformer-based models for disease classification by constructing discrete representations of the data using vector quantisation. We sample a subset of the discrete representations to form the input into a transformer-based model. We use cross-attention in our transformer model to combine the discrete representations of T2-weighted and apparent diffusion coefficient (ADC) images. RESULTS: We analyse the robustness of our model by training on a 1.5 T scanner and test on a 3 T scanner and vice versa. Our approach achieves SOTA performance for classification of lesions on prostate MRI and outperforms various other CNN and transformer-based models in terms of robustness to domain shift and perturbations in the input space. CONCLUSION: We develop a method to improve the robustness of transformer-based disease classification of prostate lesions on MRI using discrete representations of the T2-weighted and ADC images.

2.

TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods.

Collins, Gary S; Moons, Karel G M; Dhiman, Paula; Riley, Richard D; Beam, Andrew L; Van Calster, Ben; Ghassemi, Marzyeh; Liu, Xiaoxuan; Reitsma, Johannes B; van Smeden, Maarten; Boulesteix, Anne-Laure; Camaradou, Jennifer Catherine; Celi, Leo Anthony; Denaxas, Spiros; Denniston, Alastair K; Glocker, Ben; Golub, Robert M; Harvey, Hugh; Heinze, Georg; Hoffman, Michael M; Kengne, André Pascal; Lam, Emily; Lee, Naomi; Loder, Elizabeth W; Maier-Hein, Lena; Mateen, Bilal A; McCradden, Melissa D; Oakden-Rayner, Lauren; Ordish, Johan; Parnell, Richard; Rose, Sherri; Singh, Karandeep; Wynants, Laure; Logullo, Patricia.

BMJ ; 385: e078378, 2024 04 16.

Artigo em Inglês | MEDLINE | ID: mdl-38626948

Assuntos

Técnicas de Apoio para a Decisão , Modelos Estatísticos , Humanos , Prognóstico , Lista de Checagem

3.

Generalisable deep learning method for mammographic density prediction across imaging techniques and self-reported race.

Khara, Galvin; Trivedi, Hari; Newell, Mary S; Patel, Ravi; Rijken, Tobias; Kecskemethy, Peter; Glocker, Ben.

Commun Med (Lond) ; 4(1): 21, 2024 Feb 19.

Artigo em Inglês | MEDLINE | ID: mdl-38374436

RESUMO

BACKGROUND: Breast density is an important risk factor for breast cancer complemented by a higher risk of cancers being missed during screening of dense breasts due to reduced sensitivity of mammography. Automated, deep learning-based prediction of breast density could provide subject-specific risk assessment and flag difficult cases during screening. However, there is a lack of evidence for generalisability across imaging techniques and, importantly, across race. METHODS: This study used a large, racially diverse dataset with 69,697 mammographic studies comprising 451,642 individual images from 23,057 female participants. A deep learning model was developed for four-class BI-RADS density prediction. A comprehensive performance evaluation assessed the generalisability across two imaging techniques, full-field digital mammography (FFDM) and two-dimensional synthetic (2DS) mammography. A detailed subgroup performance and bias analysis assessed the generalisability across participants' race. RESULTS: Here we show that a model trained on FFDM-only achieves a 4-class BI-RADS classification accuracy of 80.5% (79.7-81.4) on FFDM and 79.4% (78.5-80.2) on unseen 2DS data. When trained on both FFDM and 2DS images, the performance increases to 82.3% (81.4-83.0) and 82.3% (81.3-83.1). Racial subgroup analysis shows unbiased performance across Black, White, and Asian participants, despite a separate analysis confirming that race can be predicted from the images with a high accuracy of 86.7% (86.0-87.4). CONCLUSIONS: Deep learning-based breast density prediction generalises across imaging techniques and race. No substantial disparities are found for any subgroup, including races that were never seen during model development, suggesting that density predictions are unbiased.

Women with dense breasts have a higher risk of breast cancer. For dense breasts, it is also more difficult to spot cancer in mammograms, which are the X-ray images commonly used for breast cancer screening. Thus, knowing about an individual's breast density provides important information to doctors and screening participants. This study investigated whether an artificial intelligence algorithm (AI) can be used to accurately determine the breast density by analysing mammograms. The study tested whether such an algorithm performs equally well across different imaging devices, and importantly, across individuals from different self-reported race groups. A large, racially diverse dataset was used to evaluate the algorithm's performance. The results show that there were no substantial differences in the accuracy for any of the groups, providing important assurances that AI can be used safely and ethically for automated prediction of breast density.

4.

Curation of myeloma observational study MALIMAR using XNAT: solving the challenges posed by real-world data.

Doran, Simon J; Barfoot, Theo; Wedlake, Linda; Winfield, Jessica M; Petts, James; Glocker, Ben; Li, Xingfeng; Leach, Martin; Kaiser, Martin; Barwick, Tara D; Chaidos, Aristeidis; Satchwell, Laura; Soneji, Neil; Elgendy, Khalil; Sheeka, Alexander; Wallitt, Kathryn; Koh, Dow-Mu; Messiou, Christina; Rockall, Andrea.

Insights Imaging ; 15(1): 47, 2024 Feb 16.

Artigo em Inglês | MEDLINE | ID: mdl-38361108

RESUMO

OBJECTIVES: MAchine Learning In MyelomA Response (MALIMAR) is an observational clinical study combining "real-world" and clinical trial data, both retrospective and prospective. Images were acquired on three MRI scanners over a 10-year window at two institutions, leading to a need for extensive curation. METHODS: Curation involved image aggregation, pseudonymisation, allocation between project phases, data cleaning, upload to an XNAT repository visible from multiple sites, annotation, incorporation of machine learning research outputs and quality assurance using programmatic methods. RESULTS: A total of 796 whole-body MR imaging sessions from 462 subjects were curated. A major change in scan protocol part way through the retrospective window meant that approximately 30% of available imaging sessions had properties that differed significantly from the remainder of the data. Issues were found with a vendor-supplied clinical algorithm for "composing" whole-body images from multiple imaging stations. Historic weaknesses in a digital video disk (DVD) research archive (already addressed by the mid-2010s) were highlighted by incomplete datasets, some of which could not be completely recovered. The final dataset contained 736 imaging sessions for 432 subjects. Software was written to clean and harmonise data. Implications for the subsequent machine learning activity are considered. CONCLUSIONS: MALIMAR exemplifies the vital role that curation plays in machine learning studies that use real-world data. A research repository such as XNAT facilitates day-to-day management, ensures robustness and consistency and enhances the value of the final dataset. The types of process described here will be vital for future large-scale multi-institutional and multi-national imaging projects. CRITICAL RELEVANCE STATEMENT: This article showcases innovative data curation methods using a state-of-the-art image repository platform; such tools will be vital for managing the large multi-institutional datasets required to train and validate generalisable ML algorithms and future foundation models in medical imaging. KEY POINTS: â¢ Heterogeneous data in the MALIMAR study required the development of novel curation strategies. â¢ Correction of multiple problems affecting the real-world data was successful, but implications for machine learning are still being evaluated. â¢ Modern image repositories have rich application programming interfaces enabling data enrichment and programmatic QA, making them much more than simple "image marts".

5.

Understanding metric-related pitfalls in image analysis validation.

Reinke, Annika; Tizabi, Minu D; Baumgartner, Michael; Eisenmann, Matthias; Heckmann-Nötzel, Doreen; Kavur, A Emre; Rädsch, Tim; Sudre, Carole H; Acion, Laura; Antonelli, Michela; Arbel, Tal; Bakas, Spyridon; Benis, Arriel; Buettner, Florian; Cardoso, M Jorge; Cheplygina, Veronika; Chen, Jianxu; Christodoulou, Evangelia; Cimini, Beth A; Farahani, Keyvan; Ferrer, Luciana; Galdran, Adrian; van Ginneken, Bram; Glocker, Ben; Godau, Patrick; Hashimoto, Daniel A; Hoffman, Michael M; Huisman, Merel; Isensee, Fabian; Jannin, Pierre; Kahn, Charles E; Kainmueller, Dagmar; Kainz, Bernhard; Karargyris, Alexandros; Kleesiek, Jens; Kofler, Florian; Kooi, Thijs; Kopp-Schneider, Annette; Kozubek, Michal; Kreshuk, Anna; Kurc, Tahsin; Landman, Bennett A; Litjens, Geert; Madani, Amin; Maier-Hein, Klaus; Martel, Anne L; Meijering, Erik; Menze, Bjoern; Moons, Karel G M; Müller, Henning.

Nat Methods ; 21(2): 182-194, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38347140

RESUMO

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.

Assuntos

Inteligência Artificial

6.

Metrics reloaded: recommendations for image analysis validation.

Maier-Hein, Lena; Reinke, Annika; Godau, Patrick; Tizabi, Minu D; Buettner, Florian; Christodoulou, Evangelia; Glocker, Ben; Isensee, Fabian; Kleesiek, Jens; Kozubek, Michal; Reyes, Mauricio; Riegler, Michael A; Wiesenfarth, Manuel; Kavur, A Emre; Sudre, Carole H; Baumgartner, Michael; Eisenmann, Matthias; Heckmann-Nötzel, Doreen; Rädsch, Tim; Acion, Laura; Antonelli, Michela; Arbel, Tal; Bakas, Spyridon; Benis, Arriel; Blaschko, Matthew B; Cardoso, M Jorge; Cheplygina, Veronika; Cimini, Beth A; Collins, Gary S; Farahani, Keyvan; Ferrer, Luciana; Galdran, Adrian; van Ginneken, Bram; Haase, Robert; Hashimoto, Daniel A; Hoffman, Michael M; Huisman, Merel; Jannin, Pierre; Kahn, Charles E; Kainmueller, Dagmar; Kainz, Bernhard; Karargyris, Alexandros; Karthikesalingam, Alan; Kofler, Florian; Kopp-Schneider, Annette; Kreshuk, Anna; Kurc, Tahsin; Landman, Bennett A; Litjens, Geert; Madani, Amin.

Nat Methods ; 21(2): 195-212, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38347141

RESUMO

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.

Assuntos

Algoritmos , Processamento de Imagem Assistida por Computador , Aprendizado de Máquina , Semântica

7.

Understanding metric-related pitfalls in image analysis validation.

Reinke, Annika; Tizabi, Minu D; Baumgartner, Michael; Eisenmann, Matthias; Heckmann-Nötzel, Doreen; Kavur, A Emre; Rädsch, Tim; Sudre, Carole H; Acion, Laura; Antonelli, Michela; Arbel, Tal; Bakas, Spyridon; Benis, Arriel; Blaschko, Matthew; Buettner, Florian; Cardoso, M Jorge; Cheplygina, Veronika; Chen, Jianxu; Christodoulou, Evangelia; Cimini, Beth A; Collins, Gary S; Farahani, Keyvan; Ferrer, Luciana; Galdran, Adrian; van Ginneken, Bram; Glocker, Ben; Godau, Patrick; Haase, Robert; Hashimoto, Daniel A; Hoffman, Michael M; Huisman, Merel; Isensee, Fabian; Jannin, Pierre; Kahn, Charles E; Kainmueller, Dagmar; Kainz, Bernhard; Karargyris, Alexandros; Karthikesalingam, Alan; Kenngott, Hannes; Kleesiek, Jens; Kofler, Florian; Kooi, Thijs; Kopp-Schneider, Annette; Kozubek, Michal; Kreshuk, Anna; Kurc, Tahsin; Landman, Bennett A; Litjens, Geert; Madani, Amin; Maier-Hein, Klaus.

ArXiv ; 2024 Feb 23.

Artigo em Inglês | MEDLINE | ID: mdl-36945687

RESUMO

Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.

8.

Risk of Bias in Chest Radiography Deep Learning Foundation Models.

Glocker, Ben; Jones, Charles; Roschewitz, Mélanie; Winzeck, Stefan.

Radiol Artif Intell ; 5(6): e230060, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-38074789

RESUMO

Purpose: To analyze a recently published chest radiography foundation model for the presence of biases that could lead to subgroup performance disparities across biologic sex and race. Materials and Methods: This Health Insurance Portability and Accountability Act-compliant retrospective study used 127 118 chest radiographs from 42 884 patients (mean age, 63 years ± 17 [SD]; 23 623 male, 19 261 female) from the CheXpert dataset that were collected between October 2002 and July 2017. To determine the presence of bias in features generated by a chest radiography foundation model and baseline deep learning model, dimensionality reduction methods together with two-sample Kolmogorov-Smirnov tests were used to detect distribution shifts across sex and race. A comprehensive disease detection performance analysis was then performed to associate any biases in the features to specific disparities in classification performance across patient subgroups. Results: Ten of 12 pairwise comparisons across biologic sex and race showed statistically significant differences in the studied foundation model, compared with four significant tests in the baseline model. Significant differences were found between male and female (P < .001) and Asian and Black (P < .001) patients in the feature projections that primarily capture disease. Compared with average model performance across all subgroups, classification performance on the "no finding" label decreased between 6.8% and 7.8% for female patients, and performance in detecting "pleural effusion" decreased between 10.7% and 11.6% for Black patients. Conclusion: The studied chest radiography foundation model demonstrated racial and sex-related bias, which led to disparate performance across patient subgroups; thus, this model may be unsafe for clinical applications.Keywords: Conventional Radiography, Computer Application-Detection/Diagnosis, Chest Radiography, Bias, Foundation Models Supplemental material is available for this article. Published under a CC BY 4.0 license.See also commentary by Czum and Parr in this issue.

9.

Prospective implementation of AI-assisted screen reading to improve early detection of breast cancer.

Ng, Annie Y; Oberije, Cary J G; Ambrózay, Éva; Szabó, Endre; Serfozo, Orsolya; Karpati, Edit; Fox, Georgia; Glocker, Ben; Morris, Elizabeth A; Forrai, Gábor; Kecskemethy, Peter D.

Nat Med ; 29(12): 3044-3049, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37973948

RESUMO

Artificial intelligence (AI) has the potential to improve breast cancer screening; however, prospective evidence of the safe implementation of AI into real clinical practice is limited. A commercially available AI system was implemented as an additional reader to standard double reading to flag cases for further arbitration review among screened women. Performance was assessed prospectively in three phases: a single-center pilot rollout, a wider multicenter pilot rollout and a full live rollout. The results showed that, compared to double reading, implementing the AI-assisted additional-reader process could achieve 0.7-1.6 additional cancer detection per 1,000 cases, with 0.16-0.30% additional recalls, 0-0.23% unnecessary recalls and a 0.1-1.9% increase in positive predictive value (PPV) after 7-11% additional human reads of AI-flagged cases (equating to 4-6% additional overall reading workload). The majority of cancerous cases detected by the AI-assisted additional-reader process were invasive (83.3%) and small-sized (≤10 mm, 47.0%). This evaluation suggests that using AI as an additional reader can improve the early detection of breast cancer with relevant prognostic features, with minimal to no unnecessary recalls. Although the AI-assisted additional-reader workflow requires additional reads, the higher PPV suggests that it can increase screening effectiveness.

Assuntos

Neoplasias da Mama , Feminino , Humanos , Inteligência Artificial , Neoplasias da Mama/diagnóstico , Detecção Precoce de Câncer/métodos , Mamografia/métodos , Variações Dependentes do Observador , Estudos Prospectivos , Estudos Retrospectivos

10.

Automatic correction of performance drift under acquisition shift in medical image classification.

Roschewitz, Mélanie; Khara, Galvin; Yearsley, Joe; Sharma, Nisha; James, Jonathan J; Ambrózay, Éva; Heroux, Adam; Kecskemethy, Peter; Rijken, Tobias; Glocker, Ben.

Nat Commun ; 14(1): 6608, 2023 10 19.

Artigo em Inglês | MEDLINE | ID: mdl-37857643

RESUMO

Image-based prediction models for disease detection are sensitive to changes in data acquisition such as the replacement of scanner hardware or updates to the image processing software. The resulting differences in image characteristics may lead to drifts in clinically relevant performance metrics which could cause harm in clinical decision making, even for models that generalise in terms of area under the receiver-operating characteristic curve. We propose Unsupervised Prediction Alignment, a generic automatic recalibration method that requires no ground truth annotations and only limited amounts of unlabelled example images from the shifted data distribution. We illustrate the effectiveness of the proposed method to detect and correct performance drift in mammography-based breast cancer screening and on publicly available histopathology data. We show that the proposed method can preserve the expected performance in terms of sensitivity/specificity under various realistic scenarios of image acquisition shift, thus offering an important safeguard for clinical deployment.

Assuntos

Neoplasias da Mama , Mamografia , Humanos , Feminino , Mamografia/métodos , Neoplasias da Mama/diagnóstico por imagem , Sensibilidade e Especificidade , Curva ROC , Software , Processamento de Imagem Assistida por Computador/métodos

11.

Joint Optimization of Class-Specific Training- and Test-Time Data Augmentation in Segmentation.

Li, Zeju; Kamnitsas, Konstantinos; Dou, Qi; Qin, Chen; Glocker, Ben.

IEEE Trans Med Imaging ; 42(11): 3323-3335, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37276115

RESUMO

This paper presents an effective and general data augmentation framework for medical image segmentation. We adopt a computationally efficient and data-efficient gradient-based meta-learning scheme to explicitly align the distribution of training and validation data which is used as a proxy for unseen test data. We improve the current data augmentation strategies with two core designs. First, we learn class-specific training-time data augmentation (TRA) effectively increasing the heterogeneity within the training subsets and tackling the class imbalance common in segmentation. Second, we jointly optimize TRA and test-time data augmentation (TEA), which are closely connected as both aim to align the training and test data distribution but were so far considered separately in previous works. We demonstrate the effectiveness of our method on four medical image segmentation tasks across different scenarios with two state-of-the-art segmentation models, DeepMedic and nnU-Net. Extensive experimentation shows that the proposed data augmentation framework can significantly and consistently improve the segmentation performance when compared to existing solutions. Code is publicly available at https://github.com/ZerojumpLine/JCSAugment.

12.

Development and Evaluation of Machine Learning in Whole-Body Magnetic Resonance Imaging for Detecting Metastases in Patients With Lung or Colon Cancer: A Diagnostic Test Accuracy Study.

Rockall, Andrea G; Li, Xingfeng; Johnson, Nicholas; Lavdas, Ioannis; Santhakumaran, Shalini; Prevost, A Toby; Punwani, Shonit; Goh, Vicky; Barwick, Tara D; Bharwani, Nishat; Sandhu, Amandeep; Sidhu, Harbir; Plumb, Andrew; Burn, James; Fagan, Aisling; Wengert, Georg J; Koh, Dow-Mu; Reczko, Krystyna; Dou, Qi; Warwick, Jane; Liu, Xinxue; Messiou, Christina; Tunariu, Nina; Boavida, Peter; Soneji, Neil; Johnston, Edward W; Kelly-Morland, Christian; De Paepe, Katja N; Sokhi, Heminder; Wallitt, Kathryn; Lakhani, Amish; Russell, James; Salib, Miriam; Vinnicombe, Sarah; Haq, Adam; Aboagye, Eric O; Taylor, Stuart; Glocker, Ben.

Invest Radiol ; 58(12): 823-831, 2023 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-37358356

RESUMO

OBJECTIVES: Whole-body magnetic resonance imaging (WB-MRI) has been demonstrated to be efficient and cost-effective for cancer staging. The study aim was to develop a machine learning (ML) algorithm to improve radiologists' sensitivity and specificity for metastasis detection and reduce reading times. MATERIALS AND METHODS: A retrospective analysis of 438 prospectively collected WB-MRI scans from multicenter Streamline studies (February 2013-September 2016) was undertaken. Disease sites were manually labeled using Streamline reference standard. Whole-body MRI scans were randomly allocated to training and testing sets. A model for malignant lesion detection was developed based on convolutional neural networks and a 2-stage training strategy. The final algorithm generated lesion probability heat maps. Using a concurrent reader paradigm, 25 radiologists (18 experienced, 7 inexperienced in WB-/MRI) were randomly allocated WB-MRI scans with or without ML support to detect malignant lesions over 2 or 3 reading rounds. Reads were undertaken in the setting of a diagnostic radiology reading room between November 2019 and March 2020. Reading times were recorded by a scribe. Prespecified analysis included sensitivity, specificity, interobserver agreement, and reading time of radiology readers to detect metastases with or without ML support. Reader performance for detection of the primary tumor was also evaluated. RESULTS: Four hundred thirty-three evaluable WB-MRI scans were allocated to algorithm training (245) or radiology testing (50 patients with metastases, from primary 117 colon [n = 117] or lung [n = 71] cancer). Among a total 562 reads by experienced radiologists over 2 reading rounds, per-patient specificity was 86.2% (ML) and 87.7% (non-ML) (-1.5% difference; 95% confidence interval [CI], -6.4%, 3.5%; P = 0.39). Sensitivity was 66.0% (ML) and 70.0% (non-ML) (-4.0% difference; 95% CI, -13.5%, 5.5%; P = 0.344). Among 161 reads by inexperienced readers, per-patient specificity in both groups was 76.3% (0% difference; 95% CI, -15.0%, 15.0%; P = 0.613), with sensitivity of 73.3% (ML) and 60.0% (non-ML) (13.3% difference; 95% CI, -7.9%, 34.5%; P = 0.313). Per-site specificity was high (>90%) for all metastatic sites and experience levels. There was high sensitivity for the detection of primary tumors (lung cancer detection rate of 98.6% with and without ML [0.0% difference; 95% CI, -2.0%, 2.0%; P = 1.00], colon cancer detection rate of 89.0% with and 90.6% without ML [-1.7% difference; 95% CI, -5.6%, 2.2%; P = 0.65]). When combining all reads from rounds 1 and 2, reading times fell by 6.2% (95% CI, -22.8%, 10.0%) when using ML. Round 2 read-times fell by 32% (95% CI, 20.8%, 42.8%) compared with round 1. Within round 2, there was a significant decrease in read-time when using ML support, estimated as 286 seconds (or 11%) quicker ( P = 0.0281), using regression analysis to account for reader experience, read round, and tumor type. Interobserver variance suggests moderate agreement, Cohen κ = 0.64; 95% CI, 0.47, 0.81 (with ML), and Cohen κ = 0.66; 95% CI, 0.47, 0.81 (without ML). CONCLUSIONS: There was no evidence of a significant difference in per-patient sensitivity and specificity for detecting metastases or the primary tumor using concurrent ML compared with standard WB-MRI. Radiology read-times with or without ML support fell for round 2 reads compared with round 1, suggesting that readers familiarized themselves with the study reading method. During the second reading round, there was a significant reduction in reading time when using ML support.

Assuntos

Neoplasias do Colo , Neoplasias Pulmonares , Humanos , Imageamento por Ressonância Magnética/métodos , Estudos Retrospectivos , Imagem Corporal Total/métodos , Pulmão , Neoplasias Pulmonares/diagnóstico por imagem , Neoplasias do Colo/diagnóstico por imagem , Sensibilidade e Especificidade , Testes Diagnósticos de Rotina

13.

Investigating the characteristics and correlates of systemic inflammation after traumatic brain injury: the TBI-BraINFLAMM study.

Li, Lucia M; Heslegrave, Amanda; Soreq, Eyal; Nattino, Giovanni; Rosnati, Margherita; Garbero, Elena; Zimmerman, Karl A; Graham, Neil S N; Moro, Federico; Novelli, Deborah; Gradisek, Primoz; Magnoni, Sandra; Glocker, Ben; Zetterberg, Henrik; Bertolini, Guido; Sharp, David J.

BMJ Open ; 13(5): e069594, 2023 05 23.

Artigo em Inglês | MEDLINE | ID: mdl-37221026

RESUMO

INTRODUCTION: A significant environmental risk factor for neurodegenerative disease is traumatic brain injury (TBI). However, it is not clear how TBI results in ongoing chronic neurodegeneration. Animal studies show that systemic inflammation is signalled to the brain. This can result in sustained and aggressive microglial activation, which in turn is associated with widespread neurodegeneration. We aim to evaluate systemic inflammation as a mediator of ongoing neurodegeneration after TBI. METHODS AND ANALYSIS: TBI-braINFLAMM will combine data already collected from two large prospective TBI studies. The CREACTIVE study, a broad consortium which enrolled >8000 patients with TBI to have CT scans and blood samples in the hyperacute period, has data available from 854 patients. The BIO-AX-TBI study recruited 311 patients to have acute CT scans, longitudinal blood samples and longitudinal MRI brain scans. The BIO-AX-TBI study also has data from 102 healthy and 24 non-TBI trauma controls, comprising blood samples (both control groups) and MRI scans (healthy controls only). All blood samples from BIO-AX-TBI and CREACTIVE have already been tested for neuronal injury markers (GFAP, tau and NfL), and CREACTIVE blood samples have been tested for inflammatory cytokines. We will additionally test inflammatory cytokine levels from the already collected longitudinal blood samples in the BIO-AX-TBI study, as well as matched microdialysate and blood samples taken during the acute period from a subgroup of patients with TBI (n=18).We will use this unique dataset to characterise post-TBI systemic inflammation, and its relationships with injury severity and ongoing neurodegeneration. ETHICS AND DISSEMINATION: Ethical approval for this study has been granted by the London-Camberwell St Giles Research Ethics Committee (17/LO/2066). Results will be submitted for publication in peer-review journals, presented at conferences and inform the design of larger observational and experimental medicine studies assessing the role and management of post-TBI systemic inflammation.

Assuntos

Lesões Encefálicas Traumáticas , Doenças Neurodegenerativas , Animais , Estudos Prospectivos , Encéfalo , Citocinas , Inflamação

14.

Multi-vendor evaluation of artificial intelligence as an independent reader for double reading in breast cancer screening on 275,900 mammograms.

Sharma, Nisha; Ng, Annie Y; James, Jonathan J; Khara, Galvin; Ambrózay, Éva; Austin, Christopher C; Forrai, Gábor; Fox, Georgia; Glocker, Ben; Heindl, Andreas; Karpati, Edit; Rijken, Tobias M; Venkataraman, Vignesh; Yearsley, Joseph E; Kecskemethy, Peter D.

BMC Cancer ; 23(1): 460, 2023 May 19.

Artigo em Inglês | MEDLINE | ID: mdl-37208717

RESUMO

BACKGROUND: Double reading (DR) in screening mammography increases cancer detection and lowers recall rates, but has sustainability challenges due to workforce shortages. Artificial intelligence (AI) as an independent reader (IR) in DR may provide a cost-effective solution with the potential to improve screening performance. Evidence for AI to generalise across different patient populations, screening programmes and equipment vendors, however, is still lacking. METHODS: This retrospective study simulated DR with AI as an IR, using data representative of real-world deployments (275,900 cases, 177,882 participants) from four mammography equipment vendors, seven screening sites, and two countries. Non-inferiority and superiority were assessed for relevant screening metrics. RESULTS: DR with AI, compared with human DR, showed at least non-inferior recall rate, cancer detection rate, sensitivity, specificity and positive predictive value (PPV) for each mammography vendor and site, and superior recall rate, specificity, and PPV for some. The simulation indicates that using AI would have increased arbitration rate (3.3% to 12.3%), but could have reduced human workload by 30.0% to 44.8%. CONCLUSIONS: AI has potential as an IR in the DR workflow across different screening programmes, mammography equipment and geographies, substantially reducing human reader workload while maintaining or improving standard of care. TRIAL REGISTRATION: ISRCTN18056078 (20/03/2019; retrospectively registered).

Assuntos

Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/diagnóstico por imagem , Mamografia , Inteligência Artificial , Estudos Retrospectivos , Detecção Precoce de Câncer , Programas de Rastreamento

15.

Exploring Healthy Retinal Aging with Deep Learning.

Menten, Martin J; Holland, Robbie; Leingang, Oliver; Bogunovic, Hrvoje; Hagag, Ahmed M; Kaye, Rebecca; Riedl, Sophie; Traber, Ghislaine L; Hassan, Osama N; Pawlowski, Nick; Glocker, Ben; Fritsche, Lars G; Scholl, Hendrik P N; Sivaprasad, Sobha; Schmidt-Erfurth, Ursula; Rueckert, Daniel; Lotery, Andrew J.

Ophthalmol Sci ; 3(3): 100294, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37113474

RESUMO

Purpose: To study the individual course of retinal changes caused by healthy aging using deep learning. Design: Retrospective analysis of a large data set of retinal OCT images. Participants: A total of 85 709 adults between the age of 40 and 75 years of whom OCT images were acquired in the scope of the UK Biobank population study. Methods: We created a counterfactual generative adversarial network (GAN), a type of neural network that learns from cross-sectional, retrospective data. It then synthesizes high-resolution counterfactual OCT images and longitudinal time series. These counterfactuals allow visualization and analysis of hypothetical scenarios in which certain characteristics of the imaged subject, such as age or sex, are altered, whereas other attributes, crucially the subject's identity and image acquisition settings, remain fixed. Main Outcome Measures: Using our counterfactual GAN, we investigated subject-specific changes in the retinal layer structure as a function of age and sex. In particular, we measured changes in the retinal nerve fiber layer (RNFL), combined ganglion cell layer plus inner plexiform layer (GCIPL), inner nuclear layer to the inner boundary of the retinal pigment epithelium (INL-RPE), and retinal pigment epithelium (RPE). Results: Our counterfactual GAN is able to smoothly visualize the individual course of retinal aging. Across all counterfactual images, the RNFL, GCIPL, INL-RPE, and RPE changed by -0.1 µm ± 0.1 µm, -0.5 µm ± 0.2 µm, -0.2 µm ± 0.1 µm, and 0.1 µm ± 0.1 µm, respectively, per decade of age. These results agree well with previous studies based on the same cohort from the UK Biobank population study. Beyond population-wide average measures, our counterfactual GAN allows us to explore whether the retinal layers of a given eye will increase in thickness, decrease in thickness, or stagnate as a subject ages. Conclusion: This study demonstrates how counterfactual GANs can aid research into retinal aging by generating high-resolution, high-fidelity OCT images, and longitudinal time series. Ultimately, we envision that they will enable clinical experts to derive and explore hypotheses for potential imaging biomarkers for healthy and pathologic aging that can be refined and tested in prospective clinical trials. Financial Disclosures: Proprietary or commercial disclosure may be found after the references.

16.

Context Label Learning: Improving Background Class Representations in Semantic Segmentation.

Li, Zeju; Kamnitsas, Konstantinos; Ouyang, Cheng; Chen, Chen; Glocker, Ben.

IEEE Trans Med Imaging ; 42(6): 1885-1896, 2023 06.

Artigo em Inglês | MEDLINE | ID: mdl-37022408

RESUMO

Background samples provide key contextual information for segmenting regions of interest (ROIs). However, they always cover a diverse set of structures, causing difficulties for the segmentation model to learn good decision boundaries with high sensitivity and precision. The issue concerns the highly heterogeneous nature of the background class, resulting in multi-modal distributions. Empirically, we find that neural networks trained with heterogeneous background struggle to map the corresponding contextual samples to compact clusters in feature space. As a result, the distribution over background logit activations may shift across the decision boundary, leading to systematic over-segmentation across different datasets and tasks. In this study, we propose context label learning (CoLab) to improve the context representations by decomposing the background class into several subclasses. Specifically, we train an auxiliary network as a task generator, along with the primary segmentation model, to automatically generate context labels that positively affect the ROI segmentation accuracy. Extensive experiments are conducted on several challenging segmentation tasks and datasets. The results demonstrate that CoLab can guide the segmentation model to map the logits of background samples away from the decision boundary, resulting in significantly improved segmentation accuracy. Code is available at https://github.com/ZerojumpLine/CoLab.

Assuntos

Redes Neurais de Computação , Semântica , Processamento de Imagem Assistida por Computador

17.

Paced-curriculum distillation with prediction and label uncertainty for image segmentation.

Islam, Mobarakol; Seenivasan, Lalithkumar; Sharan, S P; Viekash, V K; Gupta, Bhavesh; Glocker, Ben; Ren, Hongliang.

Int J Comput Assist Radiol Surg ; 18(10): 1875-1883, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-36862365

RESUMO

PURPOSE: In curriculum learning, the idea is to train on easier samples first and gradually increase the difficulty, while in self-paced learning, a pacing function defines the speed to adapt the training progress. While both methods heavily rely on the ability to score the difficulty of data samples, an optimal scoring function is still under exploration. METHODOLOGY: Distillation is a knowledge transfer approach where a teacher network guides a student network by feeding a sequence of random samples. We argue that guiding student networks with an efficient curriculum strategy can improve model generalization and robustness. For this purpose, we design an uncertainty-based paced curriculum learning in self-distillation for medical image segmentation. We fuse the prediction uncertainty and annotation boundary uncertainty to develop a novel paced-curriculum distillation (P-CD). We utilize the teacher model to obtain prediction uncertainty and spatially varying label smoothing with Gaussian kernel to generate segmentation boundary uncertainty from the annotation. We also investigate the robustness of our method by applying various types and severity of image perturbation and corruption. RESULTS: The proposed technique is validated on two medical datasets of breast ultrasound image segmentation and robot-assisted surgical scene segmentation and achieved significantly better performance in terms of segmentation and robustness. CONCLUSION: P-CD improves the performance and obtains better generalization and robustness over the dataset shift. While curriculum learning requires extensive tuning of hyper-parameters for pacing function, the level of performance improvement suppresses this limitation.

Assuntos

Currículo , Destilação , Humanos , Incerteza , Aprendizagem , Algoritmos , Processamento de Imagem Assistida por Computador

18.

Use of Support Vector Machines Approach via ComBat Harmonized Diffusion Tensor Imaging for the Diagnosis and Prognosis of Mild Traumatic Brain Injury: A CENTER-TBI Study.

Siqueira Pinto, Maíra; Winzeck, Stefan; Kornaropoulos, Evgenios N; Richter, Sophie; Paolella, Roberto; Correia, Marta M; Glocker, Ben; Williams, Guy; Vik, Anne; Posti, Jussi P; Haberg, Asta; Stenberg, Jonas; Guns, Pieter-Jan; den Dekker, Arnold J; Menon, David K; Sijbers, Jan; Van Dyck, Pieter; Newcombe, Virginia F J.

J Neurotrauma ; 40(13-14): 1317-1338, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-36974359

RESUMO

The prediction of functional outcome after mild traumatic brain injury (mTBI) is challenging. Conventional magnetic resonance imaging (MRI) does not do a good job of explaining the variance in outcome, as many patients with incomplete recovery will have normal-appearing clinical neuroimaging. More advanced quantitative techniques such as diffusion MRI (dMRI), can detect microstructural changes not otherwise visible, and so may offer a way to improve outcome prediction. In this study, we explore the potential of linear support vector classifiers (linearSVCs) to identify dMRI biomarkers that can predict recovery after mTBI. Simultaneously, the harmonization of fractional anisotropy (FA) and mean diffusivity (MD) via ComBat was evaluated and compared for the classification performances of the linearSVCs. We included dMRI scans of 179 mTBI patients and 85 controls from the Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI), a multi-center prospective cohort study, up to 21 days post-injury. Patients were dichotomized according to their Extended Glasgow Outcome Scale (GOSE) scores at 6 months into complete (n = 92; GOSE = 8) and incomplete (n = 87; GOSE <8) recovery. FA and MD maps were registered to a common space and harmonized via the ComBat algorithm. LinearSVCs were applied to distinguish: (1) mTBI patients from controls and (2) mTBI patients with complete from those with incomplete recovery. The linearSVCs were trained on (1) age and sex only, (2) non-harmonized, (3) two-category-harmonized ComBat, and (4) three-category-harmonized ComBat FA and MD images combined with age and sex. White matter FA and MD voxels and regions of interest (ROIs) within the John Hopkins University (JHU) atlas were examined. Recursive feature elimination was used to identify the 10% most discriminative voxels or the 10 most discriminative ROIs for each implementation. mTBI patients displayed significantly higher MD and lower FA values than controls for the discriminative voxels and ROIs. For the analysis between mTBI patients and controls, the three-category-harmonized ComBat FA and MD voxel-wise linearSVC provided significantly higher classification scores (81.4% accuracy, 93.3% sensitivity, 80.3% F1-score, and 0.88 area under the curve [AUC], p < 0.05) compared with the classification based on age and sex only and the ROI approaches (accuracies: 59.8% and 64.8%, respectively). Similar to the analysis between mTBI patients and controls, the three-category-harmonized ComBat FA and MD maps voxelwise approach yields statistically significant prediction scores between mTBI patients with complete and those with incomplete recovery (71.8% specificity, 66.2% F1-score and 0.71 AUC, p < 0.05), which provided a modest increase in the classification score (accuracy: 66.4%) compared with the classification based on age and sex only and ROI-wise approaches (accuracy: 61.4% and 64.7%, respectively). This study showed that ComBat harmonized FA and MD may provide additional information for diagnosis and prognosis of mTBI in a multi-modal machine learning approach. These findings demonstrate that dMRI may assist in the early detection of patients at risk of incomplete recovery from mTBI.

Assuntos

Concussão Encefálica , Lesões Encefálicas Traumáticas , Humanos , Concussão Encefálica/diagnóstico , Imagem de Tensor de Difusão/métodos , Máquina de Vetores de Suporte , Estudos Prospectivos , Prognóstico , Anisotropia , Encéfalo/patologia

19.

Algorithmic encoding of protected characteristics in chest X-ray disease detection models.

Glocker, Ben; Jones, Charles; Bernhardt, Mélanie; Winzeck, Stefan.

EBioMedicine ; 89: 104467, 2023 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-36791660

RESUMO

BACKGROUND: It has been rightfully emphasized that the use of AI for clinical decision making could amplify health disparities. An algorithm may encode protected characteristics, and then use this information for making predictions due to undesirable correlations in the (historical) training data. It remains unclear how we can establish whether such information is actually used. Besides the scarcity of data from underserved populations, very little is known about how dataset biases manifest in predictive models and how this may result in disparate performance. This article aims to shed some light on these issues by exploring methodology for subgroup analysis in image-based disease detection models. METHODS: We utilize two publicly available chest X-ray datasets, CheXpert and MIMIC-CXR, to study performance disparities across race and biological sex in deep learning models. We explore test set resampling, transfer learning, multitask learning, and model inspection to assess the relationship between the encoding of protected characteristics and disease detection performance across subgroups. FINDINGS: We confirm subgroup disparities in terms of shifted true and false positive rates which are partially removed after correcting for population and prevalence shifts in the test sets. We find that transfer learning alone is insufficient for establishing whether specific patient information is used for making predictions. The proposed combination of test-set resampling, multitask learning, and model inspection reveals valuable insights about the way protected characteristics are encoded in the feature representations of deep neural networks. INTERPRETATION: Subgroup analysis is key for identifying performance disparities of AI models, but statistical differences across subgroups need to be taken into account when analyzing potential biases in disease detection. The proposed methodology provides a comprehensive framework for subgroup analysis enabling further research into the underlying causes of disparities. FUNDING: European Research Council Horizon 2020, UK Research and Innovation.

Assuntos

Aprendizado Profundo , Humanos , Raios X , Redes Neurais de Computação , Algoritmos , Radiografia

20.

Better Together: Data Harmonization and Cross-Study Analysis of Abdominal MRI Data From UK Biobank and the German National Cohort.

Gatidis, Sergios; Kart, Turkay; Fischer, Marc; Winzeck, Stefan; Glocker, Ben; Bai, Wenjia; Bülow, Robin; Emmel, Carina; Friedrich, Lena; Kauczor, Hans-Ulrich; Keil, Thomas; Kröncke, Thomas; Mayer, Philipp; Niendorf, Thoralf; Peters, Annette; Pischon, Tobias; Schaarschmidt, Benedikt M; Schmidt, Börge; Schulze, Matthias B; Umutle, Lale; Völzke, Henry; Küstner, Thomas; Bamberg, Fabian; Schölkopf, Bernhard; Rueckert, Daniel.

Invest Radiol ; 58(5): 346-354, 2023 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-36729536

RESUMO

OBJECTIVES: The UK Biobank (UKBB) and German National Cohort (NAKO) are among the largest cohort studies, capturing a wide range of health-related data from the general population, including comprehensive magnetic resonance imaging (MRI) examinations. The purpose of this study was to demonstrate how MRI data from these large-scale studies can be jointly analyzed and to derive comprehensive quantitative image-based phenotypes across the general adult population. MATERIALS AND METHODS: Image-derived features of abdominal organs (volumes of liver, spleen, kidneys, and pancreas; volumes of kidney hilum adipose tissue; and fat fractions of liver and pancreas) were extracted from T1-weighted Dixon MRI data of 17,996 participants of UKBB and NAKO based on quality-controlled deep learning generated organ segmentations. To enable valid cross-study analysis, we first analyzed the data generating process using methods of causal discovery. We subsequently harmonized data from UKBB and NAKO using the ComBat approach for batch effect correction. We finally performed quantile regression on harmonized data across studies providing quantitative models for the variation of image-derived features stratified for sex and dependent on age, height, and weight. RESULTS: Data from 8791 UKBB participants (49.9% female; age, 63 ± 7.5 years) and 9205 NAKO participants (49.1% female, age: 51.8 ± 11.4 years) were analyzed. Analysis of the data generating process revealed direct effects of age, sex, height, weight, and the data source (UKBB vs NAKO) on image-derived features. Correction of data source-related effects resulted in markedly improved alignment of image-derived features between UKBB and NAKO. Cross-study analysis on harmonized data revealed comprehensive quantitative models for the phenotypic variation of abdominal organs across the general adult population. CONCLUSIONS: Cross-study analysis of MRI data from UKBB and NAKO as proposed in this work can be helpful for future joint data analyses across cohorts linking genetic, environmental, and behavioral risk factors to MRI-derived phenotypes and provide reference values for clinical diagnostics.

Assuntos

Bancos de Espécimes Biológicos , Imageamento por Ressonância Magnética , Humanos , Feminino , Masculino , Imageamento por Ressonância Magnética/métodos , Estudos de Coortes , Abdome/diagnóstico por imagem , Reino Unido

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA