Search | VHL Regional Portal

Random expert sampling for deep learning segmentation of acute ischemic stroke on non-contrast CT.

Ostmeier, Sophie; Axelrod, Brian; Liu, Yongkai; Yu, Yannan; Jiang, Bin; Yuen, Nicole; Pulli, Benjamin; Verhaaren, Benjamin F J; Kaka, Hussam; Wintermark, Max; Michel, Patrik; Mahammedi, Abdelkader; Federau, Christian; Lansberg, Maarten G; Albers, Gregory W; Moseley, Michael E; Zaharchuk, Gregory; Heit, Jeremy J.

J Neurointerv Surg ; 2024 Feb 01.

Article in English | MEDLINE | ID: mdl-38302420

ABSTRACT

BACKGROUND: Outlining acutely infarcted tissue on non-contrast CT is a challenging task for which human inter-reader agreement is limited. We explored two different methods for training a supervised deep learning algorithm: one that used a segmentation defined by majority vote among experts and another that trained randomly on separate individual expert segmentations. METHODS: The data set consisted of 260 non-contrast CT studies in 233 patients with acute ischemic stroke recruited from the multicenter DEFUSE 3 (Endovascular Therapy Following Imaging Evaluation for Ischemic Stroke 3) trial. Additional external validation was performed using 33 patients with matched stroke onset times from the University Hospital Lausanne. A benchmark U-Net was trained on the reference annotations of three experienced neuroradiologists to segment ischemic brain tissue using majority vote and random expert sampling training schemes. The median of volume, overlap, and distance segmentation metrics were determined for agreement in lesion segmentations between (1) three experts, (2) the majority model and each expert, and (3) the random model and each expert. The two sided Wilcoxon signed rank test was used to compare performances (1) to 2) and (1) to (3). We further compared volumes with the 24 hour follow-up diffusion weighted imaging (DWI, final infarct core) and correlations with clinical outcome (modified Rankin Scale (mRS) at 90 days) with the Spearman method. RESULTS: The random model outperformed the inter-expert agreement ((1) to (2)) and the majority model ((1) to (3)) (dice 0.51±0.04 vs 0.36±0.05 (P<0.0001) vs 0.45±0.05 (P<0.0001)). The random model predicted volume correlated with clinical outcome (0.19, P<0.05), whereas the median expert volume and majority model volume did not. There was no significant difference when comparing the volume correlations between random model, median expert volume, and majority model to 24 hour follow-up DWI volume (P>0.05, n=51). CONCLUSION: The random model for ischemic injury delineation on non-contrast CT surpassed the inter-expert agreement ((1) to (2)) and the performance of the majority model ((1) to (3)). We showed that the random model volumetric measures of the model were consistent with 24 hour follow-up DWI.

USE-Evaluator: Performance metrics for medical image segmentation models supervised by uncertain, small or empty reference annotations in neuroimaging.

Ostmeier, Sophie; Axelrod, Brian; Isensee, Fabian; Bertels, Jeroen; Mlynash, Michael; Christensen, Soren; Lansberg, Maarten G; Albers, Gregory W; Sheth, Rajen; Verhaaren, Benjamin F J; Mahammedi, Abdelkader; Li, Li-Jia; Zaharchuk, Greg; Heit, Jeremy J.

Med Image Anal ; 90: 102927, 2023 Dec.

Article in English | MEDLINE | ID: mdl-37672900

ABSTRACT

Performance metrics for medical image segmentation models are used to measure the agreement between the reference annotation and the predicted segmentation. Usually, overlap metrics, such as the Dice, are used as a metric to evaluate the performance of these models in order for results to be comparable. However, there is a mismatch between the distributions of cases and the difficulty level of segmentation tasks in public data sets compared to clinical practice. Common metrics used to assess performance fail to capture the impact of this mismatch, particularly when dealing with datasets in clinical settings that involve challenging segmentation tasks, pathologies with low signal, and reference annotations that are uncertain, small, or empty. Limitations of common metrics may result in ineffective machine learning research in designing and optimizing models. To effectively evaluate the clinical value of such models, it is essential to consider factors such as the uncertainty associated with reference annotations, the ability to accurately measure performance regardless of the size of the reference annotation volume, and the classification of cases where reference annotations are empty. We study how uncertain, small, and empty reference annotations influence the value of metrics on a stroke in-house data set regardless of the model. We examine metrics behavior on the predictions of a standard deep learning framework in order to identify suitable metrics in such a setting. We compare our results to the BRATS 2019 and Spinal Cord public data sets. We show how uncertain, small, or empty reference annotations require a rethinking of the evaluation. The evaluation code was released to encourage further analysis of this topic https://github.com/SophieOstmeier/UncertainSmallEmpty.git.

Non-inferiority of deep learning ischemic stroke segmentation on non-contrast CT within 16-hours compared to expert neuroradiologists.

Ostmeier, Sophie; Axelrod, Brian; Verhaaren, Benjamin F J; Christensen, Soren; Mahammedi, Abdelkader; Liu, Yongkai; Pulli, Benjamin; Li, Li-Jia; Zaharchuk, Greg; Heit, Jeremy J.

Sci Rep ; 13(1): 16153, 2023 09 26.

Article in English | MEDLINE | ID: mdl-37752162

ABSTRACT

We determined if a convolutional neural network (CNN) deep learning model can accurately segment acute ischemic changes on non-contrast CT compared to neuroradiologists. Non-contrast CT (NCCT) examinations from 232 acute ischemic stroke patients who were enrolled in the DEFUSE 3 trial were included in this study. Three experienced neuroradiologists independently segmented hypodensity that reflected the ischemic core on each scan. The neuroradiologist with the most experience (expert A) served as the ground truth for deep learning model training. Two additional neuroradiologists' (experts B and C) segmentations were used for data testing. The 232 studies were randomly split into training and test sets. The training set was further randomly divided into 5 folds with training and validation sets. A 3-dimensional CNN architecture was trained and optimized to predict the segmentations of expert A from NCCT. The performance of the model was assessed using a set of volume, overlap, and distance metrics using non-inferiority thresholds of 20%, 3 ml, and 3 mm, respectively. The optimized model trained on expert A was compared to test experts B and C. We used a one-sided Wilcoxon signed-rank test to test for the non-inferiority of the model-expert compared to the inter-expert agreement. The final model performance for the ischemic core segmentation task reached a performance of 0.46 ± 0.09 Surface Dice at Tolerance 5mm and 0.47 ± 0.13 Dice when trained on expert A. Compared to the two test neuroradiologists the model-expert agreement was non-inferior to the inter-expert agreement, [Formula: see text]. The before, CNN accurately delineates the hypodense ischemic core on NCCT in acute ischemic stroke patients with an accuracy comparable to neuroradiologists.

Subject(s)

Deep Learning , Ischemic Stroke , Stroke , Humans , Ischemic Stroke/diagnostic imaging , Neural Networks, Computer , Radiologists , Tomography, X-Ray Computed , Stroke/diagnostic imaging

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL