Search | VHL Regional Portal

1.

Limitations of Out-of-Distribution Detection in 3D Medical Image Segmentation.

Vasiliuk, Anton; Frolova, Daria; Belyaev, Mikhail; Shirokikh, Boris.

J Imaging ; 9(9)2023 Sep 18.

Article in English | MEDLINE | ID: mdl-37754955

ABSTRACT

Deep learning models perform unreliably when the data come from a distribution different from the training one. In critical applications such as medical imaging, out-of-distribution (OOD) detection methods help to identify such data samples, preventing erroneous predictions. In this paper, we further investigate OOD detection effectiveness when applied to 3D medical image segmentation. We designed several OOD challenges representing clinically occurring cases and found that none of the methods achieved acceptable performance. Methods not dedicated to segmentation severely failed to perform in the designed setups; the best mean false-positive rate at a 95% true-positive rate (FPR) was 0.59. Segmentation-dedicated methods still achieved suboptimal performance, with the best mean FPR being 0.31 (lower is better). To indicate this suboptimality, we developed a simple method called Intensity Histogram Features (IHF), which performed comparably or better in the same challenges, with a mean FPR of 0.25. Our findings highlight the limitations of the existing OOD detection methods with 3D medical images and present a promising avenue for improving them. To facilitate research in this area, we release the designed challenges as a publicly available benchmark and formulate practical criteria to test the generalization of OOD detection beyond the suggested benchmark. We also propose IHF as a solid baseline to contest emerging methods.

2.

Interpretable vertebral fracture quantification via anchor-free landmarks localization.

Zakharov, Alexey; Pisov, Maxim; Bukharaev, Alim; Petraikin, Alexey; Morozov, Sergey; Gombolevskiy, Victor; Belyaev, Mikhail.

Med Image Anal ; 83: 102646, 2023 01.

Article in English | MEDLINE | ID: mdl-36279768

ABSTRACT

Vertebral body compression fractures are early signs of osteoporosis. Though these fractures are visible on Computed Tomography (CT) images, they are frequently missed by radiologists in clinical settings. Prior research on automatic methods of vertebral fracture classification proves its reliable quality; however, existing methods provide hard-to-interpret outputs and sometimes fail to process cases with severe abnormalities such as highly pathological vertebrae or scoliosis. We propose a new two-step algorithm to localize the vertebral column in 3D CT images and then detect individual vertebrae and quantify fractures in 2D simultaneously. We train neural networks for both steps using a simple 6-keypoints based annotation scheme, which corresponds precisely to the current clinical recommendation. Our algorithm has no exclusion criteria, processes 3D CT in 2 seconds on a single GPU, and provides an interpretable and verifiable output. The method approaches expert-level performance and demonstrates state-of-the-art results in vertebrae 3D localization (the average error is 1mm), vertebrae 2D detection (precision and recall are 0.99), and fracture identification (ROC AUC at the patient level is up to 0.96). Our anchor-free vertebra detection network shows excellent generalizability on a new domain by achieving ROC AUC 0.95, sensitivity 0.85, specificity 0.9 on a challenging VerSe dataset with many unseen vertebra types.

Subject(s)

Spinal Fractures , Humans , Spinal Fractures/diagnostic imaging

3.

Adaptation to CT Reconstruction Kernels by Enforcing Cross-Domain Feature Maps Consistency.

Shimovolos, Stanislav; Shushko, Andrey; Belyaev, Mikhail; Shirokikh, Boris.

J Imaging ; 8(9)2022 Aug 30.

Article in English | MEDLINE | ID: mdl-36135401

ABSTRACT

Deep learning methods provide significant assistance in analyzing coronavirus disease (COVID-19) in chest computed tomography (CT) images, including identification, severity assessment, and segmentation. Although the earlier developed methods address the lack of data and specific annotations, the current goal is to build a robust algorithm for clinical use, having a larger pool of available data. With the larger datasets, the domain shift problem arises, affecting the performance of methods on the unseen data. One of the critical sources of domain shift in CT images is the difference in reconstruction kernels used to generate images from the raw data (sinograms). In this paper, we show a decrease in the COVID-19 segmentation quality of the model trained on the smooth and tested on the sharp reconstruction kernels. Furthermore, we compare several domain adaptation approaches to tackle the problem, such as task-specific augmentation and unsupervised adversarial learning. Finally, we propose the unsupervised adaptation method, called F-Consistency, that outperforms the previous approaches. Our method exploits a set of unlabeled CT image pairs which differ only in reconstruction kernels within every pair. It enforces the similarity of the network's hidden representations (feature maps) by minimizing the mean squared error (MSE) between paired feature maps. We show our method achieving a 0.64 Dice Score on the test dataset with unseen sharp kernels, compared to the 0.56 Dice Score of the baseline model. Moreover, F-Consistency scores 0.80 Dice Score between predictions on the paired images, which almost doubles the baseline score of 0.46 and surpasses the other methods. We also show F-Consistency to better generalize on the unseen kernels and without the presence of the COVID-19 lesions than the other methods trained on unlabeled data.

4.

Systematic Clinical Evaluation of a Deep Learning Method for Medical Image Segmentation: Radiosurgery Application.

Shirokikh, Boris; Dalechina, Alexandra; Shevtsov, Alexey; Krivov, Egor; Kostjuchenko, Valery; Durgaryan, Amayak; Galkin, Mikhail; Golanov, Andrey; Belyaev, Mikhail.

IEEE J Biomed Health Inform ; 26(7): 3037-3046, 2022 07.

Article in English | MEDLINE | ID: mdl-35213318

ABSTRACT

We systematically evaluate a Deep Learning model in a 3D medical image segmentation task. With our model, we address the flaws of manual segmentation: high inter-rater contouring variability and time consumption of the contouring process. The main extension over the existing evaluations is the careful and detailed analysis that could be further generalized on other medical image segmentation tasks. Firstly, we analyze the changes in the inter-rater detection agreement. We show that the model reduces the number of detection disagreements by [Formula: see text] [Formula: see text]. Secondly, we show that the model improves the inter-rater contouring agreement from [Formula: see text] to [Formula: see text] surface Dice Score [Formula: see text]. Thirdly, we show that the model accelerates the delineation process between [Formula: see text] and [Formula: see text] times [Formula: see text]. Finally, we design the setup of the clinical experiment to either exclude or estimate the evaluation biases; thus, preserving the significance of the results. Besides the clinical evaluation, we also share intuitions and practical ideas for building an efficient DL-based model for 3D medical image segmentation.

Subject(s)

Deep Learning , Radiosurgery , Humans , Image Processing, Computer-Assisted/methods , Imaging, Three-Dimensional

5.

Accelerating 3D Medical Image Segmentation by Adaptive Small-Scale Target Localization.

Shirokikh, Boris; Shevtsov, Alexey; Dalechina, Alexandra; Krivov, Egor; Kostjuchenko, Valery; Golanov, Andrey; Gombolevskiy, Victor; Morozov, Sergey; Belyaev, Mikhail.

J Imaging ; 7(2)2021 Feb 13.

Article in English | MEDLINE | ID: mdl-34460634

ABSTRACT

The prevailing approach for three-dimensional (3D) medical image segmentation is to use convolutional networks. Recently, deep learning methods have achieved human-level performance in several important applied problems, such as volumetry for lung-cancer diagnosis or delineation for radiation therapy planning. However, state-of-the-art architectures, such as U-Net and DeepMedic, are computationally heavy and require workstations accelerated with graphics processing units for fast inference. However, scarce research has been conducted concerning enabling fast central processing unit computations for such networks. Our paper fills this gap. We propose a new segmentation method with a human-like technique to segment a 3D study. First, we analyze the image at a small scale to identify areas of interest and then process only relevant feature-map patches. Our method not only reduces the inference time from 10 min to 15 s but also preserves state-of-the-art segmentation quality, as we illustrate in the set of experiments with two large datasets.

6.

Challenges in Building of Deep Learning Models for Glioblastoma Segmentation: Evidence from Clinical Data.

Kurmukov, Anvar; Dalechina, Aleksandra; Saparov, Talgat; Belyaev, Mikhail; Zolotova, Svetlana; Golanov, Andrey; Nikolaeva, Anna.

Stud Health Technol Inform ; 281: 298-302, 2021 May 27.

Article in English | MEDLINE | ID: mdl-34042753

ABSTRACT

In this article, we compare the performance of a state-of-the-art segmentation network (UNet) on two different glioblastoma (GB) segmentation datasets. Our experiments show that the same training procedure yields almost twice as bad results on the retrospective clinical data compared to the BraTS challenge data (in terms of Dice score). We discuss possible reasons for such an outcome, including inter-rater variability and high variability in magnetic resonance imaging (MRI) scanners and scanner settings. The high performance of segmentation models, demonstrated on preselected imaging data, does not bring the community closer to using these algorithms in clinical settings. We believe that a clinically applicable deep learning architecture requires a shift from unified datasets to heterogeneous data.

Subject(s)

Deep Learning , Glioblastoma , Algorithms , Glioblastoma/diagnostic imaging , Humans , Magnetic Resonance Imaging , Retrospective Studies

7.

CT-Based COVID-19 triage: Deep multitask learning improves joint identification and severity quantification.

Goncharov, Mikhail; Pisov, Maxim; Shevtsov, Alexey; Shirokikh, Boris; Kurmukov, Anvar; Blokhin, Ivan; Chernina, Valeria; Solovev, Alexander; Gombolevskiy, Victor; Morozov, Sergey; Belyaev, Mikhail.

Med Image Anal ; 71: 102054, 2021 07.

Article in English | MEDLINE | ID: mdl-33932751

ABSTRACT

The current COVID-19 pandemic overloads healthcare systems, including radiology departments. Though several deep learning approaches were developed to assist in CT analysis, nobody considered study triage directly as a computer science problem. We describe two basic setups: Identification of COVID-19 to prioritize studies of potentially infected patients to isolate them as early as possible; Severity quantification to highlight patients with severe COVID-19, thus direct them to a hospital or provide emergency medical care. We formalize these tasks as binary classification and estimation of affected lung percentage. Though similar problems were well-studied separately, we show that existing methods could provide reasonable quality only for one of these setups. We employ a multitask approach to consolidate both triage approaches and propose a convolutional neural network to leverage all available labels within a single model. In contrast with the related multitask approaches, we show the benefit from applying the classification layers to the most spatially detailed feature map at the upper part of U-Net instead of the less detailed latent representation at the bottom. We train our model on approximately 1500 publicly available CT studies and test it on the holdout dataset that consists of 123 chest CT studies of patients drawn from the same healthcare system, specifically 32 COVID-19 and 30 bacterial pneumonia cases, 30 cases with cancerous nodules, and 31 healthy controls. The proposed multitask model outperforms the other approaches and achieves ROC AUC scores of 0.87±0.01 vs. bacterial pneumonia, 0.93±0.01 vs. cancerous nodules, and 0.97±0.01 vs. healthy controls in Identification of COVID-19, and achieves 0.97±0.01 Spearman Correlation in Severity quantification. We have released our code and shared the annotated lesions masks for 32 CT images of patients with COVID-19 from the test dataset.

Subject(s)

COVID-19 , Deep Learning , Triage , COVID-19/diagnostic imaging , Humans , Pandemics , SARS-CoV-2 , Tomography, X-Ray Computed

8.

Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge.

Kuijf, Hugo J; Biesbroek, J Matthijs; De Bresser, Jeroen; Heinen, Rutger; Andermatt, Simon; Bento, Mariana; Berseth, Matt; Belyaev, Mikhail; Cardoso, M Jorge; Casamitjana, Adria; Collins, D Louis; Dadar, Mahsa; Georgiou, Achilleas; Ghafoorian, Mohsen; Jin, Dakai; Khademi, April; Knight, Jesse; Li, Hongwei; Llado, Xavier; Luna, Miguel; Mahmood, Qaiser; McKinley, Richard; Mehrtash, Alireza; Ourselin, Sebastien; Park, Bo-Yong; Park, Hyunjin; Park, Sang Hyun; Pezold, Simon; Puybareau, Elodie; Rittner, Leticia; Sudre, Carole H; Valverde, Sergi; Vilaplana, Veronica; Wiest, Roland; Xu, Yongchao; Xu, Ziyue; Zeng, Guodong; Zhang, Jianguo; Zheng, Guoyan; Chen, Christopher; van der Flier, Wiesje; Barkhof, Frederik; Viergever, Max A; Biessels, Geert Jan.

IEEE Trans Med Imaging ; 38(11): 2556-2568, 2019 11.

Article in English | MEDLINE | ID: mdl-30908194

ABSTRACT

Quantification of cerebral white matter hyperintensities (WMH) of presumed vascular origin is of key importance in many neurological research studies. Currently, measurements are often still obtained from manual segmentations on brain MR images, which is a laborious procedure. The automatic WMH segmentation methods exist, but a standardized comparison of the performance of such methods is lacking. We organized a scientific challenge, in which developers could evaluate their methods on a standardized multi-center/-scanner image dataset, giving an objective comparison: the WMH Segmentation Challenge. Sixty T1 + FLAIR images from three MR scanners were released with the manual WMH segmentations for training. A test set of 110 images from five MR scanners was used for evaluation. The segmentation methods had to be containerized and submitted to the challenge organizers. Five evaluation metrics were used to rank the methods: 1) Dice similarity coefficient; 2) modified Hausdorff distance (95th percentile); 3) absolute log-transformed volume difference; 4) sensitivity for detecting individual lesions; and 5) F1-score for individual lesions. In addition, the methods were ranked on their inter-scanner robustness; 20 participants submitted their methods for evaluation. This paper provides a detailed analysis of the results. In brief, there is a cluster of four methods that rank significantly better than the other methods, with one clear winner. The inter-scanner robustness ranking shows that not all the methods generalize to unseen scanners. The challenge remains open for future submissions and provides a public platform for method evaluation.

Subject(s)

Image Processing, Computer-Assisted/methods , Magnetic Resonance Imaging/methods , White Matter/diagnostic imaging , Aged , Algorithms , Female , Humans , Male , Middle Aged

9.

ISLES 2016 and 2017-Benchmarking Ischemic Stroke Lesion Outcome Prediction Based on Multispectral MRI.

Winzeck, Stefan; Hakim, Arsany; McKinley, Richard; Pinto, José A A D S R; Alves, Victor; Silva, Carlos; Pisov, Maxim; Krivov, Egor; Belyaev, Mikhail; Monteiro, Miguel; Oliveira, Arlindo; Choi, Youngwon; Paik, Myunghee Cho; Kwon, Yongchan; Lee, Hanbyul; Kim, Beom Joon; Won, Joong-Ho; Islam, Mobarakol; Ren, Hongliang; Robben, David; Suetens, Paul; Gong, Enhao; Niu, Yilin; Xu, Junshen; Pauly, John M; Lucas, Christian; Heinrich, Mattias P; Rivera, Luis C; Castillo, Laura S; Daza, Laura A; Beers, Andrew L; Arbelaezs, Pablo; Maier, Oskar; Chang, Ken; Brown, James M; Kalpathy-Cramer, Jayashree; Zaharchuk, Greg; Wiest, Roland; Reyes, Mauricio.

Front Neurol ; 9: 679, 2018.

Article in English | MEDLINE | ID: mdl-30271370

ABSTRACT

Performance of models highly depend not only on the used algorithm but also the data set it was applied to. This makes the comparison of newly developed tools to previously published approaches difficult. Either researchers need to implement others' algorithms first, to establish an adequate benchmark on their data, or a direct comparison of new and old techniques is infeasible. The Ischemic Stroke Lesion Segmentation (ISLES) challenge, which has ran now consecutively for 3 years, aims to address this problem of comparability. ISLES 2016 and 2017 focused on lesion outcome prediction after ischemic stroke: By providing a uniformly pre-processed data set, researchers from all over the world could apply their algorithm directly. A total of nine teams participated in ISLES 2015, and 15 teams participated in ISLES 2016. Their performance was evaluated in a fair and transparent way to identify the state-of-the-art among all submissions. Top ranked teams almost always employed deep learning tools, which were predominately convolutional neural networks (CNNs). Despite the great efforts, lesion outcome prediction persists challenging. The annotated data set remains publicly available and new approaches can be compared directly via the online evaluation system, serving as a continuing benchmark (www.isles-challenge.org).

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL