RESUMEN
BACKGROUND: UNet has achieved great success in medical image segmentation. However, due to the inherent locality of convolution operations, UNet is deficient in capturing global features and long-range dependencies of polyps, resulting in less accurate polyp recognition for complex morphologies and backgrounds. Transformers, with their sequential operations, are better at perceiving global features but lack low-level details, leading to limited localization ability. If the advantages of both architectures can be effectively combined, the accuracy of polyp segmentation can be further improved. METHODS: In this paper, we propose an attention and convolution-augmented UNet-Transformer Network (ACU-TransNet) for polyp segmentation. This network is composed of the comprehensive attention UNet and the Transformer head, sequentially connected by the bridge layer. On the one hand, the comprehensive attention UNet enhances specific feature extraction through deformable convolution and channel attention in the first layer of the encoder and achieves more accurate shape extraction through spatial attention and channel attention in the decoder. On the other hand, the Transformer head supplements fine-grained information through convolutional attention and acquires hierarchical global characteristics from the feature maps. RESULTS: mcU-TransNet could comprehensively learn dataset features and enhance colonoscopy interpretability for polyp detection. CONCLUSION: Experimental results on the CVC-ClinicDB and Kvasir-SEG datasets demonstrate that mcU-TransNet outperforms existing state-of-the-art methods, showcasing its robustness.
RESUMEN
Colonoscopy is widely recognized as the most effective method for the detection of colon polyps, which is crucial for early screening of colorectal cancer. Polyp identification and segmentation in colonoscopy images require specialized medical knowledge and are often labor-intensive and expensive. Deep learning provides an intelligent and efficient approach for polyp segmentation. However, the variability in polyp size and the heterogeneity of polyp boundaries and interiors pose challenges for accurate segmentation. Currently, Transformer-based methods have become a mainstream trend for polyp segmentation. However, these methods tend to overlook local details due to the inherent characteristics of Transformer, leading to inferior results. Moreover, the computational burden brought by self-attention mechanisms hinders the practical application of these models. To address these issues, we propose a novel CNN-Transformer hybrid model for polyp segmentation (CTHP). CTHP combines the strengths of CNN, which excels at modeling local information, and Transformer, which excels at modeling global semantics, to enhance segmentation accuracy. We transform the self-attention computation over the entire feature map into the width and height directions, significantly improving computational efficiency. Additionally, we design a new information propagation module and introduce additional positional bias coefficients during the attention computation process, which reduces the dispersal of information introduced by deep and mixed feature fusion in the Transformer. Extensive experimental results demonstrate that our proposed model achieves state-of-the-art performance on multiple benchmark datasets for polyp segmentation. Furthermore, cross-domain generalization experiments show that our model exhibits excellent generalization performance.
Asunto(s)
Pólipos del Colon , Colonoscopía , Aprendizaje Profundo , Humanos , Pólipos del Colon/patología , Pólipos del Colon/diagnóstico por imagen , Colonoscopía/métodos , Neoplasias Colorrectales/patología , Neoplasias Colorrectales/diagnóstico por imagen , Redes Neurales de la Computación , Procesamiento de Imagen Asistido por Computador/métodos , AlgoritmosRESUMEN
Colorectal cancer remains a leading cause of cancer-related deaths worldwide, with early detection and removal of polyps being critical in preventing disease progression. Automated polyp segmentation, particularly in colonoscopy images, is a challenging task due to the variability in polyp appearance and the low contrast between polyps and surrounding tissues. In this work, we propose an edge-enhanced network (EENet) designed to address these challenges by integrating two novel modules: the covariance edge-enhanced attention (CEEA) and cross-scale edge enhancement (CSEE) modules. The CEEA module leverages covariance-based attention to enhance boundary detection, while the CSEE module bridges multi-scale features to preserve fine-grained edge details. To further improve the accuracy of polyp segmentation, we introduce a hybrid loss function that combines cross-entropy loss with edge-aware loss. Extensive experiments show that the EENet achieves a Dice score of 0.9208 and an IoU of 0.8664 on the Kvasir-SEG dataset, surpassing state-of-the-art models such as Polyp-PVT and PraNet. Furthermore, it records a Dice score of 0.9316 and an IoU of 0.8817 on the CVC-ClinicDB dataset, demonstrating its strong potential for clinical application in polyp segmentation. Ablation studies further validate the contribution of the CEEA and CSEE modules.
RESUMEN
Introduction: Colorectal cancer (CRC) is one of the main causes of deaths worldwide. Early detection and diagnosis of its precursor lesion, the polyp, is key to reduce its mortality and to improve procedure efficiency. During the last two decades, several computational methods have been proposed to assist clinicians in detection, segmentation and classification tasks but the lack of a common public validation framework makes it difficult to determine which of them is ready to be deployed in the exploration room. Methods: This study presents a complete validation framework and we compare several methodologies for each of the polyp characterization tasks. Results: Results show that the majority of the approaches are able to provide good performance for the detection and segmentation task, but that there is room for improvement regarding polyp classification. Discussion: While studied show promising results in the assistance of polyp detection and segmentation tasks, further research should be done in classification task to obtain reliable results to assist the clinicians during the procedure. The presented framework provides a standarized method for evaluating and comparing different approaches, which could facilitate the identification of clinically prepared assisting methods.
RESUMEN
Automatic segmentation of polyps from colonoscopy images plays a critical role in the early diagnosis and treatment of colorectal cancer. Nevertheless, some bottlenecks still exist. In our previous work, we mainly focused on polyps with intra-class inconsistency and low contrast, using ICGNet to solve them. Due to the different equipment, specific locations and properties of polyps, the color distribution of the collected images is inconsistent. ICGNet was designed primarily with reverse-contour guide information and local-global context information, ignoring this inconsistent color distribution, which leads to overfitting problems and makes it difficult to focus only on beneficial image content. In addition, a trustworthy segmentation model should not only produce high-precision results but also provide a measure of uncertainty to accompany its predictions so that physicians can make informed decisions. However, ICGNet only gives the segmentation result and lacks the uncertainty measure. To cope with these novel bottlenecks, we further extend the original ICGNet to a comprehensive and effective network (UM-Net) with two main contributions that have been proved by experiments to have substantial practical value. Firstly, we employ a color transfer operation to weaken the relationship between color and polyps, making the model more concerned with the shape of the polyps. Secondly, we provide the uncertainty to represent the reliability of the segmentation results and use variance to rectify uncertainty. Our improved method is evaluated on five polyp datasets, which shows competitive results compared to other advanced methods in both learning ability and generalization capability. The source code is available at https://github.com/dxqllp/UM-Net.
RESUMEN
Automatic analysis of colonoscopy images has been an active field of research motivated by the importance of early detection of precancerous polyps. However, detecting polyps during the live examination can be challenging due to various factors such as variation of skills and experience among the endoscopists, lack of attentiveness, and fatigue leading to a high polyp miss-rate. Therefore, there is a need for an automated system that can flag missed polyps during the examination and improve patient care. Deep learning has emerged as a promising solution to this challenge as it can assist endoscopists in detecting and classifying overlooked polyps and abnormalities in real time, improving the accuracy of diagnosis and enhancing treatment. In addition to the algorithm's accuracy, transparency and interpretability are crucial to explaining the whys and hows of the algorithm's prediction. Further, conclusions based on incorrect decisions may be fatal, especially in medicine. Despite these pitfalls, most algorithms are developed in private data, closed source, or proprietary software, and methods lack reproducibility. Therefore, to promote the development of efficient and transparent methods, we have organized the "Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image Segmentation (MedAI 2021)" competitions. The Medico 2020 challenge received submissions from 17 teams, while the MedAI 2021 challenge also gathered submissions from another 17 distinct teams in the following year. We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic. Our analysis revealed that the participants improved dice coefficient metrics from 0.8607 in 2020 to 0.8993 in 2021 despite adding diverse and challenging frames (containing irregular, smaller, sessile, or flat polyps), which are frequently missed during a routine clinical examination. For the instrument segmentation task, the best team obtained a mean Intersection over union metric of 0.9364. For the transparency task, a multi-disciplinary team, including expert gastroenterologists, accessed each submission and evaluated the team based on open-source practices, failure case analysis, ablation studies, usability and understandability of evaluations to gain a deeper understanding of the models' credibility for clinical deployment. The best team obtained a final transparency score of 21 out of 25. Through the comprehensive analysis of the challenge, we not only highlight the advancements in polyp and surgical instrument segmentation but also encourage subjective evaluation for building more transparent and understandable AI-based colonoscopy systems. Moreover, we discuss the need for multi-center and out-of-distribution testing to address the current limitations of the methods to reduce the cancer burden and improve patient care.
RESUMEN
The performance of existing lesion semantic segmentation models has shown a steady improvement with the introduction of mechanisms like attention, skip connections, and deep supervision. However, these advancements often come at the expense of computational requirements, necessitating powerful graphics processing units with substantial video memory. Consequently, certain models may exhibit poor or non-existent performance on more affordable edge devices, such as smartphones and other point-of-care devices. To tackle this challenge, our paper introduces a lesion segmentation model with a low parameter count and minimal operations. This model incorporates polar transformations to simplify images, facilitating faster training and improved performance. We leverage the characteristics of polar images by directing the model's focus to areas most likely to contain segmentation information, achieved through the introduction of a learning-efficient polar-based contrast attention (PCA). This design utilizes Hadamard products to implement a lightweight attention mechanism without significantly increasing model parameters and complexities. Furthermore, we present a novel skip cross-channel aggregation (SC2A) approach for sharing cross-channel corrections, introducing Gaussian depthwise convolution to enhance nonlinearity. Extensive experiments on the ISIC 2018 and Kvasir datasets demonstrate that our model surpasses state-of-the-art models while maintaining only about 25K parameters. Additionally, our proposed model exhibits strong generalization to cross-domain data, as confirmed through experiments on the PH2 dataset and CVC-Polyp dataset. In addition, we evaluate the model's performance in a mobile setting against other lightweight models. Notably, our proposed model outperforms other advanced models in terms of IoU and Dice score, and running time.
Asunto(s)
Redes Neurales de la Computación , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Profundo , AlgoritmosRESUMEN
Colonoscopy is a reliable diagnostic method to detect colorectal polyps early on and prevent colorectal cancer. The current examination techniques face a significant challenge of high missed rates, resulting in numerous undetected polyps and irregularities. Automated and real-time segmentation methods can help endoscopists to segment the shape and location of polyps from colonoscopy images in order to facilitate clinician's timely diagnosis and interventions. Different parameters like shapes, small sizes of polyps, and their close resemblance to surrounding tissues make this task challenging. Furthermore, high-definition image quality and reliance on the operator make real-time and accurate endoscopic image segmentation more challenging. Deep learning models utilized for segmenting polyps, designed to capture diverse patterns, are becoming progressively complex. This complexity poses challenges for real-time medical operations. In clinical settings, utilizing automated methods requires the development of accurate, lightweight models with minimal latency, ensuring seamless integration with endoscopic hardware devices. To address these challenges, in this study a novel lightweight and more generalized Enhanced Nanonet model, an improved version of Nanonet using NanonetB for real-time and precise colonoscopy image segmentation, is proposed. The proposed model enhances the performance of Nanonet using Nanonet B on the overall prediction scheme by applying data augmentation, Conditional Random Field (CRF), and Test-Time Augmentation (TTA). Six publicly available datasets are utilized to perform thorough evaluations, assess generalizability, and validate the improvements: Kvasir-SEG, Endotect Challenge 2020, Kvasir-instrument, CVC-ClinicDB, CVC-ColonDB, and CVC-300. Through extensive experimentation, using the Kvasir-SEG dataset, our model achieves a mIoU score of 0.8188 and a Dice coefficient of 0.8060 with only 132,049 parameters and employing minimal computational resources. A thorough cross-dataset evaluation was performed to assess the generalization capability of the proposed Enhanced Nanonet model across various publicly available polyp datasets for potential real-world applications. The result of this study shows that using CRF (Conditional Random Fields) and TTA (Test-Time Augmentation) enhances performance within the same dataset and also across diverse datasets with a model size of just 132,049 parameters. Also, the proposed method indicates improved results in detecting smaller and sessile polyps (flats) that are significant contributors to the high miss rates.
RESUMEN
Automatic polyp segmentation in endoscopic images is critical for the early diagnosis of colorectal cancer. Despite the availability of powerful segmentation models, two challenges still impede the accuracy of polyp segmentation algorithms. Firstly, during a colonoscopy, physicians frequently adjust the orientation of the colonoscope tip to capture underlying lesions, resulting in viewpoint changes in the colonoscopy images. These variations increase the diversity of polyp visual appearance, posing a challenge for learning robust polyp features. Secondly, polyps often exhibit properties similar to the surrounding tissues, leading to indistinct polyp boundaries. To address these problems, we propose a viewpoint-aware framework named VANet for precise polyp segmentation. In VANet, polyps are emphasized as a discriminative feature and thus can be localized by class activation maps in a viewpoint classification process. With these polyp locations, we design a viewpoint-aware Transformer (VAFormer) to alleviate the erosion of attention by the surrounding tissues, thereby inducing better polyp representations. Additionally, to enhance the polyp boundary perception of the network, we develop a boundary-aware Transformer (BAFormer) to encourage self-attention towards uncertain regions. As a consequence, the combination of the two modules is capable of calibrating predictions and significantly improving polyp segmentation performance. Extensive experiments on seven public datasets across six metrics demonstrate the state-of-the-art results of our method, and VANet can handle colonoscopy images in real-world scenarios effectively. The source code is available at https://github.com/1024803482/Viewpoint-Aware-Network.
Asunto(s)
Algoritmos , Pólipos del Colon , Colonoscopía , Humanos , Pólipos del Colon/diagnóstico por imagen , Colonoscopía/métodos , Neoplasias Colorrectales/diagnóstico por imagen , Interpretación de Imagen Asistida por Computador/métodosRESUMEN
Early detection of polyps is essential to decrease colorectal cancer(CRC) incidence. Therefore, developing an efficient and accurate polyp segmentation technique is crucial for clinical CRC prevention. In this paper, we propose an end-to-end training approach for polyp segmentation that employs diffusion model. The images are considered as priors, and the segmentation is formulated as a mask generation process. In the sampling process, multiple predictions are generated for each input image using the trained model, and significant performance enhancements are achieved through the use of majority vote strategy. Four public datasets and one in-house dataset are used to train and test the model performance. The proposed method achieves mDice scores of 0.934 and 0.967 for datasets Kvasir-SEG and CVC-ClinicDB respectively. Furthermore, one cross-validation is applied to test the generalization of the proposed model, and the proposed methods outperformed previous state-of-the-art(SOTA) models to the best of our knowledge. The proposed method also significantly improves the segmentation accuracy and has strong generalization capability.
Asunto(s)
Pólipos del Colon , Neoplasias Colorrectales , Humanos , Pólipos del Colon/diagnóstico por imagen , Neoplasias Colorrectales/diagnóstico por imagen , Modelos Estadísticos , Interpretación de Imagen Asistida por Computador/métodos , AlgoritmosRESUMEN
The prevalence of colorectal cancer, primarily emerging from polyps, underscores the importance of their early detection in colonoscopy images. Due to the inherent complexity and variability of polyp appearances, the task stands difficult despite recent advances in medical technology. To tackle these challenges, a deep learning model featuring a customized U-Net architecture, AdaptUNet is proposed. Attention mechanisms and skip connections facilitate the effective combination of low-level details and high-level contextual information for accurate polyp segmentation. Further, wavelet transformations are used to extract useful features overlooked in conventional image processing. The model achieves benchmark results with a Dice coefficient of 0.9104, an Intersection over Union (IoU) coefficient of 0.8368, and a Balanced Accuracy of 0.9880 on the CVC-300 dataset. Additionally, it shows exceptional performance on other datasets, including Kvasir-SEG and Etis-LaribDB. Training was performed using the Hyper Kvasir segmented images dataset, further evidencing the model's ability to handle diverse data inputs. The proposed method offers a comprehensive and efficient implementation for polyp detection without compromising performance, thus promising an improved precision and reduction in manual labour for colorectal polyp detection.
RESUMEN
Colorectal polyps serve as potential precursors of colorectal cancer and automating polyp segmentation aids physicians in accurately identifying potential polyp regions, thereby reducing misdiagnoses and missed diagnoses. However, existing models often fall short in accurately segmenting polyps due to the high degree of similarity between polyp regions and surrounding tissue in terms of color, texture, and shape. To address this challenge, this study proposes a novel three-stage polyp segmentation network, named Reverse Attention Feature Purification with Pyramid Vision Transformer (RAFPNet), which adopts an iterative feedback UNet architecture to refine polyp saliency maps for precise segmentation. Initially, a Multi-Scale Feature Aggregation (MSFA) module is introduced to generate preliminary polyp saliency maps. Subsequently, a Reverse Attention Feature Purification (RAFP) module is devised to effectively suppress low-level surrounding tissue features while enhancing high-level semantic polyp information based on the preliminary saliency maps. Finally, the UNet architecture is leveraged to further refine the feature maps in a coarse-to-fine approach. Extensive experiments conducted on five widely used polyp segmentation datasets and three video polyp segmentation datasets demonstrate the superior performance of RAFPNet over state-of-the-art models across multiple evaluation metrics.
Asunto(s)
Pólipos del Colon , Humanos , Pólipos del Colon/diagnóstico por imagen , Interpretación de Imagen Asistida por Computador/métodos , AlgoritmosRESUMEN
Automatically segmenting polyps from colonoscopy videos is crucial for developing computer-assisted diagnostic systems for colorectal cancer. Existing automatic polyp segmentation methods often struggle to fulfill the real-time demands of clinical applications due to their substantial parameter count and computational load, especially those based on Transformer architectures. To tackle these challenges, a novel lightweight long-range context fusion network, named LightCF-Net, is proposed in this paper. This network attempts to model long-range spatial dependencies while maintaining real-time performance, to better distinguish polyps from background noise and thus improve segmentation accuracy. A novel Fusion Attention Encoder (FAEncoder) is designed in the proposed network, which integrates Large Kernel Attention (LKA) and channel attention mechanisms to extract deep representational features of polyps and unearth long-range dependencies. Furthermore, a newly designed Visual Attention Mamba module (VAM) is added to the skip connections, modeling long-range context dependencies in the encoder-extracted features and reducing background noise interference through the attention mechanism. Finally, a Pyramid Split Attention module (PSA) is used in the bottleneck layer to extract richer multi-scale contextual features. The proposed method was thoroughly evaluated on four renowned polyp segmentation datasets: Kvasir-SEG, CVC-ClinicDB, BKAI-IGH, and ETIS. Experimental findings demonstrate that the proposed method delivers higher segmentation accuracy in less time, consistently outperforming the most advanced lightweight polyp segmentation networks.
RESUMEN
Polyps are abnormal tissue clumps growing primarily on the inner linings of the gastrointestinal tract. While such clumps are generally harmless, they can potentially evolve into pathological tumors, and thus require long-term observation and monitoring. Polyp segmentation in gastrointestinal endoscopy images is an important stage for polyp monitoring and subsequent treatment. However, this segmentation task faces multiple challenges: the low contrast of the polyp boundaries, the varied polyp appearance, and the co-occurrence of multiple polyps. So, in this paper, an implicit edge-guided cross-layer fusion network (IECFNet) is proposed for polyp segmentation. The codec pair is used to generate an initial saliency map, the implicit edge-enhanced context attention module aggregates the feature graph output from the encoding and decoding to generate the rough prediction, and the multi-scale feature reasoning module is used to generate final predictions. Polyp segmentation experiments have been conducted on five popular polyp image datasets (Kvasir, CVC-ClinicDB, ETIS, CVC-ColonDB, and CVC-300), and the experimental results show that the proposed method significantly outperforms a conventional method, especially with an accuracy margin of 7.9% on the ETIS dataset.
Asunto(s)
Pólipos del Colon , Humanos , Pólipos del Colon/patología , Pólipos del Colon/diagnóstico por imagen , Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos , Redes Neurales de la Computación , Interpretación de Imagen Asistida por Computador/métodos , Pólipos/patología , Pólipos/diagnóstico por imagen , Endoscopía Gastrointestinal/métodosRESUMEN
Accurate segmentation of polyps in colonoscopy images has gained significant attention in recent years, given its crucial role in automated colorectal cancer diagnosis. Many existing deep learning-based methods follow a one-stage processing pipeline, often involving feature fusion across different levels or utilizing boundary-related attention mechanisms. Drawing on the success of applying Iterative Feedback Units (IFU) in image polyp segmentation, this paper proposes FlowICBNet by extending the IFU to the domain of video polyp segmentation. By harnessing the unique capabilities of IFU to propagate and refine past segmentation results, our method proves effective in mitigating challenges linked to the inherent limitations of endoscopic imaging, notably the presence of frequent camera shake and frame defocusing. Furthermore, in FlowICBNet, we introduce two pivotal modules: Reference Frame Selection (RFS) and Flow Guided Warping (FGW). These modules play a crucial role in filtering and selecting the most suitable historical reference frames for the task at hand. The experimental results on a large video polyp segmentation dataset demonstrate that our method can significantly outperform state-of-the-art methods by notable margins achieving an average metrics improvement of 7.5% on SUN-SEG-Easy and 7.4% on SUN-SEG-Hard. Our code is available at https://github.com/eraserNut/ICBNet.
Asunto(s)
Pólipos del Colon , Humanos , Pólipos del Colon/diagnóstico por imagen , Colonoscopía/métodos , Aprendizaje Profundo , Interpretación de Imagen Asistida por Computador/métodos , Grabación en Video , Neoplasias Colorrectales/diagnóstico por imagen , Algoritmos , Procesamiento de Imagen Asistido por Computador/métodosRESUMEN
Purpose: Colon cancer is one of the top three diseases in gastrointestinal cancers, and colon polyps are an important trigger of colon cancer. Early diagnosis and removal of colon polyps can avoid the incidence of colon cancer. Currently, colon polyp removal surgery is mainly based on artificial-intelligence (AI) colonoscopy, supplemented by deep-learning technology to help doctors remove colon polyps. With the development of deep learning, the use of advanced AI technology to assist in medical diagnosis has become mainstream and can maximize the doctor's diagnostic time and help doctors to better formulate medical plans. Approach: We propose a deep-learning model for segmenting colon polyps. The model adopts a dual-branch structure, combines a convolutional neural network (CNN) with a transformer, and replaces ordinary convolution with deeply separable convolution based on ResNet; a stripe pooling module is introduced to obtain more effective information. The aggregated attention module (AAM) is proposed for high-dimensional semantic information, which effectively combines two different structures for the high-dimensional information fusion problem. Deep supervision and multi-scale training are added in the model training process to enhance the learning effect and generalization performance of the model. Results: The experimental results show that the proposed dual-branch structure is significantly better than the single-branch structure, and the model using the AAM has a significant performance improvement over the model not using the AAM. Our model leads 1.1% and 1.5% in mIoU and mDice, respectively, when compared with state-of-the-art models in a fivefold cross-validation on the Kvasir-SEG dataset. Conclusions: We propose and validate a deep learning model for segmenting colon polyps, using a dual-branch network structure. Our results demonstrate the feasibility of complementing traditional CNNs and transformer with each other. And we verified the feasibility of fusing different structures on high-dimensional semantics and successfully retained the high-dimensional information of different structures effectively.
RESUMEN
Colonoscopy has attached great importance to early screening and clinical diagnosis of colon cancer. It remains a challenging task to achieve fine segmentation of polyps. However, existing State-of-the-art models still have limited segmentation ability due to the lack of clear and highly similar boundaries between normal tissue and polyps. To deal with this problem, we propose a region self-attention enhancement network (RSAFormer) with a transformer encoder to capture more robust features. Different from other excellent methods, RSAFormer uniquely employs a dual decoder structure to generate various feature maps. Contrasting with traditional methods that typically employ a single decoder, it offers more flexibility and detail in feature extraction. RSAFormer also introduces a region self-attention enhancement module (RSA) to acquire more accurate feature information and foster a stronger interplay between low-level and high-level features. This module enhances uncertain areas to extract more precise boundary information, these areas being signified by regional context. Extensive experiments were conducted on five prevalent polyp datasets to demonstrate RSAFormer's proficiency. It achieves 92.2% and 83.5% mean Dice on Kvasir and ETIS, respectively, which outperformed most of the state-of-the-art models.
Asunto(s)
Colonoscopía , Procesamiento de Imagen Asistido por Computador , IncertidumbreRESUMEN
Polyp detection is a challenging task in the diagnosis of Colorectal Cancer (CRC), and it demands clinical expertise due to the diverse nature of polyps. The recent years have witnessed the development of automated polyp detection systems to assist the experts in early diagnosis, considerably reducing the time consumption and diagnostic errors. In automated CRC diagnosis, polyp segmentation is an important step which is carried out with deep learning segmentation models. Recently, Vision Transformers (ViT) are slowly replacing these models due to their ability to capture long range dependencies among image patches. However, the existing ViTs for polyp do not harness the inherent self-attention abilities and incorporate complex attention mechanisms. This paper presents Polyp-Vision Transformer (Polyp-ViT), a novel Transformer model based on the conventional Transformer architecture, which is enhanced with adaptive mechanisms for feature extraction and positional embedding. Polyp-ViT is tested on the Kvasir-seg and CVC-Clinic DB Datasets achieving segmentation accuracies of 0.9891 ± 0.01 and 0.9875 ± 0.71 respectively, outperforming state-of-the-art models. Polyp-ViT is a prospective tool for polyp segmentation which can be adapted to other medical image segmentation tasks as well due to its ability to generalize well.
Asunto(s)
Pólipos , Humanos , Instituciones de Atención Ambulatoria , Errores Diagnósticos , Suministros de Energía Eléctrica , Colon , Procesamiento de Imagen Asistido por ComputadorRESUMEN
BACKGROUND: Polyp detection and localization are essential tasks for colonoscopy. U-shape network based convolutional neural networks have achieved remarkable segmentation performance for biomedical images, but lack of long-range dependencies modeling limits their receptive fields. PURPOSE: Our goal was to develop and test a novel architecture for polyp segmentation, which takes advantage of learning local information with long-range dependencies modeling. METHODS: A novel architecture combining with multi-scale nested UNet structure integrated transformer for polyp segmentation was developed. The proposed network takes advantage of both CNN and transformer to extract distinct feature information. The transformer layer is embedded between the encoder and decoder of a U-shape net to learn explicit global context and long-range semantic information. To address the challenging of variant polyp sizes, a MSFF unit was proposed to fuse features with multiple resolution. RESULTS: Four public datasets and one in-house dataset were used to train and test the model performance. Ablation study was also conducted to verify each component of the model. For dataset Kvasir-SEG and CVC-ClinicDB, the proposed model achieved mean dice score of 0.942 and 0.950 respectively, which were more accurate than the other methods. To show the generalization of different methods, we processed two cross dataset validations, the proposed model achieved the highest mean dice score. The results demonstrate that the proposed network has powerful learning and generalization capability, significantly improving segmentation accuracy and outperforming state-of-the-art methods. CONCLUSIONS: The proposed model produced more accurate polyp segmentation than current methods on four different public and one in-house datasets. Its capability of polyps segmentation in different sizes shows the potential clinical application.
Asunto(s)
Pólipos del Colon , Colonoscopía , Redes Neurales de la Computación , Humanos , Pólipos del Colon/diagnóstico por imagen , Colonoscopía/métodos , Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos , Neoplasias Colorrectales/diagnóstico por imagen , Neoplasias Colorrectales/patología , Interpretación de Imagen Asistida por Computador/métodos , Bases de Datos FactualesRESUMEN
In terms of speed and accuracy, the deep learning-based polyp segmentation method is superior. It is essential for the early detection and treatment of colorectal cancer and has the potential to greatly reduce the disease's overall prevalence. Due to the various forms and sizes of polyps, as well as the blurring of the boundaries between the polyp region and the surrounding mucus, most existing algorithms are unable to provide highly accurate colorectal polyp segmentation. Therefore, to overcome these obstacles, we propose an adaptive feature aggregation network (AFANet). It contains two main modules: the Multi-modal Balancing Attention Module (MMBA) and the Global Context Module (GCM). The MMBA extracts improved local characteristics for inference by integrating local contextual information while paying attention to them in three regions: foreground, background, and border. The GCM takes global information from the top of the encoder and sends it to the decoder layer in order to further investigate global contextual feature information in the pathologic picture. Dice of 92.11 % and 94.76 % and MIoU of 91.07 % and 94.54 %, respectively, are achieved by comprehensive experimental validation of our proposed technique on two benchmark datasets, Kvasir-SEG and CVCClinicDB. The experimental results demonstrate that the strategy outperforms other cutting-edge approaches.