Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 115
Filter
1.
Sensors (Basel) ; 24(17)2024 Aug 23.
Article in English | MEDLINE | ID: mdl-39275364

ABSTRACT

Different types of rural settlement agglomerations have been formed and mixed in space during the rural revitalization strategy implementation in China. Discriminating them from remote sensing images is of great significance for rural land planning and living environment improvement. Currently, there is a lack of automatic methods for obtaining information on rural settlement differentiation. In this paper, an improved encoder-decoder network structure, ASCEND-UNet, was designed based on the original UNet. It was implemented to segment and classify dispersed and clustered rural settlement buildings from high-resolution satellite images. The ASCEND-UNet model incorporated three components: firstly, the atrous spatial pyramid pooling (ASPP) multi-scale feature fusion module was added into the encoder, then the spatial and channel squeeze and excitation (scSE) block was embedded at the skip connection; thirdly, the hybrid dilated convolution (HDC) block was utilized in the decoder. In our proposed framework, the ASPP and HDC were used as multiple dilated convolution blocks to expand the receptive field by introducing a series of dilated rate convolutions. The scSE is an attention mechanism block focusing on features both in the spatial and channel dimension. A series of model comparisons and accuracy assessments with the original UNet, PSPNet, DeepLabV3+, and SegNet verified the effectiveness of our proposed model. Compared with the original UNet model, ASCEND-UNet achieved improvements of 4.67%, 2.80%, 3.73%, and 6.28% in precision, recall, F1-score and MIoU, respectively. The contributions of HDC, ASPP, and scSE modules were discussed in ablation experiments. Our proposed model obtained more accurate and stable results by integrating multiple dilated convolution blocks with an attention mechanism. This novel model enriches the automatic methods for semantic segmentation of different rural settlements from remote sensing images.

2.
Front Neurorobot ; 18: 1436052, 2024.
Article in English | MEDLINE | ID: mdl-39220588

ABSTRACT

Aiming at the problems of traditional image super-resolution reconstruction algorithms in the image reconstruction process, such as small receptive field, insufficient multi-scale feature extraction, and easy loss of image feature information, a super-resolution reconstruction algorithm of multi-scale dilated convolution network based on dilated convolution is proposed in this paper. First, the algorithm extracts features from the same input image through the dilated convolution kernels of different receptive fields to obtain feature maps with different scales; then, through the residual attention dense block, further obtain the features of the original low resolution images, local residual connections are added to fuse multi-scale feature information between multiple channels, and residual nested networks and jump connections are used at the same time to speed up deep network convergence and avoid network degradation problems. Finally, deep network extraction features, and it is fused with input features to increase the nonlinear expression ability of the network to enhance the super-resolution reconstruction effect. Experimental results show that compared with Bicubic, SRCNN, ESPCN, VDSR, DRCN, LapSRN, MemNet, and DSRNet algorithms on the Set5, Set14, BSDS100, and Urban100 test sets, the proposed algorithm has improved peak signal-to-noise ratio and structural similarity, and reconstructed images. The visual effect is better.

3.
Plants (Basel) ; 13(18)2024 Sep 22.
Article in English | MEDLINE | ID: mdl-39339630

ABSTRACT

Plants play a vital role in numerous domains, including medicine, agriculture, and environmental balance. Furthermore, they contribute to the production of oxygen and the retention of carbon dioxide, both of which are necessary for living beings. Numerous researchers have conducted thorough research in the classification of plant species where certain studies have focused on limited numbers of classes, while others have employed conventional machine-learning and deep-learning models to classify them. To address these limitations, this paper introduces a novel dual-stream neural architecture embedded with a soft-attention mechanism specifically developed for accurately classifying plant species. The proposed model utilizes residual and inception blocks enhanced with dilated convolutional layers for acquiring both local and global information. Following the extraction of features, both streams are combined, and a soft-attention technique is used to improve the distinct characteristics. The efficacy of the model is shown via extensive experimentation on varied datasets, including several plant species. Moreover, we have contributed a novel dataset that comprises 48 classes of different plant species. The results demonstrate a higher level of performance when compared to current models, emphasizing the capability of the dual-stream design in improving accuracy and model generalization. The integration of a dual-stream architecture, dilated convolutions, and soft attention provides a strong and reliable foundation for the botanical community, supporting advancement in the field of plant species classification.

4.
Genome Biol ; 25(1): 243, 2024 Sep 16.
Article in English | MEDLINE | ID: mdl-39285451

ABSTRACT

The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. We describe Splam, a novel method for predicting splice junctions in DNA using deep residual convolutional neural networks. Unlike previous models, Splam looks at a 400-base-pair window flanking each splice site, reflecting the biological splicing process that relies primarily on signals within this window. Splam also trains on donor and acceptor pairs together, mirroring how the splicing machinery recognizes both ends of each intron. Compared to SpliceAI, Splam is consistently more accurate, achieving 96% accuracy in predicting human splice junctions.


Subject(s)
Deep Learning , RNA Splice Sites , RNA Splicing , Humans , Introns , Sequence Alignment , Neural Networks, Computer
5.
Neural Netw ; 179: 106568, 2024 Nov.
Article in English | MEDLINE | ID: mdl-39089152

ABSTRACT

Dilated convolution has been widely used in various computer vision tasks due to its ability to expand the receptive field while maintaining the resolution of feature maps. However, the critical challenge is the gridding problem caused by the isomorphic structure of the dilated convolution, where the holes filled in the dilated convolution destroy the integrity of the extracted information and cut off the relevance of neighboring pixels. In this work, a novel heterogeneous dilated convolution, called HDConv, is proposed to address this issue by setting independent dilation rates on grouped channels while keeping the general convolution operation. The heterogeneous structure can effectively avoid the gridding problem while introducing multi-scale kernels in the filters. Based on the heterogeneous structure of the proposed HDConv, we also explore the benefit of large receptive fields to feature extraction by comparing different combinations of dilated rates. Finally, a series of experiments are conducted to verify the effectiveness of some computer vision tasks, such as image segmentation and object detection. The results show the proposed HDConv can achieve a competitive performance on ADE20K, Cityscapes, COCO-Stuff10k, COCO, and a medical image dataset UESTC-COVID-19. The proposed module can readily replace conventional convolutions in existing convolutional neural networks (i.e., plug-and-play), and it is promising to further extend dilated convolution to wider scenarios in the field of image segmentation.


Subject(s)
Neural Networks, Computer , Humans , Image Processing, Computer-Assisted/methods , Algorithms , COVID-19 , Deep Learning
6.
Sensors (Basel) ; 24(15)2024 Aug 05.
Article in English | MEDLINE | ID: mdl-39124111

ABSTRACT

Due to the increasing severity of aging populations in modern society, the accurate and timely identification of, and responses to, sudden abnormal behaviors of the elderly have become an urgent and important issue. In the current research on computer vision-based abnormal behavior recognition, most algorithms have shown poor generalization and recognition abilities in practical applications, as well as issues with recognizing single actions. To address these problems, an MSCS-DenseNet-LSTM model based on a multi-scale attention mechanism is proposed. This model integrates the MSCS (Multi-Scale Convolutional Structure) module into the initial convolutional layer of the DenseNet model to form a multi-scale convolution structure. It introduces the improved Inception X module into the Dense Block to form an Inception Dense structure, and gradually performs feature fusion through each Dense Block module. The CBAM attention mechanism module is added to the dual-layer LSTM to enhance the model's generalization ability while ensuring the accurate recognition of abnormal actions. Furthermore, to address the issue of single-action abnormal behavior datasets, the RGB image dataset RIDS (RGB image dataset) and the contour image dataset CIDS (contour image dataset) containing various abnormal behaviors were constructed. The experimental results validate that the proposed MSCS-DenseNet-LSTM model achieved an accuracy, sensitivity, and specificity of 98.80%, 98.75%, and 98.82% on the two datasets, and 98.30%, 98.28%, and 98.38%, respectively.


Subject(s)
Algorithms , Neural Networks, Computer , Humans , Pattern Recognition, Automated/methods , Behavior/physiology , Image Processing, Computer-Assisted/methods
7.
Sensors (Basel) ; 24(11)2024 Jun 03.
Article in English | MEDLINE | ID: mdl-38894398

ABSTRACT

Image denoising is regarded as an ill-posed problem in computer vision tasks that removes additive noise from imaging sensors. Recently, several convolution neural network-based image-denoising methods have achieved remarkable advances. However, it is difficult for a simple denoising network to recover aesthetically pleasing images owing to the complexity of image content. Therefore, this study proposes a multi-branch network to improve the performance of the denoising method. First, the proposed network is designed based on a conventional autoencoder to learn multi-level contextual features from input images. Subsequently, we integrate two modules into the network, including the Pyramid Context Module (PCM) and the Residual Bottleneck Attention Module (RBAM), to extract salient information for the training process. More specifically, PCM is applied at the beginning of the network to enlarge the receptive field and successfully address the loss of global information using dilated convolution. Meanwhile, RBAM is inserted into the middle of the encoder and decoder to eliminate degraded features and reduce undesired artifacts. Finally, extensive experimental results prove the superiority of the proposed method over state-of-the-art deep-learning methods in terms of objective and subjective performances.

8.
Sensors (Basel) ; 24(9)2024 May 02.
Article in English | MEDLINE | ID: mdl-38733020

ABSTRACT

To address the various challenges in aluminum surface defect detection, such as multiscale intricacies, sensitivity to lighting variations, occlusion, and noise, this study proposes the AluDef-ClassNet model. Firstly, a Gaussian difference pyramid is utilized to capture multiscale image features. Secondly, a self-attention mechanism is introduced to enhance feature representation. Additionally, an improved residual network structure incorporating dilated convolutions is adopted to increase the receptive field, thereby enhancing the network's ability to learn from extensive information. A small-scale dataset of high-quality aluminum surface defect images is acquired using a CCD camera. To better tackle the challenges in surface defect detection, advanced deep learning techniques and data augmentation strategies are employed. To address the difficulty of data labeling, a transfer learning approach based on fine-tuning is utilized, leveraging prior knowledge to enhance the efficiency and accuracy of model training. In dataset testing, our model achieved a classification accuracy of 97.6%, demonstrating significant advantages over other classification models.

9.
Biomed Tech (Berl) ; 69(5): 465-480, 2024 Oct 28.
Article in English | MEDLINE | ID: mdl-38712825

ABSTRACT

Subcortical brain structure segmentation plays an important role in the diagnosis of neuroimaging and has become the basis of computer-aided diagnosis. Due to the blurred boundaries and complex shapes of subcortical brain structures, labeling these structures by hand becomes a time-consuming and subjective task, greatly limiting their potential for clinical applications. Thus, this paper proposes the sparsification transformer (STF) module for accurate brain structure segmentation. The self-attention mechanism is used to establish global dependencies to efficiently extract the global information of the feature map with low computational complexity. Also, the shallow network is used to compensate for low-level detail information through the localization of convolutional operations to promote the representation capability of the network. In addition, a hybrid residual dilated convolution (HRDC) module is introduced at the bottom layer of the network to extend the receptive field and extract multi-scale contextual information. Meanwhile, the octave convolution edge feature extraction (OCT) module is applied at the skip connections of the network to pay more attention to the edge features of brain structures. The proposed network is trained with a hybrid loss function. The experimental evaluation on two public datasets: IBSR and MALC, shows outstanding performance in terms of objective and subjective quality.


Subject(s)
Brain , Neural Networks, Computer , Humans , Brain/diagnostic imaging , Algorithms , Neuroimaging/methods , Image Processing, Computer-Assisted/methods
10.
Med Eng Phys ; 126: 104138, 2024 04.
Article in English | MEDLINE | ID: mdl-38621836

ABSTRACT

Lung cancer is one of the most deadly diseases in the world. Lung cancer detection can save the patient's life. Despite being the best imaging tool in the medical sector, clinicians find it challenging to interpret and detect cancer from Computed Tomography (CT) scan data. One of the most effective ways for the diagnosis of certain malignancies like lung tumours is Positron Emission Tomography (PET) imaging. So many diagnosis models have been implemented nowadays to diagnose various diseases. Early lung cancer identification is very important for predicting the severity level of lung cancer in cancer patients. To explore the effective model, an image fusion-based detection model is proposed for lung cancer detection using an improved heuristic algorithm of the deep learning model. Firstly, the PET and CT images are gathered from the internet. Further, these two collected images are fused for further process by using the Adaptive Dilated Convolution Neural Network (AD-CNN), in which the hyperparameters are tuned by the Modified Initial Velocity-based Capuchin Search Algorithm (MIV-CapSA). Subsequently, the abnormal regions are segmented by influencing the TransUnet3+. Finally, the segmented images are fed into the Hybrid Attention-based Deep Networks (HADN) model, encompassed with Mobilenet and Shufflenet. Therefore, the effectiveness of the novel detection model is analyzed using various metrics compared with traditional approaches. At last, the outcome evinces that it aids in early basic detection to treat the patients effectively.


Subject(s)
Deep Learning , Lung Neoplasms , Humans , Lung Neoplasms/diagnostic imaging , Heuristics , Tomography, X-Ray Computed , Positron-Emission Tomography , Algorithms
11.
Anal Biochem ; 690: 115491, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38460901

ABSTRACT

Bioactive peptides can hinder oxidative processes and microbial spoilage in foodstuffs and play important roles in treating diverse diseases and disorders. While most of the methods focus on single-functional bioactive peptides and have obtained promising prediction performance, it is still a significant challenge to accurately detect complex and diverse functions simultaneously with the quick increase of multi-functional bioactive peptides. In contrast to previous research on multi-functional bioactive peptide prediction based solely on sequence, we propose a novel multimodal dual-branch (MMDB) lightweight deep learning model that designs two different branches to effectively capture the complementary information of peptide sequence and structural properties. Specifically, a multi-scale dilated convolution with Bi-LSTM branch is presented to effectively model the different scales sequence properties of peptides while a multi-layer convolution branch is proposed to capture structural information. To the best of our knowledge, this is the first effective extraction of peptide sequence features using multi-scale dilated convolution without parameter increase. Multimodal features from both branches are integrated via a fully connected layer for multi-label classification. Compared to state-of-the-art methods, our MMDB model exhibits competitive results across metrics, with a 9.1% Coverage increase and 5.3% and 3.5% improvements in Precision and Accuracy, respectively.

12.
Sci Rep ; 14(1): 5087, 2024 03 01.
Article in English | MEDLINE | ID: mdl-38429300

ABSTRACT

When traditional EEG signals are collected based on the Nyquist theorem, long-time recordings of EEG signals will produce a large amount of data. At the same time, limited bandwidth, end-to-end delay, and memory space will bring great pressure on the effective transmission of data. The birth of compressed sensing alleviates this transmission pressure. However, using an iterative compressed sensing reconstruction algorithm for EEG signal reconstruction faces complex calculation problems and slow data processing speed, limiting the application of compressed sensing in EEG signal rapid monitoring systems. As such, this paper presents a non-iterative and fast algorithm for reconstructing EEG signals using compressed sensing and deep learning techniques. This algorithm uses the improved residual network model, extracts the feature information of the EEG signal by one-dimensional dilated convolution, directly learns the nonlinear mapping relationship between the measured value and the original signal, and can quickly and accurately reconstruct the EEG signal. The method proposed in this paper has been verified by simulation on the open BCI contest dataset. Overall, it is proved that the proposed method has higher reconstruction accuracy and faster reconstruction speed than the traditional CS reconstruction algorithm and the existing deep learning reconstruction algorithm. In addition, it can realize the rapid reconstruction of EEG signals.


Subject(s)
Data Compression , Deep Learning , Signal Processing, Computer-Assisted , Data Compression/methods , Algorithms , Electroencephalography/methods
13.
Heliyon ; 10(5): e26589, 2024 Mar 15.
Article in English | MEDLINE | ID: mdl-38468917

ABSTRACT

Roads are closely intertwined with human existence, and the process of extracting road networks has emerged as the most prominent task in remote sensing (RS). The automated road interpretation process of remote sensing images (RSI) efficiently acquires road network data at a reduced expense in comparison to the traditional visual interpretation of RSI. However the manifestation of RSI is completely distinct because of the great difference in length, width, material, and shape of road networks in dissimilar areas. Thus, the extraction of road network data in RSI is still a complex issue. In recent times, DL-based approaches have projected a famous development in image segmentation outcomes, but a lot of them still could not retain boundary data and attain high-resolution road segmentation maps while processing the RSI. Traditional convolutional neural networks (CNNs) demonstrate impressive performance in road extract tasks; however, they frequently encounter difficulties in capturing intricate details and contextual information. The study introduces a novel method, named Archimedes Optimisation Algorithm, Quantum Dilated Convolutional Neural Network for Road Extraction (AOA-QDCNNRE), to tackle the challenges encountered in remote sensing images. The AOA-QDCNNRE technique aims to generate a high-resolution road segmentation map using DL with a hyperparameter tuning process. The AOA-QDCNNRE technique primarily relies on the QDCNN model, which integrates quantum technology (QC) with dilated convolutions to augment the network's capacity to capture local as well as global contextual information. In addition, the incorporation of the dilated convolutional technique effectively enhances the receptive field without sacrificing spatial resolution, enabling the extraction of precise road features. To develop the road extraction outcomes of the QDCNN approach, the AOA-based hyperparameter tuning process can be exploited. The AOA-QDCNNRE system's simulation results can be tested on benchmark databases, and the results indicate that the AOA-QDCNNRE method surpasses recent algorithms.

14.
Sci Prog ; 107(1): 368504241231161, 2024.
Article in English | MEDLINE | ID: mdl-38400510

ABSTRACT

In modern urban traffic systems, intersection monitoring systems are used to monitor traffic flows and track vehicles by recognizing license plates. However, intersection monitors often produce motion-blurred images because of the rapid movement of cars. If a deep learning network is used for image deblurring, the blurring of the image can be eliminated first, and then the complete vehicle information can be obtained to improve the recognition rate. To restore a dynamic blurred image to a sharp image, this paper proposes a multi-scale modified U-Net image deblurring network using dilated convolution and employs a variable scaling iterative strategy to make the scheme more adaptable to actual blurred images. Multi-scale architecture uses scale changes to learn the characteristics of different scales of images, and the use of dilated convolution can improve the advantages of the receptive field and obtain more information from features without increasing the computational cost. Experimental results are obtained using a synthetic motion-blurred image dataset and a real blurred image dataset for comparison with existing deblurring methods. The experimental results demonstrate that the image deblurring method proposed in this paper has a favorable effect on actual motion-blurred images.

15.
Neural Netw ; 172: 106141, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38301340

ABSTRACT

Multi-view deep neural networks have shown excellent performance on 3D shape classification tasks. However, global features aggregated from multiple views data often lack content information and spatial relationship, which leads to difficult identification the small variance among subcategories in the same category. To solve this problem, in this paper, a novel multiscale dilated convolution neural network termed as MSDCNN is proposed for multi-view fine-grained 3D shape classification. Firstly, a sequence of views are rendered from 12-viewpoints around the input 3D shape by the sequential view capturing module. Then, the first 22 convolution layers of ResNeXt50 is employed to extract the semantic features of each view, and a global mixed feature map is obtained through the element-wise maximum operation of the 12 output feature maps. Furthermore, attention dilated module (ADM), which combines four concatenated attention dilated block (ADB), is designed to extract larger receptive field features from global mixed feature map to enhance context information among the views. Specifically, each ADB is consisted by an attention mechanism module and a dilated convolution with different dilation rates. In addition, prediction module with label smoothing is proposed to classify features, which contains 3 × 3 convolution and adaptive average pooling. The performance of our method is validated experimentally on the ModelNet10, ModelNet40 and FG3D datasets. Experimental results demonstrate the effectiveness and superiority of the proposed MSDCNN framework for 3D shape fine-grained classification.


Subject(s)
Neural Networks, Computer , Semantics
16.
Neural Netw ; 171: 466-473, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38150872

ABSTRACT

DNA molecules commonly exhibit wide interactions between the nucleobases. Modeling the interactions is important for obtaining accurate sequence-based inference. Although many deep learning methods have recently been developed for modeling DNA sequences, they still suffer from two major issues: 1) most existing methods can handle only short DNA fragments and fail to capture long-range information; 2) current methods always require massive supervised labels, which are hard to obtain in practice. We propose a new method to address both issues. Our neural network employs circular dilated convolutions as building blocks in the backbone. As a result, our network can take long DNA sequences as input without any condensation. We also incorporate the neural network into a self-supervised learning framework to capture inherent information in DNA without expensive supervised labeling. We have tested our model in two DNA inference tasks, the human variant effect and the open chromatin region of plants, where the experimental results show that our method outperforms five other deep learning models. Our code is available at https://github.com/wiedersehne/cdilDNA.


Subject(s)
DNA , Neural Networks, Computer , Humans , Base Sequence , DNA/genetics , Supervised Machine Learning
17.
Math Biosci Eng ; 20(11): 20135-20154, 2023 Nov 03.
Article in English | MEDLINE | ID: mdl-38052640

ABSTRACT

Accurate segmentation of infected regions in lung computed tomography (CT) images is essential for the detection and diagnosis of coronavirus disease 2019 (COVID-19). However, lung lesion segmentation has some challenges, such as obscure boundaries, low contrast and scattered infection areas. In this paper, the dilated multiresidual boundary guidance network (Dmbg-Net) is proposed for COVID-19 infection segmentation in CT images of the lungs. This method focuses on semantic relationship modelling and boundary detail guidance. First, to effectively minimize the loss of significant features, a dilated residual block is substituted for a convolutional operation, and dilated convolutions are employed to expand the receptive field of the convolution kernel. Second, an edge-attention guidance preservation block is designed to incorporate boundary guidance of low-level features into feature integration, which is conducive to extracting the boundaries of the region of interest. Third, the various depths of features are used to generate the final prediction, and the utilization of a progressive multi-scale supervision strategy facilitates enhanced representations and highly accurate saliency maps. The proposed method is used to analyze COVID-19 datasets, and the experimental results reveal that the proposed method has a Dice similarity coefficient of 85.6% and a sensitivity of 84.2%. Extensive experimental results and ablation studies have shown the effectiveness of Dmbg-Net. Therefore, the proposed method has a potential application in the detection, labeling and segmentation of other lesion areas.


Subject(s)
COVID-19 , Humans , COVID-19/epidemiology , Algorithms , Semantics , Tomography, X-Ray Computed , Image Processing, Computer-Assisted
18.
Network ; : 1-19, 2023 Nov 21.
Article in English | MEDLINE | ID: mdl-38031802

ABSTRACT

Leaf infection detection and diagnosis at an earlier stage can improve agricultural output and reduce monetary costs. An inaccurate segmentation may degrade the accuracy of disease classification due to some different and complex leaf diseases. Also, the disease's adhesion and dimension can overlap, causing partial under-segmentation. Therefore, a novel robust Deep Encoder-Decoder Cascaded Network (DEDCNet) model is proposed in this manuscript for leaf image segmentation that precisely segments the diseased leaf spots and differentiates similar diseases. This model is comprised of an Infected Spot Recognition Network and an Infected Spot Segmentation Network. Initially, ISRN is designed by integrating cascaded CNN with a Feature Pyramid Pooling layer to identify the infected leaf spot and avoid an impact of background details. After that, the ISSN developed using an encoder-decoder network, which uses a multi-scale dilated convolution kernel to precisely segment the infected leaf spot. Moreover, the resultant leaf segments are provided to the pre-learned CNN models to learn texture features followed by the SVM algorithm to categorize leaf disease classes. The ODEDCNet delivers exceptional performance on both the Betel Leaf Image and PlantVillage datasets. On the Betel Leaf Image dataset, it achieves an accuracy of 94.89%, with high precision (94.35%), recall (94.77%), and F-score (94.56%), while maintaining low under-segmentation (6.2%) and over-segmentation rates (2.8%). It also achieves a remarkable Dice coefficient of 0.9822, all in just 0.10 seconds. On the PlantVillage dataset, the ODEDCNet outperforms other existing models with an accuracy of 96.5%, demonstrating high precision (96.61%), recall (96.5%), and F-score (96.56%). It excels in reducing under-segmentation to just 3.12% and over-segmentation to 2.56%. Furthermore, it achieves a Dice coefficient of 0.9834 in a mere 0.09 seconds. It evident for the greater efficiency on both segmentation and categorization of leaf diseases contrasted with the existing models.

19.
Diagnostics (Basel) ; 13(21)2023 Nov 03.
Article in English | MEDLINE | ID: mdl-37958277

ABSTRACT

T2-weighted magnetic resonance imaging (MRI) and diffusion-weighted imaging (DWI) are essential components of cervical cancer diagnosis. However, combining these channels for the training of deep learning models is challenging due to image misalignment. Here, we propose a novel multi-head framework that uses dilated convolutions and shared residual connections for the separate encoding of multiparametric MRI images. We employ a residual U-Net model as a baseline, and perform a series of architectural experiments to evaluate the tumor segmentation performance based on multiparametric input channels and different feature encoding configurations. All experiments were performed on a cohort of 207 patients with locally advanced cervical cancer. Our proposed multi-head model using separate dilated encoding for T2W MRI and combined b1000 DWI and apparent diffusion coefficient (ADC) maps achieved the best median Dice similarity coefficient (DSC) score, 0.823 (confidence interval (CI), 0.595-0.797), outperforming the conventional multi-channel model, DSC 0.788 (95% CI, 0.568-0.776), although the difference was not statistically significant (p > 0.05). We investigated channel sensitivity using 3D GRAD-CAM and channel dropout, and highlighted the critical importance of T2W and ADC channels for accurate tumor segmentation. However, our results showed that b1000 DWI had a minor impact on the overall segmentation performance. We demonstrated that the use of separate dilated feature extractors and independent contextual learning improved the model's ability to reduce the boundary effects and distortion of DWI, leading to improved segmentation performance. Our findings could have significant implications for the development of robust and generalizable models that can extend to other multi-modal segmentation applications.

20.
Sensors (Basel) ; 23(22)2023 Nov 16.
Article in English | MEDLINE | ID: mdl-38005604

ABSTRACT

Monocular panoramic depth estimation has various applications in robotics and autonomous driving due to its ability to perceive the entire field of view. However, panoramic depth estimation faces two significant challenges: global context capturing and distortion awareness. In this paper, we propose a new framework for panoramic depth estimation that can simultaneously address panoramic distortion and extract global context information, thereby improving the performance of panoramic depth estimation. Specifically, we introduce an attention mechanism into the multi-scale dilated convolution and adaptively adjust the receptive field size between different spatial positions, designing the adaptive attention dilated convolution module, which effectively perceives distortion. At the same time, we design the global scene understanding module to integrate global context information into the feature maps generated using the feature extractor. Finally, we trained and evaluated our model on three benchmark datasets which contains the virtual and real-world RGB-D panorama datasets. The experimental results show that the proposed method achieves competitive performance, comparable to existing techniques in both quantitative and qualitative evaluations. Furthermore, our method has fewer parameters and more flexibility, making it a scalable solution in mobile AR.

SELECTION OF CITATIONS
SEARCH DETAIL