Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Neuroimage ; 292: 120608, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38626817

RESUMO

The morphological analysis and volume measurement of the hippocampus are crucial to the study of many brain diseases. Therefore, an accurate hippocampal segmentation method is beneficial for the development of clinical research in brain diseases. U-Net and its variants have become prevalent in hippocampus segmentation of Magnetic Resonance Imaging (MRI) due to their effectiveness, and the architecture based on Transformer has also received some attention. However, some existing methods focus too much on the shape and volume of the hippocampus rather than its spatial information, and the extracted information is independent of each other, ignoring the correlation between local and global features. In addition, many methods cannot be effectively applied to practical medical image segmentation due to many parameters and high computational complexity. To this end, we combined the advantages of CNNs and ViTs (Vision Transformer) and proposed a simple and lightweight model: Light3DHS for the segmentation of the 3D hippocampus. In order to obtain richer local contextual features, the encoder first utilizes a multi-scale convolutional attention module (MCA) to learn the spatial information of the hippocampus. Considering the importance of local features and global semantics for 3D segmentation, we used a lightweight ViT to learn high-level features of scale invariance and further fuse local-to-global representation. To evaluate the effectiveness of encoder feature representation, we designed three decoders of different complexity to generate segmentation maps. Experiments on three common hippocampal datasets demonstrate that the network achieves more accurate hippocampus segmentation with fewer parameters. Light3DHS performs better than other state-of-the-art algorithms.


Assuntos
Hipocampo , Imageamento Tridimensional , Imageamento por Ressonância Magnética , Hipocampo/diagnóstico por imagem , Humanos , Imageamento por Ressonância Magnética/métodos , Imageamento Tridimensional/métodos , Redes Neurais de Computação , Aprendizado Profundo , Algoritmos
2.
Sensors (Basel) ; 24(16)2024 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-39204917

RESUMO

No-reference image quality assessment aims to evaluate image quality based on human subjective perceptions. Current methods face challenges with insufficient ability to focus on global and local information simultaneously and information loss due to image resizing. To address these issues, we propose a model that combines Swin-Transformer and natural scene statistics. The model utilizes Swin-Transformer to extract multi-scale features and incorporates a feature enhancement module and deformable convolution to improve feature representation, adapting better to structural variations in images, apply dual-branch attention to focus on key areas, and align the assessment more closely with human visual perception. The Natural Scene Statistics compensates information loss caused by image resizing. Additionally, we use a normalized loss function to accelerate model convergence and enhance stability. We evaluate our model on six standard image quality assessment datasets (both synthetic and authentic), and show that our model achieves advanced results across multiple datasets. Compared to the advanced DACNN method, our model achieved Spearman rank correlation coefficients of 0.922 and 0.923 on the KADID and KonIQ datasets, respectively, representing improvements of 1.9% and 2.4% over this method. It demonstrated outstanding performance in handling both synthetic and authentic scenes.

3.
Sensors (Basel) ; 24(4)2024 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-38400274

RESUMO

Salient Object Detection (SOD) in RGB-D images plays a crucial role in the field of computer vision, with its central aim being to identify and segment the most visually striking objects within a scene. However, optimizing the fusion of multi-modal and multi-scale features to enhance detection performance remains a challenge. To address this issue, we propose a network model based on semantic localization and multi-scale fusion (SLMSF-Net), specifically designed for RGB-D SOD. Firstly, we designed a Deep Attention Module (DAM), which extracts valuable depth feature information from both channel and spatial perspectives and efficiently merges it with RGB features. Subsequently, a Semantic Localization Module (SLM) is introduced to enhance the top-level modality fusion features, enabling the precise localization of salient objects. Finally, a Multi-Scale Fusion Module (MSF) is employed to perform inverse decoding on the modality fusion features, thus restoring the detailed information of the objects and generating high-precision saliency maps. Our approach has been validated across six RGB-D salient object detection datasets. The experimental results indicate an improvement of 0.20~1.80%, 0.09~1.46%, 0.19~1.05%, and 0.0002~0.0062, respectively in maxF, maxE, S, and MAE metrics, compared to the best competing methods (AFNet, DCMF, and C2DFNet).

4.
BMC Bioinformatics ; 24(1): 334, 2023 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-37679724

RESUMO

BACKGROUND: Drug-target affinity (DTA) prediction is a critical step in the field of drug discovery. In recent years, deep learning-based methods have emerged for DTA prediction. In order to solve the problem of fusion of substructure information of drug molecular graphs and utilize multi-scale information of protein, a self-supervised pre-training model based on substructure extraction and multi-scale features is proposed in this paper. RESULTS: For drug molecules, the model obtains substructure information through the method of probability matrix, and the contrastive learning method is implemented on the graph-level representation and subgraph-level representation to pre-train the graph encoder for downstream tasks. For targets, a BiLSTM method that integrates multi-scale features is used to capture long-distance relationships in the amino acid sequence. The experimental results showed that our model achieved better performance for DTA prediction. CONCLUSIONS: The proposed model improves the performance of the DTA prediction, which provides a novel strategy based on substructure extraction and multi-scale features.


Assuntos
Descoberta de Drogas , Sequência de Aminoácidos , Probabilidade
5.
Sensors (Basel) ; 23(19)2023 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-37837167

RESUMO

In interpreting a scene for numerous applications, including autonomous driving and robotic navigation, semantic segmentation is crucial. Compared to single-modal data, multi-modal data allow us to extract a richer set of features, which is the benefit of improving segmentation accuracy and effect. We propose a point cloud semantic segmentation method, and a fusion graph convolutional network (FGCN) which extracts the semantic information of each point involved in the two-modal data of images and point clouds. The two-channel k-nearest neighbors (KNN) module of the FGCN was created to address the issue of the feature extraction's poor efficiency by utilizing picture data. Notably, the FGCN utilizes the spatial attention mechanism to better distinguish more important features and fuses multi-scale features to enhance the generalization capability of the network and increase the accuracy of the semantic segmentation. In the experiment, a self-made semantic segmentation KITTI (SSKIT) dataset was made for the fusion effect. The mean intersection over union (MIoU) of the SSKIT can reach 88.06%. As well as the public datasets, the S3DIS showed that our method can enhance data features and outperform other methods: the MIoU of the S3DIS can reach up to 78.55%. The segmentation accuracy is significantly improved compared with the existing methods, which verifies the effectiveness of the improved algorithms.

6.
Sensors (Basel) ; 23(1)2023 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-36617046

RESUMO

Magnetic fingerprint has a multitude of advantages in the application of indoor positioning, but as a weak magnetic field, the dynamic range of the data is limited, which exerts direct influence on the positioning accuracy. Aiming at resolving the problem wherein the indoor magnetic positioning results tremendously rest with the magnetic characteristics, this paper puts forward a method based on deep learning to fuse the temporal and spatial characteristics of magnetic fingerprints, to fully explore the magnetic characteristics and to obtain stable and trustworthy positioning results. First and foremost, the trajectory of the acquisition area is extracted by adopting the ameliorated random waypoint model, and the simulation of pedestrian trajectory is completed. Then, the magnetic sequence is obtained by mapping the magnetic data. Aside from that, considering the scale characteristics of the sequence, a scale transformation unit is designed to obtain multi-scale features. At length, the neural network self-attention mechanism is adopted to fuse multiple features and output the positioning results. By probing into the positioning results of dissimilar indoor scenes, this method can adapt to diverse scenes. The average positioning error in a corridor, open area and complex area reaches 0.65 m, 0.93 m and 1.38 m respectively. The addition of multi-scale features has certain reference value for ameliorating the positioning performance.


Assuntos
Campos Magnéticos , Pedestres , Humanos , Simulação por Computador , Redes Neurais de Computação , Fenômenos Físicos
7.
Sensors (Basel) ; 23(15)2023 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-37571662

RESUMO

In image classification, few-shot learning deals with recognizing visual categories from a few tagged examples. The degree of expressiveness of the encoded features in this scenario is a crucial question that needs to be addressed in the models being trained. Recent approaches have achieved encouraging results in improving few-shot models in deep learning, but designing a competitive and simple architecture is challenging, especially considering its requirement in many practical applications. This work proposes an improved few-shot model based on a multi-layer feature fusion (FMLF) method. The presented approach includes extended feature extraction and fusion mechanisms in the Convolutional Neural Network (CNN) backbone, as well as an effective metric to compute the divergences in the end. In order to evaluate the proposed method, a challenging visual classification problem, maize crop insect classification with specific pests and beneficial categories, is addressed, serving both as a test of our model and as a means to propose a novel dataset. Experiments were carried out to compare the results with ResNet50, VGG16, and MobileNetv2, used as feature extraction backbones, and the FMLF method demonstrated higher accuracy with fewer parameters. The proposed FMLF method improved accuracy scores by up to 3.62% in one-shot and 2.82% in five-shot classification tasks compared to a traditional backbone, which uses only global image features.

8.
J Digit Imaging ; 36(6): 2427-2440, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37491542

RESUMO

Colonoscopy is acknowledged as the foremost technique for detecting polyps and facilitating early screening and prevention of colorectal cancer. In clinical settings, the segmentation of polyps from colonoscopy images holds paramount importance as it furnishes critical diagnostic and surgical information. Nevertheless, the precise segmentation of colon polyp images is still a challenging task owing to the varied sizes and morphological features of colon polyps and the indistinct boundary between polyps and mucosa. In this study, we present a novel network architecture named ECTransNet to address the challenges in polyp segmentation. Specifically, we propose an edge complementary module that effectively fuses the differences between features with multiple resolutions. This enables the network to exchange features across different levels and results in a substantial improvement in the edge fineness of the polyp segmentation. Additionally, we utilize a feature aggregation decoder that leverages residual blocks to adaptively fuse high-order to low-order features. This strategy restores local edges in low-order features while preserving the spatial information of targets in high-order features, ultimately enhancing the segmentation accuracy. According to extensive experiments conducted on ECTransNet, the results demonstrate that this method outperforms most state-of-the-art approaches on five publicly available datasets. Specifically, our method achieved mDice scores of 0.901 and 0.923 on the Kvasir-SEG and CVC-ClinicDB datasets, respectively. On the Endoscene, CVC-ColonDB, and ETIS datasets, we obtained mDice scores of 0.907, 0.766, and 0.728, respectively.

9.
BMC Bioinformatics ; 23(1): 362, 2022 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-36056300

RESUMO

BACKGROUND: Retrosynthesis prediction is the task of deducing reactants from reaction products, which is of great importance for designing the synthesis routes of the target products. The product molecules are generally represented with some descriptors such as simplified molecular input line entry specification (SMILES) or molecular fingerprints in order to build the prediction models. However, most of the existing models utilize only one molecular descriptor and simply consider the molecular descriptors in a whole rather than further mining multi-scale features, which cannot fully and finely utilizes molecules and molecular descriptors features. RESULTS: We propose a novel model to address the above concerns. Firstly, we build a new convolutional neural network (CNN) based feature extraction network to extract multi-scale features from the molecular descriptors by utilizing several filters with different sizes. Then, we utilize a two-branch feature extraction layer to fusion the multi-scale features of several molecular descriptors to perform the retrosynthesis prediction without expert knowledge. The comparing result with other models on the benchmark USPTO-50k chemical dataset shows that our model surpasses the state-of-the-art model by 7.4%, 10.8%, 11.7% and 12.2% in terms of the top-1, top-3, top-5 and top-10 accuracies. Since there is no related work in the field of bioretrosynthesis prediction due to the fact that compounds in metabolic reactions are much more difficult to be featured than those in chemical reactions, we further test the feasibility of our model in task of bioretrosynthesis prediction by using the well-known MetaNetX metabolic dataset, and achieve top-1, top-3, top-5 and top-10 accuracies of 45.2%, 67.0%, 73.6% and 82.2%, respectively. CONCLUSION: The comparison result on USPTO-50k indicates that our proposed model surpasses the existing state-of-the-art model. The evaluation result on MetaNetX dataset indicates that the models used for retrosynthesis prediction can also be used for bioretrosynthesis prediction.


Assuntos
Redes Neurais de Computação , Pesquisa
10.
Pattern Recognit ; 124: 108452, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34848897

RESUMO

Due to the irregular shapes,various sizes and indistinguishable boundaries between the normal and infected tissues, it is still a challenging task to accurately segment the infected lesions of COVID-19 on CT images. In this paper, a novel segmentation scheme is proposed for the infections of COVID-19 by enhancing supervised information and fusing multi-scale feature maps of different levels based on the encoder-decoder architecture. To this end, a deep collaborative supervision (Co-supervision) scheme is proposed to guide the network learning the features of edges and semantics. More specifically, an Edge Supervised Module (ESM) is firstly designed to highlight low-level boundary features by incorporating the edge supervised information into the initial stage of down-sampling. Meanwhile, an Auxiliary Semantic Supervised Module (ASSM) is proposed to strengthen high-level semantic information by integrating mask supervised information into the later stage. Then an Attention Fusion Module (AFM) is developed to fuse multiple scale feature maps of different levels by using an attention mechanism to reduce the semantic gaps between high-level and low-level feature maps. Finally, the effectiveness of the proposed scheme is demonstrated on four various COVID-19 CT datasets. The results show that the proposed three modules are all promising. Based on the baseline (ResUnet), using ESM, ASSM, or AFM alone can respectively increase Dice metric by 1.12%, 1.95%,1.63% in our dataset, while the integration by incorporating three models together can rise 3.97%. Compared with the existing approaches in various datasets, the proposed method can obtain better segmentation performance in some main metrics, and can achieve the best generalization and comprehensive performance.

11.
BMC Bioinformatics ; 22(1): 133, 2021 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-33740884

RESUMO

BACKGROUND: Non-coding RNA (ncRNA) and protein interactions play essential roles in various physiological and pathological processes. The experimental methods used for predicting ncRNA-protein interactions are time-consuming and labor-intensive. Therefore, there is an increasing demand for computational methods to accurately and efficiently predict ncRNA-protein interactions. RESULTS: In this work, we presented an ensemble deep learning-based method, EDLMFC, to predict ncRNA-protein interactions using the combination of multi-scale features, including primary sequence features, secondary structure sequence features, and tertiary structure features. Conjoint k-mer was used to extract protein/ncRNA sequence features, integrating tertiary structure features, then fed into an ensemble deep learning model, which combined convolutional neural network (CNN) to learn dominating biological information with bi-directional long short-term memory network (BLSTM) to capture long-range dependencies among the features identified by the CNN. Compared with other state-of-the-art methods under five-fold cross-validation, EDLMFC shows the best performance with accuracy of 93.8%, 89.7%, and 86.1% on RPI1807, NPInter v2.0, and RPI488 datasets, respectively. The results of the independent test demonstrated that EDLMFC can effectively predict potential ncRNA-protein interactions from different organisms. Furtherly, EDLMFC is also shown to predict hub ncRNAs and proteins presented in ncRNA-protein networks of Mus musculus successfully. CONCLUSIONS: In general, our proposed method EDLMFC improved the accuracy of ncRNA-protein interaction predictions and anticipated providing some helpful guidance on ncRNA functions research. The source code of EDLMFC and the datasets used in this work are available at https://github.com/JingjingWang-87/EDLMFC .


Assuntos
Biologia Computacional , Aprendizado Profundo , Animais , Camundongos , Redes Neurais de Computação , RNA não Traduzido , Software
12.
Sensors (Basel) ; 21(22)2021 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-34833577

RESUMO

We propose GourmetNet, a single-pass, end-to-end trainable network for food segmentation that achieves state-of-the-art performance. Food segmentation is an important problem as the first step for nutrition monitoring, food volume and calorie estimation. Our novel architecture incorporates both channel attention and spatial attention information in an expanded multi-scale feature representation using our advanced Waterfall Atrous Spatial Pooling module. GourmetNet refines the feature extraction process by merging features from multiple levels of the backbone through the two attention modules. The refined features are processed with the advanced multi-scale waterfall module that combines the benefits of cascade filtering and pyramid representations without requiring a separate decoder or post-processing. Our experiments on two food datasets show that GourmetNet significantly outperforms existing current state-of-the-art methods.


Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Atenção , Alimentos
13.
Sensors (Basel) ; 21(15)2021 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-34372362

RESUMO

In the field of surface defect detection, the scale difference of product surface defects is often huge. The existing defect detection methods based on Convolutional Neural Networks (CNNs) are more inclined to express macro and abstract features, and the ability to express local and small defects is insufficient, resulting in an imbalance of feature expression capabilities. In this paper, a Multi-Scale Feature Learning Network (MSF-Net) based on Dual Module Feature (DMF) extractor is proposed. DMF extractor is mainly composed of optimized Concatenated Rectified Linear Units (CReLUs) and optimized Inception feature extraction modules, which increases the diversity of feature receptive fields while reducing the amount of calculation; the feature maps of the middle layer with different sizes of receptive fields are merged to increase the richness of the receptive fields of the last layer of feature maps; the residual shortcut connections, batch normalization layer and average pooling layer are used to replace the fully connected layer to improve training efficiency, and make the multi-scale feature learning ability more balanced at the same time. Two representative multi-scale defect data sets are used for experiments, and the experimental results verify the advancement and effectiveness of the proposed MSF-Net in the detection of surface defects with multi-scale features.


Assuntos
Redes Neurais de Computação
14.
Sensors (Basel) ; 20(21)2020 Oct 23.
Artigo em Inglês | MEDLINE | ID: mdl-33114173

RESUMO

Data-driven bearing-fault diagnosis methods have become a research hotspot recently. These methods have to meet two premises: (1) the distributions of the data to be tested and the training data are the same; (2) there are a large number of high-quality labeled data. However, machines usually work under different working conditions in practice, which challenges these prerequisites due to the fact that the data distributions under different working conditions are different. In this paper, the one-dimensional Multi-Scale Domain Adaptive Network (1D-MSDAN) is proposed to address this issue. The 1D-MSDAN is a kind of deep transfer model, which uses both feature adaptation and classifier adaptation to guide the multi-scale convolutional neural network to perform bearing-fault diagnosis under varying working conditions. Feature adaptation is performed by both multi-scale feature adaptation and multi-level feature adaptation, which helps in finding domain-invariant features by minimizing the distribution discrepancy between different working conditions by using the Multi-kernel Maximum Mean Discrepancy (MK-MMD). Furthermore, classifier adaptation is performed by entropy minimization in the target domain to bridge the source classifier and target classifier to further eliminate domain discrepancy. The Case Western Reserve University (CWRU) bearing database is used to validate the proposed 1D-MSDAN. The experimental results show that the diagnostic accuracy for the 12 transfer tasks performed by 1D-MSDAN was superior to that of the mainstream transfer learning models for bearing-fault diagnosis under variable working conditions. In addition, the transfer learning performance of 1D-MSDAN for multi-target domain adaptation and real industrial scenarios was also verified.

15.
Sensors (Basel) ; 19(7)2019 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-30974816

RESUMO

Deep learning models combining spectral and spatial features have been proven to be effective for hyperspectral image (HSI) classification. However, most spatial feature integration methods only consider a single input spatial scale regardless of various shapes and sizes of objects over the image plane, leading to missing scale-dependent information. In this paper, we propose a hierarchical multi-scale convolutional neural networks (CNNs) with auxiliary classifiers (HMCNN-AC) to learn hierarchical multi-scale spectral-spatial features for HSI classification. First, to better exploit the spatial information, multi-scale image patches for each pixel are generated at different spatial scales. These multi-scale patches are all centered at the same central spectrum but with shrunken spatial scales. Then, we apply multi-scale CNNs to extract spectral-spatial features from each scale patch. The obtained multi-scale convolutional features are considered as structured sequential data with spectral-spatial dependency, and a bidirectional LSTM is proposed to capture the correlation and extract a hierarchical representation for each pixel. To better train the whole network, weighted auxiliary classifiers are employed for the multi-scale CNNs and optimized together with the main loss function. Experimental results on three public HSI datasets demonstrate the superiority of our proposed framework over some state-of-the-art methods.

16.
Cytometry A ; 89(10): 893-902, 2016 10.
Artigo em Inglês | MEDLINE | ID: mdl-27560544

RESUMO

Islet cell quantification and function is important for developing novel therapeutic interventions for diabetes. Existing methods of pancreatic islet segmentation in histopathological images depend strongly on cell/nuclei detection, and thus are limited due to a wide variance in the appearance of pancreatic islets. In this paper, we propose a supervised learning pipeline to segment pancreatic islets in histopathological images, which does not require cell detection. The proposed framework firstly partitions images into superpixels, and then extracts multi-scale color-texture features from each superpixel and processes these features using rolling guidance filters, in order to simultaneously reduce inter-class ambiguity and intra-class variation. Finally, a linear support vector machine (SVM) is trained and applied to segment the testing images. A total of 23 hematoxylin-and-eosin-stained histopathological images with pancreatic islets are used for verifying the framework. With an average accuracy of 95%, training time of 20 min and testing time of 1 min per image, the proposed framework outperforms existing approaches with better segmentation performance and lower computational cost. © 2016 International Society for Advancement of Cytometry.


Assuntos
Diagnóstico por Imagem/métodos , Ilhotas Pancreáticas/patologia , Animais , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Masculino , Camundongos , Reconhecimento Automatizado de Padrão/métodos , Máquina de Vetores de Suporte
17.
Comput Biol Chem ; 112: 108130, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38954849

RESUMO

Retrosynthesis is vital in synthesizing target products, guiding reaction pathway design crucial for drug and material discovery. Current models often neglect multi-scale feature extraction, limiting efficacy in leveraging molecular descriptors. Our proposed SB-Net model, a deep-learning architecture tailored for retrosynthesis prediction, addresses this gap. SB-Net combines CNN and Bi-LSTM architectures, excelling in capturing multi-scale molecular features. It integrates parallel branches for processing one-hot encoded descriptors and ECFP, merging through dense layers. Experimental results demonstrate SB-Net's superiority, achieving 73.6 % top-1 and 94.6 % top-10 accuracy on USPTO-50k data. Versatility is validated on MetaNetX, with rates of 52.8 % top-1, 74.3 % top-3, 79.8 % top-5, and 83.5 % top-10. SB-Net's success in bioretrosynthesis prediction tasks indicates its efficacy. This research advances computational chemistry, offering a robust deep-learning model for retrosynthesis prediction. With implications for drug discovery and synthesis planning, SB-Net promises innovative and efficient pathways.

18.
Comput Biol Med ; 170: 108013, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38271837

RESUMO

Accurate medical image segmentation is of great significance for subsequent diagnosis and analysis. The acquisition of multi-scale information plays an important role in segmenting regions of interest of different sizes. With the emergence of Transformers, numerous networks adopted hybrid structures incorporating Transformers and CNNs to learn multi-scale information. However, the majority of research has focused on the design and composition of CNN and Transformer structures, neglecting the inconsistencies in feature learning between Transformer and CNN. This oversight has resulted in the hybrid network's performance not being fully realized. In this work, we proposed a novel hybrid multi-scale segmentation network named HmsU-Net, which effectively fused multi-scale features. Specifically, HmsU-Net employed a parallel design incorporating both CNN and Transformer architectures. To address the inconsistency in feature learning between CNN and Transformer within the same stage, we proposed the multi-scale feature fusion module. For feature fusion across different stages, we introduced the cross-attention module. Comprehensive experiments conducted on various datasets demonstrate that our approach surpasses current state-of-the-art methods.


Assuntos
Processamento de Imagem Assistida por Computador , Aprendizagem
19.
Comput Biol Med ; 170: 108057, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38301516

RESUMO

Medical image segmentation is a fundamental research problem in the field of medical image processing. Recently, the Transformer have achieved highly competitive performance in computer vision. Therefore, many methods combining Transformer with convolutional neural networks (CNNs) have emerged for segmenting medical images. However, these methods cannot effectively capture the multi-scale features in medical images, even though texture and contextual information embedded in the multi-scale features are extremely beneficial for segmentation. To alleviate this limitation, we propose a novel Transformer-CNN combined network using multi-scale feature learning for three-dimensional (3D) medical image segmentation, which is called MS-TCNet. The proposed model utilizes a shunted Transformer and CNN to construct an encoder and pyramid decoder, allowing six different scale levels of feature learning. It captures multi-scale features with refinement at each scale level. Additionally, we propose a novel lightweight multi-scale feature fusion (MSFF) module that can fully fuse the different-scale semantic features generated by the pyramid decoder for each segmentation class, resulting in a more accurate segmentation output. We conducted experiments on three widely used 3D medical image segmentation datasets. The experimental results indicated that our method outperformed state-of-the-art medical image segmentation methods, suggesting its effectiveness, robustness, and superiority. Meanwhile, our model has a smaller number of parameters and lower computational complexity than conventional 3D segmentation networks. The results confirmed that the model is capable of effective multi-scale feature learning and that the learned multi-scale features are useful for improving segmentation performance. We open-sourced our code, which can be found at https://github.com/AustinYuAo/MS-TCNet.


Assuntos
Processamento de Imagem Assistida por Computador , Aprendizagem , Redes Neurais de Computação
20.
J Imaging ; 10(1)2024 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-38249009

RESUMO

Higher standards have been proposed for detection systems since camouflaged objects are not distinct enough, making it possible to ignore the difference between their background and foreground. In this paper, we present a new framework for Camouflaged Object Detection (COD) named FSANet, which consists mainly of three operations: spatial detail mining (SDM), cross-scale feature combination (CFC), and hierarchical feature aggregation decoder (HFAD). The framework simulates the three-stage detection process of the human visual mechanism when observing a camouflaged scene. Specifically, we have extracted five feature layers using the backbone and divided them into two parts with the second layer as the boundary. The SDM module simulates the human cursory inspection of the camouflaged objects to gather spatial details (such as edge, texture, etc.) and fuses the features to create a cursory impression. The CFC module is used to observe high-level features from various viewing angles and extracts the same features by thoroughly filtering features of various levels. We also design side-join multiplication in the CFC module to avoid detail distortion and use feature element-wise multiplication to filter out noise. Finally, we construct an HFAD module to deeply mine effective features from these two stages, direct the fusion of low-level features using high-level semantic knowledge, and improve the camouflage map using hierarchical cascade technology. Compared to the nineteen deep-learning-based methods in terms of seven widely used metrics, our proposed framework has clear advantages on four public COD datasets, demonstrating the effectiveness and superiority of our model.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa