Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
Entropy (Basel) ; 26(2)2024 Feb 12.
Artículo en Inglés | MEDLINE | ID: mdl-38392417

RESUMEN

Joint entity and relation extraction methods have attracted an increasing amount of attention recently due to their capacity to extract relational triples from intricate texts. However, most of the existing methods ignore the association and difference between the Named Entity Recognition (NER) subtask features and the Relation Extraction (RE) subtask features, which leads to an imbalance in the interaction between these two subtasks. To solve the above problems, we propose a new joint entity and relation extraction method, FSN. It contains a Filter Separator Network (FSN) module that employs a two-direction LSTM to filter and separate the information contained in a sentence and merges similar features through a splicing operation, thus solving the problem of the interaction imbalance between subtasks. In order to better extract the local feature information for each subtask, we designed a Named Entity Recognition Generation (NERG) module and a Relation Extraction Generation (REG) module by adopting the design idea of the decoder in Transformer and average pooling operations to better capture the entity boundary information in the sentence and the entity pair boundary information for each relation in the relational triple, respectively. Additionally, we propose a dynamic loss function that dynamically adjusts the learning weights of each subtask in each epoch according to the proportionality between each subtask, thus narrowing down the difference between the ideal and realistic results. We thoroughly evaluated our model on the SciERC dataset and the ACE2005 dataset. The experimental results demonstrate that our model achieves satisfactory results compared to the baseline model.

2.
Sensors (Basel) ; 23(8)2023 Apr 18.
Artículo en Inglés | MEDLINE | ID: mdl-37112404

RESUMEN

Accurate and robust camera pose estimation is essential for high-level applications such as augmented reality and autonomous driving. Despite the development of global feature-based camera pose regression methods and local feature-based matching guided pose estimation methods, challenging conditions, such as illumination changes and viewpoint changes, as well as inaccurate keypoint localization, continue to affect the performance of camera pose estimation. In this paper, we propose a novel relative camera pose regression framework that uses global features with rotation consistency and local features with rotation invariance. First, we apply a multi-level deformable network to detect and describe local features, which can learn appearances and gradient information sensitive to rotation variants. Second, we process the detection and description processes using the results from pixel correspondences of the input image pairs. Finally, we propose a novel loss that combines relative regression loss and absolute regression loss, incorporating global features with geometric constraints to optimize the pose estimation model. Our extensive experiments report satisfactory accuracy on the 7Scenes dataset with an average mean translation error of 0.18 m and a rotation error of 7.44° using image pairs as input. Ablation studies were also conducted to verify the effectiveness of the proposed method in the tasks of pose estimation and image matching using the 7Scenes and HPatches datasets.

3.
Sensors (Basel) ; 23(18)2023 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-37765772

RESUMEN

Three-dimensional face recognition is an important part of the field of computer vision. Point clouds are widely used in the field of 3D vision due to the simple mathematical expression. However, the disorder of the points makes it difficult for them to have ordered indexes in convolutional neural networks. In addition, the point clouds lack detailed textures, which makes the facial features easily affected by expression or head pose changes. To solve the above problems, this paper constructs a new face recognition network, which mainly consists of two parts. The first part is a novel operator based on a local feature descriptor to realize the fine-grained features extraction and the permutation invariance of point clouds. The second part is a feature enhancement mechanism to enhance the discrimination of facial features. In order to verify the performance of our method, we conducted experiments on three public datasets: CASIA-3D, Bosphorus, and Lock3Dface. The results show that the accuracy of our method is improved by 0.7%, 0.4%, and 0.8% compared with the latest methods on these three datasets, respectively.

4.
Sensors (Basel) ; 23(19)2023 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-37836968

RESUMEN

Local feature extractions have been verified to be effective for person re-identification (re-ID) in recent literature. However, existing methods usually rely on extracting local features from single part of a pedestrian while neglecting the relationship of local features among different pedestrian images. As a result, local features contain limited information from one pedestrian image, and cannot benefit from other pedestrian images. In this paper, we propose a novel approach named Local Relation-Aware Graph Convolutional Network (LRGCN) to learn the relationship of local features among different pedestrian images. In order to completely describe the relationship of local features among different pedestrian images, we propose overlap graph and similarity graph. The overlap graph formulates the edge weight as the overlap node number in the node's neighborhoods so as to learn robust local features, and the similarity graph defines the edge weight as the similarity between the nodes to learn discriminative local features. To propagate the information for different kinds of nodes effectively, we propose the Structural Graph Convolution (SGConv) operation. Different from traditional graph convolution operations where all nodes share the same parameter matrix, SGConv learns different parameter matrices for the node itself and its neighbor nodes to improve the expressive power. We conduct comprehensive experiments to verify our method on four large-scale person re-ID databases, and the overall results show LRGCN exceeds the state-of-the-art methods.

5.
Sensors (Basel) ; 23(7)2023 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-37050483

RESUMEN

There are problems associated with facial expression recognition (FER), such as facial occlusion and head pose variations. These two problems lead to incomplete facial information in images, making feature extraction extremely difficult. Most current methods use prior knowledge or fixed-size patches to perform local cropping, thereby enhancing the ability to acquire fine-grained features. However, the former requires extra data processing work and is prone to errors; the latter destroys the integrity of local features. In this paper, we propose a local Sliding Window Attention Network (SWA-Net) for FER. Specifically, we propose a sliding window strategy for feature-level cropping, which preserves the integrity of local features and does not require complex preprocessing. Moreover, the local feature enhancement module mines fine-grained features with intraclass semantics through a multiscale depth network. The adaptive local feature selection module is introduced to prompt the model to find more essential local features. Extensive experiments demonstrate that our SWA-Net model achieves a comparable performance to that of state-of-the-art methods with scores of 90.03% on RAF-DB, 89.22% on FERPlus, 63.97% on AffectNet.


Asunto(s)
Reconocimiento Facial , Cara , Conocimiento , Semántica , Expresión Facial
6.
BMC Bioinformatics ; 23(1): 538, 2022 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-36503372

RESUMEN

BACKGROUND: Investigating molecular heterogeneity provides insights into tumour origin and metabolomics. The increasing amount of data gathered makes manual analyses infeasible-therefore, automated unsupervised learning approaches are utilised for discovering tissue heterogeneity. However, automated analyses require experience setting the algorithms' hyperparameters and expert knowledge about the analysed biological processes. Moreover, feature engineering is needed to obtain valuable results because of the numerous features measured. RESULTS: We propose DiviK: a scalable stepwise algorithm with local data-driven feature space adaptation for segmenting high-dimensional datasets. The algorithm is compared to the optional solutions (regular k-means, spatial and spectral approaches) combined with different feature engineering techniques (None, PCA, EXIMS, UMAP, Neural Ions). Three quality indices: Dice Index, Rand Index and EXIMS score, focusing on the overall composition of the clustering, coverage of the tumour region and spatial cluster consistency, are used to assess the quality of unsupervised analyses. Algorithms were validated on mass spectrometry imaging (MSI) datasets-2D human cancer tissue samples and 3D mouse kidney images. DiviK algorithm performed the best among the four clustering algorithms compared (overall quality score 1.24, 0.58 and 162 for d(0, 0, 0), d(1, 1, 1) and the sum of ranks, respectively), with spectral clustering being mostly second. Feature engineering techniques impact the overall clustering results less than the algorithms themselves (partial [Formula: see text] effect size: 0.141 versus 0.345, Kendall's concordance index: 0.424 versus 0.138 for d(0, 0, 0)). CONCLUSIONS: DiviK could be the default choice in the exploration of MSI data. Thanks to its unique, GMM-based local optimisation of the feature space and deglomerative schema, DiviK results do not strongly depend on the feature engineering technique applied and can reveal the hidden structure in a tissue sample. Additionally, DiviK shows high scalability, and it can process at once the big omics data with more than 1.5 mln instances and a few thousand features. Finally, due to its simplicity, DiviK is easily generalisable to an even more flexible framework. Therefore, it is helpful for other -omics data (as single cell spatial transcriptomic) or tabular data in general (including medical images after appropriate embedding). A generic implementation is freely available under Apache 2.0 license at https://github.com/gmrukwa/divik .


Asunto(s)
Algoritmos , Metabolómica , Animales , Ratones , Humanos , Análisis por Conglomerados , Espectrometría de Masas , Macrodatos
7.
Exp Brain Res ; 240(3): 773-789, 2022 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-35034179

RESUMEN

Previous studies have paid special attention to the relationship between local features (e.g., raised dots) and human roughness perception. However, the relationship between global features (e.g., curved surface) and haptic roughness perception is still unclear. In the present study, a series of roughness estimation experiments was performed to investigate how global features affect human roughness perception. In each experiment, participants were asked to estimate the roughness of a series of haptic stimuli that combined local features (raised dots) and global features (sinusoidal-like curves). Experiments were designed to reveal whether global features changed their haptic roughness estimation. Furthermore, the present study tested whether the exploration method (direct, indirect, and static) changed haptic roughness estimations and examined the contribution of global features to roughness estimations. The results showed that sinusoidal-like curved surfaces with small periods were perceived to be rougher than those with large periods, while the direction of finger movement and indirect exploration did not change this phenomenon. Furthermore, the influence of global features on roughness was modulated by local features, regardless of whether raised-dot surfaces or smooth surfaces were used. Taken together, these findings suggested that an object's global features contribute to haptic roughness perceptions, while local features change the weight of the contribution that global features make to haptic roughness perceptions.


Asunto(s)
Tecnología Háptica , Percepción del Tacto , Dedos , Humanos , Movimiento , Estereognosis , Tacto
8.
Sensors (Basel) ; 22(24)2022 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-36560267

RESUMEN

Local feature matching is a part of many large vision tasks. Local feature matching usually consists of three parts: feature detection, description, and matching. The matching task usually serves a downstream task, such as camera pose estimation, so geometric information is crucial for the matching task. We propose the geometric feature embedding matching method (GFM) for local feature matching. We propose the adaptive keypoint geometric embedding module dynamic adjust keypoint position information and the orientation geometric embedding displayed modeling of geometric information about rotation. Subsequently, we interleave the use of self-attention and cross-attention for local feature enhancement. The predicted correspondences are multiplied by the local features. The correspondences are solved by computing dual-softmax. An intuitive human extraction and matching scheme is implemented. In order to verify the effectiveness of our proposed method, we performed validation on three datasets (MegaDepth, Hpatches, Aachen Day-Night v1.1) according to their respective metrics, and the results showed that our method achieved satisfactory results in all scenes.


Asunto(s)
Algoritmos , Humanos
9.
Sensors (Basel) ; 22(9)2022 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-35590899

RESUMEN

The research of object classification and part segmentation is a hot topic in computer vision, robotics, and virtual reality. With the emergence of depth cameras, point clouds have become easier to collect and increasingly important because of their simple and unified structures. Recently, a considerable number of studies have been carried out about deep learning on 3D point clouds. However, data captured directly by sensors from the real-world often encounters severe incomplete sampling problems. The classical network is able to learn deep point set features efficiently, but it is not robust enough when the method suffers from the lack of point clouds. In this work, a novel and general network was proposed, whose effect does not depend on a large amount of point cloud input data. The mutual learning of neighboring points and the fusion between high and low feature layers can better promote the integration of local features so that the network can be more robust. The specific experiments were conducted on the ScanNet and Modelnet40 datasets with 84.5% and 92.8% accuracy, respectively, which proved that our model is comparable or even better than most existing methods for classification and segmentation tasks, and has good local feature integration ability. Particularly, it can still maintain 87.4% accuracy when the number of input points is further reduced to 128. The model proposed has bridged the gap between classical networks and point cloud processing.


Asunto(s)
Robótica , Realidad Virtual , Nube Computacional , Redes Neurales de la Computación
10.
Entropy (Basel) ; 24(2)2022 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-35205580

RESUMEN

Handling missing values in matrix data is an important step in data analysis. To date, many methods to estimate missing values based on data pattern similarity have been proposed. Most previously proposed methods perform missing value imputation based on data trends over the entire feature space. However, individual missing values are likely to show similarity to data patterns in local feature space. In addition, most existing methods focus on single class data, while multiclass analysis is frequently required in various fields. Missing value imputation for multiclass data must consider the characteristics of each class. In this paper, we propose two methods based on closed itemsets, CIimpute and ICIimpute, to achieve missing value imputation using local feature space for multiclass matrix data. CIimpute estimates missing values using closed itemsets extracted from each class. ICIimpute is an improved method of CIimpute in which an attribute reduction process is introduced. Experimental results demonstrate that attribute reduction considerably reduces computational time and improves imputation accuracy. Furthermore, it is shown that, compared to existing methods, ICIimpute provides superior imputation accuracy but requires more computational time.

11.
Entropy (Basel) ; 24(4)2022 Mar 23.
Artículo en Inglés | MEDLINE | ID: mdl-35455106

RESUMEN

Visible thermal person re-identification (VT Re-ID) is the task of matching pedestrian images collected by thermal and visible light cameras. The two main challenges presented by VT Re-ID are the intra-class variation between pedestrian images and the cross-modality difference between visible and thermal images. Existing works have principally focused on local representation through cross-modality feature distribution, but ignore the internal connection of the local features of pedestrian body parts. Therefore, this paper proposes a dual-path attention network model to establish the spatial dependency relationship between the local features of the pedestrian feature map and to effectively enhance the feature extraction. Meanwhile, we propose cross-modality dual-constraint loss, which adds the center and boundary constraints for each class distribution in the embedding space to promote compactness within the class and enhance the separability between classes. Our experimental results show that our proposed approach has advantages over the state-of-the-art methods on the two public datasets SYSU-MM01 and RegDB. The result for the SYSU-MM01 is Rank-1/mAP 57.74%/54.35%, and the result for the RegDB is Rank-1/mAP 76.07%/69.43%.

12.
Sensors (Basel) ; 21(22)2021 Nov 19.
Artículo en Inglés | MEDLINE | ID: mdl-34833763

RESUMEN

The goal of the WrightBroS project is to design a system supporting the training of pilots in a flight simulator. The desired software should work on smart glasses supplementing the visual information with augmented reality data, displaying, for instance, additional training information or descriptions of visible devices in real time. Therefore, the rapid recognition of observed objects and their exact positioning is crucial for successful deployment. The keypoint descriptor approach is a natural framework that is used for this purpose. For this to be applied, the thorough examination of specific keypoint location methods and types of keypoint descriptors is required first, as these are essential factors that affect the overall accuracy of the approach. In the presented research, we prepared a dedicated database presenting 27 various devices of flight simulator. Then, we used it to compare existing state-of-the-art techniques and verify their applicability. We investigated the time necessary for the computation of a keypoint position, the time needed for the preparation of a descriptor, and the classification accuracy of the considered approaches. In total, we compared the outcomes of 12 keypoint location methods and 10 keypoint descriptors. The best scores recorded for our database were almost 96% for a combination of the ORB method for keypoint localization followed by the BRISK approach as a descriptor.


Asunto(s)
Algoritmos , Programas Informáticos , Bases de Datos Factuales
13.
Sensors (Basel) ; 21(22)2021 Nov 11.
Artículo en Inglés | MEDLINE | ID: mdl-34833580

RESUMEN

Finding the news of same case from the large numbers of case-involved news is an important basis for public opinion analysis. Existing text clustering methods usually based on topic models which only use topic and case infomation as the global features of documents, so distinguishing between different cases with similar types remains a challenge. The contents of documents contain rich local features. Taking into account the internal features of news, the information of cases and the contributions provided by different topics, we propose a clustering method of case-involved news, which combines topic network and multi-head attention mechanism. Using case information and topic information to construct a topic network, then extracting the global features by graph convolution network, thus realizing the combination of case information and topic information. At the same time, the local features are extracted by multi-head attention mechanism. Finally, the fusion of global features and local features is realized by variational auto-encoder, and the learned latent representations are used for clustering. The experiments show that the proposed method significantly outperforms the state-of-the-art unsupervised clustering methods.


Asunto(s)
Aprendizaje , Análisis por Conglomerados
14.
J Med Syst ; 43(6): 152, 2019 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-31016467

RESUMEN

Accurate and reliable brain tumor segmentation is a critical component in cancer diagnosis. According to deep learning model, a novel brain tumor segmentation method is developed by integrating fully convolutional neural networks (FCNN) and dense micro-block difference feature (DMDF) into a unified framework so as to obtain segmentation results with appearance and spatial consistency. Firstly, we propose a local feature to describe the rotation invariant property of the texture. In order to deal with the change of rotation and scale in texture image, Fisher vector encoding method is used to analyze the texture feature, which can combine with the scale information without increasing the dimension of the local feature. The obtained local features have strong robustness to rotation and gray intensity variation. Then, the non-quantifiable local feature is fused to the FCNN to perform fine boundary segmentation. Since brain tumors occupy a small portion of the image, deconvolutional layers are designed with skip connections to obtain a high quality feature map. Compared with the traditional MRI brain tumor segmentation methods, the experimental results show that the segmentation accuracy and stability has been greatly improved. Average Dice index can be up to 90.98%. And the proposed method has very high real-time performance, where brain tumor image can segment within 1 s.


Asunto(s)
Neoplasias Encefálicas/patología , Procesamiento de Imagen Asistido por Computador/métodos , Redes Neurales de la Computación , Algoritmos , Aprendizaje Profundo , Humanos , Imagen por Resonancia Magnética
15.
J Med Syst ; 43(7): 231, 2019 Jun 14.
Artículo en Inglés | MEDLINE | ID: mdl-31201559

RESUMEN

The traditional texture feature lacks the directional analysis of graphical element, so it could not better distinguish the thyroid nodule texture image formed by the rotation of graphical element. A non-quantifiable local feature is adopted in this paper to design a robust texture descriptor so as to enhance the robustness of the texture classification in the rotation and scale changes, which can improve the diagnostic accuracy of thyroid nodules in ultrasound images. First of all, the concept of local feature with rotational symmetry is introduced. It is found that many rotation invariant local features are rotational symmetric to a certain degree. Therefore, we propose a novel local feature to describe the rotation invariant properties of the texture. In order to deal with the change of rotation and scale of ultrasound thyroid nodules in image, Pairwise rotation-invariant spatial context feature is adopted to analyze the texture feature, which can combine with the scale information without increasing the dimension of the local feature. The fadopted local features have strong robustness to rotation and gray intensity variation. The experimental results show that our proposed method outperforms the existing algorithms on thyroid ultrasound data sets, which greatly improve the Diagnosis accuracy of thyroid nodules.


Asunto(s)
Interpretación de Imagen Asistida por Computador/métodos , Nódulo Tiroideo/diagnóstico , Ultrasonografía/métodos , Algoritmos , Humanos , Reconocimiento de Normas Patrones Automatizadas/métodos , Nódulo Tiroideo/diagnóstico por imagen
16.
Sensors (Basel) ; 18(7)2018 Jul 12.
Artículo en Inglés | MEDLINE | ID: mdl-30002288

RESUMEN

As the traditional single camera endoscope can only provide clear images without 3D measurement and 3D reconstruction, a miniature binocular endoscope based on the principle of binocular stereoscopic vision to implement 3D measurement and 3D reconstruction in tight and restricted spaces is presented. In order to realize the exact matching of points of interest in the left and right images, a novel construction method of the weighted orthogonal-symmetric local binary pattern (WOS-LBP) descriptor is presented. Then a stereo matching algorithm based on Gaussian-weighted AD-Census transform and improved cross-based adaptive regions is studied to realize 3D reconstruction for real scenes. In the algorithm, we adjust determination criterions of adaptive regions for edge and discontinuous areas in particular and as well extract mismatched pixels caused by occlusion through image entropy and region-growing algorithm. This paper develops a binocular endoscope with an external diameter of 3.17 mm and the above algorithms are applied in it. The endoscope contains two CMOS cameras and four fiber optics for illumination. Three conclusions are drawn from experiments: (1) the proposed descriptor has good rotation invariance, distinctiveness and robustness to light change as well as noises; (2) the proposed stereo matching algorithm has a mean relative error of 8.48% for Middlebury standard pairs of images and compared with several classical stereo matching algorithms, our algorithm performs better in edge and discontinuous areas; (3) the mean relative error of length measurement is 3.22%, and the endoscope can be utilized to measure and reconstruct real scenes effectively.


Asunto(s)
Algoritmos , Endoscopios , Procesamiento de Imagen Asistido por Computador/instrumentación , Imagenología Tridimensional/instrumentación , Visión Binocular , Percepción de Profundidad
17.
J Imaging ; 10(6)2024 May 29.
Artículo en Inglés | MEDLINE | ID: mdl-38921610

RESUMEN

Accurate and robust 3D human modeling from a single image presents significant challenges. Existing methods have shown potential, but they often fail to generate reconstructions that match the level of detail in the input image. These methods particularly struggle with loose clothing. They typically employ parameterized human models to constrain the reconstruction process, ensuring the results do not deviate too far from the model and produce anomalies. However, this also limits the recovery of loose clothing. To address this issue, we propose an end-to-end method called IHRPN for reconstructing clothed humans from a single 2D human image. This method includes a feature extraction module for semantic extraction of image features. We propose an image semantic feature extraction aimed at achieving pixel model space consistency and enhancing the robustness of loose clothing. We extract features from the input image to infer and recover the SMPL-X mesh, and then combine it with a normal map to guide the implicit function to reconstruct the complete clothed human. Unlike traditional methods, we use local features for implicit surface regression. Our experimental results show that our IHRPN method performs excellently on the CAPE and AGORA datasets, achieving good performance, and the reconstruction of loose clothing is noticeably more accurate and robust.

18.
J Imaging ; 10(4)2024 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-38667982

RESUMEN

Local feature description of point clouds is essential in 3D computer vision. However, many local feature descriptors for point clouds struggle with inadequate robustness, excessive dimensionality, and poor computational efficiency. To address these issues, we propose a novel descriptor based on Planar Projection Contours, characterized by convex packet contour information. We construct the Local Reference Frame (LRF) through covariance analysis of the query point and its neighboring points. Neighboring points are projected onto three orthogonal planes defined by the LRF. These projection points on the planes are fitted into convex hull contours and encoded as local features. These planar features are then concatenated to create the Planar Projection Contour (PPC) descriptor. We evaluated the performance of the PPC descriptor against classical descriptors using the B3R, UWAOR, and Kinect datasets. Experimental results demonstrate that the PPC descriptor achieves an accuracy exceeding 80% across all recall levels, even under high-noise and point density variation conditions, underscoring its effectiveness and robustness.

19.
ISA Trans ; 146: 319-335, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38220542

RESUMEN

Blind deconvolution can remove the effects of complex paths and extraneous disturbances, thus recovering simple features of the original fault source, and is used extensively in the field of fault diagnosis. However, it can only identify and extract the statistical mean of the fault impact features in a single domain and is unable to simultaneously highlight the local features of the signal in the time-frequency domain. Therefore, the extraction effect of weak fault signals is generally not ideal. In this paper, a new time-frequency slice extraction method is proposed. The method first computes a high temporal resolution spectrum of the signal by short-time Fourier transform to obtain multiple frequency slices with distinct temporal waveforms. Subsequently, the constructed harmonic spectral feature index is used to quantify and target the intensity of feature information in each frequency slice and enhance their fault characteristics using maximum correlation kurtosis deconvolution. Enhancing the local features of selected frequency slice clusters can reduce noise interference and obtain signal components with more obvious fault signatures. Finally, the validity of the method was confirmed by a simulated signal and fault diagnosis of the rolling bearing outer and inner rings was accomplished sequentially. Compared with other common deconvolution methods, the proposed method obtains more accurate and effective results in identifying fault messages.

20.
Comput Biol Med ; 168: 107717, 2024 01.
Artículo en Inglés | MEDLINE | ID: mdl-38007973

RESUMEN

Current medical image segmentation approaches have limitations in deeply exploring multi-scale information and effectively combining local detail textures with global contextual semantic information. This results in over-segmentation, under-segmentation, and blurred segmentation boundaries. To tackle these challenges, we explore multi-scale feature representations from different perspectives, proposing a novel, lightweight, and multi-scale architecture (LM-Net) that integrates advantages of both Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to enhance segmentation accuracy. LM-Net employs a lightweight multi-branch module to capture multi-scale features at the same level. Furthermore, we introduce two modules to concurrently capture local detail textures and global semantics with multi-scale features at different levels: the Local Feature Transformer (LFT) and Global Feature Transformer (GFT). The LFT integrates local window self-attention to capture local detail textures, while the GFT leverages global self-attention to capture global contextual semantics. By combining these modules, our model achieves complementarity between local and global representations, alleviating the problem of blurred segmentation boundaries in medical image segmentation. To evaluate the feasibility of LM-Net, extensive experiments have been conducted on three publicly available datasets with different modalities. Our proposed model achieves state-of-the-art results, surpassing previous methods, while only requiring 4.66G FLOPs and 5.4M parameters. These state-of-the-art results on three datasets with different modalities demonstrate the effectiveness and adaptability of our proposed LM-Net for various medical image segmentation tasks.


Asunto(s)
Redes Neurales de la Computación , Semántica , Procesamiento de Imagen Asistido por Computador
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda