Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
1.
Sensors (Basel) ; 24(12)2024 Jun 14.
Artículo en Inglés | MEDLINE | ID: mdl-38931629

RESUMEN

Existing end-to-end speech recognition methods typically employ hybrid decoders based on CTC and Transformer. However, the issue of error accumulation in these hybrid decoders hinders further improvements in accuracy. Additionally, most existing models are built upon Transformer architecture, which tends to be complex and unfriendly to small datasets. Hence, we propose a Nonlinear Regularization Decoding Method for Speech Recognition. Firstly, we introduce the nonlinear Transformer decoder, breaking away from traditional left-to-right or right-to-left decoding orders and enabling associations between any characters, mitigating the limitations of Transformer architectures on small datasets. Secondly, we propose a novel regularization attention module to optimize the attention score matrix, reducing the impact of early errors on later outputs. Finally, we introduce the tiny model to address the challenge of overly large model parameters. The experimental results indicate that our model demonstrates good performance. Compared to the baseline, our model achieves recognition improvements of 0.12%, 0.54%, 0.51%, and 1.2% on the Aishell1, Primewords, Free ST Chinese Corpus, and Common Voice 16.1 datasets of Uyghur, respectively.


Asunto(s)
Algoritmos , Software de Reconocimiento del Habla , Humanos , Habla/fisiología , Dinámicas no Lineales , Reconocimiento de Normas Patrones Automatizadas/métodos
2.
Sensors (Basel) ; 23(6)2023 Mar 13.
Artículo en Inglés | MEDLINE | ID: mdl-36991777

RESUMEN

At present, convolutional neural networks (CNNs) have been widely applied to the task of skin disease image segmentation due to the fact of their powerful information discrimination abilities and have achieved good results. However, it is difficult for CNNs to capture the connection between long-range contexts when extracting deep semantic features of lesion images, and the resulting semantic gap leads to the problem of segmentation blur in skin lesion image segmentation. In order to solve the above problems, we designed a hybrid encoder network based on transformer and fully connected neural network (MLP) architecture, and we call this approach HMT-Net. In the HMT-Net network, we use the attention mechanism of the CTrans module to learn the global relevance of the feature map to improve the network's ability to understand the overall foreground information of the lesion. On the other hand, we use the TokMLP module to effectively enhance the network's ability to learn the boundary features of lesion images. In the TokMLP module, the tokenized MLP axial displacement operation strengthens the connection between pixels to facilitate the extraction of local feature information by our network. In order to verify the superiority of our network in segmentation tasks, we conducted extensive experiments on the proposed HMT-Net network and several newly proposed Transformer and MLP networks on three public datasets (ISIC2018, ISBI2017, and ISBI2016) and obtained the following results. Our method achieves 82.39%, 75.53%, and 83.98% on the Dice index and 89.35%, 84.93%, and 91.33% on the IOU. Compared with the latest skin disease segmentation network, FAC-Net, our method improves the Dice index by 1.99%, 1.68%, and 1.6%, respectively. In addition, the IOU indicators have increased by 0.45%, 2.36%, and 1.13%, respectively. The experimental results show that our designed HMT-Net achieves state-of-the-art performance superior to other segmentation methods.


Asunto(s)
Suministros de Energía Eléctrica , Enfermedades de la Piel , Humanos , Aprendizaje , Redes Neurales de la Computación , Registros , Enfermedades de la Piel/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador
3.
Sensors (Basel) ; 23(13)2023 Jul 07.
Artículo en Inglés | MEDLINE | ID: mdl-37448077

RESUMEN

Although convolutional neural networks (CNNs) have produced great achievements in various fields, many scholars are still exploring better network models, since CNNs have an inherent limitation-that is, the remote modeling ability of convolutional kernels is limited. On the contrary, the transformer has been applied by many scholars to the field of vision, and although it has a strong global modeling capability, its close-range modeling capability is mediocre. While the foreground information to be segmented in medical images is usually clustered in a small interval in the image, the distance between different categories of foreground information is uncertain. Therefore, in order to obtain a perfect medical segmentation prediction graph, the network should not only have a strong learning ability for local details, but also have a certain distance modeling ability. To solve these problems, a remote feature exploration (RFE) module is proposed in this paper. The most important feature of this module is that remote elements can be used to assist in the generation of local features. In addition, in order to better verify the feasibility of the innovation in this paper, a new multi-organ segmentation dataset (MOD) was manually created. While both the MOD and Synapse datasets label eight categories of organs, there are some images in the Synapse dataset that label only a few categories of organs. The proposed method achieved 79.77% and 75.12% DSC on the Synapse and MOD datasets, respectively. Meanwhile, the HD95 (mm) scores were 21.75 on Synapse and 7.43 on the MOD dataset.


Asunto(s)
Algoritmos , Aprendizaje , Suministros de Energía Eléctrica , Inteligencia , Redes Neurales de la Computación , Procesamiento de Imagen Asistido por Computador
4.
Expert Syst Appl ; 228: 120389, 2023 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-37193247

RESUMEN

Recent years have witnessed a growing interest in neural network-based medical image classification methods, which have demonstrated remarkable performance in this field. Typically, convolutional neural network (CNN) architectures have been commonly employed to extract local features. However, the transformer, a newly emerged architecture, has gained popularity due to its ability to explore the relevance of remote elements in an image through a self-attention mechanism. Despite this, it is crucial to establish not only local connectivity but also remote relationships between lesion features and capture the overall image structure to improve image classification accuracy. Therefore, to tackle the aforementioned issues, this paper proposes a network based on multilayer perceptrons (MLPs) that can learn the local features of medical images on the one hand and capture the overall feature information in both spatial and channel dimensions on the other hand, thus utilizing image features effectively. This paper has been extensively validated on COVID19-CT dataset and ISIC 2018 dataset, and the results show that the method in this paper is more competitive and has higher performance in medical image classification compared with existing methods. This shows that the use of MLP to capture image features and establish connections between lesions is expected to provide novel ideas for medical image classification tasks in the future.

5.
Sensors (Basel) ; 22(1)2022 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-35009871

RESUMEN

Recently, many super-resolution reconstruction (SR) feedforward networks based on deep learning have been proposed. These networks enable the reconstructed images to achieve convincing results. However, due to a large amount of computation and parameters, SR technology is greatly limited in devices with limited computing power. To trade-off the network performance and network parameters. In this paper, we propose the efficient image super-resolution network via Self-Calibrated Feature Fuse, named SCFFN, by constructing the self-calibrated feature fuse block (SCFFB). Specifically, to recover the high-frequency detail information of the image as much as possible, we propose SCFFB by self-transformation and self-fusion of features. In addition, to accelerate the network training while reducing the computational complexity of the network, we employ an attention mechanism to elaborate the reconstruction part of the network, called U-SCA. Compared with the existing transposed convolution, it can greatly reduce the computation burden of the network without reducing the reconstruction effect. We have conducted full quantitative and qualitative experiments on public datasets, and the experimental results show that the network achieves comparable performance to other networks, while we only need fewer parameters and computational resources.


Asunto(s)
Algoritmos , Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética
6.
Sensors (Basel) ; 22(18)2022 Sep 08.
Artículo en Inglés | MEDLINE | ID: mdl-36146132

RESUMEN

Doctors usually diagnose a disease by evaluating the pattern of abnormal blood vessels in the fundus. At present, the segmentation of fundus blood vessels based on deep learning has achieved great success, but it still faces the problems of low accuracy and capillary rupture. A good vessel segmentation method can guide the early diagnosis of eye diseases, so we propose a novel hybrid Transformer network (HT-Net) for fundus imaging analysis. HT-Net can improve the vessel segmentation quality by capturing detailed local information and implementing long-range information interactions, and it mainly consists of the following blocks. The feature fusion block (FFB) is embedded in the shallow levels, and FFB enriches the feature space. In addition, the feature refinement block (FRB) is added to the shallow position of the network, which solves the problem of vessel scale change by fusing multi-scale feature information to improve the accuracy of segmentation. Finally, HT-Net's bottom-level position can capture remote dependencies by combining the Transformer and CNN. We prove the performance of HT-Net on the DRIVE, CHASE_DB1, and STARE datasets. The experiment shows that FFB and FRB can effectively improve the quality of microvessel segmentation by extracting multi-scale information. Embedding efficient self-attention mechanisms in the network can effectively improve the vessel segmentation accuracy. The HT-Net exceeds most existing methods, indicating that it can perform the task of vessel segmentation competently.


Asunto(s)
Algoritmos , Vasos Retinianos , Fondo de Ojo , Procesamiento de Imagen Asistido por Computador/métodos , Vasos Retinianos/diagnóstico por imagen
7.
Sensors (Basel) ; 22(18)2022 Sep 16.
Artículo en Inglés | MEDLINE | ID: mdl-36146373

RESUMEN

The model, Transformer, is known to rely on a self-attention mechanism to model distant dependencies, which focuses on modeling the dependencies of the global elements. However, its sensitivity to the local details of the foreground information is not significant. Local detail features help to identify the blurred boundaries in medical images more accurately. In order to make up for the defects of Transformer and capture more abundant local information, this paper proposes an attention and MLP hybrid-encoder architecture combining the Efficient Attention Module (EAM) with a Dual-channel Shift MLP module (DS-MLP), called HEA-Net. Specifically, we effectively connect the convolution block with Transformer through EAM to enhance the foreground and suppress the invalid background information in medical images. Meanwhile, DS-MLP further enhances the foreground information via channel and spatial shift operations. Extensive experiments on public datasets confirm the excellent performance of our proposed HEA-Net. In particular, on the GlaS and MoNuSeg datasets, the Dice reached 90.56% and 80.80%, respectively, and the IoU reached 83.62% and 68.26%, respectively.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos
8.
Sensors (Basel) ; 22(8)2022 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-35459043

RESUMEN

Recently, the feedforward architecture of a super-resolution network based on deep learning was proposed to learn the representation of a low-resolution (LR) input and the non-linear mapping from these inputs to a high-resolution (HR) output, but this method cannot completely solve the interdependence between LR and HR images. In this paper, we retain the feedforward architecture and introduce residuals to a dual-level; therefore, we propose the dual-level recurrent residual network (DLRRN) to generate an HR image with rich details and satisfactory vision. Compared with feedforward networks that operate at a fixed spatial resolution, the dual-level recurrent residual block (DLRRB) in DLRRN utilizes both LR and HR space information. The circular signals in DLRRB enhance spatial details by the mutual guidance between two directions (LR to HR and HR to LR). Specifically, the LR information of the current layer is generated by the HR and LR information of the previous layer. Then, the HR information of the previous layer and LR information of the current layer jointly generate the HR information of the current layer, and so on. The proposed DLRRN has a strong ability for early reconstruction and can gradually restore the final high-resolution image. An extensive quantitative and qualitative evaluation of the benchmark dataset was carried out, and the experimental results proved that our network achieved good results in terms of network parameters, visual effects and objective performance metrics.

9.
Sensors (Basel) ; 22(12)2022 Jun 14.
Artículo en Inglés | MEDLINE | ID: mdl-35746271

RESUMEN

Different feature learning strategies have enhanced performance in recent deep neural network-based salient object detection. Multi-scale strategy and residual learning strategies are two types of multi-scale learning strategies. However, there are still some problems, such as the inability to effectively utilize multi-scale feature information and the lack of fine object boundaries. We propose a feature refined network (FRNet) to overcome the problems mentioned, which includes a novel feature learning strategy that combines the multi-scale and residual learning strategies to generate the final saliency prediction. We introduce the spatial and channel 'squeeze and excitation' blocks (scSE) at the side outputs of the backbone. It allows the network to concentrate more on saliency regions at various scales. Then, we propose the adaptive feature fusion module (AFFM), which efficiently fuses multi-scale feature information in order to predict superior saliency maps. Finally, to supervise network learning of more information on object boundaries, we propose a hybrid loss that contains four fundamental losses and combines properties of diverse losses. Comprehensive experiments demonstrate the effectiveness of the FRNet on five datasets, with competitive results when compared to other relevant approaches.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Aprendizaje
10.
Sensors (Basel) ; 22(12)2022 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-35746407

RESUMEN

Change detection (CD) is a particularly important task in the field of remote sensing image processing. It is of practical importance for people when making decisions about transitional situations on the Earth's surface. The existing CD methods focus on the design of feature extraction network, ignoring the strategy fusion and attention enhancement of the extracted features, which will lead to the problems of incomplete boundary of changed area and missing detection of small targets in the final output change map. To overcome the above problems, we proposed a hierarchical attention residual nested U-Net (HARNU-Net) for remote sensing image CD. First, the backbone network is composed of a Siamese network and nested U-Net. We remold the convolution block in nested U-Net and proposed ACON-Relu residual convolution block (A-R), which reduces the missed detection rate of the backbone network in small change areas. Second, this paper proposed the adjacent feature fusion module (AFFM). Based on the adjacency fusion strategy, the module effectively integrates the details and semantic information of multi-level features, so as to realize the feature complementarity and spatial mutual enhancement between adjacent features. Finally, the hierarchical attention residual module (HARM) is proposed, which locally filters and enhances the features in a more fine-grained space to output a much better change map. Adequate experiments on three challenging benchmark public datasets, CDD, LEVIR-CD and BCDD, show that our method outperforms several other state-of-the-art methods and performs excellent in F1, IOU and visual image quality.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Atención , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Tecnología de Sensores Remotos
11.
Entropy (Basel) ; 24(7)2022 Jul 06.
Artículo en Inglés | MEDLINE | ID: mdl-35885162

RESUMEN

Violence detection aims to locate violent content in video frames. Improving the accuracy of violence detection is of great importance for security. However, the current methods do not make full use of the multi-modal vision and audio information, which affects the accuracy of violence detection. We found that the violence detection accuracy of different kinds of videos is related to the change of optical flow. With this in mind, we propose an optical flow-aware-based multi-modal fusion network (OAMFN) for violence detection. Specifically, we use three different fusion strategies to fully integrate multi-modal features. First, the main branch concatenates RGB features and audio features and the optical flow branch concatenates optical flow features with RGB features and audio features, respectively. Then, the cross-modal information fusion module integrates the features of different combinations and applies weights to them to capture cross-modal information in audio and video. After that, the channel attention module extracts valuable information by weighting the integration features. Furthermore, an optical flow-aware-based score fusion strategy is introduced to fuse features of different modalities from two branches. Compared with methods on the XD-Violence dataset, our multi-modal fusion network yields APs that are 83.09% and 1.4% higher than those of the state-of-the-art methods in offline detection, and 78.09% and 4.42% higher than those of the state-of-the-art methods in online detection.

12.
Sensors (Basel) ; 21(17)2021 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-34502731

RESUMEN

As a sub-direction of image retrieval, person re-identification (Re-ID) is usually used to solve the security problem of cross camera tracking and monitoring. A growing number of shopping centers have recently attempted to apply Re-ID technology. One of the development trends of related algorithms is using an attention mechanism to capture global and local features. We notice that these algorithms have apparent limitations. They only focus on the most salient features without considering certain detailed features. People's clothes, bags and even shoes are of great help to distinguish pedestrians. We notice that global features usually cover these important local features. Therefore, we propose a dual branch network based on a multi-scale attention mechanism. This network can capture apparent global features and inconspicuous local features of pedestrian images. Specifically, we design a dual branch attention network (DBA-Net) for better performance. These two branches can optimize the extracted features of different depths at the same time. We also design an effective block (called channel, position and spatial-wise attention (CPSA)), which can capture key fine-grained information, such as bags and shoes. Furthermore, based on ID loss, we use complementary triplet loss and adaptive weighted rank list loss (WRLL) on each branch during the training process. DBA-Net can not only learn semantic context information of the channel, position, and spatial dimensions but can integrate detailed semantic information by learning the dependency relationships between features. Extensive experiments on three widely used open-source datasets proved that DBA-Net clearly yielded overall state-of-the-art performance. Particularly on the CUHK03 dataset, the mean average precision (mAP) of DBA-Net achieved 83.2%.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Peatones , Algoritmos , Humanos , Investigación , Semántica
13.
Sensors (Basel) ; 21(15)2021 Jul 30.
Artículo en Inglés | MEDLINE | ID: mdl-34372409

RESUMEN

Considerable research and surveys indicate that skin lesions are an early symptom of skin cancer. Segmentation of skin lesions is still a hot research topic. Dermatological datasets in skin lesion segmentation tasks generated a large number of parameters when data augmented, limiting the application of smart assisted medicine in real life. Hence, this paper proposes an effective feedback attention network (FAC-Net). The network is equipped with the feedback fusion block (FFB) and the attention mechanism block (AMB), through the combination of these two modules, we can obtain richer and more specific feature mapping without data enhancement. Numerous experimental tests were given by us on public datasets (ISIC2018, ISBI2017, ISBI2016), and a good deal of metrics like the Jaccard index (JA) and Dice coefficient (DC) were used to evaluate the results of segmentation. On the ISIC2018 dataset, we obtained results for DC equal to 91.19% and JA equal to 83.99%, compared with the based network. The results of these two main metrics were improved by more than 1%. In addition, the metrics were also improved in the other two datasets. It can be demonstrated through experiments that without any enhancements of the datasets, our lightweight model can achieve better segmentation performance than most deep learning architectures.


Asunto(s)
Enfermedades de la Piel , Neoplasias Cutáneas , Retroalimentación , Humanos , Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Manejo de Especímenes
14.
Sensors (Basel) ; 20(6)2020 Mar 11.
Artículo en Inglés | MEDLINE | ID: mdl-32168732

RESUMEN

The secure transmission of data within a network has received great attention. As the core of the security management mechanism, the key management scheme design needs further research. In view of the safety and energy consumption problems in recent papers, we propose a key management scheme based on the pairing-free identity based digital signature (PF-IBS) algorithm for heterogeneous wireless sensor networks (HWSNs). Our scheme uses the PF-IBS algorithm to complete message authentication, which is safer and more energy efficient than some recent schemes. Moreover, we use the base station (BS) as the processing center for the huge data in the network, thereby saving network energy consumption and improving the network life cycle. Finally, we indirectly prevent the attacker from capturing relay nodes that upload data between clusters in the network (some cluster head nodes cannot communicate directly). Through performance evaluation, the scheme we proposed reasonably sacrifices part of the storage space in exchange for entire network security while saving energy consumption.

15.
Entropy (Basel) ; 22(11)2020 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-33287034

RESUMEN

Deep hashing is the mainstream algorithm for large-scale cross-modal retrieval due to its high retrieval speed and low storage capacity, but the problem of reconstruction of modal semantic information is still very challenging. In order to further solve the problem of unsupervised cross-modal retrieval semantic reconstruction, we propose a novel deep semantic-preserving reconstruction hashing (DSPRH). The algorithm combines spatial and channel semantic information, and mines modal semantic information based on adaptive self-encoding and joint semantic reconstruction loss. The main contributions are as follows: (1) We introduce a new spatial pooling network module based on tensor regular-polymorphic decomposition theory to generate rank-1 tensor to capture high-order context semantics, which can assist the backbone network to capture important contextual modal semantic information. (2) Based on optimization perspective, we use global covariance pooling to capture channel semantic information and accelerate network convergence. In feature reconstruction layer, we use two bottlenecks auto-encoding to achieve visual-text modal interaction. (3) In metric learning, we design a new loss function to optimize model parameters, which can preserve the correlation between image modalities and text modalities. The DSPRH algorithm is tested on MIRFlickr-25K and NUS-WIDE. The experimental results show that DSPRH has achieved better performance on retrieval tasks.

16.
Sensors (Basel) ; 18(9)2018 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-30200195

RESUMEN

As an emerging type of Internet of Things (IoT), multimedia IoT (MIoT) has been widely used in the domains of healthcare, smart buildings/homes, transportation and surveillance. In the mobile surveillance system for vehicle tracking, multiple mobile camera nodes capture and upload videos to a cloud server to track the target. Due to the random distribution and mobility of camera nodes, wireless networks are chosen for video transmission. However, the tracking precision can be decreased because of degradation of video quality caused by limited wireless transmission resources and transmission errors. In this paper, we propose a joint source and channel rate allocation scheme to optimize the performance of vehicle tracking in cloud servers. The proposed scheme considers the video content features that impact tracking precision for optimal rate allocation. To improve the reliability of data transmission and the real-time video communication, forward error correction is adopted in the application layer. Extensive experiments are conducted on videos from the Object Tracking Benchmark using the H.264/AVC standard and a kernelized correlation filter tracking scheme. The results show that the proposed scheme can allocate rates efficiently and provide high quality tracking service under the total transmission rate constraints.

17.
Sci Rep ; 14(1): 9714, 2024 Apr 27.
Artículo en Inglés | MEDLINE | ID: mdl-38678063

RESUMEN

Medical image segmentation is a key task in computer aided diagnosis. In recent years, convolutional neural network (CNN) has made some achievements in medical image segmentation. However, the convolution operation can only extract features in a fixed size region at a time, which leads to the loss of some key features. The recently popular Transformer has global modeling capabilities, but it does not pay enough attention to local information and cannot accurately segment the edge details of the target area. Given these issues, we proposed dynamic regional attention network (DRA-Net). Different from the above methods, it first measures the similarity of features and concentrates attention on different dynamic regions. In this way, the network can adaptively select different modeling scopes for feature extraction, reducing information loss. Then, regional feature interaction is carried out to better learn local edge details. At the same time, we also design ordered shift multilayer perceptron (MLP) blocks to enhance communication within different regions, further enhancing the network's ability to learn local edge details. After several experiments, the results indicate that our network produces more accurate segmentation performance compared to other CNN and Transformer based networks.

18.
Sci Rep ; 14(1): 15013, 2024 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-38951526

RESUMEN

Visual Transformers(ViT) have made remarkable achievements in the field of medical image analysis. However, ViT-based methods have poor classification results on some small-scale medical image classification datasets. Meanwhile, many ViT-based models sacrifice computational cost for superior performance, which is a great challenge in practical clinical applications. In this paper, we propose an efficient medical image classification network based on an alternating mixture of CNN and Transformer tandem, which is called Eff-CTNet. Specifically, the existing ViT-based method still mainly relies on multi-head self-attention (MHSA). Among them, the attention maps of MHSA are highly similar, which leads to computational redundancy. Therefore, we propose a group cascade attention (GCA) module to split the feature maps, which are provided to different attention heads to further improves the diversity of attention and reduce the computational cost. In addition, we propose an efficient CNN (EC) module to enhance the ability of the model and extract the local detail information in medical images. Finally, we connect them and design an efficient hybrid medical image classification network, namely Eff-CTNet. Extensive experimental results show that our Eff-CTNet achieves advanced classification performance with less computational cost on three public medical image classification datasets.


Asunto(s)
Redes Neurales de la Computación , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos , Diagnóstico por Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos
19.
PLoS One ; 19(7): e0306596, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38985710

RESUMEN

The accurate early diagnosis of colorectal cancer significantly relies on the precise segmentation of polyps in medical images. Current convolution-based and transformer-based segmentation methods show promise but still struggle with the varied sizes and shapes of polyps and the often low contrast between polyps and their background. This research introduces an innovative approach to confronting the aforementioned challenges by proposing a Dual-Channel Hybrid Attention Network with Transformer (DHAFormer). Our proposed framework features a multi-scale channel fusion module, which excels at recognizing polyps across a spectrum of sizes and shapes. Additionally, the framework's dual-channel hybrid attention mechanism is innovatively conceived to reduce background interference and improve the foreground representation of polyp features by integrating local and global information. The DHAFormer demonstrates significant improvements in the task of polyp segmentation compared to currently established methodologies.


Asunto(s)
Pólipos del Colon , Humanos , Pólipos del Colon/patología , Neoplasias Colorrectales/patología , Neoplasias Colorrectales/diagnóstico por imagen , Redes Neurales de la Computación , Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos , Pólipos/patología , Pólipos/diagnóstico por imagen , Interpretación de Imagen Asistida por Computador/métodos
20.
Sci Rep ; 14(1): 5791, 2024 03 09.
Artículo en Inglés | MEDLINE | ID: mdl-38461342

RESUMEN

Diabetic retinopathy (DR) is a serious ocular complication that can pose a serious risk to a patient's vision and overall health. Currently, the automatic grading of DR is mainly using deep learning techniques. However, the lesion information in DR images is complex, variable in shape and size, and randomly distributed in the images, which leads to some shortcomings of the current research methods, i.e., it is difficult to effectively extract the information of these various features, and it is difficult to establish the connection between the lesion information in different regions. To address these shortcomings, we design a multi-scale dynamic fusion (MSDF) module and combine it with graph convolution operations to propose a multi-scale dynamic graph convolutional network (MDGNet) in this paper. MDGNet firstly uses convolution kernels with different sizes to extract features with different shapes and sizes in the lesion regions, and then automatically learns the corresponding weights for feature fusion according to the contribution of different features to model grading. Finally, the graph convolution operation is used to link the lesion features in different regions. As a result, our proposed method can effectively combine local and global features, which is beneficial for the correct DR grading. We evaluate the effectiveness of method on two publicly available datasets, namely APTOS and DDR. Extensive experiments demonstrate that our proposed MDGNet achieves the best grading results on APTOS and DDR, and is more accurate and diverse for the extraction of lesion information.


Asunto(s)
Diabetes Mellitus , Retinopatía Diabética , Humanos , Retinopatía Diabética/diagnóstico por imagen , Ojo , Algoritmos , Cara , Proyectos de Investigación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA