Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 35
Filter
Add more filters










Publication year range
1.
Sci Rep ; 14(1): 15013, 2024 07 01.
Article in English | MEDLINE | ID: mdl-38951526

ABSTRACT

Visual Transformers(ViT) have made remarkable achievements in the field of medical image analysis. However, ViT-based methods have poor classification results on some small-scale medical image classification datasets. Meanwhile, many ViT-based models sacrifice computational cost for superior performance, which is a great challenge in practical clinical applications. In this paper, we propose an efficient medical image classification network based on an alternating mixture of CNN and Transformer tandem, which is called Eff-CTNet. Specifically, the existing ViT-based method still mainly relies on multi-head self-attention (MHSA). Among them, the attention maps of MHSA are highly similar, which leads to computational redundancy. Therefore, we propose a group cascade attention (GCA) module to split the feature maps, which are provided to different attention heads to further improves the diversity of attention and reduce the computational cost. In addition, we propose an efficient CNN (EC) module to enhance the ability of the model and extract the local detail information in medical images. Finally, we connect them and design an efficient hybrid medical image classification network, namely Eff-CTNet. Extensive experimental results show that our Eff-CTNet achieves advanced classification performance with less computational cost on three public medical image classification datasets.


Subject(s)
Neural Networks, Computer , Humans , Image Processing, Computer-Assisted/methods , Algorithms , Diagnostic Imaging/methods , Image Interpretation, Computer-Assisted/methods
2.
Sensors (Basel) ; 24(12)2024 Jun 14.
Article in English | MEDLINE | ID: mdl-38931629

ABSTRACT

Existing end-to-end speech recognition methods typically employ hybrid decoders based on CTC and Transformer. However, the issue of error accumulation in these hybrid decoders hinders further improvements in accuracy. Additionally, most existing models are built upon Transformer architecture, which tends to be complex and unfriendly to small datasets. Hence, we propose a Nonlinear Regularization Decoding Method for Speech Recognition. Firstly, we introduce the nonlinear Transformer decoder, breaking away from traditional left-to-right or right-to-left decoding orders and enabling associations between any characters, mitigating the limitations of Transformer architectures on small datasets. Secondly, we propose a novel regularization attention module to optimize the attention score matrix, reducing the impact of early errors on later outputs. Finally, we introduce the tiny model to address the challenge of overly large model parameters. The experimental results indicate that our model demonstrates good performance. Compared to the baseline, our model achieves recognition improvements of 0.12%, 0.54%, 0.51%, and 1.2% on the Aishell1, Primewords, Free ST Chinese Corpus, and Common Voice 16.1 datasets of Uyghur, respectively.


Subject(s)
Algorithms , Speech Recognition Software , Humans , Speech/physiology , Nonlinear Dynamics , Pattern Recognition, Automated/methods
3.
Sci Rep ; 14(1): 9714, 2024 Apr 27.
Article in English | MEDLINE | ID: mdl-38678063

ABSTRACT

Medical image segmentation is a key task in computer aided diagnosis. In recent years, convolutional neural network (CNN) has made some achievements in medical image segmentation. However, the convolution operation can only extract features in a fixed size region at a time, which leads to the loss of some key features. The recently popular Transformer has global modeling capabilities, but it does not pay enough attention to local information and cannot accurately segment the edge details of the target area. Given these issues, we proposed dynamic regional attention network (DRA-Net). Different from the above methods, it first measures the similarity of features and concentrates attention on different dynamic regions. In this way, the network can adaptively select different modeling scopes for feature extraction, reducing information loss. Then, regional feature interaction is carried out to better learn local edge details. At the same time, we also design ordered shift multilayer perceptron (MLP) blocks to enhance communication within different regions, further enhancing the network's ability to learn local edge details. After several experiments, the results indicate that our network produces more accurate segmentation performance compared to other CNN and Transformer based networks.

4.
Sci Rep ; 14(1): 5791, 2024 03 09.
Article in English | MEDLINE | ID: mdl-38461342

ABSTRACT

Diabetic retinopathy (DR) is a serious ocular complication that can pose a serious risk to a patient's vision and overall health. Currently, the automatic grading of DR is mainly using deep learning techniques. However, the lesion information in DR images is complex, variable in shape and size, and randomly distributed in the images, which leads to some shortcomings of the current research methods, i.e., it is difficult to effectively extract the information of these various features, and it is difficult to establish the connection between the lesion information in different regions. To address these shortcomings, we design a multi-scale dynamic fusion (MSDF) module and combine it with graph convolution operations to propose a multi-scale dynamic graph convolutional network (MDGNet) in this paper. MDGNet firstly uses convolution kernels with different sizes to extract features with different shapes and sizes in the lesion regions, and then automatically learns the corresponding weights for feature fusion according to the contribution of different features to model grading. Finally, the graph convolution operation is used to link the lesion features in different regions. As a result, our proposed method can effectively combine local and global features, which is beneficial for the correct DR grading. We evaluate the effectiveness of method on two publicly available datasets, namely APTOS and DDR. Extensive experiments demonstrate that our proposed MDGNet achieves the best grading results on APTOS and DDR, and is more accurate and diverse for the extraction of lesion information.


Subject(s)
Diabetes Mellitus , Diabetic Retinopathy , Humans , Diabetic Retinopathy/diagnostic imaging , Eye , Algorithms , Face , Research Design
5.
Sensors (Basel) ; 23(13)2023 Jul 07.
Article in English | MEDLINE | ID: mdl-37448077

ABSTRACT

Although convolutional neural networks (CNNs) have produced great achievements in various fields, many scholars are still exploring better network models, since CNNs have an inherent limitation-that is, the remote modeling ability of convolutional kernels is limited. On the contrary, the transformer has been applied by many scholars to the field of vision, and although it has a strong global modeling capability, its close-range modeling capability is mediocre. While the foreground information to be segmented in medical images is usually clustered in a small interval in the image, the distance between different categories of foreground information is uncertain. Therefore, in order to obtain a perfect medical segmentation prediction graph, the network should not only have a strong learning ability for local details, but also have a certain distance modeling ability. To solve these problems, a remote feature exploration (RFE) module is proposed in this paper. The most important feature of this module is that remote elements can be used to assist in the generation of local features. In addition, in order to better verify the feasibility of the innovation in this paper, a new multi-organ segmentation dataset (MOD) was manually created. While both the MOD and Synapse datasets label eight categories of organs, there are some images in the Synapse dataset that label only a few categories of organs. The proposed method achieved 79.77% and 75.12% DSC on the Synapse and MOD datasets, respectively. Meanwhile, the HD95 (mm) scores were 21.75 on Synapse and 7.43 on the MOD dataset.


Subject(s)
Algorithms , Learning , Electric Power Supplies , Intelligence , Neural Networks, Computer , Image Processing, Computer-Assisted
6.
Expert Syst Appl ; 228: 120389, 2023 Oct 15.
Article in English | MEDLINE | ID: mdl-37193247

ABSTRACT

Recent years have witnessed a growing interest in neural network-based medical image classification methods, which have demonstrated remarkable performance in this field. Typically, convolutional neural network (CNN) architectures have been commonly employed to extract local features. However, the transformer, a newly emerged architecture, has gained popularity due to its ability to explore the relevance of remote elements in an image through a self-attention mechanism. Despite this, it is crucial to establish not only local connectivity but also remote relationships between lesion features and capture the overall image structure to improve image classification accuracy. Therefore, to tackle the aforementioned issues, this paper proposes a network based on multilayer perceptrons (MLPs) that can learn the local features of medical images on the one hand and capture the overall feature information in both spatial and channel dimensions on the other hand, thus utilizing image features effectively. This paper has been extensively validated on COVID19-CT dataset and ISIC 2018 dataset, and the results show that the method in this paper is more competitive and has higher performance in medical image classification compared with existing methods. This shows that the use of MLP to capture image features and establish connections between lesions is expected to provide novel ideas for medical image classification tasks in the future.

7.
Sci Rep ; 13(1): 6342, 2023 04 18.
Article in English | MEDLINE | ID: mdl-37072483

ABSTRACT

Medical image segmentation provides various effective methods for accuracy and robustness of organ segmentation, lesion detection, and classification. Medical images have fixed structures, simple semantics, and diverse details, and thus fusing rich multi-scale features can augment segmentation accuracy. Given that the density of diseased tissue may be comparable to that of surrounding normal tissue, both global and local information are critical for segmentation results. Therefore, considering the importance of multi-scale, global, and local information, in this paper, we propose the dynamic hierarchical multi-scale fusion network with axial mlp (multilayer perceptron) (DHMF-MLP), which integrates the proposed hierarchical multi-scale fusion (HMSF) module. Specifically, HMSF not only reduces the loss of detail information by integrating the features of each stage of the encoder, but also has different receptive fields, thereby improving the segmentation results for small lesions and multi-lesion regions. In HMSF, we not only propose the adaptive attention mechanism (ASAM) to adaptively adjust the semantic conflicts arising during the fusion process but also introduce Axial-mlp to improve the global modeling capability of the network. Extensive experiments on public datasets confirm the excellent performance of our proposed DHMF-MLP. In particular, on the BUSI, ISIC 2018, and GlaS datasets, IoU reaches 70.65%, 83.46%, and 87.04%, respectively.


Subject(s)
Neural Networks, Computer , Semantics , Image Processing, Computer-Assisted
8.
Sensors (Basel) ; 23(6)2023 Mar 13.
Article in English | MEDLINE | ID: mdl-36991777

ABSTRACT

At present, convolutional neural networks (CNNs) have been widely applied to the task of skin disease image segmentation due to the fact of their powerful information discrimination abilities and have achieved good results. However, it is difficult for CNNs to capture the connection between long-range contexts when extracting deep semantic features of lesion images, and the resulting semantic gap leads to the problem of segmentation blur in skin lesion image segmentation. In order to solve the above problems, we designed a hybrid encoder network based on transformer and fully connected neural network (MLP) architecture, and we call this approach HMT-Net. In the HMT-Net network, we use the attention mechanism of the CTrans module to learn the global relevance of the feature map to improve the network's ability to understand the overall foreground information of the lesion. On the other hand, we use the TokMLP module to effectively enhance the network's ability to learn the boundary features of lesion images. In the TokMLP module, the tokenized MLP axial displacement operation strengthens the connection between pixels to facilitate the extraction of local feature information by our network. In order to verify the superiority of our network in segmentation tasks, we conducted extensive experiments on the proposed HMT-Net network and several newly proposed Transformer and MLP networks on three public datasets (ISIC2018, ISBI2017, and ISBI2016) and obtained the following results. Our method achieves 82.39%, 75.53%, and 83.98% on the Dice index and 89.35%, 84.93%, and 91.33% on the IOU. Compared with the latest skin disease segmentation network, FAC-Net, our method improves the Dice index by 1.99%, 1.68%, and 1.6%, respectively. In addition, the IOU indicators have increased by 0.45%, 2.36%, and 1.13%, respectively. The experimental results show that our designed HMT-Net achieves state-of-the-art performance superior to other segmentation methods.


Subject(s)
Electric Power Supplies , Skin Diseases , Humans , Learning , Neural Networks, Computer , Records , Skin Diseases/diagnostic imaging , Image Processing, Computer-Assisted
9.
Sci Rep ; 12(1): 20800, 2022 Dec 02.
Article in English | MEDLINE | ID: mdl-36460827

ABSTRACT

The existing typical combined query image retrieval methods adopt Euclidean distance as sample distance measurement method, and the model trained by triple loss function blindly pursues absolute distance between samples, resulting in unsatisfactory image retrieval performance. Meanwhile, these methods singularly adopt Convolutional Neural Network (CNN) to extract reference image features. However, receptive field of convolution operation has the characteristics of locality, which is easy to cause the loss of edge feature information of reference images. In view of shortcomings of these methods, the following improvements are proposed in this paper: (1) We propose Triangle Area Triple Loss Function (TATLF), which adopts Triangle Area (TA) as measurement of sample distance. TA comprehensively considers the absolute distance and included angle between samples, so that the trained model has better retrieval performance; (2) We combine CNN with Transformer to simultaneously extract local and edge features of reference images, which can effectively reduce the loss of reference images information. Specifically, CNN is adopted to extract local feature information of reference images. Transformer is used to pay attention to the edge feature information of reference images. Extensive experiments on two public datasets, Fashion200k and MIT-States, confirm the excellent performance of our proposed method.

10.
PLoS One ; 17(11): e0277578, 2022.
Article in English | MEDLINE | ID: mdl-36409714

ABSTRACT

Skin lesion segmentation has become an essential recent direction in machine learning for medical applications. In a deep learning segmentation network, the convolutional neural network (CNN) uses convolution to capture local information for modeling. However, it ignores the relationship between pixels and still can not meet the precise segmentation requirements of some complex low contrast datasets. Transformer performs well in modeling global feature information, but their ability to extract fine-grained local feature patterns is weak. In this work, The dual coding fusion network architecture Transformer and CNN (TC-Net), as an architecture that can more accurately combine local feature information and global feature information, can improve the segmentation performance of skin images. The results of this work demonstrate that the combination of CNN and Transformer brings very significant improvement in global segmentation performance and allows outperformance as compared to the pure single network model. The experimental results and visual analysis of these three datasets quantitatively and qualitatively illustrate the robustness of TC-Net. Compared with Swin UNet, on the ISIC2018 dataset, it has increased by 2.46% in the dice index and about 4% in the JA index. On the ISBI2017 dataset, the dice and JA indices rose by about 4%.


Subject(s)
Image Processing, Computer-Assisted , Skin Diseases , Humans , Image Processing, Computer-Assisted/methods , Algorithms , Neural Networks, Computer , Skin Diseases/diagnostic imaging , Cluster Analysis
11.
Sci Rep ; 12(1): 16117, 2022 09 27.
Article in English | MEDLINE | ID: mdl-36167743

ABSTRACT

U-Net has become baseline standard in the medical image segmentation tasks, but it has limitations in explicitly modeling long-term dependencies. Transformer has the ability to capture long-term relevance through its internal self-attention. However, Transformer is committed to modeling the correlation of all elements, but its awareness of local foreground information is not significant. Since medical images are often presented as regional blocks, local information is equally important. In this paper, we propose the GPA-TUNet by considering local and global information synthetically. Specifically, we propose a new attention mechanism to highlight local foreground information, called group parallel axial attention (GPA). Furthermore, we effectively combine GPA with Transformer in encoder part of model. It can not only highlight the foreground information of samples, but also reduce the negative influence of background information on the segmentation results. Meanwhile, we introduced the sMLP block to improve the global modeling capability of network. Sparse connectivity and weight sharing are well achieved by applying it. Extensive experiments on public datasets confirm the excellent performance of our proposed GPA-TUNet. In particular, on Synapse and ACDC datasets, mean DSC(%) reached 80.37% and 90.37% respectively, mean HD95(mm) reached 20.55 and 1.23 respectively.


Subject(s)
Image Processing, Computer-Assisted , Neural Networks, Computer , Agriculture , Image Processing, Computer-Assisted/methods
12.
Sensors (Basel) ; 22(18)2022 Sep 08.
Article in English | MEDLINE | ID: mdl-36146132

ABSTRACT

Doctors usually diagnose a disease by evaluating the pattern of abnormal blood vessels in the fundus. At present, the segmentation of fundus blood vessels based on deep learning has achieved great success, but it still faces the problems of low accuracy and capillary rupture. A good vessel segmentation method can guide the early diagnosis of eye diseases, so we propose a novel hybrid Transformer network (HT-Net) for fundus imaging analysis. HT-Net can improve the vessel segmentation quality by capturing detailed local information and implementing long-range information interactions, and it mainly consists of the following blocks. The feature fusion block (FFB) is embedded in the shallow levels, and FFB enriches the feature space. In addition, the feature refinement block (FRB) is added to the shallow position of the network, which solves the problem of vessel scale change by fusing multi-scale feature information to improve the accuracy of segmentation. Finally, HT-Net's bottom-level position can capture remote dependencies by combining the Transformer and CNN. We prove the performance of HT-Net on the DRIVE, CHASE_DB1, and STARE datasets. The experiment shows that FFB and FRB can effectively improve the quality of microvessel segmentation by extracting multi-scale information. Embedding efficient self-attention mechanisms in the network can effectively improve the vessel segmentation accuracy. The HT-Net exceeds most existing methods, indicating that it can perform the task of vessel segmentation competently.


Subject(s)
Algorithms , Retinal Vessels , Fundus Oculi , Image Processing, Computer-Assisted/methods , Retinal Vessels/diagnostic imaging
13.
Sensors (Basel) ; 22(18)2022 Sep 16.
Article in English | MEDLINE | ID: mdl-36146373

ABSTRACT

The model, Transformer, is known to rely on a self-attention mechanism to model distant dependencies, which focuses on modeling the dependencies of the global elements. However, its sensitivity to the local details of the foreground information is not significant. Local detail features help to identify the blurred boundaries in medical images more accurately. In order to make up for the defects of Transformer and capture more abundant local information, this paper proposes an attention and MLP hybrid-encoder architecture combining the Efficient Attention Module (EAM) with a Dual-channel Shift MLP module (DS-MLP), called HEA-Net. Specifically, we effectively connect the convolution block with Transformer through EAM to enhance the foreground and suppress the invalid background information in medical images. Meanwhile, DS-MLP further enhances the foreground information via channel and spatial shift operations. Extensive experiments on public datasets confirm the excellent performance of our proposed HEA-Net. In particular, on the GlaS and MoNuSeg datasets, the Dice reached 90.56% and 80.80%, respectively, and the IoU reached 83.62% and 68.26%, respectively.


Subject(s)
Image Processing, Computer-Assisted , Neural Networks, Computer , Algorithms , Image Processing, Computer-Assisted/methods
14.
Entropy (Basel) ; 24(7)2022 Jul 06.
Article in English | MEDLINE | ID: mdl-35885162

ABSTRACT

Violence detection aims to locate violent content in video frames. Improving the accuracy of violence detection is of great importance for security. However, the current methods do not make full use of the multi-modal vision and audio information, which affects the accuracy of violence detection. We found that the violence detection accuracy of different kinds of videos is related to the change of optical flow. With this in mind, we propose an optical flow-aware-based multi-modal fusion network (OAMFN) for violence detection. Specifically, we use three different fusion strategies to fully integrate multi-modal features. First, the main branch concatenates RGB features and audio features and the optical flow branch concatenates optical flow features with RGB features and audio features, respectively. Then, the cross-modal information fusion module integrates the features of different combinations and applies weights to them to capture cross-modal information in audio and video. After that, the channel attention module extracts valuable information by weighting the integration features. Furthermore, an optical flow-aware-based score fusion strategy is introduced to fuse features of different modalities from two branches. Compared with methods on the XD-Violence dataset, our multi-modal fusion network yields APs that are 83.09% and 1.4% higher than those of the state-of-the-art methods in offline detection, and 78.09% and 4.42% higher than those of the state-of-the-art methods in online detection.

15.
Sci Rep ; 12(1): 11968, 2022 Jul 13.
Article in English | MEDLINE | ID: mdl-35831628

ABSTRACT

Presently, research on deep learning-based change detection (CD) methods has become a hot topic. In particular, feature pyramid networks (FPNs) are widely used in CD tasks to gradually fuse semantic features. However, existing FPN-based CD methods do not correctly detect the complete change region and cannot accurately locate the boundaries of the change region. To solve these problems, a new Multi-Scale Feature Progressive Fusion Network (MFPF-Net) is proposed, which consists of three innovative modules: Layer Feature Fusion Module (LFFM), Multi-Scale Feature Aggregation Module (MSFA), and Multi-Scale Feature Distribution Module (MSFD). Specifically, we first concatenate the features of each layer extracted from the bi-temporal images with their difference maps, and the resulting change maps fuse richer semantic information while effectively representing change regions. Then, the obtained change maps of each layer are directly aggregated, which improves the effective communication and full fusion of feature maps in CD while avoiding the interference of indirect information. Finally, the aggregated feature maps are layered again by pooling and convolution operations, and then a feature fusion strategy with a pyramid structure is used, with layers fused from low to high, to obtain richer contextual information, so that each layer of the layered feature maps has original semantic information and semantic features of other layers. We conducted comprehensive experiments on three publicly available benchmark datasets, CDD, LEVIR-CD, and WHU-CD to verify the effectiveness of the method, and the experimental results show that the method in this paper outperforms other comparative methods.

16.
Sensors (Basel) ; 22(12)2022 Jun 14.
Article in English | MEDLINE | ID: mdl-35746271

ABSTRACT

Different feature learning strategies have enhanced performance in recent deep neural network-based salient object detection. Multi-scale strategy and residual learning strategies are two types of multi-scale learning strategies. However, there are still some problems, such as the inability to effectively utilize multi-scale feature information and the lack of fine object boundaries. We propose a feature refined network (FRNet) to overcome the problems mentioned, which includes a novel feature learning strategy that combines the multi-scale and residual learning strategies to generate the final saliency prediction. We introduce the spatial and channel 'squeeze and excitation' blocks (scSE) at the side outputs of the backbone. It allows the network to concentrate more on saliency regions at various scales. Then, we propose the adaptive feature fusion module (AFFM), which efficiently fuses multi-scale feature information in order to predict superior saliency maps. Finally, to supervise network learning of more information on object boundaries, we propose a hybrid loss that contains four fundamental losses and combines properties of diverse losses. Comprehensive experiments demonstrate the effectiveness of the FRNet on five datasets, with competitive results when compared to other relevant approaches.


Subject(s)
Machine Learning , Neural Networks, Computer , Learning
17.
Sensors (Basel) ; 22(12)2022 Jun 19.
Article in English | MEDLINE | ID: mdl-35746407

ABSTRACT

Change detection (CD) is a particularly important task in the field of remote sensing image processing. It is of practical importance for people when making decisions about transitional situations on the Earth's surface. The existing CD methods focus on the design of feature extraction network, ignoring the strategy fusion and attention enhancement of the extracted features, which will lead to the problems of incomplete boundary of changed area and missing detection of small targets in the final output change map. To overcome the above problems, we proposed a hierarchical attention residual nested U-Net (HARNU-Net) for remote sensing image CD. First, the backbone network is composed of a Siamese network and nested U-Net. We remold the convolution block in nested U-Net and proposed ACON-Relu residual convolution block (A-R), which reduces the missed detection rate of the backbone network in small change areas. Second, this paper proposed the adjacent feature fusion module (AFFM). Based on the adjacency fusion strategy, the module effectively integrates the details and semantic information of multi-level features, so as to realize the feature complementarity and spatial mutual enhancement between adjacent features. Finally, the hierarchical attention residual module (HARM) is proposed, which locally filters and enhances the features in a more fine-grained space to output a much better change map. Adequate experiments on three challenging benchmark public datasets, CDD, LEVIR-CD and BCDD, show that our method outperforms several other state-of-the-art methods and performs excellent in F1, IOU and visual image quality.


Subject(s)
Algorithms , Neural Networks, Computer , Attention , Humans , Image Processing, Computer-Assisted/methods , Remote Sensing Technology
18.
Comput Intell Neurosci ; 2022: 9637460, 2022.
Article in English | MEDLINE | ID: mdl-35586112

ABSTRACT

To address the problem that some current algorithms suffer from the loss of some important features due to rough feature distillation and the loss of key information in some channels due to compressed channel attention in the network, we propose a progressive multistage distillation network that gradually refines the features in stages to obtain the maximum amount of key feature information in them. In addition, to maximize the network performance, we propose a weight-sharing information lossless attention block to enhance the channel characteristics through a weight-sharing auxiliary path and, at the same time, use convolution layers to model the interchannel dependencies without compression, effectively avoiding the previous problem of information loss in channel attention. Extensive experiments on several benchmark data sets show that the algorithm in this paper achieves a good balance between network performance, the number of parameters, and computational complexity and achieves highly competitive performance in both objective metrics and subjective vision, which indicates the advantages of this paper's algorithm for image reconstruction. It can be seen that this gradual feature distillation from coarse to fine is effective in improving network performance. Our code is available at the following link: https://github.com/Cai631/PMDN.


Subject(s)
Data Compression , Distillation , Algorithms , Image Processing, Computer-Assisted/methods , Neural Networks, Computer
19.
Sci Rep ; 12(1): 7082, 2022 04 30.
Article in English | MEDLINE | ID: mdl-35490175

ABSTRACT

Deep hashing method is widely applied in the field of image retrieval because of its advantages of low storage consumption and fast retrieval speed. There is a defect of insufficiency feature extraction when existing deep hashing method uses the convolutional neural network (CNN) to extract images semantic features. Some studies propose to add channel-based or spatial-based attention modules. However, embedding these modules into the network can increase the complexity of model and lead to over fitting in the training process. In this study, a novel deep parameter-free attention hashing (DPFAH) is proposed to solve these problems, that designs a parameter-free attention (PFA) module in ResNet18 network. PFA is a lightweight module that defines an energy function to measure the importance of each neuron and infers 3-D attention weights for feature map in a layer. A fast closed-form solution for this energy function proves that the PFA module does not add any parameters to the network. Otherwise, this paper designs a novel hashing framework that includes the hash codes learning branch and the classification branch to explore more label information. The like-binary codes are constrained by a regulation term to reduce the quantization error in the continuous relaxation. Experiments on CIFAR-10, NUS-WIDE and Imagenet-100 show that DPFAH method achieves better performance.


Subject(s)
Neural Networks, Computer , Semantics , Attention
20.
Sensors (Basel) ; 22(8)2022 Apr 15.
Article in English | MEDLINE | ID: mdl-35459043

ABSTRACT

Recently, the feedforward architecture of a super-resolution network based on deep learning was proposed to learn the representation of a low-resolution (LR) input and the non-linear mapping from these inputs to a high-resolution (HR) output, but this method cannot completely solve the interdependence between LR and HR images. In this paper, we retain the feedforward architecture and introduce residuals to a dual-level; therefore, we propose the dual-level recurrent residual network (DLRRN) to generate an HR image with rich details and satisfactory vision. Compared with feedforward networks that operate at a fixed spatial resolution, the dual-level recurrent residual block (DLRRB) in DLRRN utilizes both LR and HR space information. The circular signals in DLRRB enhance spatial details by the mutual guidance between two directions (LR to HR and HR to LR). Specifically, the LR information of the current layer is generated by the HR and LR information of the previous layer. Then, the HR information of the previous layer and LR information of the current layer jointly generate the HR information of the current layer, and so on. The proposed DLRRN has a strong ability for early reconstruction and can gradually restore the final high-resolution image. An extensive quantitative and qualitative evaluation of the benchmark dataset was carried out, and the experimental results proved that our network achieved good results in terms of network parameters, visual effects and objective performance metrics.

SELECTION OF CITATIONS
SEARCH DETAIL
...