Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Sensors (Basel) ; 23(21)2023 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-37960550

RESUMO

Vast amounts of monitoring data can be obtained through various optical sensors, and mask detection based on deep learning integrates neural science into a variety of applications in everyday life. However, mask detection poses technical challenges such as small targets, complex scenes, and occlusions, which necessitate high accuracy and robustness in multi-scene target detection networks. Considering that multi-scale features can increase the receptive field and attention mechanism can improve the detection effect of small targets, we propose the YOLO-MSM network based on the multi-scale residual (MSR) block, multi-scale residual cascaded channel-spatial attention (MSR-CCSA) block, enhanced residual CCSA (ER-CCSA) block, and enhanced residual PCSA (ER-PCSA) block. Considering the performance and parameters, we use YOLOv5 as the baseline network. Firstly, for the MSR block, we construct hierarchical residual connections in the residual blocks to extract multi-scale features and obtain finer features. Secondly, to realize the joint attention function of channel and space, both the CCSA block and PCSA block are adopted. In addition, we construct a new dataset named Multi-Scene-Mask, which contains various scenes, crowd densities, and mask types. Experiments on the dataset show that YOLO-MSM achieves an average precision of 97.51%, showing better performance than other detection networks. Compared with the baseline network, the mAP value of YOLO-MSM is increased by 3.46%. Moreover, we propose a module generalization improvement strategy (GIS) by training YOLO-MSM on the dataset augmented with white Gaussian addition noise to improve the generalization ability of the network. The test results verify that GIS can greatly improve the generalization of the network and YOLO-MSM has stronger generalization ability than the baseline.

2.
Methods ; 220: 134-141, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37967757

RESUMO

Automated 12-lead electrocardiographic (ECG) classification algorithms play an important role in the diagnosis of clinical arrhythmias. Current methods that perform well in the field of automatic ECG classification are usually based on Convolutional Neural Networks (CNN) or Transformer. However, due to the intrinsic locality of convolution operations, CNN can't extract long-dependence between series. On the other side, the Transformer design includes a built-in global self-attention mechanism, but it doesn't pay enough attention to local features. In this paper, we propose DAMS-Net, which combines the advantages of Transformer and CNN, introducing a spatial attention module and a channel attention module using a CNN-Transformer hybrid encoder to adaptively focus on the significant features of global and local parts between space and channels. In addition, our proposal fuses multi-scale information to capture high and low-level semantic information by skip-connections. We evaluate our method on the 2018 Physiological Electrical Signaling Challenge dataset, and our proposal achieves a precision rate of 83.6%, a recall rate of 84.7%, and an F1-score of 0.839. The classification performance is superior to all current single-model methods evaluated in this dataset. The experimental results demonstrate the promising application of our proposed method in 12-lead ECG automatic classification tasks.


Assuntos
Algoritmos , Eletrocardiografia , Redes Neurais de Computação , Semântica , Transdução de Sinais , Processamento de Imagem Assistida por Computador
3.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37756592

RESUMO

The prediction of prognostic outcome is critical for the development of efficient cancer therapeutics and potential personalized medicine. However, due to the heterogeneity and diversity of multimodal data of cancer, data integration and feature selection remain a challenge for prognostic outcome prediction. We proposed a deep learning method with generative adversarial network based on sequential channel-spatial attention modules (CSAM-GAN), a multimodal data integration and feature selection approach, for accomplishing prognostic stratification tasks in cancer. Sequential channel-spatial attention modules equipped with an encoder-decoder are applied for the input features of multimodal data to accurately refine selected features. A discriminator network was proposed to make the generator and discriminator learning in an adversarial way to accurately describe the complex heterogeneous information of multiple modal data. We conducted extensive experiments with various feature selection and classification methods and confirmed that the CSAM-GAN via the multilayer deep neural network (DNN) classifier outperformed these baseline methods on two different multimodal data sets with miRNA expression, mRNA expression and histopathological image data: lower-grade glioma and kidney renal clear cell carcinoma. The CSAM-GAN via the multilayer DNN classifier bridges the gap between heterogenous multimodal data and prognostic outcome prediction.


Assuntos
Carcinoma de Células Renais , Glioma , Neoplasias Renais , MicroRNAs , Humanos , Prognóstico
4.
J Imaging ; 9(7)2023 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-37504807

RESUMO

Vision-based human activity recognition (HAR) has emerged as one of the essential research areas in video analytics. Over the last decade, numerous advanced deep learning algorithms have been introduced to recognize complex human actions from video streams. These deep learning algorithms have shown impressive performance for the video analytics task. However, these newly introduced methods either exclusively focus on model performance or the effectiveness of these models in terms of computational efficiency, resulting in a biased trade-off between robustness and computational efficiency in their proposed methods to deal with challenging HAR problem. To enhance both the accuracy and computational efficiency, this paper presents a computationally efficient yet generic spatial-temporal cascaded framework that exploits the deep discriminative spatial and temporal features for HAR. For efficient representation of human actions, we propose an efficient dual attentional convolutional neural network (DA-CNN) architecture that leverages a unified channel-spatial attention mechanism to extract human-centric salient features in video frames. The dual channel-spatial attention layers together with the convolutional layers learn to be more selective in the spatial receptive fields having objects within the feature maps. The extracted discriminative salient features are then forwarded to a stacked bi-directional gated recurrent unit (Bi-GRU) for long-term temporal modeling and recognition of human actions using both forward and backward pass gradient learning. Extensive experiments are conducted on three publicly available human action datasets, where the obtained results verify the effectiveness of our proposed framework (DA-CNN+Bi-GRU) over the state-of-the-art methods in terms of model accuracy and inference runtime across each dataset. Experimental results show that the DA-CNN+Bi-GRU framework attains an improvement in execution time up to 167× in terms of frames per second as compared to most of the contemporary action-recognition methods.

5.
BMC Bioinformatics ; 24(1): 85, 2023 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-36882688

RESUMO

Although various methods based on convolutional neural networks have improved the performance of biomedical image segmentation to meet the precision requirements of medical imaging segmentation task, medical image segmentation methods based on deep learning still need to solve the following problems: (1) Difficulty in extracting the discriminative feature of the lesion region in medical images during the encoding process due to variable sizes and shapes; (2) difficulty in fusing spatial and semantic information of the lesion region effectively during the decoding process due to redundant information and the semantic gap. In this paper, we used the attention-based Transformer during the encoder and decoder stages to improve feature discrimination at the level of spatial detail and semantic location by its multihead-based self-attention. In conclusion, we propose an architecture called EG-TransUNet, including three modules improved by a transformer: progressive enhancement module, channel spatial attention, and semantic guidance attention. The proposed EG-TransUNet architecture allowed us to capture object variabilities with improved results on different biomedical datasets. EG-TransUNet outperformed other methods on two popular colonoscopy datasets (Kvasir-SEG and CVC-ClinicDB) by achieving 93.44% and 95.26% on mDice. Extensive experiments and visualization results demonstrate that our method advances the performance on five medical segmentation datasets with better generalization ability.


Assuntos
Fontes de Energia Elétrica , Redes Neurais de Computação , Semântica
6.
ISA Trans ; 133: 369-383, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35798589

RESUMO

This paper proposes a selective kernel convolution deep residual network based on the channel-spatial attention mechanism and feature fusion for mechanical fault diagnosis. First, adjacent channel attention modules are connected with the spatial attention mechanism module, then all channel features and spatial features are fused and a channel-spatial attention mechanism is constructed to form the feature enhancement module. Second, the feature enhancement module is embedded in a series model based on selective kernel convolution and deep residual network and combined with multi-layer feature fusion information. The model can more effectively extract fault features from the vibration signal, compared with traditional deep learning methods, and the fault recognition efficiency is improved. Finally, the proposed method was used to experimentally diagnose bearing and gear faults, and identification accuracies of 99.87% and 97.77%, respectively, were achieved. Compared with similar algorithms, the proposed method has higher fault identification ability, thereby demonstrating the advantages of the channel-spatial attention mechanism network. In addition, the accuracy and robustness of the model were verified.

7.
Sensors (Basel) ; 22(23)2022 Nov 24.
Artigo em Inglês | MEDLINE | ID: mdl-36501811

RESUMO

Image super resolution (SR) is an important image processing technique in computer vision to improve the resolution of images and videos. In recent years, deep convolutional neural network (CNN) has made significant progress in the field of image SR; however, the existing CNN-based SR methods cannot fully search for background information in the measurement of feature extraction. In addition, in most cases, different scale factors of image SR are assumed to be different assignments and completed by training different models, which does not meet the actual application requirements. To solve these problems, we propose a multi-scale learning wavelet attention network (MLWAN) model for image SR. Specifically, the proposed model consists of three parts. In the first part, low-level features are extracted from the input image through two convolutional layers, and then a new channel-spatial attention mechanism (CSAM) block is concatenated. In the second part, CNN is used to predict the highest-level low-frequency wavelet coefficients, and the third part uses recursive neural networks (RNN) with different scales to predict the wavelet coefficients of the remaining subbands. In order to further achieve lightweight, an effective channel attention recurrent module (ECARM) is proposed to reduce network parameters. Finally, the inverse discrete wavelet transform (IDWT) is used to reconstruct HR image. Experimental results on public large-scale datasets demonstrate the superiority of the proposed model in terms of quantitative indicators and visual effects.


Assuntos
Aprendizagem , Redes Neurais de Computação , Gravação de Videoteipe , Processamento de Imagem Assistida por Computador , Registros
8.
Comput Biol Med ; 149: 105970, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36058067

RESUMO

Diabetic retinopathy (DR) is currently considered to be one of the most common diseases that cause blindness. However, DR grading methods are still challenged by the presence of imbalanced class distributions, small lesions, low accuracy of small sample classes and poor explainability. To address these issues, a resampling-based cost loss attention network for explainable imbalanced diabetic retinopathy grading is proposed. First, the progressively-balanced resampling strategy is put forward to create a balanced training data by mixing the two sets of samples obtained from instance-based sampling and class-based sampling. Subsequently, a neuron and normalized channel-spatial attention module (Neu-NCSAM) is designed to learn the global features with 3-D weights and a weight sparsity penalty is applied to the attention module to suppress irrelevant channels or pixels, thereby capturing detailed small lesion information. Thereafter, a weighted loss function of the Cost-Sensitive (CS) regularization and Gaussian label smoothing loss, called cost loss, is proposed to intelligently penalize the incorrect predictions and thus to improve the grading accuracy of small sample classes. Finally, the Gradient-weighted Class Activation Mapping (Grad-CAM) is performed to acquire the localization map of the questionable lesions in order to visually interpret and understand the effect of our model. Comprehensive experiments are carried out on two public datasets, and the subjective and objective results demonstrate that the proposed network outperforms the state-of-the-art methods and achieves the best DR grading results with 83.46%, 60.44%, 65.18%, 63.69% and 92.26% for Kappa, BACC, MCC, F1 and mAUC, respectively.


Assuntos
Diabetes Mellitus , Retinopatia Diabética , Retinopatia Diabética/patologia , Humanos
9.
Comput Med Imaging Graph ; 98: 102072, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35594809

RESUMO

In clinical practice, automatic polyp segmentation from colonoscopy images is an effective assistant manner in the early detection and prevention of colorectal cancer. This paper proposed a new deep model for accurate polyp segmentation based on an encoder-decoder framework. ResNet50 is adopted as the encoder, and three functional modules are introduced to improve the performance. Firstly, a hybrid channel-spatial attention module is introduced to reweight the encoder features spatially and channel-wise, enhancing the critical features for the segmentation task while suppressing irrelevant ones. Secondly, a global context pyramid feature extraction module and a series of global context flows are proposed to extract and deliver the global context information. The former captures the multi-scale and multi-receptive-field global context information, while the latter explicitly transmits the global context information to each decoder level. Finally, a feature fusion module is designed to effectively incorporate the high-level features, low-level features, and global context information, considering the gaps between different features. These modules help the model fully exploit the global context information to deduce the complete polyp regions. Extensive experiments are conducted on five public colorectal polyp datasets. The results demonstrate that the proposed network has powerful learning and generalization capability, significantly improving segmentation accuracy and outperforming state-of-the-art methods.


Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Atenção , Processamento de Imagem Assistida por Computador/métodos , Aprendizagem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA