RESUMEN
Identification of retinal diseases in automated screening methods, such as those used in clinical settings or computer-aided diagnosis, usually depends on the localization and segmentation of the Optic Disc (OD) and fovea. However, this task is difficult since these anatomical features have irregular spatial, texture, and shape characteristics, limited sample sizes, and domain shifts due to different data distributions across datasets. This study proposes a novel Multiresolution Cascaded Attention U-Net (MCAU-Net) model that addresses these problems by optimally balancing receptive field size and computational efficiency. The MCAU-Net utilizes two skip connections to accurately localize and segment the OD and fovea in fundus images. We incorporated a Multiresolution Wavelet Pooling Module (MWPM) into the CNN at each stage of U-Net input to compensate for spatial information loss. Additionally, we integrated a cascaded connection of the spatial and channel attentions as a skip connection in MCAU-Net to concentrate precisely on the target object and improve model convergence for segmenting and localizing OD and fovea centers. The proposed model has a low parameter count of 0.8 million, improving computational efficiency and reducing the risk of overfitting. For OD segmentation, the MCAU-Net achieves high IoU values of 0.9771, 0.945, and 0.946 for the DRISHTI-GS, DRIONS-DB, and IDRiD datasets, respectively, outperforming previous results for all three datasets. For the IDRiD dataset, the MCAU-Net locates the OD center with an Euclidean Distance (ED) of 16.90 pixels and the fovea center with an ED of 33.45 pixels, demonstrating its effectiveness in overcoming the common limitations of state-of-the-art methods.
Asunto(s)
Fóvea Central , Fondo de Ojo , Disco Óptico , Humanos , Disco Óptico/diagnóstico por imagen , Fóvea Central/diagnóstico por imagen , Redes Neurales de la Computación , Algoritmos , Procesamiento de Imagen Asistido por Computador/métodosRESUMEN
The motion of an object or camera platform makes the acquired image blurred. This degradation is a major reason to obtain a poor-quality image from an imaging sensor. Therefore, developing an efficient deep-learning-based image processing method to remove the blur artifact is desirable. Deep learning has recently demonstrated significant efficacy in image deblurring, primarily through convolutional neural networks (CNNs) and Transformers. However, the limited receptive fields of CNNs restrict their ability to capture long-range structural dependencies. In contrast, Transformers excel at modeling these dependencies, but they are computationally expensive for high-resolution inputs and lack the appropriate inductive bias. To overcome these challenges, we propose an Efficient Hybrid Network (EHNet) that employs CNN encoders for local feature extraction and Transformer decoders with a dual-attention module to capture spatial and channel-wise dependencies. This synergy facilitates the acquisition of rich contextual information for high-quality image deblurring. Additionally, we introduce the Simple Feature-Embedding Module (SFEM) to replace the pointwise and depthwise convolutions to generate simplified embedding features in the self-attention mechanism. This innovation substantially reduces computational complexity and memory usage while maintaining overall performance. Finally, through comprehensive experiments, our compact model yields promising quantitative and qualitative results for image deblurring on various benchmark datasets.
RESUMEN
This study proposed an improved full-scale aggregated MobileUNet (FA-MobileUNet) model to achieve more complete detection results of oil spill areas using synthetic aperture radar (SAR) images. The convolutional block attention module (CBAM) in the FA-MobileUNet was modified based on morphological concepts. By introducing the morphological attention module (MAM), the improved FA-MobileUNet model can reduce the fragments and holes in the detection results, providing complete oil spill areas which were more suitable for describing the location and scope of oil pollution incidents. In addition, to overcome the inherent category imbalance of the dataset, label smoothing was applied in model training to reduce the model's overconfidence in majority class samples while improving the model's generalization ability. The detection performance of the improved FA-MobileUNet model reached an mIoU (mean intersection over union) of 84.55%, which was 17.15% higher than that of the original U-Net model. The effectiveness of the proposed model was then verified using the oil pollution incidents that significantly impacted Taiwan's marine environment. Experimental results showed that the extent of the detected oil spill was consistent with the oil pollution area recorded in the incident reports.
Asunto(s)
Monitoreo del Ambiente , Contaminación por Petróleo , Radar , Contaminación por Petróleo/análisis , Monitoreo del Ambiente/métodos , Taiwán , AlgoritmosRESUMEN
To assist grassroots sonographers in accurately and rapidly detecting intussusception lesions from children's abdominal ultrasound images, this paper proposes an improved YOLOv8n children's intussusception detection algorithm, called EMC-YOLOv8n. Firstly, the EfficientViT network with a cascaded group attention module was used as the backbone network to enhance the speed of target detection. Secondly, the improved C2fMBC module was used to replace the C2f module in the neck network to reduce network complexity, and the coordinate attention (CA) module was introduced after each C2fMBC module to enhance attention to positional information. Finally, experiments were conducted on the self-built dataset of intussusception in children. The results showed that the recall rate, average detection accuracy (mAP@0.5) and precision of the EMC-YOLOv8n algorithm improved by 3.9%, 2.1% and 0.9%, respectively, compared to the baseline algorithm. Despite slightly increased network parameters and computational load, significant improvements in detection accuracy enable efficient completion of detection tasks, demonstrating substantial economic and social value.
Asunto(s)
Algoritmos , Intususcepción , Ultrasonografía , Humanos , Intususcepción/diagnóstico por imagen , Ultrasonografía/métodos , Niño , Procesamiento de Imagen Asistido por Computador/métodosRESUMEN
Hyperspectral image (HSI) data has a wide range of valuable spectral information for numerous tasks. HSI data encounters challenges such as small training samples, scarcity, and redundant information. Researchers have introduced various research works to address these challenges. Convolution Neural Network (CNN) has gained significant success in the field of HSI classification. CNN's primary focus is to extract low-level features from HSI data, and it has a limited ability to detect long-range dependencies due to the confined filter size. In contrast, vision transformers exhibit great success in the HSI classification field due to the use of attention mechanisms to learn the long-range dependencies. As mentioned earlier, the primary issue with these models is that they require sufficient labeled training data. To address this challenge, we proposed a spectral-spatial feature extractor group attention transformer that consists of a multiscale feature extractor to extract low-level or shallow features. For high-level semantic feature extraction, we proposed a group attention mechanism. Our proposed model is evaluated using four publicly available HSI datasets, which are Indian Pines, Pavia University, Salinas, and the KSC dataset. Our proposed approach achieved the best classification results in terms of overall accuracy (OA), average accuracy (AA), and Kappa coefficient. As mentioned earlier, the proposed approach utilized only 5%, 1%, 1%, and 10% of the training samples from the publicly available four datasets.
RESUMEN
Sorting recyclable trash is critical to reducing energy consumption and mitigating environmental pollution. Currently, trash sorting heavily relies on manpower. Computer vision technology enables automated trash sorting. However, existing trash image classification datasets contain a large number of images without backgrounds. Moreover, the models are vulnerable to background interference when categorizing images with complex backgrounds. In this work, we provide a recyclable trash dataset that supports model training and design a model specifically for trash sorting. Firstly, we introduce the TrashIVL dataset, an image dataset for recyclable trash sorting encompassing five classes (TrashIVL-5). All images are collected from public trash datasets, and the original images were captured by RGB imaging sensors, containing trash items with real-life backgrounds. To achieve refined recycling and improve sorting efficiency, the TrashIVL dataset can be further categorized into 12 classes (TrashIVL-12). Secondly, we propose the integrated parallel attention module (IPAM). Considering the susceptibility of sensor-based systems to background interference in real-world trash sorting scenarios, our IPAM is specifically designed to focus on the essential features of trash images from both channel and spatial perspectives. It can be inserted into convolutional neural networks (CNNs) as a plug-and-play module. We have constructed a recyclable trash sorting network building upon the IPAM, which produces an acuracy of 97.42% on TrashIVL-5 and 94.08% on TrashIVL-12. Our work is an effective attempt of computer vision in recyclable trash sorting. It makes a positive contribution to environmental protection and sustainable development.
RESUMEN
Electric motors play a crucial role in self-driving vehicles. Therefore, fault diagnosis in motors is important for ensuring the safety and reliability of vehicles. In order to improve fault detection performance, this paper proposes a motor fault diagnosis method based on vibration signals. Firstly, the vibration signals of each operating state of the motor at different frequencies are measured with vibration sensors. Secondly, the characteristic of Gram image coding is used to realize the coding of time domain information, and the one-dimensional vibration signals are transformed into grayscale diagrams to highlight their features. Finally, the lightweight neural network Xception is chosen as the main tool, and the attention mechanism Convolutional Block Attention Module (CBAM) is introduced into the model to enforce the importance of the characteristic information of the motor faults and realize their accurate identification. Xception is a type of convolutional neural network; its lightweight design maintains excellent performance while significantly reducing the model's order of magnitude. Without affecting the computational complexity and accuracy of the network, the CBAM attention mechanism is added, and Gram's corner field is combined with the improved lightweight neural network. The experimental results show that this model achieves a better recognition effect and faster iteration speed compared with the traditional Convolutional Neural Network (CNN), ResNet, and Xception networks.
RESUMEN
The synthesis of pseudo-healthy images, involving the generation of healthy counterparts for pathological images, is crucial for data augmentation, clinical disease diagnosis, and understanding pathology-induced changes. Recently, Generative Adversarial Networks (GANs) have shown substantial promise in this domain. However, the heterogeneity of intracranial infection symptoms caused by various infections complicates the model's ability to accurately differentiate between pathological and healthy regions, leading to the loss of critical information in healthy areas and impairing the precise preservation of the subject's identity. Moreover, for images with extensive lesion areas, the pseudo-healthy images generated by these methods often lack distinct organ and tissue structures. To address these challenges, we propose a three-stage method (localization, inpainting, synthesis) that achieves nearly perfect preservation of the subject's identity through precise pseudo-healthy synthesis of the lesion region and its surroundings. The process begins with a Segmentor, which identifies the lesion areas and differentiates them from healthy regions. Subsequently, a Vague-Filler fills the lesion areas to construct a healthy outline, thereby preventing structural loss in cases of extensive lesions. Finally, leveraging this healthy outline, a Generative Adversarial Network integrated with a contextual residual attention module generates a more realistic and clearer image. Our method was validated through extensive experiments across different modalities within the BraTS2021 dataset, achieving a healthiness score of 0.957. The visual quality of the generated images markedly exceeded those produced by competing methods, with enhanced capabilities in repairing large lesion areas. Further testing on the COVID-19-20 dataset showed that our model could effectively partially reconstruct images of other organs.
RESUMEN
To enable the timely adjustment of the control strategy of automobile active safety systems, enhance their capacity to adapt to complex working conditions, and improve driving safety, this paper introduces a new method for predicting road surface state information and recognizing road adhesion coefficients using an enhanced version of the MobileNet V3 model. On one hand, the Squeeze-and-Excitation (SE) is replaced by the Convolutional Block Attention Module (CBAM). It can enhance the extraction of features effectively by considering both spatial and channel dimensions. On the other hand, the cross-entropy loss function is replaced by the Bias Loss function. It can reduce the random prediction problem occurring in the optimization process to improve identification accuracy. Finally, the proposed method is evaluated in an experiment with a four-wheel-drive ROS robot platform. Results indicate that a classification precision of 95.53% is achieved, which is higher than existing road adhesion coefficient identification methods.
RESUMEN
Conventionally diagnosing septic arthritis relies on detecting the causal pathogens in samples of synovial fluid, synovium, or blood. However, isolating these pathogens through cultures takes several days, thus delaying both diagnosis and treatment. Establishing a quantitative classification model from ultrasound images for rapid septic arthritis diagnosis is mandatory. For the study, a database composed of 342 images of non-septic arthritis and 168 images of septic arthritis produced by grayscale (GS) and power Doppler (PD) ultrasound was constructed. In the proposed architecture of fusion with attention and selective transformation (FAST), both groups of images were combined in a vision transformer (ViT) with the convolutional block attention module, which incorporates spatial, modality, and channel features. Fivefold cross-validation was applied to evaluate the generalized ability. The FAST architecture achieved the accuracy, sensitivity, specificity, and area under the curve (AUC) of 86.33%, 80.66%, 90.25%, and 0.92, respectively. These performances were higher than using conventional ViT (82.14%) and significantly better than using one modality alone (GS 73.88%, PD 72.02%), with the p-value being less than 0.01. Through the integration of multi-modality and the extraction of multiple channel features, the established model provided promising accuracy and AUC in septic arthritis classification. The end-to-end learning of ultrasound features can provide both rapid and objective assessment suggestions for future clinic use.
RESUMEN
Sleep staging is the most crucial work before diagnosing and treating sleep disorders. Traditional manual sleep staging is time-consuming and depends on the skill of experts. Nowadays, automatic sleep staging based on deep learning attracts more and more scientific researchers. As we know, the salient waves in sleep signals contain the most important information for automatic sleep staging. However, the key information is not fully utilized in existing deep learning methods since most of them only use CNN or RNN which could not capture multi-scale features in salient waves effectively. To tackle this limitation, we propose a lightweight end-to-end network for sleep stage prediction based on feature pyramid and joint attention. The feature pyramid module is designed to effectively extract multi-scale features in salient waves, and these features are then fed to the joint attention module to closely attend to the channel and location information of the salient waves. The proposed network has much fewer parameters and significant performance improvement, which is better than the state-of-the-art results. The overall accuracy and macro F1 score on the public dataset Sleep-EDF39, Sleep-EDF153 and SHHS are 90.1%, 87.8%, 87.4%, 84.4% and 86.9%, 83.9%, respectively. Ablation experiments confirm the effectiveness of each module.
Asunto(s)
Fases del Sueño , Humanos , Redes Neurales de la Computación , Aprendizaje Profundo , AlgoritmosRESUMEN
In low-light environments, the amount of light captured by the camera sensor is reduced, resulting in lower image brightness. This makes it difficult to recognize or completely lose details in the image, which affects subsequent processing of low-light images. Low-light image enhancement methods can increase image brightness while better-restoring color and detail information. A generative adversarial network is proposed for low-quality image enhancement to improve the quality of low-light images. This network consists of a generative network and an adversarial network. In the generative network, a multi-scale feature extraction module, which consists of dilated convolutions, regular convolutions, max pooling, and average pooling, is designed. This module can extract low-light image features from multiple scales, thereby obtaining richer feature information. Secondly, an illumination attention module is designed to reduce the interference of redundant features. This module assigns greater weight to important illumination features, enabling the network to extract illumination features more effectively. Finally, an encoder-decoder generative network is designed. It uses the multi-scale feature extraction module, illumination attention module, and other conventional modules to enhance low-light images and improve quality. Regarding the adversarial network, a dual-discriminator structure is designed. This network has a global adversarial network and a local adversarial network. They determine if the input image is actual or generated from global and local features, enhancing the performance of the generator network. Additionally, an improved loss function is proposed by introducing color loss and perceptual loss into the conventional loss function. It can better measure the color loss between the generated image and a normally illuminated image, thus reducing color distortion during the enhancement process. The proposed method, along with other methods, is tested using both synthesized and real low-light images. Experimental results show that, compared to other methods, the images enhanced by the proposed method are closer to normally illuminated images for synthetic low-light images. For real low-light images, the images enhanced by the proposed method retain more details, are more apparent, and exhibit higher performance metrics. Overall, compared to other methods, the proposed method demonstrates better image enhancement capabilities for both synthetic and real low-light images.
RESUMEN
Driver monitoring systems (DMS) are crucial in autonomous driving systems (ADS) when users are concerned about driver/vehicle safety. In DMS, the significant influencing factor of driver/vehicle safety is the classification of driver distractions or activities. The driver's distractions or activities convey meaningful information to the ADS, enhancing the driver/ vehicle safety in real-time vehicle driving. The classification of driver distraction or activity is challenging due to the unpredictable nature of human driving. This paper proposes a convolutional block attention module embedded in Visual Geometry Group (CBAM VGG16) deep learning architecture to improve the classification performance of driver distractions. The proposed CBAM VGG16 architecture is the hybrid network of the CBAM layer with conventional VGG16 network layers. Adding a CBAM layer into a traditional VGG16 architecture enhances the model's feature extraction capacity and improves the driver distraction classification results. To validate the significant performance of our proposed CBAM VGG16 architecture, we tested our model on the American University in Cairo (AUC) distracted driver dataset version 2 (AUCD2) for cameras 1 and 2 images. Our experiment results show that the proposed CBAM VGG16 architecture achieved 98.65% classification accuracy for camera 1 and 97.85% for camera 2 AUCD2 datasets. The CBAM VGG16 architecture also compared the driver distraction classification performance with DenseNet121, Xception, MoblieNetV2, InceptionV3, and VGG16 architectures based on the proposed model's accuracy, loss, precision, F1 score, recall, and confusion matrix. The drivers' distraction classification results indicate that the proposed CBAM VGG16 has 3.7% classification improvements for AUCD2 camera 1 images and 5% for camera 2 images compared to the conventional VGG16 deep learning classification model. We also tested our proposed architecture with different hyperparameter values and estimated the optimal values for best driver distraction classification. The significance of data augmentation techniques for the data diversity performance of the CBAM VGG16 model is also validated in terms of overfitting scenarios. The Grad-CAM visualization of our proposed CBAM VGG16 architecture is also considered in our study, and the results show that VGG16 architecture without CBAM layers is less attentive to the essential parts of the driver distraction images. Furthermore, we tested the effective classification performance of our proposed CBAM VGG16 architecture with the number of model parameters, model size, various input image resolutions, cross-validation, Bayesian search optimization and different CBAM layers. The results indicate that CBAM layers in our proposed architecture enhance the classification performance of conventional VGG16 architecture and outperform the state-of-the-art deep learning architectures.
Asunto(s)
Conducción de Automóvil , Aprendizaje Profundo , Humanos , Conducción Distraída , Atención , Redes Neurales de la ComputaciónRESUMEN
BACKGROUND AND OBJECTIVE: Coronary artery segmentation is a pivotal field that has received increasing attention in recent years. However, this task remains challenging because of the inhomogeneous distributions of the contrast agent and dim light, resulting in noise, vascular breakages and small vessel losses in the obtained segmentation results. METHODS: To acquire better automatic blood vessel segmentation results for coronary angiography images, a UNet-based segmentation network (SARC-UNet) is constructed for coronary artery segmentation; this approach is based on residual convolution and spatial attention. First, we use the low-light image enhancement (LIME) approach to increase the contrast and clarity levels of coronary angiography images. Then, we design two residual convolution fusion modules (RCFM1 and RCFM2) that can successfully fuse the local and global information of coronary images while also capturing the characteristics of finer-grained blood vessels, hence preventing the loss of tiny blood vessels in the segmentation findings. Finally, using a cascaded waterfall structure, we create a new location-enhanced spatial attention (LESA) mechanism that can efficiently improve the long-distance dependencies between coronary vascular pixel features, eradicating vascular ruptures and noise in the segmentation results. RESULTS: This article subjectively and objectively evaluates the experimental results. This method has performed well on five general indicators. Furthermore, it outperforms the connectivity indicators proposed in this article. This method can effectively segment blood vessels and obtain higher accuracy results. CONCLUSIONS: Numerous experiments have shown that the suggested method outperforms the state-of-the-art approaches, particularly in terms of vessel connectivity and small blood vessel segmentation.
Asunto(s)
Algoritmos , Angiografía Coronaria , Vasos Coronarios , Procesamiento de Imagen Asistido por Computador , Humanos , Vasos Coronarios/diagnóstico por imagen , Angiografía Coronaria/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Medios de ContrasteRESUMEN
Multi-modal representation learning has received significant attention across diverse research domains due to its ability to model a scenario comprehensively. Learning the cross-modal interactions is essential to combining multi-modal data into a joint representation. However, conventional cross-attention mechanisms can produce noisy and non-meaningful values in the absence of useful cross-modal interactions among input features, thereby introducing uncertainty into the feature representation. These factors have the potential to degrade the performance of downstream tasks. This paper introduces a novel Pre-gating and Contextual Attention Gate (PCAG) module for multi-modal learning comprising two gating mechanisms that operate at distinct information processing levels within the deep learning model. The first gate filters out interactions that lack informativeness for the downstream task, while the second gate reduces the uncertainty introduced by the cross-attention module. Experimental results on eight multi-modal classification tasks spanning various domains show that the multi-modal fusion model with PCAG outperforms state-of-the-art multi-modal fusion models. Additionally, we elucidate how PCAG effectively processes cross-modality interactions.
Asunto(s)
Atención , Aprendizaje Profundo , Atención/fisiología , Humanos , Redes Neurales de la Computación , AlgoritmosRESUMEN
The identification of plant leaf diseases is crucial in precision agriculture, playing a pivotal role in advancing the modernization of agriculture. Timely detection and diagnosis of leaf diseases for preventive measures significantly contribute to enhancing both the quantity and quality of agricultural products, thereby fostering the in-depth development of precision agriculture. However, despite the rapid development of research on plant leaf disease identification, it still faces challenges such as insufficient agricultural datasets and the problem of deep learning-based disease identification models having numerous training parameters and insufficient accuracy. This paper proposes a plant leaf disease identification method based on improved SinGAN and improved ResNet34 to address the aforementioned issues. Firstly, an improved SinGAN called Reconstruction-Based Single Image Generation Network (ReSinGN) is proposed for image enhancement. This network accelerates model training speed by using an autoencoder to replace the GAN in the SinGAN and incorporates a Convolutional Block Attention Module (CBAM) into the autoencoder to more accurately capture important features and structural information in the images. Random pixel Shuffling are introduced in ReSinGN to enable the model to learn richer data representations, further enhancing the quality of generated images. Secondly, an improved ResNet34 is proposed for plant leaf disease identification. This involves adding CBAM modules to the ResNet34 to alleviate the limitations of parameter sharing, replacing the ReLU activation function with LeakyReLU activation function to address the problem of neuron death, and utilizing transfer learning-based training methods to accelerate network training speed. This paper takes tomato leaf diseases as the experimental subject, and the experimental results demonstrate that: (1) ReSinGN generates high-quality images at least 44.6 times faster in training speed compared to SinGAN. (2) The Tenengrad score of images generated by the ReSinGN model is 67.3, which is improved by 30.2 compared to the SinGAN, resulting in clearer images. (3) ReSinGN model with random pixel Shuffling outperforms SinGAN in both image clarity and distortion, achieving the optimal balance between image clarity and distortion. (4) The improved ResNet34 achieved an average recognition accuracy, recognition precision, recognition accuracy (redundant as it's similar to precision), recall, and F1 score of 98.57, 96.57, 98.68, 97.7, and 98.17%, respectively, for tomato leaf disease identification. Compared to the original ResNet34, this represents enhancements of 3.65, 4.66, 0.88, 4.1, and 2.47%, respectively.
RESUMEN
As the sheep industry rapidly moves towards modernization, digitization, and intelligence, there is a need to build breeding farms integrated with big data. By collecting individual information on sheep, precision breeding can be conducted to improve breeding efficiency, reduce costs, and promote healthy breeding practices. In this context, the accurate identification of individual sheep is essential for establishing digitized sheep farms and precision animal husbandry. Currently, scholars utilize deep learning technology to construct recognition models, learning the biological features of sheep faces to achieve accurate identification. However, existing research methods are limited to pattern recognition at the image level, leading to a lack of diversity in recognition methods. Therefore, this study focuses on the small-tailed Han sheep and develops a sheep face recognition method based on three-dimensional reconstruction technology and feature point matching, aiming to enrich the theoretical research of sheep face recognition technology. The specific recognition approach is as follows: full-angle sheep face images of experimental sheep are collected, and corresponding three-dimensional sheep face models are generated using three-dimensional reconstruction technology, further obtaining three-dimensional sheep face images from three different perspectives. Additionally, this study developed a sheep face orientation recognition algorithm called the sheep face orientation recognition algorithm (SFORA). The SFORA incorporates the ECA mechanism to further enhance recognition performance. Ultimately, the SFORA has a model size of only 5.3 MB, with accuracy and F1 score reaching 99.6% and 99.5%, respectively. During the recognition task, the SFORA is first used for sheep face orientation recognition, followed by matching the recognition image with the corresponding three-dimensional sheep face image based on the established SuperGlue feature-matching algorithm, ultimately outputting the recognition result. Experimental results indicate that when the confidence threshold is set to 0.4, SuperGlue achieves the best matching performance, with matching accuracies for the front, left, and right faces reaching 96.0%, 94.2%, and 96.3%, respectively. This study enriches the theoretical research on sheep face recognition technology and provides technical support.
RESUMEN
Ensuring the safety of mechanical equipment, gearbox fault diagnosis is crucial for the stable operation of the whole system. However, existing diagnostic methods still have limitations, such as the analysis of single-scale features and insufficient recognition of global temporal dependencies. To address these issues, this article proposes a new method for gearbox fault diagnosis based on MSCNN-LSTM-CBAM-SE. The output of the CBAM-SE module is deeply integrated with the multi-scale features from MSCNN and the temporal features from LSTM, constructing a comprehensive feature representation that provides richer and more precise information for fault diagnosis. The effectiveness of this method has been validated with two sets of gearbox datasets and through ablation studies on this model. Experimental results show that the proposed model achieves excellent performance in terms of accuracy and F1 score, among other metrics. Finally, a comparison with other relevant fault diagnosis methods further verifies the advantages of the proposed model. This research offers a new solution for accurate fault diagnosis of gearboxes.
RESUMEN
Convolutional neural networks (CNNs) have made significant progress in the field of facial expression recognition (FER). However, due to challenges such as occlusion, lighting variations, and changes in head pose, facial expression recognition in real-world environments remains highly challenging. At the same time, methods solely based on CNN heavily rely on local spatial features, lack global information, and struggle to balance the relationship between computational complexity and recognition accuracy. Consequently, the CNN-based models still fall short in their ability to address FER adequately. To address these issues, we propose a lightweight facial expression recognition method based on a hybrid vision transformer. This method captures multi-scale facial features through an improved attention module, achieving richer feature integration, enhancing the network's perception of key facial expression regions, and improving feature extraction capabilities. Additionally, to further enhance the model's performance, we have designed the patch dropping (PD) module. This module aims to emulate the attention allocation mechanism of the human visual system for local features, guiding the network to focus on the most discriminative features, reducing the influence of irrelevant features, and intuitively lowering computational costs. Extensive experiments demonstrate that our approach significantly outperforms other methods, achieving an accuracy of 86.51% on RAF-DB and nearly 70% on FER2013, with a model size of only 3.64 MB. These results demonstrate that our method provides a new perspective for the field of facial expression recognition.
Asunto(s)
Expresión Facial , Redes Neurales de la Computación , Humanos , Reconocimiento Facial Automatizado/métodos , Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos , Cara , Reconocimiento de Normas Patrones Automatizadas/métodosRESUMEN
The identification of cancer subtypes plays a very important role in the field of medicine. Accurate identification of cancer subtypes is helpful for both cancer treatment and prognosis Currently, most methods for cancer subtype identification are based on single-omics data, such as gene expression data. However, multi-omics data can show various characteristics about cancer, which also can improve the accuracy of cancer subtype identification. Therefore, how to extract features from multi-omics data for cancer subtype identification is the main challenge currently faced by researchers. In this paper, we propose a cancer subtype identification method named CAEM-GBDT, which takes gene expression data, miRNA expression data, and DNA methylation data as input, and adopts convolutional autoencoder network to identify cancer subtypes. Through a convolutional encoder layer, the method performs feature extraction on the input data. Within the convolutional encoder layer, a convolutional self-attention module is embedded to recognize higher-level representations of the multi-omics data. The extracted high-level representations from the convolutional encoder are then concatenated with the input to the decoder. The GBDT (Gradient Boosting Decision Tree) is utilized for cancer subtype identification. In the experiments, we compare CAEM-GBDT with existing cancer subtype identifying methods. Experimental results demonstrate that the proposed CAEM-GBDT outperforms other methods. The source code is available from GitHub at https://github.com/gxh-1/CAEM-GBDT.git.