Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.810
Filtrar
1.
Neural Netw ; 181: 106765, 2024 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-39357269

RESUMO

SNNs are gaining popularity in AI research as a low-power alternative in deep learning due to their sparse properties and biological interpretability. Using SNNs for dense prediction tasks is becoming an important research area. In this paper, we firstly proposed a novel modification on the conventional Spiking U-Net architecture by adjusting the firing positions of neurons. The modified network model, named Analog Spiking U-Net (AS U-Net), is capable of incorporating the Convolutional Block Attention Module (CBAM) into the domain of SNNs. This is the first successful implementation of CBAM in SNNs, which has the potential to improve SNN model's segmentation performance while decreasing information loss. Then, the proposed AS U-Net (with CBAM&ViT) is trained by direct encoding on a comprehensive dataset obtained by merging several diabetic retinal vessel segmentation datasets. Based on the experimental results, the provided SNN model achieves the highest segmentation accuracy in retinal vessel segmentation for diabetes mellitus, surpassing other SNN-based models and most ANN-based related models. In addition, under the same structure, our model demonstrates comparable performance to the ANN model. And then, the novel model achieves state-of-the-art(SOTA) results in comparative experiments when both accuracy and energy consumption are considered (Fig. 1). At the same time, the ablative analysis of CBAM further confirms its feasibility and effectiveness in SNNs, which means that a novel approach could be provided for subsequent deployment and hardware chip application. In the end, we conduct extensive generalization experiments on the same type of segmentation task (ISBI and ISIC), the more complex multi-segmentation task (Synapse), and a series of image generation tasks (MNIST, Day2night, Maps, Facades) in order to visually demonstrate the generality of the proposed method.

2.
Proteomics ; : e202400210, 2024 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-39361250

RESUMO

N-Linked glycosylation is crucial for various biological processes such as protein folding, immune response, and cellular transport. Traditional experimental methods for determining N-linked glycosylation sites entail substantial time and labor investment, which has led to the development of computational approaches as a more efficient alternative. However, due to the limited availability of 3D structural data, existing prediction methods often struggle to fully utilize structural information and fall short in integrating sequence and structural information effectively. Motivated by the progress of protein pretrained language models (pLMs) and the breakthrough in protein structure prediction, we introduced a high-accuracy model called CoNglyPred. Having compared various pLMs, we opt for the large-scale pLM ESM-2 to extract sequence embeddings, thus mitigating certain limitations associated with manual feature extraction. Meanwhile, our approach employs a graph transformer network to process the 3D protein structures predicted by AlphaFold2. The final graph output and ESM-2 embedding are intricately integrated through a co-attention mechanism. Among a series of comprehensive experiments on the independent test dataset, CoNglyPred outperforms state-of-the-art models and demonstrates exceptional performance in case study. In addition, we are the first to report the uncertainty of N-linked glycosylation predictors using expected calibration error and expected uncertainty calibration error.

3.
Sci Rep ; 14(1): 23644, 2024 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-39384576

RESUMO

The coal-gangue recognition technology plays an important role in the intelligent realization of fully mechanized caving face and the improvement of coal quality. Although great progress has been made for the coal-gangue recognition in recent years, most of them have not taken into account the impact of the complex environment of top coal caving on recognition performance. Herein, a hybrid multi-branch convolutional neural network (HMBCNN) is proposed for coal-gangue recognition, which based on improved Mel Frequency Cepstral Coefficient (MFCC) as well as Mel spectrogram, and attention mechanism. Firstly, the MFCC and its smooth feature matrix are input into each branch of one-dimensional multi-branch convolutional neural network, and the spliced features are extracted adaptively through multi-head attention mechanism. Secondly, the Mel spectrogram and its first-order derivative are input into each branch of the two-dimensional multi-branch convolutional neural network respectively, and the effective time-frequency information is paid attention to through the soft attention mechanism. Finally, at the decision-making level, the two networks are fused to establish a model for feature fusion and classification, obtaining optimal fusion strategies for different features and networks. A database of sound pressure signals under different signal-to-noise ratios and equipment operations is constructed based on a large amount of data collected in the laboratory and on-site. Comparative experiments and discussions are conducted on this database with advanced algorithms and different neural network structures. The results show that the proposed method achieves higher recognition accuracy and better robustness in noisy environments.

4.
Heliyon ; 10(18): e37916, 2024 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-39364248

RESUMO

In recent years, as China's industrialization level has advanced, the issue of environmental pollution, particularly mine water pollution, has become increasingly severe. Water quality prediction is a fundamental aspect of water resource protection and a critical approach to addressing the water resource crisis. For improvement in water quality prediction, this research first analyzes the characteristics of mine water quality changes and provides a brief overview of water quality prediction. Subsequently, the Long Short-Term Memory and Sequence to Sequence (Seq2Seq) models, derived from Artificial Neural Networks, are introduced. The Seq2Seq water quality prediction model is implemented, incorporating the attention mechanism. Experimental validation confirms the effectiveness of the proposed model. The results demonstrate that the attention mechanism-based Seq2Seq model accurately predicts parameters such as pH value, Dissolved Oxygen, ammonia nitrogen, and Chemical Oxygen Demand, exhibiting a high degree of consistency with actual results. They play a vital role in assessing the health of the water and its ability to support aquatic life. The change of these indicators can reflect the degree and type of water pollution. Moreover, the Seq2Seq + attention model stands out with the lowest predicted Root Mean Square Error of 0.309. Notably, in comparison to the traditional Seq2Seq model, the incorporation of attention mechanisms in the Seq2Seq model results in a substantial 2.94 reduction in Mean Absolute Error. This research on the Seq2Seq water quality prediction model with attention mechanism provides valuable insights and references for future endeavors in water quality prediction.

5.
Heliyon ; 10(19): e37495, 2024 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-39381114

RESUMO

To more effectively address the issue of carbon emissions in the aviation industry, this study first analyzes the current development status of carbon offset and carbon neutrality strategies in the aviation industry, as well as examines the existing relevant research findings. Then, optimizations are made to the Convolutional Neural Network to improve the accuracy and efficiency of the prediction model. These optimizations include architectural improvements, the use of attention mechanisms to more accurately focus on important features, as well as the adoption of multiscale feature extraction and advanced optimization algorithms to enhance the model's learning ability and convergence speed. These comprehensive improvements not only enhance the model's generalization ability but also significantly improve its applicability in complex environments. Finally, by comparing the performance of Transformer Networks, Graph Convolutional Networks, Capsule Networks, Generative Adversarial Networks, Temporal Convolutional Networks, and the proposed optimization algorithm on datasets of airline carbon emissions and fuel usage, the performance of the proposed optimization algorithm is validated through comparison of accuracy, precision, recall, and F1-score calculated from the data. Simultaneously, simulation experiments are conducted to validate the effectiveness and feasibility of the proposed optimization algorithm by comparing prediction stability, strategy adaptability, response time, and long-term effectiveness. The experimental results show that the accuracy, precision, recall, and F1-score of the proposed optimized model reach up to 0.942, 0.967, 0.951, and 0.934 respectively, all higher than those of the compared models, validating the good performance of the proposed optimized model. In the comparison of simulation experiments, the scores of prediction stability and strategy adaptability of the proposed optimized model reach up to 0.944 and 0.953 respectively, much higher than those of other models. The response time is only 0.04s when the data volume is 1000, and the computational advantage of the proposed optimized model becomes more apparent with the increase in data volume. In the comparison of long-term effectiveness, the advantage of the proposed optimized model in this aspect also becomes more obvious with the increase in data volume. Through simulation experiments, the performance of the model in actual application scenarios is further evaluated to ensure its practicability. Therefore, this study not only provides a new optimization tool for carbon emission strategies in the aviation industry but also has certain significance for research on environmental sustainability.

6.
Front Physiol ; 15: 1432987, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39397853

RESUMO

Introduction: Ultrasound imaging has become a crucial tool in medical diagnostics, offering real-time visualization of internal organs and tissues. However, challenges such as low contrast, high noise levels, and variability in image quality hinder accurate interpretation. To enhance the diagnostic accuracy and support treatment decisions, precise segmentation of organs and lesions in ultrasound image is essential. Recently, several deep learning methods, including convolutional neural networks (CNNs) and Transformers, have reached significant milestones in medical image segmentation. Nonetheless, there remains a pressing need for methods capable of seamlessly integrating global context with local fine-grained information, particularly in addressing the unique challenges posed by ultrasound images. Methods: In this paper, to address these issues, we propose DDTransUNet, a hybrid network combining Transformer and CNN, with a dual-branch encoder and dual attention mechanism for ultrasound image segmentation. DDTransUNet adopts a Swin Transformer branch and a CNN branch to extract global context and local fine-grained information. The dual attention comprising Global Spatial Attention (GSA) and Global Channel Attention (GCA) modules to capture long-range visual dependencies. A novel Cross Attention Fusion (CAF) module effectively fuses feature maps from both branches using cross-attention. Results: Experiments on three ultrasound image datasets demonstrate that DDTransUNet outperforms previous methods. In the TN3K dataset, DDTransUNet achieves IoU, Dice, HD95 and ACC metrics of 73.82%, 82.31%, 16.98 mm, and 96.94%, respectively. In the BUS-BRA dataset, DDTransUNet achieves 80.75%, 88.23%, 8.12 mm, and 98.00%. In the CAMUS dataset, DDTransUNet achieves 82.51%, 90.33%, 2.82 mm, and 96.87%. Discussion: These results indicate that our method can provide valuable diagnostic assistance to clinical practitioners.

7.
Brief Bioinform ; 25(6)2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39401144

RESUMO

Spatial transcriptomics reveals the spatial distribution of genes in complex tissues, providing crucial insights into biological processes, disease mechanisms, and drug development. The prediction of gene expression based on cost-effective histology images is a promising yet challenging field of research. Existing methods for gene prediction from histology images exhibit two major limitations. First, they ignore the intricate relationship between cell morphological information and gene expression. Second, these methods do not fully utilize the different latent stages of features extracted from the images. To address these limitations, we propose a novel hypergraph neural network model, HGGEP, to predict gene expressions from histology images. HGGEP includes a gradient enhancement module to enhance the model's perception of cell morphological information. A lightweight backbone network extracts multiple latent stage features from the image, followed by attention mechanisms to refine the representation of features at each latent stage and capture their relations with nearby features. To explore higher-order associations among multiple latent stage features, we stack them and feed into the hypergraph to establish associations among features at different scales. Experimental results on multiple datasets from disease samples including cancers and tumor disease, demonstrate the superior performance of our HGGEP model than existing methods.


Assuntos
Redes Neurais de Computação , Humanos , Perfilação da Expressão Gênica/métodos , Biologia Computacional/métodos , Algoritmos , Neoplasias/genética , Neoplasias/patologia , Processamento de Imagem Assistida por Computador/métodos
8.
Artigo em Inglês | MEDLINE | ID: mdl-39397592

RESUMO

Electroencephalography analysis is critical for brain computer interface research. The primary goal of brain-computer interface is to establish communication between impaired people and others via brain signals. The classification of multi-level mental activities using the brain-computer interface has recently become more difficult, which affects the accuracy of the classification. However, several deep learning-based techniques have attempted to identify mental tasks using multidimensional data. The hybrid capsule attention-based convolutional bidirectional gated recurrent unit model was introduced in this study as a hybrid deep learning technique for multi-class mental task categorization. Initially, the obtained electroencephalography data is pre-processed with a digital low-pass Butterworth filter and a discrete wavelet transform to remove disturbances. The spectrally adaptive common spatial pattern is used to extract characteristics from pre-processed electroencephalography data. The retrieved features were then loaded into the suggested classification model, which was used to extract the features deeply and classify the mental tasks. To improve classification results, the model's parameters are fine-tuned using a dung beetle optimization approach. Finally, the proposed classifier is assessed for several types of mental task classification using the provided dataset. The simulation results are compared with the existing state-of-the-art techniques in terms of accuracy, precision, recall, etc. The accuracy obtained using the proposed approach is 97.87%, which is higher than that of the other existing methods.

9.
Sensors (Basel) ; 24(19)2024 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-39409221

RESUMO

With the rapid growth in demand for security surveillance, assisted driving, and remote sensing, object detection networks with robust environmental perception and high detection accuracy have become a research focus. However, single-modality image detection technologies face limitations in environmental adaptability, often affected by factors such as lighting conditions, fog, rain, and obstacles like vegetation, leading to information loss and reduced detection accuracy. We propose an object detection network that integrates features from visible light and infrared images-IV-YOLO-to address these challenges. This network is based on YOLOv8 (You Only Look Once v8) and employs a dual-branch fusion structure that leverages the complementary features of infrared and visible light images for target detection. We designed a Bidirectional Pyramid Feature Fusion structure (Bi-Fusion) to effectively integrate multimodal features, reducing errors from feature redundancy and extracting fine-grained features for small object detection. Additionally, we developed a Shuffle-SPP structure that combines channel and spatial attention to enhance the focus on deep features and extract richer information through upsampling. Regarding model optimization, we designed a loss function tailored for multi-scale object detection, accelerating the convergence speed of the network during training. Compared with the current state-of-the-art Dual-YOLO model, IV-YOLO achieves mAP improvements of 2.8%, 1.1%, and 2.2% on the Drone Vehicle, FLIR, and KAIST datasets, respectively. On the Drone Vehicle and FLIR datasets, IV-YOLO has a parameter count of 4.31 M and achieves a frame rate of 203.2 fps, significantly outperforming YOLOv8n (5.92 M parameters, 188.6 fps on the Drone Vehicle dataset) and YOLO-FIR (7.1 M parameters, 83.3 fps on the FLIR dataset), which had previously achieved the best performance on these datasets. This demonstrates that IV-YOLO achieves higher real-time detection performance while maintaining lower parameter complexity, making it highly promising for applications in autonomous driving, public safety, and beyond.

10.
Sensors (Basel) ; 24(19)2024 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-39409249

RESUMO

Object detection, as a crucial aspect of computer vision, plays a vital role in traffic management, emergency response, autonomous vehicles, and smart cities. Despite the significant advancements in object detection, detecting small objects in images captured by high-altitude cameras remains challenging, due to factors such as object size, distance from the camera, varied shapes, and cluttered backgrounds. To address these challenges, we propose small object detection YOLOv8 (SOD-YOLOv8), a novel model specifically designed for scenarios involving numerous small objects. Inspired by efficient generalized feature pyramid networks (GFPNs), we enhance multi-path fusion within YOLOv8 to integrate features across different levels, preserving details from shallower layers and improving small object detection accuracy. Additionally, we introduce a fourth detection layer to effectively utilize high-resolution spatial information. The efficient multi-scale attention module (EMA) in the C2f-EMA module further enhances feature extraction by redistributing weights and prioritizing relevant features. We introduce powerful-IoU (PIoU) as a replacement for CIoU, focusing on moderate quality anchor boxes and adding a penalty based on differences between predicted and ground truth bounding box corners. This approach simplifies calculations, speeds up convergence, and enhances detection accuracy. SOD-YOLOv8 significantly improves small object detection, surpassing widely used models across various metrics, without substantially increasing the computational cost or latency compared to YOLOv8s. Specifically, it increased recall from 40.1% to 43.9%, precision from 51.2% to 53.9%, mAP0.5 from 40.6% to 45.1%, and mAP0.5:0.95 from 24% to 26.6%. Furthermore, experiments conducted in dynamic real-world traffic scenes illustrated SOD-YOLOv8's significant enhancements across diverse environmental conditions, highlighting its reliability and effective object detection capabilities in challenging scenarios.

11.
Sensors (Basel) ; 24(19)2024 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-39409242

RESUMO

Urban traffic congestion poses significant economic and environmental challenges worldwide. To mitigate these issues, Adaptive Traffic Signal Control (ATSC) has emerged as a promising solution. Recent advancements in deep reinforcement learning (DRL) have further enhanced ATSC's capabilities. This paper introduces a novel DRL-based ATSC approach named the Sequence Decision Transformer (SDT), employing DRL enhanced with attention mechanisms and leveraging the robust capabilities of sequence decision models, akin to those used in advanced natural language processing, adapted here to tackle the complexities of urban traffic management. Firstly, the ATSC problem is modeled as a Markov Decision Process (MDP), with the observation space, action space, and reward function carefully defined. Subsequently, we propose SDT, specifically tailored to solve the MDP problem. The SDT model uses a transformer-based architecture with an encoder and decoder in an actor-critic structure. The encoder processes observations and outputs, both encoded data for the decoder, and value estimates for parameter updates. The decoder, as the policy network, outputs the agent's actions. Proximal Policy Optimization (PPO) is used to update the policy network based on historical data, enhancing decision-making in ATSC. This approach significantly reduces training times, effectively manages larger observation spaces, captures dynamic changes in traffic conditions more accurately, and enhances traffic throughput. Finally, the SDT model is trained and evaluated in synthetic scenarios by comparing the number of vehicles, average speed, and queue length against three baselines, including PPO, a DQN tailored for ATSC, and FRAP, a state-of-the-art ATSC algorithm. SDT shows improvements of 26.8%, 150%, and 21.7% over traditional ATSC algorithms, and 18%, 30%, and 15.6% over the FRAP. This research underscores the potential of integrating Large Language Models (LLMs) with DRL for traffic management, offering a promising solution to urban congestion.

12.
Sensors (Basel) ; 24(19)2024 Sep 26.
Artigo em Inglês | MEDLINE | ID: mdl-39409277

RESUMO

The reliable operation of scroll compressors is crucial for the efficiency of rotating machinery and refrigeration systems. To address the need for efficient and accurate fault diagnosis in scroll compressor technology under varying operating states, diverse failure modes, and different operating conditions, a multi-branch convolutional neural network fault diagnosis method (SSG-Net) has been developed. This method is based on the Swin Transformer, the Global Attention Mechanism (GAM), and the ResNet architecture. Initially, the one-dimensional time-series signal is converted into a two-dimensional image using the Short-Time Fourier Transform, thereby enriching the feature set for deep learning analysis. Subsequently, the method integrates the window attention mechanism of the Swin Transformer, the 2D convolution of GAM attention, and the shallow ResNet's two-dimensional convolution feature extraction branch network. This integration further optimizes the feature extraction process, enhancing the accuracy of fault feature recognition and sensitivity to data variability. Consequently, by combining the global and local features extracted from these three branch networks, the model significantly improves feature representation capability and robustness. Finally, experimental results on scroll compressor datasets and the CWRU dataset demonstrate diagnostic accuracies of 97.44% and 99.78%, respectively. These results surpass existing comparative models and confirm the model's superior recognition precision and rapid convergence capabilities in complex fault environments.

13.
Sensors (Basel) ; 24(19)2024 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-39409368

RESUMO

Counting shrimp larvae is an essential part of shrimp farming. Due to their tiny size and high density, this task is exceedingly difficult. Thus, we introduce an algorithm for counting densely packed shrimp larvae utilizing an enhanced You Only Look Once version 5 (YOLOv5) model through a regional segmentation approach. First, the C2f and convolutional block attention modules are used to improve the capabilities of YOLOv5 in recognizing the small shrimp. Moreover, employing a regional segmentation technique can decrease the receptive field area, thereby enhancing the shrimp counter's detection performance. Finally, a strategy for stitching and deduplication is implemented to tackle the problem of double counting across various segments. The findings from the experiments indicate that the suggested algorithm surpasses several other shrimp counting techniques in terms of accuracy. Notably, for high-density shrimp larvae in large quantities, this algorithm attained an accuracy exceeding 98%.


Assuntos
Algoritmos , Larva , Animais , Larva/fisiologia , Processamento de Imagem Assistida por Computador/métodos
14.
Sensors (Basel) ; 24(19)2024 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-39409398

RESUMO

Interactive image segmentation extremely accelerates the generation of high-quality annotation image datasets, which are the pillars of the applications of deep learning. However, these methods suffer from the insignificance of interaction information and excessively high optimization costs, resulting in unexpected segmentation outcomes and increased computational burden. To address these issues, this paper focuses on interactive information mining from the network architecture and optimization procedure. In terms of network architecture, the issue mentioned above arises from two perspectives: the less representative feature of interactive regions in each layer and the interactive information weakened by the network hierarchy structure. Therefore, the paper proposes a network called EnNet. The network addresses the two aforementioned issues by employing attention mechanisms to integrate user interaction information across the entire image and incorporating interaction information twice in a design that progresses from coarse to fine. In terms of optimization, this paper proposes a method of using zero-order optimization during the first four iterations of training. This approach can reduce computational overhead with only a minimal reduction in accuracy. The experimental results on GrabCut, Berkeley, DAVIS, and SBD datasets validate the effectiveness of the proposed method, with our approach achieving an average NOC@90 that surpasses RITM by 0.35.

15.
Sensors (Basel) ; 24(19)2024 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-39409423

RESUMO

This paper introduces a lightweight flame detection algorithm, enhancing the accuracy and speed of gas-flame state recognition in low-pressure environments using an improved YOLOv8n model. This method effectively resolves the aforementioned problems. Firstly, GhostNet is integrated into the backbone to form the GhostConv module, reducing the model's computational parameters. Secondly, the C2f module is improved by integrating RepGhost, forming the C2f_RepGhost module, which performs deep convolution, extends feature dimensions, and simplifies the inference structure. Additionally, the CBAM attention mechanism is added to enhance the model's ability to capture fine-grained features of flames in both channel and spatial dimensions. The replacement of CIoU with WIoU improves the sensitivity and accuracy of the model's regression loss. Experimental results on a simulated dataset of the theoretical testbed indicate that compared to the original model, the proposed improvements achieve good performance in low-pressure flame state detection. The model's parameter count is reduced by 12.64%, the total floating-point operations are reduced by 12.2%, and the detection accuracy is improved by 21.2%. Although the detection frame rate slightly decreases, it still meets real-time detection requirements. The experimental results demonstrate that the feasibility and effectiveness of the proposed algorithm have been significantly improved.

16.
Sensors (Basel) ; 24(19)2024 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-39409486

RESUMO

The rapid growth in technologies for 3D sensors has made point cloud data increasingly available in different applications such as autonomous driving, robotics, and virtual and augmented reality. This raises a growing need for deep learning methods to process the data. Point clouds are difficult to be used directly as inputs in several deep learning techniques. The difficulty is raised by the unstructured and unordered nature of the point cloud data. So, machine learning models built for images or videos cannot be used directly on point cloud data. Although the research in the field of point clouds has gained high attention and different methods have been developed over the decade, very few research works directly with point cloud data, and most of them convert the point cloud data into 2D images or voxels by performing some pre-processing that causes information loss. Methods that directly work on point clouds are in the early stage and this affects the performance and accuracy of the models. Advanced techniques in classical convolutional neural networks, such as the attention mechanism, need to be transferred to the methods directly working with point clouds. In this research, an attention mechanism is proposed to be added to deep convolutional neural networks that process point clouds directly. The attention module was proposed based on specific pooling operations which are designed to be applied directly to point clouds to extract vital information from the point clouds. Segmentation of the ShapeNet dataset was performed to evaluate the method. The mean intersection over union (mIoU) score of the proposed framework was increased after applying the attention method compared to a base state-of-the-art framework that does not have the attention mechanism.

17.
Sensors (Basel) ; 24(19)2024 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-39409535

RESUMO

Strip steel surface defect detection has become a crucial step in ensuring the quality of strip steel production. To address the issues of low detection accuracy and long detection times in strip steel surface defect detection algorithms caused by varying defect sizes and blurred images during acquisition, this paper proposes a lightweight strip steel surface defect detection network, YOLO-SDS, based on an improved YOLOv8. Firstly, StarNet is utilized to replace the backbone network of YOLOv8, achieving lightweight optimization while maintaining accuracy. Secondly, a lightweight module DWR is introduced into the neck and combined with the C2f feature extraction module to enhance the model's multi-scale feature extraction capability. Finally, an occlusion-aware attention mechanism SEAM is incorporated into the detection head, enabling the model to better capture and process features of occluded objects, thus improving performance in complex scenarios. Experimental results on the open-source NEU-DET dataset show that the improved model reduces parameters by 34.4% compared with the original YOLOv8 algorithm while increasing average detection accuracy by 1.5%. And it shows good generalization performance on the deepPCB dataset. Compared with other defect detection models, YOLO-SDS offers significant advantages in terms of parameter count and detection speed. Additionally, ablation experiments validate the effectiveness of each module.

18.
Radiol Med ; 2024 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-39412688

RESUMO

PURPOSE: To develop a contrastive language-image pretraining (CLIP) model based on transfer learning and combined with self-attention mechanism to predict the tumor-stroma ratio (TSR) in pancreatic ductal adenocarcinoma on preoperative enhanced CT images, in order to understand the biological characteristics of tumors for risk stratification and guiding feature fusion during artificial intelligence-based model representation. MATERIAL AND METHODS: This retrospective study collected a total of 207 PDAC patients from three hospitals. TSR assessments were performed on surgical specimens by pathologists and divided into high TSR and low TSR groups. This study developed one novel CLIP-adapter model that integrates the CLIP paradigm with a self-attention mechanism for better utilizing features from multi-phase imaging, thereby enhancing the accuracy and reliability of tumor-stroma ratio predictions. Additionally, clinical variables, traditional radiomics model and deep learning models (ResNet50, ResNet101, ViT_Base_32, ViT_Base_16) were constructed for comparison. RESULTS: The models showed significant efficacy in predicting TSR in PDAC. The performance of the CLIP-adapter model based on multi-phase feature fusion was superior to that based on any single phase (arterial or venous phase). The CLIP-adapter model outperformed traditional radiomics models and deep learning models, with CLIP-adapter_ViT_Base_32 performing the best, achieving the highest AUC (0.978) and accuracy (0.921) in the test set. Kaplan-Meier survival analysis showed longer overall survival in patients with low TSR compared to those with high TSR. CONCLUSION: The CLIP-adapter model designed in this study provides a safe and accurate method for predicting the TSR in PDAC. The feature fusion module based on multi-modal (image and text) and multi-phase (arterial and venous phase) significantly improves model performance.

19.
Comput Methods Programs Biomed ; 257: 108454, 2024 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-39369585

RESUMO

BACKGROUND AND OBJECTIVE: Integrating domain knowledge into deep learning models can improve their effectiveness and increase explainability. This study aims to enhance the classification performance of electrocardiograms (ECGs) by customizing specific guided mechanisms based on the characteristics of different cardiac abnormalities. METHODS: Two novel guided attention mechanisms, Guided Spatial Attention (GSA) and CAM-based spatial guided attention mechanism (CGAM), were introduced. Different attention guidance labels were created based on clinical knowledge for four ECG abnormality classification tasks: ST change detection, premature contraction identification, Wolf-Parkinson-White syndrome (WPW) classification, and atrial fibrillation (AF) detection. The models were trained and evaluated separately for each classification task. Model explainability was quantified using Shapley values. RESULTS: GSA improved the F1 score of the model by 5.74%, 5%, 8.96%, and 3.91% for ST change detection, premature contraction identification, WPW classification, and AF detection, respectively. Similarly, CGAM exhibited improvements of 3.89%, 5.40%, 8.21%, and 1.80% for the respective tasks. The combined use of GSA and CGAM resulted in even higher improvements of 6.26%, 5.58%, 8.85%, and 4.03%, respectively. Moreover, when all four tasks were conducted simultaneously, a notable overall performance boost was achieved, demonstrating the broad adaptability of the proposed model. The quantified Shapley values demonstrated the effectiveness of the guided attention mechanisms in enhancing the model's explainability. CONCLUSIONS: The guided attention mechanisms, utilizing domain knowledge, effectively directed the model's attention, leading to improved classification performance and explainability. These findings have significant implications in facilitating accurate automated ECG classification.

20.
BMC Med Res Methodol ; 24(1): 232, 2024 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-39375589

RESUMO

BACKGROUND: Postoperative pain is a prevalent symptom experienced by patients undergoing surgical procedures. This study aims to develop deep learning algorithms for predicting acute postoperative pain using both essential patient details and real-time vital sign data during surgery. METHODS: Through a retrospective observational approach, we utilized Graph Attention Networks (GAT) and graph Transformer Networks (GTN) deep learning algorithms to construct the DoseFormer model while incorporating an attention mechanism. This model employed patient information and intraoperative vital signs obtained during Video-assisted thoracoscopic surgery (VATS) surgery to anticipate postoperative pain. By categorizing the static and dynamic data, the DoseFormer model performed binary classification to predict the likelihood of postoperative acute pain. RESULTS: A total of 1758 patients were initially included, with 1552 patients after data cleaning. These patients were then divided into training set (n = 931) and testing set (n = 621). In the testing set, the DoseFormer model exhibited significantly higher AUROC (0.98) compared to classical machine learning algorithms. Furthermore, the DoseFormer model displayed a significantly higher F1 value (0.85) in comparison to other classical machine learning algorithms. Notably, the attending anesthesiologists' F1 values (attending: 0.49, fellow: 0.43, Resident: 0.16) were significantly lower than those of the DoseFormer model in predicting acute postoperative pain. CONCLUSIONS: Deep learning model can predict postoperative acute pain events based on patients' basic information and intraoperative vital signs.


Assuntos
Aprendizado Profundo , Dor Pós-Operatória , Cirurgia Torácica Vídeoassistida , Humanos , Cirurgia Torácica Vídeoassistida/métodos , Cirurgia Torácica Vídeoassistida/efeitos adversos , Dor Pós-Operatória/etiologia , Dor Pós-Operatória/diagnóstico , Estudos Retrospectivos , Feminino , Masculino , Pessoa de Meia-Idade , Algoritmos , Idoso , Adulto , Dor Aguda/diagnóstico , Dor Aguda/etiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...