Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 112
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Brief Bioinform ; 25(5)2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39175132

ABSTRACT

Numerous studies have demonstrated that microRNAs (miRNAs) are critically important for the prediction, diagnosis, and characterization of diseases. However, identifying miRNA-disease associations through traditional biological experiments is both costly and time-consuming. To further explore these associations, we proposed a model based on hybrid high-order moments combined with element-level attention mechanisms (HHOMR). This model innovatively fused hybrid higher-order statistical information along with structural and community information. Specifically, we first constructed a heterogeneous graph based on existing associations between miRNAs and diseases. HHOMR employs a structural fusion layer to capture structure-level embeddings and leverages a hybrid high-order moments encoder layer to enhance features. Element-level attention mechanisms are then used to adaptively integrate the features of these hybrid moments. Finally, a multi-layer perceptron is utilized to calculate the association scores between miRNAs and diseases. Through five-fold cross-validation on HMDD v2.0, we achieved a mean AUC of 93.28%. Compared with four state-of-the-art models, HHOMR exhibited superior performance. Additionally, case studies on three diseases-esophageal neoplasms, lymphoma, and prostate neoplasms-were conducted. Among the top 50 miRNAs with high disease association scores, 46, 47, and 45 associated with these diseases were confirmed by the dbDEMC and miR2Disease databases, respectively. Our results demonstrate that HHOMR not only outperforms existing models but also shows significant potential in predicting miRNA-disease associations.


Subject(s)
MicroRNAs , MicroRNAs/genetics , Humans , Computational Biology/methods , Genetic Predisposition to Disease , Algorithms , Prostatic Neoplasms/genetics , Models, Genetic
2.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38701417

ABSTRACT

Transcription factors (TFs) are proteins essential for regulating genetic transcriptions by binding to transcription factor binding sites (TFBSs) in DNA sequences. Accurate predictions of TFBSs can contribute to the design and construction of metabolic regulatory systems based on TFs. Although various deep-learning algorithms have been developed for predicting TFBSs, the prediction performance needs to be improved. This paper proposes a bidirectional encoder representations from transformers (BERT)-based model, called BERT-TFBS, to predict TFBSs solely based on DNA sequences. The model consists of a pre-trained BERT module (DNABERT-2), a convolutional neural network (CNN) module, a convolutional block attention module (CBAM) and an output module. The BERT-TFBS model utilizes the pre-trained DNABERT-2 module to acquire the complex long-term dependencies in DNA sequences through a transfer learning approach, and applies the CNN module and the CBAM to extract high-order local features. The proposed model is trained and tested based on 165 ENCODE ChIP-seq datasets. We conducted experiments with model variants, cross-cell-line validations and comparisons with other models. The experimental results demonstrate the effectiveness and generalization capability of BERT-TFBS in predicting TFBSs, and they show that the proposed model outperforms other deep-learning models. The source code for BERT-TFBS is available at https://github.com/ZX1998-12/BERT-TFBS.


Subject(s)
Neural Networks, Computer , Transcription Factors , Transcription Factors/metabolism , Transcription Factors/genetics , Binding Sites , Algorithms , Computational Biology/methods , Humans , Deep Learning , Protein Binding
3.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38446739

ABSTRACT

Antimicrobial peptides (AMPs), short peptides with diverse functions, effectively target and combat various organisms. The widespread misuse of chemical antibiotics has led to increasing microbial resistance. Due to their low drug resistance and toxicity, AMPs are considered promising substitutes for traditional antibiotics. While existing deep learning technology enhances AMP generation, it also presents certain challenges. Firstly, AMP generation overlooks the complex interdependencies among amino acids. Secondly, current models fail to integrate crucial tasks like screening, attribute prediction and iterative optimization. Consequently, we develop a integrated deep learning framework, Diff-AMP, that automates AMP generation, identification, attribute prediction and iterative optimization. We innovatively integrate kinetic diffusion and attention mechanisms into the reinforcement learning framework for efficient AMP generation. Additionally, our prediction module incorporates pre-training and transfer learning strategies for precise AMP identification and screening. We employ a convolutional neural network for multi-attribute prediction and a reinforcement learning-based iterative optimization strategy to produce diverse AMPs. This framework automates molecule generation, screening, attribute prediction and optimization, thereby advancing AMP research. We have also deployed Diff-AMP on a web server, with code, data and server details available in the Data Availability section.


Subject(s)
Amino Acids , Antimicrobial Peptides , Anti-Bacterial Agents , Diffusion , Kinetics
4.
BMC Bioinformatics ; 25(1): 250, 2024 Jul 30.
Article in English | MEDLINE | ID: mdl-39080535

ABSTRACT

BACKGROUND: The potential benefits of drug combination synergy in cancer medicine are significant, yet the risks must be carefully managed due to the possibility of increased toxicity. Although artificial intelligence applications have demonstrated notable success in predicting drug combination synergy, several key challenges persist: (1) Existing models often predict average synergy values across a restricted range of testing dosages, neglecting crucial dose amounts and the mechanisms of action of the drugs involved. (2) Many graph-based models rely on static protein-protein interactions, failing to adapt to dynamic and higher-order relationships. These limitations constrain the applicability of current methods. RESULTS: We introduce SAFER, a Sub-hypergraph Attention-based graph model, addressing these issues by incorporating complex relationships among biological knowledge networks and considering dosing effects on subject-specific networks. SAFER outperformed previous models on the benchmark and the independent test set. The analysis of subgraph attention weight for the lung cancer cell line highlighted JAK-STAT signaling pathway, PRDM12, ZNF781, and CDC5L that have been implicated in lung fibrosis. CONCLUSIONS: SAFER presents an interpretable framework designed to identify drug-responsive signals. Tailored for comprehending dose effects on subject-specific molecular contexts, our model uniquely captures dose-level drug combination responses. This capability unlocks previously inaccessible avenues of investigation compared to earlier models. Furthermore, the SAFER framework can be leveraged by future inquiries to investigate molecular networks that uniquely characterize individual patients and can be applied to prioritize personalized effective treatment based on safe dose combinations.


Subject(s)
Neural Networks, Computer , Humans , Cell Line, Tumor , Drug Synergism , Lung Neoplasms/drug therapy , Lung Neoplasms/metabolism , Dose-Response Relationship, Drug , Signal Transduction/drug effects , Antineoplastic Agents/pharmacology
5.
Brief Bioinform ; 23(6)2022 11 19.
Article in English | MEDLINE | ID: mdl-36242566

ABSTRACT

MOTIVATION: Discovering the drug-target interactions (DTIs) is a crucial step in drug development such as the identification of drug side effects and drug repositioning. Since identifying DTIs by web-biological experiments is time-consuming and costly, many computational-based approaches have been proposed and have become an efficient manner to infer the potential interactions. Although extensive effort is invested to solve this task, the prediction accuracy still needs to be improved. More especially, heterogeneous network-based approaches do not fully consider the complex structure and rich semantic information in these heterogeneous networks. Therefore, it is still a challenge to predict DTIs efficiently. RESULTS: In this study, we develop a novel method via Multiview heterogeneous information network embedding with Hierarchical Attention mechanisms to discover potential Drug-Target Interactions (MHADTI). Firstly, MHADTI constructs different similarity networks for drugs and targets by utilizing their multisource information. Combined with the known DTI network, three drug-target heterogeneous information networks (HINs) with different views are established. Secondly, MHADTI learns embeddings of drugs and targets from multiview HINs with hierarchical attention mechanisms, which include the node-level, semantic-level and graph-level attentions. Lastly, MHADTI employs the multilayer perceptron to predict DTIs with the learned deep feature representations. The hierarchical attention mechanisms could fully consider the importance of nodes, meta-paths and graphs in learning the feature representations of drugs and targets, which makes their embeddings more comprehensively. Extensive experimental results demonstrate that MHADTI performs better than other SOTA prediction models. Moreover, analysis of prediction results for some interested drugs and targets further indicates that MHADTI has advantages in discovering DTIs. AVAILABILITY AND IMPLEMENTATION: https://github.com/pxystudy/MHADTI.


Subject(s)
Drug Repositioning , Neural Networks, Computer , Drug Interactions , Drug Development , Information Services
6.
Biomed Eng Online ; 23(1): 76, 2024 Jul 31.
Article in English | MEDLINE | ID: mdl-39085884

ABSTRACT

BACKGROUND: Transcranial sonography (TCS) plays a crucial role in diagnosing Parkinson's disease. However, the intricate nature of TCS pathological features, the lack of consistent diagnostic criteria, and the dependence on physicians' expertise can hinder accurate diagnosis. Current TCS-based diagnostic methods, which rely on machine learning, often involve complex feature engineering and may struggle to capture deep image features. While deep learning offers advantages in image processing, it has not been tailored to address specific TCS and movement disorder considerations. Consequently, there is a scarcity of research on deep learning algorithms for TCS-based PD diagnosis. METHODS: This study introduces a deep learning residual network model, augmented with attention mechanisms and multi-scale feature extraction, termed AMSNet, to assist in accurate diagnosis. Initially, a multi-scale feature extraction module is implemented to robustly handle the irregular morphological features and significant area information present in TCS images. This module effectively mitigates the effects of artifacts and noise. When combined with a convolutional attention module, it enhances the model's ability to learn features of lesion areas. Subsequently, a residual network architecture, integrated with channel attention, is utilized to capture hierarchical and detailed textures within the images, further enhancing the model's feature representation capabilities. RESULTS: The study compiled TCS images and personal data from 1109 participants. Experiments conducted on this dataset demonstrated that AMSNet achieved remarkable classification accuracy (92.79%), precision (95.42%), and specificity (93.1%). It surpassed the performance of previously employed machine learning algorithms in this domain, as well as current general-purpose deep learning models. CONCLUSION: The AMSNet proposed in this study deviates from traditional machine learning approaches that necessitate intricate feature engineering. It is capable of automatically extracting and learning deep pathological features, and has the capacity to comprehend and articulate complex data. This underscores the substantial potential of deep learning methods in the application of TCS images for the diagnosis of movement disorders.


Subject(s)
Deep Learning , Image Processing, Computer-Assisted , Parkinson Disease , Ultrasonography, Doppler, Transcranial , Humans , Parkinson Disease/diagnostic imaging , Image Processing, Computer-Assisted/methods , Ultrasonography, Doppler, Transcranial/methods
7.
BMC Med Inform Decis Mak ; 24(1): 19, 2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38247009

ABSTRACT

BACKGROUND: In clinical medicine, fetal heart rate (FHR) monitoring using cardiotocography (CTG) is one of the most commonly used methods for assessing fetal acidosis. However, as the visual interpretation of CTG depends on the subjective judgment of the clinician, this has led to high inter-observer and intra-observer variability, making it necessary to introduce automated diagnostic techniques. METHODS: In this study, we propose a computer-aided diagnostic algorithm (Hybrid-FHR) for fetal acidosis to assist physicians in making objective decisions and taking timely interventions. Hybrid-FHR uses multi-modal features, including one-dimensional FHR signals and three types of expert features designed based on prior knowledge (morphological time domain, frequency domain, and nonlinear). To extract the spatiotemporal feature representation of one-dimensional FHR signals, we designed a multi-scale squeeze and excitation temporal convolutional network (SE-TCN) backbone model based on dilated causal convolution, which can effectively capture the long-term dependence of FHR signals by expanding the receptive field of each layer's convolution kernel while maintaining a relatively small parameter size. In addition, we proposed a cross-modal feature fusion (CMFF) method that uses multi-head attention mechanisms to explore the relationships between different modalities, obtaining more informative feature representations and improving diagnostic accuracy. RESULTS: Our ablation experiments show that the Hybrid-FHR outperforms traditional previous methods, with average accuracy, specificity, sensitivity, precision, and F1 score of 96.8, 97.5, 96, 97.5, and 96.7%, respectively. CONCLUSIONS: Our algorithm enables automated CTG analysis, assisting healthcare professionals in the early identification of fetal acidosis and the prompt implementation of interventions.


Subject(s)
Acidosis , Fetal Diseases , Female , Pregnancy , Humans , Acidosis/diagnosis , Algorithms , Cardiotocography , Decision Making , Artificial Intelligence
8.
Sensors (Basel) ; 24(1)2024 Jan 02.
Article in English | MEDLINE | ID: mdl-38203134

ABSTRACT

In ocean remote sensing missions, recognizing an underwater acoustic target is a crucial technology for conducting marine biological surveys, ocean explorations, and other scientific activities that take place in water. The complex acoustic propagation characteristics present significant challenges for the recognition of underwater acoustic targets (UATR). Methods such as extracting the DEMON spectrum of a signal and inputting it into an artificial neural network for recognition, and fusing the multidimensional features of a signal for recognition, have been proposed. However, there is still room for improvement in terms of noise immunity, improved computational performance, and reduced reliance on specialized knowledge. In this article, we propose the Residual Attentional Convolutional Neural Network (RACNN), a convolutional neural network that quickly and accurately recognize the type of ship-radiated noise. This network is capable of extracting internal features of Mel Frequency Cepstral Coefficients (MFCC) of the underwater ship-radiated noise. Experimental results demonstrate that the proposed model achieves an overall accuracy of 99.34% on the ShipsEar dataset, surpassing conventional recognition methods and other deep learning models.

9.
Sensors (Basel) ; 24(8)2024 Apr 22.
Article in English | MEDLINE | ID: mdl-38676273

ABSTRACT

Deep neural networks must address the dual challenge of delivering high-accuracy predictions and providing user-friendly explanations. While deep models are widely used in the field of time series modeling, deciphering the core principles that govern the models' outputs remains a significant challenge. This is crucial for fostering the development of trusted models and facilitating domain expert validation, thereby empowering users and domain experts to utilize them confidently in high-risk decision-making contexts (e.g., decision-support systems in healthcare). In this work, we put forward a deep prototype learning model that supports interpretable and manipulable modeling and classification of medical time series (i.e., ECG signal). Specifically, we first optimize the representation of single heartbeat data by employing a bidirectional long short-term memory and attention mechanism, and then construct prototypes during the training phase. The final classification outcomes (i.e., normal sinus rhythm, atrial fibrillation, and other rhythm) are determined by comparing the input with the obtained prototypes. Moreover, the proposed model presents a human-machine collaboration mechanism, allowing domain experts to refine the prototypes by integrating their expertise to further enhance the model's performance (contrary to the human-in-the-loop paradigm, where humans primarily act as supervisors or correctors, intervening when required, our approach focuses on a human-machine collaboration, wherein both parties engage as partners, enabling more fluid and integrated interactions). The experimental outcomes presented herein delineate that, within the realm of binary classification tasks-specifically distinguishing between normal sinus rhythm and atrial fibrillation-our proposed model, albeit registering marginally lower performance in comparison to certain established baseline models such as Convolutional Neural Networks (CNNs) and bidirectional long short-term memory with attention mechanisms (Bi-LSTMAttns), evidently surpasses other contemporary state-of-the-art prototype baseline models. Moreover, it demonstrates significantly enhanced performance relative to these prototype baseline models in the context of triple classification tasks, which encompass normal sinus rhythm, atrial fibrillation, and other rhythm classifications. The proposed model manifests a commendable prediction accuracy of 0.8414, coupled with macro precision, recall, and F1-score metrics of 0.8449, 0.8224, and 0.8235, respectively, achieving both high classification accuracy as well as good interpretability.


Subject(s)
Electrocardiography , Neural Networks, Computer , Humans , Electrocardiography/methods , Atrial Fibrillation/physiopathology , Atrial Fibrillation/diagnosis , Deep Learning , Heart Rate/physiology , Algorithms , Signal Processing, Computer-Assisted
10.
Sensors (Basel) ; 24(18)2024 Sep 11.
Article in English | MEDLINE | ID: mdl-39338650

ABSTRACT

With the rapid advancement of intelligent manufacturing technologies, the operating environments of modern robotic arms are becoming increasingly complex. In addition to the diversity of objects, there is often a high degree of similarity between the foreground and the background. Although traditional RGB-based object-detection models have achieved remarkable success in many fields, they still face the challenge of effectively detecting targets with textures similar to the background. To address this issue, we introduce the WoodenCube dataset, which contains over 5000 images of 10 different types of blocks. All images are densely annotated with object-level categories, bounding boxes, and rotation angles. Additionally, a new evaluation metric, Cube-mAP, is proposed to more accurately assess the detection performance of cube-like objects. In addition, we have developed a simple, yet effective, framework for WoodenCube, termed CS-SKNet, which captures strong texture features in the scene by enlarging the network's receptive field. The experimental results indicate that our CS-SKNet achieves the best performance on the WoodenCube dataset, as evaluated by the Cube-mAP metric. We further evaluate the CS-SKNet on the challenging DOTAv1.0 dataset, with the consistent enhancement demonstrating its strong generalization capability.

11.
Sensors (Basel) ; 24(18)2024 Sep 21.
Article in English | MEDLINE | ID: mdl-39338855

ABSTRACT

Accurate crop disease classification is crucial for ensuring food security and enhancing agricultural productivity. However, the existing crop disease classification algorithms primarily focus on a single image modality and typically require a large number of samples. Our research counters these issues by using pre-trained Vision-Language Models (VLMs), which enhance the multimodal synergy for better crop disease classification than the traditional unimodal approaches. Firstly, we apply the multimodal model Qwen-VL to generate meticulous textual descriptions for representative disease images selected through clustering from the training set, which will serve as prompt text for generating classifier weights. Compared to solely using the language model for prompt text generation, this approach better captures and conveys fine-grained and image-specific information, thereby enhancing the prompt quality. Secondly, we integrate cross-attention and SE (Squeeze-and-Excitation) Attention into the training-free mode VLCD(Vision-Language model for Crop Disease classification) and the training-required mode VLCD-T (VLCD-Training), respectively, for prompt text processing, enhancing the classifier weights by emphasizing the key text features. The experimental outcomes conclusively prove our method's heightened classification effectiveness in few-shot crop disease scenarios, tackling the data limitations and intricate disease recognition issues. It offers a pragmatic tool for agricultural pathology and reinforces the smart farming surveillance infrastructure.


Subject(s)
Algorithms , Crops, Agricultural , Plant Diseases , Image Processing, Computer-Assisted/methods
12.
J Xray Sci Technol ; 2024 Sep 11.
Article in English | MEDLINE | ID: mdl-39269816

ABSTRACT

BACKGROUND: Content-based image retrieval (CBIR) systems are vital for managing the large volumes of data produced by medical imaging technologies. They enable efficient retrieval of relevant medical images from extensive databases, supporting clinical diagnosis, treatment planning, and medical research. OBJECTIVE: This study aims to enhance CBIR systems' effectiveness in medical image analysis by introducing the VisualSift Ensembling Integration with Attention Mechanisms (VEIAM). VEIAM seeks to improve diagnostic accuracy and retrieval efficiency by integrating robust feature extraction with dynamic attention mechanisms. METHODS: VEIAM combines Scale-Invariant Feature Transform (SIFT) with selective attention mechanisms to emphasize crucial regions within medical images dynamically. Implemented in Python, the model integrates seamlessly into existing medical image analysis workflows, providing a robust and accessible tool for clinicians and researchers. RESULTS: The proposed VEIAM model demonstrated an impressive accuracy of 97.34% in classifying and retrieving medical images. This performance indicates VEIAM's capability to discern subtle patterns and textures critical for accurate diagnostics. CONCLUSIONS: By merging SIFT-based feature extraction with attention processes, VEIAM offers a discriminatively powerful approach to medical image analysis. Its high accuracy and efficiency in retrieving relevant medical images make it a promising tool for enhancing diagnostic processes and supporting medical research in CBIR systems.

13.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi ; 41(3): 544-551, 2024 Jun 25.
Article in Zh | MEDLINE | ID: mdl-38932541

ABSTRACT

Skin cancer is a significant public health issue, and computer-aided diagnosis technology can effectively alleviate this burden. Accurate identification of skin lesion types is crucial when employing computer-aided diagnosis. This study proposes a multi-level attention cascaded fusion model based on Swin-T and ConvNeXt. It employed hierarchical Swin-T and ConvNeXt to extract global and local features, respectively, and introduced residual channel attention and spatial attention modules for further feature extraction. Multi-level attention mechanisms were utilized to process multi-scale global and local features. To address the problem of shallow features being lost due to their distance from the classifier, a hierarchical inverted residual fusion module was proposed to dynamically adjust the extracted feature information. Balanced sampling strategies and focal loss were employed to tackle the issue of imbalanced categories of skin lesions. Experimental testing on the ISIC2018 and ISIC2019 datasets yielded accuracy, precision, recall, and F1-Score of 96.01%, 93.67%, 92.65%, and 93.11%, respectively, and 92.79%, 91.52%, 88.90%, and 90.15%, respectively. Compared to Swin-T, the proposed method achieved an accuracy improvement of 3.60% and 1.66%, and compared to ConvNeXt, it achieved an accuracy improvement of 2.87% and 3.45%. The experiments demonstrate that the proposed method accurately classifies skin lesion images, providing a new solution for skin cancer diagnosis.


Subject(s)
Algorithms , Diagnosis, Computer-Assisted , Skin Neoplasms , Humans , Skin Neoplasms/pathology , Skin Neoplasms/diagnostic imaging , Skin Neoplasms/classification , Diagnosis, Computer-Assisted/methods , Skin/pathology , Image Interpretation, Computer-Assisted/methods
14.
Sensors (Basel) ; 23(5)2023 Feb 21.
Article in English | MEDLINE | ID: mdl-36904579

ABSTRACT

Speech enhancement tasks for audio with a low SNR are challenging. Existing speech enhancement methods are mainly designed for high SNR audio, and they usually use RNNs to model audio sequence features, which causes the model to be unable to learn long-distance dependencies, thus limiting its performance in low-SNR speech enhancement tasks. We design a complex transformer module with sparse attention to overcome this problem. Different from the traditional transformer model, this model is extended to effectively model complex domain sequences, using the sparse attention mask balance model's attention to long-distance and nearby relations, introducing the pre-layer positional embedding module to enhance the model's perception of position information, adding the channel attention module to enable the model to dynamically adjust the weight distribution between channels according to the input audio. The experimental results show that, in the low-SNR speech enhancement tests, our models have noticeable performance improvements in speech quality and intelligibility, respectively.


Subject(s)
Speech Perception , Speech , Cognition , Learning
15.
Sensors (Basel) ; 23(15)2023 Aug 07.
Article in English | MEDLINE | ID: mdl-37571773

ABSTRACT

Images captured under complex conditions frequently have low quality, and image performance obtained under low-light conditions is poor and does not satisfy subsequent engineering processing. The goal of low-light image enhancement is to restore low-light images to normal illumination levels. Although many methods have emerged in this field, they are inadequate for dealing with noise, color deviation, and exposure issues. To address these issues, we present CGAAN, a new unsupervised generative adversarial network that combines a new attention module and a new normalization function based on cycle generative adversarial networks and employs a global-local discriminator trained with unpaired low-light and normal-light images and stylized region loss. Our attention generates feature maps via global and average pooling, and the weights of different feature maps are calculated by multiplying learnable parameters and feature maps in the appropriate order. These weights indicate the significance of corresponding features. Specifically, our attention is a feature map attention mechanism that improves the network's feature-extraction ability by distinguishing the normal light domain from the low-light domain to obtain an attention map to solve the color bias and exposure problems. The style region loss guides the network to more effectively eliminate the effects of noise. The new normalization function we present preserves more semantic information while normalizing the image, which can guide the model to recover more details and improve image quality even further. The experimental results demonstrate that the proposed method can produce good results that are useful for practical applications.

16.
Sensors (Basel) ; 23(13)2023 Jun 22.
Article in English | MEDLINE | ID: mdl-37447673

ABSTRACT

Safety helmets are essential in various indoor and outdoor workplaces, such as metallurgical high-temperature operations and high-rise building construction, to avoid injuries and ensure safety in production. However, manual supervision is costly and prone to lack of enforcement and interference from other human factors. Moreover, small target object detection frequently lacks precision. Improving safety helmets based on the helmet detection algorithm can address these issues and is a promising approach. In this study, we proposed a modified version of the YOLOv5s network, a lightweight deep learning-based object identification network model. The proposed model extends the YOLOv5s network model and enhances its performance by recalculating the prediction frames, utilizing the IoU metric for clustering, and modifying the anchor frames with the K-means++ method. The global attention mechanism (GAM) and the convolutional block attention module (CBAM) were added to the YOLOv5s network to improve its backbone and neck networks. By minimizing information feature loss and enhancing the representation of global interactions, these attention processes enhance deep learning neural networks' capacity for feature extraction. Furthermore, the CBAM is integrated into the CSP module to improve target feature extraction while minimizing computation for model operation. In order to significantly increase the efficiency and precision of the prediction box regression, the proposed model additionally makes use of the most recent SIoU (SCYLLA-IoU LOSS) as the bounding box loss function. Based on the improved YOLOv5s model, knowledge distillation technology is leveraged to realize the light weight of the network model, thereby reducing the computational workload of the model and improving the detection speed to meet the needs of real-time monitoring. The experimental results demonstrate that the proposed model outperforms the original YOLOv5s network model in terms of accuracy (Precision), recall rate (Recall), and mean average precision (mAP). The proposed model may more effectively identify helmet use in low-light situations and at a variety of distances.


Subject(s)
Algorithms , Head Protective Devices , Humans , Cluster Analysis , Neural Networks, Computer
17.
Sensors (Basel) ; 23(17)2023 Aug 23.
Article in English | MEDLINE | ID: mdl-37687809

ABSTRACT

Road scene understanding, as a field of research, has attracted increasing attention in recent years. The development of road scene understanding capabilities that are applicable to real-world road scenarios has seen numerous complications. This has largely been due to the cost and complexity of achieving human-level scene understanding, at which successful segmentation of road scene elements can be achieved with a mean intersection over union score close to 1.0. There is a need for more of a unified approach to road scene segmentation for use in self-driving systems. Previous works have demonstrated how deep learning methods can be combined to improve the segmentation and perception performance of road scene understanding systems. This paper proposes a novel segmentation system that uses fully connected networks, attention mechanisms, and multiple-input data stream fusion to improve segmentation performance. Results show comparable performance compared to previous works, with a mean intersection over union of 87.4% on the Cityscapes dataset.

18.
Sensors (Basel) ; 23(3)2023 Jan 26.
Article in English | MEDLINE | ID: mdl-36772423

ABSTRACT

As the monitor probes are used more and more widely these days, the task of detecting abnormal behaviors in surveillance videos has gained widespread attention. The generalization ability and parameter overhead of the model affect how accurate the detection result is. To deal with the poor generalization ability and high parameter overhead of the model in existing anomaly detection methods, we propose a three-dimensional multi-branch convolutional fusion network, named "Branch-Fusion Net". The network is designed with a multi-branch structure not only to significantly reduce parameter overhead but also to improve the generalization ability by understanding the input feature map from different perspectives. To ignore useless features during the model training, we propose a simple yet effective Channel Spatial Attention Module (CSAM), which sequentially focuses attention on key channels and spatial feature regions to suppress useless features and enhance important features. We combine the Branch-Fusion Net and the CSAM as a local feature extraction network and use the Bi-Directional Gated Recurrent Unit (Bi-GRU) to extract global feature information. The experiments are validated on a self-built Crimes-mini dataset, and the accuracy of anomaly detection in surveillance videos reaches 93.55% on the test set. The result shows that the model proposed in the paper significantly improves the accuracy of anomaly detection in surveillance videos with low parameter overhead.

19.
Sensors (Basel) ; 23(14)2023 Jul 15.
Article in English | MEDLINE | ID: mdl-37514723

ABSTRACT

With the wide application of visual sensors and development of digital image processing technology, image copy-move forgery detection (CMFD) has become more and more prevalent. Copy-move forgery is copying one or several areas of an image and pasting them into another part of the same image, and CMFD is an efficient means to expose this. There are improper uses of forged images in industry, the military, and daily life. In this paper, we present an efficient end-to-end deep learning approach for CMFD, using a span-partial structure and attention mechanism (SPA-Net). The SPA-Net extracts feature roughly using a pre-processing module and finely extracts deep feature maps using the span-partial structure and attention mechanism as a SPA-net feature extractor module. The span-partial structure is designed to reduce the redundant feature information, while the attention mechanism in the span-partial structure has the advantage of focusing on the tamper region and suppressing the original semantic information. To explore the correlation between high-dimension feature points, a deep feature matching module assists SPA-Net to locate the copy-move areas by computing the similarity of the feature map. A feature upsampling module is employed to upsample the features to their original size and produce a copy-move mask. Furthermore, the training strategy of SPA-Net without pretrained weights has a balance between copy-move and semantic features, and then the module can capture more features of copy-move forgery areas and reduce the confusion from semantic objects. In the experiment, we do not use pretrained weights or models from existing networks such as VGG16, which would bring the limitation of the network paying more attention to objects other than copy-move areas.To deal with this problem, we generated a SPANet-CMFD dataset by applying various processes to the benchmark images from SUN and COCO datasets, and we used existing copy-move forgery datasets, CMH, MICC-F220, MICC-F600, GRIP, Coverage, and parts of USCISI-CMFD, together with our generated SPANet-CMFD dataset, as the training set to train our model. In addition, the SPANet-CMFD dataset could play a big part in forgery detection, such as deepfakes. We employed the CASIA and CoMoFoD datasets as testing datasets to verify the performance of our proposed method. The Precision, Recall, and F1 are calculated to evaluate the CMFD results. Comparison results showed that our model achieved a satisfactory performance on both testing datasets and performed better than the existing methods.

20.
Sensors (Basel) ; 24(1)2023 Dec 29.
Article in English | MEDLINE | ID: mdl-38203066

ABSTRACT

To address the challenges of balancing accuracy and speed, as well as the parameters and FLOPs in current insulator defect detection, we propose an enhanced insulator defect detection algorithm, ML-YOLOv5, based on the YOLOv5 network. The backbone module incorporates depthwise separable convolution, and the feature fusion C3 module is replaced with the improved C2f_DG module. Furthermore, we enhance the feature pyramid network (MFPN) and employ knowledge distillation using YOLOv5m as the teacher model. Experimental results demonstrate that this approach achieved a 46.9% reduction in parameter count and a 43.0% reduction in FLOPs, while maintaining an FPS of 63.6. It exhibited good accuracy and detection speed on both the CPLID and IDID datasets, making it suitable for real-time inspection of high-altitude insulator defects.

SELECTION OF CITATIONS
SEARCH DETAIL