Pesquisa | Biblioteca Virtual em Saúde

1.

SymTC: A symbiotic Transformer-CNN net for instance segmentation of lumbar spine MRI.

Chen, Jiasong; Qian, Linchen; Ma, Linhai; Urakov, Timur; Gu, Weiyong; Liang, Liang.

Comput Biol Med ; 179: 108795, 2024 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-38955128

RESUMO

Intervertebral disc disease, a prevalent ailment, frequently leads to intermittent or persistent low back pain, and diagnosing and assessing of this disease rely on accurate measurement of vertebral bone and intervertebral disc geometries from lumbar MR images. Deep neural network (DNN) models may assist clinicians with more efficient image segmentation of individual instances (discs and vertebrae) of the lumbar spine in an automated way, which is termed as instance image segmentation. In this work, we proposed SymTC, an innovative lumbar spine MR image segmentation model that combines the strengths of Transformer and Convolutional Neural Network (CNN). Specifically, we designed a parallel dual-path architecture to merge CNN layers and Transformer layers, and we integrated a novel position embedding into the self-attention module of Transformer, enhancing the utilization of positional information for more accurate segmentation. To further improve model performance, we introduced a new data synthesis technique to create synthetic yet realistic MR image dataset, named SSMSpine, which is made publicly available. We evaluated our SymTC and the other 16 representative image segmentation models on our private in-house dataset and public SSMSpine dataset, using two metrics, Dice Similarity Coefficient and the 95th percentile Hausdorff Distance. The results indicate that SymTC surpasses the other 16 methods, achieving the highest dice score of 96.169 % for segmenting vertebral bones and intervertebral discs on the SSMSpine dataset. The SymTC code and SSMSpine dataset are publicly available at https://github.com/jiasongchen/SymTC.

2.

CTHNet: a network for wheat ear counting with local-global features fusion based on hybrid architecture.

Hong, Qingqing; Liu, Wei; Zhu, Yue; Ren, Tianyu; Shi, Changrong; Lu, Zhixin; Yang, Yunqin; Deng, Ruiting; Qian, Jing; Tan, Changwei.

Front Plant Sci ; 15: 1425131, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39015290

RESUMO

Accurate wheat ear counting is one of the key indicators for wheat phenotyping. Convolutional neural network (CNN) algorithms for counting wheat have evolved into sophisticated tools, however because of the limitations of sensory fields, CNN is unable to simulate global context information, which has an impact on counting performance. In this study, we present a hybrid attention network (CTHNet) for wheat ear counting from RGB images that combines local features and global context information. On the one hand, to extract multi-scale local features, a convolutional neural network is built using the Cross Stage Partial framework. On the other hand, to acquire better global context information, tokenized image patches from convolutional neural network feature maps are encoded as input sequences using Pyramid Pooling Transformer. Then, the feature fusion module merges the local features with the global context information to significantly enhance the feature representation. The Global Wheat Head Detection Dataset and Wheat Ear Detection Dataset are used to assess the proposed model. There were 3.40 and 5.21 average absolute errors, respectively. The performance of the proposed model was significantly better than previous studies.

3.

ArabBert-LSTM: improving Arabic sentiment analysis based on transformer model and Long Short-Term Memory.

Alosaimi, Wael; Saleh, Hager; Hamzah, Ali A; El-Rashidy, Nora; Alharb, Abdullah; Elaraby, Ahmed; Mostafa, Sherif.

Front Artif Intell ; 7: 1408845, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39015364

RESUMO

Sentiment analysis also referred to as opinion mining, plays a significant role in automating the identification of negative, positive, or neutral sentiments expressed in textual data. The proliferation of social networks, review sites, and blogs has rendered these platforms valuable resources for mining opinions. Sentiment analysis finds applications in various domains and languages, including English and Arabic. However, Arabic presents unique challenges due to its complex morphology characterized by inflectional and derivation patterns. To effectively analyze sentiment in Arabic text, sentiment analysis techniques must account for this intricacy. This paper proposes a model designed using the transformer model and deep learning (DL) techniques. The word embedding is represented by Transformer-based Model for Arabic Language Understanding (ArabBert), and then passed to the AraBERT model. The output of AraBERT is subsequently fed into a Long Short-Term Memory (LSTM) model, followed by feedforward neural networks and an output layer. AraBERT is used to capture rich contextual information and LSTM to enhance sequence modeling and retain long-term dependencies within the text data. We compared the proposed model with machine learning (ML) algorithms and DL algorithms, as well as different vectorization techniques: term frequency-inverse document frequency (TF-IDF), ArabBert, Continuous Bag-of-Words (CBOW), and skipGrams using four Arabic benchmark datasets. Through extensive experimentation and evaluation of Arabic sentiment analysis datasets, we showcase the effectiveness of our approach. The results underscore significant improvements in sentiment analysis accuracy, highlighting the potential of leveraging transformer models for Arabic Sentiment Analysis. The outcomes of this research contribute to advancing Arabic sentiment analysis, enabling more accurate and reliable sentiment analysis in Arabic text. The findings reveal that the proposed framework exhibits exceptional performance in sentiment classification, achieving an impressive accuracy rate of over 97%.

4.

Wearable Data From Subjects Playing Super Mario, Taking University Exams, or Performing Physical Exercise Help Detect Acute Mood Disorder Episodes via Self-Supervised Learning: Prospective, Exploratory, Observational Study.

Corponi, Filippo; Li, Bryan M; Anmella, Gerard; Valenzuela-Pascual, Clàudia; Mas, Ariadna; Pacchiarotti, Isabella; Valentí, Marc; Grande, Iria; Benabarre, Antoni; Garriga, Marina; Vieta, Eduard; Young, Allan H; Lawrie, Stephen M; Whalley, Heather C; Hidalgo-Mazzei, Diego; Vergari, Antonio.

JMIR Mhealth Uhealth ; 12: e55094, 2024 Jul 17.

Artigo em Inglês | MEDLINE | ID: mdl-39018100

RESUMO

BACKGROUND: Personal sensing, leveraging data passively and near-continuously collected with wearables from patients in their ecological environment, is a promising paradigm to monitor mood disorders (MDs), a major determinant of the worldwide disease burden. However, collecting and annotating wearable data is resource intensive. Studies of this kind can thus typically afford to recruit only a few dozen patients. This constitutes one of the major obstacles to applying modern supervised machine learning techniques to MD detection. OBJECTIVE: In this paper, we overcame this data bottleneck and advanced the detection of acute MD episodes from wearables' data on the back of recent advances in self-supervised learning (SSL). This approach leverages unlabeled data to learn representations during pretraining, subsequently exploited for a supervised task. METHODS: We collected open access data sets recording with the Empatica E4 wristband spanning different, unrelated to MD monitoring, personal sensing tasks-from emotion recognition in Super Mario players to stress detection in undergraduates-and devised a preprocessing pipeline performing on-/off-body detection, sleep/wake detection, segmentation, and (optionally) feature extraction. With 161 E4-recorded subjects, we introduced E4SelfLearning, the largest-to-date open access collection, and its preprocessing pipeline. We developed a novel E4-tailored transformer (E4mer) architecture, serving as the blueprint for both SSL and fully supervised learning; we assessed whether and under which conditions self-supervised pretraining led to an improvement over fully supervised baselines (ie, the fully supervised E4mer and pre-deep learning algorithms) in detecting acute MD episodes from recording segments taken in 64 (n=32, 50%, acute, n=32, 50%, stable) patients. RESULTS: SSL significantly outperformed fully supervised pipelines using either our novel E4mer or extreme gradient boosting (XGBoost): n=3353 (81.23%) against n=3110 (75.35%; E4mer) and n=2973 (72.02%; XGBoost) correctly classified recording segments from a total of 4128 segments. SSL performance was strongly associated with the specific surrogate task used for pretraining, as well as with unlabeled data availability. CONCLUSIONS: We showed that SSL, a paradigm where a model is pretrained on unlabeled data with no need for human annotations before deployment on the supervised target task of interest, helps overcome the annotation bottleneck; the choice of the pretraining surrogate task and the size of unlabeled data for pretraining are key determinants of SSL success. We introduced E4mer, which can be used for SSL, and shared the E4SelfLearning collection, along with its preprocessing pipeline, which can foster and expedite future research into SSL for personal sensing.

Assuntos

Transtornos do Humor , Aprendizado de Máquina Supervisionado , Dispositivos Eletrônicos Vestíveis , Humanos , Estudos Prospectivos , Dispositivos Eletrônicos Vestíveis/estatística & dados numéricos , Dispositivos Eletrônicos Vestíveis/normas , Masculino , Feminino , Transtornos do Humor/diagnóstico , Transtornos do Humor/psicologia , Adulto , Exercício Físico/psicologia , Exercício Físico/fisiologia , Universidades/estatística & dados numéricos , Universidades/organização & administração

5.

Bifurcation detection in intravascular optical coherence tomography using vision transformer based deep learning.

Zhu, Rongyang; Li, Qingrui; Ding, Zhenyang; Liu, Kun; Lin, Qiutong; Yu, Yin; Li, Yuanyao; Zhou, Shanshan; Kuang, Hao; Jiang, Junfeng; Liu, Tiegen.

Phys Med Biol ; 69(15)2024 Jul 18.

Artigo em Inglês | MEDLINE | ID: mdl-38981596

RESUMO

Objective. Bifurcation detection in intravascular optical coherence tomography (IVOCT) images plays a significant role in guiding optimal revascularization strategies for percutaneous coronary intervention (PCI). We propose a bifurcation detection method using vision transformer (ViT) based deep learning in IVOCT.Approach. Instead of relying on lumen segmentation, the proposed method identifies the bifurcation image using a ViT-based classification model and then estimate bifurcation ostium points by a ViT-based landmark detection model.Main results. By processing 8640 clinical images, the Accuracy and F1-score of bifurcation identification by the proposed ViT-based model are 2.54% and 16.08% higher than that of traditional non-deep learning methods, are similar to the best performance of convolutional neural networks (CNNs) based methods, respectively. The ostium distance error of the ViT-based model is 0.305 mm, which is reduced 68.5% compared with the traditional non-deep learning method and reduced 24.81% compared with the best performance of CNNs based methods. The results also show that the proposed ViT-based method achieves the highest success detection rate are 11.3% and 29.2% higher than the non-deep learning method, and 4.6% and 2.5% higher than the best performance of CNNs based methods when the distance section is 0.1 and 0.2 mm, respectively.Significance. The proposed ViT-based method enhances the performance of bifurcation detection of IVOCT images, which maintains a high correlation and consistency between the automatic detection results and the expert manual results. It is of great significance in guiding the selection of PCI treatment strategies.

Assuntos

Aprendizado Profundo , Processamento de Imagem Assistida por Computador , Tomografia de Coerência Óptica , Tomografia de Coerência Óptica/métodos , Humanos , Processamento de Imagem Assistida por Computador/métodos , Vasos Coronários/diagnóstico por imagem

6.

Masked pre-training of transformers for histology image analysis.

Jiang, Shuai; Hondelink, Liesbeth; Suriawinata, Arief A; Hassanpour, Saeed.

J Pathol Inform ; 15: 100386, 2024 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-39006998

RESUMO

In digital pathology, whole-slide images (WSIs) are widely used for applications such as cancer diagnosis and prognosis prediction. Vision transformer (ViT) models have recently emerged as a promising method for encoding large regions of WSIs while preserving spatial relationships among patches. However, due to the large number of model parameters and limited labeled data, applying transformer models to WSIs remains challenging. In this study, we propose a pretext task to train the transformer model in a self-supervised manner. Our model, MaskHIT, uses the transformer output to reconstruct masked patches, measured by contrastive loss. We pre-trained MaskHIT model using over 7000 WSIs from TCGA and extensively evaluated its performance in multiple experiments, covering survival prediction, cancer subtype classification, and grade prediction tasks. Our experiments demonstrate that the pre-training procedure enables context-aware understanding of WSIs, facilitates the learning of representative histological features based on patch positions and visual patterns, and is essential for the ViT model to achieve optimal results on WSI-level tasks. The pre-trained MaskHIT surpasses various multiple instance learning approaches by 3% and 2% on survival prediction and cancer subtype classification tasks, and also outperforms recent state-of-the-art transformer-based methods. Finally, a comparison between the attention maps generated by the MaskHIT model with pathologist's annotations indicates that the model can accurately identify clinically relevant histological structures on the whole slide for each task.

7.

A deep learning anthropomorphic model observer for a detection task in PET.

Shao, Muhan; Byrd, Darrin W; Mitra, Jhimli; Behnia, Fatemeh; Lee, Jean H; Iravani, Amir; Sadic, Murat; Chen, Delphine L; Wollenweber, Scott D; Abbey, Craig K; Kinahan, Paul E; Ahn, Sangtae.

Med Phys ; 2024 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-39008812

RESUMO

BACKGROUND: Lesion detection is one of the most important clinical tasks in positron emission tomography (PET) for oncology. An anthropomorphic model observer (MO) designed to replicate human observers (HOs) in a detection task is an important tool for assessing task-based image quality. The channelized Hotelling observer (CHO) has been the most popular anthropomorphic MO. Recently, deep learning MOs (DLMOs), mostly based on convolutional neural networks (CNNs), have been investigated for various imaging modalities. However, there have been few studies on DLMOs for PET. PURPOSE: The goal of the study is to investigate whether DLMOs can predict HOs better than conventional MOs such as CHO in a two-alternative forced-choice (2AFC) detection task using PET images with real anatomical variability. METHODS: Two types of DLMOs were implemented: (1) CNN DLMO, and (2) CNN-SwinT DLMO that combines CNN and Swin Transformer (SwinT) encoders. Lesion-absent PET images were reconstructed from clinical data, and lesion-present images were reconstructed with adding simulated lesion sinogram data. Lesion-present and lesion-absent PET image pairs were labeled by eight HOs consisting of four radiologists and four image scientists in a 2AFC detection task. In total, 2268 pairs of lesion-present and lesion-absent images were used for training, 324 pairs for validation, and 324 pairs for test. CNN DLMO, CNN-SwinT DLMO, CHO with internal noise, and non-prewhitening matched filter (NPWMF) were compared in the same train-test paradigm. For comparison, six quantitative metrics including prediction accuracy, mean squared errors (MSEs) and correlation coefficients, which measure how well a MO predicts HOs, were calculated in a 9-fold cross-validation experiment. RESULTS: In terms of the accuracy and MSE metrics, CNN DLMO and CNN-SwinT DLMO showed better performance than CHO and NPWMF, and CNN-SwinT DLMO showed the best performance among the MOs evaluated. CONCLUSIONS: DLMO can predict HOs more accurately than conventional MOs such as CHO in PET lesion detection. Combining SwinT and CNN encoders can improve the DLMO prediction performance compared to using CNN only.

8.

MDC-RHT: Multi-Modal Medical Image Fusion via Multi-Dimensional Dynamic Convolution and Residual Hybrid Transformer.

Wang, Wenqing; He, Ji; Liu, Han; Yuan, Wei.

Sensors (Basel) ; 24(13)2024 Jun 21.

Artigo em Inglês | MEDLINE | ID: mdl-39000834

RESUMO

The fusion of multi-modal medical images has great significance for comprehensive diagnosis and treatment. However, the large differences between the various modalities of medical images make multi-modal medical image fusion a great challenge. This paper proposes a novel multi-scale fusion network based on multi-dimensional dynamic convolution and residual hybrid transformer, which has better capability for feature extraction and context modeling and improves the fusion performance. Specifically, the proposed network exploits multi-dimensional dynamic convolution that introduces four attention mechanisms corresponding to four different dimensions of the convolutional kernel to extract more detailed information. Meanwhile, a residual hybrid transformer is designed, which activates more pixels to participate in the fusion process by channel attention, window attention, and overlapping cross attention, thereby strengthening the long-range dependence between different modes and enhancing the connection of global context information. A loss function, including perceptual loss and structural similarity loss, is designed, where the former enhances the visual reality and perceptual details of the fused image, and the latter enables the model to learn structural textures. The whole network adopts a multi-scale architecture and uses an unsupervised end-to-end method to realize multi-modal image fusion. Finally, our method is tested qualitatively and quantitatively on mainstream datasets. The fusion results indicate that our method achieves high scores in most quantitative indicators and satisfactory performance in visual qualitative analysis.

9.

Distributed Vibration Monitoring System for 10 kV-400 kVA 3D Wound Core Transformer under Progressive Short-Circuit Impulses.

Tao, Jiagui; Zhang, Sicong; Dai, Jianzhuo; Zhu, Jinwei; Zhao, Heng.

Sensors (Basel) ; 24(13)2024 Jun 21.

Artigo em Inglês | MEDLINE | ID: mdl-39000841

RESUMO

As large-scale, high-proportion, and efficient distribution transformers surge into the grids, anti-short circuit capability testing of transformer windings in efficient distribution seems necessary and prominent. To deeply explore the influence of progressively short-circuit shock impulses on the core winding deformation of efficient power transformers, a finite element theoretical model was built by referring to a three-phase three-winding 3D wound core transformer with a model of S20-MRL-400/10-NX2. The distributions of internal equivalent force and total deformation of the 3D wound core transformer along different paths under progressively short-circuit shock impulses varying from 60% to 120% were investigated. Results show that the equivalent stress and total deformation change rate reach their maximum as the short-circuit current increases from 60% to 80%, and the maximum and average variation rate for the equivalent stress reach 177.75% and 177.43%, while the maximum and average variation rate for the total deformation corresponds to 178.30% and 177.45%, respectively. Meanwhile, the maximum equivalent stress and maximum total deformation reach 29.81 MPa and 38.70 µm, respectively, as the applied short-circuit current increased to 120%. In light of the above observations, the optimization and deployment of wireless sensor nodes was suggested. Therefore, a distributed monitoring system was developed for acquiring the vibration status of the windings in a 3D wound core transformer, which is a beneficial supplement to the traditional short-circuit reactance detection methods for an efficient grid access spot-check of distribution transformers.

10.

An Improved Method for Detecting Crane Wheel-Rail Faults Based on YOLOv8 and the Swin Transformer.

Li, Yunlong; Tang, Xiuli; Liu, Wusheng; Huang, Yuefeng; Li, Zhinong.

Sensors (Basel) ; 24(13)2024 Jun 24.

Artigo em Inglês | MEDLINE | ID: mdl-39000865

RESUMO

In the realm of special equipment, significant advancements have been achieved in fault detection. Nonetheless, faults originating in the equipment manifest with diverse morphological characteristics and varying scales. Certain faults necessitate the extrapolation from global information owing to their occurrence in localized areas. Simultaneously, the intricacies of the inspection area's background easily interfere with the intelligent detection processes. Hence, a refined YOLOv8 algorithm leveraging the Swin Transformer is proposed, tailored for detecting faults in special equipment. The Swin Transformer serves as the foundational network of the YOLOv8 framework, amplifying its capability to concentrate on comprehensive features during the feature extraction, crucial for fault analysis. A multi-head self-attention mechanism regulated by a sliding window is utilized to expand the observation window's scope. Moreover, an asymptotic feature pyramid network is introduced to augment spatial feature extraction for smaller targets. Within this network architecture, adjacent low-level features are merged, while high-level features are gradually integrated into the fusion process. This prevents loss or degradation of feature information during transmission and interaction, enabling accurate localization of smaller targets. Drawing from wheel-rail faults of lifting equipment as an illustration, the proposed method is employed to diagnose an expanded fault dataset generated through transfer learning. Experimental findings substantiate that the proposed method in adeptly addressing numerous challenges encountered in the intelligent fault detection of special equipment. Moreover, it outperforms mainstream target detection models, achieving real-time detection capabilities.

11.

A Multi-Task Joint Learning Model Based on Transformer and Customized Gate Control for Predicting Remaining Useful Life and Health Status of Tools.

Hou, Chunming; Zheng, Liaomo.

Sensors (Basel) ; 24(13)2024 Jun 25.

Artigo em Inglês | MEDLINE | ID: mdl-39000896

RESUMO

Previous studies have primarily focused on predicting the remaining useful life (RUL) of tools as an independent process. However, the RUL of a tool is closely related to its wear stage. In light of this, a multi-task joint learning model based on a transformer encoder and customized gate control (TECGC) is proposed for simultaneous prediction of tool RUL and tool wear stages. Specifically, the transformer encoder is employed as the backbone of the TECGC model for extracting shared features from the original data. The customized gate control (CGC) is utilized to extract task-specific features relevant to tool RUL prediction and tool wear stage and shared features. Finally, by integrating these components, the tool RUL and the tool wear stage can be predicted simultaneously by the TECGC model. In addition, a dynamic adaptive multi-task learning loss function is proposed for the model's training to enhance its calculation efficiency. This approach avoids unsatisfactory prediction performance of the model caused by unreasonable selection of trade-off parameters of the loss function. The effectiveness of the TECGC model is evaluated using the PHM2010 dataset. The results demonstrate its capability to accurately predict tool RUL and tool wear stages.

12.

LBRT: Local-Information-Refined Transformer for Image Copy-Move Forgery Detection.

Liang, Peng; Li, Ziyuan; Tu, Hang; Zhao, Huimin.

Sensors (Basel) ; 24(13)2024 Jun 26.

Artigo em Inglês | MEDLINE | ID: mdl-39000921

RESUMO

The current deep learning methods for copy-move forgery detection (CMFD) are mostly based on deep convolutional neural networks, which frequently discard a large amount of detail information throughout convolutional feature extraction and have poor long-range information extraction capabilities. The Transformer structure is adept at modeling global context information, but the patch-wise self-attention calculation still neglects the extraction of details in local regions that have been tampered with. A local-information-refined dual-branch network, LBRT (Local Branch Refinement Transformer), is designed in this study. It performs Transformer encoding on the global patches segmented from the image and local patches re-segmented from the global patches using a global modeling branch and a local refinement branch, respectively. The self-attention features from both branches are precisely fused, and the fused feature map is then up-sampled and decoded. Therefore, LBRT considers both global semantic information modeling and local detail information refinement. The experimental results show that LBRT outperforms several state-of-the-art CMFD methods on the USCISI dataset, CASIA CMFD dataset, and DEFACTO CMFD dataset.

13.

Enhanced Hybrid Vision Transformer with Multi-Scale Feature Integration and Patch Dropping for Facial Expression Recognition.

Li, Nianfeng; Huang, Yongyuan; Wang, Zhenyan; Fan, Ziyao; Li, Xinyuan; Xiao, Zhiguo.

Sensors (Basel) ; 24(13)2024 Jun 26.

Artigo em Inglês | MEDLINE | ID: mdl-39000930

RESUMO

Convolutional neural networks (CNNs) have made significant progress in the field of facial expression recognition (FER). However, due to challenges such as occlusion, lighting variations, and changes in head pose, facial expression recognition in real-world environments remains highly challenging. At the same time, methods solely based on CNN heavily rely on local spatial features, lack global information, and struggle to balance the relationship between computational complexity and recognition accuracy. Consequently, the CNN-based models still fall short in their ability to address FER adequately. To address these issues, we propose a lightweight facial expression recognition method based on a hybrid vision transformer. This method captures multi-scale facial features through an improved attention module, achieving richer feature integration, enhancing the network's perception of key facial expression regions, and improving feature extraction capabilities. Additionally, to further enhance the model's performance, we have designed the patch dropping (PD) module. This module aims to emulate the attention allocation mechanism of the human visual system for local features, guiding the network to focus on the most discriminative features, reducing the influence of irrelevant features, and intuitively lowering computational costs. Extensive experiments demonstrate that our approach significantly outperforms other methods, achieving an accuracy of 86.51% on RAF-DB and nearly 70% on FER2013, with a model size of only 3.64 MB. These results demonstrate that our method provides a new perspective for the field of facial expression recognition.

Assuntos

Expressão Facial , Redes Neurais de Computação , Humanos , Reconhecimento Facial Automatizado/métodos , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Face , Reconhecimento Automatizado de Padrão/métodos

14.

Bridging Convolutional Neural Networks and Transformers for Efficient Crack Detection in Concrete Building Structures.

Yadav, Dhirendra Prasad; Sharma, Bhisham; Chauhan, Shivank; Dhaou, Imed Ben.

Sensors (Basel) ; 24(13)2024 Jun 30.

Artigo em Inglês | MEDLINE | ID: mdl-39001034

RESUMO

Detecting cracks in building structures is an essential practice that ensures safety, promotes longevity, and maintains the economic value of the built environment. In the past, machine learning (ML) and deep learning (DL) techniques have been used to enhance classification accuracy. However, the conventional CNN (convolutional neural network) methods incur high computational costs owing to their extensive number of trainable parameters and tend to extract only high-dimensional shallow features that may not comprehensively represent crack characteristics. We proposed a novel convolution and composite attention transformer network (CCTNet) model to address these issues. CCTNet enhances crack identification by processing more input pixels and combining convolution channel attention with window-based self-attention mechanisms. This dual approach aims to leverage the localized feature extraction capabilities of CNNs with the global contextual understanding afforded by self-attention mechanisms. Additionally, we applied an improved cross-attention module within CCTNet to increase the interaction and integration of features across adjacent windows. The performance of CCTNet on the Historical Building Crack2019, SDTNET2018, and proposed DS3 has a precision of 98.60%, 98.93%, and 99.33%, respectively. Furthermore, the training validation loss of the proposed model is close to zero. In addition, the AUC (area under the curve) is 0.99 and 0.98 for the Historical Building Crack2019 and SDTNET2018, respectively. CCTNet not only outperforms existing methodologies but also sets a new standard for the accurate, efficient, and reliable detection of cracks in building structures.

15.

Bearing-DETR: A Lightweight Deep Learning Model for Bearing Defect Detection Based on RT-DETR.

Liu, Minggao; Wang, Haifeng; Du, Luyao; Ji, Fangsong; Zhang, Ming.

Sensors (Basel) ; 24(13)2024 Jun 30.

Artigo em Inglês | MEDLINE | ID: mdl-39001040

RESUMO

Detecting bearing defects accurately and efficiently is critical for industrial safety and efficiency. This paper introduces Bearing-DETR, a deep learning model optimised using the Real-Time Detection Transformer (RT-DETR) architecture. Enhanced with Dysample Dynamic Upsampling, Efficient Model Optimization (EMO) with Meta-Mobile Blocks (MMB), and Deformable Large Kernel Attention (D-LKA), Bearing-DETR offers significant improvements in defect detection while maintaining a lightweight framework suitable for low-resource devices. Validated on a dataset from a chemical plant, Bearing-DETR outperformed the standard RT-DETR, achieving a mean average precision (mAP) of 94.3% at IoU = 0.5 and 57.5% at IoU = 0.5-0.95. It also reduced floating-point operations (FLOPs) to 8.2 G and parameters to 3.2 M, underscoring its enhanced efficiency and reduced computational demands. These results demonstrate the potential of Bearing-DETR to transform maintenance strategies and quality control across manufacturing environments, emphasising adaptability and impact on sustainability and operational costs.

16.

A Novel Part Refinement Tandem Transformer for Human-Object Interaction Detection.

Su, Zhan; Yang, Hongzhe.

Sensors (Basel) ; 24(13)2024 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-39001055

RESUMO

Human-object interaction (HOI) detection identifies a "set of interactions" in an image involving the recognition of interacting instances and the classification of interaction categories. The complexity and variety of image content make this task challenging. Recently, the Transformer has been applied in computer vision and received attention in the HOI detection task. Therefore, this paper proposes a novel Part Refinement Tandem Transformer (PRTT) for HOI detection. Unlike the previous Transformer-based HOI method, PRTT utilizes multiple decoders to split and process rich elements of HOI prediction and introduces a new part state feature extraction (PSFE) module to help improve the final interaction category classification. We adopt a novel prior feature integrated cross-attention (PFIC) to utilize the fine-grained partial state semantic and appearance feature output obtained by the PSFE module to guide queries. We validate our method on two public datasets, V-COCO and HICO-DET. Compared to state-of-the-art models, the performance of detecting human-object interaction is significantly improved by the PRTT.

17.

Transformer Discharge Carbon-Trace Detection Based on Improved MSRCR Image-Enhancement Algorithm and YOLOv8 Model.

Ji, Hongxin; Han, Peilin; Li, Jiaqi; Liu, Xinghua; Liu, Liqing.

Sensors (Basel) ; 24(13)2024 Jul 02.

Artigo em Inglês | MEDLINE | ID: mdl-39001089

RESUMO

It is difficult to visually detect internal defects in a large transformer with a metal closure. For convenient internal inspection, a micro-robot was adopted, and an inspection method based on an image-enhancement algorithm and an improved deep-learning network was proposed in this paper. Considering the dim environment inside the transformer and the problems of irregular imaging distance and fluctuating supplementary light conditions during image acquisition with the internal-inspection robot, an improved MSRCR algorithm for image enhancement was proposed. It could analyze the local contrast of the image and enhance the details on multiple scales. At the same time, a white-balance algorithm was introduced to enhance the contrast and brightness and solve the problems of overexposure and color distortion. To improve the target recognition performance of complex carbon-trace defects, the SimAM mechanism was incorporated into the Backbone network of the YOLOv8 model to enhance the extraction of carbon-trace features. Meanwhile, the DyHead dynamic detection Head framework was constructed at the output of the YOLOv8 model to improve the perception of local carbon traces with different sizes. To improve the defect target recognition speed of the transformer-inspection robot, a pruning operation was carried out on the YOLOv8 model to remove redundant parameters, realize model lightness, and improve detection efficiency. To verify the effectiveness of the improved algorithm, the detection model was trained and validated with the carbon-trace dataset. The results showed that the MSH-YOLOv8 algorithm achieved an accuracy of 91.80%, which was 3.4 percentage points higher compared to the original YOLOv8 algorithm, and had a significant advantage over other mainstream target-detection algorithms. Meanwhile, the FPS of the proposed algorithm was up to 99.2, indicating that the model computation and model complexity were successfully reduced, which meets the requirements for engineering applications of the transformer internal-inspection robot.

18.

Research on a Recognition Algorithm for Traffic Signs in Foggy Environments Based on Image Defogging and Transformer.

Liu, Zhaohui; Yan, Jun; Zhang, Jinzhao.

Sensors (Basel) ; 24(13)2024 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-39001149

RESUMO

The efficient and accurate identification of traffic signs is crucial to the safety and reliability of active driving assistance and driverless vehicles. However, the accurate detection of traffic signs under extreme cases remains challenging. Aiming at the problems of missing detection and false detection in traffic sign recognition in fog traffic scenes, this paper proposes a recognition algorithm for traffic signs based on pix2pixHD+YOLOv5-T. Firstly, the defogging model is generated by training the pix2pixHD network to meet the advanced visual task. Secondly, in order to better match the defogging algorithm with the target detection algorithm, the algorithm YOLOv5-Transformer is proposed by introducing a transformer module into the backbone of YOLOv5. Finally, the defogging algorithm pix2pixHD is combined with the improved YOLOv5 detection algorithm to complete the recognition of traffic signs in foggy environments. Comparative experiments proved that the traffic sign recognition algorithm proposed in this paper can effectively reduce the impact of a foggy environment on traffic sign recognition. Compared with the YOLOv5-T and YOLOv5 algorithms in moderate fog environments, the overall improvement of this algorithm is achieved. The precision of traffic sign recognition of the algorithm in the fog traffic scene reached 78.5%, the recall rate was 72.2%, and mAP@0.5 was 82.8%.

19.

Prediction of Freezing of Gait in Parkinson's disease based on multi-channel time-series neural network.

Wang, Boyan; Hu, Xuegang; Ge, Rongjun; Xu, Chenchu; Zhang, Jinglin; Gao, Zhifan; Zhao, Shu; Polat, Kemal.

Artif Intell Med ; 154: 102932, 2024 Jul 06.

Artigo em Inglês | MEDLINE | ID: mdl-39004005

RESUMO

Freezing of Gait (FOG) is a noticeable symptom of Parkinson's disease, like being stuck in place and increasing the risk of falls. The wearable multi-channel sensor system is an efficient method to predict and monitor the FOG, thus warning the wearer to avoid falls and improving the quality of life. However, the existing approaches for the prediction of FOG mainly focus on a single sensor system and cannot handle the interference between multi-channel wearable sensors. Hence, we propose a novel multi-channel time-series neural network (MCT-Net) approach to merge multi-channel gait features into a comprehensive prediction framework, alerting patients to FOG symptoms in advance. Owing to the causal distributed convolution, MCT-Net is a real-time method available to give optimal prediction earlier and implemented in remote devices. Moreover, intra-channel and inter-channel transformers of MCT-Net extract and integrate different sensor position features into a unified deep learning model. Compared with four other state-of-the-art FOG prediction baselines, the proposed MCT-Net obtains 96.21% in accuracy and 80.46% in F1-score on average 2 s before FOG occurrence, demonstrating the superiority of MCT-Net.

20.

3D mobile regression vision transformer for collateral imaging in acute ischemic stroke.

Jung, Sumin; Yang, Hyun; Kim, Hyun Jeong; Roh, Hong Gee; Kwak, Jin Tae.

Int J Comput Assist Radiol Surg ; 2024 Jul 13.

Artigo em Inglês | MEDLINE | ID: mdl-39002099

RESUMO

PURPOSE: The accurate and timely assessment of the collateral perfusion status is crucial in the diagnosis and treatment of patients with acute ischemic stroke. Previous works have shown that collateral imaging, derived from CT angiography, MR perfusion, and MR angiography, aids in evaluating the collateral status. However, such methods are time-consuming and/or sub-optimal due to the nature of manual processing and heuristics. Recently, deep learning approaches have shown to be promising for generating collateral imaging. These, however, suffer from the computational complexity and cost. METHODS: In this study, we propose a mobile, lightweight deep regression neural network for collateral imaging in acute ischemic stroke, leveraging dynamic susceptibility contrast MR perfusion (DSC-MRP). Built based upon lightweight convolution and Transformer architectures, the proposed model manages the balance between the model complexity and performance. RESULTS: We evaluated the performance of the proposed model in generating the five-phase collateral maps, including arterial, capillary, early venous, late venous, and delayed phases, using DSC-MRP from 952 patients. In comparison with various deep learning models, the proposed method was superior to the competitors with similar complexity and was comparable to the competitors of high complexity. CONCLUSION: The results suggest that the proposed model is able to facilitate rapid and precise assessment of the collateral status of patients with acute ischemic stroke, leading to improved patient care and outcome.

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA