Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 45
Filter
1.
Magn Reson Med ; 91(2): 803-818, 2024 Feb.
Article in English | MEDLINE | ID: mdl-37849048

ABSTRACT

PURPOSE: To present a Swin Transformer-based deep learning (DL) model (SwinIR) for denoising single-delay and multi-delay 3D arterial spin labeling (ASL) and compare its performance with convolutional neural network (CNN) and other Transformer-based methods. METHODS: SwinIR and CNN-based spatial denoising models were developed for single-delay ASL. The models were trained on 66 subjects (119 scans) and tested on 39 subjects (44 scans) from three different vendors. Spatiotemporal denoising models were developed using another dataset (6 subjects, 10 scans) of multi-delay ASL. A range of input conditions was tested for denoising single and multi-delay ASL, respectively. The performance was evaluated using similarity metrics, spatial SNR and quantification accuracy of cerebral blood flow (CBF), and arterial transit time (ATT). RESULTS: SwinIR outperformed CNN and other Transformer-based networks, whereas pseudo-3D models performed better than 2D models for denoising single-delay ASL. The similarity metrics and image quality (SNR) improved with more slices in pseudo-3D models and further improved when using M0 as input, but introduced greater biases for CBF quantification. Pseudo-3D models with three slices achieved optimal balance between SNR and accuracy, which can be generalized to different vendors. For multi-delay ASL, spatiotemporal denoising models had better performance than spatial-only models with reduced biases in fitted CBF and ATT maps. CONCLUSIONS: SwinIR provided better performance than CNN and other Transformer-based methods for denoising both single and multi-delay 3D ASL data. The proposed model offers flexibility to improve image quality and/or reduce scan time for 3D ASL to facilitate its clinical use.


Subject(s)
Deep Learning , Magnetic Resonance Imaging , Humans , Magnetic Resonance Imaging/methods , Brain/diagnostic imaging , Brain/blood supply , Spin Labels , Arteries , Cerebrovascular Circulation/physiology , Image Processing, Computer-Assisted/methods
2.
Sensors (Basel) ; 24(7)2024 Mar 28.
Article in English | MEDLINE | ID: mdl-38610392

ABSTRACT

The decipherment of ancient Chinese scripts, such as oracle bone and bronze inscriptions, holds immense significance for understanding ancient Chinese history, culture, and civilization. Despite substantial progress in recognizing oracle bone script, research on the overall recognition of ancient Chinese characters remains somewhat lacking. To tackle this issue, we pioneered the construction of a large-scale image dataset comprising 9233 distinct ancient Chinese characters sourced from images obtained through archaeological excavations. We propose the first model for recognizing the common ancient Chinese characters. This model consists of four stages with Linear Embedding and Swin-Transformer blocks, each supplemented by a CoT Block to enhance local feature extraction. We also advocate for an enhancement strategy, which involves two steps: firstly, conducting adaptive data enhancement on the original data, and secondly, randomly resampling the data. The experimental results, with a top-one accuracy of 87.25% and a top-five accuracy of 95.81%, demonstrate that our proposed method achieves remarkable performance. Furthermore, through the visualizing of model attention, it can be observed that the proposed model, trained on a large number of images, is able to capture the morphological characteristics of ancient Chinese characters to a certain extent.

3.
Sensors (Basel) ; 24(12)2024 Jun 17.
Article in English | MEDLINE | ID: mdl-38931702

ABSTRACT

To tackle the intricate challenges associated with the low detection accuracy of images taken by unmanned aerial vehicles (UAVs), arising from the diverse sizes and types of objects coupled with limited feature information, we present the SRE-YOLOv8 as an advanced method. Our method enhances the YOLOv8 object detection algorithm by leveraging the Swin Transformer and a lightweight residual feature pyramid network (RE-FPN) structure. Firstly, we introduce an optimized Swin Transformer module into the backbone network to preserve ample global contextual information during feature extraction and to extract a broader spectrum of features using self-attention mechanisms. Subsequently, we integrate a Residual Feature Augmentation (RFA) module and a lightweight attention mechanism named ECA, thereby transforming the original FPN structure to RE-FPN, intensifying the network's emphasis on critical features. Additionally, an SOD (small object detection) layer is incorporated to enhance the network's ability to recognize the spatial information of the model, thus augmenting accuracy in detecting small objects. Finally, we employ a Dynamic Head equipped with multiple attention mechanisms in the object detection head to enhance its performance in identifying low-resolution targets amidst complex backgrounds. Experimental evaluation conducted on the VisDrone2021 dataset reveals a significant advancement, showcasing an impressive 9.2% enhancement over the original YOLOv8 algorithm.

4.
Sensors (Basel) ; 24(13)2024 Jun 24.
Article in English | MEDLINE | ID: mdl-39000865

ABSTRACT

In the realm of special equipment, significant advancements have been achieved in fault detection. Nonetheless, faults originating in the equipment manifest with diverse morphological characteristics and varying scales. Certain faults necessitate the extrapolation from global information owing to their occurrence in localized areas. Simultaneously, the intricacies of the inspection area's background easily interfere with the intelligent detection processes. Hence, a refined YOLOv8 algorithm leveraging the Swin Transformer is proposed, tailored for detecting faults in special equipment. The Swin Transformer serves as the foundational network of the YOLOv8 framework, amplifying its capability to concentrate on comprehensive features during the feature extraction, crucial for fault analysis. A multi-head self-attention mechanism regulated by a sliding window is utilized to expand the observation window's scope. Moreover, an asymptotic feature pyramid network is introduced to augment spatial feature extraction for smaller targets. Within this network architecture, adjacent low-level features are merged, while high-level features are gradually integrated into the fusion process. This prevents loss or degradation of feature information during transmission and interaction, enabling accurate localization of smaller targets. Drawing from wheel-rail faults of lifting equipment as an illustration, the proposed method is employed to diagnose an expanded fault dataset generated through transfer learning. Experimental findings substantiate that the proposed method in adeptly addressing numerous challenges encountered in the intelligent fault detection of special equipment. Moreover, it outperforms mainstream target detection models, achieving real-time detection capabilities.

5.
Sensors (Basel) ; 24(9)2024 May 01.
Article in English | MEDLINE | ID: mdl-38733010

ABSTRACT

Underwater visual detection technology is crucial for marine exploration and monitoring. Given the growing demand for accurate underwater target recognition, this study introduces an innovative architecture, YOLOv8-MU, which significantly enhances the detection accuracy. This model incorporates the large kernel block (LarK block) from UniRepLKNet to optimize the backbone network, achieving a broader receptive field without increasing the model's depth. Additionally, the integration of C2fSTR, which combines the Swin transformer with the C2f module, and the SPPFCSPC_EMA module, which blends Cross-Stage Partial Fast Spatial Pyramid Pooling (SPPFCSPC) with attention mechanisms, notably improves the detection accuracy and robustness for various biological targets. A fusion block from DAMO-YOLO further enhances the multi-scale feature extraction capabilities in the model's neck. Moreover, the adoption of the MPDIoU loss function, designed around the vertex distance, effectively addresses the challenges of localization accuracy and boundary clarity in underwater organism detection. The experimental results on the URPC2019 dataset indicate that YOLOv8-MU achieves an mAP@0.5 of 78.4%, showing an improvement of 4.0% over the original YOLOv8 model. Additionally, on the URPC2020 dataset, it achieves 80.9%, and, on the Aquarium dataset, it reaches 75.5%, surpassing other models, including YOLOv5 and YOLOv8n, thus confirming the wide applicability and generalization capabilities of our proposed improved model architecture. Furthermore, an evaluation on the improved URPC2019 dataset demonstrates leading performance (SOTA), with an mAP@0.5 of 88.1%, further verifying its superiority on this dataset. These results highlight the model's broad applicability and generalization capabilities across various underwater datasets.

6.
Sensors (Basel) ; 24(2)2024 Jan 19.
Article in English | MEDLINE | ID: mdl-38276329

ABSTRACT

Fatigue driving is a serious threat to road safety, which is why accurately identifying fatigue driving behavior and warning drivers in time are of great significance in improving traffic safety. However, accurately recognizing fatigue driving is still challenging due to large intra-class variations in facial expression, continuity of behaviors, and illumination conditions. A fatigue driving recognition method based on feature parameter images and a residual Swin Transformer is proposed in this paper. First, the face region is detected through spatial pyramid pooling and a multi-scale feature output module. Then, a multi-scale facial landmark detector is used to locate 23 key points on the face. The aspect ratios of the eyes and mouth are calculated based on the coordinates of these key points, and a feature parameter matrix for fatigue driving recognition is obtained. Finally, the feature parameter matrix is converted into an image, and the residual Swin Transformer network is presented to recognize fatigue driving. Experimental results on the HNUFD dataset show that the proposed method achieves an accuracy of 96.512%, thus outperforming state-of-the-art methods.

7.
Sensors (Basel) ; 24(11)2024 May 30.
Article in English | MEDLINE | ID: mdl-38894331

ABSTRACT

In view of the frequent failures occurring in rolling bearings, the strong background noise present in signals, weak features, and difficulties associated with extracting fault characteristics, a method of enhancing and diagnosing rolling bearing faults based on coarse-grained lattice features (CGLFs) is proposed. First, the vibrational signals of bearings are subjected to adaptive filtering to eliminate background noise. Second, frequency-domain transformation is performed, and a coarse-grained approach is used to continuously segment the spectrum. Within each segment, amplitude-enhancement operations are executed, transforming the data into a CGLF graph that enhances fault characteristics. This graph is then fed into a Swin Transformer-based pattern-recognition network. Third and finally, a high-precision fault diagnosis model is constructed using fully connected layers and Softmax, enabling the diagnosis of bearing faults. The fault recognition accuracy reaches 98.30% and 98.50% with public datasets and laboratory data, respectively, thereby validating the feasibility and effectiveness of the proposed method. This research offers an efficient and feasible fault diagnosis approach for rolling bearings.

8.
Sensors (Basel) ; 24(15)2024 Jul 31.
Article in English | MEDLINE | ID: mdl-39124022

ABSTRACT

Nowadays, autonomous driving technology has become widely prevalent. The intelligent vehicles have been equipped with various sensors (e.g., vision sensors, LiDAR, depth cameras etc.). Among them, the vision systems with tailored semantic segmentation and perception algorithms play critical roles in scene understanding. However, the traditional supervised semantic segmentation needs a large number of pixel-level manual annotations to complete model training. Although few-shot methods reduce the annotation work to some extent, they are still labor intensive. In this paper, a self-supervised few-shot semantic segmentation method based on Multi-task Learning and Dense Attention Computation (dubbed MLDAC) is proposed. The salient part of an image is split into two parts; one of them serves as the support mask for few-shot segmentation, while cross-entropy losses are calculated between the other part and the entire region with the predicted results separately as multi-task learning so as to improve the model's generalization ability. Swin Transformer is used as our backbone to extract feature maps at different scales. These feature maps are then input to multiple levels of dense attention computation blocks to enhance pixel-level correspondence. The final prediction results are obtained through inter-scale mixing and feature skip connection. The experimental results indicate that MLDAC obtains 55.1% and 26.8% one-shot mIoU self-supervised few-shot segmentation on the PASCAL-5i and COCO-20i datasets, respectively. In addition, it achieves 78.1% on the FSS-1000 few-shot dataset, proving its efficacy.

9.
Sensors (Basel) ; 24(4)2024 Feb 18.
Article in English | MEDLINE | ID: mdl-38400467

ABSTRACT

In this paper, we propose a novel method for monocular depth estimation using the hourglass neck module. The proposed method has the following originality. First, feature maps are extracted from Swin Transformer V2 using a masked image modeling (MIM) pretrained model. Since Swin Transformer V2 has a different patch size for each attention stage, it is easier to extract local and global features from images input by the vision transformer (ViT)-based encoder. Second, to maintain the polymorphism and local inductive bias of the feature map extracted from Swin Transformer V2, a feature map is input into the hourglass neck module. Third, deformable attention can be used at the waist of the hourglass neck module to reduce the computation cost and highlight the locality of the feature map. Finally, the feature map traverses the neck and proceeds through a decoder, comprised of a deconvolution layer and an upsampling layer, to generate a depth image. To evaluate the objective reliability of the proposed method in this paper, we used the NYU Depth V2 dataset to compare and evaluate the methods published in other papers. As a result of the experiment, the RMSE value of the novel method for monocular depth estimation using the hourglass neck module proposed in this paper was 0.274, which was lower than those published in other papers. The lower the RMSE value, the better the depth estimation method; therefore, its efficiency compared to other techniques has been proven.

10.
Sensors (Basel) ; 24(3)2024 Jan 24.
Article in English | MEDLINE | ID: mdl-38339469

ABSTRACT

Deep learning (DL) in magnetic resonance imaging (MRI) shows excellent performance in image reconstruction from undersampled k-space data. Artifact-free and high-quality MRI reconstruction is essential for ensuring accurate diagnosis, supporting clinical decision-making, enhancing patient safety, facilitating efficient workflows, and contributing to the validity of research studies and clinical trials. Recently, deep learning has demonstrated several advantages over conventional MRI reconstruction methods. Conventional methods rely on manual feature engineering to capture complex patterns and are usually computationally demanding due to their iterative nature. Conversely, DL methods use neural networks with hundreds of thousands of parameters and automatically learn relevant features and representations directly from the data. Nevertheless, there are some limitations to DL-based techniques concerning MRI reconstruction tasks, such as the need for large, labeled datasets, the possibility of overfitting, and the complexity of model training. Researchers are striving to develop DL models that are more efficient, adaptable, and capable of providing valuable information for medical practitioners. We provide a comprehensive overview of the current developments and clinical uses by focusing on state-of-the-art DL architectures and tools used in MRI reconstruction. This study has three objectives. Our main objective is to describe how various DL designs have changed over time and talk about cutting-edge tactics, including their advantages and disadvantages. Hence, data pre- and post-processing approaches are assessed using publicly available MRI datasets and source codes. Secondly, this work aims to provide an extensive overview of the ongoing research on transformers and deep convolutional neural networks for rapid MRI reconstruction. Thirdly, we discuss several network training strategies, like supervised, unsupervised, transfer learning, and federated learning for rapid and efficient MRI reconstruction. Consequently, this article provides significant resources for future improvement of MRI data pre-processing and fast image reconstruction.


Subject(s)
Deep Learning , Image Processing, Computer-Assisted , Magnetic Resonance Imaging , Magnetic Resonance Imaging/methods , Humans , Image Processing, Computer-Assisted/methods , Neural Networks, Computer , Algorithms
11.
Chin J Traumatol ; 2024 Apr 23.
Article in English | MEDLINE | ID: mdl-38762418

ABSTRACT

PURPOSE: Intertrochanteric fracture (ITF) classification is crucial for surgical decision-making. However, orthopedic trauma surgeons have shown lower accuracy in ITF classification than expected. The objective of this study was to utilize an artificial intelligence (AI) method to improve the accuracy of ITF classification. METHODS: We trained a network called YOLOX-SwinT, which is based on the You Only Look Once X (YOLOX) object detection network with Swin Transformer (SwinT) as the backbone architecture, using 762 radiographic ITF examinations as the training set. Subsequently, we recruited 5 senior orthopedic trauma surgeons (SOTS) and 5 junior orthopedic trauma surgeons (JOTS) to classify the 85 original images in the test set, as well as the images with the prediction results of the network model in sequence. Statistical analysis was performed using the Statistical Package for the Social Sciences (SPSS) 20.0 (IBM Corp., Armonk, NY, USA) to compare the differences among the SOTS, JOTS, SOTS + AI, JOTS + AI, SOTS + JOTS, and SOTS + JOTS + AI groups. All images were classified according to the AO/OTA 2018 classification system by 2 experienced trauma surgeons and verified by another expert in this field. Based on the actual clinical needs, after discussion, we integrated 8 subgroups into 5 new subgroups, and the dataset was divided into training, validation, and test sets by the ratio of 8:1:1. RESULTS: The mean average precision at the intersection over union (IoU) of 0.5 (mAP50) for subgroup detection reached 90.29%. The classification accuracy values of SOTS, JOTS, SOTS + AI, and JOTS + AI groups were 56.24% ± 4.02%, 35.29% ± 18.07%, 79.53% ± 7.14%, and 71.53% ± 5.22%, respectively. The paired t-test results showed that the difference between the SOTS and SOTS + AI groups was statistically significant, as well as the difference between the JOTS and JOTS + AI groups, and the SOTS + JOTS and SOTS + JOTS + AI groups. Moreover, the difference between the SOTS + JOTS and SOTS + JOTS + AI groups in each subgroup was statistically significant, with all p < 0.05. The independent samples t-test results showed that the difference between the SOTS and JOTS groups was statistically significant, while the difference between the SOTS + AI and JOTS + AI groups was not statistically significant. With the assistance of AI, the subgroup classification accuracy of both SOTS and JOTS was significantly improved, and JOTS achieved the same level as SOTS. CONCLUSION: In conclusion, the YOLOX-SwinT network algorithm enhances the accuracy of AO/OTA subgroups classification of ITF by orthopedic trauma surgeons.

12.
Article in English | MEDLINE | ID: mdl-38619385

ABSTRACT

Dementia, an increasingly prevalent neurological disorder with a projected threefold rise globally by 2050, necessitates early detection for effective management. The risk notably increases after age 65. Dementia leads to a progressive decline in cognitive functions, affecting memory, reasoning, and problem-solving abilities. This decline can impact the individual's ability to perform daily tasks and make decisions, underscoring the crucial importance of timely identification. With the advent of technologies like computer vision and deep learning, the prospect of early detection becomes even more promising. Employing sophisticated algorithms on imaging data, such as positron emission tomography scans, facilitates the recognition of subtle structural brain changes, enabling diagnosis at an earlier stage for potentially more effective interventions. In an experimental study, the Swin transformer algorithm demonstrated superior overall accuracy compared to the vision transformer and convolutional neural network, emphasizing its efficiency. Detecting dementia early is essential for proactive management, personalized care, and implementing preventive measures, ultimately enhancing outcomes for individuals and lessening the overall burden on healthcare systems.

13.
Sci Rep ; 14(1): 9127, 2024 04 21.
Article in English | MEDLINE | ID: mdl-38644396

ABSTRACT

Vitiligo is a hypopigmented skin disease characterized by the loss of melanin. The progressive nature and widespread incidence of vitiligo necessitate timely and accurate detection. Usually, a single diagnostic test often falls short of providing definitive confirmation of the condition, necessitating the assessment by dermatologists who specialize in vitiligo. However, the current scarcity of such specialized medical professionals presents a significant challenge. To mitigate this issue and enhance diagnostic accuracy, it is essential to build deep learning models that can support and expedite the detection process. This study endeavors to establish a deep learning framework to enhance the diagnostic accuracy of vitiligo. To this end, a comparative analysis of five models including ResNet (ResNet34, ResNet50, and ResNet101 models) and Swin Transformer series (Swin Transformer Base, and Swin Transformer Large models), were conducted under the uniform condition to identify the model with superior classification capabilities. Moreover, the study sought to augment the interpretability of these models by selecting one that not only provides accurate diagnostic outcomes but also offers visual cues highlighting the regions pertinent to vitiligo. The empirical findings reveal that the Swin Transformer Large model achieved the best performance in classification, whose AUC, accuracy, sensitivity, and specificity are 0.94, 93.82%, 94.02%, and 93.5%, respectively. In terms of interpretability, the highlighted regions in the class activation map correspond to the lesion regions of the vitiligo images, which shows that it effectively indicates the specific category regions associated with the decision-making of dermatological diagnosis. Additionally, the visualization of feature maps generated in the middle layer of the deep learning model provides insights into the internal mechanisms of the model, which is valuable for improving the interpretability of the model, tuning performance, and enhancing clinical applicability. The outcomes of this study underscore the significant potential of deep learning models to revolutionize medical diagnosis by improving diagnostic accuracy and operational efficiency. The research highlights the necessity for ongoing exploration in this domain to fully leverage the capabilities of deep learning technologies in medical diagnostics.


Subject(s)
Deep Learning , Vitiligo , Vitiligo/diagnosis , Humans
14.
Sci Rep ; 14(1): 4577, 2024 Feb 25.
Article in English | MEDLINE | ID: mdl-38403711

ABSTRACT

The problem of change detection in remote sensing image processing is both difficult and important. It is extensively used in a variety of sectors, including land resource planning, monitoring and forecasting of agricultural plant health, and monitoring and assessment of natural disasters. Remote sensing images provide a large amount of long-term and fully covered data for earth environmental monitoring. A lot of progress has been made thanks to deep learning's quick development. But the majority of deep learning-based change detection techniques currently in use rely on the well-known Convolutional neural network (CNN). However, considering the locality of convolutional operation, CNN unable to master the interplay between global and distant semantic information. Some researches has employ Vision Transformer as a backbone in remote sensing field. Inspired by these researches, in this paper, we propose a network named Siam-Swin-Unet, which is a Siamesed pure Transformer with U-shape construction for remote sensing image change detection. Swin Transformer is a hierarchical vision transformer with shifted windows that can extract global feature. To learn local and global semantic feature information, the dual-time image are fed into Siam-Swin-Unet which is composed of Swin Transformer, Unet Siamesenet and two feature fusion module. Considered the Unet and Siamesenet are effective for change detection, We applied it to the model. The feature fusion module is designed for fusion of dual-time image features, and is efficient and low-compute confirmed by our experiments. Our network achieved 94.67 F1 on the CDD dataset (season varying).

15.
Nat Sci Sleep ; 16: 879-896, 2024.
Article in English | MEDLINE | ID: mdl-38974693

ABSTRACT

Purpose: This study aims to improve brain age estimation by developing a novel deep learning model utilizing overnight electroencephalography (EEG) data. Methods: We address limitations in current brain age prediction methods by proposing a model trained and evaluated on multiple cohort data, covering a broad age range. The model employs a one-dimensional Swin Transformer to efficiently extract complex patterns from sleep EEG signals and a convolutional neural network with attentional mechanisms to summarize sleep structural features. A multi-flow learning-based framework attentively merges these two features, employing sleep structural information to direct and augment the EEG features. A post-prediction model is designed to integrate the age-related features throughout the night. Furthermore, we propose a DecadeCE loss function to address the problem of an uneven age distribution. Results: We utilized 18,767 polysomnograms (PSGs) from 13,616 subjects to develop and evaluate the proposed model. The model achieves a mean absolute error (MAE) of 4.19 and a correlation of 0.97 on the mixed-cohort test set, and an MAE of 6.18 years and a correlation of 0.78 on an independent test set. Our brain age estimation work reduced the error by more than 1 year compared to other studies that also used EEG, achieving the level of neuroimaging. The estimated brain age index demonstrated longitudinal sensitivity and exhibited a significant increase of 1.27 years in individuals with psychiatric or neurological disorders relative to healthy individuals. Conclusion: The multi-flow deep learning model proposed in this study, based on overnight EEG, represents a more accurate approach for estimating brain age. The utilization of overnight sleep EEG for the prediction of brain age is both cost-effective and adept at capturing dynamic changes. These findings demonstrate the potential of EEG in predicting brain age, presenting a noninvasive and accessible method for assessing brain aging.

16.
Transl Oncol ; 46: 102034, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38875936

ABSTRACT

BACKGROUND: For pediatric patients with solid abdominal tumors, early diagnosis can guide clinical treatment decisions, and comprehensive preoperative evaluation is essential to reduce surgical risk. The aim of this study was to explore the feasibility of multiphase enhanced CT-based transformer in the early diagnosis of tumors and prediction of surgical risk events (SRE). METHODS: A total of 496 pediatric patients with solid abdominal tumors were enrolled in the study. With Swin transformer, we constructed and trained two Swin-T models based on preoperative multiphase enhanced CT for personalized prediction of tumor type and SRE status. Subsequently, we comprehensively evaluated the performance of each model and constructed four benchmark models for performance comparison. RESULTS: There was no significant difference in SRE status between tumor types. In the diagnostic task, areas under the receiver operating characteristic curves (AUC) of the Swin-T model were 0.987 (95 % CI, 0.973-0.997) and 0.844 (95 % CI, 0.730-0.940) in the training and validation cohorts, respectively. In predicting SRE, AUCs of the Swin-T model were 0.920 (95 % CI, 0.885-0.948) and 0.741 (95 % CI, 0.632-0.838) in the training and test cohorts, respectively. The Swin-T model achieved the best performance in both classification tasks compared to benchmark models. CONCLUSION: The Swin-T model is a promising tool to assist pediatricians in the differential diagnosis of abdominal tumors and in comprehensive preoperative evaluation.

17.
Cancers (Basel) ; 16(5)2024 Feb 29.
Article in English | MEDLINE | ID: mdl-38473348

ABSTRACT

Oral cancer, a pervasive and rapidly growing malignant disease, poses a significant global health concern. Early and accurate diagnosis is pivotal for improving patient outcomes. Automatic diagnosis methods based on artificial intelligence have shown promising results in the oral cancer field, but the accuracy still needs to be improved for realistic diagnostic scenarios. Vision Transformers (ViT) have outperformed learning CNN models recently in many computer vision benchmark tasks. This study explores the effectiveness of the Vision Transformer and the Swin Transformer, two cutting-edge variants of the transformer architecture, for the mobile-based oral cancer image classification application. The pre-trained Swin transformer model achieved 88.7% accuracy in the binary classification task, outperforming the ViT model by 2.3%, while the conventional convolutional network model VGG19 and ResNet50 achieved 85.2% and 84.5% accuracy. Our experiments demonstrate that these transformer-based architectures outperform traditional convolutional neural networks in terms of oral cancer image classification, and underscore the potential of the ViT and the Swin Transformer in advancing the state of the art in oral cancer image analysis.

18.
Health Inf Sci Syst ; 12(1): 33, 2024 Dec.
Article in English | MEDLINE | ID: mdl-38685986

ABSTRACT

White blood cells (WBC) play an effective role in the body's defense against parasites, viruses, and bacteria in the human body. Also, WBCs are categorized based on their morphological structures into various subgroups. The number of these WBC types in the blood of non-diseased and diseased people is different. Thus, the study of WBC classification is quite significant for medical diagnosis. Due to the widespread use of deep learning in medical image analysis in recent years, it has also been used in WBC classification. Moreover, the ConvMixer and Swin transformer models, recently introduced, have garnered significant success by attaining efficient long contextual characteristics. Based on this, a new multipath hybrid network is proposed for WBC classification by using ConvMixer and Swin transformer. This proposed model is called Swin Transformer and ConvMixer based Multipath mixer (SC-MP-Mixer). In the SC-MP-Mixer model, firstly, features with strong spatial details are extracted with the ConvMixer. Then Swin transformer effectively handle these features with self-attention mechanism. In addition, the ConvMixer and Swin transformer blocks consist of a multipath structure to obtain better patch representations in the SC-MP-Mixer. To test the performance of the SC-MP-Mixer, experiments were performed on three WBC datasets with 4 (BCCD), 8 (PBC) and 5 (Raabin) classes. The experimental studies resulted in an accuracy of 99.65% for PBC, 98.68% for Raabin, and 95.66% for BCCD. When compared with the studies in the literature and the state-of-the-art models, it was seen that the SC-MP-Mixer had more effective classification results.

19.
Phys Med Biol ; 69(10)2024 Apr 30.
Article in English | MEDLINE | ID: mdl-38604178

ABSTRACT

Objective.Cardiac computed tomography (CT) is widely used for diagnosis of cardiovascular disease, the leading cause of morbidity and mortality in the world. Diagnostic performance depends strongly on the temporal resolution of the CT images. To image the beating heart, one can reduce the scanning time by acquiring limited-angle projections. However, this leads to increased image noise and limited-angle-related artifacts. The goal of this paper is to reconstruct high quality cardiac CT images from limited-angle projections.Approach. The ability to reconstruct high quality images from limited-angle projections is highly desirable and remains a major challenge. With the development of deep learning networks, such as U-Net and transformer networks, progresses have been reached on image reconstruction and processing. Here we propose a hybrid model based on the U-Net and Swin-transformer (U-Swin) networks. The U-Net has the potential to restore structural information due to missing projection data and related artifacts, then the Swin-transformer can gather a detailed global feature distribution.Main results. Using synthetic XCAT and clinical cardiac COCA datasets, we demonstrate that our proposed method outperforms the state-of-the-art deep learning-based methods.Significance. It has a great potential to freeze the beating heart with a higher temporal resolution.


Subject(s)
Heart , Image Processing, Computer-Assisted , Tomography, X-Ray Computed , Image Processing, Computer-Assisted/methods , Heart/diagnostic imaging , Humans , Deep Learning
20.
Comput Biol Med ; 170: 108090, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38320341

ABSTRACT

The U-shaped convolutional neural network (CNN) has attained remarkable achievements in the segmentation of skin lesion. However, given the inherent locality of convolution, this architecture cannot capture long-range pixel dependencies and multiscale global contextual information effectively. Moreover, repeated convolutions and downsampling operations can readily result in the omission of intricate local fine-grained details. In this paper, we proposed a U-shaped network (DBNet-SI) equipped with a dual-branch module that combines shift window attention and inception structures. First, we proposed a dual-branch module that combines shift window attention and inception structures (MSI) to better capture multiscale global contextual information and long-range pixel dependencies. Specifically, we have devised a cross-branch bidirectional interaction module within the MSI module to enable information complementarity between the two branches in the channel and spatial dimensions. Therefore, MSI is capable of extracting distinguishing and comprehensive features to accurately identify the skin lesion boundaries. Second, we have devised a progressive feature enhancement and information compensation module (PFEIC), which progressively compensates for fine-grained features through reconstructed skip connections and integrated global context attention modules. The results of the experiment show the superior segmentation performance of DBNet-SI compared with other deep learning models for skin lesion segmentation in the ISIC2017 and ISIC2018 datasets. Ablation studies demonstrate that our model can effectively extract rich multiscale global contextual information and compensate for the loss of local details.


Subject(s)
Neural Networks, Computer , Skin Diseases , Humans , Skin Diseases/diagnostic imaging , Image Processing, Computer-Assisted
SELECTION OF CITATIONS
SEARCH DETAIL