RESUMEN
Introduction: Currently, using machine learning methods for precise analysis and improvement of swimming techniques holds significant research value and application prospects. The existing machine learning methods have improved the accuracy of action recognition to some extent. However, they still face several challenges such as insufficient data feature extraction, limited model generalization ability, and poor real-time performance. Methods: To address these issues, this paper proposes an innovative approach called Swimtrans Net: A multimodal robotic system for swimming action recognition driven via Swin-Transformer. By leveraging the powerful visual data feature extraction capabilities of Swin-Transformer, Swimtrans Net effectively extracts swimming image information. Additionally, to meet the requirements of multimodal tasks, we integrate the CLIP model into the system. Swin-Transformer serves as the image encoder for CLIP, and through fine-tuning the CLIP model, it becomes capable of understanding and interpreting swimming action data, learning relevant features and patterns associated with swimming. Finally, we introduce transfer learning for pre-training to reduce training time and lower computational resources, thereby providing real-time feedback to swimmers. Results and discussion: Experimental results show that Swimtrans Net has achieved a 2.94% improvement over the current state-of-the-art methods in swimming motion analysis and prediction, making significant progress. This study introduces an innovative machine learning method that can help coaches and swimmers better understand and improve swimming techniques, ultimately improving swimming performance.
RESUMEN
Sorting out plastic waste (PW) from municipal solid waste (MSW) by material type is crucial for reutilization and pollution reduction. However, current automatic separation methods are costly and inefficient, necessitating an advanced sorting process to ensure high feedstock purity. This study introduces a Swin Transformer-based model for effectively detecting PW in real-world MSW streams, leveraging both morphological and material properties. And, a dataset comprising 3560 optical images and infrared spectra data was created to support this task. This vision-based system can localize and classify PW into five categories: polypropylene (PP), polyethylene (PE), polyethylene terephthalate (PET), polyvinyl chloride (PVC), and polystyrene (PS). Performance evaluations reveal an accuracy rate of 99.75% and a mean Average Precision (mAP50) exceeding 91%. Compared to popular convolutional neural network (CNN)-based models, this well-trained Swin Transformer-based model offers enhanced convenience and performance in five-category PW detection task, maintaining a mAP50 over 80% in the real-life deployment. The model's effectiveness is further supported by visualization of detection results on MSW streams and principal component analysis of classification scores. These results demonstrate the system's significant effectiveness in both lab-scale and real-life conditions, aligning with global regulations and strategies that promote innovative technologies for plastic recycling, thereby contributing to the development of a sustainable circular economy.
RESUMEN
Lung adenocarcinoma and squamous cell carcinoma are the two most common pathological lung cancer subtypes. Accurate diagnosis and pathological subtyping are crucial for lung cancer treatment. Solitary solid lung nodules with lobulation and spiculation signs are often indicative of lung cancer; however, in some cases, postoperative pathology finds benign solid lung nodules. It is critical to accurately identify solid lung nodules with lobulation and spiculation signs before surgery; however, traditional diagnostic imaging is prone to misdiagnosis, and studies on artificial intelligence-assisted diagnosis are few. Therefore, we introduce a volumetric SWIN Transformer-based method. It is a multi-scale, multi-task, and highly interpretable model for distinguishing between benign solid lung nodules with lobulation and spiculation signs, lung adenocarcinomas, and lung squamous cell carcinoma. The technique's effectiveness was improved by using 3-dimensional (3D) computed tomography (CT) images instead of conventional 2-dimensional (2D) images to combine as much information as possible. The model was trained using 352 of the 441 CT image sequences and validated using the rest. The experimental results showed that our model could accurately differentiate between benign lung nodules with lobulation and spiculation signs, lung adenocarcinoma, and squamous cell carcinoma. On the test set, our model achieves an accuracy of 0.9888, precision of 0.9892, recall of 0.9888, and an F1-score of 0.9888, along with a class activation mapping (CAM) visualization of the 3D model. Consequently, our method could be used as a preoperative tool to assist in diagnosing solitary solid lung nodules with lobulation and spiculation signs accurately and provide a theoretical basis for developing appropriate clinical diagnosis and treatment plans for the patients.
RESUMEN
Lung and colon cancer (LCC) is a dominant life-threatening disease that needs timely attention and precise diagnosis for efficient treatment. The conventional diagnostic techniques for LCC regularly encounter constraints in terms of efficiency and accuracy, thus causing challenges in primary recognition and treatment. Early diagnosis of the disease can immensely reduce the probability of death. In medical practice, the histopathological study of the tissue samples generally uses a classical model. Still, the automated devices that exploit artificial intelligence (AI) techniques produce efficient results in disease diagnosis. In histopathology, both machine learning (ML) and deep learning (DL) approaches can be deployed owing to their latent ability in analyzing and predicting physically accurate molecular phenotypes and microsatellite uncertainty. In this background, this study presents a novel technique called Lung and Colon Cancer using a Swin Transformer with an Ensemble Model on the Histopathological Images (LCCST-EMHI). The proposed LCCST-EMHI method focuses on designing a DL model for the diagnosis and classification of the LCC using histopathological images (HI). In order to achieve this, the LCCST-EMHI model utilizes the bilateral filtering (BF) technique to get rid of the noise. Further, the Swin Transformer (ST) model is also employed for the purpose of feature extraction. For the LCC detection and classification process, an ensemble deep learning classifier is used with three techniques: bidirectional long short-term memory with multi-head attention (BiLSTM-MHA), Double Deep Q-Network (DDQN), and sparse stacked autoencoder (SSAE). Eventually, the hyperparameter selection of the three DL models can be implemented utilizing the walrus optimization algorithm (WaOA) method. In order to illustrate the promising performance of the LCCST-EMHI approach, an extensive range of simulation analyses was conducted on a benchmark dataset. The experimentation results demonstrated the promising performance of the LCCST-EMHI approach over other recent methods.
RESUMEN
Infertility affects a significant number of humans. A supported reproduction technology was verified to ease infertility problems. In vitro fertilization (IVF) is one of the best choices, and its success relies on the preference for a higher-quality embryo for transmission. These have been normally completed physically by testing embryos in a microscope. The traditional morphological calculation of embryos shows predictable disadvantages, including effort- and time-consuming and expected risks of bias related to individual estimations completed by specific embryologists. Different computer vision (CV) and artificial intelligence (AI) techniques and devices have been recently applied in fertility hospitals to improve efficacy. AI addresses the imitation of intellectual performance and the capability of technologies to simulate cognitive learning, thinking, and problem-solving typically related to humans. Deep learning (DL) and machine learning (ML) are advanced AI algorithms in various fields and are considered the main algorithms for future human assistant technology. This study presents an Embryo Development and Morphology Using a Computer Vision-Aided Swin Transformer with a Boosted Dipper-Throated Optimization (EDMCV-STBDTO) technique. The EDMCV-STBDTO technique aims to accurately and efficiently detect embryo development, which is critical for improving fertility treatments and advancing developmental biology using medical CV techniques. Primarily, the EDMCV-STBDTO method performs image preprocessing using a bilateral filter (BF) model to remove the noise. Next, the swin transformer method is implemented for the feature extraction technique. The EDMCV-STBDTO model employs the variational autoencoder (VAE) method to classify human embryo development. Finally, the hyperparameter selection of the VAE method is implemented using the boosted dipper-throated optimization (BDTO) technique. The efficiency of the EDMCV-STBDTO method is validated by comprehensive studies using a benchmark dataset. The experimental result shows that the EDMCV-STBDTO method performs better than the recent techniques.
RESUMEN
Introduction: Early diagnosis of cervical cancer at the precancerous stage is critical for effective treatment and improved patient outcomes. Objective: This study aims to explore the use of SWIN Transformer and Convolutional Neural Network (CNN) hybrid models combined with transfer learning to classify precancerous colposcopy images. Methods: Out of 913 images from 200 cases obtained from the Colposcopy Image Bank of the International Agency for Research on Cancer, 898 met quality standards and were classified as normal, precancerous, or cancerous based on colposcopy and histopathological findings. The cases corresponding to the 360 precancerous images, along with an equal number of normal cases, were divided into a 70/30 train-test split. The SWIN Transformer and CNN hybrid model combines the advantages of local feature extraction by CNNs with the global context modeling by SWIN Transformers, resulting in superior classification performance and a more automated process. The hybrid model approach involves enhancing image quality through preprocessing, extracting local features with CNNs, capturing the global context with the SWIN Transformer, integrating these features for classification, and refining the training process by tuning hyperparameters. Results: The trained model achieved the following classification performances on fivefold cross-validation data: a 94% Area Under the Curve (AUC), an 88% F1 score, and 87% accuracy. On two completely independent test sets, which were never seen by the model during training, the model achieved an 80% AUC, a 75% F1 score, and 75% accuracy on the first test set (precancerous vs. normal) and an 82% AUC, a 78% F1 score, and 75% accuracy on the second test set (cancer vs. normal). Conclusions: These high-performance metrics demonstrate the models' effectiveness in distinguishing precancerous from normal colposcopy images, even with modest datasets, limited data augmentation, and the smaller effect size of precancerous images compared to malignant lesions. The findings suggest that these techniques can significantly aid in the early detection of cervical cancer at the precancerous stage.
RESUMEN
Dental disorders are common worldwide, causing pain or infections and limiting mouth opening, so dental conditions impact productivity, work capability, and quality of life. Manual detection and classification of oral diseases is time-consuming and requires dentists' evaluation and examination. The dental disease detection and classification system based on machine learning and deep learning will aid in early dental disease diagnosis. Hence, this paper proposes a new diagnosis system for dental diseases using X-ray imaging. The framework includes a robust pre-processing phase that uses image normalization and adaptive histogram equalization to improve image quality and reduce variation. A dual-stream approach is used for feature extraction, utilizing the advantages of Swin Transformer for capturing long-range dependencies and global context and MobileNetV2 for effective local feature extraction. A thorough representation of dental anomalies is produced by fusing the extracted features. To obtain reliable and broadly applicable classification results, a bagging ensemble classifier is utilized in the end. We evaluate our model on a benchmark dental radiography dataset. The experimental results and comparisons show the superiority of the proposed system with 95.7% for precision, 95.4% for sensitivity, 95.7% for specificity, 95.5% for Dice similarity coefficient, and 95.6% for accuracy. The results demonstrate the effectiveness of our hybrid model integrating MoileNetv2 and Swin Transformer architectures, outperforming state-of-the-art techniques in classifying dental diseases using dental panoramic X-ray imaging. This framework presents a promising method for robustly and accurately diagnosing dental diseases automatically, which may help dentists plan treatments and identify dental diseases early on.
Asunto(s)
Aprendizaje Profundo , Humanos , Aprendizaje Automático , Enfermedades Estomatognáticas/diagnóstico , Enfermedades Estomatognáticas/diagnóstico por imagen , AlgoritmosRESUMEN
Remote sensing (RS) images contain a wealth of information with expansive potential for applications in image segmentation. However, Convolutional Neural Networks (CNN) face challenges in fully harnessing the global contextual information. Leveraging the formidable capabilities of global information modeling with Swin-Transformer, a novel RS images segmentation model with CNN (GLE-Net) was introduced. This integration gives rise to a revamped encoder structure. The subbranch initiates the process by extracting features at varying scales within the RS images using the Multiscale Feature Fusion Module (MFM), acquiring rich semantic information, discerning localized finer features, and adeptly handling occlusions. Subsequently, Feature Compression Module (FCM) is introduced in main branch to downsize the feature map, effectively reducing information loss while preserving finer details, enhancing segmentation accuracy for smaller targets. Finally, we integrate local features and global features through Spatial Information Enhancement Module (SIEM) for comprehensive feature modeling, augmenting the segmentation capabilities of model. We performed experiments on public datasets provided by ISPRS, yielding notably remarkable experimental outcomes. This underscores the substantial potential of our model in the realm of RS image segmentation within the context of scientific research.
RESUMEN
Transformers have dominated the landscape of Natural Language Processing (NLP) and revolutionalized generative AI applications. Vision Transformers (VT) have recently become a new state-of-the-art for computer vision applications. Motivated by the success of VTs in capturing short and long-range dependencies and their ability to handle class imbalance, this paper proposes an ensemble framework of VTs for the efficient classification of Alzheimer's Disease (AD). The framework consists of four vanilla VTs, and ensembles formed using hard and soft-voting approaches. The proposed model was tested using two popular AD datasets: OASIS and ADNI. The ADNI dataset was employed to assess the models' efficacy under imbalanced and data-scarce conditions. The ensemble of VT saw an improvement of around 2% compared to individual models. Furthermore, the results are compared with state-of-the-art and custom-built Convolutional Neural Network (CNN) architectures and Machine Learning (ML) models under varying data conditions. The experimental results demonstrated an overall performance gain of 4.14% and 4.72% accuracy over the ML and CNN algorithms, respectively. The study has also identified specific limitations and proposes avenues for future research. The codes used in the study are made publicly available.
RESUMEN
Accurate fruit detection is crucial for automated fruit picking. However, real-world scenarios, influenced by complex environmental factors such as illumination variations, occlusion, and overlap, pose significant challenges to accurate fruit detection. These challenges subsequently impact the commercialization of fruit harvesting robots. A tomato detection model named YOLO-SwinTF, based on YOLOv7, is proposed to address these challenges. Integrating Swin Transformer (ST) blocks into the backbone network enables the model to capture global information by modeling long-range visual dependencies. Trident Pyramid Networks (TPN) are introduced to overcome the limitations of PANet's focus on communication-based processing. TPN incorporates multiple self-processing (SP) modules within existing top-down and bottom-up architectures, allowing feature maps to generate new findings for communication. In addition, Focaler-IoU is introduced to reconstruct the original intersection-over-union (IoU) loss to allow the loss function to adjust its focus based on the distribution of difficult and easy samples. The proposed model is evaluated on a tomato dataset, and the experimental results demonstrated that the proposed model's detection recall, precision, F1 score, and AP reach 96.27%, 96.17%, 96.22%, and 98.67%, respectively. These represent improvements of 1.64%, 0.92%, 1.28%, and 0.88% compared to the original YOLOv7 model. When compared to other state-of-the-art detection methods, this approach achieves superior performance in terms of accuracy while maintaining comparable detection speed. In addition, the proposed model exhibits strong robustness under various lighting and occlusion conditions, demonstrating its significant potential in tomato detection.
RESUMEN
Background: Diabetic peripheral neuropathy (DPN) is common and can go unnoticed until it is firmly developed. This study aims to establish a transformer-based deep learning algorithm (DLA) to classify corneal confocal microscopy (CCM) images, identifying DPN in diabetic patients. Methods: Our classification model differs from traditional convolutional neural networks (CNNs) using a Swin transformer network with a hierarchical architecture backbone. Participants included those with (DPN+, n = 57) or without (DPN-, n = 37) DPN as determined by the updated Toronto consensus criteria. The CCM image dataset (consisting of 570 DPN+ and 370 DPN- images, with five images selected from each participant's left and right eyes) was randomly divided into training, validation, and test subsets at a 7:1:2 ratio, considering individual participants. The effectiveness of the algorithm was assessed using diagnostic accuracy measures, such as sensitivity, specificity, and accuracy, in conjunction with Grad-CAM visualization techniques to interpret the model's decisions. Results: In the DPN + group (n = 12), the transformer model successfully predicted all participants, while in the DPN- group (n = 7), one participant was misclassified as DPN+, with an area under the curve (AUC) of 0.9405 (95% CI 0.8166, 1.0000). Among the DPN + images (n = 120), 117 were correctly classified, and among the DPN- images (n = 70), 49 were correctly classified, with an AUC of 0.8996 (95% CI 0.8502, 0.9491). For single-image predictions, the transformer model achieved a superior AUC relative to the ResNet50 model (0.8761, 95% CI 0.8155, 0.9366), the Inception_v3 model (0.8802, 95% CI 0.8231, 0.9374), and the DenseNet121 model (0.8965, 95% CI 0.8438, 0.9491). Conclusion: Transformer-based networks outperform CNN-based networks in rapid binary DPN classification. Transformer-based DLAs have clinical DPN screening potential.
RESUMEN
Accurate recognition of nutritional components in food is crucial for dietary management and health monitoring. Current methods often rely on traditional chemical analysis techniques, which are time-consuming, require destructive sampling, and are not suitable for large-scale or real-time applications. Therefore, there is a pressing need for efficient, non-destructive, and accurate methods to identify and quantify nutrients in food. In this study, we propose a novel deep learning model that integrates EfficientNet, Swin Transformer, and Feature Pyramid Network (FPN) to enhance the accuracy and efficiency of food nutrient recognition. Our model combines the strengths of EfficientNet for feature extraction, Swin Transformer for capturing long-range dependencies, and FPN for multi-scale feature fusion. Experimental results demonstrate that our model significantly outperforms existing methods. On the Nutrition5k dataset, it achieves a Top-1 accuracy of 79.50% and a Mean Absolute Percentage Error (MAPE) for calorie prediction of 14.72%. On the ChinaMartFood109 dataset, the model achieves a Top-1 accuracy of 80.25% and a calorie MAPE of 15.21%. These results highlight the model's robustness and adaptability across diverse food images, providing a reliable and efficient tool for rapid, non-destructive nutrient detection. This advancement supports better dietary management and enhances the understanding of food nutrition, potentially leading to more effective health monitoring applications.
RESUMEN
Introduction: Accurately recognizing and understanding human motion actions presents a key challenge in the development of intelligent sports robots. Traditional methods often encounter significant drawbacks, such as high computational resource requirements and suboptimal real-time performance. To address these limitations, this study proposes a novel approach called Sports-ACtrans Net. Methods: In this approach, the Swin Transformer processes visual data to extract spatial features, while the Spatio-Temporal Graph Convolutional Network (ST-GCN) models human motion as graphs to handle skeleton data. By combining these outputs, a comprehensive representation of motion actions is created. Reinforcement learning is employed to optimize the action recognition process, framing it as a sequential decision-making problem. Deep Q-learning is utilized to learn the optimal policy, thereby enhancing the robot's ability to accurately recognize and engage in motion. Results and discussion: Experiments demonstrate significant improvements over state-of-the-art methods. This research advances the fields of neural computation, computer vision, and neuroscience, aiding in the development of intelligent robotic systems capable of understanding and participating in sports activities.
RESUMEN
Objective: This study aims to develop and validate SwinHS, a deep learning-based automatic segmentation model designed for precise hippocampus delineation in patients receiving hippocampus-protected whole-brain radiotherapy. By streamlining this process, we seek to significantly improve workflow efficiency for clinicians. Methods: A total of 100 three-dimensional T1-weighted MR images were collected, with 70 patients allocated for training and 30 for testing. Manual delineation of the hippocampus was performed according to RTOG0933 guidelines. The SwinHS model, which incorporates a 3D ELSA Transformer module and an sSE CNN decoder, was trained and tested on these datasets. To prove the effectiveness of SwinHS, this study compared the segmentation performance of SwinHS with that of V-Net, U-Net, ResNet and VIT. Evaluation metrics included the Dice similarity coefficient (DSC), Jaccard similarity coefficient (JSC), and Hausdorff distance (HD). Dosimetric evaluation compared radiotherapy plans generated using automatic segmentation (plan AD) versus manual hippocampus segmentation (plan MD). Results: SwinHS outperformed four advanced deep learning-based models, achieving an average DSC of 0.894, a JSC of 0.817, and an HD of 3.430 mm. Dosimetric evaluation revealed that both plan (AD) and plan (MD) met treatment plan constraints for the target volume (PTV). However, the hippocampal Dmax in plan (AD) was significantly greater than that in plan (MD), approaching the 17 Gy constraint limit. Nonetheless, there were no significant differences in D100% or maximum doses to other critical structures between the two plans. Conclusion: Compared with manual delineation, SwinHS demonstrated superior segmentation performance and a significantly shorter delineation time. While plan (AD) met clinical requirements, caution should be exercised regarding hippocampal Dmax. SwinHS offers a promising tool to enhance workflow efficiency and facilitate hippocampal protection in radiotherapy planning for patients with brain metastases.
RESUMEN
Colorectal cancer ranks as the second most prevalent cancer worldwide, with a high mortality rate. Colonoscopy stands as the preferred procedure for diagnosing colorectal cancer. Detecting polyps at an early stage is critical for effective prevention and diagnosis. However, challenges in colonoscopic procedures often lead medical practitioners to seek support from alternative techniques for timely polyp identification. Polyp segmentation emerges as a promising approach to identify polyps in colonoscopy images. In this paper, we propose an advanced method, PolySegNet, that leverages both Vision Transformer and Swin Transformer, coupled with a Convolutional Neural Network (CNN) decoder. The fusion of these models facilitates a comprehensive analysis of various modules in our proposed architecture.To assess the performance of PolySegNet, we evaluate it on three colonoscopy datasets, a combined dataset, and their augmented versions. The experimental results demonstrate that PolySegNet achieves competitive results in terms of polyp segmentation accuracy and efficacy, achieving a mean Dice score of 0.92 and a mean Intersection over Union (IoU) of 0.86. These metrics highlight the superior performance of PolySegNet in accurately delineating polyp boundaries compared to existing methods. PolySegNet has shown great promise in accurately and efficiently segmenting polyps in medical images. The proposed method could be the foundation for a new class of transformer-based segmentation models in medical image analysis.
RESUMEN
Skin cancer is one of the top three hazardous cancer types, and it is caused by the abnormal proliferation of tumor cells. Diagnosing skin cancer accurately and early is crucial for saving patients' lives. However, it is a challenging task due to various significant issues, including lesion variations in texture, shape, color, and size; artifacts (hairs); uneven lesion boundaries; and poor contrast. To solve these issues, this research proposes a novel Convolutional Swin Transformer (CSwinformer) method for segmenting and classifying skin lesions accurately. The framework involves phases such as data preprocessing, segmentation, and classification. In the first phase, Gaussian filtering, Z-score normalization, and augmentation processes are executed to remove unnecessary noise, re-organize the data, and increase data diversity. In the phase of segmentation, we design a new model "Swinformer-Net" integrating Swin Transformer and U-Net frameworks, to accurately define a region of interest. At the final phase of classification, the segmented outcome is input into the newly proposed module "Multi-Scale Dilated Convolutional Neural Network meets Transformer (MD-CNNFormer)," where the data samples are classified into respective classes. We use four benchmark datasets-HAM10000, ISBI 2016, PH2, and Skin Cancer ISIC for evaluation. The results demonstrated the designed framework's better efficiency against the traditional approaches. The proposed method provided classification accuracy of 98.72%, pixel accuracy of 98.06%, and dice coefficient of 97.67%, respectively. The proposed method offered a promising solution in skin lesion segmentation and classification, supporting clinicians to accurately diagnose skin cancer.
RESUMEN
BACKGROUND AND OBJECTIVE: Transcranial focused ultrasound (tFUS) is an emerging non-invasive therapeutic technology that offers new brain stimulation modality. Precise localization of the acoustic focus to the desired brain target throughout the procedure is needed to ensure the safety and effectiveness of the treatment, but acoustic distortion caused by the skull poses a challenge. Although computational methods can provide the estimated location and shape of the focus, the computation has not reached sufficient speed for real-time inference, which is demanded in real-world clinical situations. Leveraging the advantages of deep learning, we propose multi-modal networks capable of generating intracranial pressure map in real-time. METHODS: The dataset consisted of free-field pressure maps, intracranial pressure maps, medical images, and transducer placements was obtained from 11 human subjects. The free-field and intracranial pressure maps were computed using the k-space method. We developed network models based on convolutional neural networks and the Swin Transformer, featuring a multi-modal encoder and a decoder. RESULTS: Evaluations on foreseen data achieved high focal volume conformity of approximately 93% for both computed tomography (CT) and magnetic resonance (MR) data. For unforeseen data, the networks achieved the focal volume conformity of 88% for CT and 82% for MR. The inference time of the proposed networks was under 0.02 s, indicating the feasibility for real-time simulation. CONCLUSIONS: The results indicate that our networks can effectively and precisely perform real-time simulation of the intracranial pressure map during tFUS applications. Our work will enhance the safety and accuracy of treatments, representing significant progress for low-intensity focused ultrasound (LIFU) therapies.
RESUMEN
BACKGROUND: The identification of infection in diabetic foot ulcers (DFUs) is challenging due to variability within classes, visual similarity between classes, reduced contrast with healthy skin, and presence of artifacts. Existing studies focus on visual characteristics and tissue classification rather than infection detection, critical for assessing DFUs and predicting amputation risk. OBJECTIVE: To address these challenges, this study proposes a deep learning model using a hybrid CNN and Swin Transformer architecture for infection classification in DFU images. The aim is to leverage end-to-end mapping without prior knowledge, integrating local and global feature extraction to improve detection accuracy. METHODS: The proposed model utilizes a hybrid CNN and Swin Transformer architecture. It employs the Grad CAM technique to visualize the decision-making process of the CNN and Transformer blocks. The DFUC Challenge dataset is used for training and evaluation, emphasizing the model's ability to accurately classify DFU images into infected and non-infected categories. RESULTS: The model achieves high performance metrics: sensitivity (95.98%), specificity (97.08%), accuracy (96.52%), and Matthews Correlation Coefficient (0.93). These results indicate the model's effectiveness in quickly diagnosing DFU infections, highlighting its potential as a valuable tool for medical professionals. CONCLUSION: The hybrid CNN and Swin Transformer architecture effectively combines strengths from both models, enabling accurate classification of DFU images as infected or non-infected, even in complex scenarios. The use of Grad CAM provides insights into the model's decision process, aiding in identifying infected regions within DFU images. This approach shows promise for enhancing clinical assessment and management of DFU infections.
RESUMEN
Background: Cotton pests have a major impact on cotton quality and yield during cotton production and cultivation. With the rapid development of agricultural intelligence, the accurate classification of cotton pests is a key factor in realizing the precise application of medicines by utilize unmanned aerial vehicles (UAVs), large application devices and other equipment. Methods: In this study, a cotton insect pest classification model based on improved Swin Transformer is proposed. The model introduces the residual module, skip connection, into Swin Transformer to improve the problem that pest features are easily confused in complex backgrounds leading to poor classification accuracy, and to enhance the recognition of cotton pests. In this study, 2705 leaf images of cotton insect pests (including three insect pests, cotton aphids, cotton mirids and cotton leaf mites) were collected in the field, and after image preprocessing and data augmentation operations, model training was performed. Results: The test results proved that the accuracy of the improved model compared to the original model increased from 94.6% to 97.4%, and the prediction time for a single image was 0.00434s. The improved Swin Transformer model was compared with seven kinds of classification models (VGG11, VGG11-bn, Resnet18, MobilenetV2, VIT, Swin Transformer small, and Swin Transformer base), and the model accuracy was increased respectively by 0.5%, 4.7%, 2.2%, 2.5%, 6.3%, 7.9%, 8.0%. Discussion: Therefore, this study demonstrates that the improved Swin Transformer model significantly improves the accuracy and efficiency of cotton pest detection compared with other classification models, and can be deployed on edge devices such as utilize unmanned aerial vehicles (UAVs), thus providing an important technological support and theoretical basis for cotton pest control and precision drug application.
RESUMEN
Skin tumors, especially melanoma, which is highly aggressive and progresses quickly to other sites, are an issue in various parts of the world. Nevertheless, the one and only way to save lives is to detect it at its initial stages. This study explores the application of advanced deep learning models for classifying benign and malignant melanoma using dermoscopic images. The aim of the study is to enhance the accuracy and efficiency of melanoma diagnosis with the ConvNeXt, Vision Transformer (ViT) Base-16, and Swin Transformer V2 Small (Swin V2 S) deep learning models. The ConvNeXt model, which integrates principles of both convolutional neural networks and transformers, demonstrated superior performance, with balanced precision and recall metrics. The dataset, sourced from Kaggle, comprises 13,900 uniformly sized images, preprocessed to standardize the inputs for the models. Experimental results revealed that ConvNeXt achieved the highest diagnostic accuracy among the tested models. Experimental results revealed that ConvNeXt achieved an accuracy of 91.5%, with balanced precision and recall rates of 90.45% and 92.8% for benign cases, and 92.61% and 90.2% for malignant cases, respectively. The F1-scores for ConvNeXt were 91.61% for benign cases and 91.39% for malignant cases. This research points out the potential of hybrid deep learning architectures in medical image analysis, particularly for early melanoma detection.