Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 353
Filtrar
1.
Microsc Res Tech ; 2024 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-39351968

RESUMO

Lymph-node status is important in decision-making during early gastric cancer (EGC) treatment. Currently, endoscopic submucosal dissection is the mainstream treatment for EGC. However, it is challenging for even experienced endoscopists to accurately diagnose and treat EGC. Multiphoton microscopy can extract the morphological features of collagen fibers from tissues. The characteristics of collagen fibers can be used to assess the lymph-node metastasis status in patients with EGC. First, we compared the accuracy of four deep learning models (VGG16, ResNet34, MobileNetV2, and PVTv2) in training preprocessed images and test datasets. Next, we integrated the features of the best-performing model, which was PVTv2, with manual and clinical features to develop a novel model called AutoLNMNet. The prediction accuracy of AutoLNMNet for the no metastasis (Ly0) and metastasis in lymph nodes (Ly1) stages reached 0.92, which was 0.3% higher than that of PVTv2. The receiver operating characteristics of AutoLNMNet in quantifying Ly0 and Ly1 stages were 0.97 and 0.97, respectively. Therefore, AutoLNMNet is highly reliable and accurate in detecting lymph-node metastasis, providing an important tool for the early diagnosis and treatment of EGC.

2.
Med Image Anal ; 99: 103356, 2024 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-39378568

RESUMO

Breast cancer is a significant global public health concern, with various treatment options available based on tumor characteristics. Pathological examination of excision specimens after surgery provides essential information for treatment decisions. However, the manual selection of representative sections for histological examination is laborious and subjective, leading to potential sampling errors and variability, especially in carcinomas that have been previously treated with chemotherapy. Furthermore, the accurate identification of residual tumors presents significant challenges, emphasizing the need for systematic or assisted methods to address this issue. In order to enable the development of deep-learning algorithms for automated cancer detection on radiology images, it is crucial to perform radiology-pathology registration, which ensures the generation of accurately labeled ground truth data. The alignment of radiology and histopathology images plays a critical role in establishing reliable cancer labels for training deep-learning algorithms on radiology images. However, aligning these images is challenging due to their content and resolution differences, tissue deformation, artifacts, and imprecise correspondence. We present a novel deep learning-based pipeline for the affine registration of faxitron images, the x-ray representations of macrosections of ex-vivo breast tissue, and their corresponding histopathology images of tissue segments. The proposed model combines convolutional neural networks and vision transformers, allowing it to effectively capture both local and global information from the entire tissue macrosection as well as its segments. This integrated approach enables simultaneous registration and stitching of image segments, facilitating segment-to-macrosection registration through a puzzling-based mechanism. To address the limitations of multi-modal ground truth data, we tackle the problem by training the model using synthetic mono-modal data in a weakly supervised manner. The trained model demonstrated successful performance in multi-modal registration, yielding registration results with an average landmark error of 1.51 mm (±2.40), and stitching distance of 1.15 mm (±0.94). The results indicate that the model performs significantly better than existing baselines, including both deep learning-based and iterative models, and it is also approximately 200 times faster than the iterative approach. This work bridges the gap in the current research and clinical workflow and has the potential to improve efficiency and accuracy in breast cancer evaluation and streamline pathology workflow.

3.
BMC Med Inform Decis Mak ; 24(1): 288, 2024 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-39375719

RESUMO

BACKGROUND: Histopathology is a gold standard for cancer diagnosis. It involves extracting tissue specimens from suspicious areas to prepare a glass slide for a microscopic examination. However, histological tissue processing procedures result in the introduction of artifacts, which are ultimately transferred to the digitized version of glass slides, known as whole slide images (WSIs). Artifacts are diagnostically irrelevant areas and may result in wrong predictions from deep learning (DL) algorithms. Therefore, detecting and excluding artifacts in the computational pathology (CPATH) system is essential for reliable automated diagnosis. METHODS: In this paper, we propose a mixture of experts (MoE) scheme for detecting five notable artifacts, including damaged tissue, blur, folded tissue, air bubbles, and histologically irrelevant blood from WSIs. First, we train independent binary DL models as experts to capture particular artifact morphology. Then, we ensemble their predictions using a fusion mechanism. We apply probabilistic thresholding over the final probability distribution to improve the sensitivity of the MoE. We developed four DL pipelines to evaluate computational and performance trade-offs. These include two MoEs and two multiclass models of state-of-the-art deep convolutional neural networks (DCNNs) and vision transformers (ViTs). These DL pipelines are quantitatively and qualitatively evaluated on external and out-of-distribution (OoD) data to assess generalizability and robustness for artifact detection application. RESULTS: We extensively evaluated the proposed MoE and multiclass models. DCNNs-based MoE and ViTs-based MoE schemes outperformed simpler multiclass models and were tested on datasets from different hospitals and cancer types, where MoE using (MobileNet) DCNNs yielded the best results. The proposed MoE yields 86.15 % F1 and 97.93% sensitivity scores on unseen data, retaining less computational cost for inference than MoE using ViTs. This best performance of MoEs comes with relatively higher computational trade-offs than multiclass models. Furthermore, we apply post-processing to create an artifact segmentation mask, a potential artifact-free RoI map, a quality report, and an artifact-refined WSI for further computational analysis. During the qualitative evaluation, field experts assessed the predictive performance of MoEs over OoD WSIs. They rated artifact detection and artifact-free area preservation, where the highest agreement translated to a Cohen Kappa of 0.82, indicating substantial agreement for the overall diagnostic usability of the DCNN-based MoE scheme. CONCLUSIONS: The proposed artifact detection pipeline will not only ensure reliable CPATH predictions but may also provide quality control. In this work, the best-performing pipeline for artifact detection is MoE with DCNNs. Our detailed experiments show that there is always a trade-off between performance and computational complexity, and no straightforward DL solution equally suits all types of data and applications. The code and HistoArtifacts dataset can be found online at Github and Zenodo , respectively.


Assuntos
Artefatos , Aprendizado Profundo , Humanos , Neoplasias , Processamento de Imagem Assistida por Computador/métodos , Patologia Clínica/normas , Interpretação de Imagem Assistida por Computador/métodos
4.
Front Comput Neurosci ; 18: 1404623, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39380741

RESUMO

Introduction: With the great success of Transformers in the field of machine learning, it is also gradually attracting widespread interest in the field of remote sensing (RS). However, the research in the field of remote sensing has been hampered by the lack of large labeled data sets and the inconsistency of data modes caused by the diversity of RS platforms. With the rise of self-supervised learning (SSL) algorithms in recent years, RS researchers began to pay attention to the application of "pre-training and fine-tuning" paradigm in RS. However, there are few researches on multi-modal data fusion in remote sensing field. Most of them choose to use only one of the modal data or simply splice multiple modal data roughly. Method: In order to study a more efficient multi-modal data fusion scheme, we propose a multi-modal fusion mechanism based on gated unit control (MGSViT). In this paper, we pretrain the ViT model based on BigEarthNet dataset by combining two commonly used SSL algorithms, and propose an intra-modal and inter-modal gated fusion unit for feature learning by combining multispectral (MS) and synthetic aperture radar (SAR). Our method can effectively combine different modal data to extract key feature information. Results and discussion: After fine-tuning and comparison experiments, we outperform the most advanced algorithms in all downstream classification tasks. The validity of our proposed method is verified.

5.
J Food Sci ; 2024 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-39385405

RESUMO

Pinelliae Rhizoma is a key ingredient in botanical supplements and is often adulterated by Rhizoma Pinelliae Pedatisectae, which is similar in appearance but less expensive. Accurate identification of these materials is crucial for both scientific and commercial purposes. Traditional morphological identification relies heavily on expert experience and is subjective, while chemical analysis and molecular biological identification are typically time consuming and labor intensive. This study aims to employ a simpler, faster, and non-invasive image recognition technique to distinguish between these two highly similar plant materials. In the realm of image recognition, we aimed to utilize the vision transformer (ViT) algorithm, a cutting-edge image recognition technology, to differentiate these materials. All samples were verified using DNA molecular identification before image analysis. The result demonstrates that the ViT algorithm achieves a classification accuracy exceeding 94%, significantly outperforming the convolutional neural network model's 60%-70% accuracy. This highlights the efficiency of this technology in identifying plant materials with similar appearances. This study marks the pioneer work of the ViT algorithm to such a challenging task, showcasing its potential for precise botanical material identification and setting the stage for future advancements in the field.

6.
Brain Inform ; 11(1): 25, 2024 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-39363122

RESUMO

Transformers have dominated the landscape of Natural Language Processing (NLP) and revolutionalized generative AI applications. Vision Transformers (VT) have recently become a new state-of-the-art for computer vision applications. Motivated by the success of VTs in capturing short and long-range dependencies and their ability to handle class imbalance, this paper proposes an ensemble framework of VTs for the efficient classification of Alzheimer's Disease (AD). The framework consists of four vanilla VTs, and ensembles formed using hard and soft-voting approaches. The proposed model was tested using two popular AD datasets: OASIS and ADNI. The ADNI dataset was employed to assess the models' efficacy under imbalanced and data-scarce conditions. The ensemble of VT saw an improvement of around 2% compared to individual models. Furthermore, the results are compared with state-of-the-art and custom-built Convolutional Neural Network (CNN) architectures and Machine Learning (ML) models under varying data conditions. The experimental results demonstrated an overall performance gain of 4.14% and 4.72% accuracy over the ML and CNN algorithms, respectively. The study has also identified specific limitations and proposes avenues for future research. The codes used in the study are made publicly available.

7.
Comput Methods Programs Biomed ; 257: 108464, 2024 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-39447437

RESUMO

BACKGROUND AND OBJECTIVE: Attaining global context along with local dependencies is of paramount importance for achieving highly accurate segmentation of objects from image frames and is challenging while developing deep learning-based biomedical image segmentation. Several transformer-based models have been proposed to handle this issue in biomedical image segmentation. Despite this, segmentation accuracy remains an ongoing challenge, as these models often fall short of the target range due to their limited capacity to capture critical local and global contexts. However, the quadratic computational complexity is the main limitation of these models. Moreover, a large dataset is required to train those models. METHODS: In this paper, we propose a novel multi-scale dual-channel decoder to mitigate this issue. The complete segmentation model uses two parallel encoders and a dual-channel decoder. The encoders are based on convolutional networks, which capture the features of the input images at multiple levels and scales. The decoder comprises a hierarchy of Attention-gated Swin Transformers with a fine-tuning strategy. The hierarchical Attention-gated Swin Transformers implements a multi-scale, multi-level feature embedding strategy that captures short and long-range dependencies and leverages the necessary features without increasing computational load. At the final stage of the decoder, a fine-tuning strategy is implemented that refines the features to keep the rich features and reduce the possibility of over-segmentation. RESULTS: The proposed model is evaluated on publicly available LiTS, 3DIRCADb, and spleen datasets obtained from Medical Segmentation Decathlon. The model is also evaluated on a private dataset from Medical College Kolkata, India. We observe that the proposed model outperforms the state-of-the-art models in liver tumor and spleen segmentation in terms of evaluation metrics at a comparative computational cost. CONCLUSION: The novel dual-channel decoder embeds multi-scale features and creates a representation of both short and long-range contexts efficiently. It also refines the features at the final stage to select only necessary features. As a result, we achieve better segmentation performance than the state-of-the-art models.

8.
Comput Methods Programs Biomed ; 257: 108455, 2024 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-39447439

RESUMO

BACKGROUND AND OBJECTIVE: Sudden cardiac death (SCD) is a critical health issue characterized by the sudden failure of heart function, often caused by ventricular fibrillation (VF). Early prediction of SCD is crucial to enable timely interventions. However, current methods predict SCD only a few minutes before its onset, limiting intervention time. This study aims to develop a deep learning-based model for the early prediction of SCD using electrocardiography (ECG) signals. METHODS: A multimodal explainable deep learning-based model is developed to analyze ECG signals at discrete intervals ranging from 5 to 30 min before SCD onset. The raw ECG signals, 2D scalograms generated through wavelet transform and 2D Hilbert spectrum generated through Hilbert-Huang transform (HHT) of ECG signals were applied to multiple deep learning algorithms. For raw ECG, a combination of 1D-convolutional neural networks (1D-CNN) and long short-term memory networks were employed for feature extraction and temporal pattern recognition. Besides, to extract and analyze features from scalograms and Hilbert spectra, Vision Transformer (ViT) and 2D-CNN have been used. RESULTS: The developed model achieved high performance, with accuracy, precision, recall and F1-score of 98.81%, 98.83%, 98.81%, and 98.81% respectively to predict SCD onset 30 min in advance. Further, the proposed model can accurately classify SCD patients and normal controls with 100% accuracy. Thus, the proposed method outperforms the existing state-of-the-art methods. CONCLUSIONS: The developed model is capable of capturing diverse patterns on ECG signals recorded at multiple discrete time intervals (at 5-minute increments from 5 min to 30 min) prior to SCD onset that could discriminate for SCD. The proposed model significantly improves early SCD prediction, providing a valuable tool for continuous ECG monitoring in high-risk patients.

9.
Med Biol Eng Comput ; 2024 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-39453557

RESUMO

Deep neural networks (DNNs) have demonstrated exceptional performance in medical image analysis. However, recent studies have uncovered significant vulnerabilities in DNN models, particularly their susceptibility to adversarial attacks that manipulate these models into making inaccurate predictions. Vision Transformers (ViTs), despite their advanced capabilities in medical imaging tasks, have not been thoroughly evaluated for their robustness against such attacks in this domain. This study addresses this research gap by conducting an extensive analysis of various adversarial attacks on ViTs specifically within medical imaging contexts. We explore adversarial training as a potential defense mechanism and assess the resilience of ViT models against state-of-the-art adversarial attacks and defense strategies using publicly available benchmark medical image datasets. Our findings reveal that ViTs are vulnerable to adversarial attacks even with minimal perturbations, although adversarial training significantly enhances their robustness, achieving over 80% classification accuracy. Additionally, we perform a comparative analysis with state-of-the-art convolutional neural network models, highlighting the unique strengths and weaknesses of ViTs in handling adversarial threats. This research advances the understanding of ViTs robustness in medical imaging and provides insights into their practical deployment in real-world scenarios.

10.
J Imaging Inform Med ; 2024 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-39455543

RESUMO

Bladder cancer, often asymptomatic in the early stages, is a type of cancer where early detection is crucial. Herein, endoscopic images are meticulously evaluated by experts, and sometimes even by different disciplines, to identify tissue types. It is believed that the time spent by experts can be utilized for patient treatment with the creation of a computer-aided decision support system. For this purpose, in this study, it is evaluated that the performances of three models proposed using the bladder tissue dataset. The first model is a convolutional neural network (CNN)-based deep learning (DL) network, and the second is a model named hybrid cnn-machine learning (ML) or DL + ML, which involves classifying deep features obtained from a CNN-based network with ML. The last one, and the one that achieved the best performance metrics, is a vision transformer (ViT) architecture. Furthermore, a graphical user interface (GUI) is provided for an accessible decision support system. As a result, accuracy and F1 score values for DL, DL + ML, and ViT models are 0.9086-0.8971-0.9257 and 0.8884-0.8496-0.8931, respectively.

11.
Front Neurorobot ; 18: 1453571, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39463860

RESUMO

Introduction: Assistive robots and human-robot interaction have become integral parts of sports training. However, existing methods often fail to provide real-time and accurate feedback, and they often lack integration of comprehensive multi-modal data. Methods: To address these issues, we propose a groundbreaking and innovative approach: CAM-Vtrans-Cross-Attention Multi-modal Visual Transformer. By leveraging the strengths of state-of-the-art techniques such as Visual Transformers (ViT) and models like CLIP, along with cross-attention mechanisms, CAM-Vtrans harnesses the power of visual and textual information to provide athletes with highly accurate and timely feedback. Through the utilization of multi-modal robot data, CAM-Vtrans offers valuable assistance, enabling athletes to optimize their performance while minimizing potential injury risks. This novel approach represents a significant advancement in the field, offering an innovative solution to overcome the limitations of existing methods and enhance the precision and efficiency of sports training programs.

12.
Biomed Eng Lett ; 14(6): 1421-1431, 2024 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-39465118

RESUMO

Colorectal cancer ranks as the second most prevalent cancer worldwide, with a high mortality rate. Colonoscopy stands as the preferred procedure for diagnosing colorectal cancer. Detecting polyps at an early stage is critical for effective prevention and diagnosis. However, challenges in colonoscopic procedures often lead medical practitioners to seek support from alternative techniques for timely polyp identification. Polyp segmentation emerges as a promising approach to identify polyps in colonoscopy images. In this paper, we propose an advanced method, PolySegNet, that leverages both Vision Transformer and Swin Transformer, coupled with a Convolutional Neural Network (CNN) decoder. The fusion of these models facilitates a comprehensive analysis of various modules in our proposed architecture.To assess the performance of PolySegNet, we evaluate it on three colonoscopy datasets, a combined dataset, and their augmented versions. The experimental results demonstrate that PolySegNet achieves competitive results in terms of polyp segmentation accuracy and efficacy, achieving a mean Dice score of 0.92 and a mean Intersection over Union (IoU) of 0.86. These metrics highlight the superior performance of PolySegNet in accurately delineating polyp boundaries compared to existing methods. PolySegNet has shown great promise in accurately and efficiently segmenting polyps in medical images. The proposed method could be the foundation for a new class of transformer-based segmentation models in medical image analysis.

13.
Sci Rep ; 14(1): 23879, 2024 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-39396096

RESUMO

Hyperspectral image (HSI) data has a wide range of valuable spectral information for numerous tasks. HSI data encounters challenges such as small training samples, scarcity, and redundant information. Researchers have introduced various research works to address these challenges. Convolution Neural Network (CNN) has gained significant success in the field of HSI classification. CNN's primary focus is to extract low-level features from HSI data, and it has a limited ability to detect long-range dependencies due to the confined filter size. In contrast, vision transformers exhibit great success in the HSI classification field due to the use of attention mechanisms to learn the long-range dependencies. As mentioned earlier, the primary issue with these models is that they require sufficient labeled training data. To address this challenge, we proposed a spectral-spatial feature extractor group attention transformer that consists of a multiscale feature extractor to extract low-level or shallow features. For high-level semantic feature extraction, we proposed a group attention mechanism. Our proposed model is evaluated using four publicly available HSI datasets, which are Indian Pines, Pavia University, Salinas, and the KSC dataset. Our proposed approach achieved the best classification results in terms of overall accuracy (OA), average accuracy (AA), and Kappa coefficient. As mentioned earlier, the proposed approach utilized only 5%, 1%, 1%, and 10% of the training samples from the publicly available four datasets.

14.
Neural Netw ; 181: 106749, 2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39357266

RESUMO

Unsupervised Domain Adaptation aims to leverage a source domain with ample labeled data to tackle tasks on an unlabeled target domain. However, this poses a significant challenge, particularly in scenarios exhibiting significant disparities between the two domains. Prior methods often fall short in challenging domains due to the impact of incorrect pseudo-labeling noise and the limits of handcrafted domain alignment rules. In this paper, we propose a novel method called DCST (Dual Cross-Supervision Transformer), which improves upon existing methods in two key aspects. Firstly, vision transformer is combined with dual cross-supervision learning strategy to enforce consistency learning from different domains. The network accomplishes domain-specific self-training and cross-domain feature alignment in an adaptive manner. Secondly, due to the presence of noise in challenging domain, and the need to reduce the risks of model collapse and overfitting, we propose a Domain Shift Filter. Specifically, this module allows the model to leverage the memory of source domain features to facilitate a smooth transition. It can also improve the effectiveness of knowledge transfer between domains with significant gaps. We conduct extensive experiments on four benchmark datasets and achieved the best classification results, including 94.3% on Office-31, 86.0% on Office-Home, 89.3% on VisDA-2017, and 48.8% on DomainNet. Code is available in https://github.com/Yislight/DCST.

15.
Curr Med Imaging ; 2024 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-39360542

RESUMO

INTRODUCTION: In this study, we harnessed three cutting-edge algorithms' capabilities to refine the elbow fracture prediction process through X-ray image analysis. Employing the YOLOv8 (You only look once) algorithm, we first identified Regions of Interest (ROI) within the X-ray images, significantly augmenting fracture prediction accuracy. METHODS: Subsequently, we integrated and compared the ResNet, the SeResNet (Squeeze-and-Excitation Residual Network) ViT (Vision Transformer) algorithms to refine our predictive capabilities. Furthermore, to ensure optimal precision, we implemented a series of meticulous refinements. This included recalibrating ROI regions to enable finer-grained identification of diagnostically significant areas within the X-ray images. Additionally, advanced image enhancement techniques were applied to optimize the X-ray images' visual quality and structural clarity. RESULTS: These methodological enhancements synergistically contributed to a substantial improvement in the overall accuracy of our fracture predictions. The dataset utilized for training, testing & validation, and comprehensive evaluation exclusively comprised elbow X-ray images, where predicting the fracture with three algorithms: Resnet50; accuracy 0.97, precision 1, recall 0.95, SeResnet50; accuracy 0.97, precision 1, recall 0.95 & ViTB- 16 with high accuracy of 0.99, precision same as the other two algorithms, with a recall of 0.95. CONCLUSION: This approach has the potential to increase the precision of diagnoses, lessen the burden of radiologists, easily integrate into current medical imaging systems, and assist clinical decision-making, all of which could lead to better patient care and health outcomes overall.

16.
Med Phys ; 2024 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-39395206

RESUMO

BACKGROUND: Although the uterus, bladder, and rectum are distinct organs, their muscular fasciae are often interconnected. Clinical experience suggests that they may share common risk factors and associations. When one organ experiences prolapse, it can potentially affect the neighboring organs. However, the current assessment of disease severity still relies on manual measurements, which can yield varying results depending on the physician, thereby leading to diagnostic inaccuracies. PURPOSE: This study aims to develop a multilabel grading model based on deep learning to classify the degree of prolapse of three organs in the female pelvis using stress magnetic resonance imaging (MRI) and provide interpretable result analysis. METHODS: We utilized sagittal MRI sequences taken at rest and during maximum Valsalva maneuver from 662 subjects. The training set included 464 subjects, the validation set included 98 subjects, and the test set included 100 subjects (training set n = 464, validation set n = 98, test set n = 100). We designed a feature extraction module specifically for pelvic floor MRI using the vision transformer architecture and employed label masking training strategy and pre-training methods to enhance model convergence. The grading results were evaluated using Precision, Kappa, Recall, and Area Under the Curve (AUC). To validate the effectiveness of the model, the designed model was compared with classic grading methods. Finally, we provided interpretability charts illustrating the model's operational principles on the grading task. RESULTS: In terms of POP grading detection, the model achieved an average Precision, Kappa coefficient, Recall, and AUC of 0.86, 0.77, 0.76, and 0.86, respectively. Compared to existing studies, our model achieved the highest performance metrics. The average time taken to diagnose a patient was 0.38 s. CONCLUSIONS: The proposed model achieved detection accuracy that is comparable to or even exceeds that of physicians, demonstrating the effectiveness of the vision transformer architecture and label masking training strategy for assisting in the grading of POP under static and maximum Valsalva conditions. This offers a promising option for computer-aided diagnosis and treatment planning of POP.

17.
Pathol Res Pract ; 263: 155644, 2024 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-39395299

RESUMO

Breast cancer (BC) is the most frequently occurring cancer disease observed in women after lung cancer. Out of different stages, invasive ductal BC causes maximum deaths in women. In this work, three deep learning (DL) models such as Vision Transformer (ViT), Convmixer, and Visual Geometry Group-19 (VGG-19) are implemented for the detection and classification of different breast cancer tumors with the help of Breast cancer histopathological (Break His) image database. The performance of each model is evaluated using an 80:20 training scheme and measured in terms of accuracy, precision, recall, loss, F1-score, and area under the curve (AUC). From the simulation result, ViT showed the best performance for binary classification of breast cancer tumors with accuracy, precision, recall, and F1-score of 99.89 %, 98.29 %, 98.29 %, and 98.29 %, respectively. Also, ViT showed the best performance in terms of accuracy (98.21 %), average Precision (89.84 %), recall (89.97 %), and F1-score (88.75) for eight class classifications. Moreover, we have also ensemble the ViT-Convmixer model and observed that the performance of the ensemble model is reduced as compared to the ViT model. We have also compared the performance of the proposed best model with other existing models reported by several research groups. The study will help find suitable models that will increase accuracy in early diagnoses of BC. We hope the study will also help to minimize human errors in the early diagnosis of this fatal disease and administer appropriate treatment. The proposed model may also be implemented for the detection of other diseases with improved accuracy.

18.
Heliyon ; 10(17): e36611, 2024 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-39281453

RESUMO

Compressors are important production equipment in the petrochemical industry, and the accuracy of their fault diagnosis is critical. In order to detect and diagnose compressor equipment faults in a timely manner, this paper constructs a deep residual shrinkage visual network (DRS-ViT). The network comprises a modified residual network (ResNet) and a vision transformer (ViT). The obtained compressor vibration signals were transformed into gram angle sum field (GASF) plots using gram angle field (GAF). The resulting image is the passed through a modified ResNet network to extract initial features. The extracted feature images are subsequently input into the ViT model for fault classification. The experimental results demonstrate that the fault diagnosis accuracy achieved by the DRS-ViT model is 99.5 %. The visualization of the model indicates that it can effectively identify the fault points. The validity and robustness of the DRS-ViT model are confirmed through comparison and analysis with various models.

19.
Heliyon ; 10(17): e36361, 2024 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-39281639

RESUMO

Over the last decade, the use of machine learning in smart agriculture has surged in popularity. Deep learning, particularly Convolutional Neural Networks (CNNs), has been useful in identifying diseases in plants at an early stage. Recently, Vision Transformers (ViTs) have proven to be effective in image classification tasks. These architectures often outperform most state-of-the-art CNN models. However, the adoption of vision transformers in agriculture is still in its infancy. In this paper, we evaluated the performance of vision transformers in identification of mango leaf diseases and compare them with popular CNNs. We proposed an optimized model based on a pretrained Data-efficient Image Transformer (DeiT) architecture that achieves 99.75% accuracy, better than many popular CNNs including SqueezeNet, ShuffleNet, EfficientNet, DenseNet121, and MobileNet. We also demonstrated that vision transformers can have a shorter training time than CNNs, as they require fewer epochs to achieve optimal results. We also proposed a mobile app that uses the model as a backend to identify mango leaf diseases in real-time.

20.
Heliyon ; 10(18): e37804, 2024 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-39323802

RESUMO

Brain tumors are one of the leading causes of cancer death; screening early is the best strategy to diagnose and treat brain tumors. Magnetic Resonance Imaging (MRI) is extensively utilized for brain tumor diagnosis; nevertheless, achieving improved accuracy and performance, a critical challenge in most of the previously reported automated medical diagnostics, is a complex problem. The study introduces the Dual Vision Transformer-DSUNET model, which incorporates feature fusion techniques to provide precise and efficient differentiation between brain tumors and other brain regions by leveraging multi-modal MRI data. The impetus for this study arises from the necessity of automating the segmentation process of brain tumors in medical imaging, a critical component in the realms of diagnosis and therapy strategy. The BRATS 2020 dataset is employed to tackle this issue, an extensively utilized dataset for segmenting brain tumors. This dataset encompasses multi-modal MRI images, including T1-weighted, T2-weighted, T1Gd (contrast-enhanced), and FLAIR modalities. The proposed model incorporates the dual vision idea to comprehensively capture the heterogeneous properties of brain tumors across several imaging modalities. Moreover, feature fusion techniques are implemented to augment the amalgamation of data originating from several modalities, enhancing the accuracy and dependability of tumor segmentation. The Dual Vision Transformer-DSUNET model's performance is evaluated using the Dice Coefficient as a prevalent metric for quantifying segmentation accuracy. The results obtained from the experiment exhibit remarkable performance, with Dice Coefficient values of 91.47 % for enhanced tumors, 92.38 % for core tumors, and 90.88 % for edema. The cumulative Dice score for the entirety of the classes is 91.29 %. In addition, the model has a high level of accuracy, roughly 99.93 %, which underscores its durability and efficacy in segmenting brain tumors. Experimental findings demonstrate the integrity of the suggested architecture, which has quickly improved the detection accuracy of many brain diseases.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA