Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 344
Filtrar
1.
Microsc Res Tech ; 2024 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-39351968

RESUMO

Lymph-node status is important in decision-making during early gastric cancer (EGC) treatment. Currently, endoscopic submucosal dissection is the mainstream treatment for EGC. However, it is challenging for even experienced endoscopists to accurately diagnose and treat EGC. Multiphoton microscopy can extract the morphological features of collagen fibers from tissues. The characteristics of collagen fibers can be used to assess the lymph-node metastasis status in patients with EGC. First, we compared the accuracy of four deep learning models (VGG16, ResNet34, MobileNetV2, and PVTv2) in training preprocessed images and test datasets. Next, we integrated the features of the best-performing model, which was PVTv2, with manual and clinical features to develop a novel model called AutoLNMNet. The prediction accuracy of AutoLNMNet for the no metastasis (Ly0) and metastasis in lymph nodes (Ly1) stages reached 0.92, which was 0.3% higher than that of PVTv2. The receiver operating characteristics of AutoLNMNet in quantifying Ly0 and Ly1 stages were 0.97 and 0.97, respectively. Therefore, AutoLNMNet is highly reliable and accurate in detecting lymph-node metastasis, providing an important tool for the early diagnosis and treatment of EGC.

2.
Front Comput Neurosci ; 18: 1404623, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39380741

RESUMO

Introduction: With the great success of Transformers in the field of machine learning, it is also gradually attracting widespread interest in the field of remote sensing (RS). However, the research in the field of remote sensing has been hampered by the lack of large labeled data sets and the inconsistency of data modes caused by the diversity of RS platforms. With the rise of self-supervised learning (SSL) algorithms in recent years, RS researchers began to pay attention to the application of "pre-training and fine-tuning" paradigm in RS. However, there are few researches on multi-modal data fusion in remote sensing field. Most of them choose to use only one of the modal data or simply splice multiple modal data roughly. Method: In order to study a more efficient multi-modal data fusion scheme, we propose a multi-modal fusion mechanism based on gated unit control (MGSViT). In this paper, we pretrain the ViT model based on BigEarthNet dataset by combining two commonly used SSL algorithms, and propose an intra-modal and inter-modal gated fusion unit for feature learning by combining multispectral (MS) and synthetic aperture radar (SAR). Our method can effectively combine different modal data to extract key feature information. Results and discussion: After fine-tuning and comparison experiments, we outperform the most advanced algorithms in all downstream classification tasks. The validity of our proposed method is verified.

3.
J Food Sci ; 2024 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-39385405

RESUMO

Pinelliae Rhizoma is a key ingredient in botanical supplements and is often adulterated by Rhizoma Pinelliae Pedatisectae, which is similar in appearance but less expensive. Accurate identification of these materials is crucial for both scientific and commercial purposes. Traditional morphological identification relies heavily on expert experience and is subjective, while chemical analysis and molecular biological identification are typically time consuming and labor intensive. This study aims to employ a simpler, faster, and non-invasive image recognition technique to distinguish between these two highly similar plant materials. In the realm of image recognition, we aimed to utilize the vision transformer (ViT) algorithm, a cutting-edge image recognition technology, to differentiate these materials. All samples were verified using DNA molecular identification before image analysis. The result demonstrates that the ViT algorithm achieves a classification accuracy exceeding 94%, significantly outperforming the convolutional neural network model's 60%-70% accuracy. This highlights the efficiency of this technology in identifying plant materials with similar appearances. This study marks the pioneer work of the ViT algorithm to such a challenging task, showcasing its potential for precise botanical material identification and setting the stage for future advancements in the field.

4.
Neural Netw ; 181: 106749, 2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39357266

RESUMO

Unsupervised Domain Adaptation aims to leverage a source domain with ample labeled data to tackle tasks on an unlabeled target domain. However, this poses a significant challenge, particularly in scenarios exhibiting significant disparities between the two domains. Prior methods often fall short in challenging domains due to the impact of incorrect pseudo-labeling noise and the limits of handcrafted domain alignment rules. In this paper, we propose a novel method called DCST (Dual Cross-Supervision Transformer), which improves upon existing methods in two key aspects. Firstly, vision transformer is combined with dual cross-supervision learning strategy to enforce consistency learning from different domains. The network accomplishes domain-specific self-training and cross-domain feature alignment in an adaptive manner. Secondly, due to the presence of noise in challenging domain, and the need to reduce the risks of model collapse and overfitting, we propose a Domain Shift Filter. Specifically, this module allows the model to leverage the memory of source domain features to facilitate a smooth transition. It can also improve the effectiveness of knowledge transfer between domains with significant gaps. We conduct extensive experiments on four benchmark datasets and achieved the best classification results, including 94.3% on Office-31, 86.0% on Office-Home, 89.3% on VisDA-2017, and 48.8% on DomainNet. Code is available in https://github.com/Yislight/DCST.

5.
Curr Med Imaging ; 2024 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-39360542

RESUMO

INTRODUCTION: In this study, we harnessed three cutting-edge algorithms' capabilities to refine the elbow fracture prediction process through X-ray image analysis. Employing the YOLOv8 (You only look once) algorithm, we first identified Regions of Interest (ROI) within the X-ray images, significantly augmenting fracture prediction accuracy. METHODS: Subsequently, we integrated and compared the ResNet, the SeResNet (Squeeze-and-Excitation Residual Network) ViT (Vision Transformer) algorithms to refine our predictive capabilities. Furthermore, to ensure optimal precision, we implemented a series of meticulous refinements. This included recalibrating ROI regions to enable finer-grained identification of diagnostically significant areas within the X-ray images. Additionally, advanced image enhancement techniques were applied to optimize the X-ray images' visual quality and structural clarity. RESULTS: These methodological enhancements synergistically contributed to a substantial improvement in the overall accuracy of our fracture predictions. The dataset utilized for training, testing & validation, and comprehensive evaluation exclusively comprised elbow X-ray images, where predicting the fracture with three algorithms: Resnet50; accuracy 0.97, precision 1, recall 0.95, SeResnet50; accuracy 0.97, precision 1, recall 0.95 & ViTB- 16 with high accuracy of 0.99, precision same as the other two algorithms, with a recall of 0.95. CONCLUSION: This approach has the potential to increase the precision of diagnoses, lessen the burden of radiologists, easily integrate into current medical imaging systems, and assist clinical decision-making, all of which could lead to better patient care and health outcomes overall.

6.
BMC Med Inform Decis Mak ; 24(1): 288, 2024 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-39375719

RESUMO

BACKGROUND: Histopathology is a gold standard for cancer diagnosis. It involves extracting tissue specimens from suspicious areas to prepare a glass slide for a microscopic examination. However, histological tissue processing procedures result in the introduction of artifacts, which are ultimately transferred to the digitized version of glass slides, known as whole slide images (WSIs). Artifacts are diagnostically irrelevant areas and may result in wrong predictions from deep learning (DL) algorithms. Therefore, detecting and excluding artifacts in the computational pathology (CPATH) system is essential for reliable automated diagnosis. METHODS: In this paper, we propose a mixture of experts (MoE) scheme for detecting five notable artifacts, including damaged tissue, blur, folded tissue, air bubbles, and histologically irrelevant blood from WSIs. First, we train independent binary DL models as experts to capture particular artifact morphology. Then, we ensemble their predictions using a fusion mechanism. We apply probabilistic thresholding over the final probability distribution to improve the sensitivity of the MoE. We developed four DL pipelines to evaluate computational and performance trade-offs. These include two MoEs and two multiclass models of state-of-the-art deep convolutional neural networks (DCNNs) and vision transformers (ViTs). These DL pipelines are quantitatively and qualitatively evaluated on external and out-of-distribution (OoD) data to assess generalizability and robustness for artifact detection application. RESULTS: We extensively evaluated the proposed MoE and multiclass models. DCNNs-based MoE and ViTs-based MoE schemes outperformed simpler multiclass models and were tested on datasets from different hospitals and cancer types, where MoE using (MobileNet) DCNNs yielded the best results. The proposed MoE yields 86.15 % F1 and 97.93% sensitivity scores on unseen data, retaining less computational cost for inference than MoE using ViTs. This best performance of MoEs comes with relatively higher computational trade-offs than multiclass models. Furthermore, we apply post-processing to create an artifact segmentation mask, a potential artifact-free RoI map, a quality report, and an artifact-refined WSI for further computational analysis. During the qualitative evaluation, field experts assessed the predictive performance of MoEs over OoD WSIs. They rated artifact detection and artifact-free area preservation, where the highest agreement translated to a Cohen Kappa of 0.82, indicating substantial agreement for the overall diagnostic usability of the DCNN-based MoE scheme. CONCLUSIONS: The proposed artifact detection pipeline will not only ensure reliable CPATH predictions but may also provide quality control. In this work, the best-performing pipeline for artifact detection is MoE with DCNNs. Our detailed experiments show that there is always a trade-off between performance and computational complexity, and no straightforward DL solution equally suits all types of data and applications. The code and HistoArtifacts dataset can be found online at Github and Zenodo , respectively.


Assuntos
Artefatos , Aprendizado Profundo , Humanos , Neoplasias , Processamento de Imagem Assistida por Computador/métodos , Patologia Clínica/normas , Interpretação de Imagem Assistida por Computador/métodos
7.
Med Image Anal ; 99: 103356, 2024 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-39378568

RESUMO

Breast cancer is a significant global public health concern, with various treatment options available based on tumor characteristics. Pathological examination of excision specimens after surgery provides essential information for treatment decisions. However, the manual selection of representative sections for histological examination is laborious and subjective, leading to potential sampling errors and variability, especially in carcinomas that have been previously treated with chemotherapy. Furthermore, the accurate identification of residual tumors presents significant challenges, emphasizing the need for systematic or assisted methods to address this issue. In order to enable the development of deep-learning algorithms for automated cancer detection on radiology images, it is crucial to perform radiology-pathology registration, which ensures the generation of accurately labeled ground truth data. The alignment of radiology and histopathology images plays a critical role in establishing reliable cancer labels for training deep-learning algorithms on radiology images. However, aligning these images is challenging due to their content and resolution differences, tissue deformation, artifacts, and imprecise correspondence. We present a novel deep learning-based pipeline for the affine registration of faxitron images, the x-ray representations of macrosections of ex-vivo breast tissue, and their corresponding histopathology images of tissue segments. The proposed model combines convolutional neural networks and vision transformers, allowing it to effectively capture both local and global information from the entire tissue macrosection as well as its segments. This integrated approach enables simultaneous registration and stitching of image segments, facilitating segment-to-macrosection registration through a puzzling-based mechanism. To address the limitations of multi-modal ground truth data, we tackle the problem by training the model using synthetic mono-modal data in a weakly supervised manner. The trained model demonstrated successful performance in multi-modal registration, yielding registration results with an average landmark error of 1.51 mm (±2.40), and stitching distance of 1.15 mm (±0.94). The results indicate that the model performs significantly better than existing baselines, including both deep learning-based and iterative models, and it is also approximately 200 times faster than the iterative approach. This work bridges the gap in the current research and clinical workflow and has the potential to improve efficiency and accuracy in breast cancer evaluation and streamline pathology workflow.

8.
Brain Inform ; 11(1): 25, 2024 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-39363122

RESUMO

Transformers have dominated the landscape of Natural Language Processing (NLP) and revolutionalized generative AI applications. Vision Transformers (VT) have recently become a new state-of-the-art for computer vision applications. Motivated by the success of VTs in capturing short and long-range dependencies and their ability to handle class imbalance, this paper proposes an ensemble framework of VTs for the efficient classification of Alzheimer's Disease (AD). The framework consists of four vanilla VTs, and ensembles formed using hard and soft-voting approaches. The proposed model was tested using two popular AD datasets: OASIS and ADNI. The ADNI dataset was employed to assess the models' efficacy under imbalanced and data-scarce conditions. The ensemble of VT saw an improvement of around 2% compared to individual models. Furthermore, the results are compared with state-of-the-art and custom-built Convolutional Neural Network (CNN) architectures and Machine Learning (ML) models under varying data conditions. The experimental results demonstrated an overall performance gain of 4.14% and 4.72% accuracy over the ML and CNN algorithms, respectively. The study has also identified specific limitations and proposes avenues for future research. The codes used in the study are made publicly available.

9.
Skin Res Technol ; 30(9): e70040, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39221858

RESUMO

BACKGROUND: Skin cancer is one of the highly occurring diseases in human life. Early detection and treatment are the prime and necessary points to reduce the malignancy of infections. Deep learning techniques are supplementary tools to assist clinical experts in detecting and localizing skin lesions. Vision transformers (ViT) based on image segmentation classification using multiple classes provide fairly accurate detection and are gaining more popularity due to legitimate multiclass prediction capabilities. MATERIALS AND METHODS: In this research, we propose a new ViT Gradient-Weighted Class Activation Mapping (GradCAM) based architecture named ViT-GradCAM for detecting and classifying skin lesions by spreading ratio on the lesion's surface area. The proposed system is trained and validated using a HAM 10000 dataset by studying seven skin lesions. The database comprises 10 015 dermatoscopic images of varied sizes. The data preprocessing and data augmentation techniques are applied to overcome the class imbalance issues and improve the model's performance. RESULT: The proposed algorithm is based on ViT models that classify the dermatoscopic images into seven classes with an accuracy of 97.28%, precision of 98.51, recall of 95.2%, and an F1 score of 94.6, respectively. The proposed ViT-GradCAM obtains better and more accurate detection and classification than other state-of-the-art deep learning-based skin lesion detection models. The architecture of ViT-GradCAM is extensively visualized to highlight the actual pixels in essential regions associated with skin-specific pathologies. CONCLUSION: This research proposes an alternate solution to overcome the challenges of detecting and classifying skin lesions using ViTs and GradCAM, which play a significant role in detecting and classifying skin lesions accurately rather than relying solely on deep learning models.


Assuntos
Algoritmos , Aprendizado Profundo , Dermoscopia , Neoplasias Cutâneas , Humanos , Dermoscopia/métodos , Neoplasias Cutâneas/diagnóstico por imagem , Neoplasias Cutâneas/classificação , Neoplasias Cutâneas/patologia , Interpretação de Imagem Assistida por Computador/métodos , Bases de Dados Factuais , Pele/diagnóstico por imagem , Pele/patologia
10.
Heliyon ; 10(16): e36092, 2024 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-39247290

RESUMO

Despite advances in deep learning for plant leaf disease recognition, accurately distinguishing morphological features under varying environmental conditions continues to pose significant challenges. Traditional deep learning models often fail to effectively merge local and global information, especially in small-scale datasets, impairing performance and elevating training costs. Focusing on citrus diseases, we propose an improved FasterViT Model, an advanced hybrid CNN-ViT framework that builds upon the FasterViT model. The proposed model seamlessly integrates CNN's rapid local learning capabilities with ViT's global information processing strength, thereby effectively extracting complex textures and morphological features from images. Cross-stage alternating Mixup and Cutout methods are strategically employed to enhance model robustness and generalization capabilities, particularly valuable for fast learning on small-scale datasets by simulating a more diverse training environment. Triplet Attention and AdaptiveAvgPool mechanisms are utilized to reduce training costs and optimize training performance. The proposed model is tested on both our specially constructed small-scale citrus disease dataset called in-field small dataset and the comprehensive PlantVillage dataset. The experimental results demonstrated that the model exhibits the capability of fast learning and adaptation to small sample training in plant disease detection tasks, and demonstrates the effectiveness of our improvement approach in improving model accuracy and reducing training costs. Additionally, its exemplary performance in transfer learning scenarios underscores its adaptability and broad applicability. This study not only highlights the efficacy of the improved FasterViT model in addressing the complexities of plant disease image recognition but also pioneers a new paradigm for developing efficient, scalable, and robust classification systems.

11.
Heliyon ; 10(16): e36144, 2024 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-39253215

RESUMO

Rationale and objectives: To develop and validate a deep learning (DL) model to automatically diagnose muscle-invasive bladder cancer (MIBC) on MRI with Vision Transformer (ViT). Materials and methods: This multicenter retrospective study included patients with BC who reported to two institutions between January 2016 and June 2020 (training dataset) and a third institution between May 2017 and May 2022 (test dataset). The diagnostic model for MIBC and the segmentation model for BC on MRI were developed using the training dataset with 5-fold cross-validation. ViT- and convolutional neural network (CNN)-based diagnostic models were developed and compared for diagnostic performance using the area under the curve (AUC). The performance of the diagnostic model with manual and auto-generated regions of interest (ROImanual and ROIauto, respectively) was validated on the test dataset and compared to that of radiologists (three senior and three junior radiologists) using Vesical Imaging Reporting and Data System scoring. Results: The training and test datasets included 170 and 53 patients, respectively. Mean AUC of the top 10 ViT-based models with 5-fold cross-validation outperformed those of the CNN-based models (0.831 ± 0.003 vs. 0.713 ± 0.007-0.812 ± 0.006, p < .001). The diagnostic model with ROImanual achieved AUC of 0.872 (95 % CI: 0.777, 0.968), which was comparable to that of junior radiologists (AUC = 0.862, 0.873, and 0.930). Semi-automated diagnosis with the diagnostic model with ROIauto achieved AUC of 0.815 (95 % CI: 0.696, 0.935). Conclusion: The DL model effectively diagnosed MIBC. The ViT-based model outperformed CNN-based models, highlighting its utility in medical image analysis.

12.
New Microbes New Infect ; 62: 101457, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-39253407

RESUMO

Background: Large vision models (LVM) pretrained by large datasets have demonstrated their enormous capacity to understand visual patterns and capture semantic information from images. We proposed a novel method of knowledge domain adaptation with pretrained LVM for a low-cost artificial intelligence (AI) model to quantify the severity of SARS-CoV-2 pneumonia based on frontal chest X-ray (CXR) images. Methods: Our method used the pretrained LVMs as the primary feature extractor and self-supervised contrastive learning for domain adaptation. An encoder with a 2048-dimensional feature vector output was first trained by self-supervised learning for knowledge domain adaptation. Then a multi-layer perceptron (MLP) was trained for the final severity prediction. A dataset with 2599 CXR images was used for model training and evaluation. Results: The model based on the pretrained vision transformer (ViT) and self-supervised learning achieved the best performance in cross validation, with mean squared error (MSE) of 23.83 (95 % CI 22.67-25.00) and mean absolute error (MAE) of 3.64 (95 % CI 3.54-3.73). Its prediction correlation has the R 2 of 0.81 (95 % CI 0.79-0.82) and Spearman ρ of 0.80 (95 % CI 0.77-0.81), which are comparable to the current state-of-the-art (SOTA) methods trained by much larger CXR datasets. Conclusion: The proposed new method has achieved the SOTA performance to quantify the severity of SARS-CoV-2 pneumonia at a significantly lower cost. The method can be extended to other infectious disease detection or quantification to expedite the application of AI in medical research.

13.
Sensors (Basel) ; 24(17)2024 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-39275368

RESUMO

In online video understanding, which has a wide range of real-world applications, inference speed is crucial. Many approaches involve frame-level visual feature extraction, which often represents the biggest bottleneck. We propose RetinaViT, an efficient method for extracting frame-level visual features in an online video stream, aiming to fundamentally enhance the efficiency of online video understanding tasks. RetinaViT is composed of efficiently approximated Transformer blocks that only take changed tokens (event tokens) as queries and reuse the already processed tokens from the previous timestep for the others. Furthermore, we restrict keys and values to the spatial neighborhoods of event tokens to further improve efficiency. RetinaViT involves tuning multiple parameters, which we determine through a multi-step process. During model training, we randomly vary these parameters and then perform black-box optimization to maximize accuracy and efficiency on the pre-trained model. We conducted extensive experiments on various online video recognition tasks, including action recognition, pose estimation, and object segmentation, validating the effectiveness of each component in RetinaViT and demonstrating improvements in the speed/accuracy trade-off compared to baselines. In particular, for action recognition, RetinaViT built on ViT-B16 reduces inference time by approximately 61.9% on the CPU and 50.8% on the GPU, while achieving slight accuracy improvements rather than degradation.

14.
Comput Methods Programs Biomed ; 257: 108373, 2024 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-39276667

RESUMO

Tumors are an important health concern in modern times. Breast cancer is one of the most prevalent causes of death for women. Breast cancer is rapidly becoming the leading cause of mortality among women globally. Early detection of breast cancer allows patients to obtain appropriate therapy, increasing their probability of survival. The adoption of 3-Dimensional (3D) mammography for the medical identification of abnormalities in the breast reduced the number of deaths dramatically. Classification and accurate detection of lumps in the breast in 3D mammography is especially difficult due to factors such as inadequate contrast and normal fluctuations in tissue density. Several Computer-Aided Diagnosis (CAD) solutions are under development to help radiologists accurately classify abnormalities in the breast. In this paper, a breast cancer diagnosis model is implemented to detect breast cancer in cancer patients to prevent death rates. The 3D mammogram images are gathered from the internet. Then, the gathered images are given to the preprocessing phase. The preprocessing is done using a median filter and image scaling method. The purpose of the preprocessing phase is to enhance the quality of the images and remove any noise or artifacts that may interfere with the detection of abnormalities. The median filter helps to smooth out any irregularities in the images, while the image scaling method adjusts the size and resolution of the images for better analysis. Once the preprocessing is complete, the preprocessed image is given to the segmentation phase. The segmentation phase is crucial in medical image analysis as it helps to identify and separate different structures within the image, such as organs or tumors. This process involves dividing the preprocessed image into meaningful regions or segments based on intensity, color, texture, or other features. The segmentation process is done using Adaptive Thresholding with Region Growing Fusion Model (AT-RGFM)". This model combines the advantages of both thresholding and region-growing techniques to accurately identify and delineate specific structures within the image. By utilizing AT-RGFM, the segmentation phase can effectively differentiate between different parts of the image, allowing for more precise analysis and diagnosis. It plays a vital role in the medical image analysis process, providing crucial insights for healthcare professionals. Here, the Modified Garter Snake Optimization Algorithm (MGSOA) is used to optimize the parameters. It helps to optimize parameters for accurately identifying and delineating specific structures within medical images and also helps healthcare professionals in providing more precise analysis and diagnosis, ultimately playing a vital role in the medical image analysis process. MGSOA enhances the segmentation phase by effectively differentiating between different parts of the image, leading to more accurate results. Then, the segmented image is fed into the detection phase. The tumor detection is performed by the Vision Transformer-based Multiscale Adaptive EfficientNetB7 (ViT-MAENB7) model. This model utilizes a combination of advanced algorithms and deep learning techniques to accurately identify and locate tumors within the segmented medical image. By incorporating a multiscale adaptive approach, the ViT-MAENB7 model can analyze the image at various levels of detail, improving the overall accuracy of tumor detection. This crucial step in the medical image analysis process allows healthcare professionals to make more informed decisions regarding patient treatment and care. Here, the created MGSOA algorithm is used to optimize the parameters for enhancing the performance of the model. The suggested breast cancer diagnosis performance is compared to conventional cancer diagnosis models and it showed high accuracy. The accuracy of the developed MGSOA-ViT-MAENB7 is 96.6 %, and others model like RNN, LSTM, EffNet, and ViT-MAENet given the accuracy to be 90.31 %, 92.79 %, 94.46 % and 94.75 %. The developed model's ability to analyze images at multiple scales, combined with the optimization provided by the MGSOA algorithm, results in a highly accurate and efficient system for detecting tumors in medical images. This cutting-edge technology not only improves the accuracy of diagnosis but also helps healthcare professionals tailor treatment plans to individual patients, ultimately leading to better outcomes. By outperforming traditional cancer diagnosis models, the proposed model is revolutionizing the field of medical imaging and setting a new standard for precision and effectiveness in healthcare.

15.
Curr Med Imaging ; 2024 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-39257152

RESUMO

BACKGROUND: Accurately modeling respiratory motion in medical images is crucial for various applications, including radiation therapy planning. However, existing registration methods often struggle to extract local features effectively, limiting their performance. OBJECTIVE: In this paper, we aimed to propose a new framework called CvTMorph, which utilizes a Convolutional vision Transformer (CvT) and Convolutional Neural Networks (CNN) to improve local feature extraction. METHODS: CvTMorph integrates CvT and CNN to construct a hybrid model that combines the strengths of both approaches. Additionally, scaling and square layers are added to enhance the registration performance. We have evaluated the performance of CvTMorph on the 4D-Lung and DIR-Lab datasets and compared it with state-of-the-art methods to demonstrate its effectiveness. RESULTS: The experimental results have demonstrated CvTMorph to outperform the existing methods in terms of accuracy and robustness for respiratory motion modeling in 4D images. The incorporation of the convolutional vision transformer has significantly improved the registration performance and enhanced the representation of local structures. CONCLUSION: CvTMorph offers a promising solution for accurately modeling respiratory motion in 4D medical images. The hybrid model, leveraging convolutional vision transformer and convolutional neural networks, has proven effective in extracting local features and improving registration performance. The results have highlighted the potential of CvTMorph for various applications, such as radiation therapy planning, and provided a basis for further research in this field.

16.
Diagnostics (Basel) ; 14(17)2024 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-39272643

RESUMO

The accurate and efficient segmentation of the spine is important in the diagnosis and treatment of spine malfunctions and fractures. However, it is still challenging because of large inter-vertebra variations in shape and cross-image localization of the spine. In previous methods, convolutional neural networks (CNNs) have been widely applied as a vision backbone to tackle this task. However, these methods are challenged in utilizing the global contextual information across the whole image for accurate spine segmentation because of the inherent locality of the convolution operation. Compared with CNNs, the Vision Transformer (ViT) has been proposed as another vision backbone with a high capacity to capture global contextual information. However, when the ViT is employed for spine segmentation, it treats all input tokens equally, including vertebrae-related tokens and non-vertebrae-related tokens. Additionally, it lacks the capability to locate regions of interest, thus lowering the accuracy of spine segmentation. To address this limitation, we propose a novel Vertebrae-aware Vision Transformer (VerFormer) for automatic spine segmentation from CT images. Our VerFormer is designed by incorporating a novel Vertebrae-aware Global (VG) block into the ViT backbone. In the VG block, the vertebrae-related global contextual information is extracted by a Vertebrae-aware Global Query (VGQ) module. Then, this information is incorporated into query tokens to highlight vertebrae-related tokens in the multi-head self-attention module. Thus, this VG block can leverage global contextual information to effectively and efficiently locate spines across the whole input, thus improving the segmentation accuracy of VerFormer. Driven by this design, the VerFormer demonstrates a solid capacity to capture more discriminative dependencies and vertebrae-related context in automatic spine segmentation. The experimental results on two spine CT segmentation tasks demonstrate the effectiveness of our VG block and the superiority of our VerFormer in spine segmentation. Compared with other popular CNN- or ViT-based segmentation models, our VerFormer shows superior segmentation accuracy and generalization.

17.
J Imaging Inform Med ; 2024 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-39284980

RESUMO

Conventionally diagnosing septic arthritis relies on detecting the causal pathogens in samples of synovial fluid, synovium, or blood. However, isolating these pathogens through cultures takes several days, thus delaying both diagnosis and treatment. Establishing a quantitative classification model from ultrasound images for rapid septic arthritis diagnosis is mandatory. For the study, a database composed of 342 images of non-septic arthritis and 168 images of septic arthritis produced by grayscale (GS) and power Doppler (PD) ultrasound was constructed. In the proposed architecture of fusion with attention and selective transformation (FAST), both groups of images were combined in a vision transformer (ViT) with the convolutional block attention module, which incorporates spatial, modality, and channel features. Fivefold cross-validation was applied to evaluate the generalized ability. The FAST architecture achieved the accuracy, sensitivity, specificity, and area under the curve (AUC) of 86.33%, 80.66%, 90.25%, and 0.92, respectively. These performances were higher than using conventional ViT (82.14%) and significantly better than using one modality alone (GS 73.88%, PD 72.02%), with the p-value being less than 0.01. Through the integration of multi-modality and the extraction of multiple channel features, the established model provided promising accuracy and AUC in septic arthritis classification. The end-to-end learning of ultrasound features can provide both rapid and objective assessment suggestions for future clinic use.

18.
Heliyon ; 10(17): e36611, 2024 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-39281453

RESUMO

Compressors are important production equipment in the petrochemical industry, and the accuracy of their fault diagnosis is critical. In order to detect and diagnose compressor equipment faults in a timely manner, this paper constructs a deep residual shrinkage visual network (DRS-ViT). The network comprises a modified residual network (ResNet) and a vision transformer (ViT). The obtained compressor vibration signals were transformed into gram angle sum field (GASF) plots using gram angle field (GAF). The resulting image is the passed through a modified ResNet network to extract initial features. The extracted feature images are subsequently input into the ViT model for fault classification. The experimental results demonstrate that the fault diagnosis accuracy achieved by the DRS-ViT model is 99.5 %. The visualization of the model indicates that it can effectively identify the fault points. The validity and robustness of the DRS-ViT model are confirmed through comparison and analysis with various models.

19.
Heliyon ; 10(17): e36361, 2024 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-39281639

RESUMO

Over the last decade, the use of machine learning in smart agriculture has surged in popularity. Deep learning, particularly Convolutional Neural Networks (CNNs), has been useful in identifying diseases in plants at an early stage. Recently, Vision Transformers (ViTs) have proven to be effective in image classification tasks. These architectures often outperform most state-of-the-art CNN models. However, the adoption of vision transformers in agriculture is still in its infancy. In this paper, we evaluated the performance of vision transformers in identification of mango leaf diseases and compare them with popular CNNs. We proposed an optimized model based on a pretrained Data-efficient Image Transformer (DeiT) architecture that achieves 99.75% accuracy, better than many popular CNNs including SqueezeNet, ShuffleNet, EfficientNet, DenseNet121, and MobileNet. We also demonstrated that vision transformers can have a shorter training time than CNNs, as they require fewer epochs to achieve optimal results. We also proposed a mobile app that uses the model as a backend to identify mango leaf diseases in real-time.

20.
Sci Rep ; 14(1): 21740, 2024 09 18.
Artigo em Inglês | MEDLINE | ID: mdl-39289394

RESUMO

Kidney diseases pose a significant global health challenge, requiring precise diagnostic tools to improve patient outcomes. This study addresses this need by investigating three main categories of renal diseases: kidney stones, cysts, and tumors. Utilizing a comprehensive dataset of 12,446 CT whole abdomen and urogram images, this study developed an advanced AI-driven diagnostic system specifically tailored for kidney disease classification. The innovative approach of this study combines the strengths of traditional convolutional neural network architecture (AlexNet) with modern advancements in ConvNeXt architectures. By integrating AlexNet's robust feature extraction capabilities with ConvNeXt's advanced attention mechanisms, the paper achieved an exceptional classification accuracy of 99.85%. A key advancement in this study's methodology lies in the strategic amalgamation of features from both networks. This paper concatenated hierarchical spatial information and incorporated self-attention mechanisms to enhance classification performance. Furthermore, the study introduced a custom optimization technique inspired by the Adam optimizer, which dynamically adjusts the step size based on gradient norms. This tailored optimizer facilitated faster convergence and more effective weight updates, imporving model performance. The model of this study demonstrated outstanding performance across various metrics, with an average precision of 99.89%, recall of 99.95%, and specificity of 99.83%. These results highlight the efficacy of the hybrid architecture and optimization strategy in accurately diagnosing kidney diseases. Additionally, the methodology of this paper emphasizes interpretability and explainability, which are crucial for the clinical deployment of deep learning models.


Assuntos
Nefropatias , Redes Neurais de Computação , Humanos , Nefropatias/diagnóstico , Nefropatias/diagnóstico por imagem , Tomografia Computadorizada por Raios X/métodos , Cálculos Renais/diagnóstico , Cálculos Renais/diagnóstico por imagem , Aprendizado Profundo , Algoritmos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA