Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 303
Filtrar
1.
Biomed Eng Comput Biol ; 15: 11795972241271569, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39156985

RESUMEN

Cancer is the leading cause of mortality in the world. And among all cancers lung and colon cancers are 2 of the most common causes of death and morbidity. The aim of this study was to develop an automated lung and colon cancer classification system using histopathological images. An automated lung and colon classification system was developed using histopathological images from the LC25000 dataset. The algorithm development included data splitting, deep neural network model selection, on the fly image augmentation, training and validation. The core of the algorithm was a Swin Transform V2 model, and 5-fold cross validation was used to evaluate model performance. The model performance was evaluated using Accuracy, Kappa, confusion matrix, precision, recall, and F1. Extensive experiments were conducted to compare the performances of different neural networks including both mainstream convolutional neural networks and vision transformers. The Swin Transform V2 model achieved a 1 (100%) on all metrics, which is the first single model to obtain perfect results on this dataset. The Swin Transformer V2 model has the potential to be used to assist pathologists in classifying lung and colon cancers using histopathology images.

2.
Ultrasonics ; 143: 107403, 2024 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-39116790

RESUMEN

This article presents a method to use the dispersive behavior of ultrasonic guided waves and neural networks to determine the isotropic elastic constants of plate-like structures through dispersion images. Therefore, two different architectures are compared: one using convolutions and transfer learning based on the EfficientNetB7 and a Vision Transformer-like approach. To accomplish this, simulated and measured dispersion images are generated, where the first is applied to design, train, and validate and the second to test the neural networks. During the training of the neural networks, distinct data augmentation layers are employed to introduce artifacts appearing in measurement data into the simulated data. The neural networks can extrapolate from simulated to measured data using these layers. The trained neural networks are assessed using dispersion images from seven known material samples. Multiple variations of the measured dispersion images are tested to guarantee the prediction stability. The study demonstrates that neural networks can learn to predict the isotropic elastic constants from measured dispersion images using only simulated dispersion images for training and validation without needing an initial guess or manual feature extraction, independent of the measurement setup. Furthermore, the suitability of the different architectures for generating information from dispersion images in general and an image-to-regression visualisation technique, are discussed.

3.
Ophthalmol Sci ; 4(6): 100552, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39165694

RESUMEN

Objective: Vision transformers (ViTs) have shown promising performance in various classification tasks previously dominated by convolutional neural networks (CNNs). However, the performance of ViTs in referable diabetic retinopathy (DR) detection is relatively underexplored. In this study, using retinal photographs, we evaluated the comparative performances of ViTs and CNNs on detection of referable DR. Design: Retrospective study. Participants: A total of 48 269 retinal images from the open-source Kaggle DR detection dataset, the Messidor-1 dataset and the Singapore Epidemiology of Eye Diseases (SEED) study were included. Methods: Using 41 614 retinal photographs from the Kaggle dataset, we developed 5 CNN (Visual Geometry Group 19, ResNet50, InceptionV3, DenseNet201, and EfficientNetV2S) and 4 ViTs models (VAN_small, CrossViT_small, ViT_small, and Hierarchical Vision transformer using Shifted Windows [SWIN]_tiny) for the detection of referable DR. We defined the presence of referable DR as eyes with moderate or worse DR. The comparative performance of all 9 models was evaluated in the Kaggle internal test dataset (with 1045 study eyes), and in 2 external test sets, the SEED study (5455 study eyes) and the Messidor-1 (1200 study eyes). Main Outcome Measures: Area under operating characteristics curve (AUC), specificity, and sensitivity. Results: Among all models, the SWIN transformer displayed the highest AUC of 95.7% on the internal test set, significantly outperforming the CNN models (all P < 0.001). The same observation was confirmed in the external test sets, with the SWIN transformer achieving AUC of 97.3% in SEED and 96.3% in Messidor-1. When specificity level was fixed at 80% for the internal test, the SWIN transformer achieved the highest sensitivity of 94.4%, significantly better than all the CNN models (sensitivity levels ranging between 76.3% and 83.8%; all P < 0.001). This trend was also consistently observed in both external test sets. Conclusions: Our findings demonstrate that ViTs provide superior performance over CNNs in detecting referable DR from retinal photographs. These results point to the potential of utilizing ViT models to improve and optimize retinal photo-based deep learning for referable DR detection. Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

4.
Med Phys ; 2024 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-39137295

RESUMEN

BACKGROUND: Precise glioma segmentation from multi-parametric magnetic resonance (MR) images is essential for brain glioma diagnosis. However, due to the indistinct boundaries between tumor sub-regions and the heterogeneous appearances of gliomas in volumetric MR scans, designing a reliable and automated glioma segmentation method is still challenging. Although existing 3D Transformer-based or convolution-based segmentation networks have obtained promising results via multi-modal feature fusion strategies or contextual learning methods, they widely lack the capability of hierarchical interactions between different modalities and cannot effectively learn comprehensive feature representations related to all glioma sub-regions. PURPOSE: To overcome these problems, in this paper, we propose a 3D hierarchical cross-modality interaction network (HCMINet) using Transformers and convolutions for accurate multi-modal glioma segmentation, which leverages an effective hierarchical cross-modality interaction strategy to sufficiently learn modality-specific and modality-shared knowledge correlated to glioma sub-region segmentation from multi-parametric MR images. METHODS: In the HCMINet, we first design a hierarchical cross-modality interaction Transformer (HCMITrans) encoder to hierarchically encode and fuse heterogeneous multi-modal features by Transformer-based intra-modal embeddings and inter-modal interactions in multiple encoding stages, which effectively captures complex cross-modality correlations while modeling global contexts. Then, we collaborate an HCMITrans encoder with a modality-shared convolutional encoder to construct the dual-encoder architecture in the encoding stage, which can learn the abundant contextual information from global and local perspectives. Finally, in the decoding stage, we present a progressive hybrid context fusion (PHCF) decoder to progressively fuse local and global features extracted by the dual-encoder architecture, which utilizes the local-global context fusion (LGCF) module to efficiently alleviate the contextual discrepancy among the decoding features. RESULTS: Extensive experiments are conducted on two public and competitive glioma benchmark datasets, including the BraTS2020 dataset with 494 patients and the BraTS2021 dataset with 1251 patients. Results show that our proposed method outperforms existing Transformer-based and CNN-based methods using other multi-modal fusion strategies in our experiments. Specifically, the proposed HCMINet achieves state-of-the-art mean DSC values of 85.33% and 91.09% on the BraTS2020 online validation dataset and the BraTS2021 local testing dataset, respectively. CONCLUSIONS: Our proposed method can accurately and automatically segment glioma regions from multi-parametric MR images, which is beneficial for the quantitative analysis of brain gliomas and helpful for reducing the annotation burden of neuroradiologists.

5.
Comput Biol Med ; 180: 109009, 2024 Aug 12.
Artículo en Inglés | MEDLINE | ID: mdl-39137673

RESUMEN

-Accurate lung tumor segmentation from Computed Tomography (CT) scans is crucial for lung cancer diagnosis. Since the 2D methods lack the volumetric information of lung CT images, 3D convolution-based and Transformer-based methods have recently been applied in lung tumor segmentation tasks using CT imaging. However, most existing 3D methods cannot effectively collaborate the local patterns learned by convolutions with the global dependencies captured by Transformers, and widely ignore the important boundary information of lung tumors. To tackle these problems, we propose a 3D boundary-guided hybrid network using convolutions and Transformers for lung tumor segmentation, named BGHNet. In BGHNet, we first propose the Hybrid Local-Global Context Aggregation (HLGCA) module with parallel convolution and Transformer branches in the encoding phase. To aggregate local and global contexts in each branch of the HLGCA module, we not only design the Volumetric Cross-Stripe Window Transformer (VCSwin-Transformer) to build the Transformer branch with local inductive biases and large receptive fields, but also design the Volumetric Pyramid Convolution with transformer-based extensions (VPConvNeXt) to build the convolution branch with multi-scale global information. Then, we present a Boundary-Guided Feature Refinement (BGFR) module in the decoding phase, which explicitly leverages the boundary information to refine multi-stage decoding features for better performance. Extensive experiments were conducted on two lung tumor segmentation datasets, including a private dataset (HUST-Lung) and a public benchmark dataset (MSD-Lung). Results show that BGHNet outperforms other state-of-the-art 2D or 3D methods in our experiments, and it exhibits superior generalization performance in both non-contrast and contrast-enhanced CT scans.

6.
Sci Rep ; 14(1): 18506, 2024 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-39122773

RESUMEN

This paper aims to increase the Unmanned Aerial Vehicle's (UAV) capacity for target tracking. First, a control model based on fuzzy logic is created, which modifies the UAV's flight attitude in response to the target's motion status and changes in the surrounding environment. Then, an edge computing-based target tracking framework is created. By deploying edge devices around the UAV, the calculation of target recognition and position prediction is transferred from the central processing unit to the edge nodes. Finally, the latest Vision Transformer model is adopted for target recognition, the image is divided into uniform blocks, and then the attention mechanism is used to capture the relationship between different blocks to realize real-time image analysis. To anticipate the position, the particle filter algorithm is used with historical data and sensor inputs to produce a high-precision estimate of the target position. The experimental results in different scenes show that the average target capture time of the algorithm based on fuzzy logic control is shortened by 20% compared with the traditional proportional-integral-derivative (PID) method, from 5.2 s of the traditional PID to 4.2 s. The average tracking error is reduced by 15%, from 0.8 m of traditional PID to 0.68 m. Meanwhile, in the case of environmental change and target motion change, this algorithm shows better robustness, and the fluctuation range of tracking error is only half of that of traditional PID. This shows that the fuzzy logic control theory is successfully applied to the UAV target tracking field, which proves the effectiveness of this method in improving the target tracking performance.

7.
Sensors (Basel) ; 24(15)2024 Jul 29.
Artículo en Inglés | MEDLINE | ID: mdl-39123960

RESUMEN

Visual object tracking, pivotal for applications like earth observation and environmental monitoring, encounters challenges under adverse conditions such as low light and complex backgrounds. Traditional tracking technologies often falter, especially when tracking dynamic objects like aircraft amidst rapid movements and environmental disturbances. This study introduces an innovative adaptive multimodal image object-tracking model that harnesses the capabilities of multispectral image sensors, combining infrared and visible light imagery to significantly enhance tracking accuracy and robustness. By employing the advanced vision transformer architecture and integrating token spatial filtering (TSF) and crossmodal compensation (CMC), our model dynamically adjusts to diverse tracking scenarios. Comprehensive experiments conducted on a private dataset and various public datasets demonstrate the model's superior performance under extreme conditions, affirming its adaptability to rapid environmental changes and sensor limitations. This research not only advances visual tracking technology but also offers extensive insights into multisource image fusion and adaptive tracking strategies, establishing a robust foundation for future enhancements in sensor-based tracking systems.

8.
Neuroscience ; 556: 42-51, 2024 Aug 03.
Artículo en Inglés | MEDLINE | ID: mdl-39103043

RESUMEN

Brain-computer interface (BCI) is a technology that directly connects signals between the human brain and a computer or other external device. Motor imagery electroencephalographic (MI-EEG) signals are considered a promising paradigm for BCI systems, with a wide range of potential applications in medical rehabilitation, human-computer interaction, and virtual reality. Accurate decoding of MI-EEG signals poses a significant challenge due to issues related to the quality of the collected EEG data and subject variability. Therefore, developing an efficient MI-EEG decoding network is crucial and warrants research. This paper proposes a loss joint training model based on the vision transformer (VIT) and the temporal convolutional network (EEG-VTTCNet) to classify MI-EEG signals. To take advantage of multiple modules together, the EEG-VTTCNet adopts a shared convolution strategy and a dual-branching strategy. The dual-branching modules perform complementary learning and jointly train shared convolutional modules with better performance. We conducted experiments on the BCI Competition IV-2a and IV-2b datasets, and the proposed network outperformed the current state-of-the-art techniques with an accuracy of 84.58% and 90.94%, respectively, for the subject-dependent mode. In addition, we used t-SNE to visualize the features extracted by the proposed network, further demonstrating the effectiveness of the feature extraction framework. We also conducted extensive ablation and hyperparameter tuning experiments to construct a robust network architecture that can be well generalized.

9.
BMC Med Inform Decis Mak ; 24(1): 232, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39174951

RESUMEN

BACKGROUND: Maxillary expansion is an important treatment method for maxillary transverse hypoplasia. Different methods of maxillary expansion should be carried out depending on the midpalatal suture maturation levels, and the diagnosis was validated by palatal plane cone beam computed tomography (CBCT) images by orthodontists, while such a method suffered from low efficiency and strong subjectivity. This study develops and evaluates an enhanced vision transformer (ViT) to automatically classify CBCT images of midpalatal sutures with different maturation stages. METHODS: In recent years, the use of convolutional neural network (CNN) to classify images of midpalatal suture with different maturation stages has brought positive significance to the decision of the clinical maxillary expansion method. However, CNN cannot adequately learn the long-distance dependencies between images and features, which are also required for global recognition of midpalatal suture CBCT images. The Self-Attention of ViT has the function of capturing the relationship between long-distance pixels of the image. However, it lacks the inductive bias of CNN and needs more data training. To solve this problem, a CNN-enhanced ViT model based on transfer learning is proposed to classify midpalatal suture CBCT images. In this study, 2518 CBCT images of the palate plane are collected, and the images are divided into 1259 images as the training set, 506 images as the verification set, and 753 images as the test set. After the training set image preprocessing, the CNN-enhanced ViT model is trained and adjusted, and the generalization ability of the model is tested on the test set. RESULTS: The classification accuracy of our proposed ViT model is 95.75%, and its Macro-averaging Area under the receiver operating characteristic Curve (AUC) and Micro-averaging AUC are 97.89% and 98.36% respectively on our data test set. The classification accuracy of the best performing CNN model EfficientnetV2_S was 93.76% on our data test set. The classification accuracy of the clinician is 89.10% on our data test set. CONCLUSIONS: The experimental results show that this method can effectively complete CBCT images classification of midpalatal suture maturation stages, and the performance is better than a clinician. Therefore, the model can provide a valuable reference for orthodontists and assist them in making correct a diagnosis.


Asunto(s)
Tomografía Computarizada de Haz Cónico , Redes Neurales de la Computación , Humanos , Suturas Craneales/diagnóstico por imagen , Técnica de Expansión Palatina , Hueso Paladar/diagnóstico por imagen , Aprendizaje Automático
10.
Stud Health Technol Inform ; 316: 919-923, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176942

RESUMEN

Cilioretinal arteries are a common congenital anomaly of retinal blood supply. This paper presents a deep learning-based approach for the automated detection of a CRA from color fundus images. Leveraging the Vision Transformer architecture, a pre-trained model from RETFound was fine-tuned to transfer knowledge from a broader dataset to our specific task. An initial dataset of 85 was expanded to 170 images through data augmentation using self-supervised learning-driven techniques. To address the imbalance in the dataset and prevent overfitting, Focal Loss and Early Stopping were implemented. The model's performance was evaluated using a 70-30 split of the dataset for training and validation. The results showcase the potential of ophthalmic foundation models in enhancing detection of CRAs and reducing the effort required for labeling by retinal experts, as promising results could be achieved with only a small amount of training data through fine-tuning.


Asunto(s)
Fondo de Ojo , Humanos , Aprendizaje Profundo , Arterias Ciliares/diagnóstico por imagen , Arteria Retiniana/diagnóstico por imagen , Interpretación de Imagen Asistida por Computador/métodos
11.
Sci Rep ; 14(1): 19281, 2024 08 20.
Artículo en Inglés | MEDLINE | ID: mdl-39164302

RESUMEN

Medical practitioners examine medical images, such as X-rays, write reports based on the findings, and provide conclusive statements. Manual interpretation of the results and report generation by examiners are time-consuming processes that lead to potential delays in diagnosis. We propose an automated report generation model for medical images leveraging an encoder-decoder architecture. Our model utilizes transformer architectures, including Vision Transformer (ViT) and its variants like Data Efficient Image Transformer (DEiT) and BERT pre-training image transformer (BEiT), as an encoder. These transformers are adapted for processing to extract and gain visual information from medical images. Reports are transformed into text embeddings, and the Generative Pre-trained Transformer (GPT2) model is used as a decoder to generate medical reports. Our model utilizes a cross-attention mechanism between the vision transformer and GPT2, which enables it to create detailed and coherent medical reports based on the visual information extracted by the encoder. In our model, we have extended the report generation with general knowledge, which is independent of the inputs and provides a comprehensive report in a broad sense. We conduct our experiments on the Indiana University X-ray dataset to demonstrate the effectiveness of our models. Generated medical reports from the model are evaluated using word overlap metrics such as Bleu scores, Rouge-L, retrieval augmentation answer correctness, and similarity metrics such as skip thought cs, greedy matching, vector extrema, and RAG answer similarity. Results show that our model is performing better than the recurrent models in terms of report generation, answer similarity, and word overlap metrics. By automating the report generation process and incorporating advanced transformer architectures and general knowledge, our approach has the potential to significantly improve the efficiency and accuracy of medical image analysis and report generation.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Diagnóstico por Imagen/métodos , Algoritmos
12.
Neural Netw ; 179: 106592, 2024 Aug 03.
Artículo en Inglés | MEDLINE | ID: mdl-39168070

RESUMEN

Brain age (BA) is defined as a measure of brain maturity and could help characterize both the typical brain development and neuropsychiatric disorders in mammals. Various biological phenotypes have been successfully applied to predict BA of human using chronological age (CA) as label. However, whether the BA of macaque, one of the most important animal models, can also be reliably predicted is largely unknown. To address this question, we propose a novel deep learning model called Multi-Branch Vision Transformer (MB-ViT) to fuse multi-scale (i.e., from coarse-grained to fine-grained) brain functional connectivity (FC) patterns derived from resting state functional magnetic resonance imaging (rs-fMRI) data to predict BA of macaques. The discriminative functional connections and the related brain regions contributing to the prediction are further identified based on Gradient-weighted Class Activation Mapping (Grad-CAM) method. Our proposed model successfully predicts BA of 450 normal rhesus macaques from the publicly available PRIMatE Data Exchange (PRIME-DE) dataset with lower mean absolute error (MAE) and mean square error (MSE) as well as higher Pearson's correlation coefficient (PCC) and coefficient of determination (R2) compared to other baseline models. The correlation between the predicted BA and CA reaches as high as 0.82 of our proposed method. Furthermore, our analysis reveals that the functional connections predominantly contributing to the prediction results are situated in the primary motor cortex (M1), visual cortex, area v23 in the posterior cingulate cortex, and dysgranular temporal pole. In summary, our proposed deep learning model provides an effective tool to accurately predict BA of primates (macaque in this study), and lays a solid foundation for future studies of age-related brain diseases in those animal models.

13.
Interdiscip Sci ; 16(2): 469-488, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38951382

RESUMEN

Image classification, a fundamental task in computer vision, faces challenges concerning limited data handling, interpretability, improved feature representation, efficiency across diverse image types, and processing noisy data. Conventional architectural approaches have made insufficient progress in addressing these challenges, necessitating architectures capable of fine-grained classification, enhanced accuracy, and superior generalization. Among these, the vision transformer emerges as a noteworthy computer vision architecture. However, its reliance on substantial data for training poses a drawback due to its complexity and high data requirements. To surmount these challenges, this paper proposes an innovative approach, MetaV, integrating meta-learning into a vision transformer for medical image classification. N-way K-shot learning is employed to train the model, drawing inspiration from human learning mechanisms utilizing past knowledge. Additionally, deformational convolution and patch merging techniques are incorporated into the vision transformer model to mitigate complexity and overfitting while enhancing feature representation. Augmentation methods such as perturbation and Grid Mask are introduced to address the scarcity and noise in medical images, particularly for rare diseases. The proposed model is evaluated using diverse datasets including Break His, ISIC 2019, SIPaKMed, and STARE. The achieved performance accuracies of 89.89%, 87.33%, 94.55%, and 80.22% for Break His, ISIC 2019, SIPaKMed, and STARE, respectively, present evidence validating the superior performance of the proposed model in comparison to conventional models, setting a new benchmark for meta-vision image classification models.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos , Aprendizaje Automático , Diagnóstico por Imagen , Aprendizaje Profundo
14.
J Imaging Inform Med ; 2024 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-38977615

RESUMEN

Automated and accurate classification of pneumonia plays a crucial role in improving the performance of computer-aided diagnosis systems for chest X-ray images. Nevertheless, it is a challenging task due to the difficulty of learning the complex structure information of lung abnormality from chest X-ray images. In this paper, we propose a multi-view aggregation network with Transformer (TransMVAN) for pneumonia classification in chest X-ray images. Specifically, we propose to incorporate the knowledge from glance and focus views to enrich the feature representation of lung abnormality. Moreover, to capture the complex relationships among different lung regions, we propose a bi-directional multi-scale vision Transformer (biMSVT), with which the informative messages between different lung regions are propagated through two directions. In addition, we also propose a gated multi-view aggregation (GMVA) to adaptively select the feature information from glance and focus views for further performance enhancement of pneumonia diagnosis. Our proposed method achieves AUCs of 0.9645 and 0.9550 for pneumonia classification on two different chest X-ray image datasets. In addition, it achieves an AUC of 0.9761 for evaluating positive and negative polymerase chain reaction (PCR). Furthermore, our proposed method also attains an AUC of 0.9741 for classifying non-COVID-19 pneumonia, COVID-19 pneumonia, and normal cases. Experimental results demonstrate the effectiveness of our method over other methods used for comparison in pneumonia diagnosis from chest X-ray images.

15.
BMC Med Inform Decis Mak ; 24(1): 191, 2024 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-38978027

RESUMEN

BACKGROUND: Recent advances in Vision Transformer (ViT)-based deep learning have significantly improved the accuracy of lung disease prediction from chest X-ray images. However, limited research exists on comparing the effectiveness of different optimizers for lung disease prediction within ViT models. This study aims to systematically evaluate and compare the performance of various optimization methods for ViT-based models in predicting lung diseases from chest X-ray images. METHODS: This study utilized a chest X-ray image dataset comprising 19,003 images containing both normal cases and six lung diseases: COVID-19, Viral Pneumonia, Bacterial Pneumonia, Middle East Respiratory Syndrome (MERS), Severe Acute Respiratory Syndrome (SARS), and Tuberculosis. Each ViT model (ViT, FastViT, and CrossViT) was individually trained with each optimization method (Adam, AdamW, NAdam, RAdam, SGDW, and Momentum) to assess their performance in lung disease prediction. RESULTS: When tested with ViT on the dataset with balanced-sample sized classes, RAdam demonstrated superior accuracy compared to other optimizers, achieving 95.87%. In the dataset with imbalanced sample size, FastViT with NAdam achieved the best performance with an accuracy of 97.63%. CONCLUSIONS: We provide comprehensive optimization strategies for developing ViT-based model architectures, which can enhance the performance of these models for lung disease prediction from chest X-ray images.


Asunto(s)
Aprendizaje Profundo , Enfermedades Pulmonares , Humanos , Enfermedades Pulmonares/diagnóstico por imagen , Radiografía Torácica/métodos , Radiografía Torácica/normas , COVID-19/diagnóstico por imagen
16.
PeerJ Comput Sci ; 10: e2146, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38983210

RESUMEN

In recent years, the growing importance of accurate semantic segmentation in ultrasound images has led to numerous advances in deep learning-based techniques. In this article, we introduce a novel hybrid network that synergistically combines convolutional neural networks (CNN) and Vision Transformers (ViT) for ultrasound image semantic segmentation. Our primary contribution is the incorporation of multi-scale CNN in both the encoder and decoder stages, enhancing feature learning capabilities across multiple scales. Further, the bottleneck of the network leverages the ViT to capture long-range high-dimension spatial dependencies, a critical factor often overlooked in conventional CNN-based approaches. We conducted extensive experiments using a public benchmark ultrasound nerve segmentation dataset. Our proposed method was benchmarked against 17 existing baseline methods, and the results underscored its superiority, as it outperformed all competing methods including a 4.6% improvement of Dice compared against TransUNet, 13.0% improvement of Dice against Attention UNet, 10.5% improvement of precision compared against UNet. This research offers significant potential for real-world applications in medical imaging, demonstrating the power of blending CNN and ViT in a unified framework.

17.
Sensors (Basel) ; 24(13)2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-39001055

RESUMEN

Human-object interaction (HOI) detection identifies a "set of interactions" in an image involving the recognition of interacting instances and the classification of interaction categories. The complexity and variety of image content make this task challenging. Recently, the Transformer has been applied in computer vision and received attention in the HOI detection task. Therefore, this paper proposes a novel Part Refinement Tandem Transformer (PRTT) for HOI detection. Unlike the previous Transformer-based HOI method, PRTT utilizes multiple decoders to split and process rich elements of HOI prediction and introduces a new part state feature extraction (PSFE) module to help improve the final interaction category classification. We adopt a novel prior feature integrated cross-attention (PFIC) to utilize the fine-grained partial state semantic and appearance feature output obtained by the PSFE module to guide queries. We validate our method on two public datasets, V-COCO and HICO-DET. Compared to state-of-the-art models, the performance of detecting human-object interaction is significantly improved by the PRTT.

18.
Magn Reson Imaging ; 113: 110219, 2024 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-39069027

RESUMEN

This study investigated the use of a Vision Transformer (ViT) for reconstructing GABA-edited Magnetic Resonance Spectroscopy (MRS) data from a reduced number of transients. Transients refer to the samples collected during an MRS acquisition by repeating the experiment to generate a signal of sufficient quality. Specifically, 80 transients were used instead of the typical 320 transients, aiming to reduce scan time. The 80 transients were pre-processed and converted into a spectrogram image representation using the Short-Time Fourier Transform (STFT). A pre-trained ViT, named Spectro-ViT, was fine-tuned and then tested using in-vivo GABA-edited MEGA-PRESS data. Its performance was compared against other pipelines in the literature using quantitative quality metrics and estimated metabolite concentration values, with the typical 320-transient scans serving as the reference for comparison. The Spectro-ViT model exhibited the best overall quality metrics among all other pipelines against which it was compared. The metabolite concentrations from Spectro-ViT's reconstructions for GABA+ achieved the best average R2 value of 0.67 and the best average Mean Absolute Percentage Error (MAPE) value of 9.68%, with no significant statistical differences found compared to the 320-transient reference. The code to reproduce this research is available at https://github.com/MICLab-Unicamp/Spectro-ViT.

19.
Phys Med Biol ; 69(15)2024 Jul 18.
Artículo en Inglés | MEDLINE | ID: mdl-38981596

RESUMEN

Objective. Bifurcation detection in intravascular optical coherence tomography (IVOCT) images plays a significant role in guiding optimal revascularization strategies for percutaneous coronary intervention (PCI). We propose a bifurcation detection method using vision transformer (ViT) based deep learning in IVOCT.Approach. Instead of relying on lumen segmentation, the proposed method identifies the bifurcation image using a ViT-based classification model and then estimate bifurcation ostium points by a ViT-based landmark detection model.Main results. By processing 8640 clinical images, the Accuracy and F1-score of bifurcation identification by the proposed ViT-based model are 2.54% and 16.08% higher than that of traditional non-deep learning methods, are similar to the best performance of convolutional neural networks (CNNs) based methods, respectively. The ostium distance error of the ViT-based model is 0.305 mm, which is reduced 68.5% compared with the traditional non-deep learning method and reduced 24.81% compared with the best performance of CNNs based methods. The results also show that the proposed ViT-based method achieves the highest success detection rate are 11.3% and 29.2% higher than the non-deep learning method, and 4.6% and 2.5% higher than the best performance of CNNs based methods when the distance section is 0.1 and 0.2 mm, respectively.Significance. The proposed ViT-based method enhances the performance of bifurcation detection of IVOCT images, which maintains a high correlation and consistency between the automatic detection results and the expert manual results. It is of great significance in guiding the selection of PCI treatment strategies.


Asunto(s)
Aprendizaje Profundo , Procesamiento de Imagen Asistido por Computador , Tomografía de Coherencia Óptica , Tomografía de Coherencia Óptica/métodos , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Vasos Coronarios/diagnóstico por imagen
20.
BMC Cancer ; 24(1): 795, 2024 Jul 03.
Artículo en Inglés | MEDLINE | ID: mdl-38961418

RESUMEN

BACKGROUND: Oral Squamous Cell Carcinoma (OSCC) presents significant diagnostic challenges in its early and late stages. This study aims to utilize preoperative MRI and biochemical indicators of OSCC patients to predict the stage of tumors. METHODS: This study involved 198 patients from two medical centers. A detailed analysis of contrast-enhanced T1-weighted (ceT1W) and T2-weighted (T2W) MRI were conducted, integrating these with biochemical indicators for a comprehensive evaluation. Initially, 42 clinical biochemical indicators were selected for consideration. Through univariate analysis and multivariate analysis, only those indicators with p-values less than 0.05 were retained for model development. To extract imaging features, machine learning algorithms in conjunction with Vision Transformer (ViT) techniques were utilized. These features were integrated with biochemical indicators for predictive modeling. The performance of model was evaluated using the Receiver Operating Characteristic (ROC) curve. RESULTS: After rigorously screening biochemical indicators, four key markers were selected for the model: cholesterol, triglyceride, very low-density lipoprotein cholesterol and chloride. The model, developed using radiomics and deep learning for feature extraction from ceT1W and T2W images, showed a lower Area Under the Curve (AUC) of 0.85 in the validation cohort when using these imaging modalities alone. However, integrating these biochemical indicators improved the model's performance, increasing the validation cohort AUC to 0.87. CONCLUSION: In this study, the performance of the model significantly improved following multimodal fusion, outperforming the single-modality approach. CLINICAL RELEVANCE STATEMENT: This integration of radiomics, ViT models, and lipid metabolite analysis, presents a promising non-invasive technique for predicting the staging of OSCC.


Asunto(s)
Neoplasias de la Boca , Estadificación de Neoplasias , Carcinoma de Células Escamosas de Cabeza y Cuello , Adulto , Anciano , Femenino , Humanos , Masculino , Persona de Mediana Edad , Biomarcadores de Tumor , Lípidos/sangre , Aprendizaje Automático , Imagen por Resonancia Magnética/métodos , Neoplasias de la Boca/diagnóstico por imagen , Neoplasias de la Boca/patología , Radiómica , Curva ROC , Carcinoma de Células Escamosas de Cabeza y Cuello/diagnóstico por imagen , Carcinoma de Células Escamosas de Cabeza y Cuello/patología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA