Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 2.447
Filter
1.
Front Plant Sci ; 15: 1409544, 2024.
Article in English | MEDLINE | ID: mdl-39354942

ABSTRACT

In the current agricultural landscape, a significant portion of tomato plants suffer from leaf diseases, posing a major challenge to manual detection due to the task's extensive scope. Existing detection algorithms struggle to balance speed with accuracy, especially when identifying small-scale leaf diseases across diverse settings. Addressing this need, this study presents FCHF-DETR (Faster-Cascaded-attention-High-feature-fusion-Focaler Detection-Transformer), an innovative, high-precision, and lightweight detection algorithm based on RT-DETR-R18 (Real-Time-Detection-Transformer-ResNet18). The algorithm was developed using a carefully curated dataset of 3147 RGB images, showcasing tomato leaf diseases across a range of scenes and resolutions. FasterNet replaces ResNet18 in the algorithm's backbone network, aimed at reducing the model's size and improving memory efficiency. Additionally, replacing the conventional AIFI (Attention-based Intra-scale Feature Interaction) module with Cascaded Group Attention and the original CCFM (CNN-based Cross-scale Feature-fusion Module) module with HSFPN (High-Level Screening-feature Fusion Pyramid Networks) in the Efficient Hybrid Encoder significantly enhanced detection accuracy without greatly affecting efficiency. To tackle the challenge of identifying challenging samples, the Focaler-CIoU loss function was incorporated, refining the model's performance throughout the dataset. Empirical results show that FCHF-DETR achieved 96.4% Precision, 96.7% Recall, 89.1% mAP (Mean Average Precision) 50-95 and 97.2% mAP50 on the test set, with a reduction of 9.2G in FLOPs (floating point of operations) and 3.6M in parameters. These findings clearly demonstrate that the proposed method improves detection accuracy and reduces computational complexity, addressing the dual challenges of precision and efficiency in tomato leaf disease detection.

2.
Front Plant Sci ; 15: 1389961, 2024.
Article in English | MEDLINE | ID: mdl-39354950

ABSTRACT

Introduction: In the field of agriculture, automated harvesting of Camellia oleifera fruit has become an important research area. However, accurately detecting Camellia oleifera fruit in a natural environment is a challenging task. The task of accurately detecting Camellia oleifera fruit in natural environments is complex due to factors such as shadows, which can impede the performance of traditional detection techniques, highlighting the need for more robust methods. Methods: To overcome these challenges, we propose an efficient deep learning method called YOLO-CFruit, which is specifically designed to accurately detect Camellia oleifera fruits in challenging natural environments. First, we collected images of Camellia oleifera fruits and created a dataset, and then used a data enhancement method to further enhance the diversity of the dataset. Our YOLO-CFruit model combines a CBAM module for identifying regions of interest in landscapes with Camellia oleifera fruit and a CSP module with Transformer for capturing global information. In addition, we improve YOLOCFruit by replacing the CIoU Loss with the EIoU Loss in the original YOLOv5. Results: By testing the training network, we find that the method performs well, achieving an average precision of 98.2%, a recall of 94.5%, an accuracy of 98%, an F1 score of 96.2, and a frame rate of 19.02 ms. The experimental results show that our method improves the average precision by 1.2% and achieves the highest accuracy and higher F1 score among all state-of-the-art networks compared to the conventional YOLOv5s network. Discussion: The robust performance of YOLO-CFruit under different real-world conditions, including different light and shading scenarios, signifies its high reliability and lays a solid foundation for the development of automated picking devices.

3.
Neural Netw ; 181: 106749, 2024 Sep 23.
Article in English | MEDLINE | ID: mdl-39357266

ABSTRACT

Unsupervised Domain Adaptation aims to leverage a source domain with ample labeled data to tackle tasks on an unlabeled target domain. However, this poses a significant challenge, particularly in scenarios exhibiting significant disparities between the two domains. Prior methods often fall short in challenging domains due to the impact of incorrect pseudo-labeling noise and the limits of handcrafted domain alignment rules. In this paper, we propose a novel method called DCST (Dual Cross-Supervision Transformer), which improves upon existing methods in two key aspects. Firstly, vision transformer is combined with dual cross-supervision learning strategy to enforce consistency learning from different domains. The network accomplishes domain-specific self-training and cross-domain feature alignment in an adaptive manner. Secondly, due to the presence of noise in challenging domain, and the need to reduce the risks of model collapse and overfitting, we propose a Domain Shift Filter. Specifically, this module allows the model to leverage the memory of source domain features to facilitate a smooth transition. It can also improve the effectiveness of knowledge transfer between domains with significant gaps. We conduct extensive experiments on four benchmark datasets and achieved the best classification results, including 94.3% on Office-31, 86.0% on Office-Home, 89.3% on VisDA-2017, and 48.8% on DomainNet. Code is available in https://github.com/Yislight/DCST.

4.
Curr Med Imaging ; 2024 Oct 02.
Article in English | MEDLINE | ID: mdl-39360542

ABSTRACT

INTRODUCTION: In this study, we harnessed three cutting-edge algorithms' capabilities to refine the elbow fracture prediction process through X-ray image analysis. Employing the YOLOv8 (You only look once) algorithm, we first identified Regions of Interest (ROI) within the X-ray images, significantly augmenting fracture prediction accuracy. METHODS: Subsequently, we integrated and compared the ResNet, the SeResNet (Squeeze-and-Excitation Residual Network) ViT (Vision Transformer) algorithms to refine our predictive capabilities. Furthermore, to ensure optimal precision, we implemented a series of meticulous refinements. This included recalibrating ROI regions to enable finer-grained identification of diagnostically significant areas within the X-ray images. Additionally, advanced image enhancement techniques were applied to optimize the X-ray images' visual quality and structural clarity. RESULTS: These methodological enhancements synergistically contributed to a substantial improvement in the overall accuracy of our fracture predictions. The dataset utilized for training, testing & validation, and comprehensive evaluation exclusively comprised elbow X-ray images, where predicting the fracture with three algorithms: Resnet50; accuracy 0.97, precision 1, recall 0.95, SeResnet50; accuracy 0.97, precision 1, recall 0.95 & ViTB- 16 with high accuracy of 0.99, precision same as the other two algorithms, with a recall of 0.95. CONCLUSION: This approach has the potential to increase the precision of diagnoses, lessen the burden of radiologists, easily integrate into current medical imaging systems, and assist clinical decision-making, all of which could lead to better patient care and health outcomes overall.

5.
Front Med (Lausanne) ; 11: 1402457, 2024.
Article in English | MEDLINE | ID: mdl-39359921

ABSTRACT

This study aims to evaluate the feasibility of large language model (LLM) in answering pathology questions based on pathology reports (PRs) of colorectal cancer (CRC). Four common questions (CQs) and corresponding answers about pathology were retrieved from public webpages. These questions were input as prompts for Chat Generative Pretrained Transformer (ChatGPT) (gpt-3.5-turbo). The quality indicators (understanding, scientificity, satisfaction) of all answers were evaluated by gastroenterologists. Standard PRs from 5 CRC patients who received radical surgeries in Shanghai Changzheng Hospital were selected. Six report questions (RQs) and corresponding answers were generated by a gastroenterologist and a pathologist. We developed an interactive PRs interpretation system which allows users to upload standard PRs as JPG images. Then the ChatGPT's responses to the RQs were generated. The quality indicators of all answers were evaluated by gastroenterologists and out-patients. As for CQs, gastroenterologists rated AI answers similarly to non-AI answers in understanding, scientificity, and satisfaction. As for RQ1-3, gastroenterologists and patients rated the AI mean scores higher than non-AI scores among the quality indicators. However, as for RQ4-6, gastroenterologists rated the AI mean scores lower than non-AI scores in understanding and satisfaction. In RQ4, gastroenterologists rated the AI scores lower than non-AI scores in scientificity (P = 0.011); patients rated the AI scores lower than non-AI scores in understanding (P = 0.004) and satisfaction (P = 0.011). In conclusion, LLM could generate credible answers to common pathology questions and conceptual questions on the PRs. It holds great potential in improving doctor-patient communication.

6.
Hu Li Za Zhi ; 71(5): 7-13, 2024 Oct.
Article in Chinese | MEDLINE | ID: mdl-39350704

ABSTRACT

Artificial intelligence (AI) is driving global change, and the implementation of generative AI in higher education is inevitable. AI language models such as the chat generative pre-trained transformer (ChatGPT) hold the potential to revolutionize the delivery of nursing education in the future. Nurse educators play a crucial role in preparing nursing students for a future technology-integrated healthcare system. While the technology has limitations and potential biases, the emergence of ChatGPT presents both opportunities and challenges. It is critical for faculty to be familiar with the capabilities and limitations of this model to foster effective, ethical, and responsible utilization of AI technology while preparing students in advance for the dynamic and rapidly advancing landscape of nursing and healthcare. Therefore, this article was written to present a strengths, weaknesses, opportunities, and threats (SWOT) analysis of integrating ChatGPT into nursing education, providing a guide for implementing ChatGPT in nursing education and offering a well-rounded assessment to help nurse educators make informed decisions.


Subject(s)
Artificial Intelligence , Education, Nursing , Humans
7.
Microsc Res Tech ; 2024 Oct 01.
Article in English | MEDLINE | ID: mdl-39351968

ABSTRACT

Lymph-node status is important in decision-making during early gastric cancer (EGC) treatment. Currently, endoscopic submucosal dissection is the mainstream treatment for EGC. However, it is challenging for even experienced endoscopists to accurately diagnose and treat EGC. Multiphoton microscopy can extract the morphological features of collagen fibers from tissues. The characteristics of collagen fibers can be used to assess the lymph-node metastasis status in patients with EGC. First, we compared the accuracy of four deep learning models (VGG16, ResNet34, MobileNetV2, and PVTv2) in training preprocessed images and test datasets. Next, we integrated the features of the best-performing model, which was PVTv2, with manual and clinical features to develop a novel model called AutoLNMNet. The prediction accuracy of AutoLNMNet for the no metastasis (Ly0) and metastasis in lymph nodes (Ly1) stages reached 0.92, which was 0.3% higher than that of PVTv2. The receiver operating characteristics of AutoLNMNet in quantifying Ly0 and Ly1 stages were 0.97 and 0.97, respectively. Therefore, AutoLNMNet is highly reliable and accurate in detecting lymph-node metastasis, providing an important tool for the early diagnosis and treatment of EGC.

8.
BMC Med Inform Decis Mak ; 24(1): 288, 2024 Oct 07.
Article in English | MEDLINE | ID: mdl-39375719

ABSTRACT

BACKGROUND: Histopathology is a gold standard for cancer diagnosis. It involves extracting tissue specimens from suspicious areas to prepare a glass slide for a microscopic examination. However, histological tissue processing procedures result in the introduction of artifacts, which are ultimately transferred to the digitized version of glass slides, known as whole slide images (WSIs). Artifacts are diagnostically irrelevant areas and may result in wrong predictions from deep learning (DL) algorithms. Therefore, detecting and excluding artifacts in the computational pathology (CPATH) system is essential for reliable automated diagnosis. METHODS: In this paper, we propose a mixture of experts (MoE) scheme for detecting five notable artifacts, including damaged tissue, blur, folded tissue, air bubbles, and histologically irrelevant blood from WSIs. First, we train independent binary DL models as experts to capture particular artifact morphology. Then, we ensemble their predictions using a fusion mechanism. We apply probabilistic thresholding over the final probability distribution to improve the sensitivity of the MoE. We developed four DL pipelines to evaluate computational and performance trade-offs. These include two MoEs and two multiclass models of state-of-the-art deep convolutional neural networks (DCNNs) and vision transformers (ViTs). These DL pipelines are quantitatively and qualitatively evaluated on external and out-of-distribution (OoD) data to assess generalizability and robustness for artifact detection application. RESULTS: We extensively evaluated the proposed MoE and multiclass models. DCNNs-based MoE and ViTs-based MoE schemes outperformed simpler multiclass models and were tested on datasets from different hospitals and cancer types, where MoE using (MobileNet) DCNNs yielded the best results. The proposed MoE yields 86.15 % F1 and 97.93% sensitivity scores on unseen data, retaining less computational cost for inference than MoE using ViTs. This best performance of MoEs comes with relatively higher computational trade-offs than multiclass models. Furthermore, we apply post-processing to create an artifact segmentation mask, a potential artifact-free RoI map, a quality report, and an artifact-refined WSI for further computational analysis. During the qualitative evaluation, field experts assessed the predictive performance of MoEs over OoD WSIs. They rated artifact detection and artifact-free area preservation, where the highest agreement translated to a Cohen Kappa of 0.82, indicating substantial agreement for the overall diagnostic usability of the DCNN-based MoE scheme. CONCLUSIONS: The proposed artifact detection pipeline will not only ensure reliable CPATH predictions but may also provide quality control. In this work, the best-performing pipeline for artifact detection is MoE with DCNNs. Our detailed experiments show that there is always a trade-off between performance and computational complexity, and no straightforward DL solution equally suits all types of data and applications. The code and HistoArtifacts dataset can be found online at Github and Zenodo , respectively.


Subject(s)
Artifacts , Deep Learning , Humans , Neoplasms , Image Processing, Computer-Assisted/methods , Pathology, Clinical/standards , Image Interpretation, Computer-Assisted/methods
9.
Front Comput Neurosci ; 18: 1404623, 2024.
Article in English | MEDLINE | ID: mdl-39380741

ABSTRACT

Introduction: With the great success of Transformers in the field of machine learning, it is also gradually attracting widespread interest in the field of remote sensing (RS). However, the research in the field of remote sensing has been hampered by the lack of large labeled data sets and the inconsistency of data modes caused by the diversity of RS platforms. With the rise of self-supervised learning (SSL) algorithms in recent years, RS researchers began to pay attention to the application of "pre-training and fine-tuning" paradigm in RS. However, there are few researches on multi-modal data fusion in remote sensing field. Most of them choose to use only one of the modal data or simply splice multiple modal data roughly. Method: In order to study a more efficient multi-modal data fusion scheme, we propose a multi-modal fusion mechanism based on gated unit control (MGSViT). In this paper, we pretrain the ViT model based on BigEarthNet dataset by combining two commonly used SSL algorithms, and propose an intra-modal and inter-modal gated fusion unit for feature learning by combining multispectral (MS) and synthetic aperture radar (SAR). Our method can effectively combine different modal data to extract key feature information. Results and discussion: After fine-tuning and comparison experiments, we outperform the most advanced algorithms in all downstream classification tasks. The validity of our proposed method is verified.

10.
Front Neurorobot ; 18: 1452019, 2024.
Article in English | MEDLINE | ID: mdl-39381775

ABSTRACT

Introduction: Currently, using machine learning methods for precise analysis and improvement of swimming techniques holds significant research value and application prospects. The existing machine learning methods have improved the accuracy of action recognition to some extent. However, they still face several challenges such as insufficient data feature extraction, limited model generalization ability, and poor real-time performance. Methods: To address these issues, this paper proposes an innovative approach called Swimtrans Net: A multimodal robotic system for swimming action recognition driven via Swin-Transformer. By leveraging the powerful visual data feature extraction capabilities of Swin-Transformer, Swimtrans Net effectively extracts swimming image information. Additionally, to meet the requirements of multimodal tasks, we integrate the CLIP model into the system. Swin-Transformer serves as the image encoder for CLIP, and through fine-tuning the CLIP model, it becomes capable of understanding and interpreting swimming action data, learning relevant features and patterns associated with swimming. Finally, we introduce transfer learning for pre-training to reduce training time and lower computational resources, thereby providing real-time feedback to swimmers. Results and discussion: Experimental results show that Swimtrans Net has achieved a 2.94% improvement over the current state-of-the-art methods in swimming motion analysis and prediction, making significant progress. This study introduces an innovative machine learning method that can help coaches and swimmers better understand and improve swimming techniques, ultimately improving swimming performance.

11.
J Environ Manage ; 370: 122742, 2024 Oct 08.
Article in English | MEDLINE | ID: mdl-39383749

ABSTRACT

Sorting out plastic waste (PW) from municipal solid waste (MSW) by material type is crucial for reutilization and pollution reduction. However, current automatic separation methods are costly and inefficient, necessitating an advanced sorting process to ensure high feedstock purity. This study introduces a Swin Transformer-based model for effectively detecting PW in real-world MSW streams, leveraging both morphological and material properties. And, a dataset comprising 3560 optical images and infrared spectra data was created to support this task. This vision-based system can localize and classify PW into five categories: polypropylene (PP), polyethylene (PE), polyethylene terephthalate (PET), polyvinyl chloride (PVC), and polystyrene (PS). Performance evaluations reveal an accuracy rate of 99.75% and a mean Average Precision (mAP50) exceeding 91%. Compared to popular convolutional neural network (CNN)-based models, this well-trained Swin Transformer-based model offers enhanced convenience and performance in five-category PW detection task, maintaining a mAP50 over 80% in the real-life deployment. The model's effectiveness is further supported by visualization of detection results on MSW streams and principal component analysis of classification scores. These results demonstrate the system's significant effectiveness in both lab-scale and real-life conditions, aligning with global regulations and strategies that promote innovative technologies for plastic recycling, thereby contributing to the development of a sustainable circular economy.

12.
J Food Sci ; 2024 Oct 09.
Article in English | MEDLINE | ID: mdl-39385405

ABSTRACT

Pinelliae Rhizoma is a key ingredient in botanical supplements and is often adulterated by Rhizoma Pinelliae Pedatisectae, which is similar in appearance but less expensive. Accurate identification of these materials is crucial for both scientific and commercial purposes. Traditional morphological identification relies heavily on expert experience and is subjective, while chemical analysis and molecular biological identification are typically time consuming and labor intensive. This study aims to employ a simpler, faster, and non-invasive image recognition technique to distinguish between these two highly similar plant materials. In the realm of image recognition, we aimed to utilize the vision transformer (ViT) algorithm, a cutting-edge image recognition technology, to differentiate these materials. All samples were verified using DNA molecular identification before image analysis. The result demonstrates that the ViT algorithm achieves a classification accuracy exceeding 94%, significantly outperforming the convolutional neural network model's 60%-70% accuracy. This highlights the efficiency of this technology in identifying plant materials with similar appearances. This study marks the pioneer work of the ViT algorithm to such a challenging task, showcasing its potential for precise botanical material identification and setting the stage for future advancements in the field.

13.
Neural Netw ; 181: 106782, 2024 Oct 05.
Article in English | MEDLINE | ID: mdl-39388995

ABSTRACT

Magnetic resonance imaging (MRI) plays a pivotal role in diagnosing and staging prostate cancer. Precise delineation of the peripheral zone (PZ) and transition zone (TZ) within prostate MRI is essential for accurate diagnosis and subsequent artificial intelligence-driven analysis. However, existing segmentation methods are limited by ambiguous boundaries, shape variations and texture complexities between PZ and TZ. Moreover, they suffer from inadequate modeling capabilities and limited receptive fields. To address these challenges, we propose a Enhanced MixFormer, which integrates window-based multi-head self-attention (W-MSA) and depth-wise convolution with parallel design and cross-branch bidirectional interaction. We further introduce MixUNETR, which use multiple Enhanced MixFormers as encoder to extract features from both PZ and TZ in prostate MRI. This augmentation effectively enlarges the receptive field and enhances the modeling capability of W-MSA, ultimately improving the extraction of both global and local feature information from PZ and TZ, thereby addressing mis-segmentation and challenges in delineating boundaries between them. Extensive experiments were conducted, comparing MixUNETR with several state-of-the-art methods on the Prostate158, ProstateX public datasets and private dataset. The results consistently demonstrate the accuracy and robustness of MixUNETR in MRI prostate segmentation. Our code of methods is available at https://github.com/skyous779/MixUNETR.git.

14.
Neurourol Urodyn ; 2024 Oct 10.
Article in English | MEDLINE | ID: mdl-39390731

ABSTRACT

BACKGROUND: Artificial intelligence models are increasingly gaining popularity among patients and healthcare professionals. While it is impossible to restrict patient's access to different sources of information on the Internet, healthcare professional needs to be aware of the content-quality available across different platforms. OBJECTIVE: To investigate the accuracy and completeness of Chat Generative Pretrained Transformer (ChatGPT) in addressing frequently asked questions related to the management and treatment of female urinary incontinence (UI), compared to recommendations from guidelines. METHODS: This is a cross-sectional study. Two researchers developed 14 frequently asked questions related to UI. Then, they were inserted into the ChatGPT platform on September 16, 2023. The accuracy (scores from 1 to 5) and completeness (score from 1 to 3) of ChatGPT's answers were assessed individually by two experienced researchers in the Women's Health field, following the recommendations proposed by the guidelines for UI. RESULTS: Most of the answers were classified as "more correct than incorrect" (n = 6), followed by "incorrect information than correct" (n = 3), "approximately equal correct and incorrect" (n = 2), "near all correct" (n = 2, and "correct" (n = 1). Regarding the appropriateness, most of the answers were classified as adequate, as they provided the minimum information expected to be classified as correct. CONCLUSION: These results showed an inconsistency when evaluating the accuracy of answers generated by ChatGPT compared by scientific guidelines. Almost all the answers did not bring the complete content expected or reported in previous guidelines, which highlights to healthcare professionals and scientific community a concern about using artificial intelligence in patient counseling.

15.
Sci Rep ; 14(1): 23879, 2024 Oct 12.
Article in English | MEDLINE | ID: mdl-39396096

ABSTRACT

Hyperspectral image (HSI) data has a wide range of valuable spectral information for numerous tasks. HSI data encounters challenges such as small training samples, scarcity, and redundant information. Researchers have introduced various research works to address these challenges. Convolution Neural Network (CNN) has gained significant success in the field of HSI classification. CNN's primary focus is to extract low-level features from HSI data, and it has a limited ability to detect long-range dependencies due to the confined filter size. In contrast, vision transformers exhibit great success in the HSI classification field due to the use of attention mechanisms to learn the long-range dependencies. As mentioned earlier, the primary issue with these models is that they require sufficient labeled training data. To address this challenge, we proposed a spectral-spatial feature extractor group attention transformer that consists of a multiscale feature extractor to extract low-level or shallow features. For high-level semantic feature extraction, we proposed a group attention mechanism. Our proposed model is evaluated using four publicly available HSI datasets, which are Indian Pines, Pavia University, Salinas, and the KSC dataset. Our proposed approach achieved the best classification results in terms of overall accuracy (OA), average accuracy (AA), and Kappa coefficient. As mentioned earlier, the proposed approach utilized only 5%, 1%, 1%, and 10% of the training samples from the publicly available four datasets.

16.
Trends Hear ; 28: 23312165241282872, 2024.
Article in English | MEDLINE | ID: mdl-39397786

ABSTRACT

Decoding speech envelopes from electroencephalogram (EEG) signals holds potential as a research tool for objectively assessing auditory processing, which could contribute to future developments in hearing loss diagnosis. However, current methods struggle to meet both high accuracy and interpretability. We propose a deep learning model called the auditory decoding transformer (ADT) network for speech envelope reconstruction from EEG signals to address these issues. The ADT network uses spatio-temporal convolution for feature extraction, followed by a transformer decoder to decode the speech envelopes. Through anticausal masking, the ADT considers only the current and future EEG features to match the natural relationship of speech and EEG. Performance evaluation shows that the ADT network achieves average reconstruction scores of 0.168 and 0.167 on the SparrKULee and DTU datasets, respectively, rivaling those of other nonlinear models. Furthermore, by visualizing the weights of the spatio-temporal convolution layer as time-domain filters and brain topographies, combined with an ablation study of the temporal convolution kernels, we analyze the behavioral patterns of the ADT network in decoding speech envelopes. The results indicate that low- (0.5-8 Hz) and high-frequency (14-32 Hz) EEG signals are more critical for envelope reconstruction and that the active brain regions are primarily distributed bilaterally in the auditory cortex, consistent with previous research. Visualization of attention scores further validated previous research. In summary, the ADT network balances high performance and interpretability, making it a promising tool for studying neural speech envelope tracking.


Subject(s)
Deep Learning , Electroencephalography , Signal Processing, Computer-Assisted , Speech Perception , Humans , Electroencephalography/methods , Speech Perception/physiology , Nonlinear Dynamics , Acoustic Stimulation/methods , Speech Acoustics , Neural Networks, Computer , Auditory Cortex/physiology
17.
Front Physiol ; 15: 1432987, 2024.
Article in English | MEDLINE | ID: mdl-39397853

ABSTRACT

Introduction: Ultrasound imaging has become a crucial tool in medical diagnostics, offering real-time visualization of internal organs and tissues. However, challenges such as low contrast, high noise levels, and variability in image quality hinder accurate interpretation. To enhance the diagnostic accuracy and support treatment decisions, precise segmentation of organs and lesions in ultrasound image is essential. Recently, several deep learning methods, including convolutional neural networks (CNNs) and Transformers, have reached significant milestones in medical image segmentation. Nonetheless, there remains a pressing need for methods capable of seamlessly integrating global context with local fine-grained information, particularly in addressing the unique challenges posed by ultrasound images. Methods: In this paper, to address these issues, we propose DDTransUNet, a hybrid network combining Transformer and CNN, with a dual-branch encoder and dual attention mechanism for ultrasound image segmentation. DDTransUNet adopts a Swin Transformer branch and a CNN branch to extract global context and local fine-grained information. The dual attention comprising Global Spatial Attention (GSA) and Global Channel Attention (GCA) modules to capture long-range visual dependencies. A novel Cross Attention Fusion (CAF) module effectively fuses feature maps from both branches using cross-attention. Results: Experiments on three ultrasound image datasets demonstrate that DDTransUNet outperforms previous methods. In the TN3K dataset, DDTransUNet achieves IoU, Dice, HD95 and ACC metrics of 73.82%, 82.31%, 16.98 mm, and 96.94%, respectively. In the BUS-BRA dataset, DDTransUNet achieves 80.75%, 88.23%, 8.12 mm, and 98.00%. In the CAMUS dataset, DDTransUNet achieves 82.51%, 90.33%, 2.82 mm, and 96.87%. Discussion: These results indicate that our method can provide valuable diagnostic assistance to clinical practitioners.

18.
JACC Adv ; 3(9): 101196, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39372455

ABSTRACT

Background: Ejection fraction (EF) estimation informs patient plans in the ICU, and low EF can indicate ventricular systolic dysfunction, which increases the risk of adverse events including heart failure. Automated echocardiography models are an attractive solution for high-variance human EF estimation, and key to this goal are echocardiogram vector embeddings, which are a critical resource for computational researchers. Objectives: The authors aimed to extract the vector embeddings from each echocardiogram in the EchoNet dataset using a classifier trained to classify EF as healthy (>50%) or unhealthy (<= 50%) to create an embeddings dataset for computational researchers. Methods: We repurposed an R3D transformer to classify whether patient EF is below or above 50%. Training, validation, and testing were done on the EchoNet dataset of 10,030 echocardiograms, and the resulting model generated embeddings for each of these videos. Results: We extracted 400-dimensional vector embeddings for each of the 10,030 EchoNet echocardiograms using the trained R3D model, which achieved a test AUC of 0.916 and 87.5% accuracy, approaching the performance of comparable studies. Conclusions: We present 10,030 vector embeddings learned by this model as a resource to the cardiology research community, as well as the trained model itself. These vectors enable algorithmic improvements and multimodal applications within automated echocardiography, benefitting the research community and those with ventricular systolic dysfunction (https://github.com/Team-Echo-MIT/r3d-v0-embeddings).

19.
J Imaging Inform Med ; 2024 Oct 14.
Article in English | MEDLINE | ID: mdl-39402355

ABSTRACT

Lung adenocarcinoma and squamous cell carcinoma are the two most common pathological lung cancer subtypes. Accurate diagnosis and pathological subtyping are crucial for lung cancer treatment. Solitary solid lung nodules with lobulation and spiculation signs are often indicative of lung cancer; however, in some cases, postoperative pathology finds benign solid lung nodules. It is critical to accurately identify solid lung nodules with lobulation and spiculation signs before surgery; however, traditional diagnostic imaging is prone to misdiagnosis, and studies on artificial intelligence-assisted diagnosis are few. Therefore, we introduce a volumetric SWIN Transformer-based method. It is a multi-scale, multi-task, and highly interpretable model for distinguishing between benign solid lung nodules with lobulation and spiculation signs, lung adenocarcinomas, and lung squamous cell carcinoma. The technique's effectiveness was improved by using 3-dimensional (3D) computed tomography (CT) images instead of conventional 2-dimensional (2D) images to combine as much information as possible. The model was trained using 352 of the 441 CT image sequences and validated using the rest. The experimental results showed that our model could accurately differentiate between benign lung nodules with lobulation and spiculation signs, lung adenocarcinoma, and squamous cell carcinoma. On the test set, our model achieves an accuracy of 0.9888, precision of 0.9892, recall of 0.9888, and an F1-score of 0.9888, along with a class activation mapping (CAM) visualization of the 3D model. Consequently, our method could be used as a preoperative tool to assist in diagnosing solitary solid lung nodules with lobulation and spiculation signs accurately and provide a theoretical basis for developing appropriate clinical diagnosis and treatment plans for the patients.

20.
FASEB J ; 38(19): e70083, 2024 Oct 15.
Article in English | MEDLINE | ID: mdl-39373982

ABSTRACT

Drug-target binding affinity (DTA) prediction is vital for drug repositioning. The accuracy and generalizability of DTA models remain a major challenge. Here, we develop a model composed of BERT-Trans Block, Multi-Trans Block, and DTI Learning modules, referred to as Molecular Representation Encoder-based DTA prediction (MREDTA). MREDTA has three advantages: (1) extraction of both local and global molecular features simultaneously through skip connections; (2) improved sensitivity to molecular structures through the Multi-Trans Block; (3) enhanced generalizability through the introduction of BERT. Compared with 12 advanced models, benchmark testing of KIBA and Davis datasets demonstrated optimal performance of MREDTA. In case study, we applied MREDTA to 2034 FDA-approved drugs for treating non-small-cell lung cancer (NSCLC), all of which act on mutant EGFRT790M protein. The corresponding molecular docking results demonstrated the robustness of MREDTA.


Subject(s)
Molecular Docking Simulation , Humans , Carcinoma, Non-Small-Cell Lung/drug therapy , Carcinoma, Non-Small-Cell Lung/metabolism , Drug Repositioning/methods , Lung Neoplasms/drug therapy , Lung Neoplasms/metabolism , Lung Neoplasms/genetics , ErbB Receptors/metabolism , ErbB Receptors/chemistry , ErbB Receptors/genetics , Protein Binding , Antineoplastic Agents/pharmacology , Antineoplastic Agents/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL