Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 82
Filter
1.
Interdiscip Sci ; 2024 Jun 29.
Article in English | MEDLINE | ID: mdl-38951382

ABSTRACT

Image classification, a fundamental task in computer vision, faces challenges concerning limited data handling, interpretability, improved feature representation, efficiency across diverse image types, and processing noisy data. Conventional architectural approaches have made insufficient progress in addressing these challenges, necessitating architectures capable of fine-grained classification, enhanced accuracy, and superior generalization. Among these, the vision transformer emerges as a noteworthy computer vision architecture. However, its reliance on substantial data for training poses a drawback due to its complexity and high data requirements. To surmount these challenges, this paper proposes an innovative approach, MetaV, integrating meta-learning into a vision transformer for medical image classification. N-way K-shot learning is employed to train the model, drawing inspiration from human learning mechanisms utilizing past knowledge. Additionally, deformational convolution and patch merging techniques are incorporated into the vision transformer model to mitigate complexity and overfitting while enhancing feature representation. Augmentation methods such as perturbation and Grid Mask are introduced to address the scarcity and noise in medical images, particularly for rare diseases. The proposed model is evaluated using diverse datasets including Break His, ISIC 2019, SIPaKMed, and STARE. The achieved performance accuracies of 89.89%, 87.33%, 94.55%, and 80.22% for Break His, ISIC 2019, SIPaKMed, and STARE, respectively, present evidence validating the superior performance of the proposed model in comparison to conventional models, setting a new benchmark for meta-vision image classification models.

2.
Comput Biol Med ; 179: 108792, 2024 Jul 03.
Article in English | MEDLINE | ID: mdl-38964242

ABSTRACT

BACKGROUND AND OBJECTIVE: Concerns about patient privacy issues have limited the application of medical deep learning models in certain real-world scenarios. Differential privacy (DP) can alleviate this problem by injecting random noise into the model. However, naively applying DP to medical models will not achieve a satisfactory balance between privacy and utility due to the high dimensionality of medical models and the limited labeled samples. METHODS: This work proposed the DP-SSLoRA model, a privacy-preserving classification model for medical images combining differential privacy with self-supervised low-rank adaptation. In this work, a self-supervised pre-training method is used to obtain enhanced representations from unlabeled publicly available medical data. Then, a low-rank decomposition method is employed to mitigate the impact of differentially private noise and combined with pre-trained features to conduct the classification task on private datasets. RESULTS: In the classification experiments using three real chest-X ray datasets, DP-SSLoRA achieves good performance with strong privacy guarantees. Under the premise of ɛ=2, with the AUC of 0.942 in RSNA, the AUC of 0.9658 in Covid-QU-mini, and the AUC of 0.9886 in Chest X-ray 15k. CONCLUSION: Extensive experiments on real chest X-ray datasets show that DP-SSLoRA can achieve satisfactory performance with stronger privacy guarantees. This study provides guidance for studying privacy-preserving in the medical field. Source code is publicly available online. https://github.com/oneheartforone/DP-SSLoRA.

3.
Bioengineering (Basel) ; 11(6)2024 Jun 05.
Article in English | MEDLINE | ID: mdl-38927807

ABSTRACT

Ameloblastoma (AM), periapical cyst (PC), and chronic suppurative osteomyelitis (CSO) are prevalent maxillofacial diseases with similar imaging characteristics but different treatments, thus making preoperative differential diagnosis crucial. Existing deep learning methods for diagnosis often require manual delineation in tagging the regions of interest (ROIs), which triggers some challenges in practical application. We propose a new model of Wavelet Extraction and Fusion Module with Vision Transformer (WaveletFusion-ViT) for automatic diagnosis using CBCT panoramic images. In this study, 539 samples containing healthy (n = 154), AM (n = 181), PC (n = 102), and CSO (n = 102) were acquired by CBCT for classification, with an additional 2000 healthy samples for pre-training the domain-adaptive network (DAN). The WaveletFusion-ViT model was initialized with pre-trained weights obtained from the DAN and further trained using semi-supervised learning (SSL) methods. After five-fold cross-validation, the model achieved average sensitivity, specificity, accuracy, and AUC scores of 79.60%, 94.48%, 91.47%, and 0.942, respectively. Remarkably, our method achieved 91.47% accuracy using less than 20% labeled samples, surpassing the fully supervised approach's accuracy of 89.05%. Despite these promising results, this study's limitations include a low number of CSO cases and a relatively lower accuracy for this condition, which should be addressed in future research. This research is regarded as an innovative approach as it deviates from the fully supervised learning paradigm typically employed in previous studies. The WaveletFusion-ViT model effectively combines SSL methods to effectively diagnose three types of CBCT panoramic images using only a small portion of labeled data.

4.
Comput Biol Med ; 177: 108635, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38796881

ABSTRACT

Multimodal medical imaging plays a pivotal role in clinical diagnosis and research, as it combines information from various imaging modalities to provide a more comprehensive understanding of the underlying pathology. Recently, deep learning-based multimodal fusion techniques have emerged as powerful tools for improving medical image classification. This review offers a thorough analysis of the developments in deep learning-based multimodal fusion for medical classification tasks. We explore the complementary relationships among prevalent clinical modalities and outline three main fusion schemes for multimodal classification networks: input fusion, intermediate fusion (encompassing single-level fusion, hierarchical fusion, and attention-based fusion), and output fusion. By evaluating the performance of these fusion techniques, we provide insight into the suitability of different network architectures for various multimodal fusion scenarios and application domains. Furthermore, we delve into challenges related to network architecture selection, handling incomplete multimodal data management, and the potential limitations of multimodal fusion. Finally, we spotlight the promising future of Transformer-based multimodal fusion techniques and give recommendations for future research in this rapidly evolving field.


Subject(s)
Deep Learning , Multimodal Imaging , Humans , Multimodal Imaging/methods , Image Interpretation, Computer-Assisted/methods , Image Processing, Computer-Assisted/methods
5.
Artif Intell Med ; 153: 102897, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38810471

ABSTRACT

Convolutional neural networks (CNNs) are gradually being recognized in the neuroimaging community as a powerful tool for image analysis. Despite their outstanding performances, some aspects of CNN functioning are still not fully understood by human operators. We postulated that the interpretability of CNNs applied to neuroimaging data could be improved by investigating their behavior when they are fed data with known characteristics. We analyzed the ability of 3D CNNs to discriminate between original and altered whole-brain parametric maps derived from diffusion-weighted magnetic resonance imaging. The alteration consisted in linearly changing the voxel intensity of either one (monoregion) or two (biregion) anatomical regions in each brain volume, but without mimicking any neuropathology. Performing ten-fold cross-validation and using a hold-out set for testing, we assessed the CNNs' discrimination ability according to the intensity of the altered regions, comparing the latter's size and relative position. Monoregion CNNs showed that the larger the modified region, the smaller the intensity increase needed to achieve good performances. Biregion CNNs systematically outperformed monoregion CNNs, but could only detect one of the two target regions when tested on the corresponding monoregion images. Exploiting prior information on training data allowed for a better understanding of CNN behavior, especially when altered regions were combined. This can inform about the complexity of CNN pattern retrieval and elucidate misclassified examples, particularly relevant for pathological data. The proposed analytical approach may serve to gain insights into CNN behavior and guide the design of enhanced detection systems exploiting our prior knowledge.


Subject(s)
Brain , Neural Networks, Computer , Humans , Brain/diagnostic imaging , Magnetic Resonance Imaging/methods , Imaging, Three-Dimensional/methods , Neuroimaging/methods , Image Processing, Computer-Assisted/methods , Diffusion Magnetic Resonance Imaging/methods , Image Interpretation, Computer-Assisted/methods , Male
6.
Comput Methods Programs Biomed ; 253: 108230, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38810377

ABSTRACT

BACKGROUND AND OBJECTIVE: The classification of diabetic retinopathy (DR) aims to utilize the implicit information in images for early diagnosis, to prevent and mitigate the further worsening of the condition. However, existing methods are often limited by the need to operate within large, annotated datasets to show significant advantages. Additionally, the number of samples for different categories within the dataset needs to be evenly distributed, because the characteristic of sample imbalance distribution can lead to an excessive focus on high-frequency disease categories, while neglecting the less common but equally important disease categories. Therefore, there is an urgent need to develop a new classification method that can effectively alleviate the issue of sample distribution imbalance, thereby enhancing the accuracy of diabetic retinopathy classification. METHODS: In this work, we propose MediDRNet, a dual-branch network model based on prototypical contrastive learning. This model adopts prototype contrastive learning, creating prototypes for different levels of lesions, ensuring they represent the core features of each lesion level. It classifies by comparing the similarity between data points and their category prototypes. Our dual-branch network structure effectively resolves the issue of category imbalance and improves classification accuracy by emphasizing subtle differences in retinal lesions. Moreover, our approach combines a dual-branch network with specific lesion-level prototypes for core feature representation and incorporates the convolutional block attention module for enhanced lesion feature identification. RESULTS: Our experiments using both the Kaggle and UWF classification datasets have demonstrated that MediDRNet exhibits exceptional performance compared to other advanced models in the industry, especially on the UWF DR classification dataset where it achieved state-of-the-art performance across all metrics. On the Kaggle DR classification dataset, it achieved the highest average classification accuracy (0.6327) and Macro-F1 score (0.6361). Particularly in the classification tasks for minority categories of diabetic retinopathy on the Kaggle dataset (Grades 1, 2, 3, and 4), the model reached high classification accuracies of 58.08%, 55.32%, 69.73%, and 90.21%, respectively. In the ablation study, the MediDRNet model proved to be more effective in feature extraction from diabetic retinal fundus images compared to other feature extraction methods. CONCLUSIONS: This study employed prototype contrastive learning and bidirectional branch learning strategies, successfully constructing a grading system for diabetic retinopathy lesions within imbalanced diabetic retinopathy datasets. Through a dual-branch network, the feature learning branch effectively facilitated a smooth transition of features from the grading network to the classification learning branch, accurately identifying minority sample categories. This method not only effectively resolved the issue of sample imbalance but also provided strong support for the precise grading and early diagnosis of diabetic retinopathy in clinical applications, showcasing exceptional performance in handling complex diabetic retinopathy datasets. Moreover, this research significantly improved the efficiency of prevention and management of disease progression in diabetic retinopathy patients within medical practice. We encourage the use and modification of our code, which is publicly accessible on GitHub: https://github.com/ReinforceLove/MediDRNet.


Subject(s)
Diabetic Retinopathy , Diabetic Retinopathy/classification , Diabetic Retinopathy/diagnosis , Humans , Machine Learning , Neural Networks, Computer , Algorithms , Databases, Factual , Retina/diagnostic imaging , Image Interpretation, Computer-Assisted/methods
7.
Med Image Anal ; 95: 103199, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38759258

ABSTRACT

The accurate diagnosis on pathological subtypes for lung cancer is of significant importance for the follow-up treatments and prognosis managements. In this paper, we propose self-generating hybrid feature network (SGHF-Net) for accurately classifying lung cancer subtypes on computed tomography (CT) images. Inspired by studies stating that cross-scale associations exist in the image patterns between the same case's CT images and its pathological images, we innovatively developed a pathological feature synthetic module (PFSM), which quantitatively maps cross-modality associations through deep neural networks, to derive the "gold standard" information contained in the corresponding pathological images from CT images. Additionally, we designed a radiological feature extraction module (RFEM) to directly acquire CT image information and integrated it with the pathological priors under an effective feature fusion framework, enabling the entire classification model to generate more indicative and specific pathologically related features and eventually output more accurate predictions. The superiority of the proposed model lies in its ability to self-generate hybrid features that contain multi-modality image information based on a single-modality input. To evaluate the effectiveness, adaptability, and generalization ability of our model, we performed extensive experiments on a large-scale multi-center dataset (i.e., 829 cases from three hospitals) to compare our model and a series of state-of-the-art (SOTA) classification models. The experimental results demonstrated the superiority of our model for lung cancer subtypes classification with significant accuracy improvements in terms of accuracy (ACC), area under the curve (AUC), positive predictive value (PPV) and F1-score.


Subject(s)
Lung Neoplasms , Tomography, X-Ray Computed , Humans , Lung Neoplasms/diagnostic imaging , Lung Neoplasms/classification , Tomography, X-Ray Computed/methods , Neural Networks, Computer , Radiographic Image Interpretation, Computer-Assisted/methods , Algorithms
8.
Entropy (Basel) ; 26(5)2024 May 01.
Article in English | MEDLINE | ID: mdl-38785649

ABSTRACT

Medical image diagnosis using deep learning has shown significant promise in clinical medicine. However, it often encounters two major difficulties in real-world applications: (1) domain shift, which invalidates the trained model on new datasets, and (2) class imbalance problems leading to model biases towards majority classes. To address these challenges, this paper proposes a transfer learning solution, named Dynamic Weighting Translation Transfer Learning (DTTL), for imbalanced medical image classification. The approach is grounded in information and entropy theory and comprises three modules: Cross-domain Discriminability Adaptation (CDA), Dynamic Domain Translation (DDT), and Balanced Target Learning (BTL). CDA connects discriminative feature learning between source and target domains using a synthetic discriminability loss and a domain-invariant feature learning loss. The DDT unit develops a dynamic translation process for imbalanced classes between two domains, utilizing a confidence-based selection approach to select the most useful synthesized images to create a pseudo-labeled balanced target domain. Finally, the BTL unit performs supervised learning on the reassembled target set to obtain the final diagnostic model. This paper delves into maximizing the entropy of class distributions, while simultaneously minimizing the cross-entropy between the source and target domains to reduce domain discrepancies. By incorporating entropy concepts into our framework, our method not only significantly enhances medical image classification in practical settings but also innovates the application of entropy and information theory within deep learning and medical image processing realms. Extensive experiments demonstrate that DTTL achieves the best performance compared to existing state-of-the-art methods for imbalanced medical image classification tasks.

9.
Med Biol Eng Comput ; 2024 May 10.
Article in English | MEDLINE | ID: mdl-38727760

ABSTRACT

Medical image classification plays a pivotal role within the field of medicine. Existing models predominantly rely on supervised learning methods, which necessitate large volumes of labeled data for effective training. However, acquiring and annotating medical image data is both an expensive and time-consuming endeavor. In contrast, semi-supervised learning methods offer a promising approach by harnessing limited labeled data alongside abundant unlabeled data to enhance the performance of medical image classification. Nonetheless, current methods often encounter confirmation bias due to noise inherent in self-generated pseudo-labels and the presence of boundary samples from different classes. To overcome these challenges, this study introduces a novel framework known as boundary sample-based class-weighted semi-supervised learning (BSCSSL) for medical image classification. Our method aims to alleviate the impact of intra- and inter-class boundary samples derived from unlabeled data. Specifically, we address reliable confidential data and inter-class boundary samples separately through the utilization of an inter-class boundary sample mining module. Additionally, we implement an intra-class boundary sample weighting mechanism to extract class-aware features specific to intra-class boundary samples. Rather than discarding such intra-class boundary samples outright, our approach acknowledges their intrinsic value despite the difficulty associated with accurate classification, as they contribute significantly to model prediction. Experimental results on widely recognized medical image datasets demonstrate the superiority of our proposed BSCSSL method over existing semi-supervised learning approaches. By enhancing the accuracy and robustness of medical image classification, our BSCSSL approach yields considerable implications for advancing medical diagnosis and future research endeavors.

10.
Med Phys ; 2024 May 20.
Article in English | MEDLINE | ID: mdl-38767532

ABSTRACT

BACKGROUND: Bladder prolapse is a common clinical disorder of pelvic floor dysfunction in women, and early diagnosis and treatment can help them recover. Pelvic magnetic resonance imaging (MRI) is one of the most important methods used by physicians to diagnose bladder prolapse; however, it is highly subjective and largely dependent on the clinical experience of physicians. The application of computer-aided diagnostic techniques to achieve a graded diagnosis of bladder prolapse can help improve its accuracy and shorten the learning curve. PURPOSE: The purpose of this study is to combine convolutional neural network (CNN) and vision transformer (ViT) for grading bladder prolapse in place of traditional neural networks, and to incorporate attention mechanisms into mobile vision transformer (MobileViT) for assisting in the grading of bladder prolapse. METHODS: This study focuses on the grading of bladder prolapse in pelvic organs using a combination of a CNN and a ViT. First, this study used MobileNetV2 to extract the local features of the images. Next, a ViT was used to extract the global features by modeling the non-local dependencies at a distance. Finally, a channel attention module (i.e., squeeze-and-excitation network) was used to improve the feature extraction network and enhance its feature representation capability. The final grading of the degree of bladder prolapse was thus achieved. RESULTS: Using pelvic MRI images provided by a Huzhou Maternal and Child Health Care Hospital, this study used the proposed method to grade patients with bladder prolapse. The accuracy, Kappa value, sensitivity, specificity, precision, and area under the curve of our method were 86.34%, 78.27%, 83.75%, 95.43%, 85.70%, and 95.05%, respectively. In comparison with other CNN models, the proposed method performed better. CONCLUSIONS: Thus, the model based on attention mechanisms exhibits better classification performance than existing methods for grading bladder prolapse in pelvic organs, and it can effectively assist physicians in achieving a more accurate bladder prolapse diagnosis.

11.
Comput Biol Med ; 173: 108388, 2024 May.
Article in English | MEDLINE | ID: mdl-38569235

ABSTRACT

The COVID-19 pandemic has resulted in hundreds of million cases and numerous deaths worldwide. Here, we develop a novel classification network CECT by controllable ensemble convolutional neural network and transformer to provide a timely and accurate COVID-19 diagnosis. The CECT is composed of a parallel convolutional encoder block, an aggregate transposed-convolutional decoder block, and a windowed attention classification block. Each block captures features at different scales from 28 × 28 to 224 × 224 from the input, composing enriched and comprehensive information. Different from existing methods, our CECT can capture features at both multi-local and global scales without any sophisticated module design. Moreover, the contribution of local features at different scales can be controlled with the proposed ensemble coefficients. We evaluate CECT on two public COVID-19 datasets and it reaches the highest accuracy of 98.1% in the intra-dataset evaluation, outperforming existing state-of-the-art methods. Moreover, the developed CECT achieves an accuracy of 90.9% on the unseen dataset in the inter-dataset evaluation, showing extraordinary generalization ability. With remarkable feature capture ability and generalization ability, we believe CECT can be extended to other medical scenarios as a powerful diagnosis tool. Code is available at https://github.com/NUS-Tim/CECT.


Subject(s)
COVID-19 , Humans , COVID-19 Testing , Pandemics , Neural Networks, Computer , Image Processing, Computer-Assisted
12.
Sci Rep ; 14(1): 8071, 2024 04 05.
Article in English | MEDLINE | ID: mdl-38580700

ABSTRACT

Over recent years, researchers and practitioners have encountered massive and continuous improvements in the computational resources available for their use. This allowed the use of resource-hungry Machine learning (ML) algorithms to become feasible and practical. Moreover, several advanced techniques are being used to boost the performance of such algorithms even further, which include various transfer learning techniques, data augmentation, and feature concatenation. Normally, the use of these advanced techniques highly depends on the size and nature of the dataset being used. In the case of fine-grained medical image sets, which have subcategories within the main categories in the image set, there is a need to find the combination of the techniques that work the best on these types of images. In this work, we utilize these advanced techniques to find the best combinations to build a state-of-the-art lumber disc herniation computer-aided diagnosis system. We have evaluated the system extensively and the results show that the diagnosis system achieves an accuracy of 98% when it is compared with human diagnosis.


Subject(s)
Intervertebral Disc Displacement , Humans , Intervertebral Disc Displacement/diagnostic imaging , Diagnosis, Computer-Assisted/methods , Algorithms , Machine Learning , Computers
13.
Diagnostics (Basel) ; 14(5)2024 Mar 06.
Article in English | MEDLINE | ID: mdl-38473030

ABSTRACT

In the realm of liver transplantation, accurately determining hepatic steatosis levels is crucial. Recognizing the essential need for improved diagnostic precision, particularly for optimizing diagnosis time by swiftly handling easy-to-solve cases and allowing the expert time to focus on more complex cases, this study aims to develop cutting-edge algorithms that enhance the classification of liver biopsy images. Additionally, the challenge of maintaining data privacy arises when creating automated algorithmic solutions, as sharing patient data between hospitals is restricted, further complicating the development and validation process. This research tackles diagnostic accuracy by leveraging novel techniques from the rapidly evolving field of quantum machine learning, known for their superior generalization abilities. Concurrently, it addresses privacy concerns through the implementation of privacy-conscious collaborative machine learning with federated learning. We introduce a hybrid quantum neural network model that leverages real-world clinical data to assess non-alcoholic liver steatosis accurately. This model achieves an image classification accuracy of 97%, surpassing traditional methods by 1.8%. Moreover, by employing a federated learning approach that allows data from different clients to be shared while ensuring privacy, we maintain an accuracy rate exceeding 90%. This initiative marks a significant step towards a scalable, collaborative, efficient, and dependable computational framework that aids clinical pathologists in their daily diagnostic tasks.

14.
Math Biosci Eng ; 21(2): 1959-1978, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-38454670

ABSTRACT

The timely diagnosis of acute lymphoblastic leukemia (ALL) is of paramount importance for enhancing the treatment efficacy and the survival rates of patients. In this study, we seek to introduce an ensemble-ALL model for the image classification of ALL, with the goal of enhancing early diagnostic capabilities and streamlining the diagnostic and treatment processes for medical practitioners. In this study, a publicly available dataset is partitioned into training, validation, and test sets. A diverse set of convolutional neural networks, including InceptionV3, EfficientNetB4, ResNet50, CONV_POOL-CNN, ALL-CNN, Network in Network, and AlexNet, are employed for training. The top-performing four individual models are meticulously chosen and integrated with the squeeze-and-excitation (SE) module. Furthermore, the two most effective SE-embedded models are harmoniously combined to create the proposed ensemble-ALL model. This model leverages the Bayesian optimization algorithm to enhance its performance. The proposed ensemble-ALL model attains remarkable accuracy, precision, recall, F1-score, and kappa scores, registering at 96.26, 96.26, 96.26, 96.25, and 91.36%, respectively. These results surpass the benchmarks set by state-of-the-art studies in the realm of ALL image classification. This model represents a valuable contribution to the field of medical image recognition, particularly in the diagnosis of acute lymphoblastic leukemia, and it offers the potential to enhance the efficiency and accuracy of medical professionals in the diagnostic and treatment processes.


Subject(s)
Precursor Cell Lymphoblastic Leukemia-Lymphoma , Humans , Bayes Theorem , Precursor Cell Lymphoblastic Leukemia-Lymphoma/diagnostic imaging , Algorithms , Health Personnel , Neural Networks, Computer
15.
J Med Imaging (Bellingham) ; 11(2): 024503, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38525295

ABSTRACT

Purpose: Ischemic myocardial scarring (IMS) is a common outcome of coronary artery disease that potentially leads to lethal arrythmias and heart failure. Late-gadolinium-enhanced cardiac magnetic resonance (CMR) imaging scans have served as the diagnostic bedrock for IMS, with recent advancements in machine learning enabling enhanced scar classification. However, the trade-off for these improvements is intensive computational and time demands. As a solution, we propose a combination of lightweight preprocessing (LWP) and template matching (TM) to streamline IMS classification. Approach: CMR images from 279 patients (151 IMS, 128 control) were classified for IMS presence using two convolutional neural networks (CNNs) and TM, both with and without LWP. Evaluation metrics included accuracy, sensitivity, specificity, F1-score, area under the receiver operating characteristic curve (AUROC), and processing time. External testing dataset analysis encompassed patient-level classifications (PLCs) and a CNN versus TM classification comparison (CVTCC). Results: LWP enhanced the speed of both CNNs (4.9x) and TM (21.9x). Furthermore, in the absence of LWP, TM outpaced CNNs by over 10x, while with LWP, TM was more than 100x faster. Additionally, TM performed similarly to the CNNs in accuracy, sensitivity, specificity, F1-score, and AUROC, with PLCs demonstrating improvements across all five metrics. Moreover, the CVTCC revealed a substantial 90.9% agreement. Conclusions: Our results highlight the effectiveness of LWP and TM in streamlining IMS classification. Anticipated enhancements to LWP's region of interest (ROI) isolation and TM's ROI targeting are expected to boost accuracy, positioning them as a potential alternative to CNNs for IMS classification, supporting the need for further research.

16.
Quant Imaging Med Surg ; 14(3): 2539-2555, 2024 Mar 15.
Article in English | MEDLINE | ID: mdl-38545066

ABSTRACT

Background: Disease diagnosis in chest X-ray images has predominantly relied on convolutional neural networks (CNNs). However, Vision Transformer (ViT) offers several advantages over CNNs, as it excels at capturing long-term dependencies, exploring correlations, and extracting features with richer semantic information. Methods: We adapted ViT for chest X-ray image analysis by making the following three key improvements: (I) employing a sliding window approach in the image sequence feature extraction module to divide the input image into blocks to identify small and difficult-to-detect lesion areas; (II) introducing an attention region selection module in the encoder layer of the ViT model to enhance the model's ability to focus on relevant regions; and (III) constructing a parallel patient metadata feature extraction network on top of the image feature extraction network to integrate multi-modal input data, enabling the model to synergistically learn and expand image-semantic information. Results: The experimental results showed the effectiveness of our proposed model, which had an average area under the curve value of 0.831 in diagnosing 14 common chest diseases. The metadata feature network module effectively integrated patient metadata, further enhancing the model's accuracy in diagnosis. Our ViT-based model had a sensitivity of 0.863, a specificity of 0.821, and an accuracy of 0.834 in diagnosing these common chest diseases. Conclusions: Our model has good general applicability and shows promise in chest X-ray image analysis, effectively integrating patient metadata and enhancing diagnostic capabilities.

17.
Med Image Anal ; 94: 103107, 2024 May.
Article in English | MEDLINE | ID: mdl-38401269

ABSTRACT

We propose a novel semi-supervised learning method to leverage unlabeled data alongside minimal annotated data and improve medical imaging classification performance in realistic scenarios with limited labeling budgets to afford data annotations. Our method introduces distance correlation to minimize correlations between feature representations from different views of the same image encoded with non-coupled deep neural networks architectures. In addition, it incorporates a data-driven graph-attention based regularization strategy to model affinities among images within the unlabeled data by exploiting their inherent relational information in the feature space. We conduct extensive experiments on four medical imaging benchmark data sets involving X-ray, dermoscopic, magnetic resonance, and computer tomography imaging on single and multi-label medical imaging classification scenarios. Our experiments demonstrate the effectiveness of our method in achieving very competitive performance and outperforming several state-of-the-art semi-supervised learning methods. Furthermore, they confirm the suitability of distance correlation as a versatile dependence measure and the benefits of the proposed graph-attention based regularization for semi-supervised learning in medical imaging analysis.


Subject(s)
Benchmarking , Neural Networks, Computer , Humans , Supervised Machine Learning
18.
Neural Netw ; 173: 106183, 2024 May.
Article in English | MEDLINE | ID: mdl-38382397

ABSTRACT

The rising global incidence of human Mpox cases necessitates prompt and accurate identification for effective disease control. Previous studies have predominantly delved into traditional ensemble methods for detection, we introduce a novel approach by leveraging a metaheuristic-based ensemble framework. In this research, we present an innovative CGO-Ensemble framework designed to elevate the accuracy of detecting Mpox infection in patients. Initially, we employ five transfer learning base models that integrate feature integration layers and residual blocks. These components play a crucial role in capturing significant features from the skin images, thereby enhancing the models' efficacy. In the next step, we employ a weighted averaging scheme to consolidate predictions generated by distinct models. To achieve the optimal allocation of weights for each base model in the ensemble process, we leverage the Chaos Game Optimization (CGO) algorithm. This strategic weight assignment enhances classification outcomes considerably, surpassing the performance of randomly assigned weights. Implementing this approach yields notably enhanced prediction accuracy compared to using individual models. We evaluate the effectiveness of our proposed approach through comprehensive experiments conducted on two widely recognized benchmark datasets: the Mpox Skin Lesion Dataset (MSLD) and the Mpox Skin Image Dataset (MSID). To gain insights into the decision-making process of the base models, we have performed Gradient Class Activation Mapping (Grad-CAM) analysis. The experimental results showcase the outstanding performance of the CGO-ensemble, achieving an impressive accuracy of 100% on MSLD and 94.16% on MSID. Our approach significantly outperforms other state-of-the-art optimization algorithms, traditional ensemble methods, and existing techniques in the context of Mpox detection on these datasets. These findings underscore the effectiveness and superiority of the CGO-Ensemble in accurately identifying Mpox cases, highlighting its potential in disease detection and classification.


Subject(s)
Mpox (monkeypox) , Humans , Algorithms , Neural Networks, Computer , Benchmarking , Learning
19.
Comput Biol Med ; 168: 107758, 2024 01.
Article in English | MEDLINE | ID: mdl-38042102

ABSTRACT

Convolutional neural network (CNN) has promoted the development of diagnosis technology of medical images. However, the performance of CNN is limited by insufficient feature information and inaccurate attention weight. Previous works have improved the accuracy and speed of CNN but ignored the uncertainty of the prediction, that is to say, uncertainty of CNN has not received enough attention. Therefore, it is still a great challenge for extracting effective features and uncertainty quantification of medical deep learning models In order to solve the above problems, this paper proposes a novel convolutional neural network model named DM-CNN, which mainly contains the four proposed sub-modules : dynamic multi-scale feature fusion module (DMFF), hierarchical dynamic uncertainty quantifies attention (HDUQ-Attention) and multi-scale fusion pooling method (MF Pooling) and multi-objective loss (MO loss). DMFF select different convolution kernels according to the feature maps at different levels, extract different-scale feature information, and make the feature information of each layer have stronger representation ability for information fusion HDUQ-Attention includes a tuning block that adjust the attention weight according to the different information of each layer, and a Monte-Carlo (MC) dropout structure for quantifying uncertainty MF Pooling is a pooling method designed for multi-scale models, which can speed up the calculation and prevent overfitting while retaining the main important information Because the number of parameters in the backbone part of DM-CNN is different from other modules, MO loss is proposed, which has a fast optimization speed and good classification effect DM-CNN conducts experiments on publicly available datasets in four areas of medicine (Dermatology, Histopathology, Respirology, Ophthalmology), achieving state-of-the-art classification performance on all datasets. DM-CNN can not only maintain excellent performance, but also solve the problem of quantification of uncertainty, which is a very important task for the medical field. The code is available: https://github.com/QIANXIN22/DM-CNN.


Subject(s)
Medicine , Neural Networks, Computer , Uncertainty , Algorithms , Monte Carlo Method
20.
Comput Biol Med ; 168: 107751, 2024 01.
Article in English | MEDLINE | ID: mdl-38016373

ABSTRACT

Computer-aided diagnosis (CAD) assists endoscopists in analyzing endoscopic images, reducing misdiagnosis rates and enabling timely treatment. A few studies have focused on CAD for gastroesophageal reflux disease, but CAD studies on reflux esophagitis (RE) are still inadequate. This paper presents a CAD study on RE using a dataset collected from hospital, comprising over 3000 images. We propose an uncertainty-aware network with handcrafted features, utilizing representation and classifier decoupling with metric learning to address class imbalance and achieve fine-grained RE classification. To enhance interpretability, the network estimates uncertainty through test time augmentation. The experimental results demonstrate that the proposed network surpasses previous methods, achieving an accuracy of 90.2% and an F1 score of 90.1%.


Subject(s)
Esophagitis, Peptic , Humans , Esophagitis, Peptic/diagnostic imaging , Uncertainty , Diagnosis, Computer-Assisted/methods , Learning
SELECTION OF CITATIONS
SEARCH DETAIL
...