Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 476
Filtrar
1.
Artif Intell Med ; 157: 102987, 2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39357280

RESUMO

Parkinson's disease (PD) is the second most prevalent neurodegenerative disorder, and it remains incurable. Currently there is no definitive biomarker for detecting PD, measuring its severity, or monitoring of treatments. Recently, oculomotor fixation abnormalities have emerged as a sensitive biomarker to discriminate Parkinsonian patterns from a control population, even at early stages. For oculomotor analysis, current experimental setups use invasive and restrictive capture protocols that limit the transfer in clinical routine. Alternatively, computational approaches to support the PD diagnosis are strictly based on supervised strategies, depending of large labeled data, and introducing an inherent expert-bias. This work proposes a self-supervised architecture based on Riemannian deep representation to learn oculomotor fixation patterns from compact descriptors. Firstly, deep convolutional features are recovered from oculomotor fixation video slices, and then encoded in compact symmetric positive matrices (SPD) to summarize second-order relationships. Each SPD input matrix is projected onto a Riemannian encoder until obtain a SPD embedding. Then, a Riemannian decoder reconstructs SPD matrices while preserving the geometrical manifold structure. The proposed architecture successfully recovers geometric patterns in the embeddings without any label diagnosis supervision, and demonstrates the capability to be discriminative regarding PD patterns. In a retrospective study involving 13 healthy adults and 13 patients diagnosed with PD, the proposed Riemannian representation achieved an average accuracy of 95.6% and an AUC of 99% during a binary classification task using a Support Vector Machine.

2.
Neural Netw ; 181: 106760, 2024 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-39362184

RESUMO

The progression of deep learning and the widespread adoption of sensors have facilitated automatic multi-view fusion (MVF) about the cardiovascular system (CVS) signals. However, prevalent MVF model architecture often amalgamates CVS signals from the same temporal step but different views into a unified representation, disregarding the asynchronous nature of cardiovascular events and the inherent heterogeneity across views, leading to catastrophic view confusion. Efficient training strategies specifically tailored for MVF models to attain comprehensive representations need simultaneous consideration. Crucially, real-world data frequently arrives with incomplete views, an aspect rarely noticed by researchers. Thus, the View-Centric Transformer (VCT) and Multitask Masked Autoencoder (M2AE) are specifically designed to emphasize the centrality of each view and harness unlabeled data to achieve superior fused representations. Additionally, we systematically define the missing-view problem for the first time and introduce prompt techniques to aid pretrained MVF models in flexibly adapting to various missing-view scenarios. Rigorous experiments involving atrial fibrillation detection, blood pressure estimation, and sleep staging-typical health monitoring tasks-demonstrate the remarkable advantage of our method in MVF compared to prevailing methodologies. Notably, the prompt technique requires finetuning <3 % of the entire model's data, substantially fortifying the model's resilience to view missing while circumventing the need for complete retraining. The results demonstrate the effectiveness of our approaches, highlighting their potential for practical applications in cardiovascular health monitoring. Codes and models are released at URL.

3.
Comput Methods Programs Biomed ; 257: 108452, 2024 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-39393284

RESUMO

BACKGROUND AND OBJECTIVE: Electrocardiogram (ECG) is one of the most important diagnostic tools for cardiovascular diseases (CVDs). Recent studies show that deep learning models can be trained using labeled ECGs to achieve automatic detection of CVDs, assisting cardiologists in diagnosis. However, the deep learning models heavily rely on labels in training, while manual labeling is costly and time-consuming. This paper proposes a new self-supervised learning (SSL) method for multilead ECGs: bootstrap each lead's latent (BELL) to reduce the reliance and boost model performance in various tasks, especially when training data are insufficient. METHOD: BELL is a variant of the well-known bootstrap your own latent (BYOL). The BELL aims to learn prior knowledge from unlabeled ECGs by pretraining, benefitting downstream tasks. It leverages the characteristics of multilead ECGs. First, BELL uses the multiple-branch skeleton, which is more effective in processing multilead ECGs. Moreover, it proposes intra-lead and inter-lead mean square error (MSE) to guide pretraining, and their fusion can result in better performances. Additionally, BELL inherits the main advantage of the BYOL: No negative pair is used in pretraining, making it more efficient. RESULTS: In most cases, BELL surpasses previous works in the experiments. More importantly, the pretraining improves model performances by 0.69% ∼ 8.89% in downstream tasks when only 10% of training data are available. Furthermore, BELL shows excellent adaptability to uncurated ECG data from a real-world hospital. Only slight performance degradation occurs (<1% in most cases) when using these data. CONCLUSION: The results suggest that the BELL can alleviate the reliance on manual ECG labels from cardiologists, a critical bottleneck of the current deep learning-based models. In this way, the BELL can also help deep learning extend its application on automatic ECG analysis, reducing the cardiologists' burden in real-world diagnosis.

4.
Comput Biol Med ; 183: 109221, 2024 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-39378579

RESUMO

Diagnosing dental caries poses a significant challenge in dentistry, necessitating precise and early detection for effective management. This study utilizes Self-Supervised Learning (SSL) tasks to improve the classification of dental caries in Cone Beam Computed Tomography (CBCT) images, employing the International Caries Detection and Assessment System (ICDAS). Faced with the challenge of scarce annotated medical images, our research employs SSL to utilize unlabeled data, thereby improving model performance. We have developed a pipeline incorporating unlabeled data extraction from CBCT exams and subsequent model training using SSL tasks. A distinctive aspect of our approach is the integration of image processing techniques with SSL tasks, along with exploring the necessity for unlabeled data. Our research aims to identify the most effective image processing techniques for data extraction, the most efficient deep learning architectures for caries classification, the impact of unlabeled dataset sizes on model performance, and the comparative effectiveness of different SSL approaches in this domain. Among the tested architectures, ResNet-18, combined with the SimCLR task, demonstrated an average F1-score macro of 88.42%, Precision macro of 90.44%, and Sensitivity macro of 86.67%, reaching a 5.5% increase in F1-score compared to models using only deep learning architecture. These results suggest that SSL can significantly enhance the accuracy and efficiency of caries classification in CBCT images.

5.
MAGMA ; 2024 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-39382814

RESUMO

Deep-learning-based MR image reconstruction in settings where large fully sampled dataset collection is infeasible requires methods that effectively use both under-sampled and fully sampled datasets. This paper evaluates a weakly supervised, multi-coil, physics-guided approach to MR image reconstruction, leveraging both dataset types, to improve both the quality and robustness of reconstruction. A physics-guided end-to-end variational network (VarNet) is pretrained in a self-supervised manner using a 4 × under-sampled dataset following the self-supervised learning via data undersampling (SSDU) methodology. The pre-trained weights are transferred to another VarNet, which is fine-tuned using a smaller, fully sampled dataset by optimizing multi-scale structural similarity (MS-SSIM) loss in image space. The proposed methodology is compared with fully self-supervised and fully supervised training. Reconstruction quality improvements in SSIM, PSNR, and NRMSE when abundant training data is available (the high-data regime), and enhanced robustness when training data is scarce (the low-data regime) are demonstrated using weak supervision for knee and brain MR image reconstructions at 8 × and 10 × acceleration, respectively. Multi-coil physics-guided MR image reconstruction using both under-sampled and fully sampled datasets is achievable with transfer learning and fine-tuning. This methodology can provide improved reconstruction quality in the high-data regime and improved robustness in the low-data regime at high acceleration rates.

6.
Front Comput Neurosci ; 18: 1404623, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39380741

RESUMO

Introduction: With the great success of Transformers in the field of machine learning, it is also gradually attracting widespread interest in the field of remote sensing (RS). However, the research in the field of remote sensing has been hampered by the lack of large labeled data sets and the inconsistency of data modes caused by the diversity of RS platforms. With the rise of self-supervised learning (SSL) algorithms in recent years, RS researchers began to pay attention to the application of "pre-training and fine-tuning" paradigm in RS. However, there are few researches on multi-modal data fusion in remote sensing field. Most of them choose to use only one of the modal data or simply splice multiple modal data roughly. Method: In order to study a more efficient multi-modal data fusion scheme, we propose a multi-modal fusion mechanism based on gated unit control (MGSViT). In this paper, we pretrain the ViT model based on BigEarthNet dataset by combining two commonly used SSL algorithms, and propose an intra-modal and inter-modal gated fusion unit for feature learning by combining multispectral (MS) and synthetic aperture radar (SAR). Our method can effectively combine different modal data to extract key feature information. Results and discussion: After fine-tuning and comparison experiments, we outperform the most advanced algorithms in all downstream classification tasks. The validity of our proposed method is verified.

7.
Heliyon ; 10(19): e37962, 2024 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-39381100

RESUMO

Transferring the ImageNet pre-trained weights to the various remote sensing tasks has produced acceptable results and reduced the need for labeled samples. However, the domain differences between ground imageries and remote sensing images cause the performance of such transfer learning to be limited. The difficulty of annotating remote sensing images is well-known as it requires domain experts and more time, whereas unlabeled data is readily available. Recently, self-supervised learning, which is a subset of unsupervised learning, emerged and significantly improved representation learning. Recent research has demonstrated that self-supervised learning methods capture visual features that are more discriminative and transferable than the supervised ImageNet weights. We are motivated by these facts to pre-train the in-domain representations of remote sensing imagery using contrastive self-supervised learning and transfer the learned features to other related remote sensing datasets. Specifically, we used the SimSiam algorithm to pre-train the in-domain knowledge of remote sensing datasets and then transferred the obtained weights to the other scene classification datasets. Thus, we have obtained state-of-the-art results on five land cover classification datasets with varying numbers of classes and spatial resolutions. In addition, by conducting appropriate experiments, including feature pre-training using datasets with different attributes, we have identified the most influential factors that make a dataset a good choice for obtaining in-domain features. We have transferred the features obtained by pre-training SimSiam on remote sensing datasets to various downstream tasks and used them as initial weights for fine-tuning. Moreover, we have linearly evaluated the obtained representations in cases where the number of samples per class is limited. Our experiments have demonstrated that using a higher-resolution dataset during the self-supervised pre-training stage results in learning more discriminative and general representations.

8.
Neural Netw ; 181: 106773, 2024 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-39383676

RESUMO

The recent advances in deep clustering have been made possible by significant progress in self-supervised and pseudo-supervised learning. However, the trade-off between self-supervision and pseudo-supervision can give rise to three primary issues. The joint training causes Feature Randomness and Feature Drift, whereas the independent training causes Feature Randomness and Feature Twist. In essence, using pseudo-labels generates random and unreliable features. The combination of pseudo-supervision and self-supervision drifts the reliable clustering-oriented features. Moreover, moving from self-supervision to pseudo-supervision can twist the curved latent manifolds. This paper addresses the limitations of existing deep clustering paradigms concerning Feature Randomness, Feature Drift, and Feature Twist. We propose a new paradigm with a new strategy that replaces pseudo-supervision with a second round of self-supervision training. The new strategy makes the transition between instance-level self-supervision and neighborhood-level self-supervision smoother and less abrupt. Moreover, it prevents the drifting effect that is caused by the strong competition between instance-level self-supervision and clustering-level pseudo-supervision. Moreover, the absence of the pseudo-supervision prevents the risk of generating random features. With this novel approach, our paper introduces a Rethinking of the Deep Clustering Paradigms, denoted by R-DC. Our model is specifically designed to address three primary challenges encountered in Deep Clustering: Feature Randomness, Feature Drift, and Feature Twist. Experimental results conducted on six datasets have shown that the two-level self-supervision training yields substantial improvements, as evidenced by the results of the clustering and ablation study. Furthermore, experimental comparisons with nine state-of-the-art clustering models have clearly shown that our strategy leads to a significant enhancement in performance.

9.
Front Robot AI ; 11: 1407519, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39403111

RESUMO

Predicting the consequences of the agent's actions on its environment is a pivotal challenge in robotic learning, which plays a key role in developing higher cognitive skills for intelligent robots. While current methods have predominantly relied on vision and motion data to generate the predicted videos, more comprehensive sensory perception is required for complex physical interactions such as contact-rich manipulation or highly dynamic tasks. In this work, we investigate the interdependence between vision and tactile sensation in the scenario of dynamic robotic interaction. A multi-modal fusion mechanism is introduced to the action-conditioned video prediction model to forecast future scenes, which enriches the single-modality prototype with a compressed latent representation of multiple sensory inputs. Additionally, to accomplish the interactive setting, we built a robotic interaction system that is equipped with both web cameras and vision-based tactile sensors to collect the dataset of vision-tactile sequences and the corresponding robot action data. Finally, through a series of qualitative and quantitative comparative study of different prediction architecture and tasks, we present insightful analysis of the cross-modality influence between vision, tactile and action, revealing the asymmetrical impact that exists between the sensations when contributing to interpreting the environment information. This opens possibilities for more adaptive and efficient robotic control in complex environments, with implications for dexterous manipulation and human-robot interaction.

10.
Magn Reson Med ; 2024 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-39385438

RESUMO

PURPOSE: To introduce a new method for generalized RF pulse design using physics-guided self-supervised learning (GPS), which uses the Bloch equations as the guiding physics model. THEORY AND METHODS: The GPS framework consists of a neural network module and a physics module, where the physics module is a Bloch simulator for MRI applications. For RF pulse design, the neural network module maps an input target profile to an RF pulse, which is subsequently loaded into the physics module. Through the supervision of the physics module, the neural network module designs an RF pulse corresponding to the target profile. GPS was applied to design 1D selective, B 1 $$ {B}_1 $$ -insensitive, saturation, and multidimensional RF pulses, each conventionally requiring dedicated design algorithms. We further demonstrate our method's flexibility and versatility by compensating for experimental and scanner imperfections through online adaptation. RESULTS: Both simulations and experiments show that GPS can design a variety of RF pulses with corresponding profiles that agree well with the target input. Despite these verifications, GPS-designed pulses have unique differences compared to conventional designs, such as achieving B 1 $$ {B}_1 $$ -insensitivity using different mechanisms and using non-sampled regions of the conventional design to lower its peak power. Experiments, both ex vivo and in vivo, further verify that it can also be used for online adaptation to correct system imperfections, such as B 0 $$ {B}_0 $$ / B 1 + $$ {B}_1^{+} $$ inhomogeneity. CONCLUSION: This work demonstrates the generalizability, versatility, and flexibility of the GPS method for designing RF pulses and showcases its utility in several applications.

11.
MethodsX ; 13: 102959, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-39329154

RESUMO

This paper describes a method that can perform robust detection and classification in out-of-distribution rotated images in the medical domain. In real-world medical imaging tools, noise due to the rotation of the body part is frequently observed. This noise reduces the accuracy of AI-based classification and prediction models. Hence, it is important to develop models which are rotation invariant. To that end, the proposed method - RISC (rotation invariant self-supervised vision framework) addresses this issue of rotational corruption. We present state-of-the-art rotation-invariant classification results and provide explainability for the performance in the domain. The evaluation of the proposed method is carried out on real-world adversarial examples in Medical Imagery-OrganAMNIST, RetinaMNIST and PneumoniaMNIST. It is observed that RISC outperforms the rotation-affected benchmark methods by obtaining 22\%, 17\% and 2\% accuracy boost on OrganAMNIST, PneumoniaMNIST and RetinaMNIST rotated baselines respectively. Further, explainability results are demonstrated. This methods paper describes:•a representation learning approach that can perform robust detection and classification in out-of-distribution rotated images in the medical domain.•It presents a method that incorporates self-supervised rotation invariance for correcting rotational corruptions.•GradCAM-based explainability for the rotational SSL pretext task and the downstream classification outcomes for the three benchmark datasets are presented.

12.
J Imaging Inform Med ; 2024 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-39231886

RESUMO

In recent years, X-ray low-dose computed tomography (LDCT) has garnered widespread attention due to its significant reduction in the risk of patient radiation exposure. However, LDCT images often contain a substantial amount of noises, adversely affecting diagnostic quality. To mitigate this, a plethora of LDCT denoising methods have been proposed. Among them, deep learning (DL) approaches have emerged as the most effective, due to their robust feature extraction capabilities. Yet, the prevalent use of supervised training paradigms is often impractical due to the challenges in acquiring low-dose and normal-dose CT pairs in clinical settings. Consequently, unsupervised and self-supervised deep learning methods have been introduced for LDCT denoising, showing considerable potential for clinical applications. These methods' efficacy hinges on training strategies. Notably, there appears to be no comprehensive reviews of these strategies. Our review aims to address this gap, offering insights and guidance for researchers and practitioners. Based on training strategies, we categorize the LDCT methods into six groups: (i) cycle consistency-based, (ii) score matching-based, (iii) statistical characteristics of noise-based, (iv) similarity-based, (v) LDCT synthesis model-based, and (vi) hybrid methods. For each category, we delve into the theoretical underpinnings, training strategies, strengths, and limitations. In addition, we also summarize the open source codes of the reviewed methods. Finally, the review concludes with a discussion on open issues and future research directions.

13.
Sci Rep ; 14(1): 20854, 2024 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-39242792

RESUMO

Progressive gait impairment is common among aging adults. Remote phenotyping of gait during daily living has the potential to quantify gait alterations and evaluate the effects of interventions that may prevent disability in the aging population. Here, we developed ElderNet, a self-supervised learning model for gait detection from wrist-worn accelerometer data. Validation involved two diverse cohorts, including over 1000 participants without gait labels, as well as 83 participants with labeled data: older adults with Parkinson's disease, proximal femoral fracture, chronic obstructive pulmonary disease, congestive heart failure, and healthy adults. ElderNet presented high accuracy (96.43 ± 2.27), specificity (98.87 ± 2.15), recall (82.32 ± 11.37), precision (86.69 ± 17.61), and F1 score (82.92 ± 13.39). The suggested method yielded superior performance compared to two state-of-the-art gait detection algorithms, with improved accuracy and F1 score (p < 0.05). In an initial evaluation of construct validity, ElderNet identified differences in estimated daily walking durations across cohorts with different clinical characteristics, such as mobility disability (p < 0.001) and parkinsonism (p < 0.001). The proposed self-supervised method has the potential to serve as a valuable tool for remote phenotyping of gait function during daily living in aging adults, even among those with gait impairments.


Assuntos
Acelerometria , Marcha , Aprendizado de Máquina Supervisionado , Humanos , Idoso , Masculino , Feminino , Marcha/fisiologia , Acelerometria/métodos , Acelerometria/instrumentação , Idoso de 80 Anos ou mais , Atividades Cotidianas , Punho , Algoritmos , Dispositivos Eletrônicos Vestíveis , Pessoa de Meia-Idade
14.
Sensors (Basel) ; 24(17)2024 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-39275768

RESUMO

The detection of anomalies in dam deformation is paramount for evaluating structural integrity and facilitating early warnings, representing a critical aspect of dam health monitoring (DHM). Conventional data-driven methods for dam anomaly detection depend extensively on historical data; however, obtaining annotated data is both expensive and labor-intensive. Consequently, methodologies that leverage unlabeled or semi-labeled data are increasingly gaining popularity. This paper introduces a spatiotemporal contrastive learning pretraining (STCLP) strategy designed to extract discriminative features from unlabeled datasets of dam deformation. STCLP innovatively combines spatial contrastive learning based on temporal contrastive learning to capture representations embodying both spatial and temporal characteristics. Building upon this, a novel anomaly detection method for dam deformation utilizing STCLP is proposed. This method transfers pretrained parameters to targeted downstream classification tasks and leverages prior knowledge for enhanced fine-tuning. For validation, an arch dam serves as the case study. The results reveal that the proposed method demonstrates excellent performance, surpassing other benchmark models.

15.
New Microbes New Infect ; 62: 101457, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-39253407

RESUMO

Background: Large vision models (LVM) pretrained by large datasets have demonstrated their enormous capacity to understand visual patterns and capture semantic information from images. We proposed a novel method of knowledge domain adaptation with pretrained LVM for a low-cost artificial intelligence (AI) model to quantify the severity of SARS-CoV-2 pneumonia based on frontal chest X-ray (CXR) images. Methods: Our method used the pretrained LVMs as the primary feature extractor and self-supervised contrastive learning for domain adaptation. An encoder with a 2048-dimensional feature vector output was first trained by self-supervised learning for knowledge domain adaptation. Then a multi-layer perceptron (MLP) was trained for the final severity prediction. A dataset with 2599 CXR images was used for model training and evaluation. Results: The model based on the pretrained vision transformer (ViT) and self-supervised learning achieved the best performance in cross validation, with mean squared error (MSE) of 23.83 (95 % CI 22.67-25.00) and mean absolute error (MAE) of 3.64 (95 % CI 3.54-3.73). Its prediction correlation has the R 2 of 0.81 (95 % CI 0.79-0.82) and Spearman ρ of 0.80 (95 % CI 0.77-0.81), which are comparable to the current state-of-the-art (SOTA) methods trained by much larger CXR datasets. Conclusion: The proposed new method has achieved the SOTA performance to quantify the severity of SARS-CoV-2 pneumonia at a significantly lower cost. The method can be extended to other infectious disease detection or quantification to expedite the application of AI in medical research.

16.
Water Res ; 266: 122405, 2024 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-39265217

RESUMO

Researchers and practitioners have extensively utilized supervised Deep Learning methods to quantify floating litter in rivers and canals. These methods require the availability of large amount of labeled data for training. The labeling work is expensive and laborious, resulting in small open datasets available in the field compared to the comprehensive datasets for computer vision, e.g., ImageNet. Fine-tuning models pre-trained on these larger datasets helps improve litter detection performances and reduces data requirements. Yet, the effectiveness of using features learned from generic datasets is limited in large-scale monitoring, where automated detection must adapt across different locations, environmental conditions, and sensor settings. To address this issue, we propose a two-stage semi-supervised learning method to detect floating litter based on the Swapping Assignments between multiple Views of the same image (SwAV). SwAV is a self-supervised learning approach that learns the underlying feature representation from unlabeled data. In the first stage, we used SwAV to pre-train a ResNet50 backbone architecture on about 100k unlabeled images. In the second stage, we added new layers to the pre-trained ResNet50 to create a Faster R-CNN architecture, and fine-tuned it with a limited number of labeled images (≈1.8k images with 2.6k annotated litter items). We developed and validated our semi-supervised floating litter detection methodology for images collected in canals and waterways of Delft (the Netherlands) and Jakarta (Indonesia). We tested for out-of-domain generalization performances in a zero-shot fashion using additional data from Ho Chi Minh City (Vietnam), Amsterdam and Groningen (the Netherlands). We benchmarked our results against the same Faster R-CNN architecture trained via supervised learning alone by fine-tuning ImageNet pre-trained weights. The findings indicate that the semi-supervised learning method matches or surpasses the supervised learning benchmark when tested on new images from the same training locations. We measured better performances when little data (≈200 images with about 300 annotated litter items) is available for fine-tuning and with respect to reducing false positive predictions. More importantly, the proposed approach demonstrates clear superiority for generalization on the unseen locations, with improvements in average precision of up to 12.7%. We attribute this superior performance to the more effective high-level feature extraction from SwAV pre-training from relevant unlabeled images. Our findings highlight a promising direction to leverage semi-supervised learning for developing foundational models, which have revolutionized artificial intelligence applications in most fields. By scaling our proposed approach with more data and compute, we can make significant strides in monitoring to address the global challenge of litter pollution in water bodies.

17.
J Imaging Inform Med ; 2024 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-39299958

RESUMO

Diabetic retinopathy (DR) is a retinal disease caused by diabetes. If there is no intervention, it may even lead to blindness. Therefore, the detection of diabetic retinopathy is of great significance for preventing blindness in patients. Most of the existing DR detection methods use supervised methods, which usually require a large number of accurate pixel-level annotations. To solve this problem, we propose a self-supervised Equivariant Refinement Classification Network (ERCN) for DR classification. First, we use an unsupervised contrast pre-training network to learn a more generalized representation. Secondly, the class activation map (CAM) is refined by self-supervision learning. It first uses a spatial masking method to suppress low-confidence predictions, and then uses the feature similarity between pixels to encourage fine-grained activation to achieve more accurate positioning of the lesion. We propose a hybrid equivariant regularization loss to alleviate the degradation caused by the local minimum in the CAM refinement process. To further improve the classification accuracy, we propose an attention-based multi-instance learning (MIL), which weights each element of the feature map as an instance, which is more effective than the traditional patch-based instance extraction method. We evaluate our method on the EyePACS and DAVIS datasets and achieved 87.4% test accuracy in the EyePACS dataset and 88.7% test accuracy in the DAVIS dataset. It shows that the proposed method achieves better performance in DR detection compared with other state-of-the-art methods in self-supervised DR detection.

18.
J Neurosci Methods ; 411: 110269, 2024 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-39222796

RESUMO

BACKGROUND: Image reconstruction is a critical task in brain decoding research, primarily utilizing functional magnetic resonance imaging (fMRI) data. However, due to challenges such as limited samples in fMRI data, the quality of reconstruction results often remains poor. NEW METHOD: We proposed a three-stage multi-level deep fusion model (TS-ML-DFM). The model employed a three-stage training process, encompassing components such as image encoders, generators, discriminators, and fMRI encoders. In this method, we incorporated distinct supplementary features derived separately from depth images and original images. Additionally, the method integrated several components, including a random shift module, dual attention module, and multi-level feature fusion module. RESULTS: In both qualitative and quantitative comparisons on the Horikawa17 and VanGerven10 datasets, our method exhibited excellent performance. COMPARISON WITH EXISTING METHODS: For example, on the primary Horikawa17 dataset, our method was compared with other leading methods based on metrics the average hash value, histogram similarity, mutual information, structural similarity accuracy, AlexNet(2), AlexNet(5), and pairwise human perceptual similarity accuracy. Compared to the second-ranked results in each metric, the proposed method achieved improvements of 0.99 %, 3.62 %, 3.73 %, 2.45 %, 3.51 %, 0.62 %, and 1.03 %, respectively. In terms of the SwAV top-level semantic metric, a substantial improvement of 10.53 % was achieved compared to the second-ranked result in the pixel-level reconstruction methods. CONCLUSIONS: The TS-ML-DFM method proposed in this study, when applied to decoding brain visual patterns using fMRI data, has outperformed previous algorithms, thereby facilitating further advancements in research within this field.


Assuntos
Encéfalo , Processamento de Imagem Assistida por Computador , Imageamento por Ressonância Magnética , Humanos , Imageamento por Ressonância Magnética/métodos , Processamento de Imagem Assistida por Computador/métodos , Encéfalo/diagnóstico por imagem , Encéfalo/fisiologia , Mapeamento Encefálico/métodos , Aprendizado Profundo
19.
Phys Med Biol ; 2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39312947

RESUMO

Bone scans play an important role in skeletal lesion assessment, but gamma cameras exhibit challenges with low sensitivity and high noise levels. Deep learning (DL) has emerged as a promising solution to enhance image quality without increasing radiation exposure or scan time. However, existing self-supervised denoising methods, such as Noise2Noise (N2N), may introduce deviations from the clinical standard in bone scans. This study proposes an improved self-supervised denoising technique to minimize discrepancies between DL-based denoising and full scan images. Retrospective analysis of 351 whole-body bone scan data sets was conducted. In this study, we used N2N and Noise2FullCount (N2F) denoising models, along with an interpolated version of N2N (iN2N). Denoising networks were separately trained for each reduced scan time from 5 to 50%, and also trained for mixed training datasets, which include all shortened scans. We performed quantitative analysis and clinical evaluation by nuclear medicine experts. The denoising networks effectively generated images resembling full scans, with N2F revealing distinctive patterns for different scan times, N2N producing smooth textures with slight blurring, and iN2N closely mirroring full scan patterns. Quantitative analysis showed that denoising improved with longer input times and mixed count training outperformed fixed count training. Traditional denoising methods lagged behind DL-based denoising. N2N demonstrated limitations in long-scan images. Clinical evaluation favored N2N and iN2N in resolution, noise, blurriness, and findings, showcasing their potential for enhanced diagnostic performance in quarter-time scans. The improved self-supervised denoising technique presented in this study offers a viable solution to enhance bone scan image quality, minimizing deviations from clinical standards. The method's effectiveness was demonstrated quantitatively and clinically, showing promise for quarter-time scans without compromising diagnostic performance. This approach holds potential for improving bone scan interpretations, aiding in more accurate clinical diagnoses.

20.
Front Neuroinform ; 18: 1277050, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39315001

RESUMO

We present a novel neural network-based method for analyzing intra-voxel structures, addressing critical challenges in diffusion-weighted MRI analysis for brain connectivity and development studies. The network architecture, called the Local Neighborhood Neural Network, is designed to use the spatial correlations of neighboring voxels for an enhanced inference while reducing parameter overhead. Our model exploits these relationships to improve the analysis of complex structures and noisy data environments. We adopt a self-supervised approach to address the lack of ground truth data, generating signals of voxel neighborhoods to integrate the training set. This eliminates the need for manual annotations and facilitates training under realistic conditions. Comparative analyses show that our method outperforms the constrained spherical deconvolution (CSD) method in quantitative and qualitative validations. Using phantom images that mimic in vivo data, our approach improves angular error, volume fraction estimation accuracy, and success rate. Furthermore, a qualitative comparison of the results in actual data shows a better spatial consistency of the proposed method in areas of real brain images. This approach demonstrates enhanced intra-voxel structure analysis capabilities and holds promise for broader application in various imaging scenarios.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA