Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 296
Filtrar
1.
Quant Imaging Med Surg ; 14(8): 5443-5459, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39144045

RESUMO

Background: The automated classification of histological images is crucial for the diagnosis of cancer. The limited availability of well-annotated datasets, especially for rare cancers, poses a significant challenge for deep learning methods due to the small number of relevant images. This has led to the development of few-shot learning approaches, which bear considerable clinical importance, as they are designed to overcome the challenges of data scarcity in deep learning for histological image classification. Traditional methods often ignore the challenges of intraclass diversity and interclass similarities in histological images. To address this, we propose a novel mutual reconstruction network model, aimed at meeting these challenges and improving the few-shot classification performance of histological images. Methods: The key to our approach is the extraction of subtle and discriminative features. We introduce a feature enhancement module (FEM) and a mutual reconstruction module to increase differences between classes while reducing variance within classes. First, we extract features of support and query images using a feature extractor. These features are then processed by the FEM, which uses a self-attention mechanism for self-reconstruction of features, enhancing the learning of detailed features. These enhanced features are then input into the mutual reconstruction module. This module uses enhanced support features to reconstruct enhanced query features and vice versa. The classification of query samples is based on weighted calculations of the distances between query features and reconstructed query features and between support features and reconstructed support features. Results: We extensively evaluated our model using a specially created few-shot histological image dataset. The results showed that in a 5-way 10-shot setup, our model achieved an impressive accuracy of 92.09%. This is a 23.59% improvement in accuracy compared to the model-agnostic meta-learning (MAML) method, which does not focus on fine-grained attributes. In the more challenging, 5-way 1-shot setting, our model also performed well, demonstrating a 18.52% improvement over the ProtoNet, which does not address this challenge. Additional ablation studies indicated the effectiveness and complementary nature of each module and confirmed our method's ability to parse small differences between classes and large variations within classes in histological images. These findings strongly support the superiority of our proposed method in the few-shot classification of histological images. Conclusions: The mutual reconstruction network provides outstanding performance in the few-shot classification of histological images, successfully overcoming the challenges of similarities between classes and diversity within classes. This marks a significant advancement in the automated classification of histological images.

2.
Sci Rep ; 14(1): 18319, 2024 08 07.
Artigo em Inglês | MEDLINE | ID: mdl-39112791

RESUMO

Accurately assigning standardized diagnosis and procedure codes from clinical text is crucial for healthcare applications. However, this remains challenging due to the complexity of medical language. This paper proposes a novel model that incorporates extreme multi-label classification tasks to enhance International Classification of Diseases (ICD) coding. The model utilizes deformable convolutional neural networks to fuse representations from hidden layer outputs of pre-trained language models and external medical knowledge embeddings fused using a multimodal approach to provide rich semantic encodings for each code. A probabilistic label tree is constructed based on the hierarchical structure existing in ICD labels to incorporate ontological relationships between ICD codes and enable structured output prediction. Experiments on medical code prediction on the MIMIC-III database demonstrate competitive performance, highlighting the benefits of this technique for robust clinical code assignment.


Assuntos
Classificação Internacional de Doenças , Redes Neurais de Computação , Semântica , Humanos , Processamento de Linguagem Natural , Algoritmos , Bases de Dados Factuais
3.
Neural Netw ; 179: 106536, 2024 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-39089156

RESUMO

Cross-domain few-shot Learning (CDFSL) is proposed to first pre-train deep models on a source domain dataset where sufficient data is available, and then generalize models to target domains to learn from only limited data. However, the gap between the source and target domains greatly hampers the generalization and target-domain few-shot finetuning. To address this problem, we analyze the domain gap from the aspect of frequency-domain analysis. We find the domain gap could be reflected by the compositions of source-domain spectra, and the lack of compositions in the source datasets limits the generalization. Therefore, we aim to expand the coverage of spectra composition in the source datasets to help the source domain cover a larger range of possible target-domain information, to mitigate the domain gap. To achieve this goal, we propose the Spectral Decomposition and Transformation (SDT) method, which first randomly decomposes the spectrogram of the source datasets into orthogonal bases, and then randomly samples different coordinates in the space formed by these bases. We integrate the above process into a data augmentation module, and further design a two-stream network to handle augmented images and original images respectively. Experimental results show that our method achieves state-of-the-art performance in the CDFSL benchmark dataset.

4.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39133096

RESUMO

The molecular property prediction (MPP) plays a crucial role in the drug discovery process, providing valuable insights for molecule evaluation and screening. Although deep learning has achieved numerous advances in this area, its success often depends on the availability of substantial labeled data. The few-shot MPP is a more challenging scenario, which aims to identify unseen property with only few available molecules. In this paper, we propose an attribute-guided prototype network (APN) to address the challenge. APN first introduces an molecular attribute extractor, which can not only extract three different types of fingerprint attributes (single fingerprint attributes, dual fingerprint attributes, triplet fingerprint attributes) by considering seven circular-based, five path-based, and two substructure-based fingerprints, but also automatically extract deep attributes from self-supervised learning methods. Furthermore, APN designs the Attribute-Guided Dual-channel Attention module to learn the relationship between the molecular graphs and attributes and refine the local and global representation of the molecules. Compared with existing works, APN leverages high-level human-defined attributes and helps the model to explicitly generalize knowledge in molecular graphs. Experiments on benchmark datasets show that APN can achieve state-of-the-art performance in most cases and demonstrate that the attributes are effective for improving few-shot MPP performance. In addition, the strong generalization ability of APN is verified by conducting experiments on data from different domains.


Assuntos
Aprendizado Profundo , Descoberta de Drogas , Descoberta de Drogas/métodos , Humanos , Algoritmos , Redes Neurais de Computação
5.
ACS Nano ; 2024 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-39137093

RESUMO

Ternary content-addressable memory (TCAM) is promising for data-intensive artificial intelligence applications due to its large-scale parallel in-memory computing capabilities. However, it is still challenging to build a reliable TCAM cell from a single circuit component. Here, we demonstrate a single transistor TCAM based on a floating-gate two-dimensional (2D) ambipolar MoTe2 field-effect transistor with graphene contacts. Our bottom graphene contacts scheme enables gate modulation of the contact Schottky barrier heights, facilitating carrier injection for both electrons and holes. The 2D nature of our channel and contact materials provides device scaling potentials beyond silicon. By integration with a floating-gate stack, a highly reliable nonvolatile memory is achieved. Our TCAM cell exhibits a resistance ratio larger than 1000 and symmetrical complementary states, allowing the implementation of large-scale TCAM arrays. Finally, we show through circuit simulations that in-memory Hamming distance computation is readily achievable based on our TCAM with array sizes up to 128 cells.

6.
Sensors (Basel) ; 24(15)2024 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-39124022

RESUMO

Nowadays, autonomous driving technology has become widely prevalent. The intelligent vehicles have been equipped with various sensors (e.g., vision sensors, LiDAR, depth cameras etc.). Among them, the vision systems with tailored semantic segmentation and perception algorithms play critical roles in scene understanding. However, the traditional supervised semantic segmentation needs a large number of pixel-level manual annotations to complete model training. Although few-shot methods reduce the annotation work to some extent, they are still labor intensive. In this paper, a self-supervised few-shot semantic segmentation method based on Multi-task Learning and Dense Attention Computation (dubbed MLDAC) is proposed. The salient part of an image is split into two parts; one of them serves as the support mask for few-shot segmentation, while cross-entropy losses are calculated between the other part and the entire region with the predicted results separately as multi-task learning so as to improve the model's generalization ability. Swin Transformer is used as our backbone to extract feature maps at different scales. These feature maps are then input to multiple levels of dense attention computation blocks to enhance pixel-level correspondence. The final prediction results are obtained through inter-scale mixing and feature skip connection. The experimental results indicate that MLDAC obtains 55.1% and 26.8% one-shot mIoU self-supervised few-shot segmentation on the PASCAL-5i and COCO-20i datasets, respectively. In addition, it achieves 78.1% on the FSS-1000 few-shot dataset, proving its efficacy.

8.
Sci Rep ; 14(1): 17900, 2024 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-39095389

RESUMO

Plant diseases pose significant threats to agriculture, impacting both food safety and public health. Traditional plant disease detection systems are typically limited to recognizing disease categories included in the training dataset, rendering them ineffective against new disease types. Although out-of-distribution (OOD) detection methods have been proposed to address this issue, the impact of fine-tuning paradigms on these methods has been overlooked. This paper focuses on studying the impact of fine-tuning paradigms on the performance of detecting unknown plant diseases. Currently, fine-tuning on visual tasks is mainly divided into visual-based models and visual-language-based models. We first discuss the limitations of large-scale visual language models in this task: textual prompts are difficult to design. To avoid the side effects of textual prompts, we futher explore the effectiveness of purely visual pre-trained models for OOD detection in plant disease tasks. Specifically, we employed five publicly accessible datasets to establish benchmarks for open-set recognition, OOD detection, and few-shot learning in plant disease recognition. Additionally, we comprehensively compared various OOD detection methods, fine-tuning paradigms, and factors affecting OOD detection performance, such as sample quantity. The results show that visual prompt tuning outperforms fully fine-tuning and linear probe tuning in out-of-distribution detection performance, especially in the few-shot scenarios. Notably, the max-logit-based on visual prompt tuning achieves an AUROC score of 94.8 % in the 8-shot setting, which is nearly comparable to the method of fully fine-tuning on the full dataset (95.2 % ), which implies that an appropriate fine-tuning paradigm can directly improve OOD detection performance. Finally, we visualized the prediction distributions of different OOD detection methods and discussed the selection of thresholds. Overall, this work lays the foundation for unknown plant disease recognition, providing strong support for the security and reliability of plant disease recognition systems. We will release our code at https://github.com/JiuqingDong/PDOOD to further advance this field.


Assuntos
Doenças das Plantas , Algoritmos
9.
Microsc Microanal ; 30(4): 741-750, 2024 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-39083424

RESUMO

Automated particle analysis (APA) provides a vast amount of compositional data via energy-dispersive X-ray spectroscopy along with size and shape data via scanning electron microscopy for individual particles in a sample. In many instances, APA data are leveraged to support identification of the source of a sample based on the detection of particles of a specific composition. Often, the particles that provide context make up a minuscule portion of the sample. Additionally, the interpretation of complex samples can be difficult due to the diversity of compositions both in the mixture and within a particle. In this work, we demonstrate a method to compute and cluster similarity graphs that describe inter-particle relationships within a sample using a multi-modal few-shot learning neural network. As a proof-of-concept, we show that samples known to have been exposed to gunshot residue can be distinguished from samples occasionally mistaken for gunshot residue. Our workflow builds upon standard APA techniques and data processing methods to unveil additional information in a readily interpretable and quantitatively comparable format.

11.
Neuropathol Appl Neurobiol ; 50(4): e12997, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39010256

RESUMO

AIMS: Recent advances in artificial intelligence, particularly with large language models like GPT-4Vision (GPT-4V)-a derivative feature of ChatGPT-have expanded the potential for medical image interpretation. This study evaluates the accuracy of GPT-4V in image classification tasks of histopathological images and compares its performance with a traditional convolutional neural network (CNN). METHODS: We utilised 1520 images, including haematoxylin and eosin staining and tau immunohistochemistry, from patients with various neurodegenerative diseases, such as Alzheimer's disease (AD), progressive supranuclear palsy (PSP) and corticobasal degeneration (CBD). We assessed GPT-4V's performance using multi-step prompts to determine how textual context influences image interpretation. We also employed few-shot learning to enhance improvements in GPT-4V's diagnostic performance in classifying three specific tau lesions-astrocytic plaques, neuritic plaques and tufted astrocytes-and compared the outcomes with the CNN model YOLOv8. RESULTS: GPT-4V accurately recognised staining techniques and tissue origin but struggled with specific lesion identification. The interpretation of images was notably influenced by the provided textual context, which sometimes led to diagnostic inaccuracies. For instance, when presented with images of the motor cortex, the diagnosis shifted inappropriately from AD to CBD or PSP. However, few-shot learning markedly improved GPT-4V's diagnostic capabilities, enhancing accuracy from 40% in zero-shot learning to 90% with 20-shot learning, matching the performance of YOLOv8, which required 100-shot learning to achieve the same accuracy. CONCLUSIONS: Although GPT-4V faces challenges in independently interpreting histopathological images, few-shot learning significantly improves its performance. This approach is especially promising for neuropathology, where acquiring extensive labelled datasets is often challenging.


Assuntos
Redes Neurais de Computação , Doenças Neurodegenerativas , Humanos , Doenças Neurodegenerativas/patologia , Interpretação de Imagem Assistida por Computador/métodos , Doença de Alzheimer/patologia
12.
PeerJ Comput Sci ; 10: e2080, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38983194

RESUMO

Poultry farming is an indispensable part of global agriculture, playing a crucial role in food safety and economic development. Managing and preventing diseases is a vital task in the poultry industry, where semantic segmentation technology can significantly enhance the efficiency of traditional manual monitoring methods. Furthermore, traditional semantic segmentation has achieved excellent results on extensively manually annotated datasets, facilitating real-time monitoring of poultry. Nonetheless, the model encounters limitations when exposed to new environments, diverse breeding varieties, or varying growth stages within the same species, necessitating extensive data retraining. Overreliance on large datasets results in higher costs for manual annotations and deployment delays, thus hindering practical applicability. To address this issue, our study introduces HSDNet, an innovative semantic segmentation model based on few-shot learning, for monitoring poultry farms. The HSDNet model adeptly adjusts to new settings or species with a single image input while maintaining substantial accuracy. In the specific context of poultry breeding, characterized by small congregating animals and the inherent complexities of agricultural environments, issues of non-smooth losses arise, potentially compromising accuracy. HSDNet incorporates a Sharpness-Aware Minimization (SAM) strategy to counteract these challenges. Furthermore, by considering the effects of imbalanced loss on convergence, HSDNet mitigates the overfitting issue induced by few-shot learning. Empirical findings underscore HSDNet's proficiency in poultry breeding settings, exhibiting a significant 72.89% semantic segmentation accuracy on single images, which is higher than SOTA's 68.85%.

13.
Front Hum Neurosci ; 18: 1421922, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39050382

RESUMO

This paper presents a systematic literature review, providing a comprehensive taxonomy of Data Augmentation (DA), Transfer Learning (TL), and Self-Supervised Learning (SSL) techniques within the context of Few-Shot Learning (FSL) for EEG signal classification. EEG signals have shown significant potential in various paradigms, including Motor Imagery, Emotion Recognition, Visual Evoked Potentials, Steady-State Visually Evoked Potentials, Rapid Serial Visual Presentation, Event-Related Potentials, and Mental Workload. However, challenges such as limited labeled data, noise, and inter/intra-subject variability have impeded the effectiveness of traditional machine learning (ML) and deep learning (DL) models. This review methodically explores how FSL approaches, incorporating DA, TL, and SSL, can address these challenges and enhance classification performance in specific EEG paradigms. It also delves into the open research challenges related to these techniques in EEG signal classification. Specifically, the review examines the identification of DA strategies tailored to various EEG paradigms, the creation of TL architectures for efficient knowledge transfer, and the formulation of SSL methods for unsupervised representation learning from EEG data. Addressing these challenges is crucial for enhancing the efficacy and robustness of FSL-based EEG signal classification. By presenting a structured taxonomy of FSL techniques and discussing the associated research challenges, this systematic review offers valuable insights for future investigations in EEG signal classification. The findings aim to guide and inspire researchers, promoting advancements in applying FSL methodologies for improved EEG signal analysis and classification in real-world settings.

14.
J Imaging ; 10(7)2024 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-39057738

RESUMO

The limited availability of specialized image databases (particularly in hospitals, where tools vary between providers) makes it difficult to train deep learning models. This paper presents a few-shot learning methodology that uses a pre-trained ResNet integrated with an encoder as a backbone to encode conditional shape information for the classification of neonatal resuscitation equipment from less than 100 natural images. The model is also strengthened by incorporating a reliability score, which enriches the prediction with an estimation of classification reliability. The model, whose performance is cross-validated, reached a median accuracy performance of over 99% (and a lower limit of 73.4% for the least accurate model/fold) using only 87 meta-training images. During the test phase on complex natural images, performance was slightly degraded due to a sub-optimal segmentation strategy (FastSAM) required to maintain the real-time inference phase (median accuracy 87.25%). This methodology proves to be excellent for applying complex classification models to contexts (such as neonatal resuscitation) that are not available in public databases. Improvements to the automatic segmentation strategy prior to the extraction of conditional information will allow a natural application in simulation and hospital settings.

15.
Sensors (Basel) ; 24(13)2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-39001199

RESUMO

Automatic Modulation Recognition (AMR) is a key technology in the field of cognitive communication, playing a core role in many applications, especially in wireless security issues. Currently, deep learning (DL)-based AMR technology has achieved many research results, greatly promoting the development of AMR technology. However, the few-shot dilemma faced by DL-based AMR methods greatly limits their application in practical scenarios. Therefore, this paper endeavored to address the challenge of AMR with limited data and proposed a novel meta-learning method, the Multi-Level Comparison Relation Network with Class Reconstruction (MCRN-CR). Firstly, the method designs a structure of a multi-level comparison relation network, which involves embedding functions to output their feature maps hierarchically, comprehensively calculating the relation scores between query samples and support samples to determine the modulation category. Secondly, the embedding function integrates a reconstruction module, leveraging an autoencoder for support sample reconstruction, wherein the encoder serves dual purposes as the embedding mechanism. The training regimen incorporates a meta-learning paradigm, harmoniously combining classification and reconstruction losses to refine the model's performance. The experimental results on the RadioML2018 dataset show that our designed method can greatly alleviate the small sample problem in AMR and is superior to existing methods.

16.
Sci Rep ; 14(1): 16041, 2024 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-38992098

RESUMO

In the realm of prognosticating the remaining useful life (RUL) of pivotal components, such as aircraft engines, a prevalent challenge persists where the available historical life data often proves insufficient. This insufficiency engenders obstacles such as impediments in performance degradation feature extraction, inadequacies in capturing temporal relationships comprehensively, and diminished predictive accuracy. To address this issue, a 1D CNN-GRU prediction model for few-shot conditions is proposed in this paper. In pursuit of more comprehensive data feature extraction and enhanced RUL prognostication precision, the Convolutional Neural Network (CNN) is selected for its capacity to discern high-dimensional features amid the intricate dynamics of the data. Concurrently, the Gated Recurrent Unit (GRU) network is leveraged for its robust capability in extracting temporal features inherent within the data. We combine the two to construct a CNN-GRU hybrid network. Moreover, the integration of data distribution alongside correlation and monotonicity indices is employed to winnow the input of multi-sensor monitoring parameters into the CNN-GRU network. Finally, the engine RULs are predicted by the trained model. In this paper, experiments are conducted on a sub-dataset of the National Aeronautics and Space Administration (NASA) C-MAPSS multi-constraint dataset to validate the effectiveness of the method. Experimental results have demonstrated that this method has high accuracy in RUL prediction tasks, which can powerfully demonstrate its effectiveness.

17.
Sensors (Basel) ; 24(14)2024 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-39066027

RESUMO

Strip steel plays a crucial role in modern industrial production, where enhancing the accuracy and real-time capabilities of surface defect classification is essential. However, acquiring and annotating defect samples for training deep learning models are challenging, further complicated by the presence of redundant information in these samples. These issues hinder the classification of strip steel surface defects. To address these challenges, this paper introduces a high real-time network, ODNet (Orthogonal Decomposition Network), designed for few-shot strip steel surface defect classification. ODNet utilizes ResNet as its backbone and incorporates orthogonal decomposition technology to reduce the feature redundancies. Furthermore, it integrates skip connection to preserve essential correlation information in the samples, preventing excessive elimination. The model optimizes the parameter efficiency by employing Euclidean distance as the classifier. The orthogonal decomposition not only helps reduce redundant image information but also ensures compatibility with the Euclidean distance requirement for orthogonal input. Extensive experiments conducted on the FSC-20 benchmark demonstrate that ODNet achieves superior real-time performance, accuracy, and generalization compared to alternative methods, effectively addressing the challenges of few-shot strip steel surface defect classification.

18.
Interdiscip Sci ; 16(2): 469-488, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38951382

RESUMO

Image classification, a fundamental task in computer vision, faces challenges concerning limited data handling, interpretability, improved feature representation, efficiency across diverse image types, and processing noisy data. Conventional architectural approaches have made insufficient progress in addressing these challenges, necessitating architectures capable of fine-grained classification, enhanced accuracy, and superior generalization. Among these, the vision transformer emerges as a noteworthy computer vision architecture. However, its reliance on substantial data for training poses a drawback due to its complexity and high data requirements. To surmount these challenges, this paper proposes an innovative approach, MetaV, integrating meta-learning into a vision transformer for medical image classification. N-way K-shot learning is employed to train the model, drawing inspiration from human learning mechanisms utilizing past knowledge. Additionally, deformational convolution and patch merging techniques are incorporated into the vision transformer model to mitigate complexity and overfitting while enhancing feature representation. Augmentation methods such as perturbation and Grid Mask are introduced to address the scarcity and noise in medical images, particularly for rare diseases. The proposed model is evaluated using diverse datasets including Break His, ISIC 2019, SIPaKMed, and STARE. The achieved performance accuracies of 89.89%, 87.33%, 94.55%, and 80.22% for Break His, ISIC 2019, SIPaKMed, and STARE, respectively, present evidence validating the superior performance of the proposed model in comparison to conventional models, setting a new benchmark for meta-vision image classification models.


Assuntos
Processamento de Imagem Assistida por Computador , Humanos , Processamento de Imagem Assistida por Computador/métodos , Algoritmos , Aprendizado de Máquina , Diagnóstico por Imagem , Aprendizado Profundo
19.
JMIR Med Inform ; 12: e56243, 2024 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-39037700

RESUMO

BACKGROUND: Understanding the multifaceted nature of health outcomes requires a comprehensive examination of the social, economic, and environmental determinants that shape individual well-being. Among these determinants, behavioral factors play a crucial role, particularly the consumption patterns of psychoactive substances, which have important implications on public health. The Global Burden of Disease Study shows a growing impact in disability-adjusted life years due to substance use. The successful identification of patients' substance use information equips clinical care teams to address substance-related issues more effectively, enabling targeted support and ultimately improving patient outcomes. OBJECTIVE: Traditional natural language processing methods face limitations in accurately parsing diverse clinical language associated with substance use. Large language models offer promise in overcoming these challenges by adapting to diverse language patterns. This study investigates the application of the generative pretrained transformer (GPT) model in specific GPT-3.5 for extracting tobacco, alcohol, and substance use information from patient discharge summaries in zero-shot and few-shot learning settings. This study contributes to the evolving landscape of health care informatics by showcasing the potential of advanced language models in extracting nuanced information critical for enhancing patient care. METHODS: The main data source for analysis in this paper is Medical Information Mart for Intensive Care III data set. Among all notes in this data set, we focused on discharge summaries. Prompt engineering was undertaken, involving an iterative exploration of diverse prompts. Leveraging carefully curated examples and refined prompts, we investigate the model's proficiency through zero-shot as well as few-shot prompting strategies. RESULTS: Results show GPT's varying effectiveness in identifying mentions of tobacco, alcohol, and substance use across learning scenarios. Zero-shot learning showed high accuracy in identifying substance use, whereas few-shot learning reduced accuracy but improved in identifying substance use status, enhancing recall and F1-score at the expense of lower precision. CONCLUSIONS: Excellence of zero-shot learning in precisely extracting text span mentioning substance use demonstrates its effectiveness in situations in which comprehensive recall is important. Conversely, few-shot learning offers advantages when accurately determining the status of substance use is the primary focus, even if it involves a trade-off in precision. The results contribute to enhancement of early detection and intervention strategies, tailor treatment plans with greater precision, and ultimately, contribute to a holistic understanding of patient health profiles. By integrating these artificial intelligence-driven methods into electronic health record systems, clinicians can gain immediate, comprehensive insights into substance use that results in shaping interventions that are not only timely but also more personalized and effective.

20.
Med Image Anal ; 97: 103258, 2024 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-38996667

RESUMO

Foundation models pre-trained on large-scale data have been widely witnessed to achieve success in various natural imaging downstream tasks. Parameter-efficient fine-tuning (PEFT) methods aim to adapt foundation models to new domains by updating only a small portion of parameters in order to reduce computational overhead. However, the effectiveness of these PEFT methods, especially in cross-domain few-shot scenarios, e.g., medical image analysis, has not been fully explored. In this work, we facilitate the study of the performance of PEFT when adapting foundation models to medical image classification tasks. Furthermore, to alleviate the limitations of prompt introducing ways and approximation capabilities on Transformer architectures of mainstream prompt tuning methods, we propose the Embedded Prompt Tuning (EPT) method by embedding prompt tokens into the expanded channels. We also find that there are anomalies in the feature space distribution of foundation models during pre-training process, and prompt tuning can help mitigate this negative impact. To explain this phenomenon, we also introduce a novel perspective to understand prompt tuning: Prompt tuning is a distribution calibrator. And we support it by analysing patch-wise scaling and feature separation operations contained in EPT. Our experiments show that EPT outperforms several state-of-the-art fine-tuning methods by a significant margin on few-shot medical image classification tasks, and completes the fine-tuning process within highly competitive time, indicating EPT is an effective PEFT method. The source code is available at github.com/zuwenqiang/EPT.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA