Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 301
Filtrar
1.
Neural Netw ; 179: 106536, 2024 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-39089156

RESUMEN

Cross-domain few-shot Learning (CDFSL) is proposed to first pre-train deep models on a source domain dataset where sufficient data is available, and then generalize models to target domains to learn from only limited data. However, the gap between the source and target domains greatly hampers the generalization and target-domain few-shot finetuning. To address this problem, we analyze the domain gap from the aspect of frequency-domain analysis. We find the domain gap could be reflected by the compositions of source-domain spectra, and the lack of compositions in the source datasets limits the generalization. Therefore, we aim to expand the coverage of spectra composition in the source datasets to help the source domain cover a larger range of possible target-domain information, to mitigate the domain gap. To achieve this goal, we propose the Spectral Decomposition and Transformation (SDT) method, which first randomly decomposes the spectrogram of the source datasets into orthogonal bases, and then randomly samples different coordinates in the space formed by these bases. We integrate the above process into a data augmentation module, and further design a two-stream network to handle augmented images and original images respectively. Experimental results show that our method achieves state-of-the-art performance in the CDFSL benchmark dataset.

3.
Sci Rep ; 14(1): 17900, 2024 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-39095389

RESUMEN

Plant diseases pose significant threats to agriculture, impacting both food safety and public health. Traditional plant disease detection systems are typically limited to recognizing disease categories included in the training dataset, rendering them ineffective against new disease types. Although out-of-distribution (OOD) detection methods have been proposed to address this issue, the impact of fine-tuning paradigms on these methods has been overlooked. This paper focuses on studying the impact of fine-tuning paradigms on the performance of detecting unknown plant diseases. Currently, fine-tuning on visual tasks is mainly divided into visual-based models and visual-language-based models. We first discuss the limitations of large-scale visual language models in this task: textual prompts are difficult to design. To avoid the side effects of textual prompts, we futher explore the effectiveness of purely visual pre-trained models for OOD detection in plant disease tasks. Specifically, we employed five publicly accessible datasets to establish benchmarks for open-set recognition, OOD detection, and few-shot learning in plant disease recognition. Additionally, we comprehensively compared various OOD detection methods, fine-tuning paradigms, and factors affecting OOD detection performance, such as sample quantity. The results show that visual prompt tuning outperforms fully fine-tuning and linear probe tuning in out-of-distribution detection performance, especially in the few-shot scenarios. Notably, the max-logit-based on visual prompt tuning achieves an AUROC score of 94.8 % in the 8-shot setting, which is nearly comparable to the method of fully fine-tuning on the full dataset (95.2 % ), which implies that an appropriate fine-tuning paradigm can directly improve OOD detection performance. Finally, we visualized the prediction distributions of different OOD detection methods and discussed the selection of thresholds. Overall, this work lays the foundation for unknown plant disease recognition, providing strong support for the security and reliability of plant disease recognition systems. We will release our code at https://github.com/JiuqingDong/PDOOD to further advance this field.


Asunto(s)
Enfermedades de las Plantas , Algoritmos
4.
Artif Intell Med ; 156: 102949, 2024 Aug 16.
Artículo en Inglés | MEDLINE | ID: mdl-39178621

RESUMEN

The lack of annotated medical images limits the performance of deep learning models, which usually need large-scale labelled datasets. Few-shot learning techniques can reduce data scarcity issues and enhance medical image analysis speed and robustness. This systematic review gives a comprehensive overview of few-shot learning methods for medical image analysis, aiming to establish a standard methodological pipeline for future research reference. With a particular emphasis on the role of meta-learning, we analysed 80 relevant articles published from 2018 to 2023, conducting a risk of bias assessment and extracting relevant information, especially regarding the employed learning techniques. From this, we delineated a comprehensive methodological pipeline shared among all studies. In addition, we performed a statistical analysis of the studies' results concerning the clinical task and the meta-learning method employed while also presenting supplemental information such as imaging modalities and model robustness evaluation techniques. We discussed the findings of our analysis, providing a deep insight into the limitations of the state-of-the-art methods and the most promising approaches. Drawing on our investigation, we yielded recommendations on potential future research directions aiming to bridge the gap between research and clinical practice.

5.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39133096

RESUMEN

The molecular property prediction (MPP) plays a crucial role in the drug discovery process, providing valuable insights for molecule evaluation and screening. Although deep learning has achieved numerous advances in this area, its success often depends on the availability of substantial labeled data. The few-shot MPP is a more challenging scenario, which aims to identify unseen property with only few available molecules. In this paper, we propose an attribute-guided prototype network (APN) to address the challenge. APN first introduces an molecular attribute extractor, which can not only extract three different types of fingerprint attributes (single fingerprint attributes, dual fingerprint attributes, triplet fingerprint attributes) by considering seven circular-based, five path-based, and two substructure-based fingerprints, but also automatically extract deep attributes from self-supervised learning methods. Furthermore, APN designs the Attribute-Guided Dual-channel Attention module to learn the relationship between the molecular graphs and attributes and refine the local and global representation of the molecules. Compared with existing works, APN leverages high-level human-defined attributes and helps the model to explicitly generalize knowledge in molecular graphs. Experiments on benchmark datasets show that APN can achieve state-of-the-art performance in most cases and demonstrate that the attributes are effective for improving few-shot MPP performance. In addition, the strong generalization ability of APN is verified by conducting experiments on data from different domains.


Asunto(s)
Aprendizaje Profundo , Descubrimiento de Drogas , Descubrimiento de Drogas/métodos , Humanos , Algoritmos , Redes Neurales de la Computación
6.
Sci Rep ; 14(1): 18319, 2024 08 07.
Artículo en Inglés | MEDLINE | ID: mdl-39112791

RESUMEN

Accurately assigning standardized diagnosis and procedure codes from clinical text is crucial for healthcare applications. However, this remains challenging due to the complexity of medical language. This paper proposes a novel model that incorporates extreme multi-label classification tasks to enhance International Classification of Diseases (ICD) coding. The model utilizes deformable convolutional neural networks to fuse representations from hidden layer outputs of pre-trained language models and external medical knowledge embeddings fused using a multimodal approach to provide rich semantic encodings for each code. A probabilistic label tree is constructed based on the hierarchical structure existing in ICD labels to incorporate ontological relationships between ICD codes and enable structured output prediction. Experiments on medical code prediction on the MIMIC-III database demonstrate competitive performance, highlighting the benefits of this technique for robust clinical code assignment.


Asunto(s)
Clasificación Internacional de Enfermedades , Redes Neurales de la Computación , Semántica , Humanos , Procesamiento de Lenguaje Natural , Algoritmos , Bases de Datos Factuales
7.
Sci Rep ; 14(1): 19470, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39174581

RESUMEN

With the rapid growth of social media, fake news (rumors) are rampant online, seriously endangering the health of mainstream social consciousness. Fake news detection (FEND), as a machine learning solution for automatically identifying fake news on Internet, is increasingly gaining the attentions of academic community and researchers. Recently, the mainstream FEND approaches relying on deep learning primarily involves fully supervised fine-tuning paradigms based on pre-trained language models (PLMs), relying on large annotated datasets. In many real scenarios, obtaining high-quality annotated corpora are time-consuming, expertise-required, labor-intensive, and expensive, which presents challenges in obtaining a competitive automatic rumor detection system. Therefore, developing and enhancing FEND towards data-scarce scenarios is becoming increasingly essential. In this work, inspired by the superiority of semi-/self- supervised learning, we propose a novel few-shot rumor detection framework based on semi-supervised adversarial learning and self-supervised contrastive learning, named Detection Yet See Few (DetectYSF). DetectYSF synergizes contrastive self-supervised learning and adversarial semi-supervised learning to achieve accurate and efficient FEND capabilities with limited supervised data. DetectYSF uses Transformer-based PLMs (e.g., BERT, RoBERTa) as its backbone and employs a Masked LM-based pseudo prompt learning paradigm for model tuning (prompt-tuning). Specifically, during DetectYSF training, the enhancement measures for DetectYSF are as follows: (1) We design a simple but efficient self-supervised contrastive learning strategy to optimize sentence-level semantic embedding representations obtained from PLMs; (2) We construct a Generation Adversarial Network (GAN), utilizing random noises and negative fake news samples as inputs, and employing Multi-Layer Perceptrons (MLPs) and an extra independent PLM encoder to generate abundant adversarial embeddings. Then, incorporated with the adversarial embeddings, we utilize semi-supervised adversarial learning to further optimize the output embeddings of DetectYSF during its prompt-tuning procedure. From the news veracity dissemination perspective, we found that the authenticity of the news shared by these collectives tends to remain consistent, either mostly genuine or predominantly fake, a theory we refer to as "news veracity dissemination consistency". By employing an adjacent sub-graph feature aggregation algorithm, we infuse the authenticity characteristics from neighboring news nodes of the constructed veracity dissemination network during DetectYSF inference. It integrates the external supervisory signals from "news veracity dissemination consistency" to further refine the news authenticity detection results of PLM prompt-tuning, thereby enhancing the accuracy of fake news detection. Furthermore, extensive baseline comparisons and ablated experiments on three widely-used benchmarks demonstrate the effectiveness and superiority of DetectYSF for few-shot fake new detection under low-resource scenarios.

8.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39177261

RESUMEN

Large language models (LLMs) are sophisticated AI-driven models trained on vast sources of natural language data. They are adept at generating responses that closely mimic human conversational patterns. One of the most notable examples is OpenAI's ChatGPT, which has been extensively used across diverse sectors. Despite their flexibility, a significant challenge arises as most users must transmit their data to the servers of companies operating these models. Utilizing ChatGPT or similar models online may inadvertently expose sensitive information to the risk of data breaches. Therefore, implementing LLMs that are open source and smaller in scale within a secure local network becomes a crucial step for organizations where ensuring data privacy and protection has the highest priority, such as regulatory agencies. As a feasibility evaluation, we implemented a series of open-source LLMs within a regulatory agency's local network and assessed their performance on specific tasks involving extracting relevant clinical pharmacology information from regulatory drug labels. Our research shows that some models work well in the context of few- or zero-shot learning, achieving performance comparable, or even better than, neural network models that needed thousands of training samples. One of the models was selected to address a real-world issue of finding intrinsic factors that affect drugs' clinical exposure without any training or fine-tuning. In a dataset of over 700 000 sentences, the model showed a 78.5% accuracy rate. Our work pointed to the possibility of implementing open-source LLMs within a secure local network and using these models to perform various natural language processing tasks when large numbers of training examples are unavailable.


Asunto(s)
Procesamiento de Lenguaje Natural , Humanos , Redes Neurales de la Computación , Aprendizaje Automático
9.
ACS Nano ; 18(34): 23489-23496, 2024 Aug 27.
Artículo en Inglés | MEDLINE | ID: mdl-39137093

RESUMEN

Ternary content-addressable memory (TCAM) is promising for data-intensive artificial intelligence applications due to its large-scale parallel in-memory computing capabilities. However, it is still challenging to build a reliable TCAM cell from a single circuit component. Here, we demonstrate a single transistor TCAM based on a floating-gate two-dimensional (2D) ambipolar MoTe2 field-effect transistor with graphene contacts. Our bottom graphene contacts scheme enables gate modulation of the contact Schottky barrier heights, facilitating carrier injection for both electrons and holes. The 2D nature of our channel and contact materials provides device scaling potentials beyond silicon. By integration with a floating-gate stack, a highly reliable nonvolatile memory is achieved. Our TCAM cell exhibits a resistance ratio larger than 1000 and symmetrical complementary states, allowing the implementation of large-scale TCAM arrays. Finally, we show through circuit simulations that in-memory Hamming distance computation is readily achievable based on our TCAM with array sizes up to 128 cells.

10.
Sensors (Basel) ; 24(15)2024 Jul 31.
Artículo en Inglés | MEDLINE | ID: mdl-39124022

RESUMEN

Nowadays, autonomous driving technology has become widely prevalent. The intelligent vehicles have been equipped with various sensors (e.g., vision sensors, LiDAR, depth cameras etc.). Among them, the vision systems with tailored semantic segmentation and perception algorithms play critical roles in scene understanding. However, the traditional supervised semantic segmentation needs a large number of pixel-level manual annotations to complete model training. Although few-shot methods reduce the annotation work to some extent, they are still labor intensive. In this paper, a self-supervised few-shot semantic segmentation method based on Multi-task Learning and Dense Attention Computation (dubbed MLDAC) is proposed. The salient part of an image is split into two parts; one of them serves as the support mask for few-shot segmentation, while cross-entropy losses are calculated between the other part and the entire region with the predicted results separately as multi-task learning so as to improve the model's generalization ability. Swin Transformer is used as our backbone to extract feature maps at different scales. These feature maps are then input to multiple levels of dense attention computation blocks to enhance pixel-level correspondence. The final prediction results are obtained through inter-scale mixing and feature skip connection. The experimental results indicate that MLDAC obtains 55.1% and 26.8% one-shot mIoU self-supervised few-shot segmentation on the PASCAL-5i and COCO-20i datasets, respectively. In addition, it achieves 78.1% on the FSS-1000 few-shot dataset, proving its efficacy.

11.
Quant Imaging Med Surg ; 14(8): 5443-5459, 2024 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-39144045

RESUMEN

Background: The automated classification of histological images is crucial for the diagnosis of cancer. The limited availability of well-annotated datasets, especially for rare cancers, poses a significant challenge for deep learning methods due to the small number of relevant images. This has led to the development of few-shot learning approaches, which bear considerable clinical importance, as they are designed to overcome the challenges of data scarcity in deep learning for histological image classification. Traditional methods often ignore the challenges of intraclass diversity and interclass similarities in histological images. To address this, we propose a novel mutual reconstruction network model, aimed at meeting these challenges and improving the few-shot classification performance of histological images. Methods: The key to our approach is the extraction of subtle and discriminative features. We introduce a feature enhancement module (FEM) and a mutual reconstruction module to increase differences between classes while reducing variance within classes. First, we extract features of support and query images using a feature extractor. These features are then processed by the FEM, which uses a self-attention mechanism for self-reconstruction of features, enhancing the learning of detailed features. These enhanced features are then input into the mutual reconstruction module. This module uses enhanced support features to reconstruct enhanced query features and vice versa. The classification of query samples is based on weighted calculations of the distances between query features and reconstructed query features and between support features and reconstructed support features. Results: We extensively evaluated our model using a specially created few-shot histological image dataset. The results showed that in a 5-way 10-shot setup, our model achieved an impressive accuracy of 92.09%. This is a 23.59% improvement in accuracy compared to the model-agnostic meta-learning (MAML) method, which does not focus on fine-grained attributes. In the more challenging, 5-way 1-shot setting, our model also performed well, demonstrating a 18.52% improvement over the ProtoNet, which does not address this challenge. Additional ablation studies indicated the effectiveness and complementary nature of each module and confirmed our method's ability to parse small differences between classes and large variations within classes in histological images. These findings strongly support the superiority of our proposed method in the few-shot classification of histological images. Conclusions: The mutual reconstruction network provides outstanding performance in the few-shot classification of histological images, successfully overcoming the challenges of similarities between classes and diversity within classes. This marks a significant advancement in the automated classification of histological images.

12.
Sensors (Basel) ; 24(13)2024 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-39001199

RESUMEN

Automatic Modulation Recognition (AMR) is a key technology in the field of cognitive communication, playing a core role in many applications, especially in wireless security issues. Currently, deep learning (DL)-based AMR technology has achieved many research results, greatly promoting the development of AMR technology. However, the few-shot dilemma faced by DL-based AMR methods greatly limits their application in practical scenarios. Therefore, this paper endeavored to address the challenge of AMR with limited data and proposed a novel meta-learning method, the Multi-Level Comparison Relation Network with Class Reconstruction (MCRN-CR). Firstly, the method designs a structure of a multi-level comparison relation network, which involves embedding functions to output their feature maps hierarchically, comprehensively calculating the relation scores between query samples and support samples to determine the modulation category. Secondly, the embedding function integrates a reconstruction module, leveraging an autoencoder for support sample reconstruction, wherein the encoder serves dual purposes as the embedding mechanism. The training regimen incorporates a meta-learning paradigm, harmoniously combining classification and reconstruction losses to refine the model's performance. The experimental results on the RadioML2018 dataset show that our designed method can greatly alleviate the small sample problem in AMR and is superior to existing methods.

13.
PeerJ Comput Sci ; 10: e2080, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38983194

RESUMEN

Poultry farming is an indispensable part of global agriculture, playing a crucial role in food safety and economic development. Managing and preventing diseases is a vital task in the poultry industry, where semantic segmentation technology can significantly enhance the efficiency of traditional manual monitoring methods. Furthermore, traditional semantic segmentation has achieved excellent results on extensively manually annotated datasets, facilitating real-time monitoring of poultry. Nonetheless, the model encounters limitations when exposed to new environments, diverse breeding varieties, or varying growth stages within the same species, necessitating extensive data retraining. Overreliance on large datasets results in higher costs for manual annotations and deployment delays, thus hindering practical applicability. To address this issue, our study introduces HSDNet, an innovative semantic segmentation model based on few-shot learning, for monitoring poultry farms. The HSDNet model adeptly adjusts to new settings or species with a single image input while maintaining substantial accuracy. In the specific context of poultry breeding, characterized by small congregating animals and the inherent complexities of agricultural environments, issues of non-smooth losses arise, potentially compromising accuracy. HSDNet incorporates a Sharpness-Aware Minimization (SAM) strategy to counteract these challenges. Furthermore, by considering the effects of imbalanced loss on convergence, HSDNet mitigates the overfitting issue induced by few-shot learning. Empirical findings underscore HSDNet's proficiency in poultry breeding settings, exhibiting a significant 72.89% semantic segmentation accuracy on single images, which is higher than SOTA's 68.85%.

14.
Neuropathol Appl Neurobiol ; 50(4): e12997, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39010256

RESUMEN

AIMS: Recent advances in artificial intelligence, particularly with large language models like GPT-4Vision (GPT-4V)-a derivative feature of ChatGPT-have expanded the potential for medical image interpretation. This study evaluates the accuracy of GPT-4V in image classification tasks of histopathological images and compares its performance with a traditional convolutional neural network (CNN). METHODS: We utilised 1520 images, including haematoxylin and eosin staining and tau immunohistochemistry, from patients with various neurodegenerative diseases, such as Alzheimer's disease (AD), progressive supranuclear palsy (PSP) and corticobasal degeneration (CBD). We assessed GPT-4V's performance using multi-step prompts to determine how textual context influences image interpretation. We also employed few-shot learning to enhance improvements in GPT-4V's diagnostic performance in classifying three specific tau lesions-astrocytic plaques, neuritic plaques and tufted astrocytes-and compared the outcomes with the CNN model YOLOv8. RESULTS: GPT-4V accurately recognised staining techniques and tissue origin but struggled with specific lesion identification. The interpretation of images was notably influenced by the provided textual context, which sometimes led to diagnostic inaccuracies. For instance, when presented with images of the motor cortex, the diagnosis shifted inappropriately from AD to CBD or PSP. However, few-shot learning markedly improved GPT-4V's diagnostic capabilities, enhancing accuracy from 40% in zero-shot learning to 90% with 20-shot learning, matching the performance of YOLOv8, which required 100-shot learning to achieve the same accuracy. CONCLUSIONS: Although GPT-4V faces challenges in independently interpreting histopathological images, few-shot learning significantly improves its performance. This approach is especially promising for neuropathology, where acquiring extensive labelled datasets is often challenging.


Asunto(s)
Redes Neurales de la Computación , Enfermedades Neurodegenerativas , Humanos , Enfermedades Neurodegenerativas/patología , Interpretación de Imagen Asistida por Computador/métodos , Enfermedad de Alzheimer/patología
15.
Front Hum Neurosci ; 18: 1421922, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39050382

RESUMEN

This paper presents a systematic literature review, providing a comprehensive taxonomy of Data Augmentation (DA), Transfer Learning (TL), and Self-Supervised Learning (SSL) techniques within the context of Few-Shot Learning (FSL) for EEG signal classification. EEG signals have shown significant potential in various paradigms, including Motor Imagery, Emotion Recognition, Visual Evoked Potentials, Steady-State Visually Evoked Potentials, Rapid Serial Visual Presentation, Event-Related Potentials, and Mental Workload. However, challenges such as limited labeled data, noise, and inter/intra-subject variability have impeded the effectiveness of traditional machine learning (ML) and deep learning (DL) models. This review methodically explores how FSL approaches, incorporating DA, TL, and SSL, can address these challenges and enhance classification performance in specific EEG paradigms. It also delves into the open research challenges related to these techniques in EEG signal classification. Specifically, the review examines the identification of DA strategies tailored to various EEG paradigms, the creation of TL architectures for efficient knowledge transfer, and the formulation of SSL methods for unsupervised representation learning from EEG data. Addressing these challenges is crucial for enhancing the efficacy and robustness of FSL-based EEG signal classification. By presenting a structured taxonomy of FSL techniques and discussing the associated research challenges, this systematic review offers valuable insights for future investigations in EEG signal classification. The findings aim to guide and inspire researchers, promoting advancements in applying FSL methodologies for improved EEG signal analysis and classification in real-world settings.

17.
J Imaging ; 10(7)2024 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-39057738

RESUMEN

The limited availability of specialized image databases (particularly in hospitals, where tools vary between providers) makes it difficult to train deep learning models. This paper presents a few-shot learning methodology that uses a pre-trained ResNet integrated with an encoder as a backbone to encode conditional shape information for the classification of neonatal resuscitation equipment from less than 100 natural images. The model is also strengthened by incorporating a reliability score, which enriches the prediction with an estimation of classification reliability. The model, whose performance is cross-validated, reached a median accuracy performance of over 99% (and a lower limit of 73.4% for the least accurate model/fold) using only 87 meta-training images. During the test phase on complex natural images, performance was slightly degraded due to a sub-optimal segmentation strategy (FastSAM) required to maintain the real-time inference phase (median accuracy 87.25%). This methodology proves to be excellent for applying complex classification models to contexts (such as neonatal resuscitation) that are not available in public databases. Improvements to the automatic segmentation strategy prior to the extraction of conditional information will allow a natural application in simulation and hospital settings.

18.
Bioengineering (Basel) ; 11(7)2024 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-39061767

RESUMEN

Bioacoustic event detection is a demanding endeavor involving recognizing and classifying the sounds animals make in their natural habitats. Traditional supervised learning requires a large amount of labeled data, which are hard to come by in bioacoustics. This paper presents a few-shot learning (FSL) method incorporating transductive inference and data augmentation to address the issues of too few labeled events and small volumes of recordings. Here, transductive inference iteratively alters class prototypes and feature extractors to seize essential patterns, whereas data augmentation applies SpecAugment on Mel spectrogram features to augment training data. The proposed approach is evaluated by using the Detecting and Classifying Acoustic Scenes and Events (DCASE) 2022 and 2021 datasets. Extensive experimental results demonstrate that all components of the proposed method achieve significant F-score improvements of 27% and 10%, for the DCASE-2022 and DCASE-2021 datasets, respectively, compared to recent advanced approaches. Moreover, our method is helpful in FSL tasks because it effectively adapts to sounds from various animal species, recordings, and durations.

19.
Med Image Anal ; 97: 103258, 2024 Jul 04.
Artículo en Inglés | MEDLINE | ID: mdl-38996667

RESUMEN

Foundation models pre-trained on large-scale data have been widely witnessed to achieve success in various natural imaging downstream tasks. Parameter-efficient fine-tuning (PEFT) methods aim to adapt foundation models to new domains by updating only a small portion of parameters in order to reduce computational overhead. However, the effectiveness of these PEFT methods, especially in cross-domain few-shot scenarios, e.g., medical image analysis, has not been fully explored. In this work, we facilitate the study of the performance of PEFT when adapting foundation models to medical image classification tasks. Furthermore, to alleviate the limitations of prompt introducing ways and approximation capabilities on Transformer architectures of mainstream prompt tuning methods, we propose the Embedded Prompt Tuning (EPT) method by embedding prompt tokens into the expanded channels. We also find that there are anomalies in the feature space distribution of foundation models during pre-training process, and prompt tuning can help mitigate this negative impact. To explain this phenomenon, we also introduce a novel perspective to understand prompt tuning: Prompt tuning is a distribution calibrator. And we support it by analysing patch-wise scaling and feature separation operations contained in EPT. Our experiments show that EPT outperforms several state-of-the-art fine-tuning methods by a significant margin on few-shot medical image classification tasks, and completes the fine-tuning process within highly competitive time, indicating EPT is an effective PEFT method. The source code is available at github.com/zuwenqiang/EPT.

20.
Interdiscip Sci ; 16(2): 469-488, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38951382

RESUMEN

Image classification, a fundamental task in computer vision, faces challenges concerning limited data handling, interpretability, improved feature representation, efficiency across diverse image types, and processing noisy data. Conventional architectural approaches have made insufficient progress in addressing these challenges, necessitating architectures capable of fine-grained classification, enhanced accuracy, and superior generalization. Among these, the vision transformer emerges as a noteworthy computer vision architecture. However, its reliance on substantial data for training poses a drawback due to its complexity and high data requirements. To surmount these challenges, this paper proposes an innovative approach, MetaV, integrating meta-learning into a vision transformer for medical image classification. N-way K-shot learning is employed to train the model, drawing inspiration from human learning mechanisms utilizing past knowledge. Additionally, deformational convolution and patch merging techniques are incorporated into the vision transformer model to mitigate complexity and overfitting while enhancing feature representation. Augmentation methods such as perturbation and Grid Mask are introduced to address the scarcity and noise in medical images, particularly for rare diseases. The proposed model is evaluated using diverse datasets including Break His, ISIC 2019, SIPaKMed, and STARE. The achieved performance accuracies of 89.89%, 87.33%, 94.55%, and 80.22% for Break His, ISIC 2019, SIPaKMed, and STARE, respectively, present evidence validating the superior performance of the proposed model in comparison to conventional models, setting a new benchmark for meta-vision image classification models.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos , Aprendizaje Automático , Diagnóstico por Imagen , Aprendizaje Profundo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA