RESUMO
Deep networks play a crucial role in the recognition of agricultural diseases. However, these networks often come with numerous parameters and large sizes, posing a challenge for direct deployment on resource-limited edge computing devices for plant protection robots. To tackle this challenge for recognizing cotton diseases on the edge device, we adopt knowledge distillation to compress the big networks, aiming to reduce the number of parameters and the computational complexity of the networks. In order to get excellent performance, we conduct combined comparison experiments from three aspects: teacher network, student network and distillation algorithm. The teacher networks contain three classical convolutional neural networks, while the student networks include six lightweight networks in two categories of homogeneous and heterogeneous structures. In addition, we investigate nine distillation algorithms using spot-adaptive strategy. The results demonstrate that the combination of DenseNet40 as the teacher and ShuffleNetV2 as the student show best performance when using NST algorithm, yielding a recognition accuracy of 90.59% and reducing FLOPs from 0.29 G to 0.045 G. The proposed method can facilitate the lightweighting of the model for recognizing cotton diseases while maintaining high recognition accuracy and offer a practical solution for deploying deep models on edge computing devices.
RESUMO
With the development of intelligent manufacturing technology, robots have become more widespread in the field of milling processing. When milling difficult-to-machine alloy materials, the localized high temperature and large temperature gradient at the front face of the tool lead to shortened tool life and poor machining quality. The existing temperature field reconstruction methods have many assumptions, large arithmetic volume and long solution time. In this paper, an inverse heat conduction problem solution model based on Gated Convolutional Recurrent Neural Network (CNN-GRU) is proposed for reconstructing the temperature field of the tool during milling. In order to ensure the speed and accuracy of the reconstruction, we propose to utilize the inverse heat conduction problem solution model constructed by knowledge distillation (KD) and compression acceleration, which achieves a significant reduction of the training time with a small loss of optimality and ensures the accuracy and efficiency of the prediction model. With different levels of random noise added to the model input data, CNN-GRU + KD is noise-resistant and still shows good robustness and stability under noisy data. The temperature field reconstruction of the milling tool is carried out for three different working conditions, and the curve fitting excellence under the three conditions is 0.97 at the highest, and the root mean square error is 1.43°C at the minimum, respectively, and the experimental results show that the model is feasible and effective in carrying out the temperature field reconstruction of the milling tool and is of great significance in improving the accuracy of the milling machining robot.
RESUMO
Protein ubiquitination is a critical post-translational modification (PTM) involved in diverse biological processes and plays a pivotal role in regulating physiological mechanisms and disease states. Despite various efforts to develop ubiquitination site prediction tools across species, these tools mainly rely on predefined sequence features and machine learning algorithms, with species-specific variations in ubiquitination patterns remaining poorly understood. This study introduces a novel approach for predicting Arabidopsis thaliana ubiquitination sites using a neural network model based on knowledge distillation and natural language processing (NLP) of protein sequences. Our framework employs a multi-species "Teacher model" to guide a more compact, species-specific "Student model", with the "Teacher" generating pseudo-labels that enhance the "Student" learning and prediction robustness. Cross-validation results demonstrate that our model achieves superior performance, with an accuracy of 86.3â¯% and an area under the curve (AUC) of 0.926, while independent testing confirmed these results with an accuracy of 86.3â¯% and an AUC of 0.923. Comparative analysis with established predictors further highlights the model's superiority, emphasizing the effectiveness of integrating knowledge distillation and NLP in ubiquitination prediction tasks. This study presents a promising and efficient approach for ubiquitination site prediction, offering valuable insights for researchers in related fields. The code and resources are available on GitHub: https://github.com/nuinvtnu/KD_ArapUbi.
RESUMO
Segment anything model (SAM) has attracted extensive interest as a potent large-scale image segmentation model, with prior efforts adapting it for use in medical imaging. However, the precise segmentation of cell nucleus instances remains a formidable challenge in computational pathology, given substantial morphological variations and the dense clustering of nuclei with unclear boundaries. This study presents an innovative cell segmentation algorithm named CellSAM. CellSAM has the potential to improve the effectiveness and precision of disease identification and therapy planning. As a variant of SAM, CellSAM integrates dual-image encoders and employs techniques such as knowledge distillation and mask fusion. This innovative model exhibits promising capabilities in capturing intricate cell structures and ensuring adaptability in resource-constrained scenarios. The experimental results indicate that this structure effectively enhances the quality and precision of cell segmentation. Remarkably, CellSAM demonstrates outstanding results even with minimal training data. In the evaluation of particular cell segmentation tasks, extensive comparative analyzes show that CellSAM outperforms both general fundamental models and state-of-the-art (SOTA) task-specific models. Comprehensive evaluation metrics yield scores of 0.884, 0.876, and 0.768 for mean accuracy, recall, and precision respectively. Extensive experiments show that CellSAM excels in capturing subtle details and complex structures and is capable of segmenting cells in images accurately. Additionally, CellSAM demonstrates excellent performance on clinical data, indicating its potential for robust applications in treatment planning and disease diagnosis, thereby further improving the efficiency of computer-aided medicine.
RESUMO
CRISPR/Cas9 is a popular genome editing technology, yet its clinical application is hindered by off-target effects. Many deep learning-based methods are available for off-target prediction. However, few can predict off-target activities with insertions or deletions (indels) between single guide RNA and DNA sequence pairs. Additionally, the analysis of off-target data is challenged due to a data imbalance issue. Moreover, the prediction accuracy and interpretability remain to be improved. Here, we introduce a deep learning-based framework, named Crispr-SGRU, to predict off-target activities with mismatches and indels. This model is based on Inception and stacked BiGRU. It adopts a dice loss function to solve the inherent imbalance issue. Experimental results show our model outperforms existing methods for off-target prediction in terms of accuracy and robustness. Finally, we study the interpretability of this model through Deep SHAP and teacher-student-based knowledge distillation, and find it can provide meaningful explanations for sequence patterns regarding off-target activity.
Assuntos
Sistemas CRISPR-Cas , Edição de Genes , Mutação INDEL , Edição de Genes/métodos , Aprendizado Profundo , RNA Guia de Sistemas CRISPR-Cas/genética , Humanos , Pareamento Incorreto de Bases/genéticaRESUMO
The model network based on YOLOv8 for detecting race cones and buckets in the Formula Unmanned Competition for Chinese university students needs help with problems with complex structure, redundant number of parameters, and computation, significantly affecting detection efficiency. A lightweight detection model based on YOLOv8 is proposed to address these problems. The model includes improving the backbone network, neck network, and detection head, as well as introducing knowledge distillation and other techniques to construct a lightweight model. The specific improvements are as follows: firstly, the backbone network for extracting features is improved by introducing the ADown module in YOLOv9 to replace the convolution module used for downsampling in the YOLOv8 network, and secondly, the FasterBlock in FasterNet network was introduced to replace the fusion module in YOLOv8 C2f, and then the self-developed lightweight detection head was introduced to improve the detection performance while achieving lightweight. Finally, the detection performance was further improved by knowledge distillation. The experimental results on the public dataset FSACOCO show that the improved model's accuracy, recall, and average precision are 92.7%, 84.6%, and 91%, respectively. Compared with the original YOLOv8n detection model, the recall and average precision increase by 2.7 and 1.2 percentage points, the memory is half the original, and the model computation is 51%. The model significantly reduces the misdetection and leakage of conical buckets in real-vehicle tests and, at the same time, ensures the detection speed to satisfy the deployment requirements on tiny devices. Satisfies all the requirements for deployment of tiny devices in the race car of the China University Student Driverless Formula Competition. The improved method in this paper can be applied to conebucket detection in complex scenarios, and the improved idea can be carried over to the detection of other small targets.
RESUMO
Enhancing deep learning performance requires extensive datasets. Centralized training raises concerns about data ownership and security. Additionally, large models are often unsuitable for hospitals due to their limited resource capacities. Federated learning (FL) has been introduced to address these issues. However, FL faces challenges such as vulnerability to attacks, non-IID data, reliance on a central server, high communication overhead, and suboptimal model aggregation. Furthermore, FL is not optimized for realistic hospital database environments, where data are dynamically accumulated. To overcome these limitations, we propose federated influencer learning (FIL) as a secure and efficient collaborative learning paradigm. Unlike the server-client model of FL, FIL features an equal-status structure among participants, with an administrator overseeing the overall process. FIL comprises four stages: local training, qualification, screening, and influencing. Local training is similar to vanilla FL, except for the optional use of a shared dataset. In the qualification stage, participants are classified as influencers or followers. During the screening stage, the integrity of the logits from the influencer is examined. If the integrity is confirmed, the influencer shares their knowledge with the others. FIL is more secure than FL because it eliminates the need for model-parameter transactions, central servers, and generative models. Additionally, FIL supports model-agnostic training. These features make FIL particularly promising for fields such as healthcare, where maintaining confidentiality is crucial. Our experiments demonstrated the effectiveness of FIL, which outperformed several FL methods on large medical (X-ray, MRI, and PET) and natural (CIFAR-10) image dataset in a dynamically accumulating database environment, with consistently higher precision, recall, Dice score, and lower standard deviation between participants. In particular, in the PET dataset, FIL achieved about a 40% improvement in Dice score and recall.
Assuntos
Segurança Computacional , Bases de Dados Factuais , Humanos , Comportamento Cooperativo , Aprendizado Profundo , AprendizagemRESUMO
OBJECTIVE: Histological classification is a challenging task due to the diverse appearances, unpredictable variations, and blurry edges of histological tissues. Recently, many approaches based on large networks have achieved satisfactory performance. However, most of these methods rely heavily on substantial computational resources and large high-quality datasets, limiting their practical application. Knowledge Distillation (KD) offers a promising solution by enabling smaller networks to achieve performance comparable to that of larger networks. Nonetheless, KD is hindered by the problem of high-dimensional characteristics, which makes it difficult to capture tiny scattered features and often leads to the loss of edge feature relationships. METHODS: A novel cross-domain visual prompting distillation approach is proposed, compelling the teacher network to facilitate the extraction of significant high-dimensional features into low-dimensional feature maps, thereby aiding the student network in achieving superior performance. Additionally, a dynamic learnable temperature module based on novel vector-based spatial proximity is introduced to further encourage the student to imitate the teacher. RESULTS: Experiments conducted on widely accepted histological datasets, NCT-CRC-HE-100K and LC25000, demonstrate the effectiveness of the proposed method and validate its robustness on the popular dermoscopic dataset ISIC-2019. Compared to state-of-the-art knowledge distillation methods, the proposed method achieves better performance and greater robustness with optimal domain adaptation. CONCLUSION: A novel distillation architecture, termed VPSP, tailored for histological classification, is proposed. This architecture achieves superior performance with optimal domain adaptation, enhancing the clinical application of histological classification. The source code will be released at https://github.com/xiaohongji/VPSP.
Assuntos
Algoritmos , Humanos , Processamento de Imagem Assistida por Computador/métodos , Bases de Dados FactuaisRESUMO
Compared to pixel-level content loss, domain-level style loss in CycleGAN-based dehazing algorithms just imposes relatively soft constraints on the intermediate translated images, resulting in struggling to accurately model haze-free features from real hazy scenes. Furthermore, globally perceptual discriminator may misclassify real hazy images with significant scene depth variations as clean style, thereby resulting in severe haze residue. To address these issues, we propose a pseudo self-distillation based CycleGAN with enhanced local adversarial interaction for image dehazing, termed as PSD-ELGAN. On the one hand, we leverage the characteristic of CycleGAN to generate pseudo image pairs during training. Knowledge distillation is employed in this unsupervised framework to transfer the informative high-quality features from the self-reconstruction network of real clean images to the dehazing generator of paired pseudo hazy images, which effectively improves its haze-free feature representation ability without increasing network parameters. On the other hand, in the output of dehazing generator, four non-uniform image patches severely affected by residual haze are adaptively selected as input samples. The local discriminator could easily distinguish their hazy style, thereby further compelling the dehazing generator to suppress haze residues in such regions, thus enhancing its dehazing performance. Extensive experiments show that our PSD-ELGAN can achieve promising results and better generality across various datasets.
RESUMO
Humans have the ability to constantly learn new knowledge. However, for artificial intelligence, trying to continuously learn new knowledge usually results in catastrophic forgetting, the existing regularization-based and dynamic structure-based approaches have shown great potential for alleviating. Nevertheless, these approaches have certain limitations. They usually do not fully consider the problem of incompatible feature embeddings. Instead, they tend to focus only on the features of new or previous classes and fail to comprehensively consider the entire model. Therefore, we propose a two-stage learning paradigm to solve feature embedding incompatibility problems. Specifically, we retain the previous model and freeze all its parameters in the first stage while dynamically expanding a new module to alleviate feature embedding incompatibility questions. In the second stage, a fusion knowledge distillation approach is used to compress the redundant feature dimensions. Moreover, we propose weight pruning and consolidation approaches to improve the efficiency of the model. Our experimental results obtained on the CIFAR-100, ImageNet-100 and ImageNet-1000 benchmark datasets show that the proposed approaches achieve the best performance among all the compared approaches. For example, on the ImageNet-100 dataset, the maximal accuracy improvement is 5.08%. Code is available at https://github.com/ybyangjing/CIL-FCE.
RESUMO
Recent advancements in retinal vessel segmentation, which employ transformer-based and domain-adaptive approaches, show promise in addressing the complexity of ocular diseases such as diabetic retinopathy. However, current algorithms face challenges in effectively accommodating domain-specific variations and limitations of training datasets, which fail to represent real-world conditions comprehensively. Manual inspection by specialists remains time-consuming despite technological progress in medical imaging, underscoring the pressing need for automated and robust segmentation techniques. Additionally, these methods have deficiencies in handling unlabeled target sets, requiring extra preprocessing steps and manual intervention, which hinders their scalability and practical application in clinical settings. This research introduces a novel framework that employs semi-supervised domain adaptation and contrastive pre-training to address these limitations. The proposed model effectively learns from target data by implementing a novel pseudo-labeling approach and feature-based knowledge distillation within a temporal convolutional network (TCN) and extracts robust, domain-independent features. This approach enhances cross-domain adaptation, significantly enhancing the model's versatility and performance in clinical settings. The semi-supervised domain adaptation component overcomes the challenges posed by domain shifts, while pseudo-labeling utilizes the data's inherent structure for enhanced learning, which is particularly beneficial when labeled data is scarce. Evaluated on the DRIVE and CHASE_DB1 datasets, which contain clinical fundus images, the proposed model achieves outstanding performance, with accuracy, sensitivity, specificity, and AUC values of 0.9792, 0.8640, 0.9901, and 0.9868 on DRIVE, and 0.9830, 0.9058, 0.9888, and 0.9950 on CHASE_DB1, respectively, outperforming current state-of-the-art vessel segmentation methods. The partitioning of datasets into training and testing sets ensures thorough validation, while extensive ablation studies with thorough sensitivity analysis of the model's parameters and different percentages of labeled data further validate its robustness.
RESUMO
Black tea is the second most common type of tea in China. Fermentation is one of the most critical processes in its production, and it affects the quality of the finished product, whether it is insufficient or excessive. At present, the determination of black tea fermentation degree completely relies on artificial experience. It leads to inconsistent quality of black tea. To solve this problem, we use machine vision technology to distinguish the degree of fermentation of black tea based on images, this paper proposes a lightweight convolutional neural network (CNN) combined with knowledge distillation to discriminate the degree of fermentation of black tea. After comparing 12 kinds of CNN models, taking into account the size of the model and the performance of discrimination, as well as the selection principle of teacher models, Shufflenet_v2_x1.0 is selected as the student model, and Efficientnet_v2 is selected as the teacher model. Then, CrossEntropy Loss is replaced by Focal Loss. Finally, for Distillation Loss ratios of 0.6, 0.7, 0.8, 0.9, Soft Target Knowledge Distillation (ST), Masked Generative Distillation (MGD), Similarity-Preserving Knowledge Distillation (SPKD), and Attention Transfer (AT) four knowledge distillation methods are tested for their performance in distilling knowledge from the Shufflenet_v2_x1.0 model. The results show that the model discrimination performance after distillation is the best when the Distillation Loss ratio is 0.8 and the MGD method is used. This setup effectively improves the discrimination performance without increasing the number of parameters and computation volume. The model's P, R and F1 values reach 0.9208, 0.9190 and 0.9192, respectively. It achieves precise discrimination of the fermentation degree of black tea. This meets the requirements of objective black tea fermentation judgment and provides technical support for the intelligent processing of black tea.
Assuntos
Fermentação , Redes Neurais de Computação , Chá , Chá/química , Destilação/métodos , Camellia sinensis/química , ChinaRESUMO
Federated learning enables multiple devices to collaboratively train a high-performance model on the central server while keeping their data on the devices themselves. However, due to the significant variability in data distribution across devices, the aggregated global model's optimization direction may differ from that of the local models, making the clients lose their personality. To address this challenge, we propose a Bidirectional Decoupled Distillation For Heterogeneous Federated Learning (BDD-HFL) approach, which incorporates an additional private model within each local client. This design enables mutual knowledge exchange between the private and local models in a bidirectional manner. Specifically, previous one-way federated distillation methods mainly focused on learning features from the target class, which limits their ability to distill features from non-target classes and hinders the convergence of local models. To solve this limitation, we decompose the network output into target and non-target class logits and distill them separately using a joint optimization of cross-entropy and decoupled relative-entropy loss. We evaluate the effectiveness of BDD-HFL through extensive experiments on three benchmarks under IID, Non-IID, and unbalanced data distribution scenarios. Our results show that BDD-HFL outperforms state-of-the-art federated distillation methods across five baselines, achieving at most 3% improvement in average classification accuracy on the CIFAR-10, CIFAR-100, and MNIST datasets. The experiments demonstrate the superiority and generalization capability of BDD-HFL in addressing personalization challenges in federated learning.
RESUMO
Colorectal cancer (CRC) is a relatively common malignancy clinically and the second leading cause of cancer-related deaths. Recent studies have identified T-cell exhaustion as playing a crucial role in the pathogenesis of CRC. A long-standing challenge in the clinical management of CRC is to understand how T cells function during its progression and metastasis, and whether potential therapeutic targets for CRC treatment can be predicted through T cells. Here, we propose DeepTEX, a multi-omics deep learning approach that integrates cross-model data to investigate the heterogeneity of T-cell exhaustion in CRC. DeepTEX uses a domain adaptation model to align the data distributions from two different modalities and applies a cross-modal knowledge distillation model to predict the heterogeneity of T-cell exhaustion across diverse patients, identifying key functional pathways and genes. DeepTEX offers valuable insights into the application of deep learning in multi-omics, providing crucial data for exploring the stages of T-cell exhaustion associated with CRC and relevant therapeutic targets.
Assuntos
Neoplasias Colorretais , RNA-Seq , Análise de Célula Única , Linfócitos T , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Neoplasias Colorretais/imunologia , Humanos , Análise de Célula Única/métodos , RNA-Seq/métodos , Linfócitos T/imunologia , Linfócitos T/metabolismo , Aprendizado Profundo , Análise de Sequência de RNA/métodos , Regulação Neoplásica da Expressão Gênica , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Exaustão das Células TRESUMO
Objective.In this study, we propose a semi-supervised learning (SSL) scheme using a patch-based deep learning (DL) framework to tackle the challenge of high-precision classification of seven lung tumor growth patterns, despite having a small amount of labeled data in whole slide images (WSIs). This scheme aims to enhance generalization ability with limited data and reduce dependence on large amounts of labeled data. It effectively addresses the common challenge of high demand for labeled data in medical image analysis.Approach.To address these challenges, the study employs a SSL approach enhanced by a dynamic confidence threshold mechanism. This mechanism adjusts based on the quantity and quality of pseudo labels generated. This dynamic thresholding mechanism helps avoid the imbalance of pseudo-label categories and the low number of pseudo-labels that may result from a higher fixed threshold. Furthermore, the research introduces a multi-teacher knowledge distillation (MTKD) technique. This technique adaptively weights predictions from multiple teacher models to transfer reliable knowledge and safeguard student models from low-quality teacher predictions.Main results.The framework underwent rigorous training and evaluation using a dataset of 150 WSIs, each representing one of the seven growth patterns. The experimental results demonstrate that the framework is highly accurate in classifying lung tumor growth patterns in histopathology images. Notably, the performance of the framework is comparable to that of fully supervised models and human pathologists. In addition, the framework's evaluation metrics on a publicly available dataset are higher than those of previous studies, indicating good generalizability.Significance.This research demonstrates that a SSL approach can achieve results comparable to fully supervised models and expert pathologists, thus opening new possibilities for efficient and cost-effective medical images analysis. The implementation of dynamic confidence thresholding and MTKD techniques represents a significant advancement in applying DL to complex medical image analysis tasks. This advancement could lead to faster and more accurate diagnoses, ultimately improving patient outcomes and fostering the overall progress of healthcare technology.
Assuntos
Adenocarcinoma de Pulmão , Processamento de Imagem Assistida por Computador , Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/diagnóstico por imagem , Neoplasias Pulmonares/patologia , Adenocarcinoma de Pulmão/diagnóstico por imagem , Adenocarcinoma de Pulmão/patologia , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina Supervisionado , Aprendizado ProfundoRESUMO
While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN's prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https://github.com/xsk160/DGKD.
Assuntos
Redes Neurais de Computação , Aprendizado de Máquina , Algoritmos , Conhecimento , Modelos LogísticosRESUMO
This work addresses the challenge of democratizing advanced Large Language Models (LLMs) by compressing their mathematical reasoning capabilities into sub-billion parameter Small Language Models (SLMs) without compromising performance. We introduce Equation-of-Thought Distillation (EoTD), a novel technique that encapsulates the reasoning process into equation-based representations to construct an EoTD dataset for fine-tuning SLMs. Additionally, we propose the Ensemble Thoughts Distillation (ETD) framework to enhance the reasoning performance of SLMs. This involves creating a reasoning dataset with multiple thought processes, including Chain-of-Thought (CoT), Program-of-Thought (PoT), and Equation-of-Thought (EoT), and using it for fine-tuning. Our experimental performance demonstrates that EoTD significantly boosts the reasoning abilities of SLMs, while ETD enables these models to achieve state-of-the-art reasoning performance.
Assuntos
Idioma , Humanos , Pensamento/fisiologia , Redes Neurais de Computação , Modelos Teóricos , MatemáticaRESUMO
Data heterogeneity (Non-IID) on Federated Learning (FL) is currently a widely publicized problem, which leads to local model drift and performance degradation. Because of the advantage of knowledge distillation, it has been explored in some recent work to refine global models. However, these approaches rely on a proxy dataset or a data generator. First, in many FL scenarios, proxy dataset do not necessarily exist on the server. Second, the quality of data generated by the generator is unstable and the generator depends on the computing resources of the server. In this work, we propose a novel data-Free knowledge distillation approach via generator-Free Data Generation for Non-IID FL, dubbed as FedF2DG. Specifically, FedF2DG requires only local models to generate pseudo datasets for each client, and can generate hard samples by adding an additional regularization term that exploit disagreements between local model and global model. Meanwhile, FedF2DG enables flexible utilization of computational resources by generating pseudo dataset locally or on the server. And to address the label distribution shift in Non-IID FL, we propose a Data Generation Principle that can adaptively control the label distribution and number of pseudo dataset based on client current state, and this allows for the extraction of more client knowledge. Then knowledge distillation is performed to transfer the knowledge in local models to the global model. Extensive experiments demonstrate that our proposed method significantly outperforms the state-of-the-art FL methods and can serve as plugin for existing Federated Learning methds such as FedAvg, FedProx, etc, and improve their performance.
Assuntos
Aprendizado de Máquina , Conhecimento , Redes Neurais de Computação , AlgoritmosRESUMO
Continuous Sign Language Recognition (CSLR) is a task which converts a sign language video into a gloss sequence. The existing deep learning based sign language recognition methods usually rely on large-scale training data and rich supervised information. However, current sign language datasets are limited, and they are only annotated at sentence-level rather than frame-level. Inadequate supervision of sign language data poses a serious challenge for sign language recognition, which may result in insufficient training of sign language recognition models. To address above problems, we propose a cross-modal knowledge distillation method for continuous sign language recognition, which contains two teacher models and one student model. One of the teacher models is the Sign2Text dialogue teacher model, which takes a sign language video and a dialogue sentence as input and outputs the sign language recognition result. The other teacher model is the Text2Gloss translation teacher model, which targets to translate a text sentence into a gloss sequence. Both teacher models can provide information-rich soft labels to assist the training of the student model, which is a general sign language recognition model. We conduct extensive experiments on multiple commonly used sign language datasets, i.e., PHOENIX 2014T, CSL-Daily and QSL, the results show that the proposed cross-modal knowledge distillation method can effectively improve the sign language recognition accuracy by transferring multi-modal information from teacher models to the student model. Code is available at https://github.com/glq-1992/cross-modal-knowledge-distillation_new.
Assuntos
Aprendizado Profundo , Língua de Sinais , Humanos , Redes Neurais de Computação , Destilação/métodosRESUMO
This paper introduces an innovative image classification technique utilizing knowledge distillation, tailored for a lightweight model structure. The core of the approach is a modified version of the AlexNet architecture, enhanced with depthwise-separable convolution layers. A unique aspect of this work is the Teacher-Student Collaborative Knowledge Distillation (TSKD) method. Unlike conventional knowledge distillation techniques, TSKD employs a dual-layered learning strategy, where the student model learns from both the final output and the intermediate layers of the teacher model. This collaborative learning approach enables the student model to actively engage in the learning process, resulting in more efficient knowledge transfer. The paper emphasizes the model suitability for scenarios with limited computational resources. This is achieved through architectural optimizations and the introduction of specialized loss functions, which balance the trade-off between model complexity and computational efficiency. The study demonstrates that despite its lightweight nature, the model maintains high accuracy and robustness in image classification tasks. Key contributions of the paper include the innovative use of depthwise-separable convolution in AlexNet, the TSKD approach for enhanced knowledge transfer, and the development of unique loss functions. These advancements collectively contribute to the model effectiveness in environments with computational constraints, making it a valuable contribution to the field of image classification.