Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 227
Filtrar
1.
Proc Natl Acad Sci U S A ; 121(14): e2317422121, 2024 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-38530895

RESUMO

Stochastic reaction networks are widely used in the modeling of stochastic systems across diverse domains such as biology, chemistry, physics, and ecology. However, the comprehension of the dynamic behaviors inherent in stochastic reaction networks is a formidable undertaking, primarily due to the exponential growth in the number of possible states or trajectories as the state space dimension increases. In this study, we introduce a knowledge distillation method based on reinforcement learning principles, aimed at compressing the dynamical knowledge encoded in stochastic reaction networks into a singular neural network construct. The trained neural network possesses the capability to accurately predict the state conditional joint probability distribution that corresponds to the given query contexts, when prompted with rate parameters, initial conditions, and time values. This obviates the need to track the dynamical process, enabling the direct estimation of normalized state and trajectory probabilities, without necessitating the integration over the complete state space. By applying our method to representative examples, we have observed a high degree of accuracy in both multimodal and high-dimensional systems. Additionally, the trained neural network can serve as a foundational model for developing efficient algorithms for parameter inference and trajectory ensemble generation. These results collectively underscore the efficacy of our approach as a universal means of distilling knowledge from stochastic reaction networks. Importantly, our methodology also spotlights the potential utility in harnessing a singular, pretrained, large-scale model to encapsulate the solution space underpinning a wide spectrum of stochastic dynamical systems.

2.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38145950

RESUMO

Single cell sequencing technology has provided unprecedented opportunities for comprehensively deciphering cell heterogeneity. Nevertheless, the high dimensionality and intricate nature of cell heterogeneity have presented substantial challenges to computational methods. Numerous novel clustering methods have been proposed to address this issue. However, none of these methods achieve the consistently better performance under different biological scenarios. In this study, we developed CAKE, a novel and scalable self-supervised clustering method, which consists of a contrastive learning model with a mixture neighborhood augmentation for cell representation learning, and a self-Knowledge Distiller model for the refinement of clustering results. These designs provide more condensed and cluster-friendly cell representations and improve the clustering performance in term of accuracy and robustness. Furthermore, in addition to accurately identifying the major type cells, CAKE could also find more biologically meaningful cell subgroups and rare cell types. The comprehensive experiments on real single-cell RNA sequencing datasets demonstrated the superiority of CAKE in visualization and clustering over other comparison methods, and indicated its extensive application in the field of cell heterogeneity analysis. Contact: Ruiqing Zheng. (rqzheng@csu.edu.cn).


Assuntos
Algoritmos , Aprendizagem , Análise por Conglomerados , Análise de Sequência de RNA
3.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37099690

RESUMO

Rapid and accurate prediction of drug-target affinity can accelerate and improve the drug discovery process. Recent studies show that deep learning models may have the potential to provide fast and accurate drug-target affinity prediction. However, the existing deep learning models still have their own disadvantages that make it difficult to complete the task satisfactorily. Complex-based models rely heavily on the time-consuming docking process, and complex-free models lacks interpretability. In this study, we introduced a novel knowledge-distillation insights drug-target affinity prediction model with feature fusion inputs to make fast, accurate and explainable predictions. We benchmarked the model on public affinity prediction and virtual screening dataset. The results show that it outperformed previous state-of-the-art models and achieved comparable performance to previous complex-based models. Finally, we study the interpretability of this model through visualization and find it can provide meaningful explanations for pairwise interaction. We believe this model can further improve the drug-target affinity prediction for its higher accuracy and reliable interpretability.


Assuntos
Benchmarking , Descoberta de Drogas , Sistemas de Liberação de Medicamentos
4.
J Cell Mol Med ; 28(18): e70101, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39344205

RESUMO

Colorectal cancer (CRC) is a relatively common malignancy clinically and the second leading cause of cancer-related deaths. Recent studies have identified T-cell exhaustion as playing a crucial role in the pathogenesis of CRC. A long-standing challenge in the clinical management of CRC is to understand how T cells function during its progression and metastasis, and whether potential therapeutic targets for CRC treatment can be predicted through T cells. Here, we propose DeepTEX, a multi-omics deep learning approach that integrates cross-model data to investigate the heterogeneity of T-cell exhaustion in CRC. DeepTEX uses a domain adaptation model to align the data distributions from two different modalities and applies a cross-modal knowledge distillation model to predict the heterogeneity of T-cell exhaustion across diverse patients, identifying key functional pathways and genes. DeepTEX offers valuable insights into the application of deep learning in multi-omics, providing crucial data for exploring the stages of T-cell exhaustion associated with CRC and relevant therapeutic targets.


Assuntos
Neoplasias Colorretais , RNA-Seq , Análise de Célula Única , Linfócitos T , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Neoplasias Colorretais/imunologia , Humanos , Análise de Célula Única/métodos , RNA-Seq/métodos , Linfócitos T/imunologia , Linfócitos T/metabolismo , Aprendizado Profundo , Análise de Sequência de RNA/métodos , Regulação Neoplásica da Expressão Gênica , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Exaustão das Células T
5.
Biol Proced Online ; 26(1): 10, 2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38632527

RESUMO

BACKGROUND: Neoadjuvant therapy followed by surgery has become the standard of care for locally advanced esophageal squamous cell carcinoma (ESCC) and accurate pathological response assessment is critical to assess the therapeutic efficacy. However, it can be laborious and inconsistency between different observers may occur. Hence, we aim to develop an interpretable deep-learning model for efficient pathological response assessment following neoadjuvant therapy in ESCC. METHODS: This retrospective study analyzed 337 ESCC resection specimens from 2020-2021 at the Pudong-Branch (Cohort 1) and 114 from 2021-2022 at the Puxi-Branch (External Cohort 2) of Fudan University Shanghai Cancer Center. Whole slide images (WSIs) from these two cohorts were generated using different scanning machines to test the ability of the model in handling color variations. Four pathologists independently assessed the pathological response. The senior pathologists annotated tumor beds and residual tumor percentages on WSIs to determine consensus labels. Furthermore, 1850 image patches were randomly extracted from Cohort 1 WSIs and binarily classified for tumor viability. A deep-learning model employing knowledge distillation was developed to automatically classify positive patches for each WSI and estimate the viable residual tumor percentages. Spatial heatmaps were output for model explanations and visualizations. RESULTS: The approach achieved high concordance with pathologist consensus, with an R^2 of 0.8437, a RAcc_0.1 of 0.7586, a RAcc_0.3 of 0.9885, which were comparable to two senior pathologists (R^2 of 0.9202/0.9619, RAcc_0.1 of 8506/0.9425, RAcc_0.3 of 1.000/1.000) and surpassing two junior pathologists (R^2 of 0.5592/0.5474, RAcc_0.1 of 0.5287/0.5287, RAcc_0.3 of 0.9080/0.9310). Visualizations enabled the localization of residual viable tumor to augment microscopic assessment. CONCLUSION: This work illustrates deep learning's potential for assisting pathological response assessment. Spatial heatmaps and patch examples provide intuitive explanations of model predictions, engendering clinical trust and adoption (Code and data will be available at https://github.com/WinnieLaugh/ESCC_Percentage once the paper has been conditionally accepted). Integrating interpretable computational pathology could help enhance the efficiency and consistency of tumor response assessment and empower precise oncology treatment decisions.

6.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34929738

RESUMO

The prediction of drug-target affinity (DTA) plays an increasingly important role in drug discovery. Nowadays, lots of prediction methods focus on feature encoding of drugs and proteins, but ignore the importance of feature aggregation. However, the increasingly complex encoder networks lead to the loss of implicit information and excessive model size. To this end, we propose a deep-learning-based approach namely FusionDTA. For the loss of implicit information, a novel muti-head linear attention mechanism was utilized to replace the rough pooling method. This allows FusionDTA aggregates global information based on attention weights, instead of selecting the largest one as max-pooling does. To solve the redundancy issue of parameters, we applied knowledge distillation in FusionDTA by transfering learnable information from teacher model to student. Results show that FusionDTA performs better than existing models for the test domain on all evaluation metrics. We obtained concordance index (CI) index of 0.913 and 0.906 in Davis and KIBA dataset respectively, compared with 0.893 and 0.891 of previous state-of-art model. Under the cold-start constrain, our model proved to be more robust and more effective with unseen inputs than baseline methods. In addition, the knowledge distillation did save half of the parameters of the model, with only 0.006 reduction in CI index. Even FusionDTA with half the parameters could easily exceed the baseline on all metrics. In general, our model has superior performance and improves the effect of drug-target interaction (DTI) prediction. The visualization of DTI can effectively help predict the binding region of proteins during structure-based drug design.


Assuntos
Desenvolvimento de Medicamentos , Proteínas , Descoberta de Drogas , Humanos , Conhecimento , Proteínas/química
7.
J Biomed Inform ; 154: 104651, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38703936

RESUMO

OBJECTIVE: Chatbots have the potential to improve user compliance in electronic Patient-Reported Outcome (ePRO) system. Compared to rule-based chatbots, Large Language Model (LLM) offers advantages such as simplifying the development process and increasing conversational flexibility. However, there is currently a lack of practical applications of LLMs in ePRO systems. Therefore, this study utilized ChatGPT to develop the Chat-ePRO system and designed a pilot study to explore the feasibility of building an ePRO system based on LLM. MATERIALS AND METHODS: This study employed prompt engineering and offline knowledge distillation to design a dialogue algorithm and built the Chat-ePRO system on the WeChat Mini Program platform. In order to compare Chat-ePRO with the form-based ePRO and rule-based chatbot ePRO used in previous studies, we conducted a pilot study applying the three ePRO systems sequentially at the Sir Run Run Shaw Hospital to collect patients' PRO data. RESULT: Chat-ePRO is capable of correctly generating conversation based on PRO forms (success rate: 95.7 %) and accurately extracting the PRO data instantaneously from conversation (Macro-F1: 0.95). The majority of subjective evaluations from doctors (>70 %) suggest that Chat-ePRO is able to comprehend questions and consistently generate responses. Pilot study shows that Chat-ePRO demonstrates higher response rate (9/10, 90 %) and longer interaction time (10.86 s/turn) compared to the other two methods. CONCLUSION: Our study demonstrated the feasibility of utilizing algorithms such as prompt engineering to drive LLM in completing ePRO data collection tasks, and validated that the Chat-ePRO system can effectively enhance patient compliance.


Assuntos
Algoritmos , Medidas de Resultados Relatados pelo Paciente , Projetos Piloto , Humanos , Masculino , Feminino , Registros Eletrônicos de Saúde , Pessoa de Meia-Idade , Adulto
8.
J Biomed Inform ; 158: 104728, 2024 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-39307515

RESUMO

OBJECTIVE: Histological classification is a challenging task due to the diverse appearances, unpredictable variations, and blurry edges of histological tissues. Recently, many approaches based on large networks have achieved satisfactory performance. However, most of these methods rely heavily on substantial computational resources and large high-quality datasets, limiting their practical application. Knowledge Distillation (KD) offers a promising solution by enabling smaller networks to achieve performance comparable to that of larger networks. Nonetheless, KD is hindered by the problem of high-dimensional characteristics, which makes it difficult to capture tiny scattered features and often leads to the loss of edge feature relationships. METHODS: A novel cross-domain visual prompting distillation approach is proposed, compelling the teacher network to facilitate the extraction of significant high-dimensional features into low-dimensional feature maps, thereby aiding the student network in achieving superior performance. Additionally, a dynamic learnable temperature module based on novel vector-based spatial proximity is introduced to further encourage the student to imitate the teacher. RESULTS: Experiments conducted on widely accepted histological datasets, NCT-CRC-HE-100K and LC25000, demonstrate the effectiveness of the proposed method and validate its robustness on the popular dermoscopic dataset ISIC-2019. Compared to state-of-the-art knowledge distillation methods, the proposed method achieves better performance and greater robustness with optimal domain adaptation. CONCLUSION: A novel distillation architecture, termed VPSP, tailored for histological classification, is proposed. This architecture achieves superior performance with optimal domain adaptation, enhancing the clinical application of histological classification. The source code will be released at https://github.com/xiaohongji/VPSP.

9.
J Biomech Eng ; 146(3)2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-37490328

RESUMO

Accurate occupant injury prediction in near-collision scenarios is vital in guiding intelligent vehicles to find the optimal collision condition with minimal injury risks. Existing studies focused on boosting prediction performance by introducing deep-learning models but encountered computational burdens due to the inherent high model complexity. To better balance these two traditionally contradictory factors, this study proposed a training method for pre-crash injury prediction models, namely, knowledge distillation (KD)-based training. This method was inspired by the idea of knowledge distillation, an emerging model compression method. Technically, we first trained a high-accuracy injury prediction model using informative post-crash sequence inputs (i.e., vehicle crash pulses) and a relatively complex network architecture as an experienced "teacher". Following this, a lightweight pre-crash injury prediction model ("student") learned both from the ground truth in output layers (i.e., conventional prediction loss) and its teacher in intermediate layers (i.e., distillation loss). In such a step-by-step teaching framework, the pre-crash model significantly improved the prediction accuracy of occupant's head abbreviated injury scale (AIS) (i.e., from 77.2% to 83.2%) without sacrificing computational efficiency. Multiple validation experiments proved the effectiveness of the proposed KD-based training framework. This study is expected to provide reference to balancing prediction accuracy and computational efficiency of pre-crash injury prediction models, promoting the further safety improvement of next-generation intelligent vehicles.


Assuntos
Acidentes de Trânsito , Ferimentos e Lesões , Humanos , Risco , Escala Resumida de Ferimentos
10.
Lasers Med Sci ; 39(1): 129, 2024 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-38735976

RESUMO

Diabetic nephropathy is a serious complication of diabetes, and primary Sjögren's syndrome is a disease that poses a major threat to women's health. Therefore, studying these two diseases is of practical significance. In the field of spectral analysis, although common Raman spectral feature selection models can effectively extract features, they have the problem of changing the characteristics of the original data. The teacher-student network combined with Raman spectroscopy can perform feature selection while retaining the original features, and transfer the performance of the complex deep neural network structure to another lightweight network structure model. This study selects five flow learning models as the teacher network, builds a neural network as the student network, uses multi-layer perceptron for classification, and selects the optimal features based on the evaluation indicators accuracy, precision, recall, and F1-score. After five-fold cross-validation, the research results show that in the diagnosis of diabetic nephropathy, the optimal accuracy rate can reach 98.3%, which is 14.02% higher than the existing research; in the diagnosis of primary Sjögren's syndrome, the optimal accuracy rate can be reached 100%, which is 10.48% higher than the existing research. This study proved the feasibility of Raman spectroscopy combined with teacher-student network in the field of disease diagnosis by producing good experimental results in the diagnosis of diabetic nephropathy and primary Sjögren's syndrome.


Assuntos
Nefropatias Diabéticas , Redes Neurais de Computação , Síndrome de Sjogren , Análise Espectral Raman , Humanos , Análise Espectral Raman/métodos , Nefropatias Diabéticas/diagnóstico , Síndrome de Sjogren/diagnóstico , Feminino
11.
Sensors (Basel) ; 24(11)2024 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-38894313

RESUMO

The purpose of this paper is to propose a novel transfer learning regularization method based on knowledge distillation. Recently, transfer learning methods have been used in various fields. However, problems such as knowledge loss still occur during the process of transfer learning to a new target dataset. To solve these problems, there are various regularization methods based on knowledge distillation techniques. In this paper, we propose a transfer learning regularization method based on feature map alignment used in the field of knowledge distillation. The proposed method is composed of two attention-based submodules: self-pixel attention (SPA) and global channel attention (GCA). The self-pixel attention submodule utilizes both the feature maps of the source and target models, so that it provides an opportunity to jointly consider the features of the target and the knowledge of the source. The global channel attention submodule determines the importance of channels through all layers, unlike the existing methods that calculate these only within a single layer. Accordingly, transfer learning regularization is performed by considering both the interior of each single layer and the depth of the entire layer. Consequently, the proposed method using both of these submodules showed overall improved classification accuracy than the existing methods in classification experiments on commonly used datasets.

12.
Sensors (Basel) ; 24(6)2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38544077

RESUMO

In recent computer vision research, the pursuit of improved classification performance often leads to the adoption of complex, large-scale models. However, the actual deployment of such extensive models poses significant challenges in environments constrained by limited computing power and storage capacity. Consequently, this study is dedicated to addressing these challenges by focusing on innovative methods that enhance the classification performance of lightweight models. We propose a novel method to compress the knowledge learned by a large model into a lightweight one so that the latter can also achieve good performance in few-shot classification tasks. Specifically, we propose a dual-faceted knowledge distillation strategy that combines output-based and intermediate feature-based methods. The output-based method concentrates on distilling knowledge related to base class labels, while the intermediate feature-based approach, augmented by feature error distribution calibration, tackles the potential non-Gaussian nature of feature deviations, thereby boosting the effectiveness of knowledge transfer. Experiments conducted on MiniImageNet, CIFAR-FS, and CUB datasets demonstrate the superior performance of our method over state-of-the-art lightweight models, particularly in five-way one-shot and five-way five-shot tasks.

13.
Sensors (Basel) ; 24(13)2024 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-39000906

RESUMO

Rock image classification represents a challenging fine-grained image classification task characterized by subtle differences among closely related rock categories. Current contrastive learning methods prevalently utilized in fine-grained image classification restrict the model's capacity to discern critical features contrastively from image pairs, and are typically too large for deployment on mobile devices used for in situ rock identification. In this work, we introduce an innovative and compact model generation framework anchored by the design of a Feature Positioning Comparison Network (FPCN). The FPCN facilitates interaction between feature vectors from localized regions within image pairs, capturing both shared and distinctive features. Further, it accommodates the variable scales of objects depicted in images, which correspond to differing quantities of inherent object information, directing the network's attention to additional contextual details based on object size variability. Leveraging knowledge distillation, the architecture is streamlined, with a focus on nuanced information at activation boundaries to master the precise fine-grained decision boundaries, thereby enhancing the small model's accuracy. Empirical evidence demonstrates that our proposed method based on FPCN improves the classification accuracy mobile lightweight models by nearly 2% while maintaining the same time and space consumption.

14.
Sensors (Basel) ; 24(6)2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38544021

RESUMO

Compared to fault diagnosis across operating conditions, the differences in data distribution between devices are more pronounced and better aligned with practical application needs. However, current research on transfer learning inadequately addresses fault diagnosis issues across devices. To better balance the relationship between computational resources and diagnostic accuracy, a knowledge distillation-based lightweight transfer learning framework for rolling bearing diagnosis is proposed in this study. Specifically, a deep teacher-student model based on variable-scale residual networks is constructed to learn domain-invariant features relevant to fault classification within both the source and target domain data. Subsequently, a knowledge distillation framework incorporating a temperature factor is established to transfer fault features learned by the large teacher model in the source domain to the smaller student model, thereby reducing computational and parameter overhead. Finally, a multi-kernel domain adaptation method is employed to capture the feature probability distribution distance of fault characteristics between the source and target domains in Reproducing Kernel Hilbert Space (RKHS), and domain-invariant features are learned by minimizing the distribution distance between them. The effectiveness and applicability of the proposed method in situations of incomplete data across device types were validated through two engineering cases, spanning device models and transitioning from laboratory equipment to real-world operational devices.

15.
Sensors (Basel) ; 24(5)2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38475094

RESUMO

The identification and classification of traditional Chinese herbal medicines demand significant time and expertise. We propose the dual-teacher supervised decay (DTSD) approach, an enhancement for Chinese herbal medicine recognition utilizing a refined knowledge distillation model. The DTSD method refines output soft labels, adapts attenuation parameters, and employs a dynamic combination loss in the teacher model. Implemented on the lightweight MobileNet_v3 network, the methodology is deployed successfully in a mobile application. Experimental results reveal that incorporating the exponential warmup learning rate reduction strategy during training optimizes the knowledge distillation model, achieving an average classification accuracy of 98.60% for 10 types of Chinese herbal medicine images. The model boasts an average detection time of 0.0172 s per image, with a compressed size of 10 MB. Comparative experiments demonstrate the superior performance of our refined model over DenseNet121, ResNet50_vd, Xception65, and EfficientNetB1. This refined model not only introduces an approach to Chinese herbal medicine image recognition but also provides a practical solution for lightweight models in mobile applications.


Assuntos
Medicamentos de Ervas Chinesas , Aplicativos Móveis , Conhecimento , Aprendizagem , Reconhecimento Psicológico
16.
Sensors (Basel) ; 24(5)2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38475151

RESUMO

An equalizer based on a recurrent neural network (RNN), especially with a bidirectional gated recurrent unit (biGRU) structure, is a good choice to deal with nonlinear damage and inter-symbol interference (ISI) in optical communication systems because of its excellent performance in processing time series information. However, its recursive structure prevents the parallelization of the computation, resulting in a low equalization rate. In order to improve the speed without compromising the equalization performance, we propose a minimalist 1D convolutional neural network (CNN) equalizer, which is reconverted from a biGRU with knowledge distillation (KD). In this work, we applied KD to regression problems and explain how KD helps students learn from teachers in solving regression problems. In addition, we compared the biGRU, 1D-CNN after KD and 1D-CNN without KD in terms of Q-factor and equalization velocity. The experimental data showed that the Q-factor of the 1D-CNN increased by 1 dB after KD learning from the biGRU, and KD increased the RoP sensitivity of the 1D-CNN by 0.89 dB with the HD-FEC threshold of 1 × 10-3. At the same time, compared with the biGRU, the proposed 1D-CNN equalizer reduced the computational time consumption by 97% and the number of trainable parameters by 99.3%, with only a 0.5 dB Q-factor penalty. The results demonstrate that the proposed minimalist 1D-CNN equalizer holds significant promise for future practical deployments in optical wireless communication systems.

17.
Sensors (Basel) ; 24(15)2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39123989

RESUMO

In order to shorten detection times and improve average precision in embedded devices, a lightweight and high-accuracy model is proposed to detect passion fruit in complex environments (e.g., with backlighting, occlusion, overlap, sun, cloud, or rain). First, replacing the backbone network of YOLOv5 with a lightweight GhostNet model reduces the number of parameters and computational complexity while improving the detection speed. Second, a new feature branch is added to the backbone network and the feature fusion layer in the neck network is reconstructed to effectively combine the lower- and higher-level features, which improves the accuracy of the model while maintaining its lightweight nature. Finally, a knowledge distillation method is used to transfer knowledge from the more capable teacher model to the less capable student model, significantly improving the detection accuracy. The improved model is denoted as G-YOLO-NK. The average accuracy of the G-YOLO-NK network is 96.00%, which is 1.00% higher than that of the original YOLOv5s model. Furthermore, the model size is 7.14 MB, half that of the original model, and its real-time detection frame rate is 11.25 FPS when implemented on the Jetson Nano. The proposed model is found to outperform state-of-the-art models in terms of average precision and detection performance. The present work provides an effective model for real-time detection of passion fruit in complex orchard scenes, offering valuable technical support for the development of orchard picking robots and greatly improving the intelligence level of orchards.

18.
Sensors (Basel) ; 24(11)2024 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-38894276

RESUMO

Malicious social bots pose a serious threat to social network security by spreading false information and guiding bad opinions in social networks. The singularity and scarcity of single organization data and the high cost of labeling social bots have given rise to the construction of federated models that combine federated learning with social bot detection. In this paper, we first combine the federated learning framework with the Relational Graph Convolutional Neural Network (RGCN) model to achieve federated social bot detection. A class-level cross entropy loss function is applied in the local model training to mitigate the effects of the class imbalance problem in local data. To address the data heterogeneity issue from multiple participants, we optimize the classical federated learning algorithm by applying knowledge distillation methods. Specifically, we adjust the client-side and server-side models separately: training a global generator to generate pseudo-samples based on the local data distribution knowledge to correct the optimization direction of client-side classification models, and integrating client-side classification models' knowledge on the server side to guide the training of the global classification model. We conduct extensive experiments on widely used datasets, and the results demonstrate the effectiveness of our approach in social bot detection in heterogeneous data scenarios. Compared to baseline methods, our approach achieves a nearly 3-10% improvement in detection accuracy when the data heterogeneity is larger. Additionally, our method achieves the specified accuracy with minimal communication rounds.

19.
Sensors (Basel) ; 24(11)2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38894408

RESUMO

Most logit-based knowledge distillation methods transfer soft labels from the teacher model to the student model via Kullback-Leibler divergence based on softmax, an exponential normalization function. However, this exponential nature of softmax tends to prioritize the largest class (target class) while neglecting smaller ones (non-target classes), leading to an oversight of the non-target classes's significance. To address this issue, we propose Non-Target-Class-Enhanced Knowledge Distillation (NTCE-KD) to amplify the role of non-target classes both in terms of magnitude and diversity. Specifically, we present a magnitude-enhanced Kullback-Leibler (MKL) divergence multi-shrinking the target class to enhance the impact of non-target classes in terms of magnitude. Additionally, to enrich the diversity of non-target classes, we introduce a diversity-based data augmentation strategy (DDA), further enhancing overall performance. Extensive experimental results on the CIFAR-100 and ImageNet-1k datasets demonstrate that non-target classes are of great significance and that our method achieves state-of-the-art performance across a wide range of teacher-student pairs.

20.
Sensors (Basel) ; 24(14)2024 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-39066137

RESUMO

In response to the increasing number of agents and changing task scenarios in multi-agent collaborative systems, existing collaborative strategies struggle to effectively adapt to new task scenarios. To address this challenge, this paper proposes a knowledge distillation method combined with a domain separation network (DSN-KD). This method leverages the well-performing policy network from a source task as the teacher model, utilizes a domain-separated neural network structure to correct the teacher model's outputs as supervision, and guides the learning of agents in new tasks. The proposed method does not require the pre-design or training of complex state-action mappings, thereby reducing the cost of transfer. Experimental results in scenarios such as UAV surveillance and UAV cooperative target occupation, robot cooperative box pushing, UAV cooperative target strike, and multi-agent cooperative resource recovery in a particle simulation environment demonstrate that the DSN-KD transfer method effectively enhances the learning speed of new task policies and improves the proximity of the policy model to the theoretically optimal policy in practical tasks.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa