Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
Artículo en Inglés | MEDLINE | ID: mdl-38758623

RESUMEN

Excessive invalid explorations at the beginning of training lead deep reinforcement learning process to fall into the risk of overfitting, further resulting in spurious decisions, which obstruct agents in the following states and explorations. This phenomenon is termed primacy bias in online reinforcement learning. This work systematically investigates the primacy bias in online reinforcement learning, discussing the reason for primacy bias, while the characteristic of primacy bias is also analyzed. Besides, to learn a policy generalized to the following states and explorations, we develop an online reinforcement learning framework, termed self-distillation reinforcement learning (SDRL), based on knowledge distillation, allowing the agent to transfer the learned knowledge into a randomly initialized policy at regular intervals, and the new policy network is used to replace the original one in the following training. The core idea for this work is distilling knowledge from the trained policy to another policy can filter biases out, generating a more generalized policy in the learning process. Moreover, to avoid the overfitting of the new policy due to excessive distillations, we add an additional loss in the knowledge distillation process, using L2 regularization to improve the generalization, and the self-imitation mechanism is introduced to accelerate the learning on the current experiences. The results of several experiments in DMC and Atari 100k suggest the proposal has the ability to eliminate primacy bias for reinforcement learning methods, and the policy after knowledge distillation can urge agents to get higher scores more quickly.

2.
Artículo en Inglés | MEDLINE | ID: mdl-37339032

RESUMEN

Introducing deep learning technologies into the medical image processing field requires accuracy guarantee, especially for high-resolution images relayed through endoscopes. Moreover, works relying on supervised learning are powerless in the case of inadequate labeled samples. Therefore, for end-to-end medical image detection with overcritical efficiency and accuracy in endoscope detection, an ensemble-learning-based model with a semi-supervised mechanism is developed in this work. To gain a more accurate result through multiple detection models, we propose a new ensemble mechanism, termed alternative adaptive boosting method (Al-Adaboost), combining the decision-making of two hierarchical models. Specifically, the proposal consists of two modules. One is a local region proposal model with attentive temporal-spatial pathways for bounding box regression and classification, and the other one is a recurrent attention model (RAM) to provide more precise inferences for further classification according to the regression result. The proposal Al-Adaboost will adjust the weights of labeled samples and the two classifiers adaptively, and the nonlabel samples are assigned pseudolabels by our model. We investigate the performance of Al-Adaboost on both the colonoscopy and laryngoscopy data coming from CVC-ClinicDB and the affiliated hospital of Kaohsiung Medical University. The experimental results prove the feasibility and superiority of our model.

3.
J Sep Sci ; 46(13): e2201057, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37031438

RESUMEN

The ability to extract peptides and proteins from biological samples with excellent reusability, high adsorption capacity, and great selectivity is essential in scientific research and medical applications. Inspired by the advantages of core-shell materials, we fabricated a core-shell material using amino-functionalized silica as the core. Benzene-1,3,5-tricarbaldehyde and 3,5-diaminobenzoic acid were used as model organic ligands to construct a shell coating by alternately reacting the two monomers on the surface of silica microspheres. The resultant material featured an outstanding capability for the adsorption of cationic peptides, most likely owing to its porous structure, a large number of carboxylic functional groups, and low mass-transfer resistance. The maximum saturated adsorption capacity reached 833.3 mg/g and the adsorption process took only 20 min. Under optimized adsorption conditions, the core-shell material was used to selectively adsorb cationic peptides from the tryptic digestive solution of lysozyme and bovine serum albumin, Specifically, the analysis results showed seven cationic peptides in the eluate and twenty anionic peptides in the supernatant, which indicates the efficient trap of most cationic peptides in the digestive solution.


Asunto(s)
Péptidos , Albúmina Sérica Bovina , Adsorción , Albúmina Sérica Bovina/química , Dióxido de Silicio/química , Microesferas
4.
IEEE Trans Cybern ; 53(3): 1699-1711, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-34506297

RESUMEN

Some researchers have introduced transfer learning mechanisms to multiagent reinforcement learning (MARL). However, the existing works devoted to cross-task transfer for multiagent systems were designed just for homogeneous agents or similar domains. This work proposes an all-purpose cross-transfer method, called multiagent lateral transfer (MALT), assisting MARL with alleviating the training burden. We discuss several challenges in developing an all-purpose multiagent cross-task transfer learning method and provide a feasible way of reusing knowledge for MARL. In the developed method, we take features as the transfer object rather than policies or experiences, inspired by the progressive network. To achieve more efficient transfer, we assign pretrained policy networks for agents based on clustering, while an attention module is introduced to enhance the transfer framework. The proposed method has no strict requirements for the source task and target task. Compared with the existing works, our method can transfer knowledge among heterogeneous agents and also avoid negative transfer in the case of fully different tasks. As far as we know, this article is the first work denoted to all-purpose cross-task transfer for MARL. Several experiments in various scenarios have been conducted to compare the performance of the proposed method with baselines. The results demonstrate that the method is sufficiently flexible for most settings, including cooperative, competitive, homogeneous, and heterogeneous configurations.

5.
Entropy (Basel) ; 24(1)2022 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-35052153

RESUMEN

Continuous-variable measure-device-independent quantum key distribution (CV-MDI QKD) is proposed to remove all imperfections originating from detection. However, there are still some inevitable imperfections in a practical CV-MDI QKD system. For example, there is a fluctuating channel transmittance in the complex communication environments. Here we investigate the security of the system under the effects of the fluctuating channel transmittance, where the transmittance is regarded as a fixed value related to communication distance in theory. We first discuss the parameter estimation in fluctuating channel transmittance based on these establishing of channel models, which has an obvious deviation compared with the estimated parameters in the ideal case. Then, we show the evaluated results when the channel transmittance respectively obeys the two-point distribution and the uniform distribution. In particular, the two distributions can be easily realized under the manipulation of eavesdroppers. Finally, we analyze the secret key rate of the system when the channel transmittance obeys the above distributions. The simulation analysis indicates that a slight fluctuation of the channel transmittance may seriously reduce the performance of the system, especially in the extreme asymmetric case. Furthermore, the communication between Alice, Bob and Charlie may be immediately interrupted. Therefore, eavesdroppers can manipulate the channel transmittance to complete a denial-of-service attack in a practical CV-MDI QKD system. To resist this attack, the Gaussian post-selection method can be exploited to calibrate the parameter estimation to reduce the deterioration of performance of the system.

6.
Entropy (Basel) ; 23(2)2021 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-33573307

RESUMEN

In quantum key distribution (QKD), there are some security loopholes opened by the gaps between the theoretical model and the practical system, and they may be exploited by eavesdroppers (Eve) to obtain secret key information without being detected. This is an effective quantum hacking strategy that seriously threatens the security of practical QKD systems. In this paper, we propose a new quantum hacking attack on an integrated silicon photonic continuous-variable quantum key distribution (CVQKD) system, which is known as a power analysis attack. This attack can be implemented by analyzing the power originating from the integrated electrical control circuit in state preparation with the help of machine learning, where the state preparation is assumed to be perfect in initial security proofs. Specifically, we describe a possible power model and show a complete attack based on a support vector regression (SVR) algorithm. The simulation results show that the secret key information decreases with the increase of the accuracy of the attack, especially in a situation with less excess noise. In particular, Eve does not have to intrude into the transmitter chip (Alice), and may perform a similar attack in practical chip-based discrete-variable quantum key distribution (DVQKD) systems. To resist this attack, the electrical control circuit should be improved to randomize the corresponding power. In addition, the power can be reduced by utilizing the dynamic voltage and frequency scaling (DVFS) technology.

7.
ISA Trans ; 98: 434-444, 2020 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-31543262

RESUMEN

For deep reinforcement learning (DRL) system, it is difficult to design a reward function for complex tasks, so this paper proposes a framework of behavior fusion for the actor-critic architecture, which learns the policy based on an advantage function that consists of two value functions. Firstly, the proposed method decomposes a complex task into several sub-tasks, and merges the trained policies for those sub-tasks into a unified policy for the complex task, instead of designing a new reward function and training for the policy. Each sub-task is trained individually by an actor-critic algorithm using a simple reward function. These pre-trained sub-tasks are building blocks that are used to rapidly assemble a rapid prototype of a complicated task. Secondly, the proposed method integrates modules in the calculation of the policy gradient by calculating the accumulated returns to reduce variation. Thirdly, two alternative methods to acquire integrated returns for the complicated task are also proposed. The Atari 2600 pong game and a wafer probe task are used to validate the performance of the proposed methods by comparison with the method using a gate network.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA