Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38669170

RESUMO

In this article, we propose a distributional policy-gradient method based on distributional reinforcement learning (RL) and policy gradient. Conventional RL algorithms typically estimate the expectation of return, given a state-action pair. Furthermore, distributional RL algorithms consider the return as a random variable and estimate the return distribution that can characterize the probability of different returns resulted by environmental uncertainties. Thus, the return distribution provides more valuable information than its expectation, leading to superior policies in general. Although distributional RL has been investigated widely in value-based RL methods, very few policy-gradient methods take advantage of distributional RL. To bridge this research gap, we propose a distributional policy-gradient method by introducing a distributional value function to the policy gradient (DVDPG). We estimate the distribution of policy gradient instead of the expectation estimated in conventional policy-gradient RL methods. Furthermore, we propose two policy-gradient value sampling mechanisms to do policy improvement. First, we propose a distribution-probability-sampling method that samples the policy-gradient value according to the quantile probability of return distribution. Second, a uniform sample mechanism is proposed. With our sample mechanisms, the proposed distributional policy-gradient method enhances the stochasticity of the policy gradient, improving the exploration efficiency and benefiting to avoid falling into local optimal solutions. In sparse-reward tasks, the distribution-probability-sampling method outperforms the uniform sample mechanism. In dense-reward tasks, the two sample mechanisms perform similarly. Moreover, we show that the conventional policy-gradient method is a special case of the proposed method. Experimental results on various sparse-reward and dense-reward OpenAI-gym tasks illustrate the efficiency of the proposed method, outperforming baselines in almost environments.

2.
IEEE Trans Neural Netw Learn Syst ; 33(12): 7502-7512, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-34143742

RESUMO

The Preisach model and the neural networks are two of the most popular strategies to model hysteresis. In this article, we first mathematically prove that the rate-independent Preisach model is actually a diagonal recurrent neural network (dRNN) with the binary step activation function. For the first time, the hysteresis nature and conditions of the classical dRNN with the tanh activation function are mathematically discovered and investigated, instead of using the common black-box approach and its variants. It is shown that the dRNN neuron is a versatile rate-dependent hysteresis system under specific conditions. The dRNN composed of those neurons can be used for modeling the rate-dependent hysteresis and it can approximate the Preisach model with arbitrary precision with specific parameters for rate-independent hysteresis modeling. Experiments show that the classical dRNN models both kinds of hysteresis more accurately and efficiently than the Preisach model.


Assuntos
Redes Neurais de Computação , Neurônios
3.
IEEE Trans Biomed Eng ; 60(6): 1518-27, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23380840

RESUMO

This paper presents an efficient approach to achieve microparticles flocking with robotics and optical tweezers technologies. All particles trapped by optical tweezers can be automatically moved toward a predefined region without collision. The main contribution of this paper lies in the proposal of several solutions to the flocking manipulation of microparticles in microenvironments. First, a simple flocking controller is proposed to generate the desired positions and velocities for particles' movement. Second, a velocity saturation method is implemented to prevent the desired velocities from exceeding a safe limit. Third, a two-layer control architecture is proposed for the motion control of optical tweezers. This architecture can help make many robotic manipulations achievable under microenvironments. The proposed approach with these solutions can be applied to many bioapplications especially in cell engineering and biomedicine. Experiments on yeast cells with a robot-tweezers system are finally performed to verify the effectiveness of the proposed approach.


Assuntos
Floculação , Micromanipulação/instrumentação , Micromanipulação/métodos , Pinças Ópticas , Modelos Teóricos , Robótica/instrumentação , Leveduras/citologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...