Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 4152-4166, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-35853052

RESUMEN

Goal-conditioned Hierarchical Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large. Searching in a large goal space poses difficulty for both high-level subgoal generation and low-level policy learning. In this article, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a k-step adjacent region of the current state using an adjacency constraint. We theoretically prove that in a deterministic Markov Decision Process (MDP), the proposed adjacency constraint preserves the optimal hierarchical policy, while in a stochastic MDP the adjacency constraint induces a bounded state-value suboptimality determined by the MDP's transition structure. We further show that this constraint can be practically implemented by training an adjacency network that can discriminate between adjacent and non-adjacent subgoals. Experimental results on discrete and continuous control tasks including challenging simulated robot locomotion and manipulation tasks show that incorporating the adjacency constraint significantly boosts the performance of state-of-the-art goal-conditioned HRL approaches.

2.
IEEE Trans Neural Netw Learn Syst ; 34(12): 10359-10373, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-35468065

RESUMEN

Undiscounted return is an important setup in reinforcement learning (RL) and characterizes many real-world problems. However, optimizing an undiscounted return often causes training instability. The causes of this instability problem have not been analyzed in-depth by existing studies. In this article, this problem is analyzed from the perspective of value estimation. The analysis result indicates that the instability originates from transient traps that are caused by inconsistently selected actions. However, selecting one consistent action in the same state limits exploration. For balancing exploration effectiveness and training stability, a novel sampling method called last-visit sampling (LVS) is proposed to ensure that a part of actions is selected consistently in the same state. The LVS method decomposes the state-action value into two parts, i.e., the last-visit (LV) value and the revisit value. The decomposition ensures that the LV value is determined by consistently selected actions. We prove that the LVS method can eliminate transient traps while preserving optimality. Also, we empirically show that the method can stabilize the training processes of five typical tasks, including vision-based navigation and manipulation tasks.

3.
IEEE Trans Neural Netw Learn Syst ; 33(11): 6458-6472, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-34115593

RESUMEN

Auxiliary rewards are widely used in complex reinforcement learning tasks. However, previous work can hardly avoid the interference of auxiliary rewards on pursuing the main rewards, which leads to the destruction of the optimal policy. Thus, it is challenging but essential to balance the main and auxiliary rewards. In this article, we explicitly formulate the problem of rewards' balancing as searching for a Pareto optimal solution, with the overall objective of preserving the policy's optimization orientation for the main rewards (i.e., the policy driven by the balanced rewards is consistent with the policy driven by the main rewards). To this end, we propose a variant Pareto and show that it can effectively guide the policy search toward more main rewards. Furthermore, we establish an iterative learning framework for rewards' balancing and theoretically analyze its convergence and time complexity. Experiments in both discrete (grid word) and continuous (Doom) environments demonstrated that our algorithm can effectively balance rewards, and achieve remarkable performance compared with those RLs with heuristically designed rewards. In the ViZDoom platform, our algorithm can learn expert-level policies.


Asunto(s)
Redes Neurales de la Computación , Refuerzo en Psicología , Simulación por Computador , Recompensa , Aprendizaje
4.
Cell Cycle ; 21(3): 228-246, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-34965191

RESUMEN

As a newly discovered cancer-related molecule, we explored the unreported mechanism of LINC01615 intervention in colon cancer.LINC01615 expression in clinical samples and cells were detected. Effects of LINC01615 silencing/overexpression on the malignant development of colon cancer cells were analyzed through cell function experiments. Changes at the level of molecular biology were detected by quantitative real-time polymerase chain reaction and Western blot. Bioinformatics analysis and dual luciferase reporter assay were involved in the display and verification of targeted binding sequences. The rescue tests and correlation analysis examined the relationship among LINC01615, miR-3653-3p and zinc finger E-box binding homeobox 2 (ZEB2) in colon cancer cells. The xenograft experiment and immunohistochemistry were performed to verify these results.TCGA suggested that LINC01615 was high-expressed in colon cancer, as verified in clinical and cell samples, and patients with LINC01615 overexpression suffered from a poor prognosis. Silent LINC01615 blocked the malignant development of colon cancer cells through regulating related genes expressions, while overexpressed LINC01615 had the opposite effect. LINC01615, which was targeted by miR-3653-3p, partially offset the inhibitory effect of miR-3653-3p on colon cancer cells. The downstream target gene ZEB2 of miR-3653-3p was high-expressed in colon cancer. MiR-3653-3p was negatively correlated with LINC01615 or ZEB2, while LINC01615 was positively correlated with ZEB2. Therefore, LINC01615 induced ZEB2 up-regulation, while miR-3653-3p reduced ZEB2 level. The results of in vivo studies were consistent with cell experiments.LINC01615 competitively binds with miR-3653-3p to regulate ZEB2 and promote canceration of colon cancer cells.


Asunto(s)
Neoplasias del Colon , MicroARNs , ARN no Traducido/genética , Carcinogénesis/genética , Línea Celular Tumoral , Proliferación Celular/genética , Neoplasias del Colon/genética , Regulación Neoplásica de la Expresión Génica/genética , Humanos , MicroARNs/genética , MicroARNs/metabolismo , Caja Homeótica 2 de Unión a E-Box con Dedos de Zinc/genética , Caja Homeótica 2 de Unión a E-Box con Dedos de Zinc/metabolismo
5.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5572-5589, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-33764874

RESUMEN

It is difficult to solve complex tasks that involve large state spaces and long-term decision processes by reinforcement learning (RL) algorithms. A common and promising method to address this challenge is to compress a large RL problem into a small one. Towards this goal, the compression should be state-temporal and optimality-preserving (i.e., the optimal policy of the compressed problem should correspond to that of the uncompressed problem). In this paper, we propose a reward-restricted geodesic (RRG) metric, which can be learned by a neural network, to perform state-temporal compression in RL. We prove that compression based on the RRG metric is approximately optimality-preserving for the raw RL problem endowed with temporally abstract actions. With this compression, we design an RRG metric-based reinforcement learning (RRG-RL) algorithm to solve complex tasks. Experiments in both discrete (2D Minecraft) and continuous (Doom) environments demonstrated the superiority of our method over existing RL approaches.

6.
IEEE Trans Neural Netw Learn Syst ; 31(6): 1884-1898, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-31395557

RESUMEN

Lifelong learning is a crucial issue in advanced artificial intelligence. It requires the learning system to learn and accumulate knowledge from sequential tasks. The learning system needs to deal with increasingly more domains and tasks. We consider that the key to an effective and efficient lifelong learning system is the ability to memorize and recall the learned knowledge using neural networks. Following this idea, we propose Generative Memory (GM) as a novel memory module, and the resulting lifelong learning system is referred to as the GM Net (GMNet). To make the GMNet feasible, we propose a novel learning mechanism, referred to as P -invariant learning method. It replaces the memory of the real data by a memory of the data distribution, which makes it possible for the learning system to accurately and continuously accumulate the learned experiences. We demonstrate that GMNet achieves the state-of-the-art performance on lifelong learning tasks.

7.
IEEE Trans Cybern ; 50(3): 1347-1354, 2020 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-30295641

RESUMEN

Hidden Markov models (HMMs) underpin the solution to many problems in computational neuroscience. However, it is still unclear how to implement inference of HMMs with a network of neurons in the brain. The existing methods suffer from the problem of being nonspiking and inaccurate. Here, we build a precise equivalence between the inference equation of HMMs with time-invariant hidden variables and the dynamics of spiking winner-take-all (WTA) neural networks. We show that the membrane potential of each spiking neuron in the WTA circuit encodes the logarithm of the posterior probability of the hidden variable in each state, and the firing rate of each neuron is proportional to the posterior probability of the HMMs. We prove that the time course of the neural firing rate can implement posterior inference of HMMs. Theoretical analysis and experimental results show that the proposed WTA circuit can get accurate inference results of HMMs.


Asunto(s)
Potenciales de Acción/fisiología , Cadenas de Markov , Modelos Neurológicos , Redes Neurales de la Computación , Encéfalo/fisiología , Conducta Competitiva , Humanos
8.
IEEE Trans Cybern ; 49(1): 133-145, 2019 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-29990165

RESUMEN

Numerous experimental data from neuroscience and psychological science suggest that human brain utilizes Bayesian principles to deal the complex environment. Furthermore, hierarchical Bayesian inference has been proposed as an appropriate theoretical framework for modeling cortical processing. However, it remains unknown how such a computation is organized in the network of biologically plausible spiking neurons. In this paper, we propose a hierarchical network of winner-take-all circuits which can carry out hierarchical Bayesian inference and learning through a spike-based variational expectation maximization (EM) algorithm. Particularly, we show how the firing activities of spiking neurons in response to the input stimuli and the spike-timing-dependent plasticity rule can be understood, respectively, as variational E-step and M-step of variational EM. Finally, we demonstrate the utility of this spiking neural network on the MNIST benchmark for unsupervised classification of handwritten digits.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA