Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Sensors (Basel) ; 19(14)2019 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-31336630

RESUMO

This paper presents a detailed experimental assessment of Gaussian Process (GP) regression for air-to-ground communication channel prediction for relay missions in urban environment. Considering restrictions from outdoor urban flight experiments, a way to simulate complex urban environments at an indoor room scale is introduced. Since water significantly absorbs wireless communication signal, water containers are utilized to replace buildings in a real-world city. To evaluate the performance of the GP-based channel prediction approach, several indoor experiments in an artificial urban environment were conducted. The performance of the GP-based and empirical model-based prediction methods for a relay mission was evaluated by measuring and comparing the communication signal strength at the optimal relay position obtained from each method. The GP-based prediction approach shows an advantage over the model-based one as it provides a reasonable performance without a need for a priori information of the environment (e.g., 3D map of the city and communication model parameters) in dynamic urban environments.

2.
IEEE Trans Neural Netw Learn Syst ; 33(5): 2045-2056, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-34559664

RESUMO

In this article, we consider a subclass of partially observable Markov decision process (POMDP) problems which we termed confounding POMDPs. In these types of POMDPs, temporal difference (TD)-based reinforcement learning (RL) algorithms struggle, as TD error cannot be easily derived from observations. We solve these types of problems using a new bio-inspired neural architecture that combines a modulated Hebbian network (MOHN) with deep Q-network (DQN), which we call modulated Hebbian plus Q-network architecture (MOHQA). The key idea is to use a Hebbian network with rarely correlated bio-inspired neural traces to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low-level features and control, while the MOHN contributes to high-level decisions by associating rewards with past states and actions. Thus, the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the Malmo environment show that the proposed algorithm improved DQN's results and even outperformed control tests with advantage-actor critic (A2C), quantile regression DQN with long short-term memory (QRDQN + LSTM), Monte Carlo policy gradient (REINFORCE), and aggregated memory for reinforcement learning (AMRL) algorithms on most difficult POMDPs with confounding stimuli and sparse rewards.


Assuntos
Redes Neurais de Computação , Reforço Psicológico , Algoritmos , Cadeias de Markov , Recompensa
3.
Front Neurorobot ; 14: 578675, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33424575

RESUMO

The ability of an agent to detect changes in an environment is key to successful adaptation. This ability involves at least two phases: learning a model of an environment, and detecting that a change is likely to have occurred when this model is no longer accurate. This task is particularly challenging in partially observable environments, such as those modeled with partially observable Markov decision processes (POMDPs). Some predictive learners are able to infer the state from observations and thus perform better with partial observability. Predictive state representations (PSRs) and neural networks are two such tools that can be trained to predict the probabilities of future observations. However, most such existing methods focus primarily on static problems in which only one environment is learned. In this paper, we propose an algorithm that uses statistical tests to estimate the probability of different predictive models to fit the current environment. We exploit the underlying probability distributions of predictive models to provide a fast and explainable method to assess and justify the model's beliefs about the current environment. Crucially, by doing so, the method can label incoming data as fitting different models, and thus can continuously train separate models in different environments. This new method is shown to prevent catastrophic forgetting when new environments, or tasks, are encountered. The method can also be of use when AI-informed decisions require justifications because its beliefs are based on statistical evidence from observations. We empirically demonstrate the benefit of the novel method with simulations in a set of POMDP environments.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa