Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros










Base de datos
Asunto principal
Intervalo de año de publicación
1.
IEEE Trans Cybern ; 52(3): 1515-1526, 2022 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-32452788

RESUMEN

Training agents via deep reinforcement learning with sparse rewards for robotic control tasks in vast state space are a big challenge, due to the rareness of successful experience. To solve this problem, recent breakthrough methods, the hindsight experience replay (HER) and aggressive rewards to counter bias in HER (ARCHER), use unsuccessful experiences and consider them as successful experiences achieving different goals, for example, hindsight experiences. According to these methods, hindsight experience is used at a fixed sampling rate during training. However, this usage of hindsight experience introduces bias, due to a distinct optimal policy, and does not allow the hindsight experience to take variable importance at different stages of training. In this article, we investigate the impact of a variable sampling rate, representing the variable rate of hindsight experience, on training performance and propose a sampling rate decay strategy that decreases the number of hindsight experiences as training proceeds. The proposed method is validated with three robotic control tasks included in the OpenAI Gym suite. The experimental results demonstrate that the proposed method achieves improved training performance and increased convergence speed over the HER and ARCHER with two of the three tasks and comparable training performance and convergence speed with the other one.


Asunto(s)
Robótica , Refuerzo en Psicología
2.
PeerJ Comput Sci ; 7: e718, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34616894

RESUMEN

In multi-agent reinforcement learning, the cooperative learning behavior of agents is very important. In the field of heterogeneous multi-agent reinforcement learning, cooperative behavior among different types of agents in a group is pursued. Learning a joint-action set during centralized training is an attractive way to obtain such cooperative behavior; however, this method brings limited learning performance with heterogeneous agents. To improve the learning performance of heterogeneous agents during centralized training, two-stage heterogeneous centralized training which allows the training of multiple roles of heterogeneous agents is proposed. During training, two training processes are conducted in a series. One of the two stages is to attempt training each agent according to its role, aiming at the maximization of individual role rewards. The other is for training the agents as a whole to make them learn cooperative behaviors while attempting to maximize shared collective rewards, e.g., team rewards. Because these two training processes are conducted in a series in every time step, agents can learn how to maximize role rewards and team rewards simultaneously. The proposed method is applied to 5 versus 5 AI robot soccer for validation. The experiments are performed in a robot soccer environment using Webots robot simulation software. Simulation results show that the proposed method can train the robots of the robot soccer team effectively, achieving higher role rewards and higher team rewards as compared to other three approaches that can be used to solve problems of training cooperative multi-agent. Quantitatively, a team trained by the proposed method improves the score concede rate by 5% to 30% when compared to teams trained with the other approaches in matches against evaluation teams.

3.
Sensors (Basel) ; 17(3)2017 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-28335561

RESUMEN

A cooperative cognitive radio scheme exploiting primary signals for energy harvesting is proposed. The relay sensor node denoted as the secondary transmitter (ST) harvests energy from the primary signal transmitted from the primary transmitter, and then uses it to transmit power superposed codes of the secrecy signal of the secondary network (SN) and of the primary signal of the primary network (PN). The harvested energy is split into two parts according to a power splitting ratio, one for decoding the primary signal and the other for charging the battery. In power superposition coding, the amount of fractional power allocated to the primary signal is determined by another power allocation parameter (e.g., the power sharing coefficient). Our main concern is to investigate the impact of the two power parameters on the performances of the PN and the SN. Analytical or mathematical expressions of the outage probabilities of the PN and the SN are derived in terms of the power parameters, location of the ST, channel gain, and other system related parameters. A jointly optimal power splitting ratio and power sharing coefficient for achieving target outage probabilities of the PN and the SN, are found using these expressions and validated by simulations.

4.
Sensors (Basel) ; 17(1)2017 Jan 10.
Artículo en Inglés | MEDLINE | ID: mdl-28075372

RESUMEN

Provision of energy to wireless sensor networks is crucial for their sustainable operation. Sensor nodes are typically equipped with batteries as their operating energy sources. However, when the sensor nodes are sited in almost inaccessible locations, replacing their batteries incurs high maintenance cost. Under such conditions, wireless charging of sensor nodes by a mobile charger with an antenna can be an efficient solution. When charging distributed sensor nodes, a directional antenna, rather than an omnidirectional antenna, is more energy-efficient because of smaller proportion of off-target radiation. In addition, for densely distributed sensor nodes, it can be more effective for some undercharged sensor nodes to harvest energy from neighboring overcharged sensor nodes than from the remote mobile charger, because this reduces the pathloss of charging signal due to smaller distances. In this paper, we propose a hybrid charging scheme that combines charging by a mobile charger with a directional antenna, and energy trading, e.g., transferring and harvesting, between neighboring sensor nodes. The proposed scheme is compared with other charging scheme. Simulations demonstrate that the hybrid charging scheme with a directional antenna achieves a significant reduction in the total charging time required for all sensor nodes to reach a target energy level.

5.
Opt Express ; 23(9): 11264-71, 2015 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-25969222

RESUMEN

We present a reduced-phase triple-illumination interferometer (RPTII) as a novel single-shot technique to increase the precision of dual-illumination optical phase unwrapping techniques. The technique employs two measurement ranges to record both low-precision unwrapped and high-precision wrapped phases. To unwrap the high-precision phase, a hierarchical optical phase unwrapping algorithm is used with the low-precision unwrapped phase. The feasibility of this technique is demonstrated by measuring a stepped object with a height 2100 times greater than the wavelength of the source. The phase is reconstructed without applying any numerical unwrapping algorithms, and its noise level is decreased by a factor of ten.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...