Búsqueda | Portal Regional de la BVS

Off-Policy Prediction Learning: An Empirical Study of Online Algorithms.

Ghiassian, Sina; Rafiee, Banafsheh; Sutton, Richard S.

IEEE Trans Neural Netw Learn Syst ; PP2024 Jun 10.

Artículo en Inglés | MEDLINE | ID: mdl-38857133

RESUMEN

Off-policy prediction-learning the value function for one policy from data generated while following another policy-is one of the most challenging problems in reinforcement learning. This article makes two main contributions: 1) it empirically studies 11 off-policy prediction learning algorithms with emphasis on their sensitivity to parameters, learning speed, and asymptotic error and 2) based on the empirical results, it proposes two step-size adaptation methods called and that help the algorithm with the lowest error from the experimental study learn faster. Many off-policy prediction learning algorithms have been proposed in the past decade, but it remains unclear which algorithms learn faster than others. In this article, we empirically compare 11 off-policy prediction learning algorithms with linear function approximation on three small tasks: the Collision task, the task, and the task. The Collision task is a small off-policy problem analogous to that of an autonomous car trying to predict whether it will collide with an obstacle. The and tasks are designed such that learning fast in them is challenging. In the Rooms task, the product of importance sampling ratios can be as large as 214 . To control the high variance caused by the product of the importance sampling ratios, step size should be set small, which, in turn, slows down learning. The task is more extreme in that the product of the ratios can become as large as 214 × 25 . The algorithms considered are Off-policy TD, five Gradient-TD algorithms, two Emphatic-TD algorithms, Vtrace, and variants of Tree Backup and ABQ that are applicable to the prediction setting. We found that the algorithms' performance is highly affected by the variance induced by the importance sampling ratios. Tree Backup, Vtrace, and ABTDare not affected by the high variance as much as other algorithms, but they restrict the effective bootstrapping parameter in a way that is too limiting for tasks where high variance is not present. We observed that Emphatic TDtends to have lower asymptotic error than other algorithms but might learn more slowly in some cases. Based on the empirical results, we propose two step-size adaptation algorithms, which we collectively refer to as the Ratchet algorithms, with the same underlying idea: keep the step-size parameter as large as possible and ratchet it down only when necessary to avoid overshoot. We show that the Ratchet algorithms are effective by comparing them with other popular step-size adaptation algorithms, such as the Adam optimizer.

From eye-blinks to state construction: Diagnostic benchmarks for online representation learning.

Rafiee, Banafsheh; Abbas, Zaheer; Ghiassian, Sina; Kumaraswamy, Raksha; Sutton, Richard S; Ludvig, Elliot A; White, Adam.

Adapt Behav ; 31(1): 3-19, 2023 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-36618906

RESUMEN

We present three new diagnostic prediction problems inspired by classical-conditioning experiments to facilitate research in online prediction learning. Experiments in classical conditioning show that animals such as rabbits, pigeons, and dogs can make long temporal associations that enable multi-step prediction. To replicate this remarkable ability, an agent must construct an internal state representation that summarizes its interaction history. Recurrent neural networks can automatically construct state and learn temporal associations. However, the current training methods are prohibitively expensive for online prediction-continual learning on every time step-which is the focus of this paper. Our proposed problems test the learning capabilities that animals readily exhibit and highlight the limitations of the current recurrent learning methods. While the proposed problems are nontrivial, they are still amenable to extensive testing and analysis in the small-compute regime, thereby enabling researchers to study issues in isolation, ultimately accelerating progress towards scalable online representation learning methods.

Electrocatalytic oxidation and determination of insulin at nickel oxide nanoparticles-multiwalled carbon nanotube modified screen printed electrode.

Rafiee, Banafsheh; Fakhari, Ali Reza.

Biosens Bioelectron ; 46: 130-5, 2013 Aug 15.

Artículo en Inglés | MEDLINE | ID: mdl-23531859

RESUMEN

Nickel oxide nanoparticles modified nafion-multiwalled carbon nanotubes screen printed electrode (NiONPs/Nafion-MWCNTs/SPE) were prepared using pulsed electrodeposition of NiONPs on the MWCNTs/SPE surface. The size, distribution and structure of the NiONPs/Nafion-MWCNTs were characterized by transmission electron microscopy (TEM) and x-ray diffraction (XRD) and also the results show that NiO nanoparticles were homogeneously electrodeposited on the surfaces of MWCNTs. Also, the electrochemical behavior of NiONPs/Nafion-MWCNTs composites in aqueous alkaline solutions of insulin was studied by cyclic voltammetry, chronoamperometry and electrochemical impedance spectroscopy (EIS). It was found that the prepared nanoparticles have excellent electrocatalytic activity towards insulin oxidation due to special properties of NiO nanoparticles. Cyclic voltammetric studies showed that the NiONPs/Nafion-MWCNTs film modified SPE, lowers the overpotentials and improves electrochemical behavior of insulin oxidation, as compared to the bare SPE. Amperometry was also used to evaluate the analytical performance of modified electrode in the quantitation of insulin. Excellent analytical features, including high sensitivity (1.83 µA/µM), low detection limit (6.1 nM) and satisfactory dynamic range (20.0-260.0 nM), were achieved under optimized conditions. Moreover, these sensors show good repeatability and a high stability after a while or successive potential cycling.

Asunto(s)

Técnicas Biosensibles/instrumentación , Técnicas Electroquímicas/instrumentación , Insulina/análisis , Nanopartículas/química , Nanotubos de Carbono/química , Níquel/química , Electrodos , Galvanoplastia , Límite de Detección , Nanopartículas/ultraestructura , Nanotubos de Carbono/ultraestructura , Oxidación-Reducción , Reproducibilidad de los Resultados

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA