Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Sci Rep ; 14(1): 2843, 2024 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-38310201

RESUMO

One of the challenges of technology-assisted motor learning is how to adapt practice to facilitate learning. Random practice has been shown to promote long-term learning. However, it does not adapt to the learner's specific learning requirements. Previous attempts to adapt learning considered the skill level of learners from past training sessions. This study investigates the effects of personalizing practice in real time, through a curriculum learning approach, where a curriculum of tasks is built by considering consecutive performance differences for each task. 12 participants were allocated to each of three training conditions in an experiment which required performing a steering task to drive a cursor in an arc channel. The curriculum learning approach was compared to two other conditions: random practice and another adaptive practice, which does not consider the learning evolution. The curriculum learning practice outperformed the random practice in effectively increasing movement smoothness at post-test and outperformed both the random practice and the adaptive practice on transfer tests. The adaptation of practice through the curriculum learning approach also made learners' skills more uniform. Based on these findings, we anticipate that future research will explore the use of curriculum learning in interactive training tools to support motor skill learning, such as rehabilitation.

3.
Commun Biol ; 3(1): 34, 2020 01 21.
Artigo em Inglês | MEDLINE | ID: mdl-31965053

RESUMO

Can decisions be made solely by chance? Can variability be intrinsic to the decision-maker or is it inherited from environmental conditions? To investigate these questions, we designed a deterministic setting in which mice are rewarded for non-repetitive choice sequences, and modeled the experiment using reinforcement learning. We found that mice progressively increased their choice variability. Although an optimal strategy based on sequences learning was theoretically possible and would be more rewarding, animals used a pseudo-random selection which ensures high success rate. This was not the case if the animal is exposed to a uniform probabilistic reward delivery. We also show that mice were blind to changes in the temporal structure of reward delivery once they learned to choose at random. Overall, our results demonstrate that a decision-making process can self-generate variability and randomness, even when the rules governing reward delivery are neither stochastic nor volatile.


Assuntos
Comportamento Animal , Comportamento de Escolha , Algoritmos , Animais , Teorema de Bayes , Aprendizagem , Masculino , Cadeias de Markov , Memória , Camundongos , Modelos Teóricos
4.
Front Robot AI ; 7: 61, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33501229

RESUMO

Producing feasible motions for highly redundant robots, such as humanoids, is a complicated and high-dimensional problem. Model-based whole-body control of such robots can generate complex dynamic behaviors through the simultaneous execution of multiple tasks. Unfortunately, tasks are generally planned without close consideration for the underlying controller being used, or the other tasks being executed, and are often infeasible when executed on the robot. Consequently, there is no guarantee that the motion will be accomplished. In this work, we develop a proof-of-concept optimization loop which automatically improves task feasibility using model-free policy search in conjunction with model-based whole-body control. This combination allows problems to be solved, which would be otherwise intractable using simply one or the other. Through experiments on both the simulated and real iCub humanoid robot, we show that by optimizing task feasibility, initially infeasible complex dynamic motions can be realized-specifically, a sit-to-stand transition. These experiments can be viewed in the accompanying Video S1.

5.
Neural Netw ; 113: 28-40, 2019 May.
Artigo em Inglês | MEDLINE | ID: mdl-30780043

RESUMO

Continuous action policy search is currently the focus of intensive research, driven both by the recent success of deep reinforcement learning algorithms and the emergence of competitors based on evolutionary algorithms. In this paper, we present a broad survey of policy search methods, providing a unified perspective on very different approaches, including also Bayesian Optimization and directed exploration methods. The main message of this overview is in the relationship between the families of methods, but we also outline some factors underlying sample efficiency properties of the various approaches.


Assuntos
Aprendizado Profundo/tendências , Políticas , Reforço Psicológico , Algoritmos , Teorema de Bayes
6.
Front Neurorobot ; 12: 59, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30319388

RESUMO

Reinforcement learning (RL) aims at building a policy that maximizes a task-related reward within a given domain. When the domain is known, i.e., when its states, actions and reward are defined, Markov Decision Processes (MDPs) provide a convenient theoretical framework to formalize RL. But in an open-ended learning process, an agent or robot must solve an unbounded sequence of tasks that are not known in advance and the corresponding MDPs cannot be built at design time. This defines the main challenges of open-ended learning: how can the agent learn how to behave appropriately when the adequate states, actions and rewards representations are not given? In this paper, we propose a conceptual framework to address this question. We assume an agent endowed with low-level perception and action capabilities. This agent receives an external reward when it faces a task. It must discover the state and action representations that will let it cast the tasks as MDPs in order to solve them by RL. The relevance of the action or state representation is critical for the agent to learn efficiently. Considering that the agent starts with a low level, task-agnostic state and action spaces based on its low-level perception and action capabilities, we describe open-ended learning as the challenge of building the adequate representation of states and actions, i.e., of redescribing available representations. We suggest an iterative approach to this problem based on several successive Representational Redescription processes, and highlight the corresponding challenges in which intrinsic motivations play a key role.

7.
Front Hum Neurosci ; 12: 143, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29697699

RESUMO

[This corrects the article on p. 615 in vol. 11, PMID: 29379424.].

8.
Front Hum Neurosci ; 12: 76, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29488507

RESUMO

[This corrects the article on p. 615 in vol. 11, PMID: 29379424.].

9.
Front Robot AI ; 5: 70, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-33500949

RESUMO

Perceiving the surrounding environment in terms of objects is useful for any general purpose intelligent agent. In this paper, we investigate a fundamental mechanism making object perception possible, namely the identification of spatio-temporally invariant structures in the sensorimotor experience of an agent. We take inspiration from the Sensorimotor Contingencies Theory to define a computational model of this mechanism through a sensorimotor, unsupervised and predictive approach. Our model is based on processing the unsupervised interaction of an artificial agent with its environment. We show how spatio-temporally invariant structures in the environment induce regularities in the sensorimotor experience of an agent, and how this agent, while building a predictive model of its sensorimotor experience, can capture them as densely connected subgraphs in a graph of sensory states connected by motor commands. Our approach is focused on elementary mechanisms, and is illustrated with a set of simple experiments in which an agent interacts with an environment. We show how the agent can build an internal model of moving but spatio-temporally invariant structures by performing a Spectral Clustering of the graph modeling its overall sensorimotor experiences. We systematically examine properties of the model, shedding light more globally on the specificities of the paradigm with respect to methods based on the supervised processing of collections of static images.

10.
Front Hum Neurosci ; 11: 615, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29379424

RESUMO

Two basic trade-offs interact while our brain decides how to move our body. First, with the cost-benefit trade-off, the brain trades between the importance of moving faster toward a target that is more rewarding and the increased muscular cost resulting from a faster movement. Second, with the speed-accuracy trade-off, the brain trades between how accurate the movement needs to be and the time it takes to achieve such accuracy. So far, these two trade-offs have been well studied in isolation, despite their obvious interdependence. To overcome this limitation, we propose a new model that is able to simultaneously account for both trade-offs. The model assumes that the central nervous system maximizes the expected utility resulting from the potential reward and the cost over the repetition of many movements, taking into account the probability to miss the target. The resulting model is able to account for both the speed-accuracy and the cost-benefit trade-offs. To validate the proposed hypothesis, we confront the properties of the computational model to data from an experimental study where subjects have to reach for targets by performing arm movements in a horizontal plane. The results qualitatively show that the proposed model successfully accounts for both cost-benefit and speed-accuracy trade-offs.

11.
Neural Netw ; 69: 60-79, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26087306

RESUMO

Regression is the process of learning relationships between inputs and continuous outputs from example data, which enables predictions for novel inputs. The history of regression is closely related to the history of artificial neural networks since the seminal work of Rosenblatt (1958). The aims of this paper are to provide an overview of many regression algorithms, and to demonstrate how the function representation whose parameters they regress fall into two classes: a weighted sum of basis functions, or a mixture of linear models. Furthermore, we show that the former is a special case of the latter. Our ambition is thus to provide a deep understanding of the relationship between these algorithms, that, despite being derived from very different principles, use a function representation that can be captured within one unified model. Finally, step-by-step derivations of the algorithms from first principles and visualizations of their inner workings allow this article to be used as a tutorial for those new to regression.


Assuntos
Algoritmos , Redes Neurais de Computação , Humanos , Modelos Lineares , Modelos Teóricos , Distribuição Normal
12.
J Physiol Paris ; 109(1-3): 78-86, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-24954026

RESUMO

Gaining a better understanding of the biological mechanisms underlying the individual variation observed in response to rewards and reward cues could help to identify and treat individuals more prone to disorders of impulsive control, such as addiction. Variation in response to reward cues is captured in rats undergoing autoshaping experiments where the appearance of a lever precedes food delivery. Although no response is required for food to be delivered, some rats (goal-trackers) learn to approach and avidly engage the magazine until food delivery, whereas other rats (sign-trackers) come to approach and engage avidly the lever. The impulsive and often maladaptive characteristics of the latter response are reminiscent of addictive behaviour in humans. In a previous article, we developed a computational model accounting for a set of experimental data regarding sign-trackers and goal-trackers. Here we show new simulations of the model to draw experimental predictions that could help further validate or refute the model. In particular, we apply the model to new experimental protocols such as injecting flupentixol locally into the core of the nucleus accumbens rather than systemically, and lesioning of the core of the nucleus accumbens before or after conditioning. In addition, we discuss the possibility of removing the food magazine during the inter-trial interval. The predictions from this revised model will help us better understand the role of different brain regions in the behaviours expressed by sign-trackers and goal-trackers.


Assuntos
Simulação por Computador , Objetivos , Modelos Neurológicos , Recompensa , Animais , Condicionamento Clássico , Sinais (Psicologia) , Valor Preditivo dos Testes , Probabilidade
13.
PLoS One ; 9(10): e111050, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25347531

RESUMO

Animals, including Humans, are prone to develop persistent maladaptive and suboptimal behaviours. Some of these behaviours have been suggested to arise from interactions between brain systems of Pavlovian conditioning, the acquisition of responses to initially neutral stimuli previously paired with rewards, and instrumental conditioning, the acquisition of active behaviours leading to rewards. However the mechanics of these systems and their interactions are still unclear. While extensively studied independently, few models have been developed to account for these interactions. On some experiment, pigeons have been observed to display a maladaptive behaviour that some suggest to involve conflicts between Pavlovian and instrumental conditioning. In a procedure referred as negative automaintenance, a key light is paired with the subsequent delivery of food, however any peck towards the key light results in the omission of the reward. Studies showed that in such procedure some pigeons persisted in pecking to a substantial level despite its negative consequence, while others learned to refrain from pecking and maximized their cumulative rewards. Furthermore, the pigeons that were unable to refrain from pecking could nevertheless shift their pecks towards a harmless alternative key light. We confronted a computational model that combines dual-learning systems and factored representations, recently developed to account for sign-tracking and goal-tracking behaviours in rats, to these negative automaintenance experimental data. We show that it can explain the variability of the observed behaviours and the capacity of alternative key lights to distract pigeons from their detrimental behaviours. These results confirm the proposed model as an interesting tool to reproduce experiments that could involve interactions between Pavlovian and instrumental conditioning. The model allows us to draw predictions that may be experimentally verified, which could help further investigate the neural mechanisms underlying theses interactions.


Assuntos
Comportamento Animal , Columbidae , Algoritmos , Animais , Aprendizagem , Modelos Teóricos
14.
Front Neurorobot ; 8: 5, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24596554

RESUMO

We hypothesize that the initiative of a robot during a collaborative task with a human can influence the pace of interaction, the human response to attention cues, and the perceived engagement. We propose an object learning experiment where the human interacts in a natural way with the humanoid iCub. Through a two-phases scenario, the human teaches the robot about the properties of some objects. We compare the effect of the initiator of the task in the teaching phase (human or robot) on the rhythm of the interaction in the verification phase. We measure the reaction time of the human gaze when responding to attention utterances of the robot. Our experiments show that when the robot is the initiator of the learning task, the pace of interaction is higher and the reaction to attention cues faster. Subjective evaluations suggest that the initiating role of the robot, however, does not affect the perceived engagement. Moreover, subjective and third-person evaluations of the interaction task suggest that the attentive mechanism we implemented in the humanoid robot iCub is able to arouse engagement and make the robot's behavior readable.

15.
PLoS Comput Biol ; 10(2): e1003466, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24550719

RESUMO

Reinforcement Learning has greatly influenced models of conditioning, providing powerful explanations of acquired behaviour and underlying physiological observations. However, in recent autoshaping experiments in rats, variation in the form of Pavlovian conditioned responses (CRs) and associated dopamine activity, have questioned the classical hypothesis that phasic dopamine activity corresponds to a reward prediction error-like signal arising from a classical Model-Free system, necessary for Pavlovian conditioning. Over the course of Pavlovian conditioning using food as the unconditioned stimulus (US), some rats (sign-trackers) come to approach and engage the conditioned stimulus (CS) itself - a lever - more and more avidly, whereas other rats (goal-trackers) learn to approach the location of food delivery upon CS presentation. Importantly, although both sign-trackers and goal-trackers learn the CS-US association equally well, only in sign-trackers does phasic dopamine activity show classical reward prediction error-like bursts. Furthermore, neither the acquisition nor the expression of a goal-tracking CR is dopamine-dependent. Here we present a computational model that can account for such individual variations. We show that a combination of a Model-Based system and a revised Model-Free system can account for the development of distinct CRs in rats. Moreover, we show that revising a classical Model-Free system to individually process stimuli by using factored representations can explain why classical dopaminergic patterns may be observed for some rats and not for others depending on the CR they develop. In addition, the model can account for other behavioural and pharmacological results obtained using the same, or similar, autoshaping procedures. Finally, the model makes it possible to draw a set of experimental predictions that may be verified in a modified experimental protocol. We suggest that further investigation of factored representations in computational neuroscience studies may be useful.


Assuntos
Condicionamento Psicológico , Modelos Psicológicos , Algoritmos , Animais , Comportamento Animal/efeitos dos fármacos , Comportamento Animal/fisiologia , Encéfalo/fisiologia , Biologia Computacional , Condicionamento Psicológico/efeitos dos fármacos , Condicionamento Psicológico/fisiologia , Dopamina/fisiologia , Antagonistas de Dopamina/administração & dosagem , Flupentixol/administração & dosagem , Modelos Neurológicos , Ratos , Reforço Psicológico , Biologia de Sistemas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA