Search | VHL Regional Portal

Acquiring musculoskeletal skills with curriculum-based reinforcement learning.

Chiappa, Alberto Silvio; Tano, Pablo; Patel, Nisheet; Ingster, Abigaïl; Pouget, Alexandre; Mathis, Alexander.

Neuron ; 2024 Oct 01.

Article in English | MEDLINE | ID: mdl-39357519

ABSTRACT

Efficient musculoskeletal simulators and powerful learning algorithms provide computational tools to tackle the grand challenge of understanding biological motor control. Our winning solution for the inaugural NeurIPS MyoChallenge leverages an approach mirroring human skill learning. Using a novel curriculum learning approach, we trained a recurrent neural network to control a realistic model of the human hand with 39 muscles to rotate two Baoding balls in the palm of the hand. In agreement with data from human subjects, the policy uncovers a small number of kinematic synergies, even though it is not explicitly biased toward low-dimensional solutions. However, selectively inactivating parts of the control signal, we found that more dimensions contribute to the task performance than suggested by traditional synergy analysis. Overall, our work illustrates the emerging possibilities at the interface of musculoskeletal physics engines, reinforcement learning, and neuroscience to advance our understanding of biological motor control.

Multi-timescale reinforcement learning in the brain.

Masset, Paul; Tano, Pablo; Kim, HyungGoo R; Malik, Athar N; Pouget, Alexandre; Uchida, Naoshige.

bioRxiv ; 2023 Nov 14.

Article in English | MEDLINE | ID: mdl-38014166

ABSTRACT

To thrive in complex environments, animals and artificial agents must learn to act adaptively to maximize fitness and rewards. Such adaptive behavior can be learned through reinforcement learning1, a class of algorithms that has been successful at training artificial agents2-6 and at characterizing the firing of dopamine neurons in the midbrain7-9. In classical reinforcement learning, agents discount future rewards exponentially according to a single time scale, controlled by the discount factor. Here, we explore the presence of multiple timescales in biological reinforcement learning. We first show that reinforcement agents learning at a multitude of timescales possess distinct computational benefits. Next, we report that dopamine neurons in mice performing two behavioral tasks encode reward prediction error with a diversity of discount time constants. Our model explains the heterogeneity of temporal discounting in both cue-evoked transient responses and slower timescale fluctuations known as dopamine ramps. Crucially, the measured discount factor of individual neurons is correlated across the two tasks suggesting that it is a cell-specific property. Together, our results provide a new paradigm to understand functional heterogeneity in dopamine neurons, a mechanistic basis for the empirical observation that humans and animals use non-exponential discounts in many situations10-14, and open new avenues for the design of more efficient reinforcement learning algorithms.

A logical framework to study concept-learning biases in the presence of multiple explanations.

Abriola, Sergio; Tano, Pablo; Romano, Sergio; Figueira, Santiago.

Behav Res Methods ; 54(1): 233-251, 2022 02.

Article in English | MEDLINE | ID: mdl-34145547

ABSTRACT

When people seek to understand concepts from an incomplete set of examples and counterexamples, there is usually an exponentially large number of classification rules that can correctly classify the observed data, depending on which features of the examples are used to construct these rules. A mechanistic approximation of human concept-learning should help to explain how humans prefer some rules over others when there are many that can be used to correctly classify the observed data. Here, we exploit the tools of propositional logic to develop an experimental framework that controls the minimal rules that are simultaneously consistent with the presented examples. For example, our framework allows us to present participants with concepts consistent with a disjunction and also with a conjunction, depending on which features are used to build the rule. Similarly, it allows us to present concepts that are simultaneously consistent with two or more rules of different complexity and using different features. Importantly, our framework fully controls which minimal rules compete to explain the examples and is able to recover the features used by the participant to build the classification rule, without relying on supplementary attention-tracking mechanisms (e.g. eye-tracking). We exploit our framework in an experiment with a sequence of such competitive trials, illustrating the emergence of various transfer effects that bias participants' prior attention to specific sets of features during learning.

Subject(s)

Concept Formation , Logic , Bias , Humans , Learning

Towards a more flexible language of thought: Bayesian grammar updates after each concept exposure.

Tano, Pablo; Romano, Sergio; Sigman, Mariano; Salles, Alejo; Figueira, Santiago.

Phys Rev E ; 101(4-1): 042128, 2020 Apr.

Article in English | MEDLINE | ID: mdl-32422757

ABSTRACT

Recent approaches to human concept learning have successfully combined the power of symbolic, infinitely productive rule systems and statistical learning to explain our ability to learn new concepts from just a few examples. The aim of most of these studies is to reveal the underlying language structuring these representations and providing a general substrate for thought. However, describing a model of thought that is fixed once trained is against the extensive literature that shows how experience shapes concept learning. Here, we ask about the plasticity of these symbolic descriptive languages. We perform a concept learning experiment that demonstrates that humans can change very rapidly the repertoire of symbols they use to identify concepts, by compiling expressions that are frequently used into new symbols of the language. The pattern of concept learning times is accurately described by a Bayesian agent that rationally updates the probability of compiling a new expression according to how useful it has been to compress concepts so far. By portraying the language of thought as a flexible system of rules, we also highlight the difficulties to pin it down empirically.

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL