Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 17.764
Filtrar
Más filtros

Intervalo de año de publicación
1.
Cell ; 186(5): 975-986.e13, 2023 03 02.
Artículo en Inglés | MEDLINE | ID: mdl-36868215

RESUMEN

Gas vesicles are gas-filled nanocompartments that allow a diverse group of bacteria and archaea to control their buoyancy. The molecular basis of their properties and assembly remains unclear. Here, we report the 3.2 Å cryo-EM structure of the gas vesicle shell made from the structural protein GvpA that self-assembles into hollow helical cylinders closed off by cone-shaped tips. Two helical half shells connect through a characteristic arrangement of GvpA monomers, suggesting a mechanism of gas vesicle biogenesis. The fold of GvpA features a corrugated wall structure typical for force-bearing thin-walled cylinders. Small pores enable gas molecules to diffuse across the shell, while the exceptionally hydrophobic interior surface effectively repels water. Comparative structural analysis confirms the evolutionary conservation of gas vesicle assemblies and demonstrates molecular features of shell reinforcement by GvpC. Our findings will further research into gas vesicle biology and facilitate molecular engineering of gas vesicles for ultrasound imaging.


Asunto(s)
Archaea , Evolución Biológica , Microscopía por Crioelectrón , Ingeniería , Refuerzo en Psicología
2.
Cell ; 183(4): 954-967.e21, 2020 11 12.
Artículo en Inglés | MEDLINE | ID: mdl-33058757

RESUMEN

The curse of dimensionality plagues models of reinforcement learning and decision making. The process of abstraction solves this by constructing variables describing features shared by different instances, reducing dimensionality and enabling generalization in novel situations. Here, we characterized neural representations in monkeys performing a task described by different hidden and explicit variables. Abstraction was defined operationally using the generalization performance of neural decoders across task conditions not used for training, which requires a particular geometry of neural representations. Neural ensembles in prefrontal cortex, hippocampus, and simulated neural networks simultaneously represented multiple variables in a geometry reflecting abstraction but that still allowed a linear classifier to decode a large number of other variables (high shattering dimensionality). Furthermore, this geometry changed in relation to task events and performance. These findings elucidate how the brain and artificial systems represent variables in an abstract format while preserving the advantages conferred by high shattering dimensionality.


Asunto(s)
Hipocampo/anatomía & histología , Corteza Prefrontal/anatomía & histología , Animales , Conducta Animal , Mapeo Encefálico , Simulación por Computador , Hipocampo/fisiología , Aprendizaje , Macaca mulatta , Masculino , Modelos Neurológicos , Redes Neurales de la Computación , Neuronas/fisiología , Corteza Prefrontal/fisiología , Refuerzo en Psicología , Análisis y Desempeño de Tareas
3.
Cell ; 183(1): 211-227.e20, 2020 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-32937106

RESUMEN

The striosome compartment within the dorsal striatum has been implicated in reinforcement learning and regulation of motivation, but how striosomal neurons contribute to these functions remains elusive. Here, we show that a genetically identified striosomal population, which expresses the Teashirt family zinc finger 1 (Tshz1) and belongs to the direct pathway, drives negative reinforcement and is essential for aversive learning in mice. Contrasting a "conventional" striosomal direct pathway, the Tshz1 neurons cause aversion, movement suppression, and negative reinforcement once activated, and they receive a distinct set of synaptic inputs. These neurons are predominantly excited by punishment rather than reward and represent the anticipation of punishment or the motivation for avoidance. Furthermore, inhibiting these neurons impairs punishment-based learning without affecting reward learning or movement. These results establish a major role of striosomal neurons in behaviors reinforced by punishment and moreover uncover functions of the direct pathway unaccounted for in classic models.


Asunto(s)
Reacción de Prevención/fisiología , Cuerpo Estriado/fisiología , Proteínas de Homeodominio/genética , Proteínas Represoras/genética , Animales , Ganglios Basales , Femenino , Proteínas de Homeodominio/metabolismo , Aprendizaje/fisiología , Masculino , Ratones , Ratones Endogámicos C57BL , Ratones Noqueados , Motivación , Neuronas/fisiología , Castigo , Refuerzo en Psicología , Proteínas Represoras/metabolismo
4.
Annu Rev Neurosci ; 46: 359-380, 2023 07 10.
Artículo en Inglés | MEDLINE | ID: mdl-37068787

RESUMEN

Striosomes form neurochemically specialized compartments of the striatum embedded in a large matrix made up of modules called matrisomes. Striosome-matrix architecture is multiplexed with the canonical direct-indirect organization of the striatum. Striosomal functions remain to be fully clarified, but key information is emerging. First, striosomes powerfully innervate nigral dopamine-containing neurons and can completely shut down their activity, with a following rebound excitation. Second, striosomes receive limbic and cognition-related corticostriatal afferents and are dynamically modulated in relation to value-based actions. Third, striosomes are spatially interspersed among matrisomes and interneurons and are influenced by local and global neuromodulatory and oscillatory activities. Fourth, striosomes tune engagement and the motivation to perform reinforcement learning, to manifest stereotypical behaviors, and to navigate valence conflicts and valence discriminations. We suggest that, at an algorithmic level, striosomes could serve as distributed scaffolds to provide formats of the striatal computations generated through development and refined through learning. We propose that striosomes affect subjective states. By transforming corticothalamic and other inputs to the functional formats of the striatum, they could implement state transitions in nigro-striato-nigral circuits to affect bodily and cognitive actions according to internal motives whose functions are compromised in neuropsychiatric conditions.


Asunto(s)
Ganglios Basales , Volición , Ganglios Basales/fisiología , Cuerpo Estriado/fisiología , Interneuronas , Refuerzo en Psicología
5.
Nature ; 626(7999): 583-592, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38092040

RESUMEN

Animals exhibit a diverse behavioural repertoire when exploring new environments and can learn which actions or action sequences produce positive outcomes. Dopamine release after encountering a reward is critical for reinforcing reward-producing actions1-3. However, it has been challenging to understand how credit is assigned to the exact action that produced the dopamine release during continuous behaviour. Here we investigated this problem in mice using a self-stimulation paradigm in which specific spontaneous movements triggered optogenetic stimulation of dopaminergic neurons. Dopamine self-stimulation rapidly and dynamically changes the structure of the entire behavioural repertoire. Initial stimulations reinforced not only the stimulation-producing target action, but also actions similar to the target action and actions that occurred a few seconds before stimulation. Repeated pairings led to a gradual refinement of the behavioural repertoire to home in on the target action. Reinforcement of action sequences revealed further temporal dependencies of refinement. Action pairs spontaneously separated by long time intervals promoted a stepwise credit assignment, with early refinement of actions most proximal to stimulation and subsequent refinement of more distal actions. Thus, a retrospective reinforcement mechanism promotes not only reinforcement, but also gradual refinement of the entire behavioural repertoire to assign credit to specific actions and action sequences that lead to dopamine release.


Asunto(s)
Dopamina , Aprendizaje , Refuerzo en Psicología , Recompensa , Animales , Ratones , Toma de Decisiones/fisiología , Dopamina/metabolismo , Neuronas Dopaminérgicas/metabolismo , Aprendizaje/fisiología , Optogenética , Factores de Tiempo , Modelos Psicológicos , Modelos Neurológicos
6.
Nature ; 630(8015): 141-148, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38778097

RESUMEN

Fentanyl is a powerful painkiller that elicits euphoria and positive reinforcement1. Fentanyl also leads to dependence, defined by the aversive withdrawal syndrome, which fuels negative reinforcement2,3 (that is, individuals retake the drug to avoid withdrawal). Positive and negative reinforcement maintain opioid consumption, which leads to addiction in one-fourth of users, the largest fraction for all addictive drugs4. Among the opioid receptors, µ-opioid receptors have a key role5, yet the induction loci of circuit adaptations that eventually lead to addiction remain unknown. Here we injected mice with fentanyl to acutely inhibit γ-aminobutyric acid-expressing neurons in the ventral tegmental area (VTA), causing disinhibition of dopamine neurons, which eventually increased dopamine in the nucleus accumbens. Knockdown of µ-opioid receptors in VTA abolished dopamine transients and positive reinforcement, but withdrawal remained unchanged. We identified neurons expressing µ-opioid receptors in the central amygdala (CeA) whose activity was enhanced during withdrawal. Knockdown of µ-opioid receptors in CeA eliminated aversive symptoms, suggesting that they mediate negative reinforcement. Thus, optogenetic stimulation caused place aversion, and mice readily learned to press a lever to pause optogenetic stimulation of CeA neurons that express µ-opioid receptors. Our study parses the neuronal populations that trigger positive and negative reinforcement in VTA and CeA, respectively. We lay out the circuit organization to develop interventions for reducing fentanyl addiction and facilitating rehabilitation.


Asunto(s)
Fentanilo , Receptores Opioides mu , Refuerzo en Psicología , Animales , Femenino , Masculino , Ratones , Analgésicos Opioides/farmacología , Analgésicos Opioides/administración & dosificación , Núcleo Amigdalino Central/citología , Núcleo Amigdalino Central/efectos de los fármacos , Núcleo Amigdalino Central/metabolismo , Dopamina/metabolismo , Neuronas Dopaminérgicas/efectos de los fármacos , Neuronas Dopaminérgicas/metabolismo , Fentanilo/farmacología , Ratones Endogámicos C57BL , Núcleo Accumbens/citología , Núcleo Accumbens/efectos de los fármacos , Núcleo Accumbens/metabolismo , Trastornos Relacionados con Opioides/metabolismo , Trastornos Relacionados con Opioides/patología , Optogenética , Receptores Opioides mu/metabolismo , Síndrome de Abstinencia a Sustancias/metabolismo , Síndrome de Abstinencia a Sustancias/patología , Área Tegmental Ventral/citología , Área Tegmental Ventral/efectos de los fármacos , Área Tegmental Ventral/metabolismo
7.
Annu Rev Neurosci ; 44: 173-195, 2021 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-33667115

RESUMEN

Addiction is a disease characterized by compulsive drug seeking and consumption observed in 20-30% of users. An addicted individual will favor drug reward over natural rewards, despite major negative consequences. Mechanistic research on rodents modeling core components of the disease has identified altered synaptic transmission as the functional substrate of pathological behavior. While the initial version of a circuit model for addiction focused on early drug adaptive behaviors observed in all individuals, it fell short of accounting for the stochastic nature of the transition to compulsion. The model builds on the initial pharmacological effect common to all addictive drugs-an increase in dopamine levels in the mesolimbic system. Here, we consolidate this early model by integrating circuits underlying compulsion and negative reinforcement. We discuss the genetic and epigenetic correlates of individual vulnerability. Many recent data converge on a gain-of-function explanation for circuit remodeling, revealing blueprints for novel addiction therapies.


Asunto(s)
Conducta Adictiva , Trastornos Relacionados con Sustancias , Comportamiento de Búsqueda de Drogas , Humanos , Refuerzo en Psicología , Recompensa
8.
Nature ; 614(7946): 108-117, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36653449

RESUMEN

Spontaneous animal behaviour is built from action modules that are concatenated by the brain into sequences1,2. However, the neural mechanisms that guide the composition of naturalistic, self-motivated behaviour remain unknown. Here we show that dopamine systematically fluctuates in the dorsolateral striatum (DLS) as mice spontaneously express sub-second behavioural modules, despite the absence of task structure, sensory cues or exogenous reward. Photometric recordings and calibrated closed-loop optogenetic manipulations during open field behaviour demonstrate that DLS dopamine fluctuations increase sequence variation over seconds, reinforce the use of associated behavioural modules over minutes, and modulate the vigour with which modules are expressed, without directly influencing movement initiation or moment-to-moment kinematics. Although the reinforcing effects of optogenetic DLS dopamine manipulations vary across behavioural modules and individual mice, these differences are well predicted by observed variation in the relationships between endogenous dopamine and module use. Consistent with the possibility that DLS dopamine fluctuations act as a teaching signal, mice build sequences during exploration as if to maximize dopamine. Together, these findings suggest a model in which the same circuits and computations that govern action choices in structured tasks have a key role in sculpting the content of unconstrained, high-dimensional, spontaneous behaviour.


Asunto(s)
Conducta Animal , Refuerzo en Psicología , Recompensa , Animales , Ratones , Cuerpo Estriado/metabolismo , Dopamina/metabolismo , Señales (Psicología) , Optogenética , Fotometría
9.
Nature ; 614(7947): 294-302, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36653450

RESUMEN

Recent success in training artificial agents and robots derives from a combination of direct learning of behavioural policies and indirect learning through value functions1-3. Policy learning and value learning use distinct algorithms that optimize behavioural performance and reward prediction, respectively. In animals, behavioural learning and the role of mesolimbic dopamine signalling have been extensively evaluated with respect to reward prediction4; however, so far there has been little consideration of how direct policy learning might inform our understanding5. Here we used a comprehensive dataset of orofacial and body movements to understand how behavioural policies evolved as naive, head-restrained mice learned a trace conditioning paradigm. Individual differences in initial dopaminergic reward responses correlated with the emergence of learned behavioural policy, but not the emergence of putative value encoding for a predictive cue. Likewise, physiologically calibrated manipulations of mesolimbic dopamine produced several effects inconsistent with value learning but predicted by a neural-network-based model that used dopamine signals to set an adaptive rate, not an error signal, for behavioural policy learning. This work provides strong evidence that phasic dopamine activity can regulate direct learning of behavioural policies, expanding the explanatory power of reinforcement learning models for animal learning6.


Asunto(s)
Conducta Animal , Dopamina , Aprendizaje , Vías Nerviosas , Refuerzo en Psicología , Animales , Ratones , Algoritmos , Dopamina/metabolismo , Redes Neurales de la Computación , Recompensa , Conjuntos de Datos como Asunto , Señales (Psicología) , Condicionamiento Psicológico , Movimiento , Cabeza
10.
Nat Rev Neurosci ; 24(12): 761-777, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37891399

RESUMEN

Many social behaviours are evolutionarily conserved and are essential for the healthy development of an individual. The neuropeptide oxytocin (OXT) is crucial for the fine-tuned regulation of social interactions in mammals. The advent and application of state-of-the-art methodological approaches that allow the activity of neuronal circuits involving OXT to be monitored and functionally manipulated in laboratory mammals have deepened our understanding of the roles of OXT in these behaviours. In this Review, we discuss how OXT promotes the sensory detection and evaluation of social cues, the subsequent approach and display of social behaviour, and the rewarding consequences of social interactions in selected reproductive and non-reproductive social behaviours. Social stressors - such as social isolation, exposure to social defeat or social trauma, and partner loss - are often paralleled by maladaptations of the OXT system, and restoring OXT system functioning can reinstate socio-emotional allostasis. Thus, the OXT system acts as a dynamic mediator of appropriate behavioural adaptations to environmental challenges by enhancing and reinforcing social salience and buffering social stress.


Asunto(s)
Señales (Psicología) , Oxitocina , Animales , Humanos , Refuerzo en Psicología , Conducta Social , Mamíferos , Receptores de Oxitocina/fisiología
11.
Nature ; 602(7896): 223-228, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-35140384

RESUMEN

Many potential applications of artificial intelligence involve making real-time decisions in physical systems while interacting with humans. Automobile racing represents an extreme example of these conditions; drivers must execute complex tactical manoeuvres to pass or block opponents while operating their vehicles at their traction limits1. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the non-linear control challenges of real race cars while also encapsulating the complex multi-agent interactions. Here we describe how we trained agents for Gran Turismo that can compete with the world's best e-sports drivers. We combine state-of-the-art, model-free, deep reinforcement learning algorithms with mixed-scenario training to learn an integrated control policy that combines exceptional speed with impressive tactics. In addition, we construct a reward function that enables the agent to be competitive while adhering to racing's important, but under-specified, sportsmanship rules. We demonstrate the capabilities of our agent, Gran Turismo Sophy, by winning a head-to-head competition against four of the world's best Gran Turismo drivers. By describing how we trained championship-level racers, we demonstrate the possibilities and challenges of using these techniques to control complex dynamical systems in domains where agents must respect imprecisely defined human norms.


Asunto(s)
Conducción de Automóvil , Aprendizaje Profundo , Refuerzo en Psicología , Deportes , Juegos de Video , Conducción de Automóvil/normas , Conducta Competitiva , Humanos , Recompensa , Deportes/normas
12.
Nature ; 608(7922): 368-373, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35896744

RESUMEN

Ketamine is used clinically as an anaesthetic and a fast-acting antidepressant, and recreationally for its dissociative properties, raising concerns of addiction as a possible side effect. Addictive drugs such as cocaine increase the levels of dopamine in the nucleus accumbens. This facilitates synaptic plasticity in the mesolimbic system, which causes behavioural adaptations and eventually drives the transition to compulsion1-4. The addiction liability of ketamine is a matter of much debate, in part because of its complex pharmacology that among several targets includes N-methyl-D-aspartic acid (NMDA) receptor (NMDAR) antagonism5,6. Here we show that ketamine does not induce the synaptic plasticity that is typically observed with addictive drugs in mice, despite eliciting robust dopamine transients in the nucleus accumbens. Ketamine nevertheless supported reinforcement through the disinhibition of dopamine neurons in the ventral tegmental area (VTA). This effect was mediated by NMDAR antagonism in GABA (γ-aminobutyric acid) neurons of the VTA, but was quickly terminated by type-2 dopamine receptors on dopamine neurons. The rapid off-kinetics of the dopamine transients along with the NMDAR antagonism precluded the induction of synaptic plasticity in the VTA and the nucleus accumbens, and did not elicit locomotor sensitization or uncontrolled self-administration. In summary, the dual action of ketamine leads to a unique constellation of dopamine-driven positive reinforcement, but low addiction liability.


Asunto(s)
Ketamina , Trastornos Relacionados con Sustancias , Animales , Dopamina/metabolismo , Neuronas Dopaminérgicas/efectos de los fármacos , Neuronas Dopaminérgicas/metabolismo , Ketamina/efectos adversos , Ketamina/farmacología , Ratones , Plasticidad Neuronal/efectos de los fármacos , Núcleo Accumbens/efectos de los fármacos , Núcleo Accumbens/metabolismo , Receptores de N-Metil-D-Aspartato/antagonistas & inhibidores , Receptores de N-Metil-D-Aspartato/metabolismo , Refuerzo en Psicología , Autoadministración , Trastornos Relacionados con Sustancias/etiología , Trastornos Relacionados con Sustancias/prevención & control , Área Tegmental Ventral/citología , Área Tegmental Ventral/efectos de los fármacos
13.
Proc Natl Acad Sci U S A ; 121(12): e2317751121, 2024 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-38489382

RESUMEN

Do people's attitudes toward the (a)symmetry of an outcome distribution affect their choices? Financial investors seek return distributions with frequent small returns but few large ones, consistent with leading models of choice in economics and finance that assume right-skewed preferences. In contrast, many experiments in which decision-makers learn about choice options through experience find the opposite choice tendency, in favor of left-skewed options. To reconcile these seemingly contradicting findings, the present work investigates the effect of skewness on choices in experience-based decisions. Across seven studies, we show that apparent preferences for left-skewed outcome distributions are a consequence of those distributions having a higher value in most direct outcome comparisons, a "frequent-winner effect." By manipulating which option is the frequent winner, we show that choice tendencies for frequent winners can be obtained even with identical outcome distributions. Moreover, systematic choice tendencies in favor of right- or left-skewed options can be obtained by manipulating which option is experienced as the frequent winner. We also find evidence for an intrinsic preference for right-skewed outcome distributions. The frequent-winner phenomenon is robust to variations in outcome distributions and experimental paradigms. These findings are confirmed by computational analyses in which a reinforcement-learning model capturing frequent winning and intrinsic skewness preferences provides the best account of the data. Our work reconciles conflicting findings of aggregated behavior in financial markets and experiments and highlights the need for theories of decision-making sensitive to joint outcome distributions of the available options.


Asunto(s)
Conducta de Elección , Toma de Decisiones , Humanos , Aprendizaje , Refuerzo en Psicología
14.
Proc Natl Acad Sci U S A ; 121(15): e2317618121, 2024 Apr 09.
Artículo en Inglés | MEDLINE | ID: mdl-38557193

RESUMEN

Throughout evolution, bacteria and other microorganisms have learned efficient foraging strategies that exploit characteristic properties of their unknown environment. While much research has been devoted to the exploration of statistical models describing the dynamics of foraging bacteria and other (micro-) organisms, little is known, regarding the question of how good the learned strategies actually are. This knowledge gap is largely caused by the absence of methods allowing to systematically develop alternative foraging strategies to compare with. In the present work, we use deep reinforcement learning to show that a smart run-and-tumble agent, which strives to find nutrients for its survival, learns motion patterns that are remarkably similar to the trajectories of chemotactic bacteria. Strikingly, despite this similarity, we also find interesting differences between the learned tumble rate distribution and the one that is commonly assumed for the run and tumble model. We find that these differences equip the agent with significant advantages regarding its foraging and survival capabilities. Our results uncover a generic route to use deep reinforcement learning for discovering search and collection strategies that exploit characteristic but initially unknown features of the environment. These results can be used, e.g., to program future microswimmers, nanorobots, and smart active particles for tasks like searching for cancer cells, micro-waste collection, or environmental remediation.


Asunto(s)
Aprendizaje , Refuerzo en Psicología , Modelos Estadísticos , Movimiento (Física) , Bacterias
15.
Proc Natl Acad Sci U S A ; 121(9): e2313073121, 2024 Feb 27.
Artículo en Inglés | MEDLINE | ID: mdl-38381794

RESUMEN

Theories of moral development propose that empathy is transmitted across individuals. However, the mechanisms through which empathy is socially transmitted remain unclear. Here, we combine computational learning models and functional MRI to investigate whether, and if so, how empathic and non-empathic responses observed in others affect the empathy of female observers. The results of three independent studies showed that watching empathic or non-empathic responses generates a learning signal that respectively increases or decreases empathy ratings of the observer. A fourth study revealed that the learning-related transmission of empathy is stronger when observing human rather than computer demonstrators. Finally, we show that the social transmission of empathy alters empathy-related responses in the anterior insula, i.e., the same region that correlated with empathy baseline ratings, as well as its functional connectivity with the temporoparietal junction. Together, our findings provide a computational and neural mechanism for the social transmission of empathy that accounts for changes in individual empathic responses in empathic and non-empathic social environments.


Asunto(s)
Encéfalo , Empatía , Humanos , Femenino , Encéfalo/fisiología , Aprendizaje , Refuerzo en Psicología , Medio Social
16.
Proc Natl Acad Sci U S A ; 121(30): e2405451121, 2024 Jul 23.
Artículo en Inglés | MEDLINE | ID: mdl-39008663

RESUMEN

Reinforcement learning inspires much theorizing in neuroscience, cognitive science, machine learning, and AI. A central question concerns the conditions that produce the perception of a contingency between an action and reinforcement-the assignment-of-credit problem. Contemporary models of associative and reinforcement learning do not leverage the temporal metrics (measured intervals). Our information-theoretic approach formalizes contingency by time-scale invariant temporal mutual information. It predicts that learning may proceed rapidly even with extremely long action-reinforcer delays. We show that rats can learn an action after a single reinforcement, even with a 16-min delay between the action and reinforcement (15-fold longer than any delay previously shown to support such learning). By leveraging metric temporal information, our solution obviates the need for windows of associability, exponentially decaying eligibility traces, microstimuli, or distributions over Bayesian belief states. Its three equations have no free parameters; they predict one-shot learning without iterative simulation.


Asunto(s)
Refuerzo en Psicología , Animales , Ratas , Aprendizaje/fisiología , Factores de Tiempo , Teorema de Bayes
17.
Proc Natl Acad Sci U S A ; 121(16): e2303165121, 2024 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-38607932

RESUMEN

Antimicrobial resistance was estimated to be associated with 4.95 million deaths worldwide in 2019. It is possible to frame the antimicrobial resistance problem as a feedback-control problem. If we could optimize this feedback-control problem and translate our findings to the clinic, we could slow, prevent, or reverse the development of high-level drug resistance. Prior work on this topic has relied on systems where the exact dynamics and parameters were known a priori. In this study, we extend this work using a reinforcement learning (RL) approach capable of learning effective drug cycling policies in a system defined by empirically measured fitness landscapes. Crucially, we show that it is possible to learn effective drug cycling policies despite the problems of noisy, limited, or delayed measurement. Given access to a panel of 15 [Formula: see text]-lactam antibiotics with which to treat the simulated Escherichia coli population, we demonstrate that RL agents outperform two naive treatment paradigms at minimizing the population fitness over time. We also show that RL agents approach the performance of the optimal drug cycling policy. Even when stochastic noise is introduced to the measurements of population fitness, we show that RL agents are capable of maintaining evolving populations at lower growth rates compared to controls. We further tested our approach in arbitrary fitness landscapes of up to 1,024 genotypes. We show that minimization of population fitness using drug cycles is not limited by increasing genome size. Our work represents a proof-of-concept for using AI to control complex evolutionary processes.


Asunto(s)
Antiinfecciosos , Aprendizaje , Refuerzo en Psicología , Farmacorresistencia Microbiana , Ciclismo , Escherichia coli/genética
18.
Proc Natl Acad Sci U S A ; 121(20): e2316658121, 2024 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-38717856

RESUMEN

Individual survival and evolutionary selection require biological organisms to maximize reward. Economic choice theories define the necessary and sufficient conditions, and neuronal signals of decision variables provide mechanistic explanations. Reinforcement learning (RL) formalisms use predictions, actions, and policies to maximize reward. Midbrain dopamine neurons code reward prediction errors (RPE) of subjective reward value suitable for RL. Electrical and optogenetic self-stimulation experiments demonstrate that monkeys and rodents repeat behaviors that result in dopamine excitation. Dopamine excitations reflect positive RPEs that increase reward predictions via RL; against increasing predictions, obtaining similar dopamine RPE signals again requires better rewards than before. The positive RPEs drive predictions higher again and thus advance a recursive reward-RPE-prediction iteration toward better and better rewards. Agents also avoid dopamine inhibitions that lower reward prediction via RL, which allows smaller rewards than before to elicit positive dopamine RPE signals and resume the iteration toward better rewards. In this way, dopamine RPE signals serve a causal mechanism that attracts agents via RL to the best rewards. The mechanism improves daily life and benefits evolutionary selection but may also induce restlessness and greed.


Asunto(s)
Dopamina , Neuronas Dopaminérgicas , Recompensa , Animales , Dopamina/metabolismo , Neuronas Dopaminérgicas/fisiología , Neuronas Dopaminérgicas/metabolismo , Humanos , Refuerzo en Psicología
19.
PLoS Biol ; 21(7): e3002201, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37459394

RESUMEN

When observing the outcome of a choice, people are sensitive to the choice's context, such that the experienced value of an option depends on the alternatives: getting $1 when the possibilities were 0 or 1 feels much better than when the possibilities were 1 or 10. Context-sensitive valuation has been documented within reinforcement learning (RL) tasks, in which values are learned from experience through trial and error. Range adaptation, wherein options are rescaled according to the range of values yielded by available options, has been proposed to account for this phenomenon. However, we propose that other mechanisms-reflecting a different theoretical viewpoint-may also explain this phenomenon. Specifically, we theorize that internally defined goals play a crucial role in shaping the subjective value attributed to any given option. Motivated by this theory, we develop a new "intrinsically enhanced" RL model, which combines extrinsically provided rewards with internally generated signals of goal achievement as a teaching signal. Across 7 different studies (including previously published data sets as well as a novel, preregistered experiment with replication and control studies), we show that the intrinsically enhanced model can explain context-sensitive valuation as well as, or better than, range adaptation. Our findings indicate a more prominent role of intrinsic, goal-dependent rewards than previously recognized within formal models of human RL. By integrating internally generated signals of reward, standard RL theories should better account for human behavior, including context-sensitive valuation and beyond.


Asunto(s)
Refuerzo en Psicología , Recompensa , Humanos , Aprendizaje , Motivación
20.
PLoS Biol ; 21(11): e3002373, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37939126

RESUMEN

Corrective feedback received on perceptual decisions is crucial for adjusting decision-making strategies to improve future choices. However, its complex interaction with other decision components, such as previous stimuli and choices, challenges a principled account of how it shapes subsequent decisions. One popular approach, based on animal behavior and extended to human perceptual decision-making, employs "reinforcement learning," a principle proven successful in reward-based decision-making. The core idea behind this approach is that decision-makers, although engaged in a perceptual task, treat corrective feedback as rewards from which they learn choice values. Here, we explore an alternative idea, which is that humans consider corrective feedback on perceptual decisions as evidence of the actual state of the world rather than as rewards for their choices. By implementing these "feedback-as-reward" and "feedback-as-evidence" hypotheses on a shared learning platform, we show that the latter outperforms the former in explaining how corrective feedback adjusts the decision-making strategy along with past stimuli and choices. Our work suggests that humans learn about what has happened in their environment rather than the values of their own choices through corrective feedback during perceptual decision-making.


Asunto(s)
Conducta de Elección , Toma de Decisiones , Animales , Humanos , Retroalimentación , Recompensa , Refuerzo en Psicología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA